Automated instrument-tracking for 4D video-rate imaging of ophthalmic surgical maneuvers

Eric M. Tang; Mohamed T. El-Haddad; Shriji N. Patel; Yuankai K. Tao

doi:10.1364/BOE.450814

1. Introduction

Rapid development of imaging technologies has facilitated the use of intraoperative image-guidance in a broad range of medical fields including neurology [1–4], ophthalmology [5–7], hepatology [8–10], and oncology [11]. Image-guided surgery provides enhanced visualization of instrument-tissue interactions and is especially useful when haptic feedback is limited such as in ophthalmic microsurgery, minimally-invasive surgery, and robotic surgery [12]. In these cases, endoscopic or microscopic video feeds are used to relay information regarding instrument position back to the surgeon. However, intraoperative imaging tends to suffer from nonuniform illumination, noise, specular reflections, and variations in the surgical environment that ultimately limit the accuracy of instrument pose estimation and instrument-tracking [13,14]. Instrument fiducials can be used to facilitate registration and tracking, but these methods assume a rigid relationship between the instrument tip and physical space [15]. These factors are also often confounded by surgical dynamics, which result in changes in instrument shape and orientation as well as movement of the instrument out-of-focus during surgery.

1.1 Ophthalmic surgery

Image-guided ophthalmic microsurgery has been demonstrated using intraoperative optical coherence tomography (iOCT) and provides significant benefits compared to conventional microscopic surgery, which offers limited contrast and feedback for submillimeter-thick tissues and precludes visualization of subsurface features [16]. The development of iOCT technology through the use of microscope-mounted handheld probes [5,17] and microscope-integrated OCT systems [6,18] addresses these limitations by enabling high-resolution volumetric imaging of supine patients during surgery. Recent studies have shown that iOCT helps to visualize structural changes during surgery that ultimately guide intraoperative decision-making, result in modification of surgical management, and enable verification of surgical goals [19–22]. Furthermore, preliminary results show that patients undergoing iOCT-assisted macular hole surgery have a higher single-operation success rate and significantly improved visual acuity post-operation [23]. Despite the aforementioned benefits, the broad adoption of iOCT technology is hindered by slow imaging speeds and a lack of automated instrument-tracking, which prevents video-rate 4D visualization and requires manual adjustment of the OCT field-of-view (FOV) [24].

1.2 4D video-rate iOCT

The safety, utility, and efficacy of commercial iOCT systems such as the Rescan 700 (Carl Zeiss Meditec) and the Enfocus (Leica Microsystems) have been well-established for ophthalmic surgery [25,26]. However, these systems operate at line-rates between 10–32 kHz, which limits current visualization to static cross-sectional images and prevents 4D imaging of surgical dynamics [27]. Recent developments in high-speed swept-source laser technology have enabled 4D video-rate OCT imaging. These research-grade systems are over an order of magnitude faster than current commercial iOCT systems and are capable of imaging at line-rates between 100 kHz – 1.67 MHz and at volume rates over 10 Hz [28–31]. 4D imaging of surgical maneuvers provides enhanced feedback of instrument-tissue interactions, which can be used to monitor tissue deformation and prevent damage to underlying ocular tissues. However, these systems suffer from an inherent trade-off between speed, sampling density, and FOV. Imaging at higher speeds reduces detector integration time per scan position, thus limiting system sensitivity [32]. On the other hand, it is possible to maintain high sensitivity by increasing sampling density at the cost of speed or FOV.

1.3 Automated instrument-tracking

Currently, iOCT imaging is performed over a static FOV, thus necessitating custom scan software to offset iOCT scan positions to regions-of-interest (ROIs) and requiring a trained technician to be able to accurately follow instrument positions intraoperatively [33]. Manual adjustment of the OCT FOV during surgery disrupts surgical workflow and has been shown to increase operating times by up to 25 minutes despite the presence of a trained technician [34]. Several automated instrument-tracking methods have been previously developed to segment axial position and estimate instrument pose from OCT B-scans and volumes [35–38]. However, these methods assume that the instrument is continuously localized within the OCT FOV and are thus unable to track lateral movements of the instrument across the surgical field. This issue is confounded by the fact that iOCT imaging is often limited to small ROIs in order to achieve 4D visualization. Our group has previously demonstrated a stereo-vision tracking system that decouples lateral instrument-tracking from axial OCT imaging, enabling tracking with high speed and accuracy across a larger FOV [39]. However, stereo-vision-based tracking methods [40–42] require fiducial markers to be placed on the back of instruments and therefore do not account for field distortion and instrument deformation inside the eye, which result in a nonlinear relationship between fiducials and tracked features. Furthermore, movement of the patient and of the eye dynamically changes the frame of reference during operation.

Conventional instrument-tracking methods for surgery involve the use of color features, instrument geometry, or gradient-based edge detection [43–47]. These methods perform well in controlled lab environments, but tend to suffer in vivo due to changes in appearance and lighting conditions, motion blur, and specular reflections. Recently, deep-learning algorithms have been shown to be more robust to variations in the surgical environment while also maintaining a high tracking accuracy on the order of several pixels of error and a high tracking speed of over 30 Hz [48–50].

Here, we present an automated instrument-tracking method for ophthalmic microsurgery that leverages multimodal imaging and recent advances in deep-learning-based object detection. We use a high-speed spectrally encoded coherence tomography and reflectometry (SECTR) system that combines en face spectrally encoded reflectometry (SER) and cross-sectional OCT imaging for automated instrument-tracking and video-rate 4D imaging [51,52]. A convolutional neural network (CNN) is trained to detect 25-gauge internal limiting membrane (25G ILM) forceps from SER images, and OCT scanning is automatically updated based on instrument position. We also present an updated method for adaptive-sampling by optimizing input scan waveforms to densely sample instrument-tissue interactions without sacrificing speed or FOV [53]. We show that we are able to achieve resolution-limited detection accuracy across the OCT imaging range, and as proof-of-concept, we demonstrate 4D tracking of surgical maneuvers in a phantom at 16 Hz volume rate.

2. Methods

2.1 Automated instrument-tracking framework

The framework for our proposed automated-instrument tracking method (Fig. 1) can be divided into three main processes: simultaneous acquisition of OCT and SER images, detection of instrument position from SER images, and scan waveform modification based on instrument position outputs to re-center the OCT FOV. Despite the fact that these processes occur in series, the overall workflow must be highly parallelized in order to minimize latency between steps. In particular, automated instrument-tracking requires dynamic modification of SECTR scan signals at high speeds necessary for video-rate 4D imaging.

Fig. 1. Automated instrument-tracking framework.

Download Full Size | PDF

2.1.1 SECTR acquisition

A custom-built SECTR engine was used to simultaneously generate en face SER and cross-sectional OCT images. OCT imaging is performed by raster scanning a galvanometer pair (X-Y) separated by a unity-magnification 4f optical relay to maintain telecentricity. SER uses a transmissive grating to spectrally-disperse broadband illumination, which is then optically relayed using a 4f relay across the OCT fast-axis scanning galvanometer (X). SER illumination on the sample consists of a focused line extended source in the spectrally-encoded axis that is aligned to the OCT slow-axis (Y). SER and OCT beam paths use a shared swept-source laser and optics to ensure collinearity and spatiotemporal co-registration, allowing for the acquisition of a single en face SER image (X-Y) and a single OCT B-scan (X-Z) for each sweep of the fast-axis (X) galvanometer mirror. As a result, we are able to use SECTR imaging to track en face features, such as surgical instrument position, from SER images and directly correlate those features with respect to the position of the OCT scan FOV.

Here, we modified our previous benchtop design [51] by moving engine components into a compact optical enclosure placed on a portable cart for clinical application. An optical buffer was used to double the laser sweep rate in order to achieve a 400 kHz line rate for rapid volumetric imaging. In addition, fiber connections were fusion-spliced with each other to minimize insertion loss and optimize the throughput of the system. A high-speed digitizer (ATS-9373, AlazarTech) was used to simultaneously acquire SER and OCT data at 2 gigasamples/second for real-time processing and display. Custom C++ software was used to synchronize scan waveform generation via a DAQ (USB-6351, National Instruments) with data acquisition.

2.1.2 CNN detection

Similar to intraoperative imaging variability, SER images acquired using SECTR tend to suffer from image artifacts that preclude the use of traditional instrument-tracking and segmentation methods (Supplement 1 Fig. 1).

Here, we leverage deep-learning-based instrument-tracking techniques by using a GPU-accelerated CNN [54] for detection of surgical instruments from SER images. In particular, the network implementation utilizes the OpenCV Deep Neural Network library and the NVIDIA GPU Computing Toolkit (CUDA) for rapid training and detection. The open-source network was trained using 4730 manually-labelled SER images of 25G ILM forceps (Supplement 1 Fig. 1, Visualization 1). SER images were acquired in paper phantoms (N = 4290), retinal phantoms (N = 254) and ex vivo bovine eyes (N = 186). In addition, data augmentation was performed by utilizing a horizontal flip and 90° clockwise rotation for each image. The augmented data was then split between training (80%) and validation (20%) sets. CNN training was performed on a computer with an Intel i9-10900X CPU, an NVIDIA GeForce RTX 2080 Ti GPU and 64 GB RAM. Training was run for 400,000 epochs over the course of 24 hours, resulting in a mean average precision value of > 99% based on a complete intersection over union (CIoU) loss function [55].

CNN model and weights were integrated directly with the SECTR C++ acquisition software for real-time detection of SER images. Multithreaded programming was used to decouple SECTR acquisition from CNN detection by copying acquired SER images to a buffer. A parallel thread then passed the stored SER image through the trained network for high-speed detection at over 120 Hz.

2.1.3 Scan waveform modification

Finally, scan waveforms must be updated on a volume-by-volume basis for smooth 4D visualization, which corresponds to update rates of at least 16-20 Hz. Typical DAQ devices used to generate scan waveform signals operate in regeneration mode, where scan waveforms are preallocated to an onboard waveform buffer (Fig. 2, red). As a result, scanning must be paused and restarted in order to update scan waveforms. Here, we take advantage of a nonregenerative operating regime that allows us to dynamically modify waveform values per scan volume. An additional thread is used to continuously monitor the size of the waveform buffer and automatically modify and write new values immediately when the buffer is empty (Fig. 2, blue).

Fig. 2. DAQ operating modes. Regeneration mode allows continuous output from the scan waveform buffer. Nonregeneration requires new values to be written to the buffer, but enables dynamic scan waveform updates.

Download Full Size | PDF

The CNN outputs bounding box coordinates (x, y, width, height) corresponding to the top left coordinate (x, y) of the detected surgical instrument for right-handed cases or the top right coordinate (x + width, y) for left-handed cases (Visualization 1). These values were sent directly to the waveform generation thread and used to update scanning outputs. CNN position to galvanometer scan waveform voltage offset calibration was accomplished by fitting measured offsets used to center the OCT FOV around the instrument from different initial positions (Fig. 3(a)). Due to adaptive-sampling of the fast-axis (see Section 2.2), a higher order fit was required to calibrate the nonlinear offset-to-position curve. On the other hand, the slow-axis was linearly-sampled and therefore required only a linear fit. The design of the SECTR system enabled independent calibration of the fast- and slow-axes since each scan mirror is conjugate to the other through a 4f imaging relay. Furthermore, in order to optimize imaging performance, galvanometer scanners (Saturn 5B, ScannerMax) were tuned using a previously-reported method to maximize scan speed and linear FOV for 4D imaging [56].

Fig. 3. Scan waveform modification calibration. (a) Fast (X) and slow (Y) axis offset calibration. (b) Adaptively-sampled SER images showing dense-sampling of the instrument tip and (c) resampled/linearized images. (d) Input and resampled scan waveforms for images in (b), (c) showing X- and Y-offsets used to center the OCT FOV on the instrument (red) following out-of-plane motion (left images vs. right images).

Download Full Size | PDF

2.2 Adaptive-sampling

Additionally, input scan waveforms were modified by using an adaptive-sampling protocol (Fig. 3(b)-3(d)), which allowed us to dynamically re-center the densely sampled region of each OCT volume with the automatically tracked instrument position to enhance visualization of instrument-tissue interactions. The original linear scan waveform prior to adaptive-sampling can be described mathematically as:

(1)$$Linear(x )= \frac{{FO{V_x}{V_x}}}{{{N_{lines}}}}\ast x$$

Here, the linear $X$-axis waveform input is defined as a line with slope equal to the $FOV$ (mm) scaled by a voltage scale factor V (V/mm) divided by the number of lines. On the other hand, the slow $Y$-axis scan waveform is constant per frame to generate a raster scan pattern. For adaptive-sampling, the slope of the linear $X$-axis waveform (Eq. 1) is scaled by ${v_{dense}} = 0.25$ to densely sample the center of the FOV between ${X_{start}}$ and ${X_{end}}$ by a factor of 4. In addition, the slope of the scan waveform is scaled by ${v_{sparse}} = 1.50$ (calculated based off of ${v_{dense}}$) to sparsely-sample the perhery of the frame. Scale factors were chosen to maximize sampling density at the center of the frame while maintaining a wide FOV for tracking as well as a high frame rate for 4D volumetric imaging.

(2)$$Dense\; (x )= Linear(x )\ast {v_{dense}},\,\,\,\,{X_{start}} \le x \le {X_{end}}$$

Adaptive-sampling

(3)$$Sparse(x )= Linear(x )\ast {v_{sparse}},\,\,<\, {{X_{start}},\; x}\, >\, {X_{end}}$$

(4)$${v_{sparse}} = \frac{{{N_{lines}} - ({{X_{end}} - {X_{start}}} )\ast {v_{dense}}}}{{{N_{lines}} - ({{X_{end}} - {X_{start}}} )}}$$

For visualization and CNN detection, acquired SER images were resampled (Fig. 3(c)) based on the fast-axis galvanometer position output measured via the galvanometer controller (MachDSP, ScannerMax).

2.3 Experimental setup

The SECTR engine was interfaced with an ophthalmic imaging probe placed in a microscope configuration with axial (Z) translation freedom. A pair of 25G ILM forceps were clamped and mounted to a two-axis (XY) motorized translation stage (MLS203, ThorLabs) with 0.1 µm position resolution for precise control of instrument position and speed.

In order to quantify the accuracy of the CNN for basic surgical maneuvers, a series of 3 experiments were performed by varying probe axial/depth (Z) position, instrument orientation, and instrument translation speed (Table 1). Instrument speeds were chosen to cover and extend beyond the range of typical ophthalmic surgical maneuver speeds between 0.1–0.5 mm/s [57–59]. For each set of experimental parameters, SER images of the forceps placed over a paper sample were acquired (Supplement 1 Fig. 2). For Experiment 1, the imaging probe was translated between 0 mm (in-focus) to 21 mm at increments of 3 mm, which corresponds to the measured Rayleigh range of the system (Supplement 1 Fig. 3), in order to simulate motion of the instrument out-of-focus (Fig. 4(a)). For Experiment 2, both instrument depth and translation speed were varied in order to simulate out-of-plane motion and motion blur. Lastly for Experiment 3, instrument orientation and speed were varied to imitate dynamic movement of the instrument during operation (Fig. 4(b)). For each set of parameters, manual annotation of instrument position from tracked SER images (N = 200) were compared to corresponding CNN outputs to determine pixel error and tracking accuracy.

Fig. 4. SER images as a function of varying (a) probe depth and (b) instrument orientation.

Download Full Size | PDF

Table 1. CNN accuracy experimental parameters.

View Table

3. Results

SER image resolution was measured using a 1951 United States Air Force test chart to be 44.2 µm with a sampling resolution of 15 µm/pixel. Thus, the minimum expected pixel position error corresponding to the resolution limit can be calculated to be 44.2/15 = 2.95 pixels. For each combination of experimental parameters, position error was calculated by taking the difference between neural network position outputs and manual labels of position (N = 200). A one-sample t-test was used to compare each distribution of calculated errors to the resolution limit of the system. Comparing manual annotations of instrument position to CNN outputs at various axial positions, we are able to achieve resolution-limited tracking accuracy (p << 1E-7) between 0 mm – 9 mm, which is beyond the ∼7 mm full axial range of our OCT imaging (Fig. 5, top). Comparing tracking accuracy in this depth range with varying instrument translation speed, we maintain resolution-limited performance up to velocities of 10 mm/s (Fig. 5, middle). Similar performance can also be achieved when varying instrument orientation and speed (Fig. 5, bottom). Despite the presence of error extrema and outliers, distribution means were statistically significantly lower than the resolution limit. These results suggest that the CNN is particularly robust to intraoperative variability, including movement of the instrument in and out-of-focus, high speed maneuvers, and dynamic maneuvers involving rotation and translation of the surgical instrument.

Fig. 5. Quantification of CNN accuracy as a function of depth for a fixed orientation and speed (top), depth and speed (middle), and orientation and speed (bottom). Pixel error between CNN and manual outputs are compared to the resolution limit of the system (∼2.95 pix.). Box plots span the 25^th and 75^th percentiles and box waists correspond to distribution means. Box plot whiskers extend to data extrema and outliers are denoted by ‘+’.

Download Full Size | PDF

Beyond tracking accuracy, it is also important to continuously center the instrument within the imaging FOV in order to provide 4D visualization of instrument-tissue interactions. For each combination of experimental parameters, the standard deviation of CNN outputs was calculated to determine how precisely the instrument was re-centered despite changes in focus, speed, and orientation. Comparing CNN instrument position outputs, we are able to localize the instrument at the center of the image within the resolution limit for typical ophthalmic maneuver speeds below 1 mm/s regardless of instrument orientation or depth position (Fig. 6). However, a linear trend in deviation from the center of the frame is observable at higher speeds due to update rate limits. Here, we chose to update scan position waveforms per volume (16 Hz) instead of at the CNN output rate (120 Hz) in order to prevent discontinuities in volumetric rendering as well as to maintain anatomical accuracy within each volume. Nevertheless, a minimal deviation of ∼15 pix. can be seen even at translation speeds of 10 mm/s, which is beyond the speed of conventional ophthalmic surgical maneuvers.

Fig. 6. Instrument deviation from center for tracked SER frames using CNN position outputs for changes in depth and speed (left) and orientation and speed (right). Resolution-limited localization of the forceps at the center of the frame is shown for typical ophthalmic surgical maneuver speeds (shaded box).

Download Full Size | PDF

Finally, we validated our CNN-based tracking method by performing automated instrument-tracking and 4D video-rate imaging of a mock surgical task. Free-hand maneuvers of 25G ILM forceps by an untrained volunteer were used to evaluate CNN tracking performance in the presence of physiological tremors. In particular, a metallic ring was moved between 4 phantom quadrants of varying height (0 mm, 1 mm, 2 mm, 3 mm). Additional features, such as holes and a scale bar (3 mm increment), were included in order to better visualize lateral and axial changes as the instrument moves (Supplement 1 Fig. 4). Volumetric imaging was performed using 2560 (Z) x 500 (X) x 50 (Y) pix. (pixels per line x lines per frame x frames per volume) for a frame rate of 800 Hz for en face SER and cross-sectional OCT B-scans and volume rate of 16 Hz for 3D OCT. CNN detection of forceps from SER images was performed simultaneously at a rate of 120 Hz. SER images were acquired over a 25 mm (Y) x 7 mm (X) FOV and tracking was performed over a maximum FOV of 25 mm x 25 mm. Following the acquisition, standard OCT post-processing methods were used to generate images from spectral data and a 4D rendering of OCT data was produced using 3D Slicer [60]. Despite lateral movement across the entire sample, changes in depth/focus, opening and closing of the forceps tip, as well as presence of specular reflections from the metallic ring, the forceps remain localized in the OCT FOV (Fig. 7, Visualization 2).

Fig. 7. Automated instrument-tracking in a 4-quadrant phantom. (a) SER image of 25G ILM forceps manipulating a metallic ring and (b) corresponding OCT volume B-scan maximum intensity projection and (c) 4D rendering. (d) SER image following out-of-plane motion and translation of the instrument. (e), (f) Localization of the instrument in the OCT FOV despite lateral movement and axial position change. Scale bar = 1 mm (a), (d) and 0.5 mm (b), (e). See Visualization 2.

Download Full Size | PDF

4. Discussion and summary

Ophthalmic microsurgery is conventionally performed under a surgical microscope, which precludes visualization of subsurface features and underlying instrument-tissue interactions. iOCT is an emerging technology that provides depth-resolved visualization of surgical maneuvers but is currently limited by slow scans speeds and lack of automated instrument-tracking that necessitates manual adjustment of the static imaging FOV. Here, we propose an automated instrument-tracking method that takes advantage of a high-speed SECTR system to simultaneously acquire en face SER and cross-sectional OCT images. Co-registration between the two imaging modalities allows for dynamic updates of OCT scan position based on instrument position detected from SER images. Using a GPU-accelerated CNN, we are able detect surgical instrument positions at over 120 Hz with resolution-limited accuracy despite changes in focus, orientation, and speed. In addition, the proposed method has significant advantages over previously-reported OCT-based, color-based, and gradient-based detection methods by providing widefield lateral tracking in the presence of instrument deformation, soft-tissue movement, mechanical manipulation of the eye, and artifacts such as specular reflections. Furthermore, our method enables tracking for both anterior and posterior segment operations due to the network’s insensitivity to distortions and aberrations induced by the optical system and the optics of the eye. We demonstrated the efficacy of our method by performing automated instrument-tracking and 4D video-rate imaging of a mock surgical task in a phantom.

Currently, the main limitation of our method is the update rate, which occurs once per volume in order to prevent intravolume updates that would degrade 4D visualization. As a result, we have a linear increase in instrument position deviation from the center of each image with increasing instrument velocities. However, ophthalmic microsurgery requires delicate manipulation of tissue, and instrument maneuver speeds are typically between 0.1–0.5 mm/s. Within this speed range, our method achieves high tracking accuracy and localization of the instrument. In addition, we show that our method is robust to extreme changes in speed (0.25–10 mm/s) and defocus (0 mm – 9 mm) that are beyond traditional use cases. We also account for changes in instrument orientation which may occur during manipulation of tissue as well as from differences in surgeon preference (left-handed vs. right-handed). At higher speeds beyond ophthalmic surgical use cases, it is also possible to use extrapolation methods, such as Kalman filtering, to achieve better localization of the surgical instrument despite limited update rates.

Furthermore, we believe the proposed method can be extended for the identification and tracking of multiple instruments which are often present simultaneously in the surgical field. The implemented CNN is capable of predicting instrument positions across multiple classes at once and can be trained to detect a variety of ophthalmic surgical tools including forceps, picks, loops, membrane scrapers, and light pipes. In addition, the existing 25G ILM forceps model can potentially be extended to facilitate training through transfer learning to generate a robust model for multiple instruments [61,62]. By using an open-source network, we also eliminate the need for extensive hyperparameter tuning and optimization that is typical for many CNN implementations and, thus, broaden the applicability of our automated instrument-tracking framework. Similarly, we can extend our adaptive-sampling protocol to be able to target individual surgical instruments as specified by the surgeon as well as to switch between multiple instruments by leveraging multi-class detection outputs.

In addition, our technology can be directly integrated into the surgical microscope and potentially used for 4D in vivo imaging and automated instrument-tracking [63,64]. We believe our method ultimately enables intraoperative guidance for ophthalmic microsurgery and will facilitate the adoption of iOCT technology. Furthermore, our method can benefit iOCT-guided surgery by lowering the learning curve for surgeons and allowing them to perform an operation normally in comparison to the use of commercial iOCT systems, which require manual tracking and alignment of the static OCT FOV to ROIs. SECTR-based tracking can also be extended beyond ophthalmic applications for use in fields such as dermatology, where imaging power and speed can be significantly increased for enhanced 4D tracking and visualization [65,66].

Funding

National Institutes of Health (R01-EY030490, R01-EY031769, T32-EB021937).

Acknowledgements

This research was supported by Vanderbilt University, the Vanderbilt Institute for Surgery and Engineering (VISE), and the US National Institutes of Health Grant No. R01-EY030490, R01-EY031769 and T32-EB021937. The content was solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Disclosures

The authors declare no conflicts of interest.

Data Availability

CNN model and accuracy quantification data are available upon request.

Supplemental document

See Supplement 1 for supporting content.

References

1. T. Peters, B. Davey, P. Munger, R. Comeau, A. Evans, and A. Olivier, “Three-dimensional multimodal image-guidance for neurosurgery,” IEEE Trans. Med. Imaging 15(2), 121–128 (1996). [CrossRef]

2. R. B. Schwartz, L. Hsu, T. Z. Wong, D. F. Kacher, A. A. Zamaní, P. M. Black, E. Alexander, P. E. Stíeg, T. M. Moriarty, C. A. Martín, R. Kikinìs, and F. A. Jolesz, “Intraoperative MR imaging guidance for intracranial neurosurgery: experience with the first 200 cases,” Radiology 211(2), 477–488 (1999). [CrossRef]

3. R. M. Comeau, A. F. Sadikot, A. Fenster, and T. M. Peters, “Intraoperative ultrasound for guidance and tissue shift correction in image-guided neurosurgery,” Med. Phys. 27(4), 787–800 (2000). [CrossRef]

4. E. Martin, D. Jeanmonod, A. Morel, E. Zadicario, and B. Werner, “High-intensity focused ultrasound for noninvasive functional neurosurgery,” Ann. Neurol. 66(6), 858–861 (2009). [CrossRef]

5. P. N. Dayani, R. Maldonado, S. Farsiu, and C. A. Toth, “Intraoperative use of handheld spectral domain optical coherence tomography imaging in macular surgery,” Retina 29(10), 1457–1468 (2009). [CrossRef]

6. Y. K. Tao, J. P. Ehlers, C. A. Toth, and J. A. Izatt, “Intraoperative spectral domain optical coherence tomography for vitreoretinal surgery,” Opt. Lett. 35(20), 3315 (2010). [CrossRef]

7. J. P. Ehlers, Y. K. Tao, and S. K. Srivastava, “The value of intraoperative optical coherence tomography imaging in vitreoretinal surgery,” Curr. Opin. Ophthalmol. 25(3), 221–227 (2014). [CrossRef]

8. D. M. Cash, M. I. Miga, S. C. Glasgow, B. M. Dawant, L. W. Clements, Z. Cao, R. L. Galloway, and W. C. Chapman, “Concepts and preliminary data toward the realization of image-guided liver surgery,” J. Gastrointest. Surg. 11(7), 844–859 (2007). [CrossRef]

9. T. Aoki, D. Yasuda, Y. Shimizu, M. Odaira, T. Niiya, T. Kusano, K. Mitamura, K. Hayashi, N. Murai, T. Koizumi, H. Kato, Y. Enami, M. Miwa, and M. Kusano, “Image-guided liver mapping using fluorescence navigation system with indocyanine green for anatomical hepatic resection,” World J. Surg. 32(8), 1763–1767 (2008). [CrossRef]

10. M. Peterhans, A. Vom Berg, B. Dagon, D. Inderbitzin, C. Baur, D. Candinas, and S. Weber, “A navigation system for open liver surgery: design, workflow and first clinical applications,” Int. J. Med. Robot. Comput. Assist. Surg. 7(1), 7–16 (2011). [CrossRef]

11. L. A. Dawson and D. A. Jaffray, “Advances in image-guided radiation therapy,” J. Clin. Oncol. 25(8), 938–946 (2007). [CrossRef]

12. O. A. J. Van Der Meijden and M. P. Schijven, “The value of haptic feedback in conventional and robot-assisted minimal invasive surgery and virtual reality training: a current review,” Surg. Endosc. 23(6), 1180–1190 (2009). [CrossRef]

13. D. Bouget, M. Allan, D. Stoyanov, and P. Jannin, “Vision-based and marker-less surgical tool detection and tracking: a review of the literature,” Med. Image Anal. 35, 633–654 (2017). [CrossRef]

14. I. Laina, N. Rieke, C. Rupprecht, J. P. Vizcaíno, A. Eslami, F. Tombari, and N. Navab, “Concurrent segmentation and localization for tracking of surgical instruments,” in Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Springer, 2017), 10434 LNCS, pp. 664–672.

15. J. B. West and C. R. Maurer, “Designing optically tracked instruments for image-guided surgery,” IEEE Trans. Med. Imaging 23(5), 533–545 (2004). [CrossRef]

16. M. T. El-Haddad and Y. K. Tao, “Advances in intraoperative optical coherence tomography for surgical guidance,” Curr. Opin. Biomed. Eng. 3, 37–48 (2017). [CrossRef]

17. R. Ray, D. E. Baraano, J. A. Fortun, B. J. Schwent, B. E. Cribbs, C. S. Bergstrom, G. B. Hubbard, and S. K. Srivastava, “Intraoperative microscope-mounted spectral domain optical coherence tomography for evaluation of retinal anatomy during macular surgery,” Ophthalmology 118(11), 2212–2217 (2011). [CrossRef]

18. Y. K. Tao, S. K. Srivastava, and J. P. Ehlers, “Microscope-integrated intraoperative OCT with electrically tunable focus and heads-up display for imaging of ophthalmic surgical maneuvers,” Biomed. Opt. Express 5(6), 1877 (2014). [CrossRef]

19. “DISCOVER Study: Microscope-integrated Intraoperative OCT Study,” ClinicalTrials.gov (2014).

20. J. P. Ehlers, Y. S. Modi, P. E. Pecen, J. Goshe, W. J. Dupps, A. Rachitskaya, S. Sharma, A. Yuan, R. Singh, P. K. Kaiser, J. L. Reese, C. Calabrise, A. Watts, and S. K. Srivastava, “The DISCOVER study 3-year results: feasibility and usefulness of microscope-integrated intraoperative OCT during ophthalmic surgery,” Ophthalmology 125(7), 1014–1027 (2018). [CrossRef]

21. “PIONEER: Intraoperative and Perioperative OCT Study,” ClinicalTrials.gov (2018).

22. B. Todorich, C. Shieh, P. J. Desouza, O. M. Carrasco-Zevallos, D. L. Cunefare, S. S. Stinnett, J. A. Izatt, S. Farsiu, P. Mruthyunjaya, A. N. Kuo, and C. A. Toth, “Impact of microscope-integrated OCT on ophthalmology resident performance of anterior segment surgical maneuvers in model eyes,” Investig. Ophthalmol. Vis. Sci. 57(9), OCT146 (2016). [CrossRef]

23. P. Yee, D. D. Sevgi, J. Abraham, S. K. Srivastava, T. Le, A. Uchida, N. Figueiredo, A. V. Rachitskaya, S. Sharma, J. Reese, and J. P. Ehlers, “iOCT-assisted macular hole surgery: Outcomes and utility from the DISCOVER study,” Br. J. Ophthalmol. 105(3), 403–409 (2021). [CrossRef]

24. O. M. Carrasco-Zevallos, C. Viehland, B. Keller, M. Draelos, A. N. Kuo, C. A. Toth, and J. A. Izatt, “Review of intraoperative optical coherence tomography: technology and applications [Invited],” Biomed. Opt. Express 8(3), 1607 (2017). [CrossRef]

25. J. P. Ehlers, P. K. Kaiser, and S. K. Srivastava, “Intraoperative optical coherence tomography using the RESCAN 700: preliminary results from the DISCOVER study,” Br. J. Ophthalmol. 98(10), 1329–1332 (2014). [CrossRef]

26. A. Runkle, S. K. Srivastava, and J. P. Ehlers, “Microscope-integrated OCT feasibility and utility with the enfocus system in the discover study,” Ophthalmic Surg. Lasers Imaging Retin. 48(3), 216–222 (2017). [CrossRef]

27. O. M. Carrasco-Zevallos, B. Keller, C. Viehland, L. Shen, M. I. Seider, J. A. Izatt, and C. A. Toth, “Optical coherence tomography for retinal surgery: perioperative analysis to real-time four-dimensional image-guided surgery,” Investig. Ophthalmol. Vis. Sci. 57(9), OCT37–OCT50 (2016). [CrossRef]

28. O. M. Carrasco-Zevallos, B. Keller, C. Viehland, L. Shen, G. Waterman, B. Todorich, C. Shieh, P. Hahn, S. Farsiu, A. N. Kuo, C. A. Toth, and J. A. Izatt, “Live volumetric (4D) visualization and guidance of in vivo human ophthalmic surgery with intraoperative optical coherence tomography,” Sci. Rep. 6(1), 31689–16 (2016). [CrossRef]

29. C. Viehland, B. Keller, O. M. Carrasco-Zevallos, D. Nankivil, L. Shen, S. Mangalesh, D. T. Viet, A. N. Kuo, C. A. Toth, and J. A. Izatt, “Enhanced volumetric visualization for real time 4D intraoperative ophthalmic swept-source OCT,” Biomed. Opt. Express 7(5), 1815 (2016). [CrossRef]

30. O. M. Carrasco-Zevallos, C. Viehland, B. Keller, R. P. McNabb, A. N. Kuo, and J. A. Izatt, “Constant linear velocity spiral scanning for near video rate 4D OCT ophthalmic and surgical imaging with isotropic transverse sampling,” Biomed. Opt. Express 9(10), 5052 (2018). [CrossRef]

31. J. P. Kolb, W. Draxinger, J. Klee, T. Pfeiffer, M. Eibl, T. Klein, W. Wieser, and R. Huber, “Live video rate volumetric OCT imaging of the retina with multi-MHz A-scan rates,” PLoS One 14(3), e0213144 (2019). [CrossRef]

32. T. Klein, W. Wieser, L. Reznicek, A. Neubauer, A. Kampik, and R. Huber, “Multi-MHz retinal OCT,” Biomed. Opt. Express 4(10), 1890 (2013). [CrossRef]

33. P. Hahn, O. Carrasco-Zevallos, D. Cunefare, J. Migacz, S. Farsiu, J. A. Izatt, and C. A. Toth, “Intrasurgical human retinal imaging with manual instrument tracking using a microscope-integrated spectral-domain optical coherence tomography device,” Transl. Vis. Sci. Technol. 4(4), 1–9 (2015). [CrossRef]

34. J. P. Ehlers, W. J. Dupps, P. K. Kaiser, J. Goshe, R. P. Singh, D. Petkovsek, and S. K. Srivastava, “The prospective intraoperative and perioperative ophthalmic imaging with optical CoherEncE TomogRaphy (PIONEER) study: 2-year results,” Am. J. Ophthalmol. 158(5), 999–1007.e1 (2014). [CrossRef]

35. B. Keller, M. Draelos, G. Tang, S. Farsiu, A. N. Kuo, K. Hauser, and J. A. Izatt, “Real-time corneal segmentation and 3D needle tracking in intrasurgical OCT,” Biomed. Opt. Express 9(6), 2716 (2018). [CrossRef]

36. N. Gessert, M. Schlüter, and A. Schlaefer, “A deep learning approach for pose estimation from volumetric OCT data,” Med. Image Anal. 46, 162–179 (2018). [CrossRef]

37. J. Weiss, N. Rieke, M. A. Nasseri, M. Maier, A. Eslami, and N. Navab, “Fast 5DOF needle tracking in iOCT,” Int. J. Comput. Assist. Radiol. Surg. 13(6), 787–796 (2018). [CrossRef]

38. M. Zhou, X. Hao, A. Eslami, K. Huang, C. Cai, C. P. Lohmann, N. Navab, A. Knoll, and M. A. Nasseri, “6DOF needle pose estimation for robot-assisted vitreoretinal surgery,” IEEE Access 7, 63113–63122 (2019). [CrossRef]

39. M. T. El-Haddad and Y. K. Tao, “Automated stereo vision instrument tracking for intraoperative OCT guided anterior segment ophthalmic surgical maneuvers,” Biomed. Opt. Express 6(8), 3014 (2015). [CrossRef]

40. D. Stoyanov, “Surgical vision,” Ann. Biomed. Eng. 40(2), 332–345 (2012). [CrossRef]

41. . W. Liu, H. Ren, W. Zhang, and S. Song, “Cognitive tracking of surgical instruments based on stereo vision and depth sensing,” in 2013 IEEE International Conference on Robotics and Biomimetics, ROBIO 2013 (IEEE Computer Society, 2013), pp. 316–321.

42. Z. Zhou, B. Wu, J. Duan, X. Zhang, N. Zhang, and Z. Liang, “Optical surgical instrument tracking system based on the principle of stereo vision,” J. Biomed. Opt. 22(6), 065005 (2017). [CrossRef]

43. S. Voros, J. A. Long, and P. Cinquin, “Automatic detection of instruments in laparoscopic images: A first step towards high-level command of robotic endoscopic holders,” in International Journal of Robotics Research (Sage Publications, 2007), 26(11–12), pp. 1173–1190.

44. S. Speidel, J. Benzko, S. Krappe, G. Sudra, P. Azad, B. P. Müller-Stich, C. Gutt, and R. Dillmann, “Automatic classification of minimally invasive instruments based on endoscopic image sequences,” in Medical Imaging 2009: Visualization, Image-Guided Procedures, and Modeling (SPIE, 2009), 7261(13), p. 72610A.

45. Y. M. Baek, S. Tanaka, H. Kanako, N. Sugita, A. Morita, S. Sora, R. Mochizuki, and M. Mitsuishi, “Full state visual forceps tracking under a microscope using projective contour models,” in Proceedings - IEEE International Conference on Robotics and Automation (Institute of Electrical and Electronics Engineers Inc., 2012), pp. 2919–2925.

46. R. Sznitman, K. Ali, R. Richa, R. H. Taylor, G. D. Hager, and P. Fua, “Data-driven visual tracking in retinal microsurgery,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012 (Springer Verlag, 2012), 7511 LNCS, pp. 568–575.

47. J. Ryu, J. Choi, and H. C. Kim, “Endoscopic vision-based tracking of multiple surgical instruments during robot-assisted surgery,” Artif. Organs 37(1), 107–112 (2013). [CrossRef]

48. Z. Zhao, S. Voros, Y. Weng, F. Chang, and R. Li, “Tracking-by-detection of surgical instruments in minimally invasive surgery via the convolutional neural network deep learning-based method,” Comput. Assist. Surg. 22(sup1), 26–35 (2017). [CrossRef]

49. L. Qiu, C. Li, and H. Ren, “Real-time surgical instrument tracking in robot-assisted surgery using multi-domain convolutional neural network,” Healthcare Technology Letters 6(6), 159–164 (2019). [CrossRef]

50. Z. Zhao, Z. Chen, S. Voros, and X. Cheng, “Real-time tracking of surgical instruments based on spatio-temporal context and deep learning,” Comput. Assist. Surg. 24(sup1), 20–29 (2019). [CrossRef]

51. M. T. El-Haddad, I. Bozic, and Y. K. Tao, “Spectrally encoded coherence tomography and reflectometry: Simultaneous en face and cross-sectional imaging at 2 gigapixels per second,” J. Biophotonics 11(4), e201700268 (2018). [CrossRef]

52. J. D. Malone, M. T. El-Haddad, S. S. Yerramreddy, I. Oguz, and Y. K. K. Tao, “Handheld spectrally encoded coherence tomography and reflectometry for motion-corrected ophthalmic optical coherence tomography and optical coherence tomography angiography,” Neurophotonics 6(04), 1 (2019). [CrossRef]

53. M. T. El-Haddad, J. D. Malone, N. T. Hoang, and Y. K. Tao, “Deep-learning based automated instrument tracking and adaptive-sampling of intraoperative OCT for video-rate volumetric imaging of ophthalmic surgical maneuvers,” in (SPIE, 2019), 10867, p.57.

54. A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: optimal speed and accuracy of object detection,” arXiv (2020).

55. . Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU loss: Faster and better learning for bounding box regression,” in AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (AAAI press, 2020), pp. 12993–13000.

56. E. Tang and Y. Tao, “Modeling and optimization of galvanometric point-scanning temporal dynamics,” Biomed. Opt. Express 12(11), 6701–6716 (2021). [CrossRef]

57. B. Gonenc, P. Gehlbach, J. Handa, R. H. Taylor, and I. Iordachita, “Motorized force-sensing micro-forceps with tremor cancelling and controlled micro-vibrations for easier membrane peeling,” in Proceedings of the IEEE RAS and EMBS International Conference on Biomedical Robotics and Biomechatronics (IEEE Computer Society, 2014), pp. 244–251.

58. B. Gonenc, R. H. Taylor, I. Iordachita, P. Gehlbach, and J. Handa, “Force-sensing microneedle for assisted retinal vein cannulation,” in Proceedings of IEEE Sensors (NIH Public Access, 2014), pp. 698–701.

59. D. Borroni, K. Gadhvi, G. Wojcik, F. Pennisi, N. A. Vallabh, A. Galeone, A. Ruzza, E. Arbabi, N. Menassa, S. Kaye, D. Ponzin, S. Ferrari, and V. Romano, “The Influence of speed during stripping in descemet membrane endothelial keratoplasty tissue preparation,” Cornea 39(9), 1086–1090 (2020). [CrossRef]

60. A. Fedorov, R. Beichel, J. Kalpathy-Cramer, J. Finet, J. C. Fillion-Robin, S. Pujol, C. Bauer, D. Jennings, F. Fennessy, M. Sonka, J. Buatti, S. Aylward, J. V. Miller, S. Pieper, and R. Kikinis, “3D slicer as an image computing platform for the quantitative imaging network,” Magn. Reson. Imaging 30(9), 1323–1341 (2012). [CrossRef]

61. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). [CrossRef]

62. H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016). [CrossRef]

63. J. D. Li, J. D. Malone, M. T. El-Haddad, A. M. Arquitola, K. M. Joos, S. N. Patel, and Y. K. Tao, “Image-guided feedback for ophthalmic microsurgery using multimodal intraoperative swept-source spectrally encoded scanning laser ophthalmoscopy and optical coherence tomography,” in Optical Coherence Tomography and Coherence Domain Optical Methods in Biomedicine XXI (SPIE, 2017), 10053, p. 100530I.

64. M. J. Ringel, E. M. Tang, D. Hu, I. Oguz, and Y. K. Tao, “Intraoperative spectrally encoded coherence tomography and reflectometry (iSECTR) for ophthalmic surgical guidance,” in Optical Coherence Tomography and Coherence Domain Optical Methods in Biomedicine XXV (SPIE, 2021), 11630, p. 116300J.

65. K. Zhang and J. U. Kang, “Real-time intraoperative 4D full-range FD-OCT based on the dual graphics processing units architecture for microsurgery guidance,” Biomed. Opt. Express 2(4), 764 (2011). [CrossRef]

66. W. Wieser, B. R. Biedermann, T. Klein, C. M. Eigenwillig, R. Huber, D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, and J. G. Fujimoto, “Multi-Megahertz OCT: high quality 3D imaging at 20 million A-scans and 4.5 GVoxels per second,” Opt. Express 18(14), 14685–14704 (2010). [CrossRef]

Experiment Number	Depth (Z)	Orientation	Translation Speed (X)
1	0, 3, 6, 9, 12, 15, 18, 21 mm	0°	5 mm/s
2	0, 3, 6, 9 mm	0°	0.25, 1, 2, 4, 6, 8, 10 mm/s
3	0 mm	0, 45, 90, 135, 180°	0.25, 1, 2, 4, 6, 8, 10 mm/s

Automated instrument-tracking for 4D video-rate imaging of ophthalmic surgical maneuvers

Abstract

Corrections

1. Introduction

1.1 Ophthalmic surgery

1.2 4D video-rate iOCT

1.3 Automated instrument-tracking

2. Methods

2.1 Automated instrument-tracking framework

2.1.1 SECTR acquisition

2.1.2 CNN detection

2.1.3 Scan waveform modification

2.2 Adaptive-sampling

2.3 Experimental setup

3. Results

4. Discussion and summary

Funding

Acknowledgements

Disclosures

Data Availability

Supplemental document

References

Supplementary Material (3)

Data Availability

Cited By

Figures (7)

Tables (1)

Equations (4)

Biomedical Optics Express

Name	Description
Supplement 1	Supplemental document
Visualization 1	Real-time CNN output and detection of 25G ILM forceps in SER images used for training acquired in a retinal phantom (left) and an ex vivo bovine eye (right). CNN outputs are shown by the purple bounding boxes overlaid on each SER image.
Visualization 2	Automated instrument-tracking and 4D video-rate imaging of a mock surgical task.