Rigid alignment method for secondary analyses of optical coherence tomography volumes

Andrew Cornelio; Ana Collazo Martinez; Hanzhang Lu; Craig Jones; Craig Jones; Craig Jones; Craig Jones; Amir H. Kashani; Amir H. Kashani; Amir H. Kashani

doi:10.1364/BOE.508123

1. Introduction

1.1 Motivation

Optical coherence tomography (OCT) is an imaging modality that uses low-coherence light to capture 3D depth encoded images of the retina at micron level resolution [1]. It is used to diagnose, track the progression, and aid treatment of diabetic retinopathy (DR) [2,3], age related macular degeneration (AMD) [4], glaucoma [5], and other retinal diseases [6,7]. Due to its speed, noninvasiveness and simplicity, multiple OCT scans can be acquired per visit. As a result, a large number of OCT volumes are collected and stored in research and clinical databases. However, these OCT scans generally have speckle noise, movement artifacts, blink artifacts, or decentration [8]. While many artifacts can be minimized by careful imaging procedures, speckle noise is inherent to the OCT image acquisition process [9]. There is a need to reduce the speckle noise in the images for better visualization during diagnosis or disease progression. Several methods have been proposed to reduce speckle noise, including filtering [10,11], averaging [12,13], and deep learning-based super-resolution [14]. Among these, image registration and averaging are perhaps the most viable strategies since image acquisition is simple and multiple images can be acquired during each visit.

1.2 Prior work

Feature-based and volume-based techniques have been proposed for registering 3D OCT volumes with a focus on algorithmic speed and accuracy. Feature-based methods use specific landmarks such as layer depth [15], vessel position [16], and macular position [17] to register the images. Volume-based methods have been used previously in OCT registration [15,18,19,24,25]. Some volume-based registration methods are implemented using orthogonal scan patterns [24,25] while others are based on unidirectional scan patterns. Intensity-based registration metrics are calculated over all voxels of the 3D volume such as mutual information or cross correlation [18]. Pan et al. proposed techniques which use a mixture of features to provide a registration. Specifically, they perform a layer segmentation and select seven of the resulting eleven surfaces as features. Layer-based registration methods are typically fast due to the small amount of information being used. However, their accuracy depends on the quality of the volume segmentation, which is variable in patients with retinal pathology. To create a more robust registration, they also combine the vessel network and voxel intensity-based features to create a nonrigid registration between two OCT volumes [15,19]. Cheng et al. use a volumetric registration algorithm which uses the two input 3D OCT volumes and creates a mutual information-based affine registration. It then further performs a non-rigid B spline registration after an initial affine registration to correct for image distortions. This combination method is accurate but slow, especially for large and higher resolution volumes. To improve the speed of the method, they repeatedly down-sample the volumes using a gaussian filter and perform the registration on these lower resolution images [20]. Khansari et al. have proposed a similar method that uses normalized mutual information to first create an affine registration and then a B-spline-based non-rigid registration. To improve the speed of the algorithm, they masked some regions of the image (choroid and vitreous) to reduce volume size, used a non-local means filter to smooth the image, and used a pyramidal registration scheme to reduce the complexity of the registration [21].

1.3 Our contribution

In this work we propose a new 3D OCT alignment method capable of running on commodity hardware for aligning preexisting OCT volumes. Our method is a combination of volume and layer-based registration methods. We develop variations of volume and layer-based registration pipelines that include pre-processing steps, surface alignment, followed by a volume alignment and compare them with objective and subjective metrics. The benefits of our combined technique are its speed associated with feature alignment techniques, and alignment accuracy associated with volumetric alignments.

2. Methods

2.1 Data collection

This is retrospective analysis of data collected under IRB approval (IRB00276781, IRB00112080). All subjects consented to use of data for research purposes. A pilot dataset (Pilot) consisting of N = 9 repeat OCT acquisitions from a single eye of an asymptomatic and healthy subject was used for exploratory analysis. This dataset was collected over the course of five sittings over three months. A larger dataset of N = 16 subjects (Cohort 1) with history of diabetic retinopathy were used for assessing the quality of the alignment in patients with disease. At least four repeated OCT volumes were available per eye and per session for all Cohort 1 subjects. All OCT volumes were acquired using an Angioplex device (Carl Zeiss Meditec Inc) and exported using the commercially available review software (Version 11.5) in IMG format. The exported volumes used in this research have axial correction applied to them by the on-device post processing software. The OCT volumes have a field-of-view of 3 × 3x2 mm³ (245 × 245 × 1024 voxels) and are highly anisotropic. Quality control was carried out by visual inspection by a trained grader and confirmed by a fellowship trained retina specialist and a board-certified ophthalmologist with extensive experience using OCTA images (AHK). Images with significant motion artifacts, blink artifacts, or poor signal strength (SS < 8) were discarded. In total, 11 out of 192 (6%) scans were discarded.

2.2 Volume processing module

An alignment module was defined (Fig. 1) using a pair of scans from the same subject where one image is the fixed OCT volume and the other is the moving OCT volume. The module consists of three steps: pre-processing (PP), surface alignment (SA), and volume alignment (VA). The pre-processing operations are cropping, masking, and resizing. For cropping, the OCT volume was cropped to include the ILM to the RPE layer resulting in a volume smaller than the original (due to exclusion of the overlying vitreous and underlying choroid). For masking, the values of the voxels above the ILM and below the RPE layer were set to zero leaving the resulting volume the same size as the original. For both cropping and masking, each A-scan was processed individually. We resized the volumes from 245 × 245 × 1024 voxels to 300 × 300 × 200 voxels. This creates an isotropic volume with voxel to physical dimension scale factor of 10 micrometers (the original volumes are highly anisotropic) which is approximately the lateral resolution of the commercially available OCT device (Cirrus HD-OCT 5000).

Fig. 1. Alignment module which takes in two input OCT volumes (Fixed and Moving), aligns them based on pre-processing (PP), surface alignment (SA), and volumetric alignment (VA) and outputs the moving OCT volume (aligned to the Fixed OCT volume). The pre-processing operations are cropping, masking, and resizing.

Download Full Size | PDF

After pre-processing both the fixed and moving volumes, the layer-based surface alignment module was used to extract the ILM surfaces for the fixed and moving OCT volumes. The ILM was extracted by smoothing the volume with a gaussian filter and then binary thresholding. The surfaces were aligned using the iterative closest (ICP) point algorithm implemented in MATLAB (R2022b) using the ‘pcregistericp’ function with a point-to-point cost parameter. The transformation obtained from ICP was applied to the entire moving volume. Since the surface of the retina is usually perpendicular to the axial direction, the surface alignment stage will have the highest sensitivity to displacements in the axial dimension.

Volumetric alignment was then performed on the fixed image and the moving image post-surface alignment. Volumetric alignment was implemented in FSL (6.0.5.2) [22] using the ‘flirt’ function, which is a multi-resolution rigid body resolution algorithm. The fixed and moving OCT volumes were aligned with a rigid 6 degree-of-freedom (DOF) rigid registration, mutual information cost function, and trilinear interpolation. Rotational search parameters of -15 to 15 degrees around the x-axis, -15 to 15 degrees around the y-axis, and -5 to 5 degrees around the z-axis were used to restrict the search space. The volume alignment stage utilizes the entire OCT volumes of the fixed and moving images, and its purpose is to create a more precise alignment than the previous step. Since surface alignment already accounts for axial displacement, the main effect of volumetric alignment is to correct for rotations around the fovea and lateral translations.

2.3 Alignment pipelines

A set of six automated alignment pipelines were implemented in MATLAB and FSL and compared using the Pilot and Cohort 1 datasets. These pipelines consisted of three basic parts: pre-processing (PP), surface alignment (SA), and volumetric alignment (VA) (Fig. 2) with changes to the alignment methodology that could be empirically tested as described below. The first OCT scan of the session was used as a fixed volume, the repeated volumes were used as moving volume. Pipelines 1-3 use native OCT size of 245 × 245 × 1024 voxels based on a commercially available 3 × 3x2 mm³ scan pattern. Pipelines 4-6 use OCT volumes resampled to an isotropic size, with each voxel having physical size of 10 × 10 × 10 µm³ and resulting in volumes of 300 × 300 × 200 voxels.

Fig. 2. Summary of the alignment pipelines: resizing, cropping (crop) surface alignment (SA), and volumetric alignment (VA). Pipelines 1-3 use native OCT size of 245 × 245 × 1024 voxels based on a commercially available 3 × 3x2 mm³ scan pattern. Pipelines 4-6 use OCT volumes resampled to an isotropic size, with each voxel having physical size of 10 × 10 × 10 µm³ and resulting in volumes of 300 × 300 × 200 voxels.

Download Full Size | PDF

2.4 Volume averaging

To perform volume averaging, we set the first OCT volume for each subject’s eye as the fixed volume and set every other corresponding OCT volume as moving volume and performed pairwise alignment (Fig. 3). We then averaged the fixed and aligned volumes. Depending on the analysis performed, four to nine repeat volumes were averaged.

Fig. 3. Volume averaging scheme: A set of repeated OCT volumes are taken as input. The first volume is set as the fixed volume and the others are registered onto it. The fixed volume and the aligned volumes are then averaged together.

Download Full Size | PDF

2.5 Statistical analysis

To assess the alignment quality of the six pipelines, we selected three metrics that had been used in prior work [18,20]: root mean square error (RMSE), normalized cross correlation (NCC), and mutual information (MI). All three metrics were calculated over the entirety of each fixed and moving volume. The volumes had a minimum voxel intensity of 0 and a maximum voxel intensity of 255. All metrics were applied immediately after the preprocessing, surface alignment, and volumetric alignment. For Pipelines 1, 2, and 3, the preprocessing metrics were applied on the original volume, after cropping, and after masking respectively. For Pipelines 4, 5, and 6, the preprocessing metrics were applied on the resized volume, after resizing and cropping, and after resizing and masking respectively. Root-mean-square error measures the difference in intensity between voxels of the fixed image and the moving image. A lower RMSE indicates better alignment. The following formula gives RMSE, where ${I_F}(v )$ is the intensity of the fixed volume at voxel $v\; $ and ${I_M}(v )$ is the intensity of the moving volume at voxel $v\; $.

(1)$$RMSE = \sqrt {\mathop \sum \limits_v^{} {{({{I_F}(v )- {I_M}(v )} )}^2}}$$

Normalized cross-correlation measures the strength of the statistical correlation between the voxel intensities of the fixed image and moving image. A NCC ranges from –1 to 1 and the closer to 1 indicates a better alignment. The following formula gives NCC, where ${I_F}(v )$ is the intensity of the fixed volume at voxel $v\; $, $\overline {{I_F}} $ is the average intensity of the voxels in the fixed volume, ${I_M}(v )$ is the intensity of the moving volume at voxel $v\; $, and $\overline {{I_M}} $ is the average intensity of the voxels in the moving volume.

(2)$$NCC = \frac{{\mathop \sum \nolimits_v^{} ({{I_F}(v )- \overline {{I_F}} } )({{I_M}(v )- \overline {{I_M}} } )}}{{\sqrt {\mathop \sum \nolimits_v^{} {{({{I_F}(v )- \overline {{I_F}} } )}^2}\mathop \sum \nolimits_v^{} {{({{I_M}(v )- \overline {{I_M}} } )}^2}} }}$$

Mutual information measures how well the signal in the moving image can be predicted, given the signal intensity in the fixed image. It is useful when the images are acquired using different modalities. A higher MI indicates better alignment. The following formula gives MI, where ${p_{F,M}}({i;j} )$ is the probability of a voxel having intensity $i\; $ in the fixed image and $j\; $ in the moving image, ${p_F}(i )$ is the probability of a voxel having intensity $i\; $ in the fixed image and ${p_M}(j )$ is the probability of a voxel having intensity $j\; $ in the moving image.

(3)$$MI = \mathop \sum \limits_i^{} \mathop \sum \limits_j^{} {p_{F,M}}({i;j} ){\log _2}\left( {\frac{{{p_{F,M}}({i;j} )}}{{{p_F}(i ){p_M}(j )}}} \right)$$

The three metrics were collected at three stages of the pipeline: after pre-processing, after surface alignment, and after volumetric alignment. To evaluate the noise reduction in the aligned and averaged images, we selected the two metrics: signal-to-noise ratio (SNR) and peak signal-to-noise ratio (PSNR). The signal-to-noise ratio measures the strength of the signal compared to the background noise. For our analysis, we calculated SNR over a homogenous patches of the retina. A higher SNR is desirable since it indicates a lower noise level. According to theory [23], SNR should increase proportional to the square root of the number of images being averaged together. The following formula gives SNR, where ${\mu _{signal}}$ is the average signal intensity over some homogenous region of the image and ${\sigma _{signal}}$ is the standard deviation of signal intensity in that region.

(4)$$SNR = \frac{{{\mu _{signal}}}}{{{\sigma _{signal}}}}$$

The peak signal-to-noise ratio is another metric that measures the signal strength of the averaged image to the background noise. A higher PSNR is desirable since it indicates a lower noise level. The following formula gives PSNR, where $MA{X_I}$ is the maximum possible signal strength of a voxel (256). ${I_R}$ is the reference image, which is the average of the fixed image with all the aligned moving images. ${I_R}(v )$ represents the signal intensity at voxel $v\; $ in the reference image. ${I_N}$ is the comparison image, which is the average of the fixed image with the first $N - 1\; $ moving images. ${I_N}(v )$ the signal intensity at voxel $v\; $ in the comparison image.

(5)$$PSNR = 10\log \left( {\frac{{MA{X_I}}}{{\frac{1}{v}\mathop \sum \nolimits_v^{} {{({{I_R}(v )- {I_N}(v )} )}^2}}}} \right)$$

3. Results

3.1 Subject description

First, a pilot (Pilot) dataset of nine scans were collected from the right eye of a single, healthy adult male subject. The purpose of Pilot dataset was to evaluate the performance of the pipeline in a relatively simple and homogenous data set with modest variability. Next, a larger, non-homogenous data set (Cohort 1) containing 16 subjects (32 eyes; with four repeated scans per eye per imaging session) was used to test the robustness of the pipeline on clinical data. Eight of 16 subjects had data from a second repeat OCT imaging session available within ∼2 weeks of the first scan session. In total there were 24 scan sessions and a total of 192 scans that were available. During quality control, 11 scans were discarded due to severe motion artifacts, leaving 181 scans to assess the robustness of the algorithm on diseased eyes. Of the 16 subjects, 5 were diabetic (DM), 3 had hypertension (HTN), and 9 had hyperlipidemia (HLD). Furthermore, 2 of the 5 diabetic subjects had mild diabetic retinopathy, and the other 3 had normal eyes. Further details of both datasets are shown in Table 1. Signal strength refers to the amount of backscattered light signal received by the device and has a range from 0 to 10, where a higher number means a better signal. The Pilot dataset had an average signal strength of 9.2 and Cohort 1 had an average signal strength of 9.3. Furthermore, in the Cohort 1 dataset, all the subjects’ right eyes had their pupils dilated.

Table 1. Dataset Description

View Table | View all tables in this article

3.2 Alignment pipelines

All six pipelines described in Fig. 2 were run on the Pilot dataset. The post-alignment quality metrics and run time are shown in Table 2. For each metric, a two sample, single tailed t-test was used to compare the metrics from the best performing pipeline with the metric of the next best pipeline. Pipeline 4 obtained a significantly higher mutual information than Pipeline 5, the next best performing pipeline (p < 0.01). Pipeline 4 also had the lowest RMSE (p = 0.070) and highest NCC (p = 0.090) compared to the next best pipeline for each metric but these were not statistically significant. For some pipelines the alignment failed during volumetric alignment stage and the aligned moving volume was an all-black image. For those pipelines, NCC could not be computed, and the recorded NCC value is 0.00 ± 0.00. Pipeline 4 had the fastest time per alignment at 3.62 minutes. Finding the ILM through binary thresholding as described in the Methods section took an average of 1.72 sec for each volume. Pipelines 1 and 3 required the longest run time due to the large size of the volumes. For the Pilot dataset, Pipeline 4 had the best results over the metrics and run time.

Table 2. Performance Metrics for Registration and Averaging of Pipelines from Fig. 2 on Data from Pilot Subject ^a

View Table | View all tables in this article

Pipelines 2, 4, 5, and 6 as described in Fig. 2 were run on the Pilot dataset and the post-alignment quality metrics and run time are shown in Table 3. Based on the Pilot dataset, both Pipelines 1 and 3 take around 10 hours per alignment and did not show promising results. Therefore, they were excluded from further analyses on Cohort 1 data. We used a two sample, single tailed t-test to compare the value from the best performing pipeline with the value of the next best pipeline on Cohort 1 data. Pipeline 4 had a significant performance improvement for the RMSE (p < 0.01) and MI (p < 0.01) metrics over the next best pipeline. For NCC, Pipeline 6 performed better than Pipeline 4, but the difference was not significant (p = 0.29). Pipeline 6 had the fastest time per alignment at 3.02 minutes. For the Cohort 1 dataset, Pipeline 4 had the overall best results when considering the metrics and run time.

Table 3. Performance Metrics for Registration and Averaging of Pipelines from Fig. 2 on Data from Cohort 1^a

View Table | View all tables in this article

To illustrate the alignment process, we visualized the foveal slice of the OCT images pre-alignment and post-alignment for three fixed and moving image pairs (Fig. 4). In all the alignments, the pre-aligned images have a large axial offset equivalent to several hundred microns.

Fig. 4. Representative foveal B-scans and en face images from Pilot dataset showing the qualitative results of the Pipeline 4 alignment. The moving image (cyan) is overlayed on the fixed image (red) in both pre-alignment (left) and post-alignment (right) panels.

Download Full Size | PDF

3.3 Alignment pipelines – module comparison

The Pilot dataset was tested on the six pipelines shown in Fig. 2. The three metrics were collected after the pre-processing, surface alignment, and post-volume alignment stage. The metrics at each stage were plotted to see whether the pipelines improved the quality of the alignment. From Table S2, Pipelines 1, 2, and 3 had average RMSE values of 44.66, 44.62, and 56.81 at the preprocessing stage respectively, and RMSE values of 47.97, 81.00, and 52.74 after the volumetric alignment stage respectively. This means that Pipelines 1, 2, and 3 failed to substantially decrease RMSE and even increased it, which is undesirable. In comparison, Pipelines 4, 5, and 6 had average RMSE values of 46.18, 44.22 and 55.81 at the preprocessing stage respectively, and RMSE values of 15.33, 39.36, and 21.55 after the volumetric alignment stage respectively. This means that Pipelines 4, 5, and 6 substantially decreased RMSE, which is desirable. As Fig. 5 shows, for NCC and MI, Pipelines 4, 5, and 6 generally improve the quality of the alignment at each stage as shown by the increasing metric value at each stage, while Pipelines 1, 2, and 3 tend to decrease the quality or leave it unchanged.

Fig. 5. All six pipelines were tested on the Pilot dataset and pipelines 2,4,5, and 6 were tested on the Cohort 1 dataset. Root mean square error (RMSE), normalized cross correlation (NCC), and mutual information (MI) were recorded pre-processing, post-surface alignment, and post-alignment (See Fig. 2). The standard error bars are shown. The alignment of Pipelines 4 and 6 after the VA stage was significantly better (p < 0.01) than the alignment after the PP stage of those same pipelines in all three metrics. See Table S1 and S2 for underlying values.

Download Full Size | PDF

Next, the Cohort 1 dataset was tested. We can see that Pipelines 4, 5, and 6 perform better as the pipeline progresses, while Pipelines 2 performed worse. Pipelines 1 and 3 were excluded, as mentioned before. The metrics at each stage were plotted to see whether the pipelines improved the quality of the alignment. To do this, we took our best performing pipeline, Pipeline 4, which isotropically resized the images, and compared the metrics at different stages of the pipeline. Specifically, we tested whether the quality of the alignment for this single pipeline after the volumetric alignment stage is better than the quality after pre-processing. The single tailed t-test for RMSE, NCC, and MI all had a p value < 0.01. We can see that SA was able to improve the registration metrics for pipelines 4 and 6, the best two performing pipelines. For Pipeline 4, the best performing pipeline, the SA stage improved RMSE from 26.71 to 13.07 (p < 0.01), NCC from 0.40 to 0.86 (p < 0.01), and MI from 0.36 to 0.82 (p < 0.01). For Pipeline 6, the next best performing pipeline, the SA stage improved RMSE from 40.27 to 13.86 (p < 0.01), NCC from 0.40 to 0.92 (p < 0.01), and MI from 0.29 to 0.79 (p < 0.01). To determine whether the VA stage improved alignment quality, we performed a two tailed t-test using the data from supplemental Tables S1 and S2. Eight alignments were performed in Pilot, but the VA stage did not appear to improve the alignment metrics. This is probably due to the small sample size. However, for Cohort 1 dataset, 133 alignments were performed. VA significantly lowered RMSE in Pipelines 2 (p < 0.0001), 4 (p = 0.0007), and 5 (p = 0.0306). VA also significantly increased NCC for pipelines 4 (p = 0.0033) and 5 (p < 0.0001) and significantly increased MI for pipelines 4 (p < 0.0001) and 5 (p = 0.0004).

3.4 Volume averaging

Using the process outlined in Fig. 3, we averaged the aligned scans from the Pilot and Cohort 1 datasets. To visualize the reduction in speckle noise, we progressively increased the number of images being averaged together in the Pilot dataset. As expected, Fig. 6 shows that increasing the number of images averaged reduces speckle noise in the OCT B-scans. We observe that four averages provide an optimal improvement in resolution, but averaging more images continues to improve layer boundary and blood vessel definition to some degree.

Fig. 6. Qualitative illustration of image averaging on OCT B-scans appearance: a) baseline unaveraged volume, b) four averaged volumes, c) seven averaged volumes, d) nine averaged volumes. The yellow boxes show a magnified portion of the retina. The averaged images have less noise and the layer boundaries as well as the vessels in the ganglion cell layer are better defined qualitatively.

Download Full Size | PDF

Fig. 7. Quantitative illustration of image averaging on OCT B-scan intensity profile. Data from the Pilot dataset. Three B-scans (left column) and corresponding A-scans (right column) from the (top) inferior macula (0.5 mm below fovea), (center) foveal, and (bottom) superior regions of the retina (0.5 mm above fovea) are visualized. For each A-scan, the location in the corresponding B-scan is denoted by the vertical white line in the left column. For each A-scan, we displayed traces for unaveraged (n = 1), four averaged volumes (n = 4), and nine averaged volumes (n = 9) in the plot on the right.

Download Full Size | PDF

Fig. 8. From the Cohort 1 dataset, the SNR and PSNR are calculated for a different number of averaged images. Note that the SNR closely follows the best fit line of the square root of the number of averaged images. Standard deviation bars are shown for both SNR and PSNR.

Download Full Size | PDF

To better understand the effect of averaging, we compared the axial intensity profile (A-scans) from different regions of the retina in the Pilot dataset at different stages of averaging (Fig. 7). As the number of averaged volumes increases, the axial intensity profile becomes less noisy, especially in the vitreous and choroid regions of the A-scan. For axial intensity profiles in the superior and inferior regions of the B-scan, we can see that the retinal region displays four hyper-reflective peaks corresponding to the following anatomic layers: the retinal nerve fiber layer, inner plexiform layer, outer plexiform layer, and the retinal pigment epithelium layer. These peaks are more clearly visible and less noisy for averaged images.

To further quantitatively evaluate the effect of averaging on image quality, we calculated average SNR and PSNR on the Cohort 1 dataset (Fig. 8). The average SNR of a single, unaveraged OCT image was 8.6. The average SNR of two averaged images was 13.8. The average SNR of three averaged images was 16.5. The average SNR of four averaged OCT scans was 18.3. This confirms that SNR increases with √N, where N is the number of images being averaged, as predicted. The corresponding PSNR were 30.4, 36.4, and 41.9, respectively. This confirms that PSNR also increases as the number of images being averaged increases, indicating a higher image quality.

4. Discussion

4.1 Summary

In this study, we've demonstrated a simple, automated pipeline for aligning 3D OCT image volumes, comprising three key stages: pre-processing, surface alignment, and volume alignment. Preprocessing consisted of cropping masking and resizing. Surface alignment used the ILM layer segmentation and an iterative closest points algorithm, while volume alignment utilized an affine volumetric alignment based on mutual information. The pipelines were evaluated on two distinct datasets. Pipeline 4, which performed isotropic resizing, achieved the best results. On Cohort 1, Pipeline 4 decreased RMSE from 26.71 to 11.75, and increased NCC from 0.40 to 0.96, and increased MI from 0.36 to 0.91, which all changes being significant. Furthermore, pipeline 4 achieved significantly better RMSE and MI values than the next best performing pipeline. When the aligned images from pipeline 4 were averaged, SNR and PSNR increased.

4.2 Comparison to others

Our approach to registering and aligning OCT image volumes leverages the unique anatomy of the retina, employing a two-step process that combines layer alignment and volume-based alignment. This strategy effectively addresses the challenges posed by the retinal anatomy, with layer alignment primarily addressing z-axis alignment and volume-based adjustments refining the registration, particularly in the rotational axis around the z-axis. Our method offers a novel approach that capitalizes on the strengths of both layer and volume-based registration methods while mitigating their respective weaknesses, with both the SA and VA stages of the pipeline being able to significantly improve RMSE, NCC, MI on our data. Layer-based methods expedite registration by relying on retinal layer boundaries and measuring accuracy based on the absolute distance between layers in fixed and moving images. However, these approaches are less robust in handling rotational displacements, which arise from the retina's rotational symmetry around the fundus.

In contrast, volumetric-based rigid body methods utilize the entire volume during registration, employing metrics such as mutual information or cross-correlation to gauge alignment quality. These techniques excel in handling translational and rotational displacements but are often slower than feature-based techniques. Our method offers a balanced solution, effectively utilizing layer alignment for quick initial alignment in the z-axis and volume-based adjustments to refine the registration, particularly in addressing rotational displacements. This results in acceptable accuracy while maintaining efficiency. To simplify the registration, our approach primarily relies on affine transformations, which do not account for nonlinear distortions or image warping. In addition, we also screened the volumes using a quality control process, and rejected volumes with significant motion artifacts, a major source of nonlinear imaging artifacts. The advantage of using rigid registration is a fast and simple pipeline. In the future, we plan on adding non-rigid registration steps (when necessary) to address more significant warping and further expand the applicability of this pipeline.

4.3 Limitations

Our quality control step, while effective in screening out OCT volumes with gross imaging artifacts such as motion artifacts, blink artifacts, and low signal strength, represents a manual intervention. In future work, we will automate this screening process or devise methods to correct these distortions. Achieving these enhancements will pave the way for a fully automated and robust alignment method that can find utility in various clinical and research applications. Additionally, we acknowledge that the current implementation of our method may not be optimized for speed. Streamlining the pipeline and optimizing it for computational efficiency will expedite the alignment process and allow for efficient batch processing of OCT volumes. This optimization holds promise for enhancing throughput and making our method suitable for large-scale OCT data analysis. Furthermore, smoothing or other denoising techniques may improve the accuracy of pipelines 1-3 and will be evaluated in future work.

4.4 Conclusion

Our method offers a compelling solution for OCT image volume registration. Its speed, simplicity, and performance offer a unique contribution to the field. By utilizing the layer anatomy of the retina as well as global information, our approach can accurately align images, which can then be averaged to improve image quality. The ability to streamline the registration process while maintaining precision holds great promise for improving clinical diagnostics and advancing research in the field of OCT imaging.

Funding

National Eye Institute (R01EY030564); National Institutes of Health (UH3NS100614); School of Medicine, Johns Hopkins University (Wilmer Eye Institute Pooled Professor Fund); Johns Hopkins University (Malone Center for Engineering in Healthcare Grant).

Acknowledgments

There are no acknowledgements.

Disclosures

AHK has received consulting fees, grant support and honoraria from Carl Zeiss Meditec unrelated to this project. Carl Zeiss Meditec played no role in the conception, drafting, review or submission of this manuscript.

Data Availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request and IRB approval.

Supplemental document

See Supplement 1 for supporting content.

References

1. J. G. Fujimoto, W. Drexler, J. S. Schuman, et al., “Optical Coherence Tomography (OCT) in Ophthalmology: Introduction,” Opt Express 17, 3978 (2009). [CrossRef]

2. M. Elgafi, A. Sharafeldeen, A. Elnakib, et al., “Detection of Diabetic Retinopathy Using Extracted 3D Features from OCT Images,” Sensors 22(20), 7833 (2022). [CrossRef]

3. K. Y. Tey, K. Teo, A. C. S. Tan, et al., “Optical coherence tomography angiography in diabetic retinopathy: a review of current applications,” Eye and Vision 6(1), 37 (2019). [CrossRef]

4. M. Elsharkawy, M. Elrazzaz, M. Ghazal, et al., “Role of optical coherence tomography imaging in predicting progression of age-related macular disease: A survey,” Diagnostics 11(12), 2313 (2021). [CrossRef]

5. I. I. Bussel, G. Wollstein, and J. S. Schuman, “OCT for glaucoma diagnosis, screeningand detection of glaucoma progression,” Br. J. Ophthalmol. 98(Suppl 2), ii15–ii19 (2014). [CrossRef]

6. C. Danese and P. Lanzetta, “Optical Coherence Tomography Findings in Rhegmatogenous Retinal Detachment: A Systematic Review,” J Clin Med 11(19), 5819 (2022). [CrossRef]

7. K. Jin Han and Y. Hoon Lee, “Optical coherence tomography automated layer segmentation of macula after retinal detachment repair,” PLoS One 13(5), e0197058 (2018). [CrossRef]

8. J. Chhablani, T. Krishnan, V. Sethi, et al., “Artifacts in optical coherence tomography,” Saudi Journal of Ophthalmology 28(2), 81–87 (2014). [CrossRef]

9. J. M. Schmitt, S. H. Xiang, and K. M. Yung, “Speckle In Optical Coherence Tomography,” J Biomed Opt 4(1), 95–105 (1999). [CrossRef]

10. A. Ozcan, A. Bilenca, A. E. Desjardins, et al., “Speckle Reduction in Optical Coherence Tomography Images Using Digital Filtering,” JOSA A 24(7), 1901–1910 (2007). [CrossRef]

11. D. C. Adler, T. H. Ko, and J. G. Fujimoto, “Speckle Reduction in Optical Coherence Tomography Images by Use of a Spatially Adaptive Wavelet Filter,” Opt Lett 29(24), 2878 (2004). [CrossRef]

12. P. Zhang, E. B. Miller, S. K. Manna, et al., “Temporal speckle-averaging of optical coherence tomography volumes for in-vivo cellular resolution neuronal and vascular retinal imaging,” Neurophotonics 6(04), 1 (2019). [CrossRef]

13. M. Bashkansky and J. Reintjes, Statistics and Reduction of Speckle in Optical Coherence Tomography 25, (2000).

14. S. Cao, X. Yao, N. Koirala, et al., “Super-resolution technology to simultaneously improve optical & digital resolution of optical coherence tomography via deep learning,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (Institute of Electrical and Electronics Engineers Inc., 2020), Vol. 2020-July, pp. 1879–1882.

15. L. Pan, F. Shi, D. Xiang, et al., “Octrexpert: A feature-based 3D registration method for retinal OCT images,” IEEE Transactions on Image Processing 29, 3885–3897 (2020). [CrossRef]

16. M. Golabbakhsh and H. Rabbani, “Vessel-based registration of fundus and optical coherence tomography projection images of retina using a quadratic registration model,” IET Image Process 7(8), 768–776 (2013). [CrossRef]

17. M. Chen, A. Lang, H. S. Ying, et al., “Analysis of macular OCT images using deformable registration,” Biomed Opt Express 5(7), 2196 (2014). [CrossRef]

18. L. Pan and X. Chen, “Retinal OCT Image Registration: Methods and Applications,” IEEE Rev Biomed Eng 16, 307–318 (2023). [CrossRef]

19. L. Pan, L. Guan, and X. Chen, “Segmentation Guided Registration for 3D Spectral-Domain Optical Coherence Tomography Images,” IEEE Access 7, 138833–138845 (2019). [CrossRef]

20. Y. Cheng, Z. Chu, and R. K. Wang, “Robust three-dimensional registration on optical coherence tomography angiography for speckle reduction and visualization,” Quant Imaging Med Surg 11(3), 879–894 (2020). [CrossRef]

21. M. M. Khansari, J. Zhang, Y. Qiao, et al., “Automated Deformation-Based Analysis of 3D Optical Coherence Tomography in Diabetic Retinopathy,” IEEE Trans Med Imaging 39(1), 236–245 (2020). [CrossRef]

22. M. Jenkinson, C. F. Beckmann, T. E. J. Behrens, et al., “FSL,” Neuroimage 62(2), 782–790 (2012). [CrossRef]

23. M. J. Npiovaarat and R. F. Wagner, “SNR and Noise Measurements for Medical Imaging: I. A Practical Approach Based on Statistical Decision Theory,” Phys Med Biol. 38(1), 71–92 (1993). [CrossRef]

24. M. F. Kraus, J. J. Liu, J. Schottenhamml, et al., “Quantitative 3D-OCT motion correction with tilt and illumination correction, robust similarity measure and regularization,” Biomed. Opt. Express 5(8), 2591–2613 (2014). [CrossRef]

25. S. B. Ploner, M. F. Kraus, E. M. Moult, et al., “Efficient and high accuracy 3-D OCT angiography motion correction in pathology,” Biomed. Opt. Express 12(1), 125–146 (2021). [CrossRef]

Datasets		Pilot	Cohort 1
Total Subjects		1	16
	Age	23	69.9 ± 7.2
	% Male (N)	100% (1)	37.5% (6)
	% DM (N)	0% (0)	31.3% (5)
	% HTN (N)	0% (0)	18.7% (3)
	% HLD (N)	0% (0)	56.3% (9)
Total Eyes		1	32
	% Right Eye	100% (1)	50% (16)
Total Scan Sessions		1	24
	Repeats Per Eye Per Session	9	4
Total OCT Volumes		9	192
	% Right Eye	100%	48% (85)
	Signal Strength (0-10)	9.2 ± 0.5	9.3 ± 1.1
	% Dilated Eyes	0%	48% (85)

	RMSE (↓) (Voxel Intensity)	NCC (↑) (Voxel Intensity)	MI (↑) (bits)	Time (↓) (min)
Pipeline 1: SA + VA	47.97 ± 0.73	0.00 ± 0.00	0.01 ± 0.028	533.84
Pipeline 2: Crop + SA + VA	81.00 ± 11.30	0.02 ± 0.24	0.07 ± 0.17	5.21
Pipeline 3: Mask + SA + VA	52.74 ± 5.45	0.00 ± 0.00	0.03 ± 0.12	631.23
Pipeline 4: Resize + SA + VA	15.33 ± 1.90	0.93 ± 0.02	1.23 ± 0.10*	3.62
Pipeline 5: Resize + Crop + SA + VA	39.36 ± 12.55	0.73 ± 0.29	0.85 ± 0.36	3.95
Pipeline 6: Resize + Mask + SA + VA	21.55 ± 11.07	0.77 ± 0.32	0.50 ± 0.25	4.25

	RMSE (↓) (Voxel Intensity)	NCC (↑) (Voxel Intensity)	MI (↑) (bits)	Time (↓) (min)
Pipeline 1: SA + VA	-	-	-	-
Pipeline 2: Crop + SA + VA	39.65 ± 12.32	0.00 ± 0.00	0.38 ± 0.31	4.26
Pipeline 3: Mask + SA + VA	-	-	-	-
Pipeline 4: Resize + SA + VA	11.75 ± 2.37*	0.90 ± 0.11	0.91 ± 0.16*	3.15
Pipeline 5: Resize + Crop + SA + VA	24.21 ± 6.26	0.78 ± 0.13	0.78 ± 0.24	3.10
Pipeline 6: Resize + Mask + SA + VA	13.52 ± 5.81	0.91 ± 0.18	0.79 ± 0.17	3.02

Datasets		Pilot	Cohort 1
Total Subjects		1	16
	Age	23	69.9 ± 7.2
	% Male (N)	100% (1)	37.5% (6)
	% DM (N)	0% (0)	31.3% (5)
	% HTN (N)	0% (0)	18.7% (3)
	% HLD (N)	0% (0)	56.3% (9)
Total Eyes		1	32
	% Right Eye	100% (1)	50% (16)
Total Scan Sessions		1	24
	Repeats Per Eye Per Session	9	4
Total OCT Volumes		9	192
	% Right Eye	100%	48% (85)
	Signal Strength (0-10)	9.2 ± 0.5	9.3 ± 1.1
	% Dilated Eyes	0%	48% (85)

	RMSE (↓) (Voxel Intensity)	NCC (↑) (Voxel Intensity)	MI (↑) (bits)	Time (↓) (min)
Pipeline 1: SA + VA	47.97 ± 0.73	0.00 ± 0.00	0.01 ± 0.028	533.84
Pipeline 2: Crop + SA + VA	81.00 ± 11.30	0.02 ± 0.24	0.07 ± 0.17	5.21
Pipeline 3: Mask + SA + VA	52.74 ± 5.45	0.00 ± 0.00	0.03 ± 0.12	631.23
Pipeline 4: Resize + SA + VA	15.33 ± 1.90	0.93 ± 0.02	1.23 ± 0.10*	3.62
Pipeline 5: Resize + Crop + SA + VA	39.36 ± 12.55	0.73 ± 0.29	0.85 ± 0.36	3.95
Pipeline 6: Resize + Mask + SA + VA	21.55 ± 11.07	0.77 ± 0.32	0.50 ± 0.25	4.25

Rigid alignment method for secondary analyses of optical coherence tomography volumes

Abstract

1. Introduction

1.1 Motivation

1.2 Prior work

1.3 Our contribution

2. Methods

2.1 Data collection

2.2 Volume processing module

2.3 Alignment pipelines

2.4 Volume averaging

2.5 Statistical analysis

3. Results

3.1 Subject description

3.2 Alignment pipelines

3.3 Alignment pipelines – module comparison

3.4 Volume averaging

4. Discussion

4.1 Summary

4.2 Comparison to others

4.3 Limitations

4.4 Conclusion

Funding

Acknowledgments

Disclosures

Data Availability

Supplemental document

References

Supplementary Material (1)

Data Availability

Cited By

Figures (8)

Tables (3)

Equations (5)

Biomedical Optics Express