Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and discs in peripapillary OCT images

Open Access Open Access

Abstract

An accurate and automated tissue segmentation algorithm for retinal optical coherence tomography (OCT) images is crucial for the diagnosis of glaucoma. However, due to the presence of the optic disc, the anatomical structure of the peripapillary region of the retina is complicated and is challenging for segmentation. To address this issue, we develop a novel graph convolutional network (GCN)-assisted two-stage framework to simultaneously label the nine retinal layers and the optic disc. Specifically, a multi-scale global reasoning module is inserted between the encoder and decoder of a U-shape neural network to exploit anatomical prior knowledge and perform spatial reasoning. We conduct experiments on human peripapillary retinal OCT images. We also provide public access to the collected dataset, which might contribute to the research in the field of biomedical image processing. The Dice score of the proposed segmentation network is 0.820 ± 0.001 and the pixel accuracy is 0.830 ± 0.002, both of which outperform those from other state-of-the-art techniques.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Glaucoma is the leading cause of irreversible blindness globally, which affects approximately 64.3 million individuals worldwide [1]. In China, around 13.12 million people were affected by glaucoma in 2015, and the number is projected to reach 25.16 million by 2050 [2]. This will cause a heavy burden on public health. Currently, the most effective way to prevent glaucoma-related vision loss is early diagnosis and early intervention. In particular, detecting small morphological changes of retinal layers, such as the thinning of the retinal nerve fiber layer (RNFL) and ganglion cell layer (GCL), has critical value on the precise diagnosis of glaucoma [3].

Optical coherence tomography (OCT) as a non-invasive three-dimensional imaging modality is commonly used in eye clinics for retinal inspection. Due to the micro-meter-level axial resolution, it provides a unique capability to directly visualize the stratified structure of the retina and access their corresponding thicknesses. OCT-derived thickness of peripapillary RNFL is a common indicator in early-stage glaucoma diagnosis [4]. Therefore, a precise tissue segmentation of retinal OCT images becomes a critical step towards successful early diagnosis of glaucoma.

However, manual segmentation is time-consuming and laborious, while an accurate automated algorithm is desirable by both clinicians and researchers. Numerous automated retinal OCT segmentation techniques have been proposed in the past decades [517]. OCTExplorer is a prominent example for retinal layer boundary extraction, which is based on a conventional graph-theory algorithm [8]. Alonso-Caneiro et al. and Tian et al. designed graph-theory-based boundary detectors to extract choroidal boundary after image pre-processing [6,14]. Mayer et al. proposed an energy-minimization-based algorithm for retinal nerve fiber layer surface segmentation in circular OCT images [10]. Lang et al. presented a random forest boundary classifier to segment eight retinal layers in macular cube images [9]. Chiu et al. reported a kernel regression-based segmentation method for retinal OCT images with diabetic macular edema [7].

Recently, convolutional neural networks (CNN) have been widely applied to segment images obtained from various modalities and thus enable exciting applications [1826]. Fully convolutional networks (FCN) [27] and U-Net [28] are two popular candidates for medical image segmentation. For retinal OCT image segmentation, most state-of-the-art models [2938] could be considered variants of the encoder-decoder architecture like FCN and U-Net. Roy et al. proposed ReLayNet for end-to-end segmentation of macular OCT B-scans into retinal layers and fluid masses [34]. This work is among the first deep learning-based methods for automated segmentation of retinal OCT images. Yang et al. designed an attention-guided channel-to-pixel convolution network for retinal layer segmentation with choroidal neovascularization [37]: a channel-to-pixel block and an "edge loss function" were used to segment the retinal layer with blurry boundaries. To address the large morphological variations of the retinal layers, they also employed the attention mechanism. However, these two techniques were mainly targeting macular retinal image segmentation rather than that of peripapillary retinal images. Zang et al. developed an automated segmentation method for peripapillary retinal layer segmentation [38]. The left and right boundaries of the optic disc were first determined based on the estimated position of Bruch’s membrane opening in radially resampled B-scans. The retinal layer boundaries were then segmented by combining a convolutional neural network with a multi-weight graph search algorithm. Devalla et al. proposed a dilated-residual U-Net (DRUNET) to facilitate end-to-end segmentation of the individual neural and connective tissues of the optical nerve head [30]. However, they only segmented the retina into five layers and did not fully segment the optic disc from its connected tissues. In summary, most aforementioned techniques perform segmentation based on the textural features of the OCT images, while abundant anatomical priors available in the peripapillary retinal OCT images are not utilized.

In this manuscript, we report our recent study on explicitly exploiting the prior knowledge existed in the peripapillary OCT images. We argue that all peripapillary OCT images obtained by following a strict clinical protocol should share a similar anatomical arrangement: the optic nerve head, which is a large structure, is located in the center region of the image, while much thinner retinal layers are stratified on both sides, as shown in Fig. 1. Inspired by Jamal et al.’s work that uses a graph to represent the domain knowledge and the structural relationship of the tissues [39], we design a novel multi-scale graph convolutional network (GCN)-assisted two-stage network for joint segmentation of retinal layers and optic disc in peripapillary OCT images to fully take advantage of the anatomical priors. To show the efficacy of the proposed framework, experiments are conducted on a collected peripapillary OCT dataset, which consists of a total number of 122 OCT B-scans from 61 patients, and another public dataset [7]. The proposed model demonstrates superior performances on both datasets in comparison with the baselines and the state-of-the-arts. In the future, we plan to integrate the proposed segmentation framework into a diagnostic workflow for early-stage glaucoma detection. The dataset and the source codes are now publicly available online at https://github.com/Jiaxuan-Li/MGU-Net.

 figure: Fig. 1.

Fig. 1. The comparison between (a) macular OCT image and (b) peripapillary OCT image. The peripapillary image is manually segmented. Ten labels including RNFL, GCL, IPL, INL, OPL, ONL, IS/OS, RPE, choroid, and optic disc are used and annotated by different colors. The layer structure follows an arrangement that the optic nerve head is located in the center of the image, while much thinner retinal layers are stratified on both sides.

Download Full Size | PDF

2. Method

In current study, 10 labels including retinal nerve fiber layer (RNFL), ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclear layer (ONL), inner/outer photoreceptor segment (IS/OS), retinal pigment epithelium (RPE), choroid, and optic disc are manually annotated on the OCT dataset to facilitate the training procedure as illustrated in Fig. 1(b).

2.1 Overview of the segmentation framework

The schematic diagram of the proposed segmentation framework is given in Fig. 2. The entire framework consists of three components: the optic disc detection network, the retinal layer segmentation network, and the fusion module. An input OCT image is first processed by the optic disc detection network, through which a mask indicating the location of the optic disc and the corresponding feature map will be obtained. We then apply the mask on the input image to generate a disc-free image, which is later fed to the retinal layer segmentation network. Similarly, a feature map that delineates the nine retinal layers is also obtained and later concatenated with that of the optic disc from the first stage. Finally, a softmax activation function is used to generate the segmented output based on the concatenated feature map in the fusion module. The entire framework is trained in an end-to-end fashion: two loss functions are defined to penalize both the intermediate disc detection and the final segmentation.

 figure: Fig. 2.

Fig. 2. Schematic diagram of the proposed segmentation framework. (a) The entire process consists of three main steps: (1) the first stage is for initial optic disc detection; (2) the second stage is for retinal layer segmentation; (3) the outputs from the previous two stages are later fused. (b) A simplified illustration of Multi-scale GCN-assisted U-shape network (MGU-Net). MGU-Net consists of a pair of encoder and decoder, with a Multi-scale global reasoning module (MGRM) inserted in between. The detailed structure for MGU-Net and MGRM could be found in Fig. 6 and Fig. 5, respectively.

Download Full Size | PDF

Specifically, the optic disc detection network and the retinal layer segmentation network are designed to exploit the anatomical priors of the peripapillary region of the retina and to address the segmentation challenges imposed by the variation of the thicknesses among different retinal layers as illustrated in Fig. 3. In the right panel of Fig. 3, it is observed that the non-disc and disc regions are horizontally arranged in a “non-disc”-“disc”-“non-disc” fashion on the global scale. On the other hand, if we zoom in to the “non-disc” region as shown in the left panel of Fig. 3, we could appreciate that the retinal layers with various thicknesses are instead vertically arranged. Therefore, the optic disc detection and the retinal layer segmentation networks are devised with graph reasoning blocks with different design goals: the former is to perform long-range horizontal spatial reasoning, while the latter is to capture the multi-level vertical structures.

 figure: Fig. 3.

Fig. 3. Graph-based representation of peripapillary retinal OCT image. The image possesses a horizontal layout as “non-disc”-“disc”-“non-disc” on the global scale. A zoomed-in view of the “non-disc” region presents a stratified structure instead. Our segmentation framework is designed to exploit the anatomical priors of the peripapillary region of the retina and to address the segmentation challenges caused by the variation of thickness among retinal layers.

Download Full Size | PDF

2.2 Graph reasoning block

The key module used in both the optic disc detection network and the retinal layer segmentation network is the graph reasoning block.

Inspired by the graph-based global reasoning network [40,41], we devise a graph reasoning block to effectively extract the global features of 9 retinal layers and optic disc. The schematic diagram of the graph reasoning block, which consists of four operations, is depicted in Fig. 4. First of all, a local feature map $\mathrm {X} \in \mathbb {R}^{\mathrm {C} \times \mathrm {H} \times \mathrm {W}}$ in the latent space is fed to two convolutional layers in parallel to generate two maps: one feature map with reduced dimension and one projection matrix. After that, the reduced dimension feature is reshaped to $\mathrm {X}_{\mathrm {r}} \in \mathbb {R}^{\mathrm {C}_{\mathrm {r}} \times \mathrm {HW}},$ while the projection matrix is reshaped and transposed to $\mathrm {X}_{\mathrm {a}} \in \mathbb {R}^{\mathrm {C}_{\mathrm {n}} \times \mathrm {HW}}$. A matrix multiplication between $\mathrm {X}_{\mathrm {r}}$ and $\mathrm {X}_{\mathrm {a}}$ is then performed to obtain a node feature map $\mathrm {H} \in \mathbb {R}^{\mathrm {C}_{\mathrm {r}} \times \mathrm {C}_{\mathrm {n}}}$ before its being sent to a GCN block. We further connect $\mathrm {X}$ to a convolutional layer, and reshape the output to an inverse projection matrix $\mathrm {X}_{\mathrm {d}} \in \mathbb {R}^{\mathrm {C}_{\mathrm {n}} \times \mathrm {H} \times \mathrm {W}}$. The output of the GCN block is multiplied by $\mathrm {X}_{\mathrm {d}}$ to transform back to the original latent space, which is then reshaped and undergo another convolutional layer to eventually obtained the feature map $\mathrm {M} \in \mathbb {R}^{\mathrm {C} \times \mathrm {H} \times \mathrm {W}}$. Finally, we perform an element-wise addition of $\mathrm {M}$ and $\mathrm {X}$ to acquire the new feature map $\mathrm {Y} \in \mathbb {R}^{\mathrm {C} \times \mathrm {H} \times \mathrm {W}}$.

 figure: Fig. 4.

Fig. 4. The schematic diagram of a graph reasoning block, which consists of four operations. First of all, the original features are projected to the node space. After that, graph convolutions are performed on the node-space features to extract global node features. In order to fuse the global node features with the original features, they are then inversely projected back to the original feature space before being fused with the original features.

Download Full Size | PDF

It is clear that the new feature map Y contains both the information from the global feature and the original feature, which enables its capability of processing long-range contextual information.

2.3 Multi-scale global reasoning module

To address the segmentation challenges caused by the large variation of the retinal thicknesses between different layers, we propose a multi-scale global reasoning module (MGRM) to conduct global reasoning on high-level semantic features in all nine retinal layers. MGRM is composed of multi-scale pooling operators and graph reasoning blocks. It uses multiple effective receptive fields to capture and learn the features of the retinal layers with different thicknesses.

The MGRM splits the input into four different paths, three of which are equipped with pooling layers of different kernel sizes followed by graph reasoning blocks: inspired by [42], the kernel sizes are set to 2 $\times$ 2, 3 $\times$ 3 and 5 $\times$ 5 to encode the information of retinal layers with different thicknesses to global feature maps. Then, the global features are up-sampled to match the size of the original input feature map by bilinear interpolation. Finally, all four paths are re-combined and the features are concatenated. The entire procedure is illustrated in Fig. 5.

 figure: Fig. 5.

Fig. 5. The structure of multi-scale global reasoning module is composed of four branches. No pooling operator is in the first branch. There are pooling operators with 2 $\times$ 2 kernel, 3 $\times$ 3 kernel and 5 $\times$ 5 kernel in the second, third and fourth branch.

Download Full Size | PDF

2.4 Multi-scale GCN-assisted U-shape network

We use a U-shape network developed on the basis of the classic U-Net [28] as the backbone of the proposed multi-scale GCN-assisted U-shape network (MGU-Net). The schematic diagram of MGU-Net is presented in Fig. 6. MGRM is located in the center of the network to connect the encoding and decoding paths. It captures additional long-range contextual features, which are difficult to acquire in the conventional neural network. After several convolution operations and max-pooling operations in the encoder part, the feature map provides rich spatial features, which are informative for aggregating features and extracting nodes in the following MGRM. It should be noted that the sizes of the max-pooling kernels are different for the optic disc detection network and the retinal layer segmentation network: the size of each max-pooling kernel is set to (2, 2, 2) in the retinal layer segmentation network as in Fig. 6, while that of the optic disc detection network is set to be (2, 4, 4) to better capture the larger-scale semantic information represented by the optic disc.

 figure: Fig. 6.

Fig. 6. The structure of MGU-Net comprises of encoder, MGRM and decoder. The skip-connections concatenate low-level features from encoding path to the corresponding high-level features in decoding path.

Download Full Size | PDF

2.5 Loss function

In our two-stage segmentation framework, two loss functions $\mathcal {L}_{\mathrm {Seg}_{1}}$ and $\mathcal {L}_{\mathrm {seg}_{2}}$ are proposed to supervise these two stage MGU-Nets and to enforce them to segment the optic disc and nine retinal layers in an end-to-end fashion more accurately. The total loss $\mathcal {L}$ in this study is the sum of the two losses. The total loss function is shown as follow,

$$\mathcal{L}= \mathcal{L}_{\mathrm{seg}_{1}}+\lambda\mathcal{L}_{\mathrm{seg}_{2}}$$
where $\lambda$ weights the two losses. $\mathcal {L}_{\textrm {seg}}$ is defined as the sum of Dice loss and Cross-Entropy loss, which can be described as
$$\mathcal{L}_{\textrm{seg}}=\mathcal{L}_{\textrm{dice}}+\mathcal{L}_{\textrm{ce}}$$
where
$$\mathcal{L}_{\textrm{dice}}=1-\frac{1}{M} \sum_{i=1}^{M}\left(\frac{2 \sum_{x \in \Omega} p_{i}(x) \times g_{i}(x)}{\sum_{x \in \Omega} p_{i}(x)+\sum_{x \in \Omega} g_{i}(x)}\right)$$
$$\mathcal{L}_{\mathrm{ce}}={-}\frac{1}{M} \sum_{i=1}^{M} g_{i} \log \left(p_{i}\right)$$
in which $g_{i}$ and $p_{i}$ indicate the ground truth and the probability in prediction of pixel $x$ belonging to of class $i, M$ is the number of classes in the segmentation network.

3. Experiment design

3.1 Datasets and implementation details

3.1.1 Collected dataset

To verify the effectiveness of the framework, we conducted a series of experiments on collected peripapillary retinal OCT images. All images were de-identified and the procedure was approved by the Internal Review Board of Shanghai General Hospital. The entire dataset consists of 61 different subjects, for each of which 12 radial OCT B-scans are collected at the Ophthalmology Department of Shanghai General Hospital by using DRI OCT-1 Atlantis (Topcon Corporation, Tokyo, Japan). The clinical characteristics of the dataset are provided in Table 1. The image size is 1024 $\times$ 992 pixels, corresponding to a field of view of 20.48 mm $\times$ 7.94 mm.

Tables Icon

Table 1. Clinical characteristics of subjects in dataset.

For each subject, 2 radial OCT B-scans were randomly selected to ensure mutual exclusion. Two graders annotated these images manually through ITK-SNAP software [43] into the optic disc and nine retinal layers under the supervision of a glaucoma specialist. While one image was only annotated by one grader, the final ground truth was obtained from the consensus of all personnel.

For the experiment, the data were randomly divided into three mutually exclusive subsets for training, validation, and testing on the patient level. The ratio between these sets was 6:2:2. In order to enlarge the size of the collected dataset, we performed data augmentation including horizontal flipping, additive Gaussian noises, and contrast adjustment.

3.1.2 Public dataset

In addition, we tested our proposed technique on the Duke SD-OCT dataset, which was collected by Chiu et al. using a Spectralis HRA+OCT (Heidelberg Engineering, Heidelberg, Germany) [7]. It consists of 110 OCT B-scans obtained from 10 patients with diabetic macular edema (DME) with a size of 496 $\times$ 768 pixels. More details about this dataset could be found in [7].

3.1.3 Experimental details

The proposed method was implemented in PyTorch and trained on NVIDIA Tesla V100 GPUs. During the training, the initial learning rate was 0.001 and was reduced by an order of magnitude after every 20 epochs. The number of epochs was 50. Momentum and weight decay coefficients were set to 0.9 and 0.0001, respectively. We used Adam optimizer to train the model in mini-batches of size = 1. Parameter $\lambda$ in Eq. (1) was empirically set as 2. To ensure a fair comparison, the training hyperparameters were kept constant to achieve the best performance for all the comparative methods.

3.2 Comparisons with the state-of-the-arts on collected dataset

We compared our model with state-of-the-art techniques, including U-Net [28], ReLayNet [34] and DRUNET [30]. U-Net is a popular segmentation network used for medical images. ReLayNet is specially designed for segmenting retinal layers and fluid in macular OCT images. DRUNET is proposed to segment optic nerve head tissues in peripapillary OCT images using a dilated and residual U-shape network. It is worth noting that U-Net has four down-sampling and four up-sampling operators, while ReLayNet and DRUNET both perform three down-sampling and three up-sampling operations.

3.3 Comparisons with the state-of-the-arts on public dataset

We repeated the experiments on the Duke SD-OCT dataset, which is macular centered. Since the proposed two-stage framework is originally designed for segmenting peripapillary OCT B-scans, we removed the first stage and only used the second stage to compete against the state-of-the-art models including U-Net, ReLayNet, DRUNET and published results on this public dataset [7,34,35,44].

3.4 Ablation study

To assess the contribution of each component of the proposed framework, we performed several ablation studies on the collected dataset. We compared the performance of the proposed model with that of (1) one-stage baseline, (2) two-stage baseline, (3) with single-scale global reasoning module in the two-stage segmentation framework, and (4) without graph reasoning blocks in multi-scale global reasoning module.

3.5 Evaluation metrics

The Dice score (DSC) and pixel accuracy (PA) between predictions and segmentation references were used for the quantitative evaluation of segmentation performance. They are calculated as:

$$\operatorname{DSC} =\frac{2|X \cap Y|}{|X|+|Y|}$$
and
$$\mathrm{PA} =\frac{|X \cap Y|}{|Y|}$$
where $X$ is the region of prediction and $Y$ is the region of ground truth. The Dice score is used to measure the overlap between the prediction and ground truth. Pixel accuracy is used to calculate the true positive rate of predicted results compared with their ground truth.

4. Results

4.1 Collected dataset

The experimental results on the collected dataset are listed in Table 2 and Table 3. The proposed method outperforms the selected state-of-the-art methods in most optic nerve head tissue categories except for the optic disc (both Dice and pixel accuracy) and RPE (pixel accuracy): the overall average Dice score for the proposed network is 1.6$\%$, 1.5$\%$, and 1.4$\%$ higher than that of ReLayNet, U-Net, and DRUNET, respectively. A similar trend is observed in the pixel accuracy results.

Tables Icon

Table 2. Dice score ($\%$) of the segmentation results on the collected dataset by different methods. The best performance is marked by “*”, the second-best performance is indicated by “**”. Improvement is defined as the difference between the proposed method and the best performance obtained among other techniques.

Tables Icon

Table 3. Pixel accuracy ($\%$) of the segmentation results on the collected dataset by different methods. The best performance is marked by “*”, the second-best performance is indicated by “**”. Improvement is defined as the difference between the proposed method and the best performance obtained among other techniques.

Figure 7 shows the segmentation results obtained by various techniques on a normal peripapillary OCT image. The segmented image obtained by the proposed method as shown in Fig. 7(f) presents the best visual quality, while various artifacts are visible in the others. We roughly categorize the artifacts into two groups,

 figure: Fig. 7.

Fig. 7. Segmentation of a normal peripapillary OCT image. (a) Original image. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e) DRUNET’s prediction. (f) Proposed method’s prediction. Exemplary scattered labels are pointed out by the yellow arrows, while layer discontinuities are marked by white stars. Magnified views are also provided for better visualization.

Download Full Size | PDF

1) layer discontinuities, where the stratified retinal layers are unexpectedly terminated in the horizontal direction as observed in ReLayNet (Fig. 7(d)) and DRUNET (Fig. 7(e)) and marked by white stars;

2) scattered labels, where an isolated region is enclosed by a certain layer to be recognized as others as pointed out by the yellow arrows in Fig. 7(c)-(e). This type of error is observed in all but the proposed method.

We suggest that the improved performance could be ascribed to the additional prior knowledge incorporated in the proposed framework. For conventional CNN-based algorithms, the pixel-level classification is often sensitive to the textural details, while we regularize this with extra spatial constraints in the proposed technique: the segmentation results have to comply with the learned spatial layout, which dictates that the retinal layers must be arranged as a horizontally continuous and vertically stratified structure.

A similar observation could be made on a diseased sample with a retinal lesion as illustrated in Fig. 8. It is clear that the blurred boundaries and the reduced contrast in the lesion area as circled out in Fig. 8(a) are challenging for conventional CNN-based algorithms. On the other hand, while mis-labeling is also observed in the proposed method, the stratified structure of the retinal layers is well preserved, which again demonstrates the efficacy of the proposed technique.

 figure: Fig. 8.

Fig. 8. Segmentation of a peripapillary OCT image with a retinal lesion. (a) Original image. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e) DRUNET’s prediction. (f) Proposed method’s prediction. The scattered labels are pointed out by yellow arrows, the layer discontinuities are marked by white stars, and the retinal lesion is circled out by the white dashed line. Magnified views are also provided for better visualization.

Download Full Size | PDF

4.2 Public dataset

The experimental results obtained on the Duke SD-OCT dataset are reported in Table 4. It is worth mentioning that the Duke dataset is macular-centered, while our segmentation framework is designed for disc-center images. Therefore, only one stage of the proposed MGU-Net is used in this experiment. Nonetheless, the proposed model achieved the best performance in the ONL-ISM layer and the second-best performance in three retinal layers. The average Dice score achieved by the proposed MGU-Net is the highest if we do not take the results reported by Roy et al. into account, while it does outperform the ReLayNet we reproduced.

Tables Icon

Table 4. Dice score of the segmentation results on Duke SD-OCT dataset by different methods and expert 2 annotations. The best performance is marked by “*”, the second-best performance is indicated by “**”.

We also display the segmented images in Fig. 9. The proposed MGU-Net manifests better visual quality in comparison with other OCT retinal image segmentation methods. Consistent with the observations we have made in Section 4.1, artifacts such as layer discontinuities, which are marked by white stars, are presented in the images segmented by U-Net and DRUNET in Fig. 9(c) and Fig. 9(e), respectively.

 figure: Fig. 9.

Fig. 9. Visualization of segmentation results on Duke SD-OCT dataset. (a) Original image. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e) DRUNET’s prediction. (f) Our proposed method’s prediction. The scattered labels are pointed out by yellow arrows, the discontinuities are marked with white stars.

Download Full Size | PDF

4.3 Ablation study on the collected dataset

The quantitative results of the ablation study on our dataset are listed in Table 5 and Table 6. The baseline is the U-shape network with three down-sampling operations, which is one level shallower than the U-Net used in the previous sections. We first compare the results of the two-stage baseline with that of the one-stage baseline. If a two-stage network is adopted to segment optic disc and retinal layers separately, the Dice score and pixel accuracy are improved by 0.4$\%$ and 1.0$\%$ respectively. When we insert the single-scale graph reasoning block (GRB) into the two-stage baseline, the Dice scores are greatly increased compared with the two-stage baseline. However, pixel accuracy on RNFL drops slightly. On the other hand, if we apply multi-scale pooling (MSP) to the two-stage baseline, an immediate boost is also observed from all the results except for the pixel accuracy on RNFL. We suspect that the lack of improvement by adding GRB or MSP might be ascribed to the complicated morphology of the image: single-scale GRB could not capture all spatial information at one time, while MSP could not perform spatial reasoning in the global level. After introducing the multi-scale global reasoning module into the two-stage network, which is the combination of GRB and MSP, all the Dice score and pixel accuracy are both lifted, achieve an improvement of 0.9$\%$ and 1.2$\%$ compared with that of two-stage baseline. The results of the ablation study demonstrate the effectiveness of the added modules and indicate that the proposed multi-scale GCN-assisted two-stage U-shape network significantly improves the performance of segmentation in peripapillary OCT images.

Tables Icon

Table 5. Ablation study of each part in our framework through comparing the Dice score ($\%$). The best performance is marked by “*”.

Tables Icon

Table 6. Ablation study of each part in our framework through comparing the pixel accuracy ($\%$). The best performance is marked by “*”.

5. Discussion

5.1 Limited size of the collected dataset

In the current study, 122 OCT B-scans from 61 individuals are manually annotated and included in our experiment. While the size of the dataset is relatively small, it should be noted that the qualified human data are difficult to acquire, and manually annotating 10 tissues in one OCT image is very expensive and time-consuming. To partially overcome this issue, we performed data augmentations on the training set including horizontal flipping, additive Gaussian noises, and contrast adjustment.

5.2 Impact of vessel shadows and retinal lesions on the segmentation results

It is well known that the commonly presented vessel shadows and retinal lesions might influence the automated segmentation algorithms. These often cause blurred layer boundaries, diminished tissue texture, and altered image contrast, which could potentially lead to a decrease in the segmentation accuracy. A good illustration is provided in Fig. 8, where the presented retinal lesion is circled out in the right panel of Fig. 8(a). For conventional deep learning-based algorithms such as U-Net, ReLayNet, and DRUNET, labeling errors are visible as shown in Fig. 8(c)-(e). On the other hand, the proposed method is more robust to this perturbation. We believe this might because these algorithms mainly depend on the texture details of the images to perform the pixel-level classifications, while the proposed method explicitly imposes spatial constraints on the task which regularizes the task and ensures a better visual outcome in this case as illustrated in Fig. 8(f). To further address this issue, we plan to perform detection, removal, and inpainting for vessel shadows and retinal lesions [45].

5.3 Label noises in the manual segmentation

It is also worth mentioning that the proposed method might be affected by the label noises. Due to the limited resources, the manual segmentation of the OCT images is performed collaboratively by two graders under the supervision of a retinal specialist, such that one image is only segmented by one grader. Therefore, it is possible that small label noises are introduced, because the two graders might possess different preferences and styles during the annotation [44]. This might slightly impair the performance of the segmentation. Therefore, this noisy label problem is deserved to be investigated in the future study [46].

5.4 Proposed framework relies on standardized images

One of the potential limitations of the proposed framework is that it requires the input images to be well standardized such that all anatomical assumptions or spatial constraints we have made are valid. Take the first stage, the optic disc segmentation network, as an example, it relies on the presumption that the optic disc region and the non-disc regions are arranged in proper horizontal order as mentioned in Section 2.1. We could be done by performing a registration process prior to segmentation with the goal of registering the optic disc region with a retinal template in the future.

6. Conclusion

To address the challenges imposed by the multi-scale features presented in the optic disc and the retinal layers with various thicknesses as well as exploiting the existing anatomical priors, a multi-scale global reasoning module, which is capable of long-range contextual spatial reasoning, is proposed and integrated into a U-shape network. Specifically, a two-stage framework is constructed to sequentially segment the optic disc and the retinal layers in peripapillary OCT images. We validate the proposed framework on a collected dataset as well as a public dataset. The experimental results on both datasets show that the proposed method could considerably improve the segmentation performance of retinal tissues compared with other state-of-the-art techniques. The proposed method achieves 82.0% and 83.0% in terms of Dice score and pixel accuracy on average, which are 1.6% and 2.8% higher than the performance of ReLayNet, respectively. More importantly, the visual quality of the segmented images is greatly enhanced, thanks to the anatomical knowledge learned by the multi-scale global reasoning module. In the future, we will incorporate the proposed segmentation network into the workflow of early-stage glaucoma diagnosis. We also believe the proposed architecture could be domain transferred to other biomedical image segmentation tasks where an abundance of anatomical priors is available. To facilitate the progression of the field, we make our segmentation dataset as well as the codes available. To our best knowledge, this will be the first public peripapillary retinal OCT dataset.

Funding

National Key Research and Development Program of China (2019YFB2203104); National Natural Science Q1 Foundation of China (61905141, 62035016); Shanghai Sailing Program (19YF1439700); Shanghai Shen Kang Hospital Development Center (SHDC2020CR30538); Shanghai Engineering Research Center of Precise Diagnosis and Treatment of Eye Diseases (19DZ2250100); Science and Technology Commission of Shanghai Municipality (20DZ1100200); Shanghai Public Health System Three-Year Plan-Key Subjects (GWV10.1-XK7); China Scholarship Council (201506230096).

Acknowledgement

The computations in this paper were run on the $\pi$ 2.0 cluster supported by the Center for High Performance Computing at Shanghai Jiao Tong University. We sincerely appreciate reviewers for their precious suggestions which help improve this work substantially.

Disclosures

The authors declare no conflicts of interest.

References

1. S. Resnikoff, D. Pascolini, D. Etya’ale, I. Kocur, R. Pararajasegaram, G. P. Pokharel, and S. P. Mariotti, “Global data on visual impairment in the year 2002,” Bull. W. H. O. 82, 844–851 (2004).

2. P. Song, J. Wang, K. Bucan, E. Theodoratou, I. Rudan, and K. Y. Chan, “National and subnational prevalence and burden of glaucoma in china: a systematic analysis,” Bull. World Heal. Organ. 7(2), 020705 (2017). [CrossRef]  

3. A. V. Mantravadi and N. Vadhar, “Glaucoma,” Prim. Care 42(3), 437–449 (2015). [CrossRef]  

4. V. Kansal, J. J. Armstrong, R. Pintwala, and C. Hutnik, “Optical coherence tomography for glaucoma diagnosis: an evidence based meta-analysis,” PLoS One 13(1), e0190621 (2018). [CrossRef]  

5. F. A. Almobarak, N. O’Leary, A. S. Reis, G. P. Sharpe, D. M. Hutchison, M. T. Nicolela, and B. C. Chauhan, “Automated segmentation of optic nerve head structures with optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 55(2), 1161–1168 (2014). [CrossRef]  

6. D. Alonso-Caneiro, S. A. Read, and M. J. Collins, “Automatic segmentation of choroidal thickness in optical coherence tomography,” Biomed. Opt. Express 4(12), 2795–2812 (2013). [CrossRef]  

7. S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, and S. Farsiu, “Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema,” Biomed. Opt. Express 6(4), 1172–1194 (2015). [CrossRef]  

8. M. K. Garvin, M. D. Abramoff, X. Wu, S. R. Russell, T. L. Burns, and M. Sonka, “Automated 3-D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images,” IEEE Trans. Med. Imaging 28(9), 1436–1447 (2009). [CrossRef]  

9. A. Lang, A. Carass, M. Hauser, E. S. Sotirchos, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Retinal layer segmentation of macular oct images using boundary classification,” Biomed. Opt. Express 4(7), 1133–1152 (2013). [CrossRef]  

10. M. A. Mayer, J. Hornegger, C. Y. Mardin, and R. P. Tornow, “Retinal nerve fiber layer segmentation on fd-oct scans of normal subjects and glaucoma patients,” Biomed. Opt. Express 1(5), 1358–1383 (2010). [CrossRef]  

11. S. Naz, A. Ahmed, M. U. Akram, and S. A. Khan, “Automated segmentation of rpe layer for the detection of age macular degeneration using oct images,” in 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA) (2016), pp. 1–4.

12. S. Niu, Q. Chen, L. de Sisternes, D. L. Rubin, W. Zhang, and Q. Liu, “Automated retinal layers segmentation in SD-OCT images using dual-gradient and spatial correlation smoothness constraint,” Comput. Biol. Med. 54, 116–128 (2014). [CrossRef]  

13. P. P. Srinivasan, S. J. Heflin, J. A. Izatt, V. Y. Arshavsky, and S. Farsiu, “Automatic segmentation of up to ten layer boundaries in SD-OCT images of the mouse retina with and without missing layers due to pathology,” Biomed. Opt. Express 5(2), 348–365 (2014). [CrossRef]  

14. J. Tian, P. Marziliano, M. Baskaran, T. A. Tun, and T. Aung, “Automatic segmentation of the choroid in enhanced depth imaging optical coherence tomography images,” Biomed. Opt. Express 4(3), 397–411 (2013). [CrossRef]  

15. C. Wang, Y. X. Wang, and Y. Li, “Automatic choroidal layer segmentation using markov random field and level set method,” IEEE J. Biomed. Heal. Inform. 21(6), 1694–1702 (2017). [CrossRef]  

16. J. Wang, M. Zhang, A. D. Pechauer, L. Liu, T. S. Hwang, D. J. Wilson, D. Li, and Y. Jia, “Automated volumetric segmentation of retinal fluid on optical coherence tomography,” Biomed. Opt. Express 7(4), 1577–1589 (2016). [CrossRef]  

17. P. Zang, S. S. Gao, T. S. Hwang, C. J. Flaxel, D. J. Wilson, J. C. Morrison, D. Huang, D. Li, and Y. Jia, “Automated boundary detection of the optic disc and layer segmentation of the peripapillary retina in volumetric structural and angiographic optical coherence tomography,” Biomed. Opt. Express 8(3), 1306–1318 (2017). [CrossRef]  

18. S. Borkovkina, A. Camino, W. Janpongsri, M. V. Sarunic, and Y. Jian, “Real-time retinal layer segmentation of oct volumes with gpu accelerated inferencing using a compressed, low-latency neural network,” Biomed. Opt. Express 11(7), 3968 (2020). [CrossRef]  

19. J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham, X. Glorot, B. O’Donoghue, D. Visentin, G. van den Driessche, B. Lakshminarayanan, C. Meyer, F. Mackinder, S. Bouton, K. Ayoub, R. Chopra, D. King, A. Karthikesalingam, C. O. Hughes, R. Raine, J. Hughes, D. A. Sim, C. Egan, A. Tufail, H. Montgomery, D. Hassabis, G. Rees, T. Back, P. T. Khaw, M. Suleyman, J. Cornebise, P. A. Keane, and O. Ronneberger, “Clinically applicable deep learning for diagnosis and referral in retinal disease,” Nat. Med. 24(9), 1342–1350 (2018). [CrossRef]  

20. H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo, “Automatic brain tumor detection and segmentation using u-net based fully convolutional networks,” in Medical Image Understanding and Analysis, M. Valdés Hernández and V. González-Castro, eds. (Springer International Publishing Cham, 2017506–517.

21. J. Fan, J. Yang, Y. Wang, S. Yang, D. Ai, Y. Huang, H. Song, A. Hao, and Y. Wang, “Multichannel fully convolutional network for coronary artery segmentation in x-ray angiograms,” IEEE Access 6, 44635–44643 (2018). [CrossRef]  

22. K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Med. Image Anal. 36, 61–78 (2017). [CrossRef]  

23. X. Li, H. Chen, X. Qi, Q. Dou, C. W. Fu, and P. A. Heng, “H-denseunet: Hybrid densely connected unet for liver and tumor segmentation from CT volumes,” IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018). [CrossRef]  

24. Y. Ma, H. Hao, H. Fu, J. Zhang, J. Yang, J. Liu, Y. Zheng, and Y. Zhao, “ROSE: a retinal OCT-angiography vessel segmentation dataset and new model,” arXiv: 2007.05201 (2020).

25. P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. Benders, and I. Isgum, “Automatic segmentation of MR brain images with a convolutional neural network,” IEEE Trans. Med. Imaging 35(5), 1252–1261 (2016). [CrossRef]  

26. C. Wu, Y. Xie, L. Shao, J. Yang, D. Ai, H. Song, Y. Wang, and Y. Huang, “Automatic boundary segmentation of vascular doppler optical coherence tomography images based on cascaded U-net architecture,” OSA Continuum 2(3), 677 (2019). [CrossRef]  

27. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv: 1411.4038 (2014).

28. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, eds. (Springer International Publishing), pp. 234–241.

29. Z. Chai, K. Zhou, J. Yang, Y. Ma, Z. Chen, S. Gao, and J. Liu, “Perceptual-assisted adversarial adaptation for choroid segmentation in optical coherence tomography,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 1966–1970.

30. S. K. Devalla, P. K. Renukanand, B. K. Sreedhar, G. Subramanian, L. Zhang, S. Perera, J. M. Mari, K. S. Chin, T. A. Tun, N. G. Strouthidis, T. Aung, A. H. Thiery, and M. J. A. Girard, “Drunet: a dilated-residual U-net deep learning network to segment optic nerve head tissues in optical coherence tomography images,” Biomed. Opt. Express 9(7), 3244–3265 (2018). [CrossRef]  

31. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Deep learning based topology guaranteed surface and mme segmentation of multiple sclerosis subjects from retinal OCT,” Biomed. Opt. Express 10(10), 5042–5058 (2019). [CrossRef]  

32. M. Heisler, M. Bhalla, J. Lo, Z. Mammo, S. Lee, M. J. Ju, M. F. Beg, and M. V. Sarunic, “Semi-supervised deep learning based 3D analysis of the peripapillary region,” Biomed. Opt. Express 11(7), 3843–3856 (2020). [CrossRef]  

33. S. Krishna Devalla, T. H. Pham, S. K. Panda, L. Zhang, G. Subramanian, A. Swaminathan, C. Zhi Yun, M. Rajan, S. Mohan, R. Krishnadas, V. Senthil, J. M. S. de Leon, T. A. Tun, C.-Y. Cheng, L. Schmetterer, S. Perera, T. Aung, A. H. Thiery, and M. J. A. Girard, “Towards label-free 3D segmentation of optical coherence tomography images of the optic nerve head using deep learning,” arXiv: 2002.09635 (2020).

34. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger, and N. Navab, “Relaynet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks,” Biomed. Opt. Express 8(8), 3627 (2017). [CrossRef]  

35. J. Wang, Z. Wang, F. Li, G. Qu, Y. Qiao, H. Lv, and X. Zhang, “Joint retina segmentation and classification for early glaucoma diagnosis,” Biomed. Opt. Express 10(5), 2639–2656 (2019). [CrossRef]  

36. X. Xi, X. Meng, Z. Qin, X. Nie, Y. Yin, and X. Chen, “Ia-net: informative attention convolutional neural network for choroidal neovascularization segmentation in OCT images,” Biomed. Opt. Express 11(11), 6122 (2020). [CrossRef]  

37. X. Yang, X. Chen, and D. Xiang, “Attention-guided channel to pixel convolution network for retinal layer segmentation with choroidal neovascularization,” in Medical Imaging 2020: Image Processing, vol. 11313I. Išgum and B. A. Landman, eds., International Society for Optics and Photonics (SPIE, 2020), pp. 786–792.

38. P. Zang, J. Wang, T. T. Hormel, L. Liu, D. Huang, and Y. Jia, “Automated segmentation of peripapillary retinal boundaries in oct combining a convolutional neural network and a multi-weights graph search,” Biomed. Opt. Express 10(8), 4340–4352 (2019). [CrossRef]  

39. J. Atif, C. Hudelot, G. Fouquier, I. Bloch, and E. D. Angelini, “From generic knowledge to specific reasoning for medical image interpretation using graph based representations,” in IJCAI (IJCAI, 2007), pp. 224–229.

40. Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, and Y. Kalantidis, “Graph-based global reasoning networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

41. X. Li, Y. Yang, Q. Zhao, T. Shen, Z. Lin, and H. Liu, “Spatial pyramid based graph reasoning for semantic segmentation,” arXiv: 2003.10211 (2020).

42. Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, and J. Liu, “Ce-net: Context encoder network for 2D medical image segmentation,” IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). [CrossRef]  

43. P. A. Yushkevich, J. Piven, H. C. Hazlett, R. G. Smith, S. Ho, J. C. Gee, and G. Gerig, “User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability,” NeuroImage 31(3), 1116–1128 (2006). [CrossRef]  

44. A. Chakravarty and J. Sivaswamy, “A supervised joint multi-layer segmentation framework for retinal optical coherence tomography images using conditional random field,” Comput. Methods Programs Biomed. 165, 235–250 (2018). [CrossRef]  

45. H. Liu, S. Cao, Y. Ling, and Y. Gan, “Inpainting for saturation artifacts in optical coherence tomography using dictionary-based sparse representation,” IEEE Photonics J. 13(2), 3900110 (2021). [CrossRef]  

46. Z. Huang, H. Zhang, A. Laine, E. Angelini, C. Hendon, and Y. Gan, “Co-seg: an image segmentation framework against label corruption,” arXiv: 2102.00523 (2021).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1.
Fig. 1. The comparison between (a) macular OCT image and (b) peripapillary OCT image. The peripapillary image is manually segmented. Ten labels including RNFL, GCL, IPL, INL, OPL, ONL, IS/OS, RPE, choroid, and optic disc are used and annotated by different colors. The layer structure follows an arrangement that the optic nerve head is located in the center of the image, while much thinner retinal layers are stratified on both sides.
Fig. 2.
Fig. 2. Schematic diagram of the proposed segmentation framework. (a) The entire process consists of three main steps: (1) the first stage is for initial optic disc detection; (2) the second stage is for retinal layer segmentation; (3) the outputs from the previous two stages are later fused. (b) A simplified illustration of Multi-scale GCN-assisted U-shape network (MGU-Net). MGU-Net consists of a pair of encoder and decoder, with a Multi-scale global reasoning module (MGRM) inserted in between. The detailed structure for MGU-Net and MGRM could be found in Fig. 6 and Fig. 5, respectively.
Fig. 3.
Fig. 3. Graph-based representation of peripapillary retinal OCT image. The image possesses a horizontal layout as “non-disc”-“disc”-“non-disc” on the global scale. A zoomed-in view of the “non-disc” region presents a stratified structure instead. Our segmentation framework is designed to exploit the anatomical priors of the peripapillary region of the retina and to address the segmentation challenges caused by the variation of thickness among retinal layers.
Fig. 4.
Fig. 4. The schematic diagram of a graph reasoning block, which consists of four operations. First of all, the original features are projected to the node space. After that, graph convolutions are performed on the node-space features to extract global node features. In order to fuse the global node features with the original features, they are then inversely projected back to the original feature space before being fused with the original features.
Fig. 5.
Fig. 5. The structure of multi-scale global reasoning module is composed of four branches. No pooling operator is in the first branch. There are pooling operators with 2 $\times$ 2 kernel, 3 $\times$ 3 kernel and 5 $\times$ 5 kernel in the second, third and fourth branch.
Fig. 6.
Fig. 6. The structure of MGU-Net comprises of encoder, MGRM and decoder. The skip-connections concatenate low-level features from encoding path to the corresponding high-level features in decoding path.
Fig. 7.
Fig. 7. Segmentation of a normal peripapillary OCT image. (a) Original image. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e) DRUNET’s prediction. (f) Proposed method’s prediction. Exemplary scattered labels are pointed out by the yellow arrows, while layer discontinuities are marked by white stars. Magnified views are also provided for better visualization.
Fig. 8.
Fig. 8. Segmentation of a peripapillary OCT image with a retinal lesion. (a) Original image. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e) DRUNET’s prediction. (f) Proposed method’s prediction. The scattered labels are pointed out by yellow arrows, the layer discontinuities are marked by white stars, and the retinal lesion is circled out by the white dashed line. Magnified views are also provided for better visualization.
Fig. 9.
Fig. 9. Visualization of segmentation results on Duke SD-OCT dataset. (a) Original image. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e) DRUNET’s prediction. (f) Our proposed method’s prediction. The scattered labels are pointed out by yellow arrows, the discontinuities are marked with white stars.

Tables (6)

Tables Icon

Table 1. Clinical characteristics of subjects in dataset.

Tables Icon

Table 2. Dice score ( % ) of the segmentation results on the collected dataset by different methods. The best performance is marked by “*”, the second-best performance is indicated by “**”. Improvement is defined as the difference between the proposed method and the best performance obtained among other techniques.

Tables Icon

Table 3. Pixel accuracy ( % ) of the segmentation results on the collected dataset by different methods. The best performance is marked by “*”, the second-best performance is indicated by “**”. Improvement is defined as the difference between the proposed method and the best performance obtained among other techniques.

Tables Icon

Table 4. Dice score of the segmentation results on Duke SD-OCT dataset by different methods and expert 2 annotations. The best performance is marked by “*”, the second-best performance is indicated by “**”.

Tables Icon

Table 5. Ablation study of each part in our framework through comparing the Dice score ( % ). The best performance is marked by “*”.

Tables Icon

Table 6. Ablation study of each part in our framework through comparing the pixel accuracy ( % ). The best performance is marked by “*”.

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

L = L s e g 1 + λ L s e g 2
L seg = L dice + L ce
L dice = 1 1 M i = 1 M ( 2 x Ω p i ( x ) × g i ( x ) x Ω p i ( x ) + x Ω g i ( x ) )
L c e = 1 M i = 1 M g i log ( p i )
DSC = 2 | X Y | | X | + | Y |
P A = | X Y | | Y |
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.