High-quality color image restoration from a disturbed graded-index imaging system by deep neural networks

Xuanyu Hu; Xuanyu Hu; Xuanyu Hu; Zaipeng Duan; Yang Yang; Yang Yang; Yang Yang; Yehong Tan; Yehong Tan; Yehong Tan; Ruiqi Zhou; Ruiqi Zhou; Ruiqi Zhou; Jiong Xiao; Jiong Xiao; Jiong Xiao; Jinwei Zeng; Jinwei Zeng; Jinwei Zeng; Jinwei Zeng; Jian Wang; Jian Wang; Jian Wang; Jian Wang

doi:10.1364/OE.485664

1. Introduction

Common tissues of bio-organisms are often opaque and scattering with a rapidly varying distribution of refractive indices. Therefore, it is one of the most exciting challenges to obtain high-quality images and videos deep in human tissue. To address this challenge, researchers have proposed imaging methods such as two-photon fluorescence imaging [1,2], optical phase conjugation [3,4], optical memory effects [5,6], and optical transport matrices [7,8] to achieve higher resolution and tissue penetration depth while pursuing faster optimization, lower phototoxicity, and further device miniaturization. Among these approaches, the graded-index (GRIN) waveguides has been served as an interesting candidate [9]. Through a special graded-index refractive distribution, it can periodically expand and converge beam with low inter-modal dispersion and analog imaging transmission [10]. As the core component of the imaging system, the GRIN lens has unique advantages such as long transmission distance, high resolution, a controllable field of view, smaller volume, high exposure, and real-time operation [11,12] thanks to their exceptional properties. However, as a common problem in all imaging system, transmitted images in practical GRIN imaging systems can also become blurred or even unrecognizable due to volatile environment, inaccurate optical alignment, and optical aberration, etc. [13].

To overcome the imaging distortion in the imaging transmission system, recently, researchers have conducted a plethora of research on imaging restoration algorithm based on the multi-mode fiber (MMF) imaging systems and deep neural networks (DNNs) [14]. In particular, convolutional neural networks (CNNs), with their low complexity and high adaptability, have become a mainstream technique for image restoration [15]. The CNNs can be equated to a correction for the point spread function (PSF) of the imaging system, which is not a device but more like a black box with a mapping relationship function. Common MMF is typically made from step-index waveguide that can support multiple modes, and naturally, the image in the MMF waveguide will suffer from severe inter-modal dispersion in a macroscopic distance [16–19]. The transmitted images from MMFs usually exhibit random speckle patterns that is completely deteriorated from the original object. With the help of CNNs, it has been demonstrated that the networks can efficiently learn input-output relations within 0.75 m MMF, and can achieve up to 94% image reconstruction quality [16]. Borhani et al. has proposed the use of U-net to reconstruct handwritten digits with 28*28 resolution propagated through multimode fiber [17], and after restoration, the classification accuracy of up to 95% have been achieved. The above study demonstrated the feasibility of CNNs in MMF imaging systems in experiments, but retained problems of large sample demand, low image size, and long training time. Kürüm et al. have used a deep learning algorithm for a real-time spectral deconstruction of scattered patterns to achieve the reconstruction of RGB images, but with lower resolution [18]. In summary, the transmission of simple handwritten digits or symbols via MMF has been widely studied. However, due to the relatively poor imaging guiding mechanism of MMF, it is generally difficult to recover complex high-quality color images, and the typical network training time is very long.

As one of the major research directions in imaging transmission today, MMF suffers from a serious problem of significant inter-modal dispersion. Intuitively, MMF has varying focal lengths for light entering at different angles, causing even high-quality images to transform into highly intricate speckle upon traversing the MMFs. This raises great difficulties for image restoration, mainly reflected in the need for a massive dataset. For instance, [16] has used 60,000 images for training. On the other hand, Common MMF based on the step index waveguide can support many modes with natural inter-modal dispersion, while the GRIN waveguide boasts of the unique feature of dispersion compensation. Specifically, GRIN waveguides can bend incident light rays periodically over a pitch length, thereby significantly reducing the inter-modal dispersion of the imaging light rays (see Supplement 1 for details). As a result, the aberration introduced by GRIN waveguides is much smaller than that of MMFs, and only a small amount of training data is needed to ensure the restoration effect of DNN networks. Considering only finite number of training sets can be practically applied to a certain DNN, this is of great significance to engineer GRIN waveguide into practical applications

Boasting of the unique imaging guiding mechanism, the GRIN waveguides can be an ideal candidate to work with DNNs and contribute to an ideal imaging transmission system. However, to date the collaboration between GRIN waveguides and DNNs has rarely been reported. As a result, it can be a promising opportunity to research the combination of the two important fields. In this work, we demonstrate the recovery of exemplary distorted 2D color images of 512*512 pixels transmitted by a disturbed GRIN lens system through two typical types of DNNs respectively, as a preliminary study of GRIN imaging system. Compared with the MMF imaging system, the output images of the GRIN imaging system have lower dispersion and distortion owing to the intrinsic physical characteristics of the GRIN lens. Thus, it brings no surprise that the GRIN imaging system excels in the significantly reducing training set and training time with complex high-quality color images. Based on this optical system, we deliberately deteriorate the imaging transmission by defocusing in sample/detector side and blurring, and compare two major types of DNNs, i.e., pix2pix and U-net to restore the distorted images. We find that the pix2pix type can achieve shorter training times.

In the following sections, we will first demonstrate the superior properties of GRIN waveguides by theoretical derivation and then describe the experimental apparatus used to collect the dataset that we train two DNNs to restore distorted images. Using the same dataset to train different DNNs, the performance is evaluated for blurred images restoration under deliberate imaging distortion. Finally, we discuss the advantages of the GRIN waveguides imaging system combined with the pix2pix type DNNs, and eventually envision the promising future applications of this work in the minimally invasive rigid endoscopic applications.

2. Materials and methods

2.1 Imaging of disturbed GRIN systems

Unlike the most common waveguide (e.g., optical fibers), the refractive index of GRIN waveguides exhibits a gradient distribution. Therefore, the light propagating in it follows a continuous arc rather than a saw tooth shape, thus reducing the scattering losses due to interface irregularities of the waveguides [21]. GRIN lens, one of the most common GRIN waveguides, have a refractive index profile that is proportional to the square of the off-axis distance. Theoretical study based on Eikonal equation and ray equation shows that paraxial meridional rays propagate in a sinusoidal pattern with the same period as that in GRIN lens [22]. So, the paraxial meridional rays are periodically converged and diverged in the GRIN lens, which greatly reduces the inter-modal dispersion. As a rigid waveguide, the GRIN lens can have a very accurate and stable intrinsic parameters (length, diameter, index profile etc.), making it an excellent candidate for rigid endoscopy applications. However, the imaging properties of GRIN waveguides are also affected by multiple aspects, which are discussed later in this paper and in Supplement 1.

A sketch of our workflow is presented as Fig. 1. As shown in Fig. 1(a), a sample image is transferred by the GRIN Lens and captured by a complementary metal oxide semiconductor (CMOS) camera. Here, we consider three situations that cause imaging distortion, i.e., changing the camera position to bring the system out of focus, adding random scattering medium interference and changing the sample position. We have finally captured 750 sets of color images with different distortion scenarios. For training and evaluating the networks, the obtained 750 images are randomly split into 700 training and 50 testing sets.

Fig. 1. Schematics of the restoration process. (a) The samples are illuminated by a light source and acquired by a GRIN lens, transferred out GRIN lens, and then imaged on a CMOS camera. We repeat this process by changing the camera and the sample position, adding random scattering medium interference to vary the blurring level of the received images. For the same set of input images, different blurred images are obtained for different distortion scenarios. (b) A part of the images is selected to train the DNNs, another part of the blurred images is given to the trained DNNs for restoration and to evaluate the generalization and robustness of the DNNs.

Download Full Size | PDF

Figure 2 shows the GRIN Lens image transmission system. In the experiment, we load different color images onto the thin-film-transistor liquid crystal display (TFT LCD, 1080 × 2280, BOE), such as the color logo, then each image is imaged by an 8.091 mm GRIN lens (G1P11, Thorlabs) with 1 mm core diameter and numerical aperture (NA) of 0.5, and then magnified and focused by the combination of 20x objective lens (OL, RMS20X-PF, Thorlabs) and focusing lens to image onto a CMOS camera (IMX582, SONY). The CMOS camera is placed on a linear displacement stage (DDS300/M, Thorlabs) to adjust the position.

Fig. 2. GRIN lens image transmission system. Experimental setup for image transmission through the GRIN Lens. GRIN Lens: Gradient-Index Lens; LCD: liquid crystal display; Lens: focusing lens; OL: objective lens; CMOS: complementary metal oxide semiconductor.

Download Full Size | PDF

In our works, we first prepare some complex color patterns as data sets loaded to the LCD, then adjust the LCD locations so that the target patterns are captured by the GRIN lens. Later, we adjust the CMOS camera distance, imaging at an accurately focused plane, to get clear images. then we consider three scenarios to get blurred images: 1) Adjust the COMS camera back 1.5, 2.5, and 3.5 cm respectively. 2) Adjust the LCD back 1.5 and 2.5 cm respectively. 3) Add random scattering media and stain. We repeat this procedure to collect a total of 5250 data sets. This database is divided into a training set and a test set by 14:1 and fed into two types of DNNs to train, separately.

2.2 Images restoration based on DNNs

In the era of big data, deep neural networks have been applied to major fields. Essentially, given a corresponding input and output pair, deep neural networks perform linear or nonlinear operations to find the mapping relationship between input and output by using large amounts of data and GPU computing power. In this paper, we use DNNs to find the mapping relationship between the input image and the corresponding pixel positions of the output image, so that the input image is as similar as possible to the output image after DNNs. Specifically, DNNs uses a convolution operation to apply a convolution kernel (also known as a filter) to each location of the input image, computes a feature map (also known as an activation map) process to extract features such as edges, textures, shapes, etc. in the image, and then maps the features to the target result through pooling and fully connected layers to achieve tasks such as image recognition. The U-net type CNNs developed by Ronneberger et al. [23] are widely used for image segmentation in biomedical applications [24] and have been explored in recent years for fiber image reconstruction [25–27]. It is a nearly symmetric network architecture that consists of down-sampling convolution to extract contextual features and up-sampling deconvolution with image information recovery. It has params of 53.5 M and a floating point of operations (FLOPs) of 37.8 G, see Fig. 3(a) for details. The pix2pix type CNNs developed by Isola et al. [28] are generally used for image semantic synthesis [29] and image super-resolution tasks [30,31]. It is a cGAN [35] network model, a variant of generative adversarial network (GAN). In cGAN, the input of the generator contains a condition vector in addition to random noise, which specifies the conditions that the generated data should satisfy. In this paper, the input image is used as a conditional vector and the output is specified as data similar to the input image, which facilitates image transformation and image generation. In recent years, it has been exploited for image reconstruction [32]. Using GANs [33] structure, the generator adopts U-net for image reconstruction, it has params of 54.4 M and a FLOPs of 18.1 G, see Fig. 3(b) for details, and the discriminator utilizes PatchGAN [28] to judge recovered image quality, reducing the computational effort and improving the results. Subject to the above, both networks can be applied to image reconstruction tasks, but structurally compared, pix2pix with its unique GAN type would be a bit more suitable.

Fig. 3. Details of the implemented networks (a) U-net type image reconstruction CNNs. (b) pix2pix GAN type image reconstruction convolutional neural networks.

Download Full Size | PDF

Here we define generalizability [34] to indicate that the network model obtained by training a finite number of samples can recover even for untrained images, and robustness [34] to indicate the effectiveness and stability of inputting diversification of images. To test these capabilities of both DNNs, the acquired datasets involve three kinds of images: images acquired at 1.5 cm, 2.5 cm, and 3.5 cm behind the image space, images acquired at 1.5 cm and 2.5 cm behind the object space, images acquired with random scattering media and stain on the object. Then the datasets are divided into a training set and a test set with a ratio of 14:1. In U-net, the Adam optimizer with a learning rate of 1 × 10⁻⁴ and momentum parameters β1 = 0.5, β2 = 0.999 is used to minimize the mean square error (MSE) cost function while the Adam optimizer with a learning rate of 2 × 10⁻⁴ and momentum parameters β1 = 0.5, β2 = 0.999 is used to minimize L1 distance and conditional GAN loss in pix2pix. The detail loss function expression of pix2pix networks is:

(1)$$L = \arg mi{n_G}ma{x_D}{L_{cGAN}}\left( {G,D} \right) + \lambda {L_1}\left( G \right)$$

(2)$${L_{cGAN}} = {E_y}[{\log D(y )} ]+ {E_{x,z}}[{\log ({1 - D({G({x,z} )} )} )} ]$$

where G,D represent the Generator Network and Discriminative Network respectively, min_G represents minimizing the pixel-wise mean square error between the output image and the GT image, i.e., making the output image more similar to the GT image, max_D represents maximizing the predicted probability value that the output image and the GT data pair are real images, i.e., making the output image closer to the real image. And E_* represents expectation, and x, y, z represents the input, output image and noise respectively (here provided by dropout). The final loss consists of two parts, one is the cGAN, which can guide image generation by adding conditional information; the other part is the L1 distance, which is used to constrain the difference between the generated image and the real image. The $\lambda $ is hyperparameter that controls the weight between the two kinds of loss.

The networks are trained for a maximum of 500 epochs. The networks are implemented on a single NVIDIA Ge-Force GTX 3080 graphics processing unit using the PyTorch framework Python library. In the illustrated examples, we have implemented image reconstruction of a 512*512-pixel self-picked dataset using U-net and pix2pix on a single GPU at 30FPS and 150FPS, respectively, indicating an excellent operation speed that can facilitate practice.

3. Results

3.1 Comparison of images recovery results

In the first step, we test the image reconstruction recovery ability of U-net and pix2pix for our self-collected datasets, respectively. The image reconstruction results for the color data set with add random scattering media and stain and adjusting image and object image space defocus distance is given in the following Fig. 4, Fig. 5, and Fig. 6 respectively.

Fig. 4. Restoration of image space defocusing. Defocused image restoration at 1.5 cm, 2.5 cm, and 3.5 cm.

Download Full Size | PDF

Fig. 5. Restoration of object space defocusing. The results of object space defocused image restoration at 1.5 cm, 2.5 cm.

Download Full Size | PDF

Fig. 6. Restoration of object space defocusing. The results of object space defocused image restoration at 1.5 cm, 2.5 cm. Restoring of scattering and stain distortion. Add scattering medium and stain to deliberately distort the images and then restore them with DNNs.

Download Full Size | PDF

After we trained the two DNNs with the training set dataset, we used the DNNs to recover the three categories of images of 512*512 pixels in the test set and display them in groups, as shown in Fig. 4, Fig. 5, and Fig. 6.

From Fig. 4, we can see that at a short image space defocus distance, the images are smaller and blurred but still recognizable, and the DNNs recover remarkably well. After increasing the defocused distance, the defocused images become extremely blurred and even unrecognizable, but our network can still recover them to recognizable states. It is noteworthy that, except for the blurring effect, the image space defocusing may also change the transverse position and the size of the image, introducing additional types of image quality. However, the DNNs here exhibit powerful restoration capability that can simultaneously cure all these effects.

As shown in Fig. 5, different from image space defocus, when the object space is defocused, the image blur is not obvious. However, the same thing as image space defocus, the image size changes, and the farther away from the focus plane, the smaller the image. Whether it is image space defocus or object space defocus, the further defocus distance will increase the difficulty of restoration. Both U-net and Pix2Pix can restore the defocus images, although the detail is a bit lacking at a further defocus distance. Similarly, a clear position and size change can be observed in the object space defocusing case, and the DNNs can loyally cure these scaling effects.

From Fig. 6, we can see that by adding random scattering media and stain. In these two situations, the image size is the same, but our network can still recover them to a recognizable level.

3.2 DNNs restoration performance assessment

To quantitatively evaluate the recovery results and explicitly describe the reconstruction performance, the average peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [19] of the test set are calculated as shown in Tables 1, 2, and 3. PSNR and SSIM are universally used to measure the performance of image reconstruction methods. PSNR is defined as

(3)$$\begin{array}{{c}} {\textrm{PSNR} = 10 \times {{\log }_{10}}\frac{{{{255}^2}}}{{MSE}}} \end{array}$$

(4)$$\begin{array}{{c}} {\textrm{MSE} = \; \frac{1}{{\textrm{H} \times \textrm{W}}}\mathop \sum \limits_{\textrm{i} = 1}^\textrm{H} \mathop \sum \limits_{\textrm{j} = 1}^\textrm{W} {{({\textrm{X}({\textrm{i},\textrm{j}} )- \textrm{Y}({\textrm{i},\textrm{j}} )} )}^2}} \end{array}$$

Table 1. Comparisons between image space defocus distance

View Table | View all tables in this article

Table 2. Comparisons between object space defocus distance

View Table | View all tables in this article

Table 3. Comparisons between object space interference

View Table | View all tables in this article

Here MSE represents the mean square error of the current image X and the reference image Y, X(i,j), Y(i,j) represent the pixel values at the corresponding coordinates, H and W are the height and width of the image, respectively; the number of pixel gray levels is 512. The unit of PSNR is dB. It is straightforward that a larger PSNR indicates a smaller MSE and a closer similarity between the original and restored images.

The structural similarity method evaluates the image quality from the following aspects: brightness, contrast, and structure. The concrete expression of SSIM between X and Y is

(5)$${\textrm{SSIM}({\textrm{X},\textrm{Y}} )= \frac{{({2{\mathrm{\mu }_\textrm{X}}{\mathrm{\mu }_\textrm{Y}} + {\textrm{C}_1}} )({2{\mathrm{\sigma }_{\textrm{XY}}} + {\textrm{C}_2}} )}}{{({\mathrm{\mu }_\textrm{X}^2 + \mathrm{\mu }_\textrm{Y}^2 + {\textrm{C}_1}} )({\mathrm{\sigma }_\textrm{X}^2 + \mathrm{\sigma }_\textrm{Y}^2 + {\textrm{C}_2}} )}}}$$

where C₁, C₂ and C₃ are constants to avoid the denominator to be 0 and maintain stability. Usually C₁= (K₁L)², C₂= (K₂L)², C₃= C₂/2, generally K₁= 0.01, K₂= 0.03, L = 255 (is the dynamic range of pixel values, generally taken as 255)

(6)$${{\mu _X} = \frac{1}{{M \times N}}\mathop \sum \limits_{i = 1}^M \mathop \sum \limits_{j = 1}^N X({i,j} )}$$

(7)$${{\mu _Y} = \frac{1}{{M \times N}}\mathop \sum \limits_{i = 1}^M \mathop \sum \limits_{j = 1}^N Y({i,j} )}$$

(8)$${{\sigma _X} = {{(\frac{1}{{M \times N - 1}}\mathop \sum \limits_{i = 1}^M \mathop \sum \limits_{j = 1}^N {{(X({i,j} )- {\mu _X})}^2})}^{\frac{1}{2}}}}$$

(9)$${{\sigma _Y} = {{(\frac{1}{{M \times N - 1}}\mathop \sum \limits_{i = 1}^M \mathop \sum \limits_{j = 1}^N {{(Y({i,j} )- {\mu _Y})}^2})}^{\frac{1}{2}}}}$$

(10)$${{\sigma _{XY}} = (\frac{1}{{M \times N - 1}}\mathop \sum \limits_{i = 1}^M \mathop \sum \limits_{j = 1}^N ({X({i,j} )- {\mu_X}} )({Y({i,j} )- {\mu_Y}} ))}$$

where structural similarity value ranges from 0 to 1. With the larger value, the image similarity is higher.

Tables 1, 2, and 3 show the average PSNR and SSIM values of U-net and pix2pix under the three types of images. With the help of Fig. 3, it can be clearly illustrated that the pix2pix achieves similar results with much less computational effort than U-net under the comparisons between image space interference datasets; under the Comparisons between object space defocus distance dataset, the performance metrics of the pix2pix network are slightly higher than U-net, indicating the superior performance of GAN for image reconstruction algorithms. By examining the robustness, we find that well-trained DNNs can still work well with the three categories of images. The results also show that pix2pix obtains similar results with much less computational effort than U-net, which reflects the superiority of GAN in the field of image reconstruction. Finally, we observe that the trained DNNs can quickly recover the blurred images from the GRIN lens imaging system under intentional defocusing.

4. Discussion and conclusion

4.1 Discussion

In this paper, we use U-net and pix2pix CNNs for image recovery. As introduced previously, the U-net type CNNs commonly has been used in image segmentation in biomedical applications and image reconstruction in recent years, while the pix2pix type CNNs are generally used for image translation and super-resolution imaging tasks. Compared with pix2pix and U-net types, although the pix2pix type CNNs have been exploited for image reconstruction only recently, we found in this work that the pix2pix network is superior in several important aspects. Firstly, for the generator, pix2pix uses U-net to reconstruct the image, restoring the low-frequency components, i.e., the parts of the image with small gradients and moderate changes. As for the discriminator, pix2pix uses PatchGAN to identify the generated image to offset the high-frequency components, i.e., the parts of the image with large gradients and sharp changes. In PatchGAN, an image is partitioned into different patches of N x N size, and each patch is discriminated as true or false, and the average is taken as the final output which will reduce the dimensionality of the input and improve the speed of operation. Secondly, the CNNs tend to over-fit when the amount of data is small, but the gradient update information of the generator in the GAN networks comes from the discriminator rather than the data samples, which moderates the consequences. Based on the FLOPs and energy consumption perspectives, we demonstrate that the pix2pix type CNNs is more suitable than the traditional U-net CNNs for image recovery applications in the proposed waveguide imaging systems.

Furthermore, we remind that one important feature of the proposed GRIN imaging system, compared with the pixel-based endoscopy, does not impose any pixel limitation for the imaging transmission procedure, so that it loses less details during imaging transmission and can achieve a better overall resolution. Such resolution improvement, on the other aspect, does not physically increase the imaging resolution, as it is still a farfield imaging system and the final resolution is determined by the farfield detector (CCD or CMOS). However, we emphasize that the proposed imaging system can be smoothly implemented into high/super-resolution microscopy technique, such as in [12,13], to make an endoscopy with high resolution and quality. It can be an important step to realize in-vivo high-resolution imaging in a live body. In this paper, experiment was conducted at room temperature to approach the ambient temperature in the medical use scenario.

As an outlook in the field of endoscope, we believe that the GRIN waveguide has irreplaceable advantages over traditional methods, including improved imaging transmission quality, higher robustness, lower computational resource and less human supervision requirement. In comparison, traditional optical imaging system using common spherical/aspheric objective lens combinations can achieve high imaging quality so that additional imaging correction may be minimally needed, but it typically cannot guide images from a complex interior area for endoscopic applications. As a common endoscopic imaging candidate, the MMFs, however, are intrinsically plagued with inter-modal dispersion so that the images are severely deteriorated inside the waveguide and form random speckle patterns. It is not surprising that the GRIN image system has much better image preservation in the waveguide, and requires much less computational resources to train the DNN and achieve image correction. Lastly, the fiber bundle [18], multi-core fiber [36] or scanning single-mode fiber [37] are also used for endoscopic purposes. Notably, fiber bundle-based endoscopic system has been proposed and achieve excellent image correction from a deep learning network [18]. However, in this case, each core of the fiber only transmits undistinguished light intensity as a pixel point, so that the waveguide itself impose additional restriction to imaging resolution. Furthermore, the scanning mechanism also negatively affects the frame rate and exposure time to the photo-detector, which may further deteriorate the image signal-to-noise ratio (SNR). In contrast, the proposed GRIN imaging system boast of the comprehensive effects of dispersion suppression and analog imaging transmission. Combining with the deep learning mechanism, the GRIN imaging system can be excel at image quality from both hardware and software side, and provides inspirational ideas for new endoscope structures.

As a result, these results can provide solutions for various waveguide optics applications, such as in-vivo tissue interior imaging for endoscopy. Regarding DNNs in high-quality reconstruction applications for in-vivo imaging, it is important to note that the major time-consuming part of the process is the DNN training phase. Once the DNN weights are optimized, it effectively becomes an automatic lookup table and can be applied quickly to any input within a few milliseconds achievable by common computational hardware.

In this work we effectively recover disturbed color images from a GRIN imaging system with quality. As a starting point we use deliberately disturbed 2D animated color images from varying disturbance sources as the test images to recover. Especially, it is notable that we have used the same DNN to cleanse the image distortion with the different sources in the GRIN imaging system. For waveguide imaging reconstruction applications, the use of GRIN waveguides requires a hundred times fewer training set images used by later DNNs than the use of step refractive index waveguides with 60000 pairs image (i.e., MMF) [16]. We attribute this to the fact that for step refractive index waveguides, such as MMF, the transmitted images become seriously distorted due to mode coupling and inter-modal dispersion [20], making the final output image like randomly-scattered speckle. In contrast, the GRIN lenses have lower chromatic and inter-modal dispersion so that the imaging distortion is efficiently suppressed inside the waveguides. Therefore, recovering the output distorted images from the GRIN imaging system requires significantly less algorithmic complexity than that of the MMF system, resulting in a significant reduction in the number of required images sets and processing time for the training set of DNNs.

4.2 Conclusion

In conclusion, here we demonstrate an imaging transmission system based on the GRIN waveguide aided with DNNs, which can efficiently cleanse the deliberately disturbed 2D color images with different sources. The required training set and operation proficiency can be much improved from conventional waveguides. Also, we show that the method of pix2pix is superior than U-net in both imaging correction similarity and time. This result defines an important starting point of imaging transmission technique using GRIN waveguide, and may inspire future endoscopic applications with finer imaging correction quality.

Funding

National Natural Science Foundation of China (62125503); Key R&D Program of Hubei Province of China (2020BAB001, 2021BAA024); Shenzhen Science and Technology Program (JCYJ20200109114018750); Innovation Project of Optics Valley Laboratory (OVL2021BG004).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data that support the findings of this study are available from the corresponding author upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. K. He, L. Zhao, Y. Chen, X. Huang, Y. Ding, H. Hua, L. Liu, X. Wang, M. Wang, and Y. Zhang, “Label-free multiphoton microscopic imaging as a novel real-time approach for discriminating colorectal lesions: A preliminary study,” J. Gastroenterol. Hepatol. 34(12), 2144–2151 (2019). [CrossRef]

2. N. Fang, Z. Wu, X. Wang, N. Cao, Y. Lin, L. Li, Y. Chen, S. Cai, H. Tu, and D. Kang, “Rapid, label-free detection of intracranial germinoma using multiphoton microscopy,” Neurophoton. 6(03), 1 (2019). [CrossRef]

3. Z. Yaqoob, D. Psaltis, M. S. Feld, and C. Yang, “Optical phase conjugation for turbidity suppression in biological samples,” Nat. Photonics 2(2), 110–115 (2008). [CrossRef]

4. Y. Shen, Y. Liu, C. Ma, and L. V. Wang, “Focusing light through biological tissue and tissue-mimicking phantoms up to 9.6 cm in thickness with digital optical phase conjugation,” J. Biomed. Opt 21(8), 085001 (2016). [CrossRef]

5. S. Feng, C. Kane, P. A. Lee, and A. D. Stone, “Correlations and fluctuations of coherent wave transmission through disordered media,” Phys. Rev. Lett. 61(7), 834–837 (1988). [CrossRef]

6. I. Freund, M. Rosenbluh, and S. Feng, “Memory effects in propagation of optical waves through disordered media,” Phys. Rev. Lett. 61(20), 2328–2331 (1988). [CrossRef]

7. S. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Image transmission through an opaque material,” Nat. Commun. 1(1), 81 (2010). [CrossRef]

8. K. Lee and Y. Park, “Exploiting the speckle-correlation scattering matrix for a compact reference-free holographic image sensor,” Nat. Commun. 7(1), 13359–7 (2016). [CrossRef]

9. W. Tomlinson, “Applications of GRIN-rod lenses in optical fiber communication systems,” Appl. Opt. 19(7), 1127–1138 (1980). [CrossRef]

10. A. Zareei, A. Darabi, M. J. Leamy, and M.-R. Alam, “Continuous profile flexural GRIN lens: Focusing and harvesting flexural waves,” Appl. Phys. Lett. 112(2), 023901 (2018). [CrossRef]

11. T. A. Murray and M. J. Levene, “Singlet gradient index lens for deep in vivo multiphoton microscopy,” J. Biomed. Opt. 17(2), 021106 (2012). [CrossRef]

12. H. Schulz-Hildebrandt, M. Pieper, C. Stehmar, M. Ahrens, C. Idel, B. Wollenberg, P. König, and G. Hüttmann, “Novel endoscope with increased depth of field for imaging human nasal tissue by microscopic optical coherence tomography,” Biomed. Opt. Express 9(2), 636–647 (2018). [CrossRef]

13. I. Kitano, M. Toyama, and H. Nishi, “Spherical aberration of gradient-index rod lenses,” Appl. Opt. 22(3), 396–399 (1983). [CrossRef]

14. R. Kuschmierz, E. Scharf, D. F. Ortegón-González, T. Glosemeyer, and J. W. Czarske, “Ultra-thin 3D lensless fiber endoscopy using diffractive optical elements and deep neural networks,” Light: Advanced Manufacturing 2(4), 1–424 (2021). [CrossRef]

15. X.-J. Mao, C. Shen, and Y.-B. Yang, “Image restoration using convolutional auto-encoders with symmetric skip connections,” arXiv,arXiv:1606.08921 (2016). [CrossRef]

16. B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, and C. Moser, “Multimode optical fiber transmission with a deep learning network,” Light: Sci. Appl. 7(1), 69 (2018). [CrossRef]

17. N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica 5(8), 960–966 (2018). [CrossRef]

18. U. Kürüm, P. R. Wiecha, R. French, and O. L. Muskens, “Deep learning enabled real time speckle recognition and hyperspectral imaging using a multimode fiber array,” Opt. Express 27(15), 20965–20979 (2019). [CrossRef]

19. Y. Sun, J. Shi, L. Sun, J. Fan, and G. Zeng, “Image reconstruction through dynamic scattering media based on deep learning,” Opt. Express 27(11), 16032–16046 (2019). [CrossRef]

20. A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Drémeau, S. Gigan, and F. Krzakala, “Random projections through multiple optical scattering: Approximating kernels at the speed of light,” in International Conference on Acoustics, Speech and Signal Processing (IEEE, 2016), pp. 6215–6219.

21. V. S. Butylkin and M. Shalyaev, “Excitation of stimulated Raman scattering in graded-index fiber waveguides by an arbitrary Gaussian beam,” Sov. J. Quantum Electron. 12(11), 1505–1507 (1982). [CrossRef]

22. D. Kumar and O. Singh II, “Elliptical and circular step-index fibers with conducting helical windings on the core-cladding boundaries for different winding pitch angles-A comparative modal dispersion analysis,” Prog. Electromagn. Res. 52, 1–21 (2005). [CrossRef]

23. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention (Springer, 2015), pp. 234–241.

24. T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, and K. Seiwald, “U-Net: deep learning for cell counting, detection, and morphometry,” Nat. Methods 16(1), 67–70 (2019). [CrossRef]

25. L. Zhang, R. Xu, H. Ye, K. Wang, B. Xu, and D. Zhang, “High definition images transmission through single multimode fiber using deep learning and simulation speckles,” Optics and Lasers in Engineering 140, 106531 (2021). [CrossRef]

26. C. Zhu, E. A. Chan, Y. Wang, W. Peng, R. Guo, B. Zhang, C. Soci, and Y. Chong, “Image reconstruction through a multimode fiber with a simple neural network architecture,” Sci. Rep. 11(1), 896 (2021). [CrossRef]

27. J. Zhao, X. Ji, M. Zhang, X. Wang, Z. Chen, Y. Zhang, and J. Pu, “High-fidelity imaging through multimode fibers via deep learning,” JPhys Photonics 3(1), 015003 (2021). [CrossRef]

28. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 1125–1134.

29. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 8798–8807.

30. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Günaydın, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nat. Methods 16(1), 103–110 (2019). [CrossRef]

31. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. 36(5), 460–468 (2018). [CrossRef]

32. C. Belthangady and L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nat. Methods 16(12), 1215–1225 (2019). [CrossRef]

33. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020). [CrossRef]

34. M. Paschali, S. Conjeti, F. Navarro, and N. Navab, “Generalizability vs. robustness: investigating medical imaging networks using adversarial examples,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2018), pp. 493–501.

35. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv, arXiv:1411.1784 (2014). [CrossRef]

36. J. Shin, T. Bosworth B, and A. Foster M, “Compressive fluorescence imaging using a multi-core fiber and spatially dependent scattering[J],” Opt. Lett. 42(1), 109–112 (2017). [CrossRef]

37. S. Y. Lee, V. J. Parot, B. E. Bouma, et al., “Confocal 3D reflectance imaging through multimode fiber without wavefront shaping[J],” Optica 9(1), 112–120 (2022). [CrossRef]

Image Space Defocus Distance	Network Type	PSNR	SSIM
1.5cm	U-net	25.96	0.83
1.5cm	Pix2pix	25.67	0.81
2.5cm	U-net	23.94	0.75
2.5cm	Pix2pix	24.07	0.78
3.5cm	U-net	23.49	0.72
3.5cm	Pix2pix	23.54	0.71

Object Space Defocus Distance	Network Type	PSNR	SSIM
1.5cm	U-net	20.69	0.50
1.5cm	Pix2pix	20.45	0.50
2.5cm	U-net	20.36	0.45
2.5cm	Pix2pix	19.98	0.43

Object Space Interference	Network Type	PSNR	SSIM
Random scattering medium	U-net	22.49	0.70
Random scattering medium	Pix2pix	22.73	0.70
Random scattering medium + stain	U-net	23.34	0.73
Random scattering medium + stain	Pix2pix	23.41	0.72

Image Space Defocus Distance	Network Type	PSNR	SSIM
1.5cm	U-net	25.96	0.83
1.5cm	Pix2pix	25.67	0.81
2.5cm	U-net	23.94	0.75
2.5cm	Pix2pix	24.07	0.78
3.5cm	U-net	23.49	0.72
3.5cm	Pix2pix	23.54	0.71

Object Space Defocus Distance	Network Type	PSNR	SSIM
1.5cm	U-net	20.69	0.50
1.5cm	Pix2pix	20.45	0.50
2.5cm	U-net	20.36	0.45
2.5cm	Pix2pix	19.98	0.43

High-quality color image restoration from a disturbed graded-index imaging system by deep neural networks

Abstract

1. Introduction

2. Materials and methods

2.1 Imaging of disturbed GRIN systems

2.2 Images restoration based on DNNs

3. Results

3.1 Comparison of images recovery results

3.2 DNNs restoration performance assessment

4. Discussion and conclusion

4.1 Discussion

4.2 Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (6)

Tables (3)

Equations (10)

Optics Express