Deep learning based one-shot optically-sectioned structured illumination microscopy for surface measurement

Changchun Chai; Cheng Chen; Xiaojun Liu; ZiLi Lei

doi:10.1364/OE.415210

1. Introduction

Functional surfaces in products such as integrated circuits, micro-electro-mechanical systems (MEMS), and micro-optic components have been highlighted in scientific research and engineering application in recent years [1]. Three-dimensional (3D) surface measurement, analysis, and evaluation play important roles in the function and quality control during production. There have been many technologies applied in surface topography measurements, including coherence scanning interferometry, imaging confocal microscopy, optically-sectioned structured illumination microscopy (OS-SIM), and focus variation instruments, wherein OS-SIM is particularly attractive due to its incoherent (low noise) characteristic and ability to measure rough and smooth surfaces with high resolution and accuracy [2–4].

The basic idea of OS-SIM for surface measurement is to project a single-spatial-frequency illumination pattern (e.g., cosine pattern) onto a sample via the objective and record a sequence of wide-field images of the illumination patterns with stable phase shifting (e.g., three illumination patterns with a relative spatial phase shift of 2π/3) on the camera. These recorded raw images are then processed by the contrast evaluation algorithm to reconstruct an optically-sectioned image. By scanning the sample or the objective vertically, a series of optically-sectioned images are obtained. For each pixel, the intensities of the optically-sectioned images at different scanning heights consist of the axial response signal (ARS) with a near-Gaussian shape. Then, the surface topography is reconstructed by locating the peak position of the ARS at each pixel.

In the current contrast evaluation algorithm (square-law detection) suggested by Neil et al. [5], three accurate phase-shifting structured illumination images are needed to obtain an optically-sectioned image. One main disadvantage of this method is that it takes much time to acquire the raw phase-shifting images for optical sectioning. Another disadvantage is the residual spatial modulation caused by phase-shift error and optical aberrations (e.g., spherical aberration). Several different techniques have been proposed to circumvent these problems. For instance, Krzysztof et al. introduced a fast, adaptive demodulation method based on the Hilbert-Huang transform, for which two phase-shifting grid-illumination images but without precise phase-shifting controlling are required [6]. Another technique called HiLo was introduced by Mertz’s group [7,8], which utilized two images with uniform and structured illumination to produce an optically-sectioned image. By introducing the HiLo principle, the speed of optical sectioning and the robustness against artifacts caused by imperfect illumination structures are improved [9]. Although these OS-SIM techniques above are promising for 3D surface reconstruction, there still exists a real challenge of efficiency due to the need for multi-frame pattern-illumination images. Based on the Fourier band-pass filtering technique, Wang et al. proposed a one-shot optical sectioning method for OS-SIM, which can reconstruct optical sectioning from one modulated structured illumination image [10]. However, the Fourier spectrum components of the modulated images may overlap each other and cannot be separated perfectly by the band-pass filter. Therefore, the single-image demodulation based on Fourier filtering is reliable only for a low lateral resolution structure that is coarser than the grid period [8].

Recently, deep learning has been proven to be versatile and efficient in solving nonlinear inverse problems such as deblurring, super-resolution, denoising, and optical sectioning in microscopic imaging [11–14]. For fluorescence microscopy, several deep-learning methods have been developed to improve the efficiency of optical sectioning for biological specimens. For example, Zhang et al. developed a CNN-based computational optical-sectioning method that can reconstruct low-noise-level optical sectioning from a wide-field image [15]. Wu et al. designed a deep neural network called Deep-Z+, which can digitally refocus the input wide-field fluorescence image at different axial distances while providing a similar ability to remove the defocused background with the confocal image [16]. All these techniques show the great potential of deep learning in improving imaging speed without reducing spatial resolution and signal-to-noise ratio (SNR) for the optical-sectioning task. However, since the CNNs in these techniques are trained using the wide-field (uniform illumination) images with low entropy, it is not sufficient to obtain high-quality optical sectioning and reconstruct the surface topography with high accuracy for smooth surfaces without rich texture features.

To address the problems above, a novel deep-learning-based one-shot optically-sectioned method, named Deep-OS-SIM, is proposed for surface measurement, especially for smooth surfaces. In this method, only a single structured illumination image is needed to decode the in-focus information for optical sectioning at each z-axis plane. Thus, the acquisition efficiency of raw images is improved by 50% compared to the Hilo method and more to other traditional OS-SIM methods (e.g., square-law detection). On the other hand, owing to the improvement of image entropy (IE, see Section. 2.2) by structured illumination images, Deep-OS-SIM shows better performance for surface measurement than other deep-learning-based methods trained with the uniform illumination images. The validity of the proposed method in this paper is demonstrated by measuring the plane mirror, standard step, and milled surface with our homemade OS-SIM system.

2. Methods

2.1 Basic principle of Deep-OS-SIM for surface measurement

As shown in Fig. 1, a micromirror-based OS-SIM system is constructed for surface measurement using a Köhler illumination module as the illumination source. This flexible illumination module allows us to experimentally obtain the optimal illumination intensity and field of view by adjusting the aperture and field stops. After totally internally reflected by a prism, the incoherent light from the Köhler illumination shots onto a DMD (DLP LightCrafter 6500, Texas Instruments). Here, the DMD acts as a pattern generator to sequentially project different illumination patterns onto the sample via a 20× objective (0.45 numerical aperture, Nikon Inc.). Then, a stack of 2D wide-field images under uniform or structured illumination is captured by a CMOS camera (acA 1920-155 µm, Basler) by scanning the objective with a high-precision objective scanner (P-721, Physik Instrumente). For each scanning height, the optically-sectioned image is then reconstructed using a contrast evaluation algorithm from the captured wide-field images. Finally, the surface topography is obtained by locating the peak position of the ARS at each pixel of the optically-sectioned image stack.

Fig. 1. The schematic of our homemade OS-SIM system

Download Full Size | PDF

Although OS-SIM has a relatively faster imaging speed than confocal microscopy, the efficiency of the current optical sectioning methods for surface measurement is yet limited due to the demand for multi-frame pattern-illumination images. To overcome the limitation, a deep-learning-based one-shot optically-sectioned method called Deep-OS-SIM is proposed for OS-SIM. The basic principle of the proposed method is shown in Fig. 2. In brief, the process of Deep-OS-SIM has two phases: training and reconstruction. In the training phase, the raw wide-field images (including structured and uniform illumination images) are firstly center-cropped into sub-images of 548 × 548 pixels since pre-processing the image stack with a full field of view of 1920 × 1200 pixels will dramatically increase the computation cost. After the cropping, these sub-images are then processed by the HiLo algorithm to obtain clean (i.e., no residual stripe-like artifacts) optically-sectioned images (i.e., ground truth images). Since the HiLo algorithm processes these images at the pixel level, a training dataset with pixel-wise alignment accuracy between the input and ground truth images is obtained in Deep-OS-SIM. To prevent the overfitting of the network model, we further enhance the size of the training dataset by data augmentation techniques. After augmentation, the intensity values of training images are normalized to the range of [0, 1] in 32-bit floating-point format to train the network with high-precision [17]. In the case of random initialization of weight parameters, Deep-OS-SIM can only output low-quality optically-sectioned images for given inputs. By iteratively updating the network’s parameters using a gradient descent algorithm, the L1-norm distance (L1 loss) between the network outputs and the ground truth images gradually decreases and eventually converges to a stable value. Thus, a well-trained network with the high-precision prediction performance for the optically-sectioned images is obtained. During the reconstruction phase, we firstly acquire a stack of raw structured illumination images by the homemade OS-SIM system and then feed them into the well-trained CNN after pre-processing to predict the optical sectioning. Since the CNN is trained in an end-to-end fashion, no tuning of the hyperparameters is needed for Deep-OS-SIM to reconstruct the corresponding optically-sectioned image stack. Finally, by using peak location algorithms (e.g., Gaussian fitting), the high-resolution surface height map is reconstructed from the optically-sectioning stack.

Fig. 2. Schematic of the Deep-OS-SIM surface measurement system. (a) Training the U-Net using pairs of structured illumination (SI) images and reference optically-sectioned (OS) images. (b) obtaining OS image stack from the SI image stack using well-trained U-Net. Then, the heightmap is reconstructed from the OS image stack by peak location algorithm.

Download Full Size | PDF

For the network architecture, we choose the U-Net [18] with minor modifications to solve the inverse problem of one-shot optical sectioning. In more detail, the main framework that we used consists of three parts: encoder, decoder, and skip connection. For the encoder part, four max-pooling layers are used to down-sample the input image, which not only enlarges the receptive field area and improves the robustness against the small disturbances but also reduces the computational cost. The decoder path, which is composed of four deconvolution up-sampling blocks, is helpful to learn the pyramid level features and recover the lost information in the encoder path. In the skip-connection module, the resolution information of the encoding and decoding block outputs is integrated to mitigate the vanishing gradient problem. Compared to the original U-Net architecture, the batch normalization [19] layer is added into our network to prevent gradient vanishing and improve the convergence speed of the model. Additionally, the cropping operation in the original U-Net is replaced by the zero-padding to implicitly encode absolute position information in CNN [20].

For the implementation, the adaptive moment estimation (Adam) optimizer with initial learning rate 3×10⁻⁴ [21] is employed for stochastic optimization. To prevent the numerical instability caused by the too large learning rate at the beginning of the training, we use a gradual warm-up strategy [22] to linearly increases the learning rate from 0 to the initial learning rate within ten epochs. Then, the initial learning rate is gradually decayed to 10⁻⁷ using a cosine annealing schedule without the restarts [23]. The whole training is performed on Google Colab with the Pytorch framework for roughly 2.5 hours. After the training, the reconstructing time for a surface height map of 512 × 512 pixels from a stack of 201 structured illumination image inputs is only 8 s (including locating the peak positions of the ARSs in the optically-sectioned image stack).

2.2 Improvement of image entropy by structured illumination in Deep-OS-SIM for surface measurement

The central challenge in deep learning is the generalization, which is the ability to perform well on previously unobserved inputs [24]. To achieve good generalization performance for deep neural networks, we need to not only optimize the network architecture but also improve training data. In microscopic imaging, variables such as the physical properties of illumination, sample preparation, and position are under full control. Therefore, it is relatively easy for Deep-OS-SIM to gather a large amount of experimental data with comparable quality to train the CNN. On the other hand, Deep-OS-SIM further improves the quality of the training data by introducing the structured illumination pattern, giving good optical-sectioning performance even for low-textured objects. To illustrate the contribution of structured illumination to deep learning in detail, image entropy is introduced to quantitatively measure the variability (or amount of information) present in an image [25]:

(1)$$\textrm{H} ={-} \sum\limits_{i = 0}^{L - 1} {p({r_i}){{\log }_2}p({r_i})} ,$$

where r_i denotes the intensities of an L-level digital image f(x,y), p(r_i) is the probability associated with each r_i. For a two-dimensional eight-bit grayscale image, L spans the 256 possible pixel values in the interval [0, 255], and p(r_i) can be estimated from the normalized histogram of the image by:

(2)$$p({r_i}) = \frac{{{n_i}}}{{MN}},$$

where n_i is the number of pixels with intensity r_i in f, M and N are the number of image rows and columns, respectively. It can be seen that the image entropy will reach it maximum if each r_i occurs with the same probability p(r_i) = 1/L. In contrast, if the image consists of a uniform grey area with only one grey level r_i (the probability p(r_i) = 1), the image entropy is minimum.

In microscopy, the intensity at the image plane can be described as a two-dimensional slice of the 3D pattern:

(3)$${\textrm{I}_0} = { {(S \cdot O) \otimes {{|{{h_2}} |}^2}} |_{z = 0}}.$$

where ⊗ denotes convolution, S is the 3D intensity distribution of illumination pattern, O is the 3D intensity reflectance of the sample, and h₂ is 3D amplitude point spread function (PSF) of the objective. Therefore, three components contribute to the image entropy: (a) illumination pattern (b) the sample itself (c) the properties of the optical system (e.g., 3D PSF). When focusing on samples only, it is not surprising that samples containing rich texture information have higher image entropy. However, if we project suitable illumination patterns (S≠1) onto the samples, an improved image entropy will be obtained even for the samples with low surface roughness. As shown in Fig. 3, the image entropy for a tilted plane mirror under uniform illumination (S=1) is 3.76. After the introduction of structured illumination (e.g., grid period=6 DMD pixels), the image entropy of the plane mirror at the same height increases by a factor of 1.8. Thus, a training dataset with sufficient variations is available for Deep-OS-SIM to learn statistical invariant information about optical sectioning. Although a low-frequency structured illumination pattern gives a higher image entropy and illumination contrast, which is beneficial to enhance the convergence of the trained network, this will simultaneously reduce optical sectioning strength and reconstruction accuracy of the surface topography. To balance the axial resolution and the convergence of the trained network, a grid pattern of 6 DMD pixels per cycle is chosen in the paper, which gives a ratio of ∼0.26 between spatial frequency of the grid illumination pattern and the cutoff frequency of the imaging system. In addition, the defocus optical transfer function of the microscope, acting as a low-pass filter, will greatly attenuate the image entropy as the out-of-focus distance increases. If we feed structured illumination images into the CNN indiscriminately, training data with low image entropy will result in a waste of computing resources and even affect network convergence. To address this problem, a threshold-based image entropy criterion is further utilized to control the generation of the training dataset, allowing Deep-OS-SIM to converge faster when trained with the same number of images.

Fig. 3. The images of a tilted plane mirror taken by a camera under (a) structured illumination and (b) uniform illumination. The measured image entropies (IE) are (a) 6.92 and (b) 3.76, respectively.

Download Full Size | PDF

In summary, structured illumination patterns improve the image entropy of the training dataset, making Deep-OS-SIM obtain a better converge and generalization for surface measurement.

3. Experiments and analysis

To demonstrate the effectiveness of the proposed method for surface measurement, we trained two test networks: Deep-SIOS (exactly equals Deep-OS-SIM) and Deep-UIOS. Both networks share the same U-Net architecture and hyper-parameters, including the learning rate and training epoch. The only difference is that Deep-SIOS takes structured illumination images (Fig. 3(a)) as inputs, whereas Deep-UIOS uses uniform images (Fig. 3(b)) as inputs. To train these two networks, 7891 image pairs from 25 measurements of the plane mirror with different field-of-views (FOVs) and tilt angles were jointly collected and randomly divided into the training set (76.5%), validation set (13.5%), and test set (10%). During the training process, each data pair of the training set was further augmented eight times by flipping and rotating the images by a random multiple of 90° while not for the validation set.

After the training phase, we firstly compared the training curves of structural similarity indexes (SSIM) [26] and the loss function for the different networks. As shown in Fig. 4, the training curves of Deep-SIOS converge more easily, whereas that of Deep-UIOS struggle to converge, indicating a large amount of uncertainty in the recovery process of the optically-sectioned images for Deep-UIOS due to the low IE value. More importantly, Deep-SIOS exhibits considerably lower training error and better-generalized performance for the validation set, suggesting that the quality problem is well addressed in Deep-SIOS, and we manage to obtain accuracy gains from increased image entropy.

Fig. 4. The training curves of Deep-SIOS and Deep-UIOS: (left) SSIM (right) loss. Thin curves denote validation SSIM (loss), and bold curves denote training SSIM (loss).

Download Full Size | PDF

Next, we blindly tested these two networks with the test set of the plane mirror. As shown in Fig. 5(a) and 5(c), both deep-learning-based methods can efficiently suppress the background, preserve the in-focus component, and provide similar reconstruction results to that obtained with the HiLo algorithm (Fig. 5(b)). However, further analysis of the line profile plot in Fig. 5(d) shows that the differences between the Deep-SIOS’s output and the HiLo image are smaller than those for Deep-UIOS, indicating a better performance for Deep-SIOS. Additionally, a more quantitative performance analysis for the outputs of the different networks was conducted using the following two criteria: (1) mean absolute error (MAE) and (2) SSIM. The results show that Deep-SIOS achieves improvements in both indices, with 2% for SSIM and 103% for MAE. Then, we experimentally evaluated the optical sectioning strength of different methods by calculating the Full-Width Half-Maximum (FWHM) of ARS curves. As shown in Fig. 5(e), a wider FWHM suggests a poor optical sectioning strength for Deep-UIOS. In contrast, Deep-SIOS shows an approximated (the relative error < 0.4%) FWHM to HiLo, indicating that it has a similar optical sectioning strength to the HiLo technique. For the computational time, the HiLo algorithm implemented with Pytroch and CPU (i.e., HiLo-CPU) takes ∼140 ms to reconstruct an optically sectioned image of 512×512 pixels. By contrast, the reconstruction time for the HiLo algorithm implemented by Pytorch and GPU (i.e., HiLo-GPU) is ∼9.6 ms (∼6 ms for transferring uniform and grid illumination images from CPU to GPU, ∼3 ms for the optical sectioning reconstruction on GPU, and ∼0.6 ms for transferring the HiLo image from GPU to CPU). For Deep-SIOS, the time to predict an optically sectioned image is ∼15 ms (∼3 ms for transferring a single grid image from CPU to GPU, ∼11.4 ms for performing model calculations on the GPU, and ∼0.6 ms for transferring the result image from GPU to CPU). Compared to the HiLo-CPU algorithm, both the HiLo-GPU and Deep-SIOS algorithms are > 9 times faster in reconstruction due to the advantage of GPU parallel computing. Note that the HiLo-GPU algorithm is mainly limited by the transfer time between CPU and GPU, while Deep-SIOS reduces this limitation by predicting the optical sectioning from a single structured illuminated image. Although the current Deep-SIOS is slightly slower than the HiLo-GPU algorithm due to a large number of convolutional layer operations, further acceleration could be implemented through model compression and optimization methods (e.g., network pruning, network quantization, knowledge distillation, etc.) [27].

Fig. 5. A comparison of optically-sectioned images reconstructed by (a) Deep-SIOS, (b) HiLo, (c) Deep-UIOS. The values of SSIM and MAE was used to quantify the performance of the different networks. (d) The line profiles along the corresponding dashed line in (a)–(c). (e) ARS profiles at the point (200, 200). We got FWHMs of 2.83 µm, 2.84 µm, 2.94 µm for HiLo, Deep-SIOS, Deep-UIOS using Gaussian fits, respectively.

Download Full Size | PDF

Next, we reconstructed height maps of the test plane mirror for different networks to evaluate the performance on surface measurement. As shown in Fig. 6(a)–6(c), little difference between the HiLo map and the deep-learning-based map is seen when only the shape of the tilted plane mirror is considered. To quantify the quality of the surface reconstruction, we then calculated the absolute error (Fig. 6(g)–6(h)) for both Deep-SIOS and Deep-UIOS maps against the HiLo map. The absolute error of the Deep-SIOS and Deep-UIOS reconstructions are 10.2 ± 8.4 nm and 59.9 ± 45.3 nm, respectively, demonstrating that our method with high image entropy can reconstruct the surface topography with better accuracy. Then, we further compared the effects of the optical aberrations (mainly field curvature) on different surface measurement methods by calculating the flatness error using the least square plane fitting method [28]. The results, presented in Fig. 6(d)–6(f), shows that a significantly larger flatness error is observed in Deep-UIOS. In contrast, Deep-SIOS produces a similar but slightly lower and smoother flatness error than HiLo owing to the removal of uncorrelated noise by the generalization of CNN, which has been found in other works [29]. These results indicate that (1) Deep-SIOS could learn to reconstruct the surface topography with similar fidelity to SIM techniques even when optical aberrations such as field curvature exist. (2) the higher image entropy does facilitate CNN to learn more statistical invariance and thus reconstruct surface topography with high accuracy. (3) Deep-SIOS could denoise the surface reconstruction to a certain extent due to the generalization ability of CNN.

Fig. 6. Height maps of the plane mirror reconstructed by (a) Deep-SIOS, (b) Deep-UIOS, (c) HiLo, respectively. (d-f) Flatness error maps of the tilted test mirror obtained using the least square plane fitting, corresponding to the height maps in (a-c). (g-h) Absolute error maps for Deep-SIOS and Deep-UIOS.

Download Full Size | PDF

Furthermore, we imaged and reconstructed a standard step with height H=1.2 µm (measured height of 1.204 µm by Form Talysurf PGI 830, Taylor Hobson) to verify the applicability of different networks to these specimens with the same or similar surface statistical characteristics. Note that, for better generalization, more epochs (200 epochs) were used to train both networks above (although unnecessary for Deep-SIOS). For an easy and fair comparison, 201 slices of optical sectioning were used to calculate the maximum intensity projection (MIP) images of different methods. As shown in Fig. 7(c), Deep-UIOS gives an unsatisfactory result compared to HiLo (Fig. 7(e)). In contrast, Deep-SIOS exhibits a similar result to HiLo. Additionally, the 3D height maps of the different methods are also estimated and shown in Fig. 7. Similar to the MIP results, the height map of Deep-SIOS (Fig. 7(b)) shows comparable surface quality to that of HiLo, while the Deep-UIOS map (Fig. 7(d)) exhibits significant distortion. Further, we extract the cross-section profiles (green for HiLo, red for Deep-SIOS, and blue for Deep-UIOS) using cubic interpolation along the same cutting plane. As shown in Fig. 7(g), there are no significant differences between the Deep-SIOS and HiLo profiles, except for the portions at the edge of the step surface. The difference is expected for two reasons. First, steep slopes at the edges of the step reduce the contrast of the structural illumination, resulting in a poor performance for surface reconstruction. Second, steep slopes show a different statistical feature that CNN was not trained for, so Deep-SIOS cannot provide a good generalization for these parts. After the correction of the flatness error (ISO 25178-607), we then calculated the step height by the distance between two parallel lines fitted to the upper and lower plateaus of the profile. The step heights of the Deep-SIOS and HiLo reconstructions are 1.189 µm and 1.192 µm, respectively, both of which are close to the measurement result by commercial instrument, demonstrating the effectiveness of our method.

Fig. 7. The MIP (201 slices) images of 1.2 µm step using three different methods: (a) Deep-SIOS, (c) Deep-UIOS, (e) HiLo. (b), (d), (f) Height maps corresponding to the MIP images in (a), (c) and (e), respectively. (g) The profiles along the cutting plane in (b), (d) and (f).

Download Full Size | PDF

To further test the performance of our method on rough surfaces (i.e., with high image entropy), we retrained the above two networks using images of a turned surface with a surface roughness (Ra) of 0.8 µm. And then the trained networks were tested on another sample with the same surface roughness, i.e., a milled surface of Ra 0.8 µm. For the training dataset, the image entropy maximums of the turned surface under structured illumination and uniform illumination are 6.94 and 6.22, respectively. Figure 8 shows the reconstruction results of the different methods. It is not surprising that Deep-SIOS again shows a similar MIP image to HiLo (ground truth), while Deep-UIOS exhibits more artifacts, albeit with an improvement compared to Fig. 7. Similarly, the height map and profile of Deep-SIOS along the same cutting plane again give better results, demonstrating the superiority of training CNNs using structured illumination images.

Fig. 8. The MIP (273 slices) imaging results of a milled surface of Ra 0.8 µm using three different methods: (a) Deep-SIOS, (c) Deep-UIOS, (e) HiLo. (b), (d), (f) are the 3D height maps corresponding to the MIP images in (a), (c) and (e), respectively. (g) The profiles along the cutting plane in (b), (d) and (f).

Download Full Size | PDF

In summary, Deep-SIOS exhibits faster convergence and better generalization than Deep-UIOS by training the CNN using structured illumination images with high image entropy. For instance, Deep-UIOS fails to provide satisfactory results when the shape of the object is different from the training data (Fig. 7(c-d) and Fig. 8(c-d)). In contrast, Deep-SIOS is still able to successfully reconstruct the step surface (Fig. 7(a-b)) and milled surface (Fig. 8(a-b)) and shows good agreement with the ground truth (HiLo). Additionally, since Deep-SIOS reconstructs the optical sectioning at each height from only a single structured illumination image, the data acquisition time of surface measurement in OS-SIM reduces ∼2s, demonstrating the potential of our method to accelerate surface measurement.

4. Discussion and conclusion

In this paper, we develop a deep-learning-based optically-sectioned method, called Deep-OS-SIM, which enables rapid high-quality optical sectioning from one single structured illumination image for surface reconstruction. Since Deep-OS-SIM is data-driven and trained end-to-end, no additional optical components are required, reducing the complexity of the technique. More importantly, owing to the generalization of the neural network, Deep-OS-SIM is applicable to specimens that have the same or similar surface statistical characteristics as training data. These features highlight the unique potentials of the proposed deep-learning-based method in improving the speed of surface measurement without accuracy loss.

However, it should be noted that Deep-OS-SIM does have some limitations like other deep-learning-based methods since some degree of specialization is needed. Specifically, we expect the trained CNN to learn to remove the out-of-focus information from a single structured illumination image and further reconstruct the surface topography for all objects. Unfortunately, it is currently restricted to the specific projected illumination patterns and specimens. Once the statistical characteristics of the sample or the mode of illumination patterns change, Deep-OS-SIM will require retraining for the variations of imaging parameters. Methods to improve the generalization and robustness of the network will be developed in our future work.

In summary, a deep-learning-based one-shot optically-sectioned method is proposed for OS-SIM, which achieves a better trade-off between quality and speed for the surface measurement and highlights good application prospects in the 3D surface measurement of the microstructures of high-volume manufacturing.

Funding

Science Challenge Project (TZ2018006-0102-02); National Natural Science Foundation of China (51975233, 52005204); China Postdoctoral Science Foundation (2019M662599).

Acknowledgment

The authors thank prof. Zhang Jianguo for providing the samples.

Disclosures

The authors declare no conflicts of interest.

References

1. R. Leach, Optical measurement of surface topography (Springer, 2011), Vol. 14.

2. M. Vogel, Z. Yang, A. Kessel, C. Kranitzky, C. Faber, and G. Häusler, “Structured-illumination microscopy on technical surfaces: 3D metrology with nanometer sensitivity,” in Optical Measurement Systems for Industrial Inspection VII, (International Society for Optics and Photonics, 2011), 80820S.

3. Z. Xie, Y. Tang, Y. Zhou, and Q. Deng, “Surface and thickness measurement of transparent thin-film layers utilizing modulation-based structured-illumination microscopy,” Opt. Express 26(3), 2944–2953 (2018). [CrossRef]

4. H. Wang, J. Tan, C. Liu, J. Liu, and Y. Li, “Wide-field profiling of smooth steep surfaces by structured illumination,” Opt. Commun. 366, 241–247 (2016). [CrossRef]

5. M. A. Neil, R. Juškaitis, and T. Wilson, “Method of obtaining optical sectioning by using structured light in a conventional microscope,” Opt. Lett. 22(24), 1905–1907 (1997). [CrossRef]

6. K. Patorski, M. Trusiak, and T. Tkaczyk, “Optically-sectioned two-shot structured illumination microscopy with Hilbert-Huang processing,” Opt. Express 22(8), 9517–9527 (2014). [CrossRef]

7. S. Santos, K. K. Chu, D. Lim, N. Bozinovic, T. N. Ford, C. Hourtoule, A. C. Bartoo, S. K. Singh, and J. Mertz, “Optically sectioned fluorescence endomicroscopy with hybrid-illumination imaging through a flexible fiber bundle,” J. Biomed. Opt. 14(3), 030502 (2009). [CrossRef]

8. J. Mertz and J. Kim, “Scanning light-sheet microscopy in the whole mouse brain with HiLo background rejection,” J. Biomed. Opt. 15(1), 016027 (2010). [CrossRef]

9. T. N. Ford, D. Lim, and J. Mertz, “Fast optically sectioned fluorescence HiLo endomicroscopy,” J. Biomed. Opt. 17(2), 021105 (2012). [CrossRef]

10. H. Wang, W. Liu, Z. Hu, X. Li, and B. Hong, “One-shot optical sectioning structured illumination microscopy,” in AOPC 2019: Optical Sensing and Imaging Technology, (International Society for Optics and Photonics, 2019), 113380F.

11. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]

12. K. de Haan, Y. Rivenson, Y. Wu, and A. Ozcan, “Deep-Learning-Based Image Reconstruction and Enhancement in Optical Microscopy,” Proc. IEEE 108(1), 30–50 (2020). [CrossRef]

13. M. Weigert, U. Schmidt, T. Boothe, A. Muller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. Rocha-Martins, F. Segovia-Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]

14. L. Jin, B. Liu, F. Zhao, S. Hahn, B. Dong, R. Song, T. C. Elston, Y. Xu, and K. M. Hahn, “Deep learning enables structured illumination microscopy with low light levels and enhanced speed,” Nat. Commun. 11(1), 1–7 (2020). [CrossRef]

15. X. Zhang, Y. Chen, K. Ning, C. Zhou, Y. Han, H. Gong, and J. Yuan, “Deep learning optical-sectioning method,” Opt. Express 26(23), 30762–30772 (2018). [CrossRef]

16. Y. Wu, Y. Rivenson, H. Wang, Y. Luo, E. Ben-David, L. A. Bentolila, C. Pritz, and A. Ozcan, “Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning,” Nat. Methods 16(12), 1323–1331 (2019). [CrossRef]

17. C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” J Big Data 6(1), 60 (2019). [CrossRef]

18. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

19. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167 (2015).

20. M. A. Islam, S. Jia, and N. D. B. Bruce, “How much position information do convolutional neural networks encode. “ arXiv preprint arXiv:2001.08248 (2020).

21. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

22. T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, “Bag of tricks for image classification with convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019), 558–567.

23. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983 (2016).

24. I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning (MIT press Cambridge, 2016), Vol. 1.

25. R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital image processing using MATLAB (Pearson Education India, 2004).

26. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

27. Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey of model compression and acceleration for deep neural networks,” arXiv preprint arXiv:1710.09282 (2017).

28. C. Bermudez, A. Felgner, P. Martinez, A. Matilla, C. Cadevall, and R. Artigas, “Residual flatness error correction in three-dimensional imaging confocal microscopes,” in Optical Micro-and Nanometrology VII, (International Society for Optics and Photonics, 2018), 106780M.

29. C. Bai, J. Qian, S. Dang, T. Peng, J. Min, M. Lei, D. Dan, and B. Yao, “Full-color optically-sectioned imaging by wide-field microscopy via deep-learning,” Biomed. Opt. Express 11(5), 2619–2632 (2020). [CrossRef]

Deep learning based one-shot optically-sectioned structured illumination microscopy for surface measurement

Abstract

1. Introduction

2. Methods

2.1 Basic principle of Deep-OS-SIM for surface measurement

2.2 Improvement of image entropy by structured illumination in Deep-OS-SIM for surface measurement

3. Experiments and analysis

4. Discussion and conclusion

Funding

Acknowledgment

Disclosures

References

Cited By

Figures (8)

Equations (3)

Optics Express