Quantized spiral-phase-modulation based deep learning for real-time defocusing distance prediction

Zezheng Zhang; Ryan K. Y. Chan; Kenneth K. Y. Wong; Kenneth K. Y. Wong

doi:10.1364/OE.460858

1. Introduction

In traditional diagnostics, pathology slides are usually observed by pathologists through a microscope, which is often time-consuming and laboriously. Alternatively, in whole slide imaging (WSI) systems (also called automated digital microscopy systems), captured images are aligned and stitched together to produce the whole image of a entire slide, for pathologists to view it on a virtual slide viewer [1,2]. Owing to its particular convenience, especially on remote review and collaborative review, WSI has become an essential tool in today’s pathological diagnosis.

However, in order to obtain digital images with high spatial resolution, high numerical aperture (NA) objective lenses are commonly used in WSI systems. It is well known that in the case of the same objective lens magnification, a larger NA leads to a smaller depth of field (DOF). For example, a typical WSI system uses a 20X objective lens with 0.75 NA to capture images of a biological sample, the DOF is usually less than 1$\mathrm{\mu}$m [3]. The small DOF brings great challenge due to the topography of the sample. Therefore, how to bring the sample at the optimal position in the vertical axis and image without defocusing artefacts, i.e. autofocusing, is still a problem.

Conventional autofocusing methods can be categorized into two groups: optical methods [4–7] and algorithmic methods [8–11]. Optical methods usually require hardware modifications to the optical system to measure or calculate the relative defocusing distance, for example, a near-infrared laser [4,12], a light-emitting diode (LED) [12] or an additional camera [5–7]. However, these devices may not be compatible with the existing microscope hardware [13]. Algorithmic methods usually require to capture a full focal stack to calculate a figure-of-merit (FOM) of each image. The image with the best FOM is treated as infocus. Nevertheless, it is time-consuming to acquire the full focal stack and the FOM may be trapped in a local maxima/minima [14].

In the recent past, deep learning has been shown to be effective on autofocusing task. These works can be divided into two classes: defocusing distance prediction [13–17] and virtual image focusing [18,19]. Here we focus on the former one because of its availability on a broader defocusing range and more direct. Jiang et al. [13] used convolutional neural network (CNN) to predict the defocusing distance of captured images. This approach avoids the time-consuming iterative algorithm thus highly improves calculation speed. Also, they showed that using multi-domain information as the input outperforms the method using single domain. Further, they made their training and testing dataset open for the community, which is used as dataset in this work. However, prediction accuracy of their work still has room for improvement.

Dastidar et al. [14,15] proposed an image propocessing method for autofocusing task. The pixel-wise difference of two images with a fixed vertical distance is fed into neural network. This method shows the ability to suppress non-essential features of the image and highly improves the generalization ability. It was shown to achieve state-of-the-art results on the dataset [13]. However, it needs to capture extra image thus costs more time. Besides, this also limits its predict range. Other works [16,17] requires hardware modification in different degrees, which definitely makes the method more complicated.

Here, we demonstrate a quantized spiral-phase-modulation (QSPM) approach on Fourier domain of captured images before feeding into neural network for defocusing distance prediction. It can be easily implemented on any standard digital microscope system without any hardware modifications because all image processing operation are executed on computer. This neural network was trained on and tested on the open dataset by Jiang et al. [13]. Average focusing error is 0.16$\mathrm{\mu}$m and 0.21$\mathrm{\mu}$m for samples with the same and different staining protocol, respectively, which is far below the 0.8$\mathrm{\mu}$m DOF when capturing these images and shown to outperform other results reported on the same dataset [13–15]. Besides, after a one-time training process, the time for predict defocusing distance of one image is less than 0.1s, thus suitable for real-time autofocusing task. Furthermore, the method is shown to achieve superior performance under the most commonly used regular incoherent Kolner illumination condition. Therefore, it can be implemented in most existing WSI systems.

The paper is organized as follows: in Section 2 we present the details of the proposed QSPM preprocessing method and briefly introduce the dataset and model architecture we used. Section 3 shows the defocusing distance prediction performance and we compare it with other work trained and tested on the same dataset. In Section 4, we conclude the paper and discuss future research direction.

2. Methods

2.1 Quantized spiral phase modulation

Images captured with a conventional optical system convey their defocusing information by the amount of blur: the further the object is from the focus plane, the more blurred it appears. However, it is hard to discern defocusing information precisely when the difference of defocusing distance is subtle. A common choice for discerning is the point spread function (PSF) engineering, for example, the double-helix point spread function (DH-PSF) [20–23]. Usually, DH-PSF system is implemented by a phase mask placed in the Fourier plane of 4-f optical system. In this case, the PSF is engineered to exhibit two lobes that rotate with defocusing, thus significantly increases the sensitivity of defocusing estimation. People have used DH-PSF in 3D tracking [24–26], imaging [27–29] and manipulation [30] of particles with super-resolution, also in depth estimation [22] and single-shot 3D imaging [31] due to its advantage.

However, DH-PSF has two main shortcomings. First, the theoretical light efficiency was only 56$\%$ even after iterative optimization thus not suitable for photon-limited applications [23]. Also, the rotational response takes place only in a limited area instead of the whole space that restricts predictability in a large defocusing range [23]. Therefore, here we use QSPM presented in [32] as the PSF engineering method owing to its high efficiency and easiness in implementation.

Figure 1 shows the illustration of PSF rotation in a 4-f system based on QSPM by adding a quantized spiral phase mask in the Fourier domain. Phase of the $l$-th segments of the mask M is given as:

(1)$$S_l=\exp{[2\pi(l-1)i/3]} \quad {\rm for} \quad\phi \in [2\pi(l-1)/3 , 2\pi l/3],$$

where $l$=1, 2, 3. In this case, the intensity on the image plane can be expressed as [32]:

(2)$$I(r^{\prime},{\varphi}^{\prime}) \propto v^{{\prime}2}I_0[C_1^2 A_{0,1}^2 +C_{{-}2}^2 A_{\frac{1}{2},\frac{3}{2}}^2 +2C_1^2 C_{{-}2}^2 A_{0,1}^2 A_{\frac{1}{2},\frac{3}{2}}^2 sin(3\varphi^{\prime}+\Delta\Omega+\pi)],$$

where $r^{\prime },{\varphi }^{\prime }$ presents the cylindrical coordinates on the image plane. Specific meaning of other parameters are located in [32].

Fig. 1. Illustration of the PSF rotation in the 4-f system based on QSPM.

Download Full Size | PDF

As from the formula shown above, we can observe the behavior of intensity distribution on the image plane as the defocusing distance of object point changes. Firstly, the intensity distribution is bounded by a Gaussian function $I_0$, whose width increases as ${\Delta }z$ becomes larger [32]. As a result, the spot on the plane spreads as the defocusing distance increases. Besides, when the object is in focus, ${\Delta }{\Omega }$ = 0 [32]. In this situation, the three lobes with intensity maxima on the image plane, which specified by the angles ${\varphi }_{max}^{\prime }$= ${\pi }$ / 2, 7${\pi }$ / 6 and $11{\pi }$ / 6, is shown in Fig. 1. When the object is out of focus, ${\Delta }{\Omega }$ $\neq$ 0, then the directions of three lobes are given by ${\varphi }_{max}^{{\prime }{\prime }}$ = ${\varphi }_{max}^{\prime }$ - ${\Delta }{\Omega }$/3. The dependence of ${\varphi }_{max}^{{\prime }{\prime }}$ on the defocusing distance means the intensity spot rotates as the object point changes its position.

It is worth noting that when the object point moves away from the optimal focal position, PSF rotates clockwise for underfocus (-$\Delta$z) while anticlockwise for overfocus (+$\Delta$z). Compared with normal 4-f system (no mask on Fourier domain), whose PSF appears as a single spot diverging with defocusing, the spiral phase mask helps to extract more defocusing information from the captured images.

Here we use the preprocessing method that adding the QSPM at the Fourier domain of captured image I, then use inverse Fourier transformation to get new spatial intensity $I^{\prime }$, as shown in Fig. 2. The post-processed spatial intensity $I^{\prime }$ can be approximated as captured with a microscope with QSPM. The principle of this is that the spatial intensity of captured image I can be expressed as:

(3)$$I=\mathscr{F}^{{-}1}(\mathscr{F}^{\prime}(O)),$$

where O is an object point. $\mathscr {F}^{\prime }$ represents approximate Fourier transformation because the object may be not at focal point of the object lens and $\mathscr {F}^{-1}$ represents the inverse Fourier transform. Then as what will do in Fig. 2, we get the new image:

(4)$$I^{\prime}=\mathscr{F}^{{-}1}(M*\mathscr{F}(I))=\mathscr{F}^{{-}1}(M*\mathscr{F}^{\prime}(O)),$$

same as what we will obtain by a microscope with QSPM, as in Fig. 1.

Fig. 2. Pipeline of image preprocessing. We add a phase mask on Fourier domain of captured image then use inverse Fourier transformation to get the post-processed spatial intensity.

Download Full Size | PDF

2.2 Dataset

In the dataset provided by [13], some stained human pathology slides were captured with a 0.75 NA, 20X objective lens and a 5 megapixel camera at different defocusing distance (from −10$\mathrm{\mu}$m to 10$\mathrm{\mu}$m, 0.5$\mathrm{\mu}$m step size). For this work, we use the images captured on regular incoherent Kolner illumination condition (termed as incoherent RGB input in dataset), which is employed in most existing WSI systems. 128,699 patches of images with size of 224*224 are set as train set. We split the train set into 115825 and 12874, for training and validation, ensuring that the two sets are made up of images from different samples. Two kinds of samples are prepared as test set, one kind is stained tissue slides from the same vendor as used in train set (not used in training process), termed "same protocol" (697 images) and the other is different tissue slides prepared by another lab, termed "different protocol" (1312 images).

2.3 Image preprocessing

The spatial intensity of image I in train set is preprocessed as follows before feeding into the neural network (as in Fig. 2):

1. I was smoothed with a median filter of size 3*3 to reduce the local noise.
2. Then it was Fourier transformed, and we get its Fourier magnitude A and Fourier phase P.
3. The phase mask M was added on P and a new phase distribution $P^{\prime }$ is obtained.
4. Combining the Fourier magnitude $A$ and the new phase $P^{\prime }$, we perform inverse Fourier transform to get a post-processed spatial intensity $I^{\prime }$.
5. Post-processed image $I^{\prime }$, Fourier magnitude A and Fourier phase P (here we use P instead of $P^{\prime }$, because the same M is added on Fourier phases of all images, thus becomes redundant information), these three channels are set as the input to feed into the neural network. The working pipeline is shown in Fig. 3. After getting output of the neural network (defocusing distance), the slides can be moved to correct focal position and in-focus image is obtained.

Fig. 3. The working pipeline of defocusing distance prediction. We use post-processed spatial intensity (converted from Fourier magnitude and Fourier phase with the spiral phase mask), Fourier magnitude and Fourier phase as three-channel input to predict defocusing distance. Then the microscope’s focus can be adjusted to correct defocusing and capture clear image.

Download Full Size | PDF

We also performed ablation study, using the original spatial intensity I, Fourier magnitude A and Fourier phase P as three channels to feed into neural network (without QSPM), to prove the improved performance is due to the application of QSPM, not the use of more advanced network as follows.

2.4 Properties of the new spatial intensity image

As we mentioned above, different from normal microscope system, the image captured by a system with QSPM has more obvious difference on PSF for object point located at different defocusing distance from the focal plane. As the example shown in Fig. 4, after the processing, the two figures with opposite defocusing distance have distinct difference in edges. Moreover, coarse color information of image is always known to be a major source of overfitting [33]. The post-processed image also removes this redundant information.

Fig. 4. Two examples of post-processed image and their zoom-in range. Notice the distinct difference in the properties of the edges between the two post-processed images.

Download Full Size | PDF

Compared with preprocessing method in [14,15] (using pixel-wise difference of two images with a fixed vertical distance as input), QSPM have two main advantages. First, it avoids the increase of local noise caused by the substract operation of the two images. Second, it only requires a single shot, thus significantly ease the operation and save the operation time. Indeed, in our work the several transformation operations cost extra time, but as we will show later, that time is so short and much less than the time used to capture another image.

2.5 Model architecture

In practical application, the predicting time of the defocusing distance requires to be short to realize real-time prediction on an edge device with low computation power. Therefore, a network structure with low computational cost and memory footprint is needed for this task. In this work, we use a "light-weight" network based on MobileNetV3 [34]. The only change is that the output of its last feature map is flattened and fed into a fully-connected layer for regression.

MobileNetV3 takes advantage of several building blocks to optimize the trade-off between accuracy and efficiency, mainly depth-wise separable convolution, inverted residual structure and ’Squeeze-and-Excitation’ (SE) module. Depth-wise separable convolution could achieve the same effect of normal convolution when combining with "1*1 Convolution" but use fewer parameters thus reduce the computational cost [34]. Inverted residual structure are used to solve the problem of information loss due to the application of nonlinear activiation function [35]. The SE module improves the representational capacity of a network by enabling it to perform dynamic channelwise feature recalibration [36]. Beyond those, MobileNetV3 uses platform-aware NAS [37] and NetAdapt algorithm [38] to optimize each network block and number of filters per layer to find optimized models. Also, the activation function "h-swish" is used instead of "swish" in order to increase computation speed.

3. Results

3.1 Model training

The network were trained on a desktop computer with an RTX 2080 Ti GPU for around 10h. The mean squared error (MSE) loss with an L2 penalty term to the weights is used to train the model. The parameters were optimized using stochastic gradient descent with momentum (SGDM) and the mini-batch size is set to be 128 images. We also used learning rate warmup scheme to increase the training stability: learning rate first increases to 5e-4 linearly, then decreases exponentially.

3.2 Defocusing distance prediction performance

As in [13], the high resolution test images (2448 by 2048) are split into smaller segments (224 by 224) , same as the train set. And to avoid impact of outliners (empty regions and regions with low contrast), we take the median of predicted defocusing distance across all segments from an image as the prediction for that image. Table 1 shows the average predict error and and its standard deviation of this work (with and without QSPM), [13] (spatial intensity, Fourier magnitude and Fourier phase as input, CNN) and [14] (pixel-wise difference of two images with a fixed vertical distance as input, MobileNetV2, best reported).

Table 1. Defocusing distance prediction errors (Average $\pm$ standard deviation) obtained on dataset [13] by different methods.

View Table | View all tables in this article

As shown in the table, the neural network training and testing with the spiral phase mask in Fourier domain of images is shown to outperform the best overall performance reported in [14] for both same and different protocol. And compared with the ablation study (without QSPM), it is clear that the QSPM leads to obvious performance improvement, which proves that the improved performance is not only from the promotion of the network (MobileNetV3 compared to MobileNetV2 in [14]), but mainly from the application of phase mask on Fourier domain. This promotion shows the ability of QSPM to suppress non-essential features of images to avoid overfitting and improve the generalization ability of the neural network.

In Fig. 5 we show the predict error for every single image when with and without QSPM for input images, for same and different protocol. Comparing prediction error distribution of neural work trained and tested with and without mask, the predict error when with QSPM in Fourier domain is highly suppressed. Also, the system used to capture images has a DOF of 0.8$\mathrm{\mu}$m, thus images with an absolute predict error less than 0.8$\mathrm{\mu}$m can be regarded as "correct prediction". For image captured with same protocol, almost 99$\%$ (690 of 697) prediction errors are within the DOF and for image captured different protocol, over 96$\%$ (1264 of 1312) are within the DOF. In contrast, when without QSPM when training and testing, only 97.1$\%$ (675 of 697) and 81.9$\%$ (1075 of 1312) are within the DOF for same and different protocol, respectively. Prediction accuracy and generalization ability are obviously improved due to the application of QSPM.

Fig. 5. Defocusing distance prediction error of every single image of (a) with QSPM, same protocol (b) without QSPM, same protocol (c) with QSPM, different protocol (d) without QSPM, different protocol.

Download Full Size | PDF

The small prediction error of the neural network trained on images with QSPM could tolerate us to use smaller DOF thus could capture higher resolution images. As shown in Table 2, for same protocol, even when DOF is halved to 0.4$\mathrm{\mu}$m, there are still over 93$\%$ images are "correctly predicted". While for different protocol, the ratio within 0.7um is also over 95$\%$. Although today’s WSI slides already have very high resolution, a chance to improve it is never a bad idea.

Table 2. Ratio of images with predict error within DOF range

View Table | View all tables in this article

In our device, the total time (including preprocess operation time as in section 2.3 and prediction time) for predicting defocusing distance of one image from the networks is less than 0.1s. Thus makes our system suitable for real-time autofocusing tasks on edge devices.

4. Conclusion and future work

In conclusion, we present the quantized spiral-phase-modulation method in Fourier domain of images for real-time defocusing distance prediction by deep learning. The prediction result is shown to outperform all the results reported on an open dataset. Also, the predict time is short, which means it is possible to be applied on an edge device with low computational resource and memory footprint.

We envision several future directions of our work. The main shortcoming of our work is that our model could not be applied in the case of a tilted or uneven specimen. Next step we may capture images of some tilted or uneven samples as dataset, and try to predict "defocusing distance map" of captured image, which would highly increase the scope of this method. Also, how to minimize the gap between the predict performance of same protocol and different protocol is still a topic worth exploring. Enhancing robustness and generalization ability of deep learning model for various types of biological samples will be an important future direction.

Funding

Health@InnoHK program of the Innovation and Technology Commission of the Hong Kong SAR Government; Research Grants Council of the Hong Kong Special Administrative Region of China (CityU T42-103/16-N, HKU 17200219, HKU 17205321, HKU 17209018, HKU C7074-21GF).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R. S. Weinstein, A. R. Graham, L. C. Richter, G. P. Barker, E. A. Krupinski, A. M. Lopez, K. A. Erps, A. K. Bhattacharyya, Y. Yagi, and J. R. Gilbertson, “Overview of telepathology, virtual microscopy, and whole slide imaging: prospects for the future,” Hum. Pathol. 40(8), 1057–1069 (2009). [CrossRef]

2. M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi, N. M. Rajpoot, and B. Yener, “Histopathological image analysis: A review,” IEEE Rev. Biomed. Eng. 2, 147–171 (2009). [CrossRef]

3. E. Abels and L. Pantanowitz, “Current state of the regulatory trajectory for whole slide imaging devices in the usa,” J. Pathol. Informatics 8(1), 23 (2017). [CrossRef]

4. M. Bathe-Peters, P. Annibale, and M. J. Lohse, “All-optical microscope autofocus based on an electrically tunable lens and a totally internally reflected ir laser,” Opt. Express 26(3), 2359–2368 (2018). [CrossRef]

5. M. C. Montalto, R. R. McKay, and R. J. Filkins, “Autofocus methods of whole slide imaging systems and the introduction of a second-generation independent dual sensor scanning method,” J. Pathol. Informatics 2(1), 44 (2011). [CrossRef]

6. K. Guo, J. Liao, Z. Bian, X. Heng, and G. Zheng, “Instantscope: a low-cost whole slide imaging system with instant focal plane detection,” Biomed. Opt. Express 6(9), 3210–3216 (2015). [CrossRef]

7. J. Liao, L. Bian, Z. Bian, Z. Zhang, C. Patel, K. Hoshino, Y. C. Eldar, and G. Zheng, “Single-frame rapid autofocusing for brightfield and fluorescence whole slide imaging,” Biomed. Opt. Express 7(11), 4763–4768 (2016). [CrossRef]

8. R. Redondo, G. Cristóbal, G. B. Garcia, O. Deniz, J. Salido, M. del Milagro Fernandez, J. Vidal, J. C. Valdiviezo, R. Nava, B. Escalante-Ramírez, and M. Garcia-Rojo, “Autofocus evaluation for brightfield microscopy pathology,” J. Biomed. Opt. 17(3), 036008 (2012). [CrossRef]

9. Y. Sun, S. Duthaler, and B. J. Nelson, “Autofocusing in computer microscopy: Selecting the optimal focus algorithm,” Microsc. Res. Tech. 65(3), 139–149 (2004). [CrossRef]

10. L. Firestone, K. Cook, K. Culp, N. Talsania, and K. Preston Jr., “Comparison of autofocus methods for automated microscopy,” Cytometry 12(3), 195–206 (1991). [CrossRef]

11. S. Yazdanfar, K. B. Kenny, K. Tasimi, A. D. Corwin, E. L. Dixon, and R. J. Filkins, “Simple and robust image-based autofocusing for digital microscopy,” Opt. Express 16(12), 8670–8677 (2008). [CrossRef]

12. J. Liao, Y. Jiang, Z. Bian, B. Mahrou, A. Nambiar, A. W. Magsam, K. Guo, S. Wang, Y. ku Cho, and G. Zheng, “Rapid focus map surveying for whole slide imaging with continuous sample motion,” Opt. Lett. 42(17), 3379–3382 (2017). [CrossRef]

13. S. Jiang, J. Liao, Z. Bian, K. Guo, Y. Zhang, and G. Zheng, “Transform- and multi-domain deep learning for single-frame rapid autofocusing in whole slide imaging,” Biomed. Opt. Express 9(4), 1601–1612 (2018). [CrossRef]

14. T. R. Dastidar and R. Ethirajan, “Whole slide imaging system using deep learning-based automated focusing,” Biomed. Opt. Express 11(1), 480–491 (2020). [CrossRef]

15. T. Rai Dastidar, “Automated focus distance estimation for digital microscopy using deep convolutional neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019), pp. 0–0.

16. H. Pinkard, Z. Phillips, A. Babakhani, D. A. Fletcher, and L. Waller, “Deep learning for single-shot autofocus microscopy,” Optica 6(6), 794–797 (2019). [CrossRef]

17. K. Xin, S. Jiang, X. Chen, Y. He, J. Zhang, H. Wang, H. Liu, Q. Peng, Y. Zhang, and X. Ji, “Low-cost whole slide imaging system with single-shot autofocusing based on color-multiplexed illumination and deep learning,” Biomed. Opt. Express 12(9), 5644–5657 (2021). [CrossRef]

18. Y. Wu, Y. Rivenson, H. Wang, Y. Luo, E. Ben-David, L. A. Bentolila, C. Pritz, and A. Ozcan, “Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning,” Nat. Methods 16(12), 1323–1331 (2019). [CrossRef]

19. Y. Luo, L. Huang, Y. Rivenson, and A. Ozcan, “Single-shot autofocusing of microscopy images using deep learning,” ACS Photonics 8(2), 625–638 (2021). [CrossRef]

20. Y. Y. Schechner, R. Piestun, and J. Shamir, “Wave propagation with rotating intensity distributions,” Phys. Rev. E 54(1), R50–R53 (1996). [CrossRef]

21. R. Piestun, Y. Y. Schechner, and J. Shamir, “Propagation-invariant wave fields with finite energy,” J. Opt. Soc. Am. A 17(2), 294–303 (2000). [CrossRef]

22. A. Greengard, Y. Y. Schechner, and R. Piestun, “Depth from diffracted rotation,” Opt. Lett. 31(2), 181–183 (2006). [CrossRef]

23. S. R. P. Pavani and R. Piestun, “High-efficiency rotating point spread functions,” Opt. Express 16(5), 3484–3489 (2008). [CrossRef]

24. M. A. Thompson, J. M. Casolari, M. Badieirostami, P. O. Brown, and W. E. Moerner, “Three-dimensional tracking of single mrna particles in saccharomyces cerevisiae using a double-helix point spread function,” Proc. Natl. Acad. Sci. 107(42), 17864–17871 (2010). [CrossRef]

25. M. A. Thompson, M. D. Lew, M. Badieirostami, and W. E. Moerner, “Localizing and tracking single nanoscale emitters in three dimensions with high spatiotemporal resolution using a double-helix point spread function,” Nano Lett. 10(1), 211–218 (2010). [CrossRef]

26. S. R. P. Pavani and R. Piestun, “Three dimensional tracking of fluorescent microparticles using a photon-limited double-helix response system,” Opt. Express 16(26), 22048–22057 (2008). [CrossRef]

27. S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proc. Natl. Acad. Sci. 106(9), 2995–2999 (2009). [CrossRef]

28. S. R. P. Pavani, J. G. DeLuca, and R. Piestun, “Polarization sensitive, three-dimensional, single-molecule imaging of cells with a double-helix system,” Opt. Express 17(22), 19644–19655 (2009). [CrossRef]

29. C. Jin, J. Zhang, and C. Guo, “Metasurface integrated with double-helix point spread function and metalens for three-dimensional imaging,” Nanophotonics 8(3), 451–458 (2019). [CrossRef]

30. D. B. Conkey, R. P. Trivedi, S. R. P. Pavani, I. I. Smalyukh, and R. Piestun, “Three-dimensional parallel particle manipulation and tracking by integrating holographic optical tweezers and engineered point spread functions,” Opt. Express 19(5), 3835–3842 (2011). [CrossRef]

31. R. Berlich, A. Bräuer, and S. Stallinga, “Single shot three-dimensional imaging using an engineered point spread function,” Opt. Express 24(6), 5946–5960 (2016). [CrossRef]

32. M. Baránek and Z. Bouchal, “Rotating vortex imaging implemented by a quantized spiral phase modulation,” J. Eur. Opt. Soc. 8, 13017 (2013). [CrossRef]

33. D. Mundhra, B. Cheluvaraju, J. Rampure, and T. R. Dastidar, “Analyzing microscopic images of peripheral blood smear using deep learning,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (Springer, 2017), pp. 178–185.

34. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019), pp. 1314–1324.

35. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 4510–4520.

36. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 7132–7141.

37. M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 2820–2828.

38. T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam, “Netadapt: Platform-aware neural network adaptation for mobile applications,” in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 285–300.

Method	Same protocol ( $μ$ m)	Different protocol ( $μ$ m)
Jiang et al. [13]	0.46 $\pm$ 0.34	0.53 $\pm$ 0.59
Dastidar et al. [14]	0.19 $\pm$ 0.18	0.25 $\pm$ 0.26
Without QSPM	0.23 $\pm$ 0.22	0.50 $\pm$ 0.49
With QSPM	0.16 $\pm$ 0.14	0.21 $\pm$ 0.24

Range ( $μ$ m)	Same protocol (percent)	Different protocol (percent)
0.8	99.0	96.3
0.7	98.9	95.6
0.6	97.8	94.2
0.5	96.6	91.8
0.4	93.1	88.9

Method	Same protocol ( $μ$ m)	Different protocol ( $μ$ m)
Jiang et al. [13]	0.46 $\pm$ 0.34	0.53 $\pm$ 0.59
Dastidar et al. [14]	0.19 $\pm$ 0.18	0.25 $\pm$ 0.26
Without QSPM	0.23 $\pm$ 0.22	0.50 $\pm$ 0.49
With QSPM	0.16 $\pm$ 0.14	0.21 $\pm$ 0.24

Range ( $μ$ m)	Same protocol (percent)	Different protocol (percent)
0.8	99.0	96.3
0.7	98.9	95.6
0.6	97.8	94.2
0.5	96.6	91.8
0.4	93.1	88.9

Quantized spiral-phase-modulation based deep learning for real-time defocusing distance prediction

Abstract

1. Introduction

2. Methods

2.1 Quantized spiral phase modulation

2.2 Dataset

2.3 Image preprocessing

2.4 Properties of the new spatial intensity image

2.5 Model architecture

3. Results

3.1 Model training

3.2 Defocusing distance prediction performance

4. Conclusion and future work

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (5)

Tables (2)

Equations (4)

Optics Express