Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Auto-focusing and quantitative phase imaging using deep learning for the incoherent illumination microscopy system

Open Access Open Access

Abstract

It is well known that the quantitative phase information which is vital in the biomedical study is hard to be directly obtained with bright-field microscopy under incoherent illumination. In addition, it is impossible to maintain the living sample in focus over long-term observation. Therefore, both the autofocusing and quantitative phase imaging techniques have to be solved in microscopy simultaneously. Here, we propose a lightweight deep learning-based framework, which is constructed by residual structure and is constrained by a novel loss function model, to realize both autofocusing and quantitative phase imaging. It outputs the corresponding in-focus amplitude and phase information at high speed (10fps) from a single-shot out-of-focus bright-field image. The training data were captured with a designed system under a hybrid incoherent and coherent illumination system. The experimental results verify that the focused and quantitative phase images of non-biological samples and biological samples can be reconstructed by using the framework. It provides a versatile quantitative technique for continuous monitoring of living cells in long-term and label-free imaging by using a traditional incoherent illumination microscopy system.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

The mystery of the micro-world often contains the essence of macroscopical problems, therefore people are enthusiastic about understanding the micro-world with help of the microscopic techniques. The Optical microscope technique, as one of the microscopic techniques, plays an irreplaceable role in biomedicine, material chemistry, industrial detection, and other fields because of its non-invasive and high-resolution characteristics [1]. Incoherent illumination is commonly adopted in microscope systems. However, the phase information which can reflect the three-dimensional morphology, internal structure, and refractive index distribution of the sample is hard to obtain directly under incoherent illumination. Moreover, observing biological cells always takes several days or even weeks, during which defocusing is likely to occur [2,3]. In living cell researches, the above two problems are simultaneously encountered, however, they are separately solved.

A variety of solutions have been developed to obtain in-focus images [4,5,6]. These methods always require extra focusing-aid devices [7] or using focusing algorithms [5,6]. The additional hardware may be incompatible with the existing microscope hardware. The latter usually requires using iterative algorithms under certain evaluation criteria to find the ideal focal plane [3], which is time-consuming. To obtain quantitative phase information, many quantitative phase imaging (QPI) techniques have been developed. From the perspective of the light source, these techniques can be divided into coherent (laser) phase imaging [8,9] and white light phase imaging [10,11]. The quantitative phase measurement techniques with coherent illumination usually produce speckle patterns and parasitic fringes which will seriously degrade the image quality. The low power of the white-light eliminates the coherence noise and is suitable for over long time periods observation and imaging of living cells. Furthermore, the coherent light may introduce more phototoxicity on the sample [12]. However, the spatial coherence area of these white light illumination methods is much smaller than the measured fields of view(FOV), which makes the halo effect [13] in the phase image [14,15,16]. It can be seen that the existing autofocusing and phase imaging methods are independent techniques.

In recent years, deep learning has been demonstrated as a powerful tool in solving various inverse problems in optical imaging [17], such as optical tomographic, optical fiber imaging [20], holographic image reconstruction [21,22], etc [1824]. Through training a network with a large quantity of accurately paired images, deep learning can map the relationship between the input and the target output distributions, without any prior knowledge of the imaging model. Some recent works have utilized deep learning to predict the out-of-focus depth from a single image [2528]. Although these methods can predict the out-of-focus depth relatively accurately, it is necessary to combine the driving hardware to shift the sample to the focal plane position. In addition, for the rough surface sample, these deep learning-based methods are still required refocusing at each position. More recently, Luo et al. [17] have proposed a deep learning-based offline autofocusing method (termed Deep-R). A generative adversarial network (GAN) which is trained with accurately matched in-focus and out-of-focus image pairs blindly autofocuses an out-of-focus image. This work successfully proved that deep learning can be used for autofocusing of a general incoherent illumination microscopy system, and further expanded the advantages of low phototoxicity, but only the sample amplitude information can be obtained (without phase information). It is known that there is a mapping relationship between the intensity and phase information of the sample [28,29]. Once intensity images and quantitative phase images of the sample can be obtained as a dataset, it is possible for a deep learning network to learn this mapping relationship. Several examples have shown that deep learning can recover phase information directly from intensity information. These deep learning-based methods have been used to phase recovery or achieve post-experimental digital refocusing in digital holography [2933] or transport of intensity equation [28], which are based on coherent illumination or partially coherent illumination. To our knowledge, no existing deep learning microscopy tools have been shown to directly recover in-focus information and phase information from a single shot out-of-focus bright-field image under incoherent illumination.

Here, we propose a new learning-based method (termed AF-QPINet) to blindly and rapidly restore the quantitative in-focus phase image from an out-of-focus intensity image recorded by an incoherent illumination microscopic system. Considering the pixel drift caused by vibration and low similarity of training data caused by the different defocusing levels, we propose a new loss function model based on perceptual loss instead of the traditional loss function which measures the pixel level difference between the network output and the ground truth annotation to obtain more accurate results. The AF-QPINet consists of two subnets, one of which is trained to perform autofocus (termed Autofocusing-Net) and the other is learned to get phase imaging (termed QPI-Net). Using the AF-QPINet, we can transform an ordinary incoherent bright field microscope into an autofocusing quantitative phase microscope. It combines the phase-sensitive advantage of coherent imaging and the speck-free advantage of incoherent illumination. We use the out-of-focus images of different samples which are captured by an incoherent bright field microscope system to prove the effectiveness and generalization performance of the AF-QPINet and its widely applicable to various data.

2. Method

We suppose that the captured in-focus image is written as ${I_{in}}$(the out-of-focus depth z is 0µm)and the corresponding phase image is represented by ${P_{in}}$. The autofocusing process we could write as ${I_{in}} = \Im {\{ }{I_{de}}\} $, where ${I_{de}}$ represents the captured out-of-focus image and $\Im {\{ }{.\} }$ represents the forward process operator that converts the out-of-focus image ${I_{de}}(x,y)$ to the in-focus image ${I_{in}}(x,y)$.

To learn the parametric inverse mapping operator ${\Im _{learn}}$ which represents the autofocus algorithm, we need to use a set of data $\delta $,which consists of out-of-focus images and their corresponding ground truth labels (in-focus images). The data can be written as $\delta = \{ (I_{de}^n,I_{in}^n)|n = 1,2, \cdots ,N\} $, $N$ is the total number of image pairs. Here, we suppose the data distribution in $\delta $ is written as $\kappa $ which is the estimation of unknown real distribution, it is obtained by independent random sampling from the unknown real distribution. The Autofocusing-Net is used to generate a probability distribution $f$,which is determined by. The suitable $\theta $ (i.e., ${\Im _{learn}}$) is chosen to make $f$ approach $\kappa $. The process of training is to find out the suitable $\theta $ to minimize the error between $\kappa $ and $f$, that is determining the internal weights of different convolutional kernels. In addition, the parameters that specify the structure of the network need to be determined before training (type of the network, size of the convolutional kernel and number of neurons in each layer, etc). The training process can be written as

$${\Im _{learn}} = \mathop {\arg \min }\limits_\theta \sum\limits_{n = 1}^N {L(I_{in}^n,{\Im _\theta }\{ I_{de}^n\} )} + {\psi _\theta }$$
where $L(.)$ is the loss function to measure the error between $I_{in}^n$ and ${\Im _\theta }{\{ }I_{de}^n{\} }$,and ${\Psi _\theta }$ is a regularizer to avoid overfitting. Once the mapping operator has been acquired, the Autofocusing-Net can be used to generate the in-focus images from the corresponding out-of-focus images directly.

The same principle can be used to the generating of phase images based on the QPI-Net. We also need to use a set of data, which consists of in-focus images ($I_{in}^n$) and their corresponding ground truth labels (phase images, $P_{in}^n$). The QPI-Net is determined by $\gamma $, the suitable $\gamma $ can be written as ${\Re _{learn}}$, and the training process can be written as

$${\Re _{learn}} = \mathop {\arg \min }\limits_\gamma \sum\limits_{n = 1}^N {F(P_{in}^n,{\Re _\gamma }\{ I_{in}^n\} )} + {\Psi _\gamma }$$
where $F(.)$ is the loss function to measure the error between $P_{in}^n$ and ${\Re _\gamma }{\{ }I_{in}^n{\} }$. The QPI-Net can be used to generate the phase images from the corresponding in-focus images directly.

Finally, input a single-shot out-of-focus bright-field image, the AF-QPINet can output the corresponding in-focus image and phase image. The overall process can be written as

$${I_{de}}\buildrel {{\Im _\theta }\{ {I_{de}}\} } \over \longrightarrow {I_{in}}\buildrel {{\Re _\gamma }\{ {I_{in}}\} } \over \longrightarrow {P_{in}}$$

2.1 Experimental setup

The key of our work is how to obtain the corresponding focal stacks and corresponding phase images under incoherent illumination. Differing from previous methods, a new experimental system has been built to obtain training and testing data, which is based on a modified Mach-Zehnder interferometer. We use the hybrid coherent and incoherent illumination imaging system to obtain focal stacks and off-axis holograms for the learning of the parametric inverse mapping operator ${\Im _{learn}}$ and ${\Re _{learn}}$. In the same FOV, the off-axis hologram and in-focus image capture at the focal plane.

The experimental recording system is shown in Fig. 1, which consists of a coherent illumination system and an incoherent illumination system. For the coherent illumination system (the resolution is 1.8 µm, the DOF is 9.232 µm), a laser beam (wavelength: 532 nm) is divided into two beams by a beam splitter BS1 after passing through the beam expander. One beam is the object beam, which passes through the sample and is magnified by a microscope objective ($\textrm{10} \times \textrm{/0}\textrm{.25 NA}$) and tube lens ($f = 200mm$) and the other is the reference beam. After the two beams pass through BS3, a hologram is generated by the interference between them and is captured by a digital camera (pixel pitch is 3.45 µm, the number of pixels is 1024×768). A white beam is first converted into a partial directional beam through L($f = \textrm{1}20mm$), and then similarly passes through the sample to be magnified by the microscope objective and the tube lens. It has to be mentioned that the DOF of the imaging system under an incoherent light source is slightly different from that of the coherent imaging system. To make the data representing both cases, we set the space along z-axis smaller than the minimum DOF, which is 2.5 µm. The sample is fixed on a two-dimensional displacement platform.

 figure: Fig. 1.

Fig. 1. Hybrid coherent and incoherent illumination imaging system. BEC, beam expanding collimation; S, shutter; BS, beam splitters; M, mirror; LEDs, LED array (including 20 individual bright LEDs); L, collimation lens; MO, microscope objective; TL, Tube Lens.

Download Full Size | PDF

The motorized displacement platform (typical absolute accuracy is 1.0 µm), annotated as ‘Z’ is used to precisely capture the images of the sample at each out-of-focus depth, while the manual displacement platform annotated as ‘X’ is used to capture images from different FOVs.

2.2 Images capturing and preprocessing

In this paper, we select 3 µm polystyrene microspheres as samples. Keep the position of the sample fixed in the X direction and shift it along the Z direction to capture the out-of-focus depth from -100 µm to 100 µm with 2.5 µm spacing of focal stacks which distributed symmetrically around the focal position. At the focal position, by controlling the switch of S1 and S2, the in-focus image and its corresponding hologram are captured respectively. Shifting X to different FOVs and repeat the above operation. Finally, we capture 50 groups of images (50 raw focal stacks and 50 raw holograms) and segment 2346 sub focal stacks from 50 raw focal stacks and their corresponding 2346 sub holograms from 50 raw holograms. As shown in Fig. 2(a), take a captured raw focal stack (1024×768 pixels) as an example, after numbering the raw focal stack (sequence numbers from -40 to 40), the sub stacks are segmented (128×128 pixels). We use of entropy function [34] and discrete cosine transform (DCT) coefficients [35] to determine the focus evaluation value of the images in arbitrary sub stacks from the spatial domain and frequency domain respectively, the normalized image focus evaluation value is shown in Fig. 2(b).

 figure: Fig. 2.

Fig. 2. (a) Numbering the raw focal stack and then segmenting sub images. The in-focus image sequence number is 0. Left side of 0: -1 to -40 (from proximal segment to distal segment). Right side of 0: 1 to 40 (from proximal segment to distal segment). (b) Quantitative representation the focus evaluation values of the segmented sub images. (c) Left: The in-focus image captured under spatially incoherent illumination. Right: Raw hologram captured at the focal position under spatially coherent illumination and reconstruction flow of the hologram.

Download Full Size | PDF

The phase images are obtained by numerical reconstruction of sub-holograms. In Fig. 2(c), the in-focus image (Target1) and corresponding phase image (Target2) are shown. The main steps of hologram reconstruction are involved: firstly, filter out the spectrum of the raw image and move it to the center of the spectrum, then use the PCA algorithm [36] to compensate the phase distortion and finally reconstruct the compensated spectrum numerically to obtain the phase without distortion.

2.3 Model architecture

The architecture of subnets which we structure is of the same architecture. As shown in Fig. 3(a), our network is inspired by U-Net [37] and residual network [38], which consists of a down-sampling path (encoder), four residual blocks, and an upsampling path (decoder). The down-sampling path consists of three repeating stages of two convolutional blocks and a 2×2 kernel size max-pooling filter (a form of non-linear down-sampling that eliminates nonmaximal values, which can reduce the computational complexity and avoid over fitting partly) with stride 2 following the two convolutional blocks. The number of channels in the first stage is from 1 to 64 and the other stages double the channels. The four residual blocks after downsampling can increase the accuracy of feature expression from increased depth. The upsampling path consists of three repeating stages of an up-convolution (transposed convolution) concatenating with corresponding feature map at the down-sampling path by skip connection (the feature maps with the same size are connected, so that the features learned by the network in the coding phase are transmitted to the decoding phase) and two convolutional blocks.

 figure: Fig. 3.

Fig. 3. Detailed schematic of the Autofocusing-Net and the API-Net architecture. (a) Each blue box corresponds to a multi-channel feature map. Blue-dotted boxes represent copied feature maps. The digits on the top of boxes denote the number of channels. The digits in the format $x \times y$ at the lower left edge of the boxes denote the size of feature maps. (b) Yellow box: Detailed structure of the convolutional block. Black box: Detailed structure of the residual block.

Download Full Size | PDF

As shown in Fig. 3(b), in order to accelerate the model convergence and improve the training speed, a 3×3 kernel size convolution is followed by a batch normalization layer (BN) and an activation function (ReLU) [39]. The residual block includes two 3×3 kernel size convolutions and two activation functions. The feature map of the output of the previous layer by two convolutional and two activation functions is added with the feature map incoming to this block and outputs an image for the next block. A shortcut is established between the input and output to optimize the neural network.

2.4 Designed loss function and networks training method

The deep learning framework not only depends on the choice of network structure but also depends on the choice of the loss function. For the network training, the input images and the corresponding target images must be strictly matched. However, in the experiment, the motorized displacement platform may cause slight random vibration during the recording process, which inevitably leads to pixel-level error. The traditional deep learning model uses pixel loss to measure the pixel-wise difference between the output of the network and ground truth label, which reduces the accuracy of the reconstruction results. In addition, the network optimization method is to minimize the error between images, which is equivalent to the averaging and makes the output image blurry [40]. Here, we propose a new loss function model based on the perceptual loss [41] and the structural similarity algorithm [42] to improve our model. As shown in Fig. 4(a), two identical parts exist in both Loss function1 (L) and Loss function2 (F). The first part: the ground truth label and the output image of the network input the network without parameters updates, extracting the feature value and then calculating the mean squared error (MSE) between the feature value of the output image input the network and the feature value of the ground truth label input the network as a part of the objective function to avoid the influence of pixel offset and impel the output image and ground truth label more similar. We performed a brute-force search that compared the influence of the feature maps from each layer on the final output with the extracted value from the F2 layer and F7 layer (the concrete details shown in Fig. 4(a)). To describe the influence on the final output, MSE is defined by M (feature map of the ground truth label input network), and N (feature map of the output image input network). It is written as

$$MSE({M_j},{N_j}) = \frac{1}{n}\sum\limits_{i = 1}^n {||{M_j} - {N_j}|{|^2}}$$
where i represents the number of the image pairs in the training set; j represents layer of the network; n is the batch size. The optimization function is the average of the sum of the difference between M and N at each data point.

 figure: Fig. 4.

Fig. 4. Training and testing of the AF-QPINet. (a) Training flow diagram. 1:an arbitrary out-of-focus bright-field image(Input1) passes through the Autofocusing-Net to obtain Output1; 2:Output1 and Target1 are input into the Autofocusing-Net concurrently; 3: calculate the final loss value $L(.)$. Finally, the loss is back-propagated through the Autofocusing-Net to update the network’s parameters. The flow path from 4 to 6 is the training process of QPI-Net, which is consistent with the training process of the Autofocusing-Net (corresponding to the flow path from 1 to 3). (b) Testing flow diagram.

Download Full Size | PDF

The second part: to generate finer details and higher resolution, we employed the Negative Structural Similarity Index (NSSIM) as an objective function [43]. NSSIM is 1 plus the negative version of the structural similarity index (SSIM). For given image $X$(ground truth label), and $Y$(the network output image), NSSIM is defined as

$$NSSIM(X,Y) = \frac{1}{n}\sum\limits_{i = 1}^n {[1 - \frac{{(2{\mu _X}{\mu _Y} + {c_1})(2{\sigma _{XY}} + {c_2})}}{{(\mu _X^2 + \mu _Y^2 + {c_1})(\sigma _X^2 + \sigma _Y^2 + {c_2})}}]}$$
where ${\mu _X}$ and ${\mu _Y}$ are the mean of the images X and Y;$\sigma _X^2$ and $\sigma _Y^2$ represent the variance;${\sigma _{XY}}$ is the covariance of X and Y;${c_1}$ and ${c_2}$ are stabilization constants used to prevent division by a small denominator. The loss function can be written as
$$Loss = \frac{{{\lambda _1}}}{2}\ast NSSIM(\textrm{X},\textrm{Y}) + \frac{1}{\textrm{4}}[{\lambda _2}{\ast }MSE({M_2},{N_2}) + {\lambda _3}{\ast }MSE({M_7},{N_7})]$$
where the weight ${\lambda _\textrm{1}}$ of 0.01, ${\lambda _\textrm{2}}$ of 2500, and ${\lambda _3}$ of 400 give the best performance. The difference between ${\lambda _\textrm{1}}$ and [${\lambda _\textrm{2}}$, ${\lambda _3}$] is due to the difference between NSSIM and MSE, and the difference between ${\lambda _\textrm{2}}$ and ${\lambda _3}$ is due to the different dimensions of the feature maps. The loss function is back-propagated through the network, the Adam optimizer with a learning rate starting from 0.001and dropping to the previous 0.95 every ten epochs, the mini-batch size of 32 are adopted to update the parameters of the Autofocusing-Net and the mini-batch size of 16 are adopted to update the parameters of the QPI-Net. In each mini-batch training, one iteration of the optimization is performed to update the parameters of the network.

As shown in Fig. 4(b), after training, a set of out-of-focus bright-field images through the Autofocusing-Net can obtain the corresponding in-focus images and then generate corresponding phase images through QPI-Net.

3. Results and discussion

3.1 Creation of datasets and networks training details

For the training of Autofocusing-Net, a set of datasets consisting of random 2000 sub stacks (2000 different FOVs; 81 depths; 162000 images) out of 2346 sub stacks, which are further augmented by fourfold to 8000 sub stacks (8000 different FOVs; 81 depths; 648000 sub-images) by rotating them to 0, 90, 180, and 270 degrees for data enhancement. It should be noted that a large training dataset means a long training time. Such a huge amount of data is unconducive to network training. To improve the training efficiency, it is necessary to reduce redundant training data. Therefore, we try to conduct the training with less training data. As shown in Fig. 5(a), we respectively used images recorded at $z ={-} 65$µm and at $z ={-} 80$µm as inputs to train Autofocusing-Net and test the images recorded at other positions between $z ={-} 65$µm and $z ={-} 80$µm. We use SSIM as the quantitative evaluation metrics to show the test results. It can be understood that the peak of SSIM appears at the trained depths, but it decreases at untrained depths. Then, we trained Autofocusing-Net with training both dataset $z ={-} 65$µm and $z ={-} 80$µm are used to train the Autofocusing-Net. The SSIM increases at untrained positions between $z ={-} 65$µm and $z ={-} 80$µm (such as point 1 and point 2). The values of SSIM are about 0.9. We conclude that the training interval can be set as 10 µm with acceptable SSIM. Therefore, the dataset contains image pairs at 17 out-of-focus depths(out-of-focus depth range of -80 µm to 80 µm,136000 image pairs) and their corresponding sequence numbers are D, here $D = 4d(d = 0, \pm 1, \pm 2,\ldots , \pm 8)$. Among the dataset, 85% are used for training (6800 different FOVs; 17 out-of-focus depths; 115600 image pairs) and 15% for validation dataset (1200 different FOVs; 17 out-of-focus depths; 20400 image pairs). The testing dataset consists of the remaining 346 sub stacks (346 different FOVs; 41 integer out-of-focus depths from -100 µm to 100 µm). A mini-batch of 32 out-of-focus images is inputted into the Autofocusing-Net for every iteration. The network is evaluated by using the validation dataset for every iteration (3625 iterations are 1 epoch).

 figure: Fig. 5.

Fig. 5. (a) The mean SSIM of trained networks for autofocusing at each depth between the output of the network and ground truth labels, the images for network training only at one or two certain depths. Each one of these SSIM curves is averaged 200 random FOVs from testing dataset. (b) Both validation loss of Autofocusing-Net and QPI-Net decrease along the training process. Here 3000 iterations are shown.

Download Full Size | PDF

For the training of QPI-Net, the dataset consists of 8000 in-focus images and corresponding phase images. 85% are used for the training dataset (6800 different FOVs) and 15% for the validation dataset (1200 different FOVs). The test output images of the Autofocusing-Net are used as the input to test the QPI-Net. A mini-batch of 16 out-of-focus images is inputted into the QPI-Net for every iteration. For every iteration (425 iterations are 1 epoch), the network is also evaluated using the validation dataset. As shown in Fig. 5(b), the validation loss of the Autofocusing-Net and QPI-Net are presented, the two networks both converge gradually while their losses decrease along the training process.

To avoid over-fitting of the neural network, retain the best model parameters which performance on the verification dataset, the AF-QPINet is implemented by Python 3.6.8 based Pytorch 1.3.1, the networks training and testing on a PC with double Intel Aeon Gold 5117 CPU @ 2.00 GHz and 128GB RAM, using NVIDIA Forced RTX 2080 Ti GPU. Finally, the training process of the Autofocusing-Net takes ∼90 hours for 100 epochs and the training process of the QPI-Net takes ∼5 hours for 100 epochs. The imaging speed of the AF-QPINet can reach 9∼10 fps (the imaging speed of the Autofocusing-Net or QPI-Net individually reach 20 fps).

3.2 Comparison of network performance

More recently, U-Net has been used in a lot of optical imaging applications [44], such as low light fluorescence imaging [45], phase retrieval [33], and low light scenes [46]. Compared with the combined network composed of two networks with a U-Net structure, the proposed AF-QPINet adds residual modules in the network structure and uses a new loss function. To compare the performance of the two modifications, we trained U1-Net and U2-Net (U1-U2) with U-Net structures, which used for autofocusing and phase generation. The training dataset including 2 out-of-focus depths($z ={-} 20$µm and $z = 20$µm) and the other hyperparameters including the learning rate, learning epoch, and batch size are respectively the same as the Autofocusing-Net and the QPI-Net. As shown in Fig. 6(a) and Fig. 6(b), during the training process, our proposed networks can achieve higher SSIM with shorter epochs. After finishing the training, we test the U1-U2 and the AF-QPINet with the same testing dataset. The SSIM including 200 random FOVs from the testing dataset recorded between $z ={-} 20$µm and $z = 20$µm is shown in Fig. 6(c) and Fig. 6(d) as violin plots. The AF-QPINet greatly improves the accuracy of imaging results (SSIM increased 0.6∼0.7). The AF-QPINet can achieve better image quality with shorter training epochs.

 figure: Fig. 6.

Fig. 6. (a) The change process of the mean SSIM between the training images and their corresponding ground truth labels of Autofocusing-Net and U1-Net in 100 epochs. (b) The change process of the mean SSIM between the training images and their corresponding ground truth labels of QPI-Net and U1-Net in 100 epochs. (c) The violin plots of SSIM of output results for Autofocusing-Net and U1-Net. (d) The violin plots of SSIM of output results for QPI-Net and U2-Net.

Download Full Size | PDF

The proposed AF-QPINet works in series operation model, in which the output from the Autofocusing-Net will be the input of the QPI-Net. We compare it with multifunction end-to-end network like Y-shaped structure [32], by which a focused phase is directly retrieved from an out-of-focused intensity image. It seems more efficiency. However, we found that the end-to-end structure is not only time-consuming but also much less accurate than the proposed net. The reason lies in the end-to-end net completing two tasks. The first one is the focused reconstruction and the other is the phase retrieval. As shown in Fig. 7, we trained the QPI-Net with out-of-focus intensity and phase image pairs to obtain Output2*. The Output2 is the final phase image obtained by our method. Compared with the two results, the phase distribution of Output2 is almost bias-free because of the constraint of the result of the Autofocusing-Net, but the result of Output2* is obviously unsatisfactory. The SSIM values for our method and the end-to-end method are 0.9603 and 0.8752, respectively. Besides that, we compared our proposed network with a combined network but with the reverse order, which the QPI-Net is firstly conducted and then the Autofocusing-Net. The QPI-Net is used to convert intensity images whether be focused or not into phase images. For network training, a large amount inaccessible out-of-focused phase images are required, which will increase the period of the data preparation. The accuracy of generated phase images is bound to be affected by the poor quality of out-of-focused intensity images. Therefore, the SSIM valued of the combined network with inverse order is also less than our proposed structure.

 figure: Fig. 7.

Fig. 7. The comparison with the end-to-end network.

Download Full Size | PDF

3.3 Output from the proposed network

Figure 8 shows the results of the test at a certain trained out-of-focus depth with different FOVs. From the experimental results as shown in Fig. 8(a), the out-of-focus bright-field image can be restored to the in-focus image and output the corresponding phase image successfully, the SSIM and error maps between the output images and their corresponding ground truth labels can preliminarily prove the feasibility of the AF-QPINet. In addition, as shown in Fig. 8(b) and (d), the background of the output images is displayed more smoothly than their corresponding ground truth labels intuitively. This is mainly because of the denoising effect of the networks [47,48]. When we suppress the high-frequency information distribution, the SSIM between the output images and ground truth labels increases, i.e., eliminating the noise interference factors, the results of our method are basically consistent with the actual distribution, but not completely consistent, as shown in Fig. 8(c) and (e). This further proves the feasibility of the AF-QPINet.

 figure: Fig. 8.

Fig. 8. Test results of polystyrene microspheres at $z ={-} 20$µm (trained depth) from different FOVs. The AF-QPINet input, output and ground truth are shown in (a), the intensity information along the line in the Part A and Part B is displayed on the right of (a), the SSIM is indicated between ground truth and output, and the error1 and error2 are the error maps between AF-QPINet output images and their corresponding ground truth labels. (b) and (c) are the specific result of output1 in the Part3, where the red block diagram in (c) is the intensity value of I and the blue block diagram is the value of II. (d) and (e) are the specific result of output2 in the Part3. The three-dimensional images of phase and corresponding ground truth labels are displayed on both sides of (d), and the red block diagram in (e) is the phase value of I and the blue block diagram is the phase value of II.

Download Full Size | PDF

For the input of the AF-QPINet without out-of-focus information overlapping or loss, the results of output are in good agreement with ground truth. Further, we show the test results at a certain untrained out-of-focus depth with information overlapping and under different boundaries in Fig. 9, where the input of the AF-QPINet under different boundaries are out-of-focus bright-field images with partial diffraction information loss. Part A and Part C show the results of input, which are captured near the boundary. Although compared with the test results of polystyrene microspheres at $z = - 20$µm in Fig. 8(a), the SSIM has some decreases and the error has a few increases on the error maps, they also demonstrate a good matching performance. Even when only a small portion of the sample is captured near the boundary, a good phase generation result also can be obtained. Part B and Part C show the results of input are captured with overlapping. Interestingly, as shown in Part D, the out-of-focus image captured near the boundary with overlapping also can obtain the corresponding in-focus image and phase image by the AF-QPINet.

 figure: Fig. 9.

Fig. 9. Test results of polystyrene microspheres at $z ={-} 35$µm (untrained depth) from different FOVs with overlapping and under different boundaries. Blue boxes: results under different boundaries; red boxes: with overlapping.

Download Full Size | PDF

The above discussion is based on the test images captured at a certain out-of-focus depth (trained depth or untrained depth). Figure 10(a) shows the mean SSIM calculated across an axial out-of-focus depth range of -100 µm to 100 µm, which was averaged across 200 random different FOVs from the testing dataset. Within the trained depth range of -80 µm to 80 µm, the mean SSIM of trained depth and untrained depth is higher than 0.93, where the SSIM of trained depth is generally 0.2 higher than the untrained depth (this is also mentioned in the previous discussion). In fact, this small difference in SSIM (0.95 versus 0.93) is visually negligible [30]. As shown in Fig. 10(b), in the depth range of -80 µm to 80 µm, the error maps between the ground truth labels and output images can confirm the above viewpoint. Beyond the trained range of -80 µm to 80 µm, the output image quality degraded rapidly, and the error appearances. By using a training dataset that consists of the out-of-focus bright-field images captured at more out-of-focus depths, the AF-QPINet can reconstruct the in-focus image and corresponding phase image from an out-of-focus bright-field image in a wider range. In fact, this mainly depends on the size of the training images, the magnification of the microscope, the size of the sample, etc.

 figure: Fig. 10.

Fig. 10. Test results of polystyrene microspheres at different out-of-focus depths. (a) The mean SSIM at different out-of-focus depths between output images and their corresponding ground truth labels. The out-of-focus images within trained depth in the red boxes are the input of the AF-QPINet and in the blue boxes are out-of-focus images of untrained depth. The SSIM curve is averaged 200 random FOVs from testing dataset. (b) The corresponding output results in (a).

Download Full Size | PDF

3.4 Different biological samples

Further, we also trained the AF-QPINet with label-free yeast cells and vicia faba leaf cells. As shown in Fig. 11(a) and Fig. 12(a), the dataset of these samples obtained by the same technological process as polystyrene microspheres. In addition to the inhomogeneities of living organisms, the contours of biological samples are rough and even with folds and protrusions. After diffraction, this part of the information will greatly overlap with the sample information, the traditional autofocusing methods are hard to separate them and restore in-focus information. As shown in Fig. 11(b) and Fig. 12(b), the AF-QPINet can not only restore the in-focus information of the samples but also reconstruct the edge contour details. Similarly, we plotted the intensity value and phase value along the line in (b), where the blue solid line represents the ground truth. From the results in (c) and Fig. 11(e), the output images of the AF-QPINet are basically consistent with the ground truth. In addition, the curves of the output images are smoother than the ground truth label, which is consistent with the polystyrene microspheres. For biological samples, the AF-QPINet can also remove the noise during the training resulting in improving the image contrast.

 figure: Fig. 11.

Fig. 11. Test results of yeast cells at different out-of-focus depths. (a) The in-focus image and reconstruction phase image of off-axis hologram captured in the experiment. (b) The Autofocusing-Net and QPI-Net input, output, ground truth and SSIM between ground truth and output. (c) The intensity value and phase value correspond to the intensity alone the line in (b). (d) the specific result of output at $z ={-} 65$µm. The three-dimensional images of the generated phase image and ground truth are displayed on both sides. (e) Red block diagram: the intensity value alone the line in (d); blue block diagram: the phase value alone the line in (d).

Download Full Size | PDF

 figure: Fig. 12.

Fig. 12. Test results of vicia faba leaf cells at different out-of-focus depths. (a) The in-focus image and reconstruction phase image of off-axis hologram captured in the experiment. (b) The Autofocusing-Net and QPI-Net input, output, ground truth and SSIM between ground truth and output. (c) The intensity value and phase value correspond to the intensity alone the line in (b). (d) The in-focus intensity image of cotton stem. (e) Test results of cotton stem reconstitution with AF-QPINet trained by the dataset contain images only of vicia faba leaf cells.

Download Full Size | PDF

The AF-QPINet successfully learned the in-focus information and phase information from the out-of-focus bright-field image of complex biological samples, which is in line with our expectations. For the results of samples currently used, the mean SSIM and the mean Peak Signal to Noise Ratio (PSNR) [49] of the results of different datasets are shown in Table 1 and Table 2. The SSIM and PSNR are separately averaged 200 random FOVs with 17 trained depths and 24 untrained depths. From the results, the SSIM of trained depth is generally 0.2 higher than the untrained depth. The simpler the structure of the sample, the higher the image quality of the AF-QPINet output, which is easily accessible. The SSIM of all our samples is higher than 0.91. Therefore, we believe that the AF-QPINet will show good versatility.

Tables Icon

Table 1. The mean SSIM and PSNR between the output of Autofocusing-Net and corresponding ground truth with trained depth and untrained depth of different samples.

Tables Icon

Table 2. The mean SSIM and PSNR between the output of QPI-Net and corresponding ground truth with trained depth and untrained depth of different samples.

Actually, neural networks can be applied into similar types of samples. To analyze the generalization capability of the AF-QPINet, the cotton stem slices are used as another sample. Using the AF-QPINet which was trained by the dataset contain images only of vicia faba leaf cells, we tested the performance of the AF-QPINet on out-of-focus images from cotton stem. As shown in Fig. 12(e), the network has its optimal blind inference achieved on the same type of samples that it was trained with.

3.5 Stability test

In our case, the same FOV images were captured in a few minutes, this ensures the accuracy of dataset acquisition. Then we took weeks to capture all training data of different samples with different FOVs. While the course of training dataset acquisition, the optical system inevitably changed. Therefore, the training dataset is captured under different conditions. To test the sensitivity of the network to the experimental setup, we artificially shift the original field-of-view, which can simulate the sample deviation caused by the instability of the optical elements [48]. The results are shown in Fig. 13. The values of SSIM show that the AF-QPINet is not affected by the system disturbance. The network can learn all these features from the training dataset, shows excellent spatiotemporal compatibility.

 figure: Fig. 13.

Fig. 13. Test for the sensitivity of the setup to the results. The previoust column images are in the original FOV, and the subsequent column images are the moved FOV.

Download Full Size | PDF

4. Conclusion

In conclusion, we demonstrate a new learning-based microscopy method to obtain in-focus images and phase images from out-of-focus bright-field images, which greatly increase the imaging throughput for specimens. The training image pairs used as training datasets are obtained by a hybrid coherent and incoherent illumination imaging system, which provides a new idea for the acquisition of the experimental matched image pairs. Further, we construct a network framework with a novel loss function model which does not require accurate pixel-level image registration and can achieve better image quality with shorter training epochs. The improved network framework is suitable for both autofocusing and quantitative phase imaging. We believe that the improved framework can be applied to more AI-enhanced imaging tasks. Here, a simple spatially incoherent illumination microscope which can only be used to obtain intensity, once directly combined with the AF-QPINet, can provide the unique capabilities to blindly and rapidly restore the in-focus image and generate corresponding phase image from an out-of-focus bright-field image, and this also means the DOF of the incoherent illumination imaging system (∼7.76 µm) is greatly expanded (the depth range over 160 µm). The AF-QPINet achieves excellent generate results, with SSIM of over 0.93 for in-focus images and over 0.91 for phase images. When imaging label-free living cells overtime periods, only a simple incoherent illumination imaging system is needed to obtain bright-field images. The AF-QPINet ensures that the in-focus image can be obtained rapidly and conveniently, and at the same time, the speckle noise and phototoxicity caused by using a coherent illumination imaging system to obtain a phase image can be avoided. We believe that the AF-QPINet has great application prospects, such as cell detection, tracking, recognition, and analysis.

Funding

National Natural Science Foundation of China (61775097, 61975081); National Key Research and Development Program of China (2017YFB0503505); Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education (2017VGE02).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Park, C. Depeursinge, and G. Popescu, “Quantitative phase imaging in biomedicine,” Nat. Photonics 12(10), 578–589 (2018). [CrossRef]  

2. H. Pinkard, Z. Phillips, A. Babakhani, D. A. Fletcher, and L. Waller, “Deep learning for single-shot autofocus microscopy,” Optica 6(6), 794–797 (2019). [CrossRef]  

3. M Kreft, M Stenovec, and R Zorec. Focus-drift correction in time-lapse confocal imaging. Annals of the New York Academy of Sciences. 2005 Jun.

4. K. Guo, J. Liao, Z. Bian, X. Heng, and G. Zheng, “InstantScope: a low-cost whole slide imaging system with instant focal plane detection,” Biomed. Opt. Express 6(9), 3210–3216 (2015). [CrossRef]  

5. J. Liao, Y. Jiang, Z. Bian, B. Mahrou, A. Nambiar, A. W. Magsam, K. Guo, S. Wang, Y. k. Cho, and G. Zheng, “Rapid focus map surveying for whole slide imaging with continuous sample motion,” Opt. Lett. 42(17), 3379–3382 (2017). [CrossRef]  

6. J. Liao, L. Bian, Z. Bian, Z. Zhang, C. Patel, K. Hoshino, Y. C. Eldar, and G. Zheng, “Single-frame rapid autofocusing for brightfield and fluorescence whole slide imaging,” Biomed. Opt. Express 7(11), 4763–4768 (2016). [CrossRef]  

7. M. Bathe-Peters, P. Annibale, and M. J. Lohse, “All-optical microscope autofocus based on an electrically tunable lens and a totally internally reflected IR laser,” Opt. Express 26(3), 2359–2368 (2018). [CrossRef]  

8. Y. Mao, B.-X. Wang, C. Zhao, G. Wang, R. Wang, H. Wang, F. Zhou, J. Nie, Q. Chen, Y. Zhao, Q. Zhang, J. Zhang, T.-Y. Chen, and J.-W. Pan, “Integrating quantum key distribution with classical communications in backbone fiber network,” Opt. Express 26(5), 6010–6020 (2018). [CrossRef]  

9. G. Popescu, T. Ikeda, R. R. Dasari, and M. S. Feld, “Diffraction phase microscopy for quantifying cell structure and dynamics,” Opt. Lett. 31(6), 775–777 (2006). [CrossRef]  

10. B. Bhaduri, H. Pham, M. Mir, and G. Popescu, “Diffraction phase microscopy with white light,” Opt. Lett. 37(6), 1094–1096 (2012). [CrossRef]  

11. H. Majeed, T. H. Nguyen, M. E. Kandel, A. Kajdacsy-Balla, and G. Popescu, “Label-free quantitative evaluation of breast tissue using Spatial Light Interference Microscopy (SLIM),” Sci. Rep. 8(1), 6875 (2018). [CrossRef]  

12. J. C. Amorim, B. M. Soares, O. A. Alves, M. V. Ferreira, G. R. Sousa, B. Silveira Lde, A. C. Piancastelli, and M. Pinotti, “Phototoxic action of light emitting diode in the in vitro viability of Trichophyton rubrum,” An. Bras. Dermatol. 87(2), 250–255 (2012). [CrossRef]  

13. C. Edwards, B. Bhaduri, T. Nguyen, B. G. Griffin, H. Pham, T. Kim, G. Popescu, and L. L. Goddard, “Effects of spatial coherence in diffraction phase microscopy,” Opt. Express 22(5), 5133–5146 (2014). [CrossRef]  

14. K. Komuro and T. Nomura, “Object plane detection and phase-amplitude imaging based on transport of intensity equation,” Opt. Rev. 24(5), 626–633 (2017). [CrossRef]  

15. Lu Zhang, Qijian Tang, Dingnan Deng, Ming Tao, Xiaoli Liu, and Xiang Peng. Field-of-View Correction for Dual-Camera Dynamic Phase Imaging Based on Transport of Intensity Equation[J]. Chinese Journal of Lasers, 2019.

16. B. Bhaduri, C. Edwards, H. Pham, R. Zhou, T. H. Nguyen, L. L. Goddard, and G. Popescu, “Diffraction phase microscopy: principles and applications in materials and life sciences,” Adv. Opt. Photonics 6(1), 57–119 (2014). [CrossRef]  

17. Y. Luo, L. Huang, Y. Rivenson, and A. Ozcan, “Single-Shot Autofocusing of Microscopy Images Using Deep Learning,” ACS Photonics 8(2), 625–638 (2021). [CrossRef]  

18. K. J. Halupka, B. J. Antony, M. H. Lee, K. A. Lucy, R. S. Rai, H. Ishikawa, G. Wollstein, J. S. Schuman, and R. Garnavi, “Retinal optical coherence tomography image enhancement via deep learning,” Biomed. Opt. Express 9(12), 6205–6221 (2018). [CrossRef]  

19. Y. Huang, Z. Lu, Z. Shao, M. Ran, J. Zhou, L. Fang, and Y. Zhang, “Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network,” Opt. Express 27(9), 12289–12307 (2019). [CrossRef]  

20. B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, and C. Moser, “Multimode optical fiber transmission with a deep learning network,” Light: Sci. Appl. 7(1), 69 (2018). [CrossRef]  

21. G. Zhang, T. Guan, Z. Shen, X. Wang, T. Hu, D. Wang, Y. He, and N. Xie, “Fast phase retrieval in off-axis digital holographic microscopy through deep learning,” Opt. Express 26(15), 19388–19405 (2018). [CrossRef]  

22. K. Umemura, Y. Matsukawa, Y. Ide, and S. Mayama, “Label-free imaging and analysis of subcellular parts of a living diatom cylindrotheca sp. using optical diffraction tomography,” MethodsX 7, 100889 (2020). [CrossRef]  

23. Y. He, G. Wang, G. Dong, S. Zhu, H. Chen, A. Zhang, and Z. Xu, “Ghost Imaging Based on Deep Learning,” Sci. Rep. 8(1), 6469 (2018). [CrossRef]  

24. Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica 5(10), 1181–1190 (2018). [CrossRef]  

25. S. Jiang, J. Liao, Z. Bian, K. Guo, Y. Zhang, and G. Zheng, “Transform- and Multi-Domain Deep Learning for Single-Frame Rapid Autofocusing in Whole Slide Imaging,” Biomed. Opt. Express 9(4), 1601–1612 (2018). [CrossRef]  

26. T. Rai Dastidar and R. Ethirajan, “Whole Slide Imaging System Using Deep Learning-Based Automated Focusing,” Biomed. Opt. Express 11(1), 480–491 (2020). [CrossRef]  

27. Y. Wu, Y. Rivenson, H. Wang, Y. Luo, E. Ben-David, L. A. Bentolila, C. Pritz, and A. Ozcan, “Three-Dimensional Virtual Refocusing of Fluorescence Microscopy Images Using Deep Learning,” Nat. Methods 16(12), 1323–1331 (2019). [CrossRef]  

28. K. Wang, J Di, Y. Li, Z. Ren, Q. Kemao, and J. Zhao, “Transport of intensity equation from a single intensity image via deep learning,” Opt. Lasers Eng. 134, 106233 (2020). [CrossRef]  

29. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]  

30. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic image reconstruction using deep learning based auto-focusing and phase-recovery,” Optica 5(6), 704–710 (2018). [CrossRef]  

31. H. Wang, M. Lyu, and G. Situ, “eHoloNet: a learning-based end-to-end approach for in-line digital holographic reconstruction,” Opt. Express 26(18), 22603–22614 (2018). [CrossRef]  

32. K. Wang, J. Dou, Q. Kemao, J. Di, and J. Zhao, “Y-Net: a one-to-two deep learning framework for digital holographic reconstruction,” Opt. Lett. 44(19), 4765–4768 (2019). [CrossRef]  

33. Z. Ren, Z. Xu, and E. Y. M. Lam, “End-to-end deep learning framework for digital holographic reconstruction,” Adv. Photonics 1(1), 016004 (2019). [CrossRef]  

34. L. Xi, L. Guosui, and J. Ni, “Autofocusing of ISAR images based on entropy minimization,” IEEE Trans. Aerosp. Electron. Syst. 35(4), 1240–1252 (1999). [CrossRef]  

35. X. Jin, Q. Jiang, S. Yao, D. Zhou, R. Nie, S.-J. Lee, and K. He, “Infrared and visual image fusion method based on discrete cosine transform and local spatial frequency in discrete stationary wavelet transform domain,” Infrared Phys. Technol. 88, 1–12 (2018). [CrossRef]  

36. S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemom. Intell. Lab. Syst. 2(1-3), 37–52 (1987). [CrossRef]  

37. O Ronneberger, P Fischer, and T Brox, eds. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; 2015 2015//; Springer International Publishing, Cham.

38. K. He, X. Zhang, S. Ren, and J. Sun, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.

39. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]  

40. I. Moon, K. Jaferzadeh, Y. Kim, and B. Javidi, “Noise-free quantitative phase imaging in Gabor holography with conditional generative adversarial network,” Opt. Express 28(18), 26284–26301 (2020). [CrossRef]  

41. J Johnson, A Alahi, and L Fei-Fei, eds. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Computer Vision – ECCV 2016; 2016 2016//; Springer International Publishing, Cham.

42. Z. Wang, J. Chen, and S. C. H. Hoi, “Deep Learning for Image Super-resolution: A Survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence.

43. D. Ren, W. Zuo, Q. Hu, P. Zhu, and D. Meng, “Progressive image deraining networks: A better and simpler baseline,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019.

44. Z. Zhang, Y. Zheng, T. Xu, A. Upadhya, Y. J. Lim, A. Mathews, L. Xie, and W. M. Lee, “Holo-UNet: hologram-to-hologram neural network restoration for high fidelity low light quantitative phase imaging of live cells,” Biomed. Opt. Express 11(10), 5478–5487 (2020). [CrossRef]  

45. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. Rocha-Martins, F. Segovia-Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]  

46. A. Goy, K. Arthur, S. Li, and G. Barbastathis, “Low Photon Count Phase Reterieval Using Deep Learning,” Phys. Rev. Lett. 121(24), 243902 (2018). [CrossRef]  

47. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]  

48. Z. Meng, L. Ding, S. Feng, F. Xing, S. Nie, J. Ma, G. Pedrini, and C. Yuan, “Numerical dark-field imaging using deep-learning,” Opt. Express 28(23), 34266–34278 (2020). [CrossRef]  

49. S. Winkler and P. Mohandas, “The Evolution of Video Quality Measurement: From PSNR to Hybrid Metrics,” IEEE Trans. on Broadcast. 54(3), 660–668 (2008). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (13)

Fig. 1.
Fig. 1. Hybrid coherent and incoherent illumination imaging system. BEC, beam expanding collimation; S, shutter; BS, beam splitters; M, mirror; LEDs, LED array (including 20 individual bright LEDs); L, collimation lens; MO, microscope objective; TL, Tube Lens.
Fig. 2.
Fig. 2. (a) Numbering the raw focal stack and then segmenting sub images. The in-focus image sequence number is 0. Left side of 0: -1 to -40 (from proximal segment to distal segment). Right side of 0: 1 to 40 (from proximal segment to distal segment). (b) Quantitative representation the focus evaluation values of the segmented sub images. (c) Left: The in-focus image captured under spatially incoherent illumination. Right: Raw hologram captured at the focal position under spatially coherent illumination and reconstruction flow of the hologram.
Fig. 3.
Fig. 3. Detailed schematic of the Autofocusing-Net and the API-Net architecture. (a) Each blue box corresponds to a multi-channel feature map. Blue-dotted boxes represent copied feature maps. The digits on the top of boxes denote the number of channels. The digits in the format $x \times y$ at the lower left edge of the boxes denote the size of feature maps. (b) Yellow box: Detailed structure of the convolutional block. Black box: Detailed structure of the residual block.
Fig. 4.
Fig. 4. Training and testing of the AF-QPINet. (a) Training flow diagram. 1:an arbitrary out-of-focus bright-field image(Input1) passes through the Autofocusing-Net to obtain Output1; 2:Output1 and Target1 are input into the Autofocusing-Net concurrently; 3: calculate the final loss value $L(.)$. Finally, the loss is back-propagated through the Autofocusing-Net to update the network’s parameters. The flow path from 4 to 6 is the training process of QPI-Net, which is consistent with the training process of the Autofocusing-Net (corresponding to the flow path from 1 to 3). (b) Testing flow diagram.
Fig. 5.
Fig. 5. (a) The mean SSIM of trained networks for autofocusing at each depth between the output of the network and ground truth labels, the images for network training only at one or two certain depths. Each one of these SSIM curves is averaged 200 random FOVs from testing dataset. (b) Both validation loss of Autofocusing-Net and QPI-Net decrease along the training process. Here 3000 iterations are shown.
Fig. 6.
Fig. 6. (a) The change process of the mean SSIM between the training images and their corresponding ground truth labels of Autofocusing-Net and U1-Net in 100 epochs. (b) The change process of the mean SSIM between the training images and their corresponding ground truth labels of QPI-Net and U1-Net in 100 epochs. (c) The violin plots of SSIM of output results for Autofocusing-Net and U1-Net. (d) The violin plots of SSIM of output results for QPI-Net and U2-Net.
Fig. 7.
Fig. 7. The comparison with the end-to-end network.
Fig. 8.
Fig. 8. Test results of polystyrene microspheres at $z ={-} 20$µm (trained depth) from different FOVs. The AF-QPINet input, output and ground truth are shown in (a), the intensity information along the line in the Part A and Part B is displayed on the right of (a), the SSIM is indicated between ground truth and output, and the error1 and error2 are the error maps between AF-QPINet output images and their corresponding ground truth labels. (b) and (c) are the specific result of output1 in the Part3, where the red block diagram in (c) is the intensity value of I and the blue block diagram is the value of II. (d) and (e) are the specific result of output2 in the Part3. The three-dimensional images of phase and corresponding ground truth labels are displayed on both sides of (d), and the red block diagram in (e) is the phase value of I and the blue block diagram is the phase value of II.
Fig. 9.
Fig. 9. Test results of polystyrene microspheres at $z ={-} 35$µm (untrained depth) from different FOVs with overlapping and under different boundaries. Blue boxes: results under different boundaries; red boxes: with overlapping.
Fig. 10.
Fig. 10. Test results of polystyrene microspheres at different out-of-focus depths. (a) The mean SSIM at different out-of-focus depths between output images and their corresponding ground truth labels. The out-of-focus images within trained depth in the red boxes are the input of the AF-QPINet and in the blue boxes are out-of-focus images of untrained depth. The SSIM curve is averaged 200 random FOVs from testing dataset. (b) The corresponding output results in (a).
Fig. 11.
Fig. 11. Test results of yeast cells at different out-of-focus depths. (a) The in-focus image and reconstruction phase image of off-axis hologram captured in the experiment. (b) The Autofocusing-Net and QPI-Net input, output, ground truth and SSIM between ground truth and output. (c) The intensity value and phase value correspond to the intensity alone the line in (b). (d) the specific result of output at $z ={-} 65$µm. The three-dimensional images of the generated phase image and ground truth are displayed on both sides. (e) Red block diagram: the intensity value alone the line in (d); blue block diagram: the phase value alone the line in (d).
Fig. 12.
Fig. 12. Test results of vicia faba leaf cells at different out-of-focus depths. (a) The in-focus image and reconstruction phase image of off-axis hologram captured in the experiment. (b) The Autofocusing-Net and QPI-Net input, output, ground truth and SSIM between ground truth and output. (c) The intensity value and phase value correspond to the intensity alone the line in (b). (d) The in-focus intensity image of cotton stem. (e) Test results of cotton stem reconstitution with AF-QPINet trained by the dataset contain images only of vicia faba leaf cells.
Fig. 13.
Fig. 13. Test for the sensitivity of the setup to the results. The previoust column images are in the original FOV, and the subsequent column images are the moved FOV.

Tables (2)

Tables Icon

Table 1. The mean SSIM and PSNR between the output of Autofocusing-Net and corresponding ground truth with trained depth and untrained depth of different samples.

Tables Icon

Table 2. The mean SSIM and PSNR between the output of QPI-Net and corresponding ground truth with trained depth and untrained depth of different samples.

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

l e a r n = arg min θ n = 1 N L ( I i n n , θ { I d e n } ) + ψ θ
l e a r n = arg min γ n = 1 N F ( P i n n , γ { I i n n } ) + Ψ γ
I d e θ { I d e } I i n γ { I i n } P i n
M S E ( M j , N j ) = 1 n i = 1 n | | M j N j | | 2
N S S I M ( X , Y ) = 1 n i = 1 n [ 1 ( 2 μ X μ Y + c 1 ) ( 2 σ X Y + c 2 ) ( μ X 2 + μ Y 2 + c 1 ) ( σ X 2 + σ Y 2 + c 2 ) ]
L o s s = λ 1 2 N S S I M ( X , Y ) + 1 4 [ λ 2 M S E ( M 2 , N 2 ) + λ 3 M S E ( M 7 , N 7 ) ]
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.