Abstract
Optical diffraction tomography (ODT) solves an inverse scattering problem to obtain label-free, 3D refractive index (RI) estimation of biological specimens. This work demonstrates 3D RI retrieval methods suitable for partially-coherent ODT systems supported by intensity-only measurements consisting of axial and angular illumination scanning. This framework allows for access to 3D quantitative RI contrast using a simplified non-interferometric technique. We consider a traditional iterative tomographic solver based on a multiple in-plane representation of the optical scattering process and gradient descent optimization adapted for focus-scanning systems, as well as an approach that relies solely on 3D convolutional neural networks (CNNs) to invert the scattering process. The approaches are validated using simulations of the 3D scattering potential for weak phase 3D biological samples.
© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Optical diffraction tomography (ODT) provides a 3D reconstruction of the refractive index (RI) for nearly optically transparent samples exhibiting weak absorption and scattering properties. Traditionally ODT approaches use specialized holographic imaging exploiting light sources with high degrees of spatial and temporal coherence, and complex hardware to achieve angular scanning of the sample illumination projection [1–4]. Interferometric approaches are subject to coherent speckle and phase instabilities. Non-interferometric ODT has been demonstrated using partially-coherent sources and traditional wide-field transmission microscopes [5–8]. This approach achieves a simplified hardware solution by utilizing an electronically tunable lens (ETL) to collect a series of through-focus intensity images as the axial focal depth is scanned. For samples that meet the criteria of the first Born approximation, the well-known transport of intensity equation (TIE) provides non-interferometric access to quantitative phase and RI contrast [9]. ODT based solely on axial scanning suffers from an incomplete representation of the 3D optical transfer function (OTF), particularly in the axial dimension. Hardware and software solutions to recover this missing resolution, commonly referred to as the “missing cone” problem, are at the core of almost all inverse scattering applications. In addition, choosing the optimal defocusing distance presents an inherent tradeoff between sensitivity and spatial resolution. While these systems have been shown to provide estimates of the transverse phase [10–15], 3D wavefield inversion is severely under-represented by the measurement space. We propose to improve upon non-interferometric ODT approaches by considering a system comprised of fast axial scanning using an ETL as well as fast angular illumination scanning using a fully or partially coherent source. Angular scanning can be achieved using beam steering hardware including digital micromirror devices (DMD), spatial light modulators (SLM), and programmable Galvo mirrors. Angular diversity can also be achieved using a 2D LED array, where individual diodes sequentially provide illumination from a discrete and well calibrated direction [15–21].
Traditional ODT reconstruction techniques rely on a formal definition of the forward scattering model which is used directly to solve the corresponding inverse problem. These approaches are subject to the limitations of a rigid mathematical construct, including boundary artificats and 1st order scattering approximations resulting in phase biases, low-frequency blurring, and reduced axial resolution. Deep learning (DL) approaches can deduce the complex nonlinear transformation between the input measurements and desired output without the constraints imposed by a direct definition of an optical scattering model. We propose an inversion scheme based on representing the scattering process using a 3D convolutional neural network (CNN) for improved refractive index and phase retrieval. System performance is evaluated using simulations of the optical scattering and measurement acquisition processes for an ODT imaging system utilizing illumination and defocus scanning. Reconstruction performance is evaluated relative to an optimization technique based on an iterative back-projection gradient descent method using a multiple in-plane scattering model.
2. Optical diffraction tomography
Tomographic reconstruction can be realized by adopting the mathematical framework of an inverse problem. An inverse problem attempts to solve for an unknown sample s using measurements m generated from a forward model H operating on $s$, i.e. $m = H(s )$. Applications of inverse problems in imaging include deconvolution, interpolation, denoising, super-resolution, non-linear wavefield inversion, inverse scattering, and time-reversal [22]. A tomographic inverse problem is depicted in Fig. 1, where the forward model represents the entire tomographic imaging system sampling the measurement space.
Optical diffraction tomography is an ill-posed inverse problem that entails computing the 3D refractive index from 2D scattering fields. The scattering problem can be defined by the inhomogeneous Helmholtz equation given by [23]
where ${k_0} = 2\pi /\lambda $ is the optical wavenumber, $r = ({x,y} )$ are the transverse spatial coordinates, z is the axial dimension, $\psi ({r,z} )= {\psi _i}({r,z} )+ {\psi _s}({r,z} )$ is the total field with contributions from the on-axis incident and scattered components, and $V({r,z} )$ is the complex scattering potential defined as [5,24]The scattering potential is a solution to the Helmholtz equation and is a function of the 3D complex refractive index $n = {n_{re}} + i{n_{im}}$ with amplitude and phase components given by A and $\phi $. Using the first-order Born approximation, $|{{\psi_s}} |\ll |{{\psi_i}} |$, the total field can be solved for without explicit knowledge of the scattered component using the Green’s function solution $G({r,z} )$ to the homogeneous equation for undiffracted light given by [25]
3. Non-interferometric ODT system
Coherent ODT has been demonstrated based on digital holographic microscopy and angular illumination diversity achieved using beam steering devices [2]. The interferometric phase is encoded in the coherent spatial mixing of the path length matched object and reference beams and is trivially deduced in reconstruction. We consider a simpler non-interferometric ODT realization, suitable for fully or partially-coherent sources which generate spatially incoherent quasi monochromatic light. Partial-coherence can be achieved directly using a filtered halogen lamp, LED array, or diffusing the beam of a spatially coherent laser source. The system provides phase sensitivity without issues associated with interferometric optics including coherent optical speckle, mechanical instabilities, aberrations, and phase wrapping. Figure 2 shows the proposed non-interferometric ODT system [15,18–21,27]. In this configuration, angular illumination is achieved by diversifying single diode illumination in a structured LED array. The partially-coherent angular illumination is focused on the object using transmission microscope optics consisting of a condenser and objective lens. Without the need for a reference arm, a simpler optical refocusing module placed after interaction with the sample provides access to quantitative phase information by solving the TIE equation at axially displaced planes. The refocusing optics consists of a divergent offset lens and electronically tunable lens (OL/ETL) combination placed at the focal plane of a relay lens in a 4$f$ imaging system. Altering the effective focal length of the OL/ETL pair is equivalent to but much faster than either axially translating the recording plane of the focal plane array (FPA) or the sample stage resulting in a re-focusing at the object plane [28].
We consider intensity measurements recorded as the effective focal length of the OL/ETL is varied to provide a focal scan across a variety of illumination projections provided by the illumination scanning module. We define the total number of measurements M as the number of defocus planes ${M_z}$ times the number of angles ${M_\theta }$, i.e. $M = {M_z} \cdot {M_\theta }$. Adequate sampling to support both scanning modalities comes at the cost of increased time and computational complexity in both hardware and reconstruction implementations. Through-focus scanning using an ETL and angular scanning typically achieves 10-20 ms update rates [29–31]. Measurement acquisition rates will also be limited by the integration time required to achieve high contrast intensity images collected through an objective lens, typically on the order of 100-400 ms for partially coherent sources [16,17]. Measurement acquisition rates of 2.5-10 Hz are therefore expected; providing 100 measurements of axial and illumination diversity in 10-40 sec. To improve effective measurement throughput multi-diode coded illumination patterns can be used to improve reconstruction quality for a limited number of led scans, drastically reducing the time required to collect measurement support for sample inversion [32].
4. Simulation of optical scattering
Transmission phase-sensitive imaging of biological samples is a well-suited application for ODT as such samples are often highly optically transparent with valuable information contained in the optical path length induced via propagation through the specimen [16,33,34]. In this section we present simulations of the 3D optical scattering potential and forward scattering process for samples that meet the requirements for ODT inversion.
4.1 Object scattering potential
We consider simulated phantoms representing the phase-only scattering potential of nearly-spherical cells, shown as a 3D distribution of refractive index in Fig. 3 (a). The simulated samples are stochastically generated using over-resolved object-plane coordinates. Stochastic sampling is guided by measured optical properties of cells [35,36] and contains random distributions of sub-cellular organelles modeled as 3D ellipsoids. The complex object field is computed using plane wave illumination at 532 nm. The imaging aperture of an objective lens is modeled using NA = 1.4. Figure 3 (b) shows 16 image-plane realizations of the simulated cells convolved with the incoherent detector point spread function (PSF) and subsampled on the FPA; these will serve as ground truth in subsequent processing and analysis whereas the object plane representations will be used to generate image-plane intensity measurements through the application of an optical scattering forward model. Transverse image sizes of 96${\times} $96 are considered, representing the FOV for a 10 micron sample assuming a pixel pitch of 6.5 microns and 60x magnification. Approximating the optical scattering process in the cell phantoms provides data for validation of 3D inverse scattering techniques.
4.2 Forward scattering model
Light propagation through homogeneous objects can be represented as the 3D convolution of the object scattering potential with the PSF for diffraction. By the mathematical nature of the convolution, this model only accounts for first-order scattering effects or equivalently, light scattered once in any direction. To compute optical scattering through inhomogeneous objects, we consider a multi-slice beam propagation model (MSBP) implementation of the tomographic forward operator that accounts for multiple-scattering along the axis of propagation. The model approximates 3D objects as thin layers and computes the optical field as sequential layer to layer propagation [16,18,37,38]. While the model does not account for out-of-plane scattering or plane-to-plane absorption, it does represent multiple in-plane scattering by computing the scattered field iteratively at each plane and is a suitable representation of the optical scattering process in biological samples. Figure 4 describes the mathematical form of the MSBP algorithm. We represent the axial dimension in the plane-to-plane propagation scheme using the z-plane indices p, e.g. $n({r,z} )= {n_{1:p}}(r )$. A plane wave is incident on the object at an angle $\theta = ({{\theta_x},{\theta_y}} )$ relative to the transverse coordinates. The scattered field ${\psi _p}(r )$ is computed by applying the scattering potential in the form of a transmittance function at each slice in a plane-to-plane propagation scheme. The composite scattered field is then propagated forward a distance ${z_f}$ to the final defocusing plane where a generalized pupil function is applied corresponding to both the imaging and detection optics. Figure 5 presents a visual representation of the forward scattering process. In general, the forward operator provides the complex ${\mathrm{\mathbb{C}}^3} \to {\mathrm{\mathbb{C}}^2}$ scattering transformation. However, for weak phase objects only the real-valued refractive index contributes to the scattering potential and the magnitude of the scattered field encodes phase-diffraction information. The modulus-squared operation is returned to represent intensity measurements from a photosensitive FPA.
5. ODT-gradient inverse
The minimization problem in Eq. (6) describes a scattering inversion framework that can be solved using an iterative approach based on plane-to-plane forward-backward projection and gradient descent optimization [16,17,38]. We refer to this approach as ODT-Gradient inverse, outlined in Fig. 6. The model is comprised of a forward component that predicts the 3D complex field $\hat{\psi }({r,z,\theta } )$ and 2D complex measurements $\hat{\psi }({r,{z_f},\theta } )$ from the 3D reconstructed RI $\hat{n}({r,z} )$, and an inverse solver which re-estimates the scattering potential from the 3D complex field predictions supported by the through-focus/angle intensity measurements $I({r,{z_f},\theta } )$. In reconstruction we denote an estimate using the hat operator. The forward and backward scattering projections are modeled using the multi-slice beam propagation model described in Section 4.2, however in reconstruction the forward operator operates on the estimated scattering potential to provide access to the 3D complex scattered field and complex measurements, as opposed to intensity-only measurements. During each iteration, the estimate of the refractive index is successively updated in the direction of minimization for each measurement. After each iteration, total-variation (TV) regularization using the ${L^2}$ norm of the 3D refractive index, corresponding to R in Eq. (6), is implemented using a proximal gradient method to update to the object reconstruction [8,17,39,40]. This allows for a momentum-based acceleration of the gradient in convex regularization problems.
The scattering estimates from the forward model are used in conjunction with the intensity measurements to update the complex object reconstruction during the back-projection gradient update described in Fig. 7. For a forward prediction at each illumination angle θ and defocus plane ${z_f}$, the residual is initialized as the difference between the complex measurements as predicted by the forward model and the observed measurements. In lieu of complex observations, the residual is initialized by enforcing the phase as predicted by the forward model applied to the square-root of the intensity measurements. For each axial plane p in a plane-by-plane back-propagation scheme, the complex field predicted by the forward model is ‘inverse-scattered’ according to the current estimate of the scattering potential. The gradient is then computed using the residual and subtracted from the previous estimate of the object reconstruction, so as to achieve an object update against the gradient in the direction of minimization using the gradient step-size $\mathrm{\varepsilon }$. Finally, the residual is back-projected to the previous plane and the algorithm repeats.
6. ODT-deep inverse
Deep learning realized using CNNs has been broadly applied to optical microscopy applications [35,41–44] and solving inverse problems [45,46], including 3D ODT inverse scattering problems applied to holographic imaging and angular diversity [24]. In the context of Section 5, CNNs have been demonstrated to provide a deep regularization solution to replace the proximal-gradient update method in Fig. 6. In this implementation the network is trained to map the low-to-high resolution transformation provided by regularization routines [47]. Similarly, iterative tomographic inversion has been demonstrated using CNN architecture that is trained in situ to learn minimization using a “deep physics prior” loss function as an alternative to solving a regularized optimization problem directly [48]. Both solutions address the missing cone problem by alleviating rigid dependence on scattering model approximations. We propose to address the shortcomings of traditional optimization routines by using CNN architecture to directly map intensity measurements to refractive index, thus providing a full representation of scattering inversion from intensity-only measurements.
To acquire RI slices from a stack of intensity images we adopted the pix2pix conditional generative adversarial network (cGAN) architecture [49], a well-known model for high resolution image-to-image translation. Figure 8 (a) shows the proposed architecture for non-interferometric ODT imaging and reconstruction applied to a simulated cell phantom. The complex optical field de-focused to a plane ${z_{{m_z},{m_\theta }}}$ is generated and measured as intensity by the ${\mathrm{\mathbb{C}}^3} \to {\mathrm{\mathbb{R}}^2}$ forward model (${H_{{m_z}}}$) operating on the 3D object scattering potential, described by the multi-slice beam propagation forward model. The composite forward model $H = ({{H_0}, \ldots ,{H_{M = {M_z} \cdot {M_\theta }}}} )$ provides a series of 2D measurements as the object is scanned through focus and angle, i.e. $H:\,{\mathrm{\mathbb{C}}^3} \to \mathrm{\mathbb{R}}_{m\in M}^2$ where M is the measurement set. The non-linear inverse model ${\tilde{H}_{{n_z}\in {N_z}}}:\, \mathrm{\mathbb{R}}_{m\in M}^2\, \to {\mathrm{\mathbb{R}}^3}$ is implemented as a fully connected cGAN based on 3D convolution/deconvolution to achieve stack-to-volume translation for a set of z-plane reconstructions ${N_z}$, shown in Fig. 8 (b).
6.1 cGAN network architecture
GANs consist of two adversarial models; a generative model G learns the data distribution and is trained to produce outputs that cannot be distinguished from “real” images by an adversarially trained discriminator model D. The discriminator is trained to identify “fake” images produced by the generator and estimates the likelihood that a sample belongs to the training data; i.e. GANs learn a mapping from a random noise vector $\tilde{z}$ to an output image $\tilde{y}$, i.e. $G\, :\tilde{z} \to \tilde{y}$ [50].
Conditional GANs use auxiliary information to condition the generator and discriminator to continuously optimize network parameters and learning; cGANs learn a mapping from an observed image $\tilde{x}$ and random noise vector $\tilde{z}$, to an output image $\tilde{y}$, i.e. $G\, :\, \{{\tilde{x},\tilde{z}} \}\to \tilde{y}$ [51]. Figure 8 (b) depicts the training procedure implemented in the cGAN architecture. The generator output G($\tilde{x}$) and input images x are concatenated and input to the discriminator network to generate a mapping function D1. The input images and conditional images y, in this case consisting of the label images or desired generator outputs, are concatenated and input to the discriminator network to generate a mapping D2. The feature mappings D1 and D2 are used to construct the loss function of the discriminator. The generator output, conditional, and feature mapping D1 are used to construct the loss function of the generator. The composite objective of a conditional GAN can be expressed as [49,51]
The use of the discriminator in the min-max objective function provides an excellent model for image-to-image prediction as the conditioned network can represent sharp features, high resolution/high frequency content, and even degenerate distributions whereas simpler models with commonly used pixel-wise loss functions require blurry distributions. It has been proven beneficial to regularize the objective function using the mean absolute error (MAE) as the L1 norm will further discourage blurring [49]. The final objective is given by
The regularization parameter, ${\gamma _G}$=100, is used to penalize generator mismatch with the conditional labeled data. The cGAN network trained using L1 minimization can learn to create a solution that resembles realistic high-resolution images with high-frequency detail. Network prediction is achieved by applying the trained generator to testing images. The networks are trained using 240 input/output image-stack pairs and tested using 64 hold-out samples. Each training example consists of the intensity measurement stack and 3D RI. For each training epoch, i.e. when each training sample has propagated through the network at least once, the model parameters are updated for each batch of training images using gradient descent optimization and back-propagation. The Adam algorithm is used for stochastic optimization with a learning rate of 0.0002 and momentum parameters ${\beta _1} = 0.5,\,{\beta _2} = 0.999$ [52].
6.2 cGAN failure mechanisms
While cGAN is a powerful mathematical construct for image generation, there are common convergence failure mechanisms that can degrade the quality of the generated images. The fundamental GAN failure mechanism is mode collapse. Mode collapse arises from the adversarial training strategy when the generator is over-optimized to produce only a small subspace of plausible outcomes, or dominant modes, that the discriminator has learned to easily identify as a “fake”. Consequently, the discriminator lacks training support to adequately learn the full mapping of plausible input/output translations. In training, the discriminator converges to a local minimum and does not incentivize the generator to produce non-dominant modes required to adequately train the network. Conditional GANs attempt to minimize this effect by processing the generator output and conditional images through the discriminator independently, and defining a composite loss function for optimization. While the issue of mode collapse has not fully been solved, more advanced mitigation strategies have received much attention in the machine learning community, including unrolled GANs which additionally include future discriminator outputs in the loss function to prevent over-optimization.
A distinct challenge is feature hallucination, which occurs if the generator has learned numerical mappings that are inconsistent with the true bounds of physics based forward and inverse scattering. Feature hallucinations can arise when training datasets contain large sample-to-sample variations and contain even small biases and noise. Increasing the regularization parameter ${\gamma _G}$in (9) would further penalize prediction mismatch with the labeled data, effectively desensitizing the learning process and de-motivating feature hallucinations, perhaps at the cost of robust network prediction for samples that are not well represented in the training data.
6.3 U-net generator network
The generator network implementation is the well-known U-Net architecture based on the fully connected convolutional network [49,53], shown in Fig. 9. It consists of a contracting path or encoder and an expansive path or decoder. The encoder/decoder follow typical CNN architectures. The contracting path consists of repeated applications of 4${\times} $4${\times} $4 3D spatial convolutions with 3D downsampling, each followed by a leaky rectified linear unit (LReLU) activation function to alleviate the vanishing gradient problem and improve learning efficiency [54]. At each layer in the contracting path the data is downsampled using stride 2 in the transverse dimensions with variable stride ${s_z}$ in the axial dimension depending on the number of measurements. At each downsampling step the number of feature channels (i.e. the number of independent 3D spatial filters in the network node) is doubled from 16 to 128. The size of the data at each layer in the network is then $({96/{2^l},96/{2^l},M/{s_z}^l} )$, with the final layer in the encoder producing 3${\times} $3${\times} $1 transverse image patches. Every step in the expansive path consists of an up-sampling of the feature map followed by a 4${\times} $4${\times} $4 convolution (“up-convolution”) with single pixel stride that halves the number of feature channels from 128 to 16. At layer interfaces in the expansive path ReLU activation functions are enforced with 50% dropout rate. After the last decoder layer, a 3D convolution is applied to map to the number of output channels and a Tanh activation function is applied to acquire the volume prediction. The network consists of 10 total layers; batch normalization is performed at each layer interface in the encoder and decoder paths to standardize the data, stabilize and improve network training [55]. The operations performed at each layer interface are detailed in Fig. 9. While classical CNN solutions require information to flow through the entire encoder/decoder structure; the “U-Net” implementation provides skip connections between mirror symmetric layers in the encoder/decoder. Skip connections are a way to protect the information from loss during transport in neural networks and provide access to low-level information which is crucial for image representation [49,53].
6.4 PatchGAN discriminator network
To further enforce high spatial frequency representations pix2pix adopts the well-known PatchGAN discriminator framework [49], which only penalizes structure at the scale of local image patches. The discriminator is run for a series of convolution patches across the image, averaging all responses to provide an aggregate likelihood that each 3${\times} $3${\times} $1 patch is a real image produced by the generator. By assuming independence between pixel separations which are large relative to the patch size, the network will learn higher order image texture [49]. The discriminator network, shown in Fig. 10, mimics the first five convolutional layers in the contracting path of the generator network; the number of feature channels doubles from 16 to 128. The final layer in the discriminator network applies a Tanh activation function to 3${\times} $3${\times} $1 patches to generate class likelihood scores that feed the cGAN loss function described in Section 6.1 and train the Generator network described in Section 6.3.
7. Results
7.1 Implementation and timing
All results were generated using a Linux based system with Intel Xeon W-2295 18 core (36 thread) processor operating at 3.00 GHz base frequency with 64 GB of RAM. To accelerate computations, both ODT inversion algorithms were executed using a NVIDIA RTX A5000 GPU with 24 GB RAM and 8192 CUDA cores. The algorithms were developed using Python 3.9.7 and CUDA Toolkit 11.6. ODT-Gradient inverse utilizes CuPy 9.6.0 for GPU accelerated computing. The cGAN architecture for the ODT-Deep inverse framework was implemented using the TensorFlow (v2.8.0) framework; matrix operations were implemented using single-precision (32-bit) floating point operations evaluated across CUDA cores.
The time complexity of ODT-Gradient inverse and ODT-Deep inverse implemented on the GPU are shown in Fig. 11. Algorithm timing is performed for 3D RI profiles with ${N_z}$=96 axial planes and N root-pixels per image, performance is shown for increasing number of measurements and image sizes. The reconstruction time per iteration for ODT-Gradient inverse is shown in Fig. 11 (a). The time complexity of ODT-Deep inverse is presented in Fig. 11 (b), the training time per epoch (black) using 240 input/output image-stack pairs and testing time per sample (red) are shown. For N = 96, network training per epoch is 1-10 min and measurement stack to RI volume prediction is performed in less than 100 ms for up to several hundred measurements. The total time to reconstruct a full 3D object is then significantly decreased relative to ODT-Gradient inverse, which takes 225 sec to iterate and converge to a 96${\times} $96${\times} $96 3D reconstruction using 400 total measurements and 25 iterations. For moderate image sizes (N > 128), ODT-Gradient reconstruction time is dominated by the series of plane-to-plane 2D spatial FFTs exhibiting $({{N^2} \cdot {N_z}} )\log ({{N^2}} )$ complexity. ODT-Deep inverse is dominated by layers of 3D convolutions, with the most intensive exhibiting $({{{[{N + k} ]}^2} \cdot [{{N_z} + k} ]} )\log ({{{[{N + k} ]}^2} \cdot [{{N_z} + k} ]} )$ complexity for filter size k. Computational limitations are reached for N > 512 due to extensive training times and limited GPU memory to store training measurements and volumes, highlighting a main disadvantage of pre-trained 3D neural networks.
7.2 ODT inverse for 3D RI estimation
We consider 3D refractive index reconstruction and prediction using through-focus and angle measurement simulations of the optical system described in Fig. 2 and optical scattering described in Section 4. Figure 12 (a-c) shows 2D slices through the true RI of a simulated cell phantom. The forward model is applied to the 3D scattering potential to generate intensity measurements which are used as support for the tomographic inversion processes. As a representative example, we will consider 252 measurements consisting of ${M_z} = 7$ defocus planes spanning ${\pm} $3.3 micron for each of ${M_\theta } = 36$ illumination angles spanning (-60,60) degrees in both transverse spatial dimensions, i.e. 6 angles relative to x and y coordinates, respectively. For alternate measurement sets, the de-focus and angular scanning range remained fixed at ${\pm} $3.3 microns and (-60,60) degrees respectively, with increasing resolution for increasing measurements.
Figure 13 (a-c) shows 2D slices through the reconstructed RI using ODT-Gradient inverse after 25 optimization iterations. The gradient solver timed 6 sec/iteration to perform 3D object reconstruction with ${N_z} = 96$ axial planes. Figure 14 (a-c) shows 2D slices through the reconstructed RI predicted by ODT-Deep inverse after 100 training epochs and the same measurements as considered in Fig. 13. Network training was executed at a rate of 165 sec/epoch, however the mean time to test a sample and predict a volume with ${N_z} = 96$ axial planes was just 68 ms.
Figure 12–14 shows that ODT-Gradient inverse suffers from blurring and reconstruction biases that are not present in the ODT-Deep inverse reconstruction. Figure 15 provides a 3D rendering of the forward scattering process (a) and scattering inversion based on the two approaches discussed (b-c). The 3D projection further demonstrates that ODT-Deep inverse reconstructs the original 3D refractive index with minimal blurring and less overall biases relative to ODT-Gradient inverse. Furthermore, many inhomogeneities that are missing from (b) are fully resolved using the network based approach (c). Deep learning based reconstruction is not without artifacts, as seen by faint vertical striping in Fig. 14 (b-c). The artifacts were found to become less pronounced with increasing training examples and training iterations, at the cost of increased computational resources and extended training times.
Figure 16 shows the root-mean-square error (RMSE) of the 3D RI reconstruction relative to the true scattering potential aggregated using 64 testing samples for ODT-Gradient inverse as a function of optimization iteration (a) and ODT-Deep inverse as a function of training epochs (b). Increased axial and illumination diversity independently improved the quality of reconstruction for both approaches. Increased axial-scanning support for ODT-Gradient inverse improved the convergence for sufficiently high illumination diversity. ODT-Gradient inverse ultimately becomes limited by the numerical inaccuracies of the non-deterministic optimization routine for sufficiently large measurement sets, achieving minimum RI errors on the order of 2e-2. The minimum achievable reconstruction error for ODT-Deep inverse was found to be 4${\times} $ below this bound. All results assume a microscope objective function with NA = 1.4, demonstrating robust reconstruction performance for high resolution imaging.
8. Conclusions and future work
Non-interferometric phase microscopy significantly decreases the complexity of 2D/3D quantitative imaging systems. Measuring a stack of through-focus intensity images can provide high fidelity phase contrast images for samples that meet the requirements of the first Born approximation. 3D RI retrieval methods exist for focal-scanning systems, however with drastically reduced axial resolution relative to systems that measure illumination diversity. We considered advanced tomographic RI retrieval approaches designed to meet the unique challenges of non-interferometric ODT systems. To provide support for scattering inversion techniques we considered through-focus and angle scanning diversity to encode a more complete 3D representation into the 2D scattered fields with improved spatial resolution relative to illumination-scanning only systems. High-fidelity 3D RI images were acquired using advanced software solutions to provide high-fidelity tomographic reconstruction for large datasets consisting of 3D input/output images.
This work demonstrates, in simulation, accurate 3D refractive index retrieval achieved by inverting the optical scattering process using non-interferometric measurements and an ODT-Deep inverse model based entirely on DL/CNN. The model consists of a pre-trained 3D cGAN network which translates 2D intensity measurements to a 3D RI volume. Achieving a fully-learned representation of optical scattering inversion improves errors associated with an incomplete sampling of the forward scattering kernel, scattering approximations, and numerical instabilities which arise from solving a non-deterministic constrained optimization problem. As is often the case with DL solutions, the performance will be bounded by the quality of the output images used in training. Using simulations of the phase-only scattering potential for biological samples, we have demonstrated improved performance relative to a constrained optimization approach based on a multiple in-plane scattering model and regularized gradient descent. While training the ODT-Deep inverse model requires significant time and computational resources, we demonstrated a significant decrease in reconstruction error and three orders of magnitude reduction in post-training reconstruction time relative to the ODT-Gradient inverse model. However, DL generated data can exhibit inaccuracies and feature hallucinations caused by the presence of noise and over-optimization. Our results show that increased axial and illumination diversity both significantly improve the quality of reconstruction for both inversion techniques. Additionally, we showed increased axial scanning improves the convergence in terms of number of training epochs for sufficiently high illumination diversity.
This work serves as a proof of concept for 3D optical scattering inversion implemented using a single 3D convolutional neural network. As a next step we will seek to eliminate the training complexity by adopting a self-supervised learning approach based on an intelligent regularization routine that exploits knowledge of the optical scattering process [19,21]. A meaningful extension would consider scattering inversion for opaque and highly diffractive samples. For such samples, we would expect a significantly improved reconstruction quality for increasing optical sectioning provided by axial scanning diversity.
Acknowledgments
We would like to thank The Johns Hopkins Applied Physics Laboratory for supporting this research effort. Portions of this work were presented at the Optica Digital Holography and 3-D Imaging Meeting in 2022, 3D Refractive Index Estimation Based on Partially-Coherent Optical Diffraction Tomography (PC-ODT) and Deep Learning.
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper may be obtained from the authors upon reasonable request.
References
1. T. Latychevskaia, F. Gehri, and H. Fink, “Depth-resolved holographic reconstructions by three-dimensional deconvolution,” Opt. Express 18(21), 22527–22544 (2010). [CrossRef]
2. A. Kuś, W. Krauze, and M. Kujawińska, “Active limited-angle tomographic phase microscope,” J. Biomed. Opt. 20(11), 111216 (2015). [CrossRef]
3. Y. Lin, H. Chen, H. Tu, C. Liu, and C. Cheng, “Optically driven full-angle sample rotation for tomographic imaging in digital holographic microscopy,” Opt. Lett. 42(7), 1321–1324 (2017). [CrossRef]
4. V. Balasubramani, A. Kuś, H. Tu, C. Cheng, M. Baczewska, W. Krauze, and M. Kujawińska, “Holographic tomography: techniques and biomedical applications,” Appl. Opt. 60(10), B65–B80 (2021). [CrossRef]
5. J. Soto, J. Rodrigo, and T. Alieva, “Optical diffraction tomography with fully and partially coherent illumination in high numerical aperture label-free microscopy,” Appl. Opt. 57(1), A205–A214 (2018). [CrossRef]
6. J. Soto, J. Rodrigo, and T. Alieva, “Partially coherent illumination engineering for enhanced refractive index tomography,” Opt. Lett. 43(19), 4699–4702 (2018). [CrossRef]
7. J. Soto, J. Rodrigo, and T. Alieva, “Partially coherent optical diffraction tomography for label-free quantitative microscopy,” 16th Workshop on Information Optics (WIO), 1–3 (2017).
8. J. Soto, J. Rodrigo, and T. Alieva, “Partially coherent optical diffraction tomography toward practical cell study,” Front. Phys. 9, 666256 (2021). [CrossRef]
9. C. Zuo, L. Jiaji, J. Sun, Y. Fan, J. Zhang, L. Lu, R. Zhang, B. Wang, L. Huang, and Q. Chen, “Transport of intensity equation: a tutorial,” Opt. Lasers Eng. 135, 106187 (2020). [CrossRef]
10. M. Jenkins, J. Long, and T. Gaylord, “Multifilter phase imaging with partially coherent light,” Appl. Opt. 53(16), D29–D39 (2014). [CrossRef]
11. Y. Bao and T. Gaylord, “Quantitative phase imaging method based on an analytical nonparaxial partially coherent phase optical transfer function,” J. Opt. Soc. Am. A 33(11), 2125–2136 (2016). [CrossRef]
12. Z. Jingshan, R. Claus, J. Dauwels, L. Tian, and L. Waller, “Transport of Intensity phase imaging by intensity spectrum fitting of exponentially spaced defocus planes,” Opt. Express 22(9), 10661–10674 (2014). [CrossRef]
13. L. Waller, L. Tian, and G. Barbastathis, “Transport of Intensity phase-amplitude imaging with higher order intensity derivatives,” Opt. Express 18(12), 12552–12561 (2010). [CrossRef]
14. Z. Jingshan, R. A. Claus, J. Dauwels, L. Tian, and L. Waller, “Non-uniform sampling and Gaussian process regression in transport of intensity phase imaging,” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7784–7788 (2014).
15. L. Tian and L. Waller, “Quantitative differential phase contrast imaging in an LED array microscope,” Opt. Express 23(9), 11394–11403 (2015). [CrossRef]
16. S. Chowdhury, M. Chen, R. Eckert, D. Ren, F. Wu, N. Repina, and L. Waller, “High-resolution 3D refractive index microscopy of multiple-scattering samples from intensity images,” Optica 6(9), 1211–1219 (2019). [CrossRef]
17. J. Li, N. Zhou, J. Sun, S. Zhou, Z. Bai, L. Lu, Q. Chen, and C. Zuo, “Transport of intensity diffraction tomography with non-interferometric synthetic aperture for three-dimensional label-free microscopy,” Light: Sci. Appl. 11(154), 154 (2022). [CrossRef]
18. L. Tian and L. Waller, “3D intensity and phase imaging from light field measurements in an LED array microscope,” Optica 2(2), 104–111 (2015). [CrossRef]
19. R. Eckert, “Robust 3D Quantitative Phase Imaging,” PhD Thesis (Department of Electrical Engineering and Computer Science, University of California, Berkeley, 2022).
20. R. Eckert, Z. Phillips, and L. Waller, “Efficient illumination angle self-calibration in Fourier ptychography,” Appl. Opt. 57(19), 5434–5442 (2018). [CrossRef]
21. R. Cao, M. Kellman, D. Ren, R. Eckert, and L. Waller, “Self-calibrated 3D differential phase contrast microscopy with optimized illumination,” Biomed. Opt. Express 13(3), 1671–1684 (2022). [CrossRef]
22. E. Soubies, F. Soulez, M. McCann, T. Pham, L. Donati, T. Debarre, D. Sage, and M. Unser, “Pocket guide to solve inverse problems with GlobalBioIm,” Inverse Problems 35(10), 104006 (2019). [CrossRef]
23. A. Devaney, Mathematical Foundations of Imaging, Tomography and Wavefield Inversion (Cambridge University, 2012), Chap. 6.
24. N. Streibl, “Three-dimensional imaging by a microscope,” J. Opt. Soc. Am. A 2(2), 121–127 (1985). [CrossRef]
25. D. Jin, R. Zhou, Z. Yaqoob, and P. So, “Tomographic phase microscopy: principles and applications in bioimaging,” J. Opt. Soc. Am. B 34(5), B64–B77 (2017). [CrossRef]
26. T. Nguyen, V. Bui, and G. Nehmetallah, “Computational Optical Tomography Using 3D Deep Convolutional Neural Networks (3D-DCNNs),” Opt. Eng. 57(04), 1 (2018). [CrossRef]
27. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “A deep learning approach for Fourier ptychography microscopy,” Opt. Express 26(20), 26470–26484 (2018). [CrossRef]
28. T. Nguyen, G. Nehmetallah, D. Tran, A. Darudi, and P. Soltani, “Fully automated, high speed, tomographic phase object reconstruction using the transport of intensity equation in transmission and reflection configurations,” Appl. Opt. 54(35), 10443–10453 (2015). [CrossRef]
29. J. Jabbour, B. Malik, C. Olsovsky, R. Cuenca, S. Cheng, J. Jo, Y. Cheng, J. Wright, and K. Maitland, “Optical axial scanning in confocal microscopy using an electrically tunable lens,” Biomed. Opt. Express 5(2), 645–652 (2014). [CrossRef]
30. B. Samanta, F. Pardo, T. Salamon, R. Kopf, and M. Eggleston, “Low-cost electrothermally actuated MEMS mirrors for high-speed linear raster scanning,” Optica 9(2), 251–257 (2022). [CrossRef]
31. S. Shin, K. Kim, J. Yoon, and Y. Park, “Active illumination using a digital micromirror device for quantitative phase imaging,” Opt. Lett. 40(22), 5407–5410 (2015). [CrossRef]
32. A. Matlock and L. Tian, “High-throughput, volumetric quantitative phase imaging with multiplexed intensity diffraction tomography,” Biomed. Opt. Express 10(12), 6432–6448 (2019). [CrossRef]
33. R. Guo, I. Barnea, and N. Shaked, “Limited-angle tomographic phase microscopy utilizing confocal scanning fluorescence microscopy,” Biomed. Opt. Express 12(4), 1869–1881 (2021). [CrossRef]
34. Y. Sung, W. Choi, C. Fang-Yen, K. Badizadegan, R. Dasari, and M. Feld, “Optical diffraction tomography for high resolution live cell imaging,” Opt. Express 17(1), 266–277 (2009). [CrossRef]
35. B. Bazow, T. Phan, V. Lam, C. Raub, and G. Nehmetallah, “Synthetic training of machine learning algorithms for holographic cell imaging,” Proc. SPIE 11731, 1173102 (2021). [CrossRef]
36. B. Bazow, T. Phan, C.B. Raub, and G. Nehmetallah, “Computational multi-wavelength phase synthesis using convolutional neural networks,” Appl. Opt. 61(5), B132–B146 (2022). [CrossRef]
37. Y. Suzuki, S. Watanabe, M. Odaira, T. Okamura, and Y. Kawata, “Numerical simulations of partially coherent illumination for multi-scattering phase objects via a beam propagation method,” Appl. Opt. 58(4), 954–962 (2019). [CrossRef]
38. U. Kamilov, I. Papadopoulos, M. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Optical Tomographic Image Reconstruction Based on Beam Propagation and Sparse Regularization,” IEEE Trans. Comput. Imaging 2(1), 59–70 (2016). [CrossRef]
39. X. Jin, L. Li, Z. Chen, L. Zhang, and Y. Xing, “Anisotropic total variation for limited-angle CT reconstruction,” IEEE Nuclear Science Symposuim & Medical Imaging Conference, 2232–2238 (2010).
40. W. Krauze, A. Kuś, and M. Kujawinska, “Limited-angle hybrid optical diffraction tomography system with total-variation-minimization-based reconstruction,” Opt. Eng. 54(5), 054104 (2015). [CrossRef]
41. M. Rubin, O. Stein, N. Turko, Y. Nygate, D. Roitshtain, L. Karako, I. Barnea, R. Giryes, and N. Shaked, “TOP-GAN: Stain-free cancer cell classification using deep learning with a small training set,” Med. Image Anal. 57, 176–185 (2019). [CrossRef]
42. M. Rubin, O. Stein, R. Giryes, D. Roitshtain, and N. T. Shaked, “Quantitative Phase Maps of Live Cells Classified By Transfer Learning and Generative Adversarial Network (GAN),” Computational Optical Sensing and Imaging Conference (COSI), 25–28 (2018).
43. I. Moon and K. Jaferzadeh, “Automated digital holographic image reconstruction with deep convolutional neural networks,” Proc. SPIE 11402, 114020A (2020). [CrossRef]
44. X. Li, H. Qi, S. Jiang, P. Song, G. Zheng, and Y. Zhang, “Quantitative phase imaging via a cGAN network with dual intensity images captured under centrosymmetric illumination,” Opt. Lett. 44(11), 2879–2882 (2019). [CrossRef]
45. A. Lucas, M. Iliadis, R. Molina, and A. Katsaggelos, “Using deep neural networks for inverse problems in imaging: beyond analytical methods,” IEEE Signal Process. Mag. 35(1), 20–36 (2018). [CrossRef]
46. M. McCann, K. Jin, and M. Unser, “Convolutional neural networks for inverse problems in imaging: a review,” IEEE Signal Process. Mag. 34(6), 85–95 (2017). [CrossRef]
47. D. Ryu, D. Ryu, Y. Baek, H. Cho, G. Kim, Y. Kim, Y. Lee, Y. Kim, J. Ye, H. Min, and Y. Park, “DeepRegularizer: rapid resolution enhancement of tomographic imaging using deep learning,” arXiv, arXiv:2009.13777 (2020).
48. K. Zhou and R. Horstmeyer, “Diffraction tomography with a deep image prior,” Opt. Express 28(9), 12872–12896 (2020). [CrossRef]
49. P. Isola, J. Zhu, T. Zhou, and A. Efros, “Image-to-Image Translation with conditional adversarial networks,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967–5976 (2017).
50. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” arXiv, arXiv:1406.2661 (2014).
51. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv, arXiv:1411.1784 (2014). [CrossRef]
52. D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv, arXiv:1412.6980 (2015). [CrossRef]
53. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” arXiv, arXiv:1505.04597 (2015).
54. D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” arXiv, arXiv:1511.07289 (2016). [CrossRef]
55. S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Coveriate Shift,” arXiv, arXiv:1502.03167 (2015). [CrossRef]