Phase extraction neural network (PhENN) with coherent modulation imaging (CMI) for phase retrieval at low photon counts

Iksung Kang; Fucai Zhang; George Barbastathis; George Barbastathis

doi:10.1364/OE.397430

1. Introduction

Retrieving phase information out of intensity images captured by a detector is important in several practical applications where photon-limited illumination onto samples is desired [1]. For imaging biological specimens, high-dose light may induce phototoxicity, which could affect the viability of cells [2] as cost of larger signal-to-noise ratio (SNR) in detection. For particle imaging, this concern is often even more severe: for example, imaging integrated circuits requires reduced beam power to avoid destructive side-effects, such as heat-induced deformation [3,4].

There are various ways to retrieve phase information from intensity measurements, for example holography [5–9], ptychography [10–12] or through the transport-of-intensity equation (TIE) [13–17]. Coherent diffraction imaging (CDI) [18–20] is popular as a lensless method relying on a single diffraction pattern and without need for a reference beam.

Under ample illumination, iterative reconstruction algorithms, e.g. Gerchberg-Saxton-Fienup (GSF) [21–23], tend to work quite well for CDI when the objects are isolated and have no strong phase variations. Using a neural network as inverse operator in CDI also works well but does not confer appreciable performance advantage over GSF [24]. When the noise becomes significant, using a stronger regularization prior is generally recommended; and in that case a machine learning algorithm, such as a deep neural network [25] or a combination of networks [26] with a restricted training dataset becomes an effective way to learn this regularization prior. Performance improves further if the raw image is first processed by an approximate inverse, which we call the “Approximant.” The Approximant partially incorporates our knowledge of the physics of the optical system into the subsequent machine learning based inverse algorithm. Alternatively, the Approximant may be thought of as reducing the learning burden, so that the neural network needs to learn less about the physics and more about the prior—even though that distinction is not clearly delineated [27].

Coherent modulation imaging (CMI) introduces a physical constraint by phase modulating the optical field diffracted from the object at some intermediate distance between the object and the camera [28–33]. The modulation may be random or of a well-defined form, e.g. using a quadratic phase pattern [34–37]. In this paper, we adopt the random modulation approach. The phase information from the object is encoded in the speckle-like diffraction pattern that is recorded as raw image at the detector plane. The CMI scheme effectively improves ill-posedness and eliminates ambiguous solutions, such as the twin image, in the inverse estimate of the phase.

Nevertheless, when the number of illuminating photons is limited, iterative reconstruction algorithms are prone to fail to converge or produce strong artifacts even with the random phase modulation in the CMI scheme. The purpose of this paper is to investigate, for the first time to our knowledge, the use of deep neural networks in combination with CMI to obtain several improvements: guarantee convergence, improve image fidelity through removal of the artifacts, resolve the ambiguities in the phase reconstructions. The general theory of CMI and as implemented here with a neural network-assisted inverse are developed in Section 2. CMI design is optimized for use with a neural network-assisted inverse in Section 3. According to this optimization, we constructed CMI experimental apparatus, trained the algorithms and conducted extensive qualitative and quantitative tests and comparisons with CDI and the GSF algorithm. The apparatus description and results are in Section 4 and concluding thoughts are in Section 5.

2. Methods

2.1 Coherent modulation imaging scheme in a general sense

The CMI principle is shown in Fig. 1. The phase object is illuminated by a localized wave at normal incidence. A spatial light modulator (SLM) or a fabricated mask with random, binary-phase transitions is placed along the path between the phase object and the measurement plane. The purpose of the mask is to randomly encode the phase information. Fabricated phase masks, in particular, enable sharper transitions in phase than SLMs and can be used with X-rays and particle radiation sources like electrons, but they have some limitations: (1) binary design will lead to sampling artifacts; (2) multi-level masks are generally expensive to fabricate; and (3) the randomness encoded on the mask cannot be altered once the mask has already been made. Nevertheless, multi-bit SLMs that are mostly available for visible and near-infrared light can approximate continuous phase modulations, and the displayed patterns may be altered at will. Therefore, in this work we chose the SLM method to implement the random phase mask.

Fig. 1. Schematic of coherent modulation imaging (CMI) setup. This setup involves three physical planes, i.e. an object plane, modulation plane, and a detector plane.

Download Full Size | PDF

Let ${\psi }_{\textrm {obj}}(x,y)=\textrm {e}^{i\varphi (x,y)}$ denote the field at the exit plane of the phase object. In the simplest case, when the phase object is well approximated as thin, $\varphi$ directly maps the index of refraction modulation or topography of the object. If the thin film approximation is not satisfied, then more elaborate models [38–41] relate the object structure to $\varphi$, and coupled amplitude modulation may occur in addition to phase modulation. For now, we neglect these effects and consider $\varphi (x,y)$ as a pure phase signal which we wish to reconstruct.

Let $\Phi (x',y')$ represent the (also assumed pure) phase modulation imposed by the CMI mask. Under the paraxial and scalar approximations, the field at the detector plane is expressed as [42]

(1)$${\psi}_{\textrm{det}}\left(x^{\prime\prime},y^{\prime\prime}\right) = \frac{\textrm{e}^{i2\pi z_2/\lambda}}{i\lambda z_2} \iint {\psi}_{\textrm{M}}\left(x',y'\right)\exp\left\{ {i\pi\frac{\left(x^{\prime\prime}-x'\right)^{2}+\left(y^{\prime\prime}-y'\right)^{2}}{\lambda z_2}} \right\}\textrm{d}x'\textrm{d}y',$$

(2)$$\textrm{where}\quad {\psi}_{\textrm{M}}\left(x',y'\right) = {\psi}_{\textrm{m}}\left(x',y'\right)\:\exp\left\{ {\: i\Phi\left(x',y'\right)\:} \right\} \qquad \textrm{and}$$

(3)$${\psi}_{\textrm{m}}\left(x',y'\right) = \frac{\textrm{e}^{i2\pi z_1/\lambda}}{i\lambda z_1} \iint {\psi}_{\textrm{obj}}\left(x,y\right) \exp\left\{ {i\pi\frac{\left(x'-x\right)^{2}+\left(y'-y\right)^{2}}{\lambda z_1}} \right\}\textrm{d}x\textrm{d}y.$$

Here, $z_1, z_2$ are the propagation distances from the object to the mask, and from the mask to the detector, respectively; and ${\psi }_{\textrm {m}}$ and ${\psi }_{\textrm {M}}$ are the fields before and after the randomly modulating mask $\Phi \left (x',y'\right )$, respectively. The mask is assumed to satisfy the thin film approximation. The detector registers the intensity

(4)$${I}_{\textrm{det}}\left(x^{\prime\prime},y^{\prime\prime}\right) = \left|{\psi}_{\textrm{det}}\left(x^{\prime\prime},y^{\prime\prime}\right)\right|^{2}.$$

Since in the frequency domain Eq. (2) becomes a convolution, the effect of the random phase modulation is to spread out the object signal in both the spatial and spatial frequency domains. This means that higher frequencies, which are more at risk of being cut off by the system aperture or scrambled by the severe ill-posedness of the Fresnel propagation operator, get mapped onto lower frequencies where they are in principle easier to recover. In that sense, CMI resembles spread spectrum techniques in communication theory [43] and indeed the CMI method has been alternatively called spread-spectrum phase retrieval (SSPR) [44].

Forward Eqs. (1–4) also apply to CDI with the choice $\Phi \left (x',y'\right )=\mathbf {0}$. The difference between CDI and CMI measurements is illustrated in Figs. 2 and 3 in the space and spatial-frequency domains, respectively. The diffraction pattern is discernible in the spatial CDI raw intensity with $10^{3}$ photons/pixel, whereas the CMI raw intensity is completely diffuse. Spreading of the spectrum is also evident in the corresponding power spectral density (PSD) of the CMI raw intensity. These trends are, of course, not discernible in either the highly noisy raw images or their PSDs at $1$ photon/pixel.

Fig. 2. Comparison in measurements in the spatial domain between with and without the random phase modulation. In these experiments, $z_1=490\ \textrm {mm}$ and $z_2= 48.5\ \textrm {mm}$ for CMI and the propagation distance is $z_1+z_2 = 538.5\ \textrm {mm}$ for CDI.

Download Full Size | PDF

Fig. 3. Power spectral densities (PSDs) of the corresponding $500$ intensity measurements from Fig. 2. (Asymmetric periodic artifacts that are clearly visible in the PSDs are due to unwanted fixed-pattern noise (FPN) in our EM-CCD.)

Download Full Size | PDF

To retrieve the phase $\varphi (x,y)$ from the intensity ${I}_{\textrm {det}}\left (x'',y''\right )$, we construct a two-step inverse algorithm. The first step is to define the forward operator, which essentially is a discretization of the Fresnel Eqs. (1) and (3). This is done in Section 2.2. The forward operator is then used in the computation of the Approximant, and the output is used as input to a neural network that finalizes the computation of the inverse, as described in Section 2.3. The training of the DNN is described in Section 2.4.

2.2 Definition of the forward operator

Our computational window consists of $N\times N$ pixels. Let $\psi _{\textrm {obj},mn},\ m, n=1, \ldots , N$, denote the object field at discrete location $\left (x_m, y_n\right )$. We rasterize ${\psi }_{\textrm {obj}}$ to the $N^{2}\times 1$ vector ${\Psi }_{\textrm {obj}}$ and define the Fresnel kernel $N\times N$ matrices $A$, $B$, $C$, $D$, such that

(5)$$a_\mathit{kl} = \frac{\textrm{e}^{i2\pi z_1/\lambda}}{\mathit{i\lambda z}_1}\exp\left\{ {\mathit{i\pi}\frac{({x}_{\textrm{k}}'-{x}_{\textrm{l}})^{2}}{\mathit{\lambda z}_1}} \right\}, \quad \quad b_\mathit{kl} = \frac{\textrm{e}^{i2\pi z_1/\lambda}}{\mathit{i\lambda z}_1}\exp\left\{ {\mathit{i\pi}\frac{({y}_{\textrm{k}}'-{y}_{\textrm{l}})^{2}}{\mathit{\lambda z}_1}} \right\}, \quad \textrm{and}$$

(6)$$c_\mathit{kl} = \frac{\textrm{e}^{i2\pi z_2/\lambda}}{\mathit{i\lambda z}_2}\exp\left\{ {\mathit{i\pi}\frac{({x}_{\textrm{k}}^{\prime\prime}-{x}_{\textrm{l}}')^{2}}{\mathit{\lambda z}_2}} \right\}, \quad \quad d_\mathit{kl} = \frac{\textrm{e}^{i2\pi z_2/\lambda}}{\mathit{i\lambda z}_2}\exp\left\{ {\mathit{i\pi}\frac{({y}_{\textrm{k}}^{\prime\prime}-{y}_{\textrm{l}}')^{2}}{\mathit{\lambda z}_2}} \right\},$$

where $k,l = 1, \ldots , N$. We also define the diagonal mask matrix ${\Psi }_{\textrm {M}}$ such that

(7)$$\mathrm{\Psi}_{\mathrm{M},\mathit{kk}} = \exp\left\{ {i\mathrm{\Phi}_{\mathrm{M},\mathit{kk}}} \right\},\ \mathit{kk}=1, \ldots, N^{2},$$

where ${\Phi }_{\textrm {M,}}{}_{\mathit {kk}}$ is the phase imparted by the mask on the field at the $\mathit {kk}$-th rasterized pixel. The field at the output plane then is expressed as

(8)$${\Psi}_{\textrm{det}} = \left[ (C\otimes D) {\Psi}_{\textrm{M}} (A\otimes B)\right]{\Psi}_{\textrm{obj}} \equiv H_\mathit{z_1, z_2} {\Psi}_{\textrm{obj}},$$

where $\otimes$ denotes the Kronecker tensor product and $H_{\mathit {z_1, z_2}}$ is the overall linear discrete forward operator. The final intensity measurement $N^{2}\times 1$ vector is

(9)$${I}_{\textrm{det}} = \left|{\Psi}_{\textrm{det}}\right|^{2} = \left|H_\mathit{z_1,z_2}{\Psi}_{\textrm{obj}}\right|^{2},$$

with the modulus-square operating element-wise.

2.3 Inverse algorithm

In Fig. 4, the algorithm receives an intensity-only measurement ${I}_{\textrm {det}}\left (x'',y''\right )$ and derives the phase inverse estimate $\hat {\varphi }(x,y)$ based on two types of reconstruction method: GSF algorithm and deep neural network (DNN). (The hat over $\hat {\varphi }$ indicates the phase estimate vis-à-vis the true phase $\varphi$.) Reconstructing phase information solely depending on the GSF algorithm is similar to a conventional technique of CMI scheme [30], which leads to GSF reconstructions ${\hat {\varphi }}_{\textrm {GSF}}$. Instead, the DNN-based algorithm first utilizes the intermediate or Approximant reconstruction ${\hat {\varphi }}_{\textrm {approx}}$, which then forms a training pair with its corresponding ground truth image $\varphi$ for training to produce the final reconstruction ${\hat {\varphi }}_{\textrm {DNN}}$. The performance is then tested by comparing ${\hat {\varphi }}_{\textrm {DNN}}$ to $\varphi$ for “test” pairs, excluded from the training set.

Fig. 4. Inverse algorithm for reconstruction using GSF algorithm and deep neural network (DNN). The dashed box indicates the Approximant and GSF parts of our overall computational platform. Please refer to Appendix A for additional details.

Download Full Size | PDF

We denote the phase estimate produced by our DNN algorithm as

(10)$${\hat{\varphi}}_{\textrm{DNN}}= \: \textrm{DNN}\left({\hat{\varphi}}_{\textrm{approx}}\right),$$

where $\textrm {DNN}(\cdot )$ is the input-output relationship of the trained DNN. Training, i.e. specifying the weights ${\mathbf {w}}_{\textrm {DNN}}$, is the nonlinear minimization procedure

(11)$$\mathbf{w}_{\mathrm{DNN}}=\underset{\mathbf{w}}{\operatorname{argmin}} \sum_{n} \zeta\left[{ \varphi_{n},\mathrm{DNN}\left(\hat{\varphi}_{n,\textrm{approx}}\left(\varphi_{n}\right)\right)}\right],$$

where $\zeta$ is the training loss function (TLF) for training; $\varphi _n$ is the true phase of the $n$-th example in the batch selected by a stochastic gradient procedure; and $\hat {\varphi }_{n,\textrm {approx}}$ is the Approximant (DNN input) obtained from the raw intensity ${I}_{\textrm {det}}$ of $n$-th example in the physical system of Fig. 1 and represented by the forward model (1–4). The TLF choice is discussed in more detail in Section 2.4.

The Approximant ${\hat {\varphi }}_{\textrm {approx}}$ is also based on the GSF algorithm with only a single backward step, i.e. half an iteration. This strategy was also followed in [25]. Other Approximant implementations are possible, but we did not investigate them in this paper. We also compute the full TV-denoised version of the GSF algorithm to generate ${\hat {\varphi }}_{\textrm {GSF}}$ for comparison with ${\hat {\varphi }}_{\textrm {DNN}}$.

The combined GSF- and DNN-based inverse algorithms shown in Fig. 4 proceed as follows: First, the intensity measurement ${I}_{\textrm {det}}$ is pre-processed as described in Appendix A. The GSF module is initialized with a plane wave with a truncated Airy pattern and $t=0$. This goes through the forward operator $H_{z_1,z_2}$ to reach the detector plane. The forward operator is realized using the angular spectrum method [45] because this eases sampling requirements given the physical parameters of our system; whereas [30] used [46]. The pre-processed measurement is imposed as the modulus constraint to obtain the Approximant ${\hat {\varphi }}_{\textrm {approx}}$ with $t=0$ as shown in Fig. 4. The TV denoising step on the phase estimate at this stage for the Approximant is optional (see Table 1.)

Table 1. Specifications on important parameters for experiments, pre-processing steps, and training process. Here, the photon count is the effective number of photons or photoelectrons per pixel. ${N}_{\textrm {TV}}$ is the number of iteration of optional TV denoising applied to $\hat{\varphi}_\text{approx}$.

View Table | View all tables in this article

Continuing on with the computation toward the GSF estimate ${\hat {\varphi }}_{\textrm {GSF}}$, the backpropagation operation is applied to the field estimate as $H^{*}_{z_1,z_2}=H^{-1}_{z_1,z_2}$. A TV denoising process is also applied to the phase estimate [19,47], and an update according the hybrid input-output scheme [23]; this follows the convention in [48]. A subsequent support constraint leads to a detector field estimate, which replaces the previous iterate of the phase estimate and with $T\;>\;0$ the iteration repeats for $t=1,\ldots , T$. GSF reconstructions in Sections 3.1 and 3.2 were obtained with $T = 30$ iterates.

2.4 Training the deep neural network

For all our DNN results, we used the PhENN (Phase Extraction Neural Network) architecture [24], as shown in Fig. 5. This includes encoder-decoder structures and skip connections according to the U-Net [49] principle with residuals [50]. PhENN is known to work for reconstructing phase information out of intensity measurements. For training and testing, several images were randomly picked from ImageNet [51] and a segmented IC layout [52]. ImageNet, in particular, is a reasonable choice for both training and testing as it is known to be a highly generic dataset with cross-domain generalization ability, whereas the IC layout is a good example of a highly restricted prior [27]. Comparison on cross-domain generalization ability between the neural networks trained with two different databases can be found in Appendix B.

From each database we drew randomly $5000$ training examples, $450$ validation examples, and $50$ test examples. For training, we used the stochastic gradient descent scheme with the Adam [53] optimizer over $100$ epochs, of which the initial learning rate was set to be $0.001$. Desktop for all of these processes has specifications of Intel CPU i$9$-$9900$K $3.60\ \textrm {GHz}$ with $16$ $\textrm {MB}$ cache, $64$ $\textrm {GB}$ of RAM, and NVIDIA GeForce RTX $2080$ with $8$ $\textrm {GB}$ VRAM.

The TLF was chosen as either the structural similarity index metric (SSIM) [54] or the negative Pearson correlation coefficient (NPCC). The respective TLFs are defined as

(12)$${\zeta}_{\textrm{NPCC}}\big(f,g\big) \equiv -\:\frac{\displaystyle{\sum_{x,y}}\Big(f(x,y)-\big\langle f\big\rangle \Big)\Big(g(x,y)-\big\langle g\big\rangle \Big)}{\sqrt{\displaystyle{\sum_{x,y}}\Big(f(x,y)-\big\langle f\big\rangle \Big)^{2}}\sqrt{\displaystyle{\sum_{x,y}}\Big(g(x,y)-\big\langle g\big\rangle \Big)^{2}}}$$

(13)$${\zeta}_{\textrm{SSIM}}\big(f,g\big) \equiv \frac{\Big(2\big\langle f\big\rangle \big\langle g\big\rangle +c_1\Big)\Big(2\sigma_\mathit{fg}+c_2\Big)}{\Big(\big\langle f\big\rangle ^{2}+\big\langle g\big\rangle ^{2}+c_1\Big)\Big(\mathit{\sigma_f^{2}+\sigma_g^{2}+c_2}\Big)}.$$

For testing, in addition to the TLFs above, various other metrics were used to quantify the performance: peak signal-to-noise ratio (PSNR), normalized root-mean-squared error (NRMSE) [55], and perceptual loss [56,57]. From the testing results we found that, generally, SSIM works better as TLF for the ImageNet [51] dataset and NPCC works better for the IC layout dataset regardless of the number of photons per pixel.

3. Simulations and design considerations

In the simulation, the signals were assumed to be Poisson random variables with additive Gaussian noise. Mean rate of each Poisson statistics was set to be either $1$ or $10$ depending on the noise level of interest. Additionally, in the case of mean photon arrival level per pixel being $1$, a factor of $50$ multiplied the Poisson random variables to mimic the behavior of EM gain of our EM-CCD. Gaussian noise is uncorrelated with Poisson statistics, and its standard deviation was assumed to be $10$ with zero mean.

Simulations were conducted under two different scenarios to explain when random phase modulation is favorable for low-photon phase retrieval (Section 3.1) and why deep neural networks are needed for reconstructing phase objects (Section 3.2). Each scenario suggests a criterion that both $z_1$ and $z_2$ should meet. This guided our choice of $(z_1, z_2)$ in the experiments of Section 4.

3.1 When is the random phase modulation favorable?

We swept two design parameters, $\textit {i.e.}$ $z_1$ and $z_2$, and for each combination computed the values of perceptual loss [56] and Pearson correlation coefficient (PCC) between reconstructions and their corresponding ground truth images. These two metrics represent radically different image aspects and, thus, we expect to reduce bias in our conclusions by taking both into consideration. Perceptual loss [56] is a feature loss devised to quantify visual quality using a VGG network pre-trained on the ImageNet database. As in [57], we used PhENN to generate reconstructions and the VGG network to compute the corresponding perceptual losses. Following [57], under photon-limited conditions we extracted the perceptual loss from the ReLU $1$-$2$ layer of the VGG, whereas under ample illumination we used ReLU $2$-$2$ layer as recommended by [56]. On the other hand, the PCC, defined as in Eq. (12) but without the minus sign, essentially computes the normalized spatial cross-covariance.

In the CDI scheme, i.e. without the random phase modulation, the performance of DNN in general decreases as $z_1+z_2$ increases, as expected since the numerical aperture (NA) of the system then decreases. This is shown in Figs. 6(a) and (d). In CMI, i.e. with the random phase modulation $\Phi \left (x',y'\right )$ incorporated into the system as in (2), this relationship is reorganized, showing an intermediate region where CMI yields better results according to both metrics, as seen in Figs. 6(b) and (e). With larger $z_2$, the improvement becomes smaller as the spatial signal becomes too diffuse; see Figs. 2 and 3. This indicates that overdoing the modulation can make the raw intensity more prone to be corrupted by the Poisson statistics of the signal and the readout and dark noise.

Fig. 5. Phase Extraction Neural Network (PhENN) architecture [24].

Download Full Size | PDF

Fig. 6. Photon arrival level for all figures is set to be $\mathbf {1}$ photon per pixel. For (a-c), perceptual loss from VGG network and for (d-f), Pearson correlation coefficient (PCC) were used to derive a design criterion of the location of the random phase modulation. Yellow asterisk marks our design for actual experiments, $\textit {i.e.}$ $(z_1, z_2) = \left (490\textrm {~mm},\ 48.5\textrm {~mm}\right )$. Black dashed lines are the loci $z_1+z_2=\textrm {constant}$, i.e. $\textrm {NA}=\textrm {constant}$.

Download Full Size | PDF

To quantify performance difference with and without phase mask, the merit ratio ${\gamma }_{\textrm {metric}}$ was defined as

(14)$$\begin{gathered} {\gamma}_{\textrm{metric}} = \frac{\textrm{metric}_\textrm{DNN,CMI}}{\textrm{metric}_\textrm{DNN,CDI}},\\ \textrm{where}\ \ \textrm{metric} = \textrm{(perceptual} \,\textrm{loss)}^{-1},\ \textrm{PCC}, \end{gathered}$$

so that ${\gamma }_{\textrm {perc}}$ and ${\gamma }_{\textrm {pcc}}$ correspond to the ratio based on perceptual loss and PCC, respectively. Based on the results in Figs. 6(c) and (f), it is reasonable to place the random phase mask in the intersection of two regions in these figures where both ${\gamma }_{\textrm {perc}}$ and ${\gamma }_{\textrm {pcc}}$ are relatively high.

3.2 Why is the deep neural network needed under photon-limited conditions?

Figures 7 and 8 show the comparison of GSF and DNN reconstructions according to perceptual loss and PCC, respectively, for various combinations of $\left (z_1,z_2\right )$. Perceptual loss was computed on ReLU $1$-$2$ layer of VGG$16$ architecture when the photon arrival level is $1$ photon per pixel [57] and ReLU $2$-$2$ layer when it is $10$ photons per pixel [56]. In the same manner as Eq. (14), we take the results from both GSF and DNN reconstructions into consideration by defining the ratio

(15)$$\begin{gathered} {\delta}_{\textrm{metric}} = \frac{\textrm{metric}_\textrm{DNN, CMI}}{\textrm{metric}_\textrm{GSF, CMI}},\\ \textrm{where}\ \ \textrm{metric} = \textrm{(perceptual}\, \textrm{loss)}^{-1},\ \textrm{PCC}, \end{gathered}$$

so ${\delta }_{\textrm {perc}}$ and ${\delta }_{\textrm {pcc}}$ are the merit ratio based on perceptual loss and PCC, respectively.

Fig. 7. All figures are based on $\textbf {perceptual loss}$. To assess the effectiveness of DNN over GSF under the photon-limited condition, two different photon levels are assumed, $\textit {i.e}$ $1$ photon per pixel for (a-c) and $10$ photons per pixel for (d-f). Yellow asterisks and black dashed lines follow the same convention as in Fig. 6.

Download Full Size | PDF

Fig. 8. Same comparisons as in Fig. 7 but now according to the PCC metric. Photon arrival rates per pixel are $\textit {i.e}$ $1$ photon per pixel for (a-c) and $10$ photons per pixel for (d-f). Yellow asterisks and black dashed lines follow the same convention as in Fig. 6.

Download Full Size | PDF

The general trend in Figs. 7 and 8 is that the noisier the raw images the more the improvement one may expect in the reconstructions by using DNN over GSF. This is especially true in the region $z_2\;<\;50\textrm {~mm}$.

According to Figs. 6, 7 and 8, the choice ($z_1 = 490\textrm {~mm},\ z_2 = 48.5\textrm {~mm}$) satisfies a reasonable compromise for performance under noisy conditions according to all combinations of reconstruction algorithms and image quality metrics. This choice is indicated with yellow asterisks in the figures and was used for the experimental apparatus and results in the next section. For a quantitative comparison between the results from this simulation and the experiments, see Appendix C.

4. Experiments and analysis of results

4.1 Optical apparatus

The optical apparatus is schematically depicted in Fig. 9. There are two SLMs: one transmissive and one reflective. The transmissive SLM1 (Holoeye LC2012, pixel pitch: $36\ \mu \textrm {m}$, $1024\times 768$ pixels) displays phase objects, and the reflective SLM2 (Thorlabs EXULUS-HD2, pixel pitch: $8\ \mu \textrm {m}$, $1920\times 1080$ pixels) implements a random phase pattern in 8-bit grayscale values. The calibration process for the two SLMs is described in Appendix D. Two linear polarizers, $\textit {i.e.}$ $\textrm {POL}1$ and $\textrm {POL}2$, properly modulate the optical field for $\textrm {SLM}1$ to display the objects with the maximum phase depth of $\sim 4.6\textrm {~rad}$. $\textrm {HWP}2$ rotates the polarization angle to $45^{\circ }$ to the vertical axis, thus the maximum phase depth of $\textrm {SLM}2$ is $2\pi$.

Fig. 9. Proposed optical apparatus. VND: variable neutral density filter, HWP: half-wave plate, OBJ: objective lens, F: spatial filter, L: lens, A: aperture, POL: polarizer / analyzer, SLM: spatial light modulator, NPBS: non-polarizing beamsplitter and EM-CCD: electron-multiplying charge coupled device.

Download Full Size | PDF

We use a coherent light source (Thorlabs HNL210L, power: $20\ \textrm {mW}$, $\lambda = 633\ \textrm {nm}$) followed by a variable neutral density (VND) filter to control the photon flux. The beam is expanded with a collimating lens L$1$ and cropped to $24\textrm {~mm}$ by an aperture A$1$. A second aperture A$2$ is placed right before POL$1$ to limit the spatial extent of the beam to $12\textrm {~mm}$ diameter. Using the design results from Sections 3.1 and 3.2, we propagate the optical field by $z_1=490\textrm {~mm}$ from SLM1 to SLM2 and by $z_2=48.5\textrm {~mm}$ from SLM2 to the EM-CCD (QImaging Rolera EM-C2, pixel pitch: $8\ \mu \textrm {m}$, $1002\times 1004$ pixels). The EM gain settings on the CCD and photon counts for the experiments reported here are in Table 1.

To implement CDI, SLM2 is set to zero phase delay for all pixels and essentially acts as a mirror. For CMI, the random phase modulation $\Phi$ is imposed by SLM2, according to the following procedure: first, a low-resolution version of the random pattern $\Phi$ is designed according to fair coin tosses, independently for each pixel. Unfortunately, pixel-to-pixel crosstalk in SLM2 [44,58] effectively introduces a spurious correlation between the phase values at neighboring pixels when they are displayed on the physical SLM2. In Appendix E we describe a process to effectively decorrelate them so that they represent as accurately as possible the results of the original independent fair coins sampling.

4.2 Experimental Results

For training and testing, images were randomly drawn from ImageNet and IC layout database and displayed as true phases $\varphi$ on the transmissive SLM1. Photon arrival levels were controlled to either $1$ or $1000$ photons per pixel, and both CDI or CMI schemes were used (i.e. with SLM2 unmodulated or modulated, respectively.) Results presented in this section are for test examples only.

Figure 10 displays intensity measurements, their Approximants, GSF reconstructions, and DNN reconstructions of two test images, labelled as 1 and 2, randomly selected from the ImageNet database. With CDI, and regardless of the photon arrival level, some unwanted artifacts, e.g. ripples, are found in the results (Image $1$). The artifacts are more prominent at the low photon count. Also, the results exhibit ambiguities in phase (visible in Image $2$). With the modulation, many artifacts are removed as shown in the DNN reconstructions (Image $1$), and the ambiguities in phase are also largely resolved (Image $2$) for both photon counts. Figure 11 similarly shows results from two images randomly selected from the IC layout database and displays similar trends. Thus, in terms of visual appearance it can be said that the CMI scheme in conjunction with deep learning in general improves reconstruction quality at low photon counts.

Fig. 10. ImageNet - Results with and without the random phase modulation over different photon levels. Two images were selected from ImageNet dataset to qualitatively illustrate how the random phase modulation deals with the noise-corrupted measurements due to the low-photon condition. In the experiments, $z_1 = 490\ \textrm {mm},\ z_2 = 48.5\ \textrm {mm}$.

Download Full Size | PDF

Fig. 11. IC layout - Results with and without the random phase modulation over different photon levels.

Download Full Size | PDF

Figures 12 and 13 show our two chosen metrics from section 3: PCC, SSIM; and, in addition, the standard metrics PSNR (peak signal-to-noise ratio) and NRMSE (normalized root mean square) [55] on the reconstructions produced by the CDI and CMI schemes with deep learning. Generally, the improvement is less noticeable in the IC layout cases. This is not surprising, since these represent a stronger prior that can be learnt by the DNN to regularize effectively even with the more ill-posed CDI; whereas for the less restrictive ImageNet case the priors are weak and the improved condition of the CMI forward operator is more helpful.

Fig. 12. ImageNet - Quantitative comparison using various metrics on DNN reconstructions from experimental measurements under different photon levels. Bar graphs denote the mean and standard deviation as error bars. PCC: Pearson correlation coefficient, SSIM: structural similarity index, NRMSE: normalized root mean-squared error, and PSNR: peak signal-to-noise ratio.

Download Full Size | PDF

Fig. 13. IC layout - Quantitative comparison using various metrics on DNN reconstructions from experimental measurements under different photon levels. Bar graphs denote the mean and standard deviation with error bars.

Download Full Size | PDF

We also analyzed the azimuthally averaged Power Spectral Densities (PSDs) of the reconstructions relative to the ground truth. This kind of spectral analysis has proven to be useful before for understanding how the deep learning inverse algorithm behaves at different spectral bands [26,60]. All cases, with or without CMI, are seen to approach the PSD of the ground truth in Fig. 14(a). A comparison of the ratio between the PSD for CMI over the PSD for CMI, despite some oscillations due to the small values involved, seems to indicate that CMI tends to perform slightly better at low and high frequencies; and CDI tends to perform slightly better at intermediate frequencies. These results merit further investigation.

Fig. 14. (a) Power spectral density (PSD) curves, which were circularly averaged and displayed in a logarithmic scale [59]. (b) Ratiometric comparison was made on two PSD curves with and without the random phase modulation under the condition of $1000$ photons per pixel. (c) Same as (b) but under the condition of $1$ photon per pixel.

Download Full Size | PDF

5. Conclusions and discussion

Phase retrieval from intensity is a highly ill-posed problem and, therefore, CDI reconstruction performance is extremely sensitive to noise. In this paper we found that using CMI together with a DNN inverse results in certain desirable effects. CMI reduces ill-posedness while the DNN can be an effective regularizer, especially with the Approximant to reduce the learning burden. The combination of CMI plus DNN leads generally to improved reconstructions, both qualitatively and quantitatively. Not surprisingly, when the noise is severe, CMI aids the DNN more effectively for data with weak priors, e.g. ImageNet. Conversely, when the priors are strong, as in the IC layout database, the CMI effect is smaller. It would be interesting to investigate, though not within the scope of the present work, how these results apply to more realistic phase objects, e.g. biological cells rather than phase objects implemented on SLMs; and to different bands of the electromagnetic spectrum or to particle imaging.

Appendix A. More details on the inverse algorithm of Fig. 4

Pre-processing in Fig. 4 consists of two steps. First, an affine transformation is applied to the raw intensity measurement ${I}_{\textrm {det}}$. This is because the coordinates of three different planes in Fig. 1 should match with each other as close as possible, which otherwise would lead to the failure of decoding process and introduce severe artifacts. This optimization step corrects a mismatch of the center axes, a rotation of a pattern or detector, imperfect alignment of the optical system, and divergence of the beam due to imperfect collimation as $z = z_1 + z_2$ gets larger. The affine transform matrix is determined by optimization with the Nelder-Mead method [61,62]. Negative normalized mutual information (NMI) was chosen as a loss function for the optimization [63]. The method tries to find an optimal matrix

(16)$$M = T\cdot R\cdot \mathit{Sh}\cdot \mathit{Sc},$$

where $T$ is a translation, $R$ a rotation, $\mathit {Sh}$ a shear, and $\mathit {Sc}$ a scaling matrix.

The second pre-processing step involves tuning parameters for imposing nonlinearity on the raw intensity measurement as $\left ({I}_{\textrm {det}}\right )^{p}$ and controlling the degree of a Tukey window $K_r$ as a smoothing kernel on the measurement. Changing $p$ to be other than $1$ may either accentuate or dim the phase contrast in Approximants – too small values of $p$ overly obscure the information, and at the other extreme they excessively emphasize the details. In addition, the Tukey window, for some small value $\textit {i.e.}$ $r\; >\; 0$ elimimates ripple-like artifacts that otherwise appear on the edges of the Approximants. However, too large a value of $r$ degrades the quality of Approximants. Therefore, both $p$ and $r$ should be determined interactively, depending on the photon arrival level and a type of dataset of interest. Typically, $p$ is chosen between $0.8$ and $1.2$, and $r$ is between $0.1$ and $0.4$.

Total-variation (TV) denoising is applied to a phase estimate to either guarantee the convergence of the GSF algorithm or to ease computational burden on training with DNN architecture [19,47]. For GSF reconstructions, it was sufficient with $30$ iterations to reach a plateau, and the algorithm did not converge without the TV denoising. In case of Approximants, the TV denoising is optional – the number of iterations of the process depends on the photon arrival level of measurements and a type of dataset as illustrated in Table 1. Here, phase estimates after every iteration are wrapped.

Appendix B. Cross-domain generalization

ImageNet and IC layout databases were chosen in this paper since they are radically different priors. Neural networks trained with ImageNet database are known to have better generalization ability than those trained with IC layout database which is highly restricted and thus acts as a strong regularizer on neural networks [27]. Therefore, cross-domain generalization performance is better with the ImageNet-trained network than with the IC layout-trained network as shown in Table 2 and Fig. 15.

Fig. 15. Qualitative results of cross-domain generalization. Columns indicate if the neural network was trained with either ImageNet or IC layout database, and rows denote if testing inputs are sampled from either ImageNet or IC layout database. Images are based on experimental intensity measurements under CMI scheme with the mean of photons per pixel of $1$.

Download Full Size | PDF

Table 2. Quantitative results of cross-domain generalization in terms of SSIM. Mean and standard deviation are shown. Values are based on experimental intensity measurements under CMI scheme with the mean of photons per pixel of $1$.

View Table | View all tables in this article

Appendix C. Additional tables of quantitative comparison

Tables 3 and 4 provide additional quantitative results supplementing Figs. 6, and 7 and 8, respectively. It is noticeable that the perceptual loss metric seems to improve for the experimental over simulation results. This is somewhat unexpected; slight discrepancies such as this between image evaluations by different numerical metrics are well-known and documented in the literature [64–67].

Table 3. Supplement to Fig. 6 - ImageNet database was used for both simulation and experiment. Values displayed are medians.

View Table | View all tables in this article

Table 4. Supplement to Figs. 7 and 8 - Results from simulation and experiment are quantitatively compared using perceptual loss and PCC. ImageNet database was used for both simulation and experiment. Values displayed are medians. (*) In the simulation, $10$ photons per pixel are the high photon count; in the experiment, the high photon count was $1000$ photons per pixel.

View Table | View all tables in this article

Appendix D. Spatial light modulator (SLM) calibration

Spatial light modulators assign different values of phase delay at every single digital pixel corresponding to a displayed pattern in grayscale. In this work, a transmissive SLM1 (Holoeye LC$2012$) and a reflective SLM2 (Thorlabs EXULUS-HD$2$), see Fig. 9, were used to display phase objects and to apply random phase modulation on the optical field, respectively. The transmissive SLM was calibrated beforehand to establish the optimum configuration of the two linear polarizers, POL$1$ and POL$2$. Polarization angles were set to modulate the phase up to $\sim 4.6\ \textrm {rad}$, which, however, involves coupled amplitude modulation. The reflective one has negligible coupled amplitude modulation, and its calibration curve is close to linear. A Mach-Zehnder interferometer was used for the calibration of two SLMs. Calibration results are presented in Fig. 16.

Fig. 16. SLM calibration curves. (a) Phase modulation of SLM$1$. (b) Coupled amplitude modulation of SLM$1$. (c) Phase modulation of SLM$2$. Horizontal axes of all curves are $8$-bit grayscale values.

Download Full Size | PDF

Appendix E. Random phase pattern optimization

Unlike fabricated phase masks, displayed patterns on a spatial light modulator may be affected by crosstalk among adjacent pixels [58] especially if they have abrupt changes in phase. If it were not compensated correctly, unwanted artifacts are introduced to Approximants because of the difference between encoding and decoding phase patterns. Smoothing phase profile of the patterns eases the problem. We implemented this as image interpolation with an appropriate size of kernel.To decide what size of kernel enables for the SLM to display a phase pattern as close to its original design possible, we performed a parameter sweep on the size of interpolation kernel. Figure 17 shows the quantitative comparison using several metrics. The best compensation is achieved for kernel size $=6$, and we found that this maximizes the performance gap between CDI and CMI as well. In Fig. 18, we cross-checked this result in the spatial frequency domain using the PSDs of the ground truth and reconstructions and found, consistently, kernel size $=6$ to perform best.

Fig. 17. Various quantitative metrics based on different values of the size of interpolation kernel. Each box shows the median in the middle, and $25^\textrm{th}$ and $75^\textrm{th}$ quantiles at the bottom and the top, respectively. (a) NRMSE (normalized root-mean squared error). (b) PCC (Pearson correlation coefficient). (c) SSIM (structural similarity index). (d) PSNR (peak signal-to-noise ratio).

Download Full Size | PDF

Fig. 18. (a) Original power spectral density (PSD) curves and (b) the ratio of each PSD curve to the reference. The case without any random phase modulation (or CDI) was set to be the reference. All cases are the results trained with NPCC as a loss function.

Download Full Size | PDF

Funding

Southern University of Science and Technology (6941806); Intelligence Advanced Research Projects Activity (FA8650-17-C-9113); Korea Foundation for Advanced Studies; National Natural Science Foundation of China (11775105).

Acknowledgments

Thanks to Mo Deng, Subeen Pang, Zhenfei He, and Prof. Jiaming Bai for helpful discussions. I. Kang acknowledges partial support from KFAS (Korea Foundation of Advanced Studies) scholarship, and F. Zhang acknowledges funding from the National Natural Science Foundation of China.

Disclosures

The authors declare no conflicts of interest.

References

1. P. A. Morris, R. S. Aspden, J. E. Bell, R. W. Boyd, and M. J. Padgett, “Imaging with a small number of photons,” Nat. Commun. 6(1), 5913 (2015). [CrossRef]

2. P. P. Laissue, R. A. Alghamdi, P. Tomancak, E. G. Reynaud, and H. Shroff, “Assessing phototoxicity in live fluorescence imaging,” Nat. Methods 14(7), 657–661 (2017). [CrossRef]

3. L. Gignac, C. Beslin, J. Gonsalves, F. Stellari, and C.-C. Lin, “High energy bse/se/stem imaging of 8 um thick semiconductor interconnects,” Microsc. Microanal. 20(S3), 8–9 (2014). [CrossRef]

4. I. Utke, S. Moshkalev, and P. Russell, Nanofabrication using focused ion and electron beams: principles and applications (Oxford University Press, 2012).

5. D. Gabor, “A new microscopic principle,” Nature 161(4098), 777–778 (1948). [CrossRef]

6. D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive holography,” Opt. Express 17(15), 13040–13049 (2009). [CrossRef]

7. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Sci. Appl. 7(2), 17141 (2018). [CrossRef]

8. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]

9. Y. Wu, Y. Luo, G. Chaudhari, Y. Rivenson, A. Calis, K. de Haan, and A. Ozcan, “Bright-field holography: cross-modality deep learning enables snapshot 3d imaging with bright-field contrast using a single hologram,” Light: Sci. Appl. 8(1), 25 (2019). [CrossRef]

10. M. Holler, A. Díaz, M. Guizar-Sicairos, P. Karvinen, E. Färm, E. Härkönen, M. Ritala, A. Menzel, J. Raabe, and O. Bunk, “X-ray ptychographic computed tomography at 16 nm isotropic 3d resolution,” Sci. Rep. 4(1), 3857 (2015). [CrossRef]

11. L. Tian, X. Li, K. Ramchandran, and L. Waller, “Multiplexed coded illumination for fourier ptychography with an led array microscope,” Biomed. Opt. Express 5(7), 2376–2389 (2014). [CrossRef]

12. A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy 109(10), 1256–1262 (2009). [CrossRef]

13. N. Streibl, “Phase imaging by the transport equation of intensity,” Opt. Commun. 49(1), 6–10 (1984). [CrossRef]

14. L. Waller, L. Tian, and G. Barbastathis, “Transport of intensity phase-amplitude imaging with higher order intensity derivatives,” Opt. Express 18(12), 12552–12561 (2010). [CrossRef]

15. L. Waller, S. S. Kou, C. J. Sheppard, and G. Barbastathis, “Phase from chromatic aberrations,” Opt. Express 18(22), 22817–22825 (2010). [CrossRef]

16. L. Waller, M. Tsang, S. Ponda, S. Y. Yang, and G. Barbastathis, “Phase and amplitude imaging from noisy images by kalman filtering,” Opt. Express 19(3), 2805–2815 (2011). [CrossRef]

17. Y. Zhu, A. Shanker, L. Tian, L. Waller, and G. Barbastathis, “Low-noise phase imaging by hybrid uniform and structured illumination transport of intensity equation,” Opt. Express 22(22), 26696–26711 (2014). [CrossRef]

18. K. A. Nugent, “Coherent methods in the x-ray sciences,” Adv. Phys. 59(1), 1–99 (2010). [CrossRef]

19. R. Horisaki, R. Takagi, and J. Tanida, “Learning-based imaging through scattering media,” Opt. Express 24(13), 13738–13743 (2016). [CrossRef]

20. J. Miao, R. L. Sandberg, and C. Song, “Coherent x-ray diffraction imaging,” IEEE J. Sel. Top. Quantum Electron. 18(1), 399–410 (2012). [CrossRef]

21. R. W. Gerchberg, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).

22. W. Saxton, Computer techniques for image processing in electron microscopy, vol. 10 (Academic Press, 2013).

23. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]

24. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

25. A. Goy, K. Arthur, S. Li, and G. Barbastathis, “Low photon count phase retrieval using deep learning,” Phys. Rev. Lett. 121(24), 243902 (2018). [CrossRef]

26. M. Deng, S. Li, A. Goy, I. Kang, and G. Barbastathis, “Learning to synthesize: Robust phase retrieval at low photon counts,” Light: Sci. Appl. 9(1), 36 (2020). [CrossRef]

27. M. Deng, S. Li, I. Kang, N. X. Fang, and G. Barbastathis, “On the interplay between physical and content priors in deep learning for computational imaging,” arXiv preprint arXiv:2004.06355 (2020).

28. F. Zhang, G. Pedrini, and W. Osten, “Phase retrieval of arbitrary complex-valued fields through aperture-plane modulation,” Phys. Rev. A 75(4), 043805 (2007). [CrossRef]

29. F. Zhang and J. Rodenburg, “Phase retrieval based on wave-front relay and modulation,” Phys. Rev. B 82(12), 121104 (2010). [CrossRef]

30. F. Zhang, B. Chen, G. R. Morrison, J. Vila-Comamala, M. Guizar-Sicairos, and I. K. Robinson, “Phase retrieval by coherent modulation imaging,” Nat. Commun. 7(1), 13367 (2016). [CrossRef]

31. X. Dong, X. Pan, C. Liu, and J. Zhu, “Single shot multi-wavelength phase retrieval with coherent modulation imaging,” Opt. Lett. 43(8), 1762–1765 (2018). [CrossRef]

32. A. Ulvestad, W. Cha, I. Calvo-Almazan, S. Maddali, S. Wild, E. Maxey, M. Duparaz, and S. Hruszkewycz, “Bragg coherent modulation imaging: Strain-and defect-sensitive single views of extended samples,” arXiv preprint arXiv:1808.00115 (2018).

33. W. Tang, J. Yang, W. Yi, Q. Nie, J. Zhu, M. Zhu, Y. Guo, M. Li, X. Li, and W. Wang, “Single-shot coherent power-spectrum imaging of objects hidden by opaque scattering media,” Appl. Opt. 58(4), 1033–1039 (2019). [CrossRef]

34. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, “Single-shot lensless imaging with fresnel zone aperture and incoherent illumination,” Light: Sci. Appl. 9(1), 53 (2020). [CrossRef]

35. G. Williams, H. Quiney, B. Dhal, C. Tran, K. A. Nugent, A. Peele, D. Paterson, and M. De Jonge, “Fresnel coherent diffractive imaging,” Phys. Rev. Lett. 97(2), 025506 (2006). [CrossRef]

36. G. Williams, H. Quiney, A. Peele, and K. Nugent, “Fresnel coherent diffractive imaging: treatment and analysis of data,” New J. Phys. 12(3), 035020 (2010). [CrossRef]

37. B. Abbey, K. A. Nugent, G. J. Williams, J. N. Clark, A. G. Peele, M. A. Pfeifer, M. De Jonge, and I. McNulty, “Keyhole coherent diffractive imaging,” Nat. Phys. 4(5), 394–398 (2008). [CrossRef]

38. B. Chen and J. J. Stamnes, “Validity of diffraction tomography based on the first born and the first rytov approximations,” Appl. Opt. 37(14), 2996–3006 (1998). [CrossRef]

39. A. Devaney, “Inverse-scattering theory within the rytov approximation,” Opt. Lett. 6(8), 374–376 (1981). [CrossRef]

40. J. Lim, A. B. Ayoub, E. E. Antoine, and D. Psaltis, “High-fidelity optical diffraction tomography of multiple scattering samples,” Light: Sci. Appl. 8(1), 1–12 (2019). [CrossRef]

41. T.-A. Pham, E. Soubies, A. Ayoub, J. Lim, D. Psaltis, and M. Unser, “Three-dimensional optical diffraction tomography with lippmann-schwinger model,” IEEE Trans. Comput. Imaging 6, 727–738 (2020). [CrossRef]

42. J. W. Goodman, Introduction to Fourier optics (Roberts and Company Publishers, 2005).

43. D. Torrieri, Principles of spread-spectrum communication systems, vol. 1 (Springer, 2005).

44. C. Kohler, F. Zhang, and W. Osten, “Characterization of a spatial light modulator and its application in phase retrieval,” Appl. Opt. 48(20), 4003–4008 (2009). [CrossRef]

45. J. Schmidt, Numerical simulation of optical wave propagation with examples in matlab, (Society of Photo-Optical Instrumentation Engineers (SPIE), 2010).

46. F. Zhang, I. Yamaguchi, and L. Yaroslavsky, “Algorithm for reconstruction of digital holograms with adjustable magnification,” Opt. Lett. 29(14), 1668–1670 (2004). [CrossRef]

47. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1-4), 259–268 (1992). [CrossRef]

48. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci. 2(1), 183–202 (2009). [CrossRef]

49. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), (Springer, 2015), pp. 234–241.

50. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), (2016), pp. 770–778.

51. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), (2009), pp. 248–255.

52. A. Goy, G. Rughoobur, S. Li, K. Arthur, A. I. Akinwande, and G. Barbastathis, “High-resolution limited-angle phase tomography of dense layered objects using deep neural networks,” Proc. Natl. Acad. Sci. U. S. A. 116(40), 19848–19856 (2019). [CrossRef]

53. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

54. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

55. J. R. Fienup, “Invariant error metrics for image reconstruction,” Appl. Opt. 36(32), 8352–8357 (1997). [CrossRef]

56. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision (ECCV), (Springer, 2016), pp. 694–711.

57. M. Deng, A. Goy, S. Li, K. Arthur, and G. Barbastathis, “Probing shallower: perceptual loss trained phase extraction neural network (plt-phenn) for artifact-free reconstruction at low photon budget,” Opt. Express 28(2), 2511–2535 (2020). [CrossRef]

58. P. Gemayel, B. Colicchio, A. Dieterlen, and P. Ambs, “Cross-talk compensation of a spatial light modulator for iterative phase retrieval applications,” Appl. Opt. 55(4), 802–810 (2016). [CrossRef]

59. A. van der Schaaf and J. H. van Hateren, “Modelling the power spectra of natural images: statistics and information,” Vision Res. 36(17), 2759–2770 (1996). [CrossRef]

60. S. Li and G. Barbastathis, “Spectral pre-modulation of training examples enhances the spatial resolution of the phase extraction neural network (phenn),” Opt. Express 26(22), 29340–29352 (2018). [CrossRef]

61. G. K. Matsopoulos, N. A. Mouravliansky, K. K. Delibasis, and K. S. Nikita, “Automatic retinal image registration scheme using global optimization techniques,” IEEE Trans. Inf. Technol. Biomed. 3(1), 47–60 (1999). [CrossRef]

62. J. A. Nelder and R. Mead, “A simplex method for function minimization,” Comput. J. 7(4), 308–313 (1965). [CrossRef]

63. A. Strehl and J. Ghosh, “Cluster ensembles—a knowledge reuse framework for combining multiple partitions,” J. Mach. Learn. Res. 3, 583–617 (2002).

64. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

65. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., (2018), pp. 586–595.

66. M. Bertero and P. Boccacci, Introduction to inverse problems in imaging (CRC press, 1998).

67. A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEE Trans. Commun. 43(12), 2959–2965 (1995). [CrossRef]

1 photon / pixel	EM gain	Photon count	$N_{TV}$	TLF
ImageNet - CDI	207.25	1.0535	0	SSIM
ImageNet - CMI	54.0	1.0607	0	SSIM
IC layout - CDI	207.25	1.1211	1	NPCC
IC layout - CMI	54.0	1.0629	0	NPCC

1000 photons / pixel	EM gain	Photon count	TLF
ImageNet - CDI	1.0	1076.9	SSIM
ImageNet - CMI	1.0	1035.0	SSIM
IC layout - CDI	1.0	1040.2	NPCC
IC layout - CMI	1.0	1016.8	NPCC

SSIM	ImageNet-trained	IC layout-trained
Input: ImageNet	$0.9272 \pm 0.0655$	$0.7775 \pm 0.1471$
Input: IC layout	$0.8314 \pm 0.0653$	$0.9162 \pm 0.0276$

1 photon	CDI (DNN)	CDI (DNN)	CMI (DNN)	CMI (DNN)
per pixel	- Sim.	- Exp.	- Sim.	- Exp.
${Perceptual loss}^{- 1}$	0.9452	1.1702	1.158	1.240
PCC	0.8973	0.8429	0.9262	0.9077

Perceptual loss^-1	GSF - Sim.	GSF - Exp.	DNN - Sim.	DNN - Exp.
Low photon count	0.3289	0.3043	1.158	1.240
${High photon count}^{*}$	0.1299	0.1122	0.2410	0.3138

Phase extraction neural network (PhENN) with coherent modulation imaging (CMI) for phase retrieval at low photon counts

Abstract

1. Introduction

2. Methods

2.1 Coherent modulation imaging scheme in a general sense

2.2 Definition of the forward operator

2.3 Inverse algorithm

2.4 Training the deep neural network

3. Simulations and design considerations

3.1 When is the random phase modulation favorable?

3.2 Why is the deep neural network needed under photon-limited conditions?

4. Experiments and analysis of results

4.1 Optical apparatus

4.2 Experimental Results

5. Conclusions and discussion

Appendix A. More details on the inverse algorithm of Fig. 4

Appendix B. Cross-domain generalization

Appendix C. Additional tables of quantitative comparison

Appendix D. Spatial light modulator (SLM) calibration

Appendix E. Random phase pattern optimization

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (18)

Tables (4)

Equations (16)

Optics Express

PCC	GSF - Sim.	GSF - Exp.	DNN - Sim.	DNN - Exp.
Low photon count	0.5385	0.2931	0.9262	0.9077
${High photon count}^{*}$	0.7069	0.4369	0.9289	0.9630