Photon-limited bounds for phase retrieval

Timothy J. Schulz; David J. Brady; Chengyu Wang

doi:10.1364/OE.425796

1. Introduction

Phase retrieval is the recovery of a complex-valued signal from measurements of the squared modulus of its linear transformations [1–3]. In a classic example of this problem, the linear transformations are propagation through free space and direct observation. In other examples, the transformations represent propagation through lenses and phase screens. But in all cases, the detected data are the intensities (or square moduli) of the transformed field, which makes the signal recovery a phase-retrieval problem.

In the discrete formulation, the signal is an $N$-element complex-valued vector $\mathbf {f}$, and the measurements come from $K$ linear transformations of $\mathbf {f}$:

(1)$$\mathbf{g}_k = \mathbf{H}_k \mathbf{f}, ~~~ k = 1, 2, \ldots, K,$$

where each $\mathbf {g}_k$ has $M_k$ elements. These linear transformations model propagation of the signal $\mathbf {f}$ through free-space or through standard optical elements such as lenses or phase screens. Throughout this paper, we use bold font to denote vectors and matrices, and we use standard font to denote their elements.

If we could directly measure both the amplitude and phase of $\mathbf {g}_k$, recovery of $\mathbf {f}$ from the measurements would be relatively straightforward. When we can only observe the intensities $\{ |g_{k,m}|^2 \}$, the signal-recovery problem—known as phase retrieval—is much more challenging.

The total intensity in the transformed signals is:

(2)$$\sum_{k=1}^K \left\| \mathbf{g}_k \right\|^2 = \mathbf{f}^\dagger \left[ \sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{H}_k \right] \mathbf{f},$$

where the superscript $^\dagger$ denotes the conjugate transpose of a matrix or vector. Therefore, the condition for the system to neither amplify nor attenuate the signal’s total intensity $\left \| \mathbf {f} \right \|^2$ is:

(3)$$\sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{H}_k = \mathbf{I}_N,$$

where $\mathbf {I}_N$ is an $N \times N$ identity matrix. Systems that amplify or attenuate the signal’s total intensity, but have equal sensitivity to all of the signal’s components, will have

(4)$$\sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{H}_k = \alpha \mathbf{I}_N,$$

where the non-negative factor $\alpha$ represents the system’s overall gain or attenuation.

Under the the semi-classical detection model [4,5] the measurements $\{ d_{k,m} \}$ are independent Poisson random variables with the expected values $\{ |g_{k,m}|^2 \}$. Because the data are random, an estimate $\widehat {\mathbf {f}}$ of $\mathbf {f}$ from these data is random too, and the estimator’s performance can be quantified by the conditional mean-square error:

(5)$$\begin{aligned}\textrm{MSE}(\mathbf{f}) &= E\left[\left. \left\| \widehat{\mathbf{f}} - \mathbf{f} \right\|^2 ~\right|~ \mathbf{f} \right]\\ &= E\left[ \left. \left\| \widehat{\mathbf{r}} - \mathbf{r} \right\|^2 ~\right|~ \mathbf{f} \right] + E\left[ \left. \left\| \widehat{\mathbf{i}} - \mathbf{i} \right\|^2 ~\right|~ \mathbf{f} \right], \end{aligned}$$

where $\mathbf {r}$ and $\mathbf {i}$ are the real and imaginary parts, respectively, for the complex valued signal $\mathbf {f}$:

(6)$$\mathbf{f} = \mathbf{r} + j \mathbf{i},$$

and $E[\cdot ~|~ \mathbf {f}]$ is the expected value operator, conditional on $\mathbf {f}$. Because the observed intensities for all phase-retrieval systems are the same for $\mathbf {f}$ and $\mathbf {f}e^{j\theta }$, where $\theta$ is any overall phase shift for the signal, the MSE is sometimes modified as:

(7)$$\begin{aligned}\textrm{MSE}_\theta(\mathbf{f}) &= \min_\theta E\left[\left. \left\| \widehat{\mathbf{f}} - \mathbf{f} e^{j\theta} \right\|^2 ~\right|~ \mathbf{f} \right]\\ &= E\left[ \left. \left\| \mathbf{f} \right\|^2 + \left\| \widehat{\mathbf{f}} \right\|^2 - 2 \left| \widehat{\mathbf{f}}^\dagger \mathbf{f} \right| ~\right| ~\mathbf{f} \right], \end{aligned}$$

where the second line results from replacing $\theta$ with the value that sets the derivative with respect to $\theta$ of $\| \widehat {\mathbf {f}} - \mathbf {f}e^{j\theta } \|^2$ to zero.

In the remainder of this paper, we show that an unbiased estimator for any phase-retrieval system that is limited only by quantum (or photon) noise has

(8)$$\textrm{MSE}_{\theta}(\mathbf{f}) \geq~N-1,$$

and we prove the existence of a system that is asymptotically optimal in the sense that its $\textrm {MSE}_\theta (\mathbf {f})$ converges to $N-1$ with increasing $N$. We compare the performance of ptychography-based phase retrieval [6–8] to the bound, and demonstrate that enhancements can make these systems closer to optimal. We also show that an optimal phase-retrieval system performs as well as holography in the presence of only photon noise.

2. Fisher information and the Cramér-Rao lower bound

The Fisher information and its inverse—the Cramér-Rao lower bound (CRLB)—provide a bound on the mean-square error for any unbiased estimator of the unknown parameters [9]. For our situation, $\mathbf {f}$ provides $2N$ unknown parameters, and the trace of the inverse of the $2N \times 2N$ Fisher information matrix provides a lower bound for $\textrm {MSE}(\mathbf {f})$.

For independent Poisson data with the observed intensities $\{ |g_{k,m} |^2 \}$, the Fisher information matrix—relative to the real-valued signal parameters $\mathbf {r}$ and $\mathbf {i}$— is [10]:

(9)$$\mathbf{J} = \left[ \begin{array}{cc} \mathbf{J}_{r,r} &\mathbf{J}_{r,i} \\ \mathbf{J}_{i,r} &\mathbf{J}_{i,i} \end{array} \right],$$

where

(10)$$\mathbf{J}_{\alpha, \beta} = \sum_{k=1}^K \sum_{m=1}^{M_k} \frac{\partial |g_{k,m}|^2}{\partial \mathbf{\alpha}} \frac{\partial |g_{k,m}|^2}{\partial \mathbf{\beta^T}} \frac{1}{|g_{k,m}|^2} .$$

If we represent $g_{k,m}$ as

(11)$$g_{k,m} = \mathbf{f}^T \mathbf{a}_{k,m},$$

where

(12)$$\mathbf{H}_k = \left[ \begin{array}{c} \mathbf{a}_{k,1}^T \\ \mathbf{a}_{k,2}^T \\ \vdots \\ \mathbf{a}_{k,M_k}^T \end{array} \right],$$

and the superscript $^T$ denotes the transpose of a matrix or vector, then the vector derivatives with respect to $\mathbf {r}$ and $\mathbf {i}$ are:

(13)$$\frac{\partial |g_{k,m} |^2}{\partial \mathbf{r}} = \mathbf{a}_{k,m} g_{k,m}^* + \mathbf{a}^*_{k,m} g_{k,m},$$

and

(14)$$\frac{\partial |g_{k,m} |^2}{\partial \mathbf{i}} = j\left( \mathbf{a}_{k,m} g_{k,m}^* - \mathbf{a}^*_{k,m} g_{k,m} \right),$$

where the superscript $^*$ denotes the complex conjugate of each element in a matrix or vector. Then if we assume that the system conserves energy with uniform gain, the elements of the Fisher information are

(15)$$\mathbf{J}_{r,r} = 2 \left( \mathbf{I}_N + \mathbf{C}_R \right),$$

(16)$$\mathbf{J}_{i,i} = 2 \left( \mathbf{I}_N - \mathbf{C}_R \right),$$

and

(17)$$\mathbf{J}_{r,i} = 2 \mathbf{C}_I ,$$

where

(18)$$\begin{aligned}\mathbf{C} &= \sum_{k=1}^K \sum_{m=1}^{M_k} \mathbf{a}_{k,m}^* \mathbf{a}_{k,m}^\dagger \left( \frac{g_{k,m}}{|g_{k,m}|} \right)^2\\ &= \sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{D}_{g_k} \mathbf{D}_{g_k} \mathbf{H}_k^*\\ &= \mathbf{C}_R + j\mathbf{C}_I. \end{aligned}$$

Here, $\mathbf {D}_{g_k}$ is a diagonal matrix with the elements of $g_{k,m} / |g_{k,m}|$ along its diagonal, and the superscript $^\dagger$ denotes the conjugate transpose of a matrix or vector.

The Fisher information matrix is then

(19)$$\mathbf{J} = 2 \left( \mathbf{I} + \left[ \begin{array}{cc} \mathbf{C}_R &\mathbf{C}_I \\ \mathbf{C}_I &-\mathbf{C}_R \end{array} \right] \right),$$

and its eigenvalues are of the form

(20)$$2 ( 1 \pm \sigma_n ),$$

where $\{ 0 \leq \sigma _n \leq 1 \}$ are the singular values for the performance matrix $\mathbf {C}$. This follows from three observations: (1) the Fisher information is non-negative definite; (2) the eigenvalues and eigenvectors for the block matrix part of the Fisher information satisfy:

(21)$$\left[ \begin{array}{cc} \mathbf{C}_R &\mathbf{C}_I \\ \mathbf{C}_I &-\mathbf{C}_R \end{array} \right] \left[ \begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \end{array} \right] = \sigma \left[ \begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \end{array} \right] \Longrightarrow \left[ \begin{array}{cc} \mathbf{C}_R &\mathbf{C}_I \\ \mathbf{C}_I &-\mathbf{C}_R \end{array} \right] \left[ \begin{array}{c} -\mathbf{u}_2 \\ \mathbf{u}_1 \end{array} \right] ={-}\sigma \left[ \begin{array}{c} -\mathbf{u}_2 \\ \mathbf{u}_1 \end{array} \right],$$

and (3) the singular values and vectors for $\mathbf {C}$ relate to the eigenvalues and eigenvectors of the block matrix according to

(22)$$(\mathbf{u}_1 + j\mathbf{u}_2)^\dagger \mathbf{C} (\mathbf{u}_1 - j\mathbf{u}_2) = \sigma.$$

Because the eigenvalues for the inverse Fisher information are

(23)$$\frac{1}{2 ( 1 \pm \sigma_n )},~~n = 1, 2, \ldots, N,$$

the Cramer-Rao Lower Bound (CRLB) for the conditional mean-square error is:

(24)$$\begin{aligned}\textrm{MSE}(\mathbf{f}) &\geq \sum_{n=1}^N \left[ \frac{1}{2 (1 + \sigma_n)} + \frac{1}{2 (1 - \sigma_n)} \right]\\ &= \sum_{n=1}^N \frac{1}{1 - \sigma_n^2}\\ &\geq N, \end{aligned}$$

where the last inequality has equality if and only if the singular values for $\mathbf {C}$ are equal to zero.

Because $\mathbf {f} e^{j\theta }$ provides the same observations as $\mathbf {f}$, we have no way to measure the reference phase for the signal $\mathbf {f}$. As a result, at least one of the singular values for $\mathbf {C}$ will be equal to $1$. But because we are only interested in the relative phase between the elements of $\mathbf {f}$—not the reference phase—we can ignore the error in that dimension of the signal. To do this, we order the singular values as $\sigma _1 \leq \sigma _2 \leq \cdots \leq \sigma _N$ and omit $\sigma _N$—which is equal to 1—from our bound on the phase-adjusted mean-square error:

(25)$$\textrm{MSE}_{\theta}(\mathbf{f}) \geq \sum_{n=1}^{N-1} \frac{1}{1 - \sigma_n^2}.$$

The smallest possible bound a system can attain, then, is $N-1$:

(26)$$\textrm{MSE}_{\theta}(\mathbf{f}) \geq (N-1),$$

and we consider any system with this bound to be optimal.

One might be concerned that the lower bound for the MSE per element is not a function of the expected number of photons per element. This is similar to estimating the amplitude $\alpha$ rather than the intensity $\alpha ^2$ of a Poisson random variable $d$ with mean $\alpha ^2$. In this case, the lower bound for the MSE is $0.25$, and the MSE for the simple estimator $\widehat {\alpha } = \sqrt {d}$ asymptotically approaches $0.25$ with less than $5\%$ difference for values of $\alpha ^2$ greater than 10, and less than $1\%$ difference when $\alpha ^2$ is greater than 100.

It is important to note that this same analysis shows that including a known holographic reference $\rho$ in the measurement model:

(27)$$\mathbf{g}_k = \mathbf{H}_k \mathbf{f} + \rho,$$

will not lower the optimal bound. Furthermore, a phase-quadrature system with

(28)$$\mathbf{H}_k = e^{j k \frac{\pi}{2}}, ~~~k = 1, 2, 3, 4,$$

is optimal in the limit of large $|\rho |$.

Finally, if the system has uniform sensitivity to the signal:

(29)$$\sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{H}_k = \alpha \mathbf{I}_N,$$

but the gain $\alpha$ is not equal to one, then the MSE bound will be

(30)$$\textrm{MSE}_{\theta}(\mathbf{f}) \geq \frac{1}{\alpha} (N-1).$$

Therefore, we can always analyze a system with uniform sensitivity as though $\alpha = 1$, then modify the lower bound for the actual value for $\alpha$.

This result is different from previous work on lower bounds for phase retrieval in two fundamental ways. First, most analyses have considered lower bounds for the situation of additive Gaussian noise, either in the intensity measurements or in the linear transformations [11,12]. Second, analyses that have considered photon-limited situations (or approximations) have regarded specific measurement systems [13–15]. Our result applies to any system that acquires photon-limited measurements from linear or affine transformations of the field. For some systems, the bound will be higher, but no system can have a smaller lower bound. In the following sections, we demonstrate the important result that a phase-retrieval system exists that can attain this optimal lower bound, and, hence, match the performance of a holographic system without using a reference signal.

2.1 Optimal phase retrieval

For an optimal system—one for which the lower bound for $\textrm {MSE}_\theta (\mathbf {f})$ is $(N-1)$—the rank for the performance matrix $\mathbf {C}$ must be one:

(31)$$\mathbf{C} = \mathbf{u} \mathbf{v}^\dagger,$$

where both $\mathbf {u}$ and $\mathbf {v}$ are unit vectors. According to Eq. (22), these singular vectors are related by

(32)$$\mathbf{v} = \mathbf{u}^* e^{j\phi},$$

for some phase $\phi$. Therefore, if we define the random variable $y$ as:

(33)$$y = \mathbf{x}^\dagger \mathbf{C} \mathbf{x}^*,$$

where $\mathbf {x}$ is a zero-mean complex Gaussian random vector with an identity covariance, then

(34)$$y = x_0^2 e^{j \phi},$$

if and only if $\mathbf {C}$ is a rank one matrix, where $x_0 = \mathbf {x}^\dagger \mathbf {u}$ is a zero-mean complex Gaussian random variable with unit variance. This implies that $\mathbf {C}$ has rank one if and only if $y$ is a zero-mean random variable with $E[ |y|^2 ~|~ \mathbf {f}] = 2$, and we can use this fact to test a matrix $\mathbf {C}$ for optimality. Note that if $\mathbf {C}$ has rank greater than one, then $y$ will have a conditional second moment that is greater than 2. In terms of the system matrices $\{ \mathbf {H}_k \}$, we have

(35)$$\begin{aligned}y &= \mathbf{x}^\dagger \mathbf{C} \mathbf{x}^*\\ &= \mathbf{x}^\dagger \sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{D}_{g_k} \mathbf{D}_{g_k} \mathbf{H}_k^* \mathbf{x}^*\\ &= \sum_{k=1}^K \mathbf{z}_k^\dagger \mathbf{z}_k^*\\ &= \sum_{k=1}^K \sum_{m=1}^{M_k} \left( z_{k,m}^* \right)^2, \end{aligned}$$

where $\{\mathbf {z}_k = \mathbf {D}^*_{g_k} \mathbf {H}_k \mathbf {x}\}$ are zero-mean complex Gaussian random vectors with

(36)$$E\left[ \mathbf{z}_k \mathbf{z}_l^\dagger \right] = \mathbf{D}^*_{g_k} \mathbf{H}_k \mathbf{H}_l^\dagger \mathbf{D}_{g_l}.$$

Now, suppose we design the system so that

(37)$$\mathbf{H}_k \mathbf{H}_l^\dagger{=} \frac{\delta(k-l)}{\sqrt{K M_k}} \mathbf{I}_{M_k},$$

where $\delta (k)$ is the Kronecker impulse function. In this case,

(38)$$E\left[ \mathbf{z}_k \mathbf{z}_l^\dagger \right] = \frac{\delta(k-l)}{\sqrt{K M_k}} \mathbf{I}_{M_k},$$

and, because $E[z_{k,m}^2] = 0$,

(39)$$\begin{aligned}E\left[ |y|^2 ~|~ \mathbf{f} \right] &= \sum_{l=1}^K \sum_{m=1}^{M_k} E\left[ z_{k,m}^* z_{k,m}^* z_{k,m} z_{k,m} \right]\\ &= \sum_{l=1}^K \sum_{m=1}^{M_k} 2 \left( \frac{1}{\sqrt{K M_k}} \right)^2\\ &= 2. \end{aligned}$$

Therefore, a sufficient condition for an optimal system is

(40)$$\mathbf{H}_k \mathbf{H}_l^\dagger{=} \frac{\delta(k-l)}{\sqrt{K M_k}} \mathbf{I}_{M_k},$$

provided that

(41)$$\sum_{k=1}^K \mathbf{H}_k^\dagger \mathbf{H}_k = \mathbf{I}_N.$$

2.2 Existence of an optimal system

Random modulation of the Fourier plane, as proposed by Candes et al. [2], can achieve quantum-limited signal estimation if the number of measurements is sufficient. Suppose that

(42)$$\mathbf{H}_k = \mathbf{W} \mathbf{D}_{\phi_k}, ~~~ k = 1, 2, \ldots, N,$$

where $\mathbf {W}$ is an $N \times N$ orthonormal Fourier transform:

(43)$$W_{m,n} = e^{{-}j \frac{2\pi}{N} nm}/\sqrt{N},$$

and $\mathbf {D}_{\phi _k}$ is a diagonal matrix with the elements $e^{j\phi _{k,n}}/\sqrt {N}$ along the diagonal, and where $\{ \phi _{k,m} \}$ are independent, identically distributed uniform phases over the interval from $0$ to $2\pi$. In this case,

(44)$$\begin{aligned}\mathbf{H}_k \mathbf{H}_k^\dagger &= \mathbf{W} \mathbf{D}_{\phi_k} \mathbf{D}^\dagger_{\phi_k} \mathbf{W}^\dagger\\ &= \frac{1}{N} \mathbf{I}_N\\ &= \frac{1}{\sqrt{K M_k}} \mathbf{I}_{M_k}, \end{aligned}$$

because $K = M_k = N$. Therefore, this system attains the condition for optimality for $k = l$. For $k \neq l$, we need to show that $\mathbf {H}_k \mathbf {H}_l^\dagger$ is a matrix of zeros, or converges to a matrix of zeros in some sense. To do that, we first note that the phase differences $\theta _m = \phi _{k,m} - \phi _{l,m}$ are independent and uniformly distributed. Then,

(45)$$\left[ \mathbf{H}_k \mathbf{H}_l^\dagger \right]_{n,p} = \frac{1}{N} \sum_{m=1}^N e^{j\theta_m} W_{n,m}W^*_{m,p},$$

and

(46)$$\begin{aligned}E\left\{ \left| \left[ \mathbf{H}_k \mathbf{H}_l^\dagger \right]_{n,p} \right|^2 \right\} &= \frac{1}{N^2} \sum_{m=1}^N \sum_{q=1}^N E\left[ e^{j\left( \theta_m - \theta_q \right)} \right] W_{n,m} W^*_{m,p} W^*_{n,q} W_{q,p}\\ &= \frac{1}{N^2} \sum_{m=1}^N \left| W_{n,m} \right|^2 \left| W_{m,p} \right|^2\\ &= \frac{1}{N^3}, \end{aligned}$$

because $|W_{n,m}| = 1/\sqrt {N}$. Therefore, $\mathbf {H}_k \mathbf {H}_l^\dagger$ is arbitrarily close to zero for a sufficiently large $N$ when $k \neq l$.

We might modify this optimal system by selecting $\mathbf {W}$ as a $2N \times N$ orthonormal Fourier transform, effectively over-sampling the field by a factor of two. In that case, each subsystem acquires $2N$ intensities, and the total number of intensity measurements is $2N^2$. As an example, Fig. 1 shows the MSE bounds for both critical- and over-sampling as a function of the number of elements $N$ for a signal $\mathbf {f}$ whose elements are independent, identically distributed complex Gaussian random variables. Because the number of intensity measurements for an $N$-element signal is $N^2$ for critical sampling and $2N^2$ for over-sampling, this approach to designing an asymptotically optimal system might be impractical from a measurement and processing perspective. Rather than make the number of subsystems $K$ equal to the number of signal elements, a sub-optimal system might set the number of subsystems as a design parameter. As an example, Fig. 2 shows the MSE bounds as a function for the number of frames for a two-dimensional signal with $N = 20^2$ elements. Here, oversampling by a factor of two in each dimension results in four times as many measurements as are obtained for a critically-sampled system. The plot is essentially the same for larger numbers of signal elements $N$.

Fig. 1. The lower bound on the phase-adjusted mean-square error for an asymptotically optimal one-dimensional phase-retrieval system as a function of the number of signal elements.

Download Full Size | PDF

Fig. 2. The lower bound on the phase-adjusted mean-square error for suboptimal phase-retrieval systems as a function of the number of sub-systems (or frames) for a two-dimensional signal with $20 \times 20$ elements.

Download Full Size | PDF

2.3 Performance bound for classic phase retrieval

The classic phase-retrieval problem as described by Gerchberg and Saxton [16] utilizes a two-frame measurement system with

(47)$$\mathbf{H}_1 = \frac{1}{\sqrt{2}} \mathbf{I}_N ~~~\textrm{and}~~~ \mathbf{H}_2 = \frac{1}{\sqrt{2}} \mathbf{W},$$

where $\mathbf {W}$ is an $M \times N$ orthonormal Fourier transform, and the measurements correspond to image- and diffraction-plane intensities. For a two-dimensional signal, the ratio $\sqrt {M/N}$ is the over-sample factor, which must be greater than one for this phase-retrieval system to have a finite lower bound on the MSE. That is, more than one of the performance matrix’s singular values are equal or nearly equal to $1$ when $M = N$. Figure 3 shows the lower bound on $\textrm {MSE}_{\theta }(\mathbf {f})$ as a function of the over-sample factor for a $30 \times 30$ signal $\mathbf {f}$ whose elements are independent, identically distributed complex Gaussian random variables.

Fig. 3. The lower bound on the phase-adjusted mean-square error for Gerchberg-Saxton phase-retrieval as a function of the over-sample factor for a two-dimensional signal with $30 \times 30$ elements.

Download Full Size | PDF

3. Practical systems

The random-phase system described above proves the existence of an optimal phase-retrieval system, but alternative measurement designs based on interferometer arrays or other strategies might also approach the optimal bound. However, as suggested by the Gerchberg-Saxton approach, coding in image and Fourier planes is common and attractive in optical systems. As illustrated in Fig. 4, the field distribution across the back focal plane of a lens is the Fourier transformation of the field distribution on the front focal plane scaled by $\frac {1}{\lambda f}$, where $\lambda$ is the wavelength and $f$ is the focal length [4,17]. Because coherent optical systems preserve the space-bandwidth product, if the field on the front focal plane can be described by $N$ measurements, then $N$ measurements are generally sufficient to characterize the field at the rear focal plane. However, due to the quadratic form of intensity measurement, we have seen that it is in fact beneficial to draw $M=4N$ measurements from the rear focal plane of a two-dimensional system. For a two-dimensional system, we consider the number of elements in each dimension of the signal as $N_x$ so that $N = N_x^2$, and the number of measurements in each dimension as $M_x$, so that $M = M_x^2$. Therefore, if $M_x = 2N_x$, then $M = 4N$. Specifically, we have considered the two-dimensional Fourier kernel with coefficients $h_{mn}=e^{j2\pi \frac {mn}{M_x}}/M_x$ with $M_x \geq 2N_x$. In an optical system, this transformation can be implemented by enforcing a support constraint on the input field, as suggested by the difference in scale between the input and output planes in Fig. 4. The sequence of coded phase modulations prior to the Fourier transform suggested in Eq. (42) can be implemented using the spatial light modulator in the input plane of the figure. Coding might also be implemented in Fourier space by placing a spatial light modulator (SLM) in the pupil plane of an imaging system or by using the 4f optical system illustrated in Fig. 5.

Fig. 4. Fourier transformation using a lens.

Download Full Size | PDF

Fig. 5. Implementation of Fourier filtering with a 4F optical system.

Download Full Size | PDF

These two optical systems provide a particular practical path to implementing the coded measurements described in Section 2.2. To implement a sequence of frames, multiple exposures can be recorded in the output plane while varying the code on the SLM. Of course, one may also expand or duplicate the input field to implement diverse coding in spatially parallel systems. The choice of physical design depends on physical and mathematical complexity. Whereas the optimal measurement strategy of Eq. (42) achieves quantum-limited MSE, the use of $N^2$ measurements to characterize $N$ signal values poses obvious practical difficulties. Ptychography [6,18] and Fourier ptychography [7,8] are phase-retrieval methods that address these issues by using space-frequency transformations of the desired signal. Ptychography simplifies physical sampling by sub-sampling either the object space or its Fourier space. Computational complexity is reduced because iterative phase retrieval updates can act on subregions of the signal, rather than the entire field. Ptychography can be implemented for the systems of Figs. 4 or 5 by sliding the limited-support window across the image- or Fourier-plane. In general, low-pass filtering does not conserve energy, meaning that current Fourier ptychographic systems are far from reaching quantum-limited MSE. However, it is possible to design optical elements that enable single-shot Fourier ptychography [19] and conserve signal power. We will assume that such designs are applied in the following analysis.

We model a ptychography system in discrete form as

(48)$$\mathbf{H}_k = \mathbf{W}_1 \mathbf{S}_k, ~~ k=1,2,\ldots,K,$$

where $\{ \mathbf {S}_k \}$ are $L \times N$ matrices that extract and possibly modify regions of $L = L_x \times L_x$ elements from the signal $\mathbf {f}$, and $\mathbf {W}_1$ is an $M \times L$ orthonormal Fourier transform. For each successive measurement $k$, the $L$-element window is shifted by $\Delta < L_x$ elements. For Fourier ptychography, we model the system as

(49)$$\mathbf{H}_k = \mathbf{W}_1 \mathbf{S}_k \mathbf{W}_0, ~~ k=1,2,\ldots,K,$$

where $\mathbf {W}_0$ is a $P \times N$ orthonormal Fourier transform, $\{ \mathbf {S}_k \}$ are $L \times P$ matrices that extract regions of $L = L_x \times L_x$ elements from the signal’s Fourier transform, and $\mathbf {W}_1$ is an $M \times L$ orthonormal Fourier transform. When $P = N$, we can ignore $\mathbf {W}_0$ for the purpose of error analysis, but when $P > N$, the system over-samples in the Fourier space and the resulting support constraint might improve the MSE bounds. As with standard ptychography, subsequent frames shift the sampling window by $\Delta < L_x$ elements. For uniform sensitivity to the elements of $\mathbf {f}$, we assume that the system design satisfies

(50)$$\sum_{k=1}^K \mathbf{S}_k^\dagger \mathbf{S}_k = \mathbf{I}_N.$$

To accomplish this, we model the Fourier- or image-plane masks $\{\mathbf {S}_k\}$ with circular shifts of a fundamental mask.

The sampling geometry for two-dimensional ptychography is illustrated in Fig. 6. Design of a ptychographic system involves selection of the subaperture ratio $N_x/L_x$, the window overlap $L_x/\Delta$ and the oversampling rate $M_x/L_x$. As an example, we consider $L_x = N_x/4$, $M_x = 2L_x$, and $\Delta = L_x/4$ in each dimension. This results in $256$ subaperture images, each with $M_x^2$ ($128 N^2$) measurements. Note that evaluating the $N_x^2 \times N_x^2$ performance matrix $\mathbf {C}$ for these systems is daunting, but for $N_x = 32$ with a signal whose elements are independent, identically distributed complex Gaussian random variables, the Cramer-Rao lower bound for the MSE is approximately 1.1. We have confirmed this limit for larger fields using simulations. We implemented a Fourier ptychographic system where sampling windows were applied to the signal in the Fourier space, and the selected patches were inverse transformed to generate low-resolution frames. As shown in our code [20], we found that augmenting simple phase retrieval with gradient descent resulted in MSE values that are close the the lower bound. We computed the gradients using the built-in function tf.gradients() in TensorFlow [21]. Our simulations confirmed that the MSE limit of 1.1 times the quantum limit extends to ptychographic systems of larger scale. Simulations also allowed us to analyze the MSE of pytchographic field estimation as a function of aperture size $L_x$, measurement size $M_x$ and apeture displacement $\Delta$. As an example, Fig. 7 shows MSE as a function of $L_x/\Delta$. The ratio $L_x/\Delta$ influences two competing factors: 1) the number of measurements; and 2) the number of photons per measurement. With all other factors held constant, increasing both of these will improve performance. However, increasing one of these factors necessarily decreases the other, and, because of that, we see non-monotonic performance as a function of $L_x/\Delta$.

Fig. 6. The measurement geometry for a discrete ptychography system: $N_x$ is the number of pixels in each dimension of the signal space; $L_x$ is the number of pixels in each dimension of subaperture; and $M_x$ is the number of pixels in each dimension of the sampled frame.

Download Full Size | PDF

Fig. 7. MSE as a function of $L_x/\Delta$. Note that curves for $L_x = 32$ and $L_x=64$ largely overlap when $L_x/\Delta <8$.

Download Full Size | PDF

We showed in Section 2.2 that modulation with phase codes improves MSE over classic Gerchberg-Saxton measurements. As suggested by other authors [22–24], and as illustrated in Fig. 8, the MSE of ptychograhic phase-retrieval systems can also be improved using phase modulation. The figure models modulation of the Fourier plane of each ptychographic measurement by an $L_x\times L_x$ phase mask. The same mask is shifted on to the measurement window in Fourier space for each subsequent measurement. We simulated 3 different masks, including a random phase mask, a binary phase mask defined by modified uniformly redundant arrays (MURA) as described in [4], and a periodic mask whose phase shift at coordinate $(m,n)$ is $2\pi \cos (\frac {\pi }{L_x}m)\cos (\frac {\pi }{L_x}n)$.

Fig. 8. Comparison of the MSE of ptychograhic phase retrieval systems modulated by different phase masks. Because the size of each dimension of MURA must be a prime number $p$ that $p(mod4)=1$, we tested $L_x = 17, 37, 16$ and 113, $M_x = 2L_x-1$ and $\Delta =4,8,16$ and 32 respectively. For other masks, $M_x = 2L_x$ and $\Delta = L_x/4$.

Download Full Size | PDF

4. Summary

Characterization of coherent optical fields under the constraint that only the modulus is observable has been of intense technical interest for the past 75 years. Holographic measurement—which linearizes the problem—has been the dominant method, with phase retrieval being applied in situations when a holographic reference is not feasible. We have shown that with appropriate coding, phase retrieval can achieve the same quantum-limited MSE as holography, which is the best any system can do by measuring the intensities of linear or affine transformations of the field. Whereas actual designs depend on coherence properties of the field and the available sources, detectors and optical elements, we anticipate that the results and strategies described here will impact the long-running competition between holography and phase retrieval.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are available in Reference [20].

References

1. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]

2. E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval from coded diffraction patterns,” Appl. Comput. Harmon. Analysis 39(2), 277–299 (2015). [CrossRef]

3. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: A contemporary overview,” IEEE Signal Process. Mag. 32(3), 87–109 (2015). [CrossRef]

4. D. J. Brady, Optical imaging and spectroscopy (John Wiley &Sons, 2009).

5. J. W. Goodman, Statistical Optics (John Wiley &Sons, 1985).

6. J. M. Rodenburg, “Ptychography and related diffractive imaging methods,” Adv. Imaging Electron Phys. 150, 87–184 (2008). [CrossRef]

7. P. C. Konda, L. Loetgering, K. C. Zhou, S. Xu, A. R. Harvey, and R. Horstmeyer, “Fourier ptychography: Current applications and future promises,” Opt. Express 28(7), 9603–9630 (2020). [CrossRef]

8. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nat. Photonics 7(9), 739–745 (2013). [CrossRef]

9. S. M. Kay, Fundamentals of Statistical Signal Processing, Volume 1: Estimation Theory (Prentice Hall, 1993).

10. D. L. Snyder and M. I. Miller, Random point processes in time and space (Springer-Verlag New York, 1991).

11. C. Qian, N. D. Sidiropoulos, K. Huang, L. Huang, and C. S. Hing, “Phase retrieval using feasible point pursuit: Algorithms and the Cramér-Rao bound,” IEEE Trans. Signal Process. 64(20), 5282–5296 (2016). [CrossRef]

12. J. N. Cederquist and C. C. Wackerman, “Phase retrieval error: A lower bound,” J. Opt. Soc. Am. A 4(9), 1788–1792 (1987). [CrossRef]

13. X. Wei, H. P. Urbach, and W. M. J. Coene, “Cramér-Rao lower bound and maximum-likelihood estimation in ptychography with Poisson noise,” Phys. Rev. A 102(4), 043516 (2020). [CrossRef]

14. J. M. Bujjamer and H. E. Grecco, “Unveiling Fourier ptychography via Fisher information,” Opt. Commun. 483, 126642 (2021). [CrossRef]

15. J. N. Cederquist, J. R. Fienup, C. C. Wackerman, S. R. Robinson, and D. Kryskowski, “Wave-front phase estimation from fourier intensity measurements,” J. Opt. Soc. Am. A 6(7), 1020–1026 (1989). [CrossRef]

16. R. Gerchberg and W. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).

17. J. W. Goodman, Introduction to Fourier optics (Roberts and Company Publishers, 2005).

18. H. M. L. Faulkner and J. M. Rodenburg, “Movable aperture lensless transmission microscopy: A novel phase retrieval algorithm,” Phys. Rev. Lett. 93(2), 023903 (2004). [CrossRef]

19. X. He, C. Liu, and J. Zhu, “Single-shot aperture-scanning fourier ptychography,” Opt. Express 26(22), 28187–28196 (2018). [CrossRef]

20. C. Wang, D. J. Brady, and T. J. Schulz, “Code used in “photon-limited bounds for phase retrieval”,” (2020). https://github.com/djbradyAtOpticalSciencesArizona/photonLimitedPhaseRetrieval.

21. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” (2015). Software available from tensorflow.org.

22. A. M. Maiden, J. M. Rodenburg, and M. J. Humphry, “Optical ptychography: A practical implementation with useful resolution,” Opt. Lett. 35(15), 2585–2587 (2010). [CrossRef]

23. M. Odstrčil, M. Lebugle, M. Guizar-Sicairos, C. David, and M. Holler, “Towards optimized illumination for high-resolution ptychography,” Opt. Express 27(10), 14981–14997 (2019). [CrossRef]

24. L. Loetgering, X. Liu, A. C. de Beurs, M. Du, G. Kuijper, K. S. Eikema, and S. Witte, “Tailoring spatial entropy in extreme ultraviolet focused beams for multispectral ptychography,” Optica 8(2), 130–138 (2021). [CrossRef]

Photon-limited bounds for phase retrieval

Abstract

1. Introduction

2. Fisher information and the Cramér-Rao lower bound

2.1 Optimal phase retrieval

2.2 Existence of an optimal system

2.3 Performance bound for classic phase retrieval

3. Practical systems

4. Summary

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Equations (50)

Optics Express