Phase-diversity wavefront sensing enhanced by a Fourier-based neural network

Zhisheng Zhou; Zhisheng Zhou; Jingang Zhang; Jingang Zhang; Qiang Fu; Yunfeng Nie

doi:10.1364/OE.466292

1. Introduction

Wavefront sensing plays an important role in characterizing and evaluating optical systems. Various wavefront sensing techniques that can quantify optical aberrations have been developed [1–4]. Image-based wavefront sensing as an essential branch in this field, determines the unknown aberrations through a numerical optimization from intensity measurements and a parameterized physical model [5]. However, they are often susceptible to the phase ambiguity problem. To improve the accuracy, an enhanced image-based method that uses multiple images formed by different imaging channels with known phase diversity has been proposed [6,7]. Such phase diversity wavefront sensing (PDWS) method has been successfully applied to a broad range of fields, such as optical metrology [8], biomedical imaging [9], and astronomical imaging [10,11].

Although phase diversity offers additional information to narrow down the solution space, PDWS still suffers from the non-convexity problem. The conventional phase diversity technique defines an error metric that measures the difference between observations and predictions, and then recovers the aberration by minimizing the error metric via nonlinear optimization. Many standard gradient descent optimization algorithms, such as steepest descent (SD), conjugate-gradient (CG), or Broyden-Fletcher-Goldfarb-Shanno (BFGS) [12], have been used to progressively reduce the error metric. As the optimization is non-convex, these gradient descent algorithms are prone to the stagnation problem [13], and the solution can be easily stuck in a local minimum that is far from the true solution [14]. Many efforts have been made to address this problem and improve the system performance [15–18]. However, to the best of our knowledge, none of the preceding works have shown reasonable statistical experimental results in scale to demonstrate the robustness of escaping local minima.

The choice of the initial estimate greatly impacts the performance of such gradient based algorithms. In order to avoid local minima, it is important to generate informative initial points. One straightforward way is to use a series of random starting guesses in an attempt to randomly fall within the solution space [14]. A bootstrapping strategy on generating starting points optimizes the low-order Zernike coefficients first and then optimizes the higher-order coefficients upon the previous results, demonstrating increased success rate [19]. Previously, we demonstrated that multiple starting points produced by the k-means clustering algorithm can improve the robustness statistically [20]. However, this strategy significantly increases the processing time and complexity. Recently, artificial neural networks have been introduced to image-based wavefront sensing [21–25]. In recent works [26,27], a convolutional neural network (CNN) based on Google’s Inception v3 modified architecture has been trained to give an estimation on optical aberration coefficients. A Monte Carlo analysis shows that the trained CNN provides good initial points, and is more effective than multiple random starting guesses. Nevertheless, the CNN training process is still very time-consuming and computationally costly. Besides that, gathering tens of thousands of highly accurate ground-truth data for CNN training is extremely hard or even impossible in most wavefront sensing applications. On the contrary, creating training data through simulation offers an alternative cheaper way, but the generated data may differ from real data in terms of many factors, such as noise, optical misalignment etc. There is still a lack of sufficient evidence from statistical experimental results to support that a network trained on simulation data can preserve high accuracy when dealing with real-world data.

To overcome the data deficiency in deep learning based PDWS methods, we propose a new approach that effectively generates starting points and improves the accuracy and robustness of such methods in general. We employ an enhanced PDWS method with a Fourier-based neural network and a Limited-Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) solver. The network is a fully connected multi-layer perceptron (MLP), which establishes the nonlinear mapping between the low-frequency Fourier coefficients of the PSF images and their associated aberration coefficients. The aberration coefficients predicted by the network are then used as the starting point for the L-BFGS solver. Compared with training a CNN to predict starting points, the Fourier-based neural network has several significant advantages, e.g., much fewer training samples, much less training time, and less power consumption. The key finding in our method is that the differences between the low-frequency Fourier coefficients of the simulated and real PSF images are negligible. As a result, when a network trained on such simulation data is applied to real-world data, the output is still close enough to the true solution. Moreover, only a small amount of simulation data is sufficient for training in our proposed method, greatly relieving the difficulty of gathering tens of thousands of real-world training data in typical deep learning training. Statistical experimental results show that the proposed network outperforms existing methods both in accuracy and processing speed, demonstrating an easier yet more practical way to solve the PDWS problem.

2. Theory

2.1 Phase diversity wavefront sensing

An intensity image capture process can be modeled by a convolution operation

(1) $$i_1 = o * h_1,$$

where $o$ is the object, $h_1$ is the intensity point spread function (PSF) for the imaging channel, and $i_1$ is the captured intensity image. In our experimental setup (see Sec. 3.1), we capture the PSF images, so the object is like a point source with a certain size, thus $i_1 = h_1$. Note that PDWS is also applied to extended sources as the captured images [24]. The generalized pupil function of an imaging system can be expressed as

(2)$$p = \mathcal{P} \exp{\left[j\phi\right]},$$

where $\mathcal {P}$ and $\phi$ are the binary pupil amplitude and the unknown wavefront phase of the imaging channel, respectively. Under Fraunhofer approximation, the Fourier transform of the generalized pupil function gives the impulse response function, and the intensity PSF is the squared magnitude of the impulse response function,

(3)$$h_1 ={\mid} \mathcal{F} \left\{ \mathcal{P} \exp{\left[j\phi\right]} \right\} \mid ^{2},$$

where the $\mathcal {F}$ operator denotes the 2D Fourier transform. Moreover, the phase function usually is described by expanding it into a finite set of Zernike polynomials [28],

(4)$$\phi = \sum_{k=1}^{K} a_k Z_k,$$

where $Z_k$ is the Zernike basis for the expansion, and $a_k$ is the corresponding coefficient. Thus, the captured intensity image is a function of the unknown Zernike coefficients. To solve the phase ambiguity problem and make the system more robust, an extra known phase diversity is added to the imaging channel, and the intensity PSF is given by

(5)$$h_2 ={\mid} \mathcal{F} \left\{ \mathcal{P} \exp{\left[ j \left(\phi+\theta\right)\right]} \right\} \mid ^{2},$$

where $\theta$ represents the phase diversity. Similarly, the captured intensity image corresponding to $h_2$ is given by

(6) $$i_2 = o * h_2.$$

An objective function that yields the maximum-likelihood estimate of the Zernike coefficients can be derived as [6]

(7)$$\mathcal{L} \left(\left\{a_k\right\}\right) = \sum_{}^{}\left({\left| I_1 \right|^{2} + \left| I_2 \right|^{2}}\right)- \sum_{}^{}\left(\frac{\left|I_1H_1^{*}+I_2H_2^{*}\right|^{2}}{\left| H_1 \right|^{2} + \left| H_2 \right|^{2}}\right),$$

where $I$ and $H$ denote the Fourier transform of $i$ and $h$, respectively, and $\sum$ operator denotes summation in the Fourier domain.

Besides the above classic solution from images of two phase diversities, more phase diversities can also be employed [6] to achieve a higher retrieval accuracy, e.g., in the James Webb Space Telescope [29]. Another improved technique uses three phase diversities from two symmetrical defocused images and a focused image [30]. In this case, the objective function can be derived as

(8)$$\mathcal{L} \left(\left\{a_k\right\}\right) = \sum_{}^{}\left({\left| I_1 \right|^{2} + \left| I_2 \right|^{2}+ \left| I_3 \right|^{2}}\right)- \sum_{}^{}\left(\frac{\left|I_1H_1^{*}+I_2H_2^{*}+I_3H_3^{*}\right|^{2}}{\left| H_1 \right|^{2} + \left| H_2 \right|^{2}+ \left| H_3 \right|^{2}}\right),$$

where $I_3$ and $H_3$ relate to the images of the third phase diversity.

A nonlinear solver works to minimize the objective function, and gives an estimate of the aberration coefficients. When the objective function is non-convex, nonlinear optimization algorithms can easily get trapped in a local minimum.

2.2 Neural network based on low-frequency Fourier coefficients

The 2D Fourier transform pair of a two-dimensional function $f\left (x,y\right )$ in the spatial domain is [31]

(9)$$F\left(u,v\right) = \iint f\left(x,y\right)\exp{\left[{-}j2\pi\left(ux+vy\right)\right]}dxdy,$$

and

(10)$$f\left(x,y\right) = \iint F\left(u,v\right)\exp{\left[j2\pi\left(ux+vy\right)\right]}dudv,$$

where $(x, y)$ and $(u, v)$ are spatial and frequency coordinates respectively. $F\left (u,v\right )$ and $f\left (x,y\right )$ are referred to as the Fourier transform pair. Obviously, $F\left (u, v\right )$ is a complex function. Furthermore, for a digital image $f\left (x, y\right )$, $F\left (u, v\right )$ is conjugate symmetric since $f\left (x, y\right )$ is a real function.

The squared magnitude of $F\left (u,v\right )$ is the power spectrum $P\left (u,v\right )$, which indicates the amount that each frequency contributes to $f\left (x,y\right )$. An illustrative example of a PSF image and its power spectrum is shown in Fig. 1(a) and Fig. 1(b), respectively. The power at the origin $P\left (0,0\right )$ indicates the so-called direct component and reflects the average intensity of $f\left (x,y\right )$. $P\left (u, v\right )$ in a region around the origin represent the low-frequency components that provide the basic structure of $f\left (x,y\right )$. For example, $P\left (u, v\right )$ inside a circle centered at the origin, as Fig. 1(b) shows, represent the low-frequency components restricted by a radial cut-off frequency determined by the circle radius. In contrast, $P\left (u, v\right )$ in a location further from the origin represents the high-frequency components that provide fine details.

Fig. 1. Motivation of the low-frequency Fourier coefficients representation. (a) Normalized PSF intensity. (b) Power spectrum of (a). Inside the white circle are the low-frequency components within a cut-off frequency. (c) Absolute relative error between the Fourier coefficients of simulated and recorded images. (d) RMS training error (solid red line, circle symbol) and training time (dashed blue line, triangle symbol).

Download Full Size | PDF

The back-propagation artificial neural network [32,33], one of the most well-known types of multi-layer perceptron neural networks, is used to learn the mapping relationship between the Fourier coefficients of PSF images and their corresponding aberration coefficients. This network is chosen for its unique features, such as its simple structure, parallel processing ability, distributed memory, and high fault tolerance.

In this work, we propose to use only low-frequency Fourier coefficients to train the network, mainly for two considerations. First and foremost, we find low-frequency Fourier coefficients show much smaller differences between simulated and real images than high-frequency Fourier terms do. Figure 1(c) shows the absolute relative errors (i.e., $\mid \left (a-b\right )/a\mid$ for the numbers $a$ and $b$) between the Fourier coefficients of simulated and real PSF images. We load Zernike coefficients onto a spatial light modulator (SLM) to capture real PSF images. Then, we generate simulated PSF images based on the real system settings and the Zernike coefficients. We calculate the errors of $300$ pairs of simulated and real PSF images, and average the results. The cut-off frequency is normalized as $r/r_0$, where $r_0$ is the maximum radial frequency. As we increase the radius $r$, the discrepancy between the Fourier coefficients of the simulated and real images gets larger, indicating that low-frequency coefficients are better in terms of image consistency. This is crucial, as we intend to apply a network trained on cheap simulation data rather than expensive real-world data. Second, we also find that low-frequency Fourier coefficients are simple but enough to draw the mapping. We use Fourier coefficients restricted by different circle radii to train the back-propagation neural network. We repeat the training $10$ times for each circle radius and average the training errors and training time. Figure 1(d) shows the RMS training error (solid red line, circle symbol) and the training time (dashed blue line, triangle symbol). The result demonstrates that the training time increases dramatically as the circle radius increases, but there is little qualitative change in the training error when the normalized radius is greater than $0.05$. Therefore, training with low-frequency coefficients is a viable way to improve the efficiency.

The structure of our Fourier-based neural network is shown in Fig. 2. The network consists of an input layer, a hidden layer, and an output layer. Each layer contains a number of neurons, each of which is connected to other neurons in the adjacent layers. The low-frequency Fourier coefficients of PSF images make up an input vector of the network, and corresponding aberration coefficients make up an output vector. Once the network is well trained, an estimate of aberration coefficients can be obtained directly from the network.

Fig. 2. The Fourier-based neural network structure.

Download Full Size | PDF

Note that the Fourier transform of a PSF image equals to the optical transfer function (OTF), so the network is essentially an OTF-based network. Through the implicit OTFs, the network connects PSFs and Zernike coefficients directly, which promotes the underlying mapping relation between the low-frequency Fourier coefficients and the Zernike coefficients indicating the wavefront aberrations.

2.3 PDWS enhanced by the Fourier-based neural network

The proposed method involves two stages, network training with simulation data, and then aberration inference. Figure 3(a) shows the schematic diagram of the network training process, which can be roughly divided into four steps.

1. Determine the simulation settings, such as the imaging target, detector, illumination, pupil, focal length, phase diversities, noise, and aberrations. These settings should be consistent with the real optical system as much as possible.
2. Generate PSF images by simulation. Based on the simulation settings and the imaging theory in Fourier optics, a group of PSF images with different phase diversities is simulated for each aberration.
3. Build a training dataset. The low-frequency Fourier coefficients of each group of images make up an input vector, and the corresponding aberration coefficients make up an output vector.
4. Train a back-propagation neural network. The number of neurons in the input and output layers matches the input and output vectors, respectively. The hidden layers are properly designed. Then the network is trained with the simulated training dataset.

Fig. 3. Schematic diagram of the working procedure of the proposed PDWS method. (a) Network training. (b) Aberration retrieval.

Download Full Size | PDF

Figure 3(b) depicts the operation of wavefront retrieval schematically. First, the low-frequency Fourier coefficients are extracted from the collected PSF images to make up an input vector. Then the input vector is fed into the trained network, which immediately gives a prediction of the aberration coefficients. The predicted aberration coefficients are used as a starting point for the nonlinear solver. Finally, an optimized solution is achieved.

3. Experiments

3.1 Experimental setup

Here we briefly describe the laboratory experimental setup, which has been detailed in Ref. [20]. A reflective phase-only spatial light modulator (SLM) that employs Liquid Crystal on Silicon (LCoS) technology is utilized to simulate aberrations as well as phase diversities in the experiment. Figure 4(a) schematically illustrates the optical system setup. The wavefront of the laser beam from a green laser module is modulated by the SLM through phase modulation. Then the modulated beam is focused by an achromatic doublet and recorded by an image sensor. The system also includes other optical devices, such as a neutral density filter, a polarizer to adjust the light attenuation and polarization, an objective lens, a pinhole to filter the wavefront, and a ring-actuated iris diaphragm to adjust the beam aperture. The system is carefully designed so that the focused light intensity can be adequately sampled by the detector [34].

Fig. 4. Experimental setup and examples of recorded PSF images. (a) Schematic diagram of the optical system setup. (c) is the wrapped aberration map to be retrieved, (b) and (d) are the wrapped aberration maps with $-1.0\lambda$ and $1.0\lambda$ defocused PV values with respect to (c). (e), (f), and (g) are the recorded PSF images corresponding to (b), (c), and (d).

Download Full Size | PDF

The SLM simulated aberration is a linear combination of Zernike polynomials up to the 15th term, with the 1st, 2nd, and 3rd terms set to zero. A set of random aberration coefficients within a reasonable range is generated by the MATLAB random number generator. By carefully selecting the coefficients, the RMS value and the peak-valley (PV) value of the combined aberration are kept within $[0.2\lambda, 0.3\lambda ]$ and $[1.0\lambda, 1.5\lambda ]$, respectively. Then the aberration is added to the light path, and the camera records the corresponding PSF images. To achieve a higher phase retrieval accuracy, two phase diversities, namely a defocus diversity of $1.0\lambda$ PV value and a defocus diversity of $-1.0\lambda$ PV value, are separately added to the aberration to generate the second and third observations [30]. Thus, there are three recorded images for each aberration. An example of the aberration maps (focused and defocused, truncated to $801\times 801$ pixels) and corresponding recorded PSF images (truncated and enlarged to $512\times 512$ pixels) are shown in Figs. 4(b)-(g). A total of $800$ different sets of Zernike coefficients are sent to the SLM to produce the ground-truth aberrations, and $2400$ images are recorded. Taking into account the optical system’s original aberration, a blank aberration map is used to retrieve the original aberration, which is used as the background to be subtracted in aberration retrieval.

3.2 Image simulation and network training

The simulation settings are consistent with the experimental optical system, with a 6 mm aperture size, 100 mm focal length, 532 nm wavelength, 2.4 $\mu m$ camera pixel pitch, and triple phase diversities $\left \{ -\lambda, 0, \lambda \right \}$. A total of $1000$ different sets of random aberration coefficients are generated and selected, following the same rules as in the experiment. PSF images are simulated according to the settings and the aberrations. To simulate the noise condition, an additive Gaussian noise commensurate with the noise of the experimental measurements is introduced [6]. For each simulated image, we calculate the Fourier coefficients using the FFT algorithm, and normalize the Fourier coefficients (i.e., the sum of the power spectrum equals one) to remove the effect of light intensity fluctuation. The low-frequency Fourier coefficients of each group of images make up an input vector, while the corresponding aberration coefficients make up an output vector. Here, the low-frequency zone is specified as an area inside a circle centered at the origin with a normalized radius of 0.05. It is worth noting that, since the direct component does not contribute to the PSF distribution, we exclude the zero-frequency coefficients. Besides, taking the conjugate symmetric property into account, only half of the low-frequency Fourier coefficients are utilized. Moreover, because a neural network generally cannot handle complex values, the real and imaginary parts of the Fourier coefficients are separated to be individual elements in the input vector. All in all, a training dataset is constructed.

The back-propagation neural network training is implemented using the MATLAB R2021a Neural Net Fitting Toolkit with the Levenberg-Marquardt backpropagation algorithm. The network consists of an input layer, a hidden layer, and an output layer. The number of neurons in the hidden layer is $25$. The samples are randomly partitioned into training ($70\%$), validation ($15\%$), and test sets ($15\%$). Figure 5(a) shows that a total regression value R higher than $0.99$ is achieved in the training, indicating an extremely high correlation between the network outputs and desired outputs. Figure 5(b) shows the histogram of the aberration coefficient errors between the network outputs and desired outputs. More than $90\%$ of the errors are within the range of $[-0.025\lambda, 0.025\lambda ]$. The RMS errors between the network outputs and desired outputs in the training, validation, and test sets are $0.0082\lambda$, $0.0148\lambda$, and $0.0140\lambda$, respectively. The training result suggests that the back-propagation neural network can establish an exceedingly accurate relationship between the low-frequency Fourier coefficients and their corresponding aberration coefficients.

Fig. 5. Network training performance evaluation from (a) the regression R values and (b) the aberration coefficient errors between the network outputs and desired outputs.

Download Full Size | PDF

3.3 Wavefront retrieval results

We retrieve the aberrations using the proposed method with the trained network. Processing steps like Fourier coefficients calculation, normalization, and input vector generation are carried out sequentially for the recorded images, just like in the simulation. For each group of images, the input vector is fed into the trained network to generate a prediction of the aberration coefficients, and the predicted coefficients are then used as the starting point for the nonlinear optimization solver. The L-BFGS algorithm is used to do the optimization, which has several remarkable advantages, e.g., low memory requirements, fast convergence, and good stability. More importantly, using the second-order curvature information to find the search direction, the L-BFGS algorithm outperforms first-order algorithms with more robust convergence for the non-convex optimization problem.

Figure 6(a) shows the statistical result of the RMS residual errors of the 800 aberrations. The minimum, maximum, and mean values of the RMS residual errors are $0.017\lambda$, $0.056\lambda$, and $0.039\lambda$, respectively. $95\%$ of the RMS residual errors are less than $0.05\lambda$. The result indicates that a remarkably high retrieval accuracy can be achieved using our proposed method.

Fig. 6. Statistical results of phase retrieval errors in solving $800$ different aberrations with the proposed method. (a) Accuracy distribution among RMS residual errors. (b) Accuracy distribution among Zernike terms. Error bar is the standard deviation of the Zernike coefficient errors.

Download Full Size | PDF

We also explore the effects on the network estimate accuracy among Zernike terms as the method is based on low-frequency Fourier coefficients. Figure 6(b) shows the accuracy distribution among the 15 Zernike terms in 800 aberrations, and their errors shows no statistical tendency. Therefore, the proposed method does not in particular prefer the low-order Zernike terms. We note that the Zernike polynomials represent phase difference in the wavefront errors on the pupil plane, and higher-order Zernike terms offer more high-frequency phase structures to the outer pupil region. The Fourier coefficients, on the other hand, correspond to the spatial structures of the PSF, which is the squared magnitude of the Fourier transform of the generalized pupil function (Eq. (3)). This non-linear projection does not imply a one-to-one correlation between the phase structure and the spatial property of the PSFs.

3.4 Performance comparison

In Table 1, we statistically compare the system performance for the $800$ different aberrations in terms of accuracy and time cost for our proposed method against some other methods, i.e., the classic PDWS with the L-BFGS algorithm [35], the PDWS with the L-BFGS algorithm and a bootstrapping strategy [19], the PDWS with the L-BFGS algorithm and multiple starting points selected randomly [14], the PDWS that combines the L-BFGS algorithm and multiple starting points generated by the k-means clustering algorithm [20]. The zero point is the only starting point in the classic PDWS [35], while serving as one of the multiple starting points in the modified PDWS methods [14,20]. To adopt the bootstrapping strategy, we first retrieve the second and third orders of the Zernike coefficients and then retrieve all other coefficients based on the achieved lower-order coefficients. Our network is readily trained on a small dataset with only $1000$ samples. The training time is about $1.4$ min without GPU acceleration. All the computations are carried out on a laptop with an AMD Ryzen 9 5900HS Processor (3.30GHz, 48GB RAM).

Table 1. Statistical comparison of different PDWS methods on experimental aberrations.^a

View Table | View all tables in this article

From Table 1, the classic PDWS with the L-BFGS algorithm has poor accuracy and fails in lots of cases. Compared to the classic method, the PDWS with a bootstrapping strategy improves the accuracy at a small cost in time increment. Only with the starting point number $M=20$ can the solver with multiple starting points achieve a highly successful retrieval rate, i.e., $98.38\%$ of the RMS residual errors are less than $0.071\lambda$ (Maréchal criterion [36]) for the randomly selected starting points and $98\%$ for the k-means clustering generated starting points, with about $35$ times more processing time than the classic method. Our proposed method achieves the highest retrieval accuracy while requiring the least amount of processing time. With only one starting point $M=1$, $95\%$ of the RMS residual errors are less than $0.05\lambda$, and the rest of the errors are less than $0.071\lambda$. The average processing time is $0.37$ sec, which includes $0.07$ sec for initial estimation and $0.30$ sec for phase retrieval. Our method also outperforms all existing methods in the inference time, resulting in much faster aberration retrieval.

4. Discussion

In our enhanced PDWS method, the final performance relies on some key factors that deserve a thorough analysis as discussed below. We perform a detailed study from several perspectives to illustrate the accuracy and efficiency of the proposed method.

4.1 Generalization

Recently deep network-learned features exhibit exciting accuracy and efficiency in image representation and characterization. In principle, we can try out different features and network structures in PDWS training until we get satisfactory results. However, it is hard to get massively accurate training data in practical applications, and most of the time we can only train the network with simulation data. Although efforts have been made to keep simulation data as close to real-world data as possible, differences inevitably exist, which degrades the actual system performance. In this work, low-frequency Fourier coefficients, which are slightly different between simulated and recorded images, are used as unique features to represent the PSF images to obtain promising results.

Nevertheless, we are curious about whether it is a universal phenomenon that low-frequency Fourier coefficients are more consistent than other Fourier coefficients. This question is very important because it deals with the theoretical foundation of the proposed method. To explore this, we generate PSF images by simulation, make slight perturbations to the simulation settings, generate the PSF images again, and calculate the absolute relative errors between the Fourier coefficients of images obtained from the two simulations. Changes are made to object size, unknown aberration, defocus error, and noise level, which are four important factors. The radius of the simulated point source, which serves as the imaging object, is enlarged from 2.4 $\mu m$ to 4.8 $\mu m$ to vary the object size. Zernike coefficients are multiplied by random values that are distributed uniformly in the range [0.8, 1.2] to mimic the aberration perturbations. Two concentric circles in the frequency domain are chosen to make a band-pass filter that picks up the desired frequency bands. The normalized radii of the two circles are $0.01$ and $0.05$ (low-frequency), $0.31$ and $0.35$ (mid-frequency), $0.61$ and $0.65$ (high-frequency), respectively. We calculate the absolute relative errors of $300$ pairs of images, and average the results in Table 2. The result confirms that there are much smaller changes between the low-frequency Fourier terms than between other Fourier terms under different conditions. It also reveals that, even though the low-frequency Fourier terms are relatively sensitive to aberration changes, they are very resistant to changes in the rest of the system parameters, particularly noise. This means that the differences between the simulation settings and the real system have little effect on the low-frequency terms. Thus, it is reasonable to conclude that the differences between the low-frequency Fourier terms of simulated and collected images are generally small, and our proposed method can be used in a wide range of real-world situations.

Table 2. Effect of parameter variations on Fourier coefficients of PSF images.

View Table | View all tables in this article

4.2 Fourier frequency selection

The low-frequency zone is specified as an area inside a circle centered at the origin. The circle radius determines the radially cut-off frequency of the low-frequency components. To demonstrate the impact of the radial frequency on the system performance, we train the network with Fourier components restricted by different circle radii, retrieve the experimental aberrations, and then calculate the percentage of the RMS residual errors that are less than $0.071\lambda$. Five different normalized radii are selected, namely $0.02$, $0.03$, $0.05$, $0.07$, and $0.08$. To eliminate the influence of network training, we repeat the process $100$ times and calculate the percentages’ minimum, maximum, and mean values. Table 3 shows the statistical result of the percentage calculation. The system achieves higher accuracy as the radius gets bigger for the normalized radius of $0.02$, $0.03$, and $0.05$. However, the total accuracy decreases as the radius gets bigger for $0.05$, $0.07$, and $0.08$. The result is due to a combination of representation accuracy and image consistency. Within certain limits, the higher the radial frequency, the more the Fourier components are used to represent the PSF images, and the training accuracy improves. On the contrary, with much higher radial frequency components, the discrepancy between the Fourier components of the simulated and real images become prominent. Besides, longer training time is also required. Thus, a good radial frequency should find the optimal balance between training and image consistency.

Table 3. System performance with different radii of the frequency circle.

View Table | View all tables in this article

4.3 Limitation and improvement

Although the neural network can provide a good starting point for the nonlinear solver, the method cannot guarantee complete success in finding the best solution. There is still a chance that the starting point is outside the capture range [14], and the nonlinear searching fails to reach the global minimum. Besides, when we repeat the network training many times, there are slight differences among the trained networks as well as the predictions. This has little impact on the overall performance, but probably has a significant influence on an individual result, increasing the chance of the prediction being outside the capture range. All network-based methods will inevitably encounter this problem. To explain this, we train a network, recover the experimental aberrations using the trained network, and calculate the percentage of the RMS residual errors that are less than $0.071\lambda$. We repeat the process $100$ times. Table 4 shows the statistical result of the percentage calculation. The minimum, maximum, and mean values of the percentages are $99\%$, $100\%$, and $99.52\%$, respectively, suggesting the method can acquire the best solution with an extremely high probability. However, it also reveals that the method is still not $100\%$ successful.

Table 4. System performance with different number of trained networks.

View Table | View all tables in this article

One way to improve this is by utilizing multiple starting points generated by different trained networks. The L-BFGS solver searches from the starting points and chooses the solution with the minimum objective function value as the final solution. We randomly train two networks and recover the experimental aberrations with the two networks. Similarly, we repeat the process $100$ times, and Table 4 shows the statistical result. The minimum, maximum, and mean values of the percentages of the RMS residual errors that are less than $0.071\lambda$ increase to $99.5\%$, $100\%$, and $99.86\%$, respectively. We also randomly train three networks and recover the aberrations. The minimum, maximum, and mean values of the percentages increase to $99.62\%$, $100\%$, and $99.91\%$, respectively. The average processing time are $0.37$ sec, $0.74$ sec, and $1.12$ sec, respectively. The result implies that the probability of finding the best solution approaches $100\%$ as the number of trained networks increases, though at the cost of a small processing time increment, which in our opinion is acceptable since accuracy is more important.

4.4 Investigation on wavefront range

Besides high accuracy, the measurable wavefront aberration range is another important metric. Therefore, we perform numeric simulations to predict the influences from various wavefront aberration ranges. Simulation settings and perturbations factors are consistent with the experimental system. To guarantee that there are apparent differences among the diversity images, we set the phase diversities to $\left \{ -4\lambda, 0, 4\lambda \right \}$. Three different training strategies have been used to evaluate the proposed approach in dealing with these larger wavefront aberrations.

During the simulation, each network is trained under 1000 wavefront aberrations, and then each test is performed using 100 aberrations to obtain the average of their RMS residual errors. The three strategies are varied in the training and test wavefront ranges. In the first case, we train the networks independently with wavefront RMS values bound in sub-ranges as $[k\lambda, (k+0.1)\lambda ]$, $k=0.2, 0.3,\ldots, 1.0, 1.1$, and then test the $10$ networks using wavefronts from the same sub-range. In the second case, we train the network using wavefronts with RMS values bound in $[1\lambda, 1.1\lambda ]$ and then test the network using wavefronts with extended sub-ranges in $[0.2\lambda, 1.2\lambda ]$. In the third case, the network is trained under the overall range $[0.2\lambda, 1.2\lambda ]$ with equally distributed wavefronts in each sub-range, and tested under the same procedure as the second case.

Figure 7 shows the RMS residual errors from each sub-range of the three cases. Intuitively, case 1 is supposed to have the highest accuracy, which serves as the benchmark performance. We notice that the RMS residual error increases as the wavefront RMS value enlarges. The maximum RMS residual error is $0.059\lambda$, about $5\%$ of the corresponding RMS wavefront value. Results of case 2 indicates the network can extrapolate the prediction to an untrained, larger wavefront range with acceptable RMS residual errors. Both results from case 1 and case 2 can satisfy the Maréchal criterion for wavefronts within $[0.2\lambda, 1.2\lambda ]$. Case 3 demonstrates the network is still working reasonably in the overall range without exhausted training for each sub-range.

Fig. 7. Residual RMS errors in various RMS wavefront ranges from three training strategies. In case 1, the network is trained and tested under the same sub-range, acting as a benchmark performance that is well below the Maréchal criterion [36]. Case 2 is trained under a sub-range and tested in other untrained sub-ranges, and Case 3 is trained and tested in the same overall range.

Download Full Size | PDF

These simulation results show that the residual error increases with larger RMS wavefront values regardless of within the training ranges or not. The method can achieve an satisfied accuracy in the well-trained range under the Maréchal criterion, and even extrapolates to an untrained, broader wavefront range with reasonable errors, demonstrating a robust performance.

5. Conclusion

In this work, we develop an efficient method to improve the accuracy and robustness of phase diversity wavefront sensing by introducing a Fourier-based neural network, which predicts better starting points from the Fourier domain. We find that low-frequency Fourier coefficients can preserve the image consistency between simulated and real images, greatly improving the representation efficiency without using all Fourier terms. With only one optimization search of the L-BFGS algorithm, the system achieves superior performance with highly acceptable accuracy and robustness, validated by statistical experimental results and extensive quantitative analyses. The mean RMS residual error on $800$ different aberrations (up to 15th term Zernike polynomials) is $0.039\lambda$, and $95\%$ of the RMS residual errors are less than $0.05\lambda$, outperforming several state-of-the-art methods in terms of accuracy and processing speed. Furthermore, we investigate the system robustness by changing key parameters (such as the radius of the frequency circle and the number of trained networks), and evaluating their effects on the performance. Statistics show that the method can be extended to various circumstances. With its high accuracy, fast speed, and strong robustness, we envision the developed method could be a promising tool for optical system aberration measurements and optical imaging system alignments.

Funding

Shenzhen Public Technical Service Platform program (GGFW2018020618063670); Fonds Wetenschappelijk Onderzoek (FWOTM1039); Equipment Research Program of the Chinese Academy of Sciences (Y70X25A1HY, YJKYYQ20180039).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. Deprez, C. Bellanger, L. Lombard, W. Boucher, and J. Primot, “Piston and tilt interferometry for segmented wavefront sensing,” Opt. Lett. 41(6), 1078–1081 (2016). [CrossRef]

2. Z. Gao, X. Li, and H. Ye, “Large dynamic range Shack–Hartmann wavefront measurement based on image segmentation and a neighbouring-region search algorithm,” Opt. Commun. 450, 190–201 (2019). [CrossRef]

3. C. Wu, J. Ko, and C. C. Davis, “Determining the phase and amplitude distortion of a wavefront using a plenoptic sensor,” J. Opt. Soc. Am. A 32(5), 964–978 (2015). [CrossRef]

4. B. Xin, C. Claver, M. Liang, S. Chandrasekharan, G. Angeli, and I. Shipsey, “Curvature wavefront sensing for the large synoptic survey telescope,” Appl. Opt. 54(30), 9045–9054 (2015). [CrossRef]

5. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: a contemporary overview,” IEEE Signal Process. Mag. 32(3), 87–109 (2015). [CrossRef]

6. R. G. Paxman, T. J. Schulz, and J. R. Fienup, “Joint estimation of object and aberrations by using phase diversity,” J. Opt. Soc. Am. A 9(7), 1072–1085 (1992). [CrossRef]

7. R. L. Kendrick, D. S. Acton, and A. Duncan, “Phase-diversity wave-front sensor for imaging systems,” Appl. Opt. 33(27), 6533–6546 (1994). [CrossRef]

8. D. Li, S. Xu, D. Wang, and D. Yan, “Large-scale piston error detection technology for segmented optical mirrors via convolutional neural networks,” Opt. Lett. 44(5), 1170–1173 (2019). [CrossRef]

9. D. Wilding, P. Pozzi, O. Soloviev, G. Vdovin, and M. Verhaegen, “Phase diversity based object estimation in light-sheet fluorescence microscopy,” in Bio-Optics: Design and Application, (Optical Society of America, 2017), paper BoTu2A–2.

10. M. P. Lamb, C. Correia, J.-F. Sauvage, J.-P. Véran, D. R. Andersen, A. Vigan, P. L. Wizinowich, M. A. Van Dam, L. Mugnier, and C. Bond, “Quantifying telescope phase discontinuities external to adaptive optics systems by use of phase diversity and focal plane sharpening,” J. Astron. Telesc. Instrum. 3(3), 039001 (2017). [CrossRef]

11. D. Wu, C. Yang, H. Li, P. Zhang, X. Zhang, Z. Cao, Q. Mu, and L. Xuan, “Astronomical observation by 2-meter telescope based on liquid crystal adaptive optics with phase diversity,” Opt. Commun. 439, 129–132 (2019). [CrossRef]

12. F. Li and C. Rao, “Algorithms for phase diversity wavefront sensing,” in Advanced Sensor Systems and Applications IV, vol. 7853 (SPIE, 2010), pp. 478–486.

13. F.-N. Hwang and X.-C. Cai, “Improving robustness and parallel scalability of newton method through nonlinear preconditioning,” in Domain Decomposition Methods in Science and Engineering (Springer, 2005), pp. 201–208.

14. D. B. Moore and J. R. Fienup, “Extending the capture range of phase retrieval through random starting parameters,” in Frontiers in Optics, (Optical Society of America, 2014), paper FTu2C–2.

15. P. Zhang, C. Yang, Z. Xu, Z. Cao, Q. Mu, and L. Xuan, “Hybrid particle swarm global optimization algorithm for phase diversity phase retrieval,” Opt. Express 24(22), 25704–25717 (2016). [CrossRef]

16. X. Qi, G. Ju, and S. Xu, “Efficient solution to the stagnation problem of the particle swarm optimization algorithm for phase diversity,” Appl. Opt. 57(11), 2747–2757 (2018). [CrossRef]

17. D. Yue, S. Xu, and H. Nie, “Co-phasing of the segmented mirror and image retrieval based on phase diversity using a modified algorithm,” Appl. Opt. 54(26), 7917–7924 (2015). [CrossRef]

18. D. Li, S. Xu, X. Qi, D. Wang, and X. Cao, “Variable step size adaptive cuckoo search optimization algorithm for phase diversity,” Appl. Opt. 57(28), 8212–8219 (2018). [CrossRef]

19. M. D. Bergkoetter and J. R. Fienup, “Increasing capture range of phase retrieval on a large scale laser system,” in Computational Optical Sensing and Imaging, (Optica Publishing Group, 2015), paper CW4E–3.

20. Z. Zhou, Y. Nie, Q. Fu, Q. Liu, and J. Zhang, “Robust statistical phase-diversity method for high-accuracy wavefront sensing,” Opt. Lasers Eng. 137, 106335 (2021). [CrossRef]

21. H. Guo, Y. Xu, Q. Li, S. Du, D. He, Q. Wang, and Y. Huang, “Improved machine learning approach for wavefront sensing,” Sensors 19(16), 3533 (2019). [CrossRef]

22. Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express 27(1), 240–251 (2019). [CrossRef]

23. Y. He, Z. Liu, Y. Ning, J. Li, X. Xu, and Z. Jiang, “Deep learning wavefront sensing method for Shack-Hartmann sensors with sparse sub-apertures,” Opt. Express 29(11), 17669–17682 (2021). [CrossRef]

24. Q. Xin, G. Ju, C. Zhang, and S. Xu, “Object-independent image-based wavefront sensing approach using phase diversity images and deep learning,” Opt. Express 27(18), 26102–26119 (2019). [CrossRef]

25. G. Ju, X. Qi, H. Ma, and C. Yan, “Feature-based phase retrieval wavefront sensing approach using machine learning,” Opt. Express 26(24), 31767–31783 (2018). [CrossRef]

26. S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. 43(6), 1235–1238 (2018). [CrossRef]

27. S. W. Paine and J. R. Fienup, “Machine learning for avoiding stagnation in image-based wavefront sensing,” Proc. SPIE 10980, 109800T (2019). [CrossRef]

28. A. Blanc, L. M. Mugnier, and J. Idier, “Marginal estimation of aberrations and image restoration by use of phase diversity,” J. Opt. Soc. Am. A 20(6), 1035–1045 (2003). [CrossRef]

29. R. E. Carlisle and D. S. Acton, “Demonstration of extended capture range for James Webb Space Telescope phase retrieval,” Appl. Opt. 54(21), 6454–6460 (2015). [CrossRef]

30. P. Zhang, C. Yang, Z. Xu, Z. Cao, Q. Mu, and L. Xuan, “High-accuracy wavefront sensing by phase diversity technique with bisymmetric defocuses diversity phase,” Sci. Rep. 7(1), 1–10 (2017). [CrossRef]

31. B. Xu, “Identifying fabric structures with fast Fourier transform techniques,” Text. Res. J. 66(8), 496–506 (1996). [CrossRef]

32. R. J. Erb, “Introduction to backpropagation neural network computation,” Pharm. Res. 10(2), 165–170 (1993). [CrossRef]

33. R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Neural Networks for Perception (Elsevier, 1992), pp. 65–93.

34. G. R. Brady, M. Guizar-Sicairos, and J. R. Fienup, “Optical wavefront measurement using phase retrieval with transverse translation diversity,” Opt. Express 17(2), 624–639 (2009). [CrossRef]

35. C. R. Vogel, “A limited memory BFGS method for an inverse problem in atmospheric imaging,” in Methods and Applications of Inversion (Springer, 2000), pp. 292–304.

36. M. Born and E. Wolf, Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light (Elsevier, 2013).

Method	M	RMS residual error ( $λ$ )					Time cost (s)
Method	M	min	max	mean	$< 0.05$	$< 0.071$	Time cost (s)
Classic [35]	1	0.018	1.856	0.120	68%	71.63%	0.59
Bootstrapping [19]	1	0.017	0.649	0.086	75.25%	79.5%	0.79
Random [14]	5	0.017	0.594	0.065	84.75%	89.12%	4.92
Random [14]	20	0.017	0.589	0.042	92.5%	98.38%	21.77
K-means [20]	5	0.017	0.540	0.061	86.25%	90.62%	5.09
K-means [20]	20	0.017	0.301	0.042	92.63%	98%	21.52
Ours	1	0.017	0.056	0.039	95%	100%	0.37

Changed Parameter	Absolute relative error
Changed Parameter	Low-frequency	Middle-frequency	High-frequency
object size	0.06	0.33	1.15
aberration	0.04	0.40	0.69
defocus error	0.02	0.23	1.93
noise level	0.00	0.12	1.18e+13

Normalized radius	RMS residual error $< 0.071 λ$			Training time (min)
Normalized radius	min	max	mean	Training time (min)
0.02	87.25%	92.75%	90.42%	0.06
0.03	97.38%	99.88%	98.97%	0.80
0.05	99%	100%	99.52%	1.44
0.07	97.75%	99.88%	99.39%	3.19
0.08	98.38%	99.88%	99.29%	6.50

Number of networks	RMS residual error $< 0.071 λ$			Time cost (s)
Number of networks	min	max	mean	Time cost (s)
1	99%	100%	99.52%	0.37
2	99.5%	100%	99.86%	0.74
3	99.62%	100%	99.91%	1.12

Method	M	RMS residual error ( $λ$ )					Time cost (s)
Method	M	min	max	mean	$< 0.05$	$< 0.071$	Time cost (s)
Classic [35]	1	0.018	1.856	0.120	68%	71.63%	0.59
Bootstrapping [19]	1	0.017	0.649	0.086	75.25%	79.5%	0.79
Random [14]	5	0.017	0.594	0.065	84.75%	89.12%	4.92
Random [14]	20	0.017	0.589	0.042	92.5%	98.38%	21.77
K-means [20]	5	0.017	0.540	0.061	86.25%	90.62%	5.09
K-means [20]	20	0.017	0.301	0.042	92.63%	98%	21.52
Ours	1	0.017	0.056	0.039	95%	100%	0.37

Phase-diversity wavefront sensing enhanced by a Fourier-based neural network

Abstract

1. Introduction

2. Theory

2.1 Phase diversity wavefront sensing

2.2 Neural network based on low-frequency Fourier coefficients

2.3 PDWS enhanced by the Fourier-based neural network

3. Experiments

3.1 Experimental setup

3.2 Image simulation and network training

3.3 Wavefront retrieval results

3.4 Performance comparison

4. Discussion

4.1 Generalization

4.2 Fourier frequency selection

4.3 Limitation and improvement

4.4 Investigation on wavefront range

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (4)

Equations (10)

Optics Express