Direct wavefront sensing with a plenoptic sensor based on deep learning

Hao Chen; Hao Chen; Haobo Zhang; Haobo Zhang; Haobo Zhang; Yi He; Ling Wei; Ling Wei; Jinsheng Yang; Jinsheng Yang; Xiqi Li; Xiqi Li; Linghai Huang; Linghai Huang; Kai Wei; Kai Wei; Kai Wei

doi:10.1364/OE.481433

1. Introduction

The plenoptic wavefront sensor (PWS) has been proposed in recent years to fill the deficiency of traditional Shack-Hartmann wavefront sensor (SHWS) [1,2]. However, the largest problem of traditional PWS is the obvious step change of the slope response, which leads to low reconstruction accuracy. Although the modified PWS with defocus modulation has been proposed to improve the linearity, it cannot remove the nonlinear effect completely [3]. Moreover, several improved reconstruction approaches, such as the checkerboard algorithm, have been proposed to solve this problem, but their performances are still unsatisfying [4–6]. Recently, an imaging sensor with a similar structure as PWS has been proposed by Wu et al. [7], proved having great ability to obtain high-resolution images. One key step in their work was obtaining accurate aberrations at different positions of the image. However, limited by the wavefront detection accuracy of PWS, iteration operations were needed to restore the actual aberrations. Therefore, it is still an urgent task to improve the wavefront sensing accuracy for PWS.

In the past few years, the artificial neural networks (ANNs) or deep learning models with convolutional neural networks (CNNs), have been applied in the SHWS-based wavefront detection. Li, and Gomez et al. proposed to improve the centroids estimation accuracy through ANNs [8,9]. Recently, Zhao et al. utilized the correctly estimated centroids as the input of a U-Net model to predict the lost centroids [10]. But these methods still use traditional approach to reconstruct the wavefront from slope measurements. Swanson and Ceruso et al. utilized the U-Net model to directly restore the phase shape from slopes [11,12]. Later, Jia et al. proposed a compressive deep learning model reconstructing wavefront only with slopes of sub-apertures whose spot images have high signal-to-noise ratio [13]. Dubose et al. utilized both the Hartmann-gram and slopes as the input of the U-Net-like model to reconstruct the wavefront and achieved superior performance [14]. However, these methods mentioned above still require slope computation from the SHWS for wavefront reconstruction. To tackle this problem, Hu et al. proposed to use the Hartmann-gram as the input of the AlexNet model and directly obtained the Zernike terms [15]. A novel U-Net-like model termed SH-Net is also utilized by them to predict the wavefront map from the Hartmann-gram directly [16] and superior performance was obtained by them. He et al. modified the Resnet-50 model to restore Zernike coefficients directly from sparse sub-aperture spot images instead of the whole Hartmann-gram [17]. Recently, Guo et al. presented a novel lightweight CNN (termed SH-CNN) to reconstruct the Zernike polynomials coefficients from the Hartmann-gram [18]. Recently, Guo et al. also concluded the state-of-the-art deep learning-assisted wavefront sensors in [19]. Although the deep-learning-assisted wavefront sensing methods have been proved effective when combining with traditional SHWS, as far as we known, there is no published work about the use of a deep learning model to improve the wavefront detection performance of PWS. To solve this problem, recently a deep learning model named PWS-ResUnet (which combines the residual connection block and U-Net model together) was proposed in our previous work [20]. Although the PWS-ResUnet obtains good performance, slope computation procedure is still needed and the dimension of restored wavefront is limited by the dimension of slope measurements.

As well-studied in PWS-based wavefront detection, due to the nonlinear slope response of PWS, it is hard to accurately calculate the slope of each sub-aperture. In addition, the limited micro-lens number also restricts the calculation of detailed wavefront distribution over each sub-aperture through the centroid displacements. To avoid above problems, in this paper, we propose and demonstrate a new method based on deep learning to assist the PWS for direct wavefront detection. The basic deep learning model used in this paper is named the SwinUnet (it is a new U-Net model recently proposed by Cao et al. [21], which has been proved efficient for medical image segmentation). The SwinUnet aims to use the intensity distribution of the whole plenoptic image to map the relationship between the wavefront and the distorted plenoptic pattern accurately so that it can predict the phase map from a plenoptic image directly. With this method, slope computation is not needed anymore and the dimension of restored wavefront is increased so that detailed wavefront distribution can be obtained directly. The statistical results show that the averaged root mean square of residual wavefront error with our method is less than 1/14λ (Marechal criterion), proving the proposed method successfully breaks through the non-linear problem existed in PWS wavefront sensing. Additionally, our method could provide a much lower RMSE when compared to traditional modal approach, and a promising performance compared to the traditional CNN models. To the best of our knowledge, it is the first time to perform direct wavefront detection with a deep-learning-based method in PWS-based applications and achieve the state-of-the-art performance.

This paper is organized as follows. In Section 2, the basic principles of PWS are introduced; In Section 3, we describe our simulation setup, i.e. the data simulator and the architecture of our model used for this work. Then, the simulation results of different wavefront reconstruction methods are given in Section 4. Finally, we make our conclusion and anticipate our future works in Section 5.

2. Basic working principles of PWS

The structure of PWS is shown in Fig. 1. It consists of an aperture modulator (AM), an objective lens (OL), a micro lens array (MLA), and an image sensor (IS). In Fig. 1, d_AM is the diameter of AM, d_obj and f_obj represent the diameter and focal length of OL, d_MLA is the length of MLA, and d₂ and f₂ denote the pitch and focal length of MLA.

Fig. 1. The structure of PWS. Red dot lines denote the complex amplitudes and green dot lines represent the light rays.

Download Full Size | PDF

For the sake of brevity, the valid diameter of OL is marked with d_vobj = d_AM. To avoid overlapping of MLA sub-aperture images, the f-number of OL (only the valid aperture is considered in this paper) should be no less than that of the microlens, which can be expressed as,

(1)$${{{f_{obj}}} / {{d_{vobj}}}} \ge {{{f_2}} / {{d_2}}}.$$

When the f-number of microlens matches the f-number of OL, the PWS can make the best use of the detector pixels (it is assumed to be satisfied in this paper). According to the optical setup of PWS in Fig. 1, we can derive the intensity distribution of a plenoptic image from Fourier optics.

Assuming the phase of the complex amplitude (CA) on the entrance pupil is given by,

(2)$$W({{x_0},{y_0}} )= \sum\limits_{j = 1}^K {{a_j}{Z_j}({{x_0},{y_0}} )} ,$$

where a_j is the jth coefficient, and Z_j(x₀, y₀) is the modal function like Zernike modes or Karhunen–Loeve modes used to decompose the phase with a maximal order K.

The CA at the back plane of OL can be derived with,

(3)$$\begin{aligned} &{U_2}({x,y} )= \frac{A}{{j\lambda {f_{obj}}}}\exp ({jk{f_{obj}}} )\times P({x,y} )\\ &\quad \mathrm{{\cal F}}{\left\{ {{U_0}({{x_0},{y_0}} )\exp \left[ {j\frac{k}{{2{f_{obj}}}}({x_0^2 + y_0^2} )} \right]} \right\}_{{f_x} = \frac{x}{{\lambda {f_{obj}}}},{f_y} = \frac{y}{{\lambda {f_{obj}}}}}}. \end{aligned}$$

Here ${U_0}({{x_0},{y_0}} )= A\exp ({jW({{x_0},{y_0}} )} )$ is the CA of input pupil plane, $\mathrm{{\cal F}}( \cdot )$ denotes the Fourier transform, k = 2π/λ is the wave number, and P(x, y) is the pupil function of OL.

Then the CA at the front plane of MLA can be computed by,

(4)$$\begin{aligned} &{U_3}({s,t} )= \frac{1}{{j\lambda {f_{obj}}}}\exp ({jk{f_{obj}}} )\exp \left( {j\frac{k}{{2{f_{obj}}}}({{s^2} + {t^2}} )} \right) \times \\ &\quad \mathrm{{\cal F}}{\left\{ {{U_2}({x,y} )\exp \left[ {j\frac{k}{{2{f_{obj}}}}({{x^2} + {y^2}} )} \right]} \right\}_{{f_x} = \frac{s}{{\lambda {f_{obj}}}},{f_y} = \frac{t}{{\lambda {f_{obj}}}}}}, \end{aligned}$$

where (s, t) is the coordinate of MLA plane.

The MLA splits U₃ into M × N square sub-apertures (where M × N is the number of MLA lenslet units), and the CA at the back plane of (i, j)th sub-aperture can be described with,

(5)$$\left\{ \begin{array}{l} {U_3}({{s_i},{t_j}} )= {U_3}({s,t} )\textrm{rect}({i,j} )\\ {U_4}({{s_i},{t_j}} )= {U_3}({{s_i},{t_j}} ){e^{\left[ { - j\frac{k}{{2{f_2}}}({{{({{s_i} - {s_{ic}}} )}^2} + {{({{t_j} - {t_{jc}}} )}^2}} )} \right]}} \end{array} \right..$$

Here rect(i, j) denotes an operator which selects the corresponding CA of (i, j)th microlens, (s_i, t_j) is the coordinate of U₃, and (s_ic, t_jc) is the center coordinate of (i, j)th microlens.

Then the image produced by (i, j)th microlens can be formulated as,

(6)$${I_{({i,j} )}}({u,v} )= {\left|{\frac{1}{{j\lambda {f_2}}}\mathrm{{\cal F}}{{\left\{ {{U_4}({{s_i},{t_j}} ){\textrm{e}^{\left[ {j\frac{k}{{2{f_2}}}({{s_i}^2 + {t_j}^2} )} \right]}}} \right\}}_{{f_x} = \frac{u}{{\lambda {f_2}}},{f_y} = \frac{v}{{\lambda {f_2}}}}}} \right|^2}.$$

Here (u, v) denotes the coordinate of IS.

Finally, the image captured by IS can be obtained with,

(7)$$I({u,v} )= \sum\limits_{i = 1}^M {\sum\limits_{j = 1}^N {{I_{({i,j} )}}({u,v} )} } .$$

Then, as pointed out by Clare et al., the phase gradients of PWS can be calculated by the generalized pyramid method [1], which can be formulated as:

(8)$$\left\{ \begin{array}{l} {S_\varepsilon }({\varepsilon ,\eta } )= \frac{{\sum\nolimits_{m ={-} {{({M - 1} )} / 2}}^{{{({M - 1} )} / 2}} {\sum\nolimits_{n ={-} {{({N - 1} )} / 2}}^{{{({N - 1} )} / 2}} {{I_{m,n}}({\varepsilon ,\eta } )n{d_2}} } }}{{L\sum\nolimits_{m ={-} {{({M - 1} )} / 2}}^{{{({M - 1} )} / 2}} {\sum\nolimits_{n ={-} {{({N - 1} )} / 2}}^{{{({N - 1} )} / 2}} {{I_{m,n}}({\varepsilon ,\eta } )} } }}\\ {S_\eta }({\varepsilon ,\eta } )= \frac{{\sum\nolimits_{m ={-} {{({M - 1} )} / 2}}^{{{({M - 1} )} / 2}} {\sum\nolimits_{n ={-} {{({N - 1} )} / 2}}^{{{({N - 1} )} / 2}} {{I_{m,n}}({\varepsilon ,\eta } )m{d_2}} } }}{{L\sum\nolimits_{m ={-} {{({M - 1} )} / 2}}^{{{({M - 1} )} / 2}} {\sum\nolimits_{n ={-} {{({N - 1} )} / 2}}^{{{({N - 1} )} / 2}} {{I_{m,n}}({\varepsilon ,\eta } )} } }} \end{array} \right.,$$

where $\varepsilon \in \left[ {1,2, \cdots ,P} \right],\,\eta \in \left[ {1,2, \cdots Q} \right]$, the pixel number under each sub-aperture is P × Q, I_m_,n(ε, η) is the (ε, η)th pixel value imprinted by the (m, n)th microlens, S_ε(ε, η) and S_η(ε, η) are the slope responses in x and y directions, respectively. After obtaining the slope measurements, the wavefront can be reconstructed with modal or zonal approaches.

It should be noted that the physical model introduced above is consistent with the models reported in previous works [4–6] which have been validated through real experiment settings. Moreover, careful comparisons with previous works (such as plenoptic images as well as slope response curve) are also provided in Supplement 1 to prove the correctness of our physical model. Therefore, we believe that the simulation experiments based on the proposed physical model are still convincing.

3. Neural network model and simulation setup

3.1 Neural network model

The basic deep learning model used in this paper is the SwinUnet, it is a new U-Net model recently proposed by Cao et al. [21], which has been proved efficient for medical image segmentation. The SwinUnet architecture is similar to the SH-Net model [16] to some degree, but it has a very different structure. As presented in Fig. 2, the CNN layers extensively used in SH-Net model are replaced with Swin-Transformer layers. The Swin-transformer layer is an attention mechanism architecture proposed recently by Liu et al., it has been proved to work well in almost all computer vision tasks and has become a hotspot in deep learning [22]. Unlike the SH-Net, which uses convolution kernels with 3 × 3, 5 × 5, 7 × 7, and 9 × 9 sizes to offer different receptive fields, the layer of Swin-Transformer utilizes the simple shifted-window partitioning configurations to extract features of a large receptive field (more information please refer [22]). Moreover, instead of just copying the architecture of Cao’s work, in our implementation, we modify their model by shrinking the third Swin-Transformer block from 6 layers to 2 layers for the sake of simplicity.

Fig. 2. The architecture of SwinUnet. (a) Patch partition indicates downsampling through a 4 × 4 convolution with a stride of 4. Linear embedding layer denotes the combination of a 1 × 1 convolution and a layer normalization operator. Patch merging and patch expanding layers indicate the downsampling and upsampling operators through full connected layer and layer normalization. The numbers under each level indicate the number of channels. (b) Swin transformer architecture in detail. W-MSA and SW-MSA denote the window-based and shifted-window-based multi-head self-attention, respectively.

Download Full Size | PDF

3.2 Simulation setup

To prove the efficiency of SwinUnet, the numerical study is carried out to simulate the wavefront prediction processes with MATLAB and Python. The simulation setup for data generation is shown in Fig. 3. The front focal plane of OL in PWS is assumed to be conjugated to a telescope of 1 m diameter. The detailed parameters of this simulation setup are shown in Table 1. The f-number of the MLA refers to the real product (#64–478, Edmund Optics) and the f-number of OL is matched with that of MLA as depicted in Eq. (1). Through the Fourier optics theorem detailed in Section 2, the distorted plenoptic image could be obtained on the PWS, and a camera is used to monitor the focusing results after compensating for the distortion with the estimated phase. Lastly, the photon noise, readout noise as well as dark current of PWS are considered in the simulation setup (the noise model in our previous work [23] is used).

Fig. 3. Simulation setup for data generation and wavefront detection. L1-L2, relay lenses; BSP, beam splitter plate. The black dotted line indicates the conjugate plane of the aberration, it coincides with the front focal plane of the objective lens (OL) of PWS. The green dotted lines represent the light rays.

Download Full Size | PDF

Table 1. Parameters of PWS used in the simulation setup

View Table | View all tables in this article

Datasets are simulated with the atmospheric coherent length r₀ fixed to 12 cm and the target is assumed to be infinitely far away which can be treated as a point source. The turbulence-induced wavefronts are simulated using 65 Zernike polynomials according to the method proposed by Roddier [24] (the amplitude ranges and statistical characteristics of these Zernike terms are given in Supplement 1). The amplitude of the input light is assumed to be 1 over the entrance pupil. It should be noted that the piston and tip-tilt terms are set to zero, because the piston term does not affect the wavefront distribution while the tip-tilt terms can be rapidly corrected by additional corrections [25].

For the network training, a total of 70000 samples are produced making up the data set, including a training set with 50000 samples, a validation set with 10000 samples and a test set with 10000 samples. Each dataset contains a phase screen as the ground truth and the corresponding plenoptic image. It is worth noting that the size of plenoptic image is 480 × 480, but it should be resized into 224 × 224 for the sake of computation efficiency (here the simple “imresize” function with the bilinear interpolation method is adopted to do so). More importantly, the plenoptic image should be transformed into a Hartmann-gram-like pattern before downsampling. The transformation process is described in Fig. 4(a) and (b). For the sake of brevity, as presented in Fig. 4(a), we can start with a 3 × 3 MLA cell. Each MLA cell contains four pixels as “A”, “B”, “C”, and “D” for simplicity. Each pixel “A” in different sub-apertures represents the same spatial position of the incident wavefront regardless of the different angular spectra represented. By reorganizing the pixels “A” in all MLA cells according to the indexes of MLA cells, we can get a recomposed sub-aperture (the purple boxes in Fig. 4(b)) whose pixel values denote the spot intensities of the same spatial position of the incident wavefront (thus it can be considered as the sub-aperture of the SHWS to some degree [5,26]). At this moment, the pixel number of recomposed sub-aperture is equal to the number of MLA cells. Similarly, by applying the same operation to pixels of different angular spectra (“B”, “C”, and “D” in this case), we can get the transformed image as presented in Fig. 4(b).

Fig. 4. The transformation procedure applied on the plenoptic image. M and N denote the horizontal and vertical indexes of MLA cells, respectively. Each box in (a) denotes a pixel of the plenoptic image and the value in the parentheses represents the intensity. Purple boxes in (b) denote the sub-apertures of the recomposed Hartmann-gram.

Download Full Size | PDF

The SwinUnet in our work is implemented using the VS Code framework with a Pytorch backend (Python version = 3.9.1) and trained with a workstation (Intel Core i9-10920X CPU, NVIDIA RTX 3090 Ti). The optimizer for SwinUnet training is Adam with an initial learning rate of 10⁻⁴ and a scheduler multiplying the learning rate by 0.9 every 4 epochs. The RMSE of residual wavefront between the predicted wavefront and ground truth is used as the loss function.

4. Experimental results

After the network training, we evaluate the performance of our method by comparing the residual wavefront RMSE with SH-Net [16], SH-CNN [18], and Zernike modal reconstructions. Specifically, the Zernike modal algorithm decomposes the wavefront into orthogonal Zernike modes and uses the linear relationships between theoretical gradients of Zernike modes and the slope values of phase map to reconstruct the wavefront. More detailed descriptions about this method are presented in Roddier’s book [27]. It should be noted that zonal approach and deep-learning-based methods [19,20] whose output wavefronts have the same dimension as that of slope measurements are not considered here, as extra interpolation errors will be introduced when comparing with our method.

A set of comparison results is given in Fig. 5. From Fig. 5, it is clear that the reconstructed wavefront by SwinUnet matches much better than those by SH-CNN, and modal approach. Additionally, although our model only performs slightly better than SH-Net, the distribution of the wavefront residuals by SwinUnet is more uniform, while that of SH-Net has relatively larger error near the edge of the pupil. It means that SwinUet could estimate the wavefronts better at different positions of the pupil. As for the SH-CNN and modal approach, although the reconstructed wavefronts are close to the original wavefront, larger residual errors are produced on the whole pupil due to the modal representation characteristic of them.

Fig. 5. Comparison of wavefront reconstruction results of four approaches. (a) Original plenoptic image, (b) recomposed Hartmann-gram, (c) ground truth wavefront, and (d) the output and residual wavefronts. The output wavefronts and their residuals share the same color bar with the ground truth wavefront. The RMSE of each residual is given at the bottom. The gray, purple, blue and red dotted boxes represent the results of modal, SH-CNN, SH-Net and SwinUnet methods, respectively.

Download Full Size | PDF

The point spread functions (PSFs) before and after compensating for the distorted wavefront are shown in Fig. 6. The corresponding central intensity profiles of these PSFs are displayed in Fig. 6(f). From Fig. 6, it is evident that the wavefront estimated by the SwinUnet could better compensate for the wavefront distortion than other methods, and higher central intensities of PSF are obtained with our method. The statistical results of 10000 test data sets are displayed in Table 2. It is clear that our method gets the lowest RMSE and standard deviation in the comparison. The averaged RMSE by the SwinUnet is 0.0602λ, which is ∼48.37% lower than that with SH-CNN (0.1166λ), and ∼73.85% lower than that with the modal approach (0.2302λ). Although our method only has slightly better performance than that with the SH-Net (0.0637λ), our method has an obvious advantage over SH-Net in the aspect of real-time performance, as will be analyzed in following parts.

Fig. 6. The PSFs without compensation and after compensation with different approaches. (a) The PSF without compensation, (b-e) the PSFs after compensation with modal, SH-CNN, SH-Net, and SwinUnet, respectively, (f) the central intensity profiles of (a-e). The PSFs are normalized with the maximum value of ideal Airy spot. For better visual perception, the color bar in (a) is compressed to [0, 9e-3].

Download Full Size | PDF

Table 2. The RMSEs of distorted and residual wavefronts after compensation with different methods

View Table | View all tables in this article

Besides the wavefront detection accuracy, the other key aspect in wavefront sensing is the real-time performance. Figure 7 presents the comparisons of these approaches from several aspects, including the training parameters, the float-points operations (Flops), training time per epoch, prediction time per data set and the loss curves during the training procedure. It should be noted that the reconstruct matrix of modal approach is obtained directly, the training parameters and training time of it are set to 0. From Fig. 7, it is evident that although our model has more training parameters (∼27.143 M) than the SH-Net (∼21.377 M), it has much smaller Flops (∼5.901 G) than the SH-Net (∼30.503 G), resulting in the training time (∼5.49 minutes) 80.65% faster than that with the SH-Net (∼14.66 minutes). Moreover, the SwinUnet takes ∼2.51 ms to predict a wavefront, which is ∼46.02% faster than that with the SH-Net (∼4.65 ms). Although the detection speed of SwinUnet is slower than that with SH-CNN (0.55 ms) and the modal approach (0.14 ms), the detection accuracy of SwinUnet is much higher, which becomes a tradeoff between the accuracy and speed during the wavefront detection. It should be noted that above prediction times of deep-learning-based methods do not consider the time consumption during the image transformation, resizing and data loading procedures. Moreover, the slope calculation time and slope data transmission time are not considered for modal approach too. Additionally, as displayed in Fig. 7(e), the loss curves of our model show smoother descent, and the training and validation loss curves of our model are coincided at the final stage, meaning the good stability of our model. Conversely, for the SH-Net and SH-CNN, the obvious gaps between the training and validation loss curves at final epochs, indicates their models are not optimized and overfitting is presented.

Fig. 7. Performance comparison of four approaches on several aspects. (a) The training parameters. (b) The floating-point operations needed to reconstruct the wavefront. (c) Training time needed for each epoch. (d) The prediction time. (e) The training and validation loss curves during the training procedure, from left to right: modal algorithm, SH-CNN, SH-Net and SwinUnet.

Download Full Size | PDF

The statistical results provided above demonstrate the performance of our method in the wavefront detection at a fixed atmospheric coherent length (r₀ = 12 cm). To test the generalizability of our method to turbulence strength, extra 600 data sets under different r₀ are generated, and the statistical residual wavefront RMSEs are presented in Fig. 8(a). It is obvious that even though the distorted wavefronts exceed the training wavefront ranges, our method could still offer an acceptable estimation. Compared to other three methods, our algorithm has the lowest RMSEs when the turbulence strength increases (r₀ decreases from 12 to 6, it should be noted that the minimum r₀ considered here is 6 cm due to the dynamic range limitation of the simulated PWS). On the other hand, our model gets competitive results as SH-Net does and performs better than modal and SH-CNN approaches, when the turbulence strength decreases (r₀ increases from 12 to 18). Therefore, stable performance is obtained with our model under the dynamic range of current PWS, proving the generalizability of our method to turbulence strength.

Fig. 8. Performance comparison of four approaches under various turbulence strengths and signal levels. Each point denotes a statistical result of 100 test data sets. (a) The statistical residual wavefront RMSEs under different atmospheric coherent length r₀. (b) The statistical residual wavefront RMSEs when the signal level changes.

Download Full Size | PDF

Moreover, since the SwinUnet is trained at a fixed luminous flux level (photon number is equal to 10⁷), it is interesting to explore the performance of our model under varying signal levels. Figure 8(b) gives the statistical residual wavefront RMSEs. It is obvious that the performance of modal approach degrades significantly with the decrease of photon numbers, it is resulted by the increased centroid estimation error. Compared to the slope-based method, deep-learning-based methods show much better robustness to noise, especially, the proposed SwinUnet model gets the lowest RMSEs at all signal levels and performs much better than other two methods at the lowest signal level (total photon number is 4e6), demonstrating the good generalizability of our method to photon budget.

Finally, as mentioned previously, the plenoptic image should be transformed into a Hartmann-gram-like pattern before inputting to the deep learning models. In fact, we find it is a very important step for deep-learning-based methods. In Fig. 9, we compare the statistical residual wavefront RMSEs of the SwinUnet, SH-Net and SH-CNN models with or without the transformation. It is obvious that the transformation improves the performance of all methods. In order to study how the transformation affects the image intensity distribution and the learning procedure of neural network models, we present the plenoptic image and the transformed image in Fig. 10(a) and (b), respectively. Comparing Fig. 10(a) with (b), it is obvious that after transformation, the recomposed image is similar to a typical Hartmann-gram and has more uniform intensity feature distribution (spot patterns are distributed at almost all areas of the image) while the original plenoptic image has large dark areas in the background. Since the success of deep learning models relies on the exploration of long-range dependencies and large reception field information [22], therefore, the transformed image will help the models extract more useful features to improve the wavefront reconstruction accuracy. Conversely, for the plenoptic image, less or even no information could be extracted from the background, leading to poor performance of the deep learning models. Therefore, we suggest that adopting appropriate transformation to make image features be uniformly distributed is helpful to enhance the fitting capability of deep learning model and obtain better solution. To some degree, we suggest the finding is also consistent with Zhang’s work [28], which applies conformal mapping to pre-process circular features of PSFs images for achieving better focal wavefront sensing.

Fig. 9. The RMSEs of deep-learning-based models with/without transformation

Download Full Size | PDF

Fig. 10. The original plenoptic image (a) and the recomposed image (b) after transformation.

Download Full Size | PDF

5. Conclusion

In conclusion, a novel deep-learning model termed SwinUnet was proposed to directly reconstruct the wavefront from the plenoptic image for the PWS. To the best of our knowledge, it is the first time a deep-learning-based method proposed to directly restore the phase map from plenoptic image for the PWS. The results showed that the proposed method could provide wavefront detection with a lower RMSE (0.0602λ) than the Marechal criterion (1/14λ). As far as we known, it is the best result published in PWS wavefront detection. Additionally, the SwinUnet model performed better than the state-of-the-art deep-learning-based methods. In a word, our method got a tradeoff between the accuracy and speed during the wavefront detection compared to other three methods. Lastly, we also found that adopting appropriate transformation to make image features be uniformly distributed in PWS and focal wavefront sensing is helpful to enhance the fitting capability of deep learning model and obtain better solution. This paper successfully demonstrated the capability of using deep learning technique to solve the nonlinear problem of PWS. In addition, it proved the potentiality of introducing the attention mechanism into the wavefront sensing area, and we believe better results will be obtained by combining the transformer architecture with traditional deep-learning-based SHWS.

In our follow-up work, we will focus on establishing the PWS-based experimental setup in laboratory and validating the performance of SwinUnet on the setup. In real optical experiment setting, various error sources such as the non-uniformity of the light source, the static aberration of the optical setup, assembly system error and so on will affect the performance of our model, but we believe that fine tuning is a useful tool to tackle this problem since these errors are relatively static while the SwinUnet has displayed good robustness to noise as shown in Fig. 8(b). Moreover, for further developments, transfer learning could help the SwinUnet perform on different configurations of PWSs or SHWSs.

Funding

National Natural Science Foundation of China (62105337, 12073031); National Key Research and Development Program of China (2021YFF0700700); Scientific Instrument Developing Project of the Chinese Academy of Sciences (ZDKYYQ20200005); Strategic Priority Research Program of the Chinese Academy of Sciences (XDA25020316).

Acknowledgments

We thank Dr. Zhang Yongfeng for useful suggestions when preparing this paper.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. R. M. Clare and R. G. Lane, “Phase retrieval from subdivision of the focal plane with a lenslet array,” Appl. Opt. 43(20), 4080–4087 (2004). [CrossRef]

2. C. Wu, J. Ko, and C. C. Davis, “Determining the phase and amplitude distortion of a wavefront using a plenoptic sensor,” J. Opt. Soc. Am. A 32(5), 964–978 (2015). [CrossRef]

3. X. Chen, W. Xie, H. Ma, J. Chu, B. Qi, G. Ren, X. Sun, and F. Chen, “Wavefront measurement method based on improved light field camera,” Results Phys. 17, 103007 (2020). [CrossRef]

4. J. Ko, C. Wu, and C. C. Davis, “An adaptive optics approach for laser beam correction in turbulence utilizing a modified plenoptic camera,” Proc. SPIE 9614, 96140I (2015). [CrossRef]

5. J. Hu, T. Chen, X. Lin, L. Wang, Q. An, and Z. Wang, “Improved wavefront reconstruction and correction strategy for adaptive optics system with a plenoptic Sensor,” IEEE Photonics J. 13(4), 1–8 (2021). [CrossRef]

6. Z. Wang, T. Chen, X. Lin, L. Wang, Q. An, and Jintian Hu, “A local threshold checkerboard algorithm for adaptive optics system with a plenoptic sensor,” IEEE Photonics J. 14(1), 1–9 (2022). [CrossRef]

7. J. Wu, Y. Guo, C. Deng, A. Zhang, H. Qiao, Z. Lu, J. Xie, L. Fang, and Q. Dai, “An integrated imaging sensor for aberration-corrected 3D photography,” Nature 612(7938), 62–71 (2022). [CrossRef]

8. Z. Li and X. Li, “Centroid computation for Shack-Hartmann wavefront sensor in extreme situations based on artificial neural networks,” Opt. Express 26(24), 31675–31692 (2018). [CrossRef]

9. S. L. Suárez Gómez, C. González-Gutiérrez, E. Díez Alonso, J. D. Santos Rodríguez, M. L. Sánchez Rodríguez, J. Carballido Landeira, A. Basden, and J. Osborn, “Improving adaptive optics reconstructions with a deep learning approach,” in International Conference on Hybrid Artificial Intelligent Systems, R. Nugent, ed. (Springer, 2018), pp. 74–83.

10. M. Zhao, W. Zhao, S. Wang, P. Yang, K. Yang, H. Lin, and L. Kong, “Centroid-Predicted Deep Neural Network in Shack-Hartmann Sensors,” IEEE Photonics J. 14(1), 1–10 (2022). [CrossRef]

11. R. Swanson, M. Lamb, C. Correia, S. Sivanandam, and K. Kutulakos, “Wavefront reconstruction and prediction with convolutional neural networks,” Proc. SPIE 10703, 52 (2018). [CrossRef]

12. S. Ceruso, S. Bonaque-Gonzalez, A. Pareja-Rios, D. Carmona-Ballester, and J. Trujillo-Sevilla, “Reconstructing wavefront phase from measurements of its slope, an adaptive neural network based approach,” Opt. Laser. Eng 126, 105906 (2020). [CrossRef]

13. P. Jia, M. Ma, D. Cai, W. Wang, J. Li, and C. Li, “Compressive Shack–Hartmann wavefront sensor based on deep neural networks,” Mon. Not. R. Astron. Soc. 503(3), 3194–3203 (2021). [CrossRef]

14. T. B. DuBose, D. F. Gardner, and A. T. Watnik, “Intensity-enhanced deep network wavefront reconstruction in Shack–Hartmann sensors,” Opt. Lett. 45(7), 1699–1702 (2020). [CrossRef]

15. L. Hu, S. Hu, W. Gong, and K. Si, “Learning-based Shack-Hartmann wavefront sensor for high-order aberration detection,” Opt. Express 27(23), 33504–33517 (2019). [CrossRef]

16. L. Hu, S. Hu, W. Gong, and K. Si, “Deep learning assisted Shack–Hartmann wavefront sensor for direct wavefront detection,” Opt. Lett. 45(13), 3741–3744 (2020). [CrossRef]

17. Y. He, Z. Liu, Y. Ning, J. Li, X. Xu, and Z. Jiang, “Deep learning wavefront sensing method for Shack-Hartmann sensors with sparse sub-apertures,” Opt. Express 29(11), 17669–17682 (2021). [CrossRef]

18. Y. Guo, Y. Wu, Y. Li, X. Rao, and C. Rao, “Deep phase retrieval for astronomical Shack–Hartmann wavefront sensors,” Mon. Not. R. Astron. Soc. 510(3), 4347–4354 (2022). [CrossRef]

19. Y. Guo, L. Zhong, L. Min, J. Wang, Y. Wu, K. Chen, K. Wei, and C. Rao, “Adaptive optics based on machine learning: a review,” Opto-Electron. Adv. 5(7), 200082 (2022). [CrossRef]

20. H. Chen, L. Wei, Y. He, J. Yang, X. Li, L. Li, L. Huang, and K. Wei, “Deep learning assisted plenoptic wavefront sensor for direct wavefront detection,” Opt. Express 31(2), 2989–3004 (2023). [CrossRef]

21. H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” arXiv, arXiv:2105.05537v1 (2021).

22. Z. Liu, Y. Lin, Y. Cao, Han Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV, 2021), pp. 10012–10022.

23. H. Chen, Y. Zhang, H. Bao, L. Li, and K. Wei, “Hartmanngram structural information-assisted aberration measurement for a 4-meter-thin primary mirror with a large dynamic range,” Opt. Commun. 524, 128749 (2022). [CrossRef]

24. A. Roddier N, “Atmospheric wavefront simulation using Zernike polynomials,” Opt. Eng. 29(10), 1174–1180 (1990). [CrossRef]

25. S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. 43(6), 1235–1238 (2018). [CrossRef]

26. L. F. Rodríguez-Ramos and Y. Martín, “The Plenoptic Camera as a wavefront sensor for the European Solar Telescope (EST),” Proc. SPIE 7439, 74390I (2009). [CrossRef]

27. F. Roddier, Adaptive optics in astronomy (Cambridge University Press, 1999), Chap. 2.

28. Y. Zhang, T. Zhou, L. Fang, L. Kong, H. Xie, and Q. Dai, “Conformal convolutional neural network (CCNN) for single-shot sensorless wavefront sensing,” Opt. Express 28(13), 19218–19228 (2020). [CrossRef]

Parameters	Value	Parameters	Value
Telescope diameter D (m)	1	atmospheric coherent length r₀ (cm)	12
Wavelength (nm)	632.8	Image color channel	1
MLA size	15 × 15	Image bit depth	16
Size of each microlens (µm)	300	Total photon number in each image	10⁷
Focal length of MLA (mm)	18.8	Readout noise (photons)	8
Focal length of OL (mm)	300	Exposure time (ms)	5
Sub-aperture size (pixel)	32 × 32	Dark current (eps)	0.01

Method	Distorted	Modal	SHCNN	SH-Net	SwinUnet
RMS	2.0517 ± 0.4877	0.2302 ± 0.0512	0.1166 ± 0.0360	0.0637 ± 0.0148	0.0602 ± 0.0118
(mean ± std λ)	2.0517 ± 0.4877	0.2302 ± 0.0512	0.1166 ± 0.0360	0.0637 ± 0.0148	0.0602 ± 0.0118

Parameters	Value	Parameters	Value
Telescope diameter D (m)	1	atmospheric coherent length r₀ (cm)	12
Wavelength (nm)	632.8	Image color channel	1
MLA size	15 × 15	Image bit depth	16
Size of each microlens (µm)	300	Total photon number in each image	10⁷
Focal length of MLA (mm)	18.8	Readout noise (photons)	8
Focal length of OL (mm)	300	Exposure time (ms)	5
Sub-aperture size (pixel)	32 × 32	Dark current (eps)	0.01

Method	Distorted	Modal	SHCNN	SH-Net	SwinUnet
RMS	2.0517 ± 0.4877	0.2302 ± 0.0512	0.1166 ± 0.0360	0.0637 ± 0.0148	0.0602 ± 0.0118
(mean ± std λ)	2.0517 ± 0.4877	0.2302 ± 0.0512	0.1166 ± 0.0360	0.0637 ± 0.0148	0.0602 ± 0.0118

Direct wavefront sensing with a plenoptic sensor based on deep learning

Abstract

1. Introduction

2. Basic working principles of PWS

3. Neural network model and simulation setup

3.1 Neural network model

3.2 Simulation setup

4. Experimental results

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (10)

Tables (2)

Equations (8)

Optics Express