Deep learning wavefront sensing method for Shack-Hartmann sensors with sparse sub-apertures

Yulong He; Yulong He; Zhiwei Liu; Zhiwei Liu; Zhiwei Liu; Yu Ning; Yu Ning; Jun Li; Jun Li; Xiaojun Xu; Xiaojun Xu; Zongfu Jiang; Zongfu Jiang

doi:10.1364/OE.427261

1. Introduction

Adaptive Optics (AO) is a technique for measuring and correcting wavefront aberrations in real-time. It has been used in a wide range of applications, including astronomical observation, laser beam cleaning and biomedical imaging [1–4]. Wavefront sensing plays a vital role in AO: wavefront sensors (WFS) detect the distorted wavefront phase distribution by measuring light intensity. Several wavefront sensors have been proposed, such as the Shack-Hartmann WFS, the plenoptic WFS, the pyramid WFS, the curvature WFS and the image-based WFS [5–9].

Shack-Hartmann wavefront Sensor (SHWFS) is a type of conventional pupil plane wavefront sensor, and its characteristics of simple structure and strong adaptability make it popular in AO systems. In principle, SHWFS samples the incident wavefront as sub-wavefronts through a piece of the microlens array (MLA). The sampled sub-wavefronts are considered as plane waves, and the original wavefront is reconstructed from the slopes of sub-wavefronts through zonal [10,11] or modal approaches [12,13]. Therefore, when aiming for atmospheric turbulence detection in AO systems, the sub-aperture diameter d is typically set to be equal to or less than the atmospheric coherent length ${r_0}$. In this case, the number of sub-apertures is determined to be ${({{D / {{r_0}}}} )^2}$, where $D$ is the telescope aperture diameter [14].

However, since the collected total energy is divided into sub-apertures, the sub-aperture spots are easily lost in the noise of the CCD detector. This means that the number of sub-apertures limits the detectable faint targets. If the beacon energy is insufficient, only sparse microlens arrays can be selected to improve the image Signal to Noise Ratio (SNR). When the wavefront is under-sampled, the higher-order modes in the sub-aperture will change the energy distribution of focal spots, resulting in incorrect slope calculations. In some extreme cases, the sub-aperture centroid displacements are insufficient to represent the wavefront phase distribution. Therefore, the slope-based wavefront sensing method is sometimes impractical, and an alternative solution needs to be proposed. We believe that the deep learning wavefront sensing method based on sub-aperture images is a suitable solution for the above problems.

Currently, deep learning is a hotspot research tool in the optics field and has already been applied in SHWFS-based wavefront sensing [15]. The essence of deep learning wavefront sensing is to use massive data to establish the relationship between WFS signals and wavefront aberrations. Guo et al. used an artificial neural network to estimate low-order Zernike coefficients from the calculated x and y spot displacements in the SHWFS [16]. Li et al. applied a neural network model to calculate the centroids at low signal-to-noise ratio conditions [17]. Swanson et al. reconstructed the wavefront from x and y slopes through a modified U-net [18]. Hu et al. used the spot array images to achieve wavefront aberration detection based on the convolutional neural network model [19,20]. Xu et al. proposed a wavefront reconstruction algorithm for SHWFS based on an extreme learning machine [21]. The above work provided new approaches for wavefront sensing. However, among most of the approaches, the wavefront is typically sufficiently sampled by the MLA, meaning they cannot handle the cases where the number of sub-apertures is insufficient.

Meanwhile, several image-based deep learning wavefront sensors have also been proposed. Paine et al. used a neural network operating point spread function to determine an initial estimate of the wavefront [22]. Nishizak et al. experimentally demonstrated several image-based wavefront sensors that use defocusing, scattering, and overexposure to directly estimate Zernike coefficients from individual intensity images with Xception networks [23]. Andersen et al. used focused and defocused images to measure atmospheric turbulence, aiming to apply deep learning to sharpen AO-free astronomical images [24,25]. Wu et al. improved the work of Anderson et al. by simplifying the structure of the neural network [26]. In general, these methods realized a real-time, non-iterative focal plane algorithm such as the phase retrieval (PR) algorithm [27] and the phase diversity (PD) algorithm [9]. However, these methods are not sufficiently accurate in atmospheric turbulence detection. Andersen et al. showed that the PD-based deep learning wavefront sensing approach could only effectively estimate the first 39 terms of the Zernike polynomial and the RMS error between the measurements and the actual values is about ${\lambda / 5}$[24]. We considered that the morphology of focal spots is too complicated for the neural network to establish a functional relationship when the turbulence is strong.

In this paper, we proposed and demonstrated a deep learning wavefront sensing algorithm applied in sparse sub-aperture SHWFS. Since this algorithm is proposed based on the ResNet network [28], we named it SH-ResNet. We treat the focal spot of each sub-aperture as a diffraction spot affected by the sub-wavefront and apply the image-based wavefront sensing method to the sub-aperture spots. The proposed method can predict the wavefront directly from sub-aperture images, breaking the limitation of ${d / {{r_0}}} \approx 1$ when using SHWFS to detect atmospheric turbulence. Besides, we used transfer learning to apply the trained SH-ResNet network to SHWFS with a different sub-aperture number. Transfer learning can reduce the amount of data required for neural network training and accelerate the training process, allowing SH-ResNet to adapt to fluctuations in beacon energy and turbulence intensity. The wavefront sensing performance of this method was evaluated by numerical simulation and compared with several existing approaches. The results show that this method can accurately reconstruct the wavefront distortion of high spatial frequency with fewer sub-apertures.

The remaining parts of the article are arranged as follows: Section 2.1 introduces the method of generating datasets, Section 2.2 describes the network structure of SH-ResNet, and section 2.3 introduces the application of transfer learning in SH-ResNet; In Section 3, the numerical simulation results are compared with the standard modal algorithm to demonstrate the wavefront detection performance of SH-ResNet model; finally, we discuss the results and conclude the paper in Section 4.

2. Method

2.1 Simulation setup

To validate the concept of direct wavefront sensing with the SH-ResNet, we conducted a numerical simulation study of the wavefront prediction process with MATLAB and Python. The simulation setup is shown in Figs. 1(a) and 1(b) indicates the MLA of the SHWFS. The MLA arrangement shown in Fig. 1(b) can provide a better fit to the circular aperture and yield sub-aperture images with consistent features. We suggest that sub-aperture images with consistent features facilitate the training of deep learning. To best balance image SNR and detection accuracy, the turbulence parameters were set to ${d / {{r_0}}} = 2 \sim 4$ (corresponding to ${D / {{r_0} = 6 \sim 12}}$). This ratio $({{d / {{r_0}}}} )$ is maintained between different SHWFS to keep the statistical features of the focal spot consistent, which is valuable in transfer learning.

Fig. 1. (a) Simulation setup for deep learning wavefront detection. (b) MLA of the SHWFS.

Download Full Size | PDF

Table 1 shows the key parameters of numerical simulation. The simulated dataset includes the atmospheric phase screen, best-fit Zernike coefficients, and the sub-aperture images. Each phase screen was generated by the “power spectrum method” [29]. The spatial phase power spectrum is:

(1)$$\Phi _\phi ^{m\nu K}(f )= 0.0229r_0^{ - {5 / 3}}\cdot \frac{{\exp ({{{ - {f^2}} / {f_m^2}}} )}}{{{{({{f^2} + f_0^2} )}^{ - {{11} / 6}}}}},$$

then a phase screen sample can be generated by:

(2)$$\varphi ({\vec{r}} )= {{\cal F}^{ - 1}}\left[ {\sqrt {{\Phi _\varphi }({\vec{f}} )} H({\vec{f}} )} \right],$$

where $f$ represents a spatial frequency vector, $\Phi (f )$ is the power spectrum function, $\varphi ({\vec{r}} )$ is phase screen, ${{\cal \textrm{F}}^{ - 1}}$ is the inverse Fourier transform, $H({\vec{f}} )$ is a random complex vector with unit amplitude and a random phase between 0 and 2π.

Table 1. Key Parameters of the Simulation

View Table | View all tables in this article

To find the best-fit Zernike coefficients for a data set, we first sampled the Zernike polynomials for a circular aperture following the numbering scheme of Noll [30]. Then the coefficients can be calculated using least-squares fitting:

(3)$$w = \sum\limits_{i = 1}^{{n_z}} {{a_i}{v_i}} + r,$$

where $w$ is the total wavefront, ${v_i}$ is a normalized Zernike vector, ${a_i}$ is a Zernike coefficient, $r$ is the vector holding residuals, and ${n_z}$ is the number of Zernikes included in the approximation. Once the phase screen has been generated, the Fresnel diffraction algorithm is used to obtain sub-aperture images [29]. Furthermore, photon noise and detector readout noise were also added in order to enhance the robustness of the network. For simplicity of calculation, only monochromatic light was considered here, and the fluctuations of light intensity within the pupil were also ignored. Figure 2 shows a random set of the simulated data, where Fig. 2(a) shows sub-aperture images and Fig. 2(b) shows the best-fit Zernike coefficients.

Fig. 2. An example of the simulated dataset. (a) The sub-aperture images of SHWFS. (b) The best-fit Zernike coefficients of the turbulence phase screen. The turbulence phase screen and the best fit Zernike phase screen are inserted into subplot.

Download Full Size | PDF

2.2 Neural network architecture

Neural networks have been extensively researched recently. Because sub-images are used as input, we apply convolutional neural networks (CNN) that can handle image information with a reasonable amount of network parameters. We selected ResNet and improved it to solve our problems. ResNet was proposed in 2015 and achieved excellent results in image classification tasks [28]. It is often challenging to train deep networks in deep learning because of the “vanishing gradient problem”. The first layers’ parameters become practically unreachable when training the network during backpropagation. ResNet solves this problem using skip connection: it can accelerate training, optimize parameters to a better state, and improve accuracy. The ResNet network can be set up with various depths (i.e., number of layers), generally in the range 20 to 164. As a compromise between computation time and accuracy, we opted for 50 layers (ResNet50).

The network diagrams of SH-ResNet are displayed in Fig. 3. Seven $64 \times 64$ sub-aperture images are sequentially inputted into the ResNet50 network to extract features, obtaining seven sets of $2048 \times 1 \times 1$ feature vectors. We suggest that the statistical features of sub-aperture images when detecting atmospheric turbulence are mainly determined by ${d / {{r_0}}}$, and the probability distributions of the data between different sub-aperture images are similar. Therefore, the features of all sub-aperture images could be extracted with an identical network structure and parameters. The obtained feature vectors are concatenated into a $2048 \times 7 \times 1$ feature map, downscaled to $2048 \times 1 \times 1$ dimensions using grouped convolution layer (kernel size:$7 \times 1$, groups: 512), and then outputted after fully connected layers with 88 Zernike coefficients (first 91 Zernike polynomials with piston, tip and tilt removed).

Fig. 3. The neural network structure of SH-ResNet. The cyan, orange, blue and green squares represent BottleNeck modules with different parameters; according to the ResNet50 network structure, the number of each module is 3, 4, 6 and 3, respectively (only two of each are shown in the diagram).

Download Full Size | PDF

The optimization of a CNN requires defining a loss function and a solver, and the solver will optimize the network parameters by minimizing the loss function. We choose the “Adam” (Adaptive momentum) solver to determine network parameters [31]. It can maintain gradient smoothing and generate adaptive learning rates for different parameters during the training process. The RMS of the difference between the measured and true values was adopted as the loss function, which corresponds to minimizing the residual wavefront RMS after correction. During the training process, we scaled the input images to have pixel values in the range 0:1 and used dropout to reduce overfitting. Other hyperparameters were set as follows: (i) batch size: 100, (ii) number of epochs: 8, (iii) initial learning rate: 0.0001, (iv) activation function: LeakyRelu, and (v) dropout probability: 0.1.

2.3 Transfer learning in SH-ResNet

In astronomical adaptive optics, the atmospheric coherence length ${r_0}$ and the beacon energy varies in real-time, so the number of sub-apertures in the SHWFS needs to be adjusted accordingly. Typically, when the number of sub-apertures in SHWFS changes, the neural network model needs to be retrained from scratch, which results in wasted time and data. Here, we use transfer learning to apply the trained network to the SHWFS with a different sub-aperture number, accelerating the training process significantly.

Transfer learning is a process which takes advantage of similarities between data, tasks, or models to apply a model learned in an old domain to a new one [32]. With the development of deep learning, more and more researchers have focused on deep transfer learning, which refers to reusing part of the pre-trained network and transforming it into a new network used in the target domain. If we analyze a neural network through its hierarchical structure, it can be found that the earlier features of CNNs contain more generic features (e.g., edge detectors) that are useful to many tasks. However, later layers focus more on learning task-specific features. Therefore, there is no need to train a network from scratch when faced with a new task; instead, we can use the previously trained model and adapt it to the new scheme. This method is termed fine-tuning and is an essential concept in deep transfer learning. Fine-tuning has significant advantages, including time and data savings, better generalization capabilities, and simplicity of implementation [33].

Fine-tuning can also be applied in SH-ResNet for wavefront sensing; we will illustrate it with an example. Figure 4 shows the principle and process of fine-tuning. Suppose we have trained the SH-ResNet in Section 2.2 and called it SH-ResNetA, now we prepare to train another model SH-ResNetB applied to the SHWFS shown in Fig. 5. Instead of repeating the process described in Sections 2.1 and 2.2, we can replace the grouped convolution layer and the fully connected layer of ResNetA with new ones with random weights according to the number of sub-apertures in Fig. 5. Then the rest of the SH-ResNetA can be fixed as a feature extractor whose parameters do not change during the training process. It is noticed that when the sub-aperture number of SHWFS was modified, only the number of input images in SH-ResNet differs and the statistical features of sub-aperture images are maintained. Therefore we believe that the feature extractor is effective. Then the new parameters in the network can be optimized during backpropagation. Because the number of trainable parameters is small, the size of the dataset and the training time will significantly be reduced.

Fig. 4. Principle of fine-tuning for SH-ResNet.

Download Full Size | PDF

Fig. 5. MLA of SHWFS with 19-sub-apertures.

Download Full Size | PDF

3. Numerical simulation results

3.1 Deep learning results

For the network training, 200,000 datasets were generated, and 1% of them were used for validation. The network model was implemented using a PyTorch framework based on Python 3.8 and trained on a cloud server (Intel Xeon Gold 6278C CPU, NVIDIA Tesla T2 GPU with 16GB VRAM). It took a total of about 5 h to converge the network.

Figure 6 shows the trend of the mean RMS error with training time for the training and validation datasets. It can be seen from the figure that, as the training time increases, the mean RMS error of the training and validation data sets continues to decrease and then gradually converges. Overfitting was not observed in the plot, indicating that the functional relationship between the sub-aperture images and wavefront distortion is correctly fitted. Furthermore, it can be noticed that the RMS error of the validation set is smaller than that of the training set. We consider this is reasonable because we use the dropout layers during training. The dropout mechanism enables only part of the network parameters to be used during feed-forward, which reduces the number of connections of the neural network and effectively suppresses overfitting.

Fig. 6. Training curves of SH-ResNet.

Download Full Size | PDF

After the network training, we generated a test set containing 1000 sets of data to validate the proposed method. Here, we quantitatively evaluate the wavefront detection performance of SH-ResNet by calculating the RMS of the residual wavefront and the Strehl ratio (SR) of the corrected point spread function (PSF). To further compare the wavefront detection performance of SH-ResNet with existing wavefront sensing approaches, we reproduced the PD-based neural network proposed by Anderson et al. [25]. A number of simulated PSF-pairs and Zernike coefficients were generated to train a ResNet50 network. For the convenience of analysis, only monochromatic light was considered. Besides, the measurement results of the $12 \times 12$ SHWFS reconstructed by the modal algorithm were used as another set of comparison experiments.

Figures 7 and 8 show the comparison results of the two sets of test data for the three approaches, which can intuitively illustrate the characteristics of each method. Figures 7(a) and 7(d) list the simulated phase screen, the output wavefronts, and the corresponding residuals. The RMS of residual wavefront for the SH-ResNet, PD-based algorithm, and modal approach in Fig. 7(a) are $0.082\lambda $, $0.17\lambda $ and $0.074\lambda $, respectively. And the residual wavefront RMS in Fig. 7(d) are $0.046\lambda $, $0.089\lambda $ and $0.054\lambda $. The PSFs before and after compensation for the turbulence phase screen is shown in Figs. 7(b) and 7(e). The SR of corrected PSFs are 0.68 (SH-ResNet), 0.34 (PD), 0.71 (modal) and 0.98 (SH-ResNet), 0.71 (PD), 0.87 (modal), respectively. The corresponding central intensity profiles of these PSFs are represented in Figs. 7(c) and 7(f). The curves are obtained from x directions where the maximum pixels located in the PSFs. Figure 8 lists the original image of SH-ResNet in Fig. 7(a) and the estimated Zernike coefficients compared to the true values.

Fig. 7. Comparison of wavefront detection results for the three approaches. (a) and (d) Simulated turbulence phase screen, the output wavefronts and their residuals. (b) and (e) The PSFs before and after compensation with SH-ResNet, PD approach and modal approach, respectively. (c) and (f) Comparison of central intensity profiles of PSFs in (b) and (e).

Download Full Size | PDF

Fig. 8. (a) The original Shack-Hartmann image of SH-ResNet in Fig. 6(a). (b) Comparison of estimated Zernike coefficients with the true values.

Download Full Size | PDF

In Fig. 8(a), the high-order aberrations in the sub-wavefronts cause apparent deformation of the spots, which makes the centroid displacements of the sub-aperture insufficient to represent the phase distribution of the wavefront. However, since SH-ResNet can extract features directly from sub-aperture images, wavefront aberrations can be accurately detected even when the sub-aperture spots are irregular. From the comparison results in Fig. 7, it is noticed that the detection result of SH-ResNet matches better than that of the PD-based deep learning method. The residual wavefront RMS of SH-ResNet is lower than that of the PD-based approach, indicating that SH-ResNet can offer better compensation for wavefront aberrations and improve the Strehl ratio of the focal spots. We suggest that the segmentation of the wavefront reduces the complexity of the sub-aperture spots, which facilitates the construction of functional relationships by the neural network. Compared with the modal approach, the measurement accuracy of SH-ResNet is approximately equivalent to that of the modal approach with wavefront sufficiently sampled. However, the number of sub-apertures required for detection drops from 102(only 102 of the 144 sub-apertures are fully illuminated) to 7, approximately 7% of the previous number, significantly improving the energy utilization.

The statistical results of the 1000 test datasets evaluated on three approaches are displayed in Fig. 9. Figure 9(a) shows the RMS of the residual wavefronts as a function of the turbulence parameters ${D / {{r_0}}}$ for 1000 phase screens. Each point in the graph represents the residual wavefront RMS after subtracting the estimated Zernike phase screen from the atmospheric phase screen. The mean residual wavefront RMS for the three methods are $0.0806\lambda $ (SH-ResNet), $0.1517\lambda $ (PD) and $0.0767\lambda $ (modal), respectively. The SR of the corrected PSF as a function of ${D / {{r_0}}}$ is shown in Fig. 9(b). Table 2 summarizes the values for each method on the three metrics (RMS, MAE, SR), where the mean absolute error (MAE) represents the average absolute difference between the estimated Zernike coefficient and the true best-fit value.

Fig. 9. The detection results of 1000 wavefronts for the three approaches. (a) The RMS of the residual wavefronts as a function of the turbulence parameter ${D / {{r_0}}}$. (b) The Strehl ratio (SR) of the corrected PSFs as a function of ${D / {{r_0}}}$.

Download Full Size | PDF

Table 2. Comparison of Metrics of the Three Approaches

View Table | View all tables in this article

To further verify the prediction performance of the proposed method, the residuals of the test data sets were analyzed statistically, and the results are shown in Fig. 10. Figure 10(a) displays the RMS of the best-fit Zernikes and RMS of the residuals Zernikes, which both averaged over 1000 data sets. Figures 10(b) and 10(c) show the distribution histogram of the residual phase screens and the residual Zernike coefficients, as well as the fitted probability distribution functions. The correlation coefficient matrix for the estimation errors of the Zernike coefficients is presented in Fig. 10(d). As expected, the residuals after correction are obviously lower than the original Zernikes, especially those lower-order aberrations with large amplitudes. This indicates that the proposed algorithm is capable of detecting high-order Zernike modes. Furthermore, both the residual phase screens and the residual Zernike coefficients follow a zero-mean Gaussian distribution. Therefore, the residuals can be regarded as white noise or a similar random process, suggesting that the model is an unbiased estimate of the true value. Finally, the correlation coefficient matrix illustrates whether the dimensions of the output are correlated with each other. Most of the Zernike modes are nearly independent, but some specific modes have a high correlation coefficient with each other. We consider it reasonable because of the similarity between these Zernike modes.

Fig. 10. Statistical characteristics of the residuals in the test data sets. (a) The RMS of the best-fit Zernikes compared with RMS of the residuals Zernikes, the y-axis is expressed as a logarithmic coordinate system. (b) and (c) Distribution histogram of the residual phase screens and the residual Zernike coefficients, as well as the fitted probability distribution functions. (d) The correlation coefficient matrix for the estimation errors of the Zernike coefficients.

Download Full Size | PDF

From the above, we suggest that the numerical simulation results are credible. SH-ResNet can accurately reconstruct high spatial frequency wavefronts with fewer sub-apertures. It outperforms PD-based deep learning wavefront sensing methods in terms of accuracy. These characteristics give SH-ResNet an obvious advantage in wavefront detection at low SNR.

3.2 Transfer learning results

Once the network model for SH-ResNet was trained, transfer learning can be used to apply the trained network to SHWFS with a different sub-aperture number. Here, we fine-tuned the SH-ResNet trained in Section 3.1 to apply it to the 19-sub-aperture SHWFS according to the procedure described in Section 2.3. A total of 20,000 data sets were generated for training and validation. The turbulence parameter was adjusted to ${D / {{r_0}}} = 10 \sim 20$ in order to keep the ratio ${d / {{r_0}}}$ invariant, and the rest of the parameters were kept consistent with those in Table 1. Since the trainable parameters in the neural network are significantly reduced, the entire training process takes only about 5 minutes. A comparison of the training process for transfer learning and deep learning is shown in Fig. 11. In the plot, the x-axis represents the training time, and the y-axis represents the loss function. Other parameters related to the training process are summarized in Table 3. The comparison demonstrates that the training process for transfer learning is 98.4% faster than deep learning.

Fig. 11. The comparison of the training process for deep learning and transfer learning. The plot contains two sets of coordinate systems (red and blue) that represent the trend of the loss function with training time (unit: s) for deep learning and transfer learning, respectively.

Download Full Size | PDF

Table 3. Comparison of Training Parameters Between Deep Learning and Transfer Learning

View Table | View all tables in this article

In order to evaluate the wavefront detection performance of SH-ResNet trained by fine-tuning, an additional 1000 test data sets were generated after the network training is completed. Figure 12 shows two sets of the random test dataset. The simulated phase screen, the output of the network and wavefront residuals are shown sequentially in subfigure (a). In Fig. 12(b), the PSFs before and after correction and the corresponding central intensity profile curve are listed. The residual wavefront RMS and SR of the corrected PSFs for the two data set are $0.113\lambda $, 0.4916 and $0.091\lambda $, 0.6374, respectively. Figures 12(c) and 12(d) provide a comparison of the estimated and actual values of the Zernike coefficients. The differences in coefficients measured by SH-ResNet are small, especially those low-order modes with large amplitudes. From the diagrams, it can be concluded that the SH-ResNet network trained with the fine-tuning approach is capable of providing accurate detections and good compensation for atmospheric turbulence with ${D / {{r_0} \approx 20}}$.

Fig. 12. Wavefront detection results of transfer learning approach. (a) The turbulence phase screens, output wavefronts and the corresponding residuals. (b) The PSFs before and after compensation and central intensity profile curves. (c) and (d) Comparison of estimated Zernike coefficients with the true values.

Download Full Size | PDF

Figure 13 shows the reconstruction results of 1000 random wavefronts. The two subgraphs respectively represent the RMS of the residual wavefronts and the SR of the corrected PSF as a function of the turbulence parameter ${D / {{r_0}}}$. As can be seen from the figure, the distribution of transfer learning results is approximately equivalent to that of deep learning results. The mean RMS and SR for the test datasets are $0.0979\lambda $ (SH-ResNet TL), $0.092\lambda $ (SH-ResNet DL), $0.0958\lambda $ (modal) and 0.5911 (SH-ResNet TL), 0.6291 (SH-ResNet DL), 0.5844 (modal). The difference in the training method of SH-ResNet did not result in a loss of measurement accuracy. However, the variance of the SH-ResNet model is smaller than that of the standard modal approach, indicating that the SH-ResNet is more robust than the standard modal algorithm. We consider that adding noise to the training dataset improves the robustness of the neural network and makes the detection results stable. In summary, fine-tuning is an effective method for training SH-ResNet models. The feature extractor efficiently acquires the characteristics of the sub-aperture images, and an accurate neural network model can be obtained after training. Moreover, the training time is significantly reduced by transfer learning, which makes the complex training process easy to implement. These properties contribute to the application of the SH-ResNet-based wavefront detection method to actual adaptive optics systems.

Fig. 13. The reconstruction results of 1000 random turbulence wavefronts. (a) The RMS of residual wavefronts as a function of the turbulence parameter ${D / {{r_0}}}$. (b) The SR of the PSFs after compensation as a function of ${D / {{r_0}}}$.

Download Full Size | PDF

4. Conclusions

In this article, we proposed and implemented a deep learning wavefront sensing method and named it SH-ResNet. SH-ResNet can estimate the Zernike coefficients of the wavefront from irregular sub-aperture focal spots, breaking the limitation of ${d / {{r_0} \approx 1}}$ when using SHWFS to measure atmospheric turbulence. Numerical simulation experiments were used to evaluate the wavefront detection performance of SH-ResNet quantitatively. The averaged phase residual RMS and the Strehl ratio of the corrected PSFs are $0.08\lambda $ and 0.72, respectively. The above results were obtained with the wavefronts under-sampled, requiring only 7% of the sub-apertures required by the standard modal approach. This improvement facilitates adaptive optics applications with low spatial sampling frequencies to obtain adequate SNR or high frame rates.

Besides, we used fine-tuning to transfer the trained SH-ResNet to SHWFS with different MLA, saving 98.4% of the training time compared with deep learning. The turbulence phase screens with parameter ${D / {{r_0} = 10 \sim 20}}$ are measured, the mean residual wavefront RMS is $0.098\lambda $. In conclusion, the proposed method is capable of detecting atmospheric turbulence with fewer sub-apertures. It is accurate and portable to solve the problems encountered in atmosphere-related applications in adaptive optics fields. In the next step, we intend to apply the proposed algorithm to the closed-loop control of AO systems.

Funding

Young Scientists Fund (61902419).

Acknowledgments

The author thanks Jialong Peng and Jinmei Yao for their help in the manuscript revision.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. J. Booth, “Adaptive optical microscopy: the ongoing quest for a perfect image,” Light: Sci. Appl. 3(4), e165 (2014). [CrossRef]

2. T. Li, L. Huang, and M. Gong, “Wavefront sensing for a nonuniform intensity laser beam by Shack–Hartmann sensor with modified Fourier domain centroiding,” Opt. Eng. 53(4), 044101 (2014). [CrossRef]

3. P. L. Wizinowich, D. Le Mignant, A. H. Bouchez, R. D. Campbell, J. C. Chin, A. R. Contos, M. A. van Dam, S. K. Hartman, E. M. Johansson, and R. E. Lafon, “The WM Keck Observatory laser guide star adaptive optics system: overview,” Publ. Astron. Soc. Pac. 118(840), 297–309 (2006). [CrossRef]

4. M. Shaw, K. O’Holleran, and C. Paterson, “Investigation of the confocal wavefront sensor and its application to biological microscopy,” Opt. Express 21(16), 19353–19362 (2013). [CrossRef]

5. B. C. Platt and R. Shack, “History and Principles of Shack-Hartmann Wavefront Sensing,” J. Refract. Surg. 17(5), S573–S577 (2001). [CrossRef]

6. C. Wu, J. Ko, and C. C. Davis, “Determining the phase and amplitude distortion of a wavefront using a plenoptic sensor,” J. Opt. Soc. Am. A 32(5), 964–978 (2015). [CrossRef]

7. R. Ragazzoni, “Pupil plane wavefront sensing with an oscillating prism,” J. Mod. Opt. 43(2), 289–293 (1996). [CrossRef]

8. F. Roddier, “Curvature sensing and compensation: a new concept in adaptive optics,” Appl. Opt. 27(7), 1223–1225 (1988). [CrossRef]

9. R. A. Gonsalves, “Phase retrieval and diversity in adaptive optics,” Opt. Eng. 21(5), 829–832 (1982). [CrossRef]

10. D. L. Fried, “Least-square fitting a wave-front distortion estimate to an array of phase-difference measurements,” J. Opt. Soc. Am. A 67(3), 370–375 (1977). [CrossRef]

11. A. Talmi and E. N. Ribak, “Wavefront reconstruction from its gradients,” J. Opt. Soc. Am. A 23(2), 288–297 (2006). [CrossRef]

12. J. Wang and D. E. Silva, “Wave-front interpretation with Zernike polynomials,” Appl. Opt. 19(9), 1510–1518 (1980). [CrossRef]

13. O. Soloviev and G. Vdovin, “Hartmann-Shack test with random masks for modal wavefront reconstruction,” Opt. Express 13(23), 9570–9584 (2005). [CrossRef]

14. F. Roddier, Adaptive optics in astronomy (Cambridge University, 1999).

15. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

16. H. Guo, N. Korablinova, Q. Ren, and J. Bille, “Wavefront reconstruction with artificial neural networks,” Opt. Express 14(14), 6456–6462 (2006). [CrossRef]

17. Z. Li and X. Li, “Centroid computation for Shack-Hartmann wavefront sensor in extreme situations based on artificial neural networks,” Opt. Express 26(24), 31675–31692 (2018). [CrossRef]

18. R. Swanson, M. Lamb, C. Correia, S. Sivanandam, and K. Kutulakos, “Wavefront reconstruction and prediction with convolutional neural networks,” in Adaptive Optics Systems VI, (International Society for Optics and Photonics, 2018), 107031F.

19. L. Hu, S. Hu, W. Gong, and K. Si, “Learning-based Shack-Hartmann wavefront sensor for high-order aberration detection,” Opt. Express 27(23), 33504–33517 (2019). [CrossRef]

20. L. Hu, S. Hu, W. Gong, and K. Si, “Deep learning assisted Shack–Hartmann wavefront sensor for direct wavefront detection,” Opt. Lett. 45(13), 3741–3744 (2020). [CrossRef]

21. Z. Xu, S. Wang, M. Zhao, W. Zhao, L. Dong, X. He, P. Yang, and B. Xu, “Wavefront reconstruction of a Shack–Hartmann sensor with insufficient lenslets based on an extreme learning machine,” Appl. Opt. 59(16), 4768–4774 (2020). [CrossRef]

22. S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. 43(6), 1235–1238 (2018). [CrossRef]

23. Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express 27(1), 240–251 (2019). [CrossRef]

24. T. Andersen, M. Owner-Petersen, and A. Enmark, “Neural networks for image-based wavefront sensing for astronomy,” Opt. Lett. 44(18), 4618–4621 (2019). [CrossRef]

25. T. Andersen, M. Owner-Petersen, and A. Enmark, “Image-based wavefront sensing for astronomy using neural networks,” J. Astron. Telesc. Instrum. Syst. 6(3), 034002 (2020). [CrossRef]

26. Y. Wu, Y. Guo, H. Bao, and C. Rao, “Sub-millisecond phase retrieval for phase-diversity wavefront sensor,” Sensors 20(17), 4877 (2020). [CrossRef]

27. R. A. Gonsalves and R. Chidlaw, “Wavefront sensing by phase retrieval,” in Applications of Digital Image Processing III, (1979), pp. 32–39.

28. K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition (2016), pp. 770–778.

29. J. D. Schmidt, “Numerical simulation of optical wave propagation with examples in MATLAB,” (SPIE, WA, USA, 2010).

30. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. 66(3), 207–211 (1976). [CrossRef]

31. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

32. S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans. Knowl. Data. Eng. 22(10), 1345–1359 (2010). [CrossRef]

33. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, (MIT Press, Montreal, Canada, 2014), pp. 3320–3328.

Parameters	Values
Wavelength	$1064nm$
Arrangement of MLA	Shown in Fig. 1(b)
Lenslet size	$1000 μ m$
The focal length of the lenslet	$30mm$
Resolution of sub-aperture images	$64 \times 64$
Pixel size	$10 μ m$
Turbulence strength	$D / r_{0} = 6 \sim 12 (d / r_{0} = 2 \sim 4)$

	SH-ResNet	PD	Modal
RMS(unit: $λ$ )	$0.0806 λ$	$0.1517 λ$	$0.0767 λ$
MAE(unit: $λ$ )	$0.0064 λ$	$0.0116 λ$	$0.0057 λ$
SR	0.72	0.47	0.76

Parameters	Values
Wavelength	$1064nm$
Arrangement of MLA	Shown in Fig. 1(b)
Lenslet size	$1000 μ m$
The focal length of the lenslet	$30mm$
Resolution of sub-aperture images	$64 \times 64$
Pixel size	$10 μ m$
Turbulence strength	$D / r_{0} = 6 \sim 12 (d / r_{0} = 2 \sim 4)$

	SH-ResNet	PD	Modal
RMS(unit: $λ$ )	$0.0806 λ$	$0.1517 λ$	$0.0767 λ$
MAE(unit: $λ$ )	$0.0064 λ$	$0.0116 λ$	$0.0057 λ$
SR	0.72	0.47	0.76

Deep learning wavefront sensing method for Shack-Hartmann sensors with sparse sub-apertures

Abstract

1. Introduction

2. Method

2.1 Simulation setup

2.2 Neural network architecture

2.3 Transfer learning in SH-ResNet

3. Numerical simulation results

3.1 Deep learning results

3.2 Transfer learning results

4. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (3)

Equations (3)

Optics Express

	Trainable Parameters (M)	Dataset	Number of Epochs	Training Time (h)
Deep learning	20.82	200,000	8	5
Transfer learning	1.45	20,000	30	0.08