DNN-based aberration correction in a wavefront sensorless adaptive optics system

Qinghua Tian; Chenda Lu; Bo Liu; Lei Zhu; Xiaolong Pan; Qi Zhang; Leijing Yang; Feng Tian; Xiangjun Xin

doi:10.1364/OE.27.010765

1. Introduction

Adaptive optics (AO) is an effective method that uses a deformable mirror (DM) or a spatial light modulator (SLM) to correct the distortion of the wavefront caused by atmospheric or maritime turbulence [1,2], thereby improving the performance of the optical system [3]. Certain AO systems, which are known as wavefront sensor less (WFS-less) AO systems, do not employ a wavefront sensor but use an adaptive control method to determine the aberration. WFS-less AO systems can be implemented at lower cost and have the potential to resolve the issue associated with the wavefront sensor [4].

WFS-less AO methods can be divided into two categories: model-based methods and model-free methods. Model-free WFS-less AO methods have employed stochastic, local or global search algorithm, such as stochastic parallel gradient descent algorithm (SPGD) [5,6] simulated annealing algorithm [7], genetic algorithm [8], hybrid input-output algorithm [9] and Gerchberg-Saxton algorithm [10,11]. These methods, however, usually need lots of iterations and measurements to finish the global optimization, which makes them difficult to use in a real-time aberration correction system. The model-based WFS-less AO methods, which have been demonstrated in [12–15], are proposed to speed up the correction by exploring the structure of the performance metric function. Compared with model-free methods, model-based methods have limited improvement in the speed of convergence but much higher implementation cost. So how to reduce computing time without increasing implementation cost is the pursuing goal of WFS-less AO methods.

Deep learning (DL), which is part of machine learning methods based on learning data representations, has sparked a revolution in artificial intelligence. The biggest difference between DL algorithms and traditional machine learning algorithms lies in the capacity of the model. Deeper model means better fitting ability while conventional machine learning techniques were limited in their ability to process natural data in their raw form [16]. Recently, there are published researches which have applied deep learning method in adaptive optics. In [17], Li proposed a model-based method with a simple artificial neural network while in [18], Lohani applied a convolutional neural network to classify the distortion degree. Both methods must rely on a search algorithm to compensate for the distortion and the neural network is only used to reduce the searching scope.

In this paper, we propose a deep neural network (DNN) based aberration correction method in WFS-less AO system. The method uses a customized DNN model, which functions like a wavefront sensor, to estimate the aberration. It does not require any searching process thus to improve the computing speed. Numerical simulation shows that this method can achieve impressive performance for different degree of distortion. Compared with conventional SPGD method, it takes less time for our method to correct the same wavefront.

2. Theory of DNN-based AO system

The schematic diagram of our proposed WFS-less AO system is depicted in Fig. 1. The AO system consists of a DM, a lens, a charge-coupled device (CCD) camera and a controller. In this AO system, the incident beam distorted by atmospheric turbulence is focused by a positive lens. Then the intensity distribution is captured by the CCD camera and fed to the controller where the controller adapts the control signal of DM to compensate for the wavefront distortion.

Fig. 1 Conceptual scheme of proposed sensor-less AO system model.

Download Full Size | PDF

The physical processes behind imaging combine the process of propagation with the effects of lenses, mirrors, and other imaging optics. Ignoring the scaling effect of the lens system, the intensity distribution I can be written in shorten notation as follows [3]:

I (x, y) = | U_{0} (- x_{0}, - y_{0}) * F (P (x', y')) |^{2}

Where

x_{0}

and

y_{0}

are rectangular coordinates in the object plane while x and y are in the image plane,

x'

and

y'

are in the pupil plane. * refers to the convolution operation. F represents the Fourier transform operation.

U_{0}

is the field distribution in the object plane and

P (x', y')

is the pupil function which is given by:

P (x', y') = A (x', y') e^{i k Φ (x', y')}

where A is the amplitude of the field,

Φ

is the wavefront aberrations and k = 2π/λ is the wave number.

It is well known that $Φ$ can be expressed as the linear combination of a series of Zernike polynomials, which is expressed as [19]:

Φ (x', y') = \sum_{j} a_{j} Z_{j}

where

Z_{j}

is the jth Zernike polynomial and

a_{j}

is the jth Zernike coefficient.

As the analytical definition of these Zernike polynomials is not unique, we use the definition suggested by Noll [20] which is given by:

\begin{matrix} Z_{eveni} = \sqrt{n + 1} R_{n}^{m} (r) \sqrt{2} \cos (m θ), m \neq 0, \\ Z_{oddi} = \sqrt{n + 1} R_{n}^{m} (r) \sqrt{2} \sin (m θ), m \neq 0, \\ z_{i} = \sqrt{n + 1} R_{n}^{0} (r), m = 0, \end{matrix}

where

R_{n}^{m} (r) = \sum_{s = 0}^{(n - m) / 2} \frac{{(- 1)}^{s} (n - s)!}{s! [(n + m) / 2 - s]! [(n - m) / 2 - s]!} r^{n - 2 s}

r and $θ$ are polar coordinates in the pupil plane. The values of n and m are always integral and satisfy the following conditions: $m \leq n$ , $n - | m | = e v e n$ . The index i is a mode ordering number and is a function of n and m.

According to the Eqs. (1)–(5) a certain mapping relationship is satisfied between the intensity distribution of the input wavefront I and the corresponding Zernike coefficients $a$ which represent for the wavefront aberration:

f : I \to a

Provided the mapping f can be accurately estimated, the controller of the AO system is able to predict the wavefront aberration. But f cannot be explicitly solved. In this work, we have trained a DNN model to fit the mapping f. During the training process, DNN model continuously updates its parameters by measuring the loss, thus to approximate the objective function, the mapping f.

Among these existing DNN models, convolutional neural networks (CNN) have been chosen for their high efficiency in processing 2-D data. A typical convolutional network is composed of series of convolutional layers, pooling layers and fully connected layers. Units in a convolutional layer are organized in feature maps. All units in a feature map share the same convolution filter which can effectively extract the image features [16]. The sharing of weights in convolutional layers dramatically reduces the number of parameters as well as the risk of over-fitting, thereby gaining a better generalization ability [21].

Convolutional layers generally follow by pooling layers which can be regarded as a subsample operation. After the maximum pooling, half the number of columns and rows compared with the convolved feature maps are removed. Fully connected layers are usually at the end of network which connect all of the units in the feature maps to the output tensor.

After a customized CNN, a transfer module is applied to convert Zernike coefficients into the control signal of DM. The transfer process can be accomplished through several existing methods [22–25] which are widely used in the controller of most WFS AO system. Notice there are no search algorithms in the controller, the CNN model can directly map the CCD intensity images to Zernike coefficients. So the proposed AO system does not need any iterations but a single measurement to compensate for the wavefront distortion.

3. Numerical simulation and discussion

3.1. Data generating and preprocessing

Atmospheric turbulence causes wavefront phase distortion of the light that propagates through it. In the simulation, we assume that an incident Gaussian beam is distorted by the given phase screen generated by Zernike polynomials with its corresponding Zernike coefficients [26]. In the method described in [26], independent random Karhunen-Loêve coefficients is computed and then converted to Zernike coefficients according to the Karhunen-Loêve Zernike expansion. These Zernike coefficients are not statistically independent but follow Kolmogorov energy distribution.

Only first 2-400 orders of Zernike coefficients are considered, because the impact of the higher-order Zernike coefficients over 400 can be ignorable, and the first-order Zernike polynomial is constant. Since the proposed method does not necessarily require a separate tip-tilt correction system, the second and third Zernike mode are included.

Zernike coefficients is proportional to $D / r_{0}$ where D is the system aperture diameter and $r_{0}$ is the Fried parameter. $D / r_{0}$ is related to the turbulence intensity: the higher value of $D / r_{0}$ ,the stronger atmospheric turbulence. Eight values of $D / r_{0}$ are considered in this work, which are 1,3,5,7,9,11,13 and 15. Figures 2(a)–2(h) shows the simulated atmospheric phase screen for different $D / r_{0}$ ratios. It can be seen that higher value of $D / r_{0}$ leads to larger aberrations.

Fig. 2 Simulation demonstration for atmospheric turbulence using Zernike polynomials with $D / r_{0}$ valued respectively in (a) 1, (b) 3, (c) 5, (d) 7, (e) 9, (f) 11, (g) 13, (h) 15.

Download Full Size | PDF

Figure 3(a) shows the generated Gaussian beam. Figures 3(b)–3(i) shows corresponding far-field intensity images in the image plane which have been converted to grayscale, normalized and reshaped to 224*224 pixels. Gaussian beam is widely applied to describe the propagation of laser. The beam waist of the Gaussian beam is set to be 0.075m and the propagation distance is 1000m.

Fig. 3 Original Gaussian beam (a) and Far-field intensity patterns of the distorted Gaussian beam with $D / r_{0}$ valued respectively in (b) 1, (c) 3, (d) 5, (e) 7, (f) 9, (g) 11, (h) 13, (i) 15.

Download Full Size | PDF

For each value of $D / r_{0}$ , 10000 sets of data are generated as training set and 1000 as the test set. Each set of data includes the intensity image along with its corresponding label, which is the coefficients of the Zernike polynomials. In total, the training set has 80000 sets of data, and the test set has 8000. Both the training set and test set contain intensity images of different distortion degree.

3.2. Model setting and evaluation

There are five key points to establish a supervised DNN model: input, output, loss, label and model structure. In this work, the model takes 224*224 far-field intensity images as input and 400-dimensional Zernike coefficients as output. As mentioned, we use Zernike polynomials to simulate atmospheric turbulence thus to get the Zernike coefficients, which is the label of our model.

L2 loss function stands for Least Square Errors which is given by Eq. (7).

L 2 L o s s = \sum_{i} (y_{p r e d i c t} - y_{t r u e})^{2}

where

y_{p r e d i c t}

is the predicted value of Zernike coefficients and

y_{t r u e}

is the true value of Zernike coefficients. L2 loss function is used in this simulation to minimize the sum of the all the squared differences between the true value and the predicted value.

Unfortunately, there is no rigid theory to determine the optimal model structure. In this paper, we have compared several models to get a decent result. The training processes of all of these models can be seen from the Figs. 4(a)-4(f), the curve shows the change of the L2 loss with the epochs on the test set. Epochs describe the number of times the model sees the entire training set which has 80000 training examples in this simulation. The blurred line in the graph is the true value of loss and we smooth the line to remove the accidental errors thus making it more clear to observe. The performances of the model can be reflected by the inflection points which are tagged with the value of loss. After these points, the loss of each model begins to rise on account of overfitting problem. Overfitting happens when the model learns the training data too well that it deteriorates the performance of the model on new data.

Fig. 4 Changes of L2 loss on the test set with MLP (a), CNN3 (b), CNN5 (c), CNN7 (d), CNN9(e), CNN11(f).

Download Full Size | PDF

First, we have trained the traditional Multilayer Perceptron (MLP) which refers to three-layer fully-connected neural network. It turns out that the MLP model cannot fit the test set very well for its low capacity and lack of generalization ability. As shown in Fig. 4(a), the L2 loss of MLP only drops to 2.21 at the lowest level.

CNN has been proved to be rather effective in processing images. So in this work, we have trained a series of CNN models with varying depth. To simplify the expression, the descriptions of these model are abbreviated to the “CNN3”, “CNN5”, “CNN7”, “CNN9”, “CNN11” according to the number of convolutional layers.

As the number of layers increases from 3 to 7, presented in Figs. 4(b)–4(d), the L2 loss drops from 1.96 to 1.88 which indicates a growth tendency of correction performance. Generally, with increase of the model layers, the CNN can extract more intrinsic features of the CCD intensity images thus promoting the correction accuracy. But large layer size also leads to vanishing gradient and overfitting problem, which bring about the performance deterioration. As can be discovered in Figs. 4(e)–4(f), while the layer size continues to grow, from 9 to 11, the L2 loss rises from 1.88 to 1.93. So an appropriate model should have moderate layer size which is 7 in this simulation.

The details of the CNN7 model structure can be shown in Fig. 5. There are 7 convolutional layers which apply a filter operator over the input image. It is generally acknowledged that small filter is more effective than large one since it takes less parameters and calculations to achieve the same receptive field. So we set filter size to 3*3, where 3 is the smallest size to work. Channel is the number of the filters. In this work, we tend to set more channels when GPU memory allows, where the values are 32,64,128 and 256 since they match the memory size. Some of the convolutional layers follow by a max pooling layer with a stride factor of 2.

Fig. 5 Model structure of CNN7.

Download Full Size | PDF

Adding fully-connected layers is an effective way of learning non-linear combinations of the features extracted by convolutional layers. At the end of the model, there are 3 fully-connected layers with 2048,1024 and 400 nodes which are connected with all the nodes in the upper layer.

3.3. Test results and discussion

To test the performance of our proposed DNN based AO method, a well-trained CNN7 model is applied to acquire the Zernike coefficients from the corresponding intensity images in the test set. Then we transfer these Zernike coefficients into control signal and simulate a DM to compensate for the aberration.

Figure 6(a) illustrates the initial intensity image before compensation and the induced wave front distortion respectively. Figure 6(b) shows compensation results of the proposed method. It can be seen that the beam is more concentrated after compensation, and the phase distortion is decreased significantly. It means the proposed method can effectively compensate for the phase distortion. Especially it can handle position drift of the centroid of the images since the tip and tilt mode of Zernike coefficients are considered in proposed method. So the proposed method can simply the AO system and reduce the cost.

Fig. 6 Intensity image and corresponding phase distortion before (a) and after (b) compensation ( $D / r_{0} = 7$ ).

Download Full Size | PDF

Figure 7 shows the changes of the root mean square (RMS) under different degrees of distortion. Each group corresponds to a $D / r_{0}$ ratio. The RMS can be computed via Eq. (8).

R M S = \sqrt{\frac{1}{π} \int_{0}^{2 π} \int_{0}^{1} {[Φ (r, θ) - \bar{Φ}]}^{2} r d r d θ}

where

Φ

is the wavefront aberration and

\bar{Φ}

is the mean of the

Φ

over the aperture. r and

θ

are polar coordinates in the pupil plane.

Fig. 7 The changes of the RMS under different degree of distortion before and after correction.

Download Full Size | PDF

It can be seen that the bar on the left of each group shows the original RMS before correction while the right bar of each group stands for the RMS after the correction. It can be seen that the RMS all declines significantly, which indicates the CNN model has great generalization ability to fit different turbulence intensities.

Figure 8 shows the computation latency of our proposed DNN-based method and traditional SPGD method. The latency is calculated on a laptop with Core i7-7500U and 8GB RAM. The sensing time by CCD camera and the correction time by DM are not included. To reach the same RMS which are 0.36, 0.78, 1.17, 1.52, 1.89, 2.23, 2.61 and 2.97 corresponding to different $D / r_{0}$ value, it takes SPGD much longer time than our proposed method since SPGD needs hundreds of iterations while DNN-based method does not. The latency time of our proposed method includes the inference time of the DNN model and the processing time of transfer module. The DNN model has been trained before AO system is put into use, so the training time is not considered in the calculation of latency. Furthermore, we use the same model to predict the phase aberration at different $D / r_{0}$ value, so the latency of DNN-based method at different degree of distortion is also the same (2ms).

Fig. 8 The latency of proposed method compared with the SPGD under different degree of distortion ( $D / r_{0} = 1, 3, 5, 7, 9, 11, 13, 15$ ).

Download Full Size | PDF

4. Conclusion

In this paper, a new adaptive optics method is proposed, of which the performance has been demonstrated. By training a well-designed DNN model, this method can acquire control signal of DM directly from the CCD images. It does not need any iteration but one measurement thereby improving the real-time performance. The method does not need a separate tip-tilt correction system since the DNN model can handle position drift of the centroid of the images. Furthermore, the method can deal with different degree of distortion. From the simulation results, it can be seen that the proposed method can effectively reduce the computation time and has an impressive improvement of root mean square (RMS) in different turbulence conditions.

Funding

National Natural Science Foundation of China (NSFC) (61575026, 61727817, 61575027); China National Funds for Distinguished Young Scientists (61425022).

References

1. M. Li and M. Cvijetic, “Coherent free space optics communications over the maritime atmosphere with use of adaptive optics for beam wavefront correction,” Appl. Opt. 54(6), 1453–1462 (2015). [CrossRef] [PubMed]

2. M. Li, M. Cvijetic, Y. Takashima, and Z. Yu, “Evaluation of channel capacities of OAM-based FSO link with real-time wavefront correction by adaptive optics,” Opt. Express 22(25), 31337–31346 (2014). [CrossRef] [PubMed]

3. R. Tyson, Principles of Adaptive Optics (CRC Press, 2010).

4. H. Linhai and C. Rao, “Wavefront sensorless adaptive optics: a general model-based approach,” Opt. Express 19(1), 371–379 (2011). [CrossRef] [PubMed]

5. M. A. Vorontsov, G. W. Carhart, M. Cohen, and G. Cauwenberghs, “Adaptive optics based on analog parallel stochastic optimization: analysis and experimental demonstration,” J. Opt. Soc. Am. A 17(8), 1440–1453 (2000). [CrossRef] [PubMed]

6. T. Weyrauch and M. A. Vorontsov, “Atmospheric compensation with a speckle beacon in strong scintillation conditions: directed energy and laser communication applications,” Appl. Opt. 44(30), 6388–6401 (2005). [CrossRef] [PubMed]

7. S. Zommer, E. N. Ribak, S. G. Lipson, and J. Adler, “Simulated annealing in ocular adaptive optics,” Opt. Lett. 31(7), 939–941 (2006). [CrossRef] [PubMed]

8. P. Yang, M. Ao, Y. Liu, B. Xu, and W. Jiang, “Intracavity transverse modes controlled by a genetic algorithm based on Zernike mode coefficients,” Opt. Express 15(25), 17051–17062 (2007). [CrossRef] [PubMed]

9. X. Yin, H. Chang, X. Cui, J. X. Ma, Y. J. Wang, G. H. Wu, L. Zhang, and X. Xin, “Adaptive turbulence compensation with a hybrid input-output algorithm in orbital angular momentum-based free-space optical communication,” Appl. Opt. 57(26), 7644–7650 (2018). [CrossRef] [PubMed]

10. H. Chang, X. Yin, X. Cui, Z. Zhang, J. Ma, G. Wu, L. Zhang, and X. Xin, “Adaptive optics compensation of orbital angular momentum beams with a modified Gerchberg–Saxton-based phase retrieval algorithm,” Opt. Commun. 405, 271–275 (2017). [CrossRef]

11. L. Ming, Y. Li, and J. Han, “Gerchberg–Saxton algorithm based phase correction in optical wireless communication,” Phys. Commun. 25, 323–327 (2017). [CrossRef]

12. M. J. Booth, “Wavefront sensorless adaptive optics for large aberrations,” Opt. Lett. 32(1), 5–7 (2007). [CrossRef] [PubMed]

13. A. Facomprez, E. Beaurepaire, and D. Débarre, “Accuracy of correction in modal sensorless adaptive optics,” Opt. Express 20(3), 2598–2612 (2012). [CrossRef] [PubMed]

14. M. A. A. Neil, M. J. Booth, and T. Wilson, “Closed-loop aberration correction by use of a modal Zernike wave-front sensor,” Opt. Lett. 25(15), 1083–1085 (2000). [CrossRef] [PubMed]

15. H. Song, R. Fraanje, G. Schitter, H. Kroese, G. Vdovin, and M. Verhaegen, “Model-based aberration correction in a closed-loop wavefront-sensor-less adaptive optics system,” Opt. Express 18(23), 24070–24084 (2010). [CrossRef] [PubMed]

16. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef] [PubMed]

17. Z. Li and X. Zhao, “BP artificial neural network based wave front correction for sensor-less free space optics communication,” Opt. Commun. 385, 219–228 (2017). [CrossRef]

18. S. Lohani and R. T. Glasser, “Turbulence correction with artificial neural networks,” Opt. Lett. 43(11), 2611–2614 (2018). [CrossRef] [PubMed]

19. G. Dai, “Modal compensation of atmospheric turbulence with the use of Zernike polynomials and Karhunen–Loève functions,” J. Opt. Soc. Am. A 12(10), 2182–2193 (1995). [CrossRef]

20. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. A 66(3), 207–211 (1976). [CrossRef]

21. J. T. Huang, J. Li, and Y. Gong, “An analysis of convolutional neural networks for speech recognition,” in Proceedings of IEEE Conference on Acoustics, Speech and Signal Processing (IEEE, 2015), pp. 4989–4993. [CrossRef]

22. L. Zhu, P. C. Sun, D. U. Bartsch, W. R. Freeman, and Y. Fainman, “Wave-front generation of Zernike polynomial modes with a micromachined membrane deformable mirror,” Appl. Opt. 38(28), 6019–6026 (1999). [CrossRef] [PubMed]

23. A. Haber, A. Polo, C. S. Smith, S. F. Pereira, P. Urbach, and M. Verhaegen, “Iterative learning control of a membrane deformable mirror for optimal wavefront correction,” Appl. Opt. 52(11), 2363–2373 (2013). [CrossRef] [PubMed]

24. A. Polo, A. Haber, S. F. Pereira, M. Verhaegen, and H. P. Urbach, “An innovative and efficient method to control the shape of push-pull membrane deformable mirror,” Opt. Express 20(25), 27922–27932 (2012). [CrossRef] [PubMed]

25. S. Bonora and L. Poletto, “Push-pull membrane mirrors for adaptive optics,” Opt. Express 14(25), 11935–11944 (2006). [CrossRef] [PubMed]

26. N. A. Roddier, “Atmospheric wavefront simulation using Zernike polynomials,” Opt. Eng. 29(10), 1174–1181 (1990). [CrossRef]

DNN-based aberration correction in a wavefront sensorless adaptive optics system

Abstract

1. Introduction

2. Theory of DNN-based AO system

3. Numerical simulation and discussion

3.1. Data generating and preprocessing

3.2. Model setting and evaluation

3.3. Test results and discussion

4. Conclusion

Funding

References

Cited By

Figures (8)

Equations (8)

Optics Express