Global piston restoration of segmented mirrors with recurrent neural networks

Dailos Guerra-Ramos; Juan Trujillo-Sevilla; Jose Manuel Rodríguez-Ramos; Jose Manuel Rodríguez-Ramos

doi:10.1364/OSAC.387358

1. Introduction

In the quest of building bigger telescopes, monolithic mirrors were dumped in favor of segmented ones easier to fabricate and maintain. However, this partition in the reflective surface carries the necessity of the alignment of the constituents parts with high precision. Particularly, when adaptive optics systems are present, this procedure becomes very demanding. Under these circumstances, the optical path difference between segments must be below a small fraction of the wavelength. In case of high contrast imaging, the goal is to image an exoplanet against its parent star. Consequently, the piston precision error should be of the order of 30 nm rms in order to reach the desired contrast [1].

Several methods have been proposed for cophasing segmented mirrors adequately in main telescopes. Nowadays, most of them are either pupil plane techniques, e.g. Shack-Hartmann sensor based methods [2,3], pyramid sensor based methods [4], or less widely intermediate plane techniques [5,6].

The former group of methods is robust and precise but require to align microlenses or prisms with every segment edge which is very time consuming and might become impractical for a growing number of segments. Whereas the latter, based on Curvature sensors have been used traditionally for crosschecking the measurements of the primary method. They lack a very wide capture range and are very sensitive to atmospheric conditions. Nonetheless, they provide a very simple optical design and require no special hardware.

The method discussed in the following paper takes as input a four channel image recorded at a single defocused distance in order to operate. Every channel contains a measurement taken at four different wavelengths, the exact four that were used in the training of the network.

Neural network applications for wavefront sensing are not new [7]. The potential use behind them as a universal approximation algorithm is well known from time ago. Other methods based on convolutional networks have been proposed for piston detection [8,9]. However they rely on extended light sources, and it has not been proved to be efficient under atmospheric conditions. We will show here an architecture that uses recurrent neural networks to extract features from the relation among junctions in the mirror. This allows the algorithm to give more precise predictions and ample capture range with more robustness.

We will organize this paper in the following manner. First we will give a brief introduction about the physical background of the problem and some mathematical considerations. We continue with the optical setup needed and the simulation process that was followed to create the data images to feed the network. It then proceeds to explain the details of the network architecture. Next, some insights about the learning process are explained. Eventually, conclusions, future works and final remarks are found at the end of paper.

2. Physical background of the problem

The difficulty of the detection of the piston misalignment lies in the intrinsic ambiguity that conveys the periodicity of the diffraction patterns. That attribute of monochromatic light makes indistinguishable optical paths that are a multiple of $\lambda$ away, being $\lambda$ the wavelength used. A vertical discontinuity in a planar wavefront creates an optical path difference $\Delta \phi$ along a vertical line that produces a diffraction pattern when propagating a distance $z$. The pattern produced by $\Delta \phi = \Delta \phi _0 + n \lambda$ repeats for every $n\in \mathbb {Z}$. In addition to that, the diffraction pattern produced by an wavefront discontinuity $\Delta \phi = p_0 + \Delta \phi _0$ are a reversed version of the pattern produced by a discontinuity $\Delta \phi = p_0 -\Delta \phi _0$, where $p_0 = n \lambda /2$.

The light beam being reflected in a segmented mirror telescope is not collimated but a spherical wave converging to the focus. The propagation distance to the conjugate plane is related to $f$, the focus distance and $l$, the distance between focal plane and defocused plane (where the detector is placed) such that $z = (f-l)f/l$. The propagation distance was chosen according to conditions described in [10]. Fresnel equation in its convolutional form was used to propagate the wavefront a distance $z$. The square of the magnitude of the complex field is computed since it is of interest the intensity image recorded at the detector.

3. Simulations

The parameters included in the Table 1 for a three ring segmented mirror were used in order to create the simulations needed to feed the network. It was chosen a 36 random global piston positions drawn from an uniform distribution. That configuration generates piston steps between any two adjacent segments with values in the range $[-21\lambda _0,+21\lambda _0]$. That is the capture range the algorithm is going to be trained for.

Table 1. Simulation parameters

View Table

It is required the tip-tilt values of the segments to be detected and corrected previously. Tip-tilt restoration can be achieved with methods based on Curvature sensors or active optics. However, residual tip-tilt could remain afterwards. To account for that, it was added a small amount of those aberrations randomly to every segment. Such tip-tilt values lie in the range $\pm 0.002^{\prime\prime}$. This accuracy was achieved in a previous result presented here [11]. That tip-tilt restoration method was based on a geometric sensor to measure tip-tilt values in a segmented surface by using the Van Dam and Lane algorithms.

The effect of polishing errors into the wavefront has been considered as well. A phase screen with a power spectral density that simulates mid and high spatial frequency errors up to an outer scale of 0.2 m was added to the wavefront. The total rms of this errors was 20 nm.

The intensity image at the defocused plane was filtered with the long exposure transfer function of the atmosphere [12] in order to simulate atmospheric turbulence. It was selected a value for the Fried parameter randomly between $0.1$ and $0.2$. This value varies in every instance of the mirror. It is shown in Table 1 only the lower bound that represents the worst atmospheric condition considered in the simulations.

The signal obtained at the detector from a piston step with monochromatic light is periodic. The diffraction pattern repeats for every piston step value $n\lambda$ where $n\in \mathbb {Z}$ and $\lambda$ is the wavelength used. By adding three more simultaneous measurements with different wavelengths it is possible to create a correspondence many-to-one between data and piston step values. The wavelengths used were, $\lambda _0 = 700$ nm as the reference wavelength and three shorter ones $\lambda _1=0.930\lambda _0$, $\lambda _2=0.860\lambda _0$ and $\lambda _3=0.790\lambda _0$. The theoretical capture range allowed by these wavelengths can be deduced through the method explained here [13]. The closeness to this limit in practice relies on the measurements uncertainty and the method itself.

4. Learning process

In every training step new simulated images are produced like the one showed in Fig. 1. And then a sequence with the 48 intersection subimages is created. An example of this $42\times 42$ pixels cutout is showed in dash line in the upper left side of the same figure. Half of the junctions have been flipped from left to right to get the same Y-shape pattern orientation in all of them.

Fig. 1. Diffraction image from the simulations. Junction of three segments showed in dash line.

Download Full Size | PDF

The network is trained in a fully supervised manner, that means that correct labels must be supplied together with the input data. The labels are not the global piston values themselves but the optical path difference between any two neighbouring segments.

There are three piston step values to be detected in each intersection. Figure 2 shows in a straight line all possible piston step values between two adjacent segments, $\Delta \phi$ within the capture range. The capture range is divided into intervals of size $\lambda /2$ called ambiguity ranges. Piston step values are decomposed into a numerical value $p\in [0,\lambda /2)$ and a categorical value associated with the ambiguity range. The value $p$ refers to the distance from the piston step value $\Delta \phi _0$ to the closest reference $p_0= n \lambda$, being $n\in \mathbb {Z}$. In Fig. 2 a shorter capture range is shown for illustrative purposes.

Fig. 2. Piston step value decomposition into numerical and categorical values.

Download Full Size | PDF

The network architecture is split into two branches. See diagram on Fig. 3 for clarity. One of the branches is trained to predict the ambiguity range that the piston step lies into i.e. a classification task. The other branch will be able to decide a floating value in the interval $[0, \lambda /2)$ within that ambiguity range, i.e. a regression task. The same procedure was used in [14]. Each branch contains a three-layer Convolutional Neural Network (CNN) [15] that takes features out of the intersection image. Each layer has 16 kernels and ReLU activation function. The CNNs are followed by an one-layer Recurrent Neural Network (RNN). That recurrent network uses a GRU unit [16] in a bidirectional schema [17,18]. The recurrent neural network takes advantage of the strong relation between piston steps values in one intersection with piston steps values of all other intersections in the mirror. The output of the RNN continues with a linear units in the regression branch and softmax unit in the classification branch.

Fig. 3. Schematic representation of the network architecture.

Download Full Size | PDF

Figure 3 shows an unwrapped version of the network. All blocks in the figure with same name share weights between them. It has two losses that are minimized simultaneously with mini-batch gradient descent and Adam update rule [19].

The network is fed with a sequence of intersection pseudo 4-color images. The same piece of data goes to both branches of the network. Each branch looks into the data for the pattern needed to fulfill the task it is being trained to perform.

The regression branch is trained by minimizing the mean squared error between scores given by the network and actual piston step values within one ambiguity range. It can be understood as the fine tuning part of the prediction. In fact, this task can be achieved with the information contained in a single wavelength and shows better convergence properties than the classification branch.

It is shown in Fig. 4 the evolution of the regression branch mse for both recurrent and convolution approaches. The convolution approach suppresses the recurrent network part in the architecture design. Predictions with recurrent approach achieves a mean square error of $0.00027\lambda _0^2$ respect to the reference wavelength $\lambda _0=700$ nm. It means $rms = 11.50$ nm, an improvement of $16.79 \%$ over the convolution approach.

Fig. 4. Regression loss during training.

Download Full Size | PDF

The classification branch is trained by minimizing the softmax loss between predictions and ground truth labels. The span of each ambiguity range is $\lambda /2$. Piston steps between segments generated by the simulations are allowed to be within $\pm 21\lambda _0$. It means that every piston step can belong to 84 different ambiguity ranges.

Figure 5 shows how the accuracy of classification branch evolves over training for both approaches. It was achieved a $94\%$ accuracy with convolution approach whereas a $98\%$ accuracy was reached with the recurrent one. It represents an error rate three times lower with recurrent approach. This final accuracy is a hard cap for this particular architecture. It means that the training process plateaus at this value.

Fig. 5. Classification accuracy during training.

Download Full Size | PDF

When the data set is limited a greater part of it is devoted to training the algorithm and the remaining data is used for testing. The goal in a supervised machine learning problem is to achieve good performance on the test set. The test set contains examples that the algorithm has not seen during training. In our scenario, new synthetic data samples are created on every training step. This data can be considered both training data and test data since they are used in the learning process and they have never been seen by the algorithm before. In consequence, the accuracy achieved by the algorithm during training can be also considered a good representation of the model’s predictive power.

The network predicts six values for each intersection. Three ambiguity ranges $C_1$, $C_2$ and $C_3$ that can take values from the set $\{1,\ldots ,84\}$, and three numerical values $p_1$, $p_2$ and $p_3$ within the range $[0,\lambda _0/2)$. Theses outcomes are combined to get the three actual piston step among segments predicted by the network, $\Delta \phi _1$, $\Delta \phi _2$ and $\Delta \phi _3$. Theses can take values in the range $\pm 21\lambda _0$ . The set of all piston step predictions for all adjacent segments form an overdetermined linear system of equations, $\boldsymbol {A}\boldsymbol {x}=\boldsymbol {b}$. In that linear system, the unknowns are the global piston values measured with respect to one of the segments. We can find these global piston values by solving the system through singular value decomposition. But before that, it is crucial to check the consistency of the predictions. One way of doing that is by knowing that the three piston step values of every junction must sum up to zero. If the predicted piston values for an intersection are $\Delta \phi _1=\phi _1-\phi _2$, $\Delta \phi _2=\phi _2-\phi _3$ and $\Delta \phi _3=\phi _3-\phi _1$. The consistency test states that:

(1)$$\begin{aligned} \left| \Delta \phi_1 + \Delta \phi_3 + \Delta \phi_3 \right| \leq K. \end{aligned}$$

If the summation of the three piston step values predicted is above some threshold in absolute value, $K$, then the predictions of the intersection are not included in the matrix $\boldsymbol {A}$. The threshold $K$ must be a value close to the accuracy of the regression branch. The rank of the matrix $\boldsymbol {A}$ then encodes the number of mirror segments that are going to be able to be resolved with SVD.

It is plotted the number of global piston values that can be solved from the predictions while training in Fig. 6. The network analyses a whole sequence of junctions from a mirror instance like the one in Fig. 1 at every step. It only needs 1000 of such images to be able to solve global piston values for all 36 segments at once. The error rate gets lower over time. Finally, it is found that no single segment was unresolved in any mirror instance presented to the algorithm after 2000 iterations.

Fig. 6. Number of global piston values that can be resolved during training.

Download Full Size | PDF

It can be computed the mean square error between the 36 global piston values predicted by the network and the 36 ground truth vales that were initially used in the simulations. That metric makes sense only in case of it being able to resolve all segments. Figure 7 shows these error values only when none of the segments was unresolved. This is the reason why the first part of the graph is empty. We can observe some outliers data points as well. These only happen at the beginning of the training because some errors bypass the consistency check due to the general instability of the predictions. After 3000 iterations the system predictions reach a mean square error of $0.00011\lambda _0^2$.

Fig. 7. Global piston error over training in logarithmic scale.

Download Full Size | PDF

5. Conclusions and future work

It has been shown in this paper a robust and reliable way of measuring global piston values in a segmented mirror. Several sources of errors were added to the simulations to test the robustness such as atmospheric turbulence, tip-tilt residuals and polishing errors. Only four visible wavelengths are needed for this method in order to operate, and no special hardware is needed.

It is really fast after the network is fully trained since it only takes one single forward pass of the network to predict the piston values. It can be used at any time during the observation. The usage of a recurrent neural network together with the consistency test allowed the algorithm to perform satisfactorily on a capture range as broad as $\pm 14.7$ µm. This result almost doubles our previous one where only CNNs were used [14]. This might still not be enough for phasing segments after replacements but suitable for keeping track of the drift nightly and crosschecking measurements of other methods [20]. However capture range could be broaden by adding more wavelengths to the training.

The classification errors that we found were virtually zero after enough training steps. Moreover, these errors can be detected by the algorithm in advance to know which of the segments are unresolved in a trial.

The global piston values measured with this method have mean $rms = 7.34$ nm. These measured values have also a variability that corresponds to 5.5 nm standard deviation. That result suffices for the adaptive optics system to work properly and other more stringent science requirements [21,22]. All piston values referenced in this paper are measured at the wavefront.

Funding

Universidad de La Laguna.

Acknowledgements

This work was supported by Wooptix, a spinoff company of the Universidad de La Laguna.

Disclosures

The authors declare no conflicts of interest.

References

1. N. Yaitskova, “Adaptive optics correction of segment aberration,” J. Opt. Soc. Am. A 26(1), 59–71 (2009). [CrossRef]

2. G. Chanan, C. Ohara, and M. Troy, “Phasing the mirror segments of the keck telescopes ii: the narrow-band phasing algorithm,” Appl. Opt. 39(25), 4706–4714 (2000). [CrossRef]

3. G. Chanan, M. Troy, F. Dekens, S. Michaels, J. Nelson, T. Mast, and D. Kirkman, “Phasing the mirror segments of the keck telescopes: the broadband phasing algorithm,” Appl. Opt. 37(1), 140–155 (1998). [CrossRef]

4. S. Esposito, E. Pinna, A. Puglisi, A. Tozzi, and P. Stefanini, “Pyramid sensor for segmented mirror alignment,” Opt. Lett. 30(19), 2572–2574 (2005). [CrossRef]

5. J. M. Rodriguez-Ramos and J. J. Fuensalida, “Piston detection of a segmented mirror telescope using a curvature sensor: preliminary results with numerical simulations,” in Optical Telescopes of Today and Tomorrow, vol. 2871 (International Society for Optics and Photonics, 1997), pp. 613–617.

6. V. G. Orlov, S. Cuevas, F. Garfias, V. V. Voitsekhovich, and L. J. Sanchez, “Co-phasing of segmented mirror telescopes with curvature sensing,” in Telescope Structures, Enclosures, Controls, Assembly/Integration/Validation, and Commissioning, vol. 4004 (International Society for Optics and Photonics, 2000), pp. 540–552.

7. P. C. McGuire, D. G. Sandler, M. Lloyd-Hart, and T. A. Rhoadarmer, “Adaptive optics: Neural network wavefront sensing, reconstruction, and prediction,” in Scientific Applications of Neural Nets, (Springer, 1999), pp. 97–138.

8. D. Li, S. Xu, D. Wang, and D. Yan, “Large-scale piston error detection technology for segmented optical mirrors via convolutional neural networks,” Opt. Lett. 44(5), 1170–1173 (2019). [CrossRef]

9. X. Ma, Z. Xie, H. Ma, Y. Xu, G. Ren, and Y. Liu, “Piston sensing of sparse aperture systems with a single broadband image via deep learning,” Opt. Express 27(11), 16058–16070 (2019). [CrossRef]

10. A. Schumacher and N. Devaney, “Phasing segmented mirrors using defocused images at visible wavelengths,” Mon. Not. R. Astron. Soc. 366(2), 537–546 (2006). [CrossRef]

11. J. J. Fernández-Valdivia, A. L. Sedano, S. Chueca, J. S. Gil, and J. M. Ridriguez-Ramos, “Tip-tilt restoration of a segmented optical mirror using a geometric sensor,” Opt. Eng. 52(5), 056601 (2013). [CrossRef]

12. D. L. Fried, “Optical resolution through a randomly inhomogeneous medium for very long and very short exposures,” J. Opt. Soc. Am. 56(10), 1372–1379 (1966). [CrossRef]

13. M. G. Lofdahl and H. Eriksson, “Algorithm for resolving 2pi ambiguities in interferometric measurements by use of multiple wavelengths,” Opt. Eng. 40(6), 984–991 (2001). [CrossRef]

14. D. Guerra-Ramos, L. Díaz-García, J. Trujillo-Sevilla, and J. M. Rodríguez-Ramos, “Piston alignment of segmented optical mirrors via convolutional neural networks,” Opt. Lett. 43(17), 4264–4267 (2018). [CrossRef]

15. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, (2012), pp. 1097–1105.

16. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555 (2014).

17. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). [CrossRef]

18. A. Graves, “Supervised sequence labelling,” in Supervised sequence labelling with recurrent neural networks, (Springer, 2012), pp. 5–13.

19. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

20. N. Yaitskova, F. Gonte, F. Derie, L. Noethe, I. Surdej, R. Karban, K. Dohlen, M. Langlois, S. Esposito, E. Pinna, M. Reyes, L. Montoya, and D. Terrett, “The active phasing experiment: Part i. concept and objectives,” in Ground-based and Airborne Telescopes, vol. 6267 (International Society for Optics and Photonics, 2006), p. 62672Z.

21. C. Cavarroc, A. Boccaletti, P. Baudoz, T. Fusco, and D. Rouan, “Fundamental limitations on earth-like planet detection with extremely large telescopes,” Astron. Astrophys. 447(1), 397–403 (2006). [CrossRef]

22. P. Martinez, A. Boccaletti, M. Kasper, C. Cavarroc, N. Yaitskova, T. Fusco, and C. Vérinaud, “Comparison of coronagraphs for high-contrast imaging in the context of extremely large telescopes,” Astron. Astrophys. 492(1), 289–300 (2008). [CrossRef]

Parameter	Value
Focal length of telescope	$f = 250$ m
Defocus distance	$l = 9$ m
Diameter of telescope	$D = 10.4$ m
Detector array size	$1024 \times 1024$
Fried parameter	$r_{0} (@ 500 n m) = 0.1$
Largest wavelength	$λ_{0} = 700$ nm
Polishing error	$r m s = 20$ nm
Tip-tilt error	$t_{e} = \pm {0.002}^{''}$

Global piston restoration of segmented mirrors with recurrent neural networks

Abstract

1. Introduction

2. Physical background of the problem

3. Simulations

4. Learning process

5. Conclusions and future work

Funding

Acknowledgements

Disclosures

References

Cited By

Figures (7)

Tables (1)

Equations (1)

OSA Continuum