Chromatic aberration correction employing reinforcement learning

Katharina Schmidt; Katharina Schmidt; Ning Guo; Wenjie Wang; Juergen Czarske; Juergen Czarske; Juergen Czarske; Juergen Czarske; Nektarios Koukourakis; Nektarios Koukourakis

doi:10.1364/OE.487045

1. Introduction

By using fluorescence microscopy, different tissue structures in biological samples can be investigated, which are usually invisible without labeling. To excite a structure of interest, a certain excitation wavelength $\lambda _{ex}$ is used, which allows the structure to emit fluorescent light with a longer wavelength $\lambda _{em}$ [1]. Often several labels are employed to mark different parts of the sample, thus the focal spots of several structures may become wavelength-dependent shifted. The misalignment of the focal spots with respect to the wavelength is called chromatic aberration (CA) [2–5]. CAs lead to reduced resolution in images because structures emitting fluorescent light with a slightly different emission wavelengths are out of focus and the signal is thereby blurred or even not detected at all. Furthermore, also the excitation path might be affected and the focal spots don’t hit the structures for investigation properly. This makes the correction of CA important for various applications in biomedicine and was used in methods such as retinal imaging [6–8], dermoscopes [9] and transmission electron microscopy [10]. To correct CAs either algorithms for post-processing [2,11] or achromatic doublet lenses [3,12] are commonly used. However, fixed-lens based solutions such as achromatic doublet lenses or lens-systems correct only for systematic CA which would be induced by themselves and don’t correct for CAs induced by other parts of the system or sample.

Recently, an adaptive achromatic lens (AAL) was introduced, consisting of two fluid filled lens chambers sealed off with deformable membranes [13]. This AAL enables tuning the wavelengths for which the system shall be corrected. But it comes with the addition of multiple input voltages and piezoelectric actors, which induce a hysteresis in the behavior of the AAL. Therefore, we are using reinforcement learning (RL) as a technique of machine learning to control the AAL. In recent publications, RL has shown high potential as an alternative way of control for adaptive optics used in aberration correction [14–17]. Compared to other machine learning methods, e.g. deep learning, RL does not need a huge training dataset and is suitable for solving a problem sequence instead of only one input [18]. The principle of training an RL agent is illustrated in Fig. 1.

Fig. 1. Reinforcement learning mechanism (given vectors are specific for AAL, refer to section 3.1).

Download Full Size | PDF

In RL, an agent is trained to solve a predefined task by using a set of actions over a certain sequence of steps. To train an agent to deal with a specific task, a training environment, usually a mathematical description, which models the problem to solve, is needed [18]. Prior to training, the agent can only generate random actions that are limited to the action space. During training, the environment generates new system observations based on the last actions of the agent. Then the agent chooses a new action with respect to the system observation. In the learning process, the agent will afterwards receive a reward based on the chosen action. The reward is a quantitative criterion, through whose value the agent can learn whether the last action or the set of actions is good or not. Such interaction between the agent and the environment repeats until the agent learns a useful way of choosing actions based on the observations, this learned behavior is then called a policy. [19]

2. Characterization of the adaptive achromatic lens

In this section we are providing details of the AAL, the experimental setup and results of its characterization.

2.1 Function principle

The AAL consists of two separate fluid-filled adaptive lenses, similar to the presented single lenses in [13,20]. Both lens chambers are filled with different optical oils and placed back-to-back together as shown in Fig. 2. This leads to different refractive indices $n_{1}$, $n_{2}$ of the chambers filled with decalin and paraffin. The surfaces of the chambers can be deformed by two rings of piezo ceramics and thereby deform the wavefront passing through the lens.

Fig. 2. Illustration of the adaptive achromatic lens containing two lens chambers and four piezo-ceramic rings. (a) 3D model. (b) Cross section.

Download Full Size | PDF

Each surface has one ring placed on top (actuator voltages $V_{1},V_{3}$) and one underneath (actuator voltages $V_{2},V_{4}$). These piezo-rings can be tightened or elongated by applying different voltages. As shown in [21] based on two rings per surface, it is possible to employ bending and buckling of the surface to shift the position of the focal spot. This shift in combination with different refractive indices of the optical fluids in the lens chambers enables the AAL to realign the focal spots for different wavelengths and satisfy eq. (1) [3]. Here, $f_{1}$ and $f_{2}$ are the focal lengths of wavelength 1 and 2, which are each passing through the decalin chamber and paraffin chamber, respectively; the CA is calculated as the difference of both.

(1)$$CA = |f_{1} - f_{2}| \overset{!}{=} 0$$

For a correction of CA, this value has to be minimized until $f_1 \cong f_2$ is valid. This is assumed valid, if $|f_1-f_2|\leq 2.2\mu m$. This threshold is $0.0001\%$ of the maximum range of CA correction performed by the AAL (see section 2.3).

2.2 Experimental setup for characterization

For characterization of the AAL wavelength multiplexing digital holography is used, this requires an object beam and a reference beam for each wavelength [22–24]. For the experiment two wavelengths of $\lambda _1 = 532nm$ (compact laser module CPS532, Thorlabs Inc., Newton, NJ) and $\lambda _2 = 635nm$ (compact laser diode module LDM635 Thorlabs Inc., Newton, NJ) were chosen. A scheme of the experimental setup is shown in Fig. 3. Laser beams of both wavelengths are collimated by lens group 1 and 2 and then combined by a beam splitter. The setup from Fig. 3 consists of three beam paths, two reference paths for the wavelengths $\lambda _{1}$ and $\lambda _{2}$ and one object-path going through the AAL. To obtain a dual wavelength hologram, it is necessary to separate the reference beams according to the wavelengths. This is realised by using two filters (long pass FEL550 and short pass FES550, both from Thorlabs Inc., Newton, NJ) inside the reference paths. The 4f-system of lens group 3 (all lenses from Thorlabs inc, Newton, NJ) was used to project the AAL with a magnification of $M = 0.5$ onto the camera (UI-3582LE, colour depth 12 bit and pixel rate $2.2~\mu m$, IDS Imaging, Obersulm, Germany) and thereby obtained one off axis hologram for both wavelengths. Due to the 4f-system (lens group 3) projecting the plane of the AAL onto the camera, a magnification of 2 is achieved, this influences the pixel rate of the camera and leads to $4.4\mu m$, which allows a field of view in our setup of about $11264~\mu m ~\times ~8448~\mu m$. Reference measurements of our system are provided in Supplement 1.

Fig. 3. Experimental Setup to characterize the behavior of the adaptive achromatic lens (AAL). Plane A shows the deviation of the lateral beam positions, which lead to off-axis-holography. Inlet B presents an example of a captured hologram. The spatial separation of the spectra in the Fourier Space is shown in inlet C. M = magnification; d = diameter; f = focal length.

Download Full Size | PDF

By the interference of object and reference beams on the camera, a hologram is captured and can be reconstructed numerically afterwards [22]. Here, we use two different wavelengths to obtain the CA of the current setup including the AAL. A simple solution to capture the holograms of two different wavelengths simultaneously is to use two separate cameras. This would lead to a complex setup, requiring additional optical elements to separate the interfering beams. Instead we separate the spectra in the Fourier plane as shown in inlet C of Fig. 3. For the separation, off-axis-holography is required, resulting in a shift of the spectra in the Fourier space depending on the angle and fringe spacing of the hologram. In off-axis holography, a certain angle between reference and object beams is necessary. This is ensured by adjusting the lateral beam positions in plane A as shown in Fig. 3. [23]

2.3 Characterization results

Here, we provide the characterization results, which were derived from the experimental setup presented in section 2.2. To avoid ambient influences, the room temperature and pressure were held constant during all the experiments ($20.5~^\circ C$, $1~bar$).

The AAL has four voltages to control: $V_{1}$ & $V_{2}$ actuate the membrane of the decalin-filled lens chamber and $V_{3}$ & $V_{4}$ actuate the paraffin-filled lens chamber. The limitations of voltages are $-30V$ and $120V$ in order not to damage the piezo-ceramic actors.

To investigate the capabilities of each piezo of the AAL, only a single voltage was tuned from $0V$ to $120V$ to $-30V$ and back to $0V$ while the three other voltages remained at $0V$. The results of this measurement series are shown in Fig. 4; details about the calculations to derive the focal lengths are provided in Supplement 1. Hence, it can be seen that the focal spot measured in diopters has an axial tuning range of $1.2~Dpt$. Furthermore, the overall lens behavior when tuning $V_1$ is similar to tuning $V_4$, same can be recognized in $V_2$ and $V_3$. This can be explained by the positions of the actuated piezo-rings. Both $V_1$ and $V_4$ are the voltages whose piezo-rings are glued on different membranes, but both facing the same side (laser side), while $V_2$ and $V_3$ are facing the camera side of the setup. The CA shown in Fig. 4 is derived from the difference of the foci $CA = f_{635~nm}-f_{532~nm}$. This leads to a maximum range for CA correction of $\Delta CA_{max} = 2200~mm$ (see Fig. 4 Tuning Voltage 4).

Fig. 4. Tuning of single voltage from $0V$ to $+120V$ (step 24), then to $-30V$ (step 54) and back to $0V$ in $5$ steps, while all other voltages remain $0V$.

Download Full Size | PDF

Furthermore, the membrane positions for maximum CA introduction have been investigated. From the two operation modes of the lens, the bending mode was chosen for this experiment, due to additionally induced spherical aberrations in the other mode, namely buckling mode. For the maximum CA, both membranes need to bend in the same direction, this is achieved by applying the voltages as shown in Fig. 5.

Fig. 5. Measured focal lengths and derived chromatic aberrations for extrema positions of the lens membranes with bending in both directions.

Download Full Size | PDF

For bending in both directions, contrary behaviors in the measured focal length are observed. However, bending to the one-hand side (see Fig. 5 left side) has doubled the range of CA compared to bending to the other side (see right side of the same figure). The overall maximum range of CA correction was the same as in the first characterization study, where only one voltage was tuned, $\Delta CA_{max} = 2200~mm$. Also, the focal length shifting range had the same maximum value of $4000~mm$ as in the single voltage tuning study.

3. Control by reinforcement learning

Since this is a multi-input system, it is hard to control by classical algorithms. Furthermore, the conventionally used iterative algorithms may lead to photo-toxicity and photo-bleaching due to enlarged exposure of the probe [25], which also causes enlarged measurement times. In addition, a system calibration before starting the measurement is often necessary and prolongs the measurement time. RL also has the potential to overcome these issues.

In this research, the AAL behavior was developed as a mathematical description and used to build a training environment (refer to Fig. 1 in section 1.). Notably, a random system-induced CA was added to the environment, in order to train RL agents with good generalization performance. A chosen RL agent (see Fig. 6) was trained in this environment. After training, the trained agent was tested on the experimental setup introduced in section 2.2, and the results from this phase are presented in section 3.4.

Fig. 6. Agent architecture.

Download Full Size | PDF

3.1 Design of training and environment

To train a RL agent for control of the AAL and thereby dynamic correction of chromatic aberrations, a simulation-based training environment was chosen. Referring to Fig. 1, the behavior of the AAL was modelled as focal power of a certain wavelength in dependency of the applied voltages from $V_1$ to $V_4$, lens parameters (e.g. aperture, optical fluid) and a random system induced CA.

The current CA of the whole system is the first element in the observation for the agent, and the other four elements of the observation are the applied voltages $V_1$ to $V_4$. In total, five values represent the observation of the system, which results in Eq. (2).

(2) $$obs = [CA, V_1, V_2, V_3, V_4]$$

The four actions are the changes of the voltages in relation to the history of the previous voltages. Both action and observation space are chosen to be continuous.

The reward function $r = r_1+r_2+r_3$ consists of three parts representing penalty for voltage boundary violation (see Eq. (3) with $i \in \left \{ 1,2,3,4 \right \}$), reward or penalty for reducing or increasing chromatic aberration compared to the last step (see Eq. (4)), and a possible reward once the chromatic aberration is reduced below the threshold. Each step starts with initial rewards of $r_1 = r_2 = r_3 = 0$. If no voltage boundary is violated, $r_1$ stays $0$ at this step. In Eq. (3), it is noticeable if any of the voltages violates the boundaries, $r_1$ will minus 1, additively. Thus, $r_1$ is the sum of the penalties resulted from the four voltages.

(3)$$\begin{aligned} r_{1} & =\sum_{i=1}^{4} r_{1}^{i}, \\ r_{1}^{i} & =\begin{cases} -1 & \text{ if }\ V_{i}<{-}30\ or\ V_{i}>120 \\ 0 & \text{ else } \end{cases}, i\in \left \{ 1,2,3,4 \right \} \end{aligned}$$

Function $r_2$ is determined by the decrease or increase of chromatic aberration. Thereby, $r_2$ determines the values to progress of the agent:

(4)$$r_2 = \begin{cases} 1 & \quad \text{if } |CA_{current}|<|CA_{last}| \\ -1 & \quad \text{if } |CA_{current}|>|CA_{last}| \\ 0 & \quad \text{if } |CA_{current}|=|CA_{last}| \end{cases}$$

In Eq. (4), $CA_{current}$ indicates the chromatic aberration at current step; $CA_{last}$ indicates the chromatic aberration at last step. In case the agent lowers the remaining chromatic aberration below the threshold of $0.001~Dpt$, which is defined by the authors in the training code as a mark that the chromatic aberration approximates zero, it gets an immediate reward of $r_3=500$ and the episode ends at once. If the agent does not achieve this goal, the training episode ends after 500 steps. The total number of epochs for training was set to $500.000$. After every $10.000$ epochs, the agent was evaluated. The training was performed on an Nvidia RTX-A6000 GPU using the python frameworks stable-baselines3 [26] and gym [27]. One training session took about 20 minutes, with eight training environments paralleled to speed up the process.

The agent architecture used in the training is illustrated in Fig. 6. It is a recurrent proximal policy optimization (PPO) agent [28] and has one critic and one long short-term memory (LSTM) actor. In Fig. 6, a feature extractor as input layer feeds the observation with five elements, namely current CA and the last four voltages, to a LSTM-actor and a critic. The LSTM-actor passes the signals through two LSTM layers, which manage to deal with the intrinsic hysteresis in the piezo actor of the AAL. Accepting the same input as LSTM-actor, the critic passes the signals through one linear layer, also known as fully-connected layer, and one rectified linear unit (ReLu) layer, which is used as activation layer. A multilayer perceptron (MLP) extractor concatenates the outputs from the LSTM-actor and the critic with a 64-neuron linear layer, and forwards the signals to a single-neuron ReLu layer. The output of this MLP extractor through a value net is used to evaluate the current policy. On the other side, the same output is forwarded to a two-layer policy net. An action net with a four-neuron linear layer receives the signal from the policy net and generates four actions (the four voltage changes), which will be fed back to the environment at the following step. All the neuron numbers of each layer are marked in Fig. 6.

3.2 Training results

To find an optimal set of hyperparameters, different combinations of hyperparameters had been tested and compared. An overview of the tuned parameters and their corresponding values is given in Table 1. In a RL context, the training process can be reckoned as an updating process for the weights of the neurons in the agent architecture, thus influencing the policy of the agent. The Learning rate $\eta$ is applied to control the influence of the last step on updating the weights of the neurons. Its value interval is $\eta \in [0,1]$. If it equals 1, it means the old weights are totally discarded and only the new weights are replacing the old ones; if it equals 0, it means the wights are not updated at all. The Clipping range is used as a measure to keep the updated policy close to the last policy, this avoids abnormal jumps in the changes of a policy. Another important measure is the Discount factor $\gamma$, it is used to balance the percentage of long-term reward and short-term reward as shown in Eq. (5). Its value interval is $\gamma \in [0,1]$. If it equals 1, it means only long-term reward should be considered; if it equals 0, it means only short-term reward should be considered. [19,26]

(5)$$G_t = \displaystyle\sum_{k=0}^{\infty} \gamma^k R_{t+k+1}$$

Here, $R_t$ is the reward at time $t$, $G_t$ is the overall return in a Markov process, which is the sum of the short-term reward and the expected long-term rewards in the future. [18]

Table 1. Tuned hyperparameters for different training sessions.

View Table

In Fig. 7 the average reward and episode length over all epochs in the training process are shown. It can be seen that the low learning rate of agent PPO_1 led to unsatisfactory results, because it was hardly able to achieve a positive reward or shorten the episode length needed during training. Agent PPO_5 was also not able to achieve the same reward at the end of the training as the other agents, the overlarge discount factor might be the reason for the failure. As a comparison, agent PPO_4 with a discount factor 0.99, larger than 0.95 and a bit smaller than 0.995, performed much better. Therefore, PPO_1 and PPO_5 were not considered anymore. All other agents did successfully maximize the reward during training. However, agent PPO_2 and PPO_4 needed more epochs to do so than the agents PPO_3.

Fig. 7. Agent training with tuned hyperparameters.

Download Full Size | PDF

From the training curves of PPO_2, PPO_3 and PPO_4 (see Fig. 7), even though all the six curves show constant wavelets, the overall trends of the both curves of PPO_4 fluctuate more drastically, for example from 5000 to 13000, at around 24000, 28000, 40000 and 49000 in episode length curve and from 0 to 10000 in reward curve. In another word, the agent PPO_4 shows greater volatility and instability than the agents PPO_2 and PPO_3.

Next to the training curves, the continuous evaluation besides training (see Fig. 8) gives an estimation of the agent performance. Here, the success-rate, episode length and cumulative reward are used as merit functions. Episode length and reward are calculated the same way as done in the training process. However, in evaluation, no backpropagation and policy update was carried out. The success-rate $s$ over $n$ episodes is defined as the fraction of successful episodes over all episodes in the evaluation (see Eq. (6)).

(6)$$s = \frac{\sum n_{success}}{\sum n}$$

From the evaluation results in Fig. 8, it can be seen clearly that agents PPO_1 and PPO_5 are not able to learn a promising policy on time. The agents PPO_2 and PPO_4 show again higher volatility in reward and episode length, as well as in the success rate. Hence, agent PPO_3 was chosen as the best agent and used for further simulations and a validation run with the experimental setup introduced in section 2.2.

Fig. 8. Evaluation of trained agents with tuned hyperparameters.

Download Full Size | PDF

3.3 Simulation results

For the first test, the best agent (agent PPO_3) was tested based on the simulation of the AAL. The agent ran one epoch in the environment the same as in the training but with frozen weights in the actor and critic nets. The results are illustrated in Fig. 9, all four actor voltages start with $0V$ before the first step, and are then changed according to the policy of the trained agent. In the lower plot in Fig. 9, the resulting CA for each step in the simulation episode is shown. The episode starts with a CA of $0.071~Dpt$, which is the same value of the experimental setup with a human thyroid slide as in section 3.4, and was reduced to $0.00016~Dpt$.

Fig. 9. Detailed validation result of PPO_3 agent over one episode based on simulations, after completing the training. It took 11 steps to reduce the remaining chromatic aberration to $0.00016~Dpt$.

Download Full Size | PDF

As expected from a well-trained agent, the voltage limits (upper limit: $+120~V$, lower limit: $-30~V$) are not violated and there are no voltage jumps between the steps. Furthermore, the CA curve (lower plot in Fig. 9) shows a smooth decrease and the episode stops once the CA is reduced to the desired range of $\pm 0.001~Dpt$.

3.4 Experimental results

To prove the achromatic ability of the trained agent in the real world, the trained agent PPO_3 was used for testing on the optical setup from section 2.2. Therefore, the axial focal spot position obtained from the captured hologram was used as the first value of the above defined observation Eq. (2). The other values of the observation are the amounts of voltages applied to the AAL and are directly calculated based on the actions, due to starting values of $V_i = 0$ at the first step.

From the experimental result (see Fig. 10), the trained agent was able to correct the initial chromatic aberration of more than $0.07~Dpt$ to a value below $0.0001~Dpt$ within 3 steps. This correction was even faster than in the simulation after finishing the training (see Fig. 9). Generally speaking, the real CA in the experimental setup was reduced faster by using smaller voltages than in the previous simulation based on the modeled environment. This might occur due to an imperfect modeling of the system- and AAL-behavior. However, these differences don’t harm the capability of the agent, which is still able to minimize the CA to a range of $|CA|<0.0001~Dpt$. Comparing with the realistic chromatic aberration of a commercial lens, for example, LA1484-A by ThorLabs, which has a focal power difference between $532\ mm$ and $635\ mm$ laser light of around $0.032362\ Dpt$ calculated according to the simplified focal length equation for thick lens eq. (7) ($f$ is the focal length; $n$ is the refractive index; $R$ is the curvature of the lens), our AAL can reduce the CA below this value.

(7)$$f=\frac{n-1}{R}.$$

Fig. 10. Experimental result of PPO_3 agent over one episode based on the optical lab setup, after 3 steps the remaining absolute chromatic aberration was reduced below $0.0001~Dpt$.

Download Full Size | PDF

Furthermore, the correction of CA improves the brightness, image contrast and resolution, as shown in Fig. 11(A). Compared to the contrast value 212 of the image without correction, the image after correction shows a contrast value of 235. In (e, f, g) of Fig. 11(A), the green parts were clearly reduced comparing to (b, c, d), which infers a significant reduction in the lateral chromatic aberration.

Fig. 11. (A) Images of a stained human thyroid sample before and after the CA correction. (a) General view of the human thyroid sample before the CA correction. (b, c, d) Zoomed-in sections with more details before the CA correction. (e, f, g) Zoomed-in sections after the CA correction corresponding to (b), (c) and (d) respectively. (B) Comparison with line profiles in the RGB channels of the same position in (d) and (g). The analyzed parts are two lines with dimension of 210$\times$1 pixels throughout the length (210 pixels) indicated in (d) and (g).

Download Full Size | PDF

In Fig. 11(B), more peaks and troughs through a line plot from the red and green channels are visible and shifted to fast the same positions (marked with vertical dotted lines), which indicates the distribution of the intensities of different wavelengths tends to be the same. In other words, this is a sign that the foci of both wavelengths are after the correction closer on the image plane than before, namely effective longitudinal CA correction. Especially, it can be seen from all three plots at pixel 170 that after the correction a new peak in all color channels is visible. Before the correction this peak was e.g. in the red channel shifted to pixel 150, this indicates also the correction of lateral CA. In a word, although the training goal was based on the correction of longitudinal CA, the result shows both lateral CA and longitudinal CA correction in some extent. The sample used in this application example was a stained human thyroid microscope slide (Item 316700 from Carolina Biological Supply Company, Burlington, VT, US). The sample was placed directly in front of the AAL, in the focal plane of the 4f system to image it onto the camera.

Except to CA, spherical aberrations also exist in the system, which is not neglectable according to its significance (see Fig. 12). Before correction, the AAL is nearly an aplanatic lens with approximately plane surfaces. When the voltages are applied to the AAL, the lens surfaces begin to bend and the curvatures are altered. This induces spherical aberration [21]. However, the spherical aberration was not included in the environment for the agent training. Hence, the aim of the agent was formulated independently of the spherical aberration.

Fig. 12. Spherical aberrations during the experimental validation in the lab.

Download Full Size | PDF

4. Discussion

Since the training simulation was only based on the theoretical model of the AAL other aberrations than CAs have been omitted. Since there are clearly spherical aberrations and probably also others, the simulation could be enhanced by adding these influences to the training and also correcting them.

Furthermore, there was no temperature and ambient pressure change included in the simulation. However, this does not pose a problem, since the trained agent is in the end reacting to the observation of the real system, which already includes these influences. Also, temperature and ambient pressure do not influence the overall aim to correct for aberrations.

In Fig. 11(B), it can be observed that there are larger values in the blue channel than in the green one, although the sample was illuminated with only green and red laser light. This could be caused by the background scattering from the ambient or changes of the light by the sample. The purplish red instead of ideal red image in Fig. 11(A) may be a secondary evidence. This behavior has to be investigated further in the future to improve the AAL and adapted to other wavelengths besides $532~nm$ and $635~nm$.

The AAL can be considered as an optical subsystem in a larger optical system such as a microscope. This is due to the ability of the AAL to deform the wavefront of light for different wavelengths in a slightly different way. Such a deformation than effects the imaging ability of the whole system. However, a successful integration in a system always depends on the control and the position of the camera used to capture the observations. We showed that training only based on the optical model of the AAL and a random CA leads to a working RL agent, which can control an optical system, which was not modelled in the training environment. We guess this is because the random CA in the training helped the agent to develop good generalization abilities. Though training based on different optical models as training environments should be investigated in future works. Furthermore, currently the AAL is limited to the correction of two wavelengths ($532~nm$ and $635~nm$), by adding a third lens chamber like in apochromats it might be possible to correct for more wavelengths.

5. Conclusion

The control of an adaptive achromatic lens is presented and its potential for chromatic aberration correction is investigated. The lens enables the correction of lateral and axial chromatic aberrations and can thereby improve contrast and image quality in microscopy, which is crucial for biomedical applications. Furthermore, we reported details about training, validation and testing of a Reinforcement Learning agent to control the adaptive achromatic lens for chromatic aberration correction. The chromatic aberration correction induced by a biological sample was demonstrated to prove the ability of the trained agent to correct not only system-induced but also sample-induced aberration. In the experimental test the chromatic aberration have been reduced from $0.071~Dpt$ to below $0.0001~Dpt$. Thus, in the future this technique can be transferred to other relevant biomedical applications, where the simultaneous investigation of stained structures with different fluorescent stains poses an advantage in time.

Funding

Deutsche Forschungsgemeinschaft (Cz 55\32-2).

Acknowledgments

The authors thank E.Scharf and B.Krug (both from Chair of Measurement and Sensor System Technique at TU Dresden) for assistance with designing and manufacturing the electronic control unit for the adaptive achromatic lens. Furthermore, we want to thank H. Gowda and U. Wallrabe from the Department of Microsystems Engineering at University of Freiburg for providing the adaptive achromatic lens.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request. Source code for RL agent training, simulation and validation presented in this paper are available in this GitLab repository [29].

Supplemental document

See Supplement 1 for supporting content regarding calculation of focal length 355 based on the obtained hologram and reference measurements of the experimental system.

References

1. J. W. Lichtman and J.-A. Conchello, “Fluorescence microscopy,” Nat. Methods 2(12), 910–919 (2005). [CrossRef]

2. J. Chang, H. Kang, and M. G. Kang, “Correction of axial and lateral chromatic aberration with false color filtering,” IEEE Trans. on Image Process. 22(3), 1186–1198 (2013). [CrossRef]

3. F. Jenkins, Fundamentals of optics (McGraw-Hill, New York, 2001).

4. W. Smith, Modern optical engineering : the design of optical systems (McGraw Hill, New York, 2000), 3rd ed.

5. H. Gross, Handbook of optical systems, vol. 3 (Wiley-VCH, Weinheim, 2005).

6. E. J. Fernández, A. Unterhuber, B. Považay, B. Hermann, P. Artal, and W. Drexler, “Chromatic aberration correction of the human eye for retinal imaging in the near infrared,” Opt. Express 14(13), 6213–6225 (2006). [CrossRef]

7. P. A. Howarth and A. Bradley, “The longitudinal chromatic aberration of the human eye, and its correction,” Vision Res. 26(2), 361–366 (1986). [CrossRef]

8. X. Jiang, J. A. Kuchenbecker, P. Touch, and R. Sabesan, “Measuring and compensating for ocular longitudinal chromatic aberration,” Optica 6(8), 981 (2019). [CrossRef]

9. P. Wighton, T. K. Lee, H. Lui, D. McLean, and M. S. Atkins, “Chromatic aberration correction: an enhancement to the calibration of low-cost digital dermoscopes,” Ski. Res. Technol. 17(3), 339–347 (2011). [CrossRef]

10. B. Freitag, S. Kujawa, P. Mul, J. Ringnalda, and P. Tiemeijer, “Breaking the spherical and chromatic aberration barrier in transmission electron microscopy,” Ultramicroscopy 102(3), 209–214 (2005). [CrossRef]

11. J. T. Korneliussen and K. Hirakawa, “Camera processing with chromatic aberration,” IEEE Trans. on Image Process. 23(10), 4539–4552 (2014). [CrossRef]

12. F. L. Pedrotti, L. M. Pedrotti, and L. S. Pedrotti, Introduction to Optics (Cambridge University Press, Cambridge, 2017), 3rd ed.

13. H. G. B. Gowda, M. C. Wapler, and U. Wallrabe, “Tunable doublets: piezoelectric glass membrane lenses with an achromatic and spherical aberration control,” Opt. Express 30(26), 46528–46540 (2022). [CrossRef]

14. H. Ke, B. Xu, Z. Xu, L. Wen, P. Yang, S. Wang, and L. Dong, “Self-learning control for wavefront sensorless adaptive optics system through deep reinforcement learning,” Optik 178, 785–793 (2019). [CrossRef]

15. B. Pou, F. Ferreira, E. Quinones, D. Gratadour, and M. Martin, “Adaptive optics control with multi-agent model-free reinforcement learning,” Opt. Express 30(2), 2991–3015 (2022). [CrossRef]

16. J. Nousiainen, C. Rajani, M. Kasper, and T. Helin, “Adaptive optics control using model-based reinforcement learning,” Opt. Express 29(10), 15327–15344 (2021). [CrossRef]

17. K. Hu, Z. X. Xu, W. Yang, and B. Xu, “Build the structure of wfsless ao system through deep reinforcement learning,” IEEE Photonics Technol. Lett. 30(23), 2033–2036 (2018). [CrossRef]

18. E. Bilgin, Mastering Reinforcement Learning with Python (Packt Publishing Ltd., Birmingham, United Kingdom, 2020), 1st ed.

19. P. Winder, Reinforcement Learning (O’Reilly Media, Inc., Boston, MA, 2021), 1st ed.

20. K. Philipp, A. Smolarski, N. Koukourakis, A. Fischer, M. Stürmer, U. Wallrabe, and J. W. Czarske, “Volumetric HiLo microscopy employing an electrically tunable lens,” Opt. Express 24(13), 15029–15041 (2016). [CrossRef]

21. K. Philipp, F. Lemke, M. C. Wapler, U. Wallrabe, N. Koukourakis, and J. W. Czarske, “Spherical aberration correction of adaptive lenses,” in Adaptive Optics and Wavefront Control for Biological Systems III, vol. 10073T. G. Bifano, J. Kubby, and S. Gigan, eds., International Society for Optics and Photonics (SPIE, 2017), pp. 7–13.

22. N. Verrier and M. Atlan, “Off-axis digital hologram reconstruction: some practical considerations,” Appl. Opt. 50(34), H136 (2011). [CrossRef]

23. J. Kühn, T. Colomb, F. Montfort, F. Charrière, Y. Emery, E. Cuche, P. Marquet, and C. Depeursinge, “Real-time dual-wavelength digital holographic microscopy with a single hologram acquisition,” Opt. Express 15(12), 7231 (2007). [CrossRef]

24. T. K. Kreis, Handbook of Holographic Interferometry (John Wiley & Sons, Weinheim, Germany, 2004), 1st ed.

25. J. Icha, M. Weber, J. C. Waters, and C. Norden, “Phototoxicity in live fluorescence microscopy, and how to avoid it,” BioEssays 39(8), 1700003 (2017). [CrossRef]

26. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research 22(268), 1–8 (2021).

27. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” (2016).

28. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” (2017).

29. K. Schmidt, “RJ with AAL,” Gitlab (2017). https://gitlab.hrz.tu-chemnitz.de/kasc102c--tu-dresden.de/rl_with_aal

Chromatic aberration correction employing reinforcement learning

Abstract

1. Introduction

2. Characterization of the adaptive achromatic lens

2.1 Function principle

2.2 Experimental setup for characterization

2.3 Characterization results

3. Control by reinforcement learning

3.1 Design of training and environment

3.2 Training results

3.3 Simulation results

3.4 Experimental results

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (12)

Tables (1)

Equations (7)

Optics Express

	PPO_1	PPO_2	PPO_3	PPO_4	PPO_5
learning rate	0.0001	0.001	0.001	0.001	0.001
clipping range	0.2	0.18	0.2	0.2	0.2
discount factor	0.95	0.95	0.95	0.99	0.995