Inferring the solution space of microscope objective lenses using deep learning

Geoffroi Côté; Yueqian Zhang; Christoph Menke; Jean-François Lalonde; Simon Thibault

doi:10.1364/OE.451327

1. Introduction

Can we extract and exploit the features common to useful lens designs? Traditional lens design approaches do not fully harness the wealth of information contained in lens design databases. The typical use for them is the selection of a starting point for optimization; provided at least one entry matches the desired lens configuration, only the selected entry contributes to the optimization process at a given time. As a data-driven approach to lens design [1–4], lens design extrapolation (LDE) aims to make better use of such databases by extrapolating them to generate new designs. LDE-based methods can generate high-quality starting points to be used in conjunction with traditional approaches such as evolutionary algorithms [5–7] and other global optimization methods [8], local search [9,10], and saddle point construction [11].

Deep learning-enabled LDE approaches [12–14] train a deep neural network (DNN) model that learns the mapping from the desired input specifications to all required lens variables. The model is trained to reproduce the reference designs contained in the dataset (supervised training), and to extrapolate this knowledge by jointly minimizing an optical loss function (OLF) applied on inferred designs across a wide variety of specifications (unsupervised training). A key idea in deep learning-enabled LDE is to use a dynamic model [13,14]: by learning a shared representation for all lens sequences (or lens configurations), the model enables knowledge transfer across sequences. As an example, a dynamic model could learn to output designs with even-numbered glass elements that are extrapolated from odd-numbered reference designs.

In this work, we build on deep learning-enabled LDE [12–14] (Sec. 2.) and narrow our focus on moderate-NA microscope objective lenses (MOLs). To this end, we elaborate a unified framework (Sec. 3.1) for both inferred and reference MOLs—the latter composed of 34 exemplars from the MOL database of [15–17]. Here, we aim not only to extrapolate across new specifications but also across new sequences, while inferring designs that are similar in structure to the reference MOLs. We train the model to extrapolate across 7432 sequences of 5–10 glass elements. In contrast, prior work [12–14] did not fully exploit the knowledge transfer capabilities of dynamic models: the only lens sequences considered were those of the reference designs.

This change exposes a limitation of the aforementioned semi-supervised training objective: the trained model favors minimizing the OLF at the expense of diversity, and infers designs that cover only a subset of the reference MOL structures. Yet, structural diversity is required to extrapolate the solution space, and desirable when generating starting points for real-world problems with varied underlying OLFs. Addressing this issue requires changes to both the model architecture and the training objective. For the former, we use a one-to-many mapping (Sec. 3.2) by modifying the model to have $K = {8}$ output branches (see Fig. 1).

Fig. 1. Upon receiving a given set of specifications and lens sequence, the model outputs $K = {8}$ different lenses that share the same sequence but differ in structure. Here, the lens sequence is composed of single lenses (SLs) and cemented doublets (CDs) separated by air gaps (–). In all layout plots, the aperture stop is shown in orange, and the scale bar indicates the size of the designs in units of effective focal length (EFL).

Download Full Size | PDF

In adapting the training objective, we formulate a criterion that quantifies the structure of a MOL using the slopes of its marginal ray (Sec. 3.3). This criterion, which correlates well with standard MOL classification, can be used to compare MOLs regardless of their number of glass elements. Using this criterion, we augment the semi-supervised training objective: each output branch of the model is trained to infer designs with distinctive lens structures, and encouraged to capture a particular subset of the reference MOL structures (Sec. 3.4).

The trained model, by extrapolating across specifications, lens sequences, and lens structures, can be used as a mechanism to find varied, high-quality starting points for microscopy (Sec. 4.). For lens designers, inferring designs from the trained model can be used to bypass the optimization process as well as the search for a proper lens sequence. In particular, by mapping a significant portion of the solution space, the model can be used as an educational tool to better understand the capabilities of different lens structures such as Double-Gauss, Lister, and retrofocus. Finally, our contribution answers two challenges of the LDE framework [18]: extrapolating across new lens sequences and providing a meaningful one-to-many mapping.

2. Preliminary considerations

2.1 Deep optics

Many optical design components can be simulated numerically in a differentiable manner, and optimized using gradient-based optimization. Since this is also the case for deep neural networks (DNNs), there have been increasing efforts at integrating the simulation of optical designs within automatic differentiation frameworks [19,20] and combining them with deep learning-related applications, hence deep optics. By itself, the use of automatic differentiation for lens design can also provide a significant speed-up during optimization as well as increased numerical accuracy [18,21,22]. In end-to-end optics design [23–29], when training DNNs for computer vision tasks (e.g., depth estimation, increased depth of field, etc.), an additional processing step is added to simulate the capture of images using a given optical design; for additional performance, the optical design is optimized simultaneously along with the DNNs used for image processing. Deep learning-enabled LDE is a form of deep optics in which the DNN that outputs the designs is trained by applying automatic differentiation through ray-tracing operations.

2.2 Deep learning-enabled LDE

Deep learning-enabled LDE [12–14] uses at its core a DNN model that acts as a function approximator and learns a mapping from the set of input specifications to the set of output lens variables. The model accepts as input the lens sequence—defined herein as a particular arrangement of glass elements, air gaps, and aperture stop placement—and the desired specifications, in particular the field and aperture requirements. It returns all variables required to model the lens: surface curvatures, glass and air thicknesses, and glass variables for each glass element.

The DNN model is characterized by numerous parameters that need to be optimized to learn a high-quality mapping between input specifications and output lens variables, the process of which is called "training the model". In [12–14], the training objective consisted of extrapolating the solution space (unsupervised training) by learning from reference designs (supervised training).

Variable representation. To facilitate the training task, deep learning-enabled LDE uses intermediate lens variables that may differ from traditional lens design software. Here, we mention some aspects that have been used in prior work [14] and that are reused herein.

One consideration with both inferred and reference designs is to prevent the model from having to learn how to scale designs, by normalizing each lens to a unit effective focal length (EFL). To this end, as in common practice, the last curvature of the design is solved directly, while the other curvatures are inferred by the model as is. Another practice is to have the model output intermediate thickness variables $t'$ that are then smoothly rectified to positive values with $t=\ln (\exp (t') + 1)$; this prevents some training instability caused by negative thickness values. For glass materials, the dispersion curve of each inferred glass is represented by a discrete set of intermediate glass variables. In this text, we denote a lens by $l$ in general and use the notation $l'$ when considering the set of intermediate variables explicitly.

Model architecture and dynamic formulation. The model architecture consists of all operations that take place in the model as a set of inputs are hierarchically remapped to a set of outputs. In the context of LDE, we highlight an important distinction in model architectures: whether an architecture is static as in [12]—the number of inputs and outputs is predefined, and only a single lens sequence can be learned—or dynamic as in [13,14]. In LDE, a dynamic architecture allows knowledge sharing between different lens sequences—a single model learns a shared representation for all designs regardless of their lens sequence. Theoretically, a dynamic model can learn a set of features for all designs, then combine and adapt them depending on the input specifications and lens sequence. In this work, we explicitly leverage the capacity of knowledge transfer of dynamic architectures as we try to extrapolate across a vast diversity of lens sequences. We emphasize that for our application, all dynamic architectures are functionally the same whether they use recurrent, convolutional, or self-attention layers; the specifics (e.g., number of layers, dimensionality of hidden representations, activation functions, etc.) are of minor importance in understanding the approach.

Unsupervised training. The primary task in [12–14] is unsupervised training: the model, given random sets of specifications and lens sequences as input, is trained to minimize an optical loss function (OLF) for all output lenses. Where unsupervised training differs from standard lens design is that the lenses are not optimized directly; rather, what is optimized is the model parameters, so that it will progressively learn to output better lenses. Additionally, the same OLF is used for all lenses regardless of their lens sequence and must be designed accordingly. In this work, we build on the OLF of [14] which targeted the overall quality of a starting point, namely: optical performance (RMS spot size), ray tracing success, feasibility of glass materials, and shape of the glass elements.

Training domain. In unsupervised training, an important aspect is the choice of the training domain from which the specifications are drawn, which may be selected independently for each lens sequence. At inference time, the model should only be given specifications that are similar to the ones seen during training; otherwise, it will generalize poorly.

Supervised training. LDE aims to generate new designs inspired by a set of reference designs. Formally, each dataset entry $j$ is composed of the lens variables $l^{*}_j$ and lens sequence of the reference design as well as the specifications for which it was optimized. In supervised training, the model is trained to reproduce a given reference lens when shown the corresponding reference specifications. Even though supervised training only has a direct effect on lenses inferred from reference specifications, it offers a large gain in optical performance when extrapolating across new specifications [14]. Intuitively, supervised training provides the model with examples of successful lenses, thus narrowing the search across the solution space.

When extrapolating across lens sequences different from the reference designs, supervised training has a much weaker effect. In particular, the model is not forced to reproduce the structures seen in the reference designs. Solving this challenge is a primary motivation of our work as we try to extrapolate across a vast diversity of lens sequences.

3. Method

In this section, we build on the LDE framework of [14] to output high-quality microscope objective lenses (MOLs) from a vast variety of lens sequences, while capturing the structural diversity of the dataset. First, in adapting the framework to MOLs, we discuss many considerations concerning the dataset, training domain, OLF, and glass model (Sec. 3.1). Next, we provide the model with a one-to-many mapping (Sec. 3.2) that outputs varied lenses for a given set of specifications and lens sequence. To this end, we design an operation to quantify the structure of a MOL using the marginal ray slopes (Sec. 3.3), which we use in adapting the training objective to capture the structural diversity of the dataset (Sec. 3.4).

3.1 Microscope objective specifics

In this work, all MOLs need to be incorporated within a unified framework, whether they are part of the dataset or inferred by the model. Considering the rich history of the development of MOLs, design standards, and specifications, deciding on such a unified framework is no small task and must achieve a balance between simplicity, practicability, and exactitude. In this section, we explain how we model MOLs and their specifications, describe the dataset, explain our choice for the universal OLF used in unsupervised training, and detail our glass model.

Following the convention of microscope objective design, we consider the systems reversely: the small object field is considered as the image, while the intermediate field is considered as the object. Furthermore, although finite-conjugate microscope objectives have been widely used in the past, here we only consider modern infinite-conjugate systems with tube lenses. While recognizing that manufacturers have different strategies concerning the lateral color correction of MOLs, we only consider the solo correction type in which the lateral color is fully corrected within the objective lens. This allows us to discard tube lenses and only consider the MOLs.

Dataset. We use a subset of the microscope objective dataset that was constructed with 448 entries from patents in Zhang et al. [15–17]. The dataset covers a full range of magnification between 0.5–250x and numerical aperture (NA) between 0.025–1.70 with varied working distance (WD) and various color and field correction levels. Here, we focus on 34 examples with low magnification (10–20x) and medium NA (0.15–0.5) that fit within our unified framework (e.g., infinite conjugate and spherical lenses only). Some of them are represented in Fig. 2, which includes diverse structures such as Double-Gauss (a–c), Lister (d–f) or retrofocus (g–j) designs.

Fig. 2. Subset of the 34 reference MOLs and their corresponding patent numbers (the scale is in units of EFL). Multiple lens structures can be recognized, such as Double-Gauss (a–c), Lister (d–f) and retrofocus (g–j) lenses.

Download Full Size | PDF

Lister-type lenses, with a conventional structure consisting of two positive groups, allow excellent color aberration correction, but limited NA, field curvature correction, and WD. Double-Gauss designs, characterized by their quasi-symmetric structure, realize medium NA and WD with excellent color and field correction. Retrofocus designs have a negative rear group (relative to the object) and positive front and middle groups, which ensures long WD with a small "retrofocus factor" [16]. In this work, one of our main goals is to see whether these features are reproduced, utilized, or combined by the DNN model in extrapolating the solution space.

Under moderate NA, including a coverglass in a MOL does not usually require structural changes to the lens; when a reference MOL has a coverglass, we simply remove it and refocus the lens. We also include some designs with moderate vignetting (10–20%), even though we assume no vignetting within our unified framework.

Specifications. The first-order specifications of a MOL usually include the magnification, numerical aperture (NA), and intermediate field size. The magnification is inherently tied to the ratio between the effective focal length (EFL) of the objective and tube lens, which we effectively ignore since all reference and inferred MOLs are normalized to unit EFL. To avoid taking into account the scale of the designs, we only consider designs with similar magnification for our reference MOLs. For compatibility with the existing LDE framework, the numerical aperture is converted to the entrance pupil diameter (EPD)—equivalent to the inverse of the f-number, because of unit EFL—while the image size is converted to object space field angle, denoted as the half field-of-view (HFOV). As a result, the specifications given as input to the model consist only of the EPD and HFOV (radians). For convenience, we mention all aperture values in this text in terms of NA. An overview of the specifications is given in Supplement 1.

Lens sequences and training domain. When training the model, we must decide on the lens sequences that will be drawn during unsupervised training as well as their training domain, i.e. the range of valid specifications (see Fig. 3). We generate a wide variety of lens sequences, unlike prior work [12–14] which considered only the lens sequences found in the reference dataset. We consider all possible combinations with up to 6 lens components—either a single glass element, cemented doublet, or cemented triplet—and keep only the sequences with a total of 5–10 glass elements. For each combination, we consider all optical surfaces as potential aperture stop placements, as well as the air gap between two lens components (in which case the model infers air thicknesses on each side), for a total of 7432 lens sequences.

Fig. 3. Training domain used in our experiments, represented by the boxed area.

Download Full Size | PDF

Next, we select a reasonably wide training domain that encompasses the specifications of many reference MOLs. While recognizing that lenses with more glass elements can fulfill more challenging specifications, we use the same domain for all lens sequences: during training, the NA is drawn randomly and uniformly between 0.35–0.45 and the HFOV between 2.5–4.5°. Near the center of the training domain ($\mathrm {NA}=0.40$ and $\mathrm {HFOV}=3.57^{\circ}$), we define the "nominal case" corresponding to a common 20x/0.40 objective with a 200 mm tube lens and a 25 mm intermediate image size; in following experiments, the nominal case will be used as a case study.

Constraints and optical loss function. Our OLF builds on the one presented in [14], which minimized (1) the average geometric RMS spot size at different fields and wavelengths, (2) ray failures and overlapping surfaces, (3) undesirable diameter-to-thickness ratios of glass elements, and (4) the distance between the inferred intermediate glass variables and their closest counterparts in common glass catalogs. We add new targets to the OLF to encourage longer WD and limit the total track length (TTL) to 6 times the EFL.

The ray-tracing operation is similar to the one described in [14], except that we now consider the "g" Fraunhofer line (435.8 nm) on top of the "C" (656.3 nm), "d" (587.6 nm), "F" (486.1 nm) lines, as most reference MOLs in our dataset were optimized for those wavelengths. We use equal weights for all four Fraunhofer lines. The spot size is averaged across the field values $\{0, 0.7, 1\} \times \mathrm {HFOV}$ with weights $\{2, 1, 1\}$. Even though most MOLs have some degree of telecentricity, for simplicity we ignore this aspect in our OLF and ray-tracing operation. The complete formulation of the OLF is given in Supplement 1.

Glass model. The glass model used in previous work [14] is inadequate for MOLs, as it assumes normal dispersion and only considers the refractive index at the "d" wavelength $n_\mathrm {d}$ and the Abbe number $v_\mathrm {d}$. Here, we choose to bypass traditional glass variables such as the Abbe number and infer the refractive indices at all four wavelengths directly, though in an intermediate representation $\mathbf {g} = \{g_1, g_2, g_3, g_4\}$. We follow the procedure described in [18], which uses principal component analysis (PCA) to normalize and disentangle the refractive indices at different wavelengths. The invertible PCA model is used to linearly transform the refractive indices between their intermediate and original representations: $\mathbf {g} = \mathrm {PCA}\left (n_\mathrm {C}, n_\mathrm {d}, n_\mathrm {F}, n_\mathrm {g}\right )$. We use 698 glass materials from the Schott [30] and Ohara [31] glass catalogs—the same glass materials as [18]—to both fit our PCA model and formulate the glass penalty used in our OLF.

3.2 Adapted model for one-to-many mapping

In practice, we observe that simply adapting the LDE framework of [14] to MOLs leads to unsatisfactory results since the model has no incentive to capture the structural diversity of the dataset. As seen in Fig. 4, a model trained using the aforementioned semi-supervised training objective outputs a majority of retrofocus or Lister-type lenses with low element power, and mostly ignores the other structures. In particular, even though there exist many viable lens configurations for a single set of specifications and lens sequence—the mapping is one-to-many—the framework of [14] can only capture one. To output lenses whose structural diversity matches that of the dataset, both the model and training objective need to be adapted.

Fig. 4. Random subset of designs inferred from a model trained without the augmented training objective ($K=1$), for 5–10 glass elements (the scale is in units of EFL). The designs do not capture the structural diversity of the dataset. Most importantly, only one design can be obtained for a given lens sequence and set of specifications.

Download Full Size | PDF

Machine learning applications that require a one-to-many mapping usually opt for a continuous, smooth formulation: additional "latent" variables are given as input to the model; as the latent variables are slightly altered, so are the model outputs. In lens design, however, "good" designs for given specifications are inherently multimodal, meaning that a lens that is altered continuously between one form and the other, e.g. from retrofocus to Double-Gauss, will behave poorly in most of the intermediate forms. Based on this intuition, we opt for a simpler discrete formulation such that the model outputs not one, but $K$ lenses for a given set of specifications and lens sequence. To capture a reasonably large structural diversity in our experiments, we set the number of output lens branches $K$ to 8. The complete model architecture, based on a transformer encoder structure [32], is detailed in Supplement 1.

Paraxial image solve. In contrast to [14], instead of treating the distance from the last optical surface to the image plane $t_\mathrm {last}$ the same way as the other thicknesses, we implement a paraxial image solve (PIM)—similar to optical design software—so that only the defocus is inferred and optimized. Specifically, every time a lens is inferred, we compute the back focal length (BFL) using all other lens variables and add it to the intermediate thickness variable ${t'}_{\mathrm {last}}$ inferred by the model: $t_{\mathrm {last}} = {t'}_{\mathrm {last}} + \mathrm {BFL}$. This change leads to a major improvement in training speed and performance, as shown in the ablation study presented in Supplement 1.

3.3 Structural distance

The existing LDE training objective has two issues: it (1) provides a weak form of supervision—the model is forced to output MOLs similar to the reference MOLs only when given the reference specifications—and (2) does not encourage structural diversity whatsoever since the unsupervised training objective only seeks to minimize the OLF. Ideally, we want the inferred designs to be close in structure to each other if they are outputted from the same branch, and different otherwise. Additionally, we want the structures of most reference MOLs to be well represented in the inferred designs.

To address those challenges, we require a criterion to quantify the structure $s$ of a lens, from which a measure of structural distance $\mathrm {sd}(s_1, s_2)$ between two lens structures can be established. We require this distance to be valid between designs of different lens sequences, in particular when comparing designs that do not have the same number of glass elements. In formulating such a criterion, we first observe from the reference MOLs shown in Fig. 2 that the shape of the marginal ray is strongly related to the structure of a design: in retrofocus designs, the light first diverges, then converges; in Double-Gauss designs, there is a noticeable "chokepoint" in the rear or middle elements. With this in mind, for the specific case of MOLs, we use the slopes of the marginal ray of a lens—computed using the paraxial approximation, hence paraxial marginal ray (PMR)—to estimate its structure $s$. Figure 5 illustrates how the structural distance is computed, using as examples the 10 reference MOLs from Fig. 2.

Fig. 5. Illustration of the process to compute the structural distance, using the 10 reference MOLs of Fig. 2. In (a), the PMR coordinates are first scaled to unit TTL (top), then passed through a derivative-of-Gaussian filter (bottom), giving us an estimation of each structure $s$. In (b), from the PMR slopes of every selected reference MOL, we compute the pairwise structural distance using Eq. (1). Designs with similar structures are generally close to one another (a–c, d–f, g–j).

Download Full Size | PDF

We define an operation $S: \mathcal {R}^{|l|}\rightarrow \mathcal {R}^{d}$ that takes all variables of a lens $l$ as input and returns the estimated structure $s$: a $d$-dimensional vector that corresponds to the PMR slopes at evenly-spaced longitudinal coordinates $z = \{0, 1, \dots, d - 1\} / (d - 1)$ with $d = 51$ in our experiments. In order, we (1) compute the coordinates of the PMR at every optical surface and scale them to $\mathrm {TTL} = 1$, (2) interpolate them linearly at the set of coordinates $z$, and (3) estimate the slopes using the derivative-of-Gaussian filter ($\sigma =1/16$). Finally, we define the structural distance between the PMR slopes $s_1, s_2$ of two designs using a scaled Euclidean distance:

(1)$$\mathrm{sd}(\mathrm{s_1}, \mathrm{s_2}) = \frac{1}{d^{1/2}}\lVert s_1 - s_2\rVert_2 ~.$$

3.4 Augmented training objective

The augmented training objective includes three new structural losses. Additionally, for compatibility with the new one-to-many mapping, it adapts the unsupervised loss $L_\mathrm {U}$ and supervised loss $L_\mathrm {S}$ that were inherited from the framework of [14].

Unsupervised loss. At the beginning of every training iteration, we first sample $b_\mathrm {U} = 512$ sets of specifications $i$—including the lens sequence—from the training domain, and process them through the model to obtain $b_\mathrm {U} \times K$ lenses $l_{i, k}$.

As in [14], we compute the unsupervised loss $L_\mathrm {U}$ by averaging the OLF of the inferred designs using the geometric mean. To prevent the number of ray-tracing operations from scaling with $K$, for each set of specifications $i$, we consider only the lens inferred from a random branch $k^{(i)}$:

(2)$$L_\mathrm{U} = \exp \left( \frac{1}{b_\mathrm{U}} \sum_{i = 1}^{b_\mathrm{U}} \ln \mathrm{OLF}\left(\mathrm{NA}_i, \mathrm{HFOV}_i, l_{i, k^{(i)}}\right) \right) ~.$$

Structural loss functions. The three structural losses are computed from the same inferred lenses $l_{i, k}$ as above, using their corresponding structures $s_{i, k} = S(l_{i, k})$.

The structural diversity loss $L_\mathrm {SD}$ requires each branch to output designs that are, on average, distinct in structure from the other branches. At every training iteration, we first estimate the average structure $\overline {s_k}$ of the designs inferred by each branch $k$ (see Fig. 6):

(3)$$\overline{s_k} = \frac{1}{b_\mathrm{U}} \sum_{i = 1}^{b_\mathrm{U}}s_{i, k} ~.$$

Fig. 6. The PMR slopes criterion is used to compute (a) the average structures $\overline{s_k}$ and (b) the reference structures $s^{*}_{k}$—both are required when computing the structural loss functions. In (a), the illustrated mean and standard deviation correspond to the fully trained model. The distance between branches is enforced by the structural diversity loss $L_\mathrm{SD}$, while the deviations within a branch are controlled by the structural adherence loss $L_\mathrm{SA}$. In (b), the reference structures each represent a subset of the 34 reference MOLs. The dataset representation loss $L_\mathrm{DR}$ encourages the average structures to be similar to the reference structures, hence the similarities between both graphs.

Download Full Size | PDF

Then, we enforce the pairwise distance between the $K$ average structures $\overline {s_k}$ to exceed the threshold $t_\mathrm {SD}$ (set to $0.25$). For this, we use the ramp function $R(x) = \max (0, x)$ so that the loss only influences the training process when the threshold is not met:

(4)$$L_\mathrm{SD} = \frac{2}{K^{2}-K} \sum_{k = 1}^{K - 1}\sum_{k' = k + 1}^{K} R\left( \mathrm{t}_\mathrm{SD} - \mathrm{sd}(\overline{s_k}, \overline{s_{k'}}) \right) ~.$$

The structural adherence loss $L_\mathrm {SA}$ ensures that each inferred design is closer in structure to its corresponding branch $k$ than to the other branches $k'$. We compare the structural distance between each inferred design and the $K$ average structures; if the smallest distance occurs for $k' \ne k$, we penalize the gap in distance:

(5)$$L_\mathrm{SA} = \frac{1}{b_\mathrm{U}K} \sum_{i = 1}^{b_\mathrm{U}} \sum_{k = 1}^{K} \mathrm{sd}\left({s_{i, k}}, {\overline{s_k}}\right) - \min_{k'} \mathrm{sd}\left({s_{i, k'}}, {\overline{s_{k'}}}\right) ~.$$

The dataset representation loss $L_\mathrm {DR}$ is used to encourage the inferred designs to capture the structures of the reference MOLs $l^{*}_j$, which we denote as $s_j = S(l^{*}_j)$. To formulate the loss, we first associate each branch to a corresponding reference structure $s^{*}_k$, where each reference structure should represent a distinct subset of the structures $s_j$. To this end, we first apply the K-means clustering algorithm on the structures $s_j$ to form $K$ clusters, then select the resulting $K$ centroids as our reference structures (see Fig. 6). The structural distance between each inferred lens $l_{i, k}$ and its corresponding reference structure $s^{*}_k$ should not exceed the threshold $t_\mathrm {DR}$ (set to $0.05$):

(6)$$L_\mathrm{DR} = \frac{1}{b_\mathrm{U}K} \sum_{i = 1}^{b_\mathrm{U}} \sum_{k = 1}^{K} R\left( \mathrm{sd}\left({s_{i, k}}, {s^{*}_k}\right) - t_\mathrm{DR} \right) ~.$$

This loss is meant to act as a prior at the beginning of training to ensure that all the reference structures are initially represented; to allow the model to search for the best designs, we decay its weight to 0 during the first half of training using a cosine half-cycle.

Supervised loss and total loss. The supervised loss is computed from a separate batch than the other loss functions: we draw $b_\mathrm {S} = 64$ supervised samples that each correspond to one of the reference MOLs $j$, and process them through the model. With the supervised loss, the model learns to reproduce the reference MOLs when shown the corresponding specifications. In [14], the supervised loss $L_\mathrm {S}$ was computed by comparing the set of intermediate lens variables $l'_i$ of each inferred lens to the corresponding reference design ${l'}^{*}_{j^{(i)}}$:

(7)$$L_\mathrm{S} = \frac{1}{b_\mathrm{S}} \sum_{i = 1}^{b_\mathrm{S}} \frac{1}{|l'_i|} \lVert l'_i - {l'}^{*}_{j^{(i)}} \rVert_2^{2} ~.$$

With a one-to-many mapping, however, only one of the $K$ inferred lenses should be compared to the reference design. Thus, to find the branch $k^{(j)}$ that best represents each reference MOL $j$, we compare its structure $s_j$ to the aforementioned $K$ reference structures $s^{*}_k$—prior to training—and select the branch whose structural distance is the lowest:

(8)$$k^{(j)} = \underset{k}{\mathrm{argmin}}~ \mathrm{sd}\left({s_j}, {s^{*}_k}\right) ~.$$

Finally, to compute the total loss $L$ at every training iteration, we associate a weight $\lambda$ for each loss term but one, and sum:

(9)$$L = L_\mathrm{U} + \lambda_\mathrm{S} L_\mathrm{S} + \lambda_\mathrm{SD} L_\mathrm{SD} + \lambda_\mathrm{SA} L_\mathrm{SA} + \lambda_\mathrm{DR} L_\mathrm{DR} ~.$$

We set $\lambda _\mathrm {S}$ to 10 and weight the three structural losses initially with $\lambda _\mathrm {SD} = \lambda _\mathrm {SA} = \lambda _\mathrm {DR} = 1$, while only decaying the latter. We provide additional training details in Supplement 1.

4. Experiments

The trained model can infer lenses corresponding to 7432 different lens sequences for any set of specifications in the aforementioned training domain (see Fig. 3). To narrow our analysis, in this section we do a case study in which we aim to infer designs corresponding to the nominal case ($\mathrm {NA} = 0.40$, $\mathrm {HFOV} = 3.57^{\circ}$, $\mathrm {mag} = 20$x, $\mathrm {EFL} = {10}\,{\mathrm{mm} }$). In the following, all results are generated from the same trained model where the input specifications (NA, HFOV) are set to the values above, but all lens sequences are considered. Due to the deterministic nature of our model and one-to-many mapping (with $K= {8}$ output branches), once the input specifications are frozen, we can generate exactly $ {7432} \times {8} = {59\;456}$ different designs. These experiments aim to replicate the design process in which a lens designer uses our trained model to find an initial design that satisfies the nominal specifications.

The model was trained over 500 000 steps, which took about 22 hours on a single Nvidia A100 GPU. Querying the trained model on all 7432 lens sequences takes about one second.

4.1 Qualitative results and structural diversity

Using the aforementioned nominal case, we show the 2D layouts of a subset of the inferred MOLs in Fig. 7. Specifically, for each output branch and number of glass elements between 5–10, we consider all lens sequences with the correct amount of glass elements, and show the 2D layout of the one that minimizes the OLF. Other randomly picked examples are shown in Supplement 1.

Fig. 7. Subset of designs inferred from all branches $k$ of the trained model, using the nominal specifications (the scale is in units of EFL). For each branch, only six lens sequences are shown out of 7432 candidates. The selected lens sequences are those that minimize the OLF for the given branch and number of elements.

Download Full Size | PDF

Qualitatively, we can observe a significant structural diversity in the inferred designs. Designs from branches 1 and 3 are clear examples of retrofocus designs. In branches 5 and 8 can be found Lister-type lenses, characterized by the presence of two positive groups. Designs from branches 2 and 6 most resemble Double-Gauss lenses, though only a few examples could be clearly classified as such. Interestingly, the model does not always put two negative lenses back-to-back to reproduce the "chokepoint" structure of Double-Gauss designs.

As intended, there is a strong connection within each branch. In some cases, as more glass elements are added, it appears that a lens "evolves" into a more sophisticated version of the same family.

4.2 Optical performance

Here, we show how the inferred designs fare in terms of on-axis RMS spot size (Fig. 8)—computed from all wavelengths using geometrical optics—and working distance (Fig. 9). We group the 59 456 inferred MOLs by branch index and number of glass elements, and show each distribution. The designs are scaled to the focal length of the MOL in the nominal case, namely $\mathrm {EFL} = {10}\,{\mathrm{mm} }$.

Fig. 8. Distribution of the on-axis RMS spot size of all inferred designs, grouped by number of elements and output branch, for the nominal case (lower is better). For reference, the Airy disk diameter at the "d" Fraunhofer line is shown with a dashed line.

Download Full Size | PDF

Fig. 9. Distribution of the WD of all inferred designs, grouped by number of elements and output branch, for the nominal case. Longer WD enables more applications.

Download Full Size | PDF

Figures 8 and 9 illustrate the strengths and weaknesses of all branches. We observe that branches 4 and 5 have a decent on-axis optical performance with only a few glass elements. In contrast, most of the other branches perform poorly with a limited number of elements but have a large gain in performance as more elements are added. Branches that output Double-Gauss-like designs (branches 2 and 6) are an interesting case: not all lens sequences can be used to create a "chokepoint" structure, which may explain why only a moderate proportion of the designs can attain a good performance. We also observe that most branches strike a balance between optical performance and WD: one usually comes at the cost of the other, as is expected with MOLs. This tradeoff is shown explicitly in Fig. 10 for all 8-element designs in the nominal case.

Fig. 10. Tradeoff between on-axis RMS spot size and WD for all 8-element designs, for the nominal case. The Pareto front is populated by designs from different branches.

Download Full Size | PDF

Figure 11 shows the spot size distribution over different field values. As the spot size increases monotonically and steadily with the field, this justifies the choice of only sampling the spot size at relative field values {0, 0.7, 1} in our OLF formulation.

Fig. 11. Uniformity of the spot size over the field across all output branches, obtained from the distribution of all inferred designs in the nominal case (lower is better). The Airy disk diameter at the "d" Fraunhofer line is shown with a dashed line.

Download Full Size | PDF

These results highlight another benefit of the approach: by leveraging how the model maps a significant portion of the solution space, we can quantify the strengths and weaknesses of every lens structure to supplement the lens designer’s knowledge.

We provide more results in Supplement 1: additional analyses that show the distribution of the inferred glass materials and distortion, as well as ablation studies—experiments where we remove one or more components of our method to justify why they are required.

5. Conclusion

Although the LDE framework of [14] has been successful in extrapolating across a wide range of specifications, it cannot extrapolate across new lens sequences in a way that captures the structural diversity of the dataset, due to the diminished impact of supervised training. In particular, previous LDE frameworks [12–14] had a one-to-one mapping in essence, and could not output multiple lens structures for a given set of specifications and lens sequence. Here, using the PMR slopes criterion as a way to quantify the structure of a MOL, we showed that the DNN model can be used to create a one-to-many mapping that faithfully captures the structural diversity of the dataset. The trained model extrapolates across diverse specifications, lens sequences, and lens structures; as such, it can be used to retrieve high-quality, highly diverse starting points for microscopy. Additionally, by mapping a large portion of the solution space representing "good" MOLs, it can also be used to supplement the knowledge of lens designers regarding the capabilities and limitations of each type of lens structure.

In the perspective of microscope objective design, we note that future experiments could deal with more sophisticated tasks, in particular enlarged NA. With the current 20x/0.40 task, the aim is to evaluate the general diversity. When it comes to higher magnification and NA, the basic structure is always retrofocus but the detailed structures in the middle and rear group may vary; investigating this aspect could be a worthwhile avenue. With higher NA, it may be necessary to improve how ray aiming operates and to change the image quality criterion from spot size to wavefront error.

We note that the idea of quantifying the structure of a lens design could find success on its own in other lens design approaches, in particular global optimization. When the aim is to return a diversity of solutions, it is usually desired to only return solutions that are sufficiently distant from one another. In establishing a measure of distance between two designs, using the lens variables directly requires weighting different types of variables arbitrarily, and may not necessarily correlate with how lens designers would classify those designs. The sampled slopes of normalized marginal rays can be compared regardless of the lens sequence, and may be more correlated with the lens structure. In other applications, using the chief ray or other paraxial rays could also replace or supplement the marginal ray used herein.

Funding

Natural Sciences and Engineering Research Council of Canada; Canada First Research Excellence Fund.

Acknowledgments

This research was supported by the Natural Sciences and Engineering Research Council of Canada and by the Sentinel North program of Université Laval, made possible, in part, thanks to funding from the Canada First Research Excellence Fund.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. C. Gannon and R. Liang, “Using machine learning to create high-efficiency freeform illumination design tools,” arXiv:1903.11166 (2018).

2. T. Yang, D. Cheng, and Y. Wang, “Direct generation of starting points for freeform off-axis three-mirror imaging system design using neural network based deep-learning,” Opt. Express 27(12), 17228–17238 (2019). [CrossRef]

3. T. Yang, D. Cheng, and Y. Wang, “Designing freeform imaging systems based on reinforcement learning,” Opt. Express 28(20), 30309–30323 (2020). [CrossRef]

4. W. Chen, T. Yang, D. Cheng, and Y. Wang, “Generating starting points for designing freeform imaging optical systems based on deep learning,” Opt. Express 29(17), 27845–27870 (2021). [CrossRef]

5. C. Gagné, J. Beaulieu, M. Parizeau, and S. Thibault, “Human-competitive lens system design with evolution strategies,” Appl. Soft Comput 8(4), 1439–1452 (2008). [CrossRef]

6. B. F. Carneiro de Albuquerque, F. Luis de Sousa, and A. S. Montes, “Multi-objective approach for the automatic design of optical systems,” Opt. Express 24(6), 6619 (2016). [CrossRef]

7. C. Menke, “Application of particle swarm optimization to the automatic design of optical systems,” Proc. SPIE 10690, 1 (2018). [CrossRef]

8. M. Isshiki, H. Ono, K. Hiraga, J. Ishikawa, and S. Nakadate, “Lens Design: Global Optimization with Escape Function,” Opt. Rev. 2(6), 463–470 (1995). [CrossRef]

9. J. Meiron, “Damped Least-Squares Method for Automatic Lens Design,” J. Opt. Soc. Am. 55(9), 1105–1109 (1965). [CrossRef]

10. M. van Turnhout and F. Bociort, “Chaotic behavior in an algorithm to escape from poor local minima in lens design,” Opt. Express 17(8), 6436–6450 (2009). [CrossRef]

11. M. van Turnhout, P. van Grol, F. Bociort, and H. P. Urbach, “Obtaining new local minima in lens design by constructing saddle points,” Opt. Express 23(5), 6679 (2015). [CrossRef]

12. G. Côté, J.-F. Lalonde, and S. Thibault, “Extrapolating from lens design databases using deep learning,” Opt. Express 27(20), 28279–28292 (2019). [CrossRef]

13. G. Côté, J.-F. Lalonde, and S. Thibault, “Introducing a dynamic deep neural network to infer lens design starting points,” Proc. SPIE 11104, 8–14 (2019). [CrossRef]

14. G. Côté, J.-F. Lalonde, and S. Thibault, “Deep learning-enabled framework for automatic lens design starting point generation,” Opt. Express 29(3), 3841–3854 (2021). [CrossRef]

15. Y. Zhang and H. Gross, “Systematic design of microscope objectives. part i: System review and analysis,” Adv. Opt. Technol. 8(5), 313–347 (2019). [CrossRef]

16. Y. Zhang and H. Gross, “Systematic design of microscope objectives. part ii: Lens modules and design principles,” Adv. Opt. Technol. 8(5), 349–384 (2019). [CrossRef]

17. Y. Zhang and H. Gross, “Systematic design of microscope objectives. part iii: miscellaneous design principles and system synthesis,” Adv. Opt. Technol. 8(5), 385–402 (2019). [CrossRef]

18. G. Côté, J.-F. Lalonde, and S. Thibault, “On the use of deep learning for lens design,” Proc. SPIE 12078, 230–236 (2021). [CrossRef]

19. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in neural information processing systems vol. 32, (2019), pp. 8024–8035.

20. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems, arXiv:1603.04467 (2016).

21. J.-B. Volatier, Á. Menduiña-Fernández, and M. Erhard, “Generalization of differential ray tracing by automatic differentiation of computational graphs,” J. Opt. Soc. Am. A 34(7), 1146 (2017). [CrossRef]

22. C. Wang, N. Chen, and W. Heidrich, “Lens design optimization by back-propagation,” in International Optical Design Conference 2021, vol. 12078 (SPIE, 2021), pp. 312–318.

23. H. Haim, S. Elmalem, R. Giryes, A. M. Bronstein, and E. Marom, “Depth Estimation From a Single Image Using Deep Learned Phase Coded Mask,” IEEE Trans. Comput. Imaging 4(3), 298–310 (2018). [CrossRef]

24. J. Chang and G. Wetzstein, “Deep Optics for Monocular Depth Estimation and 3D Object Detection,” in Proceedings of IEEE International Conference on Computer Vision, (IEEE, 2019), pp. 10192–10201.

25. C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deep Optics for Single-Shot High-Dynamic-Range Imaging,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2020), pp. 1375–1385.

26. Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learning Rank-1 Diffractive Optics for Single-Shot High Dynamic Range Imaging,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2020), pp. 1386–1396.

27. Q. Sun, C. Wang, Q. Fu, X. Dun, and W. Heidrich, “End-to-end complex lens design with differentiate ray tracing,” ACM Trans. Graph. 40, 1–13 (2021). [CrossRef]

28. E. Tseng, A. Mosleh, F. Mannan, K. St-Arnaud, A. Sharma, Y. Peng, A. Braun, D. Nowrouzezahrai, J.-F. Lalonde, and F. Heide, “Differentiable compound optics and processing pipeline optimization for end-to-end camera design,” ACM Trans. Graph. 40(2), 1–19 (2021). [CrossRef]

29. A. Halé, P. Trouvé-Peloux, and J.-B. Volatier, “End-to-end sensor and neural network design using differential ray tracing,” Opt. Express 29(21), 34748–34761 (2021). [CrossRef]

30. Schott Corporation, “Optical Glass Catalog,” (2021).

31. Ohara Corporation, “Optical Glass Catalog,” (2021).

32. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems vol. 30, (2017), pp. 5998–6008.

Inferring the solution space of microscope objective lenses using deep learning

Abstract

1. Introduction

2. Preliminary considerations

2.1 Deep optics

2.2 Deep learning-enabled LDE

3. Method

3.1 Microscope objective specifics

3.2 Adapted model for one-to-many mapping

3.3 Structural distance

3.4 Augmented training objective

4. Experiments

4.1 Qualitative results and structural diversity

4.2 Optical performance

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (11)

Equations (9)

Optics Express