Metalens enhanced ray optics: an end-to-end wave-ray co-optimization framework

Ziwei Zhu; Zhaocheng Liu; Changxi Zheng

doi:10.1364/OE.496608

1. Introduction

The end-to-end method has emerged as a novel optical design paradigm in recent years, driven by advancements in computer hardware and the demand for new imaging capabilities [1,2]. This method offers advantages in terms of user customization and flexible, intuitive figure-of-merits [3] compared to the conventional strategy of designing and combining individual optical components. By improving system performance at a holistic level, the end-to-end method can also refine systems designed through the conventional approach.

Optical systems typically comprise numerous components, and their design is guided either by wave optics [4,5] or geometrical optics [6]. Wave optical components can be accurately described using first-principle Maxwell’s equations, enabling the exploration of exotic optical phenomena such as negative refractive index [7] and ultra-broadband behaviors [8]. However, direct simulations based on these principles are often computationally expensive and do not scale well. In contrast, geometrical ray tracing is more suitable for larger systems and can easily model long propagation distances and large incident angles [9]. This approach finds extensive use in virtual reality (VR) systems [6]. However, the limitations of traditional optics and manufacturing techniques result in errors such as spherical and color aberrations when using simple lenses in VR systems. Bulky multilens sets can correct these aberrations but are impractical for wearable devices, limiting the potential for vivid near-eye displays. To address this challenge, researchers have explored the use of a single layer of metalens capable of correcting colors and manipulating light simultaneously [10]. However, the combination of it with ray tracing is still important because all-metalens VR systems are very expensive.

In recent years, significant progress has been made in the differentiable simulation of wave optics [11,12] and geometrical optics [13,14]. This progress paves the way for combining the properties of wave optics and the scalability of geometrical optics in a differentiable way, enabling the end-to-end modeling of nonparaxial imaging systems with large incident angles or the correction of aberrations in geometrical optics. However, achieving this combination is not straightforward. The description of light in wave optical systems relies on fields [15], whereas geometrical optics uses discrete rays. The conversion between these descriptions introduces information loss unless a high-order model of rays is employed. Moreover, the discretization of rays makes this conversion challenging to differentiate, if not impossible.

Research has demonstrated that the combination of meta-device and geometrical optical components can enhance optical performance and correct aberrations [16]. In this study, we introduce a gradient-based end-to-end optimization strategy that enables efficient and simultaneous optimization of wave optical components, geometrical optical components, and software/algorithm components. This strategy facilitates the joint optimization of these different components, allowing for a comprehensive approach to improving optical systems. By leveraging gradients, we can efficiently explore the design space and achieve optimal performance across multiple aspects of the optical system. The proposed framework is outlined in Fig. 1.

Fig. 1. The end-to-end wave-ray co-optimization framework. The inputs consist of plane waves from multiple directions, which are used to construct the input wavefront. The wavefront then passes through a metalens and is converted back to rays, which are directed through a geometrical lens to simulate the point spread functions. From these functions, we can simulate the output image. The optimization process involves jointly optimizing the metalens and geometrical lens to minimize the difference between the input and output images.

Download Full Size | PDF

As illustrated in Fig. 1, the process begins by sourcing rays from the input image, which then accumulate to form the incident wavefront. This wavefront subsequently passes through a wave optical component, following which the rays are extracted and directed through a geometrical lens. This setup enables simulation of the output image for any focal point and assessment of how the output image changes as parameters of the wave optical component or the geometrical lens are varied. By minimizing the difference between the output image and the input image, an end-to-end optimization pipeline can be constructed. As examples, we show how to use this system for nonparaxial optical optimization and minimizing color aberrations in the result part.

2. Optimization framework

2.1 Mathematical model

To begin, we assume that the light is described by a perfect plane wave when observed from a far-field perspective. Let us assume that the plane wave has a uniform wave vector $\vec {k}_{\text {in}}$. Consequently, the input wavefront at a position $\vec {r}$ can be represented by the equation:

(1)$$\phi_{\text{in}}(\vec{r}) = \exp{\left(j\vec{k}_{\text{in}}\cdot\vec{r}\right)}.$$

It is worth noting that our framework is also applicable to point sources. In this scenario, the input wavefront would be $\phi _{\text {in}}(\vec {r}) = \exp {\left (jk_0\Vert \vec {r}_{\text {in}} - \vec {r}\Vert \right )}$, where $\vec {r}_{\text {in}}$ is the position of the point source and $k_0$ is the wavenumber in vacuum.

As the wavefront passes through a flat metalens, the modulation of the wavefront can be described as a combination of the amplitude $A$ and phase $S$ modulations:

(2)$$\phi_{\text{mod}} = A(\vec{r})\exp{\left(jS(\vec{r})\right)}.$$

Since the thickness of the metalens is negligible compared to the scale of the entire system, it can be ignored in the ray tracing part. The modulation of amplitude and phase caused by the metalens, along with its gradients with respect to the geometries of meta-atoms, are modeled by a differentiable neural network, such as SIREN. [17] The main reason for the surrogation is that first-principle solver is oftentimes much more costly compared to other parts of our system. The output of the neural network can be multiplied with the input wavefront to get the output wavefront:

(3)$$\phi_{\text{out}} = \phi_{\text{mod}}\phi_{\text{in}}.$$

Our objective is to extract rays from the output wavefront $\phi _{\text {out}}$.

2.2 Differentiable conversion between waves and rays

The conversion between waves and rays consists of two parts. The conversion from rays to waves is straightforward. A ray with intensity $\left |A_k\right |^2$ (or amplitude $A_k$) and direction $\hat {\nu }$ represents a plane wave $A_k\exp {\left (j\vec {k}\cdot \vec {r}\right )}$, where $\vec {k} = 2\pi /\lambda \hat {\nu }$ is the wavevector. All the transformations here are differentiable and thus can be implemented using auto differentiation as other parts of our algorithms.

For the conversion from waves to rays, we proposed two methods that can be combined with differentiable ray tracing, which means, the output must be some quantities with definite directions and starting positions. Both methods assume the light can be represented by a linear combination of plane waves. In other words, the light is in pure state. [18] For light in mixed state, it is impossible to extract plane waves only from wavefront in frequency domain. We summarized the advantages and limitation of each method in Table 1 and the details of the two methods will be presented in Section 3 and 4.

Table 1. The summary of the two wave to ray conversion methods.^a

View Table

As is shown in Table 1, windowed Fourier transform does not assume the field is a plane wave locally, instead, it linearly decomposes light into several plane waves. However, due to the uncertainty principle, the size of the window in the windowed Fourier transform is inversely proportional to the uncertainty of the output ray direction. If we use phase gradient method, it will be free from this uncertainty caused by the finite window size. But it assumes the phase of the wavefront is well-defined. In other words, for each point at the wavefront, we assume it can be locally viewed as a plane wave.

2.3 Differentiable ray tracing

Once rays are extracted from the wavefront, the rays can be passed into the geometrical lens for ray tracing. [19] This is at the cost of overlooking the diffraction, which is sufficient for visible light over long distance at the scale of several milimeters, given a large enough aperture. [20]

An optical component often has smooth boundary, which means its bidirectional reflectance distribution function (BRDF) [21] is a delta function, or a summation of several delta function. The tracing process can be modeled sequentially by a unified method trace for different objects. Each time, the trace function takes the input positions $\vec {r}$s and wavevectors $\vec {k}$s, and output the the hitting positions and the refracted wavevectors. For efficiency concern, both input and output are treated as large tensors consisting of the information from all the rays. We neglect reflection. Once the total reflection happens, we set the intensity of the correponding ray to be zero, and its position can be arbitrary. By overlooking the branching and pruning the rays far away from the optical axis, we are able to vectorize the process and take advantage of GPU computation of PyTorch to compute more than $10^4$ rays.

2.4 Image simulation with differentiable point spread function

Once a ray hits the image plane at a sensor pixel, the sensor will record a signal. Because we adopt the ray tracing model, diffraction pattern and interference is overlooked. However, the postions of pixels are fixed in space, which is not differentiable inherently. To develop a differentiable point spread function (PSF) for end-to-end image optimization, we first write the point spread function as a summation of delta functions:

(4)$$\text{PSF}(\vec{r}) = \frac{1}{B}\sum_{k}\left|A_k\right|^2\delta(\vec{r} - \vec{r}_k)$$

where $B$ is a normalization term to ensure the PSF function sums to $1$.

At the first glance the delta functions in PSF are not differentiable inherently. In practice, we can either fit the PSF using Gaussian function [22] or relax the delta function as a function with small variance. However, the relaxation introduces non-physical biases. Also, if the variance is too small, the gradients approach zero far away from the hit points and infinity near the hit points. Therefore, we develop the differentiable PSF by considering output image $G(\vec {r})$ as a convolution of the input image $F(\vec {r})$ with the spatially variant PSF

(5)$$\begin{aligned}G(\vec{r}) &= F(\vec{r}) * \text{PSF}(\vec{r}) \\&= \frac{1}{B} \int\int \sum_{k}\left|A_k\right|^2\delta(\vec{r}' - \vec{r}_k)F(\vec{r} - \vec{r}'){\rm d} s(\vec{r}') \\&= \frac{1}{B}\sum_{k}\left|A_k\right|^2 F(\vec{r} - \vec{r}_k)\end{aligned}$$

Although the input image is always discretized, for any position not aligned with the pixel grid, the color value can be found by an interpolation of the neighbor grid cells. This inspires us to precompute a PSF kernel. If the image size is $N_y\times N_x$, the size of the PSF kernel should be $(2N_y-1)\times (2N_x-1)$ to cover the whole image, even when convoluting at the corner of the image. If the ray hits out of the kernel, it will never contribute to the signal of the sensor.

Now we describe the geometrical PSF as follows: we initialize the PSF as a zero matrix. Each time when the ray hits $\vec {r}_k$, assuming $\vec {r}_k$ is between four pixels $(i, j)$, $(i, j+1)$, $(i+1, j)$, and $(i+1, j+1)$ on the PSF kernel, and we have

(6)$$\alpha_k = \frac{y_k - y_i}{y_{i+1} - y_i}$$

(7)$$\beta_k = \frac{x_k - x_j}{x_{j+1} - x_j},$$

we can add elements to the PSF kernel as follows:

(8)$$P(i, j) +{=} \sum_k\left|A_k\right|^2(1-\alpha_k)(1-\beta_k)$$

(9)$$P(i, j+1) +{=} \sum_k\left|A_k\right|^2(1-\alpha_k)\beta_k$$

(10)$$P(i+1, j) +{=} \sum_k\left|A_k\right|^2\alpha_k(1-\beta_k)$$

(11)$$P(i+1, j+1) +{=} \sum_k\left|A_k\right|^2\alpha_k\beta_k$$

The resulting PSF is differentiable thanks to the assumption that the image is smooth and can be interpolated. It is direct to get the gradients of $\alpha _k$ and $\beta _k$ with respect to $\vec {r}_k$. The final convolution should have no padding, so the output will be of the same size as the input image. If we consider the spatial variance of the PSF, we can precompute a sample of PSFs (like $9\times 9$), the output image will be the convolution of the interpolated PSFs and the input image.

2.5 Optimization and practical consideration

The image reconstruction loss through the optical system can be formulated as

(12)$$\mathcal{L} = \frac{1}{S}\int\int\Vert F(\vec{r}) - G(\vec{r})\Vert^2_2{\rm d} s$$

where $S$ is the area of the image. This is working for both paraxial PSF and nonparaxial PSFs. For multiple channels like RGB, we sum the loss functions of different colors together.

However, directly optimizing the image can get stuck in a local minimum. We find it is better to get the initial guess by optimizing the paraxial property first. In order to do this, we optimize the hit point positions $\vec {r}_k$ on the image plane with a perpendicular incident plane wave first. The position-based loss can be formulated as

(13)$$\mathcal{L} = \sum_k \Vert \vec{r}_k -\vec{r}_0\Vert^2_2$$

where $\vec {r}_0 = (0, 0)$ is the position of the focal point.

3. Windowed Fourier transform

A natural idea to convert from waves to rays is to decompose the wavefront into multiple plane waves. For different plane waves, we sample infinite number of rays along the corresponding direction at different positions. This is impossible in practice. Therefore we need to choose a window for the Fourier transform.

Windowed Fourier transform is widely used in audio processing to extract different pitches of sound at a certain time $t$. [23], which is also called short-time Fourier transform. For light waves, we need to consider the whole output plane, slided by a 2D window. At any position $\vec {r}$, the ray intensity pointing to an arbitrary wavevector $\vec {k}$ is given by

(14)$${\left|A(\vec{r}, \vec{k})\right|}^2 = \left[\int_{\Omega} \phi_{\text{out}}(\vec{r}')\exp{\left({-}j\vec{k}\cdot(\vec{r}' - \vec{r})\right)}{\rm d} s(\vec{r}')\right]^2$$

where $\Omega$ is the 2D window. From the equation, we know the result is blurred by the Fourier transform of the aperture $\Omega$ — the smaller the aperture is, the result is blurrier. However, increasing the window size will delocalize the position of the ray — increase the width of the light beam. In summary, the window size is a hyperparameter, and we need to carefully tune the window size to ensure transformed rays are close to actual ray propagation. To illustrate the blurriness, we adopt a simple example, a 1D wavefront generated by a uniform wavevector $\vec {k}$. We extract different amplitudes along different directions using windowed Fourier transform. We find that, the smaller the window size is, the directions are blurrier. However, the output should only have one direction. The visualization is shown in Fig. 2.

Fig. 2. Visualization of the blurriness caused by finite window size. It is shown that by increasing the window size, the blurriness will be minimized.

Download Full Size | PDF

Although it is possible to tune the window size empirically, the best window size $L$ can be derived from theory. Consider 1D rectangular window with width $L$, the output direction $k$ is blurred by the Fourier transform of a rectangular window:

(15)$$\mathcal{F}\left(\text{rect}\left(\frac{x}{L}\right)\right) \propto \text{sinc}{\left(\frac{Lk_x}{2}\right)} = \text{sinc}{\left(\frac{L\pi}{\lambda}\frac{k_x}{k_0}\right)} \approx \text{sinc}{\left(\frac{L\pi}{\lambda}\theta\right)}$$

At distance $d$, the blurriness on the image plane is characterized by the convolution kernel $\text {sinc}{\left (\frac {\pi L}{\lambda }\frac {x}{d}\right )}$. The width of this kernel is $2d\lambda / L$. Consider a ray from a window of size $L$ has a beam width $L$, the final uncertainty will be $\max {\left \{2d\lambda / L, L\right \}}$. The minimal blurriness is achieved when $L = \sqrt {2d\lambda }$. For a wavelength around $500~{\rm nm}$ and the distance around $10~{\rm mm}$, the optimal value of L is around $100~\mu {\rm m}$.

The gradient can be backpropagated via a similar formula

(16)$$\frac{\partial \mathcal{L}}{\partial \phi^*_{\text{out}}(\vec{r})} = \int_{\Omega} \left[ \frac{\partial \mathcal{L}}{\partial A(\vec{r}', \vec{k}')}\exp{\left(j\vec{k}'(\vec{r}' - \vec{r})\right)}\right]^*{\rm d} s(\vec{r}')$$

where $\vec {k}'$ is calculated using the target point and $\vec {r}'$. Notice that for backpropagation, we utilize Wirtinger derivative [24], which is compatible with PyTorch.

The outputs of the windowed Fourier transform are the amplitudes of rays pointing to arbitrary directions $\vec {k}$. As an approximation, it is possible to extract the ray directions with top-k amplitudes. And the differentiation of the top-k problem is solvable by solving an optimal transport problem. [25] This method, however, is very slow even implemented on GPUs. Another way is to decompose the optimization into two steps. In the first optimization step, we solve the directions of the output rays that will optimize the imaging quality. Then in the second optimization step we optimize the amplitude and phase of the metalens to maximize the amplitude of the rays pointing to the target directions. We demonstrate the optimization using two setups: In the first setup, the phase modulation is coated at one side of the convex lens; in the second setup, the phase modulation is seperated from the convex lens. We assume the incident waves only come from the direction perpendicular to the flat metalens, so the incident wavefront is uniform. The setup and resulting wavefront is shown in Fig. 3.

Fig. 3. The co-optimization of metalens and geometric lens. (a) The metalens is seperated from the convex lens by $2{\rm mm}$ with size $2~{\rm mm}$ in both $x$ and $y$ dimensions. The curvature radius of the convex lens is fixed $3 {\rm mm}$ and the optimized wavefront $\phi _{\text {mod}}$ is shown in (b). (c) The metalens is coated at one side of the convex lens, and the curvature radius of both sides of the convex lens is optimized from $3.5~{\rm mm}$ to $3.2~{\rm mm}$. The aperture of the metalens viewed from the z direction has a $2~{\rm mm}\times 2~{\rm mm}$ size. The aperture of the convex lens is of size $0.5~{\rm mm}\times 0.5~{\rm mm}$. The optimized wavefront on the metalens is shown in (d). The focus point is $5~{\rm mm}$ from the center of the convex lens. For simplicity we only optimize the center PSF.

Download Full Size | PDF

As is shown in Fig. 3, our algorithm are able to optimize the convex lens together with the metalens. For simplicity, we only optimize the plane waves that is perpendicular to the wavefront without considering different plane waves from different angles. We can observe that, inside and outside of the aperture by the geometric lens, the wavefront $\phi _{\text {mod}}$ is different. This is because if the rays coming from the metalens hit nothing, it will go straightly. However, if the rays hit the convex lens, their directions change.

4. Phase gradient method

Consider the change of amplitude will split the rays into multiple directions, oftentimes we only need to optimize the phase modulation for a better result. We assume the wave is described by a scalar field:

(17)$$\phi_{\text{out}} = \exp{\left(jS_{\text{out}}(\vec{r})\right)}$$

If the wave optics component acts like a phase modulator, it will only change $S_{\text {out}}$. The phase gradient will deflect the direction of the output rays, according to the generalized refraction law. [26] If we assume the output wave is plane wave, we propose the equivalent wavevector extractor:

(18)$$\vec{k} = \frac{\partial S_{\text{out}}}{\partial \vec{r}}$$

Note that here we assume that at each position, only one $\vec {k}$ is possible. This is working for coherent plane waves without amplitude modulation, or as an approximation of the average light propagation direction for incoherent light.

To implement the equivalent wavevector extractor, assume the phase is discretized by a uniform grid with stepsize $h$, the three components of the wavevector are

(19) $$k_x(i, j) = 0.5(S(i, j+1) - S(i, j-1))/h$$

(20) $$k_y(i, j) = 0.5(S(i+1, j) - S(i-1, j))/h$$

(21)$$k_z(i, j) = \sqrt{k^2_0 - k^2_x(i, j) - k^2_y(i, j)}$$

By implementing the steps in PyTorch, we can use Autograd [27] to compute the gradient of $\vec {k}$ with respect to the phase $S$. Since the majority of our tasks primarily involve optimizing phase modulation, both methods yield similar outcomes. However, due to its faster execution and the absence of window size tuning requirements, we have chosen to adopt the phase gradient method for our subsequent experiments.

5. Results

5.1 Paraxial and nonparaxial optical design

We conducted all the experiments on the internal platform with an Nvidia Tesla P100 GPU and Intel Xeon Processor (Skylake) CPU, and it can be easily migrated to other platforms as long as an Nvidia GPU is available. Firstly, we validate the design capability of our framework by co-optimize a setup with both optical phase modulator and convex lens. The position-based loss is adopted firstly.

The results are shown in Fig. 4. The phase lens has a diameter of $2{~{\rm mm}}$. And we assume the geometrical lens is placed at $2~{\rm mm}$ away from the phase lens, with an aperture of $0.5~{\rm mm}$. The focal point is $5~{\rm mm}$ away from the phase lens. We optimize to let all the rays from the wave plate hit the image plane at the center. Notice that this setup is only for demonstration. For nonparaxial optical design, we need to make sure the aperture of the geometrical lens is large enough, so plane waves from large incident angles will still pass the convex lens. We assume the radius of the convex lens is fixed as $3~{\rm mm}$.

Fig. 4. To improve the near axis point spread function, we utilize a position-based loss function, as described in Eq. (13). We begin by applying a uniform phase modulation, then allowing the output rays to pass through a convex lens with a fixed curvature radius of $3~{\rm mm}$. The left column depicts the progression of the point spread function for the entire system. The middle column showcases the resulting output image after passing through the hybrid optical system. The right column displays the phase modulation during the optimization process.

Download Full Size | PDF

From Fig. 4, we can observe that the initial PSF consists of a small focused circle, surrounded by a ring. This is because the light perpendicularly incident to the convex lens. The rays entering the aperture will be focused while the rays out of the aperture will not change their directions. We show that by optimizing the whole system, we are able to focus both rays to the focal point. For simulation, we only consider the green light ($520~{\rm nm}$). It can be easily extended to cover other wavelengths by adding the corresponding loss functions.

Next, we consider a more realistic scene with light incident from large angles. For each incident angle, the light is modeled by a plane wave with a fixed wavevector $\vec {k}$. Because our setup consists of two lenses, one phase modulator and one geometrical lens, to ensure the rays from the modulator passing through the second lens, the aperture of the geometrical lens should be large enough. An illuration of the setup is shown in figure Fig. 5.

Fig. 5. This figure depicts the device setup for the nonparaxial optical design. Plane waves from various directions undergo an optimized phase modulation of $2~{\rm mm}$ and then pass through a large convex lens. The convex lens aperture is maintained at $7.9~{\rm mm}$ to allow plane waves from incident angles as high as $\pm 12^\circ$ to pass through. The phase modulation and convex lens are separated by a distance of $2~{\rm mm}$, and the sensor and convex lens are also separated by a distance of $2~{\rm mm}$. We perform joint optimization of the phase modulation and convex lens to enhance the imaging quality on the sensor from the far field.

Download Full Size | PDF

For each plane wave, the phase at any coordinate when incident at the metalens can be collected easily. Assume the $xy$-coordinates of the metalens is $x_{\text {inc}}$ and $y_{\text {inc}}$, and the wavevector of the plane wave is $\vec {k}_{\text {inc}} = (k_x, k_y)$, the incident phase can be formulated as

(22)$$S_{\text{inc}} = k_x x_{\text{inc}} + k_y y_{\text{inc}}.$$

If the phase contribution of the optimizable modulator is $S_{\text {mod}}(x, y)$, the output phase can be written as

(23)$$S_{\text{out}} = S_{\text{inc}} + S_{\text{mod}}.$$

This output phase can be put into Eq. (18) to get the output wavevector $\vec {k}$. After the ray tracing, each plane wave instance will hit a set of points at the imaging plane. If we optimize for an image with pixel size $p = 13~\mu {\rm m}$, the sensor size is assumed to be $pN_y\times pN_x$. In practice, we sample the centers of PSFs uniformly within the sensor.

We optimize the curvature radius of the geometrical lens and the phase modulation together. Because directly optimizing on image can easily get stuck in local minimums, in practice, we optimize to minimize the spreading in the geometrical point spread functions (spot diagrams) firstly. However, the optimization directly based on this loss is not sufficient because different point spread functions may have different contributions to the final imaging quality, and it is hard to determine the weights between different point spread functions. Also, it is noticed that different regions in the image may have difference importance. For example, the region with more details may need a better point spread functions. Based on this, we use the result as a good initial guess to optimize the image. The optimized point spread functions and the resulting image are shown in Fig. 6. To further analyse the results, we also visualize the phase and lens radius during the optimization, which is shown in Fig. 7.

Fig. 6. (a) To optimize the point spread functions and the resulting image, we first set the lens radius to $400~{\rm mm}$ and initialize the phase modulation to be uniform. Then, we train both the geometrical lens and the wave optical component by minimizing the deviation of the points in the simulated point spread functions from their corresponding target centers for the plane waves. Finally, we fine-tune the results by minimizing the difference between the input image and the output image. (b) The loss - iterations curve of the fine tuning process. (c) The ideal output image, which is the input of our optical system.

Download Full Size | PDF

Fig. 7. The optimization progress of phase modulation and curvature radius for the geometrical lens. (a) The initial state with a uniform phase modulation and a lens radius of $400~{\rm mm}$. (b) The optimized phase modulation and curvature radius obtained by minimizing the deviation of the geometrical point spread functions from their target centers. (c) The further refined phase modulation and lens radius obtained by minimizing the difference between the input and output images. (d) The difference between (b) and (c).

Download Full Size | PDF

Despite a significant improvement in imaging quality, the resulting image still appears blurry compared to the training image in Fig. 6(c). This blurriness primarily arises from the limitations of a simple system composed of two optical components. In more sophisticated imaging systems, careful design and selection of lens sets with multiple lenses and their configurations are necessary. On the other hand, to further enhance the optimization results of the system, we can consider adopting strategies such as exploring different initializations of the PSF or optimizing using a dataset consisting of randomized images. These strategies can help mitigate the issue of encountering local minimums in gradient-based optimization. While our current approach involves optimizing the optical system using only one image, it is important to acknowledge that the performance of the system may degrade when applied to other images. However, we anticipate that the training method can still enhance imaging performance for images with similar pixel distributions. To improve the overall performance in a general scenario, a more effective training strategy involves collecting a comprehensive dataset and optimizing the optical design using all available images [1]. This approach can be particularly suitable for designing optical systems that cater to specialized tasks within relatively simple scenarios, such as eye-tracking or indoor depth sensing. By incorporating a diverse range of images during the optimization process, we can achieve more robust and versatile performance across different scenarios.

5.2 Enhanced geometrical optics for virtual reality

Recent advancements have demonstrated that precise control of the phase for discrete wavelengths can be achieved through careful design of meta-atoms on a metalens [28]. This finding is particularly relevant for VR displays, where the light source is restricted to several discrete bands (e.g., red, green, and blue), enabling the correction of color aberrations and other lens-induced aberrations. This flexibility offers significant advantages at the system level, enhancing the overall performance of the VR experience.

When we enhance geometrical lens with metalens, we can correct the aberrations inherent in geometrical optics without introducing more weight and thickness. In order to design a metalens with multiple meta-atoms, one approach is to construct a library of meta-atoms, determine the desired phase shift across the lens, and assign meta-atoms accordingly. However, we observed that while this strategy can approximate the desired phase distribution, it results in fluctuating phase gradients on the metalens, leading to undesired diffracted beams in the far field. To overcome this challenge, we opted for an end-to-end training strategy. With this approach, we iteratively refine the geometries of the meta-atoms using gradients derived from the loss function. As evaluating the performance of all meta-atoms using wave simulation such as rigorous coupled wave analysis (RCWA) is computationally intensive, we employ a trained neural network as a surrogate model to expedite both the forward simulation and backward gradient calculation. This enables us to efficiently navigate the optimization process and achieve better results for the metalens design. The surrogate model characterizes how the shape (represented by four parameters) of the meta-atom affects the correction phase and wavefront for the aberrations. In this study, we focus on rectangular meta-atoms and simulate the effective indices $n_{\text {eff}}$ of different colors (wavelengths of 488 nm, 532 nm, and 658 nm) from the fundamental mode. The effective indices are solved using the RCWA solver provided in [12]. The effective phase modulation is calculated as $\phi _{\text {mod}} = k_0 n_{\text {eff}} h$, where $k_0$ is the wavenumber in vacuum, and $h$ is the height of the meta-atom. We would like to emphasize that the choice to employ varying heights for meta-atoms in the design aims to ensure maximum coverage of the $2\pi$ phase difference for all three wavelengths on the transmission side. While using a fixed height can still result in an improved design, the margin for improvement is reduced. Alternatively, it is possible to utilize complex shapes with increased degrees of freedom to accommodate the phase shifts [29,30], while maintaining a consistent height for all meta-atoms. To optimize such a metalens, it is a matter of training the surrogate model to predict the spectral responses based on the new geometric parameters. All other steps of the optimization strategy remain unchanged.

For the surrogate model, we employ a SIREN neural network architecture consisting of four hidden layers, each with eight hidden neurons. The input parameters are four meta-atom shape descriptors ($w_0$, $w_1$, $w_2$, and $w_3$), while the outputs are the effective indices ($n_b$, $n_g$, and $n_r$) corresponding to blue, green, and red light, respectively. The meta-atoms can be fabricated using silicon, surrounded by SiO2. To train the neural network, we simulate the results for 10,000 different configurations and subsequently evaluate its performance using an additional 1,000 randomly generated configurations. The loss is the summation of the squared distances between the outputs and the groundtruth effective indices of different colors. We train the neural network for 5,000 iterations. The final training loss is $7.5\times 10^{-4}$ while the testing loss is $9.8\times 10^{-4}$, which are accurate enough for our tasks. The architechture of the neural network and the shape of the meta-atoms are illustrated in Fig. 8.

Fig. 8. The architechture of the surrogate neural network and parameterization. We input four parameters, and use the simulated response for the three colors (blue, green, red) to train a simple multi-layer SIREN. To avoid overfitting, we choose a rather simple structure. The diagram of the cross-section of a meta-atam is shown in the right.

Download Full Size | PDF

We start with a single-sided spherical lens with curvature radius $2.1~{\rm mm}$. The lens is located at $z = 2~{\rm mm}$. From Lensmaker’s equation, the focal point should be at

(24)$$f = z + \frac{R}{n-1} = 2 + \frac{2.1}{1.46 - 1} \approx 6.56 ~{\rm mm}$$

However, when we simulate the system through ray tracing, because of lens aberrations, the actual focal point is near $7.2~{\rm mm}$. Furthermore, rays of different colors focus at different positions, as is shown in Fig. 9(a). Based on that, we add the SIREN-surrogated metalens at $z = 0~{\rm mm}$ to correct the errors. After the wave-ray co-optimization, the final lens radius is $2098.88~\mu {\rm m}$. And all colors get focused at $z = 7.2~{\rm mm}$ with minimum color aberrations, which is shown in Fig. 9(b). The four optimized parameters and the height of meta-atoms are shown in Fig. 10

Fig. 9. The initial and optimized hit points for different $z$-planes. Initially, rays of different colors focus at different positions, and the focus quality is limited because of geometrical aberrations. After optimization, rays of different colors all get better focus quality at $z = 7.2~{\rm mm}$.

Download Full Size | PDF

Fig. 10. The optimized parameters and heights of the meta-atoms. The units in the figures are $\mu {\rm m}$s. The size of metalens is $2\times 2~{\rm mm^2}$. Each meta-atom should have a period of $1~\mu {\rm m}$, so we need to set the resolution of metalens to be $2000\times 2000$. Because of the limitation of GPU memory, we optimize for a lower resolution $200\times 200$. For fabrication, we can interpolate the results to get the values for a higher resolution.

Download Full Size | PDF

It has been demonstrated that by jointly optimizing the metalens and the geometrical lens, significant improvements can be achieved in terms of focus quality and color aberration reduction.

6. Conclusion

To summarize, our work proposes a fully differentiable framework for co-optimizing metalens and geometrical lens, which is the first attempt to combine ray tracing and wavefront modulation in a gradient-based optimization approach. However, our framework is based on certain assumptions, such as neglecting light diffraction and using the geometrical point spread function as the response of the optical system.

In our algorithm, we primarily focus on optimizing the intensity distribution for imaging tasks. However, we would like to note the feasibility of the framework being extended to encompass general optimization of other optical properties, such as polarization, phase, amplitude, and more. To achieve this generalization, certain modifications need to be made. We first need to shift from working with intensity-based transformations to electric field-based transformations. This means that all operations would be performed on vector counterparts instead of scalar values. For instance, rays need to carry electric fields instead of intensity while tracing through the optical system. The remaining parts of the algorithm would require similar revisions to accommodate these changes. Exploring and discussing this topic in a systematic manner would be a separate and intriguing avenue of investigation. Generalizing the optimization framework to encompass various optical properties presents an interesting opportunity for further research. In addition, we also plan to optimize the end-to-end framework over a large image dataset to connect neural networks for more domain-specific tasks. We also acknowledge recent advancements in path tracing techniques, such as [31], which can handle wave effects more effectively. It will be interesting to explore differentiable optimization approaches for such techniques for the purpose of optical design.

Acknowledgments

We thank Zhao Dong, and Shuang Zhao for their valuable suggestions.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. 37(4), 1–13 (2018). [CrossRef]

2. Z. Lin, C. Roques-Carmes, R. Pestourie, M. Soljačić, A. Majumdar, and S. G. Johnson, “End-to-end nanophotonic inverse design for imaging and polarimetry,” Nanophotonics 10(3), 1177–1187 (2021). [CrossRef]

3. S. K. Nayar, “Computational cameras: Redefining the image,” Computer 39(8), 30–38 (2006). [CrossRef]

4. K. Miyamoto, “The phase fresnel lens,” J. Opt. Soc. Am. 51(1), 17–20 (1961). [CrossRef]

5. M. Pan, Y. Fu, M. Zheng, H. Chen, Y. Zang, H. Duan, Q. Li, M. Qiu, and Y. Hu, “Dielectric metalens for miniaturized imaging systems: progress and challenges,” Light: Sci. Appl. 11(1), 195 (2022). [CrossRef]

6. O. Cakmakci, Y. Qin, P. Bosel, and G. Wetzstein, “Holographic pancake optics for thin and lightweight optical see-through augmented reality,” Opt. Express 29(22), 35206–35215 (2021). [CrossRef]

7. D. R. Smith, J. B. Pendry, and M. C. Wiltshire, “Metamaterials and negative refractive index,” Science 305(5685), 788–792 (2004). [CrossRef]

8. W. Cheng, J. Feng, Y. Wang, Z. Peng, S. Zang, H. Cheng, X. Ren, Y. Shuai, H. Liu, J. Wu, and J. Yang, “Genetic algorithms designed ultra-broadband achromatic metalens in the visible,” Optik 258, 168868 (2022). [CrossRef]

9. E. Hart, “Practical solutions for ray tracing content compatibility in unreal engine 4,” Ray Tracing Gems II: Next Generation Real-Time Rendering with DXR, Vulkan, and OptiX pp. 845–858 (2021).

10. Z. Li, P. Lin, Y.-W. Huang, J.-S. Park, W. T. Chen, Z. Shi, C.-W. Qiu, J.-X. Cheng, and F. Capasso, “Meta-optics achieves rgb-achromatic focusing for virtual reality,” Sci. Adv. 7(5), eabe4458 (2021). [CrossRef]

11. T. W. Hughes, I. A. Williamson, M. Minkov, and S. Fan, “Forward-mode differentiation of maxwell’s equations,” ACS Photonics 6(11), 3010–3016 (2019). [CrossRef]

12. Z. Zhu and C. Zheng, “Differentiable scattering matrix for optimization of photonic structures,” Opt. Express 28(25), 37773–37787 (2020). [CrossRef]

13. C. Zhang, L. Wu, C. Zheng, I. Gkioulekas, R. Ramamoorthi, and S. Zhao, “A differential theory of radiative transfer,” ACM Trans. Graph. 38(6), 1–16 (2019). [CrossRef]

14. A. Teh, M. O’Toole, and I. Gkioulekas, “Adjoint nonlinear ray tracing,” ACM Trans. Graph. 41(4), 1–13 (2022). [CrossRef]

15. D. M. Sullivan, Electromagnetic simulation using the FDTD method (John Wiley & Sons, 2013).

16. W. T. Chen, A. Y. Zhu, J. Sisler, Y.-W. Huang, K. M. Yousef, E. Lee, C.-W. Qiu, and F. Capasso, “Broadband achromatic metasurface-refractive optics,” Nano Lett. 18(12), 7801–7808 (2018). [CrossRef]

17. V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, “Implicit neural representations with periodic activation functions,” Advances in Neural Information Processing Systems33, 7462–7473 (2020).

18. C. J. Isham, Lectures on quantum theory Mathematical and structural foundations (Allied Publishers, 2001).

19. P. Shirley and R. K. Morley, Realistic ray tracing (AK Peters, Ltd., 2008).

20. G. B. Airy, “On the diffraction of an object-glass with circular aperture,” Transactions of the Cambridge Philosophical Society 5, 283 (1835).

21. F. E. Nicodemus, “Directional reflectance and emissivity of an opaque surface,” Appl. Opt. 4(7), 767–775 (1965). [CrossRef]

22. A. Halé, P. Trouvé-Peloux, and J.-B. Volatier, “End-to-end sensor and neural network design using differential ray tracing,” Opt. Express 29(21), 34748–34761 (2021). [CrossRef]

23. J. A. Moorer, “A note on the implementation of audio processing by short-term fourier transform,” in 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), (IEEE, 2017), pp. 156–159.

24. R. Hunger, An introduction to complex differentials and complex differentiability (Munich University of Technology, Inst. for Circuit Theory and Signal Processing, 2007).

25. Y. Xie, H. Dai, M. Chen, B. Dai, T. Zhao, H. Zha, W. Wei, and T. Pfister, “Differentiable top-k with optimal transport,” Advances in Neural Information Processing Systems33, 20520–20531 (2020).

26. N. Yu, P. Genevet, M. A. Kats, F. Aieta, J.-P. Tetienne, F. Capasso, and Z. Gaburro, “Light propagation with phase discontinuities: generalized laws of reflection and refraction,” Science 334(6054), 333–337 (2011). [CrossRef]

27. A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems32, 1 (2019).

28. A. C. Overvig, S. Shrestha, S. C. Malek, M. Lu, A. Stein, C. Zheng, and N. Yu, “Dielectric metasurfaces for complete and independent control of the optical amplitude and phase,” Light: Sci. Appl. 8(1), 92 (2019). [CrossRef]

29. W. T. Chen, A. Y. Zhu, V. Sanjeev, M. Khorasaninejad, Z. Shi, E. Lee, and F. Capasso, “A broadband achromatic metalens for focusing and imaging in the visible,” Nat. Nanotechnol. 13(3), 220–226 (2018). [CrossRef]

30. S. Wang, P. C. Wu, V.-C. Su, Y.-C. Lai, M.-K. Chen, H. Y. Kuo, B. H. Chen, Y. H. Chen, T.-T. Huang, J.-H. Wang, R.-M. Lin, C.-H. Kuan, T. Li, Z. Wang, S. Zhu, and D. P. Tsai, “A broadband achromatic metalens in the visible,” Nat. Nanotechnol. 13(3), 227–232 (2018). [CrossRef]

31. S. Steinberg and L.-Q. Yan, “A generic framework for physical light transport,” ACM Trans. Graph. 40(4), 1–20 (2021). [CrossRef]

method	#rays	accuracy	single plane wave
windowed Fourier transform	$n$	+	no
phase gradient method	1	++	yes

Metalens enhanced ray optics: an end-to-end wave-ray co-optimization framework

Abstract

1. Introduction

2. Optimization framework

2.1 Mathematical model

2.2 Differentiable conversion between waves and rays

2.3 Differentiable ray tracing

2.4 Image simulation with differentiable point spread function

2.5 Optimization and practical consideration

3. Windowed Fourier transform

4. Phase gradient method

5. Results

5.1 Paraxial and nonparaxial optical design

5.2 Enhanced geometrical optics for virtual reality

6. Conclusion

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (1)

Equations (24)

Optics Express