Deep-learning-enabled electromagnetic near-field prediction and inverse design of metasurfaces

Tevfik Bulent Kanmaz; Efe Ozturk; Efe Ozturk; Hilmi Volkan Demir; Hilmi Volkan Demir; Cigdem Gunduz-Demir; Cigdem Gunduz-Demir; Cigdem Gunduz-Demir

doi:10.1364/OPTICA.498211

1. INTRODUCTION

Metasurfaces are artificially designed electromagnetic (EM) structures that allow the realization and engineering of a desired wavefront profile using sub-wavelength metaunits. An array of different orientations of such metaunits, each of which essentially acts as a scatterer, forms an optical structure that is ultra-thin compared to its counterpart conventional optical tools [1–4]. The interactions of light with the sub-wavelength units make it possible to control the phase and magnitude of the scattered light [5,6]. Therefore, metasurfaces are competent optical devices that can be utilized for numerous tasks including image formation, focusing, optical vortex generation, spectroscopy, and augmented reality [7–9]. The metaunits composed of all-dielectric materials have been utilized in recent metasurface architectures. These dielectric metaunits are shown to yield minor losses across a wide range of frequencies compared to designs that adopt metal-based units [10–12].

As much as metasurface designs offer benefits in terms of optical performance given their thickness, the design process can be cumbersome and is usually not straightforward. A mainstream approach in their design is to “phase-match” the responses of consecutive units as a designer seeks to achieve the right interference of the EM waves [13,14]. However, this requires a repetitive process of trial and error. Also, the EM response of a surface may not be a direct superposition of the individual units but rather a collective EM response if there exist inter-coupling effects among the metaunits [15]. In general, one typically needs to anticipate a design with intuition and optimize it with the metaunit set that is predefined or, alternatively, to propose an analytical solution for the model. Nevertheless, finding an analytical solution for an arbitrary design can be practically impossible. Moreover, the EM response of even a single metaunit may become unpredictable pretty quickly with minor changes in the geometry and materials of the units. These give rise to a strong motivation for the investigation of efficient tools capable of finding the EM response. Conventionally, iterative numerical methods and simulation techniques, including FEM (finite element method), FDTD (finite difference time domain), and FIT (finite integration technique) are utilized [16]. These methods provide accurate and reliable solutions even for complex structures. Nonetheless, they are basically brute force methods that require immense computational resources to reach a solution [17,18]. As the conventional metasurface design process relies on trial and error, the inefficiency of the numerical methods makes it considerably time consuming for the designer, and requires an experienced metasurface designer in most cases.

Fig. 1. For the forward problem, the input is a cross-section nanopillar map representing the metasurface geometry, and the output is a near-field electric field response. The arrow between the input and output indicates the use of our proposed neural network. For the inverse design problem, input–output relations are the opposite and the arrows should be reversed. In other words, the input is the near-field electric field response, and the output is the refractive index cross-section nanopillar map.

Download Full Size | PDF

In this paper, we propose and demonstrate a deep-learning-based solution to overcome these disadvantages of conventional methods. Deep neural networks have shown promising results for solving many scientific and engineering problems, including the prediction of the EM response of metasurfaces [19–23]. However, these previous studies have their specific limitations. Previously, the analysis of the EM response of metasurfaces was focused only on the metaunits’ spectrum response. Additionally, they were limited to a small number of discrete metaunits. This decreases the total number of possible designs but makes it easier to generate a dataset and train a model. Moreover, the inter-coupling effects of the metastructures were generally fully omitted or confined to one axis of interaction only [23]. Overall, these previous studies do not provide full EM information for a complex and high degree-of-freedom (DOF) design that accounts for non-constrained inter-coupling effects. Thus, they cannot offer a fully capable alternative for numerical solvers such as FDTD, for example, for the implementation of all-dielectric inter-coupled nanopillar fabrics.

In this work, different from the previous literature, we introduce a multi-task deep neural network design to predict the complete phase map information for a high-DOF inter-coupled metasurface geometry (forward problem). In addition, we propose to use a single-task deep neural network with a similar architecture for the inverse design problem, in which the aim is to find the metasurface geometry from a given EM phase profile [24]. To the best of our knowledge, there exist no conventional tools or deep-learning-based models for the solution of this inverse design problem, which makes our design the first to achieve such metasurface geometry prediction. Working with three types of metasurface design configurations, each of which has a different DOF (Fig. 1), our experiments revealed that the proposed neural networks showed high performance for both the forward and inverse design problems.

2. METHOD

In this work, two main problems are studied: 1) the forward problem of EM near-field response prediction, in which the input is a cross-section map of a metasurface, and the output is an EM near-field intensity map, and 2) the inverse design problem of metasurface geometry prediction, in which the input is an EM phase profile and the output is a metasurface geometry. Both of these prediction problems are addressed for three types of metasurface design configurations, as illustrated in Fig. 1, by designing deep neural networks.

A. Metasurface Design Configurations

The selected three configurations use different positionings of the fixed-size pillars of the same material. In our experiments, we fixed the size and the material type since one can achieve almost any desired phase profile by changing the positions of these pillars, and thus their level of inter-coupling, to cover local phase accumulation across the entire range from 0 to $2\pi$.

In general, one may prefer metaunits that lead to devices and metasurface designs as efficient and broadband as possible. Although the metaunit designs that use the Panchartam–Berry phase approach have produced promising results in recent years [25], the resulting devices typically suffer from low device efficiency and large lattice sizes in general. The lattice structures are forced to be large as it is mandatory to reduce inter-coupling effects between the metaunits. On the contrary, metalens structures that utilize the inter-coupling effects between the metaunits do not suffer from significant lattice constraints and show great efficiency [23,26]. Despite the performance level of these designs near the optimum device operating frequency, they tend to fail at providing the same efficiency over a broad spectrum [27]. Considering these undesired limitations, in our designs, we use a dielectric metaunit that provides full phase accumulation coverage via controlling the inter-coupling between them while offering broadband efficiency over the entire 400–700 nm visible range. The ${{\rm TiO}_2}$ nanopillar optimized in [28] is adopted as our fundamental design block. These nanopillars have a fixed radius of 45 nm and a fixed height of 600 nm and are placed on top of a ${{\rm SiO}_2}$ substrate [Fig. 2(a)].

Fig. 2. (a) 3D illustration of a single nanopillar used as the metaunit. For the metaline configuration, (b) refractive index cross-section of an example positioning of the nanopillars and (c) their EM near-field intensity map obtained by the numerical simulation for the forward problem. For the triangular grid configuration, (d) refractive index cross-section of an example positioning of the nanopillars and (e) their EM near-field intensity map obtained by the numerical simulation. (f) Illustration of randomly positioned nanopillars. (g) Illustration of circularly positioned nanopillars, which will be used for the achromatic metalens design.

Download Full Size | PDF

The metaline configuration is used as a basic proof of concept, in which three pillars are aligned; i.e., the simulated structure is a “metaline” that is periodic and infinite along one direction. The position of one pillar is fixed at the center of the cross-sectional area, and the other two meta pillars’ distances to the center change in each input sample. The simulation data were collected at a wavelength of 550 nm. The resolution of a two-dimensional (2D) cross-sectional input image is set to $128 \times 48\;{\rm pixels}$. The pillars are represented on this input image as circles with a fixed radius. With the selected image resolution and the radius, this geometry ensures that the interactions between the pillars are confined to one axis only. This reduces the problem’s complexity and DOF. In this work, we use this configuration to show that the inter-coupling effects between metapillars can be predicted with a deep neural network. An example positioning of the pillars and its EM near-field intensity map obtained by simulation are illustrated in Figs. 2(b) and 2(c), respectively.

In the triangular grid configuration, the pillars are randomly placed on a triangular grid, which limits the number of places that a pillar can be located (see Fig. S1 in Supplement 1). The lattice spacing is set to 5 nm, which is small enough to induce the coupling of the pillars from every direction. The coupling interactions between the pillars are, therefore, not restricted to one axis but effective in all directions. Likewise, the simulation data were collected at a wavelength of 550 nm. In this configuration, the input image size is fixed as $128 \times 128$. As a result of this fixed size, the number of randomly located metapillars changes from one sample to another (in our experiments, this number varies from 10 to 27 due to the packing limit). Note that this configuration has the limitation of having a fixed total area, which will be relaxed in the next configuration. An example pillar positioning and its simulated map are given in Figs. 2(d) and 2(e), respectively.

The random pillar configuration is used for the case where there are no constraints on the number and position of the pillars or on the total simulation area. The random positioning of the pillars is illustrated in Fig. 2(f). This configuration simulates the case where an arbitrarily large simulation area is chosen to calculate its EM near-field response. In our experiments, metapillars are randomly located across a simulation area of $2048 \times 2048\;{\rm pixels}$. This results in an average of $2950.7 \pm 13.8$ pillars in each simulation. Here it is worth noting that, although we fixed this simulation area in the experiments, our neural network design provides a generic and computationally feasible solution that can be applied to an arbitrarily selected area. Furthermore, to show the applicability of this solution on different and multiple frequencies, for this configuration, we collected the simulation data at seven spectral points in the range of 400 and 700 nm; these points are equally spaced in the frequency domain.

To solve this problem numerically with FDTD, the computational time may take prohibitively long for large simulation areas [29], making the simulation infeasible. On the other hand, the proposed neural network solution works much more efficiently, even for large areas. To demonstrate large-area simulation in a real-world application, the neural network trained for the random pillar configuration is transferred and fine-tuned to design an achromatic metalens [28], which is shown in Fig. 2(g). With the selected pillar radius, this achromatic metalens design covers a simulation area of $1410 \times 1410\;{\rm pixels}$, different from the simulation area previously selected for the random pillar configuration. Nevertheless, the proposed approach is applicable.

B. Deep Neural Network Designs

This work uses deep neural networks with the U-Net architecture, which is a very well-known model in the field of computer vision [30]. This architecture is an encoder–decoder network, where the encoder accepts an image (a 2D map) as its input and the decoder generates another image that has the same resolution as the input image. In the forward problem, the input image is a binary map specifying the metasurface geometry. To acquire the input, refractive index cross-section maps just above the substrate are converted to binary maps to represent the metasurface. In these binary maps, the space (pixel positions) occupied by the pillars is represented with 1 (white), and the empty space (filled with air) is represented with 0 (black). The output of the forward problem is the EM near-field response, which is considered as a six-channel image containing the real and imaginary parts of the three Cartesian components of the vectorial electric field, i.e., ${\rm Re}({E_x})$, ${\rm Im}({E_x})$, ${\rm Re}({E_y})$, ${\rm Im}({E_y})$, ${\rm Re}({E_z})$, and ${\rm Im}({E_z})$. For the inverse design problem, the inputs and the outputs are the opposite of the forward problem. In other words, the input is the EM near-field response map, and the output is the binary map specifying the metasurface geometry.

In order for the encoder–decoder network to generate (predict) an output from a given input, the encoder path extracts features directly on the input using a series of convolutional and pooling (downsampling) layers. The decoder path then constructs the output map from the extracted features by upsampling and convoluting them in consecutive layers. In the U-Net architecture, a long-skip connection (concatenation operation) passes the feature map from an encoder layer to its symmetric decoder layer in order for the upsampling process to better recover the fine-grained spatial information lost during downsampling. The schematic overview of the encoder and decoder architecture used in our network designs is sketched in Fig. 3.

Fig. 3. Schematic overview of the encoder and decoder architectures used in our network designs. The numbers on the top of an encoder–decoder block are the number of feature maps used by the block.

Download Full Size | PDF

In our designs, the encoder path includes five consecutive blocks of two convolutions and one max pooling, whereas the decoder path consists of five consecutive blocks of upsampling, concatenation, and two convolutions. All convolutions use $3 \times 3$ filters and are followed by the rectified linear unit (ReLU) activation function. The dropout layer, with a factor of 0.2, is added after the first convolution to prevent overfitting. The pooling/upsampling layers use $2 \times 2$ filters. The number of feature maps used by the first encoder block is 32. This number is doubled after each pooling layer and halved after each upsampling layer. The block number is selected as five since it yields a sufficient field of view to extract coupling information between the pillars with the selected filter sizes and the image resolution. Note that this is one of the typical architectures used by the U-Net-based models [30].

Fig. 4. Single-task network for the inverse design problem. This network has one encoder and a single decoder that use the architecture given in Fig. 3. This single decoder uses the feature maps, which are the outputs of the encoder.

Download Full Size | PDF

In this work, we propose two neural network designs. The first one is a single-task network for the inverse design problem. This network has one encoder and one decoder since the one-channel output map, representing the metasurface geometry, is estimated from multi-channel input maps, representing the real and imaginary components of the electric field in Cartesian coordinates. This single-task network takes a six-channel image for the metaline and triangular grid configurations (Fig. 4), and 42-channel image, which corresponds to the real and imaginary parts of all EM near-field responses collected at seven different frequency points, for the random pillar configuration. The second one is a multi-task network designed for the forward problem to predict the real and imaginary parts of the Cartesian components of the electric field maps in the near-field from the metasurface geometry. This network includes one shared encoder path and six decoder paths, one for predicting each real or imaginary part (Fig. 5). Here we use a multi-task network since the multi-task learning paradigm is known as an effective means to predict different but related tasks. Learning multiple tasks from a single shared encoder decreases the likelihood of overfitting since this requires learning a shared representation that works adequately well for all the tasks [31].

Fig. 5. Multi-task network for the forward problem. This network has one encoder but multiple decoders that also use the architecture given in Fig. 3. These decoders use the same feature maps learned by the shared encoder.

Download Full Size | PDF

The networks were designed and trained in Python using the Tensorflow framework. The network weights were optimized by backpropagation. The AdaDelta optimizer was used to adaptively adjust the learning rate. We used the categorical cross-entropy loss to train the single-task network since estimating a binary map corresponds to a classification problem for each pixel. On the other hand, we used the mean squared error loss to train the multi-task network since estimating continuous EM near-field responses corresponds to a regression problem for each pixel. The datasets were generated using the Lumerical FDTD solver. Refractive index monitors and frequency domain field monitors were used in the simulations to obtain the input–output pairs. Please see the description of the FDTD setup, Fig. S2 and Table S1 in Supplement 1, for more details of data generation and training.

3. RESULTS

A. Forward Problem for Metaline and Triangular Grid Configurations

The multi-task network design illustrated in Fig. 5 was used to predict the six-channel EM near-field response from metasurface geometry input maps. Subsequently, these six channels were used to calculate the corresponding EM near-field intensity map. Visual results obtained by the proposed network on three exemplary test set samples are illustrated in Figs. 6(a)–6(c) and Figs. 6(d)–6(f) for the metaline and triangular grid configurations, respectively. The percentage error maps calculated on the simulated and predicted maps of these samples are illustrated in Fig. S7 of Supplement 1. Additionally, for each sample, the mean squared error (MSE) was calculated between the EM near-field intensity (or irradiance) maps obtained by the FDTD solver and predicted by the proposed multi-task network. As the unit of irradiance is ${\rm W}/{\rm m^2}$, the MSE values are given in the unit of ${\rm W^2}/{\rm m^4}$. For the metaline and triangular grid configurations, the MSEs averaged over test set samples were $1.29{\rm e} - 03 \pm 0.12{\rm e} - 03$ and $5.76{\rm e} - 07 \pm 0.02{\rm e} - 07$, respectively. These low MSEs indicated that the maps were predicted quite accurately, as also supported by the visual results.

Fig. 6. Visual results on exemplary test samples. For the metaline configuration, (a) metasurface geometry maps, and EM near-field intensity maps (b) obtained by the FDTD solver and (c) predicted by the multi-task network. MSEs for the samples were 1.3e-03, 2.1e-03, and 2.4e-03, respectively. For the triangular configuration, (a) metasurface geometry maps, and EM near-field intensity maps (b) obtained by the FDTD solver and (c) predicted by the multi-task network. MSEs for these samples were 5.8e-07, 2.5e-06, and 1.6e-06, respectively.

Download Full Size | PDF

B. Forward Problem for Random Pillar Configuration

The random pillar configuration allows to use an arbitrarily selected area and collect the simulated data at an arbitrary number of spectral points. Thus, we make two modifications: first, the previous configurations were simulated for a single spectral point at a particular wavelength. Thus, we designed a multi-task network with six decoders, each of which predicted a real or an imaginary part of the Cartesian components of the vectorial electric field as the EM near-field response. On the other hand, the random pillar configuration allows to simulate data at N different spectral points, which necessitates training $6 \times {\rm N}$ decoders at the same time. In our design, this number would be 42 for the selected seven spectral points. This would correspond to simultaneously optimizing the weights of a larger network, which required more training data and demanded more powerful processors and larger memory resources. As a result, at some point, simultaneous training of all these decoders would become infeasible. To alleviate this problem, we used six independently trained multi-task networks, one for a real or an imaginary part of one Cartesian component of the EM near-field response. Each network has one encoder and multiple decoders, as given in Fig. 5. But this time, each decoder predicts a response obtained for one of the seven spectral points (see Fig. S4). The encoder and decoder architectures of these networks are the same as those given in Fig. 3.

The second modification was to handle arbitrarily selected large simulation areas. To this end, we designed all networks to take $256 \times 256$ input tiles that were cropped out of a metasurface geometry map. Networks were trained on the tiles randomly cropped out of the training samples. Then, to predict the EM near-field intensity map of a test sample, we estimated the maps for overlapping tiles and averaged all predictions estimated for the same pixel. The overlapping tiles were obtained by sliding a window over the map with an increment of 64 pixels (Fig. 7). In our experiments, we only considered the predictions in the middle $128 \times 128$ section of each window. We made these choices considering the inter-coupling effects between the pillars. We provide the rationality behind these choices and the details of the averaging algorithm in Sections S3.1 and S3.2 of Supplement 1, respectively.

Fig. 7. Illustration of estimating EM near-field and far-field responses in the random pillar configuration by the sliding window approach.

Download Full Size | PDF

In addition to analyzing the performance of our network on predicting EM near-field responses, we examined its effects on EM far-field response projections. To do so, we separately calculated the far-field response of the simulated and the predicted near-field response, using the Fresnel approximation, and compared these far-field responses visually and quantitatively. One can calculate the EM far-field response by taking the convolution of the EM near-field response with the free space transfer function [32]. However, as the analytical convolution integral makes calculations almost impossible for many cases except for the most simple diffraction geometries, one needs approximations. In our experiments, we used the Fresnel approximation (see Fig. S6) as it is one of the approximations that can be used in most cases but also contains relatively fewer assumptions [33].

For the random pillar configuration, data were simulated at seven spectral points, which were again equally spaced in the frequency domain. For each spectral point, the MSE between the maps obtained by the simulation and predicted by the proposed multi-task neural network was calculated. Table 1 reports the average MSE calculated on the test set samples for each spectral point separately. As seen in this table, the errors were acceptably low, which also led to accurate visual results. For an example spectral point (with 535.3 THz frequency corresponding to 560 nm wavelength), the visual results obtained on an exemplary test sample are given in Fig. 8. The percentage error map calculated on the simulated and predicted EM near-field maps is illustrated in Fig. S7. The visual results on the same test sample for the other spectral points can be found in Fig. S8.

Table 1. For the Random Pillar Configuration, MSEs for the Prediction of EM Near-Field and Far-Field Intensity Maps at Different Frequencies^a

View Table | View all tables in this article

Fig. 8. (a) Example of the refractive index cross-section of randomly positioned nanopillars. (b) Its EM near-field intensity map obtained by the simulation. (c) EM far-field intensity map calculated from the simulated EM near-field intensity map using the Fresnel approximation. (d) Refractive index cross-section estimated from the EM near-field response by the single-task network. Two red circles indicate the position of incorrectly predicted pillars. (e) EM near-field intensity map predicted by the multi-task network. (f) EM far-field intensity map calculated from the predicted EM near-field intensity map using the Fresnel approximation.

Download Full Size | PDF

In addition to calculating the MSE metrics directly on the simulated and predicted values, we used an additional assessment method to evaluate the performance of EM near-field prediction. This second method focuses on assessing the spatial information of strong-field localizations in the near-field intensity maps because such strongly localized field sites constitute hot spots in the intensity distribution and the spatial locations of these local hot spots are important from an electromagnetic point of view. This carries useful information because a large collection of photonic functions to be carried out by metasurfaces are closely related to the set of these hot spots and their spatial distributions. To detect the intensity peaks, which correspond to the local hot spots, we compared the responses of the simulated and predicted EM near-field maps against a threshold that was automatically selected for each map separately, considering the mean and the standard deviation of the responses on this map. The details of this selection can be found in Supplement 1. Then, we identified peak regions in each map as the pixels with responses greater than this threshold. The overlapping peak regions in the simulated and predicted maps were considered as true positives, and the precision, recall, and f-score metrics were calculated for quantitative evaluation. The visual and quantitative results of this experiment are shown in Fig. S9 and Table S2. These results indicated that our networks were able to detect the peak regions with high accuracy, leading to f-scores greater than 77.00% for the first six frequencies. For the seventh one, the f-score was 73.78%. We repeated this experiment for the EM far-field intensity prediction and its results are also reported in Supplement 1.

C. Inverse Design Problem for Metaline and Triangular Grid Configurations

The single-task network design illustrated in Fig. 4 was used to predict the metasurface geometry from the EM near-field intensity maps. The experiments revealed that the predictions were quite successful for both configurations. For the metaline and triangular grid configuration, the visual results predicted by the proposed network on exemplary test samples are depicted in Fig. 9 together with the original simulation design that was used to create the EM near-field monitor data, which were the network inputs for this prediction. As also seen in these figures, a near-perfect prediction performance was achieved. Additionally, we designed a specific test for the metaline configuration to understand the robustness of its network on the positioning of the three pillars. This test aimed to investigate the performance of the geometry prediction when the three pillars were arbitrarily located. Figure 9(c) shows such a test sample, which is much more complex than the training set samples that allow only 1D pillar positioning. As seen in Fig. 9(d), the network successfully generalized learned interactions among pillars even when arbitrarily located.

Fig. 9. For the metaline configuration, (a) original simulation design of a test sample and (b) its inverse design prediction. For another sample that was used to understand the robustness of the network, (c) original simulation design and (d) inverse design prediction. For the triangular grid configuration, (e) original simulation design of a test sample and (f) its inverse design prediction.

Download Full Size | PDF

D. Inverse Design Problem for Random Pillar Configuration

The single-task network was tested on five test set samples. The input fed to this network is the FDTD simulated EM near-field map. The original simulation design and the prediction are given in Fig. 8 for the first sample, and in Fig. S10 for the others. For quantitative assessment, first, the number of true positive (TP) pillars was calculated comparing the pillars $P$ that were predicted by the network with the pillars $S$ in the original simulation design. A predicted pillar ${p_i} \in P$ is true positive if its centroid was found inside ${s_j} \in S$ and if there exist no other predicted pillars whose centroids were found inside the same pillar ${s_j}$. Then, the ${\rm precision} = {\rm TP}/|P|$, ${\rm recall} = {\rm TP}/|S|$, and ${ f}$-score metrics were calculated. The ${f}$-score metric is the harmonic mean of precision and recall. Table 2 reports these metrics for each test set sample separately. This table reveals that the inverse design prediction gave near-perfect performance scores.

Table 2. For the Inverse Design Prediction in the Random Pillar Configuration, the Precision, Recall, and F-score Percentages Obtained on the Test Set Samples

View Table | View all tables in this article

E. Verification on Achromatic Lens Design

To assess the applicability of the proposed network designs on a real-world application, we conducted experiments on a specific achromatic metalens design given in [28], which consists of a periodically structured metasurface configuration (Fig. 10). To better predict the EM near-field intensity map from this specific geometry, we transferred the weights of the network pretrained for the random pillar configuration and fine-tuned these weights on new samples containing periodic and dense nanopillar structures. Next, the newly trained network was used to predict the EM near-field intensity map for the achromatic metalens design. This transfer learning is illustrated in Fig. 11. Here the new training samples were not cropped from any part of the achromatic metalens design but from synthetically generated structures containing pillars on a regular grid (one example is shown in Fig. 11). As a result, there existed no bias in the training samples towards overfitting this achromatic metalens design.

Fig. 10. (a) Refractive index cross-section of a circular achromatic metalens. (b) Its EM near-field intensity map obtained by simulation. (c) EM far-field intensity map calculated from the simulated EM near-field intensity map. (d) Refractive index cross-section estimated from the EM near-field response by the single-task network. Here predictions are indicated as red circles, and the boundaries of pillars in the original design, shown in (a), are indicated as white for comparison. (e) EM near-field intensity map predicted by the multi-task network. (f) EM far-field intensity map calculated from the predicted EM near-field intensity map.

Download Full Size | PDF

Fig. 11. Illustration of transfer learning for achromatic metalens design prediction.

Download Full Size | PDF

Fig. 12. (a)–(c) Inverse designs predicted by three separately trained networks for the same achromatic lens. EM far-field responses (d) calculated for the original design and (e)–(g) obtained by simulating the predictions shown in (a)–(c) with the FDTD solver.

Download Full Size | PDF

For the achromatic metalens design in Fig. 10(a), the full EM near-field intensity maps obtained by the simulation and predicted by the proposed network are illustrated in Figs. 10(b) and 10(e), respectively. The MSE between these maps was 4.23e-07. The percentage error map calculated on the simulated and predicted EM near-field maps is also illustrated in Fig. S7. Since the main assumptions of the Fresnel approximation were not satisfied for this particular application, another approach was necessary to calculate the far-field projections. To this end, we used the “farfieldexact3d” script method, available in the Lumerical FDTD solver. The predictions were fed to the FDTD environment again and labeled as near-field DFT monitor information. Then, the “farfieldexact3d” script method was called to calculate the EM far-field response at the desired points in the 3D space. The same script method was utilized for the simulation data as well. The EM far-field intensity maps found by the same procedure are compared, as displayed in Figs. 10(c) and 10(f), for the simulation and prediction, respectively. The MSE between these far-field maps was 1.14e-07.

One metasurface geometry map predicted by the network for this achromatic metalens design is shown in Fig. 10(d). However, since inverse designs are not unique for the metasurface design process (i.e., since one can achieve the same EM response with different designs theoretically), we trained and tested multiple models. In Figs. 12(a)–12(c), three different inverse designs (corresponding to three separate models) predicted for the same achromatic lens are illustrated. In all these three visuals, the first eight concentric circles were predicted at the same location as the pillars in the original design. However, the ninth layer was not predicted in any of these three runs; the inside of this missing layer turned out to be different for different runs.

The EM far-field response at the focal plane of a circular lens is more important compared to the EM near-field response. Thus, all these three inverse models were simulated with the Lumerical FDTD solver to determine their EM near-field and far-field responses and the calculated EM far-field responses are presented in Figs. 12(e)–12(g). The EM far-field intensity map of the original design is also given in Fig. 12(d) for comparison. As seen in these visuals, intensity distributions of the EM far-field responses were almost identical, while focusing efficiency in the predictions was slightly lower than the original design. This might be attributed to the following: in a neural network design, it is very common to normalize the input data to enhance network learning. This normalization, however, did not take into consideration the impact of free space propagation in a circular design on the EM near-field intensity predictions. Despite this limitation, the EM far-field responses across all three inverse designs displayed similar characteristics, regardless of their focusing efficiency differences.

4. DISCUSSION AND CONCLUSION

Our experiments revealed that encoder–decoder networks, with the U-Net architecture, proved to predict EM near-field responses and design metasurface configurations. They are shown to possess several advantages: first, the proposed network designs reduced the required computational time compared to the iterative numerical methods, which had to perform a complicated, computationally demanding, and time-consuming simulation for each EM near-field response. On the other hand, predicting these responses by a trained neural network was faster. Note that, even though the network training could take more time, this training should be undertaken only once, and it could be used for many predictions without further training.

Second, compared to the previous networks implemented for metasurface design, our proposal of using an encoder–decoder network enabled to predict the EM near-field intensity map rather than just analyzing the transmission spectra, which only focus on one parameter. Thus, our proposal provided a full EM simulation result that could be analyzed and processed further. Third, this work facilitated the prediction of a metasurface pillar geometry for the inverse design problem. The traditional simulation tools did not provide a direct inverse design prediction and relied heavily on the trial-and-error approach. Consequently, our innovative design strategy of directly mapping a corresponding EM near-field response to the metasurface design holds immense potential to revolutionize the design approach. This inverse solution did not set any constraints on the total number and location of metaunits or on the total simulation area, which were used by the previous studies.

In summary, this work introduced a novel deep learning approach to predict the full EM near-field response of metasurfaces. It provided a complete solution for the vectorial electric field in the near-field, making it the first account of a deep learning approach to achieve this task. Moreover, this approach offered a solution for the inverse design problem, which finds the metasurface geometry from the EM near-field responses, for the first time. The type of device, the operating frequency range, and the type of design geometry do not primarily affect the performance of our approach. Therefore, it is possible to apply the proposed approach in analyzing and designing different electromagnetic structures and photonic devices. For example, as future work, one can explore differently shaped metaunits (rectangular, triangular, etc.) as the building blocks used to construct metasurfaces and different lattices to stack them in such metasurfaces. In these cases, the network should be trained with new inputs consisting of these new metaunits and their inter-coupled fabrics.

Funding

Agency for Science, Technology and Research (M21J9b0085); Türkiye Bilimsel ve Teknolojik Araştırma Kurumu (20AG001, 121C266, 121N395, 120N076, 119N343); Türkiye Bilimler Akademisi.

Acknowledgment

H.V.D. gratefully acknowledges the support from the TUBA and TUBITAK 2247-A National Leader Researchers Program.

Disclosures

The authors declare no conflicts of interest.

Data availability

Training sets, codes, and trained network models will be available at [34].

Supplemental document

See Supplement 1 for supporting content.

REFERENCES

1. P. D. Peter, “Light scattering by small particles,” Science 127, 477–478 (1958). [CrossRef]

2. F. Walter, G. Li, C. Meier, S. Zhang, and T. Zentgraf, “Ultrathin nonlinear metasurface for optical image encoding,” Nano Lett. 17, 3171–3175 (2017). [CrossRef]

3. C. D. Robert, K. Mohammadreza, C. Wei Ting, O. Jaewon, and C. Federico, “Broadband high-efficiency dielectric metasurfaces for the visible spectrum,” Proc. Natl. Acad. Sci. USA 113, 10473–10478 (2016). [CrossRef]

4. M. Decker, I. Staude, M. Falkner, J. Dominguez, D. N. Neshev, I. Brener, T. Pertsch, and Y. S. Kivshar, “High-efficiency dielectric Huygens’ surfaces,” Adv. Opt. Mater. 3, 813–820 (2015). [CrossRef]

5. J. Qiang, H. Leyong, G. Guangzhou, L. Junjie, W. Yongtian, and H. Lingling, “Arbitrary amplitude and phase control in visible by dielectric metasurface,” Opt. Express 30, 13530–13539 (2022). [CrossRef]

6. A. C. Overvig, S. Shrestha, S. C. Malek, M. Lu, A. Stein, C. Zheng, and N. Yu, “Dielectric metasurfaces for complete and independent control of the optical amplitude and phase,” Light Sci. Appl. 8, 92 (2019). [CrossRef]

7. E. Arbabi, A. Arbabi, S. M. Kamali, Y. Horie, M. Faraji-Dana, and A. Faraon, “MEMS-tunable dielectric metasurface lens,” Nat. Commun. 9, 812 (2018). [CrossRef]

8. D. Neshev and I. Aharonovich, “Optical metasurfaces: new generation building blocks for multi-functional optics,” Light Sci. Appl. 7, 58 (2018). [CrossRef]

9. A. Hammad, R. A. Abdur, M. Husnul, A. M. Mahmood, M. Nasir, and N. Sadia, “Phase engineering with all-dielectric metasurfaces for focused-optical-vortex (FOV) beams with high cross-polarization efficiency,” Opt. Mater. Express 10, 434–448 (2020). [CrossRef]

10. Y. Wenhong, S. Xiao, Q. Song, Y. Liu, Y. Wu, S. Wang, J. Yu, J. Han, and D. P. Tsai, “All-dielectric metasurface for high-performance structural color,” Nat. Commun. 11, 1864 (2020). [CrossRef]

11. P. Genevet, F. Capasso, F. Aieta, M. Khorasaninejad, and R. Devlin, “Recent advances in planar optics: from plasmonic to dielectric metasurfaces,” Optica 4, 139–152 (2017). [CrossRef]

12. M. Khorasaninejad, Z. Shi, A. Y. Zhu, W. T. Chen, V. Sanjeev, A. Zaidi, and F. Capasso, “Achromatic metalens over 60 nm bandwidth in the visible and metalens with reverse chromatic dispersion,” Nano Lett. 17, 1819–1824 (2017). [CrossRef]

13. X. Ding, F. Monticone, K. Zhang, L. Zhang, D. Gao, S. N. Burokur, A. de Lustrac, Q. Wu, C.-W. Qiu, and A. Alù, “Ultrathin Pancharatnam–Berry metasurface with maximal cross-polarization efficiency,” Adv. Mater. 27, 1195–1200 (2015). [CrossRef]

14. Z. Junxiao, Q. Haoliang, C. Ching-Fu, Z. Junxiang, L. Guangru, W. Qianyi, L. Hailu, W. Shuangchun, and L. Zhaowei, “Optical edge detection based on high-efficiency dielectric metasurface,” Proc. Natl. Acad. Sci. USA 116, 11137–11140 (2019). [CrossRef]

15. L. Q. Cong, Y. K. Srivastava, and R. Singh, “Inter and intra-metamolecular interaction enabled broadband high-efficiency polarization control in metasurfaces,” Appl. Phys. Lett. 108, 011110 (2016). [CrossRef]

16. Y. Vahabzadeh, N. Chamanara, K. Achouri, and C. Caloz, “Computational analysis of metasurfaces,” IEEE J. Multiscale Multiphys. Comput. Tech. 3, 37–49 (2017). [CrossRef]

17. X. Lin, Y. Rivenson, N. T. Yardimei, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361, 1004–1008 (2018). [CrossRef]

18. J. Peurifoy, Y. C. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy, J. D. Joannopoulos, M. Tegmark, and M. Soljacic, “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. 4, eaar4206 (2018). [CrossRef]

19. S. An, C. Fowler, B. Zheng, M. Y. Shalaginov, H. Tang, H. Li, L. Zhou, J. Ding, A. M. Agarwal, C. Rivero-Baleine, K. A. Richardson, T. Gu, J. Hu, and H. Zhang, “A deep learning approach for objective-driven all-dielectric metasurface design,” ACS Photon. 6, 3196–3207 (2019). [CrossRef]

20. C. C. Nadell, B. Huang, J. M. Malof, and W. J. Padilla, “Deep learning for accelerated all-dielectric metasurface design,” Opt. Express 27, 27523–27535 (2019). [CrossRef]

21. F. Ghorbani, S. Beyraghi, J. Shabanpour, H. Oraizi, H. Soleimani, and M. Soleimani, “Deep neural network-based automatic metasurface design with a wide frequency range,” Sci. Rep. 11, 7102 (2021). [CrossRef]

22. T. S. Qiu, X. Shi, J. F. Wang, Y. F. Li, S. B. Qu, Q. Cheng, T. J. Cui, and S. Sui, “Deep learning: a rapid and efficient route to automatic metasurface design,” Adv. Sci. 6, 1900128 (2019). [CrossRef]

23. S. S. An, B. W. Zheng, M. Y. Shalaginov, H. Tang, H. Li, L. Zhou, Y. X. Dong, M. Haerinia, A. M. Agarwal, C. Rivero-Baleine, M. Kang, K. A. Richardson, T. Gu, J. J. Hu, C. Fowler, and H. L. Zhang, “Deep convolutional neural networks to predict mutual coupling effects in metasurfaces,” Adv. Opt. Mater. 10, 2102113 (2022). [CrossRef]

24. Z. Li, R. Pestourie, Z. Lin, S. G. Johnson, and F. Capasso, “Empowering metasurfaces with inverse design: principles and applications,” ACS Photon. 9, 2178–2192 (2022). [CrossRef]

25. M. V. Berry, “Quantal phase-factors accompanying adiabatic changes,” Proc. R Soc. London A 392, 45–57 (1984). [CrossRef]

26. O. Akin and H. V. Demir, “High-efficiency low-crosstalk dielectric metasurfaces of mid-wave infrared focal plane arrays,” Appl. Phys. Lett. 110, 143106 (2017). [CrossRef]

27. A. Nagarajan, K. van Erve, and G. Gerini, “Ultra-narrowband polarization insensitive transmission filter using a coupled dielectric-metal metasurface,” Opt. Express 28, 773–787 (2020). [CrossRef]

28. H. B. Yağcı and H. V. Demir, “‘Meta-atomless’ architecture based on an irregular continuous fabric of coupling-tuned identical nanopillars enables highly efficient and achromatic metasurfaces,” Appl. Phys. Lett. 118, 081105 (2020). [CrossRef]

29. R. Osgood Jr. and X. Meng, Numerical Methods (Springer, 2021).

30. O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, Lecture Notes in Computer Science (2015), Chap. 28, pp. 234–241.

31. R. Caruana, “Multitask learning,” Mach. Learn. 28, 41–75 (1997). [CrossRef]

32. M. J. Barth, R. R. McLeod, and R. W. Ziolkowski, “A near and far-field projection algorithm for finite-difference time-domain codes,” J. Electromagn. Waves Appl. 6, 5–18 (1992). [CrossRef]

33. P. Pellatfinet, “Fresnel diffraction and the fractional-order Fourier-transform,” Opt. Lett. 19, 1388–1390 (1994). [CrossRef]

34. T. B. Kanmaz, E. Ozturk, H. V. Demir, and C. Gunduz-Demir, “Code for 'Deep learning-enabled electromagnetic near-field prediction and inverse design of metasurfaces',” Koc University2023https://mysite.ku.edu.tr/cgunduz/downloads/em-estimator.

Frequency (THz)	Wavelength (nm)	Near-Field (MSE)	Far-Field (MSE)
428.3	700	5.0e-08	4.6e-06
481.8	622	8.0e-08	3.9e-06
535.3	560	1.8e-07	1.5e-07
588.9	509	2.0e-07	6.5e-07
642.4	467	4.3e-07	4.2e-10
695.9	430	4.7e-07	3.2e-07
749.5	400	5.6e-07	5.2e-08

	Precision	Recall	F-score
Sample 1 (Fig. 8)	98.90	99.08	98.99
Sample 2 (Fig. S10)	99.01	99.14	99.07
Sample 3 (Fig. S10)	98.87	99.01	98.94
Sample 4 (Fig. S10)	99.11	99.04	99.08
Sample 5 (Fig. S10)	99.29	99.42	99.35

Frequency (THz)	Wavelength (nm)	Near-Field (MSE)	Far-Field (MSE)
428.3	700	5.0e-08	4.6e-06
481.8	622	8.0e-08	3.9e-06
535.3	560	1.8e-07	1.5e-07
588.9	509	2.0e-07	6.5e-07
642.4	467	4.3e-07	4.2e-10
695.9	430	4.7e-07	3.2e-07
749.5	400	5.6e-07	5.2e-08

	Precision	Recall	F-score
Sample 1 (Fig. 8)	98.90	99.08	98.99
Sample 2 (Fig. S10)	99.01	99.14	99.07
Sample 3 (Fig. S10)	98.87	99.01	98.94
Sample 4 (Fig. S10)	99.11	99.04	99.08
Sample 5 (Fig. S10)	99.29	99.42	99.35

Deep-learning-enabled electromagnetic near-field prediction and inverse design of metasurfaces

Abstract

1. INTRODUCTION

2. METHOD

A. Metasurface Design Configurations

B. Deep Neural Network Designs

3. RESULTS

A. Forward Problem for Metaline and Triangular Grid Configurations

B. Forward Problem for Random Pillar Configuration

C. Inverse Design Problem for Metaline and Triangular Grid Configurations

D. Inverse Design Problem for Random Pillar Configuration

E. Verification on Achromatic Lens Design

4. DISCUSSION AND CONCLUSION

Funding

Acknowledgment

Disclosures

Data availability

Supplemental document

REFERENCES

Supplementary Material (1)

Data availability

Cited By

Figures (12)

Tables (2)

Optica

Tevfik Bulent Kanmaz	https://orcid.org/0009-0000-7597-5193
Efe Ozturk	https://orcid.org/0009-0002-9131-2677
Cigdem Gunduz-Demir	https://orcid.org/0000-0003-0724-1942