Hybrid supervised and reinforcement learning for the design and optimization of nanophotonic structures

Christopher Yeung; Christopher Yeung; Benjamin Pham; Zihan Zhang; Zihan Zhang; Katherine T. Fountaine; Aaswath P. Raman; Aaswath P. Raman

doi:10.1364/OE.512159

1. Introduction

Nanophotonic devices and circuits can manipulate the phase, amplitude, and polarization of light within an ultra-compact footprint. Due to these unique properties, nanophotonic devices are now widely used in a broad range of next-generation technologies and applications, such as wireless communications [1,2], passive/active thermal management [3,4], and optical displays for virtual or augmented reality [5,6]. For example, to achieve arbitrary transformations to an incident wave front, metagratings or metasurfaces utilize subwavelength scattering elements and phase-shifting materials to suppress undesired diffraction orders and reroute incident power towards desired ones with high efficiency [7]. This fundamental yet versatile capability has allowed metagratings to tailor the electromagnetic spectrum and realize holograms [8,9], beam steerers [10,11], and ultrathin optical components [12,13]. However, the development of new and complex nanophotonic devices, including metagratings and metasurfaces, faces tremendous bottlenecks due to the large and nonlinear design space of materials and geometries that must be explored.

To meet the increasing demand for high-performance and application-specific nanophotonic devices, several strategies have been proposed and deployed over the past decade to solve the inverse design problem, which is the retrieval of the optimal material and structure in response to the desired optical behavior. Conventional inverse design methods include evolutionary algorithms [14] and adjoint-based optimization [15]. Though such algorithms have been successfully applied to various photonic design problems [16], in recent years, inverse design frameworks based on artificial intelligence and deep learning have emerged and proved to be advantageous in numerous aspects. In comparison to conventional algorithms, deep learning-based methods have demonstrated superior device performance, computational efficiency, and the ability to derive new physical insights from the investigated design space [17,18,19,20]. Despite significant progress in deep learning for photonic design, many challenges and limitations still remain, in particular with each category of machine learning algorithms.

Deep and machine learning approaches can be grouped into three main classes of algorithms: supervised learning (SL), unsupervised learning (USL), and reinforcement learning (RL) [21]. SL and USL deep learning typically involves training a neural network, or related model, by using large quantities of labeled and unlabeled data, respectively. Once trained, the models can arrive at solutions orders of magnitude faster than conventional inverse design methods since the model has in principle captured a high-dimensional nonlinear function approximation between the inputs and outputs [22]. In the photonics context, SL and USL have been applied to core-shell nanoparticles [23], infrared-controlled metasurfaces [24,25], optical thin-films [26,27], and more [28–31]. Though promising, the acquisition of large training datasets may not always be practical, particularly when a near-optimal solution may have already been found by the time sufficient training data is acquired. Additionally, it is well-known that SL-trained models struggle to solve problems that are too beyond the original training dataset [32,33,34]. Thus, the main limitations of both SL and USL are their training data requirements and poor model generalizability, while their core strengths are post-training computational speed and efficiency.

On the other hand, in RL, a model known as an “agent” is trained through trial-and-error by evaluating its actions within an environment and producing a corresponding reward (e.g., a favorable action will be met with a positive reward and vice-versa). Therefore, there is no need for any training data, and RL is not bound to “prior knowledge” (as in SL and USL) since the method is inherently capable of self-exploration and exploitation. As a result, several works have demonstrated RL for the design of photonic devices and components [35,36,37,38]. Recent results show that RL can achieve lower variance and higher performance than the adjoint method as well as adjoint optimization-enhanced generative networks that rely on little to no training data [39,40]. Another unique property of SL and RL (compared to conventional optimization algorithms) is the ability to reuse and apply a solution (or trained model) to other similar problems via transfer learning [40,41]. However, due to the exploratory nature of RL, it is substantially slower than SL or USL models that are trained on readily-available datasets, with problems taking up to days [41,42] or even an entire month [43,44,45] to solve.

In this work, we aim to combine the strengths of supervised and reinforcement learning for the design and optimization of nanophotonic structures, which in turn addresses their individual weaknesses to achieve superior design performance. Specifically, we show that RL can enhance SL by using the former to explore beyond the latter’s initial training dataset. Correspondingly, we also demonstrate that SL can enhance RL by providing the RL algorithm with a better starting design for training/optimization, which in turn allows RL to achieve better performance. By utilizing a relatively small training dataset (∼5,000 data points), we first train a convolutional neural network (CNN) and leverage its high speed to inverse design a silicon-on-insulator metagrating in response to an arbitrary target electric field profile (shown in Fig. 1(a)). After generating the metagrating design, we refine device performance further than that achievable through the SL model alone by passing the design through an RL process (shown in Fig. 1(b)). The RL agent sequentially adjusts the metagrating design by either making grating widths smaller or larger to minimize the difference between the target and output electric field patterns. Through this workflow, we extend the generalizability of the SL model’s predictions, while simultaneously reducing RL training time by providing the algorithm with a better starting point for training. We validate our results by comparing the individual and combined application of SL and RL for a variety of design targets.

Fig. 1. Inverse design of transmissive metagratings. (a) Illustration of design problem. Incident light (1.5 µm) transmitted through an unknown metagrating design, consisting of patterned strips of Si on SiO₂, produces a particular electric field profile (E_Target). (b) Deep learning design framework integrating supervised and reinforcement learning. A convolutional neural network predicts an approximate metagrating design, then reinforcement learning improves metagrating performance by modifying design parameters until E_Target-E_Output is minimized and the unknown design is found. This joint method allows us to overcome the limitations associated with each individual deep learning algorithm.

Download Full Size | PDF

2. Results and discussion

Supervised learning data preparation and model evaluation

We first evaluated the ability of a supervised learning model to perform the inverse design of metagratings by building a neural network model that was trained on a relatively small training dataset. As shown in Fig. 1(b), our design problem is graphically illustrated as a 2D metagrating captured along the XY-plane of the device, where a plane wave (1.5 µm under X-polarization) was injected at normal incidence above the structure (propagating along the Y-axis), and a transmissive electric field profile was generated below the structure. The metagrating consisted of patterned strips of Si on SiO₂ (of uniform heights/thicknesses) that extended infinitely along the Z-axis and maintained unit-cell periodicity along the X-axis. A unique metagrating design was represented by 13 Si strips with individual widths that varied from 0 to 800 nm in 200 nm increments (which reflects realistic lithographic limits), with 0 nm indicating the absence of the strip. Thus, the total number of possible parameter combinations in the design space was on the order of 10⁹. In addition, symmetry boundary conditions were applied to the center of the structure along the Y-axis. Using this setup, 5,000 sets of metagrating parameters were randomly generated and simulated in MaxwellFDFD [46] (a MATLAB library) to capture their near-field (<2D²/λ) electromagnetic responses for deep learning. Here, we calculated the electric field magnitudes ($E = \sqrt {E_x^2 + E_y^2 + E_z^2} $) below the metagrating structure, and utilized this profile as image inputs for the deep learning model. To ensure sufficient signal-to-noise ratio, we scaled the color bar from 0 to 1 V/m and represented the electric field in the ‘hot’ color scheme within MATLAB. We note that this method of representing the target E-field in principle allows a more complex specification of phase and amplitude, and enables a designer to easily “draw” the required device response for a given application.

Next, we trained a CNN in the TensorFlow framework by using the E-field images as inputs and grating widths as outputs (detailed in the Supplement 1). During hyperparameter tuning and model training, we note that two types of loss behaviors were observed. As shown in Fig. 2(a), the model frequently overfitted while the training and validation losses (mean squared error or MSE) diverged. When additional measures were taken to reduce overfitting (e.g., dropout, neural architecture search, and data augmentation or noise reduction), model accuracy was compromised and underfitting occurred (shown in Fig. 2(b)). This suggests that regardless of model complexity, there were insufficient samples available to train the CNN [47]. To further assess the limitations of SL with insufficient training data, we tested the model’s ability to inverse design metagratings with target E-field profiles that were associated with “known designs” (which were generated in advance using full-wave simulations). We note that the “known designs” and target fields were used here for benchmarking purposes, since the evaluation of “unsimulated” target fields require a more thorough evaluation of targets that are physically achievable within a given design space.

Fig. 2. Supervised learning (SL) for the inverse design of metagratings. Inputs to the neural network were E-field images and outputs were device parameters. (a,b) Indications of training a model with insufficient data. Training and validation losses either (a) overfit or (b) underfit regardless of model complexity. Using the best achievable model, (c) example design targets were tested, which consist of metagratings with various E-field profiles: a single focal point (top), collimated beam (middle), and dual collimated beams (bottom). Targets were known designs that were withheld from the training dataset. (d) Designs predicted by a model trained with insufficient training data. Inset images show the corresponding metagrating designs.

Download Full Size | PDF

Figure 2(c) presents several example design targets: a single focal point (top), collimated beam (middle), and dual collimated beams (bottom), all of which were withheld from the training dataset. Furthermore, we confirmed that no structures/designs existed in the training data which corresponded to the sought design targets. Each target was passed into the CNN that had the least overfitting and lowest validation loss (from Fig. 2(b); detailed in the Supplement 1), then the model’s output designs were simulated for final verification. Simulated designs are shown in Fig. 2(d), where we observe only marginal similarities between the target and designed E-field profiles.

Reinforcement learning configuration and evaluation

After identifying cases where the SL model failed to generalize beyond the training data, we investigated the effectiveness of RL towards performing the same design task outlined in the previous section. Here, we implemented the proximal policy optimization (PPO) algorithm [48], which previously demonstrated world championship winning results in electronic sports [49]. To apply PPO to our design problem, the following components and configurations were required: action definitions, interactions between states and environment, and the reward function. Two actions were defined for each optimizable element (to either increase or decrease the grating width by 200 nm), yielding a total of 26 actions. We note that this width-changing increment is a user-defined parameter that can be tailored to account for nano-fabrication compatibility. At the initial timestep (t), the state of the metagrating was represented as the following vector: s_t= [s₁,s₂,…,s₁₃]. After performing an action in the next timestep, the state was modified accordingly. For instance, given an action a₁ (up to a₂₆), which modifies the first position within the state, the following interaction takes place: s_t + 1(a₁) = [s₁+ 0.2,s₂,…,s₁₃]. State values were normalized from 0 to 1 for ease of training. At each timestep, the state was passed into the electromagnetic solver (Maxwell FDFD), which simulated the structure to produce an output image. We then compared the output image to the user-specified target image by using the structural similarity index (SSIM) [50] to quantify the difference between the two images. We note that SSIM uses the value of 1 to define a perfect match between images. However, we subtracted the original SSIM value by 1 to use 0 instead to represent a perfect match. This change was applied in order to reflect the figure-of-merit (FOM) evolution of standard optimization techniques [51], which typically minimize the FOM during optimization to monitor performance improvements.

For the agent to learn the best actions to take under specific states, a reward function was required, which penalizes or rewards the agent based on its action/state responses. To calculate the reward function, we stored the SSIM values of the previous timestep during training, such that the next action was rewarded positively if ΔSSIM was negative and vice-versa. In other words, we incentivised the agent to perform an action that produces an output image with higher similarity to the target image. In Tables S1-S3 of the Supplement 1, we performed hyperparameter tuning on the PPO algorithm, and evaluated numerous linear and nonlinear reward functions. Here, we observe that the nonlinear reward functions (e.g., sigmoidal) performed significantly better than linear functions, particularly during the early stages of training, where we exponentially rewarded/penalized actions in response to larger changes in ΔSSIM. To expedite our RL trials, we note that the reward function and hyperparameter tuning were performed using a custom-made arbitrary function regression environment (detailed in the Supplement 1) with the same number of parameters in our design space, rather than the MaxwellFDFD-integrated environment. We emphasize that our study consists of three stages: model optimization on the expedited environment, final model training in the FDFD environment, and final model testing on the FDFD environment. Due to the comparatively lengthy training times of using the FDFD environment, our expedited environment is intended to provide a faster alternative that allows us to capture the ideal hyperparameters for an RL algorithm that is optimizing 26 actions/parameters (ranging from 0-1) that apply design changes in increments of 0.2. Once the optimal hyperparameters were determined for the parameter definition representing our design space, we applied these parameters to the RL algorithm and trained it using FDFD simulations to evaluate the generated designs. The trained RL model was then applied to new design targets that were not in the training dataset.

After optimizing our reward function, we trained the RL algorithm using the same targets shown in Fig. 2 for 5,000 episodes and 20 timesteps per episode (which is greater than the amount of timesteps needed to reach the known designs). As the starting point for training, an initial state of all 200 nm wide gratings was used for each target (i.e., s_t= [0.2₁,0.2₂,…,0.2₁₃]). To produce a larger sample size, three designs were generated for each target, and each design took approximately one week of training. Calculations were run on a machine with a 16-core 3.40 GHz processor and 64 GB of RAM. Training results are presented in Fig. 3, where we generally observe poor convergence and design performance. Since model training is initialized using random weights and actions, we note that each RL run produced a different design after 5,000 episodes. The results here indicate that longer training times or additional model optimization were required.

Fig. 3. Reinforcement learning (RL) for the inverse design of metagratings. The example design targets were fed into the proximal policy optimization (PPO) algorithm and trained for 5,000 episodes (∼1 week) and 20 timesteps per episode using a fixed metagrating starting design. Three designs were generated for each target and poor design convergence can be observed; indicating longer training times or additional model optimization were required. Inset images show the corresponding metagrating designs.

Download Full Size | PDF

Combining supervised and reinforcement learning

In this section, we combined SL and RL to simultaneously address the previously demonstrated limitations of each deep learning method. We used the CNN-predicted metagrating designs from Fig. 2 as starting points for RL, and presented the results in Fig. 4. As an additional comparison, we also combined SL with conventional optimization and presented the results in Figure S1 of the Supplement 1. Specifically, we used the same CNN-predicted metagrating designs as starting points for particle swarm optimization (PSO). We note that the combination of neural networks with PSO has been demonstrated in prior works [52], and PSO itself is a widely used algorithm for electromagnetic optimization with comparable performance to evolutionary algorithms [53]. In Fig. 4, we observe that after 5,000 episodes, the SL + RL approach produced designs with significantly higher accuracy and performance than the RL-only results (in relation to the input targets). From a qualitative standpoint, the SL + RL designs exhibited the sought behaviors, including: a single focal point (top), collimated beam (middle), and dual collimated beams (bottom). It can also be seen that the SL + RL approach performs better than the SL + PSO method in the same number of time and optimization steps (the degree of which is quantified in Fig. 5). The results here indicate that the SL model was able to assist the RL algorithm by reducing the time required for the latter to obtain an adequate solution. However, we note that the dual collimated beam designs only produced faint signals or low-power E-fields, which suggests that the degree of accuracy may depend on the complexity of target design, since such targets may possess fewer solutions (or require more precise solutions) in comparison to simpler E-field profiles. To this end, longer training times may produce results with greater accuracy, which is beyond the scope of the current study. Furthermore, we note that the number of episodes used in this study was selected based on observed minimum requirements necessary to produce meaningful improvements to the CNN-predicted designs. We therefore anticipate that a CNN model trained on a larger dataset may reduce the number of RL episodes required for the optimization to converge, which may also be explored in future works.

Fig. 4. Combining SL and RL for the inverse design of metagratings. Pre-trained CNN predictions were used as the starting point for RL, which subsequently ran for 5,000 episodes and 20 timesteps per episode. In comparison to SL- and RL-only predictions, the joint approach produced superior accuracy and converged in less time. The degree of accuracy appears to depend on the complexity of the input target. Inset images show the corresponding metagrating designs.

Download Full Size | PDF

Fig. 5. Inverse design performance comparison. Structural similarity index (SSIM) values of the generated designs were used to evaluate the performance of the (a) SL-only, (b) RL-only, (c) SL + PSO, and (d) SL + RL models. Higher performance is indicated by a lower SSIM. In comparison to SL-only, RL-only, and SL + PSO models, SL + RL predictions produced significantly lower variance and mean SSIM.

Download Full Size | PDF

In Figs. 5(a)-(d), we qualitatively compared the performance of each method by evaluating the SSIM values of the SL-only, RL-only, SL + PSO, and SL + RL designs, respectively. Figure 5(a) presents the SSIM of the design outputs from the best three CNN models found during model optimization. Figures 5(b) and 5(d) show the SSIM of the RL designs corresponding to Fig. 3 and Fig. 4, respectively. Figure 5(c) shows the SSIM values of the SL + PSO designs corresponding to Figure S1 of the Supplement 1. We observe that the SL + RL designs resulted in 4 times less variance (standard deviation < 1%) and 65% higher performance (lower SSIM) than SL-only designs. The SL + RL designs also had 10 times less variance and over 30% higher performance than the RL-only designs in the same number of episodes. Furthermore, SL + RL was able to achieve 50% SSIM improvement in comparison to SL + PSO. Therefore, in this study, we show that RL can be used to generalize the predictions of an SL model by exploring beyond the latter’s training dataset, while SL can assist RL in its search process by using a relatively small training dataset to capture high-level features first. Additionally, in Figure S2 of the Supplement 1, we captured additional RL and SL + RL training times, recorded their SSIM design performance values, and extrapolated the results to approximate the peak performance training times of each method. Here, we observe that it would take approximately 30 days for RL to achieve the best result which we obtained through RL + SL in 7 days, thus showing a 4x improvement in training time and efficiency.

We note that prior attempts at combining SL and RL required iterating through pre-trained or weight-fixed model predictions [54], which does not fit the contemporary definition of RL, where the model learns (and optimizes its weights) from the trial-and-error process itself. Thus, to our knowledge, this is the first study to explore the simultaneous application of SL and RL for nanophotonic design. Additionally, since our goal was to demonstrate the advantages of merging different deep learning techniques, we note that our implementation of RL can be enhanced with many future improvements to increase design complexity. For example, more advanced action definitions can be applied to simultaneously adjust multiple elements rather than a single element at a time, further reward function engineering may yield faster convergence rates, and parallelizing the environment or electromagnetic solver can also expedite state evaluations tremendously. In terms of application and practicality, we believe that RL is uniquely applicable over traditional algorithms in situations where local minima-trapping during optimization is a prevalent issue [39,40]. Additionally, for design spaces which require repeated exploration (such as obtaining vastly different optical behaviors with similar materials/structures), SL can drastically reduce the computation time for RL (which we validate in Figure S2 of the Supplement 1).

While many advantages have been presented in our combination of SL and RL, we note that several potential limitations have been observed as well. Most prominently, the lengthy training times associated with RL may still prohibitively scale with design space complexity. For example, since 2D design problems can take RL days to resolve, 3D design problems can be expected to take weeks or months without sufficient hardware resources. However, we note that many contemporary advancements to deep learning, which have not yet been adapted for the photonics domain, show promise in improving training time and efficiency. For complex designs that require many actions, we note that other works have demonstrated RL in the context of hundreds or even thousands of parameters [55]. Additional parameterization strategies such as level-set or topology optimization methods can also be integrated with RL to increase design complexity. Importantly, with our combined SL and RL approach, we show that training data requirements can be reduced, and model predictions do not have to be limited by the initial dataset, all by strategically harnessing the strengths of distinct deep learning algorithms and methodologies. The methods and approaches shown in this work are generally applicable and can be adapted to domains beyond nanophotonic design.

3. Conclusions

In summary, we present a multi-class deep learning strategy for nanophotonic circuit and device inverse design that combines distinct machine learning algorithm classes, particularly supervised learning (SL) and reinforcement learning (RL), to extend the capabilities of each individual method. Notably, SL suffers from its large training data requirements and inability to generalize well beyond its training dataset, which we highlight here in the context of metagrating design. On the other hand, RL provides explorative design possibilities without the need for training data, and has demonstrated better performance than conventional optimization algorithms (such as particle swarm optimization or PSO), but at the cost of very lengthy training times (ranging from days to months). To overcome their respective limitations, here we trained an SL model with a relatively small dataset to identify a high-level solution, then applied RL to achieve a more precise solution. Our results show that SL + RL designs (of metagratings with arbitrarily-defined E-field profiles) resulted in 4 times less variance (standard deviation < 1%) and 65% higher performance than SL-only designs, and 50% higher performance than SL + PSO designs. In comparison to RL-only designs, SL + RL achieved over 10 times less variance and over 30% higher performance in the same number of episodes or training cycles. Our work therefore highlights the potential of combining multiple classes of machine learning to produce more effective and practical solutions for photonic design. We anticipate future work exploring the efficacy of this approach in designing metasurfaces given completely new, unsimulated field profiles, where it might offer a new degree of freedom to optical designers. We thus believe these results will inspire the development of further hybrid algorithms or approaches that address major contemporary challenges associated with the applications of deep learning for photonic device and circuit optimization.

Funding

Defense Advanced Research Projects Agency (W911NF2110345); National Science Foundation (ECCS-2146577).

Acknowledgements

This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by UCLA Institute for Digital Research and Education's Research Technology Group.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available at [56].

Supplemental document

See Supplement 1 for supporting content.

References

1. H. Zhao, Y. Shuang, M. Wei, et al., “Metasurface-assisted massive backscatter wireless communication with commodity Wi-Fi signals,” Nat. Commun. 11(1), 3926 (2020). [CrossRef]

2. J. Y. Dai, W. K. Tang, J. Zhao, et al., “Wireless communications through a simplified architecture based on time-domain digital coding metasurface,” Adv. Mater. Technol. 4(7), 1900044 (2019). [CrossRef]

3. W. Li and S. Fan, “Nanophotonic control of thermal radiation for energy applications,” Opt. Express 26(12), 15995–16021 (2018). [CrossRef]

4. O. Ilic, C. M. Went, and H. A. Atwater, “Nanophotonic heterostructures for efficient propulsion and radiative cooling of relativistic light sails,” Nano Lett. 18(9), 5583–5589 (2018). [CrossRef]

5. K. P. Jha, X. Ni, C. Wu, et al., “Metasurface enabled remote quantum interference,” Phys. Rev. Lett. 115(2), 025501 (2015). [CrossRef]

6. R. Bekenstein, I. Pikovski, H. Pichler, et al., “Quantum metasurfaces with atom arrays,” Nat. Phys. 16(6), 676–681 (2020). [CrossRef]

7. Y. Ra’di, D. L. Sounas, and A. Alù, “Metagratings: beyond the limits of graded metasurfaces for wave front control,” Phys. Rev. Lett. 119(6), 067404 (2017). [CrossRef]

8. Z. L. Deng, J. Deng, X. Zhuang, et al., “Facile metagrating holograms with broadband and extreme angle tolerance,” Light Sci. Appl. 7(1), 78 (2018). [CrossRef]

9. Z. Deng, J. Deng, G. P. Wang, et al., “Metagrating holograms with ultra-wide incident angle tolerances and high diffraction efficiencies,” CLEO, JW2A.111 (2018). [CrossRef]

10. C. McDonnell, J. Deng, S. Sideris, et al., “Terahertz metagrating emitters with beam steering and full linear polarization control,” Nano Lett. 22(7), 2603–2610 (2022). [CrossRef]

11. A. Casolaro, A. Toscano, A. Alu, et al., “Dynamic beam steering with reconfigurable metagratings,” IEEE Trans. Antennas Propag. 22(7), 1542–1552 (2019). [CrossRef]

12. Y. Hu, Y. Zhang, G. Su, et al., “Realization of ultrathin waveguides by elastic metagratings,” Commun Phys. 5(1), 62 (2022). [CrossRef]

13. W. Wan, M. Luo, and Y. Su, “Ultrathin polarization-insensitive, broadband visible absorber based rectangular metagratings,” Opt. Commun. 458, 124857 (2020). [CrossRef]

14. S. So, Y. Yang, S. Son, et al., “Highly suppressed solar absorption in a daytime radiative cooler designed by genetic algorithm,” Nanophotonics 11(9), 2107–2115 (2022). [CrossRef]

15. J. Wang, Y. Shi, T. Hughes, et al., “Adjoint-based optimization of active nanophotonic devices,” Opt. Express 26(3), 3236–3248 (2018). [CrossRef]

16. S. Molesky, Z. Lin, A. Y. Piggott, et al., “Inverse design in nanophotonics,” Nature Photon. 12(11), 659–670 (2018). [CrossRef]

17. W. Ma, Z. Liu, Z. A. Kudyshev, et al., “Deep learning for the design of photonic structures,” Nat. Photonics 15(2), 77–90 (2021). [CrossRef]

18. J. Jiang, M. Chen, and J. A. Fan, “Deep neural networks for the evaluation and design of photonic devices,” Nat Rev Mater. 6(8), 679–700 (2020). [CrossRef]

19. C. Yeung, D. Ho, B. Pham, et al., “Enhancing adjoint optimization-based photonic inverse design with explainable machine learning,” ACS Photonics 9(5), 1577–1585 (2022). [CrossRef]

20. C. Yeung, J. Tsai, B. King, et al., “Elucidating the behavior of nanophotonic structures through explainable machine learning algorithms,” ACS Photonics 7(8), 2309–2318 (2020). [CrossRef]

21. A. Seyyedabbasi, R. Aliyev, F. Kiani, et al., “Hybrid algorithms based on combining reinforcement learning and metaheuristic methods to solve global optimization problems,” Knowledge-Based Systems 223(107044), 107044 (2021). [CrossRef]

22. C. C. Nadell, B. Huang, J. M. Malof, et al., “Deep learning for accelerated all-dielectric metasurface design,” Opt. Express 27(20), 27523–27535 (2019). [CrossRef]

23. J. Peurifoy, Y. Shen, L. Jing, et al., “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. 4(6), 6 (2018). [CrossRef]

24. C. Yeung, R. Tsai, B. Pham, et al., “Global inverse design across multiple photonic structure classes using generative deep learning,” Adv. Optical Mater. 9, 2170079 (2021). [CrossRef]

25. C. Yeung, J. Tsai, B. King, et al., “Multiplexed supercell metasurface design and optimization with tandem residual networks,” Nanophotonics 10(3), 1133–1143 (2021). [CrossRef]

26. A. Lininger, M. Hinczewski, and G. Strangi, “General inverse design of layered thin-film materials with convolutional neural networks,” ACS Photonics 8(12), 3641–3650 (2021). [CrossRef]

27. M. Fouchier, M. Zerrad, M. Lequime, et al., “Design of multilayer optical thin-films based on light scattering properties and using deep neural networks,” Opt. Express 29(20), 32627–32638 (2021). [CrossRef]

28. S. Beyraghi, F. Ghorbani, J. Shabanpour, et al., “Microwave bone fracture diagnosis using deep neural network,” Sci. Rep. 13(1), 16957 (2023). [CrossRef]

29. F. Ghorbani and H. Soleimani, “Simultaneous estimation of wall and object parameters in TWR using deep neural network,” International Journal of Antennas and Propagation 2022, 1–10 (2022). [CrossRef]

30. F. Ghorbani, J. Shabanpour, S. Beyraghi, et al., “A deep learning approach for inverse design of the metasurface for dual-polarized waves,” Appl. Phys. A 127(11), 869 (2021). [CrossRef]

31. F. Ghorbani, S. Beyraghi, J. Shabanpour, et al., “Deep neural network-based automatic metasurface design with a wide frequency range,” Sci. Rep. 11(1), 7102 (2021). [CrossRef]

32. S. Fort, P. K. Nowak, S. Jastrzebski, et al., “Stiffness: a new perspective on generalization in neural networks,” arXiv, arXiv:1901.09491 (2020). [CrossRef]

33. R. Novak, Y. Bahri, D. A. Abolafia, et al., “Sensitivity and generalization in neural networks: an empirical study,” arXiv, arXiv:1802.08760 (2018). [CrossRef]

34. J. Lenaerts, H. Pinson, and V. Ginis, “Artificial neural networks for inverse design of resonant nanophotonic components with oscillatory loss landscapes,” Nanophotonics 10(1), 385–392 (2020). [CrossRef]

35. H. Wang, Z. Zheng, C. Ji, et al., “Automated multi-layer optical design via deep reinforcement learning,” Mach. Learn.: Sci. Technol. 2(2), 025013 (2021). [CrossRef]

36. H. Hwang, M. Lee, and J. Seok, “Deep reinforcement learning with a critic-value-based branch tree for the inverse design of two-dimensional optical devices,” Applied Soft Computing 127, 109386 (2022). [CrossRef]

37. T. Shah, L. Zhuo, P. Lai, et al., “Reinforcement learning applied to metamaterial design,” J. Acoust. Soc. Am. 150(1), 321–338 (2021). [CrossRef]

38. S. Banerji, A. Hamrick, A. Majumder, et al., “Ultra-compact design of power splitters via machine learning,” IEEE Photonics Conference (IPC), pp. 1–2 (2020).

39. D. Seo, D. W. Nam, J. Park, et al., “Structural optimization of a one-dimensional freeform metagrating deflector via deep reinforcement learning,” ACS Photonics 9(2), 452–458 (2022). [CrossRef]

40. S. Hooten, R. G. Beausoleil, and T. V. Vaerenbergh, “Inverse design of grating couplers using the policy gradient method from reinforcement learning,” Nanophotonics 10(15), 3843–3856 (2021). [CrossRef]

41. T. Badloe, I. Kim, and J. Rho, “Biomimetic ultra-broadband perfect absorbers optimised with reinforcement learning,” Phys. Chem. Chem. Phys. 22(4), 2337–2342 (2020). [CrossRef]

42. I. Sajedian, T. Badloe, and J. Rho, “Optimisation of colour generation from dielectric nanostructures using reinforcement learning,” Opt. Express. 27(4), 5874–5883 (2019). [CrossRef]

43. I. Sajedian, H. Lee, and J. Rho, “Double-deep Q-learning to increase the efficiency of metasurface holograms,” Sci. Rep. 9(1), 10899 (2019). [CrossRef]

44. I. Sajedian, T. Badloe, H. Lee, et al., “Deep Q-network to produce polarization-independent perfect solar absorbers: a statistical report,” Nano Convergence 7(1), 26 (2020). [CrossRef]

45. I. Sajedian, H. Lee, and J. Rho, “Design of high transmission color filters for solar cells directed by deep Q-learning,” Sol. Energy 195, 670–676 (2020). [CrossRef]

46. W. Shin and S. Fan, “Choice of the perfectly matched layer boundary condition for frequency-domain Maxwell's equations solvers,” J. Comput. Phys. 231(8), 3406–3431 (2012). [CrossRef]

47. P. Thanapol, K. Lavangnananda, P. Bouvry, et al., “Reducing overfitting and improving generalization in training convolutional neural network (CNN) under limited sample sizes in image recognition,” 5th International Conference on Information Technology (InCIT), pp. 300–305 (2020).

48. J. Schulman, F. Wolski, P. Dhariwal, et al., “Proximal policy optimization algorithms,” arXiv, arXiv:1707.06347 (2017). [CrossRef]

49. C. Berner, G. Brockman, B. Chan, et al., “Dota 2 with large scale deep reinforcement learning,” arXiv, arXiv:1912.06680 (2019). [CrossRef]

50. Z. Wang, A. C. Bovik, H. R. Sheik, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

51. C. Yeung, B. Pham, R. Tsai, et al., “DeepAdjoint: an all-in-one photonic inverse design framework integrating data-driven machine learning with optimization algorithms,” ACS Photonics 10, 884–891 (2022). [CrossRef]

52. X. Guo, J. Lu, Y. Li, et al., “Inverse design for coating parameters in nano-film growth based on deep learning neural network and particle swarm optimization algorithm,” Photonics 9(8), 513 (2022). [CrossRef]

53. D. W. Boeringer and D. H. Werner, “Particle swarm optimization versus genetic algorithms for phased array synthesis,” IEEE Trans. Antennas Propagat. 52(3), 771–779 (2004). [CrossRef]

54. Z. Huang, X. Liu, and J. Zang, “The inverse design of structural color using machine learning,” Nanoscale 11, 21748–21758 (2019). [CrossRef]

55. N. Stiennon, L. Ouyang, J. Wu, et al., “Learning to summarize from human feedback,” Proceedings of the 34th International Conference on Neural Information Processing Systems, 253, pp. 3008–3021 (2020).

56. C. Yeung and A. Raman, “Raman Lab @ UCLA,” Github (2021), https://github.com/Raman-Lab-UCLA.

Hybrid supervised and reinforcement learning for the design and optimization of nanophotonic structures

Abstract

1. Introduction

2. Results and discussion

3. Conclusions

Funding

Acknowledgements

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (5)

Optics Express