Hybrid inverse design scheme for nanophotonic devices based on encoder-aided unsupervised and supervised learning

Shuai Yu; Tian Zhang; Jian Dai; Kun Xu

doi:10.1364/OE.505089

1. Introduction

The development of integrated photonics devices raises the demand for precision and generalization in their design methods. As a result, various novel inverse design schemes inspired by computational algorithms have been proposed in recent years [1,2]. They are expected to model structural parameters and characteristics by data rather than physical formulas [1,3]. Machine learning, which developed significantly in recent years and has been widely applied in many different research aspects, has also been introduced to the research of the inverse design of photonic devices [2,4–6]. Among them, supervised learning is an effective tool for building models in inverse design schemes [5,7]. Generally, the datasets for training the supervised learning models are generated randomly from the parameters space. However, there is a critical problem unsolved for most of these inverse design schemes based on supervised learning that the models generalize relatively poorly to the cases which occupy a small proportion of the training dataset. This problem will worsen the result of inverse design. For example, the spectrums which carry the characteristics of the device’s function, such as multi plasmon-induced transparency (PIT) phenomenon, require the nanophotonic devices to be designed only in certain limited structural parameter spaces [8,9]. These key spectrums will be overwhelmed by the regular instances with a larger proportion by the traditional dataset generation method, where the datasets are generated evenly over the structural parameter spaces. This problem will lead to an imbalanced dataset for the training with the supervised learning models, which is also known as overfitting in machine learning. Some researchers tried to mitigate this problem by increasing the proportion of certain types of data, which are more relevant to the target to be designed, in the training dataset [10]. They adjusted the balance of the dataset by presetting different material groups used in the structures. Under this circumstance, higher accuracies can be obtained than training the model with an evenly chosen dataset with the same scale. However, this method is not universal enough, because it only applies to the problem with different types of structures and needs manual grouping. A method, which can automatically classify the arbitrary datasets in advance before the training by supervised models, is of great significance.

As mature unsupervised methods for figuring out the characteristics differences among unlabeled instances, clustering algorithms have been intensively employed in many fields, such as economic data analysis [11], weather forecasting [12]. Specially, it has also been applied in some inverse design schemes with different types of fundamental devices [13–15]. In these works, these algorithms are employed to verify the effectiveness of the encoders where the high dimensional data is encoded into the data with low dimension. In addition, it has also been used in the identification of photonic modes [16]. However, the clustering algorithms only serve as measurements for whether the classifications by machine learnings are reasonable. So far, clustering algorithms have not been introduced into inverse design problems to improve the performance of the inverse design, such as the accuracy. Thus, in this article, we firstly propose a hybrid scheme based on supervised, and unsupervised learning, i.e. clustering algorithms, for the inverse design of nanophotonics devices. Different from previous applications [13–16], the unsupervised learning algorithms in our scheme are used for dataset divisions before the training of the supervised learning models. This scheme is expected to improve the performance of the inverse design applying ANNs, for which plenty of attractive works have been proposed, such as tandem neural network [7], mixture density network [17] and dimensionality reduction [18]. Furthermore, to enhance the performance of clustering methods by making the distribution of the dataset in high-dimension space, such as the spectrums, more even, the encoder model, which has been proven to be an effective tool [19,20], is introduced before the clustering process. The encoder-aided clustering algorithm can effectively reduce the influence of the overfitting and inadequate training caused by the differences in the proportion of different types of spectral characteristics in the training of supervised learning models. And the reduced dimensionality of the encoded characteristics can also effectively reduce the computational consumption for the clustering operation.

In order to verify the effectiveness of our inverse design scheme, a classical planar metamaterial is selected as the device to be designed. Based on this structure, a special optical phenomenon, dual PIT, can be observed and adjusted. In practice, the ANNs assisted by clustering algorithms and the encoder model perform significantly better than the ANNs model trained directly by the entire database. Concretely, the mean squared errors (MSE) for both training datasets and test datasets can be reduced by more than 50%. Thus, our hybrid scheme offers an efficient optimization for inverse design works, especially for those with complex design targets. Moreover, the characteristics encoding approach we used can also be integrated into other research requiring even distribution features or lower calculation resources.

2. Scheme description and device structure

2.1 Hybrid inverse design scheme

The flowchart of our inverse design scheme can be seen in Fig. 1. Unlike the common method in the previous inverse design works based on machine learning, ANNs in our scheme are assisted by cluster algorithms and a trained encoder model. They can effectively promote the performance of the inverse design. In detail, the design progress can be divided into four parts. (i) Data Preparation: since the outputs of the inverse design ANNs are structural parameters, which are discrete, the inverse design ANNs are difficult to converge with a small dataset. Besides, the simulation for obtaining data is time consuming. To extend the dataset for inverse design, a forward ANN is built in advance, which is trained by the original dataset obtained from repeated finite-difference time-domain (FDTD) simulations. After verifying the effectiveness of this prediction model, we use the trained forward ANN as a tool to extend the dataset. It is an effective approximate method under the high prediction accuracy attained by the trained forward ANNs [21]. (ii) The characteristics encoding part introduces the encoder-decoder model, which is trained by the extended dataset obtained in the first part. With this model, the 1000-dimensional discretized transmission spectrums of all the instances in the dataset will be encoded to code sequences. (iii) Inverse design: the extended dataset is first divided into N parts (N ≥ 2) according to the encoded characteristics of the transmission spectrums by employing clustering algorithms. Independent inverse design ANNs, in which the discretized transmissions are set as the input, and the structural parameters are set as the output, will be trained by instances from their own corresponding clusters. The total losses of the inverse design are the weighted summations of the losses from all the independent models and are calculated as:

(1)$$Loss = \frac{{\sum\limits_{i = 1}^N {{w_i} \times los{s_i}} }}{{\sum\limits_{i = 1}^N {{w_i}} }}$$

where w_i (i = 1, 2, …, N) are the proportions of the number of instances in different clustering groups to the total database and loss_i is the loss of the ANN model trained by the data from the cluster i. (iv) As the “Model usage” part shown in Fig. 1, for a specific spectrum to be designed, it will be coded by the trained encoder model, and the distances of the code sequence to the centroids of different clustering groups are calculated to decide the cluster it belongs to. Then the designed structural parameters will be predicted by the corresponding inverse ANN.

Fig. 1. Flowchart of the hybrid inverse design scheme based on encoder-aided unsupervised and supervised learning

Download Full Size | PDF

2.2 Planar PIT metamaterials

In order to verify the performance of the hybrid inverse design scheme, a structure that can provide complex and various spectral characteristics is needed. Thus, we select a periodic metamaterial, whose basic unit consists of four metal strips made of aluminum and a silica substrate, to design, as shown in Fig. 2(a). The four metal strips are labeled as 1, 2, 3, and 4, respectively, where 1 and 4 lay along the y direction, and 2 and 3 lay along the x direction. The structural parameters of strips 2 and 3 are set identically and the structural unit is distributed symmetrically according to the red dashed line in Fig. 2(a). The numerical simulations are carried out by the FDTD method. In the simulations, the frequency-dependent permittivity of aluminum is described by the Drude model ε_Al= ε_∞ - ω_p² / (ω²+ iωγ), where ε_∞ = 3.7, ω_p = 2.24 × 10¹⁶rad/s and γ = 1.22 × 10¹⁴rad/s represent the dielectric constant at the infinite frequency, the plasmon frequency and the damping constant, respectively [22]. The periodic boundary conditions are employed for the structural unit in the x and y directions, and the perfectly matched layer boundary condition for the z direction. The incident light whose electric field polarization is along the y direction vertically illuminates the structural unit.

Fig. 2. (a) Structure schematic of the planar PIT metamaterials. The period of the structural unit is 640 nm for both the x and y direction. The lengths of the strips are set as L₁ = 300 nm, L₂ = 260 nm, and L₄ = 500 nm, respectively. The widths of the strips are set as W₁ = 50 nm, W₂ = 100 nm, and W₄= 50 nm, respectively. Considering the limit of the structural size in actual fabrication, the thicknesses of metal strips are set the same, which is D = 100 nm. The g₁ = 30 nm, g₂ = 60 nm, and g₃ = 30 nm are gaps between strips 1 and 2, 2 and 3, and 2 and 4, respectively. (b) Simulated transmission spectrums corresponding to structures with different combinations of metal strips.

Download Full Size | PDF

As illustrated by the black line in Fig. 2(b), a dual PIT can be observed in the transmission spectrum of the metamaterial with four strips, which can be attributed to the destructive interference between two bright modes and one dark mode. Concretely, since the electric field of the incident light is polarized along the y direction, the resonances on strips 1 and 4 are strongly coupled to the incident light, while resonances on strips 2 and 3 cannot be excited by the incident light, acting as the dark mode jointly. To clarify the coupling mechanism between different modes more clearly, structures without different strips are also investigated as shown in Fig. 2(b). It can be found that the resonances on strips 2 and 3 cannot be excited directly by the incident light (indicated by the green line), but it can be excited by strongly coupling with the bright mode and give rise to a narrow transmission peak (indicated by the blue and red lines). In other words, two single PITs can be obtained by the structure without strip 1 and the structure without strip 4, respectively. These interferences bring significant variations into the transmission spectrums, which lead to the difficulty of the traditional inverse design methods, and our hybrid scheme is expected to address this difficult problem. Besides, the dual PIT phenomenon can be explained and analyzed by employing a three-level system and the Lorentzian model. Detailed theoretical analysis is presented in Appendix.

3. Dataset extension and characteristics encoding

3.1 Forward ANN and dataset extension

In our scheme, a forward ANN model for the transmission prediction is firstly constructed for dataset extension. The activation function ‘SELU’ [23], and the optimization ‘Adam’ are used for the optimization of this ANN. After trying different network structures, we establish a fully connected neural network with a structure of [7-200-200-200-400-400-800-800-1000]. The learning rate and the regularization terms are set as learning rate = 0.0001, β₁ = 0.9, β₂ = 0.999, respectively. To verify the effectiveness of the ANNs, MSE is selected to evaluate the performance of the networks. The MSE can be calculated as [24]:

(2)$$MSE({y,\textrm{ }y^{\prime}} )= \frac{1}{n}\sum\limits_{i = 0}^{n - 1} {{{({{y_i} - y{^{\prime}_i}} )}^2}} ,$$

where n is the number of samples, y and y’ represent the predicted sample and the simulated one, respectively, y_i and y^’_iare their corresponding discrete features.

The dataset of 10000 instances is collected by repeated FDTD simulations for the training of the forward ANN model. Structural parameters of every instance are generated randomly from parameters space without repetition. The variation ranges of the structural parameters are set as follows: L₁ [300 nm - 400 nm], L₄ [400 nm - 500 nm], L₂ [200 nm - 280 nm], W₁ [10 nm - 100 nm], W₄ [10 nm - 100 nm], W₂ [10 nm - 100 nm], D [10 nm - 200 nm] with the value gaps set as 10 nm, which provides us with tremendous design selections enough for designs. All the instances are divided into two parts randomly: 80 percent of instances are used for training the model and the rest 20 percent are used for testing. After 2000 iterations of training, the MSEs of the training set and test set reduce to 0.00005 and 0.00016, respectively. Two instances are selected randomly from the training and test set as shown in Fig. 3. The transmission spectrums from the simulations and predicted by the ANN agree well. After that, we obtain an additional 50,000 instances by employing this trained model with non-repetitive sets of structural parameters for dataset extension.

Fig. 3. Simulated transmission spectrum and ANN-predicted transmission spectrum of two instances chosen randomly from (a) the training set and (b) the test set.

Download Full Size | PDF

3.2 Characteristics encoding model

After the dataset extension, the ANN for characteristics encoding is constructed and trained by the extended dataset. As shown in Fig. 4(a), the total model is combined of an encoder model with a network structure of [1000-1600-1200-800-400-d] and a decoder model with a network structure of [d-400-800-1200-1600-1000]. The d refers to the dimensionality of the code in encoder-decoder model. The activation ‘Sigmoid’ [25] is chosen for the last layers of both the encoder and the decoder, and the activation ‘ReLU’ [26] is chosen for all the other layers. Other hyperparameter settings are the same as the forward ANN. Since the dimensionality of the bottleneck is the critical element that determines the performances of the encoder-decoder models [27], different ds are tried for the encoder-decoder model we build. As can be seen in Fig. 4(b), the MSEs of the trained models almost converge after the d larger than 5. Considering the model's recovering accuracy and dimensionality reduction effect of the code sequences, the d is set as 5.

Fig. 4. (a) Diagram of the ANN-based encoder-decoder model. (b) MSEs of the training and test dataset of the encoder-decoder model as a function of the dimensionality of the code. Input and output transmission spectrums of two randomly selected instances from (c) the training set and (d) the test set.

Download Full Size | PDF

After 200 epochs of training, the MSEs of the training and test set reduce to 0.9582 × 10⁻⁴ and 1.1909 × 10⁻⁴, respectively. We randomly select two instances from the training and test set and compare the transmission spectrums constructed from their input and output data, as depicted in Fig. 4(c-d), respectively. Obviously, this trained encoder-decoder model can encode the 1000-dimensional transmittance data to 5-dimensional code and recover the code sequence to the original transmittances. Once this model is established, the encoder model can be used independently for all discretized transmission data of this structure.

4. Inverse design and discussion

After obtaining the required dataset and encoder model, we start to cluster the processed dataset. To demonstrate the effectiveness of our proposed hybrid scheme, different clustering algorithms are combined with the ANNs as mentioned in section 2.1. Two typical fundamental clustering algorithms based on different theory are employed, including agglomerative hierarchical clustering with average linkage [28] (AHC) and mini-batch K-means [29] (MKm). When performing clustering operations, it is very critical to determine the number of clusters. We choose the silhouette coefficient [30], which has been widely used in clustering research, to determine the number of clusters required for the planar PIT metamaterial dataset. The silhouette coefficient is the ratio of the distance between each sample to the samples in its cluster and the nearest cluster, and it can be calculated as:

(3)$$silhouette\textrm{ }coefficient = \frac{1}{n}\sum\limits_{i = 1}^n {\frac{{b(i )- a(i )}}{{\max \{{a(i ),b(i )} \}}}} \textrm{ }$$

where a(i) is the average distance between sample i and all the other samples in its own cluster, and b(i) is the average distance between sample i and all the samples in another cluster closest to the cluster it belongs to. A larger silhouette coefficient means the differences between clusters are more significant. It can be seen from Fig. 5(a) that for the planar PIT metamaterial dataset, with the clustering number of N = 3 the clustering result has the highest silhouette coefficients, which means it has the best clustering performance. After clustering, the physical responses of the instances in each cluster have some underlying patterns. The detailed discussion is presented in Appendix B.

Fig. 5. (a) Silhouette coefficients of the clusters obtained from the AHC and Mkm with different cluster numbers. (b) The flowchart of three different inverse design procedures. (c) The universal architecture of a single ANN in inverse design used in this article. Comparison of performance evaluators: (d) MSE, (e) R², (f) S_p of the inverse design ANNs in three procedures.

Download Full Size | PDF

Then the datasets in 3 clusters are inputted into the corresponding ANN models, respectively, as the Procedure 3 shown in Fig. 5(b). To better illustrate the advantages of our inverse design scheme, we carry out other two inverse design processes for comparison as Procedure 1 and 2 shown in Fig. 5(b). In detail, in Procedure 1, the ANN model is trained and tested by the entire database directly. The training set randomly selected from the instances with a ratio of 0.8, and the rest instances are set as the test set. In Procedure 2, the entire dataset is clustered into three groups based on the cluster results of the discretized transmissions of the instances. In Procedure 3, the entire dataset is clustered into three groups based on the corresponding 5-dimensional code gained from the trained encoder model described in section 3.2. Particularly, in the second and third procedures three independent ANN models will be trained and tested by the corresponding clustered dataset.

For the consistency of the comparison, the network structures and other settings should be kept the same. Notably, although the trainable weights of the ANNs in the hybrid scheme are three times those for Procedure 1, the training time and computing resources are basically the same since the size of the dataset is the same as the ANN in Procedure 1. After attempting different network structures, activations and hyperparameters, we fix the same ANN architecture and settings, as shown in Fig. 5(c). It is worth mentioning that since our scheme can provide different models for different cluster groups, if the network settings are optimized for each group separately, better performance can be obtained based on our hybrid scheme. The activation ‘SELU’ and optimization ‘Adam’ are chosen for this ANN. The learning rate and the regularization terms are set as: learning rate is 0.0001, β₁ = 0.9 and β₂ = 0.999, respectively. In order to reduce the appearance of overfitting, which is a common problem for ANNs, we introduce the dropout layers to our inverse design models. By adding dropout layers, certain neurons will stop working with a certain probability p (p is set as 0.1 in our models) during forward propagation. This process method can make the model more generalized because it does not heavily depend on some local feature [31].

The performances of these three procedures are shown in Fig. 5(d-f). Here, to more comprehensively evaluate the performance of our proposed scheme, three different evaluators are employed, including MSE, R² and S_p. Notably, appropriate metrics of the inverse designs can also be obtained by carefully investigating [32]. Concretely, the R² is defined as [24]:

(4)$${R^2}({y,\textrm{ }y^{\prime}} )= 1 - \frac{{\sum\nolimits_{i = 1}^7 {{{({{y_i} - y{^{\prime}_i}} )}^2}} }}{{\sum\nolimits_{i = 1}^7 {{{({{y_i} - y^{\prime\prime}} )}^2}} }},$$

where y^’’ is the average of all samples for a single feature. And the S_p is defined as:

(5)$${S_p} = \left( {1 - \frac{{\left|{\sum\nolimits_{i = 1}^7 {({y_i} - y_i^{\prime})} } \right|}}{{y_i^{\prime}}}} \right) \times 100,$$

If the predicted features are completely consistent with the real ones, the S_p is 100. A higher S_p means the trained model has a better performance. By comparing the blue bars and black bars in Fig. 5(d-f), it can be found that clustering algorithms have a positive influence on the performance of inverse design ANNs, in other words, with lower MSE, higher R² and higher S_p. And the encoder model can further improve the performances of the inverse design as the red bars show. The detailed accuracy evaluators of the three inverse design processes are shown in Table 1 and the variances of the metrics are presented in the parentheses. Concretely, by implementing our scheme, the MSE can be decreased by 51% - 86%, the R² can be improved by 2.32% - 3.65%, and S_p can be improved by 3.12% - 4.77%. In addition, the three applied metrics for evaluating the performance of the ANNs are all focusing on the structural parameters. Thus, we add a new evaluator considering the designed transmission spectrum, named S_DTW as shown in Table 1. The S_DTW represents the distance between the target spectrum and the designed spectrum, which is the spectrum of the metamaterials with the inversely designed structural parameters. The S_DTW is calculated by the dynamic time wrapping algorithm [33]. Here, the trained forward ANN in data preparation is used to generate the designed spectrum instead of using FDTD simulation. In total, by applying the encoder-aided clustering algorithms in the dataset divisions, the accuracies of the inverse design ANNs can be effectively promoted. And the encoder model can additionally reduce the training time of the clustering algorithms owing to the greatly reduced data dimension. Apparent, our inverse design scheme can not only make progress in the designs of the nanophotonic devices facing overfitting problems but also play a role in the works in need of mapping data into low-dimensional space. Moreover, the training results of Procedure 3 with other clustering algorithms are presented in Appendix C.

Table 1. The performance evaluators of the inverse design ANNs in different procedures

View Table | View all tables in this article

To better illustrate the effectiveness of reducing overfitting by Procedure 3, we compare the performance of Procedure 1 and Procedure 3 with MKm on test datasets from different clusters. All the MSEs are significantly reduced as shown in Fig. 6(a), which proves that by the division of different types of data sets, the impact of overfitting can be reduced. As is commonly known, the scale of the dataset generally influences the training effectiveness of ANNs. To figure out the data efficiency of our models, we have tried different scales of the datasets as shown in Fig. 6(b). For 100 epochs of training, increasing the scale of dataset can effectively improve the performance of ANNs in Procedure 1 before the scale of dataset reach 70,000, while the improvement by scaling up the dataset is no longer valid after 70,000. But for Procedure 3, the clustered data has a smaller scale, for which the trend is still effective. As the orange bars shown in Fig. 6(a), for all the clusters, when the entire database is enlarged to 100,000 sets, the MSEs are all further reduced. This also demonstrates the advantage of the Procedures 3.

Fig. 6. (a) Test MSEs of independent ANNs trained by the entire database, different clusters and scaled up clusters (100,000) when designing test datasets of different clusters. (b) Iterations of the training and test loss functions of the ANNs trained by different scales of database.

Download Full Size | PDF

Concretely, two typical sets of instances are depicted in Fig. 7 by employing directly trained ANN (Procedure 1) and our scheme (Procedure 3). For the target spectrum in Fig. 7(a), both Procedure 1 and Procedure 3 can bring a similar spectrum, but the result of Procedure 3 is relatively more accurate (MSE₁ = 0.0148, MSE₃ = 0.0008). For the target spectrum in Fig. 7(b), we can obtain an approximate transmission spectrum based on Procedure 3, while the result of Procedure 1 has a significant difference in the transmission spectrum where another obvious dip and peak occurs (MSE₁ = 0.0255, MSE₃ = 0.0027). Apparently, by employing Procedure 3, the performance of both cases, whose physical response is sensitive and not sensitive to the deviation of the structural parameters, can be improved.

Fig. 7. (a-b) Target transmission spectrum and simulated transmission spectrums corresponding to the structural parameters designed by Procedure 1 and 3, respectively.

Download Full Size | PDF

Moreover, we also employ Procedure 1 and 3 to design for the target transmission that does not exist in the simulation database. The target spectrum is set as a combination of two constant sequences and two Gaussian curves as shown by the red dashed line in Fig. 8, and it can be generated as:

(6)$$\textrm{target} = \left\{ {\begin{array}{{ll}} {0.946\textrm{ }}&{\lambda < 1.1\mathrm{\mu m},\lambda \ge 1.9\mathrm{\mu m}}\\ {1 - \frac{1}{{\sqrt {2\pi } }}{e^{ - \frac{{{{({\lambda - 1.3} )}^2}}}{{0.02}}}}}&{1.1\mathrm{\mu m} \le \lambda < 1.5\mathrm{\mu m}}\\ {1 - \frac{2}{{\sqrt {2\pi } }}{e^{ - \frac{{{{({\lambda - 1.7} )}^2}}}{{0.02}}}}}&{1.5\mathrm{\mu m} \le \lambda < 1.9\mathrm{\mu m}} \end{array}} \right.,$$

Fig. 8. Target spectrum generated by combing Gaussian distribution and the design result applying inverse design ANN in Procedure 1 and 3.

Download Full Size | PDF

The transmission spectrum of the structure designed by Procedure 1 and 3 are also shown in Fig. 8. Procedure 3 provides a more accurate design compared with Procedure 1. Although the designed transmission spectrum is not completely consistent with the target for Procedure 3, the valleys at the required wavelengths are obtained with the required proportions. Thus, our proposed Procedure 3 is also effective than the Procedure 1 when dealing with the spectrum not feasible in our planar metal metamaterials.

5. Conclusion

In this paper, we propose a hybrid inverse design scheme applying ANNs assisted by clustering algorithms and an encoder model to resolve the performance imbalance caused by overfitting in the ANN-based nanophotonic inverse designs. In order to verify the effectiveness of this scheme, we design typical planar metal metamaterials that can exhibit tunable dual PITs in the transmission spectrums. The results reveal that compared with traditional inverse design schemes with ANNs, our proposed hybrid scheme offers an efficient performance improvement in the design of the structural parameters based on corresponding transmission spectrums, especially for those with complex design targets. Concretely, by employing the encoder-aided MKm clustering algorithm in the preprocessing of the dataset, the loss functions of ANNs for the inverse design can be reduced by 86% and 61% for training and test datasets, respectively.

Appendix A: Principle of dual PIT phenomenon and Lorentz model

In order to analyze the dual PIT phenomena occurring in the transmission spectrum of the planar metal metamaterials described in section 2.2, the Lorentzian oscillator model [34] is employed. The dual PIT can be explained by a three-level system [35], as shown in Fig. 9(a). The bright modes on strip 1 and strip 2 (labeled in Fig. 2(a) in the main content) are expressed as |B₁> and |B₂>, respectively, and the dark mode is expressed as |D > . There are two types of pathways existing in this system: |0> - |B_1,2>, |0> - |B_1,2> - |D > - |B_1,2>, which results in the dual PIT by destructively interference with each other as shown in Fig. 8. The following equation can be obtained [34]:

(7)$$\left[ {\begin{array}{{ccc}} {\omega - {\omega_{\textrm{B,1}}} + i{r_{\textrm{B,1}}}}&{{\kappa_1}}&0\\ {{\kappa_1}}&{\omega - {\omega_\textrm{D}} + i{r_\textrm{D}}}&{{\kappa_2}}\\ 0&{{\kappa_2}}&{\omega - {\omega_{\textrm{B,2}}} + i{r_{\textrm{B,2}}}} \end{array}} \right] \times \left[ {\begin{array}{{c}} {\widetilde {{B_1}}}\\ {\widetilde D}\\ {\widetilde {{B_2}}} \end{array}} \right] ={-} \left[ {\begin{array}{{c}} { - {\alpha_\textrm{1}}\widetilde {{E_0}}}\\ 0\\ { - {\alpha_\textrm{2}}\widetilde {{E_0}}} \end{array}} \right]$$

where ω_B,1 and ω_B,2 are the resonance frequency of bright modes, ω_D is the resonance frequency of the dark mode, γ_B,1, γ_B,2 and γ_D represent the damping factors of corresponding modes, κ₁ and κ₂ are the coupling strengths between bright mode |B₁>, |B₂> and dark mode |D>, and α₁ and α₂ are the coupling strengths between two bright modes and the incident light, respectively. We define the following variables, which are only related to the corresponding resonance modes:

(8)$$\left\{ {\begin{array}{{c}} {{\gamma_1} = \omega - {\omega_{\textrm{B,1}}} + i{\gamma_{\textrm{B,1}}}}\\ {{\gamma_2} = \omega - {\omega_\textrm{D}} + i{\gamma_\textrm{D}}}\\ {{\gamma_3} = \omega - {\omega_{\textrm{B,2}}} + i{\gamma_{\textrm{B,2}}}} \end{array}} \right.$$

Fig. 9. (a) Three-level diagram in the coupling system of the dual PIT exists in the transmission spectrum of the planar metal metamaterial. (b) Simulated transmission spectrum of the planar PIT metamaterial and the corresponding fitting curve by the Lorenz model. (c) Magnetic field distributions on the x-y plane at the wavelength of transmission dips A. (d) Magnetic field distributions on the x-y plane at the wavelength of transmission dips B. (e) Magnetic field distributions on the x-y plane at the wavelength of transmission dips C.

Download Full Size | PDF

Then we can solve (6) and gain the transmittance of the structure expressed as:

(9)$$T = 1 - {\left|{\frac{{\widetilde {{B_1}}}}{{\widetilde {{E_0}}}}} \right|^2} - {\left|{\frac{{\widetilde {{B_2}}}}{{\widetilde {{E_0}}}}} \right|^2}$$

where the amplitudes of bright modes B₁ and B₂ are:

(10)$$\widetilde {{B_1}} = \frac{{{\kappa _1}\widetilde {{E_0}}({\kappa _1}{\alpha _1}/{\gamma _1} + {\kappa _2}{\alpha _2}/{\gamma _3})}}{{{\gamma _1}(\frac{{{\kappa _1}^2}}{{{\gamma _1}}} + \frac{{{\kappa _2}^2}}{{{\gamma _3}}} - {\gamma _2})}} - \frac{{{\alpha _1}\widetilde {{E_0}}}}{{{\gamma _1}}},\textrm{ }\widetilde {{B_2}} = \frac{{{\kappa _2}\widetilde {{E_0}}({\kappa _1}{\alpha _1}/{\gamma _1} + {\kappa _2}{\alpha _2}/{\gamma _3})}}{{{\gamma _3}(\frac{{{\kappa _1}^2}}{{{\gamma _1}}} + \frac{{{\kappa _2}^2}}{{{\gamma _3}}} - {\gamma _2})}} - \frac{{{\alpha _2}\widetilde {{E_0}}}}{{{\gamma _3}}}$$

As can be found in Fig. 9(b), the curve obtained from the Lorentzian oscillator model fits well with the simulated transmission spectrum at the wavelength of 900 nm to 2000nm where the dual PIT occurs. The parameters in (10) are fitted as κ₁ = 23.45, κ₂ = 42.27, α₁= 62.62, α₂ = 33.5, γ_B,1 = 3.349 THz, γ_B,2 = 7.026 THz, γ_D = 63.9 THz, ω_B,1 = 207.6 THz, ω_D = 241.9 THz, ω_B,2 = 259 THz. The magnetic field distributions at the wavelengths of the three dips are also shown in Fig. 9(c-e). It can be found from field distributions of dips A and C that resonances on strips 1 and 4 can be excited directly by the incident light, acting as bright modes, while resonances on strips 2 and 3 can only be excited by coupling with two bright modes as shown in the magnetic field distribution of dip B.

Appendix B: Distribution of the codes in different cluster and corresponding transmission spectrums of typical individuals

We analyze the relationship between structural parameters’ ranges and the clusters and try to figure out the roles of structural parameters to different clusters. The distributions of structural parameters in different clusters are present in Fig. 10. Different clusters show obvious distribution characteristics in some structural parameters. For example, most individuals in Cluster 1 have small L₄, and most of the individuals in Clusters 2 and 3 have large L₄, which corresponds to the excitement of bright mode at the C point shown in Fig. 9. The individuals in cluster 3 basically have a low D, which tends to excite all the resonant modes.

Fig. 10. Distribution of structural parameters in different clusters clustered by MKm. Every circle in a row represents an individual with a corresponding value of the structural parameter in the column.

Download Full Size | PDF

To analysis the characteristic of the different cluster, we also use the codes of 3 clusters obtained by MKm clustering as shown in Fig. 11. Notably, this method has been intensively used in the inverse design to analysis the relation of structural parameters and physical response [36,37]. The transmission spectrums corresponding to the three centroid individuals are shown in Fig. 11, respectively. It can be found that by applying structures near the centroids of different clusters, single PITs with lower peaks, single PITs with higher peaks, and dual PIT spectral shapes can be obtained by cluster 1, 2 and 3, respectively. We also investigated the individuals located at the midpoints between the centroids. The spectral shapes between different clusters can be approximately regarded as the transition forms between centroids.

Fig. 11. Distribution of the codes of instances in different groups clustered by MKm and the transmission spectrums corresponding to the instances closest to the centroid of cluster and the middle points between different centroids. Here, black dots, red dots, and blue dots represent the codes of cluster 1, 2, and 3, respectively. The white enlarged labels are the individuals closest to the centroids of the clusters.

Download Full Size | PDF

Appendix C: Performance of Procedure 3 merged with different clustering algorithms

As described in section 4, different kinds of clustering algorithms can be merged into the hybrid inverse design scheme we propose in this article. Other than the AHC and MKm used in the main content, three more complex clustering algorithms, including agglomerative hierarchical clustering using ward linkage [38] (Ward), balanced iterative reducing and clustering using hierarchies [39] (BIRCH) and spectral clustering [40] (SpC), are merged into the hybrid scheme. The performance evaluators of the inverse design ANNs in Procedure 3 merged with different clustering algorithms are shown in Table 2. It can be seen that ANNs assisted by different clustering methods can all obtain better performance than the bare ANNs in our planar PIT metamaterials. Notably, the performance of these clustering-assisted ANNs is not the same, because each clustering algorithm is suitable for processing different data. Concretely, MKm has the shortest time-consuming, and SpC has the largest time-consuming and the best performance. Thus, according to the complexity and time consuming of the data, appropriate clustering algorithm can be selected in our hybrid inverse design scheme to obtain good performance. Moreover, the variational autoencoder, which can complete encoding and clustering work at the same time, can become an option for further research.

Table 2. The evaluators of bare ANNs and ANNs assisted by different clustering algorithms and the time costs of different clustering algorithms on a server with 56 threads [Intel Xeon CPU E5-2680 v4 @ 2.40 GHz]

View Table | View all tables in this article

Funding

National Natural Science Foundation of China (62171055, 61821001 , 62135009, 61705015, 61625104, 61971065); State Key Laboratory of Information Photonics and Optical Communications (IPOC2020ZT08, IPOC2020ZT03).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Molesky, Z. Lin, A. Y. Piggott, et al., “Inverse design in nanophotonics,” Nat. Photonics 12(11), 659–670 (2018). [CrossRef]

2. T. Zhang, J. Wang, Q. Liu, et al., “Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks,” Photonics Res. 7(3), 368–380 (2019). [CrossRef]

3. F. Ferranti, “Feature-Based Inverse Modeling of Nanophotonic Devices,” in 2023 Photonics & Electromagnetics Research Symposium (PIERS), (IEEE, 2023), pp. 1526–1531.

4. J. Peurifoy, Y. Shen, L. Jing, et al., “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. 4(6), eaar4206 (2018). [CrossRef]

5. Y. A. Yilmaz, A. M. Alpkilic, A. Yeltik, et al., “Inverse design of efficient and compact 1× N wavelength demultiplexer,” Opt. Commun. 454, 124522 (2020). [CrossRef]

6. F. Ferranti, “Feature-based machine learning for the efficient design of nanophotonic structures,” Photonics and Nanostructures-Fundamentals and Applications 52, 101077 (2022). [CrossRef]

7. D. Liu, Y. Tan, E. Khoram, et al., “Training deep neural networks for the inverse design of nanophotonic structures,” ACS Photonics 5(4), 1365–1369 (2018). [CrossRef]

8. W. Y. Li, X. Zhai, X. J. Shang, et al., “Multi-spectral plasmon induced transparency based on three-dimensional metamaterials,” Opt. Mater. Express 7(12), 4269–4276 (2017). [CrossRef]

9. J. Zhou, S. Yu, T. Zhang, et al., “Theoretical analysis of multiple plasmon-induced absorption effects in plasmonic waveguides side-coupled with resonators structure and its applications,” in Fifth Symposium on Novel Optoelectronic Detection Technology and Application, (International Society for Optics and Photonics, 2019), pp. 1102341.

10. C. Qiu, X. Wu, Z. Luo, et al., “Nanophotonic inverse design with deep neural networks based on knowledge transfer using imbalanced datasets,” Opt. Express 29(18), 28406–28415 (2021). [CrossRef]

11. Z. Zhu and N. Liu, “Early warning of financial risk based on K-means clustering algorithm,” Complexity 2021, 1–12 (2021). [CrossRef]

12. F. Ferstl, M. Kanzler, M. Rautenhaus, et al., “Time-hierarchical clustering and visualization of weather forecast ensembles,” IEEE Trans. Visual. Comput. Graphics 23(1), 831–840 (2017). [CrossRef]

13. Y. Li, M. Deng, Z. Liu, et al., “Inverse Design of Unidirectional Transmission Nanostructures Based on Unsupervised Machine Learning,” Adv. Opt. Mater. 10, 2200127 (2022). [CrossRef]

14. W. Ma, F. Cheng, Y. Xu, et al., “Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy,” Adv. Mater. 31(35), 1901111 (2019). [CrossRef]

15. W. Ma and Y. Liu, “A data-efficient self-supervised deep learning model for design and characterization of nanophotonic structures,” Sci. China Phys. Mech. Astron. 63(8), 284212 (2020). [CrossRef]

16. C. Barth and C. Becker, “Machine learning classification for field distributions of photonic modes,” Commun. Phys. 1(1), 58 (2018). [CrossRef]

17. R. Unni, K. Yao, and Y. Zheng, “A Deep Convolutional Mixture Density Network for Inverse Design of Layered Photonic Structures,” ACS Photonics 7(10), 2703–2712 (2020). [CrossRef]

18. M. Zandehshahvar, Y. Kiarashi, M. Chen, et al., “Inverse design of photonic nanostructures using dimensionality reduction: reducing the computational complexity,” Opt. Lett. 46(11), 2634–2637 (2021). [CrossRef]

19. W. Kong, J. Chen, Z. Huang, et al., “Bidirectional cascaded deep neural networks with a pretrained autoencoder for dielectric metasurfaces,” Photonics Res. 9(8), 1607–1615 (2021). [CrossRef]

20. J. Ma, Y. Huang, M. Pu, et al., “Inverse Design of Broadband Metasurface Absorber Based on Convolutional Autoencoder Network and Inverse Design Network,” J. Phys. D: Appl. Phys. 53(46), 464002 (2020). [CrossRef]

21. N. J. Dinsdale, P. R. Wiecha, M. Delaney, et al., “Deep learning enabled design of complex transmission matrices for universal optical components,” ACS Photonics 8(1), 283–295 (2021). [CrossRef]

22. M. A. Ordal, R. J. Bell, R. W. Alexander, et al., “Optical properties of fourteen metals in the infrared and far infrared: Al, Co, Cu, Au, Fe, Pb, Mo, Ni, Pd, Pt, Ag, Ti, V, and W,” Appl. Opt. 24(24), 4493–4499 (1985). [CrossRef]

23. G. Klambauer, T. Unterthiner, A. Mayr, et al., “Self-normalizing neural networks,” in Advances in Neural Information Processing Systems, (2017), pp. 971–980.

24. D. Melati, M. K. Dezfouli, Y. Grinberg, et al., “Design of Compact and Efficient Silicon Photonic Micro Antennas With Perfectly Vertical Emission,” IEEE J. Select. Topics Quantum Electron. 27(1), 1–10 (2021). [CrossRef]

25. G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control. Signals, Syst. 2(4), 303–314 (1989). [CrossRef]

26. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

27. D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” arXiv, arXiv:2003.05991 (2020). [CrossRef]

28. W. H. Day and H. Edelsbrunner, “Efficient algorithms for agglomerative hierarchical clustering methods,” J. Classif. 1(1), 7–24 (1984). [CrossRef]

29. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, Fifth Berkeley Symposium on Mathematical Statistics and Probability (University of California Press, 1967), pp. 281–297.

30. P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math. 20, 53–65 (1987). [CrossRef]

31. G. E. Hinton, N. Srivastava, A. Krizhevsky, et al., “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv, arXiv:1207.0580 (2012). [CrossRef]

32. M. Zandehshahvar, Y. Kiarashi, M. Zhu, et al., “Metric learning: harnessing the power of machine learning in nanophotonics,” ACS Photonics 10(4), 900–909 (2023). [CrossRef]

33. M. Müller, Dynamic time warping, Information retrieval for music and motion (2007), pp. 69–84.

34. A. Artar, A. A. Yanik, and H. Altug, “Multispectral plasmon induced transparency in coupled meta-atoms,” Nano Lett. 11(4), 1685–1689 (2011). [CrossRef]

35. S. Zhang, D. A. Genov, Y. Wang, et al., “Plasmon-induced transparency in metamaterials,” Phys. Rev. Lett. 101(4), 047401 (2008). [CrossRef]

36. M. Zandehshahvar, Y. Kiarashinejad, M. Zhu, et al., “Manifold learning for knowledge discovery and intelligent inverse design of photonic nanostructures: breaking the geometric complexity,” ACS Photonics 9(2), 714–721 (2022). [CrossRef]

37. Y. Kiarashinejad, M. Zandehshahvar, S. Abdollahramezani, et al., “Knowledge discovery in nanophotonics using geometric deep learning,” Adv. Intell. Syst. 2(2), 1900132 (2020). [CrossRef]

38. J. H. Ward Jr, “Hierarchical grouping to optimize an objective function,” J. Am. Stat. Assoc. 58(301), 236–244 (1963). [CrossRef]

39. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering method for very large databases,” Sigmod Rec. 25(2), 103–114 (1996). [CrossRef]

40. M. Fiedler, “Algebraic connectivity of graphs,” Czech. Math. J. 23(2), 298–305 (1973). [CrossRef]

Procedure		Training set				Test set
		MSE	R²	S_p	S_DTW	MSE	R²	S_p	S_DTW
1		0.0036 (0.0109)	0.9593 (0.2594)	92.3415 (28.5)	5.2311 (6.640)	0.0041 (0.0135)	0.9531 (0.2896)	91.8482 (29.86)	5.9375 (7.0574)
2	AHC	0.0016 (0.0055)	0.9812 (0.1293)	94.3693 (14.01)	4.6488 (3.4845)	0.0025 (0.0117)	0.9699 (0.2233)	93.3005 (24.59)	5.2086 (3.7289)
2	MKm	0.0009 (0.0012)	0.9889 (0.0160)	95.7253 (15.02)	4.5462 (3.1378)	0.0022 (0.0070)	0.9749 (0.1087)	94.0316 (22.09)	4.5931 (2.9199)
3	AHC	0.0009 (0.0013)	0.9884 (0.0147)	96.4005 (8.933)	4.0189 (3.0601)	0.0020 (0.0109)	0.9725 (0.1838)	94.7105 (21.61)	4.5432 (2.7692)
3	MKm	0.0005 (0.0011)	0.9943 (0.0163)	96.7460 (7.363)	3.8949 (2.6537)	0.0016 (0.0009)	0.9808 (0.2355)	95.0361 (19.32)	4.1697 (3.3659)

Clustering Algorithms	Training set			Test set			Time cost (s)
Clustering Algorithms	MSE	R²	S_p	MSE	R²	S_p	Time cost (s)
Non	0.0036	0.9593	92.3415	0.0041	0.9531	91.8482	/
AHC	0.0009	0.9884	96.4005	0.0020	0.9725	94.7105	160.79
MKm	0.0005	0.9943	96.7460	0.0016	0.9808	95.0361	0.23
BIRCH	0.0006	0.9916	96.4474	0.0014	0.9806	95.0942	1.45
Ward	0.0004	0.9957	97.6226	0.0013	0.9831	97.6226	141.78
SpC	0.0003	0.9960	97.3820	0.0011	0.9866	96.0151	541.05

Procedure		Training set				Test set
		MSE	R²	S_p	S_DTW	MSE	R²	S_p	S_DTW
1		0.0036 (0.0109)	0.9593 (0.2594)	92.3415 (28.5)	5.2311 (6.640)	0.0041 (0.0135)	0.9531 (0.2896)	91.8482 (29.86)	5.9375 (7.0574)
2	AHC	0.0016 (0.0055)	0.9812 (0.1293)	94.3693 (14.01)	4.6488 (3.4845)	0.0025 (0.0117)	0.9699 (0.2233)	93.3005 (24.59)	5.2086 (3.7289)
2	MKm	0.0009 (0.0012)	0.9889 (0.0160)	95.7253 (15.02)	4.5462 (3.1378)	0.0022 (0.0070)	0.9749 (0.1087)	94.0316 (22.09)	4.5931 (2.9199)
3	AHC	0.0009 (0.0013)	0.9884 (0.0147)	96.4005 (8.933)	4.0189 (3.0601)	0.0020 (0.0109)	0.9725 (0.1838)	94.7105 (21.61)	4.5432 (2.7692)
3	MKm	0.0005 (0.0011)	0.9943 (0.0163)	96.7460 (7.363)	3.8949 (2.6537)	0.0016 (0.0009)	0.9808 (0.2355)	95.0361 (19.32)	4.1697 (3.3659)

Clustering Algorithms	Training set			Test set			Time cost (s)
Clustering Algorithms	MSE	R²	S_p	MSE	R²	S_p	Time cost (s)
Non	0.0036	0.9593	92.3415	0.0041	0.9531	91.8482	/
AHC	0.0009	0.9884	96.4005	0.0020	0.9725	94.7105	160.79
MKm	0.0005	0.9943	96.7460	0.0016	0.9808	95.0361	0.23
BIRCH	0.0006	0.9916	96.4474	0.0014	0.9806	95.0942	1.45
Ward	0.0004	0.9957	97.6226	0.0013	0.9831	97.6226	141.78
SpC	0.0003	0.9960	97.3820	0.0011	0.9866	96.0151	541.05

Hybrid inverse design scheme for nanophotonic devices based on encoder-aided unsupervised and supervised learning

Abstract

1. Introduction

2. Scheme description and device structure

2.1 Hybrid inverse design scheme

2.2 Planar PIT metamaterials

3. Dataset extension and characteristics encoding

3.1 Forward ANN and dataset extension

3.2 Characteristics encoding model

4. Inverse design and discussion

5. Conclusion

Appendix A: Principle of dual PIT phenomenon and Lorentz model

Appendix B: Distribution of the codes in different cluster and corresponding transmission spectrums of typical individuals

Appendix C: Performance of Procedure 3 merged with different clustering algorithms

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (2)

Equations (10)

Optics Express