Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Deep learning-based modeling of photonic crystal nanocavities

Open Access Open Access

Abstract

A deep learning (DL)-based approach has been proposed to accurately model the relationship between design parameters and the Q factor of photonic crystal (PC) nanocavities. A convolutional neural network (CNN), which consists of two convolutional layers and three fully-connected layers is trained on a large-scale dataset consisting of 12,500 nanocavities. The experimental results show that the CNN is able to achieve a state-of-the-art performance in terms of prediction accuracy (i.e., up to 99.9999%) and convergence speed (i.e., orders-of-magnitude speedup). The proposed approach overcomes shortcomings of existing methods and paves the way for DL-based on-demand and data-driven optimization of PC nanocavities applicable to the rapid design of nanoscale lasers and photonic integrated circuits. We will open source the database and code as one of our main contributions to the photonics research community.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Photonic Crystal (PC) nanocavities have been the subject of much scientific endeavors in the semiconductor and photonics community [16] and is still of high research interest among many research groups around the globe. PCs are so important because they offer us a way to completely control the propagation of light in materials with minimal energy loss, and nanocavities are often created by introducing defects to the periodic lattice structure of air holes or vertical rods in the PC slab. Examples of such structures include the L3 cavity (with three missing collinear holes), the L5 cavity (with five missing collinear holes), and the H0 cavity (with shifted center holes). Applications of PC nanocavities range from nanoscale lasers [7], LEDs [8], and optical fibers [9] to waveguides [10], Mach-Zehnder switches [11], and solar cells [12]. Over the years, much effort has been put into designing PC nanocavities with high $Q$ factors [1317], because a high $Q$ factor means that photons are trapped inside the structure for a longer time and thus increases the light-matter interaction time. Before the optimization, however, one needs to first characterize the relationship between design parameters and the $Q$ factor. Conventionally, the characterization process has been done manually in simulation software with a trial-and-error approach, which is both computationally costly and time-consuming. Data-driven approaches, on the other hand, model the PC nanocavities in a much more economical and time-efficient way and have been deeply influenced by the rapid advance in artificial intelligence (AI) and data science.

At the core of AI lies machine learning, which in recent years is better known as deep learning (DL) because of its data-driven nature. DL is mainly realized by deep neural networks (DNN) which finds itself applied in increasingly many areas such as face recognition [18], autonomous driving [19] and medical imaging [20]. So far, various efforts utilizing DL and DNN to characterize the relationship between design parameters and optical properties of photonic structures have been demonstrated [13,2125]. For instance, a single-layer convolutional neural network (CNN) followed by two fully-connected (FC) layers was trained on a dataset of size 1000 to characterize a heterostructure 2D PC nanocavity [13]. The obtained regression gave rise to a fairly high $Q$ factor after optimization using gradient descent method [13]. However, their algorithm failed to consider the effect of changes in air hole radius on the $Q$ factor thus resulting in a smaller design parameter space. Also, the dataset size 1000 is generally considered too small for training CNN and as a result the prediction error reported was as high as 16% [13]. Another attempt to model an ultrasmall H0 PC nanocavity using simple FC layers trained with 300 data points was reported [21]. The algorithm demonstrated the effectiveness of FC layers for characterizing PC nanocavities with a small design parameter space and a small dataset; however, it often fails when larger-scale problems are present. Next, a DL framework to map the design space in 1D photonic crystals to optical properties was developed using FC layers trained on 50,000 data points [22]. The obtained minimum test loss of 0.0081 [22] could have been further reduced if CNN had been used, since the dataset has a fairly large size. Therefore, there is still much room for improvement regarding DL-based regression characterization of PC nanocavities in high-dimensional design parameter space and with large-scale datasets.

In this report, we propose a state-of-the-art regression modeling paradigm for 2D PC nanocavities that intelligently takes advantage of 1): DL’s diverse repertoire of frameworks to learn highly complex and nonlinear regression mappings in a data-driven fashion and 2): CNN’s unique ability to efficiently recognize intricate patterns and extract useful information from data in the form of images. We first make 12,500 different L3 PC nanocavities, which is done by randomly shifting the locations and radii of 54 air holes in the slab and calculating their $Q$ factors in Lumerical FDTD. Each nanocavity then becomes a data sample. We then tailor design and train a two-layer CNN that is followed by three FC layers to learn the complex relationship between design parameters (i.e. the location and radius of holes) and the $Q$ factor using the dataset. Our DNN-enabled regression modeling paradigm is trained with a large-scale dataset (12,500 data samples) and demonstrated an ability to map from a high-dimensional design parameter space (54 air holes, 162 parameters) to a scalar $Q$ factor. The trained DNN model makes predictions for Q with an accuracy as high as 99.9999% beating previously published works and learns a regression mapping with a record high correlation coefficient surpassing 99%. When tested on a new validation set the DNN model still achieved a high prediction accuracy up to 99.9993%, showing strong generalization capacity. The proposed regression modeling approach paves the way for DL-based on-demand optimization of PC nanocavities applicable to the rapid prototyping of integrated photonic devices.

2. L3 PC nanocavity

2.1 Base structure

We use a 2D L3 PC nanocavity as the base structure for data generation. The Indium Phosphide (InP) base structure chosen (see Fig. 1) has the following physical parameters: refractive index $n$ = 3.4, lattice constant $a$ = 320 nm, air hole radius $r$ = 89.6 nm, and air holes at the left and right ends of the L3 defect are shifted horizontally by an amount of dx$_{\textrm {shift}} = 0.15a$. This manual shift at the two edges has proven to increase the $Q$ factor by an order of magnitude [3]. Figure 1(a) shows the electric field distribution of the fundamental resonance mode, a $Q$ factor of $4.24\times 10^{5}$ and a modal volume (V$_{\textrm {cav}}$) of 0.97 $(\frac {\lambda }{n})^{3}$ calculated in FDTD. "cav" stands for cavity and the unit $(\frac {\lambda }{n})^{3}$ represents effective cubic wavelengths. The area inside the box in Fig. 1(b) indicates where the DL-based regression learning will take place, which is explained in detail in the next section Data Collection.

 figure: Fig. 1.

Fig. 1. L3 PC nanocavity used as a basis for data generation and regression learning. (a) Electric field ($E_{y}$) profile exhibiting the fundamental resonance mode. (b) Top view of the InP base structure showing its lattice constant $a$, air hole radius $r$, refractive index $n$, and $Q$ factor and modal volume V$_{\textrm {cav}}$ both calculated in FDTD. The red box encircling 54 holes forms a 5x12 array and indicates the area we apply DL regression to. See Dataset 1 for underlying air hole coordinate values [26].

Download Full Size | PDF

2.2 Data collection

The boxed area in Fig. 1(b) is where our algorithm will take effect and it contains a total of 54 air holes (8 in the center row, 12 each in the rows immediately above and below the center row, and 11 each in the rows above and below the second row). We chose this area because it’s where the electric field profile is most strongly present and therefore will have a bigger impact on the $Q$ factor calculations. These 54 air holes will have their design parameters–$(x, y)$ locations and $r$–randomly shifted as in Fig. 2 according to a Gaussian distribution with mean = 0 nm and standard deviation = $a$/640 = 0.5 nm (standard deviations any larger than 0.5 nm will cause Q to quickly drop and stay below $2\times 10^{5}$ and thus is undesirable). Taking training data randomly sampled from Gaussian distributions is a way to maximize the generalization capacity of our model so later unseen samples can also be properly predicted by the DNN [27]. Then we create a total of 12,500 of these randomly shifted L3 structures with their $Q$ factors calculated in FDTD and this becomes our dataset for the DL task. Figure 2(a)-(b) shows examples of two randomly generated nanocavities and their corresponding electric field profiles, where their $Q$ factors and V$_{\textrm {cav}}$’s are also marked. Figure 2(c)-(d) shows the same structures with (red circles) and without (black circles) random design parameter shifts. Figure 3 exhibits our complete training dataset consisting of 12,500 samples. Figure 3(a)-(c) show the histograms of design parameter shifts of one of the 54 holes for all samples, whereas Fig. 3(d)-(e) show the corresponding distribution of FDTD-calculated $Q$ factors and V$_{\textrm {cav}}$. We take the log of the $Q$ values to reduce the variance among the data. Lastly, to form tensors in Pytorch with aligned dimensions (i.e., 5x12), we had to artificially fill in the 4 missing holes in the middle of the center row (that is, at the cavity region) and 1 missing hole at the leftmost spot of the uppermost row and the rightmost spot of the lowermost row in the boxed area, respectively. These 6 extra holes have zero displacement or radius change values and simply serve as imaginary placeholders in our tensors. In other words, they have no effect on the calculations. We next encapsulated the collected design parameter data into a 4D tensor of dimension 12,500x3x5x12, where the second dimension corresponds to the three axes $(x, y, r)$ and the last two dimensions correspond to the 5x12 array formed by the 60 holes in the boxed region (schematically illustrated in Fig. 4). All data-preprocessing work is performed in MATLAB owing to its powerful matrix computing packages. The dataset collected herein is fed into the DNN layers to learn the mapping between design parameters and the $Q$ factor, as laid out in-depth in the next section.

 figure: Fig. 2.

Fig. 2. Two data samples randomly generated from the base structure in Fig. 1 by shifting the air holes’ locations and radii inside the boxed area. Top panel: electric field ($E_{y}$) profiles of the fundamental resonance mode. Bottom panel: air hole configurations before (black circles) and after (red circles) random shifts. (a),(c) Sample 1; (b),(d) Sample 2. $Q$ factors and V$_{\textrm {cav}}$’s of both samples calculated in FDTD are shown. See Dataset 1 for an example of underlying hole shift values [26].

Download Full Size | PDF

 figure: Fig. 3.

Fig. 3. Histograms of the dataset containing 12,500 samples, generated for training the DNN model. Top panel: design parameters of the L3 nanocavity including random (a) x displacements, (b) y displacements, and (c) radius variations, all following a Gaussian distribution with 0 mean and $a$/640 standard deviation. Bottom panel: the corresponding (d) $Q$ factors and (e) V$_{\textrm {cav}}$’s both calculated in FDTD.

Download Full Size | PDF

 figure: Fig. 4.

Fig. 4. Architecture of the DNN made up of 2 CNN layers and 3 FC layers to learn the relationship between design parameters (input) and the $Q$ factor (output) of the L3 nanocavity. Input is an Nx3x5x12 tensor where N represents batch size and output is a scalar value. Hyperparameters of the DNN, such as kernels and number of neurons (nodes), are labeled on the schematic.

Download Full Size | PDF

3. Deep learning with DNNs

3.1 DNN architecture

CNN is most effective for deep learning tasks involving recognizing and learning patterns in images and videos. We all have witnessed the prowess of CNN in face recognition [28], object detection [29], and autonomous driving [30] in recent years. CNN’s unique advantage stems from its ability to divide the input feature into many subregions and repeatedly apply convolutions, pooling, dropout and a series of other techniques to downsample the features and learn from them in parallel layer by layer [31]. Although the L3 structure doesn’t constitute an image in a strict sense, we can, however, treat it as one because the air holes resemble the pixels of an image while the shifts in $x, y$ and $r$ (i.e. the design parameters) of each air hole can be seen as the RGB colors of each pixel. With that in mind, a DNN model consisting of two CNN layers followed by three FC layers is constructed, as shown in Fig. 4, to learn the highly complex and nonlinear regression relationship. The input layer to the CNN is the 3-channel 12,500x3x5x12 tensor which is fed into convolution layer 1 (Conv1) that has 20 kernels of size 3x3 followed by average pooling and batch normalization (BN). Similarly to Conv1, convolution layer 2 (Conv2) has 40 kernels of size 3x3 and is again followed by average pooling and BN. Then, the output from Conv2 gets fed into FC1 that has 240 nodes, after which ReLU activation function is used to modulate the output from FC1. Next, the modulated values are sent into FC2 that has 120 nodes which in turn is followed by FC3 made up of 50 nodes. FC3 eventually outputs a single value to predict the Q factor. In our model, average pooling reduces the computational cost by summarizing the features contained in a feature map, BN makes computations of DNNs faster and more stable by normalizing the input features, and the ReLU function ($f(x) = max(0, x)$) drops values that are less than zero [31]. All of DNN’s structural settings stated herein are fine tuned to achieve the best training results in terms of prediction errors, training losses and convergence rate etc.

3.2 Training and test setup

The complete workflow of the DL regression modeling of our L3 nanocavity is outlined in Fig. 5, and is explained thoroughly here. The dataset already collected is randomly divided into a training set and a test set with a 8:2 ratio for the DL task (so 10,000 training samples and 2500 test samples). The splitting is done randomly to ensure that all different kinds of features are evenly distributed to maximize the generalization capabilities of the DNN [31]. We then train the DNN with our training dataset where the design parameter shifts are the input and the $Q$ factors are the output (illustrated in Fig. 4). Simultaneously, we evaluate the training results with our test dataset. The optimal hyperparameter settings of the DNN is determined to be the following: number of epochs = 300 (with shuffle enabled), training batch size = 64, test batch size = 100, and stochastic gradient descent (SGD) is selected for weight optimization with learning rate = 0.01 and momentum = 0.5. Shuffle is done at each epoch to increase the randomness of the training data for better generalization capacity [31]. Momentum is used in the SGD optimizer to avoid gradient plateaus or non-optimal solutions [31]. The loss, or what’s also known as the cost, is calculated by the mean squared error (MSE) method [31], as seen in Eq. (1), and evaluates the averaged difference between the $Q_{\textrm {NN}}$ predicted by the DNN and the target $Q_{\textrm {FDTD}}$ among all samples. In Eq. (1), N is the number of samples. These DNN hyperparameters are optimized to achieve an accurate regression mapping featured by fast convergence and minimum losses. All the DL code is written in Python with the popular Pytorch library.

$$MSE = \frac{1}{N}\sum_{}^{N} (Q_{\textrm{NN}} - Q_{\textrm{FDTD}})^{2}$$

 figure: Fig. 5.

Fig. 5. The workflow of DL regression modeling of the L3 nanocavity from the initial data generation to the final validation step.

Download Full Size | PDF

4. Results and discussion

To evaluate the quality of our trained model, other than applying the MSE shown in Eq. (1), we introduce a few other crucial performance metrics that are relevant to our purposes: the absolute prediction error $\epsilon _{abs}$ and the Pearson correlation coefficient $R$. The former metric is concerned with the absolute distance between our DNN-predicted $Q_{\textrm {NN}}$ and target $Q_{\textrm {FDTD}}$ (see Eq. (2)) and evaluates how good the prediction made by the trained model is. The latter metric measures how the $Q_{\textrm {NN}}$ varies with the $Q_{\textrm {FDTD}}$, or in other words, their degree of association or similarity [32] (see Eq. (3)). $R$ is defined as the covariance between the input arguments divided by the product of their standard deviations.

$$\epsilon_{abs} = \frac{|Q_{\textrm{NN}} - Q_{\textrm{FDTD}}|}{Q_{\textrm{FDTD}}}\times 100\%$$
$$R = \frac{cov(Q_{\textrm{NN}}, Q_{\textrm{FDTD}})}{\sigma(Q_{\textrm{NN}})\sigma(Q_{\textrm{FDTD}})}$$

Figure 6(a)-(b) shows the convergence of training and test processes visualized by plotting prediction error $\epsilon _{abs}$ and MSE against number of epochs, respectively. As seen in Fig. 6(a), both the training $\epsilon _{abs}$ and the test $\epsilon _{abs}$ decay rapidly within the initial 50 epochs and converge to a mean prediction error of 0.00014% and 0.01%, respectively, within a total of 300 epochs. The latter is over 1 order of magnitude lower than the previously reported value of 0.6% [23]. The lowest test $\epsilon _{abs}$ achieved by our model is 0.000147%, which is over 4 orders of magnitude lower than the previously reported value of 16% [13] and 1% [24]. Also, our converge rate (300 epochs) is over 2 orders of magnitude faster than previously reported values of $10^{5}$ epochs [13] and $10^4$ epochs [24]. Figure 6(b) exhibits that both the training MSE and the test MSE decay rapidly within the initial 25 epochs and converge to a mean loss of 0.00022 and 0.00025, respectively, within a total of 300 epochs. The lowest test MSE achieved by our model is 0.000247 with 300 epochs, which is over 1 order of magnitude lower than the previously reported values of 0.008 with 4000 epochs [24] and 0.0081 with 500 epochs [22]. These results mean that our DNN model, with an prediction accuracy up to 99.9999% ($\approx$100%-0.000147%), can predict $Q$ factors from a given set of design parameters better than previously proposed models.

 figure: Fig. 6.

Fig. 6. Visualizing the performance of the trained DNN model by: (a)-(b) convergence trends of the prediction error $\epsilon _{abs}$ and MSE loss, respectively. (c)-(d) correlation between DNN-predicted $Q_{\textrm {NN}}$ and target $Q_{\textrm {FDTD}}$ for training and test datasets, respectively. Diagonal line has slope = 1.

Download Full Size | PDF

Fig. 6(c)-(d) demonstrates the correlation between $Q_{\textrm {NN}}$ and $Q_{\textrm {FDTD}}$ for the training and test datasets during the regression learning phase. In Fig. 6(c), we see that $Q_{\textrm {NN}}$ and $Q_{\textrm {FDTD}}$ during training are highly correlated since all data points are tightly centered around the line with slope 1. This is also supported by the high correlation coefficient $R$ value of 0.991 calculated by Eq. (3). Figure 6(d) shows a similar pattern for the test data and the corresponding $R$ = 0.990. Both of our $R$ values are substantially higher than previous reported values of 0.97 for training and 0.92 for test [13] and 0.976 for training (test value unavailable) [25]. The correlation results are evidence that the regression relationship learned by our DNN model is by far the most accurate and can be used for future DL based optimization of L3 nanocavities to achieve higher $Q$ factors.

To validate the trained DNN model, we randomly generated 250 brand new L3 nanocavities as a validation dataset according to the same Gaussian distribution used before. We then tested our trained model on this validation set and computed the corresponding prediction error $\epsilon _{abs}$ distribution (see Fig. 7). Figure 7’s inset tabulates the minimum, average and median of the distribution for better perspective and we see that the majority of $\epsilon _{abs}$ are less than 0.2% from the histogram. It’s expected that $\epsilon _{abs}$ shown in Fig. 7 are going to be larger than those obtained in the training process because our DNN model has not previously seen those data samples used here. Nonetheless, a median prediction error of 0.154% proves our model has excellent generalization capacity. The minimum prediction error of 0.000721% means that, for L3 nanocavities, our model can predict Q factors from any given set of design parameters with a high accuracy up to 99.9993%.

 figure: Fig. 7.

Fig. 7. Distribution of prediction errors $\epsilon _{abs}$ of the trained model when tested on a validation dataset consisting of 250 new data samples. Inset: minimum, average and median of the $\epsilon _{abs}$ distribution.

Download Full Size | PDF

Last but not least, our refined DNN model can be used as an efficient tool for designing not only L3 PC nanocavities but also other types of PCs. To fit the model for other PCs, one ought to stick with the same Gaussian distributions to randomly vary design parameters when they generate data samples. However, they must generate and compute different datasets for different PC structures in FDTD as each structure has its own specific set of physical and optical properties. Nonetheless, the same DNN model can easily be applied to different scenarios with only minor tunings/adjustments, as it has been proven to have high generalization capacity. The broad applicability and versatility of our advanced model signifies its potential in driving the realization of photonic integrated circuits and the recently emerged all-optical neural networks [33,34].

5. Comparison with fully-connected layers

To better illustrate the state-of-the-art performance of our CNN model, here we quantitatively compare it to the more commonly used FC model (i.e. multi-layer perceptrons) using the same dataset (i.e. 12,500 data points) and under the same training conditions. Training results of the three-layer FC model are shown in Fig. 8, where convergence trends of the prediction error $\epsilon _{abs}$ and the MSE loss, as well as correlation coefficients are graphed. Data points in Fig. 8(c)-(d) are visibly more scattered than those in Fig. 6(c)-(d), demonstrating a less accurate regression mapping learned by the FCs model. The quantitative comparison between CNN and FCs using key metrics is tabulated in Table 1, and we see that CNN reported better results than FCs in all aspects by an appreciable margin and thus proves it to be more suitable for the modeling of L3 PC nanocavities. CNN’s superior performance can be attributed to the fact that, compared to FCs, it can reduce model complexity and prevent overfitting through weight sharing. Thus, our choice of CNN over the conventional FCs has been justified.

 figure: Fig. 8.

Fig. 8. For comparison between our state-of-the-art DNN model using CNN and the more commonly used FCs. Visualizing the performance of the trained FC model by: (a)-(b) convergence trends of the prediction error $\epsilon _{abs}$ and MSE loss, respectively. (c)-(d) correlation between DNN-predicted $Q_{\textrm {NN}}$ and target $Q_{\textrm {FDTD}}$ for training and test datasets, respectively. Diagonal line has slope = 1.

Download Full Size | PDF

Tables Icon

Table 1. Comparison between CNN and FC models with the same dataset and under the same training conditions. Tabulated performance metrics include: correlation coefficients, minimum test prediction error $\epsilon _{abs}$ and MSE loss.

6. Conclusion

We have proposed an improved regression characterizing method for 2D PC nanocavities based on DL methodology. A DNN composed of two CNN and three FC layers were successfully trained to map from a high-dimensional design parameter space to the $Q$ factor space using a large-scale dataset. The trained model has demonstrated an ability to predict $Q$ factors from given design parameters with a record high fidelity of 99.9999% that’s considerably better than all previously published results. The convergence speed was also shown to be several orders of magnitude faster. When tested on a new validation set, the DNN model achieved prediction accuracies for Q factors up to 99.9993%, indicating a strong generalization capacity. The proposed regression characterizing scheme not only overcomes known disadvantages of existing schemes, but also lays the foundation for DL-based data-driven and on-demand optimization of PC nanocavities applicable to the rapid design of nanoscale lasers [35,36] and photonic integrated circuits [33,34].

Funding

National Natural Science Foundation of China (11474365); Shenzhen Key Laboratory Fund (ZDSYS201603311644527); Shenzhen Science and Technology Innovation Program (KQCX20140522143114399); Shenzhen Fundamental Research Program (JCYJ20150611092848134, JCYJ20150929170644623); Foundation of NANO X (18JG01).

Acknowledgments

We thank Mr. Kinley Lam and Jiachen Liu for their helpful discussion on parallel computing and providing us access to cluster resources. We thank Mr. Qian Chen for his helpful discussion on neural networks and computer vision techniques. We thank Mr. Feng Yin for his helpful discussion on deep learning and sample collection.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Dataset 1, Ref. [26]. Machine learning code and simulation models used in the production of this work are available from R.L. upon reasonable request at the current stage and will be made public in the future following the completion of this project.

References

1. W. Ma, Z. Liu, Z. A. Kudyshev, A. Boltasseva, W. Cai, and Y. Liu, “Deep learning for the design of photonic structures,” Nat. Photonics 15(2), 77–90 (2021). [CrossRef]  

2. J. Jiang, M. Chen, and J. A. Fan, “Deep neural networks for the evaluation and design of photonic devices,” Nat. Rev. Mater. pp. 1–22 (2020).

3. Y. Akahane, T. Asano, B.-S. Song, and S. Noda, “High-q photonic nanocavity in a two-dimensional photonic crystal,” Nature 425(6961), 944–947 (2003). [CrossRef]  

4. J. C. Knight, “Photonic crystal fibres,” Nature 424(6950), 847–851 (2003). [CrossRef]  

5. Y. Ota, F. Liu, R. Katsumi, K. Watanabe, K. Wakabayashi, Y. Arakawa, and S. Iwamoto, “Photonic crystal nanocavity based on a topological corner state,” Optica 6(6), 786–789 (2019). [CrossRef]  

6. X. Gan, Y. Gao, K. Fai Mak, X. Yao, R.-J. Shiue, A. Van Der Zande, M. E. Trusheim, F. Hatami, T. F. Heinz, and J. Hone, “Controlling the spontaneous emission rate of monolayer MoS2 in a photonic crystal nanocavity,” Appl. Phys. Lett. 103(18), 181119 (2013). [CrossRef]  

7. T. Zhou, M. Tang, G. Xiang, B. Xiang, S. Hark, M. Martin, T. Baron, S. Pan, J.-S. Park, and Z. Liu, “Continuous-wave quantum dot photonic crystal lasers grown on on-axis si (001),” Nat. Commun. 11(1), 977 (2020). [CrossRef]  

8. X. Liu, K. Mashooq, T. Szkopek, and Z. Mi, “Improving the efficiency of transverse magnetic polarized emission from algan based leds by using nanowire photonic crystal,” IEEE Photonics J. 10(4), 1–11 (2018). [CrossRef]  

9. H. Zhang, X. Zhang, H. Li, Y. Deng, X. Zhang, L. Xi, X. Tang, and W. Zhang, “A design strategy of the circular photonic crystal fiber supporting good quality orbital angular momentum mode transmission,” Opt. Commun. 397, 59–66 (2017). [CrossRef]  

10. S. Mahmoodian, K. Prindal-Nielsen, I. Söllner, S. Stobbe, and P. Lodahl, “Engineering chiral light–matter interaction in photonic crystal waveguides with slow light,” Opt. Mater. Express 7(1), 43–51 (2017). [CrossRef]  

11. J. R. Hendrickson, R. Soref, and R. Gibson, “Improved 2× 2 mach–zehnder switching using coupled-resonator photonic-crystal nanobeams,” Opt. Lett. 43(2), 287–290 (2018). [CrossRef]  

12. S. Bhattacharya and S. John, “Designing high-efficiency thin silicon solar cells using parabolic-pore photonic crystals,” Phys. Rev. Appl. 9(4), 044009 (2018). [CrossRef]  

13. T. Asano and S. Noda, “Optimization of photonic crystal nanocavities based on deep learning,” Opt. Express 26(25), 32704–32717 (2018). [CrossRef]  

14. Y. Tanaka, T. Asano, and S. Noda, “Design of photonic crystal nanocavity with q-factor of 109,” J. Lightwave Technol. 26(11), 1532–1539 (2008). [CrossRef]  

15. Y. Lai, S. Pirotta, G. Urbinati, D. Gerace, M. Minkov, V. Savona, A. Badolato, and M. Galli, “Genetically designed l3 photonic crystal nanocavities with measured quality factor exceeding one million,” Appl. Phys. Lett. 104(24), 241101 (2014). [CrossRef]  

16. Y. Taguchi, Y. Takahashi, Y. Sato, T. Asano, and S. Noda, “Statistical studies of photonic heterostructure nanocavities with an average q factor of three million,” Opt. Express 19(12), 11916–11921 (2011). [CrossRef]  

17. E. Kuramochi, M. Notomi, S. Mitsugi, A. Shinya, T. Tanabe, and T. Watanabe, “Ultrahigh-Q photonic crystal nanocavities realized by the local width modulation of a line defect,” Appl. Phys. Lett. 88(4), 041112 (2006). [CrossRef]  

18. J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, and J. Kim, “Rotating your face using multi-task deep neural network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), pp. 676–684.

19. M. Zhang, Y. Zhang, L. Zhang, C. Liu, and S. Khurshid, “Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems,” in 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), (IEEE, 2018), pp. 132–142.

20. H. Greenspan, B. Van Ginneken, and R. M. Summers, “Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique,” IEEE Trans. Med. Imaging 35(5), 1153–1159 (2016). [CrossRef]  

21. R. Abe, T. Takeda, R. Shiratori, S. Shirakawa, S. Saito, and T. Baba, “Optimization of an h0 photonic crystal nanocavity using machine learning,” Opt. Lett. 45(2), 319–322 (2020). [CrossRef]  

22. R. Singh, A. Agarwal, and B. W. Anthony, “Mapping the design space of photonic topological states via deep learning,” Opt. Express 28(19), 27893–27902 (2020). [CrossRef]  

23. T. Christensen, C. Loh, S. Picek, D. Jakobović, L. Jing, S. Fisher, V. Ceperic, J. D. Joannopoulos, and M. Soljačić, “Predictive and generative machine learning models for photonic crystals,” Nanophotonics 9(13), 4183–4192 (2020). [CrossRef]  

24. S. Chugh, S. Ghosh, A. Gulistan, and B. Rahman, “Machine learning regression approach to the nanophotonic waveguide analyses,” J. Lightwave Technol. 37(24), 6080–6089 (2019). [CrossRef]  

25. T. Asano and S. Noda, “Iterative optimization of photonic crystal nanocavity designs by using deep neural networks,” Nanophotonics 8(12), 2243–2256 (2019). [CrossRef]  

26. R. Li, “Data repository for supplementary materials,” figshare (2021), https://github.com/Arcadianlee/Deep-Learning-Based-Modeling-of-PC-Nanocavities.git.

27. C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning (Massachusetts Institute of Technology, 2006).

28. G. Hu, Y. Yang, D. Yi, J. Kittler, W. Christmas, S. Z. Li, and T. Hospedales, “When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition,” in Proceedings of the IEEE international conference on computer vision workshops, (2015), pp. 142–150.

29. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). [CrossRef]  

30. P. Li, X. Chen, and S. Shen, “Stereo r-cnn based 3D object detection for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), pp. 7644–7652).

31. I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Press Cambridge, 2016).

32. J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation coefficient,” in Noise Reduction in Speech Processing (Springer, 2009), pp. 1–4.

33. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]  

34. B. J. Shastri, A. N. Tait, T. F. de Lima, W. H. Pernice, H. Bhaskaran, C. D. Wright, and P. R. Prucnal, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15(2), 102–114 (2021). [CrossRef]  

35. Y. Zeng, U. Chattopadhyay, B. Zhu, B. Qiang, J. Li, Y. Jin, L. Li, A. G. Davies, E. H. Linfield, and B. Zhang, “Electrically pumped topological laser with valley edge modes,” Nature 578(7794), 246–250 (2020). [CrossRef]  

36. Y. Li, J. Zhang, D. Huang, H. Sun, F. Fan, J. Feng, Z. Wang, and C.-Z. Ning, “Room-temperature continuous-wave lasing from monolayer molybdenum ditelluride integrated with a silicon nanobeam cavity,” Nat. Nanotechnol. 12(10), 987–992 (2017). [CrossRef]  

Supplementary Material (1)

NameDescription
Dataset 1       Base structure configuration and sample x and y displacements

Data availability

Data underlying the results presented in this paper are available in Dataset 1, Ref. [26]. Machine learning code and simulation models used in the production of this work are available from R.L. upon reasonable request at the current stage and will be made public in the future following the completion of this project.

26. R. Li, “Data repository for supplementary materials,” figshare (2021), https://github.com/Arcadianlee/Deep-Learning-Based-Modeling-of-PC-Nanocavities.git.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. L3 PC nanocavity used as a basis for data generation and regression learning. (a) Electric field ($E_{y}$) profile exhibiting the fundamental resonance mode. (b) Top view of the InP base structure showing its lattice constant $a$, air hole radius $r$, refractive index $n$, and $Q$ factor and modal volume V$_{\textrm {cav}}$ both calculated in FDTD. The red box encircling 54 holes forms a 5x12 array and indicates the area we apply DL regression to. See Dataset 1 for underlying air hole coordinate values [26].
Fig. 2.
Fig. 2. Two data samples randomly generated from the base structure in Fig. 1 by shifting the air holes’ locations and radii inside the boxed area. Top panel: electric field ($E_{y}$) profiles of the fundamental resonance mode. Bottom panel: air hole configurations before (black circles) and after (red circles) random shifts. (a),(c) Sample 1; (b),(d) Sample 2. $Q$ factors and V$_{\textrm {cav}}$’s of both samples calculated in FDTD are shown. See Dataset 1 for an example of underlying hole shift values [26].
Fig. 3.
Fig. 3. Histograms of the dataset containing 12,500 samples, generated for training the DNN model. Top panel: design parameters of the L3 nanocavity including random (a) x displacements, (b) y displacements, and (c) radius variations, all following a Gaussian distribution with 0 mean and $a$/640 standard deviation. Bottom panel: the corresponding (d) $Q$ factors and (e) V$_{\textrm {cav}}$’s both calculated in FDTD.
Fig. 4.
Fig. 4. Architecture of the DNN made up of 2 CNN layers and 3 FC layers to learn the relationship between design parameters (input) and the $Q$ factor (output) of the L3 nanocavity. Input is an Nx3x5x12 tensor where N represents batch size and output is a scalar value. Hyperparameters of the DNN, such as kernels and number of neurons (nodes), are labeled on the schematic.
Fig. 5.
Fig. 5. The workflow of DL regression modeling of the L3 nanocavity from the initial data generation to the final validation step.
Fig. 6.
Fig. 6. Visualizing the performance of the trained DNN model by: (a)-(b) convergence trends of the prediction error $\epsilon _{abs}$ and MSE loss, respectively. (c)-(d) correlation between DNN-predicted $Q_{\textrm {NN}}$ and target $Q_{\textrm {FDTD}}$ for training and test datasets, respectively. Diagonal line has slope = 1.
Fig. 7.
Fig. 7. Distribution of prediction errors $\epsilon _{abs}$ of the trained model when tested on a validation dataset consisting of 250 new data samples. Inset: minimum, average and median of the $\epsilon _{abs}$ distribution.
Fig. 8.
Fig. 8. For comparison between our state-of-the-art DNN model using CNN and the more commonly used FCs. Visualizing the performance of the trained FC model by: (a)-(b) convergence trends of the prediction error $\epsilon _{abs}$ and MSE loss, respectively. (c)-(d) correlation between DNN-predicted $Q_{\textrm {NN}}$ and target $Q_{\textrm {FDTD}}$ for training and test datasets, respectively. Diagonal line has slope = 1.

Tables (1)

Tables Icon

Table 1. Comparison between CNN and FC models with the same dataset and under the same training conditions. Tabulated performance metrics include: correlation coefficients, minimum test prediction error ϵ a b s and MSE loss.

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

M S E = 1 N N ( Q NN Q FDTD ) 2
ϵ a b s = | Q NN Q FDTD | Q FDTD × 100 %
R = c o v ( Q NN , Q FDTD ) σ ( Q NN ) σ ( Q FDTD )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.