Automatic optimization of miniaturized bound states in the continuum cavity

Feifan Wang; Ye Chen; Zixuan Zhang; Xiaoyu Zhang; Xinyi Zhou; Yi Zuo; Zihao Chen; Chao Peng; Chao Peng

doi:10.1364/OE.486873

1. Introduction

Trapping light into a compact volume has always been challenging due to the wave-propagating nature of light. Recently, it has been proved that a novel class of optical resonances, named as bound states in the continuum (BICs), can provide an alternative way of light trapping compared to conventional optical cavities in which the outwards light is forbidden by mirrors [1–6]. The BICs are localized wave functions with energies embedded in the radiation continuum but do not couple to the radiation field. Since the first observation of BIC in optical systems in 2011 [7], they had been attracting much attention for their theoretically infinite lifetime, narrow bandwidth, and robustness. Theoretically, the radiative energy loss can be totally eliminated through perfect BICs [1,2,8] and thus enables numerous applications from lasers [9–12], sensors [13–15] to nonlinear optical devices [16–20].

Although a general theory denies the existence of ideally three-dimensional (3D) confined BICs [4,21], it is still possible to realize 3D compact quasi-BIC cavities in a finite-sized footprint with the help of lateral light confinement. From the view of the application, quasi-BIC cavities are competent enough in trapping light to facilitate many novel photonic devices. Recently, we proposed a miniaturized (mini-) BIC cavity in which the out-of-plane radiation is suppressed by the BICs, while the side leakages are reduced by photonic crystal (PhC) heterostructures in transverse direction. As a result, we obtained a quality factor ($Q$) over $1\times 10^6$ in a footprint of only $20~\mu$m $\times$ $20~\mu$m in the experiments [22].

Nevertheless, it remains a challenging task to achieve high-quality lateral confinement for mini-BIC, especially when the footprint further shrinks. Although the light leakage at the side boundaries is in principle forbidden by the photonic band gap (PBG) of the hetero-structure, the mini-BIC would inevitably tunnel through the boundary region and couple to the outer space, because the boundary region cannot be infinitely thick and the PBG cannot be infinitely deep for a real device. An intuitive way to suppress energy leakage is to fine-tune the geometrical parameters that decrease the momentum mismatch between the cavity and boundary regions, which could minimize the outwards leakages to the greatest extent. However, such a method raises a difficult optimization problem of the cavity design since a large number of degrees of freedom (DOF) are involved, which is neither not easy to be solved by conventional numerical simulations, nor to find an optimal design from a simple strategy in theoretical analysis.

Towards the optimization problem in large parameter space, machine learning (ML) algorithms become new and effective approaches and had been successfully applied for designing photonic devices. ML turns out to be efficient when the relationship between inputs and outputs are complex and intractable [23]. Although traditional methods expire in these cases, a ML method can build models describing the relationship based on data analysis. Early efforts can be dated back to 1990s [24,25], when genetic algorithms are enrolled in optical designs. With the rapid development of ML, in 1980s, deep learning(DL) emerges [26]. DL provides a more powerful tool for photonic designs, especially when the design problem requires the analysis of huge data [27]. Artificial neural networks(ANN) with millions of free parameters is then studied after the computational power became available. Today, a large number of optical devices are designed using DL. New training and regularization techniques of DL are still being developed [28–30] and successful cases are boosted in recent years. [31–37].

Among the ANNs, convolutional neural networks (CNNs), which is a class of deep networks using convolution operations to extract the features of the inputs, are found to be promising in processing spatially correlated data [38,39]. Recall the fact that the optical patterns are always spatially correlated owing to the continuity of waves, the CNNs are suitable for depicting the light fields in nature, and a series of CNN-based algorithms have been adopted in optical designs ranging from inverse scattering problems [40], controlling light scattering [41], metasurface designing [42,43], prediction of properties of novel optical materials [44,45], to the optimization of optical microcavities [34,46].

In this work, we propose an automatic optimization method to promote the lateral confinement performance of a mini-BIC cavity. Briefly, we choose the displacement between each PhC layer in the boundary region as tunable parameters and train a CNN from many randomly generated cavities as a dataset. Further, we use the trained network to predict an optimal cavity design. Verified by numerical simulation, the in-plane quality factor is increased from $4.32\times 10^4$ in the original design to $6.32\times 10^5$ in the optimized design, which confirms the effectiveness of our method.

2. Theory and principle

2.1 Light confinement in the mini-BIC cavity

The goal of this work is to find an optimal design of mini-BIC cavity when the number of PhC layers in the boundary region is pre-defined. The in-plane geometry of BIC cavity is schematically illustrated in Fig. 1(a), and the PhC is patterned to a silicon-on-insulator (SOI) slab with silicon layer thickness of $600$ nm (see our previous works for the details). Two PhC regions, namely cavity region A and a surrounding boundary region B, together form a PhC heterostructure. It is noteworthy that, compared to our previous proposal which consists of $37\times 37$ air holes [22], the cavity in this work further shrinks to $21\times 21$ air holes in total. We choose the number $N=21$ as a hard constraint since such a cavity footprint is reasonably small to facilitate many devices such as on-chip lasers, OLEDs, and sensor arrays. Our target is to promote the $Q$s of the mini-BIC cavity to be as high as possible.

Fig. 1. Principles of the miniaturized BIC cavity. (a) Schematic of the cavity and its energy dissipation channels. The cavity consists of photonic crystal regions A and B, marked by white boxes. (b) Left: Momentum distribution of the four lowest-order finite-size modes, which are highly localized to points in a square latticed reciprocal space with a spacing of $\pi /L$. $L$ is the length of region A. Modes are labeled as M$_{pq}$ according to in-plane momentum quantum number. Right: Multiple BICs that carry integer topological charges appear on bulk band TE-A in momentum space, composing a topological constellation. The topological constellation matches with the discrete momentum of M$_{11}$ at the given geometrical parameters. (c) bandstructures of region A (blue) and B (orange) near the 2nd-order $\Gamma$ point, including the target band TE-A in region A that locates within the bandgap between TE-A and TE-C/D in region B. (d) The log scale trend of radiative $Q$ (blue) and side-leakage $Q$ (orange) with the increase of layer number in region B. The side-leakage $Q$ is always much lower than radiative $Q$, indicating the dominant loss is the in-plane leakage.

Download Full Size | PDF

As shown in Fig. 1(a), there exist two main energy dissipation channels that lower the $Q$s of the cavity, i.e. the out-of-plane radiation and in-plane side leakages. It has been proved that the radiation can be suppressed by arranging the BICs in the momentum space [5,22,47]. In this work, we still focus on the BICs on the TE-A band near the 2nd-order $\Gamma$ point in region A. When the cavity region is in finite size, the continuous band of an infinitely periodic PhC turns into a set of discrete modes. These modes are labeled as M$_{pq}$ according to their in-plane momenta in the first quadrant at ($p\pi /L$, $q\pi /L$), where $L$ is the length of region A. The momentum distributions of these modes are highly localized to a set of points that form a square lattice in the momentum space with a spacing of $\pi /L$, as illustrated in the left panel in Fig. 1(b). By tuning the lattice constant $a$, the topological charges carried by the BICs evolve in the momentum space. When the topological charges match the major component of M$_{11}$ in the momentum space, the radiation loss is significantly suppressed as illustrated in the right panel in Fig. 1(b). We fix the size of region A as $N_a=5$ and found the lattice constant $a$ in region A as the optimal design of a mini-BIC cavity.

On the other hand, we use boundary region B to handle the side leakage. The total number of PhC layers and the size of region A are fixed as $N=21$ and $N_a=5$, respectively, which leaves $N_b=8$ layers of region B in each lateral direction to surround region A, as shown in Fig. 1(a). The bandstructure of the infinite PhCs in regions A and B are illustrated in Fig. 1(c), showing that the target mode M$_{11}$ is laterally confined by the PBG in region B because the band TE-A in region A is embedded in the PBG between band TE-A and TE-C/D of region B. Comparing to our previous work [22], the total cavity footprint shrinks to $\sim 10\mu m\times 10\mu m$ which is only $1/4$ of our previous design. Discussion on further shrinking the size is attached in the Appendix.

However, as the footprint shrinks, the side leakage becomes more severe. The increase in lateral losses can be attributed to two main factors. Firstly, for a smaller region A, a larger fraction of light reaches the boundaries between regions A and B, making the optical mode more sensitive to the boundaries. Secondly, for a thinner region B, the light can more easily tunnel through to outer space thus bringing more energy leakage.

We quantitatively evaluate the in-plane losses from numerical simulations, characterized by a side leakage $Q$ which only counts the energy dissipation in the lateral directions. Figure 1(d) records the radiative and side leakage $Q$s with the increase of layer numbers in region B, respectively. Although side-leakage $Q$ increases when region B gets thicker, it is always much lower than the radiative $Q$ regardless of the layer numbers. Therefore we conclude that the side leakage dominates in the energy loss of the mini-BIC cavity in a small footprint. It is noteworthy that, although increasing layer number in region B will suppress the side leakage, it will result in a larger total footprint. Instead, our goal is to optimize $Q$ in a given total footprint, which has much potential in compact on-chip systems.

To suppress the side leakage, a sophisticated design of the layer displacements in region B would be effective. However, such a design requires delicate parameter tuning, which is not an easy task. Intuitively, a graded varying displacement to form an adiabatic evolution of geometric parameters can lower the mismatch between any two adjacent layers and thus benefit the lateral light confinement [48]. However, finding an optimal displacement function is still challenging, since many factors contribute to the side leakages together, for instance, the distances between adjacent layers shift the local effective lattice constant, which modifies the local bandstructure and gives a different tunneling effect.

The fine-tuning of the displacement function raises a complicated optimization problem. From the conventional design method of numerical simulation, such optimization is computationally heavy since it is hard to predict the candidate designs from an intuitive theory in physics. For the mini-BIC cavity, even assuming the system preserves $C_{4v}$ symmetry, the displacements of each layer involve a large number of DOFs that form a huge parameter space. To solve the optimization problem efficiently, we propose an automatic method, which is described in detail in the next subsection.

2.2 Framework of the automatic optimization

In order to find an optimal design in the large parameter space, we propose a fully automatic optimization method. As elaborated above, the lateral leakage is dominant in a small-footprint mini-BIC cavity, therefore we focus on the in-plane geometries only in the optimization. Accordingly, we apply an effective refractive index $n_{\text {eff}}=3.25$ of silicon to present the 600-nm-thick SOI slab and use a two-dimensional (2D) model for our investigation. With this $n_{\text {eff}}$, the bandstructure of the heterostructure is similar to that of a 3D model. Details of the 2D model is attached in the Appendix section. Such a simplification would significantly reduce the cost of numerical simulations. As a start point (i.e the base structure), the lattice constant $a$ in regions A and B are set to $513$ nm and $552$ nm, respectively, with a gap distance $g$ of $552$ nm between the two regions; the radii of all air holes are fixed to $175$ nm.

Figure 2(a) illustrates the tunable DOFs ($d_i$) in our design. We fix the design of region A to best capture the suppression of out-of-plane radiation and only enroll the design of region B into the optimization. Starting from the base structure mentioned above, we allow the displacements $d_i, (i=1,2 .., 8)$ of all the eight layers (denoted as Layer $1$-$8$) in region B to vary independently. We assume all the air holes in the same layer move simultaneously to persevere ${C_{4v}}$ symmetry, resulting in an “expanding" or “shrinking" of the layer.

Fig. 2. Framework of the automatic optimization. (a) Schematic of the eight-layer displacements in the optimization. Region B is divided into 8 layers, with the displacement of each layer varying independently. The layers shrink or expand to preserve $C_{4v}$ symmetry. (b) Workflow of the optimization. Starting from the base structure, the displacements are first randomly adjusted. 1100 sets of displacements are created and the $Q$s are simulated accordingly. By using this dataset as the training data, a neural network is trained. The trained network is then used to optimize the design, with selecting the best design in the training data as the new starting point. Finally, the real $Q$ of the predicted optimal design is validated by simulations.

Download Full Size | PDF

It is noteworthy that, although the evolution of geometry is restricted to a certain pattern here for simplicity, the method can directly be extended to more DOFs for a more sophisticated design. Generally speaking, $C_{4v}$ symmetry is not a necessary condition in the optimization algorithm. Both the random parameter adjustment and the convolutional neural network can be applied to hexagonal lattice or asymmetric lattices [49] as long as we tune a different set of parameters. Various parameters including the air hole radius can be enrolled into the optimization. Also, the shift of the air holes can be in all directions, and all the geometric parameters of an anisotropic unit can be considered for a higher DOF.

In this work, we aim to optimize a miniaturized BIC cavity on square lattice. The in-plane $C_2$ symmetry is required to ensure the existence of the BIC, and we choose a higher $C_{4v}$ symmetry to simplify the problem for a better verification of the algorithm. The BIC cavity optimization can also be executed in a hexagonal lattice as well.

We summarize the workflow of the automatic optimization method as illustrated in Fig. 2(b), which are:

1. Search for the designs in the vicinity of the base structure through a random parameter adjustment. This step creates 1100 sets of layer displacements, then calculates the corresponding $Q$s.
2. Train a neural network to describe the relationship between the displacements and the cavity $Q$s. The dataset from step 1 is used for the training.
3. Apply the trained network to optimize the structure. Optimization starts from the best design found in step 1.
4. Validate the predicted optimal designs in their best $Q$s from step 3 with numerical simulations.

3. Results

3.1 Random parameter adjustment

In general, there exist more than one local maximum in the parameter space, especially in a complicated optimization problem. To avoid the search being trapped around one local maximum, a large parameter space region should be involved in the search. Following this requirement, our random search approach consists of a coarse adjustment process with a large parameter variation and a fine-tuning of parameters that are relatively local. The coarse adjustment is aimed to cover a large enough parameter space while the fine adjustment will tune the parameters in the vicinity of the best design from the output of the coarse search.

Specifically, we assume the displacements are large enough when their absolute values exceed a threshold value of $|\delta |=39$ nm, which gives a search range of $78$ nm in the lattice constant difference. We randomly adjust the displacements $d_i$ of the eight layers, following a normal distribution with a mean value $\mu =0$ and standard deviation $\sigma =20$ nm. Consequently, the possibility for the absolute value of a displacement being larger than $39$ nm is around $0.0512$. In other words, by generating 200 structures, we expect from mathematics that the displacements in over 10 cases can be larger than the threshold $|\delta |$, indicating such randomness can cover a sufficiently large space in the coarse search.

After the coarse search, the corresponding $Q$s of the 200 generated structures are calculated by using finite-element method (FEM) simulations (COMSOL Multiphysics). The simulated $Q_{\text {sim}}$ ranges from $2.22\times 10^2$ to $4.41\times 10^5$. The highest $Q$ is obtained with the displacements of the eight layers set to $-22.10$ nm, $-15.83$ nm, $-29.02$ nm, $-2.26$ nm, $-46.73$ nm, $-10.40$ nm, $-36.47$ nm, and $-26.20$ nm, respectively, from Layer 1 to 8.

Next, we use the optimal output of coarse search to renew the start point and run the second round of parameter search but with a smaller fluctuation in $d_i$. The displacements are randomly adjusted following a normal distribution with $\mu =0$, and $\sigma =10$ nm, which is a smaller value than the coarse search. At this time, 1100 designs are generated, whose $Q_{\text {sim}}$ reads from $2.17\times 10^2$ to $5.32\times 10^5$. The displacements with the highest $Q$ are $-20.40$ nm, $-10.77$ nm, $-19.48$ nm, $-18.20$ nm, $-27.46$ nm, $-11.75$ nm, $-59.19$ nm, and $-42.61$ nm, respectively. There is no obvious trend within these two sets of optimized parameters, which indicates a complicated optimization process where traditional methods are inefficient.

3.2 Training the convolutional neural network

Through random parameter adjustment, we obtain a design with the highest $Q$ of $5.32\times 10^5$. We find that the randomization method becomes less effective in promoting the $Q$s to a higher value, even in the case we apply a much narrower random distribution. Instead, we propose to use a neural network for further optimization, since the network can depict the intrinsic relationship between $d_i$ and $Q$ to capture the optimal optimizing directions.

Among all types of neural networks, the convolutional neural network (CNN) is chosen to model the dependency between $d_i$ and the cavity $Q$, and thus can predict the designs with higher $Q$s. For a PhC structure, the displacement of one layer would strongly influence its adjacent layers owing to the continuity condition of electromagnetic waves. By taking into account such a strong spatial correlation of the layer displacements, we believe CNN is a more promising candidate than other neural networks in which convolution processes are absent in solving the problem.

The configuration of the network is illustrated in Fig. 3. The detailed network building process is attached in the Appendix. The input data is in a vector form which contains the displacements of the eight layers $d_i$ in sequence, with negative or positive values representing shrinking or expanding of the layer with respect to the cavity center. The first layer of the CNN is a convolutional layer, where 10 filters with a kernel size of 5 and a stride size of 1 are applied. Besides, one-element padding is added to each header and end of the input data. The convolutional layer outputs 60 elements, and then they are connected to three fully connected layers through rectified linear units (ReLu), and the numbers of the elements in each layer are 60, 200, and 50, respectively. Finally, the output of the last stage of fully connected layers generates 50 units of data, and they are fully connected to the output node which has only one element. This final output represents the $Q$ of a given design.

Fig. 3. Configuration of the convolutional neural network, which is composed of one input layer of layer displacements $d_\text {i}$, one convolution layer with ten filters, two fully connected layers and one output node of predicted $Q$ factor $Q_{\text {pre}}$. The kernel size of the filters in the convolution layer is set to 5, and one-element padding is added to the header and end of the input. The output of the convolution layer that contains $6\times 10$ elements are fully connected to 200 elements in the next layer, then fully connected to 50 elements, and finally to the $1\times 1$ output node.

Download Full Size | PDF

To train the network, it is necessary to generate a large enough dataset to capture all of the possible structures under consideration, and their $Q$s correspondingly. We use the randomly generated displacements and their $Q$s from “fine-tuning random adjustment" as the dataset in which contains 1100 different designs in total. We adopt 1000 designs to train the network, while the remaining 100 designs are used for the testing set. Given that the input only involves eight elements (namely 8 DOFs), we believe the data size of 1000 should be large enough for training the CNN.

The distribution of simulated $Q$s in the training dataset is illustrated in Fig. 4(a), showing the $Q$s varying in a range from $2.17\times 10^2$ to $5.32\times 10^5$. Notice that the $Q$-distribution is quite asymmetric with much more data on the low-$Q$ end, thus we use $log($Q$)$ instead of $Q$ to make the distribution more symmetrical [34,46]. Figure 4(b) shows the distribution of $log($Q$)$, with two peaks at high-$Q$ and low-$Q$ region, respectively. A fine resolution analysis of the highest bar in Fig. 4(a) is added in the inset, where a peak between 300 and 2000 is observed, corresponding to the low-$Q$ peak between 2.5 and 3.3 in 4(b). It is noteworthy that, although the low-$Q$ peak seems irrelevant with the optimization, removing such data from the training dataset would lead to worse results in training the network. We argue the reason may lie on the fact that the low-$Q$ data provides information about the unfavorable parameters for high-$Q$s, which can avoid the network collapse on extremely low $Q$s.

Fig. 4. Histogram distribution of the training data with size of 1000. (a) Distribution of the simulated $Q_{\text {sim}}$. Inset: high-resolution distribution of the highest bar, implying a peak between 300 and 2000. (b) Histogram distribution of $log_{10}(Q_{\text {sim}})$ calculated from the same data.

Download Full Size | PDF

The training of the CNN is performed by using an error back-propagation method [26], which enables a high-speed calculation of the gradient to minimize the difference between the actual $Q$ obtained from the simulations ($Q_{\text {sim}}$) and the $Q$ predicted from the network ($Q_{\text {pre}}$). The loss function in the back-propagating is set as the mean square error (MSE) between the logarithm of $Q_{\text {sim}}$ and $Q_{\text {pre}}$:

(1)$$L_1 = \left| log_{10}Q_{\text{sim}}- log_{10}Q_{\text{pre}}\right|^2$$

In the training, the weights in the network are gradually changed in an automatic manner to reduce $L_1$ with the stochastic gradient method. To evaluate the accuracy of the prediction, the standard deviations of the output $log_{10}Q_{\text {pre}}$ from the true answer $log_{10}Q_{\text {sim}}$ are calculated for both the testing data ($\sigma _{\text {test}}$) and training data ($\sigma _{\text {train}}$). The error of prediction $E$ is further defined as follows, implying that 68.27$\%$ of $Q_{\text {pre}}$ are statistically distributed within $(1\pm E)\times Q_{\text {pre}}$ [34,46]:

(2)$$E_{\text{test(train)}} = \frac{10^{\sigma_{\text{test(train)}}}-10^{-\sigma_{\text{test(train)}}}}{2}$$

We plot the prediction errors of the training dataset (blue curve) and testing dataset (orange curve) during the network training process, as shown in Fig. 5(a). After 100 steps of iterations in the training, the loss function $L_1$ converges accordingly, and both the prediction errors drop below 3$\%$, indicating that the CNN is well trained.

Fig. 5. The training of the CNN. (a) The evolution of prediction error for training (blue) and testing (orange) datasets during training iteration. Insets: the details after 50 iteration steps, showing the convergence of the prediction errors. (b) The relation between the simulated $Q_{\text {sim}}$ and CNN-predicted $Q_{\text {pre}}$ for training (blue dots) and testing (orange dots) datasets. The correlation between $Q_{\text {sim}}$ and $Q_{\text {pre}}$ of the training and testing data are 0.89 and 0.73, respectively.

Download Full Size | PDF

Figure 5(b) shows the correlation between $Q_{\text {pre}}$ and $Q_{\text {sim}}$ of both the training (blue dots) and testing (orange dots) dataset. The correlation between $Q_{\text {sim}}$ and $Q_{\text {pre}}$ of the training and testing dataset are calculated to be 0.89 and 0.73, respectively. We notice that the deviation from $Q_{\text {pre}}=Q_{\text {sim}}$ line gets larger in the case of $Q_{\text {sim}}$ lying in the central region of the distribution. The reason could be that the randomly generated training data contains fewer samples in such a region.

3.3 Cavity structure optimization and validation

Further, we perform the optimization of the mini-BIC cavity structure by using the trained CNN. During such an optimization step, all the parameters in the CNN are frozen. Instead, the displacements $d_i$ as the input data are gradually changed with the back-propagating method.

As a starting point, the input data is set to the cavity design with the highest $Q$ in the training dataset, which is $Q=5.31\times 10^5$. Then, the displacements are optimized with a new loss function $L_2$ which is defined as the MSE between $Q_{\text {pre}}$ and a target $Q$, $Q_{\text {tar}}$ and here we set it to a very high value ($10^{10}$). The loss function $L_2$ is explicitly written as:

(3)$$L_2 = \left| log_{10}Q_{\text{tar}}- log_{10}Q_{\text{pre}}\right|^2$$

By gradually reducing $L_2$, the quality factor $Q_{\text {pre}}$ of the predicted cavity design increases accordingly, which can further promote the $Q$ to a higher value than the best record obtained from random parameter adjustment.

It is noteworthy that we need to pay attention to controlling the learning rate $\eta$ during the training process. Although the trained CNN builds a relationship between the displacements and $Q$s as a data model, the model is more effective for certain regions in the parameter space in which the training data converges better. In the case that the structure under optimization departs too far away from the base structure, unexpected electromagnetic wave distributions (such as line defects, point defects, etc.) arise, which are beyond the knowledge of the network and may mislead the direction of optimization. In order to prevent the optimization from jumping to a remote and unknown region too quickly, we need to delicately control the learning rate $\eta$ in the optimization. In our optimization, we sweep $\eta$ from 0.0001 to 0.05.

Finally, we perform the numerical simulations on the optimized design suggested by the network, to validate its real $Q$s. We obtain the $Q_{\text {sim}}$ from the trained CNN at different $\eta$ and iteration steps, as shown in Fig. 6(a). When $\eta$ is relatively low, ranging from 0.0001 to 0.001, the trend of $Q_{\text {sim}}$ shows an increase after passing through the stabilizing area (grey shaded), as shown in the left panel of Fig. 6(a). However, if $\eta$ gets higher, such as 0.01 or 0.05, the evolution of $Q_{\text {sim}}$ becomes fast since the speed of parameters moving is large, and $Q_{\text {sim}}$ drops quickly after a series of iterations (right panel, Fig. 6(a)). The collapse of $Q$ can be attributed to the expiration of the CNN model.

Fig. 6. Results of CNN optimization. (a) The simulated $Q$s of the structures that are predicted by the network, at different learning rates $\eta$ and iteration steps. The grey shaded area indicates the region the process is not stabilized yet. Left: with learning rates $\eta$ set within 0.0001-0.001, the $Q$s of the predicted design increase. The best design is marked by the black circle, giving $Q = 6.32\times 10^5$. Right: with $\eta$ set within 0.01-0.05, the optimized $Q$ drops quickly in the process. (b) The distribution of the magnetic field amplitude $|H_z|$ of the mini-BIC mode M$_{11}$ in the optimal design.

Download Full Size | PDF

Among all the optimized designs, the optimal design is obtained when $\eta$ is set to 0.001 after 70,000 iteration steps, which gives $Q = 6.32\times 10^5$ (marked by a black circle in Fig. 6(a)). The modal distribution of the optimized mini-BIC cavity is presented in Fig. 6(b), in which regions A and B are highlighted by blue and orange boxes, respectively. we can identify from the modal pattern that the optimized mode belongs to M$_{11}$, with its $Q$ being one order of magnitude higher than the base structure.

4. Conclusion

In this work, we propose a fully automatic design method to optimize the lateral light confinement in a PhC heterostructure cavity in the continuum, namely the mini-BIC cavity. By combining the random parameter adjusting and CNN-assisted optimization, the displacements of the boundary layers are optimized to an optimal value to promote the $Q$ of mini-BIC while keeping the total footprint unchanged. After the optimization, the $Q$ increases from $4.32\times 10^4$ of the base structure to $6.32\times 10^5$ of the CNN’s output, which is about 14.6 times higher than the original design. This work confirms the effectiveness of using CNNs for photonic device optimization in a large parameter space, especially when the varying parameters create spatial-correlated influences.

5. Appendix

5.1. 2D modeling of the 3D PhC structure

In our device, the vertical energy leakage is suppressed by the BIC, and we optimize the in-plane light confinement with a full-automatic method. To depict the in-plane energy confinement of real 3D structures with a 2D model, we tune the refractive index of the 2D model to make sure the performance of photonic bandgaps is similar. Firstly, we model the real 3D infinite photonic crystal structure with parameters of region A and B, respectively, to obtain the bandstructures in the two regions. Then, we build infinite 2D photonic crystals with same in-plane geometry with the 3D ones, and sweep the refractive index of silicon, until the relative position of region A in the bandgap of region B gets nearly the same with the real 3D ones. The final effective refractive index is set 3.25. The bandstructures of the 3D and 2D structures are as follows, as shown in Fig. 7. From the simulation results, the bandgap between the TE-C/D and TE-A bands of region B are almost equal, and the relative position of TE-A band of region A are similar too.

Fig. 7. Bandstructures of the heterostructure in the 3D (left) and 2D (right) models. Orange (blue) lines are bands in region B (A).

Download Full Size | PDF

5.2. Discussion on further shrinking the size

In our design, $N_a = 5$ corresponds to a total cavity size of about 10 $\mu m$, which is considerably small and promising for compact on-chip devices such as lasers, OLEDs and sensors arrays. We argue that such a footprint might be quite close to the minimal size for the mini-BIC configuration. From the view of physics, the periodicity in region A is presupposition of a reasonable band structure and the BICs, so shrinking to an even smaller $N_a$ will turn the BIC to a defect mode thus increase the vertical energy loss. In this case, typically for $N_a$ below 3, the Bragg scatterings would mostly depend on the lattice constants in region B, and can hardly fulfill the requirements of BIC. Besides, when region A shrinks, the eigenfrequencies would drop, which raises the side-leakage risks, as shown in Fig. 8.

Fig. 8. Field distribution ($H_z$) of the target mode at different region A size of $N_a$.

Download Full Size | PDF

5.3. Detailed CNN building process

The CNN is built with python, by using the optimized tensor library for deep learning PyTorch [50]. As the start point, a CNN is built, and all of the weights are initialized with a fixed seed. Then, training data is divided by a fixed batch size and injected into the network in sequence. After each batch, the iteration of the network is processed with the built-in error back-propagation method. When all of the training data is injected, the prediction error is recorded, and another run of network iteration is started. The training process is repeated by a fixed iteration number.

Funding

National Key Research and Development Program of China (2020YFB1806405, 2022YFA1404804); National Natural Science Foundation of China (62135001); China Postdoctoral Science Foundation (2021M690239); Major Key Project of PCL (PCL2021A04, PCL2021A14).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. C. W. Hsu, B. Zhen, J. Lee, S.-L. Chua, S. G. Johnson, J. D. Joannopoulos, and M. Soljacic, “Observation of trapped light within the radiation continuum,” Nature 499(7457), 188–191 (2013). [CrossRef]

2. J. Lee, B. Zhen, S.-L. Chua, W. Qiu, J. D. Joannopoulos, M. Soljacic, and O. Shapira, “Observation and differentiation of unique high-q optical resonances near zero wave vector in macroscopic photonic crystal slabs,” Phys. Rev. Lett. 109(6), 067401 (2012). [CrossRef]

3. E. N. Bulgakov and D. N. Maksimov, “Topological bound states in the continuum in arrays of dielectric spheres,” Phys. Rev. Lett. 118(26), 267401 (2017). [CrossRef]

4. C. W. Hsu, B. Zhen, A. D. Stone, J. D. Joannopoulos, and M. Soljacic, “Bound states in the continuum,” Nat. Rev. Mater. 1(9), 16048 (2016). [CrossRef]

5. J. Jin, X. Yin, L. Ni, M. Soljacic, B. Zhen, and C. Peng, “Topologically enabled ultrahigh- Q guided resonances robust to out-of-plane scattering,” Nature 574(7779), 501–504 (2019). [CrossRef]

6. X. Gao, L. Yang, H. Lin, L. Zhang, J. Li, F. Bo, Z. Wang, and L. Lu, “Dirac-vortex topological cavities,” Nat. Nanotechnol. 15(12), 1012–1018 (2020). [CrossRef]

7. Y. Plotnik, O. Peleg, F. Dreisow, M. Heinrich, S. Nolte, A. Szameit, and M. Segev, “Experimental observation of optical bound states in the continuum,” Phys. Rev. Lett. 107(18), 183901 (2011). [CrossRef]

8. R. Ulrich, “Symposium on optical and acoustical micro-electronics,” in Symposium on Optical and Acoustical Micro-Electronics, (New York, NY, 1975), p. 359.

9. M. Imada, S. Noda, A. Chutinan, T. Tokuda, M. Murata, and G. Sasaki, “Coherent two-dimensional lasing action in surface-emitting laser with triangular-lattice photonic crystal structure,” Appl. Phys. Lett. 75(3), 316–318 (1999). [CrossRef]

10. D. Ohnishi, T. Okano, M. Imada, and S. Noda, “Room temperature continuous wave operation of a surface-emitting two-dimensional photonic crystal diode laser,” Opt. Express 12(8), 1562 (2004). [CrossRef]

11. A. Kodigala, T. Lepetit, Q. Gu, B. Bahari, Y. Fainman, and B. Kanté, “Lasing action from photonic bound states in continuum,” Nature 541(7636), 196–199 (2017). [CrossRef]

12. S. T. Ha, Y. H. Fu, N. K. Emani, Z. Pan, R. M. Bakker, R. Paniagua-Domínguez, and A. I. Kuznetsov, “Directional lasing in resonant semiconductor nanoantenna arrays,” Nat. Nanotechnol. 13(11), 1042–1047 (2018). [CrossRef]

13. Y. Liu, W. Zhou, and Y. Sun, “Optical refractive index sensing based on high- Q bound states in the continuum in free-space coupled photonic crystal slabs,” Sensors 17(8), 1861 (2017). [CrossRef]

14. J. Lv, Z. Chen, X. Yin, Z. Zhang, W. Hu, and C. Peng, “High-sensitive refractive index sensing enabled by topological charge evolution,” IEEE Photonics J. 12(5), 1–10 (2020). [CrossRef]

15. Y. Chen, C. Zhao, Y. Zhang, and C. W. Qiu, “Integrated molar chiral sensing based on high- Q metasurface,” Nano Lett. 20(12), 8696–8703 (2020). [CrossRef]

16. Z. Liu, Y. Xu, Y. Lin, J. Xiang, T. Feng, Q. Cao, J. Li, S. Lan, and J. Liu, “High- Q quasibound states in the continuum for nonlinear metasurfaces,” Phys. Rev. Lett. 123(25), 253901 (2019). [CrossRef]

17. L. Xu, K. Zangeneh Kamali, L. Huang, M. Rahmani, A. Smirnov, R. Camacho-Morales, Y. Ma, G. Zhang, M. Woolley, D. Neshev, and A. E. Miroshnichenko, “Dynamic nonlinear image tuning through magnetic dipole quasi- BIC ultrathin resonators,” Adv. Sci. 6(15), 1802119 (2019). [CrossRef]

18. K. Koshelev, Y. Tang, K. Li, D.-Y. Choi, G. Li, and Y. Kivshar, “Nonlinear metasurfaces governed by bound states in the continuum,” ACS Photonics 6(7), 1639–1644 (2019). [CrossRef]

19. J. Wang, M. Clementi, M. Minkov, A. Barone, J.-F. Carlin, N. Grandjean, D. Gerace, S. Fan, M. Galli, and R. Houdré, “Doubly resonant second-harmonic generation of a vortex beam from a bound state in the continuum,” Optica 7(9), 1126 (2020). [CrossRef]

20. B. Wang, W. Liu, M. Zhao, J. Wang, Y. Zhang, A. Chen, F. Guan, X. Liu, L. Shi, and J. Zi, “Generating optical vortex beams by momentum-space polarization vortices centred at bound states in the continuum,” Nat. Photonics 14(10), 623–628 (2020). [CrossRef]

21. M. G. Silveirinha, “Trapping light in open plasmonic nanostructures,” Phys. Rev. A 89(2), 023813 (2014). [CrossRef]

22. Z. Chen, X. Yin, J. Jin, Z. Zheng, Z. Zhang, F. Wang, L. He, B. Zhen, and C. Peng, “Observation of miniaturized bound states in the continuum with ultra-high quality factors,” Sci. Bull. 67(4), 359–366 (2022). [CrossRef]

23. D. Zibar, H. Wymeersch, and I. Lyubomirsky, “Machine learning under the spotlight,” Nat. Photonics 11(12), 749–751 (2017). [CrossRef]

24. T. Eisenhammer, M. Lazarov, M. Leutbecher, U. Schöffel, and R. Sizmann, “Optimization of interference filters with genetic algorithms applied to silver-based heat mirrors,” Appl. Opt. 32(31), 6310 (1993). [CrossRef]

25. S. Martin, J. Rivory, and M. Schoenauer, “Synthesis of optical multilayer systems using genetic algorithms,” Appl. Opt. 34(13), 2247 (1995). [CrossRef]

26. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature 323(6088), 533–536 (1986). [CrossRef]

27. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

28. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

29. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research 15, 1929–1958 (2014).

30. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, (pmlr, 2015), pp. 448–456.

31. I. Malkiel, M. Mrejen, A. Nagler, U. Arieli, L. Wolf, and H. Suchowski, “Plasmonic nanostructure design and characterization via Deep Learning,” Light: Sci. Appl. 7(1), 60 (2018). [CrossRef]

32. Z. Liu, D. Zhu, S. P. Rodrigues, K.-T. Lee, and W. Cai, “Generative Model for the Inverse Design of Metasurfaces,” Nano Lett. 18(10), 6570–6576 (2018). [CrossRef]

33. W. Ma, F. Cheng, and Y. Liu, “Deep-Learning-Enabled On-Demand Design of Chiral Metamaterials,” ACS Nano 12(6), 6326–6334 (2018). [CrossRef]

34. T. Asano and S. Noda, “Optimization of photonic crystal nanocavities based on deep learning,” Opt. Express 26(25), 32704 (2018). [CrossRef]

35. C. C. Nadell, B. Huang, J. M. Malof, and W. J. Padilla, “Deep learning for accelerated all-dielectric metasurface design,” Opt. Express 27(20), 27523 (2019). [CrossRef]

36. W. Ma, F. Cheng, Y. Xu, Q. Wen, and Y. Liu, “Probabilistic Representation and Inverse Design of Metamaterials Based on a Deep Generative Model with Semi-Supervised Learning Strategy,” Adv. Mater. 31(35), 1901111 (2019). [CrossRef]

37. T. Christensen, C. Loh, S. Picek, D. Jakobović, L. Jing, S. Fisher, V. Ceperic, J. D. Joannopoulos, and M. Soljačić, “Predictive and generative machine learning models for photonic crystals,” Nanophotonics 9(13), 4183–4192 (2020). [CrossRef]

38. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM 60(6), 84–90 (2017). [CrossRef]

39. W. Ma, Z. Liu, Z. A. Kudyshev, A. Boltasseva, W. Cai, and Y. Liu, “Deep learning for the design of photonic structures,” Nat. Photonics 15(2), 77–90 (2021). [CrossRef]

40. L. Li, L. G. Wang, F. L. Teixeira, C. Liu, A. Nehorai, and T. J. Cui, “DeepNIS: Deep Neural Network for Nonlinear Electromagnetic Inverse Scattering,” IEEE Trans. Antennas Propag. 67(3), 1819–1825 (2019). [CrossRef]

41. A. Turpin, I. Vishniakou, and J. d. Seelig, “Light scattering control in transmission and reflection with neural networks,” Opt. Express 26(23), 30911 (2018). [CrossRef]

42. L. Li, H. Ruan, C. Liu, Y. Li, Y. Shuang, A. Alù, C.-W. Qiu, and T. J. Cui, “Machine-learning reprogrammable metasurface imager,” Nat. Commun. 10(1), 1082 (2019). [CrossRef]

43. Q. Zhang, C. Liu, X. Wan, L. Zhang, S. Liu, Y. Yang, and T. J. Cui, “Machine-Learning Designs of Anisotropic Digital Coding Metasurfaces,” Adv. Theory Simul. 2(2), 1800132 (2019). [CrossRef]

44. A. Maksov, O. Dyck, K. Wang, K. Xiao, D. B. Geohegan, B. G. Sumpter, R. K. Vasudevan, S. Jesse, S. V. Kalinin, and M. Ziatdinov, “Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS2,” npj Comput. Mater. 5(1), 12 (2019). [CrossRef]

45. T. Xie and J. C. Grossman, “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties,” Phys. Rev. Lett. 120(14), 145301 (2018). [CrossRef]

46. T. Asano and S. Noda, “Iterative optimization of photonic crystal nanocavity designs by using deep neural networks,” Nanophotonics 8(12), 2243–2256 (2019). [CrossRef]

47. B. Zhen, C. W. Hsu, L. Lu, A. D. Stone, and M. Soljacic, “Topological nature of optical bound states in the continuum,” Phys. Rev. Lett. 113(25), 257401 (2014). [CrossRef]

48. E. Centeno and D. Cassagne, “Graded photonic crystals,” Opt. Lett. 30(17), 2278 (2005). [CrossRef]

49. C. Fang, Q. Yang, Q. Yuan, X. Gan, J. Zhao, Y. Shao, Y. Liu, G. Han, and Y. Hao, “High-q resonances governed by the quasi-bound states in the continuum in all-dielectric metasurfaces,” Opto-Electron. Adv. 4(6), 200030 (2021). [CrossRef]

50. A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems 32 (2019).

Automatic optimization of miniaturized bound states in the continuum cavity

Abstract

1. Introduction

2. Theory and principle

2.1 Light confinement in the mini-BIC cavity

2.2 Framework of the automatic optimization

3. Results

3.1 Random parameter adjustment

3.2 Training the convolutional neural network

3.3 Cavity structure optimization and validation

4. Conclusion

5. Appendix

5.1. 2D modeling of the 3D PhC structure

5.2. Discussion on further shrinking the size

5.3. Detailed CNN building process

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Equations (3)

Optics Express