Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Inverse design of a nano-photonic wavelength demultiplexer with a deep neural network approach

Open Access Open Access

Abstract

In this paper, we propose a pre-trained-combined neural network (PTCN) as a comprehensive solution to the inverse design of an integrated photonic circuit. By utilizing both the initially pre-trained inverse and forward model with a joint training process, our PTCN model shows remarkable tolerance to the quantity and quality of the training data. As a proof of concept demonstration, the inverse design of a wavelength demultiplexer is used to verify the effectiveness of the PTCN model. The correlation coefficient of the prediction by the presented PTCN model remains greater than 0.974 even when the size of training data is decreased to 17%. The experimental results show a good agreement with predictions, and demonstrate a wavelength demultiplexer with an ultra-compact footprint of 2.6×2.6µm2, a high transmission efficiency with a transmission loss of -2dB, a low reflection of −10dB, and low crosstalk around −7dB simultaneously.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Photonic integrated circuits (PICs), which are compatible with existing complementary metal-oxide-semiconductor fabrication techniques, have found different applications in the fields of optical communications [1], optical computing [2,3], and biomedical apparatus [4]. As the complexity and functionality of the PICs continue to evolve, an effective and efficient inverse design approach that is applicable for different application scenarios is highly desirable [57]. Recently, neural networks (NNs) have shown great potential in photonic signal processing in various areas [814]. Inverse design using a NN where a trained model is built to predict inverse geometric solutions for desired spectral transmission, has also emerged as a powerful tool for the rapid design of ultra-compact photonic integrated devices with various functionalities [6,7,1530], such as polarization-insensitive grating couplers [19], and optical power splitters [2123]. However, due to the non-unique relationship between the geometry and the spectral transmission, the NN-based inverse design for nano-photonic devices is usually hard to converge [29,30]. To solve the non-uniqueness issue, a tandem architecture consisting of two separated networks, an inverse network (design network) and a fixed forward network (spectrum network), has been recently presented [2935]. Since the forward models in the reported tandem networks are all pre-trained with the input topology structure data sampled from the simulation process, it could potentially form a much larger input domain than those sampled structures used in the pre-training stages. Therefore, networks using a fixed forward model cannot effectively adapt to such data domain shift unless it intensively samples the whole space of the structures to be designed for the pretraining. Even with intensive sampling, the fixed forward model still cannot fully remove the domain shift problem because the outputs from the inverse network are not truly the binary type, but rather a near-binary value between [0,1] due to the requirements of the calculation of the gradient during the training. Consequently, it causes instability issues in the network model and leads to undesired optical performances for the nano-photonic components in the integrated photonic circuits.

In this paper, a pre-trained-combined network (PTCN), where both the forward and inverse models are jointly trained, is proposed to mitigate the domain shift problem. In addition, to reduce the required input parameter numbers, a spectrum generation (SG) model at the inference stage is introduced to generate a target spectrum with a user-defined crossing wavelength point (CWP) for the prediction by the PTCN. The potential of the PTCN model in the inverse design tasks of the PICs is demonstrated by designing a wavelength demultiplexer, which plays a significant role in various applications as a basic building block of PICs [36]. The proposed network achieves a high validation accuracy of around 99%, which is 2% higher than that of the conventional tandem model. Meanwhile, the PTCN shows remarkable stability and robustness to the quantity and the quality of the training data. The correlation coefficient of the prediction by the presented PTCN model remains greater than 0.974, even when the size of training data is decreased to 17%. The wavelength demultiplexer predicted by the PTCN is demonstrated experimentally. The measured results show low crosstalk of −7dB and a low reflection of less than −10dB, agreeing well with the prediction result.

1.1 Structure design

To verify the effectiveness of the proposed PTCN model, a 1×2 wavelength demultiplexer is inverse designed on a standard SOI platform with a 220nm top silicon layer, and 2µm buried oxide layer, as shown in Fig. 1(a). The three-port device consists of a multimode region with a footprint of 2.6×2.6µm2 and three identical linear tapers to couple light into and out of the 450nm-wide waveguides. Here, the multimode region is chosen as the inverse design region and is discretized into 20×20 square pixels, where each pixel can be switched between two states: silicon square with or without an air hole. The states of the pixels vary the topology pattern of the region represented by a binary matrix (P), and this in turn changes the spectra of the device (T), which includes the transmission responses at port 1 and 2, and the reflection at the input port, as illustrated by blue, red and green lines in Fig. 1(a). The labeled data $($P, T) is then used to train the neural network. As a proof of concept, the pixel pitch and the air hole diameter are chosen to be 130nm and 90nm, respectively, both of which are feasible for mass-manufacturing fabrication, including deep UV lithography and etching processes.

 figure: Fig. 1.

Fig. 1. (a) Schematic diagram of the 1×2 wavelength multiplexer. The states of the pixels vary the topology pattern of the multimode region, which change the optical spectra at port 1 and 2 and the reflection (R). (b) Flowchart demonstrating the data collection process based on a direct binary search algorithm. (c) The overall architecture of the PTCN and SG models.

Download Full Size | PDF

1.2 Training data collection for arbitrary CWP

Considering a 20×20 binary pattern leads to an ample searching space with 2400 possible solutions, a modified direct binary search (DBS) algorithm with simulated annealing is adopted to obtain high-quality training data with low transmission loss and high wavelength selectivity. As illustrated in Fig. 1(b), the data collection process for a wavelength demultiplexer at a specific CWP is elaborated as follows. (i) The algorithm randomly generates an initial 20×20 binary matrix; (ii) The spectral responses of the initialized structure are obtained by 3D finite-difference-time-domain (FDTD) simulation (Ansys Lumerical). The simulation results are then used to calculate the original loss utilizing a mean squared error (MSE)-based loss function, which is expressed as

$${L_{MSE}} = \frac{1}{{3m}}\sum\limits_{i = 1}^{3m} {{{(N_i^s - N_i^t)}^2}} \textrm{ }$$
where $N_i^s$ and $N_i^{\,\,t}$ are the ith element in simulated and target spectrum sequences respectively. m is the number of wavelength points, and the 3m-length vector represents a total number of wavelength points for the spectral response at port 1, port 2, and reflection; (iii) The algorithm randomly selects and flips a pixel on the current matrix to generate an updated matrix; (iv) To avoid being trapped in local-optimum results, a simulated annealing strategy is adopted. After every γ iteration (γ =200), the flipping change of pixel state is forced to be accepted. Meanwhile, a new iteration starts by returning to Step (iii). Otherwise, the retention of pixel flipping will be determined by the following steps. First, the MSE of the updated structure is evaluated, and then a comparison between the MSEs of the updated and the current binary patterns is conducted. More specifically, the binary pattern with a lower MSE will be taken as the initial value of the next iteration. The process is repeated until the number of iterations is reached, or the MSE meets a pre-defined threshold ɛ, which is set as 0.01.

The aforementioned data collection process is repeated to obtain wavelength demultiplexers operating at different CWPs, from 1500nm to 1600nm, with an increment of 10nm. During the process, the parameters γ and ɛ are kept as 200 and 0.01, respectively. The whole training set contains 48000 samples. Here, based on the number of iterations in the data collection process, we divide the samples into two groups: 33000 high-performance data utilized for network training and 15000 low-performance data for model robustness evaluation. Specifically, the first 1200 iterations in the data collection process for each CWP target are taken as the low-performance data. Among selected data, the lowest MSE loss is larger than a pre-defined convergence threshold ɛ’, which is set as 0.075. On the contrary, the high-performance data is selected in the data collection process with an iteration number larger than 1300, where the highest MSE loss in all selected high-performance data is lower than ɛ’.

1.3 PTCN architecture

The architecture of the PTCN is depicted in Fig. 1(c). The inverse NN is pre-trained using the collected high-quality dataset, which takes the existing power spectrum T as the input and predicts the corresponding topology pattern P’. Specifically, the network is implemented by nine fully connected layers with 400 nodes in the output layer and 100 nodes in each of the rest layers, respectively, as indicated in Fig. 1(c) (blue region). To prevent the vanishing gradient problem, a residual connection is introduced every two layers to create an identical map and reuse the features at the previous layers. As the inverse NN gives a binary output of either “0” or “1”, the task can be considered as a multi-class multi-label classification problem. The binary cross-entropy (BCE) loss employed as the loss function of the inverse network is given by

$${L_{BCE}} ={-} \frac{1}{N}[\sum\limits_{i = 1}^N {{P_i}log(P_i^{\prime}) + (1 - {P_i})log(1 - P_i^{\prime})} ]\textrm{ }$$
where N is the number of output dimensions, $P_i^{}$ and $P_i^{\prime}$ are the ground truth and predicted binary matrix, respectively. Under the guidance of the BCE loss, the inverse network tries to generate the topology pattern, which is as close as possible to the ground truth.

Meanwhile, a forward network is introduced and pre-trained, which takes the existing topology patterns P as the input and approximates the corresponding spectra T’, as illustrated in Fig. 1(c) (yellow region). The forward model is based on the fully connected deep neural network structure, consisting of 7 layers with a width of 400 neurons. The mean absolute error (MAE) is used as the loss function of the forward network, which is defined as

$${L_{MAE}} = \frac{1}{{3m}}\sum\limits_{i = 1}^{3m} {|{T_i} - T_i^{\prime}|} $$
where $T_i^{}$ and $T_i^{\prime}$ are the ground truth and predicted optical transmission spectra array, respectively.

After both models are pre-trained individually, a novel tandem model in an end-to-end manner is constructed to enable the communication between the forward model and the inverse model. As the weights of the pre-trained forward network during the joint-training process are not fixed, the forward model can effectively adapt to the input data generated from the inverse network’s outputs during the course, thus obliviating the domain shift problem in conventional tandem architecture. This tandem-like model is fine-tuned using a linear combination of the two-loss functions given by

$${L_{total}} = {L_{BCE}} + \theta {L_{MAE}}\textrm{ }$$
where $\theta $ is the weight coefficient to balance the prediction ability between the well-performed forward network and the inverse network. To calculate the gradients during the backpropagation process in neural network training, the output of the inverse network is in a format of floating value within the range of [0,1]. Considering that the forward network is trained with samples of the actual binary type (0 and 1), a customized activation function is introduced at the connecting point between the inverse and the forward models. Here, the activation function is set to be a sigmoid-based function given by
$$y = \frac{1}{{1 + {e^{ - \beta (x - \alpha )}}}}$$
where the β is the slope coefficient, α is the threshold and x is the output of the inverse network. This function maps the floating number from the inverse model output layer to a value that is extremely close to either 0 or 1 to make the input of the forward network more similar to the training set.

It is noted that the proposed SG model allows the use of a single-specified CWP value, which is an input of the SG model, to generate the 3m-length target spectrum (T'‘). The generated spectrum T'‘ is then utilized in the well-trained PTCN model to produce the required wavelength demultiplexer topology (P’), as indicated by the red arrow of Fig. 1(c), thus reducing the required input parameter numbers significantly. Specifically, the SG model is constructed by six fully connected layers with the Relu activation function to produce a 3m-length vector. To train the model, the loss function of the SG model is defined as the MAE between the generated power spectrum (T'‘) and the ground truth (T). Note that the SG model is an independent model to the presented PTCN model. While the SG model is used to facilitate the generation of wavelength demultiplexer’s spectra in the demonstration, the PTCN model, whose input is a numerical vector, can be generalized for the inverse design of nano-photonic devices with arbitrary input.

2. Results and analysis

2.1 Neural network training results

A fast decay learning rate is used and tuned amongst {0.0003, 0.0003, 0.00005, 0.0001} for the pre-trained inverse model, pre-trained forward model, the entire PTCN model, and the SG model, respectively. Both the inverse and the forward networks are pre-trained separately by the same training set with a batch size of 32 before connecting to each other. The 33000 high-performance samples obtained via 3D-FDTD simulation are utilized for training the model with a train-validation ratio of 9:1. Figure 2 shows the convergence process of loss and accuracy for the conventional tandem network and the presented PTCN. As illustrated in Fig. 2(a), the traditional tandem network has a prediction accuracy of around 97%, which means that among 400 topology pixels, there are 12 false predicted pixels. By contrast, our PTCN model exhibits a validation accuracy of around 99%, with only 5 false prediction pixels, as shown in Fig. 2(b).

 figure: Fig. 2.

Fig. 2. The training loss and the validation accuracy as the functions of epoch number for (a) conventional tandem network and (b) the proposed PTCN model. Training loss: green line; Validation loss: red line; Training accuracy: orange line; Validation accuracy: purple line.

Download Full Size | PDF

2.2 Neural network evaluation

To test the stability of the PTCN, the size of the training set is varied to 100%, 83%, 67%, 50%, 33%, and 17% of the original dataset volume. Figure 3(a) shows the correlation coefficient between the target and predicted CWP obtained via PCTN and conventional tandem models at different training dataset sizes. To ensure a fair comparison, the network architecture of the two models, including the depth and width, is set to be the same. Meanwhile, both models are optimized based on the same training dataset using the grid search method [29]. In addition, the MAE versus the training data volume for the conventional tandem model and the PTCN model is displayed in Fig. 3(b). It is clear that the PTCN model outperforms the traditional tandem model when the dataset size is reduced. The correlation coefficient of our PTCN model is kept greater than 0.974, showing excellent robustness to the small-size dataset. In contrast, the conventional tandem model struggles to fit the training set, especially when the dataset size is reduced below 50%.

 figure: Fig. 3.

Fig. 3. (a) The correlation coefficient between the target and predicted CWP values obtained via PCTN (blue) and convential tandem (yellow) models at different training data sizes. (b) MAEs of the CWP values between the target and predictions obtained via the conventional tandem (yellow) and PTCN (blue) models at different training data sizes. (c) The target and predicted CWP values based on the full dataset of 48000 samples. Target values: red dashed line; PTCN: blue dots; Conventional tandem: yellow dots. (d) CWP mismatch versus the target CWP obtained via PCTN (blue dots) and convential tandem (yellow dots) models. (e-f) Demonstration of the predicted wavelength demultiplexers with the target CWPs of 1521 nm (e) and 1566 nm (f), where the insets show the predicted topology patterns by PTCN and conventional tandem models. Optical transmission spectra simulated via the 3D-FDTD at two output ports and the reflection spectrum based on the PTCN and conventional tandem predicted topology patterns. PCTN port 1: blue solid line; Conventional tandem port 1: red solid line; PCTN port 2: blue dashed line; Conventional tandem port 2: red dashed line; PCTN reflection: green solid line; Conventional tandem reflection: green dashed line.

Download Full Size | PDF

Moreover, to evaluate the robustness of the PTCN model to the quality of training samples, the 15000 noise (low-performance) samples obtained in section 1.2 are added to the dataset. Hence, the total sample number of the new dataset for training both the conventional tandem model and the PTCN model is increased from 33000 to a full dataset of 48000. Figure 3(c) illustrates the comparison between the target and predicted CWPs based on the full dataset, and the CWP mismatch at different target CWPs is plotted in Fig. 3(d). It is clearly seen that the binary pattern predicted by the conventional tandem structure has a larger wavelength mismatch of the CWP values in comparison with the proposed PTCN model. The MAE of the CWP predicted via PTCN model based on the full dataset is reduced from 2.63nm to 2.58nm compared with the result obtained from the original dataset with 33000 training data. In contrast, the MAE of the conventional tandem model is increased from 3.65nm to 8.08nm. Figure 3(e-f) demonstrates two wavelength demultiplexers whose target CWPs are 1521nm and 1566nm. The predicted topology patterns are also shown in the inset. It can be seen that the simulated spectral responses of the wavelength demultiplexers predicted by the conventional tandem model have a large difference with the target values showing a degradation in transmission and over 13nm shift of the CWP, while the proposed PTCN model remains superior prediction accuracy and robustness when low-quality data is added.

2.3 Inverse prediction of wavelength demultiplexer

Once the PTCN model is properly trained, a variety of design targets can be achieved. The presented PTCN model learns the non-linear relationship between the topology pattern and user-defined spectra e.g. power spectra as presented in the paper. By sampling the mapping function with given spectra, the PTCN model could generate useful devices that are not presented in the training dataset, and potentially generalize to the devices outside the training range as long as they follow the same distribution as the training set [37]. To verify the design, wavelength demultiplexer samples with five different CWP targets (1530nm, 1550nm, 1560nm, 1570nm, and 1590nm) are through the well-trained PTCN model. The predicted structure patterns from the PTCN model for different CWPs are shown in Fig. 4(a) (i-v). These are used in the 3D-FDTD simulations to evaluate the optical performance of the predicted samples. The results are presented in Fig. 4(b), showing the CWPs of the predicted samples agree well with the target values with less than ${\pm} $5.3nm offset. This is less than 2.65% of the total simulated wavelength range of 200nm. Moreover, a high transmission efficiency with a transmission loss of around −2dB and low crosstalk of about −8dB can be achieved at passband for both port 1 and port 2 with a reflection power of less than −15dB at the input port for most of the wavelengths. The performance of the wavelength demultiplexer including the crosstalk at the passband can be further improved by expanding the geometrical design space, while retaining the same joint training process of the PTCN model. A doubled device size extends the size of the input and output layers to 1600 nodes. This only accounts for around 0.3% of the total number of trainable parameters of the network, which is about half a million in this design. Therefore, the computational cost increment of training PTCN is negligible. Therefore, the cost increasement of training PTCN is negligible. The 3D-FDTD simulated electrical field intensity distribution of TE polarization at port 1 and port 2 of the demultiplexer are presented in Fig. 4(c) and Fig. 4(d), respectively. The E-field intensity at the port 1 and port 2 reveals that the light at different wavelengths is successfully coupled to the target output port, which demonstrates the effectiveness of the predicted wavelength demultiplexer.

 figure: Fig. 4.

Fig. 4. Demonstration of 5 wavelength demultiplexers examples with splitting wavelengths of 1530 nm, 1550 nm, 1560 nm, 1570 nm, and 1590 nm from (i) to (v). (a) The generated wavelength demultiplexer structure pattern from the PTCN model for different splitting wavelength (Silicon shown in grey and air-hole shown in white). (b) 3D-FDTD simulated transmission spectra at two output ports and the reflection spectrum. (c) The E-field intensity result of predicted structures at the output of port 1 and different CWPs: (i) 1460 nm (ii) 1470 nm (iii) 1480 nm (iv) 1510 nm (v) 1510 nm. (d) The E-field intensity result of predicted structures at the output of port 2 and different CWPs: (i) 1600 nm (ii) 1630 nm (iii) 1630 nm (iv) 1630 nm (v) 1650 nm.

Download Full Size | PDF

2.4 Fabrication and measurement results

As a proof of concept, two wavelength multiplexers with CWP of 1550nm and 1560nm, were fabricated on a commercially available SOI wafer (Soitec), where the 220-nm-thick silicon waveguide sits on top of a 2µm buried oxide layer and 725µm silicon substrate layer. The device layer with two multiplexers and their respective taper waveguides were patterned via e-beam lithography with optimized proximity effect correction to ensure consistent pixel pitch and etching hole diameter across the entire pattern. The pattern was subsequently etched to a depth of 220nm through an inductively coupled plasma reactive ion etching process. The inset in Fig. 5(a-b) shows the SEM images of the fabricated wavelength multiplexer with a target CWP of 1550nm and 1560nm, respectively. To couple the light between an optical fiber and an SOI waveguide, additional lithography and etching processes were completed to define vertical grating couplers (VGCs) with an etch depth of 70nm. A VGC loop that consists of two paired VGCs was also fabricated for calibration purposes. The measured and predicated transmission efficiency for the fabricated wavelength splitters with CWP of 1550nm and 1560nm are plotted in Fig. 5(a) and Fig. 5(b), respectively, which shows a good agreement with only ±3nm offset between the measured and predicted CWPs. Note the measurement range in the experiment is mainly determined by the operational bandwidth of the VGCs, whose bandwidth can be extended to over 100nm [38]. The fabricated demultiplexer exhibits relatively low transmission loss around –2dB and lower than −7dB crosstalk at the passband with a less than −10dB reflection power at the input port for most of the wavelengths.

 figure: Fig. 5.

Fig. 5. Transmission efficiency of the wavelength multiplexer with target CWPs of (a) 1550 nm and (b) 1560 nm. The transmission efficiencies at the port 1 and port 2 of the device, as well as the reflection, are represented by blue, red and green colors. The solid lines indicate the simulation results, while the dashed lines display the measured values.

Download Full Size | PDF

3. Conclusion

In this paper, a novel PCTN for the inverse design of nano-photonic devices has been presented. By creating a joint training process for the inverse and forward models in the neural network, our approach successfully solves the domain shift problem in the conventional tandem architecture. Meanwhile, the proposed model exhibits high stability and robustness to the variation in quantity and quality of training data. The predicted wavelength demultiplexer via PCTN is experimentally demonstrated, showing a low crosstalk of −7dB and a low reflection of less than −10dB, agreeing well with the prediction result.

Funding

Australian Research Council.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G.-H. Duan, S. Olivier, C. Jany, S. Malhouitre, A. L. Liepvre, A. Shen, X. Pommarede, G. Levaufre, N. Girard, D. Make, G. Glastre, J. Decobert, F. Lelarge, R. Brenot, and B. Charbonnier, “Hybrid III-V silicon photonic integrated circuits for optical communication applications,” IEEE J. Sel. Top. Quantum Electron. 22(6), 379–389 (2016). [CrossRef]  

2. A. N. Tait, T. F. de Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 7430 (2017). [CrossRef]  

3. B. J. Shastri, A. N. Tait, T. Ferreira de Lima, W. H. P. Pernice, H. Bhaskaran, C. D. Wright, and P. R. Prucnal, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15(2), 102–114 (2021). [CrossRef]  

4. E. Luan, H. Shoman, D. M. Ratner, K. C. Cheung, and L. Chrostowski, “Silicon photonic biosensors using label-free detection,” Sensors 18(10), 3519 (2018). [CrossRef]  

5. S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vucković, and A. W. Rodriguez, “Inverse design in nanophotonics,” Nat. Photonics 12(11), 659–670 (2018). [CrossRef]  

6. J. Huang, H. Ma, D. Chen, H. Yuan, J. Zhang, Z. Li, J. Han, J. Wu, and J. Yang, “Digital nanophotonics: the highway to the integration of subwavelength-scale photonics,” Nanophotonics 10(3), 1011–1030 (2021). [CrossRef]  

7. S. Mao, L. Cheng, C. Zhao, F. N. Khan, Q. Li, and H. Y. Fu, “Inverse design for silicon photonics: from iterative optimization algorithms to deep neural networks,” Appl. Sci. 11(9), 3822 (2021). [CrossRef]  

8. H. Ying, M. Y. Zhu, J. Zhang, S. Sygletos, F. Li, X. T. Huang, Y. Jiang, X. W. Yi, and K. Qiu, “Complex neural network equalization of optical SSB PAM-4 signal in direct-detection Kramers-Kronig receiver,” in Conference on Lasers and Electro-Optics (CLEO) (IEEE, 2018), 1–2.

9. A. Pepe, Z. Wei, and H. Y. Fu, “Heuristic, machine learning approach to 8-CSK decision regions in RGB-LED visible light communication,” OSA Continuum 3(3), 473–482 (2020). [CrossRef]  

10. D. Piccinotti, K. F. MacDonald, S. A. Gregory, I. Youngs, and N. I. Zheludev, “Artificial intelligence for photonics and photonic materials,” Rep. Prog. Phys. 84(1), 012401 (2021). [CrossRef]  

11. J. Fang, A. Swain, R. Unni, and Y. Zheng, “Decoding optical data with machine learning,” Laser Photonics Rev. 15(2), 2000422 (2021). [CrossRef]  

12. G. Genty, L. Salmela, J. M. Dudley, D. Brunner, A. Kokhanovskiy, S. Kobtsev, and S. K. Turitsyn, “Machine learning and applications in ultrafast photonics,” Nat. Photonics 15(2), 91–101 (2021). [CrossRef]  

13. Q. Liu, B. Gily, and M. P. Fok, “Adaptive Photonic Microwave Instantaneous Frequency Estimation Using Machine Learning,” IEEE Photonics Technol. Lett. 33(24), 1511–1514 (2021). [CrossRef]  

14. A. Venketeswaran, N. Lalam, J. Wuenschell, P. R. Ohodnicki Jr, M. Badar, K. P. Chen, P. Lu, Y. Duan, B. Chorpening, and M. Buric, “Recent advances in machine learning for fiber optic sensor applications,” Adv. Intell. Syst. 4(1), 2100067 (2022). [CrossRef]  

15. Z. Liu, D. Zhu, L. Raju, and W. Cai, “Tackling photonic inverse design with machine learning,” Adv. Sci. 8(5), 2002923 (2021). [CrossRef]  

16. S. So, T. Badloe, J. Noh, J. Bravo-Abad, and J. Rho, “Deep learning enabled inverse design in nanophotonics,” Nanophotonics 9(5), 1041–1057 (2020). [CrossRef]  

17. N. Wang, W. Yan, Y. Qu, S. Ma, S. Z. Li, and M. Qiu, “Intelligent designs in nanophotonics: from optimization towards inverse creation,” PhotoniX 2(1), 1–35 (2021). [CrossRef]  

18. P. R. Wiecha, A. Arbouet, C. Girard, and O. L. Muskens, “Deep learning in nano-photonics: inverse design and beyond,” Photonics Res. 9(5), B182–200 (2021). [CrossRef]  

19. D. Gostimirovic and W. N. Ye, “An open-source artificial neural network model for polarization-insensitive silicon-on-insulator subwavelength grating couplers,” IEEE J. Sel. Top. Quantum Electron. 25(3), 1–5 (2019). [CrossRef]  

20. B. Hu, B. Wu, D. Tan, J. Xu, and Y. Chen, “Robust inverse-design of scattering spectrum in core-shell structure using modified denoising autoencoder neural network,” Opt. Express 27(25), 36276–36285 (2019). [CrossRef]  

21. M. H. Tahersima, K. Kojima, T. Koike-Akino, D. Jha, B. Wang, C. Lin, and K. Parsons, “Deep neural network inverse design of integrated photonic power splitters,” Sci. Rep. 9(1), 1368 (2019). [CrossRef]  

22. Y. Tang, K. Kojima, T. Koike-Akino, Y. Wang, P. Wu, Y. Xie, M. H. Tahersima, D. K. Jha, K. Parsons, and M. Qi, “Generative deep learning model for inverse design of integrated nanophotonic devices,” Laser Photon. Rev. 14(12), 2000287 (2020). [CrossRef]  

23. M. H. Tahersima, K. Kojima, T. Koike-Akino, D. Jha, B. Wang, C. Lin, and K. Parsons, “Deep neural network inverse modeling for integrated photonics,” in Optical Fiber Communication Conference (OFC) (2019), paper W3B.5.

24. N. J. Dinsdale, P. R. Wiecha, M. Delaney, J. Reynolds, M. Ebert, I. Zeimpekis, D. J. Thomson, G. T. Reed, P. Lalanne, K. Vynck, and O. L. Muskens, “Deep learning enabled design of complex transmission matrices for universal optical components,” ACS Photonics 8(1), 283–295 (2021). [CrossRef]  

25. K. Kojima, M. H. Tahersima, T. Koike-Akino, D. K. Jha, Y. Tang, Y. Wang, and K. Parsons, “Deep neural networks for inverse design of nanophotonic devices,” J. Lightwave Technol. 39(4), 1010–1019 (2021). [CrossRef]  

26. Q. Wang, M. Makarenko, A. Burguete Lopez, F. Getman, and A. Fratalocchi, “Advancing statistical learning and artificial intelligence in nanophotonics inverse design,” Nanophotonics 0 (2021).

27. H. Wankerl, M. L. Stern, A. Mahdavi, C. Eichler, and E. W. Lang, “Parameterized reinforcement learning for optical system optimization,” J. Phys. D: Appl. Phys. 54(30), 305104 (2021). [CrossRef]  

28. Y. Xu, X. Zhang, Y. Fu, and Y. Liu, “Interfacing photonics with artificial intelligence: an innovative design strategy for photonic structures and devices based on artificial neural networks,” Photonics Res. 9(4), B135–152 (2021). [CrossRef]  

29. D. Liu, Y. Tan, E. Khoram, and Z. Yu, “Training deep neural networks for the inverse design of nanophotonic structures,” ACS Photonics 5(4), 1365–1369 (2018). [CrossRef]  

30. Z. Zhen, C. Qian, Y. Jia, Z. Fan, R. Hao, T. Cai, B. Zheng, H. Chen, and E. Li, “Realizing transmitted metasurface cloak by a tandem neural network,” Photonics Res. 9(5), B229–235 (2021). [CrossRef]  

31. L. Gao, X. Li, D. Liu, L. Wang, and Z. Yu, “A bidirectional deep neural network for accurate silicon color design,” Adv. Mater. 31(51), 1905467 (2019). [CrossRef]  

32. C. Qiu, X. Wu, Z. Luo, H. Yang, G. Wang, N. Liu, and B. Huang, “Simultaneous inverse design continuous and discrete parameters of nanophotonic structures via back-propagation inverse neural network,” Opt. Commun. 483, 126641 (2021). [CrossRef]  

33. J. Trisno, H. Wang, H. T. Wang, R. J. H. Ng, S. Daqiqeh Rezaei, and J. K. W. Yang, “Applying machine learning to the optics of dielectric nanoblobs,” Adv. Photonics Res. 1(2), 2000068 (2020). [CrossRef]  

34. L. Xu, M. Rahmani, Y. Ma, D. A. Smirnova, K. Z. Kamali, F. Deng, Y. K. Chiang, L. Huang, H. Zhang, S. Gould, D. N. Neshev, and A. E. Miroshnichenko, “Enhanced light–matter interactions in dielectric nanostructures via machine-learning approach,” Adv. Photonics 2(02), 1 (2020). [CrossRef]  

35. S. Mao, L. Cheng, F. N. Khan, Z. Geng, Q. Li, and H. Y. Fu, “Inverse design of high-dimensional nanostructured 2×2 optical processors based on deep convolutional neural networks,” J. Lightwave Technol. 40(9), 2926–2932 (2022). [CrossRef]  

36. A. Y. Piggott, J. Lu, K. G. Lagoudakis, J. Petykiewicz, T. M. Babinec, and J. Vučković, “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics 9(6), 374–377 (2015). [CrossRef]  

37. R. F. Mello and M. A. Ponti, Machine Learning: A Practical Approach on the Statistical Learning Theory, (Springer, 2018).

38. N. V. Sapra, D. Vercruysse, L. Su, K. Y. Yang, J. Skarda, A. Y. Piggott, and J. Vučković, “Inverse design and demonstration of broadband grating couplers,” IEEE J. Sel. Top. Quantum Electron. 25(3), 1–7 (2019). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1.
Fig. 1. (a) Schematic diagram of the 1×2 wavelength multiplexer. The states of the pixels vary the topology pattern of the multimode region, which change the optical spectra at port 1 and 2 and the reflection (R). (b) Flowchart demonstrating the data collection process based on a direct binary search algorithm. (c) The overall architecture of the PTCN and SG models.
Fig. 2.
Fig. 2. The training loss and the validation accuracy as the functions of epoch number for (a) conventional tandem network and (b) the proposed PTCN model. Training loss: green line; Validation loss: red line; Training accuracy: orange line; Validation accuracy: purple line.
Fig. 3.
Fig. 3. (a) The correlation coefficient between the target and predicted CWP values obtained via PCTN (blue) and convential tandem (yellow) models at different training data sizes. (b) MAEs of the CWP values between the target and predictions obtained via the conventional tandem (yellow) and PTCN (blue) models at different training data sizes. (c) The target and predicted CWP values based on the full dataset of 48000 samples. Target values: red dashed line; PTCN: blue dots; Conventional tandem: yellow dots. (d) CWP mismatch versus the target CWP obtained via PCTN (blue dots) and convential tandem (yellow dots) models. (e-f) Demonstration of the predicted wavelength demultiplexers with the target CWPs of 1521 nm (e) and 1566 nm (f), where the insets show the predicted topology patterns by PTCN and conventional tandem models. Optical transmission spectra simulated via the 3D-FDTD at two output ports and the reflection spectrum based on the PTCN and conventional tandem predicted topology patterns. PCTN port 1: blue solid line; Conventional tandem port 1: red solid line; PCTN port 2: blue dashed line; Conventional tandem port 2: red dashed line; PCTN reflection: green solid line; Conventional tandem reflection: green dashed line.
Fig. 4.
Fig. 4. Demonstration of 5 wavelength demultiplexers examples with splitting wavelengths of 1530 nm, 1550 nm, 1560 nm, 1570 nm, and 1590 nm from (i) to (v). (a) The generated wavelength demultiplexer structure pattern from the PTCN model for different splitting wavelength (Silicon shown in grey and air-hole shown in white). (b) 3D-FDTD simulated transmission spectra at two output ports and the reflection spectrum. (c) The E-field intensity result of predicted structures at the output of port 1 and different CWPs: (i) 1460 nm (ii) 1470 nm (iii) 1480 nm (iv) 1510 nm (v) 1510 nm. (d) The E-field intensity result of predicted structures at the output of port 2 and different CWPs: (i) 1600 nm (ii) 1630 nm (iii) 1630 nm (iv) 1630 nm (v) 1650 nm.
Fig. 5.
Fig. 5. Transmission efficiency of the wavelength multiplexer with target CWPs of (a) 1550 nm and (b) 1560 nm. The transmission efficiencies at the port 1 and port 2 of the device, as well as the reflection, are represented by blue, red and green colors. The solid lines indicate the simulation results, while the dashed lines display the measured values.

Equations (5)

Equations on this page are rendered with MathJax. Learn more.

L M S E = 1 3 m i = 1 3 m ( N i s N i t ) 2  
L B C E = 1 N [ i = 1 N P i l o g ( P i ) + ( 1 P i ) l o g ( 1 P i ) ]  
L M A E = 1 3 m i = 1 3 m | T i T i |
L t o t a l = L B C E + θ L M A E  
y = 1 1 + e β ( x α )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.