C-DONN: compact diffractive optical neural network with deep learning regression

Wencan Liu; Tingzhao Fu; Yuyao Huang; Run Sun; Sigang Yang; Hongwei Chen

doi:10.1364/OE.490072

1. Introduction

Concomitant with the substantial development of deep neural networks (DNNs) [1,2], many complex tasks, including computer vision [3,4], natural language processing [5,6], and medical science [7–9] can be efficiently processed. Advanced DNN algorithms have rigorous requirements on computing platforms [10–12] such as central processing unit (CPU), graphics processing unit (GPU), and field-programmable gate array (FPGA), which are responsible for massive data throughputs and computations. However, processing DNN needs a lot of computing resources and energy consumption when facing complex problems. Therefore, a neural network implementation with higher speed and lower power is urgently required. Optical neural network (ONN) has become a substitute for electronic counterparts when processing complicated tasks due to its characteristics of low-power, ultra-broad bandwidth, and light-speed parallel processing. Different kinds of ONNs are proposed, including artificial neural computing using nanophotonic neuron medium [13], integrated spike neural network using phase change materials (PCM) [14], a coherent method based on integrated Mach Zehnder interferometer (MZI) [15–19], and other structures like ring modulator array [20–23]. Nonetheless, the computing scale of these architectures is limited by computing density and energy consumption. For instance, the footprints of computing units in Refs. [14] and [15] are approximately $\mathrm{285\ \times 250\;\ \mathrm{\mu} }{\textrm{m}^\textrm{2}}$ and $\mathrm{55\ \times 220\;\ \mathrm{\mu} }{\textrm{m}^\textrm{2}}$, respectively. Each unit in these cases requires a continuous energy supply, which restricts large-scale expansion.

Free-space diffractive optical neural networks (DONNs) [24–29] have gained increased attention for their superior computational capability and passive nature, processing data using the inherent parallelism of optics. DONNs have the advantage of mapping a greater number of neurons and connections onto optics, compared to other ONNs. However, free-space DONNs are based on discrete diffractive units with a relatively large volume, posing integration challenges for compact systems. In addition, their complex calibration processes can introduce additional errors. In previous works [30,31], we demonstrated an integrated diffractive optical neural network on a silicon-on-insulator (SOI) platform, leveraging metasurface technologies. This architecture uses one-dimensional metasurfaces, aiming to offer an enhanced computational ability and accommodate a larger number of neurons and connections within optical parameters. Nevertheless, the current on-chip DONNs [30–37] require a group of silica slots to represent a neuron and need extra length between adjacent hidden layers due to approximation rules. These constraints limit the integration level.

In this study, we make one step forward by proposing a novel neuron mapping method, eliminating the need for approximation techniques and enhancing the level of integration. Given the complexity of numerically analyzing the change in effective refractive index (ERI) for identical slots, we have developed a deep mapping regression model (DMRM) for our neuron mapping rule. This model characterizes the propagation process of light within the metalines. By gathering a substantial amount of data, the DMRM is trained to approximate the intricate Maxwell interaction within each hidden layer [38]. We then employ the particle swarm optimization (PSO) [39] algorithm to seek the optimal neuron parameters in the hidden layers. To validate the effectiveness of our compact-DONN (C-DONN), we benchmark its classification performance on the Iris dataset [40] and employ 2.5D variational finite difference time domain (var-FDTD) solver to gather training data for the DMRM and verify the overall diffractive metasystem. Our C-DONN achieves a test accuracy of 93.3% as verified by the var-FDTD. Our proposed C-DONN offers comparable performance with a significantly higher integration level and simpler structure, presenting promising potential for large-scale computing applications.

2. Structure of C-DONN

The C-DONNs designed herein comprise one or more metalines, forming an entirely optical physical network. This network can tackle complex tasks through the interference of transmitted light. In contrast to conventional artificial neural networks, the novel characteristic here is the composition of the hidden layer, made up of silica slots. Each slot's length, while maintaining a fixed width and thickness, acts as a trainable parameter. Each such parameter is defined as an individual neuron, correlating to the changes in the optical field after propagation through the metalines. This study treats each slot as an independent diffractive unit. As light propagates through these slots, the length variation of the slots allows for control over different diffraction outputs, thereby providing solutions for corresponding complex tasks.

When the parameters of neurons in the hidden layers of the C-DONN are optimized, these parameters can be mapped onto the physical structure (length of slots) of the C-DONN. The schematic diagram of our proposed C-DONN based on an SOI platform is shown in Fig. 1.

Fig. 1. Structure of proposed C-DONN. The DONN consists of 2 hidden layers (metalines), 4 input waveguides, and 3 detectors. Each hidden layer contains 50 slots ($Si{O_2}$). The length of the slab ($Si$) waveguide is 53 µm and the width is 30 µm. There are 2 layers of cladding ($Si{O_2}$) at the top and bottom of the waveguide layer. The thickness of each slot is n, the width is m, the max length is p, and the distance between adjacent slots is q. Once the length of slots is optimized, the DONN can be fabricated by microelectronic processes.

Download Full Size | PDF

In the optimized C-DONN configuration, each metaline, measuring 30 µm in width (W), is perceived as a hidden layer and distributed along the Y-axis. Each metaline includes 50 neurons, which are comprised of 50 slots. As depicted in Fig. 1, the length and width of a metaline are set at 3 µm and 30 µm respectively. Each slot is characterized by a thickness (n) of 220 nm and a width (m) of 200 nm. The slots are spaced 300 nm apart (q), and the maximum length of each slot (p) is equal to the metaline's width (i.e., 3 µm). The output layer is defined by three detector regions, labelled “D₁”, “D₂”, and “D₃”, aligned in a linear configuration. Each detector spans a width of 5 µm, with a center-to-center separation of 7 µm between adjacent detectors. These detectors correspond to specific classification result categories, with the classification basis being the optical field intensity detected within each respective detector region.

3. Modeling of C-DONN

3.1 Deep mapping regression model

In our previous studies [30,31], the process of neuron mapping was typically reduced to the phase alteration of the optical field produced by the length variations of silica slots. However, the relationship between the optical field and these slots is governed by a complex function that is challenging to articulate in a direct mathematical expression, therefore we use slots group to represent one single neuron. We also increase the layer spacing to mitigate the impact of the angle of incident light, thereby stabilizing the effective refractive index (ERI) and facilitating a more precise mathematical analysis. In the present research, we employ the DMRM to approximate the intricate function representing light propagation within a line of slots, referred as metaline.

Assuming that the width, thickness and lattice constant of the slots are fixed, and their length is arranged as ${L_{slot}}$. The maximum length of each slot is L. The starting position of a metaline is ${x_0}$, and the electrical field distribution of the optical field at ${x_0}$ is ${E_{{x_0}}}$, so the field distribution of the optical field at $({x_0} + L)$ µm can be calculated according to Eq. (1):

(1)$${E_{{x_0} + L}} = F({E_{{x_\textrm{0}}}},\textrm{ }{L_{slot}})$$

where F is the complex function of light propagating in the metaline, and when the width, thickness and lattice constant of the slot are fixed, the optical field succeeding the metaline ${E_{{x_0} + L}}$ is only related to the optical field preceding the metaline and the length of the slots. Therefore, it is only necessary to find the mapping function F, giving the length of the slots in the metaline and the optical field preceding the metaline, then the optical field succeeding the metaline can be quickly calculated. With this method, the neuron mapping process can be represented by neural network, which no longer requires slot group and can take effect at any layer spacing.

In pursuit of the mapping function F, the approach involves using the DMRM to approximate the light propagation process within the metaline. For this work, we employ var-FDTD to generate both training and testing datasets. Figure 2(a) illustrates the simulation diagram for the var-FDTD. The input component is constituted by four waveguides, with light propagating via an SOI platform to the input of the hidden layer, the metaline. Each metaline is defined by a specific number of slots, which corresponds to a particular quantity of neuron parameters. The training set is acquired by varying the input while randomly generating slot lengths. Particularly, the distance between the input waveguide and the metaline is set at 15 µm, the metaline is equipped with 50 slots, and the maximum length of each slot is established at 3 µm. Under these parameter configurations, the var-FDTD is employed to compute and generate 72,000 samples. These samples are utilized to train and validate the DMRM. Each sample encompasses the slot lengths, the complex amplitude of the electric field in the y-direction (${E_y}$) of the input optical field preceding the metaline, and the ${E_y}$ of the output optical field succeeding the metaline.

Fig. 2. (a) Simulation diagram of var-FDTD acquiring training data. The distance between the slots and the input source is fixed at 15 µm. The monitor 1 collect the ${E_y}$ of optical field preceding the metaline and monitor 2 collect the ${E_y}$ of optical field succeeding the metaline. (b) The process of collecting training data on the CPU. The training set includes ${E_y}$ of input source (real and imaginary part), lengths of the slots (${L_{slot}}$) and ${E_y}$ of output source (real and imaginary part). (c) The process of DMRM training on GPU, the input data includes input ${E_y}$ and ${L_{slot}}$, which input to a fully connected layer respectively. Then the 2 parts are concatenated together and input to a fully connected layer. The output layer is defined as the output ${E_y}$ and the overall model plays the role of the mapping function F.

Download Full Size | PDF

The structure of the DMRM is fundamentally a deep neural network tailored for regression tasks. Unlike standard neural networks, the parameters in this network are entirely complex-valued. We utilize the ComplexPyTorch library [41,42] within the Pytorch platform to build the model's architecture due to its capability to handle complex-valued data. Given the complex-valued nature of optical field information, this structure facilitates a more precise understanding of the interaction relationships, thereby better approximating the light propagation process within the metaline. The input layer of the DMRM comprises two components. The first includes the ${E_y}$ field of light preceding the metaline (dependent on the designed structure and simulation accuracy, 673 dimensions in this study), while the second represents the lengths of the slots (corresponding to the designed structure, 50 dimensions in this study). Given that the ${E_y}$ field and slot length do not represent the same physical quantity, each part is fed into fully connected layers independently, followed by a concatenate layer. After concatenation, the subsequent deep, fully connected layers possess more neurons and layers. The output layer is defined as the ${E_y}$ field of light succeeding the metaline (673 dimensions). We employ the complex $ReLU$ function as the activation function after each layer for optimal fitting effects. Each part of the network includes a real processing segment and an imaginary processing segment, effectively doubling the number of parameters. The network structure of the DMRM is illustrated in Fig. 2(b).

As for network training, Fig. 2(a) and (b) depict the entire training process. Figure 2(a) showcases the data collection process on the CPU, while Fig. 2(b) details the DMRM training process on the GPU. The input ${E_y}$ and ${L_{slot}}$ constitute the aforementioned two-part input, with the output ${E_y}$ serving as the ground truth. The training process employs the Adam Optimizer and the gradient descent algorithm. The DMRM can be accessed at https://github.com/Liuwc01/Deep-Mapping-Regression-Model.

Assuming that the data size is N, in the training process, complex-valued mean square error (CMSE) is used as the loss function, which is defined as Eq. (2):

(2)$$CMSE(x,y) = \frac{1}{N}\sum\limits_N {[{{({x_{real}} - {y_{real}})}^2} + {{({x_{imag}} - {y_{imag}})}^2}]}$$

where x is the output of DMRM and y is the ground truth of var-FDTD simulation. Since x and y are both complex-valued, ${x_{real}}$ and ${x_{imag}}$ are the real and imaginary part of x, and the same for ${y_{real}}$ and ${y_{imag}}$. The loss curves of training set and verification set are shown in Fig. 3(a).

Goodness of fit (${R^2}$) is chosen as the measure of fitting effect of the DMRM, which can be calculated as Eq. (3):

(3)$${R^\textrm{2}}\textrm{ = }\frac{{ESS}}{{TSS}} = 1 - \frac{{RSS}}{{TSS}} = \frac{1}{N}\sum\limits_N {\frac{{\sum\nolimits_{i = 1}^n {{{(\widehat {{y_i}} - \overline {{y_i}} )}^2}} }}{{\sum\nolimits_{i = 1}^n {{{({y_i} - \overline {{y_i}} )}^2}} }}}$$

where ${y_i}\; $ is the ground truth, $\widehat {{y_i}}$ is the DMRM output and ${\bar{y}_i}$ is the Average value of ${y_i}$. Here consider both the real part and the imaginary part of the predicted optical field and ground truth. The goodness of fit (${R^2}$) curves of the validation set is shown in Fig. 3(b).

Fig. 3. (a) The loss curves on the training set (black line) and validation set (red line) for the DMRM during the fitting procedure. (b) The goodness of fit (${R^2}$) curves on the real part (black line) and imaginary part (red line) of the validation set during the learning procedure.

Download Full Size | PDF

For a specific design, we manipulate both the learning rate and the number of neurons per layer to conduct a hyperparameter search aimed at minimizing validation error. Drawing from our experience, the selection of hyperparameters has a strong correlation with the size of the training set. Given the size of the training set in this scenario, the hyperparameters for training are established prior to the final training phase. We have observed that minor structural modifications to the model do not significantly impact the CMSE.

Upon the conclusion of the training phase, the DMRM weights are fixed and stored in a file, enabling convenient retrieval and utilization. Subsequently, we employ the var-FDTD method to generate multiple test sets for assessing the DMRM's performance. During this procedure, the DMRM's input consists of several sets of ${E_y}$ values from the input optical fields and corresponding slot lengths. By comparing the ${E_y}$ values of the output optical field from both DMRM and var-FDTD simulations, we can intuitively observe the approximation effect and evaluate the DMRM's performance.

Figure 4 presents a comparative analysis of the DMRM output and var-FDTD simulation results for a randomly selected groups of test data. As seen in Fig. 4(a), a substantial discrepancy exists between the DMRM prediction outcomes and the pre-training simulation results. However, Fig. 4(b) illustrates that the post-training DMRM prediction results closely align with the simulation results. This high degree of similarity, especially evident in the overall trend, peaks, and valleys of the waveform, provides substantial evidence of the effectiveness of the DMRM.

Fig. 4. (a) The comparison between the DMRM output and var-FDTD simulation results before the training process for the same sample in testing set. (b) The comparison between the DMRM output and var-FDTD simulation results after the training process for the same sample in testing set. The DMRM is effective and matches well in the trend, peak and valley of the waveform

Download Full Size | PDF

Given the above reasons, the neuron mapping approximation method proposed herein is found to be effective. Specifically, once the number of neurons and chip design size are determined, the mapping process does not necessitate additional approximation conditions. Instead, it is approximated via the regression network of the DMRM. This approach achieves high accuracy within the acceptable error range as demonstrated by experimental testing. Maintaining generality, this neuron mapping rule can be applied to the design of any hidden layer within a neural network system. Consequently, this ensures that the size parameters of the C-DONN can be freely designed to accommodate specific requirements.

3.2 On-chip electromagnetic propagation model

On the basis of the DMRM and the structure of C-DONN above, we proposed the on-chip electromagnetic propagation model (OEPM) based on the DMRM and Huygens Fresnel principle, which is modified according to the restricted propagation conditions. The propagation process of optical field between adjacent layers of C-DONN can be described as shown in Eq. (4), which had been verified in our previous works [30,31]:

(4)$$a(m,n) = \frac{1}{{j\lambda }} \cdot (\frac{{1 + \cos \theta }}{{2r}}) \cdot \exp (j\frac{{2\pi r{n_{slab}}}}{\lambda }) \cdot \gamma \exp (j\Delta \phi )$$

where m represents the point on the chip located at $({{x_m},{y_m}} )$, n represents the point on the chip located at $({{x_n},{y_n}} )$, $\lambda $ is the working wavelength, ${n_{slab}}$ is the effective refractive index (ERI) of the slab waveguide, $cos\theta $=$({{x_n} - {x_m}} )/r$, $r = \sqrt {{{({{x_n} - {y_n}} )}^2} + {{({{y_n} - {y_m}} )}^2}} $ is the distance between point m and point n on the chip, $\gamma $ is a specific coefficient of the amplitude and $\Delta \phi $ is a fixed phase delay for the classical Huygens-Fresnel principle when light propagates a certain distance in a slab waveguide.

Utilizing the DMRM for propagation within hidden layers and Eq. (4) for propagation between every two successive layers allows for a comprehensive description of the OEPM of light, as depicted in Fig. 5(a). Taking into account the accuracy of the optical field simulation result (a 673-dimensional vector), we can, without compromising accuracy, consider any x plane in the given C-DONN as the secondary source of the wave vector (673 dimensions) based on the simulation accuracy. Therefore, for two x planes at a fixed distance, the propagation process can be articulated as a propagation matrix A (673 by 673). The elements within this matrix can be represented by Eq. (4), signifying the optical field propagated from the i-th pixel in the ${x_1}$ plane to the j-th pixel in the ${x_2}$ plane. Figure 5 illustrates the entire on-chip propagation process, which can be expressed as Eq. (5):

(5)$$\left\{ {\begin{array}{{c}} {{E_y}({x_2}) = {E_y}({x_1}) \cdot A({x_2} - {x_1})}\\ {{A_{i,j}} = a(i,j)}\\ {{E_y}({x_3}) = F[{E_y}({x_2}),\textrm{ }{L_{slot}}]} \end{array}} \right.$$

where ${E_y}(x )$ is a row vector representing the electric field of plane $x$, A is the propagation matrix with the correction factors for the distance from ${x_1}$ to ${x_2}$, and $a({i,j} )$ can be expressed as Eq. (4). F is the function generated by the DMRM, ${L_{slot}}$ refers to the length of the slots in the metaline. Figure 5 (b) shows the network abstraction of forward propagation process. The matrix A and the propagation in each metaline can be calculated by Eq. (5), both of which represent the overall on-chip propagation process.

Fig. 5. (a) Schematic of proposed C-DONN, each point on a given layer acting as a secondary source of a wave, the on-chip propagation in different area can be calculated by matrix A and function F respectively. (b) Network abstraction of the forward propagation. Red circle dots represent the input source and output detectors. Blue circle dots represent the ${E_y}$ of the optical field preceding or succeeding each metaline.

Download Full Size | PDF

To substantiate the effectiveness of the OEPM, Fig. 6 displays the magnitude of the optical field distribution, as simulated by the var-FDTD method and calculated by the OEPM, on the output surfaces of the first and second hidden layers, as well as the detection surface. This reveals a strong alignment between the OEPM calculation results and the var-FDTD simulation outcomes, thereby supporting the efficacy of the OEPM.

Fig. 6. The simulation results and comparations of var-FDTD simulation and OEPM calculation of the proposed C-DONN. (a), (b) and (c) show the positions of the monitors and the comparations of the magnitude of optical field on the monitoring surface simulated by var-FDTD and calculated by OEPM respectively.

Download Full Size | PDF

4. Verification of C-DONNs on Iris dataset

4.1 Parameters optimization algorithm

Upon establishing the on-chip architectural design, the parameters can be subsequently optimized. In this study, the parameter to be optimized is the ${L_{slot}}$ parameter of function F in Eq. (5), analogous to the neurons within the network. We employ the PSO algorithm [39] to facilitate parameter optimization, adjusting the fitness value (FV) and optimization process to circumvent premature convergence. Assume a particle swarm comprising M particles seeking the optimal position within an N-dimensional space, with each particle assigned a ‘position’ denoted as ${x_i}$. For every particle, this position represents a potential solution to the problem. The FV is determined by substituting ${x_i}$ into the objective function and is gauged according to the resultant FV. Throughout each search process, we record the optimal position of every particle. Following each iteration, the optimal solution derived from the optimal positions of all particles is deemed as the optimal position for the entire particle swarm. This search process is repeated until a predefined number of iterations is reached. After each positional search, the speed ${v_i}$ and position ${x_i}$ of each particle necessitate updating. This updating process can be calculated according to Eq. (6):

(6)$$\left\{ {\begin{array}{{c}} {v_i^d = wv_i^d + {c_1}{r_1}(p_i^d - x_i^d) + {c_2}{r_2}(p_g^d - x_g^d)}\\ {x_i^d = x_i^d + \alpha v_i^d} \end{array}} \right.$$

where $i = 1,2,\ldots ,\; M$, $d = 1,2,\ldots ,\; N$, w is a non-negative number, representing the inertia factor, which is significant in the convergence of the algorithm. The larger its value, the wider the range of particle leaps, which means easier to find the global optimum. ${p_i}$ and ${p_g}$ are local and global optimal positions respectively. The acceleration constants ${c_1}$ and ${c_2}$ are also non-negative constants, which are parameters to adjust the weight of the local optimal value and the global optimal value. ${r_1}$ and ${r_2}$ are random numbers within the range of [0, 1], $\alpha $ It is a constraint factor to control the weight of speed.

As mentioned above, in each iteration, the updating process of ${p_i}$ and ${p_g}$ is related to the FV of particle swarm, which is very critical for different tasks. In this work, the output is calculated by detectors results corresponding to classification types. The intensity of the optical field of the $i$-th detectors can be measured as Eq. (7):

(7)$${S_i} = {|{{E_y}({x_0}) \cdot A(d) \cdot S{V_i}} |^2}$$

where ${E_y}({{x_0}} )$ is a row vector representing the electrical field of the last hidden layer output, $d$ is the distance from the output plane to the detectors in x-axis, $S{V_i}$ is a column vector, with “0” and “1” representing the distribution of the detection area in the detection plane.

Using the intensity of the optical field as the basis for classification results, we have selected two functions to compute the FV. We initially optimize $F{V_1}$ of the classification results, proceeding to optimize $F{V_2}$ once a certain threshold is attained. This approach can circumvent ‘premature’ optimization results and, to a certain extent, prevent the particle swarm from converging to a local optimum. The fitness values can be computed using Eq. (8) and Eq. (9):

(8)$$F{V_1} = \sum\nolimits_i {\left\{ {\begin{array}{{c}} {0\textrm{ if }{y_i} = \max [{S_i}]}\\ {1\textrm{ if }{y_i} \ne \max [{S_i}]} \end{array}} \right.}$$

(9)$$F{V_2} = \sum\nolimits_i {\frac{{\sum {{{({S_i} - G{T_i})}^\textrm{2}}} }}{n}}$$

where ${S_i}$ represents the intensity of the optical field of n detectors with the i-th input data, while $G{T_i}$ denotes the ground truth in the form of one-hot encoding. The max function outputs the sequence number of the largest element in one-hot encoding. $F{V_1}$ captures the accuracy of particle classification results for the training set, and $F{V_2}$ reflects the distinction between different classification outcomes. The comprehensive optimization process is depicted in Fig. 7. Initially, the features of the Iris plants dataset are mapped onto the phase, and then loaded onto the input source. The OEPM performs calculations using the length of the slots and the input source to procure the intensity as calculated by the output detectors. Subsequently, the accuracy and loss of the classification result are determined. During the optimization process, the set threshold is employed to calculate and compare the results of $F{V_1}$ and $F{V_2}$. The parameters for Gbest (global optimal value), Pbest (personal optimal value), as well as the speeds and positions of the particle swarm are updated. The optimized parameters are then reintroduced to the propagation input for the next iteration.

4.2 Benchmarking on Iris dataset

Leveraging the OEPM and optimization algorithm, we employ the Iris plants dataset to validate the performance of the proposed C-DONNs. The dataset comprises 150 data sets, with each set containing four input attributes loaded onto the corresponding input waveguides in the form of phases. Each dataset also encompasses three output attributes that denote types of iris plants, namely “Virginia”, “Versicolor”, and “Setosa”. The dataset is divided into a training set and a test set, adhering to a 4:1 rule. The parameters within the on-chip DONNs system are pre-trained. Once the design and fabrication processes are complete, the operational process of the C-DONNs is fully optical.

Fig. 7. The optimization process of C-DONN. The features are mapped to phase. The forward propagating process will calculate the intensity of detectors and obtain the accuracy and loss for the optimization by OEPM. The optimization process will update the parameters of particle swarm algorithm and input to the next iteration.

Download Full Size | PDF

In this study, two configurations, C-DONN-1 and C-DONN-2 are optimized, comprising one and two hidden layers, respectively. The initial step involves modulating the input eigenvalues onto the phase of the input light. Subsequently, the C-DONN parameters (i.e., the length of the silica slots) are optimized using the DMRM and optimization algorithm. These pre-training parameters are then mapped to silicon-based structures, which vary in terms of slot lengths. Importantly, there is no need for additional constraints to ensure accurate neuron mapping, as the OEPM takes into account the errors in the overall propagation process. This results in an overall highly compact C-DONN design, significantly enhancing on-chip neuron integration. In terms of numerical calculations, the classification accuracies for C-DONN-1 and C-DONN-2 are 93.3% and 96.7% respectively. We further find that a significant improvement in prediction accuracy does not accompany an increase in the number of hidden layers. Therefore, considering minimum requirements, network performance, classification accuracy, and energy consumption, it seems prudent to employ two hidden layers in the C-DONN design. Figure 8(a) and (b) display the loss value and precision value of the training set during the optimization process and the confusion matrix of the testing results for C-DONN-1 and C-DONN-2.

Fig. 8. The loss curves on the training set (black line) and accuracy curves on the test set (red line) and the confusion matrices of the testing result for the optimized (a) C-DONN-1 and (b) C-DONN-2.

Download Full Size | PDF

Next, var-FDTD is employed to assess the performance of C-DONN-2. Figure 9 presents the var-FDTD simulation results and power distributions of C-DONN-2 under three distinct input types. The optical field's propagation within the silicon substrate and two hidden layers can be accurately represented and guided, eventually focusing on the correct classification area. This fulfills the requirements of the classification task.

Fig. 9. The var-FDTD simulation results and power distributions of the proposed on-chip C-DONN-2. D₁, D₂ and D₃ show the power of 3 detectors corresponding to “Virginia”, “Versicolor” and “Setosa” respectively. (a), (b) and (c) show the simulation results and power distributions under 3 different input type of iris plants respectively.

Download Full Size | PDF

Figure 10 displays the confusion matrix for the Iris dataset's test set for both C-DONN-1 and C-DONN-2 based on var-FDTD simulation. Comparing these results with the numerical calculation outcomes shown in Fig. 8, there is a strong correlation in terms of classification results and accuracy. The numerical calculation accuracy for the test set of C-DONN-1 and C-DONN-2 stands at 93.3% and 96.6%, respectively. In contrast, the accuracy for the var-FDTD simulation is 90.0% and 93.3%, respectively. Consequently, the agreement between the two methods is remarkable, with a matching degree of 96.5% for C-DONN-1 and 96.6% for C-DONN-2.

Fig. 10. The confusion matrixes of the (a) C-DONN-1, (b) C-DONN-2 generated based on the results of var-FDTD.

Download Full Size | PDF

5. Discussion

5.1 Performance of proposed DONN framework

During the modeling process of our C-DONNs, we employed the DMRM to replace existing approximation methods, aiming to reduce the footprint and quantify the approximation-induced error. However, a disparity between numerical calculation results and simulation results is evident. Despite the high degree of matching between the two, discrepancies persist. These deviations largely stem from the cumulative error of the DMRM and the propagation between adjacent layers. To mitigate this issue, expanding the scale of the DMRM and corresponding training data during the DMRM's design can further minimize the loss.

The C-DONN-2 model proposed in this study encompasses two metalines consisting of 100 trainable neurons (100 slots). It achieves a classification accuracy of 93.3% for the Iris plants dataset. This accuracy surpasses that of other on-chip DONNs [31], despite our C-DONN requiring fewer neurons, thus boasting a higher degree of integration. This reveals that our model can more accurately and comprehensively represent the influence of the silica slot structure on the metaline's optical field. Upon finalizing the design, our C-DONN structure is entirely comprised of optical components, consuming energy only during signal loading and result detection stages. The computation part of the process remains entirely passive.

For the fabrication imperfections, compared with previous work [31], the proposed method can further decrease the neurons needed for the classification, increase the accuracy of mapping process and integration level. Once fabrication complete, the errors of the proposed C-DONN will be easier for compensation.

5.2 Impact of training data size on DMRM

To ascertain the influence of training data size on the DMRM's fitting efficiency, we implemented different training data sizes in the model's fitting process and logged the corresponding fitting CMSE loss curves. Figure 11 displays the loss change curves corresponding to the training data sizes of 12,000, 24,000, 48,000, and 72,000, respectively. In this experiment, we modified model structures (such as the number of hidden layers) to better accommodate the size of the training data. As shown in Fig. 11(a), a significant overfitting phenomenon is present. However, Fig. 11(d) demonstrates only minimal overfitting. As the training set size increases, the model's degree of overfitting markedly decreases. Simultaneously, due to the model structure adjusting and becoming more complex as the training data increases, the training loss also decreases. This substantiates that augmenting the training data exerts a positive influence on the model. Thus, by enlarging the training data and deepening the model structure, the DMRM's fitting efficiency can be further enhanced.

Fig. 11. The loss curves on the training set (black line) and validation set (red line) for the DMRM with (a) 12,000 (b) 24,000 (c) 48,000 (d) 72,000 training data.

Download Full Size | PDF

5.3 Computational speed and latency

The speed of C-DONN is defined as the number of input vectors that can be processed per unit time, which is limited by the high-speed photodetectors. Assuming that the rate of the photodetector is 50 GHz, the system has m layers, and each layer contains N neurons, then the number of operations per second (OPS) can be calculated by Eq. (1)0 [30]:

(10)$$R = 2M \times {N^2} \times 5 \times {10^{10}}\textrm{ OPS}$$

In this study, taking the C-DONN-2 as an example, here $M\; = \; 1$, $N\; = 50$, so the computing speed of the C-DONN-2 can reach $2.5\; \times \; {10^{14}}$ OPS. This is much higher than the performance of modern GPU.

The latency of C-DONN is defined as the overall time between the start of the loading signal (input source, ${E^{in}}$) and the detection of the output signal (computing an inference result, ${E^{out}}$) (i.e., the travel time for an optical input through all layers). In our C-DONN, the latency can be calculated by Eq. (1)1 [30]:

(11)$$Latency = {D_{nw}} \times c_1^{ - 1} + m \times {D_m} \times c_2^{ - 1} + ({D_{wf}} + (m - 1) \times {D_p} + {D_f}) \times c_3^{ - 1}$$

As an example, for our designed C-DONN-2, ${n_{eff1}} = 2.33$, ${n_{eff2}} = 2.166$, ${n_{eff3}} = 2.84$, the latency is approximately 0.52 ps. Cause the integration is much higher, the latency is much less than other existing on-chip DONNs.

5.4 Footprint and integration

At present, interference-based ONNs leveraging MZIs [15], as well as pulse-based ONNs utilizing micro-ring resonators [14], cannot significantly increase the neuron count due to the substantial footprint of each individual device. The on-chip DONNs have addressed this issue, but the potential to further enhance the footprint and integration is restricted due to the neuron mapping approximation.

In our design, the width of compact-DONNs (C-DONNs) is tied to the neuron count in each layer, in this case, 30 µm. The length is related to the number of layers, denoted as m, and is calculated as $({15 + m\; \times \; 18} )\; \mathrm{\mu} \textrm{m}\;({m = 1,\; 2,\; 3, \ldots } )$. Consequently, the footprint of a C-DONN-m is approximately $30\; \mathrm{\mu} m\; \times \; ({15 + m\; \times \; 18} )\; \mathrm{\mu} m\; ({m = 1,\; 2,\; 3, \ldots } )$. Considering the C-DONN-2 as an example, the footprint approximates to $30\; \mathrm{\mu} m\; \times \; 51\; \mathrm{\mu} m = 1530\; \mathrm{\mu} {m^2}$. As there is no need for additional approximation conditions, each layer contains 50 trainable parameters. Therefore, the proposed on-chip DONN-2 consists of 100 trainable neurons. According to our current design approach, with a fixed spacing of 15 µm between hidden layers, more than 60,000 neurons can be designed per square millimeter. This is a substantial improvement over currently proposed on-chip DONNs [30–37].

Table. 1 presents a comparison between the proposed C-DONN and other ONNs and DONNs. Our work proposes a design methodology and has enhanced the integration degree of existing on-chip integrated DONN by an order of magnitude. Since the distance between hidden layers can be freely designed during the modeling process, the density of integrated neurons can be further increased according to design requirements.

Table 1. Comparison between the proposed C-DONN and other ONNs and DONNs

View Table

5.5 Scalability

For the input dimension, the proposed C-DONNs in our study have four input channels, corresponding to the inputs of the Iris dataset. However, the number of input channels can be increased to accommodate different tasks, depending on the number of input features required. An increase in input dimensions will require a larger chip footprint and more neurons. Moreover, as dimensions and neurons increase, the DMRM training process will demand more computational power and time. However, once the DMRM is trained, the C-DONN, owing to its superior integration level and mapping accuracy, can handle higher dimensional features using fewer neurons and a smaller footprint. This suggests that our proposed design method can better achieve input dimension scalability. Nevertheless, when input dimensions significantly increase (e.g., with image input), the linear arrangement of input ports may present challenges. An alternative approach involves using the DONN as a convolutional kernel, conducting multiple local operations on high-dimensional inputs [43]. For the scalability of depth, an increase in depth may complicate the training process and affect mapping accuracy. Additionally, adding more hidden layers will result in a larger footprint, error accumulation, and higher insertion loss. Despite these challenges, the higher integration level and smaller footprint of our design method can better accommodate an increase in depth.

6. Conclusion

In summary, this paper introduces a method for designing a C-DONN architecture based on the SOI platform. By using the DMRM to approximate the complex neuron mapping process, we've validated this method through the classification of the Iris dataset. Without requiring additional approximations, this method accurately represents both the mapping and propagation processes. The proposed C-DONN design is highly compact, integrated, and its chip manufacturing process is compatible with CMOS processing, facilitating large-scale, cost-effective fabrication. Compared to other ONNs, our C-DONN boasts a simpler structure, passive all-optical operation, and massive-scale neuron integration. Moreover, the neuron integration per unit area surpasses existing on-chip DONNs by an order of magnitude, while using significantly fewer neurons for equivalent performance.

Funding

National Natural Science Foundation of China (62135009); Beijing Municipal Science and Technology Commission (Z221100005322010).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature 521(7553), 436–444 (2015). [CrossRef]

2. B. Yegnanarayana, Artificial neural networks (PHI Learning Pvt. Ltd., 2009).

3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM 60(6), 84–90 (2017). [CrossRef]

4. D. A. Forsyth and J. Ponce, Computer vision: a modern approach (prentice hall professional technical reference, 2002).

5. T. Ananthanarayana, P. Srivastava, A. Chintha, A. Santha, B. Landy, J. Panaro, A. Webster, N. Kotecha, S. Sah, and T. Sarchet, “Deep learning methods for sign language translation,” ACM Trans. Access. Comput. 14(4), 1–30 (2021). [CrossRef]

6. K. Chowdhary and K. Chowdhary, “Natural language processing,” Fundamentals of artificial intelligence 603–649 (2020).

7. C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle, Molecular systems biology 12, 878 (2016).

8. C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle, “Deep learning for computational biology,” Mol. Syst. Biol. 12(7), 878 (2016). [CrossRef]

9. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Med. Image Anal. 42, 60–88 (2017). [CrossRef]

10. J. L. Hennessy and David A Patterson, “Patterson,” Computer architecture: a quantitative approach (Elsevier, 2011).

11. D. Kirk, “NVIDIA CUDA software and GPU parallel computing architecture,” in ISMM, (2007), 103–104.

12. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, (2015), 161–170.

13. E. Khoram, A. Chen, D. Liu, L. Ying, Q. Wang, M. Yuan, and Z. Yu, “Nanophotonic media for artificial neural inference,” Photonics Res. 7(8), 823–827 (2019). [CrossRef]

14. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]

15. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and D. Englund, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

16. M. Y.-S. Fang, S. Manipatruni, C. Wierzynski, A. Khosrowshahi, and M. R. DeWeese, “Design of optical neural networks with component imprecisions,” Opt. Express 27(10), 14009–14029 (2019). [CrossRef]

17. I. A. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, “Reprogrammable electro-optic nonlinear activation functions for optical neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–12 (2020). [CrossRef]

18. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864–871 (2018). [CrossRef]

19. H. Zhang, M. Gu, X. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, and M. Yung, “An optical neural chip for implementing complex-valued neural network,” Nat. Commun. 12(1), 457 (2021). [CrossRef]

20. C. Huang, S. Fujisawa, T. F. de Lima, A. N. Tait, E. C. Blow, Y. Tian, S. Bilodeau, A. Jha, F. Yaman, and H.-T. Peng, “A silicon photonic–electronic neural network for fibre nonlinearity compensation,” Nat. Electron. 4(11), 837–844 (2021). [CrossRef]

21. F. Ashtiani, A. J. Geers, and F. Aflatouni, “An on-chip photonic deep neural network for image classification,” Nature 606(7914), 501–506 (2022). [CrossRef]

22. J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, and A. S. Raja, “Parallel convolutional processing using an integrated photonic tensor core,” Nature 589(7840), 52–58 (2021). [CrossRef]

23. J. Bueno, S. Maktoobi, L. Froehly, I. Fischer, M. Jacquot, L. Larger, and D. Brunner, “Reinforcement learning in a large-scale photonic recurrent neural network,” Optica 5(6), 756–760 (2018). [CrossRef]

24. T. Yan, J. Wu, T. Zhou, H. Xie, F. Xu, J. Fan, L. Fang, X. Lin, and Q. Dai, “Fourier-space diffractive deep neural network,” Phys. Rev. Lett. 123(2), 023901 (2019). [CrossRef]

25. Z. Wu, M. Zhou, E. Khoram, B. Liu, and Z. Yu, “Neuromorphic metasurface,” Photonics Res. 8(1), 46–50 (2020). [CrossRef]

26. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

27. D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Analysis of diffractive optical neural networks and their integration with electronic neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–14 (2020). [CrossRef]

28. Z. Xu, X. Yuan, T. Zhou, and L. Fang, “A multichannel optical computing architecture for advanced machine vision,” Light: Sci. Appl. 11(1), 255 (2022). [CrossRef]

29. T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, “Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,” Nat. Photonics 15(5), 367–373 (2021). [CrossRef]

30. T. Fu, Y. Zang, H. Huang, Z. Du, C. Hu, M. Chen, S. Yang, and H. Chen, “On-chip photonic diffractive optical neural network based on a spatial domain electromagnetic propagation model,” Opt. Express 29(20), 31924–31940 (2021). [CrossRef]

31. T. Fu, Y. Zang, Y. Huang, Z. Du, H. Huang, C. Hu, M. Chen, S. Yang, and H. Chen, “Photonic machine learning with on-chip diffractive optics,” Nat. Commun. 14(1), 70 (2023). [CrossRef]

32. T. Yan, R. Yang, Z. Zheng, X. Lin, H. Xiong, and Q. Dai, “All-optical graph representation learning using integrated diffractive photonic computing units,” Sci. Adv. 8(24), eabn7630 (2022). [CrossRef]

33. S. Zarei and A. Khavasi, “Realization of optical logic gates using on-chip diffractive optical neural networks,” Sci. Rep. 12(1), 15747 (2022). [CrossRef]

34. Z. Wang, L. Chang, F. Wang, T. Li, and T. Gu, “Integrated photonic metasystem for image classifications at telecommunication wavelength,” Nat. Commun. 13(1), 2131 (2022). [CrossRef]

35. H. Zhu, J. Zou, H. Zhang, Y. Shi, S. Luo, N. Wang, H. Cai, L. Wan, B. Wang, and X. Jiang, “Space-efficient optical computing with an integrated chip diffractive neural network,” Nat. Commun. 13(1), 1044 (2022). [CrossRef]

36. Z. Wang, T. Li, A. Soman, D. Mao, T. Kananen, and T. Gu, “On-chip wavefront shaping with dielectric metasurface,” Nat. Commun. 10(1), 3547 (2019). [CrossRef]

37. S. Zarei, M.-r. Marzban, and A. Khavasi, “Integrated photonic neural network based on silicon metalines,” Opt. Express 28(24), 36668–36684 (2020). [CrossRef]

38. J. Peurifoy, Y. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy, J. D. Joannopoulos, M. Tegmark, and M. Soljačić, “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. 4(6), eaar4206 (2018). [CrossRef]

39. J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN'95-international conference on neural networks, (IEEE, 1995), 1942–1948.

40. C. Blake, “UCI repository of machine learning databases,” http://www. ics. uci. edu/∼ mlearn/MLRepository. html (1998).

41. S. H. Strogatz, “Exploring complex networks,” nature 410(6825), 268–276 (2001). [CrossRef]

42. C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv, arXiv:1705.09792 (2017).

43. Y. Huang, T. Fu, H. Huang, S. Yang, and H. Chen, “Sophisticated deep learning with on-chip optical diffractive tensor processing,” Photonics Res. 11(6), 1125 (2023). [CrossRef]

	Basic units	Unit structure size	Distance between adjacent layers	Integration in theory (neurons/ $m m^{2}$ )
Ref. [15]	MZI	$55 \times 220 μ m^{2}$	\	< 10
Ref. [14]	MR covered with PCM	$285 \times 250 μ m^{2}$	\	< 5
Ref. [20]	MRR	$π \times {8.0}^{2} μ m^{2}$	\	∼ 2,500
Ref. [26]	Spatial diffraction unit	$π \times 150^{2} μ m^{2}$	\	∼ 10
Ref. [31]	Subwavelength unit	$1.5 \times 2 μ m^{2}$	300 $μ m$	∼ 2,000
Ref. [34]	Subwavelength unit	$1 \times 2.5 μ m^{2}$	100 $μ m$	∼ 6,000
Our work	Subwavelength unit	$0.5 \times 3 μ m^{2}$	15 $μ m$	> 60,000

C-DONN: compact diffractive optical neural network with deep learning regression

Abstract

1. Introduction

2. Structure of C-DONN

3. Modeling of C-DONN

3.1 Deep mapping regression model

3.2 On-chip electromagnetic propagation model

4. Verification of C-DONNs on Iris dataset

4.1 Parameters optimization algorithm

4.2 Benchmarking on Iris dataset

5. Discussion

5.1 Performance of proposed DONN framework

5.2 Impact of training data size on DMRM

5.3 Computational speed and latency

5.4 Footprint and integration

5.5 Scalability

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (1)

Equations (11)

Optics Express