The diamond mesh, a phase-error- and loss-tolerant field-programmable MZI-based optical processor for optical neural networks

Farhad Shokraneh; Simon Geoffroy-gagnon; Odile Liboiron-Ladouceur

doi:10.1364/OE.395441

1. Introduction

Field-programmable optical structures have proven to be promising structures for vector-matrix multiplications in machine learning tasks, such as neural networks (NNs), thanks to the inherent parallelism of optics [1–6]. Employing power efficient computational accelerators in silicon photonic platforms allows for constructing larger NNs to meet the requirements of current and upcoming applications [7]. Over the last few years, silicon photonics has experienced a substantial progress in fabrication techniques to manufacture interferometric optical processors with small footprints and high power efficiency [1,8–10]. However, the optical components and measurement equipment imperfections play a key role in the performance of the optical devices. As a result, a thorough analysis should be done on the effects of these imperfections prior to fabricating the device, which necessitates the design of more robust structures against these performance degrading factors. The use of reconfigurable Mach-Zehnder interferometers (MZIs) allows for controlling the power and the relative phase of the outputs, which can be exploited for implementing field-programmable optical interferometers with different mesh topologies [11,12]. In this regard, the transformation matrix of a given application can be determined by the successive products of the appropriately defined unitary transformation matrices of the MZIs. Such an optical matrix-vector multiplier is constructed based on the phase extraction from the decomposition of its transformation matrix into that of the constituent MZIs [13,14].

This paper presents the performance analysis of a multiport field-programmable MZI-based optical processor employed in a single layer optical NN (ONN). The results are compared to the Reck mesh used in our previous works [13,14]. Both structures can be experimentally calibrated and programmed for a given application. Compared to the Reck structure, the proposed diamond mesh employs $(N-1)(N-2)/2$ additional reconfigurable MZIs, which makes its topology more symmetric, allowing for consistent path losses from the inputs to the outputs of the structure. The additional MZIs in the diamond mesh allows the structure to be programmed in a way that optimal light intensities are directed towards the output ports for a better classification performance of the implemented ONNs. This is achieved by excluding the light intensity that causes incorrect classifications through the tapered-out waveguides of these MZIs. The analytical results presented in this work show the effects of experimental uncertainties on the classification accuracy of the ONNs implemented by different sizes of the Reck and the diamond structures. Our results on the classification performance and scalability of the two structures show that the additional MZIs in the proposed diamond mesh provides extra degrees of freedom in optimizing the weight matrix of the implemented optical NNs (ONNs). Consequently, the diamond mesh is more robust to fabrication process variations and experimental imperfections, i.e., insertion loss (IL) of the MZIs and phase errors.

2. Theory and background

A multiport field-programmable MZI-based optical processor is a mesh of 2$\times$2 reconfigurable MZIs. As shown in Fig. 1, a 2$\times$2 reconfigurable MZI is composed of two 3-dB couplers with one phase shifter ($\theta$) on the top arm in between, and another one ($\phi$) on the top output of the MZI.

Fig. 1. Layout of a 2$\times$2 reconfigurable MZI which consists of two 3-dB couplers and two phase shifters ($\theta$ and $\phi$).

Download Full Size | PDF

By adjusting the phase shifters $\theta$ and $\phi$, the MZI can perform all rotations in unitary group of degree two, U(2). The phase shifters $\theta$ and $\phi$ control the transmitted field and the relative phase at the outputs, respectively [12]. In this regard, the linear transformation matrix of the MZI can be given by

(1)$$\hspace{-0.25cm} \begin{array}{cc} [D_{MZI}]= je^{j\Big(\dfrac{ \theta}{2}\Big)}\begin{bmatrix} e^{j \phi}\sin{\Big(\dfrac{\theta}{2}\Big)} & e^{j \phi}\cos{\Big(\dfrac{\theta}{2}\Big)} \\ \cos{\Big(\dfrac{\theta}{2}\Big)} & -\sin{\Big(\dfrac{\theta}{2}\Big)} \end{bmatrix}. \end{array}$$

It is essential to note that the field coupling coefficient of the couplers is $\sqrt {\rho }$ = $\sqrt {0.5}$. If the input fields are $E_{I_1}$ = $E_{in} \,e^{j0}$ and $E_{I_2}$ = 0, using transformation matrix of each component of the MZI given in Eq. (1), the electric field at the outputs of the MZI can be determined by

(2)$$E_{O_1}= j e^{j\Big(\dfrac{\theta}{2}\Big)}\, E_{in}\,\sin{\Big(\dfrac{\theta}{2}\Big)} \, e^{j \phi},$$

(3)$$E_{O_2}= j e^{j\Big(\dfrac{\theta}{2}\Big)}\, E_{in}\,\cos{\Big(\dfrac{\theta}{2}\Big)}\, e^{j 0}.$$

According to Eqs. (2) and (3), the outputs are coherent optical modes and the relative phase of the outputs can be adjusted by the phase shifter $\phi$.

In this context, the field intensity $\mathcal {I}$ is expressed as

(4)$$\mathcal{I} \propto (E\cdot \overline{E})=|E|^2,$$

where $\overline {E}$ denotes the complex conjugate of $E$. Consequently, the power splitting ratio of the 3-dB couplers in the MZI is $\rho$ = 0.5 and the field intensities at the two outputs are obtained by

(5)$$\mathcal{I}_{O_1}= |E_{in}|^2\,\sin^2{\Big(\dfrac{\theta}{2}\Big)}= \mathcal{I}_{in}\,\sin^2{\Big(\dfrac{\theta}{2}\Big)},$$

(6)$$\mathcal{I}_{O_2}= |E_{in}|^2\,\cos^2{\Big(\dfrac{\theta}{2}\Big)}= \mathcal{I}_{in}\,\cos^2{\Big(\dfrac{\theta}{2}\Big)},$$

where $T_{O_1}=\sin ^2{\left ( \frac {\theta }{2}\right )}$ and $T_{O_2}=\cos ^2{\left (\frac {\theta }{2}\right )}$ represent the power transmission coefficients of the MZI output ports $O_1$ and $O_2$, respectively, such that $T_{O_1}+T_{O_2}=1$. $\theta =0$ leads to the total transmission at $O_2$, whereas $\theta =\pi$ results in maximum transmission at $O_1$. As can be inferred from Eqs. (5) and (6), the internal phase shifter $\theta$ controls the power levels at the outputs of the MZI. It is also expected that the input light intensity is equal to the total light intensity at the two outputs of an ideal lossless MZI, which can be concluded from Eqs. (5) and (6) as follow

(7)$$\mathcal{I}_{in}= \mathcal{I}_{O_1}+\mathcal{I}_{O_2}.$$

The proposed 4$\times$4, i.e., $N=4$, diamond reconfigurable MZI-based optical processor is composed by $n=(N-1)^2=9$ MZIs. As shown in Fig. 2, the structure has $(N-1)(N-2)/2=3$ additional MZIs, i.e., MZIs (7), (8) and (9) as compared to the Reck structure investigated in our previous works [13,14]. Both structures can be experimentally calibrated and programmed for a given application.

Fig. 2. Layout of the 4$\times$4 diamond reconfigurable MZI-based structure.

Download Full Size | PDF

As shown in Fig. 2, the structure linearly relates the four inputs to the four outputs through the field interactions between its nine MZIs.

To construct the unitary transformation matrix of the 4$\times$4 diamond structure shown in Fig. 2, the 2$\times$2 unitary transformation matrix of each MZI is defined on a two-dimensional subspace within a $(2N-2=6)$-dimensional Hilbert space ($H_{6 \times 6}$) based on its location in the mesh. These matrices are denoted by $[D^{(n)}]_{H_{6 \times 6}}$, where $n$=1, 2, 3,…, 9 represents the labels of the MZIs in the structure. As a result, the contribution of each MZI is reflected into the unitary transformation matrix of the entire structure with respect to its connections to the optical paths within the mesh. In this regard, the unitary transformation matrix of the field-programmable MZI-based structure can be calculated by

(8)$$\begin{array}{c} [T_{U(6)}]= [D^{(6)}]_{H_{6 \times 6}}\, \cdot \,[D^{(5)}]_{H_{6 \times 6}}\, \cdot \,[D^{(4)}]_{H_{6 \times 6}} \cdot \,[D^{(9)}]_{H_{6 \times 6}}\, \cdot \,[D^{(3)}]_{H_{6 \times 6}}\, \cdot \\[0.2 cm] \, [D^{(2)}]_{H_{6 \times 6}} \,\cdot \,[D^{(8)}]_{H_{6 \times 6}}\, \cdot \,[D^{(7)}]_{H_{6 \times 6}}\, \cdot \,[D^{(1)}]_{H_{6 \times 6}}\,. \end{array}$$

To determine the required phases for implementing the unitary transformation matrix $[T_{U(6)}]$ given by an application, it is essentially decomposed based on the protocol given in [13]. Figure 3 depicts the decomposition process order of $[T_{U(6)}]$ which is carried out through its successive product with the inverse transformation matrices of the MZIs defined in a six-dimensional Hilbert space. Due to the unitary property of U(2) matrix given by Eq. (1), the inverse transformation matrix of each reconfigurable MZI in the structure is equal to its conjugate transpose. The decomposition process is carried out based on the fact that the structure implementing $[T_{U(6)}]$ in the forward propagation, is setup to eliminate the effects of the MZIs, one by one, in the reverse direction, i.e., when the light propagates within the structure from the right to the left. As shown in Fig. 3, the decomposition of $[T_{U(6)}]$ starts from the layer of MZIs on the far left part of the structure indicated by the blue box (MZIs (1), (2), and (4), respectively) followed by the second layer of MZIs, i.e., the red box (MZIs (7), (3) and (5), in the named order) and finally, the green box (MZIs (8), (9) and (6), respectively).

Fig. 3. Schematic illustration of the decomposition order of the MZIs in the 4$\times$4 diamond reconfigurable MZI-based optical processor.

Download Full Size | PDF

Therefore,

(9)$$\begin{array}{c} [T_{U(6)}]\, \cdot \, [D^{(1)}]^{{-}1}_{H_{6 \times 6}}\, \cdot \,[D^{(2)}]^{{-}1}_{H_{6 \times 6}}\, \cdot \,[D^{(4)}]^{{-}1}_{H_{6 \times 6}} \cdot \,[D^{(7)}]^{{-}1}_{H_{6 \times 6}}\,\cdot \,[D^{(3)}]^{{-}1}_{H_{6 \times 6}}\, \\ \hspace{-0.2 cm} \cdot \, [D^{(5)}]^{{-}1}_{H_{6 \times 6}}\, \cdot \,[D^{(8)}]^{{-}1}_{H_{6 \times 6}}\, \cdot \,[D^{(9)}]^{{-}1}_{H_{6 \times 6}}\, \cdot \,[D^{(6)}]^{{-}1}_{H_{6 \times 6}}\,= [I]_{(6)}. \end{array}$$

In each step of the decomposition process, an off diagonal matrix element in the resultant matrix becomes zero, and thanks to the unitary property, it will not be changed by the transformations in the following steps. Furthermore, once all off diagonal elements in each row become zero, the corresponding diagonal element is set to one and all off-diagonal elements in the corresponding column become zero. Eventually, this process results in an identity matrix of order six, ($[I]_{(6)}$). Therefore, the required phases to implement $[T_{U(6)}]$ can be calculated [13]. The resultant matrices in different steps of the decomposition process for $[T_{U(6)}]$ is expressed by $[T^{(n)}]_{H_{6\times 6}}$, which is multiplied by the inverse transformation matrix of the $n^{th}$ MZI, $[D^{(n)}]^{-1}_{H_{6 \times 6}}$, to null the off-diagonal matrix element in the corresponding step, given by

(10)$$[T^{(n)}]_{6\times 6} = \left[\begin{matrix} * & * & * & \color{brown}{*} & 0 & 0 \\ * & * & * & \color{brown}{*} & \color{brown}{*} & 0 \\ * & * & \color{blue}{*} & \color{blue}{*} & \color{blue}{*} & \color{blue}{*} \\ {\color{brown}{*}^{\color{brown}{(8)}}} & {\color{brown}{*}^{\color{brown}{(9)}}} & {\color{blue}{*}^{\color{blue}{(6)}}} & \color{blue}{*} & \color{blue}{*} & \color{blue}{*} \\ 0 & {\color{brown}{*}^{\color{brown}{(7)}}} & {\color{blue}{*}^{\color{blue}{(3)}}} & {\color{blue}{*}^{\color{blue}{(5)}}} & \color{blue}{*} & \color{blue}{*} \\ 0 & 0 & {\color{blue}{*}^{\color{blue}{(1)}}} & {\color{blue}{*}^{\color{blue}{(2)}}} & {\color{blue}{*}^{\color{blue}{(4)}}} & \color{blue}{*} \end{matrix}\right],$$

where $*$ in different colors denotes unknown matrix elements in the resultant matrices during the decomposition process. The matrix elements ${\color{blue}{*^{(n)}}}$ and ${\color{blue}{*}}$ correspond to the Reck mesh matrix elements within $[T_{U(6)}]$ of the diamond mesh. These elements are set to zero or one during the decomposition process through the multiplication of the related resultant matrices in the corresponding steps by the inverse transformation matrix of MZIs (1) to (6), i.e., $[D^{(n)}]^{-1}_{H_{6 \times 6}}$. As a result, one can determine the required phases for implementing $[T_{U(4)}]$ corresponding to the triangular (Reck) mesh studied in our previous works [13,14]. Furthermore, the matrix elements ${\color{brown}{*^{(n)}}}$ and ${\color{brown}{*}}$ in the related resultant matrix $[T^{(n)}]_{6\times 6}$ are associated with MZIs (7), (8) and (9), which are set to zero through the decomposition process of the diamond mesh. The use of these additional MZIs in the diamond mesh makes its topology more symmetric with evenly distributed MZIs which allows for more power-balanced optical paths from the inputs to the outputs of the structure. Consequently, the structure is more robust with respect to fabrication and experimental imperfections, such as insertion loss of the MZIs and the phase errors. The diamond mesh constructs a unitary transformation matrix of degree six from which a $4\times 6$ subspace is used for its performance analysis in ONNs and the results are compared with that of the $4\times 4$ Reck mesh.

The diamond mesh shown in Fig. 2 can be viewed as a 6$\times$6 Reck mesh in which six MZIs are excluded from the structure. As a result, it can optimally eliminate the excess light degrading the classification performance of the device through the tapered-out waveguides of its three additional MZIs. Those six MZIs in the diamond mesh correspond to the matrix elements $T_{2,1}$, $T_{3,1}$, $T_{3,2}$, $T_{5,1}$, $T_{6,1}$ and $T_{6,2}$ in the lower triangle of $[T^{(n)}]_{6\times 6}$ given by Eq. (10). As can be inferred from this equation, in the transformation matrix of the diamond mesh, the effects of the six MZIs are eliminated from the unitary transformation matrix of the corresponding Reck mesh. As a result, the programming protocol for the Reck mesh presented in [13] can be used to program the diamond mesh, as shown in Fig. 3. This allows to compare the two topologies within the same programming framework. The next section shows how the proposed diamond structure is employed to implement single layer ONNs, and the obtained classification accuracies are compared to that of the Reck mesh studied in [14].

3. Optical neural networks for classification

NNs are machine learning models that can be used for performing various computational tasks. In this sense, a series of matrix multiplications denoted by $\mathbf {W}$ are interleaved with nonlinear activation functions $H(\cdot )$ to perform the required machine learning tasks, such as classification mechanisms in voice and image recognition [15,16], autonomous driving control systems [17]. Such a classification process involves taking multi-dimensional inputs, $\mathbf {I}^0$, and sorting them into their respective classes, $c$. Each input $\mathbf {I}^0 \in \mathbb {R}^f$ is a multi-dimensional sample that represents multiple features, where $f$ is the number of features. The output vector $\mathbf {\hat {O}}$ in the NN is classified based on the index of the element with the maximum value. The size of the weight matrix $\mathbf {W}^{k}$ in a NN represented by $m^k$ can vary from one layer to another, where $m$ is the size of the matrix in the $k^{\mathrm {th}}$ layer, $k \in \{1,\,\ldots ,\, K\}$. Therefore, the output of layer $k$ is expressed as $\mathbf {O}^k = H(\mathbf {W}^{k} \cdot O^{k-1})$. Consequently, the NN with an input $\mathbf {I}^0 \in \mathbb {R}^f$ produces a final output $\mathbf {\hat {O}} \in \mathbb {R}^{c}$, where $c$ is the number of possible classes. In other words, the output of layer $1$ represented by $\mathbf {O}^1 = H(\mathbf {W}^1 \cdot \mathbf {I}^0)$ passes through the rest of the layers, being transformed by $\mathbf {W}^k$ and $H^k(\cdot )$ to generate the final output $\mathbf {\hat {O}} = H(\mathbf {W}^{K} \cdot \mathbf {O}^{K-1})$, where the maximum argument in the final output of the NN $\mathbf {\hat {O}}$ designates the class of the input sample.

The optimal weight matrix in ONNs is obtained by performing backpropagation all the way down to the phases in the MZIs phase shifters using interferometric measurements to obtain the loss gradient using the Neuroptica Python package [18]. This permits the backpropagation to the phases, as opposed to simply to the NN matrix weights. The algorithms in Neuroptica use the adjoint electric field method to implement the in-situ backpropagation routine, where loss gradients are obtained by interferometric measurements rather than by gradient descent [3,19]. It should be noted that while the results are similar to the standard backpropagation algorithm described in this work, the phases themselves are modified rather than the matrix weights [20]. During the training process of the NN, the weight matrix elements are optimized through backpropagation, where the output vector is compared to the true class of the sample, called the ground truth $\mathbf {O}$ by using a loss function $\mathcal {L}$ [21]. The loss function is a distance metric such as Mean Squared Error (MSE) between the obtained values of $\mathbf {\hat {O}}$ and $\mathbf {O}$ given by

(11)$$\mathcal{L}_{\mathrm{(MSE)}} = \frac{1}{S}\sum_{p=1}^S(\mathbf{\hat{O}}_p - \mathbf{O}_p)^2,$$

where $S$ is the total number of samples in the training dataset, respectively. The ground truth $\mathbf {O}$ is a one-hot-encoded vector with one element being set to 1 depending on the class of the sample, while every other element being set to 0 [22]. The use of MSE for classification purposes in this work is only to train a linear ONN, and MSE was used for its simplicity. In a single layer NN which is the case of this research work, there is no difference between using categorical cross-entropy or MSE as the loss function, since they both achieve a final validation classification accuracy of 100%. An optimal $\mathbf {W}$ results in a lower loss function value, which will increase the training classification accuracy [23,24]. The backpropagation algorithm modifies the weight matrix elements in the NN based on the product of a learning rate $\alpha$ with the negative gradient of the loss function $\mathcal {L}$ with respect to the weights $\mathbf {W}$ in the NN. If this process is repeated enough times, the loss function decrease and, as a result, the classification accuracy will increase [21,25].

(12)$$\mathbf{W}^{t + 1} = \mathbf{W}^t - \alpha \cdot {\nabla}_{\mathbf{W}^t} \mathcal{L},$$

where ${\nabla}$ denotes the gradient operator and $t$ is the current epoch, i.e., a single training cycle of the entire training process. It should be noted that in a single layer NN, a linearly separable dataset can be classified perfectly with no need for a nonlinear activation function at the output [14,26]. The reason lies in the fact that most nonlinear activation functions are monotonic and do not affect the classification of a sample in a single layer NN. The optimal weight matrix in ONNs is obtained by performing backpropagation to find optimal phases in the MZIs phase shifters. According to Eqs. (1) and (8), the phase shifters $\theta$ and $\phi$ of the MZIs determine the weight matrix elements of the structure. The use of linearly separable dataset allows for evaluating the proposed diamond structure itself and investigate the device performance improvement compared to the Reck structure, rather than enhancing the NN algorithm. Additionally, the datasets were separated in an 80:20 ratio between the training set and validation set. The dataset is 100% classifiable by a digital NN such that the ONNs implemented by the two structures can be assessed in the presence of experimental imperfections, i.e., phase errors and IL of each MZI.

3.1 Single layer ONNs implemented by the 4$\times$4 Reck and diamond meshes

Figure 4(a) shows the dataset used to characterize a 4$\times$4 single layer ONN. The dataset consists of four multivariate Gaussian distributions represented by different four colors, each of which composed of a set of four-dimensional points, i.e., $\mathbf {I} \in \mathbb {R}^4$. The four different Gaussian distributions are classified using a one-hot encoding scheme such that the correct output vector for a single sample is $\mathbf {O} \in \mathbb {R}^4$ in which one element is set to 1 depending on the class of the sample, while the rest are set to zero [22]. Figure 4(b) shows the backpropagation results for the diamond mesh with 0 dB loss per MZI used in the single layer ONN. According to this figure, the loss function reaches a minimum value of approximately 0.18, while the ONN achieves a final validation accuracy of 98.75%. It is essential to note that the backpropagation process optimizes the weight matrix to minimize the loss function [21].

Fig. 4. Dataset and training algorithm of the single layer ONN implemented by the 4$\times$4 diamond mesh with 0 dB loss per MZI; (a) Multivariate Gaussian dataset, (b) Backpropagation (training) process of the single layer ONN for weight matrix optimization through minimizing the loss function value.

Download Full Size | PDF

According to Fig. 4(b), the training and validation classification accuracies are well matched which implies that the ONN is still generalizing quite well after 200 epochs. Using the validation dataset, the validation accuracy is determined and compared with the training accuracy of the ONN.

It should be noted that the training process of the ONN is carried out for ideal scenarios with perfect MZIs, i.e. no loss nor phase uncertainties. The standard deviation values of the phases $\theta$ and $\phi$ and the loss per MZI are then added to perform the analysis for the given mesh of MZIs. Figure 5 shows the classification accuracy of the single layer ONN implemented by the 4$\times$4 structure size of the Reck mesh and the diamond mesh as a function of the phase error standard deviation, $\sigma _\theta$ and $\sigma _\phi$, for 0 dB and 1 dB loss per MZI with a standard deviation of 0.5 dB. The results in this work are obtained based on adding different normally distributed error terms to the phases and insertion loss per MZI with a specific standard distribution for every sample. In this regard, the phases and the insertion loss of the MZIs were defined by

(13)$$\theta = \theta_{true} + \mathcal{N}(0, \sigma^2_\theta),$$

(14)$$\phi = \phi_{true} + \mathcal{N}(0, \sigma^2_\phi),$$

(15)$$IL = IL_{mean} + \mathcal{N}(0, \sigma^2_{IL}),$$

where $\mathcal {N}$ is the added Gaussian noise with zero mean and a standard deviation of $\sigma$ for different parameters. This entire process is repeated multiple times and the mean classification accuracy is taken as the final classification accuracy [20]. The simulated standard deviations of the phase errors and the loss per MZI are implemented only in the testing phase of the ONN. This is done by adding a different normally distributed error term to the phases and insertion losses per MZI with a specific standard distribution for every sample, repeating the entire testing phase multiple times and taking the mean of the classification accuracies as the final classification accuracy [20].

Fig. 5. Classification performance of the single layer ONNs implemented by the 4$\times$4 structure size of the Reck mesh and the proposed diamond mesh. The results are obtained as a function of the phase error standard deviation, $\sigma _\theta$ and $\sigma _\phi$, for 0 dB and 1 dB loss per MZI with 0.5 dB standard deviation.

Download Full Size | PDF

According to Fig. 5(a), as expected in perfect condition, i.e., 0 dB loss per MZI with a standard deviation of 0.5 dB and zero Rad phase uncertainty, both the Reck mesh and the diamond mesh provide 100% classification accuracy. However, the trend is different in practice since the phase error and the IL of the MZIs are almost inevitable. As shown in Fig. 5(b), in the case of 1 dB loss per MZI with a standard deviation of 0.5 dB and zero Rad phase error, the ONN implemented by a 4$\times$4 diamond mesh classifies the data samples with 85% accuracy, where as the classification accuracy obtained by that of the Reck structure is approximately 75%. As can be seen in this figure, the diamond mesh tends to outperform the Reck mesh when the phase uncertainty is less than 0.5 Rad, which confirms its better candidacy for practical scenarios. It should be noted that for large phase uncertainties, the classification accuracy of both the Reck-mesh-based and the diamond-mesh-based ONNs decay to that of a random classifier. The better performance of the diamond mesh can be associated with its possibility to pass out the excess light intensity which can affect its classification performance destructively, through the tapered-out waveguides of its three additional MZIs. Furthermore, the additional MZIs in the structure provide extra degrees of freedom during the training of the ONN to minimize the loss function value $\mathcal {L}_{\mathrm {(MSE)}}$. Moreover, the symmetric topology of the diamond mesh allows for more phase error robustness and better loss balanced optical paths from its inputs to outputs.

Figure 6 demonstrates how the classification performance of the 4$\times$4 Reck-topology-based and diamond-topology-based single layer ONNs are degraded by phase errors $\sigma _{\theta }$ and $\sigma _\phi$ in the constituent MZIs. As can be seen in the case of 4$\times$4 mesh size of the two structures, the classification accuracy is more sensitive to $\sigma _{\theta }$ than $\sigma _\phi$. This can be associated with the fact that in a single MZI, only the phase shifter $\theta$ determines the power splitting ratio at the outputs. However, in a mesh of MZIs, $\phi$ phase shifters also affect the power splitting ratios of the subsequent MZIs. In a small size of meshes, such as the 4$\times$4 Reck mesh or diamond mesh with only three layers of MZI, the impact of $\phi$ phase shifters on the power at the outputs is less than that of $\theta$ phase shifters, being more similar to that of a single MZI. This is mainly because, as the light propagates though the waveguides of the structure from the input ports towards the outputs, the $\phi$ phase shifters of the MZIs in the first layer, i.e., MZIs (1), (2), and (4), do not play a role in the adjustment of the power levels at the outputs of the corresponding MZIs. However, they affect the power splitting ratio of the MZIs in the next two layers. Additionally, the $\phi$ phase shifters in the last layer of MZIs, i.e., MZI (6) in the case of the Reck mesh and MZIs (8), (9), and (6) in the diamond mesh, do not have an impact on the power splitting ratio of the corresponding MZIs. Consequently, $\sigma _\phi$ is less significant than that of the $\sigma _{\theta }$ in the classification accuracy of the 4$\times$4 ONNs. However,as the mesh size increases, the number of MZI layers in the structure becomes larger, and thus, the impact of $\sigma _\phi$ on the classification accuracy of the structures becomes on par with that of $\sigma _{\theta }$. This fact will be investigated in the next section of the manuscript.

Fig. 6. Classification accuracy of the single layer ONNS implemented by (a) the 4$\times$4 Reck mesh, and (b) the 4$\times$4 diamond mesh with respect to phase errors $\sigma _{\theta }$ and $\sigma _\phi$.

Download Full Size | PDF

Figure 7 shows the classification accuracy of the ONN exploiting the Reck and the diamond topologies as a function of phase uncertainty, $\sigma _{\theta }$ and $\sigma _\phi$, and loss per MZI. In this analysis, the standard deviation of the loss per MZI is set to 0.5 dB. It can be seen that compared to the Reck mesh, the proposed diamond topology is less sensitive to the IL of the MZIs for the data sample classification.

Fig. 7. Classification accuracy of the single layer ONNs implemented by (a) the 4$\times$4 Reck mesh, and (b) the 4$\times$4 diamond mesh with respect to phase error, $\sigma _{\theta }$ and $\sigma _\phi$, and loss per MZI. The results were obtained by setting the standard deviation of the loss per MZI to 0.5 dB.

Download Full Size | PDF

According to Figs. 5, 6 and 7, the 4$\times$4 single layer diamond-topology-based ONN outperforms that of the Reck-topology-based one when classifying the data samples in the presence of inevitable fabrication process variations and experimental imperfections. This can be refereed to the symmetric topology of the diamond mesh which makes its optical paths more loss-balanced compared to the Reck mesh. The additional MZIs in the diamond structure also provides extra degrees of freedom in the ONN weight matrix optimization during the backpropagation process. Furthermore, the additional MZIs in the diamond mesh allows to eliminate the excess light intensity through their tapered output waveguides which would have otherwise degraded the ONN classification performance. The triangular mesh, however, maintains all optical power propagating through its waveguides.

4. Scalability of the Reck and the proposed diamond meshes in ONNs

The growing demand for fast, power efficient and large scale NNs has highlighted the benefits of optics for applications in optimizing these machine learning models. Over the last few years, silicon photonics has proven to be a promising platform for implementing multiport reconfigurable interferometric structures being small in footprint, low cost, fast, and power efficient. These structures have increasingly gained attention in different photonic research communities as they can efficiently perform the essential matrix multiplications in NNs [10,14,27,28]. This section is allocated to the performance analysis of a single layer ONN with appropriate datasets implemented by different sizes of the Reck and the proposed diamond structures. In this section, it is investigated that as the size of the ONN increases, the classification accuracy becomes more sensitive to phase errors and IL of the constituent MZIs in the optical device due to a larger number of reconfigurable MZIs. Our results show that the proposed $N \times N$ diamond mesh, with $(N-1)(N-2)/2$ additional MZIs, outperforms the $N \times N$ Reck structure in terms of scalability. This is attributed to the additional symmetry present in the diamond topology, resulting in more loss-balanced optical paths from the inputs to the outputs of the device. Additionally, the larger number of MZIs in the diamond structure provides additional degrees of freedom, which allows the ONN to provide better differentiation between the correct class and the other classes.

Fig. 8. Back-propagation of the single layer ONNs implemented by the Reck and the proposed diamond meshes of 32$\times$32 structure size with 0 dB loss per MZI. The unitary transformation (weight) matrices in both scenarios are optimized through minimizing the $\mathcal {L}_{\mathrm {(MSE)}}$ value. The diamond mesh achieves a loss function value as low as 0.153 compared to that of the Reck scenario at 1.21.

Download Full Size | PDF

Figure 8 compares the backpropagation process of a 32$\times$32 ONN implemented by the Reck and the diamond meshes with 0 dB loss per MZI.

As can be seen in Figs. 8(a) and 8(b), the weight matrix is optimized through minimizing the loss function, where the diamond mesh achieves a validation classification accuracy of 97.71% with a final $\mathcal {L}_{\mathrm {(MSE)}}$ value as low as 0.15 compared to that of the Reck mesh achieving 97.40% with a final $\mathcal {L}_{\mathrm {(MSE)}}$ value of 1.21. The lower loss function value in the diamond scenario is translated to a smaller difference between the output vector and the ground truth vector for each sample during classification.

Figure 9 demonstrates the classification accuracy of the single layer ONNs implemented by various sizes of the two structures as a function of phase errors $\sigma _{\theta }$ and $\sigma _{\phi }$. As shown in the figure, the area of high classification accuracy in both cases decreases with the size of the structures i.e., the size of the ONN. As can be inferred from the figures, the diamond topology exhibits more robustness in terms of experimental phase uncertainties which makes it better suited for large scale ONNs. The reason lies in the fact that compared to the Reck mesh the diamond topology has the possibility to optimally direct the required light intensity towards its outputs for correct classification, and eliminate the destructive part of the intensity through the tapered output waveguides of its additional MZIs. Furthermore, as shown in Fig. 8, the additional MZIs in the diamond mesh yields a set of extra degrees of freedom for the training process of the ONNs, allowing for a better minimization of the loss function value.

Fig. 9. Classification accuracy of the single layer ONNs implemented by different sizes of the Reck structure and the proposed diamond mesh. The results are obtained as a function of phase uncertainties $\sigma _{\theta }$ and $\sigma _{\phi }$.

Download Full Size | PDF

It should be noted that in this analysis, every mesh size has its own dataset, where the mesh size is equal to the dimensionality of the dataset, i.e., the N=64 meshes have a 64 dimensional multivariate Gaussian dataset, while the N=32 meshes have a 32 dimensional one.

Figure 10 depicts the performance analysis of different sizes of the single layer ONNs constructed by the Reck and the diamond meshes in regards to the phase errors, $\sigma _{\theta }$ and $\sigma _{\phi }$, and the IL of each MZI in the structures. According to Fig. 10, the Figure of Merit (FoM) in $\mathrm {Rad} \cdot \mathrm {dB}$, representing the classification accuracy area above 75%, is reduced in larger structures implying a higher sensitivity to the possible experimental phase errors and IL of the MZIs. According to Fig. 10, as the structure size of the Reck mesh and the diamond mesh increases, the effect of phase error on the classification performance of the single layer ONNs is more significant, particularly, in the Reck-topology-based scenarios. For instance, a voltage noise standard deviation of 8.67 mV corresponding to $\sigma _{\theta }$ and $\sigma _{\phi }$ of 0.013 Rad [14] results in a classification accuracy degradation of 3% in the 64$\times$64 Reck-topology-based ONN compared to 0.56% in the corresponding diamond-topology-based one.

Fig. 10. Classification accuracy of the single layer ONNs implemented by different sizes of the Reck mesh and the proposed diamond mesh. The results are obtained as a function of the IL (loss) per MZI and phase uncertainty, $\sigma _{\theta }$ and $\sigma _{\phi }$.

Download Full Size | PDF

Figure 11 summarizes the performance analysis of the corresponding ONNs implemented by the Reck and the proposed diamond meshes with different structure sizes. Figure 11(a) compares the related FoM values of the classification accuracy area above 75% and Fig. 11(b) shows the final loss function values (MSE) obtained through the corresponding backpropagation processes.

Fig. 11. Performance comparison between the Reck and the proposed diamond structures in different sizes $(N)$ of the single layer ONNs; (a) figure of merit (FoM), and (b) final loss function values (MSE).

Download Full Size | PDF

It can be seen that the diamond mesh is expected to experimentally outperform the Reck mesh as it exhibits higher FoM and lower final loss function values for various sizes of ONNs. The higher value of FoM in any given size of the diamond structure can be translated to higher classification accuracies under experimental conditions, where the phase errors and the IL of the MZIs are inevitable. Additionally, the lower final loss function value (MSE) in the diamond mesh can be attributed to the extra degrees of freedom caused by the $(N-1)(N-2)/2$ additional MZIs, leading to a more optimal weight matrix in the ONN. Moreover, the diamond mesh can eliminate the excess light intensity that degrades its classification accuracy through the tapered-out waveguides of its additional MZIs, whereas the Reck mesh does not have this possibility.

5. Conclusion

A phase-error- and loss-tolerant field-programmable MZI-based optical processor with diamond mesh topology was proposed to implement various sizes of single layer ONNs. The classification performance of the proposed structure is compared to that of the Reck mesh. Our results confirm that the diamond structure is a good practical candidate for ONNs. This is related to the $(N-1)(N-2)/2$ additional MZIs in the diamond mesh compared to the Reck structure, which can be tuned to exclude the excess light degrading its classification performance through their tapered-out waveguides. Furthermore, these additional MZIs in the diamond mesh yield a higher robustness to phase error and the IL of the MZIs due to its symmetric topology, i.e., having more loss-balanced optical paths, additional degrees of freedom in the ONN weight matrix optimization.

Funding

Canada Research Chairs; Natural Sciences and Engineering Research Council of Canada (NSERC).

Disclosures

The authors declare no conflicts of interest.

References

1. I. A. D. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, “Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–12 (2020). [CrossRef]

2. D. Perez, E. S. Gomariz, and J. Capmany, “Programmable True-Time Delay Lines Using Integrated Waveguide Meshes,” J. Lightwave Technol. 36(19), 4591–4601 (2018). [CrossRef]

3. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of Photonic Neural Networks Through in Situ Backpropagation and Gradient Measurement,” Optica 5(7), 864–871 (2018). [CrossRef]

4. R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an Imperfect Photonic Network to Implement Random Unitaries,” Opt. Express 25(23), 28236–28245 (2017). [CrossRef]

5. D. A. B. Miller, “Self-Aligning Universal Beam Coupler,” Opt. Express 21(5), 6360–6370 (2013). [CrossRef]

6. D. A. B. Miller, “Self-Configuring Universal Linear Optical Component,” Photonics Res. 1(1), 1–15 (2013). [CrossRef]

7. R. Hamerly, A. Sludds, L. Bernstein, M. Prabhu, C. Roques-Carmes, J. Carolan, Y. Yamamoto, M. Soljacic, and D. Englund, “Towards Large-Scale Photonic Neural-Network Accelerators,” in 2019 IEEE International Electron Devices Meeting (IEDM), (2019), pp. 22.8.1–22.8.4.

8. L. Chrostowski, H. Shoman, M. Hammood, H. Yun, J. Jhoja, E. Luan, S. Lin, A. Mistry, D. Witt, N. A. F. Jaeger, S. Shekhar, H. Jayatilleka, P. Jean, S. B. . Villers, J. Cauchon, W. Shi, C. Horvath, J. N. Westwood-Bachman, K. Setzer, M. Aktary, N. S. Patrick, R. J. Bojko, A. Khavasi, X. Wang, T. Ferreira de Lima, A. N. Tait, P. R. Prucnal, D. E. Hagan, D. Stevanovic, and A. P. Knights, “Silicon Photonic Circuit Design Using Rapid Prototyping Foundry Process Design Kits,” IEEE J. Sel. Top. Quantum Electron. 25(5), 1–26 (2019). [CrossRef]

9. J. M. Shainline, S. M. Buckley, R. P. Mirin, and S. W. Nam, “Superconducting Optoelectronic Circuits for Neuromorphic Computing,” Phys. Rev. Appl. 7(3), 034013 (2017). [CrossRef]

10. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and D. Soljačić, “Deep Learning with Coherent Nanophotonic Circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

11. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal Design for Universal Multiport Interferometers,” Optica 3(12), 1460–1465 (2016). [CrossRef]

12. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental Realization of any Discrete Unitary Operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]

13. F. Shokraneh, M. S. Nezami, and O. Liboiron-Ladouceur, “Theoretical and Experimental Analysis of a 4 × 4 Reconfigurable MZI-Based Linear Optical Processor,” J. Lightwave Technol. 38(6), 1258–1267 (2020). [CrossRef]

14. F. Shokraneh, S. Geoffroy-Gagnon, M. S. Nezami, and O. Liboiron-Ladouceur, “A Single Layer Neural Network Implemented by a 4 × 4 MZI-Based Optical Processor,” IEEE Photonics J. 11(6), 1–12 (2019). [CrossRef]

15. J. Fu, H. Zheng, and T. Mei, “Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 4476–4484.

16. G. K. Venayagamoorthy, V. Moonasar, and K. Sandrasegaran, “Voice recognition using neural networks,” in Proceedings of the 1998 South African Symposium on Communications and Signal Processing-COMSIG ’98 (Cat. No. 98EX214), (1998), pp. 29–32.

17. B. Wu, A. Wan, F. Iandola, P. H. Jin, and K. Keutzer, “SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2017), pp. 446–454.

18. B. Bartlett, M. Minkov, T. Hughes, and I. Williamson, “Neuroptica: An Optical Neural Network Simulator,” (2019) (Available: https://github.com/fancompute/neuroptica).

19. G. Veronis, R. W. Dutton, and S. Fan, “Method for Sensitivity Analysis of Photonic Crystal Devices,” Opt. Lett. 29(19), 2288–2290 (2004). [CrossRef]

20. S. Geoffroy-Gagnon, “Neuroptica,” (GitLab, 2020) (Available: https://gitlab.com/simongg/neuroptica).

21. M. A. Nielsen, Neural Networks and Deep Learning (Determination Press, 2018).

22. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, “Scikit-learn: Machine learning in Python,” J. Machine Learn. Res. 12, 2825–2830 (2011).

23. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).

24. Y. LeCun, L. Bottou, G. Orr, and K. Muller, “Efficient BackProp,” in Neural Networks: Tricks of the tradeG. Orr and K. Muller, eds. (Springer, 1998).

25. Y. LeCun, “A Theoretical Framework for Back-Propagation,” in Artificial Neural Networks: concepts and theory, P. Mehra and B. Wah, eds. (IEEE Computer Society Press, Los Alamitos, CA, 1992).

26. H. Wu, “Stability Analysis for Periodic Solution of Neural Networks with Discontinuous Neuron Activations,” Nonlinear Anal.: Real World Appl. 10(3), 1717–1729 (2009). [CrossRef]

27. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864–871 (2018). [CrossRef]

28. Q. Cheng, J. Kwon, M. Glick, M. Bahadori, L. P. Carloni, and K. Bergman, “Silicon Photonics Codesign for Deep Learning,” Proceedings of the IEEE pp. 1–22 (2020).

The diamond mesh, a phase-error- and loss-tolerant field-programmable MZI-based optical processor for optical neural networks

Abstract

1. Introduction

2. Theory and background

3. Optical neural networks for classification

3.1 Single layer ONNs implemented by the 4$\times$4 Reck and diamond meshes

4. Scalability of the Reck and the proposed diamond meshes in ONNs

5. Conclusion

Funding

Disclosures

References

Cited By

Figures (11)

Equations (15)

Optics Express

Farhad Shokraneh	https://orcid.org/0000-0002-6894-170X
Odile Liboiron-Ladouceur	https://orcid.org/0000-0001-6238-5346