Architecture agnostic algorithm for reconfigurable optical interferometer programming

Sergei Kuzmin; Ivan Dyakonov; Sergei Kulik

doi:10.1364/OE.432481

1. Introduction

Linear optical interferometers are rapidly becoming an indispensable tool in quantum optics [1] and optical information processing [2]. The interest in linear optics grows due to broader availability of integrated photonic fabrication technology to the scientific community. The key feature of a state-of-the-art integrated linear interferometer is reconfigurability which enables the device to change its effect on the input optical modes upon the demand. This possibility has made programmable linear optical circuits particularly appealing for information processing challenges. In particular reconfigurable interferometers are the main ingredients of the contemporary linear optical quantum computing experiments [3–5] and are considered as the core of optical hardware accelerators for deep learning applications [6,7]. Furthermore, fabrication quality and improved scalability of reconfigurable photonic circuits led to the emergence of the field-programmable photonic array concept - a multipurpose photonic circuit that can serve many possible applications by means of low-level programming of the device [8].

A unitary transformation matrix $U$ completely describes an operation of a linear optical interferometer. The matrix $U$ couples input optical modes of the device to output ones $a^{(out)}_{j}=\sum _{i}U_{ij}a^{(in)}_{i}$. The architecture of the interferometer parametrizes the transformation $U = U(\{\varphi \})$ on tunable parameters $\{\varphi \}$ which are typically phase shifters controlling relative phases between arms of the interferometer. The architecture is labelled as universal if it allows reaching any arbitrary $N\times N$ unitary matrix by appropriately setting the phase shifts $\{\varphi \}$. The device programming is then essentially boiled down to establishing the correspondence between the desired matrix $U_{0}$ and the appropriate parameter set $\{\varphi ^{0}\}$. Hurwitz analytical decomposition of the $N\times N$ unitary matrix [9] is a well-known example of a universal architecture. It implies straightforward implementation using simple optical components [10,11] - the two-port Mach-Zender interferometers (MZI) with two controllable phase shifters. The interferometer architecture based on this decomposition is a mesh layout of MZIs, which is very easy to program - an efficient inverse algorithm returns the values for each phase shifter given the unitary matrix $U_{0}$. The simplicity of this architecture comes at a cost of extremely stringent fabrication tolerances. The universal operation is achieved if and only if the beamsplitters in the MZI blocks are perfectly balanced, which is never the case in a real device. Numerical optimization methods have been adopted to mitigate the effect of imperfections [12,13], but the simple programming flow is deprived.

The challenge to overcome the effect of the fabrication defects have also led to development of more sophisticated architectures [14–18] which have no simple analytical description and can only be programmed using optimization routines. Running an optimization routine to set up a real physical device transformation requires an experimental execution of a resource-intensive procedure of the transformation matrix reconstruction [19] at each iteration of an optimization algorithm of choice. From the end-user perspective, a necessity to optimize the device configuration each time when the transformation needs to be changed is unacceptable. Several works report progress in designing the optimization algorithm for precise adjustments of phaseshifts in an interferometer [20–23]. Firstly, the optimization in a high-dimensional parameter space is itself a time-consuming procedure requiring sophisticated tuning and what is more, there is no guarantee that the global minimum will be reached. Secondly, the algorithms providing fast convergence in multiparameter optimization problems are typically gradient-based, and the precision of the gradient estimation of the objective function implemented by the physical device is limited by a measurement noise. Lastly, even though the number of switching cycles of the phase shifters is not strictly limited, spending the device resource during tedious optimization procedures severely degrades the lifetime of the programmable circuit. It is worth mentioning that except optimization methods new circuit architectures with auxillary elements implementing a feedback loop were proposed to tackle the photonic circuit programming challenge [24,25].

In this work, we develop an efficient algorithm for programming a linear optical interferometer with complex architecture. We employ one of the main methods of machine learning - supervised learning, widely applied to the neural networks training [26–28]. The interferometer model is learnt using the set of samples of transformations corresponding to different phase shifts. The trained model is used to quickly find the necessary phase shifts for a given unitary transformation using an optimization routine applied to the model and not to the physical device. Our learning algorithm is divided into two stages: the training stage - find the model of the interferometer using the training set of sample transformations, and the programming stage - determine the phase shifts of the interferometer model corresponding to the required transformation.

2. Formulation of the problem

We devised our algorithm to solve the problem of programming the multimode interferometer consisting of alternating phase-shifting and mode mixing layers. This architecture has been proven to deliver close to the universal performance and has no direct connection linking the elements of the matrix to the phase shifts of the interferometer [15]. This architecture serves as the perfect example to demonstrate the gist of our algorithm. The interferometer circuit topology is outlined in Fig. 1(a)). The unitary matrix $U$ is expressed as

(1)$$U = \Phi_{L + 1} U_{L} \Phi_{L} U_{L-1} \dots \Phi_{\ell + 1} U_{\ell} \Phi_{\ell} \dots \Phi_{2} U_{1} \Phi_{1},$$

where $\Phi _{\ell } = \operatorname {diag} (e^{i \varphi _{{\ell }1}}, \dots, e^{i\varphi _{{\ell }{N}}})$, $\ell = 1, \dots, L$ and $L$ denotes the number of mode-mixing layers. We call $U_{\ell }$ the basis matrices because they ultimately define the interferometer operation. If the $U_{\ell }$ are available, a simple numerical optimization routine finds the corresponding phase shifts $\varphi _{\ell k}$, thus completing the task of programming the device. It is worth noting that the generalized form of the expansion of the type (1) given in [15] is valid for any linear-optical interferometer design. Indeed it is easy to confirm that every optical interferometer comprised of independent ingredients - fixed beam-splitting elements of any topology and phase modulators - can be unfolded into the sequence of unitary transformations coupling the modes of the circuit and the phase shifters. The only information required is the number of mode mixing layers and the phase-shifting layers. The fact that the inner structure of the beamsplitting elements can be arbitrary and there is no restriction on the length, order or number of mode mixers and phase shifters gives us the solid ground to call our algorithm architecture agnostic. Indeed it is easy to show that any interferometer composed of a series of beam-splitting and phase-shifting elements can be represented in the form of 1. The inset in the Fig. 1(a) shows the correponding representation of the Clements arhitecture in terms of a sequence of basis matrices and phase-shifting layers. It is important to stress that the knowledge of the structure of each individual element does not remove the agnostic feature of the interferometer model $\mathcal {M}$. The multimode interferometer model $\mathcal {M}$ captures the performance of the interferometer with arbitrary layout of these elements and this fact makes the model inherently architecture agnostic.

Fig. 1. a) The scheme of a multimode interferometer structure and its integrated photonic implementation circuit are depicted. The basis matrices $U_{\ell }$ describe the distribution of light between the interferometer channels. Integrated photonic elements implementing the basis matrices can be fabricated as waveguide lattices where the waveguides are evanescently coupled and the energy can transfer between different waveguides [29]. The inset illustrates the representation of $N=4$ Clements interferometer as a multiport interferometer device. The multiport expansion requires 8 basis matrices and 8 phase-shifting layers. b) The workflow of the learning algorithm. The phase parameters $\bar {\Phi }^{(i)}$ from the training set $\mathcal {T}$ are used to calculate $U^{(i)}$ using the expansion Eq. (1) supplied with the basis matrices at the current learning algorithm iteration. The output $U^{(i)}$ of the Eq. (1) and the $\bar {U}^{(i)}$ matrix from the training set are used to calculate the figure of merit $J$. The optimization algorithm updates the basis matrices at each iteration in order to minimize the distance $J$ averaged over the training set. The learning algorithm outputs the basis matrices $U_{\ell }^{\mathcal {M}}$ that serve as the model $\mathcal {M}$ of the interferometer. c) The scheme of the interferometer tuning algorithm based on the learned model $\mathcal {M}$. The tuning process picks the phases $\Phi$ to minimize the distance $J$ between the required matrix $U_{0}$ and the model-implemented matrix $U(U_{\ell }^{\mathcal {M}},\Phi )$.

Download Full Size | PDF

The problem underpinning the difficulty of programming the required unitary in the multimode architecture is that the basis matrices $U_{\ell }$ of a fabricated device do not match the ones implied by the optical circuit design. The interferometer universality may not be degraded [15], but an efficient evaluation of the phase shifts $\varphi _{\ell k}$ becomes impossible since $U_{\ell }$ are not known anymore. The abscence of the $U_{\ell }$ matrices brings us to the first step of our learning algorithm - a reconstruction of elements of the basis matrices utilizing the information stored in the training set $\mathcal {T}$ which was gathered from the device of interest (see Fig. 1(b))). The set $\mathcal {T}$ includes $M$ pairs $(\bar {U}^{(i)},\bar {\Phi }^{(i)})$ obtained by seeding the device with random phase shifts $\bar {\Phi }^{(i)}$ and applying the unitary reconstruction procedure of choice [19,30] to get the $\bar {U}^{(i)}$. The basis matrices $U_{\ell }$ are then determined as a solution of the optimization problem

(2)$$\{U_{\ell}\} = \underset{\{U_{\ell}\}}{argmin} \langle J(\bar{U},U(\{U_{\ell}\},\bar{\Phi}))\rangle_{\mathcal{T}},$$

where the figure of merit $J$ is averaged over the training set $\mathcal {T}$. Particular choice of the metric $J$ is discussed in section 3.1. Once the basis matrices are determined, we move to the second step of the algorithm (see Fig. 1(c))) - finding the appropriate phase shifts $\varphi _{{\ell k}}$ which will adjust the interferometer model $\mathcal {M}$ to match the unitary matrix $U_{0} \notin \mathcal {T}$.

3. Learning algorithm

In this section, we present the algorithm which learns the interferometer model $\mathcal {M}$ based on the initial data contained in training set $\mathcal {T}$. We reduce the learning problem to the multiparameter optimization of the nonlinear functional $J$. In this section, we present the mathematical framework of the learning algorithm and exemplify its performance on the multimode interferometer.

3.1 Figure of merit

The figure of merit $J$ to be studied in our work is the Frobenius norm $J_{FR}$:

(3)$$J_{FR} (U, \bar{U}) \equiv \dfrac{1}{N} \sum_{i, j = 1}^N |u_{ij} - \bar{u}_{ij}| ^2,$$

where $u_{ij}, \bar {u}_{ij}$ are complex elements of the unitary matrices $U, \bar {U}$. It is invariant only under the identity transformation, that is, $J_{FR} (U, \bar {U}) = 0$ if and only if the magnitudes and the phases of $U$ and $\bar {U}$ matrix elements are identical. The expression (3) can be rewritten using Hadamard’s product $(A \odot B)_{i, j}=(A)_{i, j} \cdot (B)_{i, j}$ operation and takes the following form:

(4)$$\hspace{0.5cm} J_{FR} (U, \bar{U}) = \dfrac{1}{N}\sum_{i, j = 1}^N ((U - \bar{U}) \odot (U - \bar{U})^{*})_{i,j}. \hspace{0.5cm}$$

The gradient of the $J_{FR}$ with respect to the parameter set $\{\bar {\alpha }\}$ is given by:

(5)$$\partial_{\bar{\alpha}} J_{FR}(U(\bar{\alpha}), \bar{U}) = \dfrac{2}{N} Re \sum_{i, j = 1}^N ((U - \bar{U})^{*} \odot \partial_{\bar{\alpha}} U)_{i, j}.$$

3.2 Computing the gradients of $J$

A gradient-based optimization algorithm substantially benefits from the analytical gradient expressions of optimized functions. It turns out that the multimode interferometer expansion Eq. (1) admits simple analytic forms of the gradients over the $u_{ij} = x_{ij} + iy_{ij}$ elements of the basis matrices $U_{\ell }$ and over the phase shifts $\varphi _{\ell k}.$ We will derive the analytical expressions of the gradients $\partial _{x^{(\ell )}_{ij}} J_{FR}$, $\partial _{y^{(\ell )}_{ij}} J_{FR}$ and $\partial _{\varphi _{\ell k}} J_{FR}$ required during learning and tuning stages of the algorithm respectively. The Eq. (5) stems that the gradients $\partial _{x^{(\ell )}_{ij}} J_{FR}$, $\partial _{y^{(\ell )}_{ij}} J_{FR}$ and $\partial _{\varphi _{\ell k}} J_{FR}$ calculation is reduced to finding the expressions for $\partial _{x^{(\ell )}_{ij}} U$, $\partial _{y^{(\ell )}_{ij}} U$ and $\partial _{\varphi _{\ell k}} U$ respectively.

We will first focus on the $\partial _{x^{(\ell )}_{ij}} J_{FR}$ and $\partial _{y^{(\ell )}_{ij}} J_{FR}$ gradients. In order to simplify the computation we introduce $L$ auxiliary matrices $A_{\ell }$ and another $L$ auxiliary matrices $B_{\ell }$ as the partial products taken from the expansion Eq. (1):

(6)$$U = A_{\ell} U_{\ell} B_{\ell}.$$

where $A_{\ell }$ and $B_{\ell }$ can be calculated iteratively:

(7)$$\begin{cases} A_L = \Phi_{L+1}, \\ A_{\ell} = A_{{\ell} + 1}(U_{{\ell} + 1}\Phi_{{\ell}+1}), \hspace{0.5cm} {\ell} = L - 1, \ldots ,1, \\ B_1 = \Phi_1, \\ B_{\ell} = (\Phi_{{\ell}} U_{{\ell} - 1})B_{{\ell} - 1}, \hspace{0.8cm} {\ell} = 2, \ldots ,L. \end{cases}$$

Next, given that $x_{ij}^{({\ell })}$ and $y_{ij}^{({\ell })}$ are the real and imaginary parts of $u_{ij}^{({\ell })}$ respectively we get the expressions for the gradients:

(8)$$\hspace{0.3cm}\dfrac{\partial U}{\partial x_{ij}^{({\ell})}} = A_{\ell} \Delta^{(ij)} B_{\ell} \hspace{0.2cm} \textrm{ and } \hspace{0.3cm}\dfrac{\partial U}{\partial y_{ij}^{({\ell})}} = i A_{\ell} \Delta^{(ij)} B_{\ell},$$

where $\Delta ^{(ij)}$ are the matrices in which all elements are zeros, except $\Delta ^{(ij)}_{ij}=1$. The supplemental material provides the detailed derivation of the Eq. (8).

Once the basis matrices of the model $\mathcal {M}$ are learnt we can use them to calculate the gradients $\partial _{\varphi _{\ell k}} J_{FR}$. Derivation of the $\partial _{\varphi _{\ell k}} J_{FR}$ also requires to introduce The $L + 1$ auxiliary matrices $C_{\ell }$ and $L + 1$ matrices $D_{\ell }$

(9)$$U = C_{\ell} \Phi_{\ell} D_{\ell}$$

representing the partial products from the general expansion Eq. (1). The iterative formula establishes $C_{\ell }$ and $D_{\ell }$ for each index $\ell$:

(10)$$\begin{cases} C_{L+1} = I, \\ C_{\ell} = C_{{\ell} + 1}(\Phi_{{\ell} + 1}U_{\ell}), \hspace{0.5cm} {\ell} = L, \ldots ,1, \\ D_1 = I, \\ D_{\ell} = (U_{{\ell} - 1} \Phi_{{\ell} - 1})D_{{\ell} - 1}, \hspace{0.8cm} {\ell} = 2, \ldots , L+1. \end{cases}$$

Once the $C_{\ell }$ and $D_{\ell }$ are computed, the gradients $\partial _{\varphi _{\ell k}} U$ are given by

(11)$$\hspace{0.3cm}\dfrac{\partial U}{\partial \varphi_{\ell k}} = i e^{i \varphi_{\ell k}} C_\ell \Delta^{(kk)} D_{\ell}, \hspace{0.3cm}$$

where all $\Delta ^{(kk)}$ elements are zeros, except the $\Delta ^{(kk)}_{kk}=1$. The Eq. (11) is supplied to the optimization algorithm applied for finding the phase configuration of the interferometer corresponding to the desired unitary transformation. The details of the multimode interferometer tuning optimization procedure are described in [15]. The output of the $\varphi _{\ell k}$ concludes the workflow of the algorithm.

4. Numerical experiment

In this section, we provide key performance metrics of the interferometer model learning algorithm. We test the algorithm scaling properties with respect to the training set size $M$ and the number of interferometer modes $N$. We set the number of layers $L=N$ because this configuration already delivers universal performance [15]. To certify the quality of the model $\mathcal {M}$ we employ the cross-validation methodology which requires to use another set of examples in quality tests that were not included in the training set $\mathcal {T}$.

The simulation of the learning algorithm follows a series of steps. We first generate the training set $(\bar {U}^{(i)},\bar {\Phi }^{(i)})$ using the multimode interferometer expansion Eq. (1). The phases are sampled randomly from a uniform distribution from $0$ to $2\pi$. The basis matrices $U_{\ell }$ are drawn from the Haar-uniform distribution using the QR decomposition [31]. In a real-life setting the elements of $\mathcal {T}$ are the outcomes $\bar {U}^{(i)}_{rec}$ of the unitary reconstruction algorithms [19,30] applied to a reconfigurable interferometer programmed with the phases $\bar {\Phi }^{(i)}$. The subtleties of experimental gathering the appropriate training set are discussed in Sec. 5..

The proper interferometer model must accurately predict the unitary matrix of the real device with a certain set of applied phases $\Phi$. The cross-validation procedure purpose is to estimate the predictive strength of the model $\mathcal {M}$. We generate the test set $(\hat {U}^{(i)},\hat {\Phi }^{(i)})$ comprised of randomly selected phases $\hat {\Phi }^{(i)}$ and the corresponding $\hat {U}^{(i)}$. The cross-validation routine uses each sample from the test set to verify whether the interferometer model with phases $\hat {\Phi }^{(i)}$ outputs the unitary $\hat {U}^{(i)}$. The model is considered to pass the cross-validation if $J(U,\hat {U}) \leq 10^{(-2)}$. The criterium has been derived empirically by analyzing the behaviour of $J_{FR}$ convergence on the test set. If the model passes cross-validation, the $J_{FR}(U,\hat {U})$ experiences a rapid decrease down to the values less than $10^{-2}$.

The model is initialized with basis matrices $U_{\ell }$ selected either randomly or with a priori knowledge available based on the design of the physical elements realizing the basis matrices. We will study both cases and refer to the random initialization as the black box model. At each epoch of the learning process, we estimate the average gradient over the collection of examples from $\mathcal {T}$ and update the basis matrices according to the optimization algorithm (stochastic L-BFGS-B algorithm, SciPy package). Instead of averaging the gradient over complete training set, we randomly draw a subset of $m=5$ pairs $(\bar {U}, \bar {\Phi })$ each epoch and use this subset for averaging. The value $m=5$ has been empirically determined as it provided substantial computational speed-up while still keeping high training accuracy. We do not use the unitary parametrization of the basis matrices during the learning procedure and treat these matrices simply as the complex-valued square matrices. Since the parameters of the model are then the real $x_{ij}^{(\ell )}$ and the imaginary $y_{ij}^{(\ell )}$ parts of each basis matrix $U_{\ell }$ the updated basis matrices do not belong to the unitary space. We use the polar decomposition $A = H V$, where $H$ is the hermitian, and $V$ is the unitary matrix, to project updated complex basis matrix $C_{\ell }$ onto the closest unitary $U_{\ell }$ at each step of the optimization [32]. This method helps to avoid local minima, which may arise due to sophisticated unitary matrix parametrization.

The simulation code is written in Python employing Numpy and Scipy packages. The code is publicly available on GitLab [33].

4.1 Model - black box

We start first from considering the black-box model. This scenario implies that no a priori information is available about the basis matrices $U_{\ell }$ and the interferometer is represented by a black-box type system with $N^2$ variable phase parameters $\varphi _{{\ell }k}$. Therefore, the model should be initialized with a completely random guess. The initial basis matrices are sampled from the Haar-random unitary distribution [31]. The model cross-validation testing (see Fig. 2(a)) determines the size $M$ of the training set for each value of $N$. The convergence behaviour Fig. 2(b) of the model on the cross-validation set indicates the optimal volume of the training set $\mathcal {T}$. The black-box treatment of the interferometer model allowed us to enable successfull training of $\mathcal {M}$ only up to 6 modes. The larger interferometer models are unattainable for our optimization procedure. The plateau in the convergence behaviour on the test set $\mathcal {T}$ becomes significant already at $N = 6$ (see Supplement 1 for the convergence plot example). For $N > 6$, we failed to observe learning in the black-box setting - the average value of the figure of merit remains at the plateau. The work [34] suggests that the reason for a plateau in high-dimensional optimization problems is the presence of a large number of saddle points in an optimized function landscape rather than local minima. Several algorithms exploiting the idea of the adaptive gradient have been developed to tackle the problem of escaping the plateau [35,36]. The adoption of the appropriate algorithm may solve the learning problem in the black-box setting.

Fig. 2. The panel a) ilustrates the convergence of the interferometer model with up to 6 modes depending on the training set size $M$. The numerical experiments were carried out without the a priori knowledge about interferometer structure. The panel b) provides the example of the training and the cross-validation convergence plots for the 3-mode interferometer for different training set sizes $M=3,4,5$. The bottom panel c) depicts the influence of the training set imperfections on the model training convergence. The left panel demonstrates the convergence plots for training the $N = 2$ interferometer model supplied with $M = 5$ training set sampled with $\alpha = 0.025,\,0.05,\,0.1$. The right panel outlines the dependence of the average value of the Frobenius functional $J_{FR}$ computed in the cross-validation test on the $\alpha$ parameter. All simulations were carried out for the $L=N$ interferometer configuration.

Download Full Size | PDF

Until this moment the elements of $\mathcal {T}$ included the matrices $\bar {U}^{(i)}$ which were artificially generated using the Eq. (1) initialized with the $\bar {\Phi }^{(i)}$ set of phases. Gathering the same set using the real device means that the reconstruction of the $\bar {U}^{(i)}_{exp}$ matrices must be performed with absolute precision, which is never the case in an experiment. The learning algorithm has to be designed to tolerate a certain amount of discrepancy between ideal $\bar {U}_{ideal}$ and reconstructed $\bar {U}_{exp}$ matrices. These deviations are the inevitable consequence of imperfections of measurement tools used during the reconstruction process. We have modelled the behaviour of the learning algorithm seeded with a training set including the phase shifts $\bar {\Phi }^{(i)}$ and the unitaries $\bar {U}^{(i)}_{exp}$ slightly deviated from their theoretical counterpart $\bar {U}^{(i)}_{ideal}$.

The deviation is introduced between $\bar {U}_{exp}$ and $\bar {U}_{ideal}$ as the polar projection [32] of the perturbed $\bar {U}_{ideal}$ onto the unitary space:

(12)$$A = \bar{U}_{ideal} + \alpha(X + i Y), \hspace{0.3cm} A = H\bar{U}_{exp},$$

where $X$ and $Y$ are the random real-valued matrices of size $N \times N$, whose elements are sampled from the normal distribution $\mathcal {N}(0,\,1)$. The degree of deviation is controlled by the real-valued parameter $\alpha$. The Fig. 2(c) illustrates the convergence of the model of the simplest case $N=2$ supplied with the training set sampled with the given deviation $\alpha$. The $\alpha \approx 0.04$ indicates the threshold at which the model fails to pass the cross-validation criteria $J_{FR} \leq 10^{-2}$. Averaging was performed with $1000$ models learnt using the training sets corresponding to the different basis matrices $U_{\ell }^{(0)}$. For each model, we performed the cross-validation test with $1000$ phase shift sets.

4.2 Model with a priori knowledge

The black-box model results expounded in the sec. 4.1 evidence that the optimization complexity of the model with arbitrary initial basis matrices grows rapidly in the training set volume $M$. In this section, we study the choice of the initial approximation for the basis matrices $U_{\ell }$ which enables learning for the larger dimension $N$. The case when the basis matrices $U_{\ell }$ are completely unknown does not adequately reflect the real situation. In practice, the optical circuits with well-defined geometry and optical properties implement the basis matrices. The prototypes of these circuits can be tested beforehand to verify circuit’s performance, including the mode transformation that it generates. Contemporary integrated photonics fabrication technologies guarantee reproducibility up to a certain level of precision specific to each technological line. Summing up the basis matrix unitary transformation $U_{\ell }^{est}$ can be estimated in advance. This estimate serves as the initial guess for the optimization algorithm at the model training stage. This section will demonstrate how this knowledge substantially simplifies the optimization and enables learning the models of the interferometers with $N$ up to at least a few tens of modes.

We use estimated matrices $U_{\ell }^{est}$ as the initial guess for our optimization routine. These matrices are chosen to be close to the ideal basis matrices $U_{\ell }$ used for training set generation. We get the initial guess using the procedure described by Eq. (12), substituting $\alpha$ with parameter $\beta$ to avoid notation collision. The Fig. 3 shows the convergence of the learning algorithm employing the knowledge of the basis matrix unitary up to a certain precision. The a priori information about the basis matrices enabled learning the model of the interferometer up to $N=20$ using the same computational hardware. The larger the interferometer dimension $N$, the higher the precision of the $U_{\ell }^{est}$ estimation must be. The Fig. 3(b)) illustrates the regions of the $\beta$ value depending on the size of the interferometer where the algorithm still accurately learns the model $\mathcal {M}$.

Fig. 3. Investigation of the training process using a priori knowledge about basis matrices $U_{\ell }$. a) - Examples of successfull training using a priori knowledge of basis matrices for large values of the number of optical modes $N$ of the interferometer, b) - Dependence of the maximum required $\alpha$ value, at which the model will be trained on the number of optical modes $N$. Figure c) illustrates relation between the error parameter $\beta$ introduced for estimated $U^{est}$ matrices accordin to 12 and the corresponding fidelity value $F$. All simulations were carried out for the $L=N$ interferometer configuration.

Download Full Size | PDF

The scalability of interferometer model learning aided with a priori knowledge about basis matrices is limited by the precision $\beta$ of estimation of initial approximation $U_{\ell }^{est}$. We provide in Fig. 3(c) the relation between error parameter $\beta$ and a conventional value $F(U_{\ell }^{est}, U_{\ell }^{exact}) \equiv \dfrac {1}{N^2} | Tr((U_{\ell }^{exact})^{\dagger } U_{\ell }^{est}) |^2$, where $U_{\ell }^{exact}$ denotes the exact basis matrices of mode-mixing layers. The curves were computed by averaging the fidelity values between $U_{\ell }^{est}(\alpha )$ and $U_{\ell }^{exact}$ over 1000 samples for each point. For the largest $N=20$ interferometer the $\beta = 0.11$ value required to pass the cross-validation test corresponds to approximately 80% fidelity. Experimental results reported in [1,30] indicate that reconstruction of an unknown unitary matrix can exceed 90%. The Fig. 3(c) also indicates that for fixed value $\beta$ fidelity drops down with larger $N$. This gives us hope that our algorithm can be implemented for large interferometers. However, reconstruction of unitary matrices for large $N$ still remains challening.

5. Discussion

The demonstrated approach to interferometer programming stands out with several major advantages. First and foremost, the method is agnostic of the architecture of the interferometer. Any universal interferometers reported in literature [10,11,15,16] admit the form of the expansion Eq. (1) - the optical mode mixing elements interleaved with phase shifters. Hence, both the gist of the algorithm and the mathematical framework fit any architecture of choice. We have elaborated two examples demonstratin applicability of our method to various architectures. First, we have simulated learning of basis matrices of a $N=5$ multiport interferometer with much larger amount of layers than is required for universal operation. the simulation results a presented in Fig. 4(a). Next we have decomposed a standard Clements $N=5$ interferometer into a sequence of basis matrices and phase-shifting layers. Each beamsplitter reflectivity $r=\cos \theta$ was modified by adding a normaly distributed random number to the parameter $\theta$

(13)$$\theta = \frac{\pi}{4} + \frac{1}{2}\mathcal{N}(0,1),$$

where ${N}(0,1)$ represents a random number sampled from a normal distribution. Figure 4(b) represents convergence curves for 50 instances of learning the unknown basis matrices of Clements interferometer. Each learning instance was seeded with basis matrices $U^{(est)}$ equal to matrices comprised of perfectly balanced beamsplitters. The result clearly indicates that standard layouts based on MZI interferometers can be programmed using the proposed algorithm. The architecture agnostic feature of the algorithm remains valid unless the mode mixers and the phase shifters are considered as independent elements. Next, an output of the learning algorithm is a complete interferometer model taking into account the transformation of the mode mixing elements in the fabricated device. The model answers the question of how close the required unitary $U_{0}$ can be approximated by the specific device under study and can pinpoint areas of unitary space inaccessible for the device due to design restrictions or fabrication flaws. This feature has to be compared with typical calibration data used for programming the interferometer based on MZI blocks. The phase shifters are calibrated, but no knowledge is available about the real transformation of the beamsplitters comprising the MZI. This fact leads to the necessity of running an optimization procedure to improve the fidelity of the implemented unitary if some of the beamsplitters do not meet the designed transmission coefficients. Lastly, the presented algorithm is essentially the reconstruction routine for the inner fixed optical elements of the complex interferometric circuit. Hence it can be adopted for probing the quality of the optical subcircuits located inside the larger optical scheme.

Fig. 4. Two examples of learning an interferometer model different from the one described in [15]. Figure a) illustrates a five-mode Clements interferometer and convergence plots of the algorithm learning process. Each line on the convergence plot corresponds to learning basis matrices of a five-mode Clements interferometer with different fixed beamsplitters. Figure b) shows the cross-validation convergence map calculated for a multirport interferometer with number of mode mixing-layers $L$.

Download Full Size | PDF

The bottlenecks of the proposed algorithm are related to the practical issues. The $J_{FR}$ Frobenius metric requires exact measurement of the unitary elements’ modulus and phase. Several reconstruction methods have been proposed, and verified [19,30,37–39]. Some of them [30,37] provide only partial information about the transformation matrix of the interferometer, omitting phases that are impossible to reconstruct using the method-specific dataset. Any method will inevitably suffer from path-dependent optical loss, which is impossible to distinguish and attribute to the particular path inside the circuit. Another issue that is not covered by our algorithm arises from the crosstalks between the phase shifters. Our framework assumes that the phases in different paths are enabled independently, which is not the case due to the crosstalks between different phase modulating elements. Luckily the integrated photonic modulator implementations typically exhibit extremely low crosstalks [40,41].

We believe that our results will enable opportunities to employ new programmable optical interferometer architectures for both classical and quantum applications.

Funding

Russian Science Foundation (19-72-10069).

Acknowledgements

This work was supported by Russian Science Foundation (RSF), project No: 19-72-10069.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. J. Carolan, C. Harrold, C. Sparrow, E. Martin-Lopez, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, M. Itoh, G. D. Marshall, M. G. Thompson, J. C. F. Matthews, T. Hashimoto, J. L. O’Brien, and A. Laing, “Universal linear optics,” Science 349(6249), 711–716 (2015). [CrossRef]

2. N. C. Harris, J. Carolan, D. Bunandar, M. Prabhu, M. Hochberg, T. Baehr-Jones, M. L. Fanto, A. M. Smith, C. C. Tison, P. M. Alsing, and D. Englund, “Linear programmable nanophotonic processors,” Optica 5(12), 1623 (2018). [CrossRef]

3. J. Wang, S. Paesani, Y. Ding, R. Santagati, P. Skrzypczyk, A. Salavrakos, J. Tura, R. Augusiak, L. Mančinska, D. Bacco, D. Bonneau, J. W. Silverstone, Q. Gong, A. Acín, K. Rottwitt, L. K. Oxenløwe, J. L. O’Brien, A. Laing, and M. G. Thompson, “Multidimensional quantum entanglement with large-scale integrated optics,” Science 360(6386), 285–291 (2018). [CrossRef]

4. J. Wang, F. Sciarrino, A. Laing, and M. G. Thompson, “Integrated photonic quantum technologies,” Nat. Photonics 14(5), 273–284 (2020). [CrossRef]

5. H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, and A. Q. Liu, “An optical neural chip for implementing complex-valued neural network,” Nat. Commun. 12(1), 457 (2021). [CrossRef]

6. R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X 9(2), 021032 (2019). [CrossRef]

7. G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588(7836), 39–47 (2020). [CrossRef]

8. D. Pérez-López, A. López, P. DasMahapatra, and J. Capmany, “Multipurpose self-configuration of programmable photonic circuits,” Nat. Commun. 11(1), 6359 (2020). [CrossRef]

9. A. Hurwitz, “Uber die erzeugung der invarianten durch interation,” Nachr. Gess. Wiss. Gottingen pp. 71–90 (1897).

10. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]

11. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3(12), 1460–1465 (2016). [CrossRef]

12. R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an imperfect photonic network to implement random unitaries,” Opt. Express 25(23), 28236–28245 (2017). [CrossRef]

13. I. V. Dyakonov, I. A. Pogorelov, I. B. Bobrov, A. A. Kalinkin, S. S. Straupe, S. P. Kulik, P. V. Dyakonov, and S. A. Evlashin, “Reconfigurable photonics on a glass chip,” Phys. Rev. Appl. 10(4), 044048 (2018). [CrossRef]

14. R. Tang, T. Tanemura, and Y. Nakano, “Integrated reconfigurable unitary optical mode converter using mmi couplers,” IEEE Photonics Technol. Lett. 29(12), 971–974 (2017). [CrossRef]

15. M. Y. Saygin, I. V. Kondratyev, I. V. Dyakonov, S. A. Mironov, S. S. Straupe, and S. P. Kulik, “Robust architecture for programmable universal unitaries,” Phys. Rev. Lett. 124(1), 010501 (2020). [CrossRef]

16. S. A. Fldzhyan, M. Y. Saygin, and S. P. Kulik, “Optimal design of error-tolerant reprogrammable multiport interferometers,” Opt. Lett. 45(9), 2632 (2020). [CrossRef]

17. R. Tanomura, R. Tang, S. Ghosh, T. Tanemura, and Y. Nakano, “Robust integrated optical unitary converter using multiport directional couplers,” J. Lightwave Technol. 38(1), 60–66 (2020). [CrossRef]

18. R. Tang, R. Tanomura, T. Tanemura, and Y. Nakano, “Ten-port unitary optical processor on a silicon photonic chip,” ACS Photonics 8(7), 2074–2080 (2021). [CrossRef]

19. M. Tillmann, C. Schmidt, and P. Walther, “On unitary reconstruction of linear optical networks,” J. Opt. 18(11), 114002 (2016). [CrossRef]

20. S. Pai, B. Bartlett, O. Solgaard, and D. A. B. Miller, “Matrix optimization on universal unitary photonic devices,” Phys. Rev. Appl. 11(6), 064044 (2019). [CrossRef]

21. S. Bandyopadhyay, R. Hamerly, and D. Englund, “Hardware error correction for programmable photonics,” Optica 8(10), 1247 (2021). [CrossRef]

22. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864 (2018). [CrossRef]

23. S. Pai, I. A. D. Williamson, M. Minkov, T. W. Hughes, O. Solgaard, S. Fan, and D. A. B. Miller, “Parallel fault-tolerant programming and optimization of photonic neural networks,” in Conference on Lasers and Electro-Optics, (Optical Society of America, 2020), p. SM1E.5.

24. D. A. B. Miller, “Self-aligning universal beam coupler,” Opt. Express 21(5), 6360 (2013). [CrossRef]

25. R. Hamerly, S. Bandyopadhyay, and D. Englund, “Accurate Self-Configuration of Rectangular Multiport Interferometers,” arXiv e-prints arXiv:2106.03249 (2021).

26. M. A. Nielsen, Neural Networks and Deep Learning (Determination Press, 2015).

27. I. Ohn and Y. Kim, “Smooth function approximation by deep neural networks with general activation functions,” Entropy 21(7), 627 (2019). [CrossRef]

28. S. Ferrari and R. F. Stengel, “Smooth function approximation using neural networks,” IEEE Trans. Neural Netw. 16(1), 24–38 (2005). [CrossRef]

29. N. N. Skryabin, I. V. Dyakonov, M. Y. Saygin, and S. P. Kulik, “Waveguide lattice based architecture for multichannel optical transformations,” arXiv e-prints arXiv:2103.02664 (2021).

30. D. Suess, N. Maraviglia, R. Kueng, A. Maïnos, C. Sparrow, T. Hashimoto, N. Matsuda, D. Gross, and A. Laing, “Rapid characterisation of linear-optical networks via PhaseLift,” arXiv e-prints arXiv:2010.00517 (2020).

31. F. Mezzadri, “How to generate random matrices from the classical compact groups,” arXiv e-prints math-ph/0609050 (2006).

32. K. Fan and A. J. Hoffman, “Some metric inequalities in the space of matrices,” Proc. Am. Math. Soc. 6(1), 111 (1955). [CrossRef]

33. S. Kuzmin, “Nnoptic,” https://gitlab.com/SergeiKuzmin/nnoptic (2020).

34. R. Pascanu, Y. N. Dauphin, S. Ganguli, and Y. Bengio, “On the saddle point problem for non-convex optimization,” arXiv e-prints arXiv:1405.4604 (2014).

35. M. Staib, S. J. Reddi, S. Kale, S. Kumar, and S. Sra, “Escaping Saddle Points with Adaptive Gradient Methods,” arXiv e-prints arXiv:1901.09149 (2019).

36. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv e-prints arXiv:1412.6980 (2014).

37. A. Laing and J. L. O’Brien, “Super-stable tomography of any linear optical device,” arXiv e-prints arXiv:1208.2868 (2012).

38. S. Rahimi-Keshari, M. A. Broome, R. Fickler, A. Fedrizzi, T. C. Ralph, and A. G. White, “Direct characterization of linear-optical networks,” Opt. Express 21(11), 13450 (2013). [CrossRef]

39. N. Spagnolo, E. Maiorino, C. Vitelli, M. Bentivegna, A. Crespi, R. Ramponi, P. Mataloni, R. Osellame, and F. Sciarrino, “Learning an unknown transformation via a genetic approach,” Sci. Rep. 7(1), 14316 (2017). [CrossRef]

40. R. Zhang, Y. He, Y. Zhang, S. An, Q. Zhu, X. Li, and Y. Su, “Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data,” Nanophotonics 10(2), 937–945 (2020). [CrossRef]

41. L. Jiang, X. Chen, K. Kim, G. de Valicourt, Z. R. Huang, and P. Dong, “Electro-optic crosstalk in parallel silicon photonic mach-zehnder modulators,” J. Lightwave Technol. 36(9), 1713–1720 (2018). [CrossRef]

Architecture agnostic algorithm for reconfigurable optical interferometer programming

Abstract

1. Introduction

2. Formulation of the problem

3. Learning algorithm

3.1 Figure of merit

3.2 Computing the gradients of $J$

4. Numerical experiment

4.1 Model - black box

4.2 Model with a priori knowledge

5. Discussion

Funding

Acknowledgements

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (4)

Equations (13)

Optics Express