Abstract
We develop the learning algorithm to build an architecture agnostic model of a reconfigurable optical interferometer. A procedure of programming a unitary transformation of optical modes of an interferometer either follows an analytical expression yielding a unitary matrix given a set of phase shifts or requires an optimization routine if an analytic decomposition does not exist. Our algorithm adopts a supervised learning strategy which matches a model of an interferometer to a training set populated by samples produced by a device under study. A simple optimization routine uses the trained model to output phase shifts corresponding to a desired unitary transformation of the interferometer with a given architecture. Our result provides the recipe for efficient tuning of interferometers even without rigorous analytical description which opens opportunity to explore new architectures of the interferometric circuits.
© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
Linear optical interferometers are rapidly becoming an indispensable tool in quantum optics [1] and optical information processing [2]. The interest in linear optics grows due to broader availability of integrated photonic fabrication technology to the scientific community. The key feature of a state-of-the-art integrated linear interferometer is reconfigurability which enables the device to change its effect on the input optical modes upon the demand. This possibility has made programmable linear optical circuits particularly appealing for information processing challenges. In particular reconfigurable interferometers are the main ingredients of the contemporary linear optical quantum computing experiments [3–5] and are considered as the core of optical hardware accelerators for deep learning applications [6,7]. Furthermore, fabrication quality and improved scalability of reconfigurable photonic circuits led to the emergence of the field-programmable photonic array concept - a multipurpose photonic circuit that can serve many possible applications by means of low-level programming of the device [8].
A unitary transformation matrix $U$ completely describes an operation of a linear optical interferometer. The matrix $U$ couples input optical modes of the device to output ones $a^{(out)}_{j}=\sum _{i}U_{ij}a^{(in)}_{i}$. The architecture of the interferometer parametrizes the transformation $U = U(\{\varphi \})$ on tunable parameters $\{\varphi \}$ which are typically phase shifters controlling relative phases between arms of the interferometer. The architecture is labelled as universal if it allows reaching any arbitrary $N\times N$ unitary matrix by appropriately setting the phase shifts $\{\varphi \}$. The device programming is then essentially boiled down to establishing the correspondence between the desired matrix $U_{0}$ and the appropriate parameter set $\{\varphi ^{0}\}$. Hurwitz analytical decomposition of the $N\times N$ unitary matrix [9] is a well-known example of a universal architecture. It implies straightforward implementation using simple optical components [10,11] - the two-port Mach-Zender interferometers (MZI) with two controllable phase shifters. The interferometer architecture based on this decomposition is a mesh layout of MZIs, which is very easy to program - an efficient inverse algorithm returns the values for each phase shifter given the unitary matrix $U_{0}$. The simplicity of this architecture comes at a cost of extremely stringent fabrication tolerances. The universal operation is achieved if and only if the beamsplitters in the MZI blocks are perfectly balanced, which is never the case in a real device. Numerical optimization methods have been adopted to mitigate the effect of imperfections [12,13], but the simple programming flow is deprived.
The challenge to overcome the effect of the fabrication defects have also led to development of more sophisticated architectures [14–18] which have no simple analytical description and can only be programmed using optimization routines. Running an optimization routine to set up a real physical device transformation requires an experimental execution of a resource-intensive procedure of the transformation matrix reconstruction [19] at each iteration of an optimization algorithm of choice. From the end-user perspective, a necessity to optimize the device configuration each time when the transformation needs to be changed is unacceptable. Several works report progress in designing the optimization algorithm for precise adjustments of phaseshifts in an interferometer [20–23]. Firstly, the optimization in a high-dimensional parameter space is itself a time-consuming procedure requiring sophisticated tuning and what is more, there is no guarantee that the global minimum will be reached. Secondly, the algorithms providing fast convergence in multiparameter optimization problems are typically gradient-based, and the precision of the gradient estimation of the objective function implemented by the physical device is limited by a measurement noise. Lastly, even though the number of switching cycles of the phase shifters is not strictly limited, spending the device resource during tedious optimization procedures severely degrades the lifetime of the programmable circuit. It is worth mentioning that except optimization methods new circuit architectures with auxillary elements implementing a feedback loop were proposed to tackle the photonic circuit programming challenge [24,25].
In this work, we develop an efficient algorithm for programming a linear optical interferometer with complex architecture. We employ one of the main methods of machine learning - supervised learning, widely applied to the neural networks training [26–28]. The interferometer model is learnt using the set of samples of transformations corresponding to different phase shifts. The trained model is used to quickly find the necessary phase shifts for a given unitary transformation using an optimization routine applied to the model and not to the physical device. Our learning algorithm is divided into two stages: the training stage - find the model of the interferometer using the training set of sample transformations, and the programming stage - determine the phase shifts of the interferometer model corresponding to the required transformation.
2. Formulation of the problem
We devised our algorithm to solve the problem of programming the multimode interferometer consisting of alternating phase-shifting and mode mixing layers. This architecture has been proven to deliver close to the universal performance and has no direct connection linking the elements of the matrix to the phase shifts of the interferometer [15]. This architecture serves as the perfect example to demonstrate the gist of our algorithm. The interferometer circuit topology is outlined in Fig. 1(a)). The unitary matrix $U$ is expressed as
The problem underpinning the difficulty of programming the required unitary in the multimode architecture is that the basis matrices $U_{\ell }$ of a fabricated device do not match the ones implied by the optical circuit design. The interferometer universality may not be degraded [15], but an efficient evaluation of the phase shifts $\varphi _{\ell k}$ becomes impossible since $U_{\ell }$ are not known anymore. The abscence of the $U_{\ell }$ matrices brings us to the first step of our learning algorithm - a reconstruction of elements of the basis matrices utilizing the information stored in the training set $\mathcal {T}$ which was gathered from the device of interest (see Fig. 1(b))). The set $\mathcal {T}$ includes $M$ pairs $(\bar {U}^{(i)},\bar {\Phi }^{(i)})$ obtained by seeding the device with random phase shifts $\bar {\Phi }^{(i)}$ and applying the unitary reconstruction procedure of choice [19,30] to get the $\bar {U}^{(i)}$. The basis matrices $U_{\ell }$ are then determined as a solution of the optimization problem
3. Learning algorithm
In this section, we present the algorithm which learns the interferometer model $\mathcal {M}$ based on the initial data contained in training set $\mathcal {T}$. We reduce the learning problem to the multiparameter optimization of the nonlinear functional $J$. In this section, we present the mathematical framework of the learning algorithm and exemplify its performance on the multimode interferometer.
3.1 Figure of merit
The figure of merit $J$ to be studied in our work is the Frobenius norm $J_{FR}$:
where $u_{ij}, \bar {u}_{ij}$ are complex elements of the unitary matrices $U, \bar {U}$. It is invariant only under the identity transformation, that is, $J_{FR} (U, \bar {U}) = 0$ if and only if the magnitudes and the phases of $U$ and $\bar {U}$ matrix elements are identical. The expression (3) can be rewritten using Hadamard’s product $(A \odot B)_{i, j}=(A)_{i, j} \cdot (B)_{i, j}$ operation and takes the following form:The gradient of the $J_{FR}$ with respect to the parameter set $\{\bar {\alpha }\}$ is given by:
3.2 Computing the gradients of $J$
A gradient-based optimization algorithm substantially benefits from the analytical gradient expressions of optimized functions. It turns out that the multimode interferometer expansion Eq. (1) admits simple analytic forms of the gradients over the $u_{ij} = x_{ij} + iy_{ij}$ elements of the basis matrices $U_{\ell }$ and over the phase shifts $\varphi _{\ell k}.$ We will derive the analytical expressions of the gradients $\partial _{x^{(\ell )}_{ij}} J_{FR}$, $\partial _{y^{(\ell )}_{ij}} J_{FR}$ and $\partial _{\varphi _{\ell k}} J_{FR}$ required during learning and tuning stages of the algorithm respectively. The Eq. (5) stems that the gradients $\partial _{x^{(\ell )}_{ij}} J_{FR}$, $\partial _{y^{(\ell )}_{ij}} J_{FR}$ and $\partial _{\varphi _{\ell k}} J_{FR}$ calculation is reduced to finding the expressions for $\partial _{x^{(\ell )}_{ij}} U$, $\partial _{y^{(\ell )}_{ij}} U$ and $\partial _{\varphi _{\ell k}} U$ respectively.
We will first focus on the $\partial _{x^{(\ell )}_{ij}} J_{FR}$ and $\partial _{y^{(\ell )}_{ij}} J_{FR}$ gradients. In order to simplify the computation we introduce $L$ auxiliary matrices $A_{\ell }$ and another $L$ auxiliary matrices $B_{\ell }$ as the partial products taken from the expansion Eq. (1):
where $A_{\ell }$ and $B_{\ell }$ can be calculated iteratively:Next, given that $x_{ij}^{({\ell })}$ and $y_{ij}^{({\ell })}$ are the real and imaginary parts of $u_{ij}^{({\ell })}$ respectively we get the expressions for the gradients:
Once the basis matrices of the model $\mathcal {M}$ are learnt we can use them to calculate the gradients $\partial _{\varphi _{\ell k}} J_{FR}$. Derivation of the $\partial _{\varphi _{\ell k}} J_{FR}$ also requires to introduce The $L + 1$ auxiliary matrices $C_{\ell }$ and $L + 1$ matrices $D_{\ell }$
representing the partial products from the general expansion Eq. (1). The iterative formula establishes $C_{\ell }$ and $D_{\ell }$ for each index $\ell$:Once the $C_{\ell }$ and $D_{\ell }$ are computed, the gradients $\partial _{\varphi _{\ell k}} U$ are given by
4. Numerical experiment
In this section, we provide key performance metrics of the interferometer model learning algorithm. We test the algorithm scaling properties with respect to the training set size $M$ and the number of interferometer modes $N$. We set the number of layers $L=N$ because this configuration already delivers universal performance [15]. To certify the quality of the model $\mathcal {M}$ we employ the cross-validation methodology which requires to use another set of examples in quality tests that were not included in the training set $\mathcal {T}$.
The simulation of the learning algorithm follows a series of steps. We first generate the training set $(\bar {U}^{(i)},\bar {\Phi }^{(i)})$ using the multimode interferometer expansion Eq. (1). The phases are sampled randomly from a uniform distribution from $0$ to $2\pi$. The basis matrices $U_{\ell }$ are drawn from the Haar-uniform distribution using the QR decomposition [31]. In a real-life setting the elements of $\mathcal {T}$ are the outcomes $\bar {U}^{(i)}_{rec}$ of the unitary reconstruction algorithms [19,30] applied to a reconfigurable interferometer programmed with the phases $\bar {\Phi }^{(i)}$. The subtleties of experimental gathering the appropriate training set are discussed in Sec. 5..
The proper interferometer model must accurately predict the unitary matrix of the real device with a certain set of applied phases $\Phi$. The cross-validation procedure purpose is to estimate the predictive strength of the model $\mathcal {M}$. We generate the test set $(\hat {U}^{(i)},\hat {\Phi }^{(i)})$ comprised of randomly selected phases $\hat {\Phi }^{(i)}$ and the corresponding $\hat {U}^{(i)}$. The cross-validation routine uses each sample from the test set to verify whether the interferometer model with phases $\hat {\Phi }^{(i)}$ outputs the unitary $\hat {U}^{(i)}$. The model is considered to pass the cross-validation if $J(U,\hat {U}) \leq 10^{(-2)}$. The criterium has been derived empirically by analyzing the behaviour of $J_{FR}$ convergence on the test set. If the model passes cross-validation, the $J_{FR}(U,\hat {U})$ experiences a rapid decrease down to the values less than $10^{-2}$.
The model is initialized with basis matrices $U_{\ell }$ selected either randomly or with a priori knowledge available based on the design of the physical elements realizing the basis matrices. We will study both cases and refer to the random initialization as the black box model. At each epoch of the learning process, we estimate the average gradient over the collection of examples from $\mathcal {T}$ and update the basis matrices according to the optimization algorithm (stochastic L-BFGS-B algorithm, SciPy package). Instead of averaging the gradient over complete training set, we randomly draw a subset of $m=5$ pairs $(\bar {U}, \bar {\Phi })$ each epoch and use this subset for averaging. The value $m=5$ has been empirically determined as it provided substantial computational speed-up while still keeping high training accuracy. We do not use the unitary parametrization of the basis matrices during the learning procedure and treat these matrices simply as the complex-valued square matrices. Since the parameters of the model are then the real $x_{ij}^{(\ell )}$ and the imaginary $y_{ij}^{(\ell )}$ parts of each basis matrix $U_{\ell }$ the updated basis matrices do not belong to the unitary space. We use the polar decomposition $A = H V$, where $H$ is the hermitian, and $V$ is the unitary matrix, to project updated complex basis matrix $C_{\ell }$ onto the closest unitary $U_{\ell }$ at each step of the optimization [32]. This method helps to avoid local minima, which may arise due to sophisticated unitary matrix parametrization.
The simulation code is written in Python employing Numpy and Scipy packages. The code is publicly available on GitLab [33].
4.1 Model - black box
We start first from considering the black-box model. This scenario implies that no a priori information is available about the basis matrices $U_{\ell }$ and the interferometer is represented by a black-box type system with $N^2$ variable phase parameters $\varphi _{{\ell }k}$. Therefore, the model should be initialized with a completely random guess. The initial basis matrices are sampled from the Haar-random unitary distribution [31]. The model cross-validation testing (see Fig. 2(a)) determines the size $M$ of the training set for each value of $N$. The convergence behaviour Fig. 2(b) of the model on the cross-validation set indicates the optimal volume of the training set $\mathcal {T}$. The black-box treatment of the interferometer model allowed us to enable successfull training of $\mathcal {M}$ only up to 6 modes. The larger interferometer models are unattainable for our optimization procedure. The plateau in the convergence behaviour on the test set $\mathcal {T}$ becomes significant already at $N = 6$ (see Supplement 1 for the convergence plot example). For $N > 6$, we failed to observe learning in the black-box setting - the average value of the figure of merit remains at the plateau. The work [34] suggests that the reason for a plateau in high-dimensional optimization problems is the presence of a large number of saddle points in an optimized function landscape rather than local minima. Several algorithms exploiting the idea of the adaptive gradient have been developed to tackle the problem of escaping the plateau [35,36]. The adoption of the appropriate algorithm may solve the learning problem in the black-box setting.
Until this moment the elements of $\mathcal {T}$ included the matrices $\bar {U}^{(i)}$ which were artificially generated using the Eq. (1) initialized with the $\bar {\Phi }^{(i)}$ set of phases. Gathering the same set using the real device means that the reconstruction of the $\bar {U}^{(i)}_{exp}$ matrices must be performed with absolute precision, which is never the case in an experiment. The learning algorithm has to be designed to tolerate a certain amount of discrepancy between ideal $\bar {U}_{ideal}$ and reconstructed $\bar {U}_{exp}$ matrices. These deviations are the inevitable consequence of imperfections of measurement tools used during the reconstruction process. We have modelled the behaviour of the learning algorithm seeded with a training set including the phase shifts $\bar {\Phi }^{(i)}$ and the unitaries $\bar {U}^{(i)}_{exp}$ slightly deviated from their theoretical counterpart $\bar {U}^{(i)}_{ideal}$.
The deviation is introduced between $\bar {U}_{exp}$ and $\bar {U}_{ideal}$ as the polar projection [32] of the perturbed $\bar {U}_{ideal}$ onto the unitary space:
where $X$ and $Y$ are the random real-valued matrices of size $N \times N$, whose elements are sampled from the normal distribution $\mathcal {N}(0,\,1)$. The degree of deviation is controlled by the real-valued parameter $\alpha$. The Fig. 2(c) illustrates the convergence of the model of the simplest case $N=2$ supplied with the training set sampled with the given deviation $\alpha$. The $\alpha \approx 0.04$ indicates the threshold at which the model fails to pass the cross-validation criteria $J_{FR} \leq 10^{-2}$. Averaging was performed with $1000$ models learnt using the training sets corresponding to the different basis matrices $U_{\ell }^{(0)}$. For each model, we performed the cross-validation test with $1000$ phase shift sets.4.2 Model with a priori knowledge
The black-box model results expounded in the sec. 4.1 evidence that the optimization complexity of the model with arbitrary initial basis matrices grows rapidly in the training set volume $M$. In this section, we study the choice of the initial approximation for the basis matrices $U_{\ell }$ which enables learning for the larger dimension $N$. The case when the basis matrices $U_{\ell }$ are completely unknown does not adequately reflect the real situation. In practice, the optical circuits with well-defined geometry and optical properties implement the basis matrices. The prototypes of these circuits can be tested beforehand to verify circuit’s performance, including the mode transformation that it generates. Contemporary integrated photonics fabrication technologies guarantee reproducibility up to a certain level of precision specific to each technological line. Summing up the basis matrix unitary transformation $U_{\ell }^{est}$ can be estimated in advance. This estimate serves as the initial guess for the optimization algorithm at the model training stage. This section will demonstrate how this knowledge substantially simplifies the optimization and enables learning the models of the interferometers with $N$ up to at least a few tens of modes.
We use estimated matrices $U_{\ell }^{est}$ as the initial guess for our optimization routine. These matrices are chosen to be close to the ideal basis matrices $U_{\ell }$ used for training set generation. We get the initial guess using the procedure described by Eq. (12), substituting $\alpha$ with parameter $\beta$ to avoid notation collision. The Fig. 3 shows the convergence of the learning algorithm employing the knowledge of the basis matrix unitary up to a certain precision. The a priori information about the basis matrices enabled learning the model of the interferometer up to $N=20$ using the same computational hardware. The larger the interferometer dimension $N$, the higher the precision of the $U_{\ell }^{est}$ estimation must be. The Fig. 3(b)) illustrates the regions of the $\beta$ value depending on the size of the interferometer where the algorithm still accurately learns the model $\mathcal {M}$.
The scalability of interferometer model learning aided with a priori knowledge about basis matrices is limited by the precision $\beta$ of estimation of initial approximation $U_{\ell }^{est}$. We provide in Fig. 3(c) the relation between error parameter $\beta$ and a conventional value $F(U_{\ell }^{est}, U_{\ell }^{exact}) \equiv \dfrac {1}{N^2} | Tr((U_{\ell }^{exact})^{\dagger } U_{\ell }^{est}) |^2$, where $U_{\ell }^{exact}$ denotes the exact basis matrices of mode-mixing layers. The curves were computed by averaging the fidelity values between $U_{\ell }^{est}(\alpha )$ and $U_{\ell }^{exact}$ over 1000 samples for each point. For the largest $N=20$ interferometer the $\beta = 0.11$ value required to pass the cross-validation test corresponds to approximately 80% fidelity. Experimental results reported in [1,30] indicate that reconstruction of an unknown unitary matrix can exceed 90%. The Fig. 3(c) also indicates that for fixed value $\beta$ fidelity drops down with larger $N$. This gives us hope that our algorithm can be implemented for large interferometers. However, reconstruction of unitary matrices for large $N$ still remains challening.
5. Discussion
The demonstrated approach to interferometer programming stands out with several major advantages. First and foremost, the method is agnostic of the architecture of the interferometer. Any universal interferometers reported in literature [10,11,15,16] admit the form of the expansion Eq. (1) - the optical mode mixing elements interleaved with phase shifters. Hence, both the gist of the algorithm and the mathematical framework fit any architecture of choice. We have elaborated two examples demonstratin applicability of our method to various architectures. First, we have simulated learning of basis matrices of a $N=5$ multiport interferometer with much larger amount of layers than is required for universal operation. the simulation results a presented in Fig. 4(a). Next we have decomposed a standard Clements $N=5$ interferometer into a sequence of basis matrices and phase-shifting layers. Each beamsplitter reflectivity $r=\cos \theta$ was modified by adding a normaly distributed random number to the parameter $\theta$
where ${N}(0,1)$ represents a random number sampled from a normal distribution. Figure 4(b) represents convergence curves for 50 instances of learning the unknown basis matrices of Clements interferometer. Each learning instance was seeded with basis matrices $U^{(est)}$ equal to matrices comprised of perfectly balanced beamsplitters. The result clearly indicates that standard layouts based on MZI interferometers can be programmed using the proposed algorithm. The architecture agnostic feature of the algorithm remains valid unless the mode mixers and the phase shifters are considered as independent elements. Next, an output of the learning algorithm is a complete interferometer model taking into account the transformation of the mode mixing elements in the fabricated device. The model answers the question of how close the required unitary $U_{0}$ can be approximated by the specific device under study and can pinpoint areas of unitary space inaccessible for the device due to design restrictions or fabrication flaws. This feature has to be compared with typical calibration data used for programming the interferometer based on MZI blocks. The phase shifters are calibrated, but no knowledge is available about the real transformation of the beamsplitters comprising the MZI. This fact leads to the necessity of running an optimization procedure to improve the fidelity of the implemented unitary if some of the beamsplitters do not meet the designed transmission coefficients. Lastly, the presented algorithm is essentially the reconstruction routine for the inner fixed optical elements of the complex interferometric circuit. Hence it can be adopted for probing the quality of the optical subcircuits located inside the larger optical scheme.The bottlenecks of the proposed algorithm are related to the practical issues. The $J_{FR}$ Frobenius metric requires exact measurement of the unitary elements’ modulus and phase. Several reconstruction methods have been proposed, and verified [19,30,37–39]. Some of them [30,37] provide only partial information about the transformation matrix of the interferometer, omitting phases that are impossible to reconstruct using the method-specific dataset. Any method will inevitably suffer from path-dependent optical loss, which is impossible to distinguish and attribute to the particular path inside the circuit. Another issue that is not covered by our algorithm arises from the crosstalks between the phase shifters. Our framework assumes that the phases in different paths are enabled independently, which is not the case due to the crosstalks between different phase modulating elements. Luckily the integrated photonic modulator implementations typically exhibit extremely low crosstalks [40,41].
We believe that our results will enable opportunities to employ new programmable optical interferometer architectures for both classical and quantum applications.
Funding
Russian Science Foundation (19-72-10069).
Acknowledgements
This work was supported by Russian Science Foundation (RSF), project No: 19-72-10069.
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
Supplemental document
See Supplement 1 for supporting content.
References
1. J. Carolan, C. Harrold, C. Sparrow, E. Martin-Lopez, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, M. Itoh, G. D. Marshall, M. G. Thompson, J. C. F. Matthews, T. Hashimoto, J. L. O’Brien, and A. Laing, “Universal linear optics,” Science 349(6249), 711–716 (2015). [CrossRef]
2. N. C. Harris, J. Carolan, D. Bunandar, M. Prabhu, M. Hochberg, T. Baehr-Jones, M. L. Fanto, A. M. Smith, C. C. Tison, P. M. Alsing, and D. Englund, “Linear programmable nanophotonic processors,” Optica 5(12), 1623 (2018). [CrossRef]
3. J. Wang, S. Paesani, Y. Ding, R. Santagati, P. Skrzypczyk, A. Salavrakos, J. Tura, R. Augusiak, L. Mančinska, D. Bacco, D. Bonneau, J. W. Silverstone, Q. Gong, A. Acín, K. Rottwitt, L. K. Oxenløwe, J. L. O’Brien, A. Laing, and M. G. Thompson, “Multidimensional quantum entanglement with large-scale integrated optics,” Science 360(6386), 285–291 (2018). [CrossRef]
4. J. Wang, F. Sciarrino, A. Laing, and M. G. Thompson, “Integrated photonic quantum technologies,” Nat. Photonics 14(5), 273–284 (2020). [CrossRef]
5. H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, and A. Q. Liu, “An optical neural chip for implementing complex-valued neural network,” Nat. Commun. 12(1), 457 (2021). [CrossRef]
6. R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X 9(2), 021032 (2019). [CrossRef]
7. G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588(7836), 39–47 (2020). [CrossRef]
8. D. Pérez-López, A. López, P. DasMahapatra, and J. Capmany, “Multipurpose self-configuration of programmable photonic circuits,” Nat. Commun. 11(1), 6359 (2020). [CrossRef]
9. A. Hurwitz, “Uber die erzeugung der invarianten durch interation,” Nachr. Gess. Wiss. Gottingen pp. 71–90 (1897).
10. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]
11. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3(12), 1460–1465 (2016). [CrossRef]
12. R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an imperfect photonic network to implement random unitaries,” Opt. Express 25(23), 28236–28245 (2017). [CrossRef]
13. I. V. Dyakonov, I. A. Pogorelov, I. B. Bobrov, A. A. Kalinkin, S. S. Straupe, S. P. Kulik, P. V. Dyakonov, and S. A. Evlashin, “Reconfigurable photonics on a glass chip,” Phys. Rev. Appl. 10(4), 044048 (2018). [CrossRef]
14. R. Tang, T. Tanemura, and Y. Nakano, “Integrated reconfigurable unitary optical mode converter using mmi couplers,” IEEE Photonics Technol. Lett. 29(12), 971–974 (2017). [CrossRef]
15. M. Y. Saygin, I. V. Kondratyev, I. V. Dyakonov, S. A. Mironov, S. S. Straupe, and S. P. Kulik, “Robust architecture for programmable universal unitaries,” Phys. Rev. Lett. 124(1), 010501 (2020). [CrossRef]
16. S. A. Fldzhyan, M. Y. Saygin, and S. P. Kulik, “Optimal design of error-tolerant reprogrammable multiport interferometers,” Opt. Lett. 45(9), 2632 (2020). [CrossRef]
17. R. Tanomura, R. Tang, S. Ghosh, T. Tanemura, and Y. Nakano, “Robust integrated optical unitary converter using multiport directional couplers,” J. Lightwave Technol. 38(1), 60–66 (2020). [CrossRef]
18. R. Tang, R. Tanomura, T. Tanemura, and Y. Nakano, “Ten-port unitary optical processor on a silicon photonic chip,” ACS Photonics 8(7), 2074–2080 (2021). [CrossRef]
19. M. Tillmann, C. Schmidt, and P. Walther, “On unitary reconstruction of linear optical networks,” J. Opt. 18(11), 114002 (2016). [CrossRef]
20. S. Pai, B. Bartlett, O. Solgaard, and D. A. B. Miller, “Matrix optimization on universal unitary photonic devices,” Phys. Rev. Appl. 11(6), 064044 (2019). [CrossRef]
21. S. Bandyopadhyay, R. Hamerly, and D. Englund, “Hardware error correction for programmable photonics,” Optica 8(10), 1247 (2021). [CrossRef]
22. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864 (2018). [CrossRef]
23. S. Pai, I. A. D. Williamson, M. Minkov, T. W. Hughes, O. Solgaard, S. Fan, and D. A. B. Miller, “Parallel fault-tolerant programming and optimization of photonic neural networks,” in Conference on Lasers and Electro-Optics, (Optical Society of America, 2020), p. SM1E.5.
24. D. A. B. Miller, “Self-aligning universal beam coupler,” Opt. Express 21(5), 6360 (2013). [CrossRef]
25. R. Hamerly, S. Bandyopadhyay, and D. Englund, “Accurate Self-Configuration of Rectangular Multiport Interferometers,” arXiv e-prints arXiv:2106.03249 (2021).
26. M. A. Nielsen, Neural Networks and Deep Learning (Determination Press, 2015).
27. I. Ohn and Y. Kim, “Smooth function approximation by deep neural networks with general activation functions,” Entropy 21(7), 627 (2019). [CrossRef]
28. S. Ferrari and R. F. Stengel, “Smooth function approximation using neural networks,” IEEE Trans. Neural Netw. 16(1), 24–38 (2005). [CrossRef]
29. N. N. Skryabin, I. V. Dyakonov, M. Y. Saygin, and S. P. Kulik, “Waveguide lattice based architecture for multichannel optical transformations,” arXiv e-prints arXiv:2103.02664 (2021).
30. D. Suess, N. Maraviglia, R. Kueng, A. Maïnos, C. Sparrow, T. Hashimoto, N. Matsuda, D. Gross, and A. Laing, “Rapid characterisation of linear-optical networks via PhaseLift,” arXiv e-prints arXiv:2010.00517 (2020).
31. F. Mezzadri, “How to generate random matrices from the classical compact groups,” arXiv e-prints math-ph/0609050 (2006).
32. K. Fan and A. J. Hoffman, “Some metric inequalities in the space of matrices,” Proc. Am. Math. Soc. 6(1), 111 (1955). [CrossRef]
33. S. Kuzmin, “Nnoptic,” https://gitlab.com/SergeiKuzmin/nnoptic (2020).
34. R. Pascanu, Y. N. Dauphin, S. Ganguli, and Y. Bengio, “On the saddle point problem for non-convex optimization,” arXiv e-prints arXiv:1405.4604 (2014).
35. M. Staib, S. J. Reddi, S. Kale, S. Kumar, and S. Sra, “Escaping Saddle Points with Adaptive Gradient Methods,” arXiv e-prints arXiv:1901.09149 (2019).
36. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv e-prints arXiv:1412.6980 (2014).
37. A. Laing and J. L. O’Brien, “Super-stable tomography of any linear optical device,” arXiv e-prints arXiv:1208.2868 (2012).
38. S. Rahimi-Keshari, M. A. Broome, R. Fickler, A. Fedrizzi, T. C. Ralph, and A. G. White, “Direct characterization of linear-optical networks,” Opt. Express 21(11), 13450 (2013). [CrossRef]
39. N. Spagnolo, E. Maiorino, C. Vitelli, M. Bentivegna, A. Crespi, R. Ramponi, P. Mataloni, R. Osellame, and F. Sciarrino, “Learning an unknown transformation via a genetic approach,” Sci. Rep. 7(1), 14316 (2017). [CrossRef]
40. R. Zhang, Y. He, Y. Zhang, S. An, Q. Zhu, X. Li, and Y. Su, “Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data,” Nanophotonics 10(2), 937–945 (2020). [CrossRef]
41. L. Jiang, X. Chen, K. Kim, G. de Valicourt, Z. R. Huang, and P. Dong, “Electro-optic crosstalk in parallel silicon photonic mach-zehnder modulators,” J. Lightwave Technol. 36(9), 1713–1720 (2018). [CrossRef]