Linear optical random projections without holography

Ruben Ohana; Ruben Ohana; Daniel Hesslow; Daniel Hesslow; Daniel Brunner; Sylvain Gigan; Kilian Müller

doi:10.1364/OE.496224

1. Introduction

For most intents and purposes, the propagation of light is linear. The input and output fields of an optical system can therefore be described as vectors $x$ and $y$ that are connected via the transmission matrix $M$ of the system: $y = Mx$. This notion and the ubiquitous use of vector-matrix multiplications in all data processing is one reason for the continuous interest in building optical processors. However, since detectors register the intensity of light, and not its complex amplitude, they inevitably introduce the modulus-square non-linearity and detect $|y|^2$, not $y$. All information about the phase of the complex output field $y$ is therefore lost when optical intensity is the measurement variable. Holographic techniques can be used to recover the phase and the linearity of the operation. However, they come at a cost: indispensable for holographic methods is a reference beam that is made to interfere with the output field, which invokes stringent stability requirements. Further, in the case of phase-shift holography [1] at least three pictures/measurements must be taken to recover the phase, slowing down the data processing rate. Only one image has to be taken to perform off-axis holography [2]. Yet, this experimental simplification comes at the price of non-trivial digital post-processing that has to be performed, and the maximum output dimension is significantly smaller than the number of pixels on the intensity-recording camera since interference fringes for each output feature need to be resolved.

In this work, we present a method to optically perform a linear vector-matrix multiplication without holography, where the matrix elements are independent and identically distributed random variables drawn from a normal distribution. This operation is surprisingly versatile: distances between vectors are approximately conserved as shown by the Johnson-Lindenstrauss lemma [3], and the field of randomized numerical linear algebra (RNLA) is exploiting such operations in a variety of ways to be able to solve linear algebra tasks for very high dimensional data [4,5]. Consequently, an optical processor that performs random projections can lend itself to all these uses, while keeping the well-known advantages of optical computing: since the computation is executed as the light propagates through the optical system, it is entirely passive, very fast, and highly parallel.

Our method requires as few as $2N+2$ images to perform $N$ linear random projections, does not rely on interference with a reference beam, and only requires computationally trivial post-processing. It therefore allows for a very stable and simple design of a fast optical linear processor. To our knowledge, this method presented in the following is unknown in the community.

2. Presentation of the method

We start in the most general setting where $M$ is an arbitrary linear transform, and $x$ is an arbitrary input vector. We choose a fixed input vector $x_a$, which in the following we refer to as the anchor vector, and we require that $Mx_a$ exclusively comprises non-zero elements. In practice we achieve this by excluding those dimensions of the output vector space for which the corresponding elements of $|M x_a|$ are smaller than some threshold. In the case of random projections, which is the focus of this manuscript, this procedure merely leads to a reduction of the dimensionality of the output vector space, and has no other discernible consequences.

In the following, the complex conjugate is denoted by an overline $\overline {(\ldots )}$, and $\mathfrak {Re}(\ldots )$ is the real part of a complex value. The absolute value $|\ldots |$, multiplication $(\ldots ) \odot (\ldots )$, division $(\ldots )/(\ldots )$ and power $(\ldots )^p$ are element-wise operations. We start with the following identity:

(1)$$|M(x_a - x)|^2 = |M x_a|^2 + |M x|^2 - 2\mathfrak{Re}\left(M x_a \odot \overline{M x}\right)$$

Next, we construct a new linear transform $\tilde {M}x = (Mx) \odot \exp (-i \arg (Mx_a))$. $\tilde {M}$ is defined such that $\tilde {M}x_a$ is real and non-negative. It follows that for any $x$ we have $|\tilde {M}x| = |Mx|$, $M x_a \odot \overline {M x} = \tilde {M} x_a \odot \overline {\tilde {M} x}$, and $\tilde {M}x_a = |\tilde {M}x_a| = |Mx_a|$. This allows us to separate the real part of the product in Eq. (1) into the product of two real parts:

(2)$$\begin{aligned}|M(x_a - x)|^2 = |M x_a|^2 + |M x|^2 - 2|M x_a| \odot \mathfrak{Re}\left(\tilde{M} x\right) \end{aligned}$$

(3)$$\begin{aligned} \Rightarrow Lx := \mathfrak{Re}\left(\tilde{M} x\right) = \frac{|M x_a|^2 + |M x|^2 - |M(x_a - x)|^2}{2|M x_a|} \end{aligned}$$

We note that on the left-hand side of Eq. (3), we have a linear transformation $L$ of the vector $x$, while the right-hand side only contains quantities that can be obtained from intensity measurements in an optical setup. Although the mathematical derivation does not constrain the choice of the anchor vector $x_a$ other than that $|M x_a|$ must only have non-zero elements, there are experimental considerations that we explain later in this manuscript. Of course, this derivation only makes practical sense if one can construct a linear transform that is "useful": note that we don’t obtain the real part of the optical field at the camera $Mx$, but the generally dissimilar real part of $\tilde {M}x$. Note also that knowledge of the real part and absolute value of $\tilde {M} x$ only allows us to determine the absolute value of its imaginary part, and not the signs of its elements. In the following, we develop how we employ this procedure to create a random matrix with Gaussian i.i.d. entries.

We are using an optical processing unit (OPU) that has been developed by LightOn [6], the principle of which has previously been described in [7,8]. These OPUs have been successfully used in various machine learning settings such as adversarial robustness [9,10], differential privacy [11], reservoir computing [12] and randomized numerical algebra [13]. A schematic drawing of an OPU is shown in Fig. 1(a), and further technical details can be found in its caption. In short, a digital micromirror device (DMD) is used to imprint information (vector $x$) onto a laser beam. The DMD contains individually controllable mirrors that reflect the light either into a beam block (OFF-state) or towards a diffusive medium (ON-state). We are therefore limited to input vectors with binary entries (0 or 1). As this binary modulated beam propagates through the diffusive medium, it is randomly scattered multiple times. This naturally results in a random optical transmission matrix with normally distributed complex entries [14].

Fig. 1. Panel a) shows the schematic of the experiment: A collimated laser beam (Cobolt Samba 04-03; 100 mW @ 532 nm) is incident on a DMD (DLP4500 from Texas Instruments with Ajile AJD-4500-UT controller). Pixels in the OFF position reflect the light towards a beam dump. Pixels in the ON position direct the light towards a focusing lens (f=25 mm). The binary pattern displayed on the DMD is therefore encoded as a 2D binary amplitude pattern in the beam cross-section. In the focal plane of the lens is an optically thick ceramic diffuser. Light is multiply and randomly scattered as it propagates to the other side. It is this diffusive medium that results in the random transmission matrix of the overall system. The speckle emanating from the diffuser is captured by a camera (Basler ACA2000 340KMS). A small section of a speckle is shown in panel b). The maximum frame rate is limited by the camera: at full frame and 8bit resolution it is 340 Hz, but it can be increased when reading out only part of the sensor.

Download Full Size | PDF

We have found experimentally and in numerical simulations that in the canonical basis, the elements of $L$ as derived in Eq. (3) have a bias that depends on the choice of the anchor vector $x_a$. In order to remove this bias and to obtain a matrix whose elements are symmetrically distributed around zero, we work with vectors in the Hadamard basis. This is an orthogonal set of vectors $\{h_i\}$ with binary entries of either -1 or 1. We project them via $L(h_i) = L(h_i^+) - L(h_i^-)$, where $h_i^+$ and $h_i^-$ only have entries 0 or 1 and can therefore be displayed on the DMD. Calculating this difference removes the bias. All of the entries of $h_0$ are equal to 1, making it an obvious choice for the anchor vector $x_a$, since it allows us to display $x_a - x$ on the DMD for all binary vectors $x$, which is necessary for the linear construction. This choice of basis vectors and anchor vector is also advantageous for the following reason: all of the DMD pixels reflect light towards the camera when displaying $x_a$, and half of them do for all vectors $h_i^+$ and $h_i^-$. That is, the average intensity of the speckle pattern $|M x_a|^2$ detected by the camera is only twice as big as $|M h_i^+|^2$ and $|M h_i^-|^2$. It is therefore possible to fix the exposure time and laser intensity such that a good use of the dynamic range of the camera is guaranteed, which in turn leads to a minimization of errors due to quantization of the camera data.

3. Experimental results

We first verify that the transform $L$ defined in Eq. (3) is indeed linear. For this, we are not obliged to use vectors in the Hadamard basis, but can simply generate three binary input vectors $x_1$, $x_2$, and $x_3$ that fulfill the condition $x_3 = x_1 + x_2$, for which $L(x_3) - L(x_1) - L(x_2)$ should be equal to zero. The result is shown in Fig. 2. We note that the distributions of $L(x_i)$ are not centered around zero due to the aforementioned bias of $L$ in the canonical basis. The distribution of $L(x_3) - L(x_1) - L(x_2)$, on the other hand, is centered around zero, and its width is about 13 times smaller than those of $L(x_i)$. We have found that small elements of $|M x_a|$ contribute to this residual broadening: since we divide by this vector, the consequently small signal-to-noise ratio of small elements leads to large fluctuations of the corresponding elements of $Lx$. To limit this effect we exclude elements of $Lx$ for which the recorded $|M x_a|^2$ is smaller than an adjustable threshold. In practice we set this threshold to 20 for our 8-bit resolution data for all the results presented in this paper. Camera noise does not contribute significantly to the broadening in our setup. We are not certain where the remaining width comes from. One possible factor that we could not test was microscopic vibrations of the setup, which would result in small relative translations between the speckle field and the camera sensor.

Fig. 2. Basic test of linearity: The distributions of $L(x_1)$ and $L(x_2)$ are very similar and overlap. The standard deviations are $2.75$ for all $L(x_i)$. The standard deviation of $L(x_3) - L(x_1) - L(x_2)$ is $0.21$.

Download Full Size | PDF

We then investigate the distribution of the matrix elements of $L$, measured in the Hadamard basis. To obtain the best results, we implement procedures that compensate for imperfections in the experimental setup: Initially, a Gaussian-shaped laser beam illuminates the DMD, resulting in higher light intensity at the center than at the edges. To compensate for this, we assign a random distribution of DMD pixels to each input vector element, creating distributed macro-pixels. Consequently, each macro-pixel receives, on average, equal light intensity. Similarly, we only utilize camera pixels with roughly equal average illumination, as the incident intensity on the detector is not uniform.

Next, we account for the size of an average speckle grain on the camera, which has a standard deviation of approximately 0.9 camera pixels. This implies that measured intensities on neighboring camera pixels are correlated, and we use only every third pixel to remove these correlations from $L$. We have measured the correlations between neighboring DMD pixels and confirmed they are negligible.

Therefore our system has the flexibility to trade between maximum "cleanliness" – i.e. Gaussian distributed and i.i.d. elements – of the random matrix, or maximum input and output dimensions. Here we choose the former and obtain a system with a maximum (binary) input dimension of $\sim 10^5$ and a maximum output dimension of $\sim 3 \times 10^4$.

Figure 3 shows some properties of an experimentally retrieved matrix $L$ with $4096 \times 4096$ elements, represented in the Hadamard basis. The projections of the Hadamard vectors are the columns of $L$. We compare these experimental results to a numerical simulation of our method, for which we create $M$ with Gaussian i.i.d. elements. We find that the distribution of the matrix elements of $L$ is Gaussian in both cases, and we attribute the small deviation of the experimental data to remaining and unidentified artifacts of our setup. Furthermore, as described earlier we eliminate correlations of the matrix elements of $L$ by excluding neighboring camera pixels. To verify this method we compare the SVD of $L$ to the Marchenko-Pastur law [14,15]. In our experience, this is a sensitive probe, with SVDs deviating quickly from the theoretical prediction even for residual correlations. Finally, we show that $L$ is approximately orthogonal by calculating $L L^T$.

Fig. 3. Left: A normal distribution (N. D.), simulated data, and the distribution of the elements of the experimental matrix. All distributions are normalized such that their standard deviations are equal to 1. Center: The Marchenko-Pastur (M-P) law for square random i.i.d. matrices – i.e. the quarter circle law –, compared to the SVDs of simulated data and of the experimental matrix. Right: A $20 \times 20$ subset of $L L^T$, showing that the rows and columns of the experimental transform $L$ are approximately orthogonal. The diagonal and the off-diagonal values of the entire $4096 \times 4096$ matrix are $0.999 \pm 0.023$ and $0 \pm 0.016$, respectively.

Download Full Size | PDF

In Fig. 3 we established that we can obtain a very clean random matrix. To demonstrate that we can also scale to high dimensions, we perform a dimensionality reduction using the Johnson-Lindenstrauss lemma [3] using the maximum available input dimension of our system: $912 \times 1140 \approx 1.0 \times 10^6$, equal to the total number of DMD pixels. Consequently, the transform $L$ is now influenced by the inhomogeneous illumination of the DMD, affecting the magnitude of its matrix elements. We generate 100 input vectors and project them down to a dimension of 1000. The Johnson-Lindenstrauss lemma guarantees that the pairwise distances between vectors are approximately preserved: $\parallel x_i - x_j \parallel \approx \lambda \parallel L(x_i) - L(x_j) \parallel$. For our test, we have normalized $L$ such that $\lambda = 1$. Figure 4 demonstrates that our experimental setup can carry out the dimensionality reduction without significant performance degradation when compared to a numerical random projection.

Fig. 4. Dimensionality reduction via the Johnson-Lindenstrauss lemma. 100 input vectors are projected from the maximal input dimension of our system ($\approx 1.0 \times 10^6$) to a 1000-dimensional space. We have done this numerically and on the experimental setup. In the former, the random matrix has normally distributed i.i.d. elements. Left: The scatter-plot shows that pairwise distances are conserved. Right: A histogram showing the relative errors induced by the compression. Our optical setup performs as well as the numerical projection: In this data set the standard deviation of the simulation is 0.21, and that of the experiment is 0.22.

Download Full Size | PDF

So far, we have used input vectors in the Hadamard basis in order to remove the bias of the transform $L$. This makes intuitive sense since the projection of each vector $h_i$ involves the difference of two projections, and any constant bias is thus removed by this subtraction. However, this implies that $4N+1$ images need to be taken for $N$ random projections. Additionally, we have developed a different way to obtain similar results that requires only $2N+2$ measurements, thus speeding up the process: We first note that any affine transformation $\alpha (x)$ can be written as a linear transform plus a constant offset, $\alpha (x) = Fx + C$, and that two linear transforms applied in series result in another linear transform. The idea is then to prepare the input data with an affine transform before applying $L$: $x\rightarrow L(\alpha (x))$. The total bias $C = L(\alpha (0))$ can then be subtracted at the end. Constrained by the binary nature of our DMD, we choose the XOR operation with a constant vector $A$ as our affine operation: $\alpha (x) = x \oplus A$. With this modification our method then is ($\lnot$ denotes the element-wise binary NOT operator):

(4)$$\begin{aligned}K(x) := L(A \oplus x) - L(0 \oplus x) \end{aligned}$$

(5)$$\begin{aligned} = \frac{|M(x \oplus \lnot A)|^2 - |M(x \oplus A)|^2 + |M(A)|^2 - |M( \lnot A)|^2}{2|M(x_a)|} \end{aligned}$$

We see in the numerator of Eq. (5) the differences of two pairs of corresponding terms. Following the same logic as above, any constant bias is therefore removed. We have repeated all the tests in this paper using $K$ and obtained similar results.

4. Discussion

We have presented a novel method that allows to perform linear random projections without any holographic methods. This simplifies the experimental setup and increases its stability since interference with a reference beam is no longer necessary. In order to obtain the cleanest random matrix, we were restricted to using a subset of the pixels of our camera and need to create macro-pixels on the DMD. Consequently, the size of the accessible random matrix is reduced. However, we have shown with a demonstration of the Johnson-Lindenstrauss lemma that for some RNLA algorithms, a "perfect" random matrix is not necessary, and the system becomes usable at its maximum dimensions. We also want to note that the optical setup could be improved to increase the accessible dimensionality while retaining nice mathematical conditions of the random matrix: output correlations can be minimized, and input and output illuminations be made more homogeneous. A larger bit-depth of the camera might further improve the precision of the optical calculation, provided the signal-to-noise ratio of the least significant bits is sufficiently high. Input and output devices with large pixel counts are now available as off-the-shelve components: the manufacturers of the SLM and camera used in our setup for example offer the TI DLP801XE with $\sim 0.92\times 10^7$ pixels, and the Basler boA9344-70cm with $\sim 6.5\times 10^7$ pixels, opening the possibility for matrices with on the order of $10^{14}$ elements. Such a matrix would take up on the order of 100 TB in single precision when implemented numerically, which is about three orders of magnitude more than the maximum memory available on the state-of-the-art NVIDIA H100 and A100 GPUs, showing the enormous potential of using optics in such very high dimensional setting.

Implementing our method with any spatial light modulator that acts on the amplitude of light is straightforward, and we hope that it can therefore find immediate application in laboratories that utilize similar experimental configurations. Although we have only developed this method for random projections, we would like to note that it may also find applications for other linear optical transforms. Consider, for example, the optical Fourier transform of a point source centered on the optical axis. The result is an output field with a constant phase. Using the same formalism and this point source as the anchor vector $x_a$ it follows that up to a constant phase factor $\tilde {M} = M$, and $L$ is the real part of the optical Fourier transform. In general, there may be systems where the liberty in choosing the anchor vector $x_a$, in combination maybe with a gray-scale amplitude SLMs, or SLMs that can control amplitude and phase, could open up further applications. Finally, this method may be useful in setups where the optical transmission matrix is learned, modified, or designed to achieve a certain purpose, as for example in [16–19]. When incorporating our method from the beginning, the system could directly be optimized such that $L$ yields the desired linear transform.

Funding

H2020 Future and Emerging Technologies (899794); European Research Council (SMARTIES-724473).

Acknowledgments

We acknowledge support from EU Horizon 2020 FET-OPEN OPTOLogic. S.G. acknowledges funding from the European Research Council ERC Consolidator Grant (SMARTIES-724473).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. I. Yamaguchi and T. Zhang, “Phase-shifting digital holography,” Opt. Lett. 22(16), 1268–1270 (1997). [CrossRef]

2. E. Cuche, F. Bevilacqua, and C. Depeursinge, “Digital holography for quantitative phase-contrast imaging,” Opt. Lett. 24(5), 291–293 (1999). [CrossRef]

3. W. B. Johnson, “Extensions of lipschitz mappings into a hilbert space,” Contemp. Math. 26, 189–206 (1984). [CrossRef]

4. P. Drineas and M. W. Mahoney, “Randnla: randomized numerical linear algebra,” Commun. ACM 59(6), 80–90 (2016). [CrossRef]

5. P.-G. Martinsson and J. A. Tropp, “Randomized numerical linear algebra: Foundations and algorithms,” Acta Numerica 29, 403–572 (2020). [CrossRef]

6. C. Brossollet, A. Cappelli, I. Carron, C. Chaintoutis, A. Chatelain, L. Daudet, S. Gigan, D. Hesslow, F. Krzakala, J. Launay, S. Mokaadi, F. Moreau, K. Müller, R. Ohana, G. Pariente, I. Poli, and E. Tommasone, “Lighton optical processing unit: Scaling-up ai and hpc with a non von neumann co-processor,” arXiv, arXiv:2107.11814 (2021). [CrossRef]

7. A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Drémeau, S. Gigan, and F. Krzakala, “Random projections through multiple optical scattering: Approximating kernels at the speed of light,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2016), pp. 6215–6219.

8. R. Ohana, J. Wacker, J. Dong, S. Marmin, F. Krzakala, M. Filippone, and L. Daudet, “Kernel computations from large-scale random features obtained by optical processing units,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2020), pp. 9294–9298.

9. A. Cappelli, J. Launay, L. Meunier, R. Ohana, and I. Poli, “Ropust: improving robustness through fine-tuning with photonic processors and synthetic gradients,” arXiv, arXiv:2108.04217 (2021). [CrossRef]

10. A. Cappelli, R. Ohana, J. Launay, L. Meunier, I. Poli, and F. Krzakala, “Adversarial robustness by design through analog computing and synthetic gradients,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2022), pp. 3493–3497.

11. R. Ohana, H. Medina, J. Launay, A. Cappelli, I. Poli, L. Ralaivola, and A. Rakotomamonjy, “Photonic differential privacy with direct feedback alignment,” Advances in Neural Information Processing Systems 34, 22010–22020 (2021).

12. M. Rafayelyan, J. Dong, Y. Tan, F. Krzakala, and S. Gigan, “Large-scale optical reservoir computing for spatiotemporal chaotic systems prediction,” Phys. Rev. X 10(4), 041037 (2020). [CrossRef]

13. D. Hesslow, A. Cappelli, I. Carron, L. Daudet, R. Lafargue, K. Müller, R. Ohana, G. Pariente, and I. Poli, “Photonic co-processors in hpc: using lighton opus for randomized numerical linear algebra,” arXiv, arXiv:2104.14429 (2021). [CrossRef]

14. S. M. Popoff, G. Lerosey, R. Carminati, M. Fink, A. C. Boccara, and S. Gigan, “Measuring the transmission matrix in optics: an approach to the study and control of light propagation in disordered media,” Phys. Rev. Lett. 104(10), 100601 (2010). [CrossRef]

15. V. A. Marchenko and L. A. Pastur, “Distribution of eigenvalues for some sets of random matrices,” Matematicheskii Sbornik 114, 507–536 (1967). [CrossRef]

16. C. M. Valensise, I. Grecco, D. Pierangeli, and C. Conti, “Large-scale photonic natural language processing,” Photonics Res. 10(12), 2846–2853 (2022). [CrossRef]

17. M. W. Matthès, P. Del Hougne, J. De Rosny, G. Lerosey, and S. M. Popoff, “Optical complex media as universal reconfigurable linear operators,” Optica 6(4), 465–472 (2019). [CrossRef]

18. G. Jacucci, L. Delloye, D. Pierangeli, M. Rafayelyan, C. Conti, and S. Gigan, “Tunable spin-glass optical simulator based on multiple light scattering,” Phys. Rev. A 105(3), 033502 (2022). [CrossRef]

19. M. G. Anderson, S.-Y. Ma, T. Wang, L. G. Wright, and P. L. McMahon, “Optical transformers,” arXiv, arXiv:2302.10360 (2023). [CrossRef]

Linear optical random projections without holography

Abstract

1. Introduction

2. Presentation of the method

3. Experimental results

4. Discussion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (4)

Equations (5)

Optics Express