Programmable matrix operation with reconfigurable time-wavelength plane manipulation and dispersed time delay

Yuyao Huang; Wenjia Zhang; Fan Yang; Jiangbing Du; Zuyuan He

doi:10.1364/OE.27.020456

1. Introduction

Artificial neural networks (ANNs) enabled machine learning technologies [1] have been widely applied in many areas including computer vision, speech recognition and nature language processing [2–4]. However, ANNs based on statistical learning theory (SLT) involves large amount of matrix manipulations, which pose rigorous requirements on current computing hardware with increasing data capacity and network depth, due to bandwidth limitation and power inefficiency of electronics. Photonic signal processing [5–7] based optical computing is a promising technology to break the physical bottlenecks of electrical counterparts [8] attributed to the intrinsic parallelism and low power consumption for high speed operation [9–12]. Significant efforts have been made in exploring matrix manipulation by optics. The famous Stanford multiplier [13] is initially proposed by using the free space optics, which is composed of arrays of high-speed LED and avalanche photodiode to perform 32-point DFT with speed of $3 \times 10^{9}$ complex samples per second. However, this delicate architecture is limited by its bulky volume and fixed computation scale. Alternative solutions aiming at fully optical matrix multiplication in the integrated architecture are proposed by leveraging the versatile functions of optical components. Semiconductor laser [14] is utilized to perform optical vector and matrix operation at $5 \times 10^{8}$ operations/s, and microring modulator array is also exploited for the calculation of matrix-vector multiplication [15] with speed of $8 \times 10^{7}$ MACs/s and photonic neuromorphic computing [16] with up to TeraMACs/s processing capability. Integrated Mach-Zehnder interferometer (MZI) array [17] is another popular solution for monolithic photonic neural network circuits [18] according to the theory of unitary matrix transformation enabled by MZI mesh [17, 19], with potential of TeraFLOPs/s operation speed. However, the limitation of the aforementioned schemes lies mainly in their poor scalability and flexibility: computation scale is fixed once these optical devices are fabricated and there is little possibility for the devices to adapt another order of matrix multiplication under such circumstance.

In this work, we address the issues by proposing a novel matrix manipulation scheme with simple hardware architecture by establishing a mapping between mathematical operation and optical configuration [20]. Linear weighting, which is the cornerstone of matrix multiplication, is performed in both wavelength and time domain for the proposed architecture. For the proof-of-concept, we perform the autocorrelation of a 7-bit m-sequence by both simulation and experiment with speed at $1.18 \times 10^{11}$ MACs/s in wavelength domain. A fourth-order matrix-vector multiplication at $2.69 \times 10^{9}$ MACs/s by experiment and the edge extraction of 32 × 32 binary images at $5 \times 10^{8}$ MACs/s by simulation are also realized in time domain. Furthermore, the properties that affect the system scalability are discussed and the potentials of this proposed architecture as the building block to process complex matrix-involved computing tasks are also carefully considered.

Fig. 1 Principle of the proposed linear weighting operation architecture with broadband modulation and dispersion effect. ΔT: tunable width of time gate (here ΔT = T/N), PD: photodetector.

Download Full Size | PDF

2. Proof-of-concept of optical matrix manipulation

2.1. Principle of optical matrix manipulation

Linear weighting is one of the most basic operations in matrix manipulation, which can be expressed as follow:

y = \sum_{i = 1}^{N} w_{i} x_{i} = w_{1} x_{1} + w_{2} x_{2} + ... + w_{i} x_{i} = W X^{T}

where W=[

w_{1}, w_{2}, ..., w_{N}

] and

X^{T}

=[

x_{1}, x_{2}, ..., x_{N}

] are the basis of weight coefficient and the input vector respectively. The whole mathematical operation can be mapped into a plane of time and wavelength in optics as shown in Fig. 1. An unweighted input vector X with N elements are modulated on top of the N channels of parallel wavelengths simultaneously via a broadband modulator (e.g. Mach-Zehnder modulator, MZM) with data duration of T. Then the raw input elements are weighted by a weight bank with N coefficients respectively, performing multiplications between elements in input vector and weight basis. A dispersive medium is designed precisely to delay the weighted multi-wavelength light by the period of one element (i.e. T/N), stretching the time-wavelength plane to parallelogram shape and overlapping the light of different wavelengths in the time dimension. A time gate (which can be realized by generating an optical square-pulse wave via an MZM) with tunable width is utilized to sample the overlapped wavelengths at certain time slot in order to obtain a weighted vector. The delayed and sampled channels are then received by a photodetector (PD) where only intensity information of the received signal is retrieved at the arriving time, becoming a natural adder in optical field that sums the elements of weighted vector up.

The degree of freedom of the weighting scheme can include the wavelength and time dimension, and the parallelism of the intrinsic nature in optics will not only provide the flexibility for the orders of matrix multiplication, but also be able to scale up the architecture for the massive high dimensional parallel computation. Therefore, we propose two approaches for the weight bank design as stated in the following parts.

2.2. Wavelength domain weighting

The concept of wavelength domain weighting (WDW) can be interpreted as a wavelength filter where light of each wavelength is modulated in intensity individually. The illustration in Fig. 2 shows the weight bank of WDW method in a time-wavelength plane, in which each wavelength carries a weight (the weight coefficients are different among the wavelength channels (columns) but remain the same in different time slots (rows)). A wavelength selective switch (WSS) or wavelength selective modulator can be exploited to achieve the weighting in the wavelength domain. Based on this theory, the autocorrelation function of a 7-bit m-sequence which is a type of pseudo random binary sequence (PRBS) generated by a linear feedback shift register is realized and the rigid wavelength weighting is performed.

Fig. 2 Weight bank design in wavelength domain.

Download Full Size | PDF

The definition of autocorrelation function is expressed as follow:

y (n) = \sum_{m = - \infty}^{\infty} x (m) x (m - n) = x (n) * x (- n)

where x(n) is discrete sequence. For computational convenience, we can make

w (n) = x (- n)

so that the autocorrelation function of a sequence can be seen as the discrete convolution which can be expressed in Eq. (3):

y (n) = x (n) * w (n) = \sum_{m = 0}^{L - 1} x (m) w (n - m) = x (0) w (n) + x (1) w (n - 1) + ... + x (L - 1) w (n - L + 1)

Here the lengths of x(n) and w(n) are both N. For the convenience of expression, we make N_x as the length of x(n) and N_w as the length of w(n), respectively, so that $N_{x = N_{w = N}}$ and $0 \leq n \leq N$ . Therefore the length of y(n) is $L = N_{x + N_{w - 1}}$ . To facilitate the mapping from the mathematical explanation to the time-wavelength plane, Eq. (3) can be rewrote in matrix multiplication form as follow:

Y = xW = [x (0) x (1) ... x (N_{x} - 1)] [\begin{matrix} w (0) & w (1) & \dots & w (L - 2) & w (L - 1 `) \\ 0 & w (0) & \dots & w (L - 3) & w (L - 2) \\ 0 & 0 & \dots & w (L - 4) & w (L - 3) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & w (L - N_{x} + 1) & w (L - N_{x}) \end{matrix}]

x is the input vector and W is the Toeplitz matrix in which w(n)=0 where

N_{w} < n < L

.

Y = [Y_{0} Y_{0} ... Y_{L - 1}]

is the result vector with the length of L. Therefore the convolution result can be expressed in a vertical form as shown in Fig. 3(a), which is resembled to the time-wavelength plane after dispersion delay in Fig. 1 and the columns represent the elements of Y. Therefore, the plane before the dispersive medium can be obtained in Fig. 3(b) and the weight bank can be consequently calculated as Fig. 3(c) according to Section 2.1. Here, the weight coefficients w(i) are the same in the time slots but different among the wavelength channels, fits the WDW method naturally. Also note that in this case the m-sequence is binary, makes the weighted input vector in each wavelength zero when

w (i) = 0

or itself when

w (i) = 1

. Accordingly, the WDW here can be seen as on-off of light at different wavelengths (the time-wavelength plane only contains wavelengths whose weights equal to one) rather than the manipulation of the volume of particular wavelength power, and therefore an unequal-spacing wavelengths light source should be generated.

Fig. 3 (a) The mathematics-optics mapping of the result of the convolution. (b) Time-wavelength plane before a dispersive medium. (c) The weight bank of the convolution. (d) Unequal-spacing wavelengths can be generated according to the distribution of zeros and ones in weight bank.

Download Full Size | PDF

Fig. 4 (a) The experiment setup of optical matrix manipulation using WDW method. OFC: optical frequency comb, EDFA: erbium-doped fiber amplifier, PPG: pulse pattern generator, PC: polarization controller, MZM: Mach-Zehnder modulator, DCF: dispersion compensation fiber, PD: photodetector, OSC: oscilloscope. (b) OFC source with wavelength spacing of 18GHz. (c) Unequal spacing wavelength distribution after waveshaper and EDFA. (d) Simulation and experiment results of the autocorrelation of a 7-bit m-sequence.

Download Full Size | PDF

The experiment setup is presented in Fig. 4(a). A pulse pattern generator (PPG) sends a 13-bit sequence periodically with data rate of 16.875Gb/s, where the first 6 bits are zeros which used as a guard time slot to prevent interference between other computing periods and the 7-bit x(n) is $[1 0 0 1 1 1 0]$ , which is the m-sequence. Therefore the weight vector w(n) is $[0 1 1 1 0 0 1]$ according to Eq. (2), which indicates that the weight in autocorrelation is the reverse of the input vector in time axis. An optical frequency comb (OFC) with spacing of 18GHz is generated as a wavelength division multiplexing (WDM) source as shown in Fig. 4(b) and a waveshaper is exploited as a WSS for tuning the tooth number and wavelength spacing based on the aforementioned theory. The OFC is modulated simultaneously by the input vector x(n) via a MZM and its time-reversed form w(n) is realized by using the waveshper as a weight bank in WDW method. Here, unequal spacing of 36GHz and 108 GHz is realized as shown in Fig. 4(c) with extinction radio (ER) around 32 dB, which exactly fits the distribution of zeros and ones of the weight vector. A dispersion compensation fiber (DCF) is used as the dispersive medium for dispersion delay whose dispersion coefficient is around -160ps/nm/km, which determines the fiber length is 1.29km, according to Eq. (5):

Δ τ = \frac{1}{f} = D \cdot Δ λ \cdot l

where

Δ τ

is the delayed bit width, f is the modulation speed, D is dispersion coefficient,

Δ λ

is wavelength spacing and l is the length of DCF. The periodic output of the autocorrelation pulse is presented in Fig. 4(d) with period of 770ps, showing a great matching in simulation and experiment. The computation speed is calculated as

1.18 \times 10^{11}

MACs/s.

2.3. Time domain weighting

The schematic of time domain weighting (TDW) is presented in Fig. 5. The distribution of weight coefficients in TDW method in the time-wavelength plane is the transverse of the distribution in WDW method if we regard the weight bank as a matrix. Here, each time slot carries a weight (the weights are different among the time slots (rows) but remain the same in different wavelength channels (columns)). An MZM can be cascaded to modulate the wavelengths simultaneously again after the first stage broadband modulation of the input vector. We calculate a multiplication of a 4 × 4 matrix and a 4 × 1 vector experimentally and simulate a 2D convolution for edge extraction of 32 × 32 binary images based on the proposed TDW method, respectively.

Fig. 5 Weight bank design in time domain.

Download Full Size | PDF

2.3.1. Matrix-vector multiplication

The fourth-order matrix-vector multiplication can be expressed as Eq. (6):

y^{T} = [\begin{matrix} y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{matrix}] = [\begin{matrix} h_{11} & h_{12} & h_{13} & h_{14} \\ h_{21} & h_{22} & h_{23} & h_{24} \\ h_{31} & h_{32} & h_{33} & h_{34} \\ h_{41} & h_{42} & h_{43} & h_{44} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{matrix}] = H x^{T}

where

x^{T}

and

y^{T}

are the input and output vector respectively, and H is the weight matrix. Here, Eq. (6) can be decomposed as four linear weighted additions as follow:

\begin{matrix} y_{1} = h_{11} x_{1} + h_{12} x_{2} + h_{13} x_{3} + h_{14} x_{4} \\ y_{2} = h_{21} x_{1} + h_{22} x_{2} + h_{23} x_{3} + h_{24} x_{4} \\ y_{3} = h_{31} x_{1} + h_{32} x_{2} + h_{33} x_{3} + h_{34} x_{4} \\ y_{4} = h_{41} x_{1} + h_{42} x_{2} + h_{43} x_{3} + h_{44} x_{4} \end{matrix}

Accordingly, the fourth-order matrix-vector multiplication can be simplified to four linear weighted additions, and each weighted addition can be implemented by the TDW method. In order to improve scalability, we solve these four weighted additions on one single weighting system by utilizing time division multiplexing (TDM) where each slot computes one element of the result vector. Here, the whole operation time is cut into four slices, and each slice contains four wavelength channels with duration of T. The input vector x modulates the time slices simultaneously with data rate of N/T as shown in Fig. 6(a), where N = 4. The weight matrix H is decomposed into four row vectors H $_{1}$ , H $_{2}$ , H $_{3}$ and H $_{4}$ , where H $_{i}$ =[ h $_{i 1}$ h $_{i 2}$ h $_{i 3}$ h $_{i 4}$ ], i=1, 2, 3, 4. Then each row vector simultaneously modulates each time slice again to perform multiplication as shown in Fig. 6(b). Hereto, the weight matrix is mapped into optics separately by four pieces via TDM, and each piece contains a row of the matrix and weighting is consequently performed via TDW. Dispersive medium then delay the wavelengths by T/N in time and a PD will sum the delayed channels up to perform accumulation. One can sample the result in the middle of each time slice to get the results of four weighted additions as shown in Fig. 6(c), which are exactly the four elements of the output vector y. Note that the duration of each time slice will extent to $2 T - T / N$ after dispersion, therefore guard time is necessary to avoid conflict between slices. Here, the guard time between each slice is set to T in order to keep one bit width gap between slices after stretching.

Fig. 6 Principle of TDM assisted TDW method for fourth-order matrix-vector multiplication. Here, N = 4. (a) Broadband modulation of the input vector in four time slices. (b) Weighting by each row vector of the matrix H in each time slice. (c) Computation result can be sampled at the middle of each slice.

Download Full Size | PDF

Fig. 7 Experiment demonstration of fourth-order matrix-vector multiplication. (a) Experiment setup. MUX: multiplexer, PC: polarization controller, MZM: Mach-Zehnder modulator, AWG: arbitrary waveform generator, EA: electrical amplifier, EDFA: erbium-doped fiber amplifier, DCF: dispersion compensation fiber, PD: photodetector, OSC: oscilloscope. (b) The optical waveform of x and H. (c) Dispersive time delay of the modulated wavelengths by dispersion effect of the well-designed DCF. (d) Computation result of the multiplication. Elements of the output vector can be sampled at the middle of each time slice.

Download Full Size | PDF

Fig. 7(a) shows the experiment setup of the fourth-order matrix-vector multiplication. Four channels are multiplexed as the WDM source with wavelengths of 1550.12nm, 1553.12nm, 1556.12nm, 1559.12nm respectively. An arbitrary waveform generator (AWG) modulates x by MZM1 and H by MZM2 periodically with data rate of 5.37Gb/s, performing multiplications by the cascaded structure. The waveform of x and H is presented in Fig. 7(b), in which guard time is realized by 4-bit zero. A 455m DCF with dispersion coefficient D around -136ps/nm/km is exploited to delay the channels around 186ps, which is exactly the width of one bit, as illustrated in Fig. 7(c). The relationship among dispersion coefficient, wavelength spacing, DCF length and modulation speed also satisfy with Eq. (5). A PD with 3-dB bandwidth of 25GHz is responsible for calculating the accumulation and the elements of output vector appear at the middle of each slices. The multiplication results are shown in Fig. 7(d). The spacing between two computing slices is around 1.49ns. Computation speed can also be calculated as $1.08 \times 10^{10}$ MACs/s for each vector-vector weighted addition, and $2.69 \times 10^{9}$ MACs/s for the whole matrix multiplication.

2.3.2. The edge extraction enabled by optical 2D convolution process

Convolutional neural networks (CNNs) [21] are a kind of ANNs that have been impressively employed in computer vision tasks such as image classification and recognition [22, 23]. However, there is a dramatic increase of parameters and operations in performing the convolution layer in CNNs, where input images convolve with filter-kernels as shown in Fig. 8. The whole operation can be expressed as a 2D convolution in Eq. (8):

S (u, v) = (I * K) (u, v) = \sum_{m} \sum_{n} I (m, n) K (u - m, v - n)

where I is input matrix (or image), K is the kernel function and S is the convolution result. Assuming that we only input a square matrix with oeder of M, kernel order is F with convolution stride is t and the number of padding is P, then the order of result matrix is

((M - F + 2 P) / t) + 1

. Here, we set the stride t and padding number P to be 1 and kernel size F to be 3 so that the order of result matrix is as same as the input matrix. Because of the commutativity of convolution, it is equivalent to change Eq. (8) to Eq. (9):

S (u, v) = (I * K) (u, v) = \sum_{m} \sum_{n} I (u - m, v - n) K (m, n)

Therefore, the 2D convolution can be considered as bunches of weighted additions with time shifting weights $I (u, v)$ , and TDM assisted TDW method can be employed consequently based on the theory in Section 2.3.1.

Fig. 8 Principle of 2D convolution process.

Download Full Size | PDF

Fig. 9 Simulation demonstration of 2D convolution. (a) Simulation setup of edge extraction of binary images. IMG: input images, WDM: wavelength division multiplexing, MZM: Mach-Zehnder modulator, DCF: dispersion compensation fiber, PD: photodetector. (b) 32×32 binary images. (c) Extracted edges of the binary images based on proposed method. (d) Sampling errors with DCF length varies. Red dash line: the limiting error of a successful edge extraction.

Download Full Size | PDF

Here, the optical 2D convolution process is simulated in Lumerical INTERCONNECT as shown in Fig. 9(a). We use 32 × 32 binary images as the input and a 3 × 3 Laplacian operator is exploited as the convolution kernel for edge extraction. Kernel size determines the number of wavelength channels by a wavelength filter, therefore we generate 9 wavelengths in 1550nm band with spacing of 100GHz. Dimensional reduction is used to turn the 2D images and the kernel into 1D data streams so that the whole convolution operation can be simplified as linear weighted additions. Here, we use 5 bits to code each kernel pixel for computational convenience, where the first bit is sign bit that represents the polarity (positive or negative) and the rest of it is the binary form of the pixel’s absolute value. The 1D data stream of the input image and the kernel are modulated by two cascaded MZMs. Note that the pixel of the input binary image is expressed by one binary bit but the kernel pixel is expressed in 5 binary bits, therefore the modulation speed of the two data streams should be matched carefully: the modulation speed of the kernel data should be 5 times of the input image data. Here, we set the modulation speed to be 5Gb/s for kernel data and 1Gb/s for input image data. The desired DCF length can be calculated by Eq. (5) as 7.8125km. Signal determination based on an optimized threshold is exploited after sampling and quantifying at the output of PD in order to turn the results into binary. The data stream is decoded and finally processed back into a matrix whose elements representing the pixels of an image after the convolution process by dimensional recovery. The binary images and extracted edges are shown in Fig. 9(b) and Fig. 9(c). The computation speed can be calculated as $5.12 \times 10^{11}$ MACs/s for per weighted addition and 5× 10⁸ MACs/s for whole convolution operation. By changing the DCF length l, different sampling value can be obtained under the same sampling rate, and by calculating the mean square error between the sampling value when the l meets the Eq. (5) and those does not, we can get the sampling error to evaluate the tolerance of the system, as shown in Fig.9 (d). It is considered as a successful extraction only when the edge of the output image is clearly visible, as the red dash line in Fig. 9(d) shows. Here, the edge can be extracted clearly when l varies from 7.7375km to 7.8625km, but is failed to extract when l is out of the range, where an unmatched time delay is introduced during the alignment between the input image patches and the kernel. Besides, the variation of the DCF length can be equivalent to the variations of wavelength spacing and modulation speed, according to Eq. (5).

3. Discussion

Due to the utilizing of multiplexing techniques, there is potential for this proposed architecture to realize high-order or multi-dimensional matrix manipulation. Wavelength parallelism provides the unique degree for extending the order of the MACs operation by increasing the utilized wavelength number. However, increasing the number of wavelengths to expand the scale of operations has different effects on TDW and WDW weighting methods. For TDW, increasing the number of the utilized wavelengths has no significant impact on the hardware structure and power penalty of the system due to the broad modulating bandwidth of broadband modulators. However, for WDW, each wavelength is manipulated by one wavelength controller (such as microring modulator) and every increasing of the wavelength corresponds to an addition of this device, leading to a greater insertion loss that affects the computation accuracy. Also note that the extension by simply adding more wavelengths would cause the arithmetic error due to the wavelength dependence of the dispersion coefficient D of the exploited dispersive medium, producing unequal time delays for the light at different wavelengths which finally turns to reduce the signal-noise ratio. One possible solution for this issue is to design a dispersion medium that have a stable and flat dispersion slope in a certain wavelength range. Another practical approach is to reduce wavelength span by narrowing the wavelength spacing, which makes the D almost a constant to bring an equal time delay. Nevertheless, there is still a floor of wavelength spacing to avoid channel crosstalk. Fig. 10 shows the computation results by experiment set up in Section 2.3.1 with different wavelength spacing at 3nm, 2nm, and 1nm, where the level of each computation value fades with the spacing narrows, which increases the error of sampling and quantification.

Fig. 10 Computing levels with with (a) 3nm (b) 2nm and (c) 1nm wavelength spacing.

Download Full Size | PDF

TDM technique is introduced to support calculations of high dimensional weighted addition. Different weighted additions are performed in one weighting system but in different time slices which greatly reduces the space complexity of the system. However, the low complexity of the TDM assisted system is obtained by calculating weighted additions in time sequence instead of in parallel. The reduction of the space complexity is traded by increasing the time complexity and reducing the computation efficiency. Assume that we are calculating a multiplication of $M \times N$ matrix and $N \times 1$ vector in parallel (namely calculating M weighted additions simultaneously by multiple weighting systems), the computation speed (Com.Speed 1) can be calculated as Eq. (11). However, if the multiplication were performed by the TDM assisted TDW method mentioned in Section 2.3.1, the computation speed (Com.Speed 2) of the whole system would be consequently slowed down due to the time consumption as expressed in Eq. (12):

C o m . S p e e d 1 = \frac{f}{2 N} \times M \times N = \frac{f}{2} \times M MACs / s

C o m . S p e e d 2 = \frac{\frac{f}{2 N} \times M \times N}{M} = \frac{f}{2} MACs / s

where f is modulation speed, N is also the length of the input vector and 2N means an additional guard time of N is also considered, which can be replaced by 2N-T/N according to Section 2.3.1. The M in denominator of Eq. (11) indicates that the whole operation calculates M times weighted additions for M elements of the result vector in M time slices respectively, making it 1/M slower in computation speed than using multiple weighting system. However, since the whole operation is performed in one single weighting system, the size of the system based on TDM is

1 / M

of the system based on parallel architecture. Besides, Com.Speed 2 is only related to the modulation speed according to Eq. (11), therefore the computation speed can be compensated by promoting the bandwidth of the system and modulation speed. Another problem in performing TDM assisted TDW method is the transmission delay between the two cascaded MZMs. The optical delay will cause a mismatched phase between the RF signal and modulated carrier. Therefore, clock alignment is required to compensate the mismatch via either controlling the delay by changing the length of optical path between two MZMs or introducing electronic delay that is equal to the optical delay by modifying the modulation coding of the second RF generator.

According to Eq. (10) and Eq. (11), the computation speed of the weighting system is proportional to the modulation speed, which is mainly determined by the bandwidth of utilized modulators. Suppose that a microring modulator or an MZM has the modulation speed as the reported 64 Gb/s [24] or 56Gb/s [25], respectively, with N_λ wavelengths utilized, the computation speed (without TDM) can be calculated as $6.4 \times 10^{10} \times N_{λ}$ MACs/s for WDW and $5.6 \times 10^{10} \times N_{λ}$ MACs/s for TDW, where N_λ is related to the wavelength spacing and system bandwidth.

The proposed optical matrix manipulation system has great potential to process complex matrix-involved ANN tasks known as optical neural networks (ONNs), which have attracted a lot of attentions both in academia and industry [16, 18, 26, 27]. General ANN models can be simplified as multiplications between matrices and the input vector with nonlinear activation after each multiplication. The model can be mapped into optics by cascading or recycling the proposed system to perform matrix operation between each layer. Nonlinear activation can be also implemented in optics by physical properties of optical devices which has been demonstrated previously such as saturable absorption [28] and bistability [29]. The whole architecture can be implementedon monolithic chip due to common optical devices utilization and simple topological structure. WDM source can be implemented via generating OFC by ring resonator [30, 31] and weight bank can be realized by ring modulator in WDW or MZM in TDW. Dispersive medium on chip has been widely studied by researchers with relatively high dispersion coefficient and low dispersion slop [32, 33]. However, power budget should be carefully considered when building complex matrix manipulation circuits such as ONNs due to the huge insertion loss of devices and zero gain of silicon material.

4. Conclusion

We propose a fully optical computing architecture for massive parallel matrix manipulation with multiplexing techniques. Two weighting methods are exploited and demonstrated, in which sequence autocorrelation is performed in WDW method with speed at $1.18 \times 10^{11}$ MACs/s, Matrix-vector multiplication and 2D convolution are also implemented in TDW method at $2.69 \times 10^{9}$ MACs/s and $5 \times 10^{8}$ MACs/s respectively. The proposed architecture has great potential for large scale matrix operations with small footprint and low power consumption.

Funding

National Natural Science Foundation of China (NSFC) (61605111); National Key Research and Development Program of China (2016YFB0200205).

References

1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015). [CrossRef] [PubMed]

2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, (2012), pp. 1097–1105.

3. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Process. Mag. 29, 82–97 (2012). [CrossRef]

4. T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing [review article],” IEEE Comput. Intell. Mag. 13, 55–75 (2018). [CrossRef]

5. C. Shu and Q. Xie, “Programmable schemes on temporal processing of optical pulses for high-speed photonic subsystems,” in Optical Fiber Communication Conference (OFC) 2019, (Optical Society of America, 2019), p. M1B.1.

6. Z. Jiang, C.-B. Huang, D. E. Leaird, and A. M. Weiner, “Optical arbitrary waveform processing of more than 100 spectral comb lines,” Nat. Photonics 1, 463–467 (2007). [CrossRef]

7. J. Yao, “Microwave photonics,” J. Light. Technol. 27, 314–335 (2009). [CrossRef]

8. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, and R. Boyle, “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th Annual International Symposium on Computer Architecture, (ACM, 2017), pp. 1–12.

9. H. J. Caulfield and S. Dolev, “Why future supercomputing requires optics,” Nat. Photonics 4, 261–263 (2010). [CrossRef]

10. H. J. Caulfield and S. Dolev, “The role of optics in computing,” Nat. Photonics 4, 406–407 (2010). [CrossRef]

11. J. Touch, A. Badawy, and V. Sorger, “Optical computing,” Nanophotonics 6, 503–505 (2017). [CrossRef]

12. T. Zhang, Q. Qiu, Z. Fan, J. Su, and M. Xu, “Experimental study on a 4-b serial optical digital to analog convertor,” IEEE Photonics J. 10, 1–9 (2018). [CrossRef]

13. J. W. Goodman, A. R. Dias, and L. M. Woody, “Fully parallel, high-speed incoherent optical method for performing discrete fourier transforms,” Opt. Lett. 2, 1–3 (1978). [CrossRef] [PubMed]

14. D. Brunner, M. C. Soriano, and I. Fischer, “High-speed optical vector and matrix operations using a semiconductor laser,” IEEE Photonics Technol. Lett. 25, 1680–1683 (2013). [CrossRef]

15. L. Yang, R. Ji, L. Zhang, J. Ding, and Q. Xu, “On-chip CMOS-compatible optical signal processor,” Opt. Express 20, 13560–13565 (2012). [CrossRef] [PubMed]

16. M. A. Nahmias, H. Peng, T. F. de Lima, C. Huang, A. N. Tait, B. J. Shastri, and P. R. Prucnal, “A teramac neuromorphic photonic processor,” in 2018 IEEE Photonics Conference (IPC), (2018), pp. 1–2.

17. D. A. B. Miller, “Self-aligning universal beam coupler,” Opt. Express 21, 6360–6370 (2013). [CrossRef] [PubMed]

18. Y. Shen, N. C. Harris, S. Kirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017). [CrossRef]

19. A. Ribeiro, A. Ruocco, L. Vanacker, and W. Bogaerts, “Demonstration of a 4 × 4-port universal linear circuit,” Optica 3, 1348–1357 (2016). [CrossRef]

20. Y. Huang, W. Zhang, F. Yang, and Z. He, “Optical matrix manipulation based on frequency comb modulation and dispersed time delay,” in Optical Fiber Communication Conference (OFC) 2019, (Optical Society of America, 2019), p. M1B.4.

21. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2324 (1998). [CrossRef]

22. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, (Curran Associates Inc, 2012), pp. 1097–1105.

23. M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision – ECCV 2014, (Springer International Publishing, 2014), pp. 818–833.

24. J. Sun, M. Sakib, J. Driscoll, R. Kumar, H. Jayatilleka, Y. Chetrit, and H. Rong, “A 128 gb/s pam4 silicon microring modulator,” in Optical Fiber Communication Conference Postdeadline Papers, (Optical Society of America, 2018), p. Th4A.7.

25. t. y. v. n. p. k. d. I. m. G. Denoyer and A. Chen and B. Park and Y. Zhou and A. Santipo and R. Russo, booktitle=2014 The European Conference on Optical Communication (ECOC).

26. A. Tait, T. F. N. de Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430–7449 (2017). [CrossRef]

27. Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287–292 (2012). [CrossRef]

28. Z. Cheng, H. K. Tsang, X. Wang, K. Xu, and J. Xu, “In-plane optical absorption and free carrier absorption in graphene-on-silicon waveguides,” IEEE J. Sel. Top. Quantum Electron. 20, 43–48 (2014). [CrossRef]

29. B. Xu and N.-B. Ming, “Experimental observations of bistability and instability in a two-dimensional nonlinear optical superlattice,” Phys. Rev. Lett. 71, 3959–3962 (1993). [CrossRef] [PubMed]

30. P. DelHaye, A. Schliesser, O. Arcizet, T. Wilken, R. Holzwarth, and T. J. Kippenberg, “Optical frequency comb generation from a monolithic microresonator,” Nature 450, 1214–1217 (1993). [CrossRef]

31. J. Lin, H. Sepehrian, Y. Xu, L. A. Rusch, and W. Shi, “Frequency comb generation using a CMOS compatible SiP DD-MZM for flexible networks,” IEEE Photonics Technol. Lett. 30, 1495–1498 (2018). [CrossRef]

32. K. Wu and A. W. Poon, “Dispersion engineering of High-Q Si $_{3}$ N $_{4}$ microdisk resonators,” in Conference on Lasers and Electro-Optics, (Optical Society of America, 2018), p. SW4B.4.

33. Y. Li, J. Li, Y. Huo, M. Chen, S. Yang, and H. Chen, “Spatial-mode-coupling-based dispersion engineering for integrated optical waveguide,” Opt. Express 26, 2807–2816 (2018). [CrossRef] [PubMed]

Programmable matrix operation with reconfigurable time-wavelength plane manipulation and dispersed time delay

Abstract

1. Introduction

2. Proof-of-concept of optical matrix manipulation

2.1. Principle of optical matrix manipulation

2.2. Wavelength domain weighting

2.3. Time domain weighting

2.3.1. Matrix-vector multiplication

2.3.2. The edge extraction enabled by optical 2D convolution process

3. Discussion

4. Conclusion

Funding

References

Cited By

Figures (10)

Equations (11)

Optics Express