Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

BP-based supervised learning algorithm for multilayer photonic spiking neural network and hardware implementation

Open Access Open Access

Abstract

We introduce a supervised learning algorithm for photonic spiking neural network (SNN) based on back propagation. For the supervised learning algorithm, the information is encoded into spike trains with different strength, and the SNN is trained according to different patterns composed of different spike numbers of the output neurons. Furthermore, the classification task is performed numerically and experimentally based on the supervised learning algorithm in the SNN. The SNN is composed of photonic spiking neuron based on vertical-cavity surface-emitting laser which is functionally similar to leaky-integrate and fire neuron. The results prove the demonstration of the algorithm implementation on hardware. To seek ultra-low power consumption and ultra-low delay, it is great significance to design and implement a hardware-friendly learning algorithm of photonic neural networks and realize hardware-algorithm collaborative computing.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The biological plausible spiking neural network (SNN) is often called the third generation of neural network. The information in the SNN is expressed in the form of spikes, which is thought to be power efficient and have attracted lots of attention in many fields [15]. Spike-timing-dependent plasticity (STDP) is a prominent mechanism observed biologically, typically used for the unsupervised and supervised training of SNN [69]. In traditional neural network, back-propagation (BP) is the most efficient way for training. However, due to the loss of derivatives of spike signal, the traditional supervised learning algorithm based on gradient descent cannot be used directly in SNN. To overcome this problem, researchers have developed some effective methods to train the SNN directly with gradient descent. There are mainly two thoughts, one is to modify the spiking neuron model to a derivable form [10,11], and the other is to avoid calculating the derivative, for a significant example, to use surrogate gradient function in the back-propagation [1215]. At present, there is no recognized supervised training algorithm for SNN.

SNN provides an ideal neuromorphic-computing paradigm for realizing energy-efficient hardware due to the high bio-fidelity and intrinsic sparse event-driven processing capability [8]. Recently, there have been various hardware emulations of SNN developed in very-large-scale integration system such as True-north and Loihi [1618]. However, the electronic SNNs are subject to a fundamental bandwidth connection-density tradeoff. Integrated photonics in the field of information processing has unique advantages of fast speed, large bandwidth, low power consumption and high parallelism, as compared to its electronic counterpart [1925]. In this sense, photonic computing has attracted wide attention [23]. In recent years, the successful demonstrations of photonic integrated circuits mainly fucus on linear functions such as weight matrix multiplication (WMM), implemented on conventional optical components such as micro-resonators (MR) [22] or MR with phase-change materials such as Ge2Sb2Te5 [26,27], and Mach-Zehnder modulator (MZM) [19,20]. There are also studies on nonlinear activation function [2830]. An on-chip all-optical neuron on silicon is implemented via heterogeneously integrated a III-V laser onto a silicon on insulator substrate, which can be integrated as the non-linear activation function within a silicon photonic neural network [29]. Moreover, a programmable, low-loss all-optical activation function device based on a silicon is also demonstrated [30].

In addition to the implementation of linear WMM and nonlinear activation functions, the emulating of photonic spiking neuron, being the main element of photonic SNN, has also caused lots of interests. A most commonly used scheme for photonic spiking neuron is based on PCM [21], which could be scalable and exhibit integrate and fire property, however lacks temporal encoding feature and asynchronous information processing [31]. Spiking neuron models that have inner spiking mechanism are mostly based Q-switching and phase-locking in lasers [32,33]. Vertical-cavity surface-emitting lasers (VCSELs) [23, 3437], two-section lasers with integrated or embedded saturable absorber such as VCSELs (VCSEL-SA) [38,39] and Fabry–Pérot laser [40], distributed feedback semiconductor laser [41] and so on [4244] have been demonstrated numerically or experimentally. Based on photonic spiking neuron model, there are several works focused on the training algorithm and architecture design of photonic SNN [40,45,46]. For example, we have previously demonstrated a hardware-algorithm collaborative computing with photonic spiking neuron chip based on integrated Fabry–Pérot laser with saturable absorber [40].

In this paper, we propose a supervised learning algorithm for photonic SNN. For the supervised learning algorithm, the information is encoded into spike trains with different strength, and the SNN is trained according to different patterns composed of different spike numbers of the output neurons. The rest of the paper is organized as follows. Section 2 describes the BP-based algorithm and network structure. Section 3 presents the numerical results of classification in a SNN. Section 4 presents the experimental setup and results based on the proposed supervised learning algorithm. Finally, the overall conclusion is summarized in Section 5.

2. Method

We propose to train a SNN with a BP-inspired algorithm. The BP-inspired algorithm is described as follows: For a SNN, the input neurons are described by corresponding to features of dataset, and ${N_o}$ output neurons corresponding to the category of dataset. Suppose the hidden layer has ${N_h}$ neurons, then we have a multi-layer feedforward network structure with ${N_i}$ input neurons, ${N_o}$ output neurons and ${N_h}$ hidden layer neurons. The connection weight between the $i - th$ neuron of the input layer and the $h - \textrm{th}$ neuron of the hidden layer is ${\omega _{ih}}$, and the connection between the $h - \textrm{th}$ neuron of the hidden layer and the $j - \textrm{th}$ neuron of the output layer is ${\omega _{hj}}$.

We suppose that the output of ${N_i}$ neuron is ${x_i}$. Thus, the $h - \textrm{th}$ neuron receives the weighted sum of input signals ${\alpha _h} = \sum\nolimits_{i = 1}^{{N_i}} {{\omega _{ih}}{x_i}}$, and the output is denoted as ${y_h}$, which contains ${m_h}$ spikes. The input of the $j - \textrm{th}$ neuron is ${\beta _j} = \sum\nolimits_{h = 1}^{{N_h}} {{\omega _{hj}}{y_h}}$, and the output of the $j - \textrm{th}$ neuron is ${y_j}$, containing ${m_j}$ spikes. The $j - \textrm{th}$ output neuron is supposed to release ${d_j}$ spikes. Hence, the error of the network can be written as:

$$E = \frac{1}{2}{\sum\nolimits_{j = 1}^{{N_o}} {({{m_j} - {d_j}} )} ^2}$$

For traditional BP algorithm, the weight can be adjusted as follows:

$$\Delta {w_{hj}} = - \eta \frac{{\partial E}}{{\partial {w_{hj}}}},\,\Delta {\omega _{ih}} = - \eta \frac{{\partial E}}{{\partial {\omega _{ih}}}}$$
where $\eta $ is the learning rate. In the photonic SNN, the activation function is emulated via photonic spiking neuron. It is known that the number of output spike is positively related to the intensity of input spike train. However, due to the complex numerical model of the photonic neuron, the derivative is difficult to calculate. Since the derivative around the spike is quite large, we can set it to a constant, here we set it to 1. Then, according to traditional BP algorithm [47], the weight can be adjusted as follows:
$$\Delta {\omega _{hj}}\textrm{ } ={-} \eta \cdot \frac{{\partial E}}{{\partial {y_j}}} \cdot \frac{{\partial {y_j}}}{{\partial {\beta _j}}} \cdot \frac{{\partial {\beta _j}}}{{\partial {\omega _{hj}}}} = \eta ({d_j} - {m_j}){y_h}$$
$$\begin{array}{l} \Delta {\omega _{\textrm{i}h}}\textrm{ ={-} }\eta \cdot \frac{{\partial E}}{{\partial {y_j}}} \cdot \frac{{\partial {y_j}}}{{\partial {\beta _j}}} \cdot \frac{{\partial {\beta _j}}}{{\partial {\alpha _h}}} \cdot \frac{{\partial {\alpha _h}}}{{\partial {\omega _{\textrm{i}h}}}}\\ = \eta \sum\nolimits_{j = 1}^{{N_o}} {{\omega _{h\textrm{j}}}({d_j} - {m_j})} {x_i} \end{array}$$
$${\omega _{hj}} = {\omega _{hj}} + \Delta {\omega _{hj}}$$
$${\omega _{\textrm{i}h}} = {\omega _{\textrm{i}h}} + \Delta {\omega _{\textrm{i}h}}$$

In proposed photonic SNN, VCSEL-SA or VCSEL is adopted as photonic spiking neuron for hidden layer and output layer. It is because that VCSELs have unique inherent advantages, such as high energy efficiency, high speed modulation capability, low bias currents [46]. Moreover, the algebraic equivalent of VCSEL-SA is the well-known LIF neuron. The gain variable is corresponding to the membrane potential in LIF model [38]. In the formula, if the output spike number of hidden layer VCSEL-SA neurons ${m_h}\textrm{ = }0$, we set it to 1 to ensure the effective training of weight ${\omega _{h\textrm{j}}}$. In addition, when the weight is negative after updating, the weight is adjusted to a smaller positive value to make sure that all inputs are contributed to the training of network. The network is trained to minimize the loss function, which measures the inconsistency degree between the predicted value and the real target of the model. The smaller the loss function is, the better the fitting degree of the model is.

In the traditional BP algorithm, the input value is mapped to the output value 0 or 1, where 1 corresponds to neuron excitation and 0 corresponds to neuron inhibition. Because there is no reasonable physical expression of negative weight in all-optical SNN, and output 0 means no spike generation, it is difficult for network training. Therefore, we modify the input signal into a cluster of spikes, and the output is mapped to 1 or 2 in this network, where 1 denotes that the output neuron release 1 spike and 2 corresponds to two output spikes.

The input layer firstly converts the initial sample input into the form of Gaussian spike train. Specifically, the input feature ${\textrm{X}_i}$ is converted into a Gaussian spike train ${\textrm{x}_i}$ of length ${T_p}$=3 ns, where the full width at half maximum of a single pulse is 0.14 ns, and the interval between two spikes is 0.28 ns, as shown in Fig. 1. More importantly, the input feature is reflected in the peak intensity of the spike train, i.e., ${k_e}\textrm{ = }{\textrm{X}_i}$. The neurons in the hidden layer coverts the weighted Gaussian spike train into the form of photonic spikes. The neurons in the output layer receive and generate spikes as shown in Fig. 1. In this case, larger inputs can result in spike responses of the hidden layer and output layer neurons. The coupling weight of the network is trained to produce pre-defined patterns according to the input patterns.

 figure: Fig. 1.

Fig. 1. The structure of neural network. ${N_{x1}}$-${N_{x4}}$ are neurons in input layer, ${N_{h1}}$-${N_{h4}}$ are neurons in hidden layer, ${N_{o1}}$-${N_{o2}}$ are neurons in output layer. The neurons of hidden layer and output layer are simulated by VCSELs-SA model.

Download Full Size | PDF

3. Numerical results

Here we use iris dataset for demonstration. The iris dataset has a total of 150 samples. We choose 99 samples as training dataset, and the rest 51 as test dataset. Each sample has 4 features to be trained, namely, sepal length, sepal width, petal length, and petal width. The distribution of iris dataset is shown in Figs. 2(a)-(d). Four features are presented in the Y-axis, while three kinds of iris dataset are shown in the X-axis in Figs. 2(a) - (d). The input features are firstly precoder. The range of the iris data is [0.1, 7.9], which was scaled linearly to [7000, 16000]. This is the range of intensity of the Gaussian spike train that encodes the input information.

 figure: Fig. 2.

Fig. 2. The distribution of iris dataset (a) – (d), the training loss and test accuracy with the training epoch (e). The test accuracies for the three categories are 100%, 88.24%, and 88.24% respectively, separately presented as Acc1, Acc2 and Acc3.

Download Full Size | PDF

We choose a network structure with 4 input neurons, 4 hidden neurons and 2 output neurons as shown in Fig. 1. Different combinations of the output spike number of output neurons are used to classify the three categories. The labels for the three categories are designed as: [1, 1] for Iris-setosa [1, 2] for Iris-versicolor, and [2, 2] for Iris-virginica. The network is trained according to the introduced algorithm until the loss reaches its minimum value or the epoch reached a preset maximum. The trained loss and test accuracy is presented in Fig. 2(e). It can be seen that there is no longer decline in loss at about 300-th training epoch, indicating the convergence of the network. The mean test accuracy is about 92.16%, as shown in the orange line Acc in Fig. 3. The test accuracies for the three categories are 100%, 88.24%, and 88.24% respectively, separately presented as Acc1, Acc2 and Acc3. Furthermore, we also test a network with the structure of 4 input neurons, 3 hidden neurons/5 hidden neurons and 2 output neurons, the mean test accuracy is about 88.24%/94.12%. In the following, the network with four neurons in hidden layer is analyzed.

 figure: Fig. 3.

Fig. 3. The evolution of weight from input to hidden layer (a) and those from hidden to output layer. (b) The weight distributions of ${\omega _{\textrm{i}h}}$ (c) and ${\omega _{hj}}$ in 300-th training epoch.

Download Full Size | PDF

The weight evolution is further presented in Fig. 3. The coupling weights from input layer to hidden layer are shown in Fig. 3(a), and those from hidden layer to output layer are shown in Fig. 3(b). It can be seen that ${\omega _{ih}}$ changes obviously compared with initial values, whereas there is little variation in ${\omega _{hj}}$ as shown from the inset figure. It may be because that the distribution range of the initial values is larger for ${\omega _{ih}}$ and smaller for ${\omega _{hj}}$. Additionally, the weight changes are also affected by network structure, task characteristics and so on. In 300-th training epoch, the weight distributions of ${\omega _{\textrm{i}h}}$ and ${\omega _{hj}}$ are shown in Figs. 3(c) and (d). It can be seen that two neurons (${N_{x1}}$ and ${N_{x3}}$) in input layer are strongly connected with neurons in hidden layer with the trained photonic neural network. One neuron (${N_{h4}}$) in the hidden layer is hardly connected with one neuron (${N_{o2}}$) in output layer in the trained photonic neural network.

Then we show the input pattern of a certain sample from each category and the corresponding output calculated with the trained weights. The signal received by the output neuron is directly presented as the weighted sum of the output from the hidden layer neurons, as shown in Fig. 4, in blue. The corresponding output pattern is presented in red. Figures 4(a1) -(b2) give an example from Iris-setosa. From Figs. 4(b1)-(b2), we can see that both output neurons generate a single spike, denoted by [1, 1], which is correlated to Iris-setosa. A sample from Iris-versicolor is shown in Figs. 4(c1)-(d2), and from which we can see the output pattern is [1, 2]. Similarly, the output pattern of Iris-virginical is [2,2], as presented in Figs. 4(e1)-(f2). The results suggest consistent relations of the inference result and the target.

 figure: Fig. 4.

Fig. 4. The input and output patterns of the output neuron for 3 samples from different categories. (a1) - (b2) Example from Iris-setosa and corresponding output with [1, 1]; (c1) - (d2) Example from Iris-versicolor and corresponding output with [1, 2]; (e1) - (f2) Example from Iris-virginica and corresponding output with [2, 2]. (a1) -(f1) corresponding the inputs and outputs of ${N_{o1}}$; (a2) -(f2) corresponding the inputs and outputs of ${N_{o2}}$.

Download Full Size | PDF

To demonstrate the scalability and universality of the algorithm, we trained and classified the Molecular Biology (Splice-junction Gene Sequences) Data Set. This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). We choose a network structure with 240 input neurons, 32 hidden neurons and 1 output neurons to classify the Splice-junction Gene Sequences. One spike generated in output neuron represents EI sites. Two spikes generated in output neuron represents IE sites. There is no longer decline in loss at about 600-th training epoch, indicating the convergence of the network. The mean test accuracy is about 97.38%. For classifying the MNIST, we choose a network structure with 784 input neurons, 100 hidden neurons and 10 output neurons. The results of recognition are represented by the neuron that fires the maximum number of spikes. The mean test accuracy is about 97.96% over epochs [81, 100].

4. Experimental results and discussion

Next, we design to combine the algorithm with photonic hardware architecture. Based on the structure of Fig. 1 and the trained weights of Fig. 3, we achieved the mean test accuracy of 92.16%. Here, we map the neurons and the trained weights to hardware platform as illustrated in Fig. 5. Specifically, the neurons of output layer in Fig. 1 are implemented in two commercial 1500 nm VCSEL neurons in Fig. 5. The test data, outputs of neurons in input/hidden layers were weighted and summed in the computer. The computed values are considered as the stimuli of the neurons of output layer (VCSEL-neurons), which are generated by arbitrary waveform generator (AWG, Tektronix AWG70002) operating at 10GSa/s in Fig. 5. The rest of the experimental setup is introduced as follows.

 figure: Fig. 5.

Fig. 5. The experimental setup for the demonstration of the proposed SNN. TL1, TL2: tunable laser; OI1, OI2: optical isolator; PC1, PC2, PC3 and PC4: polarization controllers; MZM1, MZM2: Mach-Zehnder intensity modulators; VOA1, VOA2: variable optical attenuator; LDC Bias & Temp: laser diode bias current and temperature controller. EDFA1, EDFA2: erbium doped fiber amplifier; PD1, PD2: photodetector; OSC: oscilloscope; AWG: arbitrary waveform generator.

Download Full Size | PDF

Laser diode bias current and temperature controllers (LDC Bias&Temp, LDC-3724C) were used to control the temperature and bias currents of VCSELs. The electronic signals generated by AWG were amplified via radio frequency amplifiers. Then, the two amplified electronic signals were modulated into continuous wave (CW) by Mach-Zehnder intensity modulators (MZM, Fujitsu FTM7928FB). Two tunable lasers (TLs) generate continuous wave (CW), continued by optical isolators (ISO) to minimize back reflections. Two polarization controllers (PC1 and PC2) are used to adjust the polarization state of optical path to match the polarization state of MZMs. The polarization states of the modulated signals are adjusted to match VCSEL1 and VCSEL2 using PC3 and PC4, respectively. Variable optical attenuators (VOA) are used to adjust the optical power of the modulated signal. The modulated signals are then injected into VCSEL1 and VCSEL2 through the three-port circulators. Erbium doped fiber amplifiers (EDFA1 and EDFA2) are used to amplify the output signals of VCSEL1 and VCSEL2 respectively. The output signal is converted into electrical signal by photodetectors (PD, Agilent HP11982A), and monitored by oscilloscope (OSC, Keysight DSOV334A). Here, the spectra of all optical signals are observed by a spectral analyzer (OSA, Advantest Q8384).

The threshold current measured of VCSEL1 is 2.2 mA at 297.28 K, and that of VCSEL2 is 2.6 mA at 298.28 K. In the experiment, the bias current (temperature) of VCSEL1 and VCSEL2 is kept constant at 5 mA (297.28 K) and 5 mA (298.28 K), respectively.

The spectra of the two VCSELs under optical injection is presented in Fig. 6. The optical signals are injected into the YP modes of VCSELs, and the peak power in the spectra reveal the orthogonal polarization modes. The wavelengths of XP and YP modes of VCSEL1 are 1558.189 nm and 1558.384 nm, respectively. The wavelengths of XP and YP modes of VCSEL2 are 1558.175 nm and 1558.779 nm, respectively.

 figure: Fig. 6.

Fig. 6. Spectra of VCSEL1 (a) and VCSEL2 (b) under optical injection.

Download Full Size | PDF

It is worth noting that in the numerical simulation neuron model, the neurons are injected with ascending pulses. In VCSEL, the neuron-like characteristics are realized under descending disturbance pulses. Therefore, in the hardware co-design, the ascending pulses injected into VCSEL-SA should be mapped into the corresponding descending disturbance pulses injected into VCSEL neurons. In the experiment, the spike interval was twice that in the simulation, and the full width of the half-peak of spike was 200ps. Then, electrical signals expressed by AWG are modulated and injected into two VCSEL neurons through MZM to achieve the classification and recognition function in the output layer.

In addition, we noticed that in the simulation results as shown in Fig. 4, the first output spike of VCSEL-SA neuron was distributed within the range of 0∼2 ns after the input spike train was injected, where the starting time of the injection of input spike train is 4 ns. The second output spike of VCSEL-SA neuron was distributed in the range of 2∼4 ns after the starting time of the injection of input spike train. It is because of the refractory period and suppression response in hidden layer neurons. When Gaussian pulse train is injected into the hidden layer neurons, the time interval of two output spikes of hidden layer neurons is affected by the intensity of the injected pulses. In addition, the spikes observed in the experiment are not single and clear spikes, which are usually accompanied by burrs. Also, the input pulses interval of VCSEL is twice as much as that of simulation. Therefore, we specify that in the experiment, the spiking time for the two spikes is in 0 ∼4 ns and 4 ∼8 ns, respectively. That is to say, if there are spikes during 0∼4 ns, no matter how many spikes are generated, it is thought to release only one spike. Compared with the SNN based on accurate time information, this scheme is more hardware friendly and robust.

In the experiment, three samples were injected into two VCSEL neurons, and each group of samples was separated by 14 ns to avoid the interaction between spikes. The duration time of a sample is 8 ns. The weighted sum signals from random samples of Iris-setosa, Iris-versicolor and Iris-virginica in the simulation are injected into neurons at the output layer. The experimental results are presented in Fig. 7, where bule signal denotes the input of neurons, and red signal represents for output signal. Figure 7(a1) shows the injected signals of VCSEL1 neurons, from left to right corresponding to (a1), (c1) and (e1) in Fig. 4. Figure 7(a2) shows the output response of VCSEL1. Observing the signal within a relative region of 0∼4 ns and 4∼8 ns, and output spike is related to [1, 1, 2] respectively, corresponding to (b1), (d1) and (f1) in Fig. 4. Similarly, Fig. 7(b1) shows the injected signals of VCSEL2 neurons, corresponding to (a2), (c2) and (e2) in Fig. 4. Figure 7(b2) shows the output response of VCSEL2, which is mapped as [1, 2, 2] in sequence, corresponding to (b2), (d2) and (f2) in Fig. 4. Combining the results of the two VCSELs, it can be seen that the recognized labels are [1, 1], [1, 2], and [2, 2] for the three samples, realizing correct classification and recognition.

 figure: Fig. 7.

Fig. 7. The experimental result. (a1) - (a2) Input and output of VCSEL1; (b1) - (b2) Input and output of VCSEL2. 21 ∼29 ns corresponding to the sample of Iris-setosa; 43 ∼ 51 ns corresponding to the sample of Iris-versicolor; 65 ∼ 73 ns corresponding to the sample of Iris-virginica.

Download Full Size | PDF

To demonstrate the stability of the experimental result, we repeat the experiment for 200 trials. The output signals of VCSEL1 and VCSEL2 are presented in Fig. 8(a) and Fig. 8(b), and the output pattern is [1, 1, 2] and [1, 2, 2], respectively. It can be seen that the output for 200 trials is almost constant, verifying the stability of the results. To further demonstrate the classified capacity of neurons in the output layer, we choose 47 samples from test dataset to implement in the hardware platform. All the 47 samples can be classified correctly in the simulation in photonic SNN with the structure of Fig. 1 and the weights of Fig. 3. With these 47 samples and two epochs, 87 samples are classified correctly through the experimental platform. Thus, the classification accuracy of mapping from simulation to hardware platform is about 92.5% (87/(47 × 2) = 92.5%) based on the BP algorithm. It is may be because that the noise and low precision of experimental platform. For instant, in simulation, two stimuli with different injection strength of 0.1(a.u.) can be distinguished in the photonic neuron, while they cannot be distinguished in the photonic neuron of experimental platform. Furthermore, the classification accuracy of iris dataset in the photonic experimental platform is about 85.25% (92.16% from simulation ×92.5% from mapping to hardware platform = 85.25%).

 figure: Fig. 8.

Fig. 8. Experimental temporal maps plotting superimposed time series captured at the outputs of VCSEL1 (a1) and VCSEL2 (a2).

Download Full Size | PDF

5. Conclusion

In conclusion, we designed a new supervised learning algorithm of SNN for classification task and realize the hardware-algorithm co-design. In the network, input information is encoded in the intensities of Gaussian spike trains, and the output is decoded by the neuromorphic spike patterns of the output neurons. Such coding scheme is more robust in experiment. We numerically and experimentally dementated the SNN on the Iris dataset classification.

In actual optical neuromorphic platform, the calculated results often differ greatly from the theoretical results due to the inconsistency of optical devices. For instance, different kinds of devices (such as VCSEL, Fabry–Pérot laser and distributed feedback semiconductor laser) have different excitability thresholds and different integrated time windows. Therefore, it is also necessary to study the algorithm compensation method of device inconsistency. Finally, how to implement spike coding with good tolerance for photonic SNN structure and algorithm is the focus of future research, and requires collaborative design and optimization of the architecture and algorithm.

Funding

National Key Research and Development Program of China (2021YFB2801900, 2021YFB2801901, 2021YFB2801902, 2021YFB2801904); National Natural Science Foundation of China (No. 61974177, No.61674119, No.62204196, No.62205258); National Outstanding Youth Science Fund Project of National Natural Science Foundation of China (62022062); State Key Laboratory of Advanced Optical Communication Systems and Networks (2022GZKF010); Fundamental Research Funds for the Central Universities (QTZX23041); National Key Laboratory Foundation 2021-JCJQ-LB-006 (Grant No. 6142411512119).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R. Xiao, Y. Wan, B. Yang, H. Zhang, H. Tang, D. F. Wong, and B. Chen, “Towards energy-preserving natural language understanding with spiking neural networks,” IEEE/ACM Trans. Audio Speech Lang. Process. 31, 439–447 (2023). [CrossRef]  

2. M. Dampfhoffer, T. Mesquida, A. Valentian, and L. Anghel, “Are SNNs really more energy-efficient than ANNs? an in-depth hardware-aware study,” IEEE Trans. Emerg. Top. Comput. Intell. 1–11 (2022) (to be published).

3. J. Büchel, G. Lenz, Y. Hu, S. Sheik, and M. Sorbaro, “Adversarial attacks on spiking convolutional neural Networks for event-based vision,” Front. Neurosci. 16, (2022), Art. No. 1068193.

4. A. Balaji, S. Song, A. Das, J. Krichmar, N. Kandasamy, and F. Catthoor, “Enabling resource-aware mapping of spiking neural networks via spatial decomposition,” IEEE Comput. Archit. Lett. 13(3), 142–145 (2020). [CrossRef]  

5. M. Arsalan, A. Santra, and V. Issakov, “Power-efficient gesture sensing for edge devices: mimicking fourier transforms with spiking neural networks,” Appl. Intell. 1, 1–16 (2022). [CrossRef]  

6. F. Ponulak and A. Kasiński, “Supervised learning in spiking neural networks with ReSuMe: sequence learning, classification, and spike shifting,” Neural Comput. 22(2), 467–510 (2010). [CrossRef]  

7. R. Gütig and H. Sompolinsky, “The tempotron: a neuron that learns spike timing-based decisions,” Nat. Neurosci. 9(3), 420–428 (2006). [CrossRef]  

8. C. Lee, G. Srinivasan, P. Panda, and K. Roy, “Deep spiking convolutional neural network trained with unsupervised spike-timing-dependent plasticity,” IEEE Trans. Cogn. Dev. Syst. 11(3), 384–394 (2019). [CrossRef]  

9. Y. Zhang, S. Xiang, X. Guo, A. Wen, and Y. Hao, “A modified supervised learning rule for training a photonic spiking neural network to recognize digital patterns,” Sci. China Inf. Sci. 64(2), 122403 (2021). [CrossRef]  

10. S. Zhou, X. Li, Y. Chen, S. T. Chandrasekaran, and A. Sanyal, “Temporal-coded deep spiking neural network with easy training and robust performance,” In Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 11143–11151 (2021).

11. Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,” Front. Neurosci. 12, 1 (2018), Art. No. 331.

12. J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Front. Neurosci. 10, 1 (2016), Art. No. 508.

13. S. Xiang, S. Jiang, X. Liu, T. Zhang, and L. Yu, “Spiking VGG7: deep convolutional spiking neural network with direct training for object recognition,” Electronics 11(13), 2097 (2022). [CrossRef]  

14. Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie, and L. Shi, “Direct training for spiking neural networks: faster, larger, better,” In Proceedings of the AAAI conference on artificial intelligence, 33(01), 1311–1318 (2019).

15. E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks: bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Process. Mag. 36(6), 51–63 (2019). [CrossRef]  

16. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, et al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science 345(6197), 668–673 (2014). [CrossRef]  

17. M. Davies, N. Srinivasa, T. H. Lin, et al., “Loihi: a neuromorphic manycore processor with on-chip learning,” IEEE Micro 38(1), 82–99 (2018). [CrossRef]  

18. S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri, “A scalable multicore architecture with heterogeneous memory structures for dynamic neuromorphic asynchronous processors (DYNAPs),” IEEE Trans. Biomed. Circuits Syst. 12(1), 106–122 (2018). [CrossRef]  

19. A. Cem, O. Jovanovic, S. Yan, Y. Ding, D. Zibar, and F. D. Ros, “Data-efficient modeling of optical matrix multipliers using transfer learning,” arXivarXiv preprint arXiv:2211.16038. (2022).

20. F. Brückerhoff-Plückelmann, I. Bente, D. wendland, J. Feldmann, C. D. Wright, H. Bhaskaran, and W. Pernice, “A large scale photonic matrix processor enabled by charge accumulation,” Nanophotonics 12(5), 819–825 (2023). [CrossRef]  

21. M. Eslaminia and S. Le Beux, “Toward large scale all-optical spiking neural networks,” in 2022 IFIP/IEEE 30th International Conference on Very Large-Scale Integration (VLSI-SoC), Patras, Greece, 1–6 Oct. 2022 - Oct. 2022.

22. M. J. Filipovich, Z. Guo, M. AI-Qadasi, B. A. Marquez, H. D. Morison, V. J. Sorger, P. R. Prucnal, S. Shekhar, and B. J. Shastri, “Silicon photonic architecture for training deep neural networks with direct feedback alignment,” Optica 9(12), 1323–1332 (2022). [CrossRef]  

23. D. Owen-Newns, J. Robertson, M. Hejda, and A. Hurtado, “Photonic spiking neural networks with highly efficient training protocols for ultrafast neuromorphic computing systems,” arXiv, arXiv:2211.12239 (2022).

24. B. Shi, N. Calabretta, and R. Stabile, “Parallel photonic convolutional processing on-chip with cross-connect architecture and cyclic AWGs,” IEEE J. Select. Topics Quantum Electron. 29(2), 1–10 (2023). [CrossRef]  

25. C. Wu, X. Yang, Y. Chen, and M. Li, “Photonic bayesian neural network using programmed optical noises,” IEEE J. Select. Topics Quantum Electron. 29(2: Optical Computing), 1–6 (2023). [CrossRef]  

26. X. Lian, J. Jiang, J. Fu, X. Wan, X. Liu, Z. Cai, and L. Wang, “Phase-change nanophotonic circuits with crossbar electrodes and integrated microheaters,” IEEE Electron Device Lett. 43(12), 2192–2195 (2022). [CrossRef]  

27. Z. Liu, P. Guo, P. Zhao, W. Hou, and L. Guo, “An energy-efficient non-volatile silicon photonic accelerator for convolutional neural networks (NVSP-CNNs),” In Asia Communications and Photonics Conference, T4A-244. Optical Society of America, (2021). [CrossRef]  

28. I. Teofilovic, J. Crnjanski, M. Banovic, M. Krstic, and D. Gvozdic, “An all-optical perceptron for binary classification,” in 2021 29th Telecommunications Forum (TELFOR), Belgrade, Serbia, 1–4, Nov. 2021 - Nov. 2021.

29. B. Tossoun, D. Liang, and R. G. Beausoleil, “Heterogeneously integrated III/V-on-Si injection seeding laser neuron,” in 2022 28th International Semiconductor Laser Conference (ISLC), Matsue, Japan, 1–2 Oct. 2022 - Oct. 2022.

30. Z. Fu, Z. Wang, P. Bienstman, R. Jiang, J. Wang, and C. Wu, “Programmable low-power consumption all-optical nonlinear activation functions using a micro-ring resonator with phase-change materials,” Opt. Express 30(25), 44943–44953 (2022). [CrossRef]  

31. A. Jha, C. Huang, H.-T. Peng, B. Shastri, and P. R. Prucnal, “Photonic spiking neural networks and graphene-on-silicon spiking neurons,” J. Lightwave Technol. 40(9), 2901–2914 (2022). [CrossRef]  

32. K. E. Chlouverakis and M. J. Adams, “Two-section semiconductor lasers subject to optical injection,” IEEE J. Select. Topics Quantum Electron. 10(5), 982–990 (2004). [CrossRef]  

33. A. G. Vladimirov and D. Turaev, “Model for passive mode locking in semiconductor lasers,” Phys. Rev. A 72(3), 033808 (2005). [CrossRef]  

34. T. Deng, J. Robertson, and A. Hurtado, “Controlled propagation of spiking dynamics in vertical-cavity surface-emitting lasers: towards neuromorphic photonic networks,” IEEE J. Select. Topics Quantum Electron. 23(6), 1–8 (2017). [CrossRef]  

35. S. Y. Xiang, H. Zhang, X. Guo, J. Li, A. Wen, W. Pan, and Y. Hao, “Cascadable neuron-like spiking dynamics in coupled VCSELs subject to orthogonally polarized optical pulse injection,” IEEE J. Select. Topics Quantum Electron. 23(6), 1–7 (2017). [CrossRef]  

36. M. Willemsen, A. S. van de Nes, M. P. van Exter, J. P. Woerdman, M. Brunner, and R. Hövel, “Self-pulsations in vertical-cavity semiconductor lasers,” Appl. Phys. Lett. 77(22), 3514–3516 (2000). [CrossRef]  

37. T. Deng, J. Robertson, Z. Wu, G. Xia, X. Lin, X. Tang, Z. Wang, and A. Hurtado, “Stable propagation of inhibited spiking dynamics in vertical-cavity surface-emitting lasers for neuromorphic photonic networks,” IEEE Access 6, 67951–67958 (2018). [CrossRef]  

38. M. A. Nahmias, B. J. Shastri, A. N. Tait, and P. R. Prucnal, “A leaky integrate-and-fire laser neuron for ultrafast cognitive computing,” IEEE J. Select. Topics Quantum Electron. 19(5), 1–12 (2013). [CrossRef]  

39. J. L. A. Dubbeldam and B. Krauskopf, “Self-pulsations of lasers with saturable absorber: dynamics and bifurcations,” Opt. Commun. 159(4-6), 325–338 (1999). [CrossRef]  

40. S. Xiang, Y. Shi, X. Guo, Y. Zhang, H. Wang, D. Zheng, Z. Song, Y. Han, S. Gao, S. Zhao, B. Gu, H. Wang, X. Zhu, L. Hou, X. Chen, W. Zheng, X. Ma, and Y. Hao, “Hardware-algorithm collaborative computing with photonic spiking neuron chip based on an integrated Fabry–Perot laser with a saturable absorber,” Optica 10(2), 162–171 (2023). [CrossRef]  

41. B. Ma, J. Chen, and W. Zou, “A DFB-LD-based photonic neuromorphic network for spatiotemporal pattern recognition,” 2020 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 1–3 (2020).

42. V. A. Pammi, K. Alfaro-Bittner, M. G. Clerc, and S. Barbay, “Photonic computing with single and coupled spiking micropillar lasers,” IEEE J. Select. Topics Quantum Electron. 26(1), 1–7 (2020). [CrossRef]  

43. T. Chen, P. Zhou, Y. Huang, Y. Zeng, S. Xiang, and N. Li, “Boolean logic gates implemented by a single photonic neuron based on a semiconductor Fano laser,” Opt. Commun. 1(8), 1859–1866 (2022). [CrossRef]  

44. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. P. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]  

45. Y. Han, S. Xiang, Z. Ren, C. Fu, A. Wen, and Y. Hao, “Delay-weight plasticity-based supervised learning in optical spiking neural networks,” Photonics Res. 9(4), B119–B127 (2021). [CrossRef]  

46. S. Xiang, Z. Ren, Z. Song, Y. Zhang, X. Guo, G. Han, and Y. Hao, “Computing primitive of fully VCSEL-based all-optical spiking neural network for supervised learning and pattern classification,” IEEE Trans. Neural Netw. Learning Syst. 32(6), 2494–2505 (2021). [CrossRef]  

47. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature 323(6088), 533–536 (1986). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. The structure of neural network. ${N_{x1}}$-${N_{x4}}$ are neurons in input layer, ${N_{h1}}$-${N_{h4}}$ are neurons in hidden layer, ${N_{o1}}$-${N_{o2}}$ are neurons in output layer. The neurons of hidden layer and output layer are simulated by VCSELs-SA model.
Fig. 2.
Fig. 2. The distribution of iris dataset (a) – (d), the training loss and test accuracy with the training epoch (e). The test accuracies for the three categories are 100%, 88.24%, and 88.24% respectively, separately presented as Acc1, Acc2 and Acc3.
Fig. 3.
Fig. 3. The evolution of weight from input to hidden layer (a) and those from hidden to output layer. (b) The weight distributions of ${\omega _{\textrm{i}h}}$ (c) and ${\omega _{hj}}$ in 300-th training epoch.
Fig. 4.
Fig. 4. The input and output patterns of the output neuron for 3 samples from different categories. (a1) - (b2) Example from Iris-setosa and corresponding output with [1, 1]; (c1) - (d2) Example from Iris-versicolor and corresponding output with [1, 2]; (e1) - (f2) Example from Iris-virginica and corresponding output with [2, 2]. (a1) -(f1) corresponding the inputs and outputs of ${N_{o1}}$; (a2) -(f2) corresponding the inputs and outputs of ${N_{o2}}$.
Fig. 5.
Fig. 5. The experimental setup for the demonstration of the proposed SNN. TL1, TL2: tunable laser; OI1, OI2: optical isolator; PC1, PC2, PC3 and PC4: polarization controllers; MZM1, MZM2: Mach-Zehnder intensity modulators; VOA1, VOA2: variable optical attenuator; LDC Bias & Temp: laser diode bias current and temperature controller. EDFA1, EDFA2: erbium doped fiber amplifier; PD1, PD2: photodetector; OSC: oscilloscope; AWG: arbitrary waveform generator.
Fig. 6.
Fig. 6. Spectra of VCSEL1 (a) and VCSEL2 (b) under optical injection.
Fig. 7.
Fig. 7. The experimental result. (a1) - (a2) Input and output of VCSEL1; (b1) - (b2) Input and output of VCSEL2. 21 ∼29 ns corresponding to the sample of Iris-setosa; 43 ∼ 51 ns corresponding to the sample of Iris-versicolor; 65 ∼ 73 ns corresponding to the sample of Iris-virginica.
Fig. 8.
Fig. 8. Experimental temporal maps plotting superimposed time series captured at the outputs of VCSEL1 (a1) and VCSEL2 (a2).

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

E = 1 2 j = 1 N o ( m j d j ) 2
Δ w h j = η E w h j , Δ ω i h = η E ω i h
Δ ω h j   = η E y j y j β j β j ω h j = η ( d j m j ) y h
Δ ω i h  ={-}  η E y j y j β j β j α h α h ω i h = η j = 1 N o ω h j ( d j m j ) x i
ω h j = ω h j + Δ ω h j
ω i h = ω i h + Δ ω i h
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.