Hybrid photonic deep convolutional residual spiking neural networks for text classification

Yahui Zhang; Yahui Zhang; Shuiying Xiang; Shuiying Xiang; Shuqing Jiang; Yanan Han; Xingxing Guo; Xingxing Guo; Ling Zheng; Yuechun Shi; Yue Hao

doi:10.1364/OE.497218

1. Introduction

With the rapid development of artificial intelligence (AI) and big data technology, the data being generated is growing exponentially. Conventional computers based on the von-Newman architecture are finding it increasingly challenging to meet the demanding computation requirements of data explosion. Implementing spiking neural networks (SNNs) on a neuromorphic platform offers a promising solution for carrying out energy-efficient and low-latency computations [1].

SNN, the third generation of artificial neural network (ANN), uses discrete spikes for message transmission and computation, and possesses powerful capabilities of nonlinear computation, asynchronous event information processing, self-learning and so on [2]. Compared to traditional ANN, SNNs are more biologically plausible. Besides, SNNs have the advantages of energy efficient and low power consumption and are more suitable for embedded system applications with constrained resources [3,4]. However, the signals transmitted in SNNs are discrete binary signals, which cannot be differentiated, and the impulsive form of activation function makes the direct application of gradient-based optimization algorithm challenging. One of the significant challenges of SNN is the training difficulty. Various training algorithms such as unsupervised learning algorithm based on spike-timing dependent plasticity (STDP) rule [5,6], SpikeProp [7,8], Tempotron learning rule [9], and ReSuMe [10], have been developed for shallow SNNs. However, the application of SNN is still limited to small-scale datasets because of the shallow structure.

For deep SNNs, the indirect training method, which is also called the ANN-to-SNN conversion, has attracted lots of attention due to its high performance [11–13]. For a conversion with rate encoding, the ReLU activation function is represented by the firing rate of the integrator-and-fire (IF) neuron. The conversion actually relies on the training of ANN with back propagation algorithm, and avoids the difficulties of direct SNNs training. The conversion method shows high performance in inference and can be implemented with large-scale network structures and datasets. The reported accuracy for a converted SNN with VGG-16 network architecture reached 91.55% on CIFAR-10 and 69.96% on ImageNet [14]. However, this approach has inherent limitations, such as performance degradation due to constraints imposed on the original ANN and the requirement for long-time simulation of several hundred to several thousand-time steps during forward inference, which can result additional latency and energy consumption.

To reduce the inference latency of SNN, direct training with backpropagation has been rapidly developed. With the introduction of surrogate gradients [15,16], the problem of non-differentiable spike functions can be avoided, allowing direct training of deep SNNs that can successfully perform on large datasets. Surrogate gradient works by replacing the gradient of the spiking neuron function with a similar-shaped function that can be differentiated. During forward propagation, the output of the spiking neuron is either 0 or 1 with the step function. During back propagation, if the spiking neuron fires, a surrogate function like the arctan function is used to calculate its gradient. If the neuron does not fire a spike, then the derivative of the activation of the corresponding neuron is zero.

Photonic neuromorphic computing has become increasingly popular in recent years because of its inherent advantages, such as high speed, wide bandwidth, and massive parallelism [17–25]. The training algorithms for photonic SNN have also been developed extensively [26–31]. However, due to the limited integration scale of photonic integrated circuits (PICs), photonic SNNs are currently only capable of shallow architectures and simple AI tasks.

Natural language processing (NLP) is a crucial field of research within AI [32]. Text classification is a significant application of NLP and plays a key role in text information processing [33–39]. Text classification refers to the automatic classification and tagging of text by computer according to certain classification standards. There are numerous applications for text classification, including sentiment analysis, news classification, topic labeling, topic classification, question and answer system, natural language inference, relationship classification, event prediction and so on. However, the photonic SNNs have not yet be utilized for solving the text classification tasks.

In this work, we propose a deep convolutional residual spiking neural network (DCRSNN) and hybrid photonic DCRSNN for solving text classification. By incorporating residual connections into the network, the network can effectively conduct deep training, improve the representational capacity of the network, and make SNNs more suitable for text processing tasks. We utilize the surrogate gradient method to enable end-to-end training of deep SNN on four different public datasets, and conduct simulation verification and comparative analysis under different reset modes, surrogate functions and observation time (T).

The rest of the paper is organized as follows: in Section 2 is devoted to a description of the architecture of the DCRSNN and hybrid photonic DCRSNN, with a focus on each functional part. In Section 3, the training methods and datasets are described. In Section 4, the classification results for the text classification with different conditions are presented. Conclusions are provided in Section 5.

2. Network structure and principle

The overall structure of the DCRSNN network is presented in Fig. 1. It is comprised of three main functional parts, the spike encoder, the spike feature extractor and the spike classifier. Here, the input samples with length being L are first transformed into word vectors with width W through word embedding techniques [40,41]. The resulting word vectors are then convolved by the first convolution layer (Conv1), and the calculated feature map is repeated T times in the time dimension. The resulting tensor of T × L × 1×C is then passed into the spiking neuron layer for activation. After feature extraction by convolution layers Conv2 and Conv3, the tensor of T × (L−2) × 1×C is obtained. After maximum pooling and Conv2 convolution layers, the text length is reduced, but the depth remains the same. The loop continues with maximum pooling and Conv2 convolution layers until the text length is 1, and then the loop is stopped. Finally, the tensor of T × 1 × 1×C is obtained. After calculating the firing rate, the feature map of 1 × 1×C is obtained. In Fig. 1, the blue rectangle represents the word vector, the blue cuboid represents the feature map, and the red cuboid represents the leaky integrate-and-fire (LIF) neuron.

Fig. 1. The overall structure of the DCRSNN network for text classification. The functional parts can be divided into three parts, spike encoder, spike feature extractor and spike classifier. L is the word numbers of an input sample. W is the width of the word vectors, here W = 300. C denotes the number of convolutional kernels, here C = 250. T is the time steps.

Download Full Size | PDF

In the process of network transmission, before and after the convolutional operation, the depth of the tensor remains unchanged, and only the dimension of the plane changes. The dimension of the plane is controlled by the convolutional layer and the pooling layer. The parameters of different convolution layers and pooling layers are as follows: the number of convolution kernels is 250 for Conv1, Conv2 and Conv3. The size of the convolution kernel for Conv1 is k = 3 × 300, the step size is s = 1, and p = (0, 0, 1, 1) means that the up and down padding is used. The size of the convolution kernel of Conv2 is k = 3 × 1, the step size is s = 1, and p = (0, 0, 1, 1). The size of the convolution kernel for Conv3 is k = 3 × 1, the step size is s = 1, and no padding is used. The size of the convolution kernel of MaxPool1 is k = 3 × 1, and s = 2.

In the following, the principle of basic operations and three functional parts will also be introduced.

2.1 Spiking neuron model

Spiking neuron model is a mathematical model used to simulate the working principle of biological neurons. Here, the commonly used spiking neuron model in computational neuroscience, the LIF model is considered [42].

The charging equation of membrane voltage U(t) of LIF neurons can be expressed as:

(1)$$U(t )= U({t - 1} )+ \frac{1}{\tau }({ - U({t - 1} )+ X(t )} )$$

When the neuron is charged, it will release a spike. The discrete equation of the firing stage of the spiking neuron is as follows:

(2)$$S[t ]= \left\{ {\begin{array}{cc} {1,}&{\textrm{if}\;U[t] > {U_{\textrm{th}\;}}}\\ {0,}&{\textrm{otherwise}} \end{array}} \right.$$

2.2 Spike convolution and pooling

Figure 2 shows the convolution operation of the input spikes on the LIF neuron. At time T = 1, the input spike train is {0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1}, the size of convolution kernel is 3 × 3, and the step size is 1. After convolution operation, the membrane voltage of spiking neuron is {0.05, 0.0, -0.35, 0.5}, and the threshold of the neuron membrane voltage is 1. At the current moment, if the membrane voltage is less than the threshold value, it will continue to charge, while if the membrane voltage reaches the threshold, it will discharge and release a spike and then reset the membrane voltage. Finally, the output spike is {0, 0, 0, 0}. At T = 2, the input spike train is {0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1}. The input spike at T = 2 also combines with the membrane voltage at time T = 1 to get the membrane voltage at time T = 2 as {0.075, -0.2, -0.175, 1.05}. The output spike train is {0, 0, 0, 1} after discharge operation. Similarly, at T = 3, the membrane voltage is {0.1375, 0.4, 0.2625, 0.85}, and the output spike train is {0, 0, 0, 0}.

Fig. 2. Spike convolution operation for a LIF neuron with hard reset.

Download Full Size | PDF

Figure 3 shows the maximum pooling and average pooling operations of LIF neurons using hard reset at three time steps. At time T = 1, the input spike train is {0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0}. The pooling kernel size is 2 × 2 and the step size is 2. For the average pooling, after pooling, the membrane voltage at time T = 1 as {0.125, 0.125, 0.25, 0.0}. Since all membrane voltages are less than the threshold, no spike is released. At T = 2, the input spike train is {0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0}, and the membrane voltages after pooling is {0.1875, 0.1875, 0.25, 0.0}, namely, no spike was released. Finally, at the time T = 3, the input spike train is {1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1}. After pooling, the neuron membrane voltage is {0.2188, 0.3438, 0.25, 0.25}. The spike train after discharge is {0, 0, 0, 0}. For the maximum pooling, the output spike train after pooling is {1, 1, 1, 0}, {1, 1, 1, 0} and {1, 1, 1, 1} at time step T = 1, T = 2, and T = 3 respectively.

Fig. 3. Spike pooling operation for a LIF neuron with hard reset.

Download Full Size | PDF

2.3 Spike convolution encoder

The spike convolutional encoder consists of three main parts: the word embedding layer, the first convolutional layer and the spiking neuron layer.

For a text dataset, text is a kind of unstructured data information which cannot be directly calculated. Word embedding is a technique that allows text to be represented as low-dimensional vectors. By mapping words to vectors in a vector space, similar semantic words are clustered together. This approach has broad applicability and is useful for diverse tasks. There are currently two popular embedding algorithms: word2vec [40] and Glove [41].

The input sample is sent to a pre-trained model like Glove, which converts the words in the text into word vectors as shown in Fig. 1. The word/symbols and their corresponding word vectors are listed in Table 1 for some representative examples. The word vectors obtained are then fed into a convolutional layer (Conv1), resulting in a tensor of size L × W × C. This tensor is observed T times repeatedly over the time dimension (T is the observation time step of the spiking neuron) and before being input into the spiking neuron layer for spiking activation. Finally, the tensor of T × L × W × C is obtained. The encoder is integrated into the head of the entire DCRSNN network and participates in backpropagation during training, allowing the parameters of the encoding layer to be learned. By using the spike encoder, we can convert the text data into spike signals, which are then processed by the spike feature extractor to extract higher dimensional features.

Table 1. Words/symbols and the corresponding word vectors of the Glove pre-trained model [41].

View Table | View all tables in this article

2.4 Spike feature extractor

The spike feature extractor is a crucial component responsible for extracting features from the spike train generated by the spike encoder. The process starts with passing the spike train through two convolution layers (Conv2 and Conv3), resulting in a T × (L−2) × 1×C tensor. The tensor is then combined with a residual connection [43–45], as shown in Fig. 4 (a). Following this, we perform the maximum pooling operation (MaxPool1), which reduces the length of the tensor to half of its original size, and the resultant tensor is denoted as PX. After this, features between adjacent words are extracted by Conv2 convolution, and the results are denoted as CX. Finally, following the idea of residual connection in ResNet [43], we add PX and CX to get X (i.e., X = PX + CX). If the spike train length of X is not equal to 1, we repeat the cyclic process shown in Fig. 4(b) until the spike train length becomes L = 1. The final result will be a feature tensor of size T × 1 × 1×C.

Fig. 4. The residual connection of the spike feature extractor. (a) Skip connection, (b) loop until L = 1.

Download Full Size | PDF

2.5 Spike classifier

In DCRSNN, the spike classifier uses a fully connected layer to map input features to the output space. The firing rate of a spiking neuron is then calculated, where more spikes emitted within the observation time T lead to a higher firing rate for a particular class. This increased firing rate leads to a higher probability of determining that particular class. On the other hand, for the hybrid photonic DCRSNN, the spike classifier uses the photonic spiking neuron as the spiking activation.

3. Training methods and datasets

3.1 Datasets

In the field of NLP, common data sets are MR [46], AG News [47], IMDB [48], and Yelp review polarity [49]. MR (Movie Reviews) is a dataset used in sentiment analysis experiments. It is a sentence marked according to the overall emotional polarity or according to the subjective state (subjective or objective), and can be divided into two categories. The training set has 8530 samples, and the test set has 2132 samples. AG News is a classified dataset of news topics, including four categories: World, Sports, Business and Sci/Tech. Each class contains 30,000 training samples and 1,900 test samples. IMDB dataset contains 50,000 movie reviews marked by sentiment (positive/negative). The number of training samples is 40,000, and the number of test samples is 10,000. Yelp review polarity reviews database is a dataset that contains a collection of user reviews from the Yelp website. It consists of textual reviews written by users, along with corresponding ratings and sentiment labels indicating the polarity of each review (positive or negative). The dataset covers various businesses and represents diverse customer experiences and opinions. The dataset contains 560,000 reviews for training and 38,000 for testing.

We pre-processed the text data in our dataset by removing special characters, converting all letters to lowercase, and eliminating word abbreviations. Once pre-processing was complete, we utilized the Glove pre-training model to convert the samples in the dataset into word vectors [41]. Additional parameters for various datasets can be found in Table 2.

Table 2. Sample parameters of different text datasets.

View Table | View all tables in this article

3.2 Training parameters

The hardware environment of this experiment consists of four 18-core Intel Xeon Gold 6240 CPUs with a running frequency of 2.60 GHz and a memory capacity of 32 G. Additionally, it includes an Nvidia Tesla V100 GPU with 32 G video memory. The software environment in this experiment is based on Python language implementation, specifically interpreter version 3.9. The deep learning framework PyTorch with version 1.11.0 was also utilized in combination with the spiking neural network framework SpikingJelly [50] with version 0.0.0.0.8.

Parameters in the training process of the network are shown in Table 3. In the model training parameters, the range of simulation time T is 1-16 ms and the time step is 1 ms. The spiking neurons are LIF neurons and the threshold of membrane voltage was 1 mV. During training, Adam optimization algorithm [51] is used with an initial learning rate of 0.001.The learning rate is adjusted through exponential attenuation with a bottom value of 0.9. For the loss function, cross entropy loss function is utilized. Input samples are mapped to 300-dimensional vectors using pre-trained Glove word vector. The training sample batch size is 256, with 250 convolution kernels, and 50 training epochs.

Table 3. Training parameters of SCNN model.

View Table | View all tables in this article

3.3 Training algorithm

The training process of DCRSNN is based on surrogate gradient that is similar to the traditional backpropagation algorithm [15]. The network weights are adjusted continuously in order to minimize the loss function. This iterative process includes both forward and backward propagation steps. During forward propagation, input samples are fed into the network using the current weights, and the output is calculated. Then, during the backpropagation step, the difference between the output and the expected output is calculated, and these differences are utilized to update the weight to reduce the error. The training process is shown in algorithm 1.

oe-31-17-28489-i001

4. Results and discussions

According to the experimental method described above, the trained network was tested under different conditions. Comparison experiments were conducted with different reset methods, surrogate functions and observation times, and tested on MR, AG News, IMDB and Yelp review polarity datasets.

4.1 Different reset methods and surrogate functions

Table 4 displays the maximum convergence classification accuracy achieved by the model on four datasets when using maximum pooling along with different reset modes and surrogate functions with an observation time of T = 4. The bold numbers present the maximum classification accuracy under different reset modes and surrogate functions for each dataset in DCRSNN. The performance of the deep pyramid CNN (DPCNN) architecture is also included in these experiments for comparison purposes. The maximum classification accuracy obtained by DPCNN across the four datasets is 77.6%, 91.43%, 91.54% and 94.35%, respectively.

Table 4. The classification accuracy of the model using maximum pooling under different reset modes and surrogate functions with T = 4.

View Table | View all tables in this article

From the perspective of soft and hard reset mode, soft-arctan is better than hard-arctan in the four datasets. For MR and AG News datasets, the accuracy of soft-softsign is better than that of hard-softsign. For IMDB and Yelp Review Polarity datasets, hard-softsign is more accurate than soft-softsign. For AG News, IMDB and Yelp Review Polarity datasets, the accuracy of soft-sigmoid is better than that of hard-sigmoid. For MR datasets, the accuracy of hard-sigmoid is better than that of soft-sigmoid. The accuracy of soft and hard resets of different surrogate functions varies in each dataset. Generally, the accuracy of using soft reset method in the deep convolutional SNN is slightly better than hard reset.

From the perspective of different surrogate functions used by spiking neurons, the classification accuracy of sigmoid function is better than that of softsign function on the four datasets, while the accuracy of sigmoid, softsign and arctan function on different data sets is different.

The DCRSNN had the highest classification accuracy for MR, AG News, IMDB and Yelp review polarity sets using soft-arctan, which was 76.36%, 91.03%, 88.06% and 93.99% respectively, very close to the DPCNN method.

4.2 Different pooling methods

Next, we evaluate the effects of different pooling methods on the network performance using a fixed T = 4 with a soft reset method and arctan surrogate function. As can be seen from Fig. 5, on the MR dataset, the convergence accuracy of maximum pooling on the test set is about 76.3%. Under the same conditions, the convergence accuracy for average pooling is 74.3%. On the AG News dataset, the accuracy of maximum pooling is about 90.54%, which is 0.34% higher than that for average pooling. On the IMDB dataset, the accuracy of maximum pooling is 88.06%, which is 1% higher than that of the average pooling method. For the Yelp Review Polarity dataset, convergence accuracy of maximum pooling and average pooling is roughly the same, with convergence accuracy of about 93.58%.

Fig. 5. Convergence accuracy of models on different data sets under maximum pooling and average pooling. The soft reset method is employed, and arctan surrogate function is adopted.

Download Full Size | PDF

4.3 Different observation time T

The observation time T is a crucial parameter in the SNN model. Too small T value can reduce the computational complexity of the model, but cannot accurately describe the changing process of neuron state. Conversely, if T is too large, the calculation amount of the network will be increased. Therefore, selecting an appropriate value for T is important, as it enables the model to precisely simulate the voltage variation of neurons while maintaining optimal performance.

Here, we adopted maximum pooling and soft-arctan, and studied the maximum convergence accuracy for the observation time ranging from T = 1 ms to 12 ms. The experimental results are shown in Fig. 6. For the MR dataset, as observation time T increases, the maximum classification accuracy initially improves and reaches 76.36% at T = 4, which is only 1.24% lower than that of DPCNN. From T = 4, the accuracy fluctuates around 75%. For AG News dataset, the convergence accuracy fluctuates around 91% with the highest convergence accuracy being 91.25% at T = 8, which is 0.18% lower than that of DPCNN. For the IMDB dataset, the maximum convergence accuracy fluctuates around 88%, with the highest convergence accuracy of 88.35% at T = 7, which is 0.49% lower than that of DPCNN. For the Yelp Review Polarity dataset, the maximum convergence accuracy fluctuates around 93%, with the highest convergence accuracy being 93.96% at T = 8. This value is 0.4% lower than that for the DPCNN.

Fig. 6. The maximum convergence accuracy of the model on different data sets under different observation time T.

Download Full Size | PDF

Without loss of generality, we also considered some other models on these NLP datasets. Table 5 shows the classification results of various models in relevant natural language processing datasets. Deep neural network models such as TextCNN [33], CharCNN [34], DPCNN [35] and BERT [36], along with SNN-based models like Spiking Transformer [37] and Converted TextCNN [38]. Out of all these models, BERT exhibited the best sorting performance with an accuracy of 95.49% for IMDB dataset and 98.11% for Yelp review polarity. However, in the SNN domain, our model achieved a maximum accuracy of 76.36% on the MR Dataset, which was 0.91% higher than that of Converted TextCNN (75.45%). Additionally, the maximum accuracy on the IMDB dataset (88.06%) was 1.7% higher than that achieved by Spiking Transformer (86.36%). These results suggest that the SNN model proposed in this paper holds great potential for processing text data.

Table 5. Classification accuracy of other models

View Table | View all tables in this article

4.4 Testing performance for hybrid photonic DCRSNN

At last, we further discuss the hybrid photonic DCRSNN architecture and its spike classifier that is realized using photonic spiking activation. Hybrid photonic DCRNN have several advantages such as high speed, wide bandwidth, low power consumption [52]. We proposed and fabricated a monolithically integrated photonic spiking neuron chip based on a distributed feedback (DFB) laser with embedded saturable absorber (DFB-SA) [53]. The DFB-SA chip contains a gain region and a saturable absorber (SA) region. As can be seen from Fig. 7(a), with proper bias conditions, the self-pulsation output can be observed. Here, the spiking activation function is obtained by fitting the experimentally measured dependence of the spike frequency on the gain current as shown in Fig. 7 (b) and 7(c). By adopting the ANN-to-SNN conversion method with the fitted photonic spiking activation, we performed the experiments for four different datasets, and the results are presented in Table 6. We find that the accuracy for the four datasets is comparable to those shown in Table 4. The highest accuracy achieved is 76.17% for MR, 91.09% for AG News, 87.92% for IMDB and 94.15% for Yelp review polarity.

Fig. 7. (a) The self-pulsation output when the bias current of the gain region is I_G= 130 mA and the reversely bias voltage of the SA region is V_SA= -0.4 V, (b) the measured dependence of the self-pulsation frequency on the gain current for V_SA= -0.4 V and (c) V_SA= -0.8 V.

Download Full Size | PDF

Table 6. The classification accuracy of the hybrid photonic DCRSNN using maximum pooling and T = 4

View Table | View all tables in this article

5. Conclusion

In conclusion, we proposed a deep convolutional SNN with residual connection for performing text classification tasks. The surrogate gradient direct training algorithm is employed for training the DCRSNN. Four datasets are benchmarked to evaluate the network performance. The results show that high classification accuracy can be achieved for four datasets. With different combinations of reset methods and surrogate functions, the highest accuracy reaches 76.36% for MR, 91.03% for AG News, 88.06% for IMDB and 93.99% for Yelp review polarity. We further consider a hybrid photonic DCRSNN architecture and obtain comparable testing performance. To the best of our knowledge, compared to other reported SNN-based text classifiers, our proposed approach achieves the best performance. This work provides a solution for extending the application of SNN and photonic SNN to the field of text classification, and thus, is interesting and valuable for the photonic neuromorphic computing and information processing.

Funding

National Key Research and Development Program of China (2021YFB2801900, 2021YFB2801901, 2021YFB2801902, 2021YFB2801904); National Natural Science Foundation of China (No. 61974177, No.61674119, No.62204196, No.62205258); National Outstanding Youth Science Fund Project of National Natural Science Foundation of China (62022062); Fundamental Research Funds for the Central Universities (QTZX23041).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,” Nature 575(7784), 607–617 (2019). [CrossRef]

2. C. D. Schuman, S. R. Kulkarni, M. Parsa, J. P. Mitchell, P. Date, and B. Kay, “Opportunities for neuromorphic computing algorithms and applications,” Nat. Comput. Sci. 2(1), 10–19 (2022). [CrossRef]

3. W. Maass, “Networks of spiking neurons: The third generation of neural network models,” Neural Networks 10(9), 1659–1671 (1997). [CrossRef]

4. A. Taherkhani, A. Belatreche, Y. Li, G. Cosma, L. P. Maguire, and T. M. McGinnity, “A review of learning in biologically plausible spiking neural networks,” Neural Networks 122, 253–272 (2020). [CrossRef]

5. T. Masquelier and S. J. Thorpe, “Unsupervised learning of visual features through spike timing dependent plasticity,” PLoS Comput. Biol. 3(2), e31 (2007). [CrossRef]

6. P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers Comput. Neurosci. 9, 99 (2015). [CrossRef]

7. S. M. Bohte, J. N. Kok, and H. L. Poutre, “Error-backpropagation in temporally encoded networks of spiking neurons,” Neurocomputing 48(1-4), 17–37 (2002). [CrossRef]

8. Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,” Front. Neurosci. 12, 331 (2018). [CrossRef]

9. R. Gütig and H. Sompolinsky, “The tempotron: A neuron that learns spike timing-based decisions,” Nature Neurosci. 9(3), 420–428 (2006). [CrossRef]

10. F. Ponulak and A. Kasiński, “Supervised learning in spiking neural networks with ReSuMe: sequence learning, classification, and spike shifting,” Neural Comput. 22(2), 467–510 (2010). [CrossRef]

11. Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energy-efficient object recognition,” Int. J. Comput. Vision 113(1), 54–66 (2015). [CrossRef]

12. B. Rueckauer, I. A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of continuous-valued deep networks to efficient event-driven networks for image classification,” Front. Neurosci. 11, 682 (2017). [CrossRef]

13. J. H. Ding, Z. F. Yu, Y. H. Tian, and T. J. Huang, “Optimal ANN-SNN conversion for fast and accurate inference in deep spiking neural networks,” arXiv, arXiv:2105.11654 (2021). [CrossRef]

14. A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” Front. Neurosci. 13, 95 (2019). [CrossRef]

15. E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Proc. Mag. 36(6), 51–63 (2019). [CrossRef]

16. B. Cramer, S. Billaudelle, S. Kanya, A. Leibfried, A. Grubl, V. Karasenko, C. Pehle, K. Schreiber, Y. Stradmann, J. Weis, J. Schemmel, and F. Zenke, “Surrogate gradients for analog neuromorphic computing,” Proc. Natl. Acad. Sci. 119(4), e2109194119 (2022). [CrossRef]

17. S. Xiang, S. Jiang, X. Liu, T. Zhang, and L. Yu, “Spiking VGG7: Deep convolutional spiking neural network with direct training for object recognition,” Electronics 11(13), 2097 (2022). [CrossRef]

18. B. J. Shastri, A. N. Tait, T. Ferreira de Lima, W. H. Pernice, H. Bhaskaran, C. D. Wright, and P. R. Prucnal, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15(2), 102–114 (2021). [CrossRef]

19. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljacic, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

20. A. N. Tait, T. Ferreira de Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 7430 (2017). [CrossRef]

21. H.-T. Peng, M. A. Nahmias, T. F. De Lima, A. N. Tait, and B. J. Shastri, “Neuromorphic photonic integrated circuits,” IEEE J. Sel. Top. Quantum Electron. 24(6), 1 (2018). [CrossRef]

22. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]

23. H. Zhou, Y. Zhao, G. Xu, X. Wang, Z. Tan, J. Dong, and X. Zhang, “Chip-scale optical matrix computation for PageRank algorithm,” IEEE J. Sel. Top. Quantum Electron. 26(2), 1–10 (2020). [CrossRef]

24. C. Huang, S. Bilodeau, T. F. De Lima, A. N. Tait, P. Y. Ma, E. C. Blow, A. Jha, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, “Demonstration of scalable microring weight bank control for large-scale photonic integrated circuits,” APL Photonics 5(4), 040803 (2020). [CrossRef]

25. Q. Cen, H. Ding, T. Hao, S. Guan, Z. Qin, J. Lyu, W. Li, N. Zhu, K. Xu, Y. Dai, and M. Li, “Large-scale coherent Ising machine based on optoelectronic parametric oscillator,” Light: Sci. Appl. 11(1), 333 (2022). [CrossRef]

26. S. Xiang, Y. Zhang, J. Gong, X. Guo, L. Lin, and Y. Hao, “STDP-based unsupervised spike pattern learning in a photonic spiking neural network with VCSELs and VCSOAs,” IEEE J. Sel. Top. Quantum Electron. 25(6), 1–9 (2019). [CrossRef]

27. S. Xiang, Z. Ren, Z. Song, Y. Zhang, X. Guo, G. Han, and Y. Hao, “Computing primitive of fully VCSEL-based all-optical spiking neural network for supervised learning and pattern classification,” IEEE Trans. Neural Networks Learn. Syst. 32(6), 2494–2505 (2021). [CrossRef]

28. S. Xiang, Z. Ren, Y. Zhang, Z. Song, X. Guo, G. Han, and Y. Hao, “Training a multi-layer photonic spiking neural network with modified supervised learning algorithm based on photonic STDP,” IEEE J. Sel. Top. Quantum Electron. 27(2), 1–9 (2021). [CrossRef]

29. Y. Han, S. Xiang, Z. Ren, C. Fu, A. Wen, and Y. Hao, “Delay-weight plasticity-based supervised learning in optical spiking neural networks,” Photonics Res. 9(4), B119–B127 (2021). [CrossRef]

30. Z. Song, S. Y. Xiang, S. Zhao, Y. Zhang, X. Guo, Y. Tian, Y. Shi, and Y. Hao, “A hybrid-integrated photonic spiking neural network framework based on an MZI array and VCSELs-SA,” IEEE J. Sel. Top. Quantum Electron. 29(2: Optical Computing), 1–11 (2023). [CrossRef]

31. S. Xiang, Y. Shi, X. Guo, Y. Zhang, H. Wang, D. Zheng, Z. Song, Y. Han, S. Gao, S. Zhao, B. Gu, H. Wang, X. Zhu, L. Hou, X. Chen, W. Zheng, X. Ma, and Y. Hao, “Hardware-algorithm collaborative computing with photonic spiking neuron chip based on integrated Fabry- Pérot laser with saturable absorber,” Optica 10(2), 162–171 (2023). [CrossRef]

32. J. Hirschberg and C. D. Manning, “Advances in natural language processing,” Science 349(6245), 261–266 (2015). [CrossRef]

33. Y. Chen, “Convolutional neural network for sentence classification,” University of Waterloo, 2015.

34. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Adv. Neural Inf. Process. Syst., 2015, 28.

35. R. Johnson and T. Zhang, “Deep pyramid convolutional neural networks for text categorization,” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long Papers, pp. 562–570, 2017.

36. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “2019 BERT: pre-training of deep bidirectional transformers for language understanding,” Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis, MN: Association for Computational Linguistics) pp. 4171–4186.

37. E. Mueller, V. Studenyak, D. Auge, and A. Knoll, “Spiking transformer networks: A rate coded approach for processing sequential data,” 2021 7th International Conference on Systems and Informatics (ICSAI). IEEE, pp. 1–5, 2021.

38. C. Lv, J. Xu, and X. Zheng, “Spiking convolutional neural networks for text classification,” ICLR 2023.

39. J. Huang, A. Serb, S. Stathopoulos, and T. Prodromakis, “Text classification in memristor-based spiking neural networks,” Neuromorph. Comput. Eng. 3(1), 014003 (2023). [CrossRef]

40. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv, arXiv:1301.3781 (2013). [CrossRef]

41. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014: 1532–1543.

42. E. Izhikevich, “Which model to use for cortical spiking neurons?” IEEE Trans. Neural Netw. 15(5), 1063–1070 (2004). [CrossRef]

43. K. He, X. Zhang, S. Ren, J. Dai, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

44. Y. Hu, H. Tang, and G. Pan, “Spiking deep residual network,” IEEE Trans. Neural Netw., Nov. 2021 (Early Access).

45. W. Fang, Z. Yu, Y. Chen, T. Huang, T. Masquelier, and Y. Tian, “Deep residual learning in spiking neural networks,” Advances in Neural Information Processing Systems, 2021, 34.

46. MR Corpus. http://www.cs.cornell.edu/people/pabo/movie-review-data/, 2002.

47. AG Corpus. http://www.di.unipi.it/∼gulli/AG_corpus_of_news_ articles. html, 2004.

48. Q. Diao, M. Qiu, C. Y. Wu, A. Smola, J. Jiang, and C. Wang, “Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS),” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014, pp. 193–202.

49. D. Tang, B. Qin, and T. Liu, “Document modeling with gated recurrent neural network for sentiment classification,” Proceedings of the 2015 conference on empirical methods in natural language processing. 2015, pp. 1422–1432.

50. Github. Available online: https://github.com/fangwei123456/spikingjelly (accessed on 17 December 2019).

51. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

52. P. R. Prucnal, B. J. Shastri, T. F. de Lima, M. A. Nahmias, and A. N. Tait, “Recent progress in semiconductor excitable lasers for photonic spike processing,” Adv. Opt. Photonics 8(2), 228–299 (2016). [CrossRef]

53. Y. Shi, S. Xiang, X. Guo, Y. Zhang, H. Wang, D. Zheng, Y. Zhang, Y. Han, Y. Zhao, X. Zhu, X. Chen, X. Li, and Y. Hao, “Photonic integrated spiking neuron chip based on self-pulsating DFB with saturable absorber,” Photonics Res. 11(8), 1382–1389 (2023). [CrossRef]

Word or symbol	Word vectors
The	0.04656, 0.21318, -0.0074364, …, -0.20989, 0.053913
,	-0.25539, -0.25723, 0.13169, …, -0.12226, 0.35499
.	-0.12559, 0.01363, 0.10306, …, -0.022394, 0.13684
Of	-0.076947, -0.021211, 0.21271,…, -0.29183, -0.046533
…	…
To	-0.25756, -0.057132, -0.6719,…, 0.046744, -0.070621
And	0.038466, -0.039792, 0.082747,…, 0.011807, 0.059703
In	-0.44399, 0.12817, -0.25247,…, -0.082191, -0.06255

Datasets	MR	AG News	IMDB	Yelp review polarity
Total sample size	10662	127600	50000	598000
Training sets size	8530	120000	40000	560000
Test sets size	2132	7600	10000	38000
Classes number	2	4	2	2
The average vocabulary numbers of each sample	25	50	290	165

Description	Value	Description	Value
Time window T	0-12 ms	Loss function	cross entropy
Time step dt	1 ms	Pre-trained word vector	Glove
Membrane voltage threshold U_th	1 mV	Training sample batch size	256
Optimizer	Adam	Convolutional kernel number	250
Initial learning rate	0.001	The maximum training epoch	50
Learning rate decay	exponential decay

Model	Reset method	Surrogate functions	Classification accuracy (%)
Model	Reset method	Surrogate functions	MR	AG News	IMDB	Yelp review polarity
DPCNN	-	-	77.6	91.43	88.84	94.36
DCRSNN	Soft	arctan	76.36	90.54	88.06	93.58
	Soft	softsign	74.44	90.30	87.36	93.24
	Soft	sigmoid	74.86	91.03	87.87	93.99
	Hard	arctan	74.62	90.17	87.54	93.45
	Hard	softsign	73.87	89.93	87.67	92.88
	Hard	sigmoid	75.09	90.80	87.73	93.87

Model	Reset method	Surrogate functions	Classification accuracy (%)
Model	Reset method	Surrogate functions	MR	AG News	IMDB	Yelp review polarity
Hybrid photonic DCRSNN	Soft	arctan	74.53	91.07	87.73	93.68
	Soft	softsign	76.17	90.97	87.21	93.53
	Soft	sigmoid	74.20	89.61	87.92	94.15
	Hard	arctan	73.92	91.09	87.59	93.67
	Hard	softsign	73.08	90.68	87.64	93.29
	Hard	sigmoid	73.92	89.64	87.63	94.06

Hybrid photonic deep convolutional residual spiking neural networks for text classification

Abstract

1. Introduction

2. Network structure and principle

2.1 Spiking neuron model

2.2 Spike convolution and pooling

2.3 Spike convolution encoder

2.4 Spike feature extractor

2.5 Spike classifier

3. Training methods and datasets

3.1 Datasets

3.2 Training parameters

3.3 Training algorithm

4. Results and discussions

4.1 Different reset methods and surrogate functions

4.2 Different pooling methods

4.3 Different observation time T

4.4 Testing performance for hybrid photonic DCRSNN

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (6)

Equations (2)

Optics Express