Bi-directional gated recurrent unit neural network based nonlinear equalizer for coherent optical communication system

Xinyu Liu; Xinyu Liu; Xinyu Liu; Yongjun Wang; Yongjun Wang; Yongjun Wang; Xishuo Wang; Xishuo Wang; Xishuo Wang; Hui Xu; Hui Xu; Hui Xu; Chao Li; Chao Li; Chao Li; Xiangjun Xin; Xiangjun Xin; Xiangjun Xin

doi:10.1364/OE.416672

1. Introduction

The emergence of data-intensive applications, such as e-commerce, e-science and high-definition video, has brought us into the "big data" era [1]. With the rapid development of Internet and big data services, the consequent bandwidth demands have stimulated researches and developments for high-speed, large-capacity optical transmission and highly flexible optical network technologies [2–8]. With the combination of high order modulation formats, coherent detection technology and digital signal processing (DSP), coherent optical communication system can simultaneously achieve high spectral efficiency, large capacity and long-haul transmission [9]. However, for high-speed coherent optical communication system, nonlinear impairments caused by devices and fiber transmission links can severely limit the transmission distance as well as the achievable capacity [10]. Therefore, it is crucial to compensate the nonlinear impairments in the coherent optical communication system.

To compensate the nonlinear impairments effectively, many nonlinear equalization methods used in the receiver DSP have been proposed. Well-known DSP techniques for nonlinear equalization include digital back-propagation (DBP) [11], Volterra series approximations [12] and perturbation-based nonlinear equalization [13]. All the above-mentioned nonlinear equalization methods have been widely investigated because of their effectiveness in eliminating system nonlinear effects [14–17]. Recently, machine learning (ML) techniques have been explored for dealing with nonlinear equalization in optical communication systems. K-means [18], fuzzy logic C-means [19], hierarchical clustering [19] and Gaussian mixture models (GMM) [20] have been used for constellation points clustering. K-nearest neighbors (KNN) [21] and support vector machines (SVM) [22] have been used as nonlinear classifiers to classify constellation points. Hidden Markov model [23], and a variety of neural networks (NNs) such as deep neural networks (DNNs) [24,25], convolutional neural networks (CNNs) [26] and recurrent neural networks (RNNs) [27,28] are proposed as the nonlinear equalizer to effectively deal with the nonlinear impairments. The long short-term memory (LSTM) is a special RNN architecture that has powerful modeling capabilities for long-term dependencies and can extract the correlation between the past and current data [29]. LSTM neural network based nonlinear equalizer (LSTM NLE) has been successfully applied in intensity modulation/direct detection systems (IM/DD) [30] and visible light communications (VLC) [31].

Long haul fiber transmission leads to the accumulation of nonlinearity-induced distortion. All in principle bi-directional RNNs (bi-RNNs) can efficiently handle not only inter-symbol interference (ISI) among preceding and succeeding symbols caused by chromatic dispersion (CD), but also the nonlinear impairments caused by devices and fiber transmission links [32]. The effectiveness of bi-directional LSTM (bi-LSTM) neural network based nonlinearity mitigation technique is verified by numerical simulation in digital coherent optical communication systems [32]. However, the structure of LSTM cell is complex with an input gate, an output gate and a forget gate, resulting in heavy calculations of the parameters. A gated recurrent unit (GRU) was proposed in 2014, which only contains two gates (namely reset gate and update gate) [33]. GRU is a less complex variant compared to LSTM, the number of parameters in GRU is less than that in LSTM.

In this paper, a bi-directional GRU (bi-GRU) neural network is proposed for nonlinear equalization in coherent optical communication systems. Experimental verification of the proposed bi-GRU neural network based nonlinear equalizer (bi-GRU NLE) is demonstrated in a 120 Gb/s 64-quadrature amplitude modulation (64-QAM) coherent optical communication system with a transmission distance of 375 km. And the complexity analysis is also conducted in two aspects: the number of parameters and the number of multiplications required for the nonlinear equalizer to equalize per symbol. Results show that, bi-GRU NLE can significantly mitigate nonlinear distortion and improve the Q-factors. The Q-factors can exceed the hard-decision forward error correction (HD-FEC) limit of 8.52 dB with the aid of bi-GRU NLE, when the launched optical power is in the range of -3 dBm to 3 dBm. When the launched optical power is in the range of 0 dBm to 2 dBm, the Q-factor performances of bi-GRU NLE and bi-LSTM NLE are similar, while the number of parameters of bi-GRU NLE is about 20.2$\%$ less than that of bi-LSTM NLE, the average training time of bi-GRU NLE is shorter than that of bi-LSTM NLE, and the number of multiplications required for the bi-GRU NLE to equalize per symbol is about 24.5$\%$ less than that for bi-LSTM NLE.

2. Principles of bi-GRU based nonlinear equalizer

2.1 Bi-GRU model

RNNs have the ability to process sequential data [34]. And RNNs are able to learn some information from previous data when dealing with the current data. The LSTM and GRU are improved RNN models that have powerful modeling capabilities for long-term dependencies, and GRU is a less complex variant compared to LSTM [35]. Figure 1 demonstrates the detailed structure of a GRU unit. The details of a GRU unit are described in Ref. [33]. A GRU unit is composed of a reset gate ${r_t}$ and an update gate ${z_t}$ . The output ${h_t}$ is determined by both current input ${x_t}$ and previous state ${h_{t - 1}}$ under the control of these two gates. The outputs of the gates and the GRU unit are calculated as follows:

(1)$$\begin{array}{l} {r_t} = \sigma \left( {{W_r}{x_t} + {U_r}{h_{t - 1}} + {b_r}} \right)\\ {z_t} = \sigma \left( {{W_z}{x_t} + {U_z}{h_{t - 1}} + {b_z}} \right)\\ {{\tilde h}_t} = \tanh \left[ {{W_h}{x_t} + {U_h}\left( {{r_t} \odot {h_{t - 1}}} \right) + {b_h}} \right]\\ {h_t} = \left( {1 - {z_t}} \right) \odot {h_{t - 1}} + {z_t} \odot {{\tilde h}_t} \end{array}$$

where ${W_r}$ , ${U_r}$ , ${W_z}$ , ${U_z}$ , ${W_h}$ and ${U_h}$ are the weight matrices. ${b_r}$ , ${b_z}$ , ${b_h}$ are the synthesis of bias vectors for input ${x_t}$ and previous state ${h_{t - 1}}$ , $\sigma$ is the logistic sigmoid function, $\tanh$ is the hyperbolic tangent activation function, $\odot$ denotes the Hadamard product.

Fig. 1. Detailed structure of a GRU unit.

Download Full Size | PDF

Models with bi-directional structure have the ability to learn information from previous and subsequent data when dealing with the current data. The structure of the bi-GRU model diagram is shown in Fig. 2. The bi-GRU model is determined based on the state of two GRUs, which are unidirectional in opposite directions [36]. One GRU that moves forward, beginning from the start of the data sequence, the other GRU that moves backward, beginning from the end of the data sequence. This allows the information from both future and past to impact the current states. The bi-GRU is defined as follows:

(2)$$\begin{array}{c} \overrightarrow {{h_t}} = GR{U_{fwd}}\left( {{x_t},\overrightarrow {{h_{t - 1}}} } \right)\\ \overleftarrow {{h_t}} = GR{U_{bwd}}\left( {{x_t},\overleftarrow {{h_{t + 1}}} } \right)\\ {h_t} = \overrightarrow {{h_t}} \oplus \overleftarrow {{h_t}} \end{array}$$

where $\overrightarrow {{h_t}}$ is the state of the forward GRU, $\overleftarrow {{h_t}}$ is the state of the backward GRU, $\oplus$ indicates the operation of concatenating two vectors.

Fig. 2. Structure of a bi-GRU model.

Download Full Size | PDF

2.2 Bi-GRU based nonlinear equalization scheme for M-QAM signals

We propose a nonlinear equalization scheme for M-QAM signals by the use of bi-GRU network. Denote the received M-QAM signal sequence as $\boldsymbol {r} = [{\boldsymbol {r}_1},{\boldsymbol {r}_2},\ldots ,{\boldsymbol {r}_T}]$ , where the vector ${{\boldsymbol {r}}_i}\left ( {i = 1,2,\ldots ,T} \right )$ corresponds to the i-th symbol, each vector ${{\boldsymbol {r}}_i}\left ( {i = 1,2,\ldots ,T} \right )$ contains two values (data of in-phase (I) and quadrature (Q) components), ${\boldsymbol {r}_i} = \left [ {{I_i},{Q_i}} \right ]$. The corresponding predicted sequence is denoted as $\boldsymbol {y}{ = }[{y_1},{y_2},\ldots ,{y_T}]$, the vector value $y \in \left \{ {1,2,\ldots ,M} \right \}$ stand for different M classes of M-QAM signal (In terms of 64-QAM, M is equal to 64).

Figure 3 shows the architecture of the proposed bi-GRU network for the implementation of nonlinear equalization. The first layer is the input layer, for each symbol ${{\boldsymbol {r}}_i}\left ( {i = 1,2,\ldots ,T} \right )$, we wrap the current symbol ${\boldsymbol {r}_i}$ with its k preceding and k succeeding symbols together as ${\boldsymbol {x}^{(i)}} = \left [ {{\boldsymbol {r}_{i - k}},\ldots ,{\boldsymbol {r}_i},\ldots ,{\boldsymbol {r}_{i + k}}} \right ]$ , which is used as the input sequence of the bi-GRU network. The second layer is the bi-GRU model layer. The recurrent time steps of the bi-GRU model depend on the length of the input sequence. In our paper, the recurrent time steps are set to 2k+1. The hidden state ${\boldsymbol {h}_t}$ contains the symbols information flow among recurrent time steps. The output of the bi-GRU model layer are fully connected to the linear layer. The number of nodes in the linear layer is the same as the number of classes of the M-QAM signals. Denote the output of the linear layer as $\boldsymbol {z} = \left [ {{z_1},{z_2},\ldots ,{z_M}} \right ]$. The Softmax layer outputs are the probabilities that the current symbol ${\boldsymbol {r}_i}$ map to each class, which can be expressed as:

(3)$$P\left( {y = j\left| {\boldsymbol{x}} \right.} \right) = \frac{{{e^{{z_j}}}}}{{\sum\nolimits_{m\textrm{ = }1}^M {{e^{{z_m}}}} }}\textrm{ , } \,j = 1,2,\ldots,M.$$

Where $P\left ( {y = j\left | {\boldsymbol {x}} \right .} \right )$ represents the probability that the current symbol ${\boldsymbol {r}_i}$ map to the j-th class, ${z_j}\left ( {j = 1,2,\ldots ,M} \right )$ is the output of the linear layer, M represents the M classes of the M-QAM signal.

Fig. 3. Structure of bi-GRU based nonlinear equalizer.

Download Full Size | PDF

Then, the output layer outputs the corresponding predicted class of the current symbol ${\boldsymbol {r}_i}$ with the maximum probability. Thus, the corresponding predicted class ${y_i}\left ( {i = 1,2,\ldots ,T} \right )$ of the i-th symbol ${{\boldsymbol {r}}_i}\left ( {i = 1,2,\ldots ,T} \right )$ is obtained.

2.3 Complexity analysis

We analyze the complexity of the proposed bi-GRU NLE and compare it with that of bi-LSTM neural network based nonlinear equalizer (bi-LSTM NLE) and bi-RNN based nonlinear equalizer (bi-RNN NLE). We focus on analyzing the complexity in two aspects: the number of parameters of each nonlinear equalizer, and the number of multiplications required for the nonlinear equalizer to equalize per symbol.

The parameters of the proposed bi-GRU NLE include the parameters of the bi-GRU layer and the parameters of the linear layer. For an GRU unit shown in Fig.1, according to Eq. (1), parameters of an GRU unit contain three weight matrices for the input ${x_t}$ , three weight matrices for the previous state ${h_{t - 1}}$ , three bias vectors for input ${x_t}$ , and three bias vectors for the previous state ${h_{t - 1}}$ . The size of the input feature for input ${x_t}$ is denoted as $1 \times F$ , the size of the hidden state ${h_{t}}$ is denoted as $1 \times H$ , then, the size of the weight matrices for the input ${x_t}$ is $F \times H$ , the size of the weight matrices for the previous state ${h_{t - 1}}$ is $H \times H$ , the size of the bias vectors is $1 \times H$ . Thus, the number of parameters of an GRU unit can be calculated as ${N_{P\_GRU}} = 3 \times \left ( {FH + {H^2} + 2H} \right )$ . The bi-GRU layer is composed of two GRUs, which are unidirectional in opposite directions, then, the number of parameters of the bi-GRU layer can be calculated as ${N_{P\_bi - GRU}} = 2 \times 3 \times \left ( {FH + {H^2} + 2H} \right )$ . For the linear layer, the output of the bi-GRU layer are fully connected to the linear layer of M units, the size of the output of the linear layer is $1 \times M$ . According to Eq. (2), the size of the output of the bi-GRU layer is $1 \times 2H$ . Then the size of the weight matrics for the input of the linear layer is $2H \times M$ , the size of the bias vector is $1 \times M$ , the number of parameters of the linear layer can be calculated as ${N_{P\_Linear}} = 2HM + M$ . Therefore, the number of parameters of the proposed bi-GRU NLE can be summarized as:

(4)$${N_{P\_bi - GRU\textrm{ }NLE}} = {N_{P\_bi - GRU}} + {N_{P\_Linear}} = 2 \times 3 \times \left( {FH + {H^2} + 2H} \right) + 2HM + M.$$

For bi-LSTM NLE, according to the description of an LSTM unit in Ref. [32], parameters of an LSTM unit contain four weight matrices for the input ${x_t}$ , four weight matrices for the previous state ${h_{t - 1}}$ , four bias vectors for input ${x_t}$ , and four bias vectors for the previous state ${h_{t - 1}}$ . Therefore, the number of parameters of the bi-LSTM NLE can be summarized as:

(5)$${N_{P\_bi - LSTM\textrm{ }NLE}} = 2 \times 4 \times \left( {FH + {H^2} + 2H} \right) + 2HM + M.$$

For bi-RNN NLE, according to the description of the recurrent unit for RNN in Ref. [27], parameters of the recurrent unit contain one weight matrices for the input ${x_t}$ , one weight matrices for the previous state ${h_{t - 1}}$ , one bias vectors for input ${x_t}$ , and one bias vectors for the previous state ${h_{t - 1}}$ . Therefore, the number of parameters of the bi-RNN NLE can be summarized as:

(6)$${N_{P\_bi - RNN\textrm{ }NLE}} = 2 \times \left( {FH + {H^2} + 2H} \right) + 2HM + M.$$

The number of multiplications required for the bi-GRU NLE to equalize per symbol include the number of multiplications in bi-GRU layer and the number of multiplications in the linear layer. Note that the activation function is assumed to be implemented through a look-up table (LUT) [24].

For the bi-GRU layer, the number of multiplications can be calculated as:

(7)$${N_{M\_bi - GRU}} = 2 \times \left[ {3 \times \left( {FH + {H^2}} \right) + 3H} \right] \times L,$$

where is L the length of the input sequence. For the linear layer, the number of multiplications is $2HM$.

Thus, the number of multiplications required for the bi-GRU NLE to equalize per symbol can be calculated as:

(8)$${N_{M\_bi - GRU\textrm{ }NLE}} = 2 \times \left[ {3 \times \left( {FH + {H^2}} \right) + 3H} \right] \times L + 2HM.$$

For the nonlinear equalizers based on bi-LSTM, the number of multiplications required to equalize per symbol can be calculated as:

(9)$${N_{M\_bi - LSTM\textrm{ }NLE}} = 2 \times \left[ {4 \times \left( {FH + {H^2}} \right) + 3H} \right] \times L + 2HM.$$

For the nonlinear equalizers based on bi-RNN, the number of multiplications required to equalize per symbol can be calculated as:

(10)$${N_{M\_bi - RNN\textrm{ }NLE}} = 2 \times \left( {FH + {H^2}} \right) \times L + 2HM.$$

It can be estimated that, under the circumstance that the input and output structures of the networks are the same, bi-GRU NLE has fewer parameters and multiplication operations than bi-LSTM NLE. Model with fewer parameters has less computational overhead.

3. Experimental setup

Figure 4(a) depicts the experimental setup for 120 Gb/s 64-QAM coherent optical communication system. At the transmitter side, symbol sequences of the 64-QAM signals are generated by a MATLAB program, and then uploaded to an arbitrary waveform generator (AWG) with a sampling rate of 25 GSa/s. Each analog signal is amplified by an electric amplifier (EA) and then sent into the in-phase/quadrature (I/Q) modulator. The nominal linewidth of the external cavity laser (ECL) is 100 kHz. Then, the modulated optical signal is generated by the I/Q modulator. The polarization multiplexing of the signal is realized by the polarization-division-multiplexing (PDM) module, which consists of a polarization-maintaining optical coupler (PM-OC), optical delay line, a polarization controller (PC), and polarization beam combiner (PBC). The Erbium-doped fiber amplifier (EDFA) is used to amplify the signal, and a variable optical attenuator (VOA) is used to adjust the signal power. The transmission link consists of multi-span G.652D single-mode fiber (SMF) with length of 75 km. At the end of each span, an EDFA is used to compensate the fiber loss. The number of the span is 5. At the receiver side, an ECL with a linewidth of 100 kHz is used as the local oscillator (LO) for coherent detection. The optical polarization- and phase-diversity coherent receiver front-end is composed of a LO, two polarization beam splitters (PBSs), two 90-degree optical hybrids, and four balanced photodetectors (BPDs). The X- and Y-polarization components of the received optical signal and the local oscillator are separately combined and detected by two identical phase-diversity receivers. A phase-diversity receiver is composed of a 90-degree optical hybrid and two BPDs. A 4-channel digital phosphor oscilloscope (DPO) with a sampling rate of 100 GSa/s is used to digitize the signals.

Fig. 4. (a) Experimental setup; (b) Receiver side offline DSP flow.

Download Full Size | PDF

The receiver side offline DSP is shown in Fig. 4(b). The offline DSP consists of low-pass filter, amplitude normalization, chromatic dispersion compensation (CDC), clock recovery, resampling, Gram-Schmidt orthogonalizing process (GSOP), constant modulus algorithm (CMA) equalization, frequency offset estimation (FOE), carrier phase estimation (CPE) based on blind phase search, bi-GRU neural network based nonlinear equalizer (bi-GRU NLE), 64-QAM demapping and bit error ratio (BER) calculation.

The bi-GRU network is built, trained and evaluated in Pytorch 1.6.0. In our model, Cross Entropy Loss is chosen as the loss function, and Adam optimizer are employed to optimize the bi-GRU network. The whole data set is divided into training data (60$\%$) and testing data (40$\%$). The maximum training epochs is set to 300. The learning rate is set to 0.001. When the accuracy does not improve for 20 successive epochs, the training is early terminated to prevent overfitting.

4. Results and discussion

In this section, we will analyze the experimental results of the bi-GRU NLE in a 120 Gb/s 64-QAM coherent optical communication system with a transmission distance of 375 km.

The measured launched optical power ranges from -4 dBm to 5 dBm. The received constellation diagrams of the 64-QAM signals before the nonlinear equalizer are illustrated in Fig. 5. The corresponding launched optical power (LOP) of Fig. 5(a)-(c) are -4 dBm, -1 dBm and 5 dBm, respectively. It can be seen that, the 64-QAM signals suffer from severe nonlinear distortion which is mainly induced by the fiber nonlinearity and devices nonlinearity.

Fig. 5. Received constellation diagrams of the 64-QAM signals before the NLE. (a) LOP = -4 dBm; (b) LOP = -1 dBm; (c) LOP = 5 dBm.

Download Full Size | PDF

Figure 6 presents the BER performance of bi-GRU NLE with different numbers of preceding and succeeding symbols k. The launched optical power is 1 dBm. As shown in the figure, when the value of k exceeds 3, bi-GRU NLE can achieve BER below the HD-FEC limit of 3.8$\times$10$^{-3}$ [37]. We can get optimal BER performance when the value of k is 11, and a larger k cannot lead to a better BER. For smaller k value, the equalizer could not handle all the inter-symbol interferences. For larger k value, a more complex structure is required to handle the inter-symbol interferences, or it may cause overfitting in the training process, and then affect the performance of the nonlinear equalization.

Fig. 6. BER performance of bi-GRU NLE with different numbers of preceding and succeeding symbols k when the launched optical power is 1 dBm.

Download Full Size | PDF

Figure 7 plots the Q-factor versus the launched optical power for 120 Gb/s 64-QAM after 375 km SMF transmission with bi-GRU NLE, bi-LSTM NLE, bi-RNN NLE and without (w/o) employing NLE. Nonlinearity equalization capability is assessed based on Q-factor, which is calculated from the BER. The relationship between the Q-factor and BER can be expressed as $Q = 20{\log _{10}}\left [ {\sqrt 2 {{{\mathop {\textrm erfc}\nolimits } }^{ - 1}}\left ( {2BER} \right )} \right ]$ [38]. As shown by the square-marked curve (w/o NLE) in Fig. 7, in the launched optical power range from -4 dBm to -1 dBm, as the launched optical power increases, the optical signal to-noise ratio (OSNR) of the signal becomes higher, the system performance improves, the signals with low launched optical power are mainly affected by the serious ASE noise, the constellation points are widely scattered as shown in Fig. 5(a) and (b). With the increasing of launched optical power, the effects of fiber nonlinearity increase, once the launched optical power exceeds a value, the system performance is mainly deteriorated by the fiber nonlinearity, the signals with higher launched optical power suffer from severe nonlinear distortion, especially for the outer symbol constellation points with larger power, as shown in Fig. 5(c).

Fig. 7. Q-factor vs. the launched optical power with bi-GRU NLE, bi-LSTM NLE, bi-RNN NLE and without (w/o) employing NLE, for 120 Gb/s 64-QAM after 375 km SMF transmission; Constellation diagrams of the X-polarization (X-pol) and Y-polarization (Y-pol) components of the 64-QAM signal without employing NLE when the launched optical power is 1 dBm.

Download Full Size | PDF

As the results shown in Fig. 7, bi-RNN NLE, bi-LSTM NLE and bi-GRU NLE show Q-factors well exceed the soft-decision forward error correction (SD-FEC) limit of 6.25 dB (according to the SD-FEC limit of 2$\times$10$^{-2}$ in BER [37]). Constellation diagrams of the X-polarization (X-pol) and Y-polarization (Y-pol) components of the 64-QAM signal without employing NLE when the launched optical power is 1 dBm are shown. The optimum launch optical power can be extended by 2dB after nonlinear equalization. The bi-GRU NLE outperforms bi-RNN NLE evidently. Furthermore, with the aid of bi-GRU NLE, when the launched optical power is in the range of -3 dBm to 3 dBm, the Q-factors can exceed the HD-FEC limit of 8.52 dB (according to the HD-FEC limit of 3.8$\times$10$^{-3}$ in BER). With the aid of bi-LSTM NLE, when the launched optical power is in the range of -3 dBm to 4 dBm, the Q-factors can exceed the HD-FEC limit of 8.52 dB. Especially when the launched optical power is in the range of 0 dBm to 2 dBm, the performance of bi-GRU NLE and bi-LSTM NLE are similar, the difference of Q-factor is within 0.1 dB.

Table 1 shows a complexity comparison of bi-GRU NLE, bi-LSTM NLE and bi-RNN NLE in the processing of 64-QAM signal. All the programs were performed on a Personal Computer with Intel Core i5-8300H CPU @ 2.30 GHz, 8 GB Random Access Memory (RAM). It is noteworthy that the nonlinear equalizer is trained offline first, and the parameters in the nonlinear equalizer will be fixed during the equalizing process. Fewer parameters contribute to shorter testing time and less computational overhead. Based on the foregoing experimental results, when the launched optical power is in the range of 0 dBm to 2 dBm, the Q-factor performances of bi-GRU NLE and bi-LSTM NLE are similar, while the number of parameters of bi-GRU NLE is about 20.2$\%$ less than that of bi-LSTM NLE, the average training time of bi-GRU NLE is shorter than that of bi-LSTM NLE, and the number of multiplications required for the bi-GRU NLE to equalize per symbol is about 24.5$\%$ less than that for bi-LSTM NLE. Bi-GRU NLE has lower computational cost than Bi-LSTM NLE.

Table 1. Complexity comparison of different nonlinear equalizers

View Table

5. Conclusion

In this paper, we propose bi-GRU neural network based nonlinear equalizer for coherent optical communication system. The performance of bi-GRU NLE has been experimentally evaluated in a 120 Gb/s 64-QAM coherent optical communication system with a transmission distance of 375 km. The complexity of bi-GRU NLE, bi-LSTM NLE and bi-RNN are analyzed theoretically. Experiment results reveal that bi-GRU NLE, bi-LSTM NLE and bi-RNN show significant Q-factor improvements. With the aid of bi-GRU NLE, when the launched optical power is in the range of -3 dBm to 3 dBm, the Q-factors can exceed the HD-FEC limit of 8.52 dB. In the launched optical power range from 0 dBm to 2 dBm, the Q-factor performance of bi-GRU NLE and bi-LSTM NLE are similar, while the number of parameters of bi-GRU NLE is about 20.2$\%$ less than that of bi-LSTM NLE, the average training time of bi-GRU NLE is shorter than that of bi-LSTM NLE, and the number of multiplications required for the bi-GRU NLE to equalize per symbol is about 24.5$\%$ less than that for bi-LSTM NLE. The computational cost of bi-GRU NLE is lower than that of bi-LSTM NLE.

Funding

National Natural Science Foundation of China (62075014, 61675030); National Key Research and Development Program of China (2018YFB1801203).

Disclosures

The authors declare no conflicts of interest.

References

1. P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly efficient data migration and backup for big data applications in elastic optical inter-data-center networks,” IEEE Network 29(5), 36–42 (2015). [CrossRef]

2. H. Chien, J. Yu, Y. Cai, B. Zhu, X. Xiao, Y. Xia, X. Wei, T. Wang, and Y. Chen, “Approaching terabits per carrier metro-regional transmission using beyond-100GBd coherent optics with probabilistically shaped DP-64QAM modulation,” J. Lightwave Technol. 37(8), 1751–1755 (2019). [CrossRef]

3. M. Kong, X. Li, J. Zhang, K. Wang, X. Xin, F. Zhao, and J. Yu, “High spectral efficiency 400 Gb/s transmission by different modulation formats and advanced DSP,” J. Lightwave Technol. 37(20), 5317–5325 (2019). [CrossRef]

4. Q. Zhou, F. Zhang, and C. Yang, “AdaNN: Adaptive neural network-based equalizer via online semi-supervised learning,” J. Lightwave Technol. 38(16), 4315–4324 (2020). [CrossRef]

5. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic service provisioning in elastic optical networks with hybrid single-/multi-path routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

6. L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efficient resource allocation for all-optical multicasting over spectrum-sliced elastic optical networks,” J. Opt. Commun. Netw. 5(8), 836–847 (2013). [CrossRef]

7. Y. Yin, H. Zhang, M. Zhang, M. Xia, Z. Zhu, S. Dahlfort, and S. J. B. Yoo, “Spectral and spatial 2D fragmentation-aware routing and spectrum assignment algorithms in elastic optical networks [invited],” J. Opt. Commun. Netw. 5(10), A100–A106 (2013). [CrossRef]

8. L. Gong and Z. Zhu, “Virtual optical network embedding (VONE) over elastic optical networks,” J. Lightwave Technol. 32(3), 450–460 (2014). [CrossRef]

9. S. J. Savory, “Digital coherent optical receivers: Algorithms and subsystems,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1164–1179 (2010). [CrossRef]

10. Q. Zhuge, M. Fu, H. Lun, X. Liu, and W. Hu, “Fiber nonlinearity mitigation and compensation for capacity-approaching optical transmission systems,” in Asia Communications and Photonics Conference (ACPC) 2019, (Optical Society of America, 2019), p. T4B.1.

11. E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

12. L. Liu, L. Li, Y. Huang, K. Cui, Q. Xiong, F. N. Hauske, C. Xie, and Y. Cai, “Intrachannel nonlinearity compensation by inverse Volterra series transfer function,” J. Lightwave Technol. 30(3), 310–316 (2012). [CrossRef]

13. L. Dou, Z. Tao, L. Li, W. Yan, T. Tanimura, T. Hoshida, and J. C. Rasmussen, “A low complexity pre-distortion method for intra-channel nonlinearity,” in Optical Fiber Communication Conference/National Fiber Optic Engineers Conference 2011, (Optical Society of America, 2011), p. OThF5.

14. H. Nakashima, T. Oyama, C. Ohshima, Y. Akiyama, Z. Tao, and T. Hoshida, “Digital nonlinear compensation technologies in coherent optical communication systems,” in Optical Fiber Communication Conference, (Optical Society of America, 2017), p. W1G.5.

15. Y. Wu, H. Lun, M. Fu, X. Zeng, X. Liu, Q. Liu, L. Yi, W. Hu, and Q. Zhuge, “Degenerated look-up table-based perturbative fiber nonlinearity compensation algorithm for probabilistically shaped signals,” Opt. Express 28(9), 13401–13413 (2020). [CrossRef]

16. H. Lun, Q. Zhuge, Z. Xiao, S. Fu, M. Tang, D. Liu, W. Hu, and D. V. Plant, “Single-step digital backpropagation for subcarrier-multiplexing transmissions,” Opt. Express 27(25), 36680–36690 (2019). [CrossRef]

17. N. Stojanovic, F. Karinou, Z. Qiang, and C. Prodaniuc, “Volterra and wiener equalizers for short-reach 100G PAM-4 applications,” J. Lightwave Technol. 35(21), 4583–4594 (2017). [CrossRef]

18. J. Zhang, W. Chen, M. Gao, and G. Shen, “K-means-clustering-based fiber nonlinearity equalization techniques for 64-QAM coherent optical communication system,” Opt. Express 25(22), 27570–27580 (2017). [CrossRef]

19. E. Giacoumidis, A. Matin, J. Wei, N. J. Doran, L. P. Barry, and X. Wang, “Blind nonlinearity equalization by machine-learning-based clustering for single- and multichannel coherent optical OFDM,” J. Lightwave Technol. 36(3), 721–727 (2018). [CrossRef]

20. M. Xu, J. Zhang, H. Zhang, Z. Jia, J. Wang, L. Cheng, L. A. Campos, and C. Knittle, “Multi-stage machine learning enhanced DSP for DP-64QAM coherent optical transmission systems,” in Optical Fiber Communication Conference (OFC) 2019, (Optical Society of America, 2019), p. M2H.1.

21. J. Zhang, M. Gao, W. Chen, and G. Shen, “Non-data-aided k-nearest neighbors technique for optical fiber nonlinearity mitigation,” J. Lightwave Technol. 36(17), 3564–3572 (2018). [CrossRef]

22. E. Giacoumidis, S. Mhatli, M. F. C. Stephens, A. Tsokanos, J. Wei, M. E. McCarthy, N. J. Doran, and A. D. Ellis, “Reduction of nonlinear intersubcarrier intermixing in coherent optical OFDM by a fast newton-based support vector machine nonlinear equalizer,” J. Lightwave Technol. 35(12), 2391–2397 (2017). [CrossRef]

23. F. Tian, Q. Zhou, and C. Yang, “Gaussian mixture model-hidden Markov model based nonlinear equalizer for optical fiber transmission,” Opt. Express 28(7), 9728–9737 (2020). [CrossRef]

24. I. Aldaya, E. Giacoumidis, A. Tsokanos, M. Jarajreh, Y. Wen, J. Wei, G. Campuzano, M. L. F. Abbade, and L. P. Barry, “Compensation of nonlinear distortion in coherent optical OFDM systems using a MIMO deep neural network-based equalizer,” Opt. Lett. 45(20), 5820–5823 (2020). [CrossRef]

25. S. Zhang, F. Yaman, K. Nakamura, T. Inoue, V. Kamalov, L. Jovanovski, V. Vusirikala, E. Mateo, Y. Inada, and T. Wang, “Field and lab experimental demonstration of nonlinear impairment compensation using neural networks,” Nat. Commun. 10(1), 3033 (2019). [CrossRef]

26. C. Chuang, L. Liu, C. Wei, J. Liu, L. Henrickson, W. Huang, C. Wang, Y. Chen, and J. Chen, “Convolutional neural network based nonlinear classifier for 112-Gbps high speed optical link,” in Optical Fiber Communication Conference, (Optical Society of America, 2018), p. W2A.43.

27. Q. Zhou, C. Yang, A. Liang, X. Zheng, and Z. Chen, “Low computationally complex recurrent neural network for high speed optical fiber transmission,” Opt. Commun. 441, 121–126 (2019). [CrossRef]

28. B. Karanov, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end optimized transmission over dispersive intensity-modulated channels using bidirectional recurrent neural networks,” Opt. Express 27(14), 19650–19663 (2019). [CrossRef]

29. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9(8), 1735–1780 (1997). [CrossRef]

30. X. Dai, X. Li, M. Luo, Q. You, and S. Yu, “LSTM networks enabled nonlinear equalization in 50-Gb/s PAM-4 transmission links,” Appl. Opt. 58(22), 6079–6084 (2019). [CrossRef]

31. X. Lu, C. Lu, W. Yu, L. Qiao, S. Liang, A. P. T. Lau, and N. Chi, “Memory-controlled deep LSTM neural network post-equalizer used in high-speed PAM VLC system,” Opt. Express 27(5), 7822–7833 (2019). [CrossRef]

32. S. Deligiannidis, A. Bogris, C. Mesaritakis, and Y. Kopsinis, “Compensation of fiber nonlinearities in digital coherent systems leveraging long short-term memory neural networks,” J. Lightwave Technol. 38(21), 5991–5999 (2020). [CrossRef]

33. K. Cho, B. V. Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Association for Computational Linguistics, Doha, Qatar, 2014), pp. 1724–1734.

34. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (Massachusetts Institute of Technology, 2016).

35. K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” IEEE Trans. Neural Netw. Learning Syst. 28(10), 2222–2232 (2017). [CrossRef]

36. C. Xiong, S. Merity, and R. Socher, “Dynamic memory networks for visual and textual question answering,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, (JMLR.org, 2016), ICML’16, pp. 2397–2406.

37. X. Wang, Q. Zhang, X. Xin, R. Gao, Q. Tian, F. Tian, C. Wang, X. Pan, Y. Wang, and L. Yang, “Robust weighted K-means clustering algorithm for a probabilistic-shaped 64QAM coherent optical communication system,” Opt. Express 27(26), 37601–37613 (2019). [CrossRef]

38. E. Giacoumidis, S. Mhatli, T. Nguyen, S. T. Le, I. Aldaya, M. E. McCarthy, A. D. Ellis, and B. J. Eggleton, “Comparison of DSP-based nonlinear equalizers for intra-channel nonlinearity compensation in coherent optical OFDM,” Opt. Lett. 41(11), 2509–2512 (2016). [CrossRef]

Nonlinear equalizer	Number of parameters	Average training time (s)	Number of multiplications
Bi-GRU	34368	1369	599936
Bi-LSTM	43072	1473	794240
Bi-RNN	16960	970	202496

Bi-directional gated recurrent unit neural network based nonlinear equalizer for coherent optical communication system

Abstract

1. Introduction

2. Principles of bi-GRU based nonlinear equalizer

2.1 Bi-GRU model

2.2 Bi-GRU based nonlinear equalization scheme for M-QAM signals

2.3 Complexity analysis

3. Experimental setup

4. Results and discussion

5. Conclusion

Funding

Disclosures

References

Cited By

Figures (7)

Tables (1)

Equations (10)

Optics Express