81-GHz W-band 60-Gbps 64-QAM wireless transmission based on a dual-GRU equalizer

Cuiwei Liu; Chen Wang; Wen Zhou; Feng Wang; Miao Kong; Jianjun Yu

doi:10.1364/OE.448845

1. Introduction

The data-intensive applications, such as cloud computing, high-definition video and Internet traffic, has brought us into the “big data” era [1]. To manage with the future mobile data traffic, new advanced technologies have been put forward for the upcoming beyond 5G (B5G). For instance, millimeter wave (mm-wave) technology is a promising candidate for providing a large capacity [2–10], since it occupies ultra-wide bandwidth from 30-GHz to 300-GHz. Enhanced mobile broadband (eMBB) is a main challenge for 5G radio network to increase the capacity and the peak data rate. It is worth noting that W-band radio-over-fiber (ROF) wireless transmission is a promising direction for the future long-distance large-capacity mobile communication network due to its advantages of good direction, large bandwidth and low transmission loss in the air space. Some reported achievements indicate that the W-band (75∼110-GHz) transmission has a great potential for all-weather long-distance communication applications.

As we know, the diffraction ability of mm-wave signal is worsen with the increasing frequency, thus a larger loss is induced. Besides, facing a long range between B5G base stations, a large transmission loss is also suffered. At present, coherent detection for m-QAM signal transmission offers high receiver sensitivity and linear detection enabling channel impairment compensation in the digital domain [11–13]. At receiver side, m-QAM signals can be received via coherent detection and optimized by the carrier recovery technologies for the purpose of improving the capability of phase and frequency noise resistance. Moreover, nonlinearity in the wireless system originated from the nonlinear devices containing mixers, power amplifiers in the wireless transmitter, and low-noise amplifiers in the receiver [14]. In addition, the nonlinear impairment also happened in a photonics-aided mm-wave ROF system because of the electro-optical components and fiber nonlinearity.

To compensate the nonlinear impairments effectively, many nonlinear equalization methods used in the receiver DSP have been proposed. In the past, blind equalization technology which achieves channel equalization without the aid of training sequence has become a favorite of practitioners such as constant modulus algorithm (CMA) and cascaded multi-modulus algorithm (CMMA) [15]. However, CMA has a large residual error after convergence and it is not suitable for the nonlinear channel balance. In [16], Volterra nonlinear compensation (VNC) method is applied in the W-band heterodyne detection millimeter wave (MMW) and fiber converged system. However, the oversize of the training and evaluation set will result in an ‘over-fitting’ effect. In recent years, the research results have been proved that one of the most effective methods for modeling wireless channels is based on neural network [17–27]. In order to obtain the optimum effect of neural network training, a large quantity of training data is required. In Ref. [28], there is a need to clearly specify the strategy for training and testing, including the size and allocation of which are adopted. Without such information, it is not sure that if over-fitting effect is induced. Long short-term memory (LSTM) is a special RNN architecture that has powerful modeling capabilities for long-term dependencies and extracts the correlation between the past and current data [29]. It has been applied in intensity modulation/direct detection systems (IM/DD) [30] and visible light communications (VLC) [31]. However, the structure of LSTM cell is complex with an input gate, an output gate and a forget gate, resulting in heavy calculations of the parameters. A gated recurrent unit (GRU) has been proposed in 2014, which only contains two gates (namely reset gate and update gate) [32]. GRU is a less complex variant compared to LSTM, and there are fewer parameters in GRU than those in LSTM.

In this paper, a dual-directional GRU (dual-GRU) neural network is proposed for nonlinear equalization in W-band ROF system. Adopting our proposed dual-GRU equalizer, BER performance can be further optimized. We experimentally demonstrated a 60 Gbit/s 64-QAM signal generation and 1.2-m wireless transmission at 81-GHz with BER below 2×10⁻². Based on our proposed scheme, the results show that it can improve the performance of receiver sensitivity greatly compared with the traditional CMMA in terms of the higher order modulation. And an improvement of receiver sensitivity can be achieved as much as 1 dB compared with a Volterra filtering equalizer at the BER of 2×10⁻².

2. Principle

2.1 Volterra nonlinear equalization

The main feature of Volterra nonlinear equalization is that it can compensate linear and nonlinear impairments simultaneously. In this work, we employed VNC in the W-band wireless transmission. The basic principle of the training and the usage of VNC is shown in Fig. 1.

Fig. 1. Schematic diagram of the training and the usage of VNC.

Download Full Size | PDF

The VNC that we utilized is based on two-order Volterra-series, which can be expressed as follows,

(1)$$y(n) = \sum\limits_{{k_1} = 0}^{{M_1} - 1} {{h_{{k_1}}}(n)x(n - {k_1})} + \sum\limits_{{k_1} = 0}^{{M_2} - 1} {\sum\limits_{{k_2} = {k_1}}^{{M_2} - 1} {{h_{{k_1}{k_2}}}(n)x(n - {k_1})x(n - {k_2})} },$$

where ${M_1}$ and ${M_2}$ are the number of linear and nonlinear taps, i.e. the first and second-order taps, respectively. ${h_{{k_1}}}$ and ${h_{{k_1}{k_2}}}$ are the linear and nonlinear tap coefficients. The first-order Volterra series are for linear impairment, and the second-order Volterra series are for nonlinear impairment. These two sets of tap coefficients can be updated based on the least-mean-square (LMS) error function obtained by the training sequence.

2.2 Dual-GRU based nonlinear equalization scheme for M-QAM signals

The gated recurrent unit neural network controls the input, memory and other information through gating mechanism [33]. It makes predictions at the current time step. As shown from Fig. 2(a), GRU has two gates (a reset gate and an update gate). Intuitively, the reset gate determines that how the new input information combined with the previous memories. The update gate defines the current memory information which saved from the previous memory information. The update gate ${z_t}$ is calculated by the following formula,

(2)$${z_t} = \sigma ({W^{(z)}}{x_t} + {U^z}{h_{t - 1}}).$$

where ${x_t}$ is the input vector at time index t. It goes through a linear transformation and is multiplied by the weight matrices ${W^{(z)}}$. ${h_{t - 1}}$ hold the information at the previous time t-1, which multiply by the weight matrices ${U^z}$. The two results are added together and a logistic sigmoid activation function ($\sigma$) is applied to compress the result between 0 and 1. The update gate determines how much information from the past should be transferred to the present unit in the model or how much information from the previous time and the current time should be transferred to the next moment. This is very powerful because the model decides all the information which copies from the past to reduce the risk of the gradient disappear.

Fig. 2. (a) Detailed structure of a GRU unit. (b) Structure of a dual-GRU model. (c) Structure of a GRU model.

Download Full Size | PDF

The reset gate mainly determines how much the past information needs to be discarded. We calculate the update gate ${r_t}$ at time step t using the following formula,

(3)$${r_t} = \sigma ({W^{(r)}}{x_t} + {U^r}{h_{t - 1}}).$$

This formula is the same as that of the update gate, except that the parameters and weight matrices are different. As mentioned before, ${h_{t - 1}}$ and ${x_t}$ are multiplied by the weight matrices, and then input into the Sigmoid activation function to get the output value from the reset gate.

After the reset gate, the new memory information is calculated from the information of the reset gate which is stored from the past. The process can be represented by the Eq. (4),

(4)$${h_t}^\prime = tanh({W^{(h)}}{x_t} + {r_t} \odot {U^h}{h_{t - 1}}),$$

where the input ${x_t}$ and the previous information ${h_{t - 1}}$ are first post-multiplied by the weight matrices ${W^{(h)}}$ and ${U^h}$, respectively. Then the $Hadamar$ product of reset gate ${r_t}$ and ${U^h}{h_{t - 1}}$ is given. Because the reset gate is a vector range from 0 to 1, it controls the size of the gate opening. For example, an element has been completely forgotten if the value of its gate information is 0. The $Hadamar$ product determines if the past information is retained. Next, we load the sum of $Hadamar$ results into the $tanh$ function, which is the hyperbolic tangent activation function.

Subsequently, the vector ${h_t}$ remains the information of the current cell and deliver it to the next cell. In this process, we employ the update gate to manage the network transmission path of packet ${h_t}^\prime$ in the current memory and ${h_{t - 1}}$ at previous time. This formula can be expressed as,

(5)$${h_t} = {z_t} \odot {h_{t - 1}} + (1 - {z_t}) \odot {h_t}^\prime ,$$

where ${z_t}$ is the output of the update gate. The $Hadamar$ product of ${z_t}$ and ${h_{t - 1}}$ is the dataflow transferred from the previous time to the present time. The final output of the gated recurrent unit is given as the trained data in the current time.

To solve the gradient disappearance problem of the standard RNN, the GRU employs two gated vectors (i.e., the update gate and the reset gate). The special feature of these two gates is that they can hold the information in previous memory although the saved information is useless for the current state production.

Since the modulation format deployed in this paper is M-QAM consisting both in-phase (I) data and quadrature (Q) components, our proposed model with a dual-directional structure has the ability to parallel implement I and Q data processing. Figure 2(b) shows the schematic diagram of the dual-GRU model, and Fig. 2(c) shows the mechanic of a typical GRU model diagram. Our proposed dual-GRU model composed of two GRUs creates one-way links, one of which is used to process the I data and the other is employed to train the Q sequence. Dual-GRU models capture the further detailed feedback information. The dual-GRU is defined as follows,

(6)$$\begin{aligned} h_t^I &= GR{U_I}(x_t^I,h_{t - 1}^I),\\ h_t^Q &= GR{U_Q}(x_t^Q,h_{t - 1}^Q), \end{aligned}$$

where $h_t^I$ is the state of the I-GRU, $h_t^Q$ is the state of the Q-GRU.

We establish a dual-GRU network model for M-QAM signals nonlinear equalization. The received M-QAM signal sequence denoted as $r = [{r_1},{r_2},\ldots ,{r_T}]$,where the vector $i = [1,2,\ldots ,T]$ corresponds to the i -th symbol, each vector ${r_i}$ contains two values (i.e., in-phase (I) and quadrature (Q) components). The corresponding predicted probability of M-QAM constellation point is denoted as $y = [{y_1},{y_2},\ldots ,{y_M}]$ from output softmax layer. Here, we consider M-QAM equalization as a multi-classification. In our experiment, 64-QAM signal is transmitted, thus M is equal to 64 and the maximum y gives the predicted symbol.

Figure 3 shows the configuration of the proposed dual-GRU network for the achievement of nonlinear equalization. The first layer is the input layer, for each symbol ${r_i}$, ${r_i} = [{I_i},{Q_i}]$, $i = [1,2,\ldots ,T]$,and we package the input batch $x(i)$ with a size of k, defined as $x(i) = [{r_{i - k}},\ldots ,{r_i},\ldots ,{r_{i + k}}]$. The second layer is the dual-GRU model layer. The recurrent complexity of the dual-GRU model depend on the length of the input sequence. The output of the dual-GRU model layer are fully connected to the linear layer. The number of nodes in the linear layer is the same as the square of modulation order $\sqrt M$ for M-QAM formats. The output of the linear layer is given as ${Z^I} = [Z_1^I,Z_2^I,\ldots ,Z_{\sqrt M }^I]$. The softmax layer outputs are the probability expressed as,

(7)$${P_I}(y = j\theta |{x^I}) = \frac{{{e^{{z^I}j}}}}{{\sum\limits_{m = 1}^M {{e^{{z^I}m}}} }},j = 1,2,\ldots ,\sqrt M .$$

where ${P_I}(y = j|x)$ represents the probability that the current symbol ${I_i}$ mapping to the $j$-th class, $Z_j^I(j = 1,2,\ldots ,\sqrt M )$ is the output of the linear layer. Then, the output value depicts the corresponding predicted symbol ${I_i}$ depending on the maximum probability. In GRU model of Q-path, the training process is same with that of I-path. And the final predicted class ${y_i}(i = 1,2,\ldots ,T)$ can be expressed as,

(8)$${y_i} = {P_I} \oplus {P_Q}.$$

Fig. 3. Configuration of dual-GRU based nonlinear equalizer.

Download Full Size | PDF

2.3 Complexity analysis

We analyze the complexity of the proposed dual-GRU NLE [34] and make a comparison with Volterra nonlinear equalizer [30]. We consider the complexity in two aspects such as the number of parameters and the number of multiplications. For a GRU unit shown in Fig. 2(a), the parameters of a GRU unit contain three weight matrices for the input ${x_t}$, three weight matrices for the past state ${h_{t - 1}}$, three bias vectors for input ${x_t}$, and three bias vectors for the past state ${h_{t - 1}}$ . The size of the input feature for input ${x_t}$ is denoted as $1 \times F$, the size of the hidden state ${h_t}$ is denoted as $1 \times H$. Then, the size of the weight matrices for the input ${x_t}$ is $F \times H$, the size of the weight matrices for the previous state ${h_{t - 1}}$ is $H \times H$, the size of the bias vectors is $1 \times H$. Thus, the number of parameters of a GRU unit can be calculated as ${N_{P - GRU}} = 3 \times (FH + {H^2} + 2H).$ Our proposed dual-GRU layer is composed of two GRUs. Then, the number of parameters of the dual-GRU layer can be calculated as ${N_{P - dual - GRU}} = 6 \times (FH + {H^2} + 2H).$ For the linear layer, the output of the dual-GRU layer are fully connected to the linear layer of $\sqrt M$ units, the size of the output of the linear layer is $2 \times \sqrt M$. According to Eq. (6), the size of the output of the dual-GRU layer is $2 \times H$. Then the size of the weight matrices for the input of the linear layer is $2H \times 2\sqrt M$, the size of the bias vector is $2 \times \sqrt M$, the number of parameters of the linear layer can be calculated as ${N_{P - linear}} = 4HM + 2\sqrt M$. Therefore, the number of parameters of the proposed dual-GRU NLE can be summarized as:

(9)$$\begin{aligned} &{N_{P - dual - GRUNLE}} = {N_{{P_L} - linear)}} + {N_{P - dual - GRU}}\\ &= 6 \times (FH + {H^2} + 2H) + 4HM + 2\sqrt M . \end{aligned}$$

Note that the activation function is assumed to be implemented through a look-up table (LUT) [35]. For the dual-GRU layer, the number of multiplications can be calculated as,

(10)$${N_{M - dual - GRU}} = 2 \times [3(FH + {H^2}) + 3H] \times L,$$

where $L\; \textrm{is}$ the length of the input sequence. For the linear layer, the number of multiplications is $2H\sqrt M$. Thus, the number of multiplications required for the dual-GRU NLE per symbol can be calculated as,

(11)$${N_{M - dual - GRUNLE}} = 2 \times [3(FH + {H^2}) + 3H] \times L + 2H\sqrt M .$$

According to Eq. (1), the number of multiplications for Volterra filtering can be calculated as,

(12)$${N_{M - VolterraNLE}} = S \times [{l_1} \times L + 2 \times \frac{{({l_1} + 1){l_2}}}{2}], $$

where ${l_1}$ and ${l_2}$ are memory lengths for the first and second kernels, respectively. $L$ is the length of the input sequence. S is the number of equalized symbols.

3. Experimental setup

Figure 4 depicts the experimental setup of our demonstrated W-band 64-QAM delivery over 10 km single mode fiber (SMF) and 1.2 m wireless distance.

Fig. 4. The experimental setup for the QPSK mm-wave vector signal generation at D-band adopting photonic frequency quadrupling scheme (×4). ECL: external cavity laser, I/Q MOD: I/Q modulator, EDFA: Erbium-doped fiber amplifier, ATT: attenuation, DAC: digital-to-analog converter, EA: electrical amplifier, PD: photodiode, LNA: low-noise amplifier, OSC: oscilloscope. (a) offline processed. (b) The received electrical spectrum of the 64-QAM.

Download Full Size | PDF

In the transmission link, the adoption of two individual lasers is a relatively simple, flexible and cost-effective architecture for W-band mm-wave signal generation. The baseband 64-QAM signal is digital-to-analog (DAC) converted by the arbitrary wave generator (AWG) with a sampling rate of 80GSa/s. After amplified by a cascaded electrical amplifier (EA), the boosted 64-QAM signal is used to drive the IQ modulator. The external cavity laser (ECL1) at 1551.35 nm with line width of 100 kHz and an average power of 16 dBm is operated as a signal light source, which is modulated via the IQ modulator. Here, the deployed MZM has a 3 dB optical bandwidth of 30-GHz, half-wave voltage of 2.7-V at 1 GHz, and 5-dB insertion loss. ECL2 at the center wavelength of 1550.26 nm works as a local oscillator (LO), which has a frequency space of 81-GHz with the modulated ECL1 lightwave. The coupled light beam can be delivered over 10 km standard single mode fiber (SSMF), and the optical power of which is adjusted to obtain the optimum input power into photodiode (PD) by an attenuator (ATT). The adopted PD in our experiment is implemented within the frequency range of 10∼170 GHz at −2V DC bias, and the output power of which is −7 dBm.

At the W-band wireless transmitter, the generated 81-GHz signal is emitted from W-band HA. The signal power of which is amplified to obtain the superior input power into HA by a low-noise amplifier (LNA). A paired W-band HA is employed to receive W-band signal. However, the W-band wireless link composed solely of a pair of HAs would not work without appropriate W-band amplifiers, due to low SNR. Here, we employ a pair of identical lens(Lens 1 and Lens 2) between HAs to realize the amplification of mm-wave signals, the diameter and focal length of which are 10 cm and 20 cm, respectively. The transmitted HA is just placed at focus of Lens 1 and then the wireless mm-wave signal is focused at the position of received W-band HA.

At the receiver end, the received signal at 81 GHz is firstly down converted into an intermediate frequency (IF) signal by a commercial W-band mixer with a 9.5 dB conversion loss and a local oscillator (LO) source conducted with 75 GHz. And then the IF signal at 6 GHz is boosted by using an electrical amplifier (EA) with 33-dB gain and 14-dBm saturation output power available from DC to 50 GHz frequency band. Finally, the boosted signal is captured by a digital storage oscilloscope (OSC). The deployed OSC has a sampling rate of 120 GSa/s and an electrical bandwidth of 45 GHz. As Inset (a) shown in Fig. 4 the captured signal is offline processed by DSP steps including down conversion into base band, resampling, frequency offset estimation (FOE) and carrier phase recovery (CPR). In particular, we also compare the BER performance between CMMA equalizer, CMMA-Volterra equalizer and dual-GRU equalizer after these DSP steps. Figure 4(b) shows the received electrical spectrum of the 64-QAM and the IF is 6 GHz. The device during the experiment is not ideal. As shown in Fig. 4(b), the fluctuation phenomenon of the electrical spectrum is caused by the low noise amplifier (LNA) and mixer employed in the experiment. The frequency response of the LNA and mixer is not flat on the spectral range in W-band. Therefore, the imperfect frequency response of LNA and mixer leads to the fluctuation phenomenon of the electrical spectrum.

The dual-GRU network is built, trained and evaluated in Pytorch 1.6.0. In our model, Cross Entropy Loss is chosen as the loss function, and Adam optimizer is employed to optimize the dual-GRU network. The whole data set is divided into training data (50%) and testing data (50%).

4. Experimental results and discussions

PRBS patterns are used for training and testing set. Therefore, to make the pattern irregular, we take the shuffle operation on training samples to break the generation pattern of the PRBS. We compared the BER after the shuffle and un-shuffle sequences in Fig. 5(a). The following figure shows the BER of 64QAM signal after shuffle is obviously higher than BER after un-shuffle. It can be improved that the performance come from learning the channel characteristics instead of learning the pattern. The BER value versus received optical power on training set and testing set is demonstrated as follows. It can be seen from the Fig. 5(b) that the performance of dual-GRU equalizer on training set is similar with the performance on testing set, which proves that there are not overfitting issues in our experiment. Furthermore, the MSE value on training set and testing set versus the training epoch is demonstrated in the following figure. We have employed the dropout layer to avoid overfitting phenomenon. The dropout layer will turn off some neurons (set some weights of the neural network to zero) randomly in the training process to avoid overfitting phenomenon. In testing stage of the neural network, all the weights will be activated to obtain the optimal performance of the neural network. It can be seen from the Fig. 5(c) that the MSE value on testing set is lower that the MSE value on training set, which also verifies our statement.

Fig. 5. (a) BER performance vs shuffle and un-shuffle. (b) the BER versus received optical power on training set and testing set (c) MSE value on training set and testing set versus the training epoch

Download Full Size | PDF

Based on our proposed experiment system for 10-Gbaud W-band 64-QAM wireless transmission, we will further compare the performance of dual-GRU equalization in terms of training data size, neuron number in hidden layer, as well as the training iterative epoch.

4.1 Training data size

Assuming that the amount of dataset is fixed, a larger section for training will increase computation complexity and time delay, whereas a smaller one can result in a worse performance. Therefore, the ratio of training dataset to testing plays a critical role in dual-GRU network. Figure 6 shows the tested BER performance versus training data size, where the red and the blue curves are based on 64-QAM and 16-QAM transmission, respectively, with 0 dB and -1 dB optical power into PD, respectively.

Fig. 6. The BER versus the training data size. (a)-(c) the constellation diagrams of the 16-QAM signals when the training data are 10000/18000/26000, respectively. (d)-(f) the constellation diagrams of the 64-QAM signals when the training data are 10000/20000/26000, respectively.

Download Full Size | PDF

It is evident that BER declines effectively with the increase of training data size. Note that when the size of 64-QAM training samples exceeds 20000, the BER drops below 1×10⁻², thus considering the tradeoff between performance and complexity, 20000 is set as 64-QAM training size in the next investigation. Moreover, for 16-QAM signals, BER reduces significantly when training size is less than 18000, selected as benchmark for the following discussion. Figures 6(a)-(c) illustrate the received constellation diagrams of the 16-QAM signals when 10000/18000/26000 samples are used as the training data, respectively. Similarly, those of the 64-QAM signals, when 10000/20000/26000 training samples are employed, respectively, are shown in the Figs. 6(d)-(f).

4.2 Training iterative epoch

Another key factor that impacts the training speed and precision is training iterative epoch. Generally, only one epoch iteration is not enough to get optimal weight values, namely “under-fitting”, while oversized epochs lead to an “over-fitting” effect and high complexity. Figure 7(a) depicts MSE versus epoch when the optical power into PD is fixed at 0 dBm, where both curves describe that MSE falls sharply at the early stage and then slows down to converge as the epoch increases.

Fig. 7. (a) MSE versus the epoch value when the optical power is fixed as 0 dBm. (b) BER of the 16-QAM curves the epoch iteration. (c) BER of the 64-QAM curve the epoch iteration. (d)-(f) the constellation diagram of the 16-QAM signal with 20/40/100 epoch, respectively. (g)-(i) the constellation diagram of the 64-QAM signal with 20/40/100 epoch, respectively.

Download Full Size | PDF

In order to further explore the effects of epoch on performance, BERs of 16-QAM and 64-QAM are demonstrated in Figs. 7(b)-(c) and obviously, they have the same tendency as MSE’s and converge to 1.19×10⁻⁴ and 1.34×10⁻², respectively, at 40 epochs. Figures 7(d)-(f) and (g)-(i) are the constellation diagrams of the recovered 16-QAM and 64-QAM signals with 20/40/100 epochs, respectively. Due to the raise of the epoch iteration, the signals are distributed clearly and finally, 40 epoch iterations are determined in terms of complexity and BER performance.

4.3 Neuron number in hidden layer

The increase of the neuron cells in a hidden layer often leads to an “over-fitting” and high complexity. In the over-fitting model, the training set has a remarkable performance improvement, but the testing case has an underperformance. Figure 8 gives the relationship between the BER performance and neuron cells in one hidden layer when the input power is 0 dBm.

Fig. 8. BER performance vs the neuron number. (a-c) the constellation diagrams of the 16-QAM when the neuron number is 28, 378 and 528, respectively. (d-f) the constellation diagram of the 64-QAM when the neuron number is 28, 278 and 528, respectively.

Download Full Size | PDF

The blue curve represents the BER of the 16-QAM signal vs. the neuron number in a hidden layer. It can be seen that the BER decreases rapidly with less neuron nodes while stable with nodes larger than 250. Figures 8(a)-(c) show the constellation diagrams of the received 16-QAM signals when the neuron number is 28, 378 and 528, respectively. Figure 8(a) depicts that the trained 16-QAM signals are under-fitting when there are only 28 neuron cells in a hidden layer. In Fig. 8 (b), it can be found that the quality of constellation diagram is improved with 378 cells in one hidden layer. Although more cells are deployed in trained network, little improvement is achieved due to over-fitting, as shown in Fig. 8 (c). Therefore, the neuron number is fixed at 378 to meet both complexity and training accuracy requirements. Here, the BER of the received 16-QAM signals drops to 9.3×10⁻⁵.

The red curve illustrates the BER of the 64-QAM signals vs. the neuron number in a hidden layer. With the growth of the neuron number, it has a better BER performance similar to that of the blue curve. Figures 8(d)-(f) show the constellation diagram when the neuron number is 28, 278 and 528, respectively. It can be seen that the optimal neuron number is 278. At this moment, the BER of the 64-QAM signal descends to 7.8×10⁻³.

4.4 Performance comparison of nonlinear dual-GRU, Volterra equalization and linear CMMA equalization

Firstly, we compare the performance of dual-GRU and CMMA traditional algorithms. Figure 9 illustrates BERs of 16-QAM signals versus the input optical power. The BER has a significant gain with the rise of input optical power, achieving the best performance when the optical power is -1 dBm. When the optical power is greater than -4 dBm, BER satisfies the HD-FEC (3.8×10⁻³). Figures 9(a)-(b) show the constellation diagram of 16-QAM signals employing the traditional CMMA algorithm and dual-GRU, respectively, when the optical power is -1 dBm. At this time, the BER drops to 1.26×10⁻⁴. Seen from Fig. 9, compared with the traditional CMMA algorithm, the dual-GRU method improves the BER performance. It approximately has a gain of 0.5 dB at the BER of SD-FEC. We further study the BER performance based on different algorithms over the high-order 64-QAM signals.

Fig. 9. BER of 16-QAM signal versus the input optical power. (a) The constellation diagram employing the traditional CMMA algorithm. (b) The constellation diagram with dual-GRU.

Download Full Size | PDF

Figure 10 illustrates BERs of 64-QAM signals versus the input optical power using different algorithms. It can be seen that the BER decreases gradually with the increase of optical power. When the optical power is 0 dBm, the BER performance is optimal. Figures 10(a)-(c) show the constellation diagrams of 64-QAM signals based on CMMA, Volterra and GRU, and the BERs are 0.033(below SD-FEC), 0.0198 and 0.021, respectively. Figure 10(d) is the constellation diagram with the dual-GRU method. It is worth noting that BER drops to 0.0135.

Fig. 10. BER of 64-QAM signal versus the input optical power based on different algorithms. The constellation diagram with (a) CMMA algorithm. (b) Volterra algorithm. (c) GRU algorithm. (d) dual-GRU algorithm.

Download Full Size | PDF

As can be seen from Fig. 10, compared with the traditional CMMA algorithm, the dual-GRU algorithm has a gain of 2 dB in receiver sensitivity at BER of SD-FEC. In contrast to the Volterra algorithm and GRU method, the dual-GRU algorithm has a gain of 1 dB in receiver sensitivity at BER of SD-FEC.

As we know, neural network algorithm needs higher algorithm complexity. Hence, we compare the number of multiplications Volterra algorithm, GRU algorithm and the dual-GRU algorithm. According to the formula in the Section 2, the multiplicative complexity of the Volterra algorithm is 11,890 in this paper. And multiplication complexity of the GRU algorithm is 174080. Then, multiplication complexity of the dual-GRU algorithm is 348160. Though the complexity is sacrificed, the dual-GRU algorithm tremendously mitigate the nonlinear distortion in the ROF system.

It is believed that neural network algorithm causes higher algorithm complexity, hence we compare the number of multiplications of Volterra, GRU and the dual-GRU algorithm. According to the formula in the Section 2, the multiplicative complexity of the Volterra algorithm is 11890, much less than 174080 for GRU algorithm. Especially, the complexity of the dual-GRU algorithm is 348160, almost thirty times as much as that of Volterra, but it tremendously mitigate the nonlinear distortion in the ROF system.

5. Conclusions

In this paper, a new dual-GRU equalizer for the 60-Gbps 64-QAM signal ROF transmission system over 10-km SMF and 1.2-m wireless link at 81-GHz is experimentally demonstrated. In terms of the training data size, neuron number and training iterative epoch, we optimized the dual-GRU equalizer. We compare the BER performance of the novel dual-GRU equalizer, the classical CMMA equalizer and Volterra nonlinear equalizer. Compared with the classical CMMA equalizer, the experimental results illustrate that an improvement of 0.5 dB using the new method in the 16-QAM signal receiver sensitivity at BER of HD-FEC. Compared with the traditional CMMA algorithm, the 64-QAM signal can achieve an improvement of 2 dB in receiver sensitivity using the dual-GRU equalizer at BER of SD-FEC. It has a receiver sensitivity improvement of 1 dB in at a BER of 2×10⁻² in comparison with the Volterra nonlinear equalizer. Thanks to our proposed dual-GRU scheme, the BER performance of the signal has been dramatically raised. Therefore, our proposed dual-GRU equalization scheme is promising for the future B5G ROF-based communication application.

Funding

National Key Research and Development Program of China (2018YFB1801703); National Natural Science Foundation of China (61675048, 61720106015, 61805043, 61835002, 91938202).

Disclosures

The authors declare no conflicts of interest.

Data availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

References

1. T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access 1, 335–349 (2013). [CrossRef]

2. X. Li, J. Zhang, J. Xiao, Z. Zhang, Y. Xu, and J. Yu, “W-band 8QAM vector signal generation by MZM-based photonic frequency octupling,” IEEE Photonics Technol. Lett. 27(12), 1257–1260 (2015). [CrossRef]

3. W. Zhou and C. Qin, “Simultaneous generation of 40, 80 and 120 GHz optical millimeter-wave from one Mach–Zehnder modulator and demonstration of millimeter-wave transmission and down-conversion,” Opt. Commun. 398, 101–106 (2017). [CrossRef]

4. T. Nagatsuma, “Photonic generation of millimeter waves and its applications,” in Proc. Opt. Fiber Commun. Conf., Los Angeles, CA, USA, 2012, Paper OM2B.7.

5. X. Li, J. Yu, J. Xiao, and Y. Xu, “Fiber-wireless-fiber link for 128-Gb/s PDM-16QAM signal transmission at W-band,” IEEE Photonics Technol. Lett. 26(19), 1948–1951 (2014). [CrossRef]

6. H. T. Huang, C. T. Lin, C. H. Ho, W. L. Liang, C. C. Wei, Y. H. Cheng, and S. Chi, “High spectral efficient W-band OFDM-RoF system with direct-detection by two cascaded single-drive MZMs,” Opt. Express 21(14), 16615–16620 (2013). [CrossRef]

7. C. B. Huang, J. W. Shi, N.W. Chen, H. P. Chuang, J. E. Bowers, and C. L. Pan, “Remotely up-converted 20-Gbit/s error-free wireless on–off-keying data transmission at w-band using an ultra-wideband photonic transmitter mixer,” IEEE Photonics J. 3(2), 209–219 (2011). [CrossRef]

8. A. Kanno, K. Inagaki, I. Morohashi, Ta. Sakamoto, T. Kuri, I. Hosako, T. Kawanishi, Y. Yoshida, and K. I. Kitayama, “40 Gb/sW-band (75–110 GHz) 16-QAM radio-over-fiber signal generation and its wireless transmission,” Opt. Express 19(26), B56–B63 (2011). [CrossRef]

9. R. Puerta, J. Yu, X. Li, Y. Xu, J. J. V. Olmos, and I. T. Monroy, “Single carrier dual-polarization 328-Gb/s wireless transmission in a D-band millimeter wave 2 × 2 MU-MIMO radio-over-fiber system,” J. Lightwave Technol. 36(2), 587–593 (2018). [CrossRef]

10. T. Schneider, “Ultrahigh-bitrate wireless data communications via THz links; Possibilities and challenges,” J Infrared Milli Terahz Waves 36(2), 159–179 (2015). [CrossRef]

11. J. Zhang, Jianjun. Yu, X. Li, K. Wang, and W. Zhou, “200 Gbit/s/λ PDM-PAM-4 PON system based on intensity modulation and coherent detection,” J. Opt. Commun. Netw. 12(1), A1–A8 (2020). [CrossRef]

12. J. Zhang, J. S. Wey, J. Shi, and J. Yu, “Single-wavelength 100-Gb/s PAM-4 TDM-PON achieving over 32-dB power budget using simplified and phase insensitive coherent detection,” in Proc. Eur. Conf. Opt. Commun., Rome, Italy, pp. 1–3 (2018).

13. C. Xie, S. Spiga, P. Dong, P. J. Winzer, and M. Amann, “Generation and transmission of 100-Gb/sPDM4-PAMusing directly modulated VCSELs and coherent detection,” in Proc. Optical Fiber Commun. Conf., San Francisco, CA, USA, 2014, Paper Th3 K.2.

14. K. M. Gharaibeh, “Nonlinear Distortion in Wireless Systems: Modeling and Simulation with Matlab,” Wiley: Hoboken, NJ, USA, 2011.

15. I. Fijalkow, C. E. Manlove, and C. R. Johnson, “Adaptive fractionally spaced blind CMA equalization: Excess MSE,” IEEE Trans. Signal Process. 46(1), 227–231 (1998). [CrossRef]

16. Li Zhao, Run-Kai Shiu, Wen Zhou, Rui Zhang, Shuyi Shen, Yitong Li, Jianjun Yu, and G.K. Chang, “Nonlinear compensation in W-band MM-wave communication system with heterodyne coherent detection,” Opt. Fiber Technol. 54, 102099 (2020). [CrossRef]

17. L. Sehovac and K. Grolinger, “Deep Learning for Load Forecasting: Sequence to Sequence Recurrent Neural Networks With Attention,” IEEE Access 8, 36411–36426 (2020). [CrossRef]

18. P. Li, L. Yi, L. Xue, and W. Hu, “56 Gbps IM/DD PON based on 10G-Class Optical Devices with 29 dB Loss Budget Enabled by Machine Learning,” Optical Fiber Commun. Conf., paper M2B.2 (20180.

19. K. B. Petrov, C. Mathieu, T. Felix, T. A. Eriksson, B. Henning, L. Domanic, B. Polina, and S. Laurent, “End-to-End Deep Learning of Optical Fiber Communications,” J. Lightwave Technol. 36(20), 4843–4855 (2018). [CrossRef]

20. S. Guob, G Peng, A Yangd, and Y Qiaoe, “Deep Neural Network Based Chromatic Dispersion Estimation With Ultra-Low Sampling Rate for Optical Fiber Communication Systems,” IEEE Access 7, 84155–84162 (2019). [CrossRef]

21. S. Deligiannidis, A. Bogris, C. Mesaritakis, and Y. Kopsinis, “Compensation of Fiber Nonlinearities in Digital Coherent Systems Leveraging Long Short-Term Memory Neural Networks,” J. Lightwave Technol. 38(21), 5991–5999 (2020). [CrossRef]

22. T. Xu, T. Xu, and I. Darwazeh, “Deep Learning for Interference Cancellation in Non-Orthogonal Signal Based Optical Communication Systems,” Progress In Electromagnetics Research Symposium (PIERS). 2019.

23. C. Hger and H. D. Pfister, “Nonlinear Interference Mitigation via Deep Neural Networks,” Optical Fiber Commun. Conf., paper W3A.4 (2018).

24. D. Wang, M. Zhan, J. Li, Z. Li, and J. Li, “Intelligent constellation diagram analyzer using convolutional neural network-based deep learning,” Opt. Express 25(15), 17150–17166 (2017). [CrossRef]

25. O. Kotlyar, M. K. Kopae, M. Pankratova, A. Vasylchenkova, J. E. Prilepsky, and S. K. Turitsyn, “Convolutional long-short term memory neural network equaliser for nonlinear Fourier transform-based optical transmission systems,” Opt. Express 29(7), 11254–11267 (2021). [CrossRef]

26. W. Wang and B. Chang, “Graph-based Dependency Parsing with Bidirectional LSTM,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016.

27. W. Zhou, L. Zhao, J. Zhang, K. Wang, J. Yu, Y.W. Chen, S. Shen, R.K. Shiu, and G.K. Chang, “135-GHz D-Band 60-Gbps PAM-8 Wireless Transmission Employing a Joint DNN Equalizer With BP and CMMA,” J. Lightwave Technol. 38(14), 3592–3601 (2020). [CrossRef]

28. T. A. Eriksson, H. Bülow, and A. Leven, “Applying Neural Networks in Optical Communication Systems: Possible Pitfalls,” IEEE Photonics Technol. Lett. 29(23), 2091–2094 (2017). [CrossRef]

29. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9(8), 1735–1780 (1997). [CrossRef]

30. X. Dai, X. Li, M. Luo, Q. You, and S. Yu, “LSTM networks enabled nonlinear equalization in 50-Gb/s PAM-4 transmission links,” Appl. Opt. 58(22), 6079–6084 (2019). [CrossRef]

31. X. Lu, C. Lu, W. Yu, L. Qiao, S. Liang, A. P. T. Lau, and N. Chi, “Memory-controlled deep LSTM neural network post-equalizer used in high-speed PAM VLC system,” Opt. Express 27(5), 7822–7833 (2019). [CrossRef]

32. K. Cho, B. V. Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Association for Computational Linguistics, Doha, Qatar, 2014), pp. 1724–1734.

33. K. Cho, B. Van Merrienboer, C. Gulcehre, D. Ba. Hdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation,” Computer Science, 2014. https://arxiv.org/pdf/1406.1078.pdf

34. X. Liu, Y. Wang, X. Wang, H. Xu, C. Li, and X. Xin, “Bi-directional gated recurrent unit neural network based nonlinear equalizer for coherent optical communication system,” Opt. Express 29(4), 5923–5933 (2021). [CrossRef]

35. I. Aldaya, E. Giacoumidis, A. Tsokanos, M. Jarajreh, Y. Wen, J. Wei, G. Campuzano, M. L. F. Abbade, and L.P. Barry, “Compensation of nonlinear distortion in coherent optical OFDM systems using a MIMO deep neural network-based equalizer,” Opt. Lett. 45(20), 5820–5823 (2020). [CrossRef]

81-GHz W-band 60-Gbps 64-QAM wireless transmission based on a dual-GRU equalizer

Abstract

1. Introduction

2. Principle

2.1 Volterra nonlinear equalization

2.2 Dual-GRU based nonlinear equalization scheme for M-QAM signals

2.3 Complexity analysis

3. Experimental setup

4. Experimental results and discussions

4.1 Training data size

4.2 Training iterative epoch

4.3 Neuron number in hidden layer

4.4 Performance comparison of nonlinear dual-GRU, Volterra equalization and linear CMMA equalization

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Equations (12)

Optics Express