Direct decoding of nonlinear OFDM-QAM signals using convolutional neural network

Wen Qi Zhang; Wen Qi Zhang; Terence H. Chan; Shahraam Afshar V.

doi:10.1364/OE.419609

1. Introduction

Mitigating the effects of optical fibre nonlinearity is one of the key challenges for today’s fibre optical communication. Fibre nonlinearity causes nonlinear interference noise [1] that impacts symbol detection, which leads to a fundamental limitation for channel capacity called the Shannon limit [2]. New techniques based on Nonlinear Fourier Transform (NFT) can be used to mitigate the influence of nonlinearity in communication channels. In NFT based communications, the information is encoded using invariant entities in NFT. However, due to the complexity of NFT operations, speed and accuracy remain as major bottlenecks for real-world applications despite years of development [3,4]. In recent years, a few fast NFT algorithms have been developed [5–9], but no algorithms have been optimized to reach the speed level of linear fast Fourier transform (FFT).

Machine learning (ML) has been demonstrated to be useful and surpasses researchers’ expectation in many areas of science and technology. Applications of ML for optical communications are still in its early development [10–13], but ML has already shown promising potential to overcome many dangling challenges, including in the field of NFT communication [14–17]. However, the applications in NFT communication are mainly for post-processing, such as constellation classification [17] and equalisation [15], instead of replacing the direct NFT calculation. Jones et al. [14], as an exception, have applied a neural network directly to the time domain of second-order soliton pulses (data carried by the discrete spectrum of the NFT with 2 eigenvalues [18]) with QPSK modulation. To the best of our knowledge, there has been no attempt to use a neural network to bypass NFT calculation in more complex settings than second-order solitons, such as signals encoded in the continuous spectrum with Orthogonal Frequency-Division Multiplexing (OFDM) and Quadrature Amplitude Modulation (QAM) modulation.

The theory of NFT implies that information encoded in the nonlinear Fourier domain remains unchanged in certain ways. It is a matter of how to reveal this obfuscated information. ML, especially with the use of a convolutional neural network (CNN) [19], has demonstrated the capacity of extracting features from the obfuscated data. One example is the application of CNN for facial recognition [20]. With respect to NFT communications, the main question is whether CNN can be employed to recognise the hidden information in the NFT domain and extract them?

In this work, we develop a simulation model, using which we demonstrate the capability of using CNN to decode NFT signals directly. Figure 1 shows the schematic of our approach. Our transmission is similar to a typical OFDM-QAM system, except the usual frequency domain is replaced by a nonlinear frequency domain. In other words, the time-domain signal is obtained by invoking an inverse NFT instead of the inverse Fourier transform. With NFT, the multiplexing scheme is also called Nonlinear Frequency-Division Multiplexing (NFDM). The propagation of optical signals through an optical fibre are simulated using a split-step Fourier method [21], where fibre loss is assumed to be perfectly compensated with continuous amplification along fibre. Other conditions such as insertion loss and signal distortions at the transmitter and receiver are ignored. The output signals are then decoded directly using a CNN to recover the binary data. At the same time, We also decode the same output signals using the direct NFT approach (NFT, demodulation and decision-making) and compare the accuracy and speed of the two approaches. We train the network for pulses with fixed propagation length as well as random propagation lengths. Among all cases, a boost of accuracy and a maximum of 60 times speed increase are observed with the CNN approach.

Fig. 1. Simulation scheme. Two approaches are used at the receiver end: CNN approach and direct NFT approach.

Download Full Size | PDF

The paper is structured into 6 sections. In Section 2, a brief introduction of the NFT is given. We then define the problem and parameters in Section 3. In Section 4, we show the neural network training results with time domain and frequency domain signals, respectively, and compare them with the direct NFT approach and finally conclude in Section 5.

2. Brief introduction to NFT

Optical signals propagating inside an optical fibre are governed by the nonlinear Schrödinger equation,

(1)$$\frac{\partial}{\partial z}q(t,z) ={-}j\frac{\partial^{2}}{\partial t^{2}}q(t,z) - 2j\left|q(t,z)\right|^{2} q(t,z),$$

where $q(t,z)$ is the normalized optical signal in time domain at normalized propagation distance $z$ [18]. The pulse $q$ and distance $z$ can be associated to real-world units through the following relations:

q={\sqrt\frac{\gamma L}{2}}A\textrm{,}\quad z=\frac{l}{L}\textrm{,}\quad t=\sqrt{\frac{2}{\left|\beta_2\right|L}}\tau,

where $A$ is real-world optical signal amplitude, $L$ is total optical fibre length, $\tau$ is the real-world time, $\gamma$ is nonlinear coefficient of the optical fibre and $\beta _2$ is the group velocity dispersion of the fibre.

Due to the second term on the right-hand side of the equation (the nonlinear term), the signals at different linear frequency channels can influence each other based on the power of the signals, which causes signal degradation when the power is high. One way to solve this problem is to apply NFT, after which the channels at different nonlinear frequency channels no longer influence each other.

Performing an NFT on a time signal $q(t)$ is equivalent to solve the following differential equation [18]

(2)$$v_{t}=\left( \begin{array}[c]{ccc} - i\lambda & q(t) & \\ -q^{*}(t) & i\lambda & \end{array} \right)v$$

with the boundary conditions:

(3)$$v^1(t,\lambda) = \left(\begin{array}{l} v^1_1(t,\lambda)\\ v^1_2(t,\lambda) \end{array} \right) \to \left( \begin{array}{l} 0 \\ 1 \end{array} \right)e^{j\lambda t}, \quad t\to +\infty$$

(4)$$v^{2}(t,\lambda) = \left(\begin{array}{l} v^{2}_1(t,\lambda)\\ v^{2}_2(t,\lambda) \end{array} \right) \to \left( \begin{array}{l} 1 \\ 0 \end{array} \right)e^{-j\lambda t}, \quad t\to -\infty$$

(5)$$v^1(t,\lambda^*) \to \left( \begin{array}{r} 0 \\ 1 \end{array} \right)e^{j\lambda^* t}, \quad t\to +\infty$$

(6)$$v^{2}(t,\lambda^*) \to \left( \begin{array}{r} -1 \\ 0 \end{array} \right)e^{-j\lambda^* t}, \quad t\to -\infty $$

where $v_t$ is the derivative of $v$ and $\lambda$ is the complex nonlinear frequency also known as the eigenvalue.

Two scattering coefficients $a(\lambda )$ and $b(\lambda )$ can be defined through the relations

(7)$$ v^{2}(t,\lambda)=a(\lambda)\tilde{v}^1(t,\lambda^*)+b(\lambda)v^1(t,\lambda) \textrm{,} $$

(8)$$ \tilde{v}^{2}(t,\lambda)={-}b^*(\lambda^*)\tilde{v}^1(t,\lambda^*)+a^*(\lambda^*)v^1(t,\lambda)\textrm{,} $$

where

\tilde{v}^1(t,\lambda^*)=\left( \begin{array}{r} v^{1\ast}_2(t,\lambda^\ast)\\ -v^{1\ast}_1(t,\lambda^\ast) \end{array} \right), \quad \tilde{v}^{2}(t,\lambda^*)=\left(\begin{array}{r} v^{2\ast}_2(t,\lambda^\ast)\\ -v^{2\ast}_1(t,\lambda^\ast) \end{array} \right)

Since $a(\lambda )$ and $b(\lambda )$ are invariant in time, they can be evaluated at any point in time. One choice is to evaluate at $t\to +\infty$. Solving $v^{2}(t,\lambda )$ from $t\to -\infty$ and using the boundary conditions, the following $a(\lambda )$ and $b(\lambda )$ can be obtained [18]:

(9)$$ a(\lambda)=\displaystyle\lim_{t\to\infty} v^{2}_1(t,\lambda)e^{j\lambda t}, \quad b(\lambda)=\displaystyle\lim_{t\to\infty} v^{2}_2(t,\lambda)e^{{-}j\lambda t}. $$

One may notice that $a(\lambda )$ and $b(\lambda )$ are well defined for real $\lambda$, but for complex $\lambda$, $\tilde {v}^1(t,\lambda ^*)\to \infty$ when $t\to \infty$, thus Eq. (7) and (8) only have solutions when $a(\lambda )=0$. As a result, the NFT spectrum is continuous for real $\lambda$, but for complex $\lambda$, there are only discrete points. The continuous and discrete NFT spectrum are defined as

(10)$$ Q^{c}(\lambda)=b(\lambda)/a(\lambda), \quad \lambda\in\mathbb{R}{,}$$

(11)$$ Q^{d}(\lambda)=b(\lambda)/a^\prime(\lambda), \quad \lambda\in\mathbb{C^+}{,}$$

where $\displaystyle a^{\prime} (\lambda )=\frac {\partial a(\lambda )}{\partial \lambda }$. In the nonlinear Fourier domain, the NFT spectra at different propagation length can be represented by the ones at the beginning of the propagation through the following relations:

(12)$$ Q^{c}(\lambda,z) = Q^{c}(\lambda,0)e^{-4j\lambda^{2} z} {,} \quad \lambda\in\mathbb{R}{,}$$

(13)$$ Q^{d}(\lambda,z) = Q^{d}(\lambda,0)e^{-4j\lambda^{2} z} {,} \quad \lambda\in\mathbb{C^{+}}{,}$$

Both continuous and discrete spectrum can be used to carry information. The work in the Ref. [22] used the discrete spectrum $Q^{d}(\lambda )$. In this work, we focus on the continuous spectrum $Q^{c}(\lambda )$.

2.1 Q-modulation and b-modulation

OFDM and QAM are widely used techniques in optical communication. Especially recently, OFDM-QAM has become popular in the research field of NFT-based communication systems [23–28]. The modulation is usually applied to continuous spectrum $Q^{c}(\lambda )$ directly, but modulating $b(\lambda )$ (b-modulation) is also recommended as a better alternative [29–34].

The modulation can often be expressed as

(14)$$s(\lambda)=\sum_n c_n \omega_n(\lambda)\textrm{,}$$

where $n$ is the number of channels, $c_n$ is the complex data symbols from the quadrature-amplitude modulation (QAM), $\omega _n(\lambda )$ is the carrier wave. Conventionally, $sinc$ function is used as $\omega (\lambda )$ [24]. Other choices of functions such as raised cosine or flat-top window with cardinal sine carrier waveform have also been proposed [30]. The function $s(\lambda )$ is the modulated spectrum. It is often scaled to match the desired signal power. For Q-modulation case, $s(\lambda )$ is $Q^{c}(\lambda )$. An inverse NFT can be performed to obtain the temporal signal $q(t)$ [24]. In the case of b-modulation, $s(\lambda )$ is $b(\lambda )$ and an additional step is needed to generate $a(\lambda )$ from $b(\lambda )$ before the inverse NFT step can be taken [29].

3. Machine learning tools and setups

MATLAB’s deep learning toolbox (MATLAB 2019a) is used in this work. The toolbox provides a collection of utility tools to help design, analyse and train networks without extensive knowledge of machine learning. This work does not emphasise on the optimization of the neural network but rather demonstrates the application of CNN and supervised learning in bypassing NFT calculations [35].

Supervised learning can solve two types of problem: regression and classification. The regression problems give continuous variable outputs such as the NFT spectrum $Q^{c}(\lambda )$ while the classification problems give discrete variable outputs such as the symbol $c_k$ in Eq. (14). For our work, we decode the information from the received signal directly, hence the network is set up for a classification problem.

3.1 Parameters of OFDM-QAM

To set up the problem, we first need a reasonable format for the OFDM-QAM. The constellation for QAM are usually in the form of 16, 32, 64, 128 points, etc, and 16-, 32- and 64-QAM are the common formats in NFT related works [15,16,23–26,30,31,36]. MATLAB(2019a)’s deep learning toolbox is mainly for image and sequence data. For our problem, we use the “imageInputLayer” from the toolbox. A downside of using this input layer is that it only accepts 8-bit input values (256 levels), which is a relatively low resolution for digitising high-density QAMs. Because of this and to demonstrate the concept, 16-QAM is chosen for this work ($c_n\in \{-3,-1,1,3\}+i\{-3,-1,1,3\}$). A simple encoding method is used, in which each alphabet of the constellation represents a 4-bit number between 0 to 15.

A $n$-channel 16-QAM pulse can represent $16^n$ different signals ($4\times n$-bit). When the number of channels is small, a single neural network can be used to classify all the input signal. For instance, a 3-channel 16-QAM pulse represents 12-bit signals, hence 4096 different signals for the neural network to classify. It can be seen that this approach will quickly become impractical as the number of channels increases. However, we propose an immediate approach to bypass this problem. That is to use a separate network for each channel. In this way, each network only needs to identify 16 classes at a time. A high channel number problem can therefore be scaled down to a group of single-channel problems. In this work, 16 channels are used. With 16 channels, the total number of classes is around $1.8\times 10^{19}$, which is too big to be classified in a single network. It is sufficient to demonstrate our approach while the networks can still be trained relatively fast (2 to 3 days on a 28 core 2.4 GHz cluster node).

3.2 Parameters of NFT

The training data is generated using Q-modulation where a flat-top window function, Eq. (15), is used as carrier wave [30]

(15)$$\omega_n(\lambda)=\sum_{m=0}^{15}a_m\left[\textrm{sinc}(T\frac{\lambda-n\lambda_s}{\pi}-m)+\textrm{sinc}(T\frac{\lambda-n\lambda_s}{\pi}+m)\right]\textrm{,}$$

where $\lambda$ is the nonlinear frequency, $\lambda _s$ is the nonlinear frequency shift between the channels, $T$ is size of the time window of the pulse and $a_m$ are the coefficients taken from [30], $n$ is an integer form -7 to 8 indicating the channel number and $\textrm {sinc}(x)$ is defined as $\sin (\pi x)/\pi x$. This waveform is chosen because of its neat shape in the nonlinear frequency domain. Its special properties associated with b-modulation is not utilized in this work. In the NFT frequency space, a frequency windows of $\lambda \in [-13,13]$ is used with a frequency resolution of 0.01. In the time domain, a time window of $t\in [-T/2,T/2]$ is used with 4096 sampling points across. An example of the function $s(\lambda )$ for $T=100$ and $\lambda _s=0.471$ is shown in Fig. 2.

Fig. 2. The $s(\lambda )$ function of an example pulse. Left: real part of $s(\lambda )$. Right: imaginary part of $s(\lambda )$.

Download Full Size | PDF

4. Training CNN for NFT

A signal pulse $q(t)$ that arrives at the receiver is significantly distorted due to fibre nonlinearity and chromatic dispersion. However, its NFT spectrum still contains the information it carries. Therefore, recovering the information is basically a question of finding out the relations between the shape of the pulse and the information it contains. CNN is used for extracting features from images and shown great success. The same can be applied to find out hidden features in $q(t)$ that carry the information. Here, we use a simple neural network design, that is used to recognise handwritten digits [37]. The network architecture for a single channel is shown in Fig. 3. For a multi-channel network, the same input is passed to multiple copies of the same single channel network.

Fig. 3. Network architecture for a single channel, Convolution/Conv: Convolutional layer, ReLU: Rectifier Linear Unit layer, Maxpool: Max pooling layer, FC: Fully Connected layer, Softmax: Softmax layer

Download Full Size | PDF

The network consists of multiple layers. The first layer is called the input layer which interfaces the input data with the rest of the network. The last layer is called the output layer which gives the prediction results of the network. In this case, the input layer is the optical signal and the output layer is the data symbols $c_k$. For CNN, a special type of layer called convolutional layer is used, which is the key for feature recognition. It is often used together with an activation layer (ReLU) and a pooling layer (Maxpool). MATLAB’s deep learning toolbox has all the layers designed and ready for use. One can easily design a network by pick and drop in the graphical design interface. The network design is a trial and error process. A Bayesian optimization approach can be applied to refine the design of the network Such optimization is not used here since this work focuses on demonstrating the feasibility of using CNN to directly decode nonlinear OFDM signals.

To fit the signal pulse into the 8-bit input layer, the complex $q(t)$ are scaled using the following equation

(16)$$U(t)=128\times\frac{q(t)}{M}+127\textrm{,}$$

where $M$ is chosen such that the $|q(t)|$ of all input pulses are smaller than or equal to $M$. The real and imaginary part of $U(t)$ is then represented as two lines of pixels in a $N \times 2$ bitmap image, where $N$ is the number of sampling point across $q(t)$ [38]. The parameters for the rest of the layers used in this work are listed in Table 1.

Table 1. Network layer parameters

View Table

4.1 Training with time domain signals

4.1.1 Fixed propagation length

Signals that contains only continuous spectral components broaden in time under the influence of chromatic dispersion as they propagate through optical networks. If the propagation length is known, it has been proposed to apply a shift to the phase of $Q^{c}(\lambda )$ (called pre-compensation) before sending the pulse down the optical network. In this way, the maximum temporal span (or the guard interval) of the pulse can be reduced [23]. In this part of the work, the pulses are sent through the whole length of an optical fibre with a half-length pre-compensation at the beginning.

For the training data, a total number of 100,000 random signals with $T=100$ are numerically generated and propagated through an optical fibre using a numerical pulse propagation simulation. Each signal pulse has 4096 discrete temporal samples, which makes the input layer to the neural network to be $4096\times 2$ in size. A split-step Fourier method is used to simulate the pulse propagation with a step size of $10^{-5}$ [21]. A batch size of 500 and a learning rate of 0.002 are used in the training with a stochastic gradient descent with momentum (SGDM) optimizer. No noise is included in the training data. Fig. 4 shows the linear spectra of an example pulse before and after the optical fibre. The difference in the linear spectra demonstrates the severe nonlinear distortion happened to the optical pulse. The networks for the 16 channels are trained using the same training data. The training is terminated when the prediction accuracy (the number of correct predictions versus total number of predictions) reaches 100%. To validate the networks, independent testing data are generated. For testing data, sets of 1,000 random signals are generated in the same way as the training data. In different test data sets, distributed white Gaussian noise with different noise power is added to the pulse along the propagation.

Fig. 4. The linear spectra of an example pulse at the transmitter (TX) and receiver (RX).

Download Full Size | PDF

Figure 5 shows the bit error rate (BER) of the predictions made by the trained networks as a function of optical signal-to-noise ratio (OSNR) in comparison to the BER of using a fast direct NFT algorithm (FNFT). The dotted curves are the BER of each channel, the solid curves with circles are the total BER of all channels. The red and blue curves are obtained using CNN and direct NFT, respectively. As one can find from the figure, the network can, (1) make accurate predictions as good as the direct NFT approach at low noise levels, (2) have higher accuracy at high noise levels. Furthermore, on our desktop computer (i7-6800k CPU @ 3.4 GHz with 48 GB RAM), the evaluation speed the CNN approach is over 5 times faster than the FNFT implementation (https://github.com/FastNFT/FNFT). We have to point out the speed comparison is rough. Both FNFT and CNN are neither optimized nor under fair conditions. The comparison results only show the potential of the CNN approach.

Fig. 5. The bit error rate of CNN prediction with time-domain signals and a fixed propagation lengths as a function of optical signal-to-noise ratio. Dotted curves: BER of individual channels. Solid curves with circles: the total BER of all channels. Red curves: CNN predictions. Blue curves: direct NFT.

Download Full Size | PDF

4.1.2 Random propagation length

The networks trained in the previous section only works with pulses that propagate through a fibre with the same length as of the training pulses. This is not convenient for practical implementation. It is more preferable if a network can work with different propagation lengths. A straightforward approach to overcome this obstacle is to generate training data with random propagation lengths and train the networks with the new data. However, this approach was quickly proven inefficient. Training of the networks becomes harder as the nonlinear frequencies of the channels move away from zero (recall that $\lambda \in [-13,13]$). According to Eq. (12), the nonlinear phase accumulated during pulse propagation will depend on both $\lambda$ and the propagation distance. For a given range of random propagation lengths, choosing a larger $\lambda$ implies the range of the accumulated nonlinear phase will also be larger. This will make decoding and training more difficult.

To circumvent this obstacle, we apply the following NFT properties to the received signals to convert the non-zero nonlinear frequency channels to zero frequency [18]:

(17)$$ e^{j\phi}q(t) \underset{\textrm{INFT}}{\stackrel{\textrm{NFT}}{\rightleftharpoons}} \; e^{{-}j\phi} Q^{c}(\lambda), $$

(18)$$ q(t-t_0) \rightleftharpoons \; e^{{-}2j \lambda t_0} Q^{c}(\lambda), $$

(19)$$ q(t)e^{{-}2j \lambda_0 t} \rightleftharpoons \; Q^{c}(\lambda-\lambda_0). $$

Here right arrows represent NFT and left arrows represent Inverse NFT. Assuming channel k of the received signal is needed to be decoded at distance z, we can write the signal as:

(20)$$q(t,z) \rightleftharpoons \; {Q^{c}(\lambda,z)} , $$

(21)$$= \; Q^{c}(\lambda,0) e^{{-}4j \lambda^{2} z} , $$

(22)$$ = \; Q^{c}_k(\lambda-\lambda_k,0) e^{{-}4j \lambda^{2} z} . $$

$Q^{c}_k(\lambda)$ is $Q^{c}(\lambda)$ shifted in nonlinear frequency by $\lambda_k$, where $\lambda_k$ is the central frequency of channel k. We apply another nonlinear frequency shift of $-\lambda_k$ to Eq. (22), using Eq. (19), to obtain

(23)$$q(t,z) e^{2j \lambda_{k} t} \rightleftharpoons \; Q^{c}_{k}(\lambda+\lambda_{k}-\lambda_{k},0)e^{{-}4j(\lambda+\lambda_{k})^{2}z} $$

(24)$$= \; Q^{c}_{k}(\lambda,0)e^{{-}4j \lambda^{2} z} e^{{-}8j \lambda\lambda_{k} z} e^{{-}4j \lambda_{k}^{2} z} .$$

The term $e^{{-}4j \lambda_{k}^{2} z}$ is a constant and using Eq. (17), we have

(25)$$q(t,z) e^{2j \lambda_{k} t} e^{{-}4j \lambda_{k}^{2} z} \rightleftharpoons \; Q^{c}_{k}(\lambda,0)e^{{-}4j \lambda^{2} z} e^{{-}8j \lambda\lambda_{k} z} e^{{-}4j \lambda_{k}^{2} z} e^{4j \lambda_{k}^{2} z} $$

(26)$$= \; Q^{c}_{k}(\lambda,0)e^{{-}4j \lambda^{2} z} e^{{-}8j \lambda\lambda_{k} z}. $$

Defining $t_{k} = 4 \lambda_{k} z$, we find

(27)$$q(t,z) e^{2j \lambda_{k} t} e^{{-}j \lambda_{k} t_{k}} \rightleftharpoons \; Q^{c}_{k}(\lambda,0)e^{{-}4j \lambda^{2} z} e^{{-}2j \lambda t_{k}}. $$

Shift $q(t)$ in time by $-t_{k}$ using Eq. (18), we arrive at

(28)$$q(t+t_{k},z) e^{2j \lambda_{k} (t+t_{k})} e^{{-}j \lambda_{k} t_{k}} \rightleftharpoons \; Q^{c}_{k}(\lambda,0)e^{{-}4j \lambda^{2} z} e^{{-}2j \lambda t_{k}} e^{2j \lambda t_{k}}, $$

(29)$$q(t+t_{k},z) e^{2j \lambda_{k} t+j\lambda_{k} t_{k}} \rightleftharpoons \; Q^{c}_{k}(\lambda,0)e^{{-}4j \lambda^{2} z} $$

(30)$$= \; Q^{c}_{k}(\lambda,z) . $$

Now, since

(31)$$q_{k}(t,z) \rightleftharpoons \; Q^{c}_{k}(\lambda,z) , $$

we finally arrive at

(32)$$ q_{k}(t,z) = \; q(t+t_{k},z) e^{2j \lambda_{k} t+j\lambda_{k} t_{k}} . $$

The transformation of Eq. (32) is equivalent to shift the signal $q(t,z)$ in time and in linear frequency. It is noted that to use the transformation in Eq. (32), one needs to know the propagation length of the signals at the receiver. Applying the transformation in Eq. (32) to all of the received pulses, we can train the neural networks as easily as fixed propagation length case. Figure 6 shows the BER of trained networks for random propagation length as a function of OSNR. The size of the training dataset is 100,000. The propagation distance is randomized between 0 to 50% of the total fibre length. We propagate a maximum of 50% of the full fibre length in this test to obtain the same level of pulse broadening as the fixed-length case since no pre-compensation is applied here. In this case, we choose $T=40$, which leads to a broader bandwidth in the nonlinear frequency domain according to Eq. (15) and resulting in approximately 2.5 times increase in input signal energy comparing to the previous case. For this case, the real world peak signal power of the pulse at launch is around 72 mW for a maximum propagating length of 500 km.

Fig. 6. The bit error rate of CNN prediction with time-domain signals and random propagation lengths as a function of optical signal-to-noise ratio. Dotted curves: BER of individual channels. Solid curve with circles: the total BER of all channels.

Download Full Size | PDF

4.2 Training with frequency domain signals

For pulses with a pure continuous spectrum, soliton order $N<1$, which implies dispersion effects are dominant [21]. The optical pulses propagating in the fibre undergo significant broadening in the time domain due to fibre dispersion. As the pulse spreads in time, the instantaneous power $|q(t)|^{2}$ reduces, which lead to the reduction of nonlinear interactions between different linear frequency components. Hence, it is expected that nonlinear interactions are more concentrated at the initial part of the propagation (without pre-compensation) in which most of the spectral change happens. This observation indicates that training neural networks using the pulse in the linear frequency domain can be more efficient than using the pulse in the time domain since for the most part of the pulse propagation, the linear spectrum of the pulse changes insignificantly. One may also downsample the pulses in the frequency domain to further improve efficiency. The only disadvantage is the need for additional equipment to perform the linear Fourier transforms on the received pulses before decoding.

Using the same fixed propagation length dataset as before, we re-train the networks using the linear spectrum of the pulses. There are 4096 data points in the time domain, but after the linear Fourier transform, the pulses are concentrated in the middle of the spectrum. An example of a pulse’s spectrum is shown in the inset of Fig. 7. We cut out the empty spectrum and only used 500 data points in the middle to train the networks (between the two dashed red lines). In this case, the training time is also shorter because the size of the input to the network is smaller. However, the accuracy of the trained network, as shown in Fig. 7, remains approximately the same as before. Furthermore, the validation speed on the desktop computer is about 12 times faster (includes the time of performing the linear FFTs) than using time-domain signals.

Fig. 7. The bit error rate of CNN prediction with frequency-domain signals and a fixed propagation length as a function of optical signal-to-noise ratio. Dotted curves: BER of individual channels. Solid curve with circles: the total BER of all channels. Inset: an example of the spectrum of a pulse. The two dashed red lines denote the spectral region that is used for training (500 points from 1801 to 2300).

Download Full Size | PDF

The same approach is applied to the dataset with random propagation length. The maximum spectral width of the pulse in this dataset is larger than the fixed propagation length dataset due to higher pulse energy. Therefore, 700 points are used. The result is shown in Fig. 8. The accuracy of the networks’ prediction is much higher than the time domain ones, especially for large noise energy levels. We also downsample the frequency data to 350 points and re-train the networks to further improve the speed. The resulting BER is still better than the time domain ones.

Fig. 8. The bit error rate of CNN prediction with frequency-domain signals and random propagation lengths as a function of optical signal-to-noise ratio. Two sets of data with different sampling rate are presented. Dotted curves: BER of individual channels. Solid curve with circles: the total BER of all channels.

Download Full Size | PDF

In the next example, we demonstrate the scalability of our approach by decoding 128-subchannel 16-QAM signals. Here, $T=300$ is chosen and $2^{14}$ sampling points are used across the time windows of the pulses. Figure 9 shows the BER as a function of OSNR for both CNN approach (red) as well as direct NFT approach (blue). The light colored dotted curves are the BER of each channel while the solid curve with circles are the total BER of all channels. From the plot, one can see the CNN approach has lower BER for small OSNR, which agrees with previous examples. For large OSNR, the total BER of the direct NFT approach is lower but there is a bigger deviations between the BER of individual channels than the CNN approach. We believe increasing the pulse sampling rate can improve the accuracy of the NFT routine and increase the size of the training dataset can improve the accuracy of CNN approach at high OSNR. Nevertheless, this example demonstrates that the CNN approach can be a good candidature to replace NFT calculation in practical applications.

Fig. 9. The bit error rate of CNN prediction with time-domain signals and random propagation lengths as a function of optical signal-to-noise ratio. Dotted curves: BER of individual channels. Solid curve with circles: the total BER of all channels.

Download Full Size | PDF

5. Conclusion

In this paper, we have used convolutional neural networks to directly decode information from nonlinear OFDM-QAM signals. We have shown that CNN can be trained to extract information from the pulses in two scenarios; 1) the propagation length is fixed and 2) the propagation length is random but is known at the receiver end. The CNN can successfully decode information with rather a high accuracy, it also shows resilience to the influence of noise. Higher than 60% accuracy can be achieved even when the noise energy is at 10% of the signal energy. The accuracy of CNN prediction at high noise level significantly outperforms the conventional direct NFT method. Furthermore, the neural networks decode information at a rather promising speed, which shows the potential of the CNN approach in future applications. The examples shown in this work did not include fibre loss, however, our preliminary results show that the network can be trained for fibre losses of the order of $0.2dB/km$ with similar results. In practice, the optical fibres used in the communication network can have variations in chromatic dispersion, nonlinearity and loss. The impact of these variations on the performance of the neural networks can be considered in future studies. While the neural network developed in this study is not optimised, it demonstrated the possibility of using neural networks for NFT based communication system. The research on using machine learning for optical communication is still at an early stage. There are many opportunities and great potentials for future development in this field.

Funding

Australian Research Council (DP190102896).

Acknowledgements

This research was supported fully by the Australian Government through the Australian Research Council (DP190102896).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. A. H. Shafie, T. M. Bazan, and K. M. Hassan, “The effect of FWM and SRS on the performance of WDM systems with optical amplifiers,” in 2018 35th National Radio Science Conference (NRSC), (2018), pp. 425–432.

2. C. E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J. 27(3), 379–423 (1948). [CrossRef]

3. S. K. Turitsyn, J. E. Prilepsky, S. T. Le, S. Wahls, L. L. Frumin, M. Kamalian, and S. A. Derevyanko, “Nonlinear Fourier transform for optical data processing and transmission: advances and perspectives,” Optica 4(3), 307–322 (2017). [CrossRef]

4. I. T. Lima, T. D. S. DeMenezes, V. S. Grigoryan, M. O’Sullivan, and C. R. Menyuk, “Nonlinear Compensation in Optical Communications Systems With Normal Dispersion Fibers Using the Nonlinear Fourier Transform,” J. Lightwave Technol. 35(23), 5056–5068 (2017). [CrossRef]

5. S. Wahls and H. V. Poor, “Fast inverse nonlinear Fourier transform for generating multi-solitons in optical fiber,” in 2015 IEEE International Symposium on Information Theory (ISIT), (2015), pp. 1676–1680.

6. S. Chimmalgi, P. J. Prins, and S. Wahls, “Fast Nonlinear Fourier Transform Algorithms Using Higher Order Exponential Integrators,” IEEE Access 7, 145161–145176 (2019). [CrossRef]

7. V. Vaibhav, “Numerical Methods for Fast Nonlinear Fourier Transformation, Part I: Exponential Runge-Kutta and Linear Multistep Methods,” arXiv:1812.04701 [physics] (2018).

8. V. Aref, S. T. Le, and H. Buelow, “An Efficient Nonlinear Fourier Transform Algorithm for Detection of Eigenvalues from Continuous Spectrum,” in Optical Fiber Communication Conference (OFC), (Optical Society of America, 2019), p. M1I.5.

9. A. Vasylchenkova, J. E. Prilepsky, D. Shepelsky, and A. Chattopadhyay, “Direct nonlinear Fourier transform algorithms for the computation of solitonic spectra in focusing nonlinear Schrödinger equation,” Commun. Nonlinear Sci. Numer. Simul. 68, 347–371 (2019). [CrossRef]

10. O. Sidelnikov, A. Redyuk, and S. Sygletos, “Equalization performance and complexity analysis of dynamic deep neural networks in long haul transmission systems,” Opt. Express 26(25), 32765–32776 (2018). [CrossRef]

11. F. Musumeci, C. Rottondi, A. Nag, I. Macaluso, D. Zibar, M. Ruffini, and M. Tornatore, “An Overview on Application of Machine Learning Techniques in Optical Networks,” IEEE Commun. Surv. Tutorials 21(2), 1383–1408 (2019). [CrossRef]

12. M. Schaedler, C. Bluemm, M. Kuschnerov, F. Pittalá, S. Calabró, and S. Pachnicke, “Deep Neural Network Equalization for Optical Short Reach Communication,” Appl. Sci. 9(21), 4675 (2019). [CrossRef]

13. P. J. Freire, V. Neskorniuk, A. Napoli, B. Spinnler, N. Costa, G. Khanna, E. Riccardi, J. E. Prilepsky, and S. K. Turitsyn, “Complex-Valued Neural Network Design for Mitigation of Signal Distortions in Optical Links,” Journal of Lightwave Technology (Early Access).

14. R. T. Jones, S. Gaiarin, M. P. Yankov, and D. Zibar, “Time-Domain Neural Network Receiver for Nonlinear Frequency Division Multiplexed Systems,” IEEE Photonics Technol. Lett. 30(12), 1079–1082 (2018). [CrossRef]

15. O. Kotlyar, M. Pankratova, M. Kamalian, A. Vasylchenkova, J. E. Prilepsky, and S. K. Turitsyn, “Unsupervised and supervised machine learning for performance improvement of NFT optical transmission,” in 2018 IEEE British and Irish Conference on Optics and Photonics (BICOP), (2018), pp. 1–4.

16. O. Kotlyar, M. Kamalian-Kopae, J. E. Prilepsky, M. Pankratova, and S. K. Turitsyn, “Machine learning for performance improvement of periodic NFT-based communication system,” in 45th European Conference on Optical Communication (ECOC 2019), (2019), pp. 1–4.

17. O. Kotlyar, M. Pankratova, M. Kamalian-Kopae, A. Vasylchenkova, J. E. Prilepsky, and S. K. Turitsyn, “Combining nonlinear Fourier transform and neural network-based processing in optical communications,” Opt. Lett. 45(13), 3462–3465 (2020). [CrossRef]

18. M. I. Yousefi and F. R. Kschischang, “Information Transmission Using the Nonlinear Fourier Transform, Part I: Mathematical Tools,” IEEE Trans. Inf. Theory 60(7), 4312–4328 (2014). [CrossRef]

19. I. Goodfellow, Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville., Adaptive computation and machine learning (MIT, Cambridge, MA, 2017).

20. S. Lawrence, C. Giles, A. C. Tsoi, and A. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997). [CrossRef]

21. G. P. Agrawal, Nonlinear Fiber Optics (Academic, 2013).

22. R. T. Jones, S. Gaiarin, M. P. Yankov, and D. Zibar, “Noise Robust Receiver for Eigenvalue Communication Systems,” in 2018 Optical Fiber Communications Conference and Exposition (OFC), (2018), pp. 1–3.

23. S. T. Le, V. Aref, and H. Buelow, “125 Gbps Pre-Compensated Nonlinear Frequency-Division Multiplexed Transmission,” in 2017 European Conference on Optical Communication (ECOC), (2017), pp. 1–3.

24. V. Aref, S. T. Le, and H. Buelow, “Modulation Over Nonlinear Fourier Spectrum: Continuous and Discrete Spectrum,” J. Lightwave Technol. 36(6), 1289–1295 (2018). [CrossRef]

25. W. A. Gemechu, M. Song, Y. Jaouen, S. Wabnitz, and M. I. Yousefi, “Comparison of the Nonlinear Frequency Division Multiplexing and OFDM in Experiment,” in 2017 European Conference on Optical Communication (ECOC), (2017), pp. 1–3.

26. S. T. Le, I. D. Phillips, J. E. Prilepsky, M. Kamalian, A. D. Ellis, P. Harper, and S. K. Turitsyn, “Achievable Information Rate of Nonlinear Inverse Synthesis Based 16QAM OFDM Transmission,” in ECOC 2016; 42nd European Conference on Optical Communication, (2016), pp. 1–3.

27. S. A. Derevyanko, J. E. Prilepsky, and S. K. Turitsyn, “Capacity estimates for optical transmission based on the nonlinear Fourier transform,” Nat. Commun. 7(1), 12710 (2016). [CrossRef]

28. S. Civelli, E. Forestieri, and M. Secondini, “Mitigating the Impact of Noise on Nonlinear Frequency Division Multiplexing,” Appl. Sci. 10(24), 9099 (2020). [CrossRef]

29. S. Wahls, “Generation of Time-Limited Signals in the Nonlinear Fourier Domain via b-Modulation,” in 2017 European Conference on Optical Communication (ECOC), (2017), pp. 1–3.

30. T. Gui, G. Zhou, C. Lu, A. P. T. Lau, and S. Wahls, “Nonlinear frequency division multiplexing with b-modulation: shifting the energy barrier,” Opt. Express 26(21), 27978–27990 (2018). [CrossRef]

31. S. T. Le and H. Buelow, “High Performance NFDM Transmission with b-modulation,” in Photonic Networks; 19th ITG-Symposium, (2018), pp. 1–6.

32. X. Yangzhang, V. Aref, S. T. Le, H. Bulow, and P. Bayvel, “400 Gbps Dual-Polarisation Non-Linear Frequency-Division Multiplexed Transmission with B-Modulation,” in 2018 European Conference on Optical Communication (ECOC), (IEEE, Rome, 2018), pp. 1–3.

33. A. Vasylchenkova, M. Pankratova, J. Prilepsky, N. Chichkov, and S. Turitsyn, “Signal-Dependent Noise for B-Modulation NFT-Based Transmission,” in 2019 Conference on Lasers and Electro-Optics Europe European Quantum Electronics Conference (CLEO/Europe-EQEC), (2019), p. 1.

34. S. Wahls, S. Chimmalgi, and P. J. Prins, “Wiener-Hopf Method for b-Modulation,” in Optical Fiber Communication Conference (OFC), (Optical Society of America, 2019), p. W2A.50.

35. S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach (Pearson, Upper Saddle River, 2010).

36. S. T. Le, I. D. Philips, J. E. Prilepsky, M. Kamalian, A. D. Ellis, P. Harper, and S. K. Turitsyn, “Equalization-Enhanced Phase Noise in Nonlinear Inverse Synthesis Transmissions,” in ECOC 2016; 42nd European Conference on Optical Communication, (2016), pp. 1–3.

37. matlab, “Create Simple Deep Learning Network for Classification - MATLAB & Simulink Example - MathWorks Australia,” (2020).

38. $N\times 2$ arrangement allows the filters in the convolutional layer be applied to both real and imaginary part of the pulse simultaneously to pick out features in the phase.

Direct decoding of nonlinear OFDM-QAM signals using convolutional neural network

Abstract

1. Introduction

2. Brief introduction to NFT

2.1 Q-modulation and b-modulation

3. Machine learning tools and setups

3.1 Parameters of OFDM-QAM

3.2 Parameters of NFT

4. Training CNN for NFT

4.1 Training with time domain signals

4.1.1 Fixed propagation length

4.1.2 Random propagation length

4.2 Training with frequency domain signals

5. Conclusion

Funding

Acknowledgements

Disclosures

References

Cited By

Figures (9)

Tables (1)

Equations (34)

Optics Express

Convolution	filter size: $3 \times 3$ , number of filters: 64, stride: 1, padding: 1, dilation: 1
Maxpool	pool size: $2 \times 2$ , stride: 2, padding: 0