Design of fully interpretable neural networks for digital coherent demodulation

Xiatao Huang; Wenshan Jiang; Xingwen Yi; Jing Zhang; Taowei Jin; Qianwu Zhang; Bo Xu; Kun Qiu

doi:10.1364/OE.472406

1. Introduction

With the enormous growth of the internet traffic data flow, optical fiber communication networks are constantly challenged by further extending the upper bound of capacity and transmission distance [1,2]. The performances of optical fiber communication systems can be significantly degraded by the transmission impairments, such as chromatic dispersion (CD), polarization mode dispersion (PMD), laser phase noise (PN), and fiber nonlinearities [3,4]. Fortunately, coherent optical detection combined with DSP allows the powerful equalization and mitigation of various linear impairments in the electrical domain with satisfactory performances [5,6]. Compensation of nonlinear impairments brings about a further increase in the fundamental capacity limit for fiber transmission, which is an important research area [7]. Many nonlinear compensation techniques have been investigated, such as DBP [3], nonlinear pre-distortion [8], Volterra equalization [9,10], optical phase conjugation [11,12], and nonlinear Fourier transform [13], etc. Among them, DBP is generally considered a benchmark for fiber nonlinearity compensation [14].

Recently, NN has ignited massive applications both in industry and academia [15,16]. For optical fiber communication systems, NNs have also been investigated in fiber nonlinearity equalization, carrier recovery and so on [17–19]. However, these schemes discard the theoretical framework of the traditional communication system and replace certain DSP modules of the transceiver with various NNs, which leads to two problems. Firstly, it is difficult to analyze the inner working principles of the NN and interpret the learned parameter configurations. Secondly, the performance and complexity of the NN are strongly relevant to network hyper-parameters, which are typically obtained based on experience. To solve these problems, guided by the deep unfolding theory in [20,21] which establishes the relationship between the iterative algorithm and NN architecture, a fully interpretable mapping from an optical communication system to an NN for transceiver-joint optimization has been proposed [22]. They replace the pre-equalization and adaptive filter at the transmitter- and receiver-side with two convolutional layers whose kernels need to be updated while channel, transceiver and other DSPs are regarded as unchanged functions. This method is highly clear in physical meanings compared with previous “black-box” NN-based equalizers, bringing the optimal coherent receiver equalization to a new level. For fiber nonlinearity, a learned DBP (LDBP) which converts each step of DBP into a layer of NN has been proposed for a trainable NN, moreover, all the taps of CDC filters and nonlinear coefficients in each step are jointly optimized [23,24].

In this paper, we make an extension of LDBP and propose a fully interpretable NN to completely replace the whole digital coherent demodulation process that is directly mapped from the conventional coherent DSP modules. There are three main advantages. Firstly, each layer of NN has a clear physical background, instead of a black box. It allows us to interpret the learning solutions to understand how to perform with a global optimum. Using the physics-based model with trainable filters in each step, its complexity can also be reduced. Secondly, the number of kernels in each layer of NN can be predetermined according to the physical model. In general, the limited number of kernels can avoid the overfitting problem in NN and accelerate the convergence speed. Thirdly, with the final optimized NN, we can easily tell if the conventional DSP modules achieve the optimal equalization. In our work, we can initialize NN with the help of conventional DSP to speed up the training process. We then further to replace the dispersion compensation layer with the cascaded DBP algorithm layer, and demonstrate that the performance of FINN-DBP outperforms conventional DBP with the same propagation steps. Our work illustrates that combining NN optimization with existing domain knowledge can stimulate new insights into communications problems, providing a promising choice for both linear and nonlinear distortions mitigation for long-haul coherent optical systems.

2. Brief outline of DSP and the framework of NN

The role of DSP is to recover the transmitted data from the received signal. Figure 1(a) shows the stages of DSP. We then present a strict mapping from the coherent DSP demodulation algorithm to NN modeling and optimization. The whole NN schematic is shown in Fig. 1 (b). The NN is strictly based on the coherent DSP module by module and the number of kernels in each layer is the same as the taps in each DSP module. Note that, we do not use any other layers such as ReLU, Batch normalization, or Sigmod layers in NN. The complexity of NN is also the same as traditional DSP. As NNs are designed primarily for real-valued data, in order to take full advantage of the tensors, the complex sequence is treated as two real-valued sequences in this paper. The input data to NN is reshaped to a 3-dimensional (3D) tensor with the structure (batch size, 4, data length), where 4 represents the number of sequences used for the real and imaginary part of x- and y-polarization.

Fig. 1. (a) Coherent DSP; (b) the mapping FINN structure proposed in our study; (c) the mapping DBP layer with totally K sections; (d) DBP structure.

Download Full Size | PDF

2.1 Linear chromatic dispersion compensation

In the absence of fiber nonlinearity, the effect of chromatic dispersion on the pulse $\textrm{A}({z,t} )= {[{{A_x}({z,t} ),{A_y}({z,t} )\; } ]^T}$ can be modeled by the following differential equation [6]:

(1)$$\frac{{\partial \textrm{A}({z,t} )}}{{\partial z}} = j\frac{{D{\lambda ^2}}}{{4\pi c}}\frac{{{\partial ^2}\textrm{A}({z,t} )}}{{\partial {t^2}}}$$

where z is the propagation distance, $\lambda $ is the wavelength of the light, c is the speed of light, and D is the dispersion parameter of the fiber. By solving (1), it can be shown that $\textrm{A}({0,\omega } )= exp\left( { - j\frac{{DL{\lambda^2}}}{{4\pi c}}{\omega^2}} \right)\textrm{A}({L,\omega } ),$ the original signal can be recovered from the dispersed signal. Although chromatic dispersion compensation can be implemented either in time or frequency domain [3], a time domain implementation using finite impulse response (FIR) filters is more efficient in real-time applications. By considering the impulse response of the fiber dispersion, a signal sampled every T seconds (usually exceeds the Nyquist sampling frequency) can be recovered by applying a finite impulse response (FIR) filter to the signal with tap weights given by [6]

(2)$${a_k} = \sqrt {\frac{{jc{T^2}}}{{D{\lambda ^2}z}}} \exp \left( { - j\frac{{\pi c{T^2}}}{{D{\lambda^2}z}}{k^2}} \right),\begin{array}{c} {} \end{array} - \left\lfloor {\frac{{N(z )}}{2}} \right\rfloor \le k \le \left\lfloor {\frac{{N(z )}}{2}} \right\rfloor \begin{array}{c} {} \end{array}\textrm{and}\begin{array}{c} {} \end{array}N(z )= 2\left\lfloor {\frac{{\left|D \right|{\lambda^2}z}}{{c{T^2}}}} \right\rfloor + 1$$

where x is the integer part of x rounded towards minus infinity. These tap weights form the basis for the compensation of chromatic dispersion using an FIR filter.

It is proved that an FIR filter can be represented by a one-dimensional (1D) convolutional layer and the weights of the convolutional layer are equal to the coefficients of the FIR filter. So, we use one basic convolution layer to execute the CDC. Since the filter taps of the CDC are complex values, to perform convolution with 3D data we separate the real and imaginary parts of filter taps, forming the final ($N(z )$,4,4) 3D CDC layer. Figure 2 shows the structure of CDC with complex coefficients and the corresponding CDC layer of single polarization with real-valued sequences. Note that x- and y-polarization share the same weights/kernels, only one polarization is shown in Fig. 2. The CDC layer is similar to a conventional time domain CDC. The difference is that the trainable parameters is optimized by NN. While the number of filter taps $N(z )$ is calculated from Eq. (2) in conventional CDC. According to the physical properties, taps of the CDC filter should be symmetric. Hence, we set symmetric limitations to the convolution kernel, and guarantee that the CDC layer after training also meets the physical properties. The symmetric limitations can help to decide or optimize the neurons of convolution NN, which greatly reduces the number of taps and accelerates the convergence speed. Note that we add padding on both sides to guarantee that the input and output of the CDC layer have the same size.

Fig. 2. The structure of CDC and corresponding CDC layer of x/y-polarization. $h$: taps of CDC filter given by Eq. (2). $h(R )$: real parts of h. $h(I )$: imaginary parts of h. Dark blue and light blue circles represents 1-dimensional convolution.

Download Full Size | PDF

2.2 Fiber nonlinear compensation

The nonlinear propagation of a polarization division multiplexed (PDM) optical signal in single mode fibers can be described by the coupled nonlinear Schrödinger equation (NLSE) [3]. This equation can be reduced to Manakov equation under the common assumption that the nonlinear interaction length is much greater than the length scale of random polarization rotations. The split-step Fourier method (SSFM) is an iterative solution to the nonlinear propagation equation, which can fully to compensate for the deterministic nonlinear signal-signal interaction (NSSI). In practical implementation, SSFM is realized by DBP in which the received signal is backward propagated through virtual fiber. This virtual fiber channel is split into multiple sections. In each section, a frequency domain CDC and power-dependent phase rotation are iteratively operated as shown in Fig. 3, because the truncation error is invariably introduced in time domain due to the finite number of taps, especially for the cascaded multi-span transmissions, leading to an undesired overall response [24]. The phase rotation for a PDM signal is written as [25]

(3)$$\begin{array}{c} {{h_\textrm{X}}({\Delta z} )= x\exp \left( {j\frac{8}{9}\gamma ({{{\left|x \right|}^2} + {{\left|y \right|}^2}} )\Delta z} \right)}\\ {{h_\textrm{Y}}({\Delta z} )= y\exp \left( {j\frac{8}{9}\gamma ({{{\left|x \right|}^2} + {{\left|y \right|}^2}} )\Delta z} \right)} \end{array}$$

where $x,\; y$ are the signals in the x- and y-polarization, respectively. $\gamma \; $ is the nonlinear Kerr coefficient. The factor 8/9 follows the Manakov version of DBP [14] by averaging the nonlinear contributions from each polarization.

Fig. 3. The structure of DBP and corresponding DBP layer of x/y-polarization with single section. $\alpha $: fiber attenuation. $h$: taps of CDC filter. $\gamma $: fiber nonlinear coefficient. $\kappa $: nonlinear scaling factors. Dark blue and light blue circles represents 1-dimensional convolution.

Download Full Size | PDF

Since the DBP algorithm iteratively operates the CDC and power-dependent nonlinear compensation (NLC), based on the direct mapping of DSP, the DBP layer should include multiple CDC and NLC layers alternately as shown in Fig. 1 (c). The DBP contains K sections with different step lengths. Figure 3 shows the structure of DBP and corresponding DBP layer of x/y-polarization with single section. We use frequency domain CDC with fast Fourier transform and inverse fast Fourier transform in each DBP step. The CDC layer step is the same as the CDC layer in Fig. 2. For each section, the CDC layer is a convolution layer with ($N({\Delta z} )$, 4, 4) 3D taps, but we need to minimize the cascaded truncation error accumulating and optimizing the compensation performance [24]. The NLC layer is an element-wise application of nonlinear activation function described as,

(4)$$\begin{array}{l} {\sigma _x}({\Delta z} )= x\exp \left( {j\frac{8}{9}\kappa \gamma ({{{\left|x \right|}^2} + {{\left|y \right|}^2}} )\Delta z} \right)\\ {\sigma _y}({\Delta z} )= y\exp \left( {j\frac{8}{9}\kappa \gamma ({{{\left|x \right|}^2} + {{\left|y \right|}^2}} )\Delta z} \right). \end{array}$$

We import four (the real and imaginary parts of x- and y-polarization) identical update nonlinear scaling factors $\kappa $ with 3D (1, 1, 1) tap in each NLC layer. The fiber nonlinear Kerr coefficient is relatively small, which can slow down the convergence of the model and reduce the accuracy of the trained model. Therefore, the nonlinear scaling factor $\kappa $ is important to control the average absolute value around 1, and the same order of magnitude for the other layers. The NLC layer does not change the size of the dataset.

2.3 Matched filter

Matched filtering is a process for detecting a known piece of signal that is embedded in noise. If the input signal, $s(t )= w(t )+ n(t )$, where $w(t )$ is the transmit signal and $n(t )$ is white noise, the matched filter theory states that the maximum SNR of the received signal to combat the additive white Gaussian noise occurs when the filter has an impulse response that is the time-reversal of the transmit signal. When the signal is of length $T$, then the matched filter is defined in [26] with the same taps:

(5) $$h(t )= w({T - t} ).$$

The matched filter vector takes the convolution operation with transmission samples to recover the symbols in DSP. Then it is natural to design the matched filter layer as a convolution layer mapped from the coherent DSP. Four identical traditional convolution layers with 3D ($M$, 1, 1) taps are used, where $M$ is the length of the pulse shaping filter $w(t )$. We add padding on both sides to guarantee that the input and output of this layer have the same size.

2.4 Polarization demultiplexing and equalization

In PDM optical systems, the transmitter launches complex amplitudes of both x- and y-polarized electric fields. While propagating through the fiber, the signal suffers from the effects of PMD and PDL from birefringence. After carrier frequency offset compensation, polarization demultiplexing and PMD equalization can be implemented in the digital domain by using four complex-valued multi-tap FIR filters arranged in a widely utilized 2 × 2 butterfly configuration, as show in Fig. 4. This type of equalizers can de-multiplex the PDM signals with significant crosstalk and can also equalize linear impairments. The output signals from the equalization stage ($\textrm{x}^{\prime}$ and $\textrm{y}^{\prime}$) at time k, are related to the input signal vectors containing samples $ ({k\; -{-}\; R\; + 1} )$ to k by [27]

(6)$$\begin{aligned} &\mathrm{x^{\prime}}(k )= {h_{xx}} \cdot \textrm{x}(k )+ {h_{yx}} \cdot \textrm{y}(k )\\ &{\mathrm{y^{\prime}}(k )= {h_{yy}} \cdot \textrm{y}(k )+ {h_{xy}} \cdot \textrm{x}(k )} \end{aligned}$$

where ${h_{xx}},{h_{xy}},{h_{yx}},\; \textrm{and}\; {h_{yy}}$ are the T/2-spaced weighting coefficients for the FIR filters. The length R of the tap vector is equal to the impulse response of the distortion that can be compensated. The state of polarization, the PMD of the system, and laser phase noise are time-varying. Usually, the adaptive equalizer is required to compensate for these dynamic effects. The state of polarization changes slowly, then it can be treated as the static effect as in [22,26], and ${h_{xx}},{h_{xy}},{h_{yx}},\; \textrm{and}\; {h_{yy}}$ are all kept unchanged after convergence. In general, the weighting parameters of the LMS or RLS algorithm can be used for initialization of the polarization NN module. To track the variation of the state of polarization, we can just train the NN of polarization de-multiplexing module with transfer learning or incremental learning. Then in the training stage, we can update all of the kernels in the whole network optimization stage as in Section 2.6 to optimize the overall system performance.

Fig. 4. The structure of pol demux and corresponding pol demux layer. ${h_{xx}},{h_{xy}},{h_{yx}},\; \textrm{and}\; {h_{yy}}$: taps of equalizers can de-multiplex. Dark blue and light blue circles represents 1-dimensional convolution.

Download Full Size | PDF

In our proposed NN structure, under the static effect assumption mentioned above, polarization demultiplexing and equalization operation is equivalent to convolving with the weighting coefficients. Figure 4 shows the structure of polarization demultiplexing and the corresponding polarization demultiplexing layers. Traditional DSP module uses a multi-tap FIR filter with butterfly configuration. In the polarization demultiplexing layer, we implement a traditional convolution layer but with all the initial coefficients ${h_{xx}},{h_{xy}},{h_{yx}},\; \textrm{and}\; {h_{yy}}$ flipped to realize this module. The complex-valued signal processings also need to be further separated into the real and imaginary parts, forming the final ($R$, 4, 4) 3D convolution layer, where R denotes filter taps mentioned above.

2.5 Frequency offset and phase estimation

Due to fabrication imperfection of optical devices and environmental variation, the frequencies of the transmitter and the LO lasers are not accurately aligned. Besides, the linewidth of the lasers also introduces the phase noise, which is generally considered a Wiener process. The purpose of the carrier recovery algorithm in the coherent receiver is to remove the impairments of carrier frequency offset and phase noise by processing a discrete data sample sequence. In the case of considering the symbol phase only, it is assumed that the sampling value of the ${k_{th}}$ received symbol is [28]

(7)$$S(k )= \exp \{{j({{\theta_s}(k )+ \Delta \omega kT + {\theta_L}(k )+ {\theta_{ASE}}(k )} )} \}$$

where ${\theta _s}(k )$ represents the modulated phase, $\Delta \omega$ is additional phase caused by frequency offset, ${\theta _L}(k )$ is the phase noise from the laser linewidth, ${\theta _{ASE}}(k )$ is related to the amplified spontaneous emission (ASE) noise. The frequency shift is a slow variation process relative to the symbol rate so that the frequency offset corresponding to multiple consecutive symbols can be regarded as the same. In this case, a series of consecutive symbols in a block can be processed together to estimate the frequency offset. By calculating the phase difference between the adjacent symbols, the ${\theta _L}$ can be removed.

The carrier frequency offset compensation algorithm needs to calculate the frequency offset first, and NN can compensate for this offset directly. It is reasonable to design the frequency offset compensation layer into a two identical (x- and y-polarization) activation function layers, optimizing a single parameter $\Delta \omega $ with 3D (1, 1, 1) tap

(8)$$\begin{aligned} {\sigma _{\Delta \omega }}(x )&= x\exp ({ - j\Delta \omega \textrm{T}} )\\ &{{\sigma _{\Delta \omega }}(y )= y\exp ({ - j\Delta \omega \textrm{T}} )} \end{aligned}$$

where $\textrm{T}$ is an upper triangular matrix with element $\textrm{T}({a,b} )$ = $T$ is the sampling period for $b \ge a$ and 0 otherwise.

The residual phase noise is caused by the linewidth of the laser, and its variance increases linearly with the observation interval. The traditional DSP phase recovery algorithms, such as blind phase search (BPS), need dozens of testing phases (increase complexity) and a decision module that is not suitable for NN (difficult to calculate the gradient). Therefore, it is regarded as the static layer and kept unchanged throughout the whole training, the same as in traditional DSP. We are not packing the BPS into NN for training, instead, we record the phase rotation from the traditional DSP of each symbol and treat it as a static layer of NN before calculating the loss function.

2.6 Optimization base on NN and practical considerations

As aforementioned, we replace the digital filters of the coherent demodulation with a FINN, in which the CDC layer and polarization demultiplexing layer are equivalent to the convolutional layers of the NN, and the NLC layer and frequency offset compensation layer are equivalent to the active functions with necessary calibration. The inputs of the NN are the samples before equalization and the outputs of NN are the symbols after equalization. As mentioned above, we use frequency domain CDC in DBP steps to avoid s the accumulation of truncation errors. The FINN by optimizing the CDC filters and the hyper-parameters can mitigate the accumulation of truncation errors to save resources. Furthermore, the NN-based optimization requires a differentiable channel model to compute the gradients. In our proposed NN structure, the phase noise layer is recorded as a static layer and the other layers are differentiable, making the whole cascaded NN a differentiable model. Therefore, the FINN can optimize a special module and also the whole demodulation process to be considered as an entirety.

The parameter vector is commonly optimized by using a variant of stochastic gradient descent (SGD). Choosing a suitable parameter-initialization scheme is important to facilitate successful training since SGD is a local search method. Our approach leverages the fact that the proposed NN framework is based on a well-established numerical method by initializing the parameters such that the initial performance is close to the standard DSP. In this case, the initial performance of the NN is close to the performance of the system with coherent DSP equalization. After designing the NN, the second step is to obtain the optimal equalization filter coefficients by training the NN. The goal of training the NN is to minimize the mean square error (MSE) between the output symbol $\hat{x}$ and the input symbol x which is utilized as the loss function. The loss function can be expressed as,

(9)$$L = {\left|{\hat{x} - x} \right|^2}$$

The NN is trained using the Adam optimizer, which is an SGD method based on adaptive estimation of the first-order and second-order moments. The training process consists of iteratively varying parameters mentioned above, which is then used to update the trainable parameters, and evaluate the performance of the system in terms of MSE loss.

3. Experimental setup and results

Figure 5 shows our experimental setup. At the transmitter side, one 16-GBaud 16-QAM sequence with 65536 symbols mapped from a random bit source is generated. After the raised-cosine pulse shaping with 0.1 roll-off factor, the signal is loaded into 50 GSa/s arbitrary waveform generator whose outputs are amplified by SHF-807 to drive the modulator. Polarization division multiplexing (PDM) emulator is used to emulate polarization multiplexed signals through a polarization beam splitter (PBS)-delay-polarization beam combiner (PBC) approach. An EDFA is used to adjust and monitor the signal power before the re-circulating loop, which is comprised of one span of 100-km standard single mode fibers (attenuation 0.2 dB/km, CD parameter 16.7 ps/(nm·km), and nonlinear coefficient 1.2 1/(W·km)), one inline EDFA with a noise figure of 5 dB and an optical band-pass filter (BPF) with a bandwidth of 1 nm to prevent saturation of the EDFAs [29]. The number of the span is 6. At the receiver, the local oscillator is an ECL with <100 kHz linewidth. After coherent detection, the received signals are digitized by a 4-channel real-time oscilloscope with a sampling rate of 50 GS/s per channel, resampled to 2 samples per symbol for the ensuing FINN. The offline DSP functions include two technical routes: the first one is the traditional CDC or DBP algorithm, which includes down sample, ideal phase rotation, and data recovery. Note that the conventional DSP is only for the initialization of the FINN and for comparison. The second one is the proposed FINN-CDC or FINN-DBP which is equivalent to the corresponding function of the first route but with a NN optimization process. FINN-CDC refers to replacing the digital filters of the receiver with FINN kernels that only contains the CDC function and compensate for the linear distortions, while FINN-DBP refers to FINN containing DBP layer that can compensate for the nonlinear distortions. As explained previously, we use the structure of NN in Fig. 1 for the training procedure. To train and test the NN, the dataset with 65536 symbols consists of 549 sets and each set contains 120 transmit symbols. The dataset is used to train the NN iteratively with batch size sets 32. We allocate 75% of the data as a training dataset and the rest 25% of data as a testing dataset. The testing dataset is utilized to guarantee the network reaching the optimized performance.

Fig. 5. Experimental setup. IQ-Tx: IQ- transmitter; PDM Emu.: polarization division multiplex emulator; SW: optic switches; EDFA: erbium-doped fiber amplifier; SSMF: standard single-mode fiber; BPF: optical band-pass filter; VOA: variable optical attenuator; LO: local oscillator.

Download Full Size | PDF

First, we focus on CDC, 1sps-DBP, 2sps-DBP, FINN-CDC, FINN-1sps-DBP, and FINN-2sps-DBP with logarithmic step-size distribution [30]. The number of kernels for each layer and taps for each DSP module is presented in Table 1. The FINN implicitly learns the DSP parameters from a given link, so it can readily be applied to links with heterogeneous spans. Note that the performance improvements of FINN originate from optimizing the parameters in DSP and it incurs no additional computational complexity, e.g., linear DSP technical route with CDC has the almost same complexity as FINN-CDC. Moreover, we use frequency domain CDC with fast Fourier transform and inverse fast Fourier transform for CDC in DBP steps. The CDC filter with a finite number of taps inevitably introduces a truncation error. The truncation error accumulates coherently in the multi-spans transmission, leading to an undesired overall response. The FINN by optimizing the CDC filters and the hyper-parameters can mitigate the accumulation of truncation errors to save resources. This stimulates the fully implementation of NN to completely replace the whole digital coherent demodulation process. The SNR of the transceiver chain is no better than 20 dB (exactly 19.93 dB) in our B2B configuration and serves as the maximum SNR of the system. The SNR performances are displayed to show the effectiveness of the proposed NN training method. The SNR, which is defined as ${\parallel} x{\parallel ^2}/{\parallel} x\, - \,\hat{x}\,{\parallel ^2}\,$ where x denotes the transmitted symbol and $\hat{x}$ denotes the recovered symbol x after NN. The Adam optimizer is used for training and a batch size of 120 symbols for each iteration provides optimal results from our experimental data. Adam’s learning rate is set to be 10⁻⁴, ${\beta _1}$ is 0.9 and ${\beta _2}$ is 0.999. The setup of the training link is built in Pytorch to perform the optimization of the parameters, while the performances are evaluated using the testing data. Padding is selected to guarantee that all the inputs and outputs of convolutional layers have the same length. We observed that the NN converges in two thousands of iterations. Therefore, we set the maximum iteration to 6000. In the meantime, early stopping is used, which means that the training procedure breaks when the loss does not decrease on the testing dataset. As the noise of EDFA is randomly generated at each training iteration, no batch is identical during the training. This also prevents overfitting problems that may arise when using a training dataset of a fixed and limited size for multiple times [31]. After training, we evaluate the performance of the equalization using the evaluation link.

Table 1. Filter kernels/taps used in FINN and DSP

View Table

Next, SNRs as a function of launched optical power with different equalization methods at a transmission link of 600 km and the corresponding constellation diagrams are shown in Fig. 6. DBP method improves the optimum launch power with respect to CDC by 2 dB, from 0 dBm to 2 dBm. By using FINN-DBP, the peak-SNR gain is further increased compared with DBP, and the optimum launch power for FINN-DBP remains the same as the one for DBP. Compared with CDC, FINN-CDC produces a negligible influence (less than 0.06-dB SNR gain). This is because no nonlinear functions are used in our proposed FINN, making the whole NN can only deal with linear effects. We also confirm that the traditional coherent DSP almost can cancel out all the linear interferences and achieve its best performance. When comparing the performances of different methods under the optimal launch power, 2sps-DBP has 0.40-dB and 0.73-dB gain over 1sps-DBP and CDC, which results from nonlinear compensation. Besides, the SNRs under the optimal launch power, FINN-2sps-DBP and FINN-1sps-DBP bring about 0.59-dB and 0.53-dB SNR improvement concerning 2sps-DBP and 1sps-DBP, respectively. Figures 6 (b) and (c) display the constellations of FINN-2sps-DBP and FINN-CDC at their optimal launch power. Above all, the FINN-DBP contributes a lot to further improving the transmission performance for long-haul coherent optical systems.

Fig. 6. (a) Performance of different equalization methods. The constellation diagram for (b): FINN-2sps-DBP @ 2 dBm and (c): FINN-CDC @ 0 dBm.

Download Full Size | PDF

Then, we can identify the state of layer and associate it with each digital filter with the optimized coefficients of NN because our proposed FINN structure is a physics-based model and fully interpretable. Figure 7 and Fig. 8 show the experimental results for FINN-CDC with a launch power of 0 dBm and FINN-DBP with a launch power of 2 dBm, respectively. Figure 7 (a) and Fig. 8 (a) show the MSE evolution for training. We initialize the FINN in a good state, leading the initialized MSE to be small and FINN to converge fast. After about 2000 iterations, the performance tends to be converged. This characteristic can also be attributed to the mini-batch gradient descent, saving plenty of time when verifying the effectiveness of the adjustment of parameters. In Fig. 8 (a), MSEs between the input and output symbols after training are smaller than those after FINN- CDC as shown in Fig. 7 (a).

Fig. 7. Experimental results for FINN-CDC with launch power of 0 dBm. (a) MSE evolution for training. (b) The evolution of the CDC layer of the combined amplitude and phase response. The magnitude of (c): matched filter layer, (d): frequency offset layer and (e1-e4): polarization demultiplexing and equalization layer before and after training.

Download Full Size | PDF

Fig. 8. Experimental results for FINN-2sps-DBP with launch power of 2 dBm. (a) MSE evolution for training. (b) The evolution of the DBP layer of the combined amplitude and phase response. (c) The evolution of the scaling factor. The magnitude of (d): matched filter layer, (e): frequency offset layer and (f1-f4): polarization demultiplexing and equalization layer before and after training.

Download Full Size | PDF

Figures 7 (b-e) display the results of all the layers’ coefficients before and after training with FINN-CDC, where DSP configurations are used for initialization. Almost all the coefficients are unchanged, reflecting that traditional CDC-based DSP already achieved its best performance. However, in Figs. 8 (b-f), the learned coefficients of each NN layer are changed with respect to FINN-2sps-DBP with 2-dBm launch power. In Fig. 8 (b), the evolutions of the DBP layer of the amplitude and phase response are both changed after FINN-DBP. After NN learning, the in-band amplitude response becomes flat, and the in-band phase response is almost kept the same during the training process, which is in accord with the physical properties of dispersion. In the meantime, the influence of out of band component response is less important, the amplitude only needs to be lower than a certain level. It indicated for band-limited signals, we do not need to compensate for dispersion over the whole frequency band, thereby reducing the implementation cost. Figure 8 (c) depicts the evolution of the nonlinear scaling factor. Traditional DBP using only one scaling factor along with the whole link is not an optimal solution in Fig. 8 (c). The nonlinear scaling factors show a similar “U” shape during the training process which has also been pointed out in [32]. Note, due to the optimization of the DBP layer changing the outputs of nonlinear compensation, the ensuing layers have also been changed for the overall optimization as shown in Figs. 8 (d-f).

4. Conclusion

In this paper, we replace the whole digital filter of conventional coherent optical demodulation with fully interpretable NNs to compensate for the optical system impairments, which shows a novel solution to coherent demodulation and joint the advantages of conventional demodulation and NN to reach the resource saving and performance optimization. We experimentally demonstrate the equalization performance of the proposed FINN when various fiber impairments are considered. The initialization of FINN and the predetermined number of kernels according to the physical model can accelerate the convergence speed. In addition, we demonstrate that it is sufficient to compensate for the dispersion of the in band components. Finally, taking advantage of NN optimization, FINN-DBP can achieve global optimization, bringing about 0.59-dB and 0.53-dB SNR improvement compared with the conventional DBP of 1sps and 2sps, respectively. The proposed method shows great potential for coherent demodulation, providing a better choice for future coherent optical systems.

Funding

National Key Research and Development Program of China (2018YFB1801704); National Natural Science Foundation of China (61871082, 61871408, 62035018, 62111530150); Open Fund of IPOC (IPOC2020A011); Science and Technology Commission of Shanghai Municipality (SKLSFO2021-01); Fundamental Research Funds for the Central Universities (ZYGX2019J008, ZYGX2020ZB043).

Acknowledgement

This work was supported by National Key Research and Development Program of China (2018YFB1801704), National Science Foundation of China (NSFC) (61871082, 62111530150, 61871408, and 62035018), Open Fund of IPOC (BUPT) (No. IPOC2020A011), STCSM (SKLSFO2021-01) and Fundamental Research Funds for the Central Universities (ZYGX2020ZB043 and ZYGX2019J008).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Stefano, “End-to-end capacities of a quantum communication network,” Commun. Phys. 2(1), 1–10 (2019). [CrossRef]

2. R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightwave Technol. 28(4), 662–701 (2010). [CrossRef]

3. E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

4. L. Maher, M. Galdino, T. Sato, K. Xu, S. Shi, S. Kilmurray, B. Savory, R. Thomsen, P. Killey, and Bayvel, “Linear and nonlinear impairment mitigation in a Nyquist spaced DP-16QAM WDM transmission system with full-field DBP,” 2014 The European Conference on Optical Communication (ECOC). IEEE, 2014, presented at the 39th Eur. Conf. Optical Communication, Cannes, France, 2014, paper P.5.10.

5. T. Xu, J. Li, A. Djupsjöbacka, R. Schatz, G. Jacobsen, and S. Popov, “Quasi real-time 230-Gbit/s coherent transmission field trial over 820 km SSMF using 57.5-Gbaud dual-polarization QPSK,” Asia Communications and Photonics Conference. Optical Society of America, 2013.

6. S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express 16(2), 804–817 (2008). [CrossRef]

7. E. Temprana, E. Myslivets, B.P.-P. Kuo, L. Liu, V. Ataie, N. Alic, and S. Radic, “Overcoming Kerr-induced capacity limit in optical fiber transmission,” Science 348(6242), 1445–1448 (2015). [CrossRef]

8. L. Galdino, D. Semrau, D. Lavery, G. Saavedra, C. B. Czegledi, E. Agrell, R. Killey, and P. Bayvel, “On the limits of digital back-propagation in the presence of transceiver noise,” Opt. Express 25(4), 4564–4578 (2017). [CrossRef]

9. F. Guiomar, J. Reis, A. Teixeira, and A. Pinto, “Mitigation of intra-channel nonlinearities using a frequency-domain Volterra series equalizer,” Opt. Express 20(2), 1360–1369 (2012). [CrossRef]

10. B. Xu and M. Brandt-Pearce, “Comparison of FWM-and XPM-induced crosstalk using the Volterra series transfer function method,” J. Lightwave Technol. 21(1), 40–53 (2003). [CrossRef]

11. I. Phillips, M. Tan, MF. Stephens, M. Mccarthy, and Andrew D. Ellis, “Exceeding the nonlinear-Shannon limit using Raman laser based amplification and optical phase conjugation,” OFC 2014. IEEE, 2014.

12. X. Yi, X. Chen, D. Sharma, C. Li, M. Luo, Q. Yang, Z. Li, and K. Qiu, “Digital coherent superposition of optical OFDM subcarrier pairs with Hermitian symmetry for phase noise mitigation,” Opt. Express 22(11), 13454–13459 (2014). [CrossRef]

13. S. Derevyanko, J. Prilepsky, and S. Turitsyn, “Capacity estimates for optical transmission based on the nonlinear Fourier transform,” Nat. Commun. 7(1), 12710 (2016). [CrossRef]

14. J. Gonçalves, C. Martins, F. Guiomar, T. Cunha, J. Pedro, A. Pinto, and P. Lavrador, “Nonlinear compensation with DBP aided by a memory polynomial,” Opt. Express 24(26), 30309–30316 (2016). [CrossRef]

15. T. O’Shea and J. Hoydis, “An introduction to deep Learning for the physical layer,” IEEE Trans. Cogn. Commun. Netw. 3(4), 563–575 (2017). [CrossRef]

16. I. V. Tetko, D. J. Livingstone, and A. I. Luik, “Neural network studies. 1. Comparison of overfitting and overtraining,” J. Chem. Inf. Comput. Sci. 35(5), 826–833 (1995). [CrossRef]

17. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

18. B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bulow, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end deep learning of optical fiber communications,” J. Lightwave Technol. 36(20), 4843–4855 (2018). [CrossRef]

19. F. N. Khan, Q. Fan, C. Lu, and A. P. T. Lau, “An Optical Communication’s Perspective on Machine Learning and Its Applications,” J. Lightwave Technol. 37(2), 493–516 (2019). [CrossRef]

20. J. R. Hershey, J. L. Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,” arXiv preprint arXiv:1409.2574 (2014).

21. A. Balatsoukas-Stimming and C. Studer, “Deep unfolding for communications systems: A survey and some new directions,” 2019 IEEE International Workshop on Signal Processing Systems (SiPS). IEEE, 2019.

22. Z. Zhai, H. Jiang, M. Fu, L. Liu, L. Yi, W. Hu, and Q. Zhuge, “An Interpretable Mapping From a Communication System to a Neural Network for Optimal Transceiver-Joint Equalization,” J. Lightwave Technol. 39(17), 5449–5458 (2021). [CrossRef]

23. C. Häger and H. D. Pfister, “Nonlinear Interference Mitigation via Deep Neural Networks,” in Optical Fiber Communication Conference, (Optical Society of America, 2018), p. W3A.4.

24. C. Häger and H. D. Pfister, “Physics-based deep learning for fiber-optic communication systems,” IEEE J. Select. Areas Commun. 39(1), 280–294 (2021). [CrossRef]

25. B. I. Bitachon, A. Ghazisaeidi, M. Eppenberger, B. Baeurle, and J. Leuthold, “Deep learning based digital backpropagation demonstrating SNR gain at low complexity in a 1200 km transmission link,” Opt. Express 28(20), 29318–29334 (2020). [CrossRef]

26. J. C. Bancroft, “Introduction to matched filters,” CREWES Research 297 (2002).

27. C. R. S. Fludger, T. Duthel, D. van den Borne, C. Schulien, E. Schmidt, T. Wuth, J. Geyer, E. De Man, G. Khoe, and H. de Waardt, “Coherent equalization and POLMUX-RZ-DQPSK for robust 100-GE transmission,” J. Lightwave Technol. 26(1), 64–72 (2008). [CrossRef]

28. J. Zhao, Y. Liu, and T. Xu, “Advanced DSP for coherent optical fiber communication,” Appl. Sci. 9(19), 4192 (2019). [CrossRef]

29. Y. Gao, J. C. Cartledge, A. S. Karar, Scott S.-H. Yam, M. O’Sullivan, C. Laperle, A. Borowiec, and K. Roberts, “Reducing the complexity of perturbation based nonlinearity pre-compensation using symmetric EDC and pulse shaping,” Opt. Express 22(2), 1209–1219 (2014). [CrossRef]

30. G. Bosco, A. Carena, V. Curri, R. Gaudino, P. Poggiolini, and S. Benedetto, “Suppression of spurious tones induced by the split-step method in fiber systems simulation,” IEEE Photonics Technol. Lett. 12(5), 489–491 (2000). [CrossRef]

31. S. Gaiarin, F. Da Ros, R. T. Jones, and D. Zibar, “End-to-end optimization of coherent optical communications over the split-step Fourier method guided by the nonlinear Fourier transform theory,” J. Lightwave Technol. 39(2), 418–428 (2021). [CrossRef]

32. Q. Fan, G. Zhou, T. Gui, C. Lu, and A. P. T. Lau, “Advancing theoretical understanding and practical performance of signal processing for nonlinear optical communications through machine learning,” Nat. Commun. 11(1), 1–11 (2020). [CrossRef]

Design of fully interpretable neural networks for digital coherent demodulation

Abstract

1. Introduction

2. Brief outline of DSP and the framework of NN

2.1 Linear chromatic dispersion compensation

2.2 Fiber nonlinear compensation

2.3 Matched filter

2.4 Polarization demultiplexing and equalization

2.5 Frequency offset and phase estimation

2.6 Optimization base on NN and practical considerations

3. Experimental setup and results

4. Conclusion

Funding

Acknowledgement

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (1)

Equations (9)

Optics Express

Layers	CDC	FINN-1sps-DBP		FINN-2sps-DBP		Matched	FOE	Pol Demux
Layers	CDC	CDC	NLC	CDC	NLC	Matched	FOE	Pol Demux
Kernels	85	15 × 6	1 × 6	(13 + 3) × 6	(1 + 1) × 6	65	1	21 × 4
Modules	CDC	1sps-DBP		2sps-DBP		Matched	FOE	Pol Demux
Modules	CDC	CDC	NLC	CDC	NLC	Matched	FOE	Pol Demux
Taps	85	FFT/IFFT 69 × 6	1 × 6	FFT/IFFT (69 + 69)×6	(1 + 1) × 6	65	1	21× 4