Learning-based digital back propagation to compensate for fiber nonlinearity considering self-phase and cross-phase modulation for wavelength-division multiplexed systems

Takashi Inoue; Ryosuke Matsumoto; Shu Namiki

doi:10.1364/OE.454841

1. Introduction

In high-capacity and long-distance wavelength-division multiplexed (WDM) optical fiber transmission systems, the capacity and distance are limited by the nonlinearity of the optical fibers [1,2]. Digital back propagation (DBP) has been proposed as a promising technique to compensate for nonlinear waveform distortion [3,4]. DBP is implemented in digital signal processing (DSP) after a transmitted signal is received to estimate the signal’s waveform at the transmitter. This is done by calculating reverse propagation of the amplitude in the transmission line based on the split-step Fourier method (SSFM) to solve the nonlinear Schrödinger equation (NLSE) as a physical model of lightwave propagation in an optical fiber [5]. A “virtual transmission line” is defined to calculate the SSFM, where each span of a real transmission line is split into several steps, and each step involves alternate linear and nonlinear segments corresponding to the linear and nonlinear parts of the NLSE. Although the use of many steps per span for a virtual transmission line contributes to a more accurate result, the computational complexity also increases. Therefore, the number of steps per span is typically limited to one or two for the practical implementation of DSP.

A conventional DBP can, in principle, compensate for waveform distortion caused by self-phase modulation (SPM) and cross-phase modulation (XPM) between different channels for a WDM signal. This is because both of them originate from the same physical nature of an optical fiber, and they can be identically treated if the entire amplitude of multiple channels is calculated together. DBP with such a structure is called “multi-channel DBP” or “full-field DBP” [6]. However, the high bandwidth to treat multiple channels together requires a considerably high waveform sampling rate in the time and frequency domains. Many steps per span are necessary for the calculation of SSFM because the dispersion length [5] decreases for a high-bandwidth amplitude. Both a large sampling rate and a short dispersion length cause an impractically high computational complexity for DSP. Consequently, in practical situations, “single-channel DBP” is implemented for individual channels to compensate for the waveform distortion caused by the nonlinear phase shift related to SPM only; the phase shift related to XPM from the other channels is typically ignored. A useful method was proposed to calculate the phase shift owing to XPM, even in SSFM implemented for individual channels [7]. In this method, the intensity of a neighboring channel is assumed to be invariant throughout a nonlinear segment, except for the time shift due to the group-velocity dispersion (GVD) effect, which is the so-called “walkoff.” This assumption and the operation of the Fourier transform enable the integration of the phase shift caused by XPM from the moving amplitude of the neighboring channel over the entire distance of the nonlinear segment. This method has been applied to “single-channel DBP” with inter-channel connections to calculate the effect of XPM on a WDM signal [8,9], and the required computational complexity is considerably reduced compared to the “multi-channel DBP.” However, a substantially higher calculation cost is still necessary compared to conventional DBP schemes that consider only SPM with one- or two-step/span configurations.

Achieving the expected performance of the DBP is limited by the setting of appropriate parameters for a virtual transmission line. The use of non-accurate parameters in DBP may degrade signal quality compared to the case without any compensation schemes for nonlinear waveform distortions. Learning-based DBP (LDBP) technique has been proposed and widely studied to address this limitation [10–15]. With this technique, the parameters of a virtual transmission line can be optimized based on the stochastic gradient descent (SGD) algorithm such that the mean-square error (MSE) between the signal waveforms before transmission and after the DBP process is minimized, that is, the signal quality is maximized. Furthermore, a previous study has shown that LDBP has the potential to reduce the required number of steps in the span of a virtual transmission line [13]. Hence, LDBP realizes high-performance nonlinearity compensation with reduced computational complexity at the expense of a process for learning parameters. However, most of the LDBP techniques reported thus far operated on a channel-by-channel basis and focused on compensating for the waveform distortion related to SPM only; the influence of XPM was not considered. Recently, LDBP techniques that consider XPM have been reported [16,17], where a finite impulse response (FIR) filter, which uses the intensities of the neighboring channels as its input, is employed to calculate the phase shift induced by XPM. LDBPs based on FIR filters are not only advantageous for arbitrary impairments that are well described in the frequency domain but also highly flexible in terms of a large tap number. This results in difficulties in designing circuits for DSP in various practical situations, or instabilities in the convergence of many tap coefficients in the learning process. Therefore, it is important to further investigate LDBP to address XPM based on an established physical mechanism with a few optimized parameters rather than the FIR filter to improve the transmission performance of WDM signals stably in practical situations.

In this paper, we propose an LDBP technique to compensate for the waveform distortion of the WDM signal induced by a nonlinear phase shift that originates from the SPM and XPM. The proposed LDBP has a structure of “single-channel DBP” with inter-channel connections to consider the XPM as the interaction between neighboring channels of the WDM signal, and an approximation is applied to the propagation of the amplitudes over the nonlinear segments [7–9]. Accordingly, the LDBP has a filter in the frequency domain to calculate the phase shift induced by XPM, and it has several optimized parameters per channel in its structure; in particular, direct learning of the parameters for the walkoff between different channels is effective for compensating for the XPM-induced phase shift in a transmission line where the local distribution of the fiber dispersion is unknown. We develop a framework to systematically optimize the parameters used in the LDBP based on the SGD algorithm and explicitly obtain equations to update the parameters. Unlike the LDBPs using FIR filter allowing freedom of arbitrary tap coefficients [16,17], the proposed LDBP is constrained by the model so as to dedicate its operation to compensation of waveform distortion originating from the GVD and the phase shift induced by SPM and XPM. Therefore, the parameters in the model are expected to converge to the best optimum values to compensate for the distortion by XPM after the learning process. We then conduct a transmission experiment to verify the effectiveness of the proposed LDBP technique. In the experiment, we employed a recirculation-loop configuration for transmission distances of 6, 10, and 16 spans of approximately 80-km standard single-mode fiber (SSMF) link, and we transmitted 11-channel, 32-Gbaud, dual-polarization (DP) uniform 16 quadrature amplitude modulation (QAM) signals, and probabilistically shaped (PS) 64QAM signals [18,19]. Using the received waveforms of the DP-16QAM signals, we implemented a learning process to optimize particular parameters in the virtual transmission line for each distance, and we confirmed the successful convergence of the parameters. We apply the LDBP technique with the fixed optimized parameters to the received waveforms of the DP-PS-64QAM signals, and we compare the performance with DBPs that have one- or two-step/span configurations, with and without the structure to address XPM. We then observe that the proposed LDBP that considers XPM with a one-step/span configuration exhibits the best performance, and the learning process is meaningful, especially when XPM is considered for compensation. To the best of our knowledge, this is the first experimental result to confirm an operation of LDBP to compensate for nonlinear phase shift induced by XPM. Finally, we estimate the computational complexity required to operate the LDBP after the learning process is complete. This reveals that the calculation cost of the one-step/span configuration considering XPM, employed in the LDBP, is comparable with that of a conventional DBP that employs a two-step/span configuration considering only SPM.

2. Operation principle

2.1 Structure and mathematical model of proposed LDBP

The propagation of a DP signal in an optical fiber is described by the coupled NLSE [5,20,21],

(1)$$i\frac{\partial A_p}{\partial z}+\frac{\beta_2}{2}\frac{\partial^2A_p}{\partial t^2}-i\frac{\beta_3}{6}\frac{\partial^3A_p}{\partial t^3}-\frac{8}{9}\gamma_0\left(\left|A_p\right|^2+\delta\left|A_{3-p}\right|^2\right)A_p={-}i\frac{\alpha}{2}A_p,$$

where $A_p\left (z,t\right )$ is the complex envelope of an optical signal, which is a function of distance $z$ and time $t$. $p=1$ or $2$ represents an orthogonal polarization component of the signal. $\alpha$, $\beta _2$, and $\beta _3$ are coefficients for the propagation loss, second-order GVD, and third-order GVD of a fiber, respectively, which are categorized as linear propagation effects. $\gamma _0$ is the nonlinear coefficient of a fiber, which is proportional to the nonlinear refractive index $n_2$ and inversely proportional to the effective area $A_\mathrm {eff}$ of the fiber [5]; this is an important parameter for characterizing nonlinear effects, such as SPM and XPM, for the envelope during propagation. $\delta$ is the cross-polarization phase modulation (XPolM) coefficient, and it is set to $\delta =1$ for the case of the Manakov model [20,21]. In deriving Eq. (1), the Fourier transform against temporal waveform $A(t)$ is defined as $\tilde {A}(\omega )=\mathcal {F}\left [A(t)\right ]=\int _{-\infty }^{\infty }{A(t)e^{-i\omega t}\mathrm {d}t}.$

We consider a $C$-channel WDM signal with a frequency spacing $\Delta \omega$ and define the envelope as

(2)$$A_p(t)=\sum_{n}{A_{p,n}(t)e^{i\omega_nt}},$$

where $A_{p,n}(t)$ is the complex envelope of the channel with an index of $n$ and polarization component $p$, and $\omega _n=n\Delta \omega$ is the carrier frequency of the channel. We assume an odd channel number $C$, and the index $n$ is assigned as $-(C-1)/2\leq n\leq (C+1)/2$. Throughout this study, we define the central channel with the index $n=0$ as the “target channel” and focus on the quality of the target channel to compensate for the nonlinear waveform distortion caused by interplay between the GVD and nonlinear phase shift, such as SPM and XPM.

We split the span of a real transmission line into several steps to define a virtual transmission line in a DBP process based on SSFM. Figure 1 depicts a schematic of a virtual transmission line with a two-step/span configuration against a two-span real transmission line as an example, and a varying optical power for the signal owing to propagation loss and lumped amplification between the spans is also shown. In Fig. 1, the span of the real transmission line is split into two steps, such that the nonlinear lengths [5] for those steps become identical. A nonlinear segment $\mathcal {N}^{(j)}$ for a virtual transmission line is defined corresponding to a step of the real one, and it has a distance of $h^{(j)}$, which is equal to the physical distance of the step. Although linear segments can be defined for each of the first and second halves of a step, we combine the second linear segment of a step with the first of the next step. For example, in Fig. 1, the first linear segment $\mathcal {L}^{(0)}$ has a length of $h^{(1)}/2$, the following segments $\mathcal {L}^{(j)}$ have lengths of $[h^{(j)}+h^{(j+1)}]/2$ for $1\leq j\leq 3$, and finally, $\mathcal {L}^{(4)}$ has $h^{(4)}/2$. Consequently, linear and nonlinear segments are calculated alternately in the order of $\mathcal {L}^{(0)}$, $\mathcal {N}^{(1)}$, $\mathcal {L}^{(1)}$, $\cdots$, and $\mathcal {L}^{(4)}$, as shown in Fig. 1.

Fig. 1. Linear segments $\mathcal {L}^{(j)}$ $(0\leq j\leq 4)$ and nonlinear segments $\mathcal {N}^{(j)}$ $(1\leq j\leq 4)$ of virtual transmission line with two-step/span configuration for two-span real transmission line and the order of calculation. Decaying optical power along the forward propagation distance in a real transmission line is also depicted.

Download Full Size | PDF

By collecting the terms related to linear propagation effects in the NLSE (1), we obtain the following equation for a linear segment:

(3)$$i\frac{\partial A_p}{\partial z}+\frac{\beta_2}{2}\frac{\partial^2A_p}{\partial t^2}-i\frac{\beta_3}{6}\frac{\partial^3A_p}{\partial t^3}={-}i\frac{\alpha}{2}A_p.$$

The amplitude of each channel component after propagating through distance $h$ is then given by integrating Eq. (3) with respect to $z$ in the Fourier domain.

(4)$$\tilde{A}_{p,n}\left(h,\omega\right)=\exp{\left[i\left(D_2\omega^2+D_3\omega^3-T_n\omega\right)-\frac{\alpha}2 h\right]}\tilde{A}_{p,n}\left(0,\omega\right),$$

where $D_2(h)=-h\beta _2/2$ and $D_3(h)=-h\beta _3/6$ are the coefficients of the accumulated values of the second and third dispersions for the distance $h$, respectively, and $T_n(h)=h(\beta _2\omega _n+\beta _3\omega _n^2/2)$ is the walkoff of channel $n$ observed from the target channel. Notably, $d_n\equiv \beta _2\omega _n+\beta _3\omega _n^2/2$ corresponds to the walkoff per unit length or the inverse of the group velocity of channel $n$ relative to the target channel. $d_n$ is an important factor in compensating the nonlinear phase shift induced by XPM, as discussed later; hereafter, we call $d_n$ the “walkoff parameter.” The amplitude described in Eq. (4) can be written in the time domain using the Fourier operator $\mathcal {F}$ as follows:

(5)$$A_{p,n}(h,t)= \mathcal{F}^{{-}1}\exp{\left[i\left(D_2\omega^2+D_3\omega^3-T_n\omega\right)-\frac{\alpha}2h\right]}{\mathcal{F}A}_{p,n}(0,t).$$

The collection of terms related to the nonlinear propagation effects in the NLSE (1) yields

(6)$$i\frac{\partial A_p}{\partial z}-\frac{8}{9}\gamma_0\left(\left|A_p\right|^2+\delta\left|A_{3-p}\right|^2\right)A_p=0.$$

We substitute Eq. (2) into Eq. (6), expand the nonlinear terms for the individual channels, and obtain an equation that describes the propagation of the target channel $A_{p,0}$ as follows:

(7)$$\frac{\partial A_{p,0}}{\partial z} ={-}i\frac{8}{9}\gamma_0\left[\left|A_{p,0}\right|^2+\delta\left|A_{3-p,0}\right|^2+\sum_{n\neq0}\left(2\left|A_{p,n}\right|^2+\delta\left|A_{3-p,n}\right|^2\right)\right]A_{p,0}.$$

In deriving Eq. (7), we ignored the terms related to nonlinear polarization crosstalk [14,21] because the nonlinear phase shift induced by SPM and XPM primarily contributed to the waveform distortion of the WDM signals. To integrate Eq. (7) with respect to $z$, we introduce an assumption for the propagation of the signal: the intensity of each channel, $|A_{p,n}(t)|^2$, is invariant over the distance, except for the damped optical power due to the loss and the walkoff owing to the dispersion of the fiber [7]. This results in an approximated description for the intensity, $|A_{p,n}(z,t)|^2=\exp (-\alpha z)|A_{p,n}(0,t-d_nz)|^2$, where $d_n$ is the walkoff parameter defined above. Under this assumption, we can integrate Eq. (7) with respect to $z$ over an interval of $[-h/2, h/2]$ and obtain the following solution:

(8)$$ A_{p,0}(h,t)= A_{p,0}(0,t)\exp\left[i\left\{\varphi_p^\mathrm{SPM}(h,t)+\varphi_p^\mathrm{XPM}(h,t)\right\}\right], $$

(9)$$ \varphi_p^\mathrm{SPM}\left(h,t\right) = gH_0(h)\left[P_{p,0}(t)+\delta P_{3-p,0}(t)\right], $$

(10)$$ \varphi_p^\mathrm{XPM}\left(h,t\right) = g\mathcal{F}^{{-}1}\left[\sum_{n\neq0}{H_n\left(h,\omega\right)\left\{2\tilde{P}_{p,n}(\omega)+\delta\tilde{P}_{3-p,n}(\omega)\right\}}\right], $$

where $g=-\frac 89\gamma _0$, $P_{p,0}(t)=|A_{p,0}(0,t)|^2$, $\tilde {P}_{p,n}(\omega )=\mathcal {F}[|A_{p,n}(0,t)|^2]$, $H_0(h)=\frac {2}{\alpha }\sinh {\left (\frac {\alpha }{2}h\right )}$, $H_n(h,\omega )=\frac {2}{\zeta _n(\omega )}\sinh {\left [\frac {\zeta _n(\omega )}{2}h\right ]}$, and $\zeta _n(\omega )=\alpha +id_n\omega$. Using Eqs. (8)–(10), we can calculate the nonlinear phase shift of the target channel caused by SPM and XPM after a nonlinear segment with a distance of $h$.

Figure 2 shows a block diagram of the linear and nonlinear segments of a virtual transmission line for a DBP process that considers SPM and XPM for a three-channel WDM signal with the channel indices of $n=-1, 0, +1$, where the thick and red arrows from $\mathcal {L}_{p,n}^{(j)}$ to $\mathcal {N}_{p,n}^{(j+1)}$ represent evolution related to SPM calculated by Eq. (9), and thin and blue arrows are related to XPM via Eq. (10). $x_{p,n}$ is the input amplitude for the polarization component $p$ and channel index $n$ of the received signal after a real transmission line, and $y_{p,n}$ is the output amplitude of the virtual transmission line. The linear segments for the orthogonal polarization components, $\mathcal {L}_{1,n}^{(j)}$ and $\mathcal {L}_{2,n}^{(j)}$, use parameters such as $D_2^{(j)}$, $D_3^{(j)}$, and $T_n^{(j)}$ in common; similarly, $\mathcal {N}_{1,n}^{(j)}$ and $\mathcal {N}_{2,n}^{(j)}$ share $g^{(j)}$, $\delta ^{(j)}$, and $d_n^{(j)}$. The DBP structure shown in Fig. 2, which employs a method to calculate the nonlinear phase shift including XPM using Eq. (10), was originally proposed in Ref. [8], and we aim to develop a scheme to optimize the parameters used in this structure based on an iterative learning procedure that is typically used in various machine learning techniques. Table 1 summarizes the parameters to be optimized in this paper. In addition to these parameters, the third-order GVD $D_3$ and propagation loss coefficient $\alpha$ could also be optimized if required. In compensating for nonlinear phase shift induced by XPM, we use a filter in the frequency domain, $H_n(h,\omega )$, as defined in Eq. (10), while Refs. [16,17] proposed the use of FIR filter in the time domain. Both filters are mathematically equivalent in their operations after the learning process is completed. However, $H_n(h,\omega )$ has a specific function derived from the physical mechanism of XPM. Therefore, some useful prior information such as the walkoff parameter is available at the beginning of the learning process, and its operation dedicated to the compensation of XPM-induced phase shift is advantageous for acquiring the best optimum parameters.

Fig. 2. Linear segments $\mathcal {L}_{p,n}^{(j)}$ and nonlinear segments $\mathcal {N}_{p,n}^{(j)}$ in a virtual transmission line considering SPM and XPM for 3-channel WDM signal.

Download Full Size | PDF

Table 1. Parameters to be optimized by the developed iterative learning scheme.

View Table

Notably, the structure shown in Fig. 2 is a type of network in which the results of the linear transformations are interconnected before entering the nonlinear functions. By considering an approach to optimize the parameters therein using a machine learning technique, the network discussed here can also be regarded as a neural network based on physical modeling for lightwaves propagating through optical fibers [11].

2.2 Development of learning procedure

We developed a framework based on the SGD algorithm to optimize the parameters used in the DBP process presented above. We define an error function $J(\boldsymbol {\theta })$ against a parameter set $\boldsymbol {\theta }$ as follows:

(11)$$J(\boldsymbol{\theta})=\sum_k\left\|\mathbf{e}(k;\boldsymbol{\theta})\right\|^2=\sum_k\mathbf{e}^T\mathbf{e}^{{\ast}},$$

where $\mathbf {e}(k;\boldsymbol {\theta })=\mathbf {y}(k;\boldsymbol {\theta })-\mathbf {t}(k)$ is the difference between the amplitude of the signal after the DBP process, $\mathbf {y}(k;\boldsymbol {\theta })=[y_1(k;\boldsymbol {\theta }) ~ y_2(k;\boldsymbol {\theta })]^T$, and the training data, $\mathbf {t}(k)=[t_1(k;\boldsymbol {\theta }) ~ t_2(k;\boldsymbol {\theta })]^T$, both of which are vector signals of the orthogonal polarization components. $k$ represents an index for discrete time, and superscript $T$ denotes the transpose of a vector. The training data $\mathbf {t}(k)$ are identical to the waveform before transmission, which is assumed to be known in the learning process for $\boldsymbol {\theta }$. The parameter set $\boldsymbol {\theta }$ can be iteratively updated using the following equation:

(12)$$\boldsymbol{\theta}_{i+1}=\boldsymbol{\theta}_i-\frac{\eta}2\boldsymbol{\nabla}J,$$

where $\boldsymbol {\theta }_i$ is the parameter set at the $i$-th step of the iterative process, $\eta$ is the step-size parameter, and the gradient $\boldsymbol {\nabla }J$ is given by

(13)$$\boldsymbol{\nabla} J =\frac{\partial J}{\partial\boldsymbol{\theta}} =\sum_k \frac{\partial}{\partial\boldsymbol{\theta}}\left\|\mathbf{e}(k;\boldsymbol{\theta})\right\|^2 =2\sum_k\mathrm{Re}\left[\frac{\partial \mathbf{e}^T}{\partial\boldsymbol{\theta}}\mathbf{e}^{{\ast}}\right] =2\sum_k\mathrm{Re}\left[\frac{\partial \mathbf{y}^T}{\partial\boldsymbol{\theta}}(\mathbf{y}^{{\ast}}-\mathbf{t}^{{\ast}})\right],$$

where we applied $\partial \mathbf {e}/\partial \boldsymbol {\theta }=\partial \mathbf {y}/\partial \boldsymbol {\theta }$ because the training data $\mathbf {t}(k)$ are independent of the parameter set $\boldsymbol {\theta }$. Then, we finally obtain

(14)$$\boldsymbol{\theta}_{i+1}=\boldsymbol{\theta}_i-\eta\sum_k\mathrm{Re}\left[\frac{\partial \mathbf{y}^T}{\partial\boldsymbol{\theta}}(\mathbf{y}^{{\ast}}-\mathbf{t}^{{\ast}})\right].$$

The parameter set $\boldsymbol {\theta }$ can be updated using Eq. (14), and we require the gradient $\partial \mathbf {y}/\partial \theta$ for each parameter $\theta$. Using the chain rule of differentiation, we derive equations to calculate the gradients with respect to all parameters in a virtual transmission line.

As shown in Fig. 2, we define the input and output of a linear segment $\mathcal {L}_{p,n}^{(j)}$ as $x_{p,n}^{(j)}$ and $y_{p,n}^{(j)}$, respectively, which are related by Eq. (5) as follows:

(15)$$y_{p,n}^{(j)}=\mathcal{F}^{{-}1}\exp\left[i\left(D_2^{(j)}\omega^2+D_3^{(j)}\omega^3-T_n^{(j)}\omega\right)-\frac{\alpha}{2}h\right]\mathcal{F}x_{p,n}^{(j)},$$

where the distance $h$ is set to $h=h^{(1)}/2$ for $j=0$, $h=[h^{(j)}+h^{(j+1)}]/2$ for $1\leq j\leq m-1$, and $h=h^{(m)}/2$ for $j=m$; $h^{(j)}$ is the distance of the nonlinear segment $\mathcal {N}_{p,n}^{(j)}$. In addition, the walkoff $T_n^{(j)}$ is given by $T_n^{(0)}=h^{(1)}d_n^{(1)}/2$, $T_n^{(j)}=[h^{(j)}d_n^{(j)}+h^{(j+1)}d_n^{(j+1)}]/2$ for $1\leq j\leq m-1$, and $T_n^{(m)}=h^{(m)}d_n^{(m)}/2$ using the distance $h^{(j)}$ and the walkoff parameter $d_n^{(j)}$ of $\mathcal {N}_{p,n}^{(j)}$. We optimize the second dispersion coefficient, $D_2^{(j)}$ in $\mathcal {L}_{p,n}^{(j)}$ using the iterative learning scheme discussed above. To achieve this, we directly differentiate Eq. (15) with respect to $D_2^{(j)}$:

(16)$$\frac{\partial y_{p,n}^{(j)}}{\partial D_2^{(j)}}= \mathcal{F}^{{-}1}(i\omega^2)\exp\left[i\left(D_2^{(j)}\omega^2+D_3^{(j)}\omega^3-T_n^{(j)}\omega\right)-\frac{\alpha}{2}h\right]\mathcal{F}x_{p,n}^{(j) }.$$

Equation (16) suggests that the gradients of the output $y_{p,n}^{(j)}$ of $\mathcal {L}_{p,n}^{(j)}$ with respect to $D_2^{(j)}$ can be calculated from the input $x_{p,n}^{(j)}$ of $\mathcal {L}_{p,n}^{(j)}$. Meanwhile, the gradients of $y_{p,n}^{(j)}$ with respect to any parameter $\varepsilon$ that appeared in all segments before $\mathcal {L}_{p,n}^{(j)}$ are calculated as

(17)$$\frac{\partial y_{p,n}^{(j)}}{\partial\varepsilon}= \mathcal{F}^{{-}1}\exp\left[i\left(D_2^{(j)}\omega^2+D_3^{(j)}\omega^3-T_n^{(j)}\omega\right)-\frac{\alpha}{2}h\right] \mathcal{F}\frac{\partial x_{p,n}^{(j)}}{\partial\varepsilon}.$$

This means that the gradient of the input, $\partial x_{p,n}^{(j)}/\partial \varepsilon$, is transformed into the gradient of the output, $\partial y_{p,n}^{(j)}/\partial \varepsilon$, using the same operation as in Eq. (15). The gradient described in Eq. (16) is consolidated into $\partial y_{p,n}^{(j)}/\partial \varepsilon$ and handed over to the next segment $\mathcal {N}_{p,n}^{(j+1)}$.

In a nonlinear segment $\mathcal {N}_{p,n}^{(j)}$, we optimize the nonlinear coefficient $g^{(j)}$, XPolM coefficient $\delta ^{(j)}$, and walkoff parameters $d_n^{(j)}$. The input and output amplitudes of $\mathcal {N}_{p,n}^{(j)}$ are $y_{p,n}^{(j-1)}$ and $x_{p,n}^{(j)}$, respectively, and their relationship for the target channel is described by Eqs. (8)–(10) as follows:

(18)$$x_{p,0}^{(j)}=y_{p,0}^{(j-1)}\exp{\left[i\varphi_{p,0}^{(j)}\right]},$$

(19)$$\begin{aligned} \varphi_{p,0}^{(j)}&=g^{(j)}\mathcal{F}^{{-}1}\left[H_0^{(j)}\left\{\tilde{P}_{p,0}^{(j-1)}(\omega)+\delta^{(j)}\tilde{P}_{3-p,0}^{(j-1)}(\omega)\right\}\right.\\ &\left.+\sum_{n\neq0}{H_n^{(j)}(\omega)\left\{2\tilde{P}_{p,n}^{(j-1)}(\omega)+\delta^{(j)}\tilde{P}_{3-p,n}^{(j-1)}(\omega)\right\}}\right], \end{aligned}$$

(20)$$H_0^{(j)}=\frac{2}{\alpha}\sinh\left[\frac{\alpha}{2}h^{(j)}\right],$$

(21)$$H_n^{(j)}(\omega)=\frac{2}{\zeta_n^{(j)}}\sinh\left[\frac{\zeta_n^{(j)}}{2}h^{(j)}\right],$$

(22)$$\zeta_n^{(j)}(\omega)=\alpha+id_n^{(j)}\omega,$$

where $\tilde {P}_{p,n}^{(j-1)}(\omega )=\mathcal {F}[|y_{p,n}^{(j-1)}(t)|^2]$. We can calculate the gradients of the output $x_{p,n}^{(j)}$ with respect to the parameters to be optimized by differentiating Eq. (18),

(23)$$\begin{aligned} \frac{\partial x_{p,0}^{(j)}}{\partial g^{(j)}} &=i\mathcal{F}^{{-}1}\left[H_0^{(j)}\left\{\tilde{P}_{p,0}^{(j-1)}(\omega)+\delta^{(j)}\tilde{P}_{3-p,0}^{(j-1)}(\omega)\right\}\right.\\ &\left.+\sum_{n\neq0}{H_n^{(j)}(\omega)\left\{2\tilde{P}_{p,n}^{(j-1)}(\omega)+\delta^{(j)}\tilde{P}_{3-p,n}^{(j-1)}(\omega)\right\}}\right]x_{p,0}^{(j)}, \end{aligned}$$

(24)$$ \frac{\partial x_{p,0}^{(j)}}{\partial\delta^{(j)}} =ig^{(j)}\mathcal{F}^{{-}1}\left[H_0^{(j)}\tilde{P}_{3-p,0}^{(j-1)}(\omega)+\sum_{n\neq0}{H_n^{(j)}(\omega)\tilde{P}_{3-p,n}^{(j-1)}(\omega)}\right]x_{p,0}^{(j)}, $$

(25)$$ \frac{\partial x_{p,0}^{(j)}}{\partial d_n^{(j)}} =ig^{(j)}\mathcal{F}^{{-}1}\left[\frac{\partial H_n^{(j)}(\omega)}{\partial d_n^{(j)}}\left\{2\tilde{P}_{p,n}^{(j-1)}(\omega)+\delta^{(j)}\tilde{P}_{3-p,n}^{(j-1)}(\omega)\right\}\right]x_{p,0}^{(j)}. $$

Thus, in a nonlinear segment, $\mathcal {N}_{p,n}^{(j)}$, the gradient of the output $x_{p,n}^{(j)}$ with respect to its parameters can be calculated from the output itself and the input intensity spectra $\tilde {P}_{p,n}^{(j-1)}(\omega )$. In Eq. (25), $\partial H_n(\omega )/\partial d_n$ can be calculated using Eqs. (21) and (22) as follows:

(26)$$\frac{\partial H_n(\omega)}{\partial d_n}=i\frac{\omega}{\zeta_n^2}\left[\zeta_nh\cosh\left(\frac{\zeta_n}2h\right)-2\sinh\left(\frac{\zeta_n}2h\right)\right].$$

The gradient of the output $x_{p,n}^{(j)}$ with respect to any parameter $\varepsilon$ that appeared in all segments before $\mathcal {N}_{p,n}^{(j)}$ is obtained as follows:

(27)$$ \frac{\partial x_{p,0}^{(j)}}{\partial\varepsilon}=\left[i\frac{\partial\varphi_{p,0}^{(j)}}{\partial\varepsilon}y_{p,0}^{(j-1)}+\frac{\partial y_{p,0}^{(j-1)}}{\partial\varepsilon}\right]\exp{\left[i\varphi_{p,0}^{(j)}\right]}, $$

(28)$$\begin{aligned} \frac{\partial\varphi_{p,0}^{(j)}}{\partial\varepsilon} &=g^{(j)}\mathcal{F}^{{-}1}\left[H_0^{(j)}\left\{\frac{\partial\tilde{P}_{p,0}^{(j-1)}(\omega)}{\partial\varepsilon}+\delta^{(j)}\frac{\partial\tilde{P}_{3-p,0}^{(j-1)}}{\partial\varepsilon}\right\}\right.\\ & \left.+\sum_{n\neq0}{H_n^{(j)}(\omega)\left\{2\frac{\partial\tilde{P}_{p,n}^{(j-1)}}{\partial\varepsilon}+\delta^{(j)}\frac{\partial\tilde{P}_{3-p,n}^{(j-1)}}{\partial\varepsilon}\right\}}\right]. \end{aligned}$$

Equations (27) and (28) suggest that the input gradient $\partial y_{p,0}^{(j-1)}/\partial \varepsilon$ can be transformed into the output gradient $\partial x_{p,0}^{(j)}/\partial \varepsilon$ after calculations that are similar to Eqs. (18)–(22). The gradients given by Eqs. (23)–(25) are consolidated into $\partial x_{p,n}^{(j)}/\partial \varepsilon$ and handed over to the next segment, $\mathcal {L}_{p,n}^{(j)}$.

Although the use of Eqs. (27) and (28) is secure to calculate $\partial x_{p,0}^{(j)}/\partial \varepsilon$ in a nonlinear segment $\mathcal {N}_{p,n}^{(j)}$, we introduce an assumption to simplify Eq. (28). Figure 3(a) shows a virtual transmission line for a DBP process against a two-channel WDM signal, and we consider calculating the gradient of the output $y_{1,0}$ with respect to $D_2^{(0)}$ defined in $\mathcal {L}_{p,0}^{(0)}$. The gradient $\partial y_{p,0}^{(0)}/\partial D_2^{(0)}$ is given by Eq. (16), and it is handed over to the next nonlinear segments. The gradient is repeatedly transformed at each of the nonlinear and linear segments using Eqs. (27) and (17), respectively, and it reaches the last linear segment $\mathcal {L}_{p,0}^{(2)}$. The paths of the gradient from $\mathcal {L}_{p,0}^{(0)}$ to $\mathcal {N}_{p,1}^{(1)}$ and from $\mathcal {L}_{p,1}^{(1)}$ to $\mathcal {N}_{p,0}^{(2)}$ arise as a consequence of XPM between the two channels; however, the influence of these paths on the last output gradient $\partial y_{1,0}/\partial D_2^{(0)}$ is negligible. That is, when we slightly vary $D_2^{(0)}$ in $\mathcal {L}_{p,0}^{(0)}$, it affects the amplitude of the channel index $n=1$ through XPM in $\mathcal {N}_{p,1}^{(1)}$. It also affects the amplitude of the channel index $n=0$ through XPM in $\mathcal {N}_{p,0}^{(2)}$, but the magnitude of variation of the amplitude is very small. Therefore, we neglect the paths of the gradient across the channels and consider only the paths within a channel, as shown in Fig. 3(b). With this assumption, Eq. (28) is simplified as follows:

(29)$$\begin{aligned} \frac{\partial\varphi_{p,0}^{(j)}}{\partial\varepsilon} &=g^{(j)}H_0^{(j)}\left\{\frac{\partial P_{p,0}^{(j-1)}(t)}{\partial\varepsilon}+\delta^{(j)}\frac{\partial P_{3-p,0}^{(j-1)}(t)}{\partial\varepsilon}\right\}\\ &=2g^{(j)}H_0^{(j)}\mathrm{Re}\left[\frac{\partial y_{p,0}^{(j-1)}}{\partial\varepsilon}\left\{y_{p,0}^{(j-1)}\right\}^\ast{+}\delta^{(j)}\frac{\partial y_{3-p,0}^{(j-1)}}{\partial\varepsilon}\left\{y_{3-p,0}^{(j-1)}\right\}^\ast\right]. \end{aligned}$$

Thus, in a nonlinear segment, the output gradient with respect to a parameter that previously appeared, $\partial x_{p,0}^{(j)}/\partial \varepsilon$, can be simply calculated using the input amplitude $y_{p,0}^{(j-1)}$ and its gradient with respect to the same parameter, $\partial y_{p,0}^{(j-1)}/\partial \varepsilon$, using Eqs. (27) and (29).

Fig. 3. (a) Paths of gradient with respect to $D_2^{(0)}$ originating from $\mathcal {L}_{1,0}^{(0)}$ in a virtual transmission line for 2-channel WDM signal. (b) Simplified paths of the gradient.

Download Full Size | PDF

To summarize a step of the iterative learning process for the parameter set $\boldsymbol {\theta }$ discussed above, at each segment, we calculate the output amplitude and its gradient with respect to the parameters therein and those that appeared in the previous segments, and then we hand them over to the next segment. By repeating such calculations for alternating linear and nonlinear segments, we finally obtain the output amplitude of the target channel after an entire virtual transmission line, $y_{p,0}$, and its gradient with respect to all parameters that are to be optimized, $\partial y_{p,0}/\partial \theta$. We then update the parameter set $\boldsymbol {\theta }$ using Eq. (14). The iterative updating process eventually yields a converged parameter set, and a DBP process using the optimized parameters should effectively compensate for the waveform distortion caused by the nonlinear phase shift owing to SPM and XPM.

3. Experimental verification

We verified the performance of the proposed LDBP technique using a recirculation-loop transmission experiment, in which we transmitted 50-GHz spaced, 11-channel WDM, 32-Gbaud, DP-16QAM or DP-PS-64QAM signals over distances of 6, 10, and 16 spans of approximately 80-km SSMF links.

We focus on compensating for waveform distortion caused by the GVD, SPM, and nonlinear phase shift originating from XPM and XPolM. On the other hand, we do not consider polarization-mode dispersion (PMD) which could affect the performance of DBPs [6,22,23]. Since we aim to transmit WDM signals with a high-order modulation format such as PS-64QAM, the transmission distance is limited by optical noise accumulation and nonlinear waveform distortion, and significant PMD does not appear under the current experimental condition. However, when we attempt to further increase the channel number and the total bandwidth, or the symbol rate of each channel of the signal, the impacts of random birefringence of fibers should be addressed.

In a preliminary study based on numerical simulations, we observed an important characteristic about the learning process and applicability of the proposed LDBP technique. That is, once a parameter set for a virtual transmission line with a particular distance is optimized for a signal that has a modulation format with a particular complexity and a relatively large optical launch power, the DBP operation with the same parameters can be applied to signals received after the same transmission line but with other conditions of any modulation formats and smaller launch powers than that used in the learning process. For example, when the waveform of a uniform 16QAM signal with a fixed launch power of $+3$ dBm/channel is used as the training data in a learning process, the DBP operation with the optimized parameters is also applicable to signals with more complex modulation formats such as PS-64QAM signals, and launch powers less than $+3$ dBm/channel. Note here that the QPSK signal is not appropriate for training data, since it has a constant intensity and variations of the error signal tends to be poor, and the learning process does not progress effectively. Based on these findings, we implemented a learning process for the LDBP for each of the transmission distances using the received waveforms of the DP-16QAM signals with a fixed launch power of $+3$ dBm/channel as the training data. We then applied the LDBP technique with the optimized parameters to the received waveforms of the DP-PS-64QAM signals with launch powers ranging from $-4$ to $+2$ dBm/channel, and we evaluated the performance of compensating for the nonlinear waveform distortion. Note here that the proposed LDBP after the learning process can operate independent of the signal format. Although we examined the performance of the LDBP using PS-64QAM signal in the experiment, uniform 64QAM or higher order QAM signals could also be examined as long as they are correctly demodulated in the receiver.

As mentioned above, we attempted to compensate for the nonlinear waveform distortion of only the target channel. Therefore, in both the learning process using DP-16QAM signals and the evaluation process using DP-PS-64QAM signals, we calculated the linear segments of the virtual transmission line for all channels and the nonlinear segments only for the target channel. That is, the distortion of the neighboring channels was not compensated for; however, its influence on the target channel through XPM was negligible, as discussed in the derivation of Eq. (29).

3.1 Setup for transmission experiment

Figure 4 depicts the setup for the recirculation-loop transmission experiment. In the transmitter, the eleven tunable laser sources (TLSs) emitted continuous waves with optical frequencies ranging from 192.85 to 193.35 THz spaced by 50 GHz, to which the channel indices -5, -4, $\cdots$, +5 were assigned. They were combined by a $16\times 1$ polarization-maintaining (PM) coupler, amplified by a PM erbium-doped fiber amplifier (EDFA), and modulated by a DP IQ modulator (IQM) based on lithium niobate that was driven by a four-channel electric signal from an arbitrary waveform generator (AWG) with a sampling rate of 64 GSa/s. The modulated optical signal had a symbol rate of 32 Gbaud, a waveform shaped by a root raised-cosine (RRC) filter with a roll-off factor of 0.1, and the modulation format of either a uniform 16QAM or PS-64QAM with information rates of 4 and 5 bits/symbol/polarization, respectively. In modulating the PS-64QAM signal, we applied the distribution $p(x)\propto \exp (-\lambda x^2)$ with $\lambda =0.0638$ to the amplitude $x=\pm 1, \pm 3, \pm 5, \pm 7$ for each of in-phase and quadrature components of the signal, and we used constant-composite distribution matcher (CCDM) [19] for encoding the input bit patterns to the shaped symbols. Each signal was periodic, and a period consisted of 65,536 symbols per polarization, of which 88 were modulated by the $2^{11}-1$ pseudo-random bit sequence (PRBS), as the fixed header symbols to be used for symbol synchronization, 64,800 were modulated by a random bit sequence generated from a random seed, and 648 were dummy symbols with a constant amplitude that were inserted into every 100 in the 64,800 symbol set. For each modulation format, we generated four waveform patterns using different random seeds. Note here that all eleven channels of the generated WDM signal had the same waveform at the transmitter, and it may be a concern that only SPM occurs during transmission. However, different channels experienced walkoff owing to GVD after the transmission over the SSMF of multiple spans. For example, the walkoff between neighboring channels separated by 50 GHz after a span of 80-km SSMF with a GVD of 17 ps/nm/km was approximately 544 ps corresponding to 17.4 symbols for a 32-Gbaud signal. Therefore, XPM between different channels with uncorrelated waveforms in the time domain can be expected after a transmission distance of more than one span.

Fig. 4. Experimental setup.

Download Full Size | PDF

The 11-channel WDM signal from the transmitter was amplified by an EDFA, which was followed by a variable optical attenuator (VOA) to adjust the optical power and an optical band-pass filter (BPF) to remove the out-of-band optical noise. Two synchronized acoustic optical modulators (AOMs) and a 3-dB optical coupler were used to switch the optical signal to and from the recirculation loop. There were two spans of transmission lines in the loop, and a single span consisted of an EDFA, a BPF, a VOA, and an SSMF. The lengths of the fibers in the first and second spans of the loop were 84.1 and 80.5 km, respectively, which were intentionally set to different values. A polarization scrambler was synchronized to the switching operation of the AOMs to randomly rotate the signal’s polarization for every loop.

At the receiver, a tunable BPF with a bandwidth of 50 GHz was used to extract one of the 11 channels of the WDM signal. The optical power of the extracted channel was amplified up to $+3$ dBm regardless of the transmission conditions using an EDFA operating at automatic power control mode, and the signal was detected by a coherent receiver that consisted of a TLS as a local oscillator (LO), an optical front-end for coherent detection, and a real-time oscilloscope (OSC) with a bandwidth of 33 GHz and a sampling rate of 80 GSa/s. We captured the received waveform for a duration of 12.5 $\mu$s; thus, 400,000 symbols were included in the recorded file. Offline DSP was then performed to demodulate the received waveforms. Note that each channel of the WDM signal was received individually and asynchronously after the frequency of the LO was tuned to the channel. Therefore, to operate the proposed LDBP with consideration to the propagation of multiple channels, we needed to synchronize the timing of all channels in the offline DSP as if we captured them simultaneously using multiple receivers.

The demodulation process for each received channel was implemented offline in the following order as depicted in the upper row of Fig. 5: fixed GVD compensation, RRC filtering as the matched filter, polarization alignment and demultiplexing [24], resampling into two samples per symbol, clock extraction and retiming, carrier frequency and phase recovery together with 3-tap $2\times 2$ butterfly adaptive equalization (AEQ) adjusted by the decision-directed least-mean square (DD-LMS) algorithm, symbol decision, decoding to a bit sequence, and evaluation of the signal quality. Note here that the sampling rate of two samples per symbol is applied to all waveforms used in all processes of demodulation and LDBP. In the process to recover the carrier’s frequency and phase, they were first coarsely estimated using the dummy symbols, and then a decision-directed blind process for fine tracking was performed based on the block estimation process using a Kalman filter [25]. The 3-tap AEQ was effective in recovering the accumulated polarization crosstalk induced by the birefringence of the fibers or XPM between the orthogonal polarization components of the DP signal [21]. The reference amplitudes used in the decision-directed process of the carrier recovery and AEQ were adaptively adjusted [26], and the obtained reference amplitudes were also used for the final symbol decision.

Fig. 5. Full demodulation process for evaluation (upper row) and partial process for back propagation (lower).

Download Full Size | PDF

The fully demodulated waveform used for the signal-quality evaluation was unsuitable for the input of the DBP process because the RRC filter changed the spectral shape of the received signal, which can affect the performance of the compensation. Therefore, we obtained the demodulated waveform without RRC filtering, where we recovered the polarization rotation, clock timing, and carrier phase by retrieving these information from the full demodulation process, as shown in the lower row of Fig. 5. Notably, the tap coefficients of the 3-tap AEQ in the full demodulation process were also applied to the fixed equalization. Moreover, the equalizer of such a short tap number only compensated for polarization crosstalk but did not significantly reshape the signal’s spectrum. Finally, we collected the waveforms of all received channels and aligned the timing of the symbols, as they were at the transmitter using the header symbols with a fixed pattern. We then reimposed the total accumulated GVD of the transmission line on the waveforms, such that the walkoff between the channels observed at the receiver was reproduced. The waveforms obtained for all channels were then used as the input to a virtual transmission line of the LDBP. These waveforms and an ideal waveform defined at the transmitter were consolidated into a dataset for use in the learning process. We note that the sampling rate of the waveforms were two samples per symbol.

The two SSMFs in the recirculation loop exhibited slightly different dispersion characteristics. We measured the dispersion of each fiber via the offline DSP against optical signals that were propagated through the fiber that was disconnected from the loop. Using one of the two SSMFs, we transmitted 32-Gbaud, single-channel DP-QPSK signals with carrier frequencies similar to that of the 11-channel WDM signal mentioned previously, and we demodulated the received waveforms using the offline DSP by varying the accumulated GVD value in the fixed dispersion compensation stage. We identified the best value that provided the maximum signal quality. Consequently, it was observed that the 84.1-km SSMF used in the first span of the loop had a dispersion of 17.14 ps/nm/km and a slope of 0.062 ps/nm$^2$/km at a frequency of 193.1 THz, and the 80.5-km SSMF in the second span had a dispersion and slope of 16.55 ps/nm/km and 0.058 ps/nm$^2$/km, respectively. For a single loop with a total length of 164.6 km, the mean dispersion and slope were 16.85 ps/nm/km and 0.060 ps/nm$^2$/km, respectively. The difference in the dispersion characteristics of the two fibers is important for compensating for nonlinear waveform distortion, especially when XPM is considered, because the conditions for the walkoff between different channels depend on the local dispersion value, and it should be appropriately configured for an accurate calculation of XPM. We will show below that the proposed LDBP can correctly learn the different walkoff parameters of the two SSMFs.

3.2 Learning progress of LDBP for 11-channel DP-16QAM signal

We present the details of the experimentally obtained learning progress of the proposed LDBP. For each of the transmission distances, we trained the LDBPs with and without the structure to consider XPM between different channels as a nonlinear phase shift. We employed a virtual transmission line with a one-step/span configuration for the LDBP, in which the number of linear and nonlinear segments was $S$ and $S+1$, respectively, where $S$ is the total number of spans. We optimized the dispersion coefficient $D_2^{(j)}$ in linear segments $\mathcal {L}^{(j)}$ for $0\leq j\leq S$, and the nonlinear coefficient $g^{(j)}$, XPolM coefficient $\delta ^{(j)}$, and walkoff parameters $d_n^{(j)}$ for the neighboring channels with indices of $n$ in nonlinear segments $\mathcal {N}^{(j)}$ for $1\leq j\leq S$. We ignored the intra-channel third-order dispersion effect and the associated terms, including $D_3^{(j)}$ in Eqs. (15), (16), and (17). However, we considered the dispersion slope for the walkoff between different channels by setting the initial values of the walkoff parameters, $d_n^{(j)}=\beta _2\omega _n+\beta _3\omega _n^2/2$, where $\beta _2$ and $\beta _3$ were calculated from the measured mean dispersion characteristics of the real transmission line.

We calculated the signal propagation through a virtual transmission line using dimensionless parameters in the normalized space, where we used $t_0=1$ [ps], $z_0=1$ [km], and $P_0=1$ [mW] as constants to normalize the time, distance, and optical power, respectively. Before starting a learning process for the parameters in the virtual transmission line, we determined their initial values according to a basic SSFM. The initial values of $D_2^{(j)}$ in the first and last linear segments were set to $-443.7$ and those in the intermediate segments to $-887.5$. The loss coefficient $\alpha$ was fixed to $0.0442$. All nonlinear segments had the following common initial values: distance $h^{(j)}=82.3$, nonlinear coefficient $g^{(j)}=-9.78\times 10^{-4}$, XPolM coefficient $\delta ^{(j)}=1$, and walkoff parameters $d_n^{(j)}=-6.78n$ for the channel index $n$. To determine these initial values, we assumed the parameters of an SSMF, such as a span length of 82.3 km, second-order GVD of 16.85 ps/nm/km, dispersion slope of 0.060 ps/nm$^2$/km, nonlinear coefficient $\gamma$ of 1.1 rad/W/km, and propagation loss of 0.192 dB/km. Fig. 6 shows a schematic of the initial setting of the one-step/span virtual transmission line for the case of a six-span real transmission line as an example. The accumulated dispersion $D_2^{(j)}$ of the first and last linear segments, $j=0$ and $6$, were initially set to half of that of the intermediate segments, $1\leq j\leq 5$. Therefore, the sizes of $\mathcal {L}^{(0)}$ and $\mathcal {L}^{(6)}$ were smaller than the other linear segments in Fig. 6. However, a nonlinear segment $\mathcal {N}^{(j)}$ almost matched the individual span of the real transmission line, although the distance was initially set as the mean value of the two SSMFs.

Fig. 6. Schematic of 6-span real transmission line and corresponding 1-step/span virtual transmission line at the beginning of the learning process.

Download Full Size | PDF

Four datasets of 11-channel DP-16QAM signals were prepared in advance, and they were repeatedly used to update the parameters of a virtual transmission line using an iterative learning process. In a step of the iterative process, we chose one of the four datasets. The waveforms of all channels of the received signal were loaded from the dataset and set as the input to the virtual transmission line. We calculated the evolution of the signal in alternating linear and nonlinear segments of the virtual transmission line using Eq. (15) and Eqs. (18)–(22), respectively. Notably, the linear segments were calculated for all channels, but the nonlinear segments were only for the target channel. In addition to the signal evolution, we calculated the gradient of the waveform of the target channel with respect to the parameters to be optimized using Eqs. (16) and (17) for the linear segments and Eqs. (23)–(27) and (29) for the nonlinear segments. Finally, we obtained the output waveform of the target channel after the entire virtual transmission line and its gradient with respect to all parameters of the line that are to be optimized. We then randomly chose 1,024 symbols from the output waveform, which contained 65,536 symbols, and calculated the gradient of the error function with respect to the parameters, $\boldsymbol {\nabla } J$, using Eq. (13). The random choice of 1,024 out of 65,536 symbols in a dataset was effective in avoiding over-fitting against the repeated use of only a few datasets in the iterative learning process. Instead of using the simple updating formula in Eq. (12) to update the parameter set, we used the AdaBelief algorithm [27] for a rapid convergence in the learning process. By letting $\mu _0=0$ and $\nu _0=0$, the formula to update the parameter $\theta$ at the $i$-th step of the iterative process is given by

(30)$$\theta_i=\theta_{i-1}-\eta\frac{\mu_i}{\sqrt{\nu_i}+e},$$

where $\mu _i=b_1\mu _{i-1}/(1-b_1)+\nabla J_i$, $\nu _i=b_2\nu _{i-1}/(1-b_2)+\left (\nabla J_i-\mu _i\right )^2$, $\Delta J_i$ is the gradient of the error function $J$ with respect to the parameter $\theta$ at the $i$-th iteration given by Eq. (13), and $b_1=0.9$, $b_2=0.999$, and $e=10^{-8}$ are constants. $\eta$ in Eq. (30) is a parameter that controls the learning rate. In this experiment, we used $\eta =1.0$, $1.0\times 10^{-7}$, $7.0\times 10^{-5}$, and $3.0\times 10^{-4}$ to update $D_2$, $g$, $\delta$, and $d_n$, respectively. The difference in the magnitudes of those $\eta$ values was caused by the different scales of the parameters’ absolute values and gradients. By cyclically using the four datasets, we iterated the steps to update the parameters up to $10^5$ times, and we acquired an optimized parameter set of the virtual transmission line for each of the real lines with distances of 6, 10, and 16 spans. We focused on the learning progress for the case of a six-span transmission line as an example, and we analyzed how the parameters were optimized. The tendency of the learning progress was similar to that of the other span numbers. Figure 7 presents the MSE of the signal against the number of iterated steps in the learning process. The MSE was calculated from the amplitude after the virtual transmission line and that at the transmitter. We compared the case that considered both SPM and XPM for the nonlinear phase shift and the case of SPM only. We also plotted the moving average of the rapidly oscillating MSE in Fig. 7. The MSEs were immediately reduced after the learning process started, and they successfully converged after approximately 10,000 iterations in both cases. The initial MSE was more than $0.01$, and the moving-average value was finally reduced to $8.2\times 10^{-3}$ and $9.5\times 10^{-3}$ in Figs. 7(a) and (b), respectively. Thus, the proposed LDBP that considered SPM and XPM for the nonlinear phase shift substantially contributed to a reduction in the MSE of the amplitude of a WDM signal after a long-haul transmission.

Fig. 7. MSE and its moving average against the iteration number of the learning process when considering (a) both SPM and XPM and (b) only SPM for a nonlinear phase shift.

Download Full Size | PDF

In Figs. 8–10, we show the progress of the updated parameters in the iterative learning process when we considered both SPM and XPM for the nonlinear phase shift. It can be observed that all the parameters successfully converged in less than 50,000 iterations and that they were stable for further iterated steps. Fig. 8 depicts the progress of the dispersion parameter $D_2^{(j)}$ in $\mathcal {L}^{(j)}$ for $0\leq j\leq 6$, where $D_2^{(0)}$ and $D_2^{(6)}$ were significantly varied from their initial values, whereas $D_2^{(j)}$ for $1\leq j\leq 5$ remained close to the initial values with slight modifications. Specifically, $D_2^{(0)}$ of the first linear segment approached those of the intermediate segments, whereas $D_2^{(6)}$ of the last segment was reduced and approached 0. That is, the first six linear segments from the receiver side were dominant for the GVD compensation, and the last linear segment $\mathcal {L}^{(6)}$, which was closest to the transmitter, was appropriate for the fine tuning of the residual dispersion of the signal. It is clear that the converged parameters are different from a typical SSFM setting, where the first and last linear segments are determined to have halve of the accumulated dispersion of the first and last steps, respectively, as shown in Fig. 6. This is a consequence of the learning process for the virtual transmission line under the one-step/span configuration to minimize the MSE of the signal. Notably, the converged values, in this case, were not necessarily identical to the physical values for the real transmission line. Therefore, the LDBP considered herein does not always provide the best result for accurately monitoring the parameters of a transmission line, as reported in Refs. [28,29].

Fig. 8. Learning progress of dispersion coefficient $D_2^{(j)}$ in linear segments $\mathcal {L}^{(j)}$.

Download Full Size | PDF

Fig. 9. Learning progress of (a) nonlinear coefficient $g^{(j)}$ and (b) XPolM coefficient $\delta ^{(j)}$ in nonlinear segments $\mathcal {N}^{(j)}$.

Download Full Size | PDF

Fig. 10. Learning progress of walkoff parameters $d_n^{(j)}$ in nonlinear segments $\mathcal {N}^{(j)}$ for (a) all channels except for the target channel $n=0$, and (b) the channel with $n=-5$.

Download Full Size | PDF

The learning progress of the nonlinear coefficient $g^{(j)}$ and XPolM coefficient $\delta ^{(j)}$ in $\mathcal {N}^{(j)}$ for $1\leq j\leq 6$ is shown in Fig. 9. Only the nonlinear coefficient of the last segment $g^{(6)}$ had a larger absolute value than its initial value after convergence. Thus, in contrast to the result of $D_2^{(j)}$, the nonlinear phase shift was dominant in the nonlinear segment closest to the transmitter. On the other hand, the progress of the XPolM coefficient $\delta ^{(j)}$ was correlated to $g^{(j)}$; that is, $\delta ^{(j)}$ was small when $g^{(j)}$ had a large absolute value, and vice versa. Although the mechanism of these behaviors is still under investigation, we attribute this to the specific experimental condition wherein all channels had the same waveforms, and they overlapped just after the transmitter in the real transmission line.

Figure 10(a) shows the progress of the walkoff parameters $d_n^{(j)}$ for the channel indices $-5\leq n\leq 5$ except for $n=0$ in the nonlinear segments $\mathcal {N}^{(j)}$ for $1\leq j\leq 6$. Each of the $d_n^{(j)}$ values converged to two branches. As an example, we plot the progress of $d_{-5}^{(j)}$ in Fig. 10(b), where $d_{-5}^{(j)}$ with even $j$ values are larger than the initial values, whereas those with odd $j$ values are smaller. This result demonstrates that the nonlinear segments $\mathcal {N}^{(j)}$ for an even $j$ correspond to the first SSMF in the recirculation loop with a length of 84.1 km and those for an odd $j$ correspond to the second SSMF with a length of 80.5 km, as shown in Fig. 6. The first SSMF had a slightly larger dispersion and slope than the second SSMF, and a larger walkoff was expected in the former. Therefore, $d_n^{(j)}$ for an even $j$ consistently converges to larger values than the initial value, and those for an odd $j$ are smaller. It is important to correctly set the conditions of the walkoff between neighboring channels in DBPs that considers XPM to achieve good performance. The results shown in Fig. 10 suggest that the developed learning procedure was effective in optimizing the walkoff parameters $d_n^{(j)}$ to approach physically reasonable values.

We also attempted to optimize the parameters of the LDBP with a two-step/span configuration. However, learning did not proceed well, and successful convergence was not achieved. The LDBP with one-step/span configuration achieves sufficiently high performance, as shown below; therefore, we attribute the reason for the unsuccessful learning process for the two-step/span configuration to redundant parameters in the virtual transmission line.

3.3 Performance of LDBP for 11-channel DP-PS-64QAM signal

To confirm the effectiveness of the proposed LDBP for each transmission distance, we applied the LDBP with fixed optimized parameters to the received waveforms of the 11-channel DP-PS-64QAM signals with various launch powers ranging from $-4$ to $+2$ dBm/channel. We then evaluated the quality of the target channel of the signal, which was located in the center position of the WDM channels and was most affected by the nonlinear waveform distortion caused by XPM. For each condition, the qualities of the four different patterns of the 11-channel DP-PS-64QAM signal were evaluated and averaged. The input to the virtual transmission line had 75,536 symbols, and we extracted the central 65,536 symbols from the output so that the walkoff effect between different channels was correctly treated.

Figure 11 summarizes the obtained signal qualities for each transmission distance, where the Q factor and normalized generalized mutual information (NGMI) [30,31] of the signals are plotted against the launch powers. The Q factor was calculated from the measured bit-error rate (BER) of the signal using $Q^2 \mathrm {[dB]} =20\log _{10}[\sqrt 2\mathrm {erfc}^{-1}(2\times BER)]$, and they were consistent with the NGMIs. We used LDBPs with a one-step/span configuration with and without the structure that considers the XPM. For comparison, we also calculated the signal qualities for the case where DBPs without a learning process were applied; we employed one- and two-step/span configurations, and we changed the presence of the structure considering XPM. Note here that the DBPs had parameters of the virtual transmission lines similar to those of the LDBP before the learning process. The results obtained for the signal without any DBP schemes are also shown in Fig. 11, where the Q factor was rapidly degraded for launch powers larger than $-2$ dBm/channel owing to nonlinear waveform distortion. All LDBP and DBP schemes can compensate for the distortion and increase the Q factor at higher launch powers compared with the case without DBP, and the proposed LDBP that considers XPM with a one-step/span configuration achieved the best performance among them. When only SPM was considered for the nonlinear phase shift, the one-step/span LDBP and two-step/span DBP were equivalent. They presented better results than the one-step/span DBP, which suggests that the learning process reduced the number of steps for a virtual transmission line while compensating for the distortion, as stated in Ref. [13]. However, when both SPM and XPM were considered, one-step/span LDBP and two-step/span DBP further improved the signal qualities compared with the case that only considered SPM. Notably, the one-step/span LDBP was even better than the two-step/span DBP owing to the learning process. Obtaining optimized walkoff parameters that were close to the physically expected values, as discussed in Sec. 3.2, was effective in accurately compensating for the nonlinear phase shift, including XPM. Furthermore, good performance was achieved against a transmission line in which the accurate physical parameters were unknown. Considering that the performances of the one-step/span LDBP and two-step/span DBP were equivalent when only SPM was considered for compensation, we recognize that the learning process is more significant when we compensate for the fiber nonlinearity accounting for both SPM and XPM for WDM signals.

Finally, we verified the dependency of the number of channels to be considered in compensating for XPM using the one-step/span LDBP. Figure 12 shows the Q factor of the target channel of the DP-PS-64QAM signal after 16-span transmission line with the launch power of $+1$ dBm/channel, against the number of neighboring channels considered in the one-step/span LDBP. When the number of the neighboring channel was $2n$, the channels with the indices $-n, \ldots, -1, +1, \ldots, +n$ were considered to compensate for XPM. As can be seen in Fig. 12, the signal quality was improved by increasing the number of the neighboring channels. However, the performance and the computation complexity are in trade-off relation, as discussed in the next section.

Fig. 11. Q factor (upper) and NGMI (lower) of the target channel of the DP-PS-64QAM signal against the launch power after transmission over 6, 10, and 16 spans under various conditions of LDBP/DBP. Insets: Constellation diagrams for the launch power of $+1$ dBm/channel, observed for no DBP (left), two-step/span DBP considering only SPM (center), and one-step/span LDBP considering SPM and XPM (right).

Download Full Size | PDF

Fig. 12. Q factor against the number of neighboring channels considered for XPM.

Download Full Size | PDF

4. Evaluation of computational complexity

We evaluated the computational complexity of the proposed LDBP and observed that it is comparable to a conventional DBP. An LDBP operates as a DBP after the learning process is completed, and the complexity depends on the number of steps per span and whether XPM is considered in its structure. In the previous section, we confirmed that the one-step/span LDBP that considers XPM exhibited the best performance among the compared DBP schemes. In addition, when only SPM was considered, the two-step/span DBP was better than the one-step/span DBP. Therefore, we estimate and compare the calculation costs associated with operating DBPs that have a one-step/span configuration with XPM as the proposed scheme and a two-step/span configuration without XPM as a conventional scheme. Although variations of DBP having improved performances and reduced complexities have been proposed [32–34], we evaluate a conventional DBP with a basic structure [3,4] and the proposed LDBP for comparison of the complexities determined by the number of steps per span and structures with and without XPM.

We count the total number of operations used for the multiplication of real numbers because they dominate the scale of a DSP circuit. In most calculations, complex numbers are required, and a product of two complex numbers, $a\times b=\mathrm {Re}[a]\mathrm {Re}[b]-\mathrm {Im}[a]\mathrm {Im}[b]+i(\mathrm {Re}[a]\mathrm {Im}[b]+\mathrm {Im}[a]\mathrm {Re}[b])$, is equivalent to four multiplication operations of real numbers. A fast Fourier transform (FFT) operation requires $\frac N2(\log _2N-2)$ complex multiplications when the length of the data $N$ is a power of 2; thus, the number of real multiplications is $2N(\log _2N-2)$. Finally, we calculate all parameters that are independent of the signal amplitude in advance and store them in a memory or a look-up table; that is, we do not count the number of calculations for those parameters. We consider a DBP process for a WDM signal of $C$ channels that is received after a real transmission line with $S$ spans, where each channel has two polarization components and a length of $N$ in the time domain.

First, we accumulate the number of real multiplications required to calculate Eq. (5) for the polarization component of a signal in the linear segment. The forward and inverse FFT operations cost $2N(\log _2N-2)$ real multiplications each. The phase shift factor, which is described by the exponential part in Eq. (5), is calculated and stored in the memory in advance. The product of the exponential part and $\tilde {A}_{p,n}(0,\omega )$ costs $4N$ real multiplications. Consequently, the total number of real multiplications in the linear segment is $2\times 2N(\log _2N-2)+4N=4N(\log _2N-1)$.

Next, we consider the nonlinear segment, which is described by Eqs. (8), (9), and (10), and accumulate the required number of real multiplications for a single polarization component of the target channel. When the nonlinear phase shift $\varphi$ of $N$ real numbers is given in Eq. (8), we use a Taylor expansion up to the fourth order, $\exp (i\varphi )=1-\frac 12\varphi ^2+\frac 1{24}(\varphi ^2)^2+i(\varphi -\frac 16\varphi ^2\times \varphi )$ to obtain the phase shift component [8]. This operation requires $6N$ real multiplications because $\frac 12\times \varphi ^2$ costs $2N$, $\frac 1{24}\times (\varphi ^2)^2$ costs $2N$ after reusing the result of $\varphi ^2$, and $\frac 16\times \varphi ^2\times \varphi$ costs $2N$. In addition, the complex product of $A_{p,0}(0,t)$ and $\exp (i\varphi )$ costs $4N$ real multiplications. We then focus on the calculation of the SPM-related phase shift $\varphi _p^\mathrm {SPM}$ using Eq. (9). $P_{p,0}(t)=\mathrm {Re}[A_{p,0}(t)]^2+\mathrm {Im}[A_{p,0}(t)]^2$ costs $2N$ real multiplications. $P_{3-p,0}(t)$ is provided after being calculated separately; therefore, the calculation cost is not included here. By multiplying the coefficients $gH_0$ and $gH_0\delta$ by $P_{p,0}(t)$ and $P_{3-p,0}(t)$ (each costs $N$ real multiplications), respectively, we finally determine that the total number of real multiplications required to calculate $\varphi _p^\mathrm {SPM}$ is $4N$. Finally, we focus on the calculation of the XPM-related phase shift $\varphi _p^\mathrm {XPM}$ using Eq. (10). Although the spectrum of the intensity of the target channel $\tilde {P}_{p,0}(\omega )$ does not appear in Eq. (10), it is necessary to distribute the spectrum to the other channels to calculate $\varphi _p^\mathrm {XPM}$ for each. Therefore, $2N(\log _2N-2)$ real multiplications for the FFT are counted here. However, $\tilde {P}_{3-p,0}(\omega )$ and $\tilde {P}_{p,n}(\omega )$ for $n\neq 0$ are provided from the separated calculation units for the other channels, and their costs are excluded. The product of the real coefficient $2g$ and complex spectrum $\tilde {P}_{p,n}(\omega )$ costs $2N$ real multiplications, and the product of $\delta g$ and $\tilde {P}_{3-p,n}(\omega )$ also costs $2N$. We assume that $H_n(h,\omega )$ is stored in the memory, and its product with the result of $2g\tilde {P}_{p,n}(\omega )+g\delta \tilde {P}_{3-p,n}(\omega )$ costs $4N$ real multiplications. Since there are $C-1$ neighboring channels, the number of real multiplications is $8N(C-1)$. Considering the last inverse FFT operation, we estimated the total number of real multiplications for $\varphi _p^\mathrm {XPM}$ to be $2N(\log _2N-2)+8N(C-1)+2N(\log _2N-2)=4N(\log _2N+2C-4)$.

In the case of a DBP for the $S$-span transmission line with the configuration of $P$ steps per span, there are $PS+1$ linear segments and $PS$ nonlinear segments. Therefore, the total number of real multiplications as the computational complexity for the waveform of the length $N$ for a DBP considering only SPM, $\mathcal {M}_\mathrm {SPM}$, is as follows:

(31)$$\mathcal{M}_\mathrm{SPM}(S, P, N)=4N(\log_2N-1)(PS+1)+14NPS.$$

On the other hand, when we consider both SPM and XPM for the nonlinear phase shift against the $C$-channel WDM signal, the total number of real multiplications for the process, $\mathcal {M}_\mathrm {XPM}$, is

(32)$$\mathcal{M}_\mathrm{XPM}(S, P, C, N)=\mathcal{M}_\mathrm{SPM}(S, P, N) + 4N(\log_2N+2C-4)PS.$$

Using Eqs. (31) and (32), we calculated the computational complexities $\mathcal {M}_\mathrm {SPM}$ with a $P=2$ step/span configuration as the conventional scheme, and $\mathcal {M}_\mathrm {XPM}$ with $P=1$ step/span configuration for WDM signals of $C=5$, $11$, and $21$ channels as examples of the proposed LDBP using a transmission line with a span number of $S=10$. For comparison, we also calculated $\mathcal {M}_\mathrm {XPM}$ with $P=2$ step/span configuration assuming conventional DBP without learning but considering XPM. Fig. 13 shows the calculated result, where $\mathcal {M}_\mathrm {XPM}$ with $P=1$ exceeds $\mathcal {M}_\mathrm {SPM}$ and increases by the channel number $C$, and their relationship is invariant for $N$. However, $\mathcal {M}_\mathrm {XPM}$ with $P=1$ was always approximately a half of that with $P=2$, suggesting that the reduced number of the steps per span as a result of the learning process contributed to the reduction of the computational complexity. For a data length of $N=1024$, the ratios of $\mathcal {M}_\mathrm {XPM}$ with $P=1$ to $\mathcal {M}_\mathrm {SPM}$ for $C=5$, $11$, and $21$ are 1.14, 1.60, and 2.38, respectively. Thus, the computational complexity of the proposed LDBP considering XPM with a one-step/span configuration remains in the same order as the conventional scheme considering only SPM with a two-step/span configuration.

Fig. 13. Computational complexity against data length $N$ for operating DBPs, $\mathcal {M}_\mathrm {SPM}(10, 2, N)$, $\mathcal {M}_\mathrm {XPM}(10, 1, C, N)$, and $\mathcal {M}_\mathrm {XPM}(10, 2, C, N)$ for channel number $C=5$ (a), $11$ (b), and $21$ (c).

Download Full Size | PDF

5. Conclusion

We proposed an LDBP technique that considers SPM and XPM to effectively compensate for the nonlinear waveform distortion of WDM signals with a reasonable calculation cost. The LDBP employed a structure with a channel-by-channel basis; however, it included inter-channel interaction to account for XPM. Additionally, an approximation was applied to the evolution of the neighboring channels in nonlinear segments. We developed a learning procedure to optimize the parameters of the structure using an iterative process. To verify the learning process and performance of the proposed LDBP, we conducted a transmission experiment using 11-channel WDM DP-16QAM and DP-PS-64QAM signals over distances of 6, 10, and 16 spans of approximately 80-km SSMF links. Using the received waveforms of the DP-16QAM signal for each transmission distance, we observed the successful convergence of particular parameters in a virtual transmission line with a one-step/span configuration, and we analyzed how the parameters were optimized after the learning process for the case of six spans. We then applied the LDBP with fixed optimum parameters to the DP-PS-64QAM signals and evaluated the compensation performance for the nonlinear waveform distortion. The proposed LDBP considering XPM with a one-step/span configuration exhibited the best performance. The result was better than that of the two-step/span DBP considering XPM owing to its learning feature against a real transmission line, in which accurate values of the physical parameters were unknown. The LDBP was effective when considering XPM, instead of the case of considering SPM only. We also estimated the computational complexity of the proposed LDBP and observed that the calculation cost of the proposed LDBP technique considering XPM with a one-step/span configuration remained in the same order as a conventional DBP that considers only SPM with a two-step/span configuration.

In conclusion, the proposed LDBP technique can be a powerful tool to mitigate the nonlinear waveform distortion of WDM signals, and it can be implemented with reasonable computational complexity. Although we did not consider nonlinear polarization crosstalk induced by XPM [14] in this paper, we will develop an LDBP scheme that addresses both of XPM-induced nonlinear phase shift and polarization crosstalk to further improve the performance of WDM signal transmission.

Funding

National Institute of Information and Communications Technology (20401).

Acknowledgments

The authors thank M. Pelusi for the helpful discussion. The computational resource of the AI Bridging Cloud Infrastructure (ABCI) provided by AIST was partly used for this work.

Disclosures

The authors declare no conflicts of interest.

Data availability

The data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Mitra and J. Stark, “Nonlinear limits to the information capacity of optical fibre communications,” Nature 411(6841), 1027–1030 (2001). [CrossRef]

2. R. -J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightwave Technol. 28(4), 662–701 (2010). [CrossRef]

3. X. Li, X. Chen, G. Goldfarb, E. Mateo, I. Kim, F. Yaman, and G. Li, “Electronic post-compensation of WDM transmission impairments using coherent detection and digital signal processing,” Opt. Express 16(2), 880–888 (2008). [CrossRef]

4. E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

5. G. P. Agrawal, Nonlinear Fiber Optics (Academic, 2001).

6. G. Liga, T. Xu, A. Alvarado, R. I. Killey, and P. Bayvel, “On the performance of multichannel digital backpropagation in high-capacity long-haul optical transmission,” Opt. Express 22(24), 30053–30062 (2014). [CrossRef]

7. J. Leibrich and W. Rosenkranz, “Efficient numerical simulation of multichannel WDM transmission systems limited by XPM,” IEEE Photonics Technol. Lett. 15(3), 395–397 (2003). [CrossRef]

8. E. F. Mateo, F. Yaman, and G. Li, “Efficient compensation of inter-channel nonlinear effects via digital backward propagation in WDM optical transmission,” Opt. Express 18(14), 15144–15154 (2010). [CrossRef]

9. E. F. Mateo, X. Zhou, and G. Li, “Improved digital backward propagation for the compensation of inter-channel nonlinear effects in polarization-multiplexed WDM systems,” Opt. Express 19(2), 570–583 (2011). [CrossRef]

10. T. Tanimura, T. Hoshida, T. Tanaka, L. Li, S. Oda, H. Nakashima, Z. Tao, and J. C. Rasmussen, “Semi-blind nonlinear equalization in coherent multi-span transmission system with inhomogeneous span parameters,” Proc. OFC/NFOEC2010, Paper OMR6 (2010).

11. C. Hager and H. D. Pfister, “Nonlinear interference mitigation via deep neural networks,” Proc. OFC2018, Paper W3A.4 (2018).

12. Q. Fan, G. Zhou, T. Gui, C. Lu, and A. P. T. Lau, “Advancing theoretical understanding and practical performance of signal processing for nonlinear optical communications through machine learning,” Nat. Commun. 11(1), 3694 (2020). [CrossRef]

13. B. Bitachon, A. Ghazisaeidi, M. Eppenberger, B. Baeuerle, M. Ayata, and J. Leuthold, “Deep learning based digital backpropagation demonstrating SNR gain at low complexity in a 1200km transmission link,” Opt. Express 28(20), 29318–29334 (2020). [CrossRef]

14. D. Tang, Z. Wu, Z. Sun, X. Tang, and Y. Qiao, “Joint intra and inter-channel nonlinearity compensation based on interpretable neural network for long-haul coherent systems,” Opt. Express 29(22), 36242–36256 (2021). [CrossRef]

15. Q. Fan, C. Lu, and A. P. T. Lau, “Combined neural network and adaptive DSP training for long-haul optical communications,” J. Lightwave Technol. 39(22), 7083–7091 (2021). [CrossRef]

16. S. Civelli, E. Forestieri, A. Lotsmanov, D. Razdoburdin, and M. Secondini, “Coupled-channel enhanced SSFM for digital backpropagation in WDM systems,” Proc. OFC2021, paper M3H.7.

17. O. Sidelnikov, A. Redyuk, S. Sygletos, M. Fedoruk, and S. Turitsyn, “Advanced convolutional neural networks for nonlinearity mitigation in long-haul WDM transmission systems,” J. Lightwave Technol. 39(8), 2397–2406 (2021). [CrossRef]

18. F. Buchali, F. Steiner, G. Böcherer, L. Schmalen, P. Schulte, and W. Idler, “Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration,” J. Lightwave Technol. 34(7), 1599–1609 (2016). [CrossRef]

19. G. Böcherer, P. Schulte, and F. Steiner, “Probabilistic shaping and forward error correction for fiber-optic communication systems,” J. Lightwave Technol. 37(2), 230–244 (2019). [CrossRef]

20. C. R. Menyuk and B. S. Marks, “Interaction of polarization mode dispersion and nonlinearity in optical fiber transmission systems,” J. Lightwave Technol. 24(7), 2806–2826 (2006). [CrossRef]

21. Z. Tao, W. Yan, L. Liu, L. Li, S. Oda, T. Hoshida, and J. C. Rasmussen, “Simple fiber model for determination of XPM effects,” J. Lightwave Technol. 29(7), 974–986 (2011). [CrossRef]

22. R. Dar and P. J. Winzer, “On the limits of digital back-propagation in fully loaded WDM systems,” IEEE Photonics Technol. Lett. 28(11), 1253–1256 (2016). [CrossRef]

23. C. B. Czegledi, G. Liga, D. Lavery, M. Karlsson, E. Agrell, S. J. Savory, and P. Bayvel, “Digital backpropagation accounting for polarization-mode dispersion,” Opt. Express 25(3), 1903–1915 (2017). [CrossRef]

24. B. Szafraniec, B. Nebendahl, and T. Marshall, “Polarization demultiplexing in Stokes space,” Opt. Express 18(17), 17928–17939 (2010). [CrossRef]

25. T. Inoue and S. Namiki, “Carrier recovery for M-QAM signals based on a block estimation process with Kalman filter,” Opt. Express 22(13), 15376–15387 (2014). [CrossRef]

26. T. Inoue and S. Namiki, “Adaptive adjustment of reference constellation for demodulating 16QAM signal with intrinsic distortion due to imperfect modulation,” Opt. Express 21(24), 29120–29128 (2013). [CrossRef]

27. J. Zhuang, T. Tang, Y. Ding, S. Tatikonda, N. Dvornek, X. Papademetris, and J. S. Duncan, “AdaBelief optimizer: Adapting stepsizes by the belief in observed gradients,” arXiv:2010.07468 (2020).

28. T. Tanimura, S. Yoshida, K. Tajima, S. Oda, and T. Hoshida, “Fiber-longitudinal anomaly position identification over multi-span transmission link out of receiver-end signals,” J. Lightwave Technol. 38(9), 2726–2733 (2020). [CrossRef]

29. T. Sasai, M. Nakamura, E. Yamazaki, S. Yamamoto, H. Nishizawa, and Y. Kisaka, “Digital backpropagation for optical path monitoring: loss profile and passband narrowing estimation,” Proc. ECOC2020, paper Tu2D-1.

30. A. Alvarado, E. Agrell, D. Lavery, R. Maher, and P. Bayvel, “Replacing the soft-decision FEC limit paradigm in the design of optical communication systems,” J. Lightwave Technol. 33(20), 4338–4352 (2015). [CrossRef]

31. J. Cho and P. J. Winzer, “Probabilistic constellation shaping for optical fiber communications,” J. Lightwave Technol. 37(6), 1590–1607 (2019). [CrossRef]

32. L. B. Du and A. J. Lowery, “Improved single channel backpropagation for intra-channel fiber nonlinearity compensation in long-haul optical communication systems,” Opt. Express 18(16), 17075–17088 (2010). [CrossRef]

33. D. Rafique, M. Mussolin, M. Forzati, J. Mårtensson, M. N. Chugtai, and A. D. Ellis, “Compensation of intra-channel nonlinear fibre impairments using simplified digital back-propagation algorithm,” Opt. Express 19(10), 9453–9460 (2011). [CrossRef]

34. M. Secondini, D. Marsella, and E. Forestieri, “Enhanced split-step Fourier method for digital backpropagation,” Proc. ECOC2014, paper We.3.3.5.

Variable	Parameter
$D_{2}^{(j)}$	Accumulated second-order dispersion
$g^{(j)}$	Nonlinear coefficient
$δ^{(j)}$	XPolM coefficient
$d_{n}^{(j)}$	Walkoff parameters for channel index $n$

Learning-based digital back propagation to compensate for fiber nonlinearity considering self-phase and cross-phase modulation for wavelength-division multiplexed systems

Abstract

1. Introduction

2. Operation principle

2.1 Structure and mathematical model of proposed LDBP

2.2 Development of learning procedure

3. Experimental verification

3.1 Setup for transmission experiment

3.2 Learning progress of LDBP for 11-channel DP-16QAM signal

3.3 Performance of LDBP for 11-channel DP-PS-64QAM signal

4. Evaluation of computational complexity

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (1)

Equations (32)

Optics Express