Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Neural network architectures for optical channel nonlinear compensation in digital subcarrier multiplexing systems

Open Access Open Access

Abstract

In this work, we propose to use various artificial neural network (ANN) structures for modeling and compensation of intra- and inter-subcarrier fiber nonlinear interference in digital subcarrier multiplexing (DSCM) optical transmission systems. We perform nonlinear channel equalization by employing different ANN cores including convolutional neural networks (CNN) and long short-term memory (LSTM) layers. First, we develop a fiber nonlinearity compensation for DSCM systems based on a fully-connected network across all subcarriers. In subsequent steps, and borrowing from the perturbation analysis of fiber nonlinearity, we gradually upgrade proposed designs towards modular structures with better performance-complexity advantages. Our study shows that putting proper macro structures in design of ANN nonlinear equalizers in DSCM systems can be crucial in development of practical solutions for future generations of coherent optical transceivers.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

For high-speed long-haul fiber-optic transmission, the nonlinear interference from Kerr effect is a major bottleneck that limits the achievable transmission rates. This interference can be equalized by approximating and inversing the nonlinear Schrödinger equation through digital back-propagation (DBP) [13] or perturbation-based nonlinear compensation (PNLC) [4,5]. These solutions require accurate information from the optical channel and their prohibitive complexity has limited their application in real-time processing with agile and flexible requirements. DBP notably has been widely used as a bench marking algorithm for evaluation the performance of other nonlinear compensation (NLC) solutions due to its algorithmic simplicity and limited hyper-parameters. However, it faces serious challenges for fixed-point implementation in coherent modems due to higher required over-sampling rate, use of multiple (inverse) fast Fourier transform (FFT) modules, affecting the linear equalization path, and the need for higher fixed-point precision to maintain accuracy.

Alternatively, a variety of ANN solutions have been recently proposed for fiber nonlinearity compensation application. The primitive works tried to squeeze additional performance by feeding triplets inspired from perturbation analysis of fiber nonlinearity to a feed-forward neural network [6]. Later works draw inspiration from DBP and aimed to incorporate deep convolutional neural networks (CNNs) for this tasks [7,8]. The use of advance recurrent neural networks (RNNs), such as long short-term memory (LSTM) modules, which are more suitable for the equalization of time-series processes has also picked up a great interest [9,10], with more recent works employing transformer structures for this task [11]. In fact, the pattern and medium dependent characteristics of nonlinear propagation make it a suitable problem to be tackled by variety of toolsets and solutions from ANN domain. An ANN-based nonlinear equalizer is generally more flexible compared to the conventional methods in the sense that it can be better updated for different transmission scenarios without the need for accurate channel parameter feedback. Also, ANN nonlinear equalizers can be extended to include the functionalities of traditional DSP modules to form a more general equalizer. Furthermore, the ANN design where the compensation process is learned through data can potentially lead to a large reduction in computational complexity [12]. The flexibility and universality of machine learning solutions can be improved by using reinforcement learning (RL) solutions specially for adaptive application in optical environment where acquiring enough data for training and retraining is challenging [13].

In this work, we consider an application of ANN in compensation of fiber nonliterary distortions in coherent optical communication systems. We focus on advanced ANN structures with the ability to generate appropriate features without any reliance on an external pre-processing module. We particularly study digital subcarrier multiplexing (DSCM) systems since their design flexibility makes it a promising solution for the coherent optical modems [14,15]. Simplifying the DSP development with lower speed processing per-subcarrier, flexible channel-matched transmission, robust clock recovery, and the easy transition to a point-to-multi-point (P2MP) architecture are some of the advantages of DSCM systems.

Here, we develop macro ANN structures, inspired by the fiber nonlinearity distortion mechanism that governs the nonlinear interaction across different subcarriers and are shown to be more efficient in terms of inference complexity, model representation, and training efficiency. We propose various ANN structures for modeling and compensation of intra- and inter-subcarrier fiber nonlinearities in DSCM systems, and explore scalability and performance versus complexity tradeoffs of the presented solutions. Different models are designed in terms of how received symbols across digital subcarriers are employed for training ANN cores for intra-subcarrier self-phase modulation (iSPM) and inter-subcarrier cross-phase modulation (iXPM) nonlinear impairments. Starting with a fully-connected network across all subcarriers, we move toward upgrading the design with modular ANN cores and sequential training stages. In other words, we start with black-box ANN models and then propose more efficient and flexible modular designs inspired by nonlinear perturbation analysis. All models in here are universal from ANN-core choice perspective. Specifically, we choose the building block for all the proposed structures in this work to be an ANN core with combinations of CNN and LSTM layers. One important aspect in this work is to generalize the neural network designs such that a block of data is generated in the equalization since parallelization is an essential feature of the coherent modems. We explore parallelization of these designs and impact of block-processing on performance-complexity tradeoffs for these models. The results suggest that one can get orders of magnitude reduction in computational complexity by moving towards block equalization in this fashion when RNN-based solutions are deployed.

The remainder of this paper is organized as follows: In Section 2, the base of nonlinear compensation for fiber channel is briefly discussed. In Section 3, the multi-purpose ANN-core structure as the main building block of the proposed models is explained. The details of various ANN structures for NLC in DSCM are presented in Section 4 while Section 5 is devoted to the numerical setup and results. In Section 6, we discuss the impact of dispersion map on the design of the nonlinear equalizers for DSCM systems. Finally, we conclude the paper in Section 7.

2. Nonlinear compensation for optical fiber channel

The dual-polarization evolution of optical field over a fiber link can be explained by the Manakov equation [16] where the linear and nonlinear propagation impacts are described as follows:

$$\frac{\partial{u_{x/y}}}{\partial{z}} + \frac{\alpha}{2}u_{x/y} + j\frac{\beta}{2}\frac{\partial^2u_{x/y}}{\partial{t^2}} = j\frac{8}{9}\gamma\Bigl[|u_x|^2+|u_y|^2\Bigr]u_{x/y},$$
where $u_{x/y} = u_{x/y}(t,z)$ represents the optical field of polarization $x$ and $y$, respectively, $\alpha$ is the attenuation coefficient, $\beta$ is the group velocity dispersion (GVD), and $\gamma$ is the nonlinear coefficient. Nonlinear interference can be equalized by approximating and inversing the above equation through DBP [13] where the fiber is modeled as a series of linear and nonlinear sections through first-order approximation of Manakov equation. On the other hand, by employing the perturbation analysis [4], one can represent the optical field as the solution of linear propagation plus a symbol domain perturbation term that encapsulates the accumulated nonlinear distortion on every symbol. It is shown that the first-order perturbation term can be modeled by the weighted sum of triplets of transmitted symbols plus a constant phase rotation [5,17].

Considering the lumped nonlinear compensation methods, a block diagram for the equalization module is presented in Fig. 1 where the pre-processing buffer generates appropriate inputs for a given method. Specifically, it includes a module that calculates appropriate PNLC triplets for regular perturbation-based method or an artificial neural network nonlinear compensation (ANN-NLC) approach that operates on externally generated triplet features [6]. In ANN-NLC solutions that directly operate on Rx-DSP outputs [8,10,18,19], this pre-processing buffer is tasked to provide an extended block of soft symbols needed to efficiently equalize the nonlinear interference.

 figure: Fig. 1.

Fig. 1. Block diagram for lumped perturbation-based nonlinear compensation.

Download Full Size | PDF

Considering the first-order perturbation as the dominant nonlinear term, an appropriate scaling can be employed to adapt the nonlinear error estimates in case the training and inference stages are performed at different optical launch powers:

$$\alpha = 10^{\bigl(P_\text{inference}(\text{dB}) - P_\text{train}(\text{dB})\bigr)/10},$$
where, $P_\text {train}$ is the optical launch power for the training dataset while $P_\text {inference}$ is the respective optical launch power of data in the inference (equalization) stage.

In this work, we consider various lumped ANN structures for fiber nonlinearity compensation in DSCM systems. We provide the evolutionary approach of designing advanced ANN models that do not rely on externally generated features (such as triplets). Hence, by using input symbols in a delay-line format, the model learns relevant features according to the imposed structure through available layers. The proposed ANN-NLC equalizers estimate the nonlinear distortions of each subcarrier in one polarization of a DSCM signal given the relevant information from all digital subcarriers across both polarizations. Due to the nature of signal propagation in fiber and symmetries in the medium, it has been shown that the same model can be used to generate nonlinear error estimates for other polarization by simply swapping input signals to their respective counterpart from the other polarization. This alleviates the need to train separate models for each polarization and enables efficient learning of a more generalized model.

3. Multi-purpose ANN-core structure

The presented ANNs mainly explore different higher level structures that govern the interaction between each target digital subcarrier and its neighbors in search for more powerful and efficient models to learn intra- and inter-subcarrier nonlinear distortion. Hence, the models are universal from ANN-core choice perspective and the core can be replaced with other designs. Specifically, we choose the building block for all the proposed models in this work to be an ANN core comprising a combination of CNN and LSTM layers. The first layer is a 1-dimensional CNN followed by Leaky ReLU activation function that is tasked with helping in feature generation. CNN features are fed into a LSTM module with bi-directional structure to extract time-dependency of the input features.

Real-time processing of coherent optical receivers require a great deal of parallelization. Hence, a block of input symbols is processed at each clock and results are delivered to the next module in the pipeline. In order to address this requirement and also save in computational resources for equalization, one can share the overhead corresponding to initialization of each LSTM chain by forming longer chains. By employing block-processing, we expand the input and output sequences to provide nonlinear estimates for a block of consecutive time instances of size $N$ in one round of forward and backward LSTM state transition process.

With this modification, we exploit the LSTM excellent efficiency in handling the memory. Generally, LSTMs are highly suitable to reduce the processing overhead of a sequential input stream since they aim to capture the most relevant representations of the past observed inputs in form of the hidden states. These hidden state variables are updated as new inputs are processed sequentially. However, the output remains an explicit function of the inputs and hidden state variables at every time instance. Consequently, equalization of any extra input only increases the total computations by one extra LSTM processing step. To leverage this capability, simplify training, and to avoid the challenges from long back-propagation in LSTMs, these neural networks are trained with the regular symbol-based processing while the block-processing is employed during the deployment and the evaluation stage. Note that by using the block-processing in the equalization path, we bring an approximation into the network that we have trained with different initial hidden states. However, with a long-enough training block size and sufficient filter-tap, one can show that the changes in the states are minimal [19]. This is reflected in the complexity figures as we deploy trained models with different block sizes $N$ in the numerical results.

A block diagram for the proposed ANN equalization core is depicted in Fig. 2. The LSTM network has been trained using a fixed sequence of features corresponding to $2k+1$ time instances, where $k$ is the filter-tap size on each side of the target symbol. In the equalization path, we deploy the same network over input feature sequences corresponding to $2k+N$ time instances to obtain output features associated to the symbols in the middle $N$ time slots. In this case, input features corresponding to the first $k+N$ time-instances $i \in \{-k+1,\ldots,N\}$ are sequentially fed into a forward LSTM unit initialized with zero memory, producing output features and evolving the internal memory states. A similar, backward LSTM unit starts with zero memory and evolves using the CNN features corresponding to the last $k+N$ time instances of the $2k+N$ window $i \in \{1,\ldots, N+k\}$ in the opposite direction. Outputs of forward and backward LSTM modules for middle $N$ time instances are concatenated to form the LSTM block outputs. Finally, the LSTM block outputs may pass through a linear or a multi-layer perceptrons (MLP) stage with Leaky ReLU activation functions (for all but the last layer) that ultimately provides estimations for real and imaginary part of nonlinear interference per output. Note that, as we discuss further in section 4, the final MLP layer can be separated from the ANN core and trained individually in some architectures.

 figure: Fig. 2.

Fig. 2. Multi-purpose ANN-core structure for the equalization path.

Download Full Size | PDF

In order to get a measure of complexity for the multi-purpose ANN core, we consider a CNN-LSTM network with an MLP output layer. Let us consider equalization of $N$ symbols with processing window of $N_{w} = 2t+N$. The real multiplication per symbol (RM) for CNN is equal to:

$$CNN_{RM} = \frac{4N_fN_{ke}(N_{w}-N_{ke}+1)}{N},$$
where $N_f$ is the number of filter, $N_{ke}$ is the kernel size for four input channels corresponding to in-phase and quadrature symbols for X and Y polarizations. In case information from multiple subcarriers are fed as input to the ANN core $N_f$ should be scaled accordingly. The convolutional layer is assumed to have zero padding with single stride and dilation. For LSTM network, consider the input sequence length for each direction as $N_s = k + N$ where $k = t - (N_{ke}-1)/2$ is the extra symbol length at each side of LSTM input. In this case, the combined RMs for forward and backward LSTM is given by:
$$LSTM_{RM} = \frac{2N_sN_h(4(N_f+N_h)+3)}{N},$$
where $N_h$ is the hidden size. Finally, for an MLP with single hidden layer at the output of LSTM network, the RM is described by:
$$MLP_{RM} = n_m.2N_h + 2n_m,$$
where $n_m$ is the hidden layer size. In case MLP contains more than one hidden layer, extra multiplications should be added, accordingly. Furthermore, in the absence of any hidden layer, $MLP_{RM} =4N_h$ where 4 accounts for two directions of LSTM and two real estimates for inphase and quadrature components per output.

Note that to obtain complexity for each structure in Section 4, we need to calculate and accumulate the RMs associated to ANN cores in the equalization path for all subcarriers. Thus, we mainly use the number of real multiplications per super-symbol (RMpS) as the complexity metric for each realization of an architecture. Super-symbol denotes the combined output symbols for all digital subcarriers across one polarization at each time instance. While we limit the scope of this paper to a DSCM system with four sub-carriers, this metric enables us to further compare the results with other single carrier and DSCM transmission systems that operate on the similar baudrate and tailored for the same throughput in future studies.

4. ANN structures for NLC in DSCM

4.1 Common-core (CC)

First structure for joint NLC in DSCM is a fully-connected black-box approach that contains only one ANN core. As depicted in Fig. 3(a), this single ANN core is tasked to provide nonlinear distortion estimates for all subcarriers of one polarization using a window of received symbols from all subcarriers in both polarizations. Note that employing a model with a common-core (CC) which lacks any enforced structure that separates the iSPM and iXPM nonlinear contributions could be seen as a double-edge sword. In one hand, this increases the number of training parameters compared to a specialized physics-informed ANN where a predetermined structure is enforced on the ANN architecture. On the other hand, by lack of adding any structure on the construct of the network, we may allow maximum entanglement of iSPM and iXPM features through different layers of the ANN core. This can potentially lead to higher efficiency by allowing the network to avoid duplicating terms that could be shared in the absence of a single and fully-connected structure. However, there is always the possibility that the ANN core structure may not be inherently powerful enough for the underlying nonlinear mechanism to learn all the appropriate features even by allowing higher complexity realizations. This could severely limit the performance, especially in the absence of adequate training data and defeat this purpose.

 figure: Fig. 3.

Fig. 3. ANN-NLC structures: (a) common-core and (b) separate-core per subcarrier.

Download Full Size | PDF

4.2 Separate-core (SC) per subcarrier

In order to obtain a subcarrier-based structure and parallelize the model, a separate ANN core is dedicated to generate nonlinear distortion estimates for each subcarrier output, resulting in separate-core (SC) per band architecture. This design is illustrated in Fig. 3(b). Note that similar to CC, ANN cores in SC still operate on input information from all subcarriers. The motivation here is to employ separate and smaller cores per subcarrier in order to be more effective in fine-tuning the model parameters. This is important since inner and outer subcarriers may experience different balances of iSPM and iXPM nonlinear distortions. Also, in terms of flexibility, in case there are inactive subcarriers due to the network throughput demands, such as hitless capacity upgrades or in P2MP scenarios, the parallel design in SC could be more efficiently deployed compared to the single connected core architecture of CC. However, one potential drawback for this structure is that utility sharing between equalization paths of different subcarriers is prevented.

4.3 Modular-I (M1)

In order to obtain more flexible ANN-NLC models, we move on from the black-box approach and incorporate deeper insights from perturbation analysis of fiber nonlinearity. Specifically, the underlying mathematics behind iSPM triplet coefficients in perturbation analysis weakly relies on the absolute position of a subcarrier in the spectrum [4].

Additionally, iXPM nonlinearity mechanism relies on the relative position of target and interfering subcarriers. Hence, only a small set of iSPM and iXPM cores can be trained and multiple instances of the trained ANN cores could be deployed in the equalization path. Furthermore, smaller and more efficient networks can be deployed by involving only iXPM contributions of immediate neighboring subcarriers where iXPM contributions are strongest [4]. Figure 4 illustrates a set of ANN cores for M1 design where one iSPM and four iXPM cores are trained to model the intra- and inter-subcarrier nonlinearities for up to two neighboring subcarriers from each side. Note that the input to an iSPM core is a window of the target subcarrier symbols while the iXPM cores employ symbols from both target and interfering subcarriers.

 figure: Fig. 4.

Fig. 4. Structural design of ANN-NLC using Modular-I. (a) illustrates the trained cores and (b) illustrates the implementation for a 4-subcarrier system.

Download Full Size | PDF

Let us take a look at an implementation of M1 ANN equalizer based on the suggested trained modules on a DSCM system with four subcarriers. The block diagram for this modular ANN equalizer is depicted in Fig. 4(b) where four iSPM cores compensate self nonlinearities originated from each subcarrier. Moreover, two inner and outer subcarrier pairs additionally employ three and two iXPM cores, respectively. Note that ANN cores with similar color share same layouts and weights leading to more efficient training specifically with limited data. Provided that channel parameters and subcarrier bandwidth and spacing remain the same, additional cores with learned weights and biases from this example can be deployed for systems with higher number of subcarriers. With a proper training strategy, the proposed structure allows us to separate iSPM and iXPM contributions and informatively direct computational resources to the best route. This is evident in the numerical results where we explore moving beyond iSPM compensation for various modular designs.

Another advantage of this modular design can be seen in certain scenarios, such as hitless capacity upgrades or in P2MP scenarios, wherein certain subcarriers could be turned-off. In this case, SC and specially CC models trained over a fully loaded system may not be efficiently utilized as the statistics of inputs to ANN core(s) for to the deactivated subcarrier(s) would be vastly different from the training stage. Additionally, it would be almost impossible to effectively identify and disable routes within the ANN that correspond to the absent subcarriers to save power or reduce penalty. However, a modular design can be readily reconfigured to accommodate such scenarios by deactivating the equalization paths corresponding to absent subcarriers, leading to a flexible and power-efficient deployment.

4.4 Modular-II (M2)

The next step in the evolution of ANN-NLC for DSCM is rooted in two observations. First, the perturbation analysis [5,17,20] suggests that the iXPM perturbation coefficients $C_{m,n}^{(-\ell )}$ governing the interaction of subcarrier $i$ and its $\ell$’s neighbor on right $i+\ell$ are similar to those of subcarrier $i$ and its $\ell$’s neighbor on left $i-\ell$, provided that we employ a simple transformation, i.e.,

$$C_{m,n}^{(-\ell)} = {C_{{-}m,n}^{(\ell)}}^*,$$
where $m$ and $n$ are the symbol indices. Additionally, since these perturbation coefficients mainly rely on the relative position of subcarriers, the similarity can be extended to subcarrier $i+\ell$ and its $\ell$’s neighbor on the left, $i$. Note that the iXPM$(+\ell )$ and iXPM$(-\ell )$ cores in M1 for $i$ and $i+\ell$ subcarriers, respectively, are solely fed by inputs from these two subcarriers. This hints to potential computational savings by merging iXPM$(+\ell )$ and iXPM$(-\ell )$ cores that operate on same subcarriers in M1 into a super core iXPM$(\pm \ell )$ and potentially obtain a more efficient structure that preserve similar performance levels with a lower complexity.

The output features of these super-cores along with the appropriate iSPM features are passed to MLP modules prior to aggregation for each subcarrier. Note that MLP layers are detached from the ANN cores in this design and a set of $2\ell +1$ MLP modules are trained in this approach to model integration of iSPM features and up to $2\ell$ iXPM core features involving neighboring subcarriers. The trained MLP modules are appropriately instantiated in the inference path for each subcarrier. Figure 5 shows a block diagram for this model with four subcarriers and $\ell =2$.

 figure: Fig. 5.

Fig. 5. Modular-II design for DSCM ANN-NLC.

Download Full Size | PDF

In summary, potential efficiency advantages of M2 could be of two fold. First, merging cores that are believed to contain significant amount of shared computations for feature generation can increase model efficiency. Second, reducing the distinct parameters of a network by replicating trained modules can greatly improve training efficiency model generalization. Also as mentioned before, the modular design provides additional flexibility in crafting more intelligent solutions for different network operational scenarios.

5. Numerical results

5.1 System model

The simulation setup includes typical Tx, channel, and Rx modules for a DSCM transmission scenario. To focus on fiber nonlinearity, we consider ideal electrical components and Mach-Zhender modulator. Additionally, DACs/ADCs are ideal with no quantization or clipping effects. The dual polarization fiber channel is modeled by split-step Fourier method [21] with adaptive step-size and maximum nonlinear phase-rotation of 0.05 degree to ensure sufficient accuracy. At the Rx side, the sequence output from carrier recovery (CR) are used to train and evaluate the nonlinear equalizer. Standard DSP algorithms are employed for detection and processing of the received signal at the Rx. The block diagram for such system is depicted in Fig. 6. Note that, to keep the ability of conventional coherent receiver for phase correction under correlated phase-noise (which is coming from nonlinear propagation in our case), we deployed the carrier recovery before ANN-NLC module. This ensures that the linear equalization provides nonlinear phase compensation capability of a coherent receiver without a dedicated NLC module. Hence, the reported ANN-NLC gains are relative to the best linear performance.

 figure: Fig. 6.

Fig. 6. System model for DSCM system.

Download Full Size | PDF

To evaluate and optimize different algorithms, we focus on a single-channel DSCM system operating at 32 Gbaud with four subcarriers and uniform 16QAM modulation format. The signal at each subcarrier is digitally generated using root-raised cosine pulse shape with a roll-off factor of $1/16$. The link consists of 40 spans of standard single-mode fiber of 80 km length, followed by optical amplifiers with $6$ dB noise figure. Furthermore, for the most of numerical results we consider a symmetric dispersion map, in which 50% of the total dispersion is digitally pre-compensated at the transmitter side. This in turn allows us to simplify the diagrams and avoid unnecessary complications at this stage. Section 6. is devoted to the extension of this design for other dispersion maps where we provide ANN-NLC structures optimized for a post chromatic dispersion compensation (CDC) scenario. The training and evaluation of models are performed using dataset obtained at $2$ dBm launch power. This is close to the optimal launch power when DBP at 2 Sa/sym and 1 and 2 steps-per-span are employed to benchmark these results. Note that for this setup, a Q-factor of $Q=7.88$ dB can be obtained at the optimal launch power of $1$ dBm in the absence of fiber-nonlinearity compensation.

5.2 ANN optimization workflow

All the models here are trained and evaluated based on the simulation data using $2^{18}$ symbols per digital subcarrier. The training and evaluation data are generated from pseudo-random streams and different generator seeds using permuted congruential generator (PCG64). Also, $20 {\%}$ of the training dataset was set aside for validation of the model during the training process. Root minimum squared error (RMSE) between the model outputs and the difference between transmitted symbols and the received values constitutes the loss, which is used in the back-propagation process to update the model coefficients. All models were trained using Adam optimizer with learning-rate $=0.001$ for at least $200$ epochs, unless terminated by the early-stopping mechanism that tracks the validation loss and prevents over-fitting. We mainly used mini-batches of length $512$ in obtaining these results. Minor performance differences were observed by exploring mini-batch sizes as low as $128$, and as high as $2048$ provided that the learning-rate and the number of epochs were optimized accordingly. Additionally, we employed a learning-rate scheduler that reduces the learning-rate by $20{\%}$ when loss stops reducing for 10 epochs. For each model, best coefficients associated to the least validation loss across all training epochs were saved at the end of the training stage. Note that the code for this simulation setups along with ANN algorithms are implemented in Python using Pytorch library.

In order to explore performance versus complexity tradeoff, more than a thousand models for each design are trained. After training, each model is tested over different block-sizes. Table 1 illustrates the list of the ANN-Core hyper-parameters and their sweeping ranges. The sweeping resolution of each parameter within each participating ANN core are individually adjusted for each model structure. We use the scatter plots reflecting the performance-complexity of different realizations of each model based on a common test dataset obtained from a separate transmission simulation using noise and bit sequences of different random-number generator algorithms and seeds. The envelope associated to be best performing models at various complexity constraints are generated in order to compare different architectures.

Tables Icon

Table 1. List of hyper-parameters for ANN core that operates on a sequence length $T=2t+1$ with $t \in [5:40]$.

5.3 Numerical results comparison

Here, we provide performance versus complexity tradeoff comparison of various optimized ANN equalizers over different block-sizes. Figure 7 illustrates the inference cost of various models in terms of RMpS. From ANN design point of view, it is important to efficiently allocate additional complexity in order to improve performance since majority of the models demonstrate subpar efficiency. For example, increasing hidden size of LSTM may not be an efficient strategy to improve the performance if filter-tap size $k$ is not large enough to capture the nonlinear memory.

 figure: Fig. 7.

Fig. 7. A comparison of performance as a function of RMpS among different explored ANN-NLC solutions for DSCM.

Download Full Size | PDF

It can be seen that using a separate ANN core per each subcarrier did not significantly change the outcome of SC compared to CC. Their best performance remains around 8.8 dB. The performance-complexity tradeoffs for these models remain very similar for all block sizes. One can clearly observe various advantages of modular solutions compared to the black-box approaches represented by CC and SC. Both modular solutions offer clear superiority in both low and high complexity regions while M2 structure, specifically, demonstrates superior performance complexity trade-off across all complexity regions among all structures. Note that the performance for iSPM compensation is capped around 8.6 dB. Employing additional cores to compensate for intra-subcarrier nonlinearities due to immediate neighboring subcarriers $\ell =1$ from each side (iXPM1) can significantly increase the maximum performance to around 9 dB, unlocking 0.4 dB gain compared to the iSPM compensation at 2 dBm launch power. We further explored another scenario by incorporating iXPM contributions of two subcarriers from each side. However, the results are omitted as we did not observe a meaningful additional performance gain for this scenario. This result can be corroborated by findings in Section 6, where we demonstrate the perturbation coefficients corresponding to the iXPM contributions of the second neighbors for this setup showing that the magnitude of these coefficients are around 10dB lower than iXPM contributions from the immediate neighbors.

Note that the best performance obtained from modular solutions are generally 0.2 dB higher than CC and SC. This suggests that these solutions can more efficiently learn from a limited training data due to a more generalized structure with fewer trainable parameters. The trade-off between performance and complexity in mid-tier performance regions with Q around 8.6 dB is particularly noteworthy where non-modular designs can compete with M1. Note that this region is the onset of switching away from iSPM only NLC to incorporate iXPM nonlinearities from immediate neighboring subcarriers. This suggests that CC and SC architectures can converge to a moderately efficient structures by internally sharing resources of iXPM compensation between neighboring subcarriers. This type of resource sharing is one of the main distinctive features of M2 model compared to M1 that reflects in its superior efficiency in this region.

In order to demonstrate advantages of block-processing, performance versus complexity evaluations for different block-sizes are illustrated in Fig. 8 for M2 model as an example. A substantial complexity reduction for a very minimal performance loss can be obtained by parallelization of the trained ANN core and deploying the solution with a block-size $N>1$ provided that the model is sufficiently generalized in the training stage. In high-performance region ($Q>8.8$), we can achieve a complexity reduction by a factor of $20$ for $N=1024$. However, the complexity advantages shrinks in lower performance regions (ex. factor of $5$ for $Q\sim 8.4$) where best models generally have lower filter-tap size and incorporate less nonlinear memory.

 figure: Fig. 8.

Fig. 8. Impact of block-size on performance vs. complexity of the best M2 models.

Download Full Size | PDF

Next, performance envelopes for all models as a function of the number of training parameters are depicted in Fig. 9. The number of training parameters is related to the memory requirements to store and retrieve model parameters as link configuration is modified over-time. This metric can also measure the efficiency of a model to provide a certain performance level with the least independent parameters which is also closely tied with generalization of the ANN. For a mid-tier performance of around 8.6 dB, the modular solutions generally demonstrate approximately 2 to 4 times lower number of parameters compared to CC and SC. Note that CC and SC solutions supposedly have access to all subcarrier information and are not limited to iSPM+iXPM1 architectures of M1 and M2. However, this assumed advantage results in a significant loss for CC and SC solutions if the number of training parameters are below 40,000. We attempted to close this performance gap by increasing the number of epoches for non-modular solutions and further optimizing the learning rates without much success. This may indicate that practical ANN design in presence of various limitations and constraints for this problem is far from a plug and play approach and requires careful design using insights from the physical model.

 figure: Fig. 9.

Fig. 9. A comparison of performance as a function of number of training parameters amongst different explored ANN-NLC solutions for DSCM.

Download Full Size | PDF

Next, we explore the applicability of proposed models on similar links with different optical launch powers. Figure 10 illustrates the performance as a function of optical launch power where multiple graphs are presented for best models obtained with different complexity budget constraints. As stated earlier, all models where trained at $2$ dBm optical launch power. Note that selected models from all structures demonstrate good generalization and can provide nonlinear performance gain on a wide range of launch powers, spanning from linear regime to deep nonlinearity. We provide DBP plots with different number of steps per span (StPS) to benchmark the performance of various proposed ANN-NLC structures at different complexity levels. Unlike ANN-NLC solutions that operate on symbol rate, DBP operates at 2 Sa/sym to maintain its performance. Note that, the complexity comparison in terms of RMpS with DBP is not performed here since DBP faces implementation challenges beyond multiplication complexity as mentioned in Section 1.

 figure: Fig. 10.

Fig. 10. Performance of different ANN-NLC solutions as a function of optical launch power for given complexity constraint budgets. Also, DBP results with 1 and 2 step-per-span are added to all figure as a performance measure.

Download Full Size | PDF

Finally, Fig. 11 illustrates the constellation graphs for all subcarriers with and without nonlinear compensation using M2. Note that the proposed ANN-NLC solution clearly improves signal quality of all subcarriers without introducing constellation artifacts. Additionally, including a feed-forward carrier recovery (such as maximum-likelihood (ML)-CR [22]) which is commonly part of the linear equalization has a significant impact on the overall performance of the system. In this example, ML-CR partially mitigates part of the nonlinear phase noise and improves the signal quality even without a dedicated nonlinear equalizer. This is evident in by comparing constellation graphs for linear equalization with and without the ML-CR stage. The ANN-NLC improves the performance by addressing the remaining phase and amplitude nonlinear distortion.

 figure: Fig. 11.

Fig. 11. Impact of ML-CR and fiber nonlinearity compensation (M2 iSPM+iXPM1) on the signal constellation.

Download Full Size | PDF

6. Impact of dispersion map

So far, we have shown application of ANN-NLC equalizers in transmission scenarios with symmetric dispersion map. As depicted in Fig. 12, the windows for symbols of interest from target and interfering subcarriers for iSPM and iXPM triplet features are symmetric around the reference symbols in a symmetric dispersion map. This is the main reason that symmetric windows of soft values are selected as input to iSPM an iXPM cores in the previous designs. However, in presence of an asymmetric dispersion map, such as post dispersion compensation, the regions for iXPM features of most significance are neither symmetric nor centered around the reference symbol from interfering subcarrier as shown in Fig. 13. Hence, one needs to adjust the input features for each iXPM core according to the dispersion-induced group-delay between the involved subcarriers. Another approach is to introduce delay lines in the input and output of the ANN equalizer and maintain a symmetric window of inputs for the ANN cores. Specifically, to ensure proper operation of the equalizer in this case, we introduce a progressive delay amounting to half of the dispersion-induced group delay between subcarriers prior to the ANN equalization. To reverse this impact, another delay-line is added at the out of the ANN equalizer. Note that the window size for each iXPM core needs to be as large as the maximum group-delay between associated subcarriers. This ensures symbols that impacted the target symbol are appropriately involved. Figure 14 illustrates a block diagram for this solution.

 figure: Fig. 12.

Fig. 12. Magnitude of iSPM ($\ell =0$) and iXPM ($\ell \not =0$) perturbation coefficients $C_{m,n}^{(\ell )}$ for the DSCM simulation setup with sym-CDC: (a) $\ell =-2$, (b) $\ell =-1$, (c) $\ell =0$, (d) $\ell =1$, (e) $\ell =2$.

Download Full Size | PDF

 figure: Fig. 13.

Fig. 13. Magnitude of iSPM ($\ell =0$) and iXPM ($\ell \not =0$) perturbation coefficients $C_{m,n}^{(\ell )}$ for the DSCM simulation setup with post-CDC: (a) $\ell =-2$, (b) $\ell =-1$, (c) $\ell =0$, (d) $\ell =1$, (e) $\ell =2$.

Download Full Size | PDF

 figure: Fig. 14.

Fig. 14. Delay adjustment for post-CDC dispersion map.

Download Full Size | PDF

We have modified the simulation setup to provide a performance comparison of few ANN-NLC structures for symmetric- and post-CDC scenarios. Similar trends are observed for CC and M2 solutions with post-CDC in Fig. 15, attesting to the applicability the proposed solution to address asymmetric dispersion maps. Note that similar performance gains are achieved by switching from iSPM to iSPM+iXPM1 nonlinear equalization for these schemes. Also, we observe that the complexity of all NLC solutions with post-CDC is higher than their respective counterpart with symmetric-CDC for a given performance level. This can be attributed to the larger memory of iSPM and iXPM nonlinearities in the link with post-CDC. This is corroborated by comparing the domain and magnitude of perturbation coefficients as in Fig. 12 and Fig. 13.

 figure: Fig. 15.

Fig. 15. Comparison on the impact of dispersion map on effectiveness of ANN-NLC using envelope associated to best performing models at different block-sizes.

Download Full Size | PDF

7. Conclusion

In this work, we studied different ANN approaches in compensation of intra-channel nonlinearities in DSCM systems. By training and evaluating various models over a comprehensive grid of parameters, we explored performance-complexity tradeoff of each approach and discussed their scalability, potentials and weaknesses. Starting from back-box approaches in designing ANN models, we gradually moved towards modular designs inspired by perturbation analysis of fiber nonlinearity. This approach proved to be more efficient in training as well as inference complexity and model storage requirements. We further demonstrate a pragmatic approach to adapt the proposed solutions to links with asymmetric dispersion maps. While these networks were exclusively designed for fiber nonlinearity compensation, a similar approach can be further studied in the context of component nonlinearity compensation in DSCM systems.

Note that all these designs can be further optimized by looking at other avenues. Notable approaches such as weight pruning, quantization and also future extension to quantization-aware training in form of quantized [12] and binary [23] neural networks can be explored to drastically reduce complexity of these models. Despite this, we believe that our presented study provides a fair comparison and good starting step towards that path by focusing on the macro design of the ANN equalizers tailored to the characteristics of the fiber nonlinearity distortion mechanism in multi-subcarrier systems. Furthermore, for the real-world application of the presented solutions, the models can be initially trained through simulation and offline data and then adapted to the practical scenarios through transfer learning [24].

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]  

2. L. B. Du and A. J. Lowery, “Improved single channel backpropagation for intra-channel fiber nonlinearity compensation in long-haul optical communication systems,” Opt. Express 18(16), 17075–17088 (2010). [CrossRef]  

3. E. F. Mateo, F. Yaman, and G. Li, “Efficient compensation of inter-channel nonlinear effects via digital backward propagation in WDM optical transmission,” Opt. Express 18(14), 15144–15154 (2010). [CrossRef]  

4. A. Mecozzi and R.-J. Essiambre, “Nonlinear Shannon limit in pseudolinear coherent systems,” J. Lightwave Technol. 30(12), 2011–2024 (2012). [CrossRef]  

5. Z. Tao, L. Dou, W. Yan, L. Li, T. Hoshida, and J. C. Rasmussen, “Multiplier-free intrachannel nonlinearity compensating algorithm operating at symbol rate,” J. Lightwave Technol. 29(17), 2570–2576 (2011). [CrossRef]  

6. S. Zhang, F. Yaman, K. Nakamura, T. Inoue, V. Kamalov, L. Jovanovski, V. Vusirikala, E. Mateo, Y. Inada, and T. Wang, “Field and lab experimental demonstration of nonlinear impairment compensation using neural networks,” Nat. Commun. 10, 1–8 (2019). [CrossRef]  

7. C. Häger and H. D. Pfister, “Nonlinear interference mitigation via deep neural networks,” in Optical Fiber Communication Conference (Optical Society of America, 2018), paper W3A–4.

8. O. Sidelnikov, A. Redyuk, S. Sygletos, M. Fedoruk, and S. Turitsyn, “Advanced convolutional neural networks for nonlinearity mitigation in long-haul WDM transmission systems,” J. Lightwave Technol. 39(8), 2397–2406 (2021). [CrossRef]  

9. S. Deligiannidis, A. Bogris, C. Mesaritakis, and Y. Kopsinis, “Compensation of fiber nonlinearities in digital coherent systems leveraging long short-term memory neural networks,” J. Lightwave Technol. 38(21), 5991–5999 (2020). [CrossRef]  

10. P. J. Freire, Y. Osadchuk, B. Spinnler, A. Napoli, W. Schairer, N. Costa, J. E. Prilepsky, and S. K. Turitsyn, “Performance versus complexity study of neural network equalizers in coherent optical systems,” J. Lightwave Technol. 39(19), 6085–6096 (2021). [CrossRef]  

11. B. B. Hamgini, H. Najafi, A. Bakhshali, and Z. Zhang, “Application of transformers for nonlinear channel compensation in optical systems,” arXiv, arXiv:2304.13119 (2023). [CrossRef]  

12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

13. J. W. Nevin, S. Nallaperuma, N. A. Shevchenko, X. Li, M. S. Faruk, and S. J. Savory, “Machine learning for optical fiber communication systems: An introduction and overview,” APL Photonics 6(12), 121101 (2021). [CrossRef]  

14. D. Krause, A. Awadalla, A. S. Karar, H. Sun, and K.-T. Wu, “Design considerations for a digital subcarrier coherent optical modem,” in Optical Fiber Communications Conference and Exhibition (2017), pp. 1–3.

15. M. Qiu, Q. Zhuge, M. Chagnon, Y. Gao, X. Xu, M. Morsy-Osman, and D. V. Plant, “Digital subcarrier multiplexing for fiber nonlinearity mitigation in coherent optical communication systems,” Opt. Express 22(15), 18770–18777 (2014). [CrossRef]  

16. P. K. A. Wai and C. Menyak, “Polarization mode dispersion, decorrelation, and diffusion in optical fibers with randomly varying birefringence,” J. Lightwave Technol. 14(2), 148–157 (1996). [CrossRef]  

17. Y. Gao, J. C. Cartledge, A. S. Karar, S. S.-H. Yam, M. O’Sullivan, C. Laperle, A. Borowiec, and K. Roberts, “Reducing the complexity of perturbation based nonlinearity pre-compensation using symmetric EDC and pulse shaping,” Opt. Express 22(2), 1209–1219 (2014). [CrossRef]  

18. P. J. Freire, A. Napoli, B. Spinnler, N. Costa, S. K. Turitsyn, and J. E. Prilepsky, “Neural networks-based equalizers for coherent optical transmission: Caveats and pitfalls,” IEEE J. Sel. Top. Quantum Electron. 28(4), 1–23 (2022). [CrossRef]  

19. H. Ming, X. Chen, X. Fang, L. Zhang, C. Li, and F. Zhang, “Ultralow complexity long short-term memory network for fiber nonlinearity mitigation in coherent optical communication systems,” J. Lightwave Technol. 40(8), 2427–2434 (2022). [CrossRef]  

20. F. Frey, L. Molle, R. Emmerich, C. Schubert, J. K. Fischer, and R. F. Fischer, “Single-step perturbation-based nonlinearity compensation of intra-and inter-subcarrier nonlinear interference,” in European Conference on Optical Communication (IEEE, 2017), p. P1.SC3.53.

21. G. P. Agrawal, Nonlinear Fiber Optics, 2nd ed. (Academic, 1995).

22. X. Zhou, “An improved feed-forward carrier recovery algorithm for coherent receivers with m-qam modulation format,” IEEE Photonics Technol. Lett. 22(14), 1051–1053 (2010). [CrossRef]  

23. C. Yuan and S. S. Agaian, “A comprehensive review of binary neural network,” arXiv, arXiv:2110.06804 (2022). [CrossRef]  

24. P. J. Freire, D. Abode, J. E. Prilepsky, N. Costa, B. Spinnler, A. Napoli, and S. K. Turitsyn, “Transfer learning for neural networks-based equalizers in coherent optical systems,” J. Lightwave Technol. 39(21), 6733–6745 (2021). [CrossRef]  

Supplementary Material (1)

NameDescription
Supplement 1       Additional graphs omitted from the paper

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (15)

Fig. 1.
Fig. 1. Block diagram for lumped perturbation-based nonlinear compensation.
Fig. 2.
Fig. 2. Multi-purpose ANN-core structure for the equalization path.
Fig. 3.
Fig. 3. ANN-NLC structures: (a) common-core and (b) separate-core per subcarrier.
Fig. 4.
Fig. 4. Structural design of ANN-NLC using Modular-I. (a) illustrates the trained cores and (b) illustrates the implementation for a 4-subcarrier system.
Fig. 5.
Fig. 5. Modular-II design for DSCM ANN-NLC.
Fig. 6.
Fig. 6. System model for DSCM system.
Fig. 7.
Fig. 7. A comparison of performance as a function of RMpS among different explored ANN-NLC solutions for DSCM.
Fig. 8.
Fig. 8. Impact of block-size on performance vs. complexity of the best M2 models.
Fig. 9.
Fig. 9. A comparison of performance as a function of number of training parameters amongst different explored ANN-NLC solutions for DSCM.
Fig. 10.
Fig. 10. Performance of different ANN-NLC solutions as a function of optical launch power for given complexity constraint budgets. Also, DBP results with 1 and 2 step-per-span are added to all figure as a performance measure.
Fig. 11.
Fig. 11. Impact of ML-CR and fiber nonlinearity compensation (M2 iSPM+iXPM1) on the signal constellation.
Fig. 12.
Fig. 12. Magnitude of iSPM ($\ell =0$) and iXPM ($\ell \not =0$) perturbation coefficients $C_{m,n}^{(\ell )}$ for the DSCM simulation setup with sym-CDC: (a) $\ell =-2$, (b) $\ell =-1$, (c) $\ell =0$, (d) $\ell =1$, (e) $\ell =2$.
Fig. 13.
Fig. 13. Magnitude of iSPM ($\ell =0$) and iXPM ($\ell \not =0$) perturbation coefficients $C_{m,n}^{(\ell )}$ for the DSCM simulation setup with post-CDC: (a) $\ell =-2$, (b) $\ell =-1$, (c) $\ell =0$, (d) $\ell =1$, (e) $\ell =2$.
Fig. 14.
Fig. 14. Delay adjustment for post-CDC dispersion map.
Fig. 15.
Fig. 15. Comparison on the impact of dispersion map on effectiveness of ANN-NLC using envelope associated to best performing models at different block-sizes.

Tables (1)

Tables Icon

Table 1. List of hyper-parameters for ANN core that operates on a sequence length T = 2 t + 1 with t [ 5 : 40 ] .

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

u x / y z + α 2 u x / y + j β 2 2 u x / y t 2 = j 8 9 γ [ | u x | 2 + | u y | 2 ] u x / y ,
α = 10 ( P inference ( dB ) P train ( dB ) ) / 10 ,
C N N R M = 4 N f N k e ( N w N k e + 1 ) N ,
L S T M R M = 2 N s N h ( 4 ( N f + N h ) + 3 ) N ,
M L P R M = n m .2 N h + 2 n m ,
C m , n ( ) = C m , n ( ) ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.