Optical performance monitoring using digital coherent receivers and convolutional neural networks

Hyung Joon Cho; Siddharth Varughese; Daniel Lippiatt; Richard Desalvo; Sorin Tibuleac; Stephen E. Ralph

doi:10.1364/OE.406294

1. Introduction

Global IP traffic has been doubling every three years and this growth is expected to continue due to increasing capacity demands from wired customers, including both residential and business [1–2], as well as from significant increases in mobile broadband traffic. The latter driven by increased number of users and the deployment of 5G. Importantly, no matter how a user connects to the internet infrastructure, these signals are quickly aggregated within an optical network via wireless backhaul or campus enterprise and eventually into the core optical network. To meet these demands, both the line rates and spectral efficiency of all optical links continue to increase, enabled by complex signaling techniques implemented at both the transmitter and receiver.

While today’s high capacity dense wavelength division multiplexing (DWDM) networks allow for increased aggregate data rates, the tight optical channel spacing which characterizes these networks impedes traditional monitoring methods such as optical signal-to-noise ratio (OSNR) monitoring via conventional spectrum analysis. Likewise, capacity maximization of these systems depends on many other parameters aside from OSNR such as launch power and modulation format (MF), which are now dynamically optimized using intelligent transport equipment. However, these performance parameters are difficult to assess and real-time deterministic models are computationally very expensive. Therefore, in order to deploy and maintain optimal operation of these links and the entire network as a whole, it is increasingly necessary to create and implement real-time signal monitors that may be less dependent on the complex demodulation algorithms to ensure unbiased monitoring of the received signals.

Until recently, studies on optical performance monitoring (OPM) have relied on traditional digital signal processing (DSP) techniques. These DSP-based OPMs, however, generally require iterative computation for each metric monitored [3] or may require significant overhead to extract the required information [4]. This can be computationally expensive and might not be feasible for real-time monitoring. Recently, machine learning (ML) methods have received considerable attention in the field of optical communication systems [5–6]. These include applications in optical network fault management systems [7–9], linear and nonlinear OSNR estimation [10,11], modulation format identification (MFI) [12–14], and optical performance monitoring (OPM) [15–18]. The majority of these works utilized a deep neural network (DNN) structure which generally requires significant pre-processing on the input data sets to enable the NN to extract learnable features. Convolutional neural networks, on the other hand, include filters (convolutions and pooling) which are jointly optimized with the NN to enable classification. CNNs are known to work well with image classification hence we surmised that CNNs are a good choice for classifying Stokes space and IQ constellations. Indeed, we previously demonstrated accurate OSNR estimation using ML techniques employing only I-Q constellation images [19]. These efforts have shown the potential of ML for performance monitoring, however, the full benefits and tradeoffs of where in the demodulation path to extract signal metrics and what ML methods are optimum have not been explored. Receiver-side demodulation algorithms associated with digital coherent receivers (DCRs) provide multiple points where signal feature extraction is possible; however, as the signal traverses through different DSP modules, signal-like features are enhanced, and distortion-like features are suppressed. Additionally, extracting information earlier in the demodulation may be more beneficial as it allows one to use this information for validating and optimizing later demodulation stages. Thus, there may exist an application-based performance tradeoff when applying ML techniques at various demodulation stages.

Here, we demonstrate the use of convolutional neural networks (CNNs) to estimate OSNR, and BER, and identify MF from received signals at two distinct points within the demodulation process. In the first case, we use three-dimensional (3D) Stokes-space constellation density matrices obtained after the signal has undergone chromatic dispersion (CD) compensation and timing recovery. Using 3D-CNN at the receive side of a multi-channel experimental transmission system, we demonstrate MF identification with >99.8% accuracy, OSNR estimation with <0.5 dB average discrepancy and BER estimation with percentage error of <25% for all formats and all OSNRs investigated (percent error is computed as the mean absolute percent error). In the second case, we apply 2D-CNN to two-dimensional (2D) I-Q constellation density matrices obtained after the signal has gone through all demodulation processes except symbol unmapping. All investigated techniques were validated on both simulated and experimental waveforms. Our results demonstrate that important signal metrics can be effectively extracted at multiple stages within the demodulation process enabling robust OPM that can be used to optimized and validate the demodulation process and to identify changes in network performance.

2. Methodology

2.1 Data preprocessing

The distinct modules of demodulation algorithms for conventional DCRs along with the two locations where features are extracted for our OPM techniques are shown in Fig. 1. The DSP algorithms are grouped into MF-independent and MF-dependent modules. The earliest stage chosen for feature extraction is immediately after timing recovery but prior to polarization demultiplexing where CD compensation has already occurred. Although CD compensation and timing recovery might be implemented within an ML paradigm, it is unlikely an ML approach would be more efficient than the prescriptive algorithms [20]. Choosing to perform OSNR and BER monitoring at this earlier stage of the demodulation chain ensures minimal impact from subsequent modules which may enhance noise or reduce signal content particularly when not operating optimally. The second point of feature extraction investigated is near the end of the demodulation process immediately prior to symbol unmapping. This point is likely to be readily available in many commercial systems and we show that features extracted here can provide accurate estimates of the signal’s OSNR which can be used in combination with other metrics, FEC and estimated BER for example, to validate the performance of the demodulation algorithms. Other link metrics beyond those extracted here may also be available from the accessed waveforms. Furthermore, whether extracted after timing recovery or just prior to symbol demapping, these metrics can be used to inform the control layer of the status of the optical network to enable proactive maintenance.

Fig. 1. DSP configuration within a digital coherent receiver for dual polarization-QAM transmission systems showing two distinct stages where waveforms are accessed to identify modulation format, and estimate OSNR and BER; 3D-CNN after timing recovery and 2D-CNN before symbol unmapping.

Download Full Size | PDF

Depicting signals in Stokes-space reveals unique constellation patterns for different MFs. These patterns are resistant to OSNR variations since these variations do not affect the centroids of the clusters in Stokes-space. Additionally, performing OPM in Stokes-space allows us to calculate useful signal metrics before performing the subsequent demodulation steps. Performance monitoring using projections of Stokes-space constellations onto two dimensions have been previously demonstrated [21,22]. However, these 2D projections may underestimate spatial variations due to polarization mode dispersion (PMD) and polarization dependent loss (PDL) [23,24]. Therefore, we choose to use 3D Stokes-space constellations to minimize information loss and minimize sensitivity to PMD and PDL. We use conventional Stokes-space mapping [23] and 40,000 symbols to generate 3D constellation density matrices for each waveform. The 3D matrices are first scaled to ensure all symbols fit within the matrix where we use the lowest ONSR waveforms to set the scaling and leave this fixed for all OSNRs for each MF. We explored bin sizes from 10 × 10 × 10 to 40 × 40 × 40. The smaller number of bins did not capture the density variations sufficiently well, and the larger number of bins increased processing times without providing any performance improvements. Therefore, we chose 30 × 30 × 30 as the optimal tradeoff between performance and processing times. Stokes-space-based density matrices for four different MFs and two different OSNRs are shown in Figs. 2(a)-(d). To deal with the random rotation of polarization during signal transmission that could affect the constellation density distributions in 3D spaces, we employed principal component analysis (PCA) that is widely used to divide the received symbols into subspaces that include the principal components on the time-recovered symbols [25]. We utilized the score values on each of the three orthogonal axes created by the PCA process to ensure a fixed reference axis for different Stokes-space constellations. Indeed, this is equivalent to projecting rotated images on a fixed reference axis. Conventional I-Q constellation density matrices used in our 2D-CNN used a bin size of 30 × 30, Figs. 2(e)-(h).

Fig. 2. (a)-(d) 3D density matrices of Stokes-space constellations from two different views and (e)-(h) 2D density matrices of conventional I-Q constellations for various modulation formats and OSNRs. Only x-polarized constellations are shown in (e)- (h).

Download Full Size | PDF

Lastly we note that our Stokes space and I-Q images are formed from contiguous samples of the data stream. However, these images can be formed with equal integrity using down sampled data. Thus, access, either immediately after timing recovery or post demodulation, can be implemented with low data rate paths.

2.2 Convolutional neural networks

CNNs are a type of neural network that are widely used in image processing and classification [26,27]. They find useful relations between input images and output classification labels by assigning learnable weights and biases to various parts of the image through the application of a set of kernels and nonlinear activation functions. The dimension of the CNN is the dimension of the input image; we use 3D-CNN to analyze Stokes-space-based constellations and 2D-CNN to analyze conventional I-Q constellations.

The 3D-CNN employed here consists of 3 convolutional layers where the kernels are applied, 3 pooling stages where the feature size is reduced, and 2 fully-connected (FC) layers that provide the final classification/regression, Fig. 3. The numbers of kernels, pooling stages (and their step size; stride) and layers are hyperparameters that are chosen by the user heuristically. We zero-padded the input to the first convolution layer to 33 × 33 × 33 to minimize information loss in the periphery of the constellations. Hence the size of the first layer feature map is 30 × 30 × 30. Each convolutional layer is followed by a batch normalization layer and an activation function. The batch normalization layer adjusts the feature maps to zero mean and unit variance to prevent overfitting [28]. The activation function is a nonlinear rectified linear unit (ReLU) that extracts features. The ReLU is a piecewise linear function that outputs the input if it is positive, and outputs zero if it is negative [29,30].

Fig. 3. 3D-CNN architecture for modulation format identification, and OSNR and BER estimation. Three convolutional layers and 3 corresponding pooling layers were implemented. As example, the initial 30 × 30 × 30 Stokes Space density matrix is convolved with 48 4 × 4x4 kernels yielding 36 3D feature maps. For this first layer, the density matrix is padded to capture information in the periphery of the constellations. Each convolutional layer is followed by a batch normalization layer to mitigate overfitting and a nonlinear rectified linear unit to extract features, (not shown). Pooling reduces the matrix size and is accomplished by averaging over a 2 × 2x2 matrix that is scanned with a 2 × 2x2 step size (i.e. stride). Kernel are initialized using the Glorot method to prevent vanishing or exploding gradients during the training process [31,32]. Dropout layers with a factor of 0.1 are implemented to prevent overfitting. MF is identified as one of 4 output neurons representing each format. OSNR/BER is estimated throughout CNN models trained with regression fitting.

Download Full Size | PDF

Each pooling layer reduces the feature map size by using a sliding window with a specific stride, typically equal to or less than the window size, assigning a single value at each step, and thus downsampling the input while preserving features and reducing computational complexity. Although the pooled images may have specific structure [13], these images do not often add much understanding to the CNN results. We use average pooling here to accommodate uncertainties from residual Stokes-space image rotation that may not be fully compensated by PCA. We ensured the density variations were captured by using appropriately small kernel sizes. The global average pooling layer (stride equals window size) after the third convolution stage computes the overall mean of its input. Finally, the features learned in the last convolutional layer are processed by two fully-connected layers and signal metrics such as MF, OSNR and BER are computed.

For MFI, the last layer in the CNN architecture consists of four mutually exclusive output neurons (one for each MF investigated here). For OSNR and BER estimation, the last layer consists of a single neuron since the output is a real valued scalar obtained through regression. Additionally, the number of kernels in each convolutional layer for MFI was increased by 25% to accommodate the larger training data sets from all MFs. A dilated convolution technique (rate = 2) was also employed in the first convolutional layer. Dilated convolution inserts zeroes among the original kernel elements, expanding the kernel size, to increase its receptive field and thereby accommodate uncertainties in the rotational position of the 3D constellation without highly increasing the number of CNN parameters. In our investigations, we observed that unscaled target variables often resulted in unstable learning owing to large updates in the gradient descent [33]. This was especially true for BER estimation as it had a large dynamic range. Hence, when training our CNN for BER estimation, we used a logarithmic scale to avoid large gradient updates and enable better learning.

We used the same CNN architectures for 2D-CNN and 3D-CNN based OPM. However, the dimensions of the input image were reduced to accommodate 2D density matrices. Furthermore, in case of 2D-CNN, the density matrices for X- and Y- polarizations were treated as two separate channels [34].

3. DWDM link

The experimental DWDM link consists of three co-propagating channels each with 32 GBaud dual polarization (DP) optical signals centered near 1550 nm, Fig. 4(a). The channels are tightly spaced at 37.5 GHz, Fig. 4(b), precluding the use of standard methods of determining the OSNR of the center channel. The electrical signals used to drive the QAM transmitters were generated using a standard PRBS15 pattern. The link consists of three 90 km spans of standard single-mode fiber (SSMF) with launch powers of approximately +2dBm. Prior to the DCR, the signal is optically bandpass filtered (3.5 order Super-Gaussian, BW = 37.5 GHz) selecting only the center channel. The received optical signal was detected via balanced photodiodes of 45 GHz bandwidth and ADCs with 30 GHz bandwidth. Therefore, the optical signal at the input to the receiver, and the electrical signal input to the demodulation DSP, Fig. 4(d), do not contain any information about the noise floor that can be used to determine the OSNR using conventional means. Thus, all noise is in-band, mimicking the conditions of a tightly spaced DWDM system. The BER is measured, both in simulation and experimentally, by correlating the output symbols from the DSP to the reference PRBS15 sequence and counting the bit errors.

Fig. 4. (a) Experimental DWDM link configured with three DP-32 Gbaud QAM signals copropagating through three spans of 90 km SSMF. (b) Optical spectra of tightly spaced transmitted channel, (c) optical spectra with temporarily increased channel spacing for OSNR measurement, (d) electrical spectrum of received signal after digitization; PS: polarization scrambler; OBPF: optical bandpass filter.

Download Full Size | PDF

The OSNR was varied by amplified spontaneous emission (ASE) noise loading before the optical bandpass filter and measured at the input to bandpass filter using an optical spectrum analyzer (OSA) by temporarily increasing the channel spacing from 37.5 GHz to 112.5 GHz, Fig. 4(c). Changing the channel spacing has minimal on impact the signal’s OSNR and therefore accurate OSNR measurements in tightly spaced channels is ensured. Waveforms were captured with typical OSNRs for the specific format in 0.5 dB increments. 200 waveforms of 40,000 symbols each were captured for each OSNR and MF, and appropriate density matrices were created for the CNN. The data was split as 80%/20% for CNN training and testing, respectively.

For the simulation efforts, we used R-Soft’s OptSim environment and each component was configured to closely match the corresponding experimental component [35]. Our ML methods were first validated in this simulation environment with a single channel back-to-back (B2B) setup. Then, we compare the simulation results to the experimental single-channel B2B setup, and finally explore the performance of ML on the three-channel setup with 270 km of transport fiber.

4. Simulation results

4.1 3D-CNN after timing recovery

Figures 5 shows the MFI estimation accuracy of our 3D-CNN, for each format over a corresponding range of OSNRs. High identification accuracy is obtained for all explored OSNRs and formats and nearly perfect MFI was achieved for OSNRs as much as 2.5 dB lower than those required to reach the FEC thresholds (2.0 × 10⁻³). For example, in a back to back configuration 32 Gbaud QPSK required an OSNR of 10.5 dB to achieve a BER of 2.0 × 10⁻³ yet accurate identification is demonstrated for OSNR as low as 7.5 dB.

Fig. 5. Modulation format identification versus OSNR for all simulated waveforms in a back to back single channel configuration. Waveforms are extracted immediately after timing recovery and processed with 3D-constellation density matrices in Stokes-space.

Download Full Size | PDF

For the CNN-estimated OSNR we define an estimation discrepancy as the difference between the simulated or measured OSNR and the CNN estimate for the same data. The average estimation discrepancy is the average of the estimation discrepancies over a specified data set. We note it is not proper to identify this as an error due to uncertainties within the conventional OSNR measurement technique resulting in ambiguity regarding which value is more accurate.

CNN-estimated OSNR are plotted vs. the measured OSNR for all 40 simulated testing waveforms for each ONSR in Fig. 6. The measured OSNR is that determined within the simulation environment and did not vary significantly among the 3000 waveforms captured, exhibiting a variance <0.02 dB. The average estimation discrepancy for each OSNR (and MF) are also shown in Fig. 6 as “average discrepancy”. Furthermore, if we average the estimated discrepancies over the entire 600 testing samples for each MF, which consists of 40 testing samples for 15 different OSNRs, we find an overall average discrepancy of <0.1 dB for each explored MFs. Estimation discrepancies were independent of OSNR, demonstrating that 3D-CNN based OSNR estimation was not prone to overfitting. Estimation performance degraded slightly with higher order modulation formats due to the associated complexity of the Stokes-space constellations. Regardless, the maximum OSNR estimation discrepancy for any single waveform was <0.5 dB for all explored MFs.

Fig. 6. CNN-estimated OSNR for every simulated format in a back to back single channel configuration. Each OSNR depicts 40 distinct tested waveforms. Waveforms are extracted immediately after timing recovery and processed with 3D-constellation density matrices in Stokes-space. The average discrepancies is <0.1 dB for all OSNRs and MFs.

Download Full Size | PDF

The BER estimation and OSNR estimation accuracy can be seen by overlaying two distinct data sets, Fig. 7; BER vs. OSNR (X’s) extracted directly from simulation and CNN-estimated BER vs. CNN-estimated OSNR (diamonds) from the simulated Stokes-space based constellations. When training the CNN, we used the average BER of each OSNR as our BER targets. However, the BER determined within the simulation environment did not vary by more than 8% among waveforms within each configuration. CNN-estimated BER and OSNR are shown to closely follow their corresponding simulated metrics. Indeed, CNN extracts BER and OSNR as well as direct assessment within the simulation environment even at low OSNRs. The mean absolute percentage error of the estimated BER was <8.4% for all formats.

Fig. 7. BER vs. OSNR (X’s) extracted directly from simulation overlaid with CNN-estimated BER vs. CNN-estimated OSNR (diamonds) from the simulated Stokes-space based constellations for every investigated format in a back to back single channel configuration.

Download Full Size | PDF

4.2 2D-CNN after final channel equalization

The OSNR estimation performance of 2D-CNN applied to simulated conventional I-Q constellations obtained immediately prior to symbol unmapping is shown in Fig. 8. The estimated OSNR is plotted vs. the simulated OSNR for each format. Each simulated OSNR consists of 40 data points, one for each waveform tested, demonstrating that there were no outlier estimations. The average discrepancy for each OSNR is also shown. The overall average OSNR estimation discrepancy for all OSNRs of each format was <0.05 dB. Furthermore, the maximum OSNR estimation discrepancy was <0.2 dB for all explored OSNRs and MFs. These results demonstrate very accurate OSNR estimation, better than those obtained from waveforms extracted after timing recovery, Fig. 6. Since most impairments have been compensated at this DSP stage, the signal and noise characteristics are more visible here, allowing for better OSNR estimates.

Fig. 8. OSNR estimation after the final stage of equalization with conventional 2D-constellation density matrices using simulated transmission waveforms. Each OSNR depicts 40 data points corresponding to the 40 tested waveforms of each configuration. The average discrepancies are significantly <0.1 dB for all OSNRs of each format.

Download Full Size | PDF

These simulation results demonstrate that both 3D-CNN applied to waveforms early in the demodulation process and 2D CNN applied to I-Q constellation images are both effective in estimating useful fundamental signal metrics from simulated waveforms.

5. Experimental results

5.1 3D-CNN after timing recovery

Experimentally, we first examine signals from a single-channel B2B setup to provide a comparison to our simulation results and a reference for the subsequent experimental multichannel results. We first note that the experimentally captured waveforms had measured OSNRs within ±0.1 dB of the target OSNR. Thus, our ML training and testing data sets exhibited a small variance. When estimation errors are computed, they are done with respect to the exact target OSNR. Thus, the minimum deviation of these estimates will be at least 0.1 dB.

The MFI using 3D-CNN on waveforms extracted after timing recovery yielded ∼100% identification accuracies over typical OSNR ranges for all explored MFs, with performance degrading for OSNRs more than 1.5 dB below the corresponding FEC threshold, Fig. 9. The experimental performance is only slightly degraded compared to our results using simulated waveforms in Fig. 5.

Fig. 9. Modulation format identification versus OSNR for each format in a back to back single channel configuration. Experimental waveforms are extracted immediately after timing recovery and processed with 3D-constellation density matrices in Stokes-space.

Download Full Size | PDF

The CNN-estimated OSNR are plotted vs. the experimentally measured OSNR for each of the 40 experimental test waveforms for each ONSR, Fig. 10. The average discrepancy for each OSNR is also shown. The overall average OSNR estimation discrepancy for all OSNRs of each format was <0.5 dB for all explored MFs increasing with format complexity. Additionally, the maximum discrepancy was <1.0, 1.3, 1.5 and 2 dB for QPSK, 8QAM, 16QAM, and 32QAM respectively. These estimation errors, maximum and average, are somewhat larger than those observed with simulated waveforms, Fig. 6. However, these results demonstrate useful OSNR estimations when using waveforms extracted early in the demodulation process from signals that are not amenable to conventional OSNR measurement methods.

Fig. 10. CNN-estimated OSNR in a back to back single channel configuration. Experimental waveforms are extracted immediately after timing recovery and processed with 3D-constellation density matrices in Stokes-space. Each OSNR depicts 40 data points corresponding to the 40 tested waveforms of each configuration.

Download Full Size | PDF

The larger estimation discrepancies observed when using experimental waveforms is partly due to the initial variance of the measured data. This initial variance was relative higher for higher OSNRs especially for DP-32QAM. Additionally, we note the presence of a polarization scrambler in the experimental setup may cause increased residual rotations of the Stokes-space constellations degrading the performance of 3D-CNN.

The BER estimation and OSNR estimation accuracy can be seen by overlaying two distinct data sets, Fig. 11; BER vs. OSNR (X’s) extracted directly from experiment and CNN-estimated BER vs. CNN-estimated OSNR (diamonds) from the experimental Stokes-space based constellations. We used the average experimental BER (11% variation) for each OSNR as our BER targets when training the CNN. BER estimation results after timing recovery exhibited 20.9% error as compared to <8.4% with simulation data. However, 3D-CNN is still able to accurately capture the behavior of experimental BER early in the demodulation process.

Fig. 11. BER vs. OSNR (X’s) extracted directly from experiment overlaid with CNN-estimated BER vs. CNN-estimated OSNR (diamonds) from the experimental Stokes-space based constellations for every investigated format in a back to back single channel configuration.

Download Full Size | PDF

Next, we estimate the performance of these ML techniques on the three-channel link with 270 km of SSMF fiber. MFI performance after timing recovery using 3D-CNN is shown in Fig. 12. The performance is similar to the experimental single channel B2B results, Fig. 9, demonstrating the robustness of the method within a DWDM system. Performance is slightly degraded at very low OSNRs owing to fiber impairments such as PMD and PDL. The launch powers used may only produce modest nonlinearities. It was observed that most misclassifications occurred between nearest neighbor MFs. Nevertheless, an overall MFI accuracy of 99.8% was achieved for typical OSNR ranges, demonstrating the ability of 3D-CNN and Stokes-space-based MFI to extract metrics early in the demodulation process.

Fig. 12. Modulation format identification versus OSNR for each investigated format in a multi-channel transmission through 270 km SSMF. Experimental waveforms are extracted immediately after timing recovery and processed with 3D-constellation density matrices in Stokes-space.

Download Full Size | PDF

The CNN-estimated OSNR is plotted vs. the measured OSNR for each of the 40 experimental test waveforms in the multi-channel setup with 270 km of transmission fiber, Fig. 13. These OSNR estimation results are similar to their corresponding single channel results, Fig. 10. The average discrepancy for each OSNR is also shown. The overall average OSNR estimation discrepancy for all OSNRs of each format is <0.5 dB, Fig. 13. The method yields accurate OSNR for typical OSNRs with <5% of data yielding >1 dB discrepancy. These results demonstrate that 3D-CNN can be effective in assessing OSNR in deployed DWDM links. Additionally, the maximum discrepancy was <1.3, 1.6, 1.9 dB and 2 dB for QPSK, 8QAM, 16QAM, and 32QAM respectively. Higher discrepancies are likely due to PMD and PDL, and its effects on Stokes’ space-based constellations [23]. We note that the OSNR estimation performance for DP-32QAM is restricted to a smaller OSNR range due to maximum achievable OSNR in our experimental link. Improved performance may be achieved through additional waveform preprocessing such as equalization using a constant modulus algorithm [14,22] or by extracting data after the polarization demultiplexing module. However, our goal is to minimize preprocessing and extract waveforms prior to modulation format dependent DSP.

Fig. 13. CNN-estimated OSNR for multi-channel transmission through 270 km of SSMF. Experimental waveforms are extracted immediately after timing recovery and processed with 3D-constellation density matrices in Stokes-space. Each OSNR depicts 40 data points corresponding to the 40 tested waveforms of each configuration.

Download Full Size | PDF

The BER estimation and OSNR estimation accuracy can be seen by overlaying two distinct data sets, Fig. 14; BER vs. OSNR (X’s) extracted directly from experiment and CNN-estimated BER vs. CNN-estimated OSNR (diamonds) from the experimental Stokes-space based constellations. The BER estimation using 3D-CNN accurately captured experimental BER behaviors (9.7% variation) exhibited a percentage error of 22.3%. The slight increase in the estimation discrepancy compared to single channel B2B is likely from additional impairments incurred in the DWDM link. However, the increase was <5%, further demonstrating the benefits of using 3D-CNN early in the demodulation process and the robustness of the method when identifying changes in network performance.

Fig. 14. BER vs. OSNR (X’s) extracted directly from experiment overlaid with CNN-estimated BER vs. CNN-estimated OSNR (diamonds) from the experimental Stokes-space based constellations for every investigated format in a 270 km multi-channel configuration.

Download Full Size | PDF

5.2 2D-CNN after final equalization

The OSNR estimation performance of 2D-CNN applied to experimental conventional I-Q constellations obtained immediately prior to symbol unmapping for single channel back to back and multichannel 270 km of transmission is shown in Figs. 15 and 16, respectively. Again, the results of all 40 tested waveforms are plotted. The average discrepancy for each OSNR is also shown. The overall average OSNR estimation discrepancy for all OSNRs was <0.1 dB for all explored MFs demonstrating that at this DSP stage, constellation densities are more representative of the signal’s OSNR. Additionally, the maximum discrepancy was <0.2, 0.2, 0.3 and 0.7 dB for QPSK, 8QAM, 16QAM, and 32QAM respectively. As was expected from our simulation results, the OSNR estimations are better than those obtained from waveforms extracted after timing recovery since the additional demodulation modules enables improved visibility of the signal and noise characteristics. We note that the performance of 2D-CNN is closer to simulations than 3D-CNN primarily because of polarization scrambling which affects the signal in Stokes’ space but is compensated later during polarization demultiplexing.

Fig. 15. OSNR estimation after the final stage of equalization with conventional 2D-constellation density matrices using experimental B2B waveforms. Each OSNR depicts 40 data points corresponding to the 40 tested waveforms for each configuration. The average discrepancies was <0.1 dB for all OSNRs of each format.

Download Full Size | PDF

Fig. 16. OSNR estimation after the final stage of equalization with conventional 2D-constellation density matrices using experimental waveforms from multi-channel transmission through 270 km SSMF. Each OSNR depicts 40 data points corresponding to the 40 tested waveforms of each configuration. The average discrepancies are shown to be <0.1 dB for all OSNRs of each format.

Download Full Size | PDF

Finally, OSNR estimation performance in the three-channel link before symbol unmapping is shown in Fig. 16. Although fiber impairments marginally degraded the performance of 2D-CNN, the overall average OSNR estimation discrepancies remain below 0.1 dB and the maximum estimation discrepancy was <0.5 dB for all explored MFs and OSNRs. Specifically, the maximum discrepancy was <0.3, 0.3, 0.4 and 0.4 dB for QPSK, 8QAM, 16QAM, and 32QAM respectively. The estimation results represented here consists of 0.5 dB steps of OSNR; however, a continuum of OSNRs can be estimated with similar accuracy since the CNNs were trained with regression fitting where the values of the kernels in CNNs are fitted to capture constellation density patterns over a continuously varying OSNR.

These results validate the robustness of 2D-CNN over a wide range of OSNR within a tightly spaced DWDM system using I-Q constellation densities as the input features. Hence, the method can be used to estimate the OSNR with an average accuracy as good as conventional spectral analysis when the noise floor can be observed.

To investigate how the number of symbols affects the OSNR estimation performance, we tested with a reduced number of symbols for 16-QAM for both simulated and experimental waveforms, Fig. 17. The B2B single-channel configuration has limited nonlinear effects and more clearly reveals the impact of number of symbols. Although larger number of symbols improves the estimation performance, the overall average discrepancy was less than 0.2 dB for total symbols more than 4000 with the exception of experimental data taken immediately after timing recovery. 3D Stokes space assessment (after time recovery) is expected to require more symbols for equal performance due to the larger number of constellation points; 60 in Stoke-space compared to 16 in IQ-constellation for 16QAM. The same reasoning explains why the estimation discrepancy of DP-32QAM increases since the number of unique symbols in Stokes space increases to 240. Although it may be possible to increase the performance by capturing more waveforms to increase number of symbols in each bin particularly for higher order constellation, the underlying stability of the OSNR and the limited value of identifying an OSNR to better than 0.1 dB suggests that a few thousand symbols is sufficient for the waveforms investigated here. Lastly, we note that we have purposely examined constellations (Stokes space and I-Q) since these do not require contiguous symbols to create. Thus meaningful performance metrics can be determined from constellations created by dramatically down sampled acquisition and these are likely to be readily obtainable from within DCRs.

Fig. 17. Average discrepancy of CNN-estimated OSNR vs. number of symbols used to create constellation density matrices for both 3D Stokes-space and 2D IQ constellations. Simulation and experimental waveforms for 16-QAM in a back-to-back single channel configuration are shown.

Download Full Size | PDF

6. Conclusion

In this work, we demonstrated that machine learning can be used to accurately determine performance metrics of tightly spaced DWDM optical links using waveforms extracted from the demodulation process without requiring the transmission of special symbols or sequences and without interfering with the demodulation process. Experimental waveforms accessed early in the demodulation process (immediately after timing recover) yielded modulation format identification with >99.8% accuracy and OSNR estimation with an average discrepancy of <0.5 dB for all investigated formats and OSNRs. The estimated BER accurately captures experimentally obtained BER behaviors with <25% error. Accurate estimation of signal characteristics at early demodulation stages can provide critical performance monitory metrics particularly at low OSNRs, as it can help inform the remaining demodulation stages that lower performance is related to OSNR and no other impairments.

Similarly, we demonstrated OSNR estimation with an average discrepancy of <0.1 dB using experimental waveforms accessed before symbol unmapping. Results using waveforms created from simulations performed systematically better than those obtained from our experimental setup. These results demonstrate the importance of testing machine learning techniques on data obtained from testbeds creating realistic link environments or from deployed systems and not simulation data alone.

Although our methods used continuously sampled waveforms the methods will perform similarly when using down sampled data since we used constellation images as input and these can be formed with equivalent integrity using down sampled data. Thus, low data rate access to the sampled data immediately after timing recovery or post demodulation are suitable to create the input images necessary for our ML methods. Thus these methods do not add excessive complexity to DCRs and can be practically deployed in commercial systems.

Our methods can provide the overall network status of a physical layer thereby informing the control layer to enable proactive and reactive maintenance. The method can be implemented in elastic optical networks as an low-cost OPM tool since the scheme uses standard DCR DSP. The method can be extended to identify other useful signal characteristics such as accumulated CD and PMD and nonlinear contributions to OSNR to broaden its functionality.

Funding

ADVA Optical Networking, Inc.; L3Harris Technologies, Inc.; Georgia Electronic Design Center.

Disclosures

The authors declare no conflicts of interest.

References

1. P. J. Winzer, D. T. Neilson, and A. R. Chraplyvy, “Fiber-optic transmission and networking: the previous 20 and the next 20 years [Invited],” Opt. Express 26(18), 24190–24239 (2018). [CrossRef]

2. Cisco, “Annual Internet Report (2018–2023) White Paper,” On-line: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (2020).

3. Z. Dong, A. P. T. Lau, and C. Lu, “OSNR monitoring for QPSK and 16-QAM systems in presence of fiber nonlinearities for digital coherent receivers,” Opt. Express 20(17), 19520–19534 (2012). [CrossRef]

4. C. Rasmussen and M. Aydinlik, “Optical signal-to-noise ratio monitoring and measurement in optical communications systems,” U.S. patent 2,015,036,516,5A1 (2017).

5. F. Musumeci, C. Rottondi, A. Nag, I. Macaluso, D. Zibar, M. Ruffini, and M. Tornatore, “An overview on application of machine learning techniques in optical networks,” IEEE Commun. Surv. Tutorials 21(2), 1383–1408 (2019). [CrossRef]

6. J. Thrane, J. Wass, M. Piels, J. C. M. Diniz, R. Jones, and D. Zibar, “Machine learning techniques for optical performance monitoring from directly detected PDM-QAM signals,” J. Lightwave Technol. 35(4), 868–875 (2017). [CrossRef]

7. F. Musumeci, C. Rottondi, G. Corani, S. Shahkarami, F. Cugini, and M. Tornatore, “A Tutorial on Machine Learning for Failure Management in Optical Networks,” J. Lightwave Technol. 37(16), 4125–4139 (2019). [CrossRef]

8. S. Varughese, D. Lippiatt, T. Richter, S. Tibuleac, and S. E. Ralph, “Identification of Soft Failures in Optical Links Using Low Complexity Anomaly Detection,” in Proc. Optical Fiber Communication Conference (OFC, 2019), paper W2A.46.

9. S. Varughese, D. Lippiatt, T. Richter, S. Tibuleac, and S. E. Ralph, “Low Complexity Soft Failure Detection and Identification in Optical Links using Adaptive Filter Coefficients,” in Proc. Optical Fiber Communication Conference (OFC, 2020), paper M2J.4.

10. F. J. V. Caballero, D. J. Ives, C. Laperle, D. Charlton, Q. Zhuge, M. O’Sullivan, and S. J. Savory, “Machine learning based linear and nonlinear noise estimation,” J. Opt. Commun. Netw. 10(10), D42–D51 (2018). [CrossRef]

11. D. Lippiatt, S. Varughese, T. Richter, S. Tibuleac, and S. E. Ralph, “Joint Linear and Nonlinear Noise Estimation of Optical Links by Exploiting Carrier Phase Recovery,” in Proc. Optical Fiber Communication Conference (OFC, 2020), paper Th2A.49.

12. F. N. Khan, K. Zhong, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Modulation format identification in coherent receivers using deep machine learning,” IEEE Photonics Technol. Lett. 28(17), 1886–1889 (2016). [CrossRef]

13. D. Wang, M. Zhang, J. Li, Z. Li, J. Li, C. Song, and X. Chen, “Intelligent constellation diagram analyzer using convolutional neural network-based deep learning,” Opt. Express 25(15), 17150–17166 (2017). [CrossRef]

14. F. N. Khan, K. Zhong, X. Zhou, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Joint OSNR monitoring and modulation format identification in digital coherent receivers using deep neural networks,” Opt. Express 25(15), 17767–17776 (2017). [CrossRef]

15. Y. Cheng, S. Fu, M. Tang, and D. Liu, “Multi-task deep neural network (MT-DNN) enabled optical performance monitoring from directly detected PDM-QAM signals,” Opt. Express 27(13), 19062–19074 (2019). [CrossRef]

16. A. Salehiomran, G. Gao, and Z. Jiang, “Linear and Nonlinear Noise Monitoring in Coherent Systems Using Fast BER Measurement and Neural Networks,” in Proc. European Conference on Optical Communication (ECOC, 2019), paper W3D.3.

17. Z. Wang, A. Yang, P. Guo, and P. He, “OSNR and nonlinear noise power estimation for optical fiber communication systems using LSTM based deep learning technique,” Opt. Express 26(16), 21346–21357 (2018). [CrossRef]

18. R. Proietti, X. Chen, A. Castro, G. Liu, H. Lu, K. Zhang, J. Guo, Z. Zhu, L. Velasco, and S.J.B Yoo, “Experimental demonstration of cognitive provisioning and alien wavelength monitoring in multi-domain EON,” in Proc. Optical Fiber Communication Conference (OFC, 2018), paper W4F.7.

19. H.J. Cho, D. Lippiatt, S. Varughese, and S.E. Ralph, “Convolutional neural networks for optical performance monitoring,” in Proc. IEEE Avionics and Vehicle Fiber-Optics and Photonics Conf. (AVFOP, 2019), paper WD2.

20. H.J. Cho, S. Varughese, D. Lippiatt, and S.E. Ralph, “Convolutional recurrent machine learning for OSNR and launch power estimation: a critical assessment,” in Proc. Optical Fiber Communication Conference (OFC, 2020), paper M2J. 5.

21. W. Zhang, D. Zhu, Z. He, N. Zhang, X. Zhang, H. Zhang, and Y. Li, “Identifying modulation formats through 2D Stokes planes with deep neural networks,” Opt. Express 26(18), 23507–23517 (2018). [CrossRef]

22. A. Yi, L. Yan, H. Liu, L. Jiang, Y. Pan, B. Luo, and W. Pan, “Modulation format identification and OSNR monitoring using density distributions in Stokes axes for digital coherent receivers,” Opt. Express 27(4), 4471–4479 (2019). [CrossRef]

23. P. Isautier, J. Pan, R. DeSalvo, and S. E. Ralph, “Stokes space-based modulation format recognition for autonomous optical receivers,” J. Lightwave Technol. 33(24), 5157–5163 (2015). [CrossRef]

24. P. Isautier, K. Mehta, A. J. Stark, and S. E. Ralph, “Robust Architecture for Autonomous Coherent Optical Receivers,” J. Opt. Commun. Netw. 7(9), 864–875 (2015). [CrossRef]

25. S. Varughese, J. Langston, S. E. Ralph, and R. DeSalvo, “Blind polarization identification and demultiplexing using statistical learning,” in Proc. IEEE Photonics Conference (IPC), 319–320 (2017).

26. R. C. Gonzalez, “Deep convolutional neural networks,” IEEE Signal Process. Mag. 35(6), 79–87 (2018). [CrossRef]

27. W. Liu, C. Qin, K. Gao, H. Li, Z. Qin, and Y. Cao, “Research on Medical Data Feature Extraction and Intelligent Recognition Technology Based on Convolutional Neural Network,” IEEE Access 7, 150157 (2019). [CrossRef]

28. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” In Proceedings of The 32nd International Conference on Machine Learning, 37, 448–456 (2015).

29. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Proc. Advances Neural Information Processing Systems, 1097–1105, (2012).

30. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

31. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” In Proc. of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9, 249–256 (2010).

32. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). [CrossRef]

33. R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, “Convolutional neural networks: an overview and application in radiology,” Insights Imaging 9(4), 611–629 (2018). [CrossRef]

34. J. Jiang, X. Feng, F. Liu, Y. Xu, and H. Huang, “Multi-spectral RGB-NIR image classification using double-channel CNN,” IEEE Access 7, 20607–20613 (2019). [CrossRef]

35. S. Varughese, J. Langston, V. A. Thomas, S. Tibuleac, and S. E. Ralph, “Frequency dependent ENoB requirements for M-QAM optical links: an analysis using an improved digital to analog converter model,” J. Lightwave Technol. 36(18), 4082–4089 (2018). [CrossRef]

Optical performance monitoring using digital coherent receivers and convolutional neural networks

Abstract

1. Introduction

2. Methodology

2.1 Data preprocessing

2.2 Convolutional neural networks

3. DWDM link

4. Simulation results

4.1 3D-CNN after timing recovery

4.2 2D-CNN after final channel equalization

5. Experimental results

5.1 3D-CNN after timing recovery

5.2 2D-CNN after final equalization

6. Conclusion

Funding

Disclosures

References

Cited By

Figures (17)

Optics Express