Low-complexity full-field ultrafast nonlinear dynamics prediction by a convolutional feature separation modeling method

Hang Yang; Hang Yang; Haochen Zhao; Haochen Zhao; Zekun Niu; Guoqing Pu; Shilin Xiao; Weisheng Hu; Lilin Yi

doi:10.1364/OE.475417

1. Introduction

Most systems inherently exhibit some interactions between linearities and nonlinearities, and system outputs always change in variables over time [1,2]. Complex nonlinear dynamics have long been regarded as challenging and unpredictable in many research fields, including biology, chemistry, hydrodynamics, and engineering [3–5]. In terms of optics and photonics, nonlinear pulse propagation in optical fiber waveguides is a typical complex nonlinear evolution, especially for high-power ultrafast pulses. The analysis, control, and prediction of the ultrafast nonlinear dynamics in optical fiber are of great importance in the development of laser design, experimental optimization, remote sensing, and other fundamental researches [6–10]. The nonlinear Schrödinger equation (NLSE) can accurately describe the pulse propagation in the optical fiber, and the split-step Fourier method (SSFM)-based numerical solution has been proven with high accuracy compared with pulse transmission in the experiment [11,12]. However, the numerical methods typically involve high computation costs due to the requirement of many iterative complex operations. The longtime simulation imposes a severe bottleneck in using conventional methods to predict the complete pulse propagation in the optical fiber [13].

Recently, machine learning has been used for nonlinear dynamics prediction or fiber transmission modeling due to its strong nonlinear data fitting ability, flexible design structure, and low computation complexity. The physics-informed neural network (PINN) was proposed to study the propagation of pulses based on NLSE [14], and the nonlinear dynamics of the pulses can be predicted by applying physical knowledge under some conditions [15,16]. However, when the initial pulse parameters change, the PINN should be re-trained. Namely, the generalization ability of the PINN-based method is extremely limited. The recurrent neural network (RNN) has been regarded as a good prediction tool for time-series data and applied to predict ultrafast nonlinear dynamics with high accuracy and good generalization for multiple conditions [17]. This work aims to predict the intensity profiles of the time domain and spectral domain by two separate NNs. And the phase information of pulses is neglected. In addition, due to the internal loop unit of RNN, the parallelism and complexity of RNN are limited. A feed-forward neural network (FNN) is introduced to predict intensity and phase profiles together in the temporal (or spectral) domain with lower complexity but higher prediction errors [18]. In the work of RNN and FNN, the input pulses are required to be truncated or down-sampled to keep at short input grid points and low complexity [17,18]. When the input grid points cannot be turned to short length manually in some cases and are as long as NLSE simulation, the computational complexity of RNN and FNN will be higher. Furthermore, these structures can only predict the input pulse with the fixed input length, so they can not realize the prediction of the flexible pulse input length. Therefore, a faster and more flexible prediction method for full-field ultrafast nonlinear dynamics with long input grid points is still an open issue.

Here, we focus on full-field pulse propagation modeling by collecting full-length temporal pulses represented by complex numbers as training datasets. It inherently includes the phase information by phase calculation from complex numbers and the spectral information by performing the Fourier transform over the temporal pulses. This eliminates the need for two separate models for temporal and spectral domains, and the complex number representation allows applying some physical knowledge to the pulses. For fast and accurate ultrafast dynamics predictions, the linear features are modeled by NLSE-derived methods, and nonlinear features are modeled by a dedicatedly-designed convolutional neural network (CNN). This convolutional feature separation modeling (FSM) method has advantages as follows (due to the accuracy degradation of the FNN, our work is mainly compared with the RNN):

(1) The temporal relevance of nonlinear effects between the grid points can be reduced by the FSM, resulting in a short local correlation in the time axis. In this case, the CNN structure with much shorter kernel lengths can achieve accurate nonlinearity modeling. The required parameters of CNN are only 0.14% of RNN at the 2048 pulse grid points. The running time of NLSE, RNN, and CNN at 2048 points are 763s, 46s, and 5.9s, which demonstrates a 94% reduction versus NLSE and an 87% reduction versus RNN.
(2) Compared with RNN, the simple CNN structure with FSM can achieve equal accuracy within the training transmission distances and higher accuracy beyond the training distances. This local correlation calculation structure is demonstrated with high stability and good distance generalization capability.
(3) The strong generalization ability is also verified. The input pulse conditions (including pulse durations and peak powers) and transmission distances can be generalized accurately and quickly. Furthermore, based on convolution computation, pulse propagation with dynamic input pulse lengths can be flexibly achieved by CNN. On one hand, the pulse can be truncated to a shorter length for complexity reduction. On the other hand, the input length can be elongated for multiple pulse propagation.

The results demonstrate that the CNN with FSM is a full-field ultrafast nonlinear dynamic prediction method with high accuracy, strong generalization ability, and low complexity. This work also provides a new perspective of feature separation modeling with local correlation calculations for fast and accurate nonlinear dynamics studies in other fields.

2. Principle of the convolutional feature separation modeling method

Ultrafast pulse propagation in optical fiber has rapid nonlinear changes determined by the interplay between a range of nonlinear and linear effects. Specifically, the linear effects always lead to long temporal dependency between the grid points of pulses [19]. The nonlinear effects are determined by the input pulse power and the fiber channel characteristics. The interaction between the linear and nonlinear features accumulates along with transmission. To predict nonlinear dynamics fast and accurately, the proposed prediction method considers the temporal and distance dependency of pulses by separately modeling the linear and nonlinear features and injecting the multiple preceding known pulses into NN.

We use the numerical solution method to generate training data at different distances with the same distance interval. Note that the distance interval of the training dataset is set larger to reduce the complexity compared to that in the NLSE simulation. The data is represented by the real and imaginary parts, so the temporal intensity profiles and phase profiles can be directly generated by one NN. And the spectral profiles can also be easily obtained by Fast Fourier Transform (FFT).

The schematic of the convolutional FSM architecture is presented in Fig. 1(a), where the modeling procedure is divided into 2 steps: NLSE-derived linear model for linear feature modeling and the CNN-based nonlinear model for nonlinear feature modeling. The CNN model contains the input layer, 1-dimensional convolution (conv1d) layers [20], dense layers, and output layer. The iterations are implemented for the longer transmission distance.

Fig. 1. The architecture of the convolutional neural network (CNN) with feature separation modeling (FSM). a, Schematic of the convolutional FSM architecture, showing the NLSE-derived linear model, CNN-based nonlinear model and the iteration steps for longer propagation. b, The detailed structure and data arrangement of the CNN with FSM. The pulses from the previous distances are implemented by linear feature modeling first. And the pulse data are arranged to form a two-dimensional matrix D × 2P, where D is 10 presenting the previous distances number, P is the pulse grid points, and 2P represents each complex number represented by two real numbers. Then 3 conv1d layers and 4 hidden dense layers with shared parameters are applied for the nonlinear feature modeling. The Ch represents the channel number of the kernel in each conv1d layer. The matrices are shown by rectangles with different colours representing different processing stages, and the colour depth of each small square in the matrix indicates the data values.

Download Full Size | PDF

A detailed scheme of the CNN structure with FSM and the data arrangement are illustrated in Fig. 1(b). First, the pulses from previous distances (form z-10Δz to z-Δz) should be collected while predicting the output pulse of distance z. Δz is the interval distance we set along the propagation direction. When predicting the first interval distance (z = Δz), the input previous pulses are the same as the input pulses, i.e. A(-9Δz), …, A(-Δz) = A (0), where A represents the optical pulse profile. The input pulses of multiple distances can improve the prediction accuracy and the optimal distances number is set to 10 by the ergodic parameters search. During the prediction process, the output pulse of distance z is fed back to the input for the pulse prediction of the next distance A(z + Δz). The iteration operation also provides NN with the distance generalization ability.

Second, the NLSE-derived linear model is implemented in one interval distance by prior knowledge, which can be expressed by

(1)$$\tilde{A}(\omega ,z + \Delta z) = \tilde{A}(\omega ,z)\exp \left( {j\left( {\sum\limits_{k \ge 2} {\frac{{{\beta_k}}}{{k!}}{w^k}} } \right)\Delta z} \right),$$

where ω is the frequency component, β_k is the dispersion coefficients associated with the Taylor series expansion of the propagation constant β(ω) . This linear modeling operation can achieve the long-time dependency feature modeling in one step. This process is based on the physics knowledge described by NLSE. Therefore, the linear features can be modeled accurately and not depend on the neural network performances. In addition, this FSM method can be used when the dispersion exists, and has few restrictions for longer distance transmission. FSM method has been used for long-haul multi-channel optical fiber transmission modeling, which shows the effectiveness of this method in wide applications [21].

After the linear modeling operation, conv1d layers and local dense layers are used for nonlinear feature modeling. The convolution kernel length and dense layers dimension we use is relatively small, where a hypothesis that the nonlinear features are short-time dependency is introduced. If linear and nonlinear are modeled together, NN is required to model the global long-time correlation, which will lead to a rapid increase in complexity and the number of parameters. We adopt the method of linear and nonlinear FSM, which can realize local correlation modeling and effectively reduce the parameters and complexity of NN.

As shown in Fig. 1(b), the detailed CNN structure for nonlinear feature modeling consists of convolutional layers and fully connected layers with shared parameters. The input data is arranged in the form of a two-dimensional matrix, and its dimension is expressed as (D × 2P), where D represents the pulses number of preceding distances, and P represents the complex grid points of a pulse. The 2P term refers to real (R) and imaginary (I) parts of P grid points, because the inputs of NN are typically real numbers. The use of a complex-valued neural network is of future discussion and is not covered. After rearrangement, several layers of one-dimensional convolution operation are carried out for local correlation extraction. Each convolution operation contains multiple channels (Ch). The size of the convolution kernel relates to the output dimension we designed. We keep the output dimension of each convolutional layer at (Ch × P), which can be achieved by setting the stride size and padding number.

After obtaining the convolutional layer output, the data of each column (dimension is Ch) is processed by a small fully-connected NN respectively and mapped to the prediction data corresponding to the time grid point. Compared to the method of expanding into one-dimensional data and then fully connecting, our proposed method can effectively reduce the scale and complexity of the dense NN structures. After the local correlation extraction by the convolutional layer, the output data contain the coincident distribution or characteristics. On the basis of this intuition, we share the parameters among the fully-connected NNs, which enables the NN structure with fewer parameters and fast convergence. The input of each dense layer is (Ch×1) and the dense layer output dimension is set as 2, representing a complex point. The total number of the fully-connected NNs is P, corresponding to the column number of convolutional layer output. The outputs of these fully-connected NNs are concatenated to form the output with length 2P, which represents the pulse prediction of the next distance z.

During the training process, the mean absolute error (MAE) loss between the predicted pulse and the actual pulse in the label is calculated for NN training. Through the hyperparametric search, the number of convolution layers we set is 3 and the fully connected hidden layer is 4. The channel number of the kernel is 90, i.e. Ch = 90. We select the convolution kernel size as 18 for the first conv1d layer, and 9 for the other two conv1d layers. If the FSM method is not used, the kernel size should be larger to maintain high prediction accuracy. The hidden dimension of the dense layer is the same as the input layer. The optimizer is RMSprop [22] and the epoch number is 120. The learning rate is initialized as 1e-3 and then decreases with the epochs based on a cosine annealing with the warm restart schedule [23].

The above descriptions stress that CNN with FSM can predict the full-field ultrafast nonlinear dynamics without any pulse truncation by one NN. Meanwhile, the convolution kernel is fed with fixed-length data at a time and then slid continuously to cover the full input data length. Therefore, one convolution kernel flexibly supports the dynamic processing of different input pulse lengths, which allows truncated or extended pulses beyond the points number of the training pulses. We demonstrate two applications of dynamic input pulse lengths capability in predicting phase. The first one, as shown in Fig. 2, is to dynamically truncate pulses to only apply the NN model on the high-energy portion. This can help reduce prediction calculation costs, which will be further discussed in the complexity section. The second one is to accept multiple input pulses in one elongated window. This is useful to apply a trained model on longer input pulse lengths without re-training, which will be further investigated in the generalization section.

Fig. 2. The diagram of the dynamic input pulse length to CNN based on FSM. A threshold can be defined by the user demands and channel conditions in each step to separate the high power and low power pulses. The pulses after linear model with high power are input to the CNN for nonlinear modeling. The pulses after linear model with low power are directly duplicated to the output predicted pulse.

Download Full Size | PDF

3. Nonlinear dynamics modeling setup and results

3.1 Modeling setup

To verify the nonlinear dynamics prediction capability of the proposed CNN structure based on the hybrid modeling method, we need to obtain the fully nonlinear evolution map of injected ultrafast pulses in the optical fiber as the training data set. We chose two typical ultrafast nonlinear scenarios: higher-order soliton (HOS) compression, and broadband optical super-continuum (SC) generation [24]. This FSM method and CNN structure can also be applied to other NLSE-like cases. Note that when the scenarios change, the parameters need to be reset. For example, when the modeling distance is much longer, larger step length of NN is required. Otherwise, the training data set is very large and the training time will be very long. When the step length is long, the dispersion will also increase, so the kernel size should be a little larger. The parameters for both cases are selected by the previous typical works and listed in Table 1 [12,23]. In the first case, the pulse duration $\Delta \mathrm{\tau}$ (full-width at half-maximum, FWHM) and peak power P0 are randomly varied from 0.77-1.43ps and 18.4-34.2W. The soliton number $N = \sqrt {\mathrm{\gamma}{P_0}T_0^2/|{{\beta_2}} |} $ in this field is varied from 3.5 to 8.9 ($\mathrm{\gamma}$ is the nonlinear parameter, ${\beta _2}$ is the group velocity dispersion parameter, and ${T_0} = \Delta \mathrm{\tau}/1.763$). The total transmission distance of the training data is 1300 cm and the step distance length of NN is 13 cm, so the total iteration steps of NN is 100. We demonstrate the input pulse points number, duration, peak power, and distance generalization beyond the training dataset conditions in this case. The input peak power of SC generation varied from 500W to 2000W, and the pulse duration is fixed at 0.1ps. SC generation process includes more complex nonlinear dynamics induced by Raman and self-steepening effects. The results of the SC generation verified the ability of complex nonlinear dynamic prediction. The grid points of the training pulse are 1024 and 2048 for HOS compression and SC generation respectively. We emphasize that the full pulse points can be predicted accurately and fast by the proposed CNN structure with FSM.

Table 1. Modeling parameters

View Table

3.2 Results

3.2.1 Comparisons of different methods

To express the prediction gap between the NN-based model and the NLSE simulation quantitatively, we record the prediction errors of different cases and conditions. The root normalized mean square error (RNMSE) can be described as

(2)$$RNMSE = \frac{1}{n}\mathop \sum \limits_n \sqrt {\frac{{\mathop \sum \nolimits_{p,d} \left\|{A_{n,p,d}} - {{\hat{A}}_{n,p,d}}\right\|^2}}{{\mathop \sum \nolimits_{p,d} \left\|{A_{n,p,d}}\right\|^2}}} ,$$

where A and $\hat{A}$ represents the transmitted pulse represented by complex values simulated by NLSE and NN-based model. The variables p and d represent the grid point and distance index. n denotes the number of random input pulse conditions.

First, the RNN structure and our designed CNN structure with and without FSM are compared. The RNN structure we test is the same as that in [17], consisting of 1 recurrent layer, 2 hidden dense layers, and the output layer. The output dimension of the recurrent layer is the same as that of the input and the previous pulse window is 10. Each input pulse consists of 1024 grid points in the HOS compression scene, so the input dimension of each RNN cell is 1024 for full-length pulse prediction. The kernel size of CNN is set to 18 whether FSM is performed or not, which leads to a fair prediction accuracy comparison under the same complexity.

As shown in Fig. 3, the RNMSE vs. distance is presented to show the iteration errors during the transmission. The results are tested under the conditions of HOS compression. Each RNMSE is computed over the $n = 100$ full-time pulses in the testing dataset with different input conditions. The distance before 13 m is trained, and the distance after 13 m is not trained to test generalization. For both methods, prediction error accumulates over the propagation distances, because the prediction errors at the previous distances will accumulate and further reduce the accuracy at the subsequent distances. The RNMSE of CNN at 13 m showed a significant improvement from 0.15 without FSM to 0.09 with FSM. This highlights the effectiveness of FSM for CNN with small kernel sizes. Other simulations are also implemented to show that the same level of accuracy can be achieved without FSM by increasing the convolution kernel size (larger than 82), but the complexity will significantly increase and the model with many parameters will be hard to train. As for RNN, FSM operation has no effect on the prediction accuracy in this full-length input structure. Since the full-length grid points are input to the RNN cell, global points computation can be achieved, and thus realize good capability for channel effects (linear and nonlinear effects together) modeling. With the FSM, the input points of each RNN cell can be shortened by a sliding length-limited window for full-length modeling, which can reduce the input grid point number and the computational complexity. Compared with the CNN structure, this operation of RNN is required to be designed manually, resulting in a new data processing flow, which is not within the scope of our article. Note that the errors of RNN after 13 m increase sharply, which indicates the weakness of distance generalization. On the contrary, the iterative errors of CNN increase steadily and keep at a lower level than RNN, which demonstrates the good distance generalization ability and stability of the local corrections calculation structure. We emphasize that while ensuring high performance, CNN with FSM has much lower complexity, which will be further described in the complexity section.

Fig. 3. The RNMSE vs. distances of CNN and RNN with (w) or without (w/o) FSM. The RNMSE at each distance is averaged over the 100 full-time pulses in the testing dataset with random input conditions.

Download Full Size | PDF

3.2.2 Accuracy

We also present the detailed results of HOS compression and SC generation to present the prediction accuracy of CNN with FSM. The overall pulse evolution with transmission distance and detailed pulse profile at some distances are illustrated. The results are simulated using NLSE and predicted by CNN with FSM for accuracy comparison.

Figure 4 illustrates the temporal intensity evolution (Fig. 4(a)), spectral intensity evolution (Fig. 4(b)), and phase changes varied with time and distance (Fig. 4(c)) of HOS propagation dynamics. The input peak power of these results is 30W and the duration is 0.8ps. The corresponding soliton number is about 4.7. These conditions are never included in the training dataset. From Fig. 4(a) and Fig. 4(b), one can see the high consistency between the pulse propagation predicted by CNN with FSM and that simulated from NLSE on both temporal intensity evolution and spectral intensity evolution. Note that the narrowest pulse presented in temporal intensity evolution and maximum expansion presented in spectra intensity evolution have an agreement in distance value, which demonstrates the prediction accuracy of maximal compression distance. For clearer visual comparison, we select three distances (1.3 m, 7.8 m, and 10.4 m) to plot the intensity profile in the temporal and spectra domain, where a great deal of overlap of full-time and full-spectral intensity pulse are shown. The RNMSE over the full temporal evolution is calculated as 0.028, which is a lower value than that of RNN and indicates the high accuracy of the pulse prediction during the full evolution.

Fig. 4. Temporal evolution, spectra evolution and pulse profiles of high-order soliton (HOS) compression simulated by NLSE and predicted by CNN with FSM. a-b, Temporal (a), and spectral (b) intensity evolution of NLSE (left panel) and CNN (right panel), and comparison between the predicted (red lines) and simulated (blue lines) profiles at selected distances (middle panel). c, Pulse phase profiles in time domain simulated by NLSE (blue dash lines) and predicted by CNN (red dotted lines). All the results correspond to the input pulse with 0.8ps duration and 30W input peak power. The corresponding soliton number is about 4.7.

Download Full Size | PDF

We also plot the pulse phase profiles in the time domain with different distances in Fig. 4(c) to demonstrate the phase prediction accuracy of CNN with FSM. For the grid point p represented by a complex number, the phase can be calculated as $Phase(p )= \arctan \left[ {\frac{{I(p )}}{{R(p )}}} \right]$, where I(p) and R(p) represent the imaginary and real part of point p respectively. The phase value is in the range of [-$\mathrm{\pi }$, $\mathrm{\pi }$], which can result in a discontinuous phase curve. Therefore, the phase unwrapping algorithm is used to present a continuous and actual phase cures shape [25]. It can be seen that the phase curves of NLSE simulation and CNN prediction have great coincidence, demonstrating high phase prediction accuracy. The full field nonlinear dynamics prediction is successfully achieved by the proposed CNN structure with FSM.

We next implement CNN with FSM to predict more complex nonlinear dynamics, i.e. the generation of a broadband SC. This case considers the femtosecond pulses propagations in the highly nonlinear fiber with anomalous dispersion, where the delayed Raman response and self-steepening effects produce stronger nonlinear dynamics than self-phase modulation [26,27]. Results are shown in Fig. 5(a), and Fig. 5(b) for temporal and spectral intensity evolution respectively. The associated intensity profiles at 1 cm, 6 cm, and 16 cm are also plotted. The input peak power of these results is 1.6 kW. The excellent visual agreement between simulated and predicted evolution maps can be seen from the temporal and spectral domains. For temporal evolution, the generation process induced by high-order dispersion and Raman perturbation, including initial compression, soliton fission, and dispersive emission [23], can be perfectly described by the CNN. For spectral evolution, the generation processes, including dispersive wave generation, continuous redshift, and extreme broadband spectrally output, are perfectly reproduced by CNN. The RNMSE over the full spectral evolution is 0.08, which is a little higher than that of the HOS compression case since the more complex nonlinear dynamics, but it is still at a high accuracy level demonstrated by the accurate pulse profiles. Figure 5(c) plot the pulse phase profiles in spectral domain simulated by NLSE and predicted by CNN. Since the drastic phase change with the frequency, the phase unwrapping algorithm is also used for clear phase curve comparations. The high coincidence proves the phase prediction accuracy by CNN with FSM.

Fig. 5. Temporal evolution, spectra evolution and pulse profiles of supercontinuum (SC) generation simulated by NLSE and predicted by CNN with FSM. a-b, Temporal (a) and spectral (b) intensity evolution of NLSE (left panel) and CNN (right panel), and comparison between the predicted (red lines) and simulated (blue lines) profiles at selected distances (middle panel). c, Pulse phase profiles in spectral domain simulated by NLSE (blue dash lines) and predicted by CNN (red dotted lines). All the results correspond to the input pulse with 0.1ps duration and 1.6 kW input peak power.

Download Full Size | PDF

3.2.3 Generalization

It is well known that the generalization capability of a NN is of huge challenge and an important criterion for applications. Only if a neural network can effectively generalize the input conditions it can have a wide range of applications in practice. For ultrafast pulse propagation, different input conditions always lead to different nonlinear evolution maps.

The input peak power and pulse duration are usually the points of interest for ultrafast pulse studies. The generalization for durations and peak power was reported in previous literature, where RNN is investigated [17]. Here, we demonstrate the generation capability of CNN for different input conditions. The durations and peak power are randomly varied from 0.77-1.43ps and 18.4-34.2W in the training dataset. Therefore, the designed CNN can realize accurate predictions among these ranges. To demonstrate the generalization ability of CNN with FSM, we show the temporal intensity profile with different durations and input peak powers in the HOS compression case. These selected conditions are handed picked and simulated separately. We investigate the generalization against input pulse conditions in Fig. 6. In the upper two rows of Fig. 6, the durations are both 0.8ps, and the input peak powers are 20W and 30W respectively. In the first row and the third row, input peak powers are both 20W, and the durations are 0.8ps and 1.4ps. Under these three sets of conditions, the input pulses compressed and fissured at different rates, causing vastly different pulses shapes at the same distance. Pulses simulated by NLSE and CNN show a great deal of overlap at distances, which shows the generalization ability against input pulses durations and peak powers.

Fig. 6. Generalization capabilities validation of CNN with FSM for high-order soliton compression temporal dynamics. Pulse intensity profiles at selected distances for different input conditions are shown, including 0.8ps duration and 20W input peak power (the first row), 0.8ps duration and 30W input peak power (the second row), and 1.4ps duration and 20W input peak power (the third row). The distances presented are 1.3 m, 7.8 m, 10.4 m, and 19.5 m. Among them, 19.5 m exceeds the maximum length in the training dataset (13 m).

Download Full Size | PDF

Besides generalization over input conditions, we next test and demonstrate the ability of CNN to predict distances farther than the training set, which is the first investigation for the distance generalization capability. The fourth column of Fig. 6 shows the prediction result at 19.6 m, 150% of the training transmission length of 13 m. The results are acquired by further iterating the CNN model by another 50 steps beyond the original 100 steps. Although the RNMSE of the pulse at 19.6 m is larger than that within the training distance conditions, the temporal peak location and intensity agree very well between NLSE and CNN prediction. The pulse difference mainly comes from the low energy part of the pulse base, and the pulse at high power is accurate to maintain the pulse shape and reflect the nonlinear dynamics. The ability to predict beyond the distances in training data shows that CNN does learn the nonlinearity features of ultrafast pulse propagation. As far as we know, such generalization has not yet been reported in the literature concerning ultrafast pulse propagation modeling.

We also test the generalization ability of CNN with FSM for the scenario of SC generation. As the pulse condition setting shown in Table 1, the input peak power is ranging from 500W to 2000W, and the fiber length is within 20 cm in the training dataset. The generation results are presented in Fig. 7. We show the temporal intensity profile with different input peak powers and transmission distances in the SC generation case. The input peak powers are set 673W, 1014W, and 1390W for presentation. These selected conditions never occurred in the training dataset. The transmission length changes between 1.0 cm and 35 cm, which reaches 175% of the training distance. Pulses simulated by NLSE and CNN show a great deal of overlap at different settings, which shows the good generalization ability for different input peak power and distances for the scenario of SC generation.

Fig. 7. Generalization capabilities validation of CNN with FSM for super-continuum (SC) generation temporal dynamics. Pulse intensity profiles at selected distances for different input peak power are shown, including 673W (the first row), 1014W (the second row), and 1390W (the third row). The distances presented are 1.0 cm, 16.0 cm 28.0 cm and 35.0 cm. Among them,35.0 cm exceeds the maximum length in the training dataset (20 cm).

Download Full Size | PDF

The flexibility of CNN also enables modeling multiple input pulses in one elongated window. In Fig. 8 we extended the temporal window size from 10ps to 40ps and corresponding grid points from 1024 to 4096. This enabled inputting 4 pulses of different durations and peak powers at once. All peaks propagated with great accuracy compared to NLSE simulations. This example demonstrated CNN’s generalization ability of pulse duration, power, transmission distance, and simulation grid size.

Fig. 8. Propagation of multiple pulses in single simulation. The pulse number in the training dataset is only one. During the testing process, the input points number can be elongated to form multiple pulses with different conditions. Results is test in the high-order soliton compression case.

Download Full Size | PDF

3.2.4 Complexity

In order to compare the complexity of RNN and CNN, we theoretically calculate the total numbers of NN parameters and multiplication computations. The parameters number directly reflects the running memory and the multiplication number is a common criterion to evaluate the complexity of an algorithm [28].

The CNN structure is described in detail above. The RNN structure we test is the same as that in [17], and is also described above. Although both models can reduce the dimension of the input layer through the operation of truncating input pulse, to more fairly compare the complexity of the two structures, we do not truncate when calculating the number of parameters and the multiplications. Figure 9(a) illustrates the number of parameters vs. the pulses grid points number of RNN and CNN. One can see that the parameter quantity of RNN increases quadratically with the number of input points since the dimension of the hidden layer is the same as the input layer. If the hidden dimension is fixed at a small level, the number of parameters can be reduced with a linear trend, but the accuracy will be reduced. The parameters of CNN remain unchanged with the increase of points, because the designed CNN structure is independent of the number of input pulses grid, as shown in Section 2. The parameters required by CNN are much less than RNN. Specifically, RNN needs about 14 billion parameters at the 2048 pulse grid points, while CNN remains constant at about 200 thousand. Namely, the required parameters of CNN are only 0.14% of RNN at the 2048 pulse grid points, which achieves a magnitude of decrease. The much smaller parameter count can greatly reduce the storage cost.

Fig. 9. The number of parameters (a) and multiplications (b) of CNN and RNN. The grid point number refers to the sampled pulse length input to the neural network. Note that the input pulses are not truncated for a fair comparison between different models.

Download Full Size | PDF

The number of multiplications vs. the grid points of RNN and CNN is illustrated in Fig. 9(b). The results show the nearly linear increase of CNN and the quadratic increase of RNN. The trend with the pulse grid points is determined by the structure and dimension of NN. The multiplication numbers of CNN and RNN with 2048 pulse points are 400 million and 14 billion respectively. With the increase of grid points number, the complexity reduction benefited from the CNN structure becomes more obvious. The much fewer multiplication number of CNN demonstrates the better low computational complexity of CNN.

We also record the running time of the CNN, RNN, and NLSE under the same hardware and software conditions. The NLSE is implemented by the classical separate step Fourier method. The channel conditions setting is the same with SC generation. The codes of these three methods are run on the same server with two Intel Xeon Gold 6146R processers using Python. The model for each condition runs five times and gets an average value as the time results. As shown in Fig. 10, the time increases with the grid points and the realization number on the central processing unit (CPU), and the CNN time is shorter than RNN and NLSE. For example, 10 realizations for different pulses with 2048 points take about 763s by NLSE, 46s by RNN, and 5.9s by CNN. The running time of CNN achieves a 94% reduction versus NLSE and an 87% reduction versus RNN. If NNs are run on the graphics processing unit (GPU), the time can be further reduced.

Fig. 10. The running time to compute evolution maps of supercontinuum using NLSE, RNN, and CNN with FSM on CPU. The transmission distance is 20 cm, and the number of computed maps is 10 and 100.

Download Full Size | PDF

In addition, the input pulse can be truncated to accelerate the operation time as shown in Fig. 2. This is done dynamically and does not require retraining the model. In the HOS compression and 100 realizations case, the input pulse includes 1024 complex points initially and it may be truncated to 760 points with a 5% increase in RNMSE, and the time can be reduced to 13s from 19s. In SC generation, the input pulse includes 2048 complex points initially and it may be truncated to 970 points with a 0.2% increase in RNMSE, and the time can be reduced to 18s from 50s with 100 realizations.

The above results demonstrate that CNN with FSM can accurately, flexibly, and rapidly predict the full-field ultrafast nonlinear dynamics. The intensity and phase information of the full-length input pulse in both temporal and spectral domains can be predicted. Compared with RNN, the CNN with FSM presents equal accuracy within the training transmission distances and higher accuracy beyond the training distances. Since the local convolution computation, the input pulse length can be flexibly shortened and elongated. This work provides a more flexible nonlinear dynamics prediction method. In addition, after the linear feature modeling by the traditional model-driven method, the residual nonlinear feature can be regarded as short-time dependent relations. Then the CNN structure with fewer parameters can be applied for local feature computation, which leads to a prediction method with a much shorter running time. Finally, the CNN with FSM can achieve high accuracy, robust generalization, and low complexity.

4. Conclusion

In conclusion, we propose a convolutional FSM for nonlinear dynamics prediction of the full-field ultrafast pulse evolution. Compared to the existing methods, our proposed method generalizes better and runs faster because of the designed small-scale CNN structure and the linear-nonlinear feature separation operation. The high-degree overlaps of intensity and phase profiles and low RNMSEs demonstrate the accuracy of our proposed method. At the 2048 pulse grid points, the required parameters of CNN are only 0.14% of RNN, and the lower complexity results in an 87% reduction in computing time compared to RNN. In addition, the generalization ability of the modeling scheme under different input pulse conditions is also verified. The CNN structure can realize dynamic input pulse length prediction in the testing process due to the local correlation computing capability. We believe this FSM method with the computation of the local correlation will have positive impacts on future nonlinear physics research attributed to the flexible, high-accuracy, and low-complexity design for nonlinear dynamics analysis and prediction.

Funding

National Natural Science Foundation of China (62025503).

Disclosures

The authors declare no conflicts of interest.

Data availability

The data is obtained from the fiber channel numerical simulation based on NLSE, and the related channel code is obtained from [29].

References

1. P. G. Drazin and P. D. Drazin, Nonlinear system (Cambridge University Press, 1992).

2. D. Swaroop, J. K. Hedrick, P. P. Yip, and J. C. Gerdes, “Dynamic surface control for a class of nonlinear systems,” IEEE Trans. Autom. Control 45(10), 1893–1899 (2000). [CrossRef]

3. E. Mosekilde, Topics in nonlinear dynamics: Applications to physics, biology and economic systems (World Scientific, 2003).

4. S. H. Strogatz, Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering (CRC press, 2018).

5. M. Casdagli, “Nonlinear prediction of chaotic time series,” Phys. D 35(3), 335–356 (1989). [CrossRef]

6. O. Tzang, A. M. Caravaca-Aguirre, K. Wagner, and R. Piestun, “Adaptive wavefront shaping for controlling nonlinear multimode interactions in optical fbres,” Nat. Photonics 12(6), 368–374 (2018). [CrossRef]

7. G. Pu, L. Yi, L. Zhang, and W. Hu, “Intelligent programmable mode-locked fiber laser with a human-like algorithm,” Optica 6(3), 362–369 (2019). [CrossRef]

8. P. N. Slater, “Remote sensing: optics and optical systems,” Reading (1980).

9. B. A. Flusberg, E. D. Cocker, W. Piyawattanametha, et al., “Fiber-optic fluorescence imaging,” Nat. Methods 2(12), 941–950 (2005). [CrossRef]

10. C. R. Petersen, N. Prtljaga, M. Farries, J. Ward, B. Napier, G. Rhys Lloyd, J. Nallala, N. Stone, and O. Bang, “Mid-infrared multispectral tissue imaging using a chalcogenide fiber supercontinuum source,” Opt. Lett. 43(5), 999–1002 (2018). [CrossRef]

11. G. P. Agrawal, Nonlinear fiber optics, 4th ed. (Academic Press, 2006).

12. A. Tikan, C. Billet, G. El, A. Tovbis, M. Bertola, T.T. Sylvestre, F. Gustave, S. Randoux, G. Genty, P. Suret, and J. M. Dudley, “Universality of the Peregrine soliton in the focusing dynamics of the cubic nonlinear Schrödinger equation,” Phys. Rev. Lett. 119(3), 033901 (2017). [CrossRef]

13. G. Genty, L. Salmela, J. M. Dudley, Daniel Brunner, A. Kokhanovskiy, S. Kobtsev, and S. K. Turitsyn, “Machine learning and applications in ultrafast photonics,” Nat. Photonics 15(2), 91–101 (2021). [CrossRef]

14. M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” J. Comput. Phys. 378, 686–707 (2019). [CrossRef]

15. G. Z. Wu, Y. Fang, Y. Y. Wang, G. C. Wu, and C. Q. Dai, “Predicting the dynamic process and model parameters of the vector optical solitons in birefringent fibers via the modified PINN,” Chaos, Solitons Fractals 152, 111393 (2021). [CrossRef]

16. X. Jiang, D. Wang, Q. Fan, M. Zhang, C. Lu, and A. P. T. Lau, “Physics-informed Neural Network for Nonlinear Dynamics in Fiber Optics,” arXiv, arXiv:2109.00526 (2021).

17. L. Salmela, N. Tsipinakis, A. Foi, C. Billet, J. M. Dudley, and G. Genty, “Predicting ultrafast nonlinear dynamics in fiber optics with a recurrent neural network,” Nat. Mach. Intell. 3(4), 344–354 (2021). [CrossRef]

18. L. Salmela, M. Hary, M. Mabed, A. Foi, J. M. Dudley, and G. Genty, “Feed-forward neural network as nonlinear dynamics integrator for supercontinuum generation,” Opt. Lett. 47(4), 802–805 (2022). [CrossRef]

19. M. Tateda, N. Shibata, and S. Seikai, “Interferometric method for chromatic dispersion measurement in a single-mode optical fiber,” IEEE J. Quantum Electron. 17(3), 404–407 (1981). [CrossRef]

20. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural. Comput. 1(4), 541–551 (1989). [CrossRef]

21. H. Yang, Z. Niu, H. Zhao, S. Xiao, W. Hu, and L. Yi, “Fast and accurate waveform modeling of long-haul multi-channel optical fiber transmission using a hybrid model-data driven scheme,” J. Lightwave Technol. 40(14), 4571–4580 (2022). [CrossRef]

22. G. Hinton, N. Srivastava, and K. Swersky, “Neural networks for machine learning lecture 6a overview of mini-batch gradient descent,” Cited on 14, 2 (2012).

23. T. Tieleman and G. Hinton, “Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude,” (2012).

24. J. M. Dudley, G. Genty, and S. Coen, “Supercontinuum generation in photonic crystal fiber,” Rev. Mod. Phys. 78(4), 1135–1184 (2006). [CrossRef]

25. K. Itoh, “Analysis of the phase unwrapping algorithm,” Appl. Opt. 21(14), 2470 (1982). [CrossRef]

26. R. W. Hellwarth, “Theory of stimulated Raman scattering,” Phys. Rev. 130(5), 1850–1852 (1963). [CrossRef]

27. F. DeMartini, C. H. Townes, T. K. Gustafson, et al., “Self-steepening of light pulses,” Phys. Rev. 164(2), 312–323 (1967). [CrossRef]

28. A. Napoli, Z. Maalej, V. A. J. M. Sleiffer, M. Kuschnerov, D. Rafique, E. Timmers, B. Spinnler, T. Rahman, L. D. Coelho, and N. Hanik, “Reduced complexity digital back-propagation methods for optical communication systems,” J. Lightwave Technol. 32(7), 1351–1362 (2014). [CrossRef]

29. pyNLO group, “pyNLO: Nonlinear optics modeling for Python,” GitHub (2015), https://github.com/pyNLO/PyNLO.

Scenarios	HOS compression	SC generation
FWHM (ps)	[0.77,1.43]	0.1
Input peak power (W)	[18.41,34.19]	[500,2000]
Fiber length (cm)	1300	20
Step length of NN (cm)	13	0.1
Step length of NLSE (cm)	0.13	0.001
Temporal window size (ps)	10	5
Grid points	1024	2048
Training sets	2900	1250
Testing sets	100	50
Center wavelength (nm)	1550	810
Dispersion coefficient (ps^order/km)	β₂β₃	β₂, b₃, b₄, b₅, b₆, b₇
Dispersion coefficient (ps^order/km)	-5.23, 4.27e-2	-9.59, 0.0784, -6.84e-05, -4.78e-07, 2.71e-09, -5e-12
Nonlinear parameter (W^-1m^-1)	18.4e-3	0.1
Fiber channel effects	Self-phase modulation	Self-phase modulation, Raman, Self-steepening

Low-complexity full-field ultrafast nonlinear dynamics prediction by a convolutional feature separation modeling method

Abstract

1. Introduction

2. Principle of the convolutional feature separation modeling method

3. Nonlinear dynamics modeling setup and results

3.1 Modeling setup

3.2 Results

3.2.1 Comparisons of different methods

3.2.2 Accuracy

3.2.3 Generalization

3.2.4 Complexity

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (1)

Equations (2)

Optics Express