Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Data-driven inverse design of mode-locked fiber lasers

Open Access Open Access

Abstract

The diverse applications of mode-locked fiber lasers (MLFLs) raise various demands on the output of the laser, including the pulse duration, energy, and shape. Simulation is an excellent method to guide the design and construction of an MLFL for on-demand laser output. Traditional simulation of an MLFL uses the split-step Fourier method (SSFM) to solve the nonlinear Schrödinger (NLS) equation, which suffers from high computational complexity. As a result, the inverse design of MLFLs via the traditional SSFM-based simulation method relies on the design experience. Here, a completely data-driven approach for the inverse design of MLFLs is proposed, which significantly reduces the computational complexity and achieves a fast automatic inverse design of MLFLs. We utilize a recurrent neural network to realize fast and accurate MLFL modeling, then the desired cavity settings meeting the output demands are searched via a deep-reinforcement learning algorithm. The results prove that the data-driven method enables the accurate inverse design of an MLFL to produce a preset target femtosecond pulse with a certain duration and pulse energy. In addition, the cavity settings generating soliton molecules with different target separations can also be located via the data-driven inverse design. With the GPU acceleration, the time consumption of the data-driven inverse design of an MLFL is less than 1.3 hours. The proposed data-driven approach is applicable to guide the inverse design of an MLFL to meet the different demands of various applications.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Owing to the extremely high peak power, wide spectra, and shorter durations, femtosecond pulses hold a vital position in high-precision measurement [1,2], material processing [3], signal processing [4], and optical communication [5]. The mode-locked fiber laser (MLFL) is the primary method of generating femtosecond pulses. To fit different applications, the cavity parameters of an MLFL require to be specially designed to generate femtosecond pulses with different characteristics (e.g., pulse width, energy, and shape). Traditionally, the cavity parameters of an MLFL are repeatedly modified in the experimental field to obtain the target femtosecond pulse, which is verified by complex experimental characterizations. Apparently, the experimental approach is low efficient and extremely relies on design experiences. Simulation is an effective method to guide experiments. The numerical simulation model of the MLFL is implemented based on the split-step Fourier method (SSFM) to iteratively solve the nonlinear Schrödinger equation (NLSE) [6]. Then, a trial-and-error-based manual inverse design is carried out to search the desired cavity parameters according to the laser model simulation results. However, searching the desired cavity setting manually is considerably time-consuming and lacks confidence in locating the optimal solution. Further, as the number of cavity parameters (e.g., cavity length, pump power) increases, the task evolves into a multi-dimensional optimization problem and it becomes very hard to solve by empiricism. Algorithm-driven automatic inverse design consisting of systematic modeling and multi-parameter optimization is an effective method to solve the multi-dimensional systematic design problem, which has been employed in the construction of various optical systems (e.g., nanophotonic devices [7], Raman fiber amplifiers [8,9]). Recently, given a numerical model of an MLFL based on the generalized NLSE, which is solved by SSFM, GA enables the automatic search of cavity parameters (e.g., pump power and fiber length, etc.) [10]. However, the iterative solution process of SSFM costs considerable time to realize the inverse design of an MLFL. Therefore, the need for a rapid and accurate method for designing MLFLs to meet the requirements of different applications becomes urgent.

In recent years, as an emerging technology, deep learning has been applied extensively in the field of photonics and has broken numerous technical barriers [1115]. The improved deep deterministic policy gradient (DDPG) algorithm is applied to control the MLFL to enhance the robustness and responsiveness against environmental instabilities [16,17]. Furthermore, deep learning provides a rapid and accurate means of modeling nonlinear fiber optic systems [18,19]. Previously, we demonstrate a long-short-term memory (LSTM) model with prior information feeding to accurately model MLFLs, which is 6 times faster than the SSFM method given an identical hardware platform [20]. Accurate and rapid modeling of MLFLs can certainly benefit the inverse design and parameter optimization of MLFLs. Thus, fast inverse design of MLFLs may be achieved by involving deep learning for both MLFLs modeling and cavity settings.

In this work, a complete data-driven method for rapid and accurate inverse design of MLFLs is proposed. The data-driven method basically consists of rapid MLFL modeling and multi-parameter optimization. Concretely, our previous LSTM model with prior information feeding for MLFL modeling is wrapped by a DDPG model for multi-parameter optimization. The proposed algorithmic combination allows for quick and accurately optimizes the cavity parameters of the MLFL to generate the target ultrafast pulses. The results demonstrate that the data-driven method can design different MLFLs to produce femtosecond pulses with certain durations and energies, as well as the soliton molecule with various separations. With GPU acceleration, the proposed method completes the rapid design of an MLFL in merely 1.3 hours. We believe the proposed method could assist researchers in quickly building an ideal MLFL and can be extensively referred to in the inverse design of other optical systems.

2. Principles

2.1 Rapid and accurate modeling of the MLFL

Figure 1 shows a typical erbium-doped MLFL based on a saturate absorber (SA) for validating the inverse design algorithm. The MLFL consists of single-mode fiber (SMF), a wavelength division multiplexer (WDM), an SA, a 90/10 optical coupler, and 0.3 m erbium-doped fiber (EDF) that acts as a gain medium pumped via a laser diode (LD). The attenuation, second-order dispersion, third-order dispersion, and field diameter of the SMF are 0.2 dB/km, -0.022 ps2/m, 0.000086 ps3/m, and 10.4 um, respectively.

 figure: Fig. 1.

Fig. 1. The simulated MLFL is based on a saturate absorber. LD, laser diode; WDM, wavelength division multiplexer; EDF, erbium-doped fiber; SMF, single-mode fiber; SA, saturate absorber; ISO, isolator.

Download Full Size | PDF

The evolution process of light field in the MLFL is described via the NLSE expressed as Eq. (1) [21],

$$\frac{{\partial u}}{{\partial z}} + \frac{{({\alpha - g} )}}{2}u + \frac{{i{\beta _2}}}{2}\frac{{{\partial ^2}u}}{{{\partial ^2}t}} - \frac{{{\beta _3}}}{6}\frac{{{\partial ^3}u}}{{{\partial ^3}t}} - \frac{g}{{\Omega _g^2}}\frac{{{\partial ^2}u}}{{{\partial ^2}t}} = i\gamma {|u |^2}u$$
where u is the slow-varying optical field envelope, z is the propagation distance, t is the propagation time, $\alpha $ is the attenuation of the fiber, ${\beta _2}$, ${\beta _3}$ and $\gamma $ are second-order, third-order dispersion and nonlinear refractive index, respectively. In particular, ${\mathrm{\Omega }_g}$ is the MLFL gain bandwidth, g is the saturation gain given in Eq. (2) [22],
$$g = {g_0}/({1 + {E_p}/{E_s}} )$$
where ${g_0}$ is the small signal gain, ${E_p}$ and ${E_s}$ are pulse energy and gain saturation energy, respectively. The small signal gain ${g_0}$ and the cavity length ${L_c}$ can be tuned to optimize the width, energy, and shape of formed pulses in the simulation. To model MLFLs, the SSFM is the commonly-used method to iteratively solve the NLSE, which is based on the idea of considering the linear and nonlinear effects separately in a tiny length of the fiber. As a result, the SSFM is computationally expensive and time-consuming.

An LSTM model is proposed to rapidly and accurately model MLFLs, which is 6 times faster than SSFM [20]. The training data set of the LSTM model is gathered via the SSFM-based MLFL model, where the ${g_0}$ and ${L_c}$ are randomly selected with the range of 2.5∼3.5 ${m^{ - 1}}$ and range of 1.03∼2.05 m (corresponding to the repetition frequency with the range of 100∼200 MHz), respectively. The waveform generated vis SSFM consists of 2048 points with a high time resolution of 4.9 fs. The sequence of waveforms $({{P_i},\; {P_{i + 1}},\; \ldots ,{P_{i + w - 1}}} )$, ${g_0}$ and ${L_c}$ serving as input features of the LSTM model are utilized to predict the waveform ${P_{i + w}}$ of the next roundtrip, where ${P_i}$ denotes the pulse of the i-th roundtrip waveform, w denotes the length of the sequence. Finally, given an initial waveform, a preset ${g_0}$ and ${L_c}$, the LSTM model can accurately predict the internal dynamics of the MLFL via roundtrip-iteration calculations.

2.2 Multi-parameter optimization algorithm of the MLFL

The purpose of the inverse design is to search for an optimal cavity setting enabling the MLFL to produce the target pulse. As shown in Fig. 2, the workflow of the inverse design consists of the environment and the agent. According to the current state (i.e., the actual pulse) of the environment, the agent selects an action, which is a cavity setting composed of ${g_0}$ and ${L_c}$ in this case. Then, the environment updates the state and calculates the reward value based on the next and target state. The DDPG algorithm is used as an agent to optimize the policy to maximize the reward value.

 figure: Fig. 2.

Fig. 2. The workflow of the inverse design of an MLFL.

Download Full Size | PDF

The training architecture of the DDPG algorithm is illustrated in Fig. 3, which consists of two actor networks with the same structure and two critic networks with the same structure. The actor network ${\mu _\theta }({{s_t}} )$ contains 3 hidden layers with 128 nodes, 64 nodes, and 64 nodes, respectively, an input layer with 2048 nodes, an output layer with 2 nodes, where $\theta $ represents the parameters of the network. The actor network serves as a policy network to choose the action ${a_t}$ based on the current state ${s_t}$, where the action represents ${g_0}$ and ${L_c}$, the state represents the actual pulse. Then, the MLFL model can generate the next state ${s_{t + 1}}$ after executing ${a_t}$. The critic network ${q_\omega }({{s_t}\; ,{a_t}} )$ estimates the sum of rewards Q value from the ${s_t}$ till the end of each episode, where $\omega $ is the parameters of the critic network. The target actor and critic networks are the replicas of the actor and critic, which are applied to estimate the future Q value.

 figure: Fig. 3.

Fig. 3. The training architecture of the DDPG algorithm.

Download Full Size | PDF

The reward ${R_t}$ plays an important role in the DDPG algorithm, which guides the training of the actor and critic networks. The differences in the pulse width, energy, and shape between the actual and target pulse are considered in the reward ${R_t}$, as shown in Eq. (3).

$${R_t} ={-} \{{\alpha [{MSE({{s_{t + 1}},{s_{target}}} )} ]+ \beta |{{W_{{s_{t + 1}}}} - {W_{{s_{target}}}}} |+ \gamma |{max({{s_{t + 1}}} )- max({{s_{target}}} )} |} \}$$
$\alpha $, $\beta $, and $\gamma $ are weights for the pulse shape, duration, and energy. ${W_{{s_t}}}$ and ${W_{{s_{target}}}}$ are the temporal full width at half maximum (FWHM) of the actual and target pulse, respectively. MSE indicates the mean-square error between two pulses. The weight factors are hyperparameters in the DDPG algorithm. The magnitude of the MSE, energy difference, and duration difference would be obtained based on a randomly selected action. To ensure that the effects of pulse shape, energy, and duration on the reward value are consistent, the weight factors could be set to balance the magnitude difference between them. Nevertheless, the weight factors are not required to be optimized repeatedly for different target states. A larger reward is obtained after executing a decent action, which means that the difference between the actual and target pulse becomes smaller. The target pulse is designed according to the Hyperbolic-Secant function [23].

The implementation of the DDPG algorithm includes data sampling and network learning. To train the model, numerous tuples (${s_t}$, ${a_t}$, ${s_{t + 1}}$, ${R_t}$) are stored in an experienced buffer after the interaction between the environment and the actor network. Then, the data sets sampled from the experience buffer are used to train the actor and critic networks. The policy gradient is utilized to update the actor network ${\mu _\theta }({{s_t}} )$ to maximize the Q value by minimizing the loss function $Los{s_a}$, as shown in Eq. (4) [24],

$$Los{s_a} ={-} \frac{1}{N}\mathop \sum \limits_{j = 1}^N {q_\omega }({{s_j},{\mu_\theta }({{s_j}} )} )$$
where N represents the sample times. We use the temporal-difference error (TD-error) method to train critic network ${q_\omega }({{s_t},{a_t}} )$ to accurately evaluate the action determined by the actor network. The loss function of the critic network is shown in Eq. (5) [24].
$$Los{s_c} = \frac{1}{N}\mathop \sum \limits_{j = 1}^N [{{q_\omega }({{s_j},{a_j}} )- ({{q_{\bar{\omega }}}({{s_{j + 1}},{\mu_{\bar{\theta }}}({{s_{j + 1}}} )} )+ {R_j}} )} ]$$
$$\left\{ {\begin{array}{{c}} {\bar{\theta } = \delta \ast \bar{\theta } + ({1 - \delta } )\ast \theta }\\ {\bar{\omega } = \delta \ast \bar{\omega } + ({1 - \delta } )\ast \omega } \end{array}} \right.$$

A soft-update strategy indicated in Eq. (6) [24] is applied for updating the parameters of target networks, where $\delta $ is the soft-update weight. At the beginning of each episode, the action is randomly initialized. During the warm-up stage, the agent samples the environment K steps in an episode. Each episode ends when the reward is greater than a preset threshold, which usually signifies the cavity setting corresponds to the target pulse being searched, or the sampling steps reaching the preset maximum of steps. The critic and actor networks are respectively trained for 30 and 10 times after the end of the episode. The training process ends when the preset number of episodes is met.

3. Results

The LSTM model for rapid modeling and the DDPG model for multi-parameter optimization algorithm are both implemented using the PyTorch frame and run on an NVIDIA RTX3090 GPU. Firstly, we demonstrate the training process of the agent to achieve the inverse design of an MLFL. Then, the proposed method is used to design different MLFLs to generate desired solitons and soliton molecules.

Given a target pulse with a temporal FWHM of 260 fs and an energy of 24 pJ, Fig. 4(a) shows the reward value versus the number of episodes. The DDPG model warms up in the first 100 episodes for collecting enough experiences for training. During warming up, the ${g_0}$ and ${L_c}$ are randomly selected in the range of 2.4∼3.6 ${m^{ - 1}}$ and 1.03∼2.05 m, respectively, which results in extremely small reward values. When the training process starts, the reward value gradually increases and eventually converges to a stable value near zero. Figure 4(b) records the variation of the ${g_0}$ and ${L_c}$ during 400 episodes. After training, the ${g_0}$ and ${L_c}$ gradually converge to 3.03 ${m^{ - 1}}$ and 1.37 m, respectively. The evolution of the actual pulse during the training process and the comparison between the actual and target pulse are demonstrated in Fig. 4(c). The final MSE between the actual and target pulse below 1.5e-4 indicates that the method could accurately search an optimal cavity setting for the MLFL to generate the target pulse.

 figure: Fig. 4.

Fig. 4. The training process of the agent. (a) Reward convergence. (b) Convergences of the ${g_0}$ and ${L_c}$. (c) Comparison of the evolved actual and target pulses.

Download Full Size | PDF

The data-driven method is used to search the target pulses of different energies and durations. To evaluate the performance of the data-driven method, the SSFM is employed to obtain a stable pulse using the cavity setting acquired via the inverse design process. Figure 5 demonstrates the comparison of the pulse obtained via SSFM, the pulse searched by the data-driven method and the target pulse, and the insets show convergences of the ${g_0}$ and ${L_c}$ via the data-driven method. With the data-driven method, the ${g_0}$ and ${L_c}$ gradually converge to stable values after training, respectively. According to Fig. 5(a)∼(d), the proposed method exhibits excellent performance in searching for target pulses with energy in the range of 16∼24 pJ. The maximum normalized MSE between the search pulse and target pulse, SSFM pulse are 4.3e-4 and 8.7e-5, respectively. In addition, MLFLs generating a pulse with different durations are also designed via the data-driven method, as shown in Fig. 5(c), (e), and (f). In Fig. 5(d), there are obvious deviations in the shape of the searched pulse and the target pulse, especially at the bottom of the pulses. The deviations are due to the near-zero net dispersion of the cavity, which prevents the MLFL from producing the Hyperbolic-Secant pulses as the target function.

 figure: Fig. 5.

Fig. 5. The comparison of the traditional soliton resolved via SSFM, pulse searched by data-driven method and target pulse, and convergences of the ${g_0}$ and ${L_c}$ (insets) via the data-driven method. (a) FWHM = 260 fs, energy = 16 pJ. (b) FWHM = 260 fs, energy = 18 pJ. (c) FWHM = 260 fs, energy = 20 pJ. (d) FWHM = 260 fs, energy = 24 pJ. (e) FWHM = 230 fs, energy = 20 pJ. (f) FWHM = 290 fs, energy = 20 pJ.

Download Full Size | PDF

Figure 6 shows the inverse design results of MLFLs for generating soliton molecules with different inter-soliton separations. The target soliton molecule consists of two hyperbolic-secant pulses with an inter-soliton separation of 1.3 ps, as shown in Fig. 6(a). The normalized MSE between the search soliton molecule and target soliton molecule is 1.8e-4, which signifies successfully inverse designing an MLFL to generate soliton molecules on demand. However, the solution to the target state must exist in the parametric space for inverse design to work effectively. Here, it means the target state is acquirable via altering the ${g_0}$ and ${L_c}$ in the given range. Otherwise, the inverse design could fail and the target state cannot be located. For instance, Fig. 6(b) demonstrates the inverse design results when the target inter-soliton separation becomes 1.6 ps. There is a significant difference between the searched and target soliton molecules, with a normalized MSE rising to 4.6e-3. To verify the target inter-soliton separation becomes 1.6 ps does not exist in the parametric space, we attempted to use the grid search method to calculate the reward value under different cavity settings, where the ${g_0}$ and ${L_c}$ are divided into 20 equal divisions, respectively. The LSTM model predicts the actual state via the selected ${g_0}$ and ${L_c}$. Then, the reward is calculated based on the actual state and the target state. As shown in Fig. 7, the optimal ${g_0}$ and Lc are 3.15 ${m^{ - 1}}$ and 1.37 m searched via the grid search, respectively, corresponding to the reward value of -8.97. Nevertheless, the optimized reward value for the data-driven method is -8.61. The result demonstrates that although the model has no solution for the target state, the inverse design is still able to locate a state that is very close to the target state.

 figure: Fig. 6.

Fig. 6. The comparison of the soliton molecule resolved via SSFM, pulse searched by data-driven method, and target pulse. (a) The inter-soliton separation is 1.3 ps. (b) The inter-soliton separation is 1.6 ps.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. The reward values obtained in the grid search.

Download Full Size | PDF

4. Discussion

The time for the multi-parameter optimization depends on numerous factors, with the step setting of the warm-up being the critical factor. During the warm-up stage, the agent randomly interacts with the environment to generate data without weights updating. Previously, the episode for the warm-up stage is set to 100, which could consume significant amounts of time in the simulation. To investigate the optimal simulation time, we attempt to shorten the episode of the warm-up stage to 20. In addition, the learning rates of the actor and critic networks are optimized during the training process. Figure 8 demonstrates the search results for different target pulses after shortening the episode of the warm-up stage. According to Fig. 8(a)∼(c), the proposed method could achieve similar efficiency for different target pulses. As shown in Fig. 8 (a), the reward value of the DDPG algorithm converges to a stable value after only 70 episodes of searching the target pulse with a temporal FWHM of 260 fs and an energy of 24 pJ. With the improved training strategy, Fig. 8(d) demonstrates the comparison of the actual pulse and target pulse during the inverse design process. After the warm-up stage, the MSE between two pulses gradually decreases to 2.14e-4. After optimizing the algorithm, an MLFL is accurately designed in 1.3 hours. In contrast to traditional methods, the design complexity of the data-driven approach is not dependent on different target pulses, because it solves for the pulses through an AI model rather than iterations of SSFM.

 figure: Fig. 8.

Fig. 8. The search process for target pulse after algorithm optimization. (a) The process of reward value convergence (FWHM = 260 fs, Energy = 24 pJ). (b) The process of reward value convergence (FWHM = 230 fs, Energy = 20 pJ). (c) The process of reward value convergence (FWHM = 290fs, Energy = 24 pJ). (d) The comparison of actual and pulses during the search process(FWHM = 260 fs, Energy = 24 pJ).

Download Full Size | PDF

5. Conclusion

To summarize, we propose a data-driven approach to design MLFL to generate target femtosecond pulses. Concretely, combined with fast and accurate MLFL modeling enabled by an LSTM model, a reinforcement learning algorithm is employed to search for a desired cavity setting to produce the target pulses. The data-driven method can accurately design different MLFLs with durations ranging from 230 fs to 290 fs and energies ranging from 16 pJ to 24 pJ. Moreover, the inverse design of MLFLs to generate different inter-soliton separations can also be realized via this method. With the acceleration by the GPU, the reward value of the DDPG algorithm can converge within 70 episodes, and the search time is less than 1.3 hours. The data-driven method certainly improves efficiency in designing an MLFL and paves the way for intelligent laser design. We believe the proposed approach can be referred to when performing inverse design of other optical systems.

Funding

National Natural Science Foundation of China (62025503, 62205199, 62227821); China Postdoctoral Science Foundation (2022M722059).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. B. Rieker, F. R. Giorgetta, W. C. Swann, J. Kofler, A. M. Zolot, L. C. Sinclair, E. Baumann, C. Cromer, G. Petron, C. Sweeney, P. P. Tans, I. Coddington, and N. R. Newbury, “Frequency-comb-based remote sensing of greenhouse gases over kilometer air paths,” Optica 1(5), 290–298 (2014). [CrossRef]  

2. J. Lee, Y. J Kim, K Lee, S Lee, and S. W Kim, “Time-of-flight measurement with femtosecond light pulses,” Nat. Photonics 4(10), 716–720 (2010). [CrossRef]  

3. K. Sugioka and Y. Cheng, “Ultrafast lasers—reliable tools for advanced materials processing,” Light: Sci. Appl. 3(4), e149 (2014). [CrossRef]  

4. P. Ghelfi, F. Laghezza, F. Scotti, et al., “A fully photonics-based coherent radar system,” Nature 507(7492), 341–345 (2014). [CrossRef]  

5. S. Liu, Y. Cui, E. Karimi, and B. A. Malomed, “On-demand harnessing of photonic soliton molecules,” Optica 9(2), 240–250 (2022). [CrossRef]  

6. Govind P. Agrawal, “Nonlinear Fiber Optics,” in Nonlinear Fiber Optics (Academic Press, 2013), Vol. 2.

7. R. Yan, T. Wang, X. Jiang, Q. Zhong, X. Huang, L. Wang, X. Yue, H. Wang, and Y. Wang, “Efficient inverse design and spectrum prediction for nanophotonic devices based on deep recurrent neural networks,” Nanotechnology 32(33), 335201 (2021). [CrossRef]  

8. G. C. M. Ferreira, S. P. N. Cani, M. J. Pontes, and M. E. V. Segatto, “Optimization of distributed Raman amplifiers using a hybrid genetic algorithm with geometric compensation technique,” IEEE Photonics J. 3(3), 390–399 (2011). [CrossRef]  

9. Darko Zibar, Ann Margareth Rosa Brusin, Uiara C. de Moura, Francesco Da Ros, Vittorio Curri, and Andrea Carena, “Inverse System Design Using Machine Learning: The Raman Amplifier Case,” J. Lightwave Technol. 38(4), 736–753 (2020). [CrossRef]  

10. J. S. Feehan, S. R. Yoffe, E. Brunetti, M. Ryser, and D. A. Jaroszynski, “Computer-automated design of mode-locked fiber lasers,” Opt. Express 30(3), 3455–3473 (2022). [CrossRef]  

11. S. Chugh, A. Gulistan, S. Ghosh, and B. M. A. Rahman, “Machine learning approach for computing optical properties of a photonic crystal fiber,” Opt. Express 27(25), 36414–36425 (2019). [CrossRef]  

12. G. Pu and B. Jalali, “Neural network enabled time stretch spectral regression,” Opt. Express 29(13), 20786–20794 (2021). [CrossRef]  

13. C. M. Valensise, A. Giuseppi, G. Cerullo, and D. Polli, “Deep reinforcement learning control of white-light continuum generation,” Optica 8(2), 239–242 (2021). [CrossRef]  

14. S. Boscolo, J. M. Dudley, and C. Finot, “Modelling Self-Similar Parabolic Pulses in Optical Fibres with a Neural Network,” Results in Optics 3, 100066 (2021). [CrossRef]  

15. G. Genty, L. Salmela, J. M. Dudley, D. Brunner, A. Kokhanovskiy, S. Kobtsev, and S. K. Turitsyn, “Machine Learning and Applications in Ultrafast Photonics,” Nat. Photonics 15(2), 91–101 (2021). [CrossRef]  

16. Z. Li, S. Yang, Q. Xiao, T. Zhang, Y. Li, L. Han, D. Liu, X. Ouyang, and J. Zhu, “Deep reinforcement with spectrum series learning control for a mode-locked fiber laser,” Photonics Res. 10(6), 1491–1500 (2022). [CrossRef]  

17. Q. Yan, Q. Deng, J. Zhang, Y. Zhu, K. Yin, T. Li, D. Wu, and T. Jiang, “Low-latency deep-reinforcement learning algorithm for ultrafast fiber lasers,” Photonics Res. 9(8), 1493–1501 (2021). [CrossRef]  

18. L. Salmela, N. Tsipinakis, A. Foi, C. Billet, and G. Genty, “Predicting ultrafast nonlinear dynamics in fibre optics with a recurrent neural network,” Nat. Mach. Intell. 3(4), 344–354 (2021). [CrossRef]  

19. X. Jiang, D. Wang, Q. Fan, M. Zhang, C. Lu, and A. P. T. Lau, “Physics-Informed Neural Network for Nonlinear Dynamics in Fiber Optics,” Laser Photonics Rev. 16(9), 2100483 (2022). [CrossRef]  

20. G. Pu, R. Liu, H. Yang, Y. Xu, W. Hu, M. Hu, and L. Yi, “Fast Predicting the Complex Nonlinear Dynamics of Mode-Locked Fiber Laser by a Recurrent Neural Network with Prior Information Feeding,” Laser Photonics Rev. 17(6), 2200363 (2023). [CrossRef]  

21. X. Liu, D. Popa, and N. Akhmediev, “Revealing the Transition Dynamics from Q Switching to Mode Locking in a Soliton Laser,” Phys. Rev. Lett. 123(9), 093901 (2019). [CrossRef]  

22. FÖ Ilday, J. R. Buckley, W. G. Clark, and F. W. Wise, “Self-Similar Evolution of Parabolic Pulses in a Laser,” Phys. Rev. Lett. 92(21), 213902 (2004). [CrossRef]  

23. Ursula Keller, Ultrafast Lasers (Springer, 2021), Chap. 4.

24. T. P Lillicrap, J. J Hunt, A Pritzel, N Heess, T Erez, Y Tassa, D Silver, and D Wierstra, “Continuous control with deep reinforcement learning,” arXiv, arXiv:1509.02971 (2019). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. The simulated MLFL is based on a saturate absorber. LD, laser diode; WDM, wavelength division multiplexer; EDF, erbium-doped fiber; SMF, single-mode fiber; SA, saturate absorber; ISO, isolator.
Fig. 2.
Fig. 2. The workflow of the inverse design of an MLFL.
Fig. 3.
Fig. 3. The training architecture of the DDPG algorithm.
Fig. 4.
Fig. 4. The training process of the agent. (a) Reward convergence. (b) Convergences of the ${g_0}$ and ${L_c}$. (c) Comparison of the evolved actual and target pulses.
Fig. 5.
Fig. 5. The comparison of the traditional soliton resolved via SSFM, pulse searched by data-driven method and target pulse, and convergences of the ${g_0}$ and ${L_c}$ (insets) via the data-driven method. (a) FWHM = 260 fs, energy = 16 pJ. (b) FWHM = 260 fs, energy = 18 pJ. (c) FWHM = 260 fs, energy = 20 pJ. (d) FWHM = 260 fs, energy = 24 pJ. (e) FWHM = 230 fs, energy = 20 pJ. (f) FWHM = 290 fs, energy = 20 pJ.
Fig. 6.
Fig. 6. The comparison of the soliton molecule resolved via SSFM, pulse searched by data-driven method, and target pulse. (a) The inter-soliton separation is 1.3 ps. (b) The inter-soliton separation is 1.6 ps.
Fig. 7.
Fig. 7. The reward values obtained in the grid search.
Fig. 8.
Fig. 8. The search process for target pulse after algorithm optimization. (a) The process of reward value convergence (FWHM = 260 fs, Energy = 24 pJ). (b) The process of reward value convergence (FWHM = 230 fs, Energy = 20 pJ). (c) The process of reward value convergence (FWHM = 290fs, Energy = 24 pJ). (d) The comparison of actual and pulses during the search process(FWHM = 260 fs, Energy = 24 pJ).

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

u z + ( α g ) 2 u + i β 2 2 2 u 2 t β 3 6 3 u 3 t g Ω g 2 2 u 2 t = i γ | u | 2 u
g = g 0 / ( 1 + E p / E s )
R t = { α [ M S E ( s t + 1 , s t a r g e t ) ] + β | W s t + 1 W s t a r g e t | + γ | m a x ( s t + 1 ) m a x ( s t a r g e t ) | }
L o s s a = 1 N j = 1 N q ω ( s j , μ θ ( s j ) )
L o s s c = 1 N j = 1 N [ q ω ( s j , a j ) ( q ω ¯ ( s j + 1 , μ θ ¯ ( s j + 1 ) ) + R j ) ]
{ θ ¯ = δ θ ¯ + ( 1 δ ) θ ω ¯ = δ ω ¯ + ( 1 δ ) ω
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.