## Abstract

The generation of user-defined optical temporal waveforms with picosecond resolution is an essential task for many applications, ranging from telecommunications to laser engineering. Realizing this functionality in an on-chip reconfigurable platform remains a significant challenge. Towards this goal, autonomous optimization methods are fundamental to counter fabrication imperfections and environmental variations, as well as to enable a wider range of accessible waveform shapes and durations. In this work, we introduce and demonstrate a self-adjusting on-chip optical pulse-shaper based on the concept of temporal coherence synthesis. The scheme enables on-the-fly reconfigurability of output optical waveforms by using an all-optical sampling technique in combination with an evolutionary optimization algorithm. We further show that particle-swarm optimization can outperform more commonly used algorithms in terms of convergence time. Hence, our system combines all key ingredients for realizing fully on-chip smart optical waveform generators for next-generation applications in telecommunications, laser engineering, and nonlinear optics.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. INTRODUCTION

The incorporation of machine learning and “smart” optimization techniques into photonic technologies has enabled widespread enhancements in device functionalities beyond the capabilities of traditional optical systems [1]. Recent examples include the neural-network-enabled reconstruction of optical fields [2], the control of complex cavity dynamics [3–6], and the tailoring of nonlinear light generation [7,8]. The advantage of using machine-learning approaches becomes particularly apparent for the optimization of complex photonic integrated circuits with many degrees of control—where numerical modeling alongside fabrication imperfections makes the task increasingly difficult.

An application field that can especially benefit from such approaches is on-chip optical waveform generation. Optical waveform generators (OWGs) are key to boosting optical signal processing functionalities [9]. In particular, OWGs enable crucial enhancements in frequency shifting [10], nonlinear conversion processes [11], and all-optical signal processing [12]. On-chip OWG implementations are particularly desired, as they offer high efficiencies, intrinsic environmental stability, and scalability [13–15]. Nevertheless, to the best of our knowledge, a practical on-chip scheme that enables reconfigurable customized picosecond wave shaping in a user-friendly, energy- and cost-efficient fashion has yet to be demonstrated.

Ideally, an OWG should (i) autonomously output the target waveform (user-friendliness); (ii) minimize the experimental requirements for driving the system and reading out (or monitoring) the waveform; and (iii) feature long-term reliability, low losses, fiber connectivity, and maximal functionality. Previously demonstrated integrated OWGs only featured one or two of these key features at a time. Moreover, the performance of these techniques has often been limited in terms of waveform accuracy, since practical imperfections, such as individual device fidelities or fabrication tolerances, deteriorate the performances accessible from those initially designed/simulated for.

In contrast to schemes based on *a priori* design approaches, adaptive *a posteriori* optimization of phase and field amplitudes has been shown to enable enhanced flexibility and reconfigurability for optical pulse shaping, as demonstrated in free-space and fiber-based setups [16–18]. More generally, to fulfill the demands of out-of-lab applications, artificial neural networks and evolutionary optimization algorithms have recently found use in ultrafast optics [1]. Such approaches have brought about important advances in solving experimental multiparameter problems, such as adaptive control of nonlinear laser dynamics [3,4,19,20], temporal and spectral pulse control [21,22], and performance optimization of nonlinear light generation schemes [7,23]. Commonly, such experimental schemes use “smart” optimization techniques, such as genetic algorithms (GAs), running on a central processor that acquires a measurement of the system output, compares it to a target functionality, and applies corrective feedback to system control parameters. Despite the success associated with GAs, other search algorithms may outperform them on certain tasks. For example, particle swarm optimization (PSO) [24] offers a promising alternative, as it is straightforward to implement, demands fewer computational resources, and has been shown to converge faster for multidimensional problems [25–27]. Despite these advantages, PSO use in optics has mainly been confined to inverse design problems [28–30], while not finding widespread use for practical photonics system optimization.

Keys for applying autonomous optimization techniques in an experimental environment are fast, efficient and unambiguous optical detection methods for the system output. This can be achieved with classical techniques such as optical spectrum analyzers in the Fourier domain or full-field (amplitude and phase) reconstruction methods for the temporal characterization of optical pulses [31]. However, these also impose stringent experimental requirements for the measurement equipment and may be unsuitable for online optimization tasks or user-friendly stand-alone operation. Time-lens techniques are available for the straightforward measurement of ultrafast temporal pulse profiles in fiber systems [32]. However, they are difficult to integrate on a single chip, hard to apply to narrowband picosecond pulses, may still require broadband (gigahertz) optical detection, and may cause artifacts (e.g., due to temporal modulations originating from the time-domain interference of spectral components from the input pulse with a nonflat phase).

In this work, we demonstrate an autonomous OWG framework that has the potential for full on-chip integrability. Our approach combines a novel reconfigurable picosecond pulse-shaping technique with an ultrafast all-optical sampling method and PSO algorithm for the autoadjustment of the optical system output. Specifically, for user-defined pulse shaping, we make use of temporal coherence synthesis (as introduced in Section 2, and Fig. 1) using a concatenation of thermally tunable, unbalanced Mach–Zehnder interferometers (UMZIs) on a low-loss silicon-oxy-nitride chip platform. For self-optimization, we make use of an all-optical sampling read-out based on seeded four-wave mixing (FWM), which provides a slowly varying output signal for waveform reconstruction with minimal electronic requirements (see Section 3). In Section 4, we numerically and experimentally show that the setup, in combination with a PSO algorithm, allows for excellent, high-accuracy performance, while providing improved convergence relative to widely adopted GAs. These results also pave the way for new implementations of smart photonics beyond the limitations of GAs.

## 2. RECONFIGURABLE OPTICAL WAVEFORM GENERATION USING TEMPORAL COHERENCE SYNTHESIS

Today, on-chip implementations of OWG techniques employ Fourier or Taylor synthesis for shaping pulses up to a few hundred picoseconds [13,14]. In those methods, the optical field is decomposed into spectral components, which are then weighted, phase-controlled, and interfered in order to prepare a temporal field with the targeted shape. Current implementations predominantly rely on the use of predefined amplitude and phase weights obtained *a priori* from simulations and encoded into the physical system via fabrication, making the system performance vulnerable to imperfections. Moreover, the number of components required for picosecond pulse shaping (e.g., integrated resonant structures, amplitude and phase modulators for each incorporated frequency band) leads to high system complexities, resulting in high insertion losses, and a large number of free parameters for external control.

A promising alternative, called *temporal coherence synthesis*, has recently been demonstrated in free space [33,34]. This concept builds on cascaded optical splitting and delaying of an input pulse using unbalanced interferometers with increasing temporal delays. By controlling the amplitude and phase at each interferometric stage, the constructive interference at the system output can be shaped as defined by the user.

Our demonstration scales the initial proof-of-concept by miniaturizing the components necessary for temporal coherence synthesis onto a single CMOS-compatible high-index silica chip. Similar to the scheme depicted in Fig. 1, we use a concatenation of up to five UMZIs with bit-wise increasing delays $\Delta {t_n} = T \cdot {2^{n - 1}}$, where $T = 1\;{\rm{ps}}$, with $\Delta {t_n}$ thus spanning from $\Delta {t_1} = 1{\rm{\;ps}}$ to $\Delta {t_5} = 16\;{\rm{ps}}$, in a compact footprint (${11.75} \times {9.00}\;{\rm{mm}}^2$) [7]. The platform offers excellent transmission efficiencies, with ultralow propagation losses (linear loss ${\lt}\;{0.06}\;{\rm{dB/cm}}$ at 1550 nm [35]) and low coupling losses to standard SMF28 fibers (1.4 dB insertion loss per facet). Each individual on-chip interferometer output is connected with thermally controllable beam splitters for an *ad hoc* adjustment of the amplitude weights (${\lt}\;{{100}}\;{\rm{ms}}$ switching time), which can be addressed directly through a microcontroller unit (MCU). At each interferometric stage, the incoming optical field is split at a specific ratio, controlled by thermally tuning the waveguide coupler at the input, mediated by the MCU. One copy of the split pulse then propagates through the corresponding delay section, while the other part propagates through a fixed short path. Afterwards, the two pulses interfere in another controllable waveguide coupler to create a new temporal waveform. A particular advantage of cascading MZIs is that the optical energy does not leave the coupled waveguide system until the optical pulse reaches the last coupler. Hence, the device loss, and thus the system’s energy efficiency, is determined by the coupling coefficients at the last coupler of the setup. It is important to note that our platform does not presently provide phase control over the individual delays. The relative phase gained from each active delay is arbitrary, but constant, as it is intrinsic to our on-chip implementation, which lacks additional heaters on the short path. We discuss the implications of this on the device performance in Section 4.

The access to amplitude weights in the split-and-delay line (SDL) device allows for autonomous tailoring of the output optical waveform towards a target temporal profile (see PSO in Fig. 1). The system output is monitored by making use of optical sampling, which enables the unambiguous reconstruction of the output temporal envelope (see optical sampling in Fig. 1). The measured waveform can then be compared to the target waveform, with the difference between the two used to create an electronic feedback to adjust the amplitude weights of the SDL, as we detail in the following sections.

## 3. EXPERIMENTAL METHODS

#### A. Setup

The experimental setup is divided into two main parts: pulse shaping and optical sampling, as depicted in Fig. 2. The optical shaper uses a commercial mode-locked fiber laser (Pritel FFL) as the input source, emitting Gaussian pulses with a duration (full width at half-maximum, FWHM) of $\Delta {\tau _{{\rm{FWHM}},1}} \approx 22\;{\rm{ps}}$ at a repetition rate of ${f_{{\rm{rep}},1}} = 10\;{\rm{MHz}}$, centered at 1544.6 nm. The pulse processing is performed by means of the integrated on-chip pulse-splitter, i.e., the SDL, as described in Section 2. In order to increase the number of accessible delay combinations, the pulse propagates through the sample twice [36], i.e., after the pulse exits the sample, it is reflected off a fiber mirror and propagates backwards through the chip again.

Waveform sampling is realized using a nonlinear optical AND gate. The AND gate is based on seeded, degenerate FWM in a highly nonlinear fiber (HNLF, OFS Fitel Denmark ApS., 1 km length, 1546 nm zero-dispersion wavelength) [37]. A second mode-locked fiber laser (Menlo FC1500-250-WG, ${f_{{\rm{rep}},2}} = {250.27}\;{\rm{MHz}}$, $\Delta {\tau _{{\rm{FWHM}},2}} \approx 4.4\;{\rm{ps}}$ after spectral filtering at ${1557.9}\;{\rm{nm}}\;{{\pm}}\;{2.4}\;{\rm{nm}}$) with a slightly different repetition rate than the pulse shaper (i.e., ${f_{{\rm{rep}},2}} = n{f_{{\rm{rep}},1}} + \delta\!{f}$) is used as the sampling probe [38,39]. Both optical sources are coupled to the HNLF and undergo phase-matched parametric amplification at times when they temporally overlap. By using locked but asynchronous repetition rates, the seed (sampling) pulse passively scans the full duration of the (shaped) pump pulse at a relatively slow repetition rate, namely, with a sampling period $\Delta T = 1/\delta{f} = 3.7\;{\rm{\unicode{x00B5} s}}$. As a result, an optical idler is generated at the corresponding sampling times in another frequency band. This band is spectrally filtered using a tunable optical bandpass filter (Finisar Waveshaper, ${1532.4}\;{\rm{nm}}\;{{\pm}}\;{3.8}\;{\rm{nm}}$), slightly amplified, and directed to a fiber-coupled photodiode (Finisar XPDV2120R). A real-time oscilloscope (Agilent DSO-X 92804A) allowed interfacing and automatic readout of the photodiode signal. Finally, a numerical Hilbert transform is used to extract the temporal envelope of the sampled signal [40]. For additional readout stability, both lasers are locked to one another (i.e., the input source was used as an external reference clock to lock the repetition rate of the sampling source) so as to minimize effects caused by any cavity drift during the experiment, which allowed for reliable and constant spacing $\Delta T$ of the sampling points over several weeks.

The microsecond sampling period was calibrated to the picosecond scale by accurately measuring the repetition rate difference of the lasers used. This calibration factor was confirmed by performing an autocorrelation measurement of the unshaped Gaussian pump pulse. The autocorrelation width ($\Delta {\tau _{{\rm{FWHM}},1,{\rm{AC}}}} = 21.85\;{\rm{ps}}$) matches the measured envelope width of our optical sampling scope well ($\Delta {\tau _{{\rm{FWHM}},1,{\rm{Sampling}}}} = 21.26\;{\rm{ps}}$); see Fig. 2(b). In future implementations, highly nonlinear on-chip waveguides [35] or shorter HNLFs can be used to reduce system cost further [37,41].

The asynchronous sampling method offers a few important advantages for the measurement of output pulses in the picosecond regime. First, the low-repetition idler can be detected with less-demanding equipment requirements (e.g., detection bandwidth) [37] compared to other ultrafast measurement schemes. For instance, dispersive time-stretch methods are not very effective for pulse widths ${\gt}{{10}}\;{\rm{ps}}$, as highly dispersive elements are then required to magnify the input temporal features to the tens of nanoseconds range for efficient resolution with ultrafast photodiodes. Otherwise, direct optical detection methods (e.g., ultrafast photodetection) are prohibitively expensive and still limited in temporal resolution, even when considering state-of-the-art 100 GHz bandwidth optoelectronics. Second, the idler power ${P_{i,N}}$ at the sampling point $N$ is directly proportional to the pump power ${P_p}$ squared, i.e., ${P_{i,N}} \sim {\gamma ^2}{P_s}(t)P_p^2({t - N\tau})$, with seed power ${P_s}$, nonlinear parameter $\gamma$, and temporal resolution $\tau$. This allows for the direct detection of the square-amplitude profile of the shaped pump waveform from the idler pulse energy. The resolution of the system (here $\tau \approx 4.3\;{\rm{ps}}$) can be finely adjusted as the repetition rate difference $\delta\!{f}$ between the two lasers, calculated as $\tau = {({n{f_{{\rm{rep}},1}}})^{- 1}} \cdot {V^{- 1}}$, with stretching factor $V = n{f_{{\rm{rep}},1}}/\delta\!{f}$. Hence, the detection method is particularly suitable for unambiguously measuring picosecond non-Gaussian pulses, such as triangular, square, or sawtooth waveforms, which would all otherwise yield triangular autocorrelation traces with barely distinguishable features. A single measurement of a waveform took between 6 s and 10 s, which mainly includes the waiting time for thermal stabilization of the modified SDL switches (${\sim}{{2}}\;{\rm{s}}$), the acquisition time of the oscilloscope (${\sim}{{2}}\;{\rm{s}}$), and the time for retrieving the waveform envelope.

#### B. Waveform Optimization Algorithm

The optically sampled waveform was used to autonomously optimize the OWG output using an evolutionary algorithm. Commonly, GAs are used to perform a smart search for the most performant system parameters with respect to minimizing a given error or cost function [1]. The performance of GAs critically depends on a reasonable guess of initial parameters, since ill-defined GA starting conditions will result in a limited exploration of the search space and thus lead to exceptionally long measurement times. It should be noted that in practice, the measurement time represents one of the main bottlenecks in applying machine-learning concepts to optical systems. Choosing a good initial parameter set is particularly difficult in nonintuitive systems involving many independent degrees of freedom. Other algorithms may therefore be more advantageous to make a complex system efficiently converge to an output state with respect to a given target. For our implementation of adaptive optical pulse shaping, we instead chose a PSO algorithm to determine device settings (i.e., the voltages for each thermal switch) in order to obtain a target output waveform. In our system, we expect the PSO to perform more efficiently, as identifying suitable initial system settings to obtain specific waveforms, such as triangular pulses, is exceedingly difficult.

The PSO algorithm, similar to GA, is a nature-inspired, population-based, metaheuristic optimization algorithm [42]. The PSO starts with an initial population of candidate solutions (particle swarm of size $M$) in an $N$-dimensional search space (i.e., $N$ is the number of variables per particle to optimize). By following strict self- and swarm-optimization rules, the PSO systematically minimizes the particle potential in this space given by a cost function [see Eq. (S1) in Supplement 1] illustrated in Fig. 1 by a color map in a simplified 2D parameter space.

In our experiment, we used a swarm size of $M = {{60}}$ with $N = {{6}}$ system parameters each (corresponding to six variable couplers enclosing five delays). Each particle in the swarm represents a set of six electrode voltage values that control the splitting ratios of the six on-chip MZIs. Each particle is assigned an initial position and inertia, represented in the inset of Fig. 1 as magenta-colored particles and dashed lines, respectively. Relying on simple vector multiplications, the PSO does not require elaborate operations such as mutation and crossover, as in the case of a GAs, to explore the solution space at large. In cases of successful convergence, the algorithm returns the system parameters that correspond to the (ideally global) cost function minimum, indicated in the inset of Fig. 1 with yellow particles. In order to speed up the optimization progress, we reduced the maximum possible iterations to 60 and the stall iterations (i.e., number of iterations after which the algorithm stops when not improving) to 12. Other parameters can be found in Supplement 1 .

The PSO performance critically relies on the selection of a meaningful cost function ${f_{{\rm{Cost}}}}$. We used the *cosine similarity* ${\cos}(\theta)$, a measure of similarity between two vectors, ${\textbf{A}}$ and ${\textbf{B}}$, as the basis of our ${f_{{\rm{Cost}}}}$, which we defined as

*factor*(not used inside the algorithm). Here, we define a $Q$

*factor*with $Q = {\rm{sgn}}({\cos\theta}) \cdot {10^{| {{\cos}\theta} |}}$ that reaches 10 in the case of perfect similarity, 0 in the case of missing correlation, and $- 10$ in the case of dissimilarity. Note that the Q factor was not used within the optimization process, but it is helpful for better distinguishing the results in what follows.

For performance comparison we also tested a standard GA (see Supplement 1 for additional information), which has been used in a previous demonstration [7]. The GA uses the same cost function given in Eq. (1). Moreover, we implemented a fully software-based PSO and GA by modeling the functionality of the programmable MZI cascade [see Eq. (S2) in Supplement 1], which we later use to study the practical constraints of our system. Using the model, we also tested various ${{\rm{L}}_1}$ and ${{\rm{L}}_2}$ distance metrics. We noticed that such simulations never yielded satisfactory waveforms (independently of the PSO settings or chosen waveform). We attribute this to a limited specificity of the distance metrics, leading to a large ambiguity in fitness values and hence to a “noisy” fitness landscape (i.e., good and nonideal solutions might be very close in cost value). Consequently, the experiments have been exclusively performed based on the cosine similarity [i.e., Eq. (1)].

## 4. RESULTS

In order to demonstrate the capabilities of our pulse-shaping approach, we tested four waveforms of particular interest for optical signal processing [10–12], i.e., positive and negative sawtooth, triangle, and flattop pulses.

Figure 3 shows the results for both experiment and simulation for the case of an input pulse with 21 ps duration: The top row [Figs. 3(a)–3(d)] depicts the simulation results for optimizing only the amplitude ratios of the switches (similar to the experimental configuration), the middle row [Figs. 3(e)–3(h)] shows the experimental results, and the bottom row [Figs. 3(i)–3(l)] shows the results for an ideal configuration, where both the amplitude and phase settings for each delay can be optimized. In all cases, an ideal waveform shape was targeted, while no constraints were applied to the FWHM width of the output (see Supplement 1). Thus, the algorithm finds the waveform with the lowest cost function value, corresponding to the best overlap with a given waveform shape.

The experimental data [Figs. 3(e)–3(h)] demonstrate that, by incorporating the delays from the five interferometers (propagated through twice), an output pulse duration (FWHM) longer than 45 ps can be achieved (see also Fig. 4 for more details). Both simulated and measured results [Figs. 3(a)–3(h)] match the targeted waveforms considerably well in all cases, despite the arbitrary phase settings for each delay and the access to only amplitude control. Improvements on the smoothness of the waveforms can be achieved with individual phase control per delay arm, as demonstrated in the simulation results shown in Figs. 3(i)–3(l). Nevertheless, amplitude-only temporal processing on each delay performs surprisingly well, especially considering the low complexity of the circuit. In order to test system performances, we require the system to only match the shape of a given waveform, without adding any additional, separate constraints on its duration (the constrained case is discussed in Fig. 4 for the positive sawtooth and in Supplement 1 for the flattop). We note that undesired modulations appear in all cases and, as in Figs. 3(b)–3(d), such freedom may even lead to the issue that the input pulse does not change (i.e., no shaping occurs). This is also a result of missing sensitivity in the cost function. In other words, the cost function does not change significantly for small deviations between target and waveform. This weakness is common to all evolutionary algorithms and can only be overcome using custom fitting methods and more elaborate cost evaluations.

Although the waveforms from the simulation (Fig. 3, first row) are similar to those obtained in the experiment, there is a mismatch in the retrieved optimal settings. For example, for the positive sawtooth [Figs. 3(b) and 3(f)], similar pulse widths are output (e.g., ${\tau _{{\rm{sim}}}} = \;{46.4}\;{\rm{ps}}$, ${\tau _{{\exp}}} = \;{39.1}\;{\rm{ps}}$) at significantly different switch settings (ratios simulation: [0.71, 0, 0.67, 0, 0.48, 0.69]; ratios experiment: [0.56, 0, 0.12, 0.08, 0.07, 0.76]). This difference arises from simulation simplifications such as zero phase settings per delay, and the lack of simulated cross talk between the switches. This finding emphasizes the discrepancy between simulated sample design and practical performance that can be caused by certain shortcomings of the model. Autonomous optimization approaches, such as ours, enable the mitigation of such design bottlenecks.

The considerable difference between the best switch settings returned from simulation versus those returned from experiment suggests that simulation results cannot be easily transferred to speed up experimental convergence. In such systems, where simulated and experimental parameter settings differ significantly, the PSO is expected to outperform a standard GA. We tested the potential performance improvement of the PSO in both simulations and experiments. Table 1 summarizes the simulation results (see also Fig. S1 in Supplement 1). The PSO shows a faster convergence in all tested cases, i.e., it requires fewer populations to be tested to reach the targeted cost value. For a fair comparison of the convergence improvement, we chose the optimized cost function value from our simulations above [i.e., ${f_{{\rm{cost}}}} = {1.135}$; see Fig. 3(b)] and used it as a threshold to measure the needed populations and respective runtime for each algorithm. While both PSO runs surpassed this threshold with a maximum of 200 tested populations (i.e., ${\lt}{{4}}\;{\rm{s}}$ runtime), the GA with 100 population size required 1200 tests (i.e., ${\sim}{{22}}\;{\rm{s}}$ runtime), and the GA with a population size of 20 did not even reach the set cost value within the given maximum number of iterations. Thus, our simulations indicate that the PSO seems to converge faster than the GA, even at small population sizes. This aspect becomes particularly severe in experiments where each tested population requires update/detection time.

In our measurements, we tested a GA with the optimized settings recently reported in the literature using a similar platform [26] [i.e., large initial population (500) and low number of iterations (15 generations)]. Here, a large population size was chosen to specifically enhance the exploration of the solution space. Finally, our PSO (100 swarm size, 100 iterations, and 20 stall iterations) reached a Q factor of 9.08 (cost 1.1010) after a total of 9800 tested populations, while the GA finished with a lower Q factor of 8.68 (cost 1.1527) after a total of 7500 tested populations (see Fig. S2 in Supplement 1). An important difference between both runs is that the PSO stopped after 20 stall iterations, during which it achieved no further improvements, basically adding a calculation overhead of 2000 tested populations. The GA, however, finished after its maximum number of iterations was reached without any stall iteration being deposited, indicating low convergence. Hence, our measurements confirm the trend observed in the simulations: the PSO indeed converges notably better, thus achieving high waveform accuracy (i.e., higher Q factor than the GA) within a given number of tested populations for which the GA still does not approach a stable minimum.

The faster convergence is intrinsic to the PSO principle. Since each particle’s momentum is partially calculated from the *swarm* *intelligence* [42] [i.e., the social adjustment mechanics outlined in Supplement 1 together with Eq. (S1)], i.e., a smaller swarm size implies larger deviations from the global optimum and thus a larger individual inertia, and a faster exploration of the solution space. In other words, when the swarm’s knowledge is little, individuals scatter more at a faster pace. However, it should be noted that the convergence speed is highly problem-dependent, and social- and self-adjustment weights as well as the neighborhood size of the swarm may need to be accurately adjusted. A smaller initial swarm size can also lead to a poor convergence, as there might be too few particles to explore the full parameter space—a trade-off also known as *exploration vs. exploitation* [43]. In any case, the benefit to the GA remains when no initial parameter estimate exists.

Finally, we demonstrate the versatility and waveform scalability of our approach in terms of the targeted output pulse duration. While existing *a priori* approaches are often tailored towards only a narrow range of pulse durations, we show the scalability for our OWG output based on simulations for different input settings (see Fig. 4). Therefore, we optimized a positive sawtooth waveform of arbitrary but fixed pulse width in the range from 4–150 ps (see Supplement 1 for more information on target waveform construction) using the same PSO settings as for Fig. 3(b) (i.e., without phase optimization) and three different input pulse durations of 3, 20, and 50 ps. We performed a similar analysis for other waveforms with similar outcomes (see Supplement 1 Fig. S3 for a flattop waveform target).

The waveforms returned were evaluated with a least square fit (via postprocessing) in order to retrieve the output pulse FWHM width. Figure 4(a) shows the final cost value for each target; three examples of the returned waveforms are presented in the inset. The results demonstrate that the relation between target and output pulse width is close to linear [see Fig. 4(b)] up to a factor of 10 (e.g., in the case of a 3 ps input width, one can obtain a 30 ps output) with a relatively small error [Fig. 4(c)]. Notably, each input pulse width features an individual target width that best fits the given waveform, indicated by the lowest cost values in Fig. 4(a) (i.e., at ${\sim}{7.3}\;{\rm{ps}}$, ${\sim}{55.5}\;{\rm{ps}}$, ${\sim}{94.6}\;{\rm{ps}}$ target width for 3, 20, and 50 ps input pulses, respectively). For other target widths, the pulses start to differ more strongly from the target as the optimized cost function values increase in Fig. 4(a), and the respective output pulse widths diverge linearly in Figs. 4(b) and 4(c). Those deviations are mainly caused by increasingly severe modulations of the pulse envelope. These modulations appear primarily when the target waveform is inaccessible for a given input using the provided on-chip delays, i.e., when the platform’s limits are reached in terms of (i) resolution (i.e., given by the shortest delay, here 1 ps), (ii) maximally achievable delay (here 62 ps, i.e., twice the sum of all delays due to double propagation), or (iii) possible delay combinations. Additionally, the shortest delay also limits the minimal input duration, which should be chosen larger than the minimum resolution of the MZI cascade in order to undergo significant pulse shaping. Nevertheless, our results clearly demonstrate the versatility of our approach, ultimately showing the advantages of simpler optimization algorithms applied to dynamical optical systems over *ab initio* simulation methods for sample design.

## 5. CONCLUSION AND OUTLOOK

In conclusion, we demonstrated picosecond pulse shaping by temporal coherence synthesis on a fiber-coupled, reconfigurable SDL chip combined with a cost-effective optical readout and an autonomous optimization technique. The demonstrated device can achieve arbitrary optical waveform shapes of several tens of picoseconds with on-the-fly reconfigurability using a potentially chip-integrable pulse sampling scheme. Notably, our experimental results demonstrated the shaping of pulses with similar and broader widths than previously demonstrated using tailored photonic chips based on Fourier synthesis [13]. Moreover, our simulations indicate a pulse processing potential of $\ge\! 100\;{\rm{ps}}$, where previously demonstrated on-chip implementations become increasingly complex as they involve a higher number of optical on-chip components (incl. multiple microresonators, phase shifters, and beam combiners [14,15]). Also, contrary to other on-chip approaches, no high-speed detection equipment or *a priori* simulations for weight determination are needed in our approach. In fact, our sample was not tailored for this specific application, yet could be made useful through the employed optimization technique. In particular, our findings are empowered by the PSO algorithm, which succeeds in reaching performances comparable to other on-chip systems on a user-friendly platform with significantly fewer optical components.

The implemented processing platform features low energy consumption of maximum 1.8 W during operation (${\sim}{{300}}\;{\rm{mW}}$ per switch for the largest voltage applied, optical monitoring and software-based optimization excluded), mainly from the current that is required to hold the correct splitting weights. In terms of versatility, we envision a more ideal platform design by implementing each of the bit-wise increasing delays twice, instead of propagating through the same delay two times, while adding amplitude and phase control to each delay individually (at the expense of system complexity). Future improvements might also include complete on-chip system integration, using, for example, soliton microcombs [44] as a shaping and sampling source and nonlinear waveguides for the optical sampling [35,45].

We further showed that the use of the PSO algorithm can be advantageous for unbiased optimization tasks over simple random search algorithms and the commonly used GA, both in terms of convergence speed and achievable accuracy. We emphasize that convergence within a low number of evaluations is especially important in experimental demonstrations, since the time-limiting element is indeed not the computational time for the algorithm, but the measurement time per parameter setting (i.e., swarm particle or population member). Thus, the capability of the PSO to quickly converge a small population might significantly speed up practical searches for target system outputs, promising performance boosts for ultrafast photonic applications such as interruption-free, adaptive generation of laser pulses, or control of complex cavity dynamics.

We also would like to stress that we do not suggest an exclusive use of the PSO before the GA. Either can run more efficiently than the other, depending on the problem at hand [25–27]. Yet, based on our experiences, we recommend using the PSO in case of optimization tasks involving many free parameters but lacking an intuition for a good set of initial parameters. For such unbiased problems, the PSO seems to converge to a good solution straightforwardly without much effort from the user in optimizing the algorithm’s control parameter space itself. Future work will be conducted on the convergence speed and performance, also by using more sophisticated *adaptive* PSOs, by combining algorithms, such as for example, the PSO and GA [46] or the inclusion of more elaborate machine-learning approaches [8].

## Funding

Strategic Priority Research Program of the Chinese Academy of Sciences (XDB24030300); Research Grants Council, University Grants Committee (GRF 11213618); Canada Research Chairs; Natural Sciences and Engineering Research Council of Canada (Alliance, Banting Program, CGS-M, Vanier Program); Conseil Régional de Nouvelle-Aquitaine (SCIR project, SPINAL project); Agence Nationale de la Recherche (Optimal project, ANR-20-CE30-0004); H2020 European Research Council (950618); Fonds de recherche du Québec—Nature et technologies (PBEEE).

## Acknowledgment

B. W. acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 950618. R. M. is affiliated with Affiliation 6 as an adjoint professor.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

All experimental and numerical data shown in this work are available from the authors upon request.

## Supplemental document

See Supplement 1 for supporting content.

## REFERENCES

**1. **G. Genty, L. Salmela, J. M. Dudley, D. Brunner, A. Kokhanovskiy, S. Kobtsev, and S. K. Turitsyn, “Machine learning and applications in ultrafast photonics,” Nat. Photonics **15**, 91–101 (2021). [CrossRef]

**2. **T. Zahavy, A. Dikopoltsev, D. Moss, G. I. Haham, O. Cohen, S. Mannor, and M. Segev, “Deep learning reconstruction of ultra-short pulses,” Optica **5**, 666–673 (2018). [CrossRef]

**3. **U. Andral, R. Si Fodil, F. Amrani, F. Billard, E. Hertz, and P. Grelu, “Fiber laser mode locked through an evolutionary algorithm,” Optica **2**, 275–278 (2015). [CrossRef]

**4. **G. Pu, L. Yi, L. Zhang, and W. Hu, “Intelligent programmable mode-locked fiber laser with a human-like algorithm,” Optica **6**, 362–369 (2019). [CrossRef]

**5. **F. Meng and J. M. Dudley, “Toward a self-driving ultrafast fiber laser,” Light Sci. Appl. **9**, 26 (2020). [CrossRef]

**6. **G. Pu, L. Yi, L. Zhang, C. Luo, Z. Li, and W. Hu, “Intelligent control of mode-locked femtosecond pulses by time-stretch-assisted real-time spectral analysis,” Light Sci. Appl. **9**, 13 (2020). [CrossRef]

**7. **B. Wetzel, M. Kues, P. Roztocki, C. Reimer, P. L. Godin, M. Rowley, B. E. Little, S. T. Chu, E. A. Viktorov, D. J. Moss, A. Pasquazi, M. Peccianti, and R. Morandotti, “Customizing supercontinuum generation via on-chip adaptive temporal pulse-splitting,” Nat. Commun. **9**, 4884 (2018). [CrossRef]

**8. **C. M. Valensise, A. Giuseppi, G. Cerullo, and D. Polli, “Deep reinforcement learning control of white-light continuum generation,” Optica **8**, 239–242 (2021). [CrossRef]

**9. **S. T. Cundiff and A. M. Weiner, “Optical arbitrary waveform generation,” Nat. Photonics **4**, 760–766 (2010). [CrossRef]

**10. **F. Parmigiani, P. Petropoulos, M. Ibsen, P. J. Almeida, T. T. Ng, and D. J. Richardson, “Time domain add–drop multiplexing scheme enhanced using a saw-tooth pulse shaper,” Opt. Express **17**, 8362–8369 (2009). [CrossRef]

**11. **J. A. Fülöp, Z. Major, B. Horváth, F. Tavella, A. Baltuška, and F. Krausz, “Shaping of picosecond pulses for pumping optical parametric amplification,” Appl. Phys. B **87**, 79–84 (2007). [CrossRef]

**12. **A. I. Latkin, S. Boscolo, R. S. Bhamber, and S. K. Turitsyn, “Doubling of optical signals using triangular pulses,” J. Opt. Soc. Am. B **26**, 1492–1496 (2009). [CrossRef]

**13. **S. Liao, Y. Ding, J. Dong, T. Yang, X. Chen, D. Gao, and X. Zhang, “Arbitrary waveform generator and differentiator employing an integrated optical pulse shaper,” Opt. Express **23**, 12161–12173 (2015). [CrossRef]

**14. **S. Liao, Y. Ding, J. Dong, S. Yan, X. Wang, and X. Zhang, “Photonic arbitrary waveform generator based on Taylor synthesis method,” Opt. Express **24**, 24390–24400 (2016). [CrossRef]

**15. **Y. Xie, L. Zhuang, and A. J. Lowery, “Picosecond optical pulse processing using a terahertz-bandwidth reconfigurable photonic integrated circuit,” Nanophotonics **7**, 837–852 (2018). [CrossRef]

**16. **T. Baumert, T. Brixner, V. Seyfried, M. Strehle, and G. Gerber, “Femtosecond pulse shaping by an evolutionary algorithm with feedback,” Appl. Phys. B **65**, 779–782 (1997). [CrossRef]

**17. **S. Thomas, A. Malacarne, F. Fresi, L. Potì, and J. Azaña, “Fiber-based programmable picosecond optical pulse shaper,” J. Lightwave Technol. **28**, 1832–1843 (2010). [CrossRef]

**18. **S. Boscolo, J. M. Dudley, and C. Finot, “Modelling self-similar parabolic pulses in optical fibres with a neural network,” Results Opt. **3**, 100066 (2021). [CrossRef]

**19. **C. E. Preda, B. Ségard, and P. Glorieux, “Genetic drive of a laser,” Opt. Lett. **29**, 1885–1887 (2004). [CrossRef]

**20. **R. I. Woodward and E. J. R. Kelleher, “Towards ‘smart lasers:’ self-optimisation of an ultrafast pulse source using a genetic algorithm,” Sci. Rep. **6**, 37616 (2016). [CrossRef]

**21. **M. Veli, D. Mengu, N. T. Yardimci, Y. Luo, J. Li, Y. Rivenson, M. Jarrahi, and A. Ozcan, “Terahertz pulse shaping using diffractive surfaces,” Nat. Commun. **12**, 37 (2021). [CrossRef]

**22. **S. Boscolo and C. Finot, “Artificial neural networks for nonlinear pulse shaping in optical fibers,” Opt. Laser Technol. **131**, 106439 (2020). [CrossRef]

**23. **L. Michaeli and A. Bahabad, “Genetic algorithm driven spectral shaping of supercontinuum radiation in a photonic crystal fiber,” J. Opt. **20**, 055501 (2018). [CrossRef]

**24. **J. Kennedy and R. Eberhart, “Particle swarm optimization,” in *International Conference on Neural Networks (ICNN)* (IEEE, 1995), pp. 1942–1948.

**25. **R. C. Eberhart and Y. Shi, “Comparison between genetic algorithms and particle swarm optimization,” in *Evolutionary Programming VII*, V. W. Porto, N. Saravanan, D. Waagen, and A. E. Eiben, eds., Lecture Notes in Computer Science (Springer, 1998), Vol. 1447, pp. 611–616.

**26. **M. Clerc and J. Kennedy, “The particle swarm-explosion, stability, and convergence in a multidimensional complex space,” IEEE Trans. Evol. Comput. **6**, 58–73 (2002). [CrossRef]

**27. **R. Hassan, B. Cohanim, O. de Weck, and G. Venter, “A comparison of particle swarm optimization and the genetic algorithm,” in *46th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference* (American Institute of Aeronautics and Astronautics, 2005).

**28. **H. M. Jiang, K. Xie, and Y. F. Wang, “Design of multi-pumped Raman fiber amplifier by particle swarm optimization,” Guangdianzi Jiguang/J. Optoelectron. Laser **15**, 1190–1193 (2004).

**29. **K. Han, Y. Huang, F. Liu, X. Pang, P. Hu, G. Liu, H. Qin, F. Zhang, X. Ge, X. Liu, and X. Geng, “An intelligent method to design laser resonator with particle swarm optimization algorithm,” Optoelectron. Lett. **14**, 425–428 (2018). [CrossRef]

**30. **D. Guo, L. Yin, and G. Yuan, “New automatic optical design method based on combination of particle swarm optimization and least squares,” Opt. Express **27**, 17027–17040 (2019). [CrossRef]

**31. **K. Ohno, T. Tanabe, and F. Kannari, “Adaptive pulse shaping of phase and amplitude of an amplified femtosecond pulse laser by direct reference to frequency-resolved optical gating traces,” J. Opt. Soc. Am. B **19**, 2781–2790 (2002). [CrossRef]

**32. **J. Girardot, F. Billard, A. Coillet, É. Hertz, and P. Grelu, “Autosetting mode-locked laser using an evolutionary algorithm and time-stretch spectral characterization,” IEEE J. Sel. Top. Quantum Electron. **26**, 1100108 (2020). [CrossRef]

**33. **Y. Park, M. H. Asghari, T.-J. Ahn, and J. Azaña, “Transform-limited picosecond pulse shaping based on temporal coherence synthesization,” Opt. Express **15**, 9584–9599 (2007). [CrossRef]

**34. **L. Yin, H. Wang, B. A. Reagan, and J. J. Rocca, “Programmable pulse synthesizer for the generation of Joule-level picosecond laser pulses of arbitrary shape,” Opt. Express **27**, 35325–35335 (2019). [CrossRef]

**35. **D. J. Moss, R. Morandotti, A. L. Gaeta, and M. Lipson, “New CMOS-compatible platforms based on silicon nitride and Hydex for nonlinear optics,” Nat. Photonics **7**, 597–607 (2013). [CrossRef]

**36. **F. Liu, S. Huang, S. Si, G. Zhao, K. Liu, and S. Zhang, “Generation of picosecond pulses with variable temporal profiles and linear polarization by coherent pulse stacking in a birefringent crystal shaper,” Opt. Express **27**, 1467–1478 (2019). [CrossRef]

**37. **T. Dingkang, Z. Jianguo, L. Yuanshan, and Z. Wei, “Ultrashort optical pulse monitoring using asynchronous optical sampling technique in highly nonlinear fiber,” Chin. Opt. Lett. **8**, 630–633 (2010). [CrossRef]

**38. **N. B. Hébert, S. Boudreau, J. Genest, and J.-D. Deschênes, “Coherent dual-comb interferometry with quasi-integer-ratio repetition rates,” Opt. Express **22**, 29152–29160 (2014). [CrossRef]

**39. **L. Antonucci, X. Solinas, A. Bonvalet, and M. Joffre, “Asynchronous optical sampling with arbitrary detuning between laser repetition rates,” Opt. Express **20**, 17928–17937 (2012). [CrossRef]

**40. **Y. Shmaliy, *Continuous-Time Signals* (Springer Netherlands, 2006).

**41. **M. Chemnitz, M. Baumgartl, T. Meyer, C. Jauregui, B. Dietzek, J. Popp, J. Limpert, and A. Tünnermann, “Widely tuneable fiber optical parametric amplifier for coherent anti-Stokes Raman scattering microscopy,” Opt. Express **20**, 26583–26595 (2012). [CrossRef]

**42. **D. Simon, *Evolutionary Optimization Algorithms* (Wiley, 2013).

**43. **S. Chen and J. Montgomery, “Particle swarm optimization with thresheld convergence,” in *IEEE Congress on Evolutionary Computation* (IEEE, 2013), pp. 510–516.

**44. **B. Stern, X. Ji, Y. Okawachi, A. L. Gaeta, and M. Lipson, “Battery-operated integrated frequency comb generator,” Nature **562**, 401–405 (2018). [CrossRef]

**45. **A. Pasquazi, M. Peccianti, Y. Park, B. E. Little, S. T. Chu, R. Morandotti, J. Azaña, and D. J. Moss, “Sub-picosecond phase-sensitive optical pulse characterization on a chip,” Nat. Photonics **5**, 618–623 (2011). [CrossRef]

**46. **J. Zhang, J. Cai, Y. Meng, and T. Meng, “Genetic algorithm particle swarm optimization based hardware evolution strategy,” WSEAS Trans. Circuits Syst. **13**, 274–283 (2014).