Confinement loss prediction in diverse anti-resonant fibers through neural networks

Gu Zhenyu; Ning Tigang; Pei Li; Liu Yangmei; Li Jing; Zheng Jingjing; Song Jingyi; Zhang Chengbao; Wang Hua; Jiang Wei; Wang Wensheng

doi:10.1364/OE.517026

1. Introduction

The discovery of the low-loss property of Kagome hollow-core fiber has led to a surge in interest in the use of negative curvature medium boundaries to minimize losses in hollow-core fibers (HCFs) [1]. This has given rise to the development of anti-resonant fibers (ARFs), an evolution of Kagome fibers, which maintain the negative curvature core boundary while simplifying the cladding structure. The past decade has seen increased focus on ARFs due to their ongoing loss reduction and superior single-mode performance [2–5].

A key feature of ARFs is the cladding, which is constructed by stacking membranes. When these thin membranes meet the antiresonance condition, the electromagnetic field at one or both of the air-glass interface membranes is minimized, reducing surface scattering loss (SSL) especially in the near-infrared wavelength range [6,7]. Instead, confinement loss (CL), which is structurally dependent, often becomes the primary limitation [7]. Over the past decade, efforts have been made to develop analytical theories and studies on adjustability for single-wall ARFs to reduce CL and improve design efficiency [6,8–10]. However, with the continual surpassing of low-loss records by conjoined-tube fiber (CTF) [4] and nested antiresonant nodeless fiber (NANF) [2], analyzing the boundary conditions for emerging ARF structures has become increasingly complex.

The finite element method (FEM) offers a viable solution to these complex boundary conditions by breaking them down into simpler components. However, optical simulations using FEM typically require high-precision meshing of the ARF structure, with maximum element sizes for air and glass regions set at $\lambda /4$ and $\lambda /6$, respectively, to ensure accurate results [2]. These simulations can be computationally intensive, leading to fiber design often relying on the designer’s experience or extensive calculations. While intelligent algorithms have been introduced for fiber design solutions [11–14], they are still subject to the challenge of extensive computational requirements, exacerbating the issue of insufficient computing power.

In recent years, the expansion of machine learning into optics, particularly in the fields of sensing and optical devices [15–17], has sparked keen interest among researchers. Some investigations have endeavored to forecast CL by employing models trained on sets of ARF structural parameters [18,19]. These models, however, are inherently limited by their specialized parameter space, rendering them less effective for other than certain structures. This specificity restricts their utility to particular configurations, diminishing their effectiveness when applied to designing tube shapes rather than assembling predefined components.

We contend that the challenges observed are rooted in two main deficiencies: the absence of a universally applicable method for structural definition and a lack of adaptive modifications in the extant models. The former hinders the models’ ability to generalize beyond certain structures, while the latter makes it difficult to associate structural segments, ultimately compromising both the predictive accuracy and the training efficiency of the models.

In this study, we employed a method of defining various ARF structures through the reconstruction of positioning anchor points. This method allows for the definition of various axially symmetrical shapes of the same topological structure, eliminating differences in numerous structures such as NANF, CTF, U-shaped ARF, and also supports the definition of a large number of fused structures obtained in our previous research [14], which we call bulb-shaped ARF. Each set of positioning anchor points constitutes a sample, with FEM used to calculate the CL of these structures as labels for the dataset. We emphasize that while traditional optimization or design methods continually use substantial computing power to directly solve problems, we are now employing this computing power to create tools for quickly resolving the issue.

Based on this unique structural definition method, segmenting the tubes based on the position of the anti-resonant layer becomes feasible. We use convolutional neural networks (CNNs) to extract structural features of the tubes, with different channels representing different segments. Pooling and convolutional layers are used to eliminate the relative positional differences of positioning anchor points brought about by different discrete schemes and to extract associations between different segments. Consequently, the model is able to expand the structure of its input from a certain type to several types. Additionally, we adopted a training method for the neural network similar to population-based training (PBT), except that here, the hybrid objects of the genetic algorithm (GA) are not hyperparameters, but biases and weights of different layers of the neural network. This swarm intelligence-based neural network training scheme enables simultaneous deployment across multiple devices. The results demonstrate that our proposed predictive model exhibits generalizability and higher CL prediction accuracy, and the training method employed enhances training speed and predictive capability of the model. The model proposed in this work also shows promising application prospects in the design of other types of non-uniform waveguides.

2. ARF structural deconstruction and dataset generation

The interstitial distance between tubes in ARFs is typically designed to be narrow, yet non-contacting. This design choice is made to avert extra losses induced by Fano resonances [8,20], which are generated by sharp interfaces at contact points, and to curtail light leakage through interstices. Furthermore, to diminish light leakage at the azimuthal coordinates of the tubes, their morphologies should be tailored to minimize the coupling between the internal air modes and the core modes, especially the fundamental mode. Presently, lots of meticulously designed structures exist that can alleviate such leakage, and a multitude of designs boasting ultra-low CL can be traced back to rudimentary dual-ring topology. Representative examples of these ARFs encompass NANFs [2], CTFs [4], and U-shaped fibers [21]. The tube structures in these fibers exhibit analogous topologies and limitations, implying that they can be similarly decomposed and common features extracted.

Considering that ARFs exhibit a light-guiding mechanism similar to Bragg fibers, the relative positioning of the anti-resonant layers significantly impacts its guiding capabilities [22], especially CL. Therefore, despite the computational challenges posed by the complexity of boundary conditions in solving Maxwell’s equations by FEM, it seems possible to quickly predict performance by extracting structural features of the tubes and using appropriate algorithms. In simpler terms, this approach is akin to training a mental arithmetic expert using a multiplication table. The preliminary phase in training this "mental arithmetic expert" involves endowing him with the proficiency to "comprehend" and "interpret" the structural nuances of ARFs.

For the purposes of this study, we refine and expand upon the anchor points reconstruction-based method for structural description. Figure 1 delineates this technique, which provides a comprehensive and adaptable structural definition for a variety of dual-ring topological constructs within ARFs. Given that this investigation does not delve into algorithm-driven structural modifications, the categorization of anchor points is deemed superfluous. We have elucidated a archetypal methodology for structural definition and its application in our earlier research [14]. Herein, we present a further extension on this method, with a particular focus on the segmentation and correlation of fragments.

Fig. 1. Classification of edges and points in different structures. (a) CTF. (b) NANF. (c) the bulb-shaped ARF. (d) Anti-resonant layer assisted ARF. (e) U-shaped nested ARF. (f) Partial enlarged drawing containing free points and (g) sample construction based on free point coordinates.

Download Full Size | PDF

The process begins with the establishment of the backbone for the double-ring topology inside the tube structure, involving the identification of the medial line symbolizing the tube’s thickness. This line is segmented at its nodes, effectively dividing the double-ring topology into three distinct sections. These sections are then classified through an intra-ARF sorting based on the radial distances between each segment’s midpoints and the ARF’s centroid, as depicted in Fig. 1(a)-(f). The segments are color-coded with cyan, green, and yellow representing the three segment categories, respectively. Subsequently, each segment undergoes n-point sampling.

Considering the inherent symmetry of the tube structure, sampling half of the tube is sufficient to capture the entire structural attributes. The sampled data is then distributed into three separate channels based on segment category, resulting in a data array of dimensions $n*2*3$ for each structural entity. Here, $n$ is standardized to 16. A total of 85783 such arrays were successfully generated, and their CL were calculated by FEM. This includes 20467 bulb-shaped ARF structures obtained through swarm intelligence-driven scatter point iteration from our prior research [14]; 9904 randomly interpolated structures obtained through low-density positioning point interpolation followed by high-density sampling; the rest are high-density sampling point sets of some classical structures obtained through parameter scanning. These structures’ thickness is set to $0.372\mu m$, which is obtained by $t=(m-0.5) \lambda /\left [2\left (n_{1}^{2}-n_{0}^{2}\right )^{1 / 2}\right ]$ [23], where $t$ is the tube thickness corresponding to the antiresonant condition when the wavelength is $\lambda$ and the index of the silica glass and air are $n_1$ and $n_0$ respectively. The index of silica glass at different wavelengths is calculated by Sellmeier equation. Their CL distribution and structure proportions are displayed in Fig. 2, where structures with CL less than 1dB/m total 78312.

Fig. 2. CL distribution of 78312 ARF structural samples used for model training

Download Full Size | PDF

Here we want to clarify that Fig. 2 does not reflect the CL performance differences between structures due to the varying acquisition methods. The introduction of different structures aims to compel the predictive model to explore the influence of the shape of the anti-resonant layer and its relative position on the ARF’s light guiding ability. In other words, it seeks to find the correlation between the positioning points within the same anti-resonant layer and between different anti-resonant layers and their CL performance.

3. Strategies for constructing and training neural networks

Compared to the parametrically defined structures commonly used in fiber design [24,25], the scatter points reconstruction approach for structure definition offers some unique advantages. In addition to eliminating discrepancies between classical categories of ARFs, as introduced in the previous section, this method can depict structural details more precisely, such as the configuration of interconnection between anti-resonant layers or how the clad connects to the tubes. Clearly, the scatter points reconstruction modality enables more comprehensive expression of structural characteristics, but these features are not as readily discernible as parameter sets. Instead, they are concealed in the relationships between anchor points.

CNN, through their multilayered convolutional and pooling operations, construct a hierarchical feature extraction framework well-suited for this problem. Figures 3(a) and (b) depict our use of CNNs to extract ARF structural features to aid in predicting CL.

Fig. 3. CL prediction model utilizing CNN: (a) preprocessing of ARF cross sections; (b) architecture of the CNN; (c) training of the CNN with a GA-optimized approach.

Download Full Size | PDF

In the endeavor to predict the CL of ARFs using CNNs, the initial challenge addressed is the enhancement of the model’s sensitivity to interpolated segments. Although a uniform decomposition strategy has been employed for the segmented single segments, it is not a given that each segment’s shape and position contribute equally to the CL, nor is there a clear correspondence between the shapes and positions of any certain two segments interposed between anti-resonant layers. Indeed, these relationships are inherently ambiguous. Therefore, for the preprocessed dataset comprising three channels, an large convolutional kernel is employed initially to discern the correlations across broader spatial extents, succeeded by pooling layers and another convolutional layer to extract higher-order features.

The second challenge is the imbalance in loss function sensitivity. For example, with mean squared error (MSE) directly using the CL as labels implies higher sensitivity for larger CL values, skewing predictions toward higher magnitudes of CL and impeding effective prediction of lower CL values. To address this, a logarithmic transformation was applied to the CL in $dB/m$, yielding a relatively balabced label distribution between −5 and 3. To mitigate training difficulty, extreme values were excluded, retaining only samples with loss below 1 dB/m, corresponding to label values below zero. This curation resulted in a dataset of 78,312 samples.

The third challenge involves quality degradation from high-loss mode patterns. Given that extra loss peaks due to resonance between fundamental and nodal mode are common in some structures, some label errors are inevitable. While these errors do not stem from computation, such occurrences are indeed detrimental to the assessment of ARF performance, as the width of the resonance peaks is very narrow and does not totally reflect the bandwidth characteristics. To avoid extensive computations for data cleansing, we use small batches for frequent updates and regularization, reducing overfitting and error impacts. This comes at the cost of diminished GPU acceleration and potential training instability. We counter this with a GA for layers-exchange, shared across the CNN population, with fitness based on accuracy. The GA refines layer weights and biases, stabilizing parameters and hastening convergence during small-batch training. However, population-based training scales parameter numbers with size, posing challenges for individual PCs. To address this, we developed a distributed framework for training neural networks across devices. Devices communicate through direct cables, local area network (LAN), or wide area network (WAN) and share clipboards, establishing a group call channel for signal interaction. This approach eliminates the need for high-performance hardware, allowing for task allocation and dynamic assignment in large-scale training.

In Fig. 4, we illustrate the proposed framework. As depicted in sub-figure, Fig. 4(a), the entire computational framework comprises a broadcast channel and multiple PCs, with at most one signaling present in the broadcast channel at any given time. The signaling can be categorized into workload-signaling and collection-signaling, which respectively incorporate the most recently received signaling PC into the computation-loop and collection-loop, updating the signaling accordingly.

Fig. 4. (a) Composition of the distributed framework; (b) Program flowchart on single PC; (c) Schematic diagram of the workload-signaling; (d) Schematic diagram of the collection-signaling.

Download Full Size | PDF

During the computation-loop, a protective time is designated for each generation to notify all PCs to download the information of the new generation. Upon expiration of this protective time, the PC first extracts the first to-be-trained network embedded in the signaling, passing the remaining part as new signaling. The extracted network is trained, its fitness is calculated and stored locally, then the PC proceeds to receive the next instruction.

The structure of the workload signaling is illustrated in sub-figure Fig. 4(b). When all $net$ in the workload signaling have been distributed, an $idle$ block remains, indicating the completion of workload distribution. It is identified by the next PC, which updates it into a collection-signaling. As shown in sub-figure Fig. 4(c), the collection-signaling appears as an table containing networks and their fitness. The PCs receiving the collection-signaling fills in the trained neural networks and their fitness at the corresponding positions, updating the signaling and continuing to listen until the table is complete or collection times out.

Collection-timeout refers to a condition where, within a specified time, information collection is not completed and there are still vacancies in the table. This time is set to 1.5 times the longest training time of a single network from each PC starting to broadcast collection signaling, which is approximately 7 minutes. The first PC that experiences a collection-timeout reissues the workload signaling according to the vacancies in the collection signaling, distributing training tasks to other PCs. This step aims to detect offline devices and compensate for lost data, implying that each PC can exit the framework at any time.

These steps are repeated until the table is completed. The first PC that receives a collection-signaling without vacancies, after several iterations, will crossbreed the neural networks according to GA. The newly generated networks and other information are packaged into a workload signaling with a protective timestamp, completing one iteration.

In this study, we formed a cluster of 11 CPUs across three devices for training the neural network. Two cores come from a tablet equipped with an Intel Core i5-8250U processor, and each core can complete approximately 1600 iterations per day, having the lowest priority, and are classified as Class A CPU. Six cores come from a desktop computer, featuring an Intel Core i7-8700 processor, and each core can complete around 1700 iterations per day, having a slightly higher priority, classified as Class B CPU. A workstation also joint the cluster and equipped with a Dell T5820-Core i9-10900X processor. The workstation has 10 cores, and each core can complete approximately 4000 iterations per day, and we have access to three of its cores with the highest priority, classified as Class C CPU. The tablet and desktop are connected via a direct cable, while the workstation is connected to the tablet through a WAN connection.

Due to the influence of population size and priority-based load distribution, certain CPUs experience idle intervals in the workflow. As a result, the computational efficiency does not linearly increase with the computational potential. In Fig. 5, we illustrate the impact of CPU deployment on the calculation acceleration ratio and computational potential. Here, the calculation acceleration ratio is defined as the ratio of the training time using a single Class A CPU to that in the cluster minus 1. Including the tablet in the cluster did not significantly contribute to acceleration. This decision was made partly because the desktop lacks internet connectivity and direct connection to the workstation, requiring the tablet as an intermediary. Additionally, the tablet serves the purpose of promptly replacing offline devices and maintaining data. The total computation time for 5000 iterations of training the neural network presented in Ref [26] was approximately 76 hours.

Fig. 5. The relationship between calculation acceleration ratio and CPU configurations in cluster deployments.

Download Full Size | PDF

4. Results and discussion

Significant disparities exist in the ability of different neural networks to extract structural features of ARF. Figure 6 illustrates a comparative analysis of the training outcomes on a single PC using a backpropagation neural network (BPNN), a Single-channel convolutional neural network (SC-CNN), and a three-channel convolutional neural network (TC-CNN). For this comparison, neither input normalization nor network hybridization was employed but logarithmic procession of labels, and the number of fully connected layers was consistently set at seven. The so-called SC-CNN involves the row-wise concatenation of each sample’s three channels from the dataset, as introduced in Section 2, forming a series of $16*6$ matrices that serve as input.

Fig. 6. Comparing true and predicted values based on (a)BPNN, (b)SC-CNN, and (c)TC-CNN; and (e) predicted accuracy of order of CL-magnitude, (f) MSE and (g) $R^{2}$.

Download Full Size | PDF

In comparison with CNNs, the BPNN did not exhibit satisfactory predictive performance. This is evidenced not only by the inferior metrics presented in Fig. 6(e)-(g), but also by the substantial discrepancies between the predicted results and the true values in sub-figure Fig. 6(a), particularly within the bar region indicated by actual values of $10^{-4}~10^{-3}$ and predicted values about $10^{-3}$, where a dense cluster of prediction errors occurred. Upon our examination, it was found that these errors mostly arose in predictions concerning bulb-shaped ARFs, corresponding to the concentrated distribution of such samples within this CL-range depicted in Fig. 2. Since these fibers were generated by algorithmic with optimal solutions being convergent, a multitude of structurally similar samples were produced, complicating the extraction of structural features and leading to significant confusion. In other words, the BPNN failed to demonstrate an ability to accurately predict loss based on the local curvature and overall shape of the boundaries, and instead, appeared to merely memorize the approximate shapes of different structures.

Besides, there is also a significant disparity in the predictive capabilities of the two CNN models. As depicted in sub-figure Fig. 6(b), the SC-CNN predictions exhibit considerable errors, yet they do not present a dense error cluster observed in the BPNN predictions. This suggests that the CNN’s ability to predict the CL is not merely based on simple shape recognition but rather on grasping certain predictive capacities rooted in structural details. In comparison to SC-CNN, the TC-CNN demonstrates superior predictive accuracy, higher correlation coefficients, and lower MSE, indicating that distinguishing and correlating different segments plays a positive role in extracting advanced structural features and enhancing the prediction of CL in ARFs. Therefore, we select the TC-CNN as the foundation for subsequent endeavors. The default configuration involves sequential layers, including convolutional layers, pooling layers, another convolutional layer, and seven hidden layers. The default parameters are as follows:

• First convolutional layer: 3x2 kernel, 128 filters, stride 3.
• Second convolutional layer: 2x1 kernel, 256 filters, stride 2.
• Pooling layer: Average pooling with a 2x1 kernel and stride 2.
• Hidden layers node counts: 256, 128, 128, 64, 16, 4, 1.

Figure 7(a) illustrates the impact of the number of hidden layers on prediction performance. With a default configuration of seven hidden layers, removing the second and fourth layers for the five-layer setup and further excluding the fifth and sixth layers for the three-layer configuration results in improved prediction. The use of five hidden layers yields enhanced performance, while the increased node-count in seven-layers-network leads to overfitting. In Fig. 7(b), the first convolutional layer’s kernel parameters show that our CNN is insensitive to changes in receptive field, and we refrain from modifying the default settings. Given the small input size, limited optimization potential is found in the pooling layer and the second convolutional layer, no further investigation is conducted here. Figure 7(c) demonstrates the impact of node-count in each layer on prediction performance. Doubling or halving the default node-counts in each layer both led to poor accuracy, prompting the retention of the default configuration.

Fig. 7. Impact of hyper-parameter configuration on neural network training effectiveness: (a) number of hidden layers; (b) convolutional kernel specifications; (c) number of nodes.

Download Full Size | PDF

In the context of our employed structural definition and segmentation methodologies, the distinctive feature of this CNN application, quite differ from conventional image processing techniques, is the ordered nature of the input matrix. Integrating with anti-resonant waveguide theories, the relationships between a given anchor point, its neighboring points, and corresponding points in other segments can influence the waveguiding properties. Consequently, it is worth considering whether it is necessary to enhance the network’s perception of local details, even at some cost. Herein, we have implemented a unique data preprocessing approach intended to partially substitute batch normalization in CNNs. This method is akin to min-max normalization but is distinct in that it is independently applied to each anchor point, as shown in Eq. (1).

(1)$$a_{ijk,\text{norm}} = \frac{a_{ijk}-\min (A_{ijk})}{\max (A_{ijk})-\min (A_{ijk})}$$

where $a_{ijk}$ denotes the data at the $i^{th}$ row and $j^{th}$ column in the $k^{th}$ channel of the input matrix. The set ${A}_{ijk}$ comprises the data from all samples located at the $i^{th}$ row and $j^{th}$ column across the $k^{th}$ channel. It is a location-based min-max normalization.

This processing disassembles the uniform coordinate system centered on the centroid into multiple coordinate systems based on corresponding sampling points among samples, and it applies anisotropic scaling. This amplifies the rate of change in the coordinates, making the differences between samples and among different points within the same sample more pronounced. The trade-off is that the data becomes more abstract, potentially increasing the difficulty in correlating widely separated values within the matrix. For comparison, we have additionally trained two TC-CNNs with and without batch normalization, without employing any hybrid strategies during the training process. The training outcomes are presented in Fig. 8.

Fig. 8. Comparing true and predicted values by TC-CNN (a)without batch normalization, (b)with batch normalization, and (c)with location-based min-max normalization; and (e) predicted accuracy of order of CL-magnitude, (f) MSE and (g) $R^{2}$.

Download Full Size | PDF

In examining the predictive outcomes, it appears that the three processing methods did not exert a significant impact on the predictive performance, compare to the influence of network types. The network that exhibited optimal performance is that utilize a min-max normalization approach for data preprocessing, culminating in a CL-level prediction accuracy of 90.6%. It also achieved a higher $R^2$ value of 0.944 and a lower MSE, ultimately reaching $2.8\times 10^{-5}$. These results validate the effectiveness of our strategy to enhance local features. The other two schemes plateaued at an accuracy rate of approximately 88%, with no significant differences observed between them. However, the network incorporating batch normalization demonstrated a slower convergence rate. We attribute this phenomenon to the batch-based training procedure, where the scaling factor for normalization is determined by the sample features within the batch, leading to fluctuations. Consequently, the normalization layer’s mapping of the same sample is unstable across iterations. Furthermore, the size features that affect ARF performance become obscured, hence the commonly employed batch normalization in CNNs not only failed to improve network performance but also increased the complexity of training.

Given the training instability associated with small batch sizes and the increased vulnerability to label noise when utilizing large batch sizes, we have adopted GA to facilitate the training of CNNs. Here, the crossovers were conducted among layers within neural networks of the same architecture. We established a population size of 10 and conducted two training sessions on four PCs, with batch sizes of 1,000 and 10,000, respectively. The training performance is showcased in Fig. 9. For comparative purposes, training outcomes without the employment of the crossover strategy are also depicted. In this context, we introduce two criteria for assessing prediction accuracy: one is precise prediction, defined as having a relative error of less than $10{\%}$, as shown in Fig. 9(a); the other, consistent with previous discussions, pertains to the prediction of the order of magnitude of CL, as depicted in Fig. 9(b). The $R^2$ and the MSE are presented in Fig. 9(c) and (d), respectively.

Fig. 9. Variation of training performance with iteration count: (a) prediction accuracy with relative error less than $10{\%}$; (b) prediction accuracy for the order of magnitude of CL; (c) $R^2$; (d) MSE.

Download Full Size | PDF

According to the training results, the four training strategies achieved comparable predictive accuracy with the magnitude of CL predicted to be around $90{\%}$ and the precise prediction accuracy being approximately $51{\%}$. An inspection of the $R^2$ and MSE suggests slightly better performance for training based on small batches. More pronounced differences among the training schemes are evident during the training process: training without GA assistance and with small batches exhibited strong fluctuations, especially in the accuracy for precise predictions. In contrast, incorporating GA-assisted training or using larger batches helped mitigate these fluctuations, although the latter came at the expense of slower convergence and some degradation in performance metrics like $R^2$ and MSE. On the contrary, GA-assisted training did not encounter these issues and simultaneously yielded accelerated convergence, an effect that also manifested itself in large batch-based training, where it likewise enhanced the stability of the training process.

In Fig. 10(b), we showcase the predictive performance against actual losses, trained with a batch size of 1000 and employing hybrid auxiliary training. The variation in prediction accuracy with CL-range is illustrated in sub-figure(a). At the extremes of the prediction range, specifically $CL > 0.05 \, \text {dB/m}$ and $CL < 5 \times 10^{-5} \, \text {dB/m}$, there is a noticeable decrease in prediction accuracy and an increase in MSE. Although this trend appears symmetric in sub-figure(a), the distribution of relative errors in sub-figure Fig. 10(c) reveals distinctions between these two intervals.

Fig. 10. Relationship between TC-CNN predictions and loss intervals: (a) accuracy and MSE; (b) comparison of predicted CL and actual CL; (c) distribution of relative errors.

Download Full Size | PDF

For $CL > 0.05 \, \text {dB/m}$, the relative error distribution is dispersed. In this high-loss interval, various factors contribute to elevated losses, such as the extension of nodes modes, boundary modes, or airy mode resonating with the fundamental mode. The cause of high losses is not solely determined by the structure’s restriction on the fundamental mode. Consequently, the labeled values do not effectively reflect the relationship between the structure and its limiting capabilities, leading to a degradation in predictive performance. Concerning the $CL < 5 \times 10^{-5} \, \text {dB/m}$ interval, as illustrated in Fig. 2, the count of samples is limited, and the variety is quite limited. Consequently, the model exhibits poor predictive performance in this interval. Given this observation, when applying this model for predictions, setting reliable intervals is advisable, and disregarding predictions for unreliable intervals is recommended.

Additionally, another unique error pattern manifests as inaccuracies in label, specifically corresponding to the resonance loss peak at the reference wavelength of $1550 nm$. This phenomenon is illustrated in Fig. 11: as the wavelength increases, the membrane-modes excited by the nodes cut off and extend into the air, rapidly reducing the effective index and even intersecting with the fundamental mode. At this point, the propagation constants of the fundamental mode and membrane-mode match, leading to resonance and inducing an additional peak in loss spectrum. Considering that such resonances occur randomly and are relatively sparse, they can not reflect the CL performance in these ARFs. When such resonances happen to coincide with the wavelength of $1550 nm$, the label values based on FEM calculations become unreliable.

Fig. 11. Correspondence between CL-peaks of fundamental mode and crossings of effective index in the range of 1.4-1.6$\mu m$: (a) Variation of effective index for each mode; (b) CL spectrum obtained by FEM calculation.

Download Full Size | PDF

Fortunately, the occurrences of labeling errors in FEM calculations are limited in count and hardly impact the predictive capability of the model. As indicated by the blue line in sub-graph (b), our predictions continue to accurately reflect the CL performance in these ARFs. However, in the comparison plot (Fig. 10(b)) between predicted values and FEM results, it may be misconstrued as an offset in the predicted values.

5. Application

In this application, we have employed the trained CNN to predict the CL of each ARF as fitness to drive the iterations of a particle swarm optimization (PSO) algorithm. The initial population is composed of 2,000 samples with labeled values ranging from −2 to −3, corresponding to CL between $10^{-2} dB/m$ and $10^{-3} dB/m$. The program is executed using the commercial software MATLAB, as we show in Code 1, Ref. [26]. Figure 12(a) presents the flowchart of this application. To prevent the occurrence of singular values affecting the iteration process, we have excluded fitness values below −6.5 from the fitness update; To ensure a superior high-order mode extinction ratio (HOMER) and bending losses (BL), the core size should be restricted. Consequently, we prohibit CLs in the structures with reference core radii exceeding 20$\mu m$ from becoming the best fitness. The parameter update is based on the PSO algorithm.

Fig. 12. (a) Flowchart of ARF design based on CNN-driven-PSO; (b) Evolution diagram of CL during iteration process.

Download Full Size | PDF

During this iteration we performed prediction processes on 200,000 samples using one PC outfitted with a i5-8250U CPU and 8 GB of RAM. The entire operation was completed in a mere 40 seconds, during which we optimized samples with initial losses on the order of $10^{-3} dB/m$, and finally down to $10^{-5} dB/m$. The optimal result is depicted in Fig. 13. Within the 700 $nm$ bandwidth range from 0.9 to 1.6 $\mu m$, this structure achieves low-loss transmission below $10^{-4} dB/m$, while maintaining a HOMER of several hundred. Additionally, owing to the smaller core size, there are no modes in the core with effective indices close to that of the fundamental mode, as is illustrated in sub-figure Fig. 13(b) with $f=2 t \sqrt {n^{2}-1} / \lambda$. The structure exhibits excellent BL performance, and there are no additional resonance-peaks observed either along the gap or in the direction crossing the tube. The bending threshold is approximately at $R_b=5 cm$. For comparison, our preceding efforts involved the use of six PCs in a parallel computing configuration, along with numerically driven iterative calculations, to process 20,467 samples over the course of one week, which corresponds to roughly a tenth of the current sample size, also achieving loss reductions down to $10^{-5} dB/m$ [14]. The current work has dramatically accelerated the computational process from minute-scale to millisecond-scale, substantially enhancing design throughput.

Fig. 13. FEM-verification of prediction-driven design results: (a) comparison of CL-spectra between fundamental mode and $LP_{11}$; (b) normalized effective index; (c) bending loss.

Download Full Size | PDF

The evolution of a sample across generations is depicted in Visualization 1, (Ref. [26]). This sample is one of the 2000 individuals in the population mentioned above. Utilizing prediction-driven, 100 iterations were performed. We compared the predicted values with FEM-calculated values, finding close agreement between them. However, these 100 traditional FEM calculations took us three days. The corresponding geometric files and datasets are presented in [27] and [28] respectively. Despite the algorithm not guaranteeing a reduction in CLs with every refinement due to restrictions on the core radius, it effectively enhances HOMER, achieving excellent mode purity.

6. Conclusions

In this work, we have achieved the definition of a variety of ARFs using a unified approach by decomposing them into sequential anchor points, thereby eradicating the classification boundaries, improving the model’s perception of structural details and constructing a dataset encompassing diverse ARFs. To address the instability and large fluctuations encountered during training with small batch-size, as well as the issue of excessive training data volume, we have introduced a training strategy based on GA-driven layer-exchange among networks and constructed a group-call training framework. Moreover, we have investigated the impact of network type and normalization strategy on model performance, as well as the effects of batch size and the implementation of layer-exchange on training efficacy. The results demonstrate that the use of CNN, coupled with min-max normalization performed individually for each position, yields optimal network performance. The implementation of GA-driven layer-exchange serves to accelerate the training process and significantly mitigate fluctuations in training outcomes. The final neural network model has achieved a CL-magnitude prediction accuracy of at least 90.6%, the highest precision to our knowledge. Compared to traditional evaluation methods based on FEM to obtain CL for fiber performance assessment, our model avoids the evaluation bias introduced by additional loss peaks, thereby providing a more comprehensive reflection of the ARF’s guiding performance. Lastly, we present an application example to illustrate the model’s robust generalizability and reliability. Utilizing the model’s predicted values as fitness, PSO iterations were driven, and the final structure achieves a CL as low as $3\times 10^{-5} dB/m$ at $1.23\mu m$, with a corresponding HOMER of approximately 2000. The effectiveness of the iterative results demonstrates the excellent accuracy and robustness of the prediction algorithm.

This method addresses the challenge of extensive computations required for ARF design driven by FEM, particularly for ARFs with boundary conditions that are difficult to analytically resolve, reducing computation times from the order of minutes to milliseconds. It provides an effective tool for ARF design, exhibiting strong generalizability. The principles and methodologies involved in this study are equally applicable to the design of other non-uniform optical waveguides and can facilitate the exploration of structural possibilities for optical waveguides with various applications.

Funding

National Natural Science Foundation of China (61827817, 62221001, 62235003); State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University (RCS2019ZZ007).

Acknowledgments

This work is jointly supported by the National Natural Science Foundation of China under Grant (62235003, 62221001 61827817), the State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University (Contract No. RCS2019ZZ007)

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are available in Dataset 1, Ref. [27] and Dataset 2, Ref. [28].

References

1. F. Benabid, J. C. Knight, G. Antonopoulos, et al., “Stimulated Raman scattering in hydrogen-filled hollow-core photonic crystal fiber,” Science 298(5592), 399–402 (2002). [CrossRef]

2. F. Poletti, “Nested antiresonant nodeless hollow core fiber,” Opt. Express 22(20), 23807–23828 (2014). [CrossRef]

3. M. B. S. Nawazuddin, N. V. Wheeler, J. R. Hayes, et al., “Lotus-shaped negative curvature hollow core fiber with 10.5 db/km at 1550 nm wavelength,” J. Lightwave Technol. 36(5), 1213–1219 (2018). [CrossRef]

4. S.-f. Gao, Y.-y. Wang, W. Ding, et al., “Hollow-core conjoined-tube negative-curvature fibre with ultralow loss,” Nat. Commun. 9(1), 2828 (2018). [CrossRef]

5. G. T. Jasion, H. Sakr, J. R. Hayes, et al., “0.174 db/km hollow core double nested antiresonant nodeless fiber (dnanf),” in 2022 Optical Fiber Communications Conference and Exhibition (OFC), (2022), pp. 1–3.

6. E. N. Fokoua, F. Poletti, and D. J. Richardson, “Analysis of light scattering from surface roughness in hollow-core photonic bandgap fibers,” Opt. Express 20(19), 20980–20991 (2012). [CrossRef]

7. E. N. Fokoua, S. A. Mousavi, G. T. Jasion, et al., “Loss in hollow-core optical fibers: mechanisms, scaling rules, and limits,” Adv. Opt. Photonics 15(1), 1–85 (2023). [CrossRef]

8. L. Vincetti and V. Setti, “Extra loss due to fano resonances in inhibited coupling fibers based on a lattice of tubes,” Opt. Express 20(13), 14350–14361 (2012). [CrossRef]

9. L. Vincetti and V. Setti, “Theoretical and experimental investigation of light guidance in hollow-core anti-resonant fiber,” Acta Phys. Sin. 67(12), 124201 (2018). [CrossRef]

10. W. Ding and Y. Wang, “Analytic model for light guidance in single-wall hollow-core anti-resonant fibers,” Opt. Express 22(22), 27242–27256 (2014). [CrossRef]

11. S. Chugh, A. Gulistan, S. Ghosh, et al., “Machine learning approach for computing optical properties of a photonic crystal fiber,” Opt. Express 27(25), 36414–36425 (2019). [CrossRef]

12. X. Hu and A. Schülzgen, “Design of negative curvature hollow core fiber based on reinforcement learning,” J. Lightwave Technol. 38(7), 1959–1965 (2020). [CrossRef]

13. F. Meng, X. Zhao, J. Ding, et al., “Discovering extremely low confinement-loss anti-resonant fibers via swarm intelligence,” Opt. Express 29(22), 35544–35555 (2021). [CrossRef]

14. G. Zhenyu, N. Tigang, P. Li, et al., “Antiresonant fiber structures based on swarm intelligence design,” Opt. Express 31(16), 26777–26790 (2023). [CrossRef]

15. A. G. Leal-Junior, V. Campos, C. Díaz, et al., “A machine learning approach for simultaneous measurement of magnetic field position and intensity with fiber bragg grating and magnetorheological fluid,” Opt. Fiber Technol. 56, 102184 (2020). [CrossRef]

16. A. Leal-Junior, H. Rocha, P. L. Almeida, et al., “Force estimation with sustainable hydroxypropyl cellulose sensor using convolutional neural network,” IEEE Sens. J. 24(2), 1366–1373 (2024). [CrossRef]

17. C. Zhu, O. Alsalman, and W. Naku, “Machine learning for a vernier-effect-based optical fiber sensor,” Opt. Lett. 48(9), 2488–2491 (2023). [CrossRef]

18. F. Meng, X. Zhao, J. Ding, et al., “Use of machine learning to efficiently predict the confinement loss in anti-resonant hollow-core fiber,” Opt. Lett. 46(6), 1454–1457 (2021). [CrossRef]

19. F. Meng, J. Ding, Y. Zhao, et al., “Artificial intelligence designer for optical fibers: Inverse design of a hollow-core anti-resonant fiber based on a tandem neural network,” Results Phys. 46, 106310 (2023). [CrossRef]

20. L. Vincetti and V. Setti, “Fano resonances in polygonal tube fibers,” J. Lightwave Technol. 30(1), 31–37 (2012). [CrossRef]

21. W. Zheng, “Theoretical study and performance simulation of a novel hollow-core antiresonant fiber,” Master’s thesis, Guangdong University of Technology (2022).

22. Y. Wang and W. Ding, “Confinement loss in hollow-core negative curvature fiber: A multi-layered model,” Opt. Express 25(26), 33122–33133 (2017). [CrossRef]

23. C. Wei, R. J. Weiblen, C. R. Menyuk, et al., “Negative curvature fibers,” Adv. Opt. Photonics 9(3), 504–561 (2017). [CrossRef]

24. W. Wang, T. Ning, L. Pei, et al., “Design and characteristics of rectangular-assistant ring-core fiber for space division multiplexing,” Acta Opt. Sin. 41, 0906001 (2021). [CrossRef]

25. W. Xu, L. Pei, J. Wang, et al., “Mode field modulation and gain equalization with central depressed few-mode erbium-doped fiber,” Acta Opt. Sin. 42, 2106002 (2022). [CrossRef]

26. Z. Gu, “Three-channel-CNN application example for CL prediction in ARF,” figshare (2023).https://doi.org/10.6084/m9.figshare.25183070

27. Z. Gu, “The mphbin files of the evolution of a sample across generations,” figshare (2024), https://doi.org/10.6084/m9.figshare.25183118.

28. Z. Gu, “The sets of the evolution of a sample across generations,” figshare (2024), https://doi.org/10.6084/m9.figshare.25183106.

Name	Description
Code 1	An example code demonstrating ARF design using predicted values driven PSO iterations.
Dataset 1	The mphbin files of the evolution of a sample across generations.
Dataset 2	The sets of the evolution of a sample across generations.
Visualization 1	The evolution of a sample across generations.

Confinement loss prediction in diverse anti-resonant fibers through neural networks

Abstract

1. Introduction

2. ARF structural deconstruction and dataset generation

3. Strategies for constructing and training neural networks

4. Results and discussion

5. Application

6. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Supplementary Material (4)

Data availability

Cited By

Figures (13)

Equations (1)

Optics Express