Intelligent water perimeter security event recognition based on NAM-MAE and distributed optic fiber acoustic sensing system

Mingyang Sun; Miao Yu; Haoran Wang; Kaiwen Song; Xinyu Guo; Songfeng Xue; Hongwei Zhang; Yanbin Shao; Hongliang Cui; Tianying Chang; Tianyu Zhang

doi:10.1364/OE.498554

1. Introduction

Water perimeter security is of great significance for protecting Marine resources, preventing illegal invasion and maintaining maritime security. However, the traditional water perimeter security system has some limitations, such as difficult sensor layout, limited coverage and so on. Distributed optical fiber acoustic sensing (DAS) based on phase-sensitive optical time-domain reflectometry (OTDR) is a new type of sensing technology that realizes the continuous and distributed detection of acoustic signals using optical fiber backward Rayleigh scattering. It inherits the advantages of common optical fiber sensing technologies, which include anti-electromagnetic interference, excellent invisibility, corrosion resistance, and insulation, and also supports long-distance, distributed, and real-time measurement of dynamic strain (vibration and sound waves) along the optical fiber line. Therefore, this system is particularly suitable for perimeter security in important areas [1–6].

However, the main sources of noise in the current DAS system are interference fading and polarization fading. When coherent detection pulse light passes through the optical fiber, a random scattering medium, it creates Rayleigh scattered signals that exhibit interference fading during coherent superposition. When the coherent detection light is orthogonal to the local light polarization state, the two light paths do not interfere, resulting in polarization fading. Both phenomena greatly weaken the signal at the fading points, leading to severe signal distortion. This is currently the presents the primary challenge in the DAS system. Many scholars have made efforts to explore from different angles. Reference [7,8] achieved good suppression effects by improving the system structure and applying assisted algorithms. However, improving the system structure always comes with increased equipment costs and complexity. Reference [9,10] achieved fading suppression solely through algorithms without increasing system costs and can be directly applied to existing systems. Underwater, the coupling between optical cables and the environment is low, making it more susceptible to fading due to factors such as water waves. Therefore, effective implementation of fading suppression plays an important role in secure event identification in the system.

In recent years, several methods have been proposed for intelligent event recognition in distributed optical fiber sensing systems, achieving good results [11–13]. Classical machine learning is still in use. Examples include the support Vector Machine (SVM) [14–16], Relevance Vector Machine (RVM) [17], eXtreme Gradient Boosting (XGBoost) [18], and Probabilistic Neural Network (PNN) [19], among other methods [20–23]. They are usually accompanied by corresponding feature extraction algorithms, such as the Hilbert transform, Mel-Cepstrum coefficient, variational mode decomposition, and principal component analysis. These machine-learning methods have significantly contributed to the study of signal characteristics and have achieved high recognition accuracy. However, limitations such as the high complexity of signal preprocessing and the inability to support large datasets render them unable to adapt to the large amount of signal data collected by DAS systems with complex information.

By contrast, deep learning can automatically learn the most representative features from raw data without human intervention or by manually designing feature extractors. It can be trained on large-scale datasets to achieve high-precision prediction or classification of data with complex structures and patterns; therefore, deep learning is more suitable for DAS intelligent event recognition. Some scholars believe that the time-frequency information of DAS signals can be directly input into a Convolutional Neural Network (CNN) for event recognition and classification [24–26]. However, others believe that DAS signals must be transformed to input more evident features into a CNN [27]. Some scholars have improved classic CNN models to enhance the model performances [28–30]. These methods apply CNN-based models to process 2D images. They typically have small models, are fast to train, and can achieve high accuracy. However, time-frequency analysis is more applicable to the scenario where the cable and the environment are highly coupled and can be quickly located to the vibration point. However, the underwater environment mainly detects fluctuations, which requires the analysis of time and space information at the same time.

Several researchers have suggested feasible concepts to address these limitations [31–36]. Shi et al. proposed a transfer training event recognition method that freezes the front-end structure of a pre-trained network to train the remaining layers in the transfer learning direction. The accuracy of this method reached 96.16% for training with more samples and 95.56% for training with fewer samples [31]. The transfer learning method can improve the generalization of the model and the training speed. However, manually labeling large amounts of data during the pre-training phase greatly increases labor costs. Tian et al. proposed a channel temporal convolutional network based on channel attention, which combined spatial attention and a bidirectional long short-term memory network for signal recognition. This method exhibited good classification performance with a 93.4% accuracy rate and a zero interference alarm rate [32]. Liu et al. proposed a ResNet with time-frequency features as input and added a convolutional Block Attention Module (CBAM). The average accuracy of the field test of this method was more than 99.014%, and it could still be maintained at more than 91.08% when dealing with multiple scenes and inconsistent lying environments. These two methods are based on the attention mechanism and focus on the key information of different channels and specific signal structures to further improve system recognition performance [33]. However, actively extracted features may involve bias in the interpretability of the attention mechanism. Almudevar et al. and Zhang et al. proposed the use of unsupervised learning models for signal classification and recognition in distributed optical fiber sensing systems [34,35]. The main advantage of unsupervised learning is that no human labeling is required. The relaxation of the training set requirement increases the applicability of this technique for event detection in real environments without prior knowledge of the underlying events. However, in the absence of a clear objective, the effects of unsupervised learning models are usually unstable. Although the accuracy can be improved by manually correlating the results of unsupervised learning with actual categories [36], it incurs additional labor costs. In general, the above methods provide several improvements in the signal classification and recognition abilities of DAS. However, they all solve the existing problem from a single perspective. Combining these proposed concepts from multiple directions can further improve the model performance in water area signal recognition.

For the processing of water signals collected by DAS system, the fading suppression, simultaneous analysis of temporal and spatial information are required, and the features without active bias can be extracted, and these features can be used for supervised classification. Combining the advantages of representation learning, attention mechanism, and self-supervised learning, this study proposes an intelligent event-recognition algorithm based on a Noise Adaptive Mask-Masked AutoEncoder (NAM-MAE) and applies it to a DAS system. This method considers the discovery of noise as the premise and generates a noise-free signal by masking the noise and inputting it into a deep learning model for the model to learn directly, thereby significantly reducing the complexity of signal preprocessing. A scalable self-supervised learner, the Masked AutoEncoder (MAE) [37], is used as the deep learning model. It combines the mask method with the attention mechanism to achieve significantly high accuracy with the simplest encoder-decoder model structure. Therefore, the NAM-MAE algorithm uses a new mask method that is more suitable for complete event recognition by DAS. First, based on the differential concept, the amplitude signal of DAS is analyzed to determine the fading points, and the digital difference is used to highlight these noise points. Then, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used for judgment, thus effectively extracting the noise points and generating a noise discrimination matrix. Finally, on the one hand, the upstream task of MAE performs model pre-training. The basic characteristics of DAS signals are learned autonomously using random masks. On the other hand, the downstream task of MAE changes the mask mode and generates a mask according to the noise discrimination matrix to complete the training of intelligent event recognition. In this experiment, five types of events, including four typical water perimeter security events and no abnormal events collected in the field, are used as classification data; further, NAM-MAE is compared with MAE and five other related models. The experimental results demonstrate the effectiveness and superiority of the proposed method.

2. Methods

2.1 Distributed optic fiber acoustic sensing

The structure of the DAS system is illustrated in Fig. 1. The output light of an extremely narrow-linewidth laser with a wavelength of 1550.12 nm is divided into two paths after passing through a fiber coupler, 90% of which is used as the probe light and 10% of which is used as the local light. After the probe light enters an acoustic-optic modulator (AOM) driven by an arbitrary waveform generator (AWG), it is modulated into a probe pulse light, which is shifted in frequency. The detection pulse then enters the fiber amplifier for amplification, enters the sensing fiber via the fiber circulator, and returns the Rayleigh backscattered light. After interfering with the local light through the optical coupler, it is converted into an electrical signal by the balance detector and collected on a computer using an acquisition card with an acquisition rate of 1 G/s.

Fig. 1. DAS system structure and physical diagram.

Download Full Size | PDF

The DAS system passes the collected signal through a digital I/Q demodulation algorithm [38], obtaining two orthogonal signals:

(1)$$\begin{array}{l} I \propto {E_s}(z){E_L}(z)\cos {\phi _s}(z),\\ Q \propto {E_s}(z){E_L}(z)\sin {\phi _s}(z), \end{array}$$

where E_S is the Rayleigh backscattered light, E_L is the local light, and z is the fiber location. The amplitude and phase information of the I and Q signals are synthesized as follows:

(2)$${E_s}(z){E_L}(z) \propto \sqrt {{I^2} + {Q^2}} ,$$

(3)$${\phi _s}(z) \propto \arctan (Q/I) + k\pi ,$$

Figure 2 shows the signal collected by DAS. According to Eq. (2), the DAS amplitude signal is obtained, as shown in Fig. 2(a). It can be observed that the amplitudes at the three positions of 38, 87, 152, and 197 m are significantly small, and these points are the fading points. Figure 2(b) shows the phase data collected by DAS. It can be observed that excluding the transition from π to -π, the three positions of 38, 87, 152, and 197 m exhibit a jump, further confirming that these points are the fading points. It can be concluded that the position of the fading point determined by the amplitude signal is the position of the fading point of the phase signal.

Fig. 2. Signals collected by DAS (a) amplitude signal; (b) phase signal.

Download Full Size | PDF

2.2 Fading analysis

The backscattered Rayleigh light E_S at position z is as follows:

(4)$${E_S}(z,t) = r(z,t){E_0}f(t)\cos (\theta )\textrm{exp} ( - \alpha z/2 - j{\omega _s}t)\textrm{exp} (j2\int_0^z {\beta dy} ),$$

where, r(z,t) is the random Rayleigh scattering coefficient, E₀ is the intensity of the incident probe light, f(t) stands for the probe pulse waveform, cos(θ) is the polarization-related factor, and θ is the polarization state angle between E_S and E_L, β is the propagation constant and can be expressed as $\beta = 2\pi {n_{neff}}[{T(z),\varepsilon (z)} ]/\lambda $.

In the DAS system, theoretically, the polarization states of E_S and E_L remain unchanged. However, there are random non-uniformities and irregularities in the optical fiber that cause changes in their polarization states. When the polarization states of E_S and E_L are orthogonal, cos(θ) = 0, resulting in polarization fading. In this case, E_S and E_L do not interfere, resulting in a very low amplitude, or even 0.

In the sensing fiber, the integrated scattering coefficient r varies identically for each pulse width transmitted along the fiber. However, as the sensing fiber is affected by the temperature and strain variations along the fiber, the effective refractive index of the sensing fiber changes. According to Ref. [39], the formula for the comprehensive scattering coefficient is expressed as follows:

(5)$${r_c}(z )= \int_z^{z + \Delta z} {rf(x/v)\textrm{exp} (j\left[ {2(\omega /c)\int_z^x {\Delta {n_{eff}}(y)dy + 2{{\overline n }_{eff}}} \Delta \omega /c} \right]dx)} ,$$

where Δn_eff is the change in the refractive index. It can be seen from Eq. (5) that the integrated scattering coefficient is determined by the change in the refractive index. As the temperature and strain are unevenly distributed along the fiber, although they are very small, the cumulative effect is large, thus causing the integrated scattering coefficient to change randomly. Finally, the amplitude of the Rayleigh scattering waveform at some positions becomes significantly small or even zero, thus producing an interference fading effect. When the random change in the refractive index results in significantly small amplitudes at some positions of the two-channel mixing signals simultaneously, the corresponding position of the amplitude signal exhibits a fading phenomenon after synthesis using Eq. (2).

The differentials describe the rate of change in a function. Therefore, when fading occurs, there will be a significant attenuation in the amplitude, and differentiation can be used to identify these fading points. According to Equations (2), (3) and (4) and converting the complex function to a real function, the equation for the amplitude A(z) can be obtained:

(6)$$\begin{aligned} A(z) &\propto |{{E_L}(z){E_S}(z)} | = |{{E_L}(z){E_0}{e^{ - \alpha z}}\cos (\theta ){r_c}(z)} |\\ &=\left|{{E_L}(z){E_0}{e^{ - \alpha z}}\cos (\theta )\int_z^{z + \Delta z} {\overline r } f(x/v)\cos (\left[ {2(\omega /c)\int_z^x {\Delta {n_{eff}}(y)dy + 2{{\overline n }_{eff}}} \Delta \omega /c} \right]dx)} \right|. \end{aligned}$$

As shown in Fig. 2(a), the position of the fading point exhibits a significantly low amplitude and cusp. According to the theory of absolute values, when there is a position z that makes A(z) equal to 0, the derivative A'(z) of A(z) at that point is discontinuous. In Eq. (6), when polarization fading occurs, cos(θ)→0. When interference fading occurs, r_c(z)→0. The amplitude information after the first-order differential is shown in Fig. 3(a), indicating that the position of the fading point represents a discontinuity; hence, the fading effect causes the signal to appear as non-differentiable points. During fading, the non-differentiable point is the fading point, and the difference between the left and right derivatives at the fading point is larger than that at the normal point. Although it is theoretically impossible to continue differentiating, the DAS system collects digital signals. Therefore, the digital difference in the amplitude information after the first-order differential reflects its variation and highlights the position of the non-differentiable point. The amplitude information after the digital difference is shown in Fig. 3(b), which indicates that the position of the fading point is prominent, and its amplitude is a multiple of that of the other positions.

Fig. 3. (a) Amplitude information after first-order differential; (b) Amplitude information after digital difference; (c) DBSCAN algorithm processes the amplitude information after digital difference.

Download Full Size | PDF

2.3 DBSCAN principle

The analysis shown in Fig. 3(b) indicates that the difference between the fading and normal points is relatively large; as the fading and normal points are sparse and compact, respectively, it is appropriate to apply the density-based clustering algorithm; the most typical method is the DBSCAN algorithm. It defines a cluster as a set of samples connected by the maximum density derived from the density reachability relation and can partition regions with sufficiently high density into clusters and find arbitrarily shaped clusters in noisy spatial databases. The DBSCAN algorithm requires two hyperparameters: neighborhood radius R and the minimum number of points Minpoints. The points whose number of sample points within the neighborhood radius R is greater than or equal to Minpoints are called core points. The points not belonging to the core but are in the neighborhood of core points are called boundary points. The points that pass from one core point to another are called density-reachable points. According to Fig. 3(b), the overall distribution of data points is near 0, so the standard deviation can be used as a standard to judge the degree of dispersion. So R is set to the standard deviation of the amplitude. Theoretically, a single fading point will only have a mutation value of one point. However, in practical applications, there is often not only one fading point. Therefore, Minpoints need to be set according to how many points are likely to occur in actual fading. According to the algorithm definition, the value of Minpoints must be greater than 3. After many experiments, we give the recommended setting range of Minpoints to be 6-15.

The flow of the DBSCAN algorithm is described as follows:

1) Set parameter R and Minpoints. Select a data object point p from the the spatial amplitude information after digital difference.
2) For parameters R and Minpoints, if the selected data object point p is the core point, then all data object points reachable from p density form a cluster.
3) If the selected data object point p is the edge point, select another data object point.
4) Repeat steps (2) and (3) until all points are processed.

The DBSCAN algorithm is applied to process the amplitude information after the digital difference, as shown in Fig. 3(c), where Class 1 represents the normal point, and Class 0 indicates the noise point, that is, the fading point. Through this judgment, the fading points of each group of signals that must be processed can be determined, and a fading discriminant matrix can be formed for subsequent processing.

2.4 NAM-MAE

According to Eq. (3), the range of phase data collected by DAS system is [-π, π]. Therefore, it can be normalized to [0, 1] and used as the input to the model in the form of a regular image. This method is more generalized than the phase signal processed by the traditional method. Figure 4 shows the process of converting phase information into model input. As shown in Fig. 4(a), 300 m phase information of 3s time is combined to form a 300 × 300 input as shown in Fig. 4(b).

Fig. 4. Process of converting phase information into model input. (a) Spatial phase information along the timeline; (b) Model input

Download Full Size | PDF

The model structure of MAE is shown in Fig. 5. The MAE method involves creating mask random patches of the input image and reconstructing the missing pixels. It is mainly based on two core designs. First, masking most input images leads to important and meaningful self-supervised tasks. Second, it develops an asymmetric encoder-decoder architecture. The encoder is Vision-Transformer (ViT), which contains L transformer blocks and operates only on unmasked patches (without mask markers). Transformer block is the basic component unit of transformer model, which is composed of multi-head attention mechanism and feedforward neural network. The multi-head attention mechanism implements self-attention computation and can capture the long-term dependencies of different positions in the sequence. Feedforward neural networks include multi-layer perceptron (MLP) and layer normalization. It is responsible for nonlinear transformation and mapping of multi-head attention output. The stacking of transformer blocks forms the transformer model. The decoder is a transformer containing N transformer blocks, the number of N is much less than L. So it's an asymmetric structure, and the decoder is lightweight relative to the encoder. The decoder can reconstruct the original image from the underlying representation and mask markers. The mean square error (MSE) is selected as the evaluation index of the loss function, and the reconstructed image is compared with the target image to judge the performance of each training step. Combining these two designs enables efficient training of large models—speeding up training (3x or more) and improving model accuracy; scalable methods allow the learning of high-capacity models with good generalizability.

Fig. 5. Structure of the MAE model.

Download Full Size | PDF

In the original MAE model, the mask mode is a random mask aimed at natural images. However, in DAS, to further reduce the interference of the fading noise on useful information, the mask mode of the MAE must be replaced via a special masking method according to the location information of the fading point. Therefore, this study proposes a NAM-MAE model that generates masks for the MAE based on a faded matrix. In the model, the DAS amplitude signal is used as the basis for generating the faded matrix, whereas the phase signal is used as the input image of size 300 × 300. The structure of the model is shown in Fig. 6. The model is divided into two parts: an upstream pre-trained model and a downstream classification task model. The DAS separately collects phase signals for upstream and downstream tasks. The phase signal of the upstream task is mainly to enable the model to recognize autonomous features through unsupervised learning. It has no situational limitations and can be the phase data collected by the DAS system in experimental stage or application stage. The phase signal of the downstream task is the disturbance event signal collected by DAS system in the actual application environment.

Fig. 6. Structure of the NAM-MAE model (a) Pre-trained model architecture; (b) Classification task model architecture; (c) Faded mask generation process.

Download Full Size | PDF

The first is the pre-trained model structure of the MAE, as shown in Fig. 6(a). The MAE's pre-trained model masks the phase signal with random mask to generate the input image of the upstream task. The mask ratio is set to 75%. The input image is fed into the encoder, and the encoder of the MAE selects the ViT-base [40]. Subsequently, it is inputted into the decoder, and the decoder selects the transformer structure of 4 transformer blocks; finally, the reconstructed image is obtained. This is a unsupervised learning process in which the model can learn the features of the phase signal autonomously.

The second is the downstream classification task model structure of the MAE, as shown in Fig. 6(b). The MAE's downstream classification task model masks the phase signal with faded mask to generate the input image of the downstream task into the classification model. The downstream task structure of the original MAE is fine-tuned to extract only class tokens. The MLP Head serves as the output layer of the downstream task model. It takes the class token generated by the encoder as input, processes the features through a linear layer and a non-linear activation function, and generates the final prediction result. The pre-logits is the output of a linear transform layer in the MLP Head that precedes the output layer. It is used to map feature vectors from previous levels to higher dimensions and to make representations before nonlinear transformations. The classification task is trained for the five collected event types.

The NAM generation process is illustrated in Fig. 6(c). The NAM generation process is as follows: The fading judgment is generated by DBSCAN after the amplitude signal collected by the DAS system is digitally differenced twice. According to the principle of the ViT model, the faded matrix is divided into 30 × 30 patches of size 10 × 10. Masking the patches that are judged to have fading points by the faded matrix (as indicated by the blue box in the faded matrix). The mask ratio is also set to 75%. Based on the faded mask, a random mask is applied to the remaining positions according to the mask ratio to generate the faded mask used for MAE.

3. Experiment and results

In this study, the water perimeter security events collected in the underwater acoustic experiment of the DAS system were used as experimental data. The DAS system was placed on the shore, and a section of winding acoustically sensitive optical fiber cable, approximately 300 m long, was placed in the lake; a rope was used at the end to connect the optical fiber cable to the opposite bank. The cable was tied with floating balls and weights every 10 m so that it was suspended approximately 3 m underwater. Each sample is defined as phase data collected by the DAS system. 300 m optical cable is used, one point per 1 m. The duration is 3s. The physical size is 300 × 300.

The DAS system was used to collect four typical water perimeter security events and one static state without any events; therefore, a total of five events were analyzed, as shown in Fig. 7. These events are as follows: (I) Drone ship: the use of remote-controlled vessels for surface reconnaissance; (II) Boating: driving a vessel, such as a kayak, close to a cable; (III) Demolition optic cable: using tools to forcibly destroy the fiber optic cable; (IV) Sonar detection: the use of sound waves of certain frequencies for detection; (V) No event: only ambient noise around the cable.

Fig. 7. Five types of events arranged at the experimental site.

Download Full Size | PDF

The data visualization of the five events is shown in Fig. 8. The first row is the amplitude signal, and the second row is the phase signal. It can be seen from the figure that the fluctuation range of class I is wide and dense. The fluctuations of class II are wide and sparse. Class III has the widest and densest range of fluctuations. Class IV has a fixed frequency fluctuation at 50 m. Class V is generally stable.

Fig. 8. Data visualization of the five events. (a) Amplitude signal; (b) Phase signal

Download Full Size | PDF

The experiments were conducted using a 10-fold cross-validation to reduce the randomness of the experimental results and improve their accuracy. Each type of data was divided into 10 parts, and 10 tests were performed using eight groups as the training set and two groups as the validation set in turns. These were mutually exclusive sets. The role of the validation set was to evaluate the training results after each iteration during the training process and return the evaluation results to the network to improve the training effect in the next iteration. Subsequently, a test set of 2,000 samples of each type that were completely incompatible with the training and validation sets was used. The test set was used to evaluate the fully trained model and its performance. The corresponding correct rates were obtained for each test. After 10 tests, the average of all the results was used as the performance estimate of the model. The datasets used in the experiments are listed in Table 1.

Table 1. Data sets used in the experiments

View Table | View all tables in this article

The experiments were conducted using a 10-fold cross-validation to reduce the randomness of the experimental results and improve their accuracy. Each type of data was divided into 10 parts, and 10 tests were performed using eight groups as the training set and two groups as the validation set in turns. These were mutually exclusive sets. The role of the validation set was to evaluate the training results after each iteration during the training process and return the evaluation results to the network to improve the training effect in the next iteration. Subsequently, a test set of 2,000 samples of each type that were completely incompatible with the training and validation sets was used. The test set was used to evaluate the fully trained model and its performance. The corresponding correct rates were obtained for each test. After 10 tests, the average of all the results was used as the performance estimate of the model. The datasets used in the experiments are listed in Table 1.

3.1 Comparison between NAM-MAE and MAE

To demonstrate the superiority of NAM-MAE in the masking method, it is compared with the MAE using a random mask. The two models differ only in terms of the masking method, and all other model parameters are the same.

The training dataset used for the upstream and downstream tasks of NAM-MAE comprised 50% of the total training dataset, and the same was true for the MAE model. Each of the two models was trained for over 500 rounds. A comparison plot of the average values of the training results with ten-fold cross-validation is shown in Fig. 9. The accuracy curve is shown in Fig. 9(a), and the loss curve is shown in Fig. 9(b). The red line in the figure represents the training curve of the MAE; it can be observed that the curve significantly oscillates during the training process, indicating that multiple training processes are relatively unstable. The blue line represents the training curve of NAM-MAE. Compared with MAE, the training curve of NAM-MAE is relatively flat. This shows that even after multiple training sessions, the model training results remain relatively stable. Further, the overall training accuracy is higher than that of MAE, thus indicating that the learning ability of NAM-MAE is stable and efficient.

Fig. 9. Comparison of the average values of the ten-fold cross-validation training results of the two models. (a) Accuracy curve; (b) Loss curve

Download Full Size | PDF

The performance index pairs for the two models are listed in Table 2. In the training set, the average training accuracy and average training loss of MAE are 95.3979% and 0.2196, respectively. The average training accuracy of NAM-MAE is 99.2333%, which is 3.8351% higher than that of MAE; moreover, the average training loss of NAM-MAE is 0.0983, which is 0.1213 lower than that of MAE. In the test set, the average test accuracy of MAE and NAM-MAE are 93.1912% and 96.6134%, respectively; the average test accuracy of NAM-MAE is 3.4232% higher than that of MAE. The three indicators of the overall test of NAM-MAE are 3.3333% higher than those of MAE, on average.

Table 2. Comparison of the performance indicators of the two models

View Table | View all tables in this article

In summary, MAE, with the improved mask strategy, is more stable and efficient in extracting useful information from a signal with high accuracy for training and has better generalization. This demonstrates the advantages of NAM-MAE in signal augmentation and self-supervised learning. Thus, NAM-MAE is confirmed to improve MAE.

3.2 Comparison of NAM-MAE with other models

To prove the effectiveness and superiority of NAM-MAE in terms of attention mechanism and self-supervised learning, five existing event recognition methods were selected for comparison. These include CNN-CBAM [34], Vision-Transformer [40], Swin-Transformer [41], SE-ResNet [42], and ResNet-18. CNN-CBAM method and SE-ResNet method add attention mechanism module into the CNN model. Comparing with NAM-MAE, it can be proved that transformer is superior to CNN's attention mechanism in underwater acoustic signal classification. The Vision-transformer method is a NAM-MAE encoder that has no pre-training process. By comparison, it can be proved that self-supervised learning is better than supervised learning only. The Swin-transformer method uses supervised pre-training model weights to retrain experimental data. By comparison, we can prove that self-supervised learning using unsupervised pre-training is superior to transfer learning using supervised pre-training. The ResNet-18 method is a classical CNN model, and its comparison can prove whether the traditional typical CNN is superior to the new and more powerful NAM-MAE model. The parameters of these methods were set according to the previous reports or past experience. The same dataset and preprocessing method were used for 500 rounds of training to ensure the effectiveness of the comparison.

The training accuracy results for the five models are shown in Fig. 10. The figure shows that NAM-MAE has a higher training accuracy and higher convergence speed than the other models. Their learning speeds and abilities are very strong. Although the final training accuracy of the Swan-Transformer is high, the overall convergence speed is significantly low, and no convergence trend is observed after 500 rounds of training. This indicates that its learning speed is not as high as that of NAM-MAE. The overall convergence rate of the Vision-Transformer is slow, and the training accuracy is lower than that of the other models. The training accuracy curve of SE-ResNet is smooth, and the convergence speed is high. However, the final training accuracy is slightly lower, and the learning ability is not as effective as that of NAM-MAE. The CNN-CBAM has a general convergence speed, and the final training accuracy is not high. The initial learning speed of Resnet-18 is fast, but quickly converges and has the lowest accuracy. Therefore, the overall performance is not as good as that of NAM-MAE.

Fig. 10. Training accuracy curves for the five models.

Download Full Size | PDF

The confusion matrix of the five methods is shown in Fig. 11. For different events, the percentage numbers of correctly and incorrectly classified samples in total classified samples are provided. The horizontal and vertical axes represent the predicted and true labels, respectively.

Fig. 11. Confusion matrices for the five models. (a)NAM-MAE (b) Swin transformer (c)Vision transformer (d)SE-ResNet (e)CNN-CBAM (f)ResNet-18

Download Full Size | PDF

Based on the confusion matrix, several evaluation metrics for the model are obtained: 1) Accuracy: the number of correctly predicted samples divided by the total number of samples; 2) Precision: the number of correct predictions for a single predicted tag divided by the total number of predictions for a single predicted tag; 3) Recall: the number of correct predictions for a single true label divided by the total number of single true labels; 4) F1-score: the harmonic mean of precision and recall; the equation is F1-score = 2×precision × recall/(precision + recall); 5) Average precision: weighted average of each precision; 6) Average F1-score: the weighted average of each F1-score [43]. Table 3 presents a comparison of the performance indicators of the five models.

Table 3. Performance indicators of the five models

View Table | View all tables in this article

From the analysis results shown in Fig. 11 and Table 3, from the perspective of single-label recognition, NAM-MAE correctly classifies the real labels of Classes 4 and 5, and the overall indicators of Classes 1 and 3 are the highest. However, NAM-MAE has lower recall than Vision-Transformer for true-label classification of Class 2 and lower precision and F1-score than Swin-Transformer for true-label classification of Class 4. Other models have the advantage of judging certain events. However, the results show that from the perspective of overall recognition, the accuracy, average precision, and average F1-score of NAM-MAE are 0.9661, 0.9667, and 0.9662, respectively; these values are approximately 9.5–18.9% higher than those of the other models.

4. Conclusion

This study proposed an intelligent event recognition algorithm with a noise-adaptive mask MAE that combines the advantages of representation learning, attention mechanism, and self-supervised learning. This study applied the proposed algorithm to a DAS system. Considering the most profound fading noise in a DAS system as the main research object, this method analyzed the fading points of DAS signals based on differential ideas, used the DBSCAN algorithm to judge and extract the noise points, and generated a noise discrimination matrix. Finally, the upstream task of MAE was used for model pre-training, and the downstream task generated a mask according to the noise discrimination matrix using a noise adaptive mask to complete the intelligent event recognition training. In this experiment, five types of water perimeter security events collected from the DAS underwater acoustic test site were used as classification data, and the NAM-MAE was compared with the MAE and five other related models. The results showed that the average training accuracy of NAM-MAE was 99.2333%, which was 3.8351% higher than that of MAE, and the average training loss was 0.0983, which was 0.1213 lower than that of the MAE. The test performance index was 3.3333% higher than MAE on average. NAM-MAE had a higher training accuracy and higher convergence speed than the other five models. The accuracy of NAM-MAE was 0.9661, the average precision was 0.9667, and the average F1-score was 0.9662, which was approximately 9.5–18.9% higher than that of the other models. The experimental results showed that event recognition can also be achieved by using a few effective features. This method provided a new idea and direction for big data processing and recognition of DAS system. Future work could focus on finding more noise to mask and the setting of hyperparameters such as the mask rate of the model.

Funding

National Key Research and Development Program of China (2019YFC0409105; 2022YFC3003801); Department of Science and Technology of Jilin Province (20210204193YY, 20220203133SF, YDZ1202201ZYTS137); The Shenyang Science and Technology Plan Public Health R&D Special Project (21-172-9-18); Science, Technology and Innovation Commission of Shenzhen Municipality (No. JCYJ20220818101206015, No. JCYJ20220818101609021).

Disclosures

All of the authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Z. Wang, L. Zhang, S. Wang, N. Xue, F. Peng, M. Fan, W. Sun, X. Qian, J. Rao, and Y. Rao, “Coherent Phi-OTDR based on I/Q demodulation and homodyne detection,” Opt. Express 24(2), 853–858 (2016). [CrossRef]

2. D. Chen, Q. Liu, and Z. He, “High-fidelity distributed fiber-optic acoustic sensor with fading noise suppressed and sub-meter spatial resolution,” Opt. Express 26(13), 16138–16146 (2018). [CrossRef]

3. D. Chen, Q. Liu, Y. Wang, H. Li, and Z. He, “Fiber-optic distributed acoustic sensor based on a chirped pulse and a non-matched filter,” Opt. Express 27(20), 29415–29424 (2019). [CrossRef]

4. T. F. B. Marie, Y. Bin, H. Dezhi, and A. Bowen, “Principle and application state of fully distributed fiber optic vibration detection technology based on Φ-OTDR: A review,” IEEE Sens. J. 21(15), 16428–16442 (2021). [CrossRef]

5. H. Li, C. Fan, Z. Shi, B. Yan, J. Chen, Z. Yan, D. Liu, P. Shum, and Q. Sun, “Spatio-temporal joint oversampling-downsampling technique for ultra-high resolution fiber optic distributed acoustic sensing,” Opt. Express 30(16), 29639–29654 (2022). [CrossRef]

6. W. Cao, G. Cheng, G. Xing, and B. Liu, “Near-field target localisation based on the distributed acoustic sensing optical fibre in shallow water,” Opt. Fiber Technol. 75, 103198 (2023). [CrossRef]

7. X. He, X. Xu, M. Zhang, S. Xie, F. Liu, L. Gu, Y. Zhang, Y. Yang, and H. Lu, “On the phase fading effect in the dual-pulse heterodyne demodulated distributed acoustic sensing system,” Opt. Express 28(22), 33433–33447 (2020). [CrossRef]

8. Z. Zhao, H. Wu, J. Hu, K. Zhu, Y. Dang, Y. Yan, M. Tang, and C. Lu, “Interference fading suppression in φ-OTDR using space-division multiplexed probes,” Opt. Express 29(10), 15452–15462 (2021). [CrossRef]

9. Y. Lu, X. Hu, Z. Yu, Q. Zhu, and Z. Meng, “Fading noise reduction in distributed acoustic sensing using an optimal weighted average algorithm,” Appl. Opt. 60(34), 10643–10648 (2021). [CrossRef]

10. F. Jiang, Z. Zhang, Z. Lu, H. Li, Y. Tian, Y. Zhang, and X. Zhang, “High-fidelity acoustic signal enhancement for phase-OTDR using supervised learning,” Opt. Express 29(21), 33467–33480 (2021). [CrossRef]

11. D. F. Kandamali, X. Cao, M. Tian, Z. Jin, H. Dong, and K. Yu, “Machine learning methods for identification and classification of events in ϕ-OTDR systems: a review,” Appl. Opt. 61(11), 2975–2997 (2022). [CrossRef]

12. J. Li, Y. Wang, P. Wang, Q. Bai, Y. Gao, H. Zhang, and B. Jin, “Pattern recognition for distributed optical fiber vibration sensing: A review,” IEEE Sens. J. 21(10), 11983–11998 (2021). [CrossRef]

13. Z. Peng, H. Wen, J. Jian, A. Gribok, M. Wang, S. Huang, H. Liu, Z.-H. Mao, and K. P. Chen, “Identifications and classifications of human locomotion using Rayleigh-enhanced distributed fiber acoustic sensors with deep neural networks,” Sci. Rep. 10(1), 21014 (2020). [CrossRef]

14. B. M. T. Fouda, B. Yang, D. Han, and B. An, “Pattern recognition of optical fiber vibration signal of the submarine cable for its safety,” IEEE Sens. J. 21(5), 6510–6519 (2021). [CrossRef]

15. H. Jia, S. Liang, S. Lou, and X. Sheng, “A k -Nearest Neighbor Algorithm-Based Near Category Support Vector Machine Method for Event Identification of $\mathrm{\varphi}$-OTDR,” IEEE Sens. J. 19(10), 3683–3689 (2019). [CrossRef]

16. N. Yang, Y. Zhao, J. Chen, and F. Wang, “Real-time classification for Φ-OTDR vibration events in the case of small sample size datasets,” Opt. Fiber Technol. 76, 103217 (2023). [CrossRef]

17. Y. Wang, P. Wang, K. Ding, H. Li, J. Zhang, X. Liu, Q. Bai, D. Wang, and B. Jin, “Pattern recognition using relevant vector machine in optical fiber vibration sensing system,” IEEE Access 7, 5886–5895 (2019). [CrossRef]

18. H. Meng, S. Wang, C. Gao, and F. Liu, “Research on recognition method of railway perimeter intrusions based on Φ-OTDR optical fiber sensing technology,” IEEE Sens. J. 21(8), 9852–9859 (2021). [CrossRef]

19. X. Wang, A. Zhang, S. Liang, and S. Lou, “Event identification of a phase-sensitive OTDR sensing system based on principal component analysis and probabilistic neural network,” Infrared Phys. Technol. 114, 103630 (2021). [CrossRef]

20. T. F. B. Marie, Y. Bin, H. Dezhi, and A. Bowen, “A Hybrid Model Integrating MPSE and IGNN for Events Recognition along Submarine Cables,” IEEE Trans. Instrum. Meas. 71, 1–13 (2022). [CrossRef]

21. S. Xu, Z. Qin, W. Zhang, X. Xiong, H. Li, Z. Wei, O. A. Postolache, and C. Mi, “A novel method of recognizing disturbance events in Φ-OTDR based on affinity propagation clustering and perturbation signal selection,” IEEE Sens. J. 21(12), 13272–13282 (2021). [CrossRef]

22. Y. Huang, S. Cheng, Y. Li, X. Chen, J. Dai, C. Hu, C. Deng, F. Pang, X. Zhang, and T. Wang, “High-efficient disturbance event recognition method of ϕ-OTDR utilizing region-segmentation differential phase signals,” Appl. Opt. 61(22), 6609–6616 (2022). [CrossRef]

23. Y. Huang, H. Zhao, X. Zhao, B. Lin, F. Meng, J. Ding, S. Lou, X. Wang, J. He, X. Sheng, and others, “Pattern recognition using self-reference feature extraction for φ-OTDR,” Appl. Opt. 61(35), 10507–10518 (2022). [CrossRef]

24. Q. Sun, Q. Li, L. Chen, J. Quan, and L. Li, “Pattern recognition based on pulse scanning imaging and convolutional neural network for vibrational events in Φ-OTDR,” Optik 219, 165205 (2020). [CrossRef]

25. Z. Ge, H. Wu, C. Zhao, and M. Tang, “High-Accuracy Event Classification of Distributed Optical Fiber Vibration Sensing Based on Time–Space Analysis,” Sensors 22(5), 2053 (2022). [CrossRef]

26. Y. Shi, Y. Wang, L. Zhao, and Z. Fan, “An event recognition method for Φ-OTDR sensing system based on deep learning,” Sensors 19(15), 3421 (2019). [CrossRef]

27. X. Zhao, H. Sun, B. Lin, H. Zhao, Y. Niu, X. Zhong, Y. Wang, Y. Zhao, F. Meng, J. Ding, and others, “Markov transition fields and deep learning-based event-classification and vibration-frequency measurement for φ-OTDR,” IEEE Sens. J. 22(4), 3348–3357 (2022). [CrossRef]

28. I. A. Barantsov, A. B. Pnev, K. I. Koshelev, V. S. Tynchenko, V. A. Nelyub, and A. S. Borodulin, “Classification of Acoustic Influences Registered with Phase-Sensitive OTDR Using Pattern Recognition Methods,” Sensors 23(2), 582 (2023). [CrossRef]

29. N. Yang, Y. Zhao, and J. Chen, “Real-time Φ-OTDR vibration event recognition based on image target detection,” Sensors 22(3), 1127 (2022). [CrossRef]

30. W. Xu, F. Yu, S. Liu, D. Xiao, J. Hu, F. Zhao, W. Lin, G. Wang, X. Shen, W. Wang, and others, “Real-time multi-class disturbance detection for Φ-OTDR based on YOLO algorithm,” Sensors 22(5), 1994 (2022). [CrossRef]

31. Y. Shi, Y. Li, Y. Zhang, Z. Zhuang, and T. Jiang, “An easy access method for event recognition of Φ-OTDR sensing system based on transfer learning,” J. Lightwave Technol. 39(13), 4548–4555 (2021). [CrossRef]

32. M. Tian, H. Dong, X. Cao, and K. Yu, “Temporal convolution network with a dual attention mechanism for φ-OTDR event classification,” Appl. Opt. 61(20), 5951–5956 (2022). [CrossRef]

33. X. Liu, H. Wu, Y. Wang, Y. Tu, Y. Sun, L. Liu, Y. Song, Y. Wu, and G. Yan, “A Fast Accurate Attention-Enhanced ResNet Model for Fiber-Optic Distributed Acoustic Sensor (DAS) Signal Recognition in Complicated Urban Environments,” in Photonics (MDPI, 2022), 9(10), p. 677.

34. A. Almudévar, P. Sevillano, L. Vicente, J. Preciado-Garbayo, and A. Ortega, “Unsupervised Anomaly Detection Applied to Φ-OTDR,” Sensors 22(17), 6515 (2022). [CrossRef]

35. J. Zhang, X. Zhao, Y. Zhao, X. Zhong, Y. Wang, F. Meng, J. Ding, Y. Niu, X. Zhang, L. Dong, and others, “Unsupervised learning method for events identification in φ-OTDR,” Opt. Quantum Electron. 54(7), 457 (2022). [CrossRef]

36. Z. Peng, J. Jian, H. Wen, A. Gribok, M. Wang, H. Liu, S. Huang, Z.-H. Mao, and K. P. Chen, “Distributed fiber sensor and machine learning data analytics for pipeline protection against extrinsic intrusions and intrinsic corrosions,” Opt. Express 28(19), 27277–27292 (2020). [CrossRef]

37. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 16000–16009.

38. J. Zhou, Z. Pan, Q. Ye, H. Cai, R. Qu, and Z. Fang, “Phase demodulation technology using a multi-frequency source discrimination of interference-fading induced false alarm in a Φ-OTDR system,” Chin. J. Laser 40(9), 0905003 (2013). [CrossRef]

39. J. Zhou, Z. Pan, Q. Ye, H. Cai, R. Qu, and Z. Fang, “Characteristics and explanations of interference fading of a $\mathrm{\phi}$-OTDR with a multi-frequency source,” J. Lightwave Technol. 31(17), 2947–2954 (2013). [CrossRef]

40. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, and others, “An image is worth 16 × 16 words: Transformers for image recognition at scale,” arXiv, arXiv:2010.11929 (2020). [CrossRef]

41. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10012–10022.

42. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 7132–7141.

43. M. Sun, M. Yu, P. Lv, A. Li, H. Wang, X. Zhang, T. Fan, and T. Zhang, “Man-made threat event recognition based on distributed optical fiber vibration sensing and SE-WaveNet,” IEEE Trans. Instrum. Meas. 70, 1–11 (2021). [CrossRef]

Event type	Drone ship	Boating	Demolition optic cable	Sonar detection	No event	Total number
Train set	11965	14400	13434	14400	14400	68599
Validation set	2991	3600	3358	3600	3600	17149
Test set	2000	2000	2000	2000	2000	10000
Total number	16956	20000	18792	20000	20000	95748

	Drone ship	Boating	Demolition optic cable	Sonar detection	No event
	Recall					Accuracy
NAM-MAE	0.9435	0.9380	0.9490	1.0000	1.0000	0.9661
Swin transformer	0.8420	0.7935	0.7005	1.0000	1.0000	0.8672
Vision transformer	0.6065	0.9420	0.7055	1.0000	1.0000	0.8508
SE-ResNet	0.6945	0.6200	0.8145	0.9995	0.9915	0.8240
CNN-CBAM	0.4785	0.7100	0.6940	1.0000	1.0000	0.7765
ResNet-18	0.5215	0.6870	0.6520	1.0000	1.0000	0.7721
	Precision					Average precision
NAM-MAE	0.9440	0.9921	0.9174	0.9799	1.0000	0.9667
Swin transformer	0.8223	0.7549	0.7572	1.0000	1.0000	0.8669
Vision transformer	0.8464	0.6382	0.8846	0.9900	1.0000	0.8718
SE-ResNet	0.8372	0.6659	0.6571	0.9915	0.9994	0.8302
CNN-CBAM	0.7517	0.5047	0.7270	0.9975	1.0000	0.7962
ResNet-18	0.6631	0.5414	0.7003	0.9886	0.9980	0.7782
	F1-score					Average F1-score
NAM-MAE	0.9437	0.9643	0.9329	0.9898	1.0000	0.9662
Swin transformer	0.8320	0.7738	0.7278	1.0000	1.0000	0.8667
Vision transformer	0.7066	0.7609	0.7849	0.9950	1.0000	0.8495
SE-ResNet	0.7592	0.6421	0.7273	0.9955	0.9954	0.8239
CNN-CBAM	0.5847	0.5900	0.7101	0.9987	1.0000	0.7767
ResNet-18	0.5838	0.6055	0.6752	0.9942	0.9990	0.7715

Event type	Drone ship	Boating	Demolition optic cable	Sonar detection	No event	Total number
Train set	11965	14400	13434	14400	14400	68599
Validation set	2991	3600	3358	3600	3600	17149
Test set	2000	2000	2000	2000	2000	10000
Total number	16956	20000	18792	20000	20000	95748

	Drone ship	Boating	Demolition optic cable	Sonar detection	No event
	Recall					Accuracy
NAM-MAE	0.9435	0.9380	0.9490	1.0000	1.0000	0.9661
Swin transformer	0.8420	0.7935	0.7005	1.0000	1.0000	0.8672
Vision transformer	0.6065	0.9420	0.7055	1.0000	1.0000	0.8508
SE-ResNet	0.6945	0.6200	0.8145	0.9995	0.9915	0.8240
CNN-CBAM	0.4785	0.7100	0.6940	1.0000	1.0000	0.7765
ResNet-18	0.5215	0.6870	0.6520	1.0000	1.0000	0.7721
	Precision					Average precision
NAM-MAE	0.9440	0.9921	0.9174	0.9799	1.0000	0.9667
Swin transformer	0.8223	0.7549	0.7572	1.0000	1.0000	0.8669
Vision transformer	0.8464	0.6382	0.8846	0.9900	1.0000	0.8718
SE-ResNet	0.8372	0.6659	0.6571	0.9915	0.9994	0.8302
CNN-CBAM	0.7517	0.5047	0.7270	0.9975	1.0000	0.7962
ResNet-18	0.6631	0.5414	0.7003	0.9886	0.9980	0.7782
	F1-score					Average F1-score
NAM-MAE	0.9437	0.9643	0.9329	0.9898	1.0000	0.9662
Swin transformer	0.8320	0.7738	0.7278	1.0000	1.0000	0.8667
Vision transformer	0.7066	0.7609	0.7849	0.9950	1.0000	0.8495
SE-ResNet	0.7592	0.6421	0.7273	0.9955	0.9954	0.8239
CNN-CBAM	0.5847	0.5900	0.7101	0.9987	1.0000	0.7767
ResNet-18	0.5838	0.6055	0.6752	0.9942	0.9990	0.7715

Intelligent water perimeter security event recognition based on NAM-MAE and distributed optic fiber acoustic sensing system

Abstract

1. Introduction

2. Methods

2.1 Distributed optic fiber acoustic sensing

2.2 Fading analysis

2.3 DBSCAN principle

2.4 NAM-MAE

3. Experiment and results

3.1 Comparison between NAM-MAE and MAE

3.2 Comparison of NAM-MAE with other models

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (3)

Equations (6)

Optics Express

	Average training accuracy(%)	Average training loss	Average test accuracy(%)	Average precision	Average F1-score
NAM-MAE	99.2333	0.0983	96.6134	0.9667	0.9662
MAE	95.3979	0.2196	93.1912	0.9352	0.9319