Perimeter monitoring of urban buried pipeline threated by construction activities based on distributed fiber optic sensing and real-time object detection

Suzhen Li; Suzhen Li; Renzhu Peng; Zelong Liu; Xuqiang Liu

doi:10.1364/OE.509487

1. Introduction

As one of the most important infrastructures, urban pipeline network (UPN) plays an important role in transmission and distribution of gas, water, and other energy or resources. For the convenience of installation and maintenance, most pipelines are directly buried under roads or greenbelts. The third-party activities caused by civil construction and man-made sabotage have severely affected the normal operation of the pipelines. According to the latest statistics in China [1], the pipeline failure caused by construction activities accounts for more than 34% of all accidents from 2019 to 2021. It is of great significance for monitoring and early warning of construction activities for pipeline protection and maintenance.

A few monitoring systems have been recently developed to prevent the pipeline accidents caused by third-party invasion. According to the types of the collected signals, the techniques can be divided into image/video monitoring, sound monitoring and vibration monitoring. The image/video monitoring generally collects the images or videos of third-party activities near pipelines by fixed cameras or unmanned aerial vehicles (UAV) [2–4]. The fixed cameras are usually installed in the gate station for intrusion monitoring of regional borders rather than buried pipelines, while the UAV technology combined with image recognition is highly susceptible to bad weather and buildings. A sound monitoring system [5] has been proposed for road cutters, excavator breaking hammers and electric hammers. It, however, has to deal with the challenge in limited sensing range and diverse urban noise [6]. As for vibration monitoring, the basic idea is to detect the ground motion induced by the construction activities by using accelerometers, velocimeters or other vibration sensors [7–14]. Compared with the traditional point sensors, the distributed fiber optic sensors (DFOS) present great potential since they can monitor a long pipeline by using a single optical fiber. Besides, the communication optical cables in cities are usually laid near pipelines, which can enhance the applicability of this technique in engineering practice.

The phase-sensitive optical time-domain reflectometer (φ-OTDR) is one of the optical fiber sensors for distributed vibration monitoring. Its basic measurement principle is to emit incident light into the optical fiber and obtain the Rayleigh backscattering signals, based on which the vibration monitoring can be realized by calculating the differential curves of captured signals [15,16]. The φ-OTDR is first developed by Taylor and Lee [17], later, Choi and Taylor add an Er + amplifier to improve the sensitivity of this technology [18]. Juarez et al. further extend the monitoring distance by using an ultra-narrow bandwidth and low frequency shift laser, which realized the vibration measurements with a spatial resolution of 100 m [19]. Combining the Mach-Zehnder interferometer with φ-OTDR, Zhu et al. develop a system achieving 5 m spatial resolution and 3 MHz frequency response for a 1064 m optical fiber in the experiment [20]. In the early stage of development, φ-OTDR can only perform qualitative measurement rather than accurately reconstruct the vibration waveform. To solve this problem, Pan et al. propose a phase demodulation based on digital coherent detection technology, which successfully extracted the dynamic information of external vibration so as to realize the quantitative vibration measurement [21].

In general, the signals obtained by the φ-OTDR system are composed of multiple time series data from different channels, each of which corresponds to a certain spatial location of the optical fiber. The recognition of unusual events such as the construction activities is essentially to identify the abnormal signals of the optical fiber and determine the corresponding time and channel/location. In the early stage of application development of the φ-OTDR system in third-party activity monitoring, the vibration intensity value is directly used as the indicator to abnormal events by comparing with a threshold. Although simple and straightforward, it is far from an effective identification measure in practice. Considering the time-varying characteristics of the signals due to various activities, the multi-scale wavelet decomposition method [8] and the contextual information extraction method [9,10] are respectively proposed to recognize the time-series data. Since the spatial resolution is a key issue for a distributed sensing system and different activities may induce field vibration with specific propagation, some methods are proposed to extract temporal and spatial information from the vibration signals to recognize the patterns of different third-party activities. As a powerful tool for data processing and mining, the convolutional neural network (CNN) has been employed to deal with the time-space samples of the optical fiber [22–24]. The CNN greatly reduces the false positive rate and enables the model to distinguish between different third-party activities. However, as a classification model, the CNN model can only determine whether there is a third-party activity in a time-space sample. This results in that the sample size must be set small enough to make sure the time and location of the activity can be identified precisely, which significantly reduces recognition speed. In addition, the receptive field of the CNN model is limited by the small sample size, which makes the model unable to perceive the difference among different activities at large time-space scales.

The recent development of object detection algorithms has opened up new possibilities for pattern recognition of third-party activities using the φ-OTDR based fiber optic sensing technique. Superior to the CNN models, the object detection algorithms have no limitations on the size of time-space samples [25,26]. So conceptually, a larger time-space sample size can give a more suitable receptive field for the recognition model and improve the processing speed of the monitoring method. The vibration signals from all the channels of an optical fiber can be used directly as time-space samples, and the alarm areas on time-space samples obtained by object detection algorithms can indicate where and when third-party activities are taking place. As one of the best object detection algorithms, the faster region-based convolutional neural networks (Faster R-CNN) can well search multiple targets on gray images [27], and hence provides a potential tool to improve the recognition performance on time-space samples acquired by the distributed fiber optic sensor.

Using the φ-OTDR system for distributed vibration sensing, a monitoring perimeter method based on Faster R-CNN models is proposed for the protection of urban buried pipelines from construction activities. The rest of this paper is arranged as follows: Section 2 presents the methodology as well as the technique details; Field tests are conducted and the collected data is preliminarily analyzed in Section 3; using the datasets in the field tests, the proposed method are quantitatively evaluated in terms of detection ability, false alarm probability and recognition speed in Section 4. Finally, the conclusions are drawn in Section 5.

2. Methodology

2.1 Overview of the monitoring method

Using the distributed optical fiber placed alongside with the buried pipeline, the ambient vibration is constantly monitored and used as input for recognition of construction activities. The overall scheme of the monitoring method is designed and presented in Fig. 1, which consists of two faster R-CNN models for two-level recognition. The model-1 is designed for a quick scan of the full-scale data to detect and locate the suspicious regions of construction activities. Focusing on the identified suspicious regions, the model-2 is developed to identify the type of the activities as well as the corresponding time and location by processing the samples with more delicate analysis.

Fig. 1. The overall scheme of the monitoring method.

Download Full Size | PDF

Conceptually, the design of this monitoring scheme presents unique advantages by comparing with the existing DFOS-based recognition methods. Since the construction activities are the events unusual happening, the induced abnormal signals are highly localized with respect to the signals acquired by the optical fiber with a certain spatial distribution over time duration. The employment of the object detection algorithm is effective to quickly search and extract the object signals from large amounts of raw data. Besides, the two-level strategy helps to achieve the recognition results with both efficiency and accuracy. The model-1 is used to quickly capture the suspicious regions from a relatively large sample scale, based on which the model-2 is devoted to the more sophisticated tasks and the improvement of recognition accuracy. The key techniques will be elaborated as follows.

2.2 Segmentation of time-space samples

In the actual application scenarios, the vibration of the full-length optical fiber is monitored and recorded in the dataset containing the signals of all channels at a certain sampling rate, which consists of a two-dimensional matrix presenting the signals varying with time and space/channel. It can be found from the dashed boxes in Fig. 1 that the time-space samples with different size has been adopted in succession. The initial size is ${t_0} \times {c_0}$, where ${t_0}$ is the time duration of the latest record together with a few preceding records and ${c_0}$ is the total channels of the tested optical fiber. The size of the samples feeding to the model-1 and the model-2 is ${t_1} \times {c_1}$ and ${t_2} \times {c_2}$, respectively. In general, ${t_0} > {t_1} > {t_2}$, ${c_0} > {c_1} > {c_2}$, and their values should be carefully selected considering the actual application scenarios as well as the requirements for recognition efficiency and accuracy.

(1)$${S_i}(t,c) = W[{({i - 1} )\times {d_\textrm{v}} + t,({i - 1} )\times {d_\textrm{c}} + c} ]\quad t = 1,2,\ldots ,{N_t},\,c = 1,2,\ldots ,{N_c}$$

where S_i(t, c) is the value in the x-th row and y-th column of the i-th time-space sample; d_t and d_c are the shifts of framing window on rows(time) and columns (space) respectively. W[.] is an operator of mapping the coordinates of data points in raw signal to the coordinates in generated i-th time-space samples S_i. N_t and N_c are the number of rows and columns in a time-space sample respectively, which determine the receptive field of the samples. The increase of N_t and N_c reduces the number of time-space samples and improves the recognition speed, but may lead to the loss of information details for sample recognition.

2.3 Segmentation of time-space samples

The Faster R-CNN is one of the most advanced object detection algorithm with the advantage of fast detection, which can locate and recognize multiple objects in images through the model consisting of a feature extraction, a region proposal network and a classification network [27]. For the object detection of grayscale images, the input of the model is a two-dimensional matrix composed of values in the grayscale images, and the output of the model is the classification of the detected objects and their position in the gray images.

It should be noted that the detection of construction activities from the large-scale datasets of the fiber optic signals is quite similar to the task as searching for the regions where the construction activities appear in the time-space samples. Therefore, the Faster R-CNN algorithm is adopted here to develop the recognition models, as presented in the Fig. 2, which consists of the following three steps.

(1) The ResNet50 network [28] is employed to extract the feature map of the time-space samples. ResNet50 is a classic CNN architecture. Compared to traditional ordinary CNN architecture, ResNet50 addresses the challenges of vanishing gradients and exploding gradients during deep neural network training by introducing residual learning, which facilitates easier optimization. At the same time, the ResNet50 comprises 50 convolutional layers. The increased depth enables the extraction of richer features, contributing to enhanced model performance.
(2) The region proposal network (RPN) is used to determine the suspicious regions of the construction activities, which is mainly composed of anchor generation, anchor classification by a softmax layer and boundary regression. The suspicious regions are obtained on the basis of the classification results and the anchors which have been regressed. RPN is a crucial component in modern object detection frameworks. It employs anchor boxes, which are a set of pre-defined bounding boxes of various scales and aspect ratios. These anchors serve as reference boxes to predict potential object regions. The RPN slides a small window over the feature map generated by the CNN backbone. At each position, the RPN predicts whether an object is present or not and adjusts the coordinates of the associated anchor box.
(3) The vibration intensity values in every suspicious region are extracted from the time-space samples and processed into a two-dimensional matrix of the same size by the Region of Interest Pooling (ROI Pooling) method [29]. After that, the matrices of suspicious regions are classified by a softmax layer, and the bounding regression algorithm is used again for the more accurate region coordinates of the object. The purpose of RoI Pooling is to enable the extraction of fixed-size feature maps from irregularly shaped regions of an input feature map, allowing for the application of standard fully connected layers. It divides each proposed region into a fixed number of spatial grids. Within each spatial grid, a pooling operation is performed to obtain a representative value. The results of the pooling operation in each grid are concatenated to form the output feature map.

Fig. 2. Recognition model based on Faster R-CNN.

Download Full Size | PDF

Both of two Faster RCNN models have the same structures. After obtaining time-space samples of two models, the 1600 × 800 grayscale pictures are plotted and regraded as the inputs of the two Faster RCNN models. Therefore, the inputs of the two models are the grayscale pictures with 1600 × 800 pixels. The structures of the two Faster RCNN models are listed in Table 1. And the hyperparameters for training are presented in Table 2.

Table 1. The structures of the two Faster RCNN models

View Table | View all tables in this article

Table 2. Training hyperparameters of the two Faster RCNN models

View Table | View all tables in this article

Generally, The CNN algorithm and Faster R-CNN algorithm are both composed of convolution part and decision part. The convolution part is used to extract feature maps from the inputs. And the decision part is the applied to recognize and classify based on feature maps. Although the two algorithm has the similarities, the differences between the two algorithms makes the Faster R-CNN obtaining greater advantages in efficiency. The details are as follows:

(1) Convolution Part: The outputs of CNN are the classification results of inputs. Therefore, the localization of third-party intrusion is unavailable by simply deploying an existing and mature CNN model. One feasible method to realize localization is segmenting input time-space signals into small size pieces, and recognize the small size samples. The localization can be realized by mapping small samples to corresponding coordinates in raw signals. As a result, the computation efficiency of CNN is much lower than R-CNN in this case because the CNN needs to recognize much more times (much more small samples).
(2) Decision Part: The decision part of CNN is flatting the feature maps calculated by convolution part, and then using full-connected layer to make decision. For the R-CNN, the decision part is depended on regional proposal network (RPN). The number of full-connected layer parameters is far more than RPN. Therefore, due to the larger number parameters of full-connected layer, the computational efficiency of CNN decision part is also lower than that of R-CNN.

2.4 Two kernel recognition models

Based on the methods described in section 2.2 and 2.3, two recognition models are developed respectively using the time-space samples of the large scale t₁×c₁ and the small scale t₂×c₂.

Before the training of the model-1, the vibration intensity values over 0.5 are assigned to 0.5 to highlight the suspicious regions in the samples. Via a n × 1 max pooling layer, the t₁×c₁ samples are reduced to t₁/n × c₁ to ensure that the aspect ratio of the samples is acceptable. The processed samples are then used for training the model-1 to detect the suspicious regions in the t₁×c₁ samples. The model-1 can effectively reduce the recognition time because of its larger sample size, but the excessive sensitivity may cause the environmental noise to be recognized as the construction activities. The high sensitivity and wide recognition range make the model-1 more suitable for the preliminary recognition of vibration signals.

Comparatively, the model-2 with the smaller sample size helps to retain more details of the signals to implement more accurate recognition on various construction activities. Since the sample size of the model-2 is very small, it is not appropriate to send an alarm directly based on the single recognition result on one sample. A warning strategy is hence proposed to improve the recognition results, including some considerations as follows: (1) If the duration of the detected activity is less than 3s, the result will not trigger an alarm; (2) Only when two consecutive t₂×c₂ samples corresponding to the same location are recognized as the construction activities, an alarm is issued.

3. Field tests

3.1 Overview of the field tests

Field tests are conducted for a long-distance urban gas pipeline in service. A 5.25 km-long single-mode optical fiber cable is laid along the pipeline with the buried depth of 1 m. As shown in the Fig. 3, the cable covers the most common scenarios in cities. The pulse repetition rate used in this work is 2000Hz, and the spatial resolution is set as 10 m. To increase the processing speed of the sensing system, the average algorithm is used to reduce the noise of the signals collected by φ-OTDR, and the sampling rate of the system is reduced to 8 Hz. Finally, the φ-OTDR sensing system can collect the signal of the vibration intensity along the length of the optical fiber, which are divided into 525 channels corresponding to 10-meter portions of the fiber with the number of 525, and their intensity values are in the range of 0-2.5.

Fig. 3. Optical fiber layout.

Download Full Size | PDF

The type of optical cable is GYTA53-48B, with an outer Φ40 HDPE sleeve. Except for the crossing under the road, it is directly buried along with the pipeline in the green belt. The direct burial installation method and the structure of the optical cable are shown in Fig. 4. The optical fiber is placed near the buried pipeline, and the interval between pipeline and cable is over 300 mm. The end of cable is placed in the pipeline pressure regulating station, and the φ-OTDR devices are deployed near the cable. The electricity is supplied by outlet power supply. Therefore, the φ-OTDR devices can operate for long time. The vibration is simulated in the direction perpendicular to optical cable, and the distance is 0∼15 m.

Fig. 4. Buried pipeline and optical cable.

Download Full Size | PDF

As presented in Fig. 5 four common third-party activities including pickaxe, shovel, hammer and electric hammer are carried out on the ground near the pipeline at six locations. The details are listed in Table 3. The distances between the fiber and the excavation points are also considered to investigate the attenuation characteristics of the vibration signals induced by the different threats. Besides, 85-hour environmental vibration signals are also obtained by the 5.25 km-long fiber across the urban regions of residences, roads, parks, business and metros. Because all areas through which the fiber passes are typically urban areas with strong vibrations, the environmental vibration signals similar to those of third-party activities, such as people jumping, vehicles, subways, etc., can be collected to rigorously test the proposed monitoring method. Finally, an over 86-hour database with 8 Hz sampling rate and 525 channels are constructed for further investigation.

Fig. 5. Vibration signal collection.

Download Full Size | PDF

Table 3. Details of third-party intrusion experiments

View Table | View all tables in this article

Using the sample segmentation method in Section 2, the signals from channel 26 to channel 525 collected in the Table 4 are processed with the parameters given in the Table 5 to obtain the two sample sets. d_t and d_c are the shifts of framing window on rows(time) and columns (space) respectively. In the two sets, the scales of the time-space samples are set to 200 × 200 and 9600 × 500 equivalent to 25s × 2 km and 1200s × 5 km. After that, as presented in Table 4, the sample set is further divided into the training and testing sets for model training. The sample set-1 and sample set-2 are used to train the model-1 and model-2 respectively. The size of each sample in the sample set-1 is 9600 × 500 equivalent to 1200s × 5 km. And the size of each sample in the sample set-2 is 200 × 200 equivalent to 25s × 2 km. The total number of sample set-1 and sample set-2 is 477 and 64766 respectively. The training sets of the two models are randomly selected from the two sample sets, which is listed in Table 4. The two models are trained by the samples listed in Table 6. And the rest of samples are the testing samples, which is also listed in Table 6. The recognition rate of the two models is computed based on testing samples and trained models.

Table 4. Record collected in the field test

View Table | View all tables in this article

Table 5. Parameters for the two sample sets

View Table | View all tables in this article

Table 6. Number of samples for different activities

View Table | View all tables in this article

Table 7. Recognition rate

View Table | View all tables in this article

3.2 Preliminary analysis of vibration signals

The time series records of 525 channels are combined together to obtain the total matrix with two dimensions of time and space. Afterwards, the 200 × 200 matrices of different activities are extracted from the total matrix for preliminary analysis, which contains the vibration intensity values within 25s and 2 km. In the Fig. 6, the 200 × 200 matrices are visually displayed in the form of the grayscale images. It is obvious that the vibration intensity peak, the activity duration and the number of channels affected by activities can preliminarily characterize different activities near the pipeline. The peaks of environmental vibration are usually very low as presented in the Fig. 6(a) except that the peaks caused by metros can reach 2. The electric hammer is the excavation activity with the highest peak value of 2.2 and obviously different from the environmental vibration. The peaks of the excavation activities induced by pickaxes, spades and hammers are similar and significantly higher than the peaks of normal urban activities other than metros. The activity durations of pickaxes, spades and hammers are within one second, while the activity durations of electric hammers and metros can reach 10 seconds or more. The numbers of channels affected by pickaxes, spades and hammers are usually less than 4, but the numbers of electric hammers and metros are 5 and 8.

Fig. 6. The vibration intensity values of different activities.

Download Full Size | PDF

Compared to the third-party vibration signals, the amplitude of environmental noise is relatively small. After turning the heatmap image of time-space samples into grayscale, the noise is not clear. Therefore, turning colored heatmap to grayscale is a feasible method to reduce the influence of environmental noise. Figure 7 presents the comparison of colored heatmap and grayscale heatmap. It clearly indicated that the environmental noise can be seen in colored picture, but not clear in the grayscale picture. Therefore, turning the colored map to grayscale heatmap can be regarded as an effective method to reduce the impact of environmental noise.

Fig. 7. Comparison of colored and grayscale picture.

Download Full Size | PDF

Figure 8 presented the signal-to-noise (SNR) of different vibration signals. It can be seen that: (1) The vibration made by manual excavation, including pickaxe, spade and hammer, has the periodic variation in amplitude. Therefore, it’s hard to discriminate each other. (2) The value of SNR indicate that vibration signals are much higher than environmental noise. (3) the peaks of the signals caused by excavator, electrical hammer, and metro are more continuous.

(1) Based on the above comparison, the three conclusions can be obtained: The extracted two-dimensional matrices can well reveal the vibration near a certain length of pipe over a period of time
(2) The characteristics of different activities in time or space, such as the vibration intensity peak, the activity duration and the number of channels affected by activities, provide indications to recognize the third-party threats under urban environmental vibration. Further, the combination of characteristics in time and space may be a more accurate description of the vibration activities.
(3) According to the severity of the third-party threats and the similarity of the signals, the excavation caused by pickaxes, spades and hammers is referred to as the manual excavation, while the electric hammer excavation is referred to as mechanical excavation.

Fig. 8. The signal-to-noise ratio (SNR) of different activities.

Download Full Size | PDF

It can be also found from Fig. 6 that if taking the time-space optical signals as grayscale images, the abnormal event can be treated as an object. The detection of third-party threats can be regarded as searching for the activity regions in two dimensional matrices of vibration intensity values. It can be observed that the signals due to third-party activities just occupy a very small proportion of the optical signals, so the object detection process which first determines threat regions and then classifies them is very helpful to improve the recognition efficiency.

4. Method validation

Using the testing sets shown in Table 6 and followed by the procedure shown in Fig. 1, the recognition results can be obtained and evaluated from the view of the detecting capability, the anti-noise ability and the recognition speed. For comparison, another method based on a CNN model in our previous work [23] is also conducted on the same testing set. The following three indices are proposed for the quantitative comparison of the testing results.

(1) Recognition rate
The recognition rate (R) representing the detectability is defined as:
$(2)$$R = {S_\textrm{r}}/{S_\textrm{a}}$$$ where S_r is the number of the construction activities detected by the tested model or method in the testing set; S_a is the total number of the construction activities in the testing set.
(2) Normalized false alarm count
The normalized false alarm count (F) which corresponds to anti-noise ability is defined as the number of false alarms generated per hour for 1 km of fiber:
$(3)$$F = \frac{{{S_\textrm{f}}{N_{\textrm{unit}}}}}{{{N_\textrm{b}}}}$$$ where S_f is the number of the false alarms for the testing set; N_unit is the number of vibration intensity values collected per hour for 1 km of fiber; N_b is the number of the vibration intensity values in the testing set: $(4)$${N_\textrm{b}}\textrm{ = }{N_\textrm{t}} \times {N_\textrm{c}} \times {N_\textrm{s}}$$$ in which N_s is the number of the samples in the testing set.
(3) Normalized recognition time

The normalized recognition time (A) is the recognition time of vibration signals generated by 1 km fiber in 1 hour, which can be calculated by:

(5)$$A = \frac{{T{N_{\textrm{unit}}}}}{{{N_\textrm{b}}}}$$

where T is the time required to recognize all the vibration intensity values in the testing set.

Table 7 indicates the three indices of the two models are compared to quantitatively evaluate the performance of the two models. The overall recognition rates (R) of the model-1 and model-2 are 98.85% and 99.05%, indicating that both models can effectively detect third-party threats in vibration signals. For the manual excavation, the recognition rate of the model-2 reaches 99.73% which is better than that of the model-1, because the smaller receptive field of the model-2 helps to detect the manual excavation threats with smaller vibration intensity values.

Table 8 and Table 9 are the confusion matrixes of model-1 and model-2. It indicates that the specific type of manual excavation is hard to be identified. At the same time, the electrical hammer (mechanical excavation) and metro can be effectively discriminated. Therefore, the third-party threats are divided into manual and mechanical excavation.

Table 8. Confusion matrix of model-1

View Table | View all tables in this article

Table 9. Confusion matrix of model-2

View Table | View all tables in this article

In terms of the anti-noise ability, as shown in the Table 10, model-2 can reduce the number of false alarms by 58.14% compared with the model-1. In the aspect of the recognition speed, the standardized recognition time listed in the Table 9 demonstrates that the recognition speed of the model-1 of Faster R-CNN is 134 times that of the model-2. A 30-second video (See Visualization 1) is attached as supplementary material to verify the feasibility of the method. The video includes a one-hour testing data of third-party intrusion simulation. (Testing time 3:00 pm to 4:00 pm, at 1.96 km location, including manual excavation and mechanical excavation. The duration of each simulation is around 1minute, and the interval of each simulation is also around 1 minute.) Except from vibration simulated at 1.96 km, the rest is environment noise and metro induced vibration (4.5 km). Considering the metro vibration is not a third-party threat, in the proposed method, alarm will not send even the objective detection models detect metro vibration. As a result, the vibration in 4.5 km is not boxed in the video. The results indicate that the proposed method can recognize third-party intrusion with low false alarm rate.

Table 10. Normalized false alarm count

View Table | View all tables in this article

Table 11. Normalized recognition time

View Table | View all tables in this article

The YOLOv3 model and Fast R-CNN model are also tested on the two sample sets (sample set 1 for model-1; sample set 2 for model-2). The results are presented in Tables 10 and 11, which indicate: (1) With respect to accuracy and computational efficiency, Faster R-CNN is better than Fast R-CNN. (2) The recognition rate and false alarm times of Faster R-CNN and YOLOv3 are almost the same. However, the computational efficiency of YOLOv3 is better than Faster R-CNN. (3) Theoretically, Faster R-CNN is a two-stage objective detection algorithm. Comparing the one-stage objective detection algorithm such as YOLOv3, the two-stage objective detection algorithm should have higher efficiency but lower computational efficiency. In the recognition of third-party threats, the recognition rate is important than computational efficiency. Therefore, the Faster R-CNN is chosen. In this case, due to the size of dataset is relatively small, the recognition results of Faster R-CNN are almost the same as YOLOv3. In future works, more data will be collected to improve the performance of models.

The monitoring method combines the recognition results of the model-1 and the model-2. Also, the monitoring strategy are employed to improve the performance of the method. The monitoring method has a high level of detection capability similar to the model-1 and the model-2, and its average recognition rate reaches 98.85%. Benefiting from the collaboration of the two models and the addition of monitoring strategy, the index F corresponding to the anti-noise capability of the monitoring method falls to a very low level. For the same number of vibration signals, the monitoring method can reduce the number of false alarms by 99.00% and 98.28% compared with the model-1 and the model-2. Besides, the recognition speed of the monitoring method is 2031.12% higher than that of model-2.

At the same time, the normalized recognition time of the proposed combined method is much closer to model-1 than model-2. A reasonable explanation is as follows: According to the logic of combined method depicted in Fig. 1, the model-2 can precisely recognize third-party intrusion after the model-1 coarsely detecting suspicious signal. However, for the collected vibration raw signals, the duration of the third-party intrusion accidents is much shorter than the duration of non-damage vibration (the details can be seen in Table 4). Therefore, in the recognition process of combined method, the coarse detection executed by model-1 takes control, which can explain the recognition speed of the combined method is much closer to that of model-1 that model-2.

In testing, the Faster R-CNN based monitoring method proposed in this paper shows many advantages over the CNN-based method. Compared to the CNN-based method, the recognition rate of the mechanical excavation is significantly improved due to the large receptive field brought about by the large sample size. In addition, the proposed monitoring method can reduce the number of false alarms by 88.89% compared with the CNN-based method. For the same number of vibration signals, the recognition time of the Faster R-CNN based method is only 0.41% of that of the CNN-based method. A comprehensive comparison is listed in Table 12, which indicate the feasibility and effectiveness of the proposed methods

Table 12. Comprehensive comparisons of the pipeline third-party intrusion identification based on CNN models

View Table | View all tables in this article

5. Conclusions

Using a φ-OTDR system for distributed fiber optic vibration measurements, a method based on the advanced Faster R-CNN model is proposed for monitoring and early warning of construction activities close to the pipeline in a variety of urban scenarios. The main conclusions are summarized as follows.

(1) The basic idea of object detection is adopted in the proposed method by treating the time-space optical signals as grayscale images and the construction activities as objects, which results in a larger receptive field and faster recognition. The design of two-level strategy also helps to achieve the recognition results with both efficiency and accuracy.
(2) Field tests has been conducted on a 5.25 km-long optical fiber cable placing alongside a gas pipeline crossing the most common scenarios in a city. It can be observed from the measurements that the signals due to third-party activities just occupy a very small proportion of the optical signals under various urban ambient noise, which confirms the feasibility of the object detection process for recognition of construction activities.
(3) The ascendency of the monitoring strategy combining two models has been validated on the datasets in field tests. The 86-hours vibration signals for 5.25 km distance are recognized within 6.6 minutes with the recognition rate of 98.85% for construction activities. The proposed method reduces the false alarm by around 99% and enhance the recognition speed by 2041.12% by comparing with the small-scale model-2.
(4) Compared with the CNN-based method, the proposed Faster R-CNN based monitoring method reduce the false alarm by 88.89%. Besides, the recognition speed is significantly improved. Using the same testing datasets, the recognition time of the Faster R-CNN based method is only 0.82% of that of the CNN-based method.

Funding

National Natural Science Foundation of China (52378525); State Key Laboratory for Disaster Reduction in Civil Engineering (SLDRCE19-B-25).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Z. Liu and S. Li, “Pipeline incident statistics from 2019 to 2021,” (Tongji Univeristy, 2021).

2. Y. Zhu, Z. Lei, W. Zheng, et al., “Research on substation perimeter isolation based on phased array radar and multi-video fusion technology,” in Journal of Physics: Conference Series (IOP Publishing2019), p. 022054.

3. A. Khaloo, D. Lattanzi, K. Cunningham, et al., “Unmanned aerial vehicle inspection of the Placer River Trail Bridge through image-based 3D modelling,” Struct. Infrastruct. Eng. 14(1), 124–136 (2018). [CrossRef]

4. M. Aliyari, B. Ashrafi, and Y. Z. Ayele, “Hazards identification and risk assessment for UAV–assisted bridge inspections,” Struct. Infrastruct. Eng. 18(3), 412–428 (2022). [CrossRef]

5. Z. Liu and S. Li, “A sound monitoring system for prevention of underground pipeline damage caused by construction,” Autom. Constr. 113, 103125 (2020). [CrossRef]

6. R. Xiao, P. Joseph, and J. Li, “The leak noise spectrum in gas pipeline systems: Theoretical and experimental investigation,” J. Sound Vibr. 488, 115646 (2020). [CrossRef]

7. F. Peng, H. Wu, X.-H. Jia, et al., “Ultra-long high-sensitivity Φ-OTDR for high spatial resolution intrusion detection of pipelines,” Opt. Express 22(11), 13804–13810 (2014). [CrossRef]

8. H. Wu, S. Xiao, X. Li, et al., “Separation and determination of the disturbing signals in phase-sensitive optical time domain reflectometry (Φ-OTDR),” J. Lightwave Technol. 33(15), 3156–3162 (2015). [CrossRef]

9. J. Tejedor, J. Macias-Guarasa, H. F. Martins, et al., “A novel fiber optic based surveillance system for prevention of pipeline integrity threats,” Sensors 17(2), 355 (2017). [CrossRef]

10. J. Tejedor, J. Macias-Guarasa, H. F. Martins, et al., “Real field deployment of a smart fiber-optic surveillance system for pipeline integrity threat detection: Architectural issues and blind field test results,” J. Lightwave Technol. 36(4), 1052–1062 (2018). [CrossRef]

11. Q. Sun, H. Feng, X. Yan, et al., “Recognition of a phase-sensitivity OTDR sensing system based on morphologic feature extraction,” Sensors 15(7), 15179–15197 (2015). [CrossRef]

12. N. Wang, N. Fang, and L. Wang, “Intrusion recognition method based on echo state network for optical fiber perimeter security systems,” Opt. Commun. 451, 301–306 (2019). [CrossRef]

13. S. El-Zahab, E. M. Abdelkader, and T. Zayed, “An accelerometer-based leak detection system,” Mech. Syst. Sig. Process 108, 276–291 (2018). [CrossRef]

14. K. Wang, Z. Liu, X. Qian, et al., “Dynamic characteristics and damage recognition of blast-induced ground vibration for natural gas transmission pipeline and its integrated systems,” Mech. Syst. Sig. Process 136, 106472 (2020). [CrossRef]

15. R. Zinsou, X. Liu, Y. Wang, et al., “Recent progress in the performance enhancement of phase-sensitive OTDR vibration sensing systems,” Sensors 19(7), 1709 (2019). [CrossRef]

16. P. Lu, N. Lalam, M. Badar, et al., “Distributed optical fiber sensing: Review and perspective,” Appl. Phys. Rev 6(4), 041302 (2019). [CrossRef]

17. H. F. Taylor and C. E. Lee, “Apparatus and method for fiber optic intrusion sensing,” (Google Patents, 1993).

18. K. N. Choi and H. F. Taylor, “Spectrally stable Er-fiber laser for application in phase-sensitive optical time-domain reflectometry,” IEEE Photon. Technol. Lett. 15(3), 386–388 (2003). [CrossRef]

19. J. C. Juarez, E. W. Maier, K. N. Choi, et al., “Distributed fiber-optic intrusion sensor system,” J. Lightwave Technol. 23(6), 2081–2087 (2005). [CrossRef]

20. T. Zhu, Q. He, X. Xiao, et al., “Modulated pulses based distributed vibration sensing with high frequency response and spatial resolution,” Opt. Express 21(3), 2953–2963 (2013). [CrossRef]

21. Z. Pan, K. Liang, Q. Ye, et al., “Phase-sensitive OTDR system based on digital coherent detection,” in 2011 Asia Communications and Photonics Conference and Exhibition (ACP) (IEEE2011), pp. 1–6.

22. Q. Sun, Q. Li, L. Chen, et al., “Pattern recognition based on pulse scanning imaging and convolutional neural network for vibrational events in Φ-OTDR,” Optik 219, 165205 (2020). [CrossRef]

23. S. Li, R. Peng, and Z. Liu, “A surveillance system for urban buried pipeline subject to third-party threats based on fiber optic sensing and convolutional neural network,” Struct Health Monit 20(4), 1704–1715 (2021). [CrossRef]

24. Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” IEEE Trans Knowl Data Eng 34(1), 249–270 (2022). [CrossRef]

25. Z.-Q. Zhao, P. Zheng, S.-t. Xu, et al., “Object detection with deep learning: A review,” IEEE Trans Neural Networks Learn Syst 30(11), 3212–3232 (2019). [CrossRef]

26. Z. Sha, H. Feng, X. Rui, et al., “PIG Tracking utilizing fiber optic distributed vibration sensor and YOLO,” J. Lightwave Technol. 39(13), 4535–4541 (2021). [CrossRef]

27. S. Ren, K. He, R. Girshick, et al., “Faster r-cnn: Towards real-time object detection with region proposal networks,” arXiv, arXiv:1506.01497 (2015). [CrossRef]

28. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.

29. R. Girshick, “Fast r-cnn,” inProceedings of the IEEE international conference on computer vision (2015), pp. 1440–1448.

30. H. Wu, J. Chen, X. Liu, et al., “One-dimensional CNN-based intelligent recognition of vibrations in pipeline monitoring with DAS,” J. Lightwave Technol. 37(17), 4359–4366 (2019). [CrossRef]

31. J. Chen, H. Wu, X. Liu, et al., “A real-time distributed deep learning approach for intelligent event recognition in long distance pipeline monitoring with DOFS,” in 2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) (IEEE2018), pp. 290–2906.

32. H. Zhang, J. Gao, and B. Hong, “Φ-OTDR signal identification method based on multimodal fusion,” Sensors 22(22), 8795 (2022). [CrossRef]

33. Y. Shi, Y. Li, Y. Zhang, et al., “An easy access method for event recognition of Φ-OTDR sensing system based on transfer learning,” J. Lightwave Technol. 39(13), 4548–4555 (2021). [CrossRef]

34. Z. Li, Z. Wang, J. Li, et al., “Technology reseSarch of φ-OTDR fiber perimeter intrusion system based on CNN-LSTM algorithm,” in TShird International Conference on Advanced Algorithms and Neural Networks (AANN 2023) (SPIE2023), pp. 98–105.

35. Y. Yang, H. Zhang, and Y. Li, “Pipeline safety early warning by multifeature-fusion CNN and LightGBM analysis of signals from distributed optical fiber sensors,” IEEE Trans Instrum Meas 70, 1–13 (2021). [CrossRef]

Model	Parts	Structure	Inputs	Outputs
Faster RCNN Model-1 & Faster RCNN Model-2	Feature extraction backbone	Resnet-50	1600 × 800 × 1 grayscale picture	100 × 50 × 1024 feature maps
	Regional Proposal Network	Softmax Layer	100 × 50 × 1024 feature maps	100 × 50 × 9 × 2 positive and negative possibility of anchor boxes
		Bounding Box Regression	100 × 50 × 9 anchor boxes	100 × 50 × 9 × 4 coordinates of anchor boxes
		Proposal Layer	Outputs of Softmax Layer and Bounding Box Regression	600 high positive possibility anchors
	Classification RoI Pooling	CNN + Sottmax	100 × 50 × 1024 feature maps and positive possibility anchors	Boxed Regions of third-party threat in time-space samples

Model	Hyper parameters	Value
Faster RCNN Model-1 & Faster RCNN Model-2	Learning rate	0.0001
	Batch size	16
	Epoch	150
	Optimizer	Adam
	Anchor box confidence	0.5

Location Number	Distance to φ-OTDR devices(km)	Simulated Vibration Types
1	1.26	Pickaxe, Spade, Hammer, Electrical Hammer
2	1.96	Pickaxe, Spade, Hammer, Electrical Hammer
3	2.05	Spade, Electrical Hammer
4	2.89	Pickaxe, Spade, Hammer, Electrical Hammer
5	3.3	Pickaxe, Spade, Hammer, Electrical Hammer
6	3.7	Pickaxe, Spade, Hammer, Electrical Hammer

Third-party threat	The transvers distances between vibration and pipeline (m)	Number of records	Total duration of records (s)
Pickaxe	0-15	20	662
Spade	0-15	57	2103
Hammer	0-15	26	761
Electric hammer	0-15	37	2861
No threat	N/A	N/A	306023

Activity	Sample set 1		Sample set 2
Activity	Training set 1	Testing set 1	Training set 2	Testing set 2
Mechanical excavation	28	42	206	375
Manual excavation	43	58	126	363
Metro	0	377	123	64028
Environmental vibration	0	—	0	—
Overall	71	477	455	64766

Perimeter monitoring of urban buried pipeline threated by construction activities based on distributed fiber optic sensing and real-time object detection

Abstract

1. Introduction

2. Methodology

2.1 Overview of the monitoring method

2.2 Segmentation of time-space samples

2.3 Segmentation of time-space samples

2.4 Two kernel recognition models

3. Field tests

3.1 Overview of the field tests

3.2 Preliminary analysis of vibration signals

4. Method validation

5. Conclusions

Funding

Disclosures

Data availability

References

Supplementary Material (1)

Data availability

Cited By

Figures (8)

Tables (12)

Equations (5)

Optics Express

Tested model/method	Proposed deep learning model	Manual excavation	Mechanical excavation	Overall
Model-1	Fast R-CNN	90.62%	90.15%	90.28%
	Faster R-CNN	98.48%	100.00%	98.85%
	YOLOv3	99.24%	98.96%	99.17%
Model-2	Fast R-CNN	89.60%	88.89%	89.30%
	Faster R-CNN	99.73%	98.35%	99.05%
	YOLOv3	99.47%	98.90%	99.19%
The proposed method	Faster R-CNN	98.48%	100.00%	98.85%
CNN-based method [23]	CNN	98.61%	93.75%	97.12%

	Electrical Hammer	Pickaxe	Hammer	Shovel
Electrical Hammer	92	0	0	0
Pickaxe	0	42	31	15
Hammer	3	7	56	22
Shovel	1	10	29	48

Tested model/method	Proposed deep learning model	Number of Data Points in Datasets N_b (N_t× N_c× N_s)	Total Number of False Alarm Times S_f	Normalized False Alarm Count F
Model-1	Fast R-CNN	1.81e9 (9600 × 500 × 377)	286	0.4550
	Faster R-CNN	1.81e9 (9600 × 500 × 377)	201	0.3198
	YOLOv3	1.81e9 (9600 × 500 × 377)	51	0.0811
Model-2	Fast R-CNN	2.56e9 (200 × 200 × 64028)	274	0.3082
	Faster R-CNN	2.56e9 (200 × 200 × 64028)	165	0.1856
	YOLOv3	2.56e9 (200 × 200 × 64028)	45	0.0506
The proposed method	Faster R-CNN	1.81e9 (9600 × 500 × 377)	2	0.0032
CNN-based method [23]	CNN	1.81e9 (40 × 3 × 1.51e7)	18	0.0286

Tested model/method	Proposed deep learning model	Number of Data Points in Datasets N_b (N_t× N_c× N_s)	Required Recognition Time of All Samples in Dataset T(s)	Normalized Recognition Time A(s)
Model-1	Fast R-CNN	2.29e9 (9600 × 500 × 477)	1431	1.79
	Faster R-CNN	2.29e9 (9600 × 500 × 477)	64	0.08
	YOLOv3	2.29e9 (9600 × 500 × 477)	51	0.06
Model-2	Fast R-CNN	2.56e9(200 × 200 × 64028)	192084	216.07
	Faster R-CNN	2.56e9(200 × 200 × 64028)	9521	10.71
	YOLOv3	2.56e9(200 × 200 × 64028)	7363	8.28
The proposed method	Faster R-CNN	2.29e9 (9600 × 500 × 477)	395	0.50
CNN-based method [23]	CNN	2.29e9 (40 × 3 × 1.91e7)	48351	122.41

Classification Methods	Fiber Length (Km)	Spatial Resolution (m)	Size of Each Sample	Accuracy (%)	Recognition Time of a Sample(s)	Average Recognition Time for Every Data Point in a Sample (s)
1D-CNN [30]	35	10	508 × 1	98.19%	0.0027	5.3E-6
2D-CNN [30,31]	35	10	122 × 9	91.57%	0.0082	7.4E-6
CNN and Transformer [32]	20	—	90 × 90 × 3, 2048 × 1	98.54%	—	—
2D-CNN [33]	—	—	227 × 227 × 3	96.16%	—	—
CNN-LSTM [34]	2	30	—	96.62%	—	—
2D-CNN + Light GBM [35]	48	20	100 × 7	96.16%	—	—
2D-CNN [23]	5.25	10	40 × 3	98%	0.005	4.3E-5
The method in this work	5.25	10	9600 × 500	98.85%	0.83	1.7E-7