Footsteps detection and identification based on distributed optical fiber sensor and double-YOLO model

Yi Shi; Yingchao Zhang; Shangwei Dai; Lei Zhao; Chunying Xu

doi:10.1364/OE.502163

1. Introduction

Footstep signal recognition is considered as a biometric technology, and has wide applications such as personal identification [1,2],medical assistance application [3] and robot research and development [4,5]. The existing footstep recognition methods are various and have their own advantages and disadvantages. Some researchers detect the footsteps through pressure measurement. M. Wagner [3] proposed a knowledge-assisted gait analysis system to help clinicians to base clinical decisions on objective data. They detected the gait by two force plates. J. A. Cantoral-Ceballos [6,7] used optical fiber to form an intelligent carpet to detect the gait and footprint of a person but it covers only a short sensing range. K. Kohov [8] proposed footstep recognition matching method by using pressure-sensitive floor to collect the pressure signal and using semi-Markov models to detect footsteps from the floor data. Patrick Bours [9] used the sound of walking to recognize the person and achieved nearly 80% recognition accuracy. Some researchers detect the gait through vision method. Ganbayar Batchuluun [10] proposed a gait-based human identification method by recognize the gait of a person in the video and achieved a over 97% recognition accuracy. Nowadays, many researchers use wearable sensors to quantitatively analyse the gait to provide clinical diagnosis and treatment procedures [11,12]. However, most of the existing gait-based identification sensors are single-point sensors, which can only cover a small sensing range (several meters). If we want to detect the footsteps and identify the person ID in a long range, these sensors are not suitable. For example, in the task of perimeter security for a plaza, an airport or a prison, some large areas are only accessible to specific authorized person, and if invaded, the number of intrude persons are needed to know. In the medical application, if we want to monitor a patient’s movement in a long range (from at least tens meter to a kilometer) to evaluate his motor ability, both the long sensing range and identification ability are required.

The distributed optical fiber sensor (DOFS), which is based on phase-sensitive optical time-domain reflectometry (Φ-OTDR), can easily cover a large sensing range and detect the vibration along the sensing fiber [13]. It is often used for long range perimeter security. However, the signal identification ability of DOFS is poor [14]. Currently, there is no DOFS that can identify individual’s ID. When a walking signal is detected, it requires additional staff to confirm the identity and number of walkers. Inspired by the gait analysis, we try to make use of the personal information hidden in the gait pattern. In this paper, we propose a gait detection and identification method based on DOFS with deep learning detection model to detect the sound of walking. Compared to the other existing gait detection method, the proposed method can easily cover a large sensing range, such as several kilometers, and then detect the footsteps and identify the walker’s identity document (ID). To the best of our knowledge, this is the first time that the DOFS can identify a walker’s ID. The sensing fiber can be buried into a carpet or just placed underground. It is suitable for not only indoor gait detection, but also outdoor gait detection. By analyzing the gait information within the walking sound, the sensing system can recognize the walker’s ID. The You Only Look Once (YOLO) network is an object detection model which can locate and identify the object simultaneously [15–18]. In order to locate and analyze the walking sound more accurately, a double-YOLO model is applied. The preliminary result shows that this method can reach about 86.0% recognition accuracy for detecting footsteps of three individuals. The additional walker ID identification ability may further promote the application of DOFS in perimeter security, as the ID identification ability can avoid further manual conformation when the footsteps are detected. It can also be used for stealth monitoring instead of video camera in a large building due to the embedded characteristics of optical fibers. Besides, the remote monitoring of gait may also give possibilities for objective evaluation and tackling impairment in motor ability in medical area, and the distributed sensing characteristics give the possibilities for simultaneous multiple person usage.

2. Detection and recognition methodology based on optical fiber sensor and double-YOLO method

2.1 Main structure of Φ-OTDR sensing system

Φ-OTDR is a distributed optical fiber sensing technology. Through injecting a highly coherent probe light pulse into the sensing fiber, the Rayleigh backscattering light (RBL) will carry the stress changing information along the sensing fiber and interfere with each other. The external vibration will cause the light phase change in the sensing fiber and result in light intensity change in the RBL. Through analyzing the RBL, the external vibration can be figured out. For completeness, the structure of Φ-OTDR and the setup of gait detection is shown in Fig. 1.

Fig. 1. The distributed optical fiber sensing system and the gait detection.

Download Full Size | PDF

In a relatively quiet outdoor scene, the optical fiber is buried below the soil and the target persons walk near the sensing fiber. The probe light is generated by an ultra-narrow line width laser (NLL), modulated into pulses by an acoustic optical modulator (AOM), amplified by a erbium doped fiber amplifier (EDFA), and injected into the sensing optical fiber. The Rayleigh backscattered trace (RBT) in the sensing fiber is directly detected by a photodetector (PD), acquired by a data acquisition card (DAC) and then processed by a personal computer (PC). The RBTs collected within a certain time are transformed into a space-time matrix. The horizontal direction of the matrix represents the position along the sensing fiber and the vertical direction represents the pulse repetition sequence. The collected space-time matrix is used to locate and analyze the footsteps.

2.2 Footstep detection based on the double-YOLO method

Object recognition and object detection are the two main issues for machine vision. Object recognition task only identifies the class of object, but do not figure out its position and outline (including its center and outer frame). Object detection task finds out both the position, outline and the class of the object. Typical neural networks for object recognition include VGG, AlexNet and ResNet.

The classic object detection neural networks, such as RCNN, Fast-RCNN and Faster-RCNN, are based on Region of Interest (ROI) algorithms, which firstly finds out the interested regions and then identifies the object classes. The ROI extraction process will walk through the whole scale of input image, which takes a lot of time and makes these networks difficult to achieve real-time. YOLO model is firstly proposed by Joseph and solves the real-time issue [19–21]. It directly locates and labels the object in one round of processing. YOLO-v5 is the fifth version of YOLO, which has better performance in object detection and faster. The structure of YOLO-v5, which is the baseline model of our method, is shown in Fig. 2. The YOLO-v5 model is composed of backbone part, neck part and prediction part. The backbone part consists of focus module, CBS module, CSP module and SPP module, whose details are also shown in Fig. 2. It augments the input image four times and extracts the image features. The neck part mainly consists of CSP modules. The CSP module fuses the feature obtained from different scales and avoids the network degradation problem by adding Res-unit. The prediction part gives out the predicted boxes, predicted object class and confidence probabilities.

Fig. 2. The structure of YOLO-v5. (Conv is short for convolution layer, BN is short for batch normalization, Silu is short for sigmoid linear unit and Concat is short for concatenate).

Download Full Size | PDF

As the RCNN series networks work very slow and cannot meet the potential real-time need, we chose YOLO series network for our research. However, due to the very similar detail feature of footsteps belong to different walkers in the footstep detection task, the YOLO-v5 network can locate the footsteps but shows bad performance on walker’s ID identification. According to gait analysis research [8,9,22], the stride and step frequency are the dynamic features of a person’s gait, which can be used to identify the walker. Thus, a double-YOLO method, which consists of YOLO-1 model and YOLO-2 model, is proposed to identify the walkers and locate their corresponding footsteps. The basic workflow is shown in Fig. 3 and Fig. 4, which mainly contains 4 stages, signals acquisition and data pre-processing, training set preparation, training of two models and offline test with results combination.

Fig. 3. Construction of the proposed footstep detection method.

Download Full Size | PDF

Fig. 4. The test and result fusion process.

Download Full Size | PDF

The YOLO-1 model is trained to identify the walker by the stride and step frequency and gives out a large predicted box containing a series of footsteps. The YOLO-2 model is trained to accurately locate each single footstep. Then the final results, including both the walker’s ID and footstep location, are acquired by fusing these two model outputs. If the predicted box of a single footstep, which acquired by YOLO-2, is located in the large predicted box acquired by YOLO-1, the single footstep predicted box and the corresponding walker’s ID will be added to the final output. In the test process, the data image will be simultaneously input into YOLO-1 and YOLO-2, and the walker’s ID and location of each single footstep will be figured out, as shown in Fig. 4

In order to train the two YOLO-v5 models in different direction and test the proposed method, 3 labeling method is introduced here. Labeling Method-1: a series of continuous footsteps is taken as one data sample and the label is the walker’s ID. Labeling Method-2: a single footstep is taken as one data sample, and the label is just a footstep, which is used to distinguish it from the background. Labeling Method-3: a single footstep is taken as one data sample, and the label is the walker’s ID. It should be noted that there are several footsteps in each grayscale image and one grayscale image can be labeled by 3 labeling methods simultaneously.

The grayscale images used for training will be labeled by both Labeling Method-1 and Labeling Method-2, and we can get the corresponding Training Set-1 and Training Set-2. The Training Set-1 is used to train YOLO-1 model and the Training Set-2 is used to train YOLO-2 model. The grayscale images used for test will be labeled by Labeling Method-3. Both the walker’s ID and location of each single footsteps will be tested to evaluated the footsteps detection performance. Figure 5 shows a typical grayscale image with different labeling methods.

Fig. 5. Typical data samples and their labels under Labeling Method-1(a), Labeling Method-2(b), and Labeling Method-3(c).

Download Full Size | PDF

3. Footstep collection and detection experiment

3.1 Data collection and pre-processing

The setup of the Φ-OTDR system used in the experiment is shown in Fig. 1. The linewidth of NLL is 3kHz. The probe pulse width is 200 ns and repetition rate is 10kHz. The sampling frequency of DAC is 50 MHz. The sensing optical fiber is a kilometer-long G652 single mode fiber with steel wire reinforcement and polyvinyl chloride cladding, and it is buried 5 cm below the ground surface. The buried sensing fiber passes through brick floor, grassland, clay land and sand land to simulate different environment. The data collected from different environment are mixed together. Three persons denoted as No.I (Female, 160 cm, 60 kg), No.II (Male, 175 cm, 75 kg), and No.III (Male, 175 cm, 65 kg), walked along the sensing fiber in different days and the RBTs were recorded. Both one person walked alone and two persons walked face to face are conducted. It should be noticed that these walkers walked along the sensing fiber within 5 m radial distance, which increases the data diversity.

The RBTs collected within 110m-340 m and within 20s are extracted and transformed into a time-space matrix. The column direction of the matrix stands for temporal domain and the row direction of the matrix stands for spatial domain. The step signal is analyzed in frequency domain compared to background noise, as shown in Fig. 6. It shows that the detected step signal is basically below 1kHz. Therefore, the direct component (DC) in each column is filtered by a 25Hz∼2.5kHz bandpass filter. The details of each footsteps signal will be kept in this frequency band. Typical waveforms of each step of different person are shown in Fig. 7. Then, the time-space matrices will be transformed into gray-scale images. YOLO-v5 supports any input size of multiples of 32. A very large input image will reduce the detection ability of small objects, increase the training time and require large computing buffer. However, a very small input image may reduce the detection ability due to the information loss in image compression. Usually, the input size of YOLO model is several hundred pixels [21–23]. In order to get a feasible input size for footstep detection, a primary test is conducted. 284 grayscale images are labeled by Labeling Method-3 and divided into training, validation and test with ratio of 8:1:1. A YOLO-v5 model pre-trained on ImageNet is retrained on it with different input sizes. The classification accuracy of YOLO-v5 on validation set with respect to input size (square shape) is recorded and shown in Fig. 8. Figure 8 shows that all of the input sizes larger than 608 pixels are feasible choices for footstep detection. It should be noticed that only a feasible input size is needed here. Taking the computer hardware (GeForce RTX2080Ti @NVIDIA with 4352 CUDA core and 11 G memory) into consideration, 608 × 608 is chose to be the grayscale image size.

Fig. 6. Frequency comparison between step signal and background noise.

Download Full Size | PDF

Fig. 7. Typical footsteps within 10s. No.I(a), No.II(b), and No.III(c). The red dot box means one footstep.

Download Full Size | PDF

Fig. 8. The effect of input size.

Download Full Size | PDF

We require manuscripts to identify a single corresponding author. The corresponding author typically is the person who submits the manuscript and handles correspondence throughout the peer review and publication process. If other statements about author contribution and contact are needed, they can be added in addition to the corresponding author designation. The corresponding author will have an asterisk correlating to an email address.

3.2 Data set preparation

All of the grayscale images are divided into training set, validation set and test set with ratio of 8:1:1. The details are shown in Table 1. In the training process, the grayscale images in the training set will be labeled by Labeling Method-1, Labeling Method-2 and Labeling Method-3, which forms Training Set-1, Training Set-2 and Training Set-3. As the Labeling Method-1 uses a series of footsteps as one target sample, which leads to the reduction of the number of target samples, an image augmentation based on overlapping 50% part of two adjacent grayscale images is applied in Training Set-1. The Training Set-1 is used for YOLO-1 training. The Training Set-2 is used for YOLO-2 training. And the Training Set-3 is used for a single-YOLO model training for comparison. The grayscale images in validation set will also be labeled by the corresponding labeling method for each model validation. The gray-scale images in test set will be labeled by Labeling Method-3, which forms the final test set. The test set will be used for both the single-YOLO method and double-YOLO method.

Table 1. The Information of Each Dataset

View Table | View all tables in this article

3.3 Judgment and combination of two YOLO model output

Accuracy, precision, recall and F1-score are four common evaluation indicators of classification task. Their definitions are as follow,

(1)$$Accuracy = \frac{{TP + TN}}{{TP + TN + FP + FN}}$$

(2)$$Precision = \frac{{TP}}{{TP + FP}}$$

(3)$$Recall = \frac{{TP}}{{TP + FN}}$$

(4)$${F_1} - scor\textrm{e} = \frac{{2\cdot Precision\cdot Recall}}{{Precision + Recall}}$$

where, TP means the both predicted label and real label of a detected object are positive, TN means both the predicted label and real label of a detected object are negative, FP means the predicted label is positive but the real label is negative and FN means the predicted label is negative but the real label is positive. The positive and negative classifications are determined for each class. For example, the walker class No.I is positive and the other classes are negative when calculating these indicators for walker class No.I. The final indicators for all classes are the weighted average of corresponding indicators for each class with their object numbers as the weight. mean average precision (mAP) is another common evaluation indicator in target detection task. It shows the comprehensive accuracy (combine precision and recall) of YOLO model. The larger mAP means the better performance. Nuisance alarm rate (NAR) is another common evaluation indicator for distributed optical fiber sensors. NAR refers to the rate of regarding the background as a footstep in this task.

In the object detection task, YOLO model gives out a predicted box, a confidence probability and a label information for each detected object. The predicted box shows the location of detected object in the image and the confidence probability shows the possibility of the predicted box containing the object and the accuracy of the predicted box. Usually, if the confidence probability is larger than 0.5, the corresponding predicted box will be regarded as a positive output of YOLO-v5. Intersection over union (IOU) is a common parameter to evaluate the accuracy of the predicted box. Its definition is shown in Fig. 9(a), which shows the overlap of the real target box and the predicted box. Inspired by IOU, IOUDY is proposed to combine the output of YOLO-1 and YOLO-2 in double-YOLO method. YOLO-1 and YOLO-2 will give out predicted boxes for the same input grayscale image, separately. The IOUDY is defined as the overlap of these two kinds of predicted boxes, as shown in Fig. 9(b). If the ratio of the overlap to the corresponding predicted box of YOLO-2 is larger than the threshold we set, the corresponding walker’s ID will be added to the single footstep. Then these predicted single-footstep boxes with their walker’s ID will be the final output of double-YOLO method.

Fig. 9. Definition of IOU (a) and IOU_DY (b).

Download Full Size | PDF

3.4 Effect of IOU_DY threshold

The effect of IOU_DY is tested experimentally. In order to accelerate the training process, all of the YOLO-v5 models used in this paper are pre-trained on ImageNet. YOLO-1 model is retrained on Training Set-1, and YOLO-2 model is retrained on Training Set-2. Then the combination of YOLO-1 and YOLO-2 (the double-YOLO method) is evaluated on the validation set with Labeling Method-3. The results with IOU_DY from 0.5 to 0.9 are shown in Table 2. With a stricter IOU_DY threshold, these evaluating indicators will decrease a little. Therefore, IOU_DY is chose to be 0.5 in the following tests.

Table 2. The effect of IOU_DY threshold

View Table | View all tables in this article

3.5 Comparison between different labeling methods

Different labeling methods provide different feature information and lead to different evolutionary directions of deep learning model. Both Labeling Method-1 and Labeling Method-3 provide the walker’s ID information. But the difference is that Labeling Method-1 additionally provides stride and step frequency information. In order to test whether this additional information is useful for walker’s ID identification, two YOLO-v5 models are separately retrained on the Training Set-1 and Training Set-3. The training batch size is 16, and maximum epoch is 700. In fact, the model trained on Training Set-1 is the YOLO-1 model in double-YOLO method. And the model trained on Training Set-3 is denoted as Single-YOLO model. Both of them are tested on the test set. When YOLO-1 model is tested, Labeling Method-1 is used. The accuracy and mAP of YOLO-1 model are 81.8% and 84.9%, and those of Single-YOLO model are 73.5% and 63.3%. This result shows that the additional stride and step frequency information can effectively help the YOLO-v5 model to better identify the walker’s ID.

3.6 Comparison among SVM, single-YOLO method and double-YOLO method

In this section, we compared the performance of basic machine learning method (support vector machine, SVM), deep network (single-YOLO) and the improved method (double-YOLO). We first test the clarification performance of SVM. The footsteps are firstly located by moving differential method [24] and the time series of the footsteps are picked up as the data samples for SVM. The data shown in Table 1 are used for this SVM test and the labels are the walker’s ID. The SVM classification results are shown in Table 3. It can be seen that SVM only achieves 31.2% classification accuracy. This is because the footsteps waveform from each individual are very similar. And the distance between the sensing point and the walker keeps changing as the walker is moving, which makes the detected vibration intensity unstable and further blurs the signal feature of different footsteps. Besides, the waveform distortion between the light tensity detected by DOFS and the footstep waveform further increase the difficulty in temporal waveform classification.

Table 3. The classification test by SVM

View Table | View all tables in this article

Then we test the footsteps detection and recognition by YOLO network. YOLO-1 model provides the walker’s ID information and YOLO-2 model provides the precise footsteps position. In order to evaluate the generalization ability of the proposed double-YOLO method, 10-fold cross-validation method, which is a common used model validation method, is applied for the validation experiment. In the 10-fold cross-validation, the model will be repeatedly trained and tested 10 times. All of the gray-scale images are divided into 10 subsets. In each round of training and test, one subset is chose to be the test set and the other 9 subsets are used for training. In this way, every gray-scale image takes turns to be the test set. The average detection accuracy of 10 tests is used as the final detection accuracy for method evaluation. The single-YOLO method is used for comparison. The training batch size is 16, and maximum epoch is 700. The IOU threshold and coincidence of single-YOLO, YOLO-1 and YOLO-2 are all set to be 50% and 0.5, respectively. The IOUDY threshold is 50%.

A typical output of those two methods are shown in Fig. 10. The gray image is transferred into color for better demonstration in Fig. 10. It is from the case walker No.I and No.III walking face to face. It shows that those two methods can locate most of the footsteps and distinguish the corresponding walker’s ID. However, due to the similarity between the details of different person’s footstep vibration waveforms, some footsteps belonging to No.III are wrongly judged as the footsteps belonging to No.II in single-YOLO method.

Fig. 10. Typical footsteps recognition result of double-YOLO (a) and single-YOLO (b).

Download Full Size | PDF

The evaluating indicators of those two methods under 10-fold cross-validation are shown in Table 4. The average accuracy of double-YOLO method is 86.0%, which is 12.6% higher than single-YOLO method. The mAP improvement is 19.1%. The accuracy and mAP comparisons are also shown in Fig. 11. It can be seen that the stability of double-YOLO method is also improved compared to single-YOLO method. The confusion matrices of Single-YOLO method and double-YOLO method in the first round test are shown in Fig. 12. The single-YOLO method may predict some background area as a footstep by error but the double-YOLO method may reduce this misjudgment. YOLO-2 is trained to detect just the footsteps without walker’s ID information, which barely mispredicts the background area. Besides, the probability of the same background area to be predicted as a footstep by two YOLO models at the same time is greatly reduced. Meanwhile, these two YOLO networks are trained to focus on the footstep features on different scales, which further restrains the misidentification probability.

Fig. 11. 10-fold cross-validation result of double-YOLO model and single-YOLO model.

Download Full Size | PDF

Fig. 12. Confusion matrices of single-YOLO (a) and double-YOLO method (b).

Download Full Size | PDF

Table 4. Comparison between Single and Double YOLO Method

View Table | View all tables in this article

A paired t-test is further conducted to show the performance improvement in statistic. The statistical test results are shown in Table 5. It shows that the double-YOLO method can significantly (with p < 0.01) improve the detection performance.

Table 5. Paired t-test Results of Two Methods

View Table | View all tables in this article

3.7 Leave-one-location-out test

In order to further test the generalization ability of the proposed method, the leave-one-location-out test is conducted. There are 51 grayscale images from around ∼230 m location. All of them are picked up to form the test set. The rest 179 and 54 gray-scale images form the training set and validation set. Through this segmentation, the data in test set come from a location completely different from the data in training set. The single-YOLO and double-YOLO methods are used. The labeling methods and other setup used here are as same as in Section 3.6. The result is shown in Table 6. It shows that the double-YOLO method’s performance is similar to the test result in Section 3.6, which can reach 83.5% accuracy and 83.4% mAP.

Table 6. Leave-one-location-out Test Result

View Table | View all tables in this article

4. Discussion

4.1 Comparison of the state of art in footstep detection area

The electrical pressure sensor placed on the floor is the most commonly sensor used to detect the footsteps. It can accurately detect the detailed waveform of a footstep and support clinical diagnosis. However, as a point sensor, the sensing range is limited (fixed point or a few meters) [3,8]. The foot detection and ID identification by visual-based method can reach a very high identification accuracy (over 95%) [10], but the weakness is that the detection effect is severely affected by lighting conditions and the detection range is only several tens of meters. The plastic optical fiber can also be used for gait detection [6,7]. The form of optical fiber allows the sensor to be embedded into other materials, such as forming a intelligence carpet. It can cover a relatively larger sensing range (several meters) than the traditional electrical pressure sensors and achieve smaller sensor sizes. However, the large light propagation attenuation coefficient makes the plastic optical fiber unsuitable for long range sensing. The DOFS based on single mode optical fiber inherits the advantages of fiber optic sensing and offers a much longer sensing range (over several kilometers). Besides, it can detect multiple footsteps from different locations at the same time. However, the footsteps detection and walker’s ID identification is the weakness of DOFS. Based on the deep learning approach proposed in this paper, the DOFS can identify the walker’s ID with 86.0% accuracy. To the best of our knowledge, it is the first time that the DOFS can recognize a walker’s ID and meanwhile locate and detect the footsteps. Although the accuracy rate is lower than the visual-based method, the much longer sensing range makes it applicable in more occasions, such as surrounding a hospital, an office building or a prison. The comparison between different type of gait sensors are listed in Table 7.

Table 7. Comparison of Different Type of Gait Sensors

View Table | View all tables in this article

The double-YOLO method also has its limitation. The detected Φ-OTDR traces will be divided into several pieces in spatial domain before being input into YOLO network. Because the YOLO network is not suitable for detecting very small target (less than 32 pixels in the image), it is better to convert the Φ-OTDR traces within 1/64 image size multiplied by the optical pulse length into one data sample. In our case (40 m pulse length and 608 × 608 image size), the limitation length is about 380 m for one data sample. If too long Φ-OTDR traces are transferred into one data sample, the detection rate may drop. Another potential challenge is the overlap of each footsteps. If two footsteps from two person are overlapped in both time and space, the double-YOLO method cannot decide whom it should belong to.

4.2 Sensing length and feasible

The main idea of this paper is to use DOFS to detect the footsteps and recognize the walker’s ID. The regular setup of Φ-OTDR can reach over 20 km sensing range. With the help of distributed amplification technology, it can even cover over ∼100 km sensing range. Limited by the experiment condition, we only have one kilometer optical fiber buried underground. However, each point of the sensing fiber is an independent sensor due to the distributed sensing characteristics. The major difference between the sensing points from near end and far end is the signal to noise rate (SNR). The SNR of the sensing point from the near end will be better. In the regular setup of Φ-OTDR, the SNR for the first 10 km of optical fiber is very high and the sensed vibration waveform can be recognized. Our work is a preliminary experiment shows effectiveness of the DOFS with the double-YOLO method in detecting the footsteps and identifying the walker’s ID when the SNR is good (or when the detected vibration waveform can be recognized). The initial results are promising, but more research are still needs to be done in this area, containing longer test range, more subjects, more samples per subjects and more different environment. Although only 1 km test was conducted in this paper, it is still much longer than the other existing footstep detection method. It is long enough to cover a small garden, a room or event a small house.

5. Conclusion

Gait is a biometric of a person. This paper proposes a gait detection and identification method by using DOFS to detect the sound of walking and identify the walker’s ID by double-YOLO model. The stride and step frequency feature of gait is used to enhance the walker identification performance. Experiments involving the identification of three walking individuals shows that the proposed method can effectively locate the footsteps and identify who is walking. Compared to single-YOLO method, the double-YOLO method can improve the classification accuracy and mAP to 86.0% and 81.6%, with 12.6% and 19.1% improvement separately. These initial results are promising, and this footstep detection method may promote the development of gait-based clinical diagnosis or person identification application, such as perimeter security and long term patient movement monitoring.

Funding

National Natural Science Foundation of China (61801283); Basic and Applied Basic Research Foundation of Guangdong Province (2019A1515011060, 2021A1515012001).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Shoji, T. Takasuka, and H. Yasukawa, “Personal identification using footstep detection,” in 2004 International Symposium on Intelligent Signal Processing and Communication Systems, Seoul, South Korea (2004), pp. 43–47.

2. G. Qian, J. Zhang, and A. Kidané, “People Identification Using Floor Pressure Sensing and Analysis,” IEEE Sens. J. 10(9), 1447–1460 (2010). [CrossRef]

3. M. Wagner, D. Slijepcevic, B. Horsak, et al., “KAVAGait: Knowledge-Assisted Visual Analytics for Clinical Gait Analysis,” IEEE Trans. Visual. Comput. Graphics 25(3), 1528–1542 (2019). [CrossRef]

4. T. Li, P. Kuo, Y. Ho, et al., “A biped gait learning algorithm for humanoid robots based on environmental impact assessed artificial bee colony,” IEEE Access 3(1), 13–26 (2015). [CrossRef]

5. D. Zhao, J. Yang, M. O. Okoye, et al., “Walking Assist Robot: A Novel Non-Contact Abnormal Gait Recognition Approach Based on Extended Set Membership Filter,” IEEE Access 7, 76741–76753 (2019). [CrossRef]

6. J. Cantoral-Ceballos, N. Nurgiyatna, P. Wright, et al., “Intelligent Carpet System, Based on Photonic Guided-Path Tomography, for Gait and Balance Monitoring in Home Environments,” IEEE Sens. J. 15(1), 279–289 (2015). [CrossRef]

7. O. Costilla-Reyes, P. Scully, and K. Ozanyan, “Temporal Pattern Recognition in Gait Activities Recorded With a Footprint Imaging Sensor System,” IEEE Sens. J. 16(24), 8815–8822 (2016). [CrossRef]

8. K. Koho, J. Suutala, T. Seppänen, et al., “Footstep pattern matching from pressure signals using Segmental Semi-Markov Models,” in 12th European Signal Processing Conference, Vienna, Austria (2004).

9. P. Bours and A. Evensen, “The Shakespeare experiment: Preliminary results for the recognition of a person based on the sound of walking,” in 2017 International Carnahan Conference on Security Technology, Madrid, Spain (2017).

10. G. Batchuluun, H. Yoon, J. Kang, et al., “Gait-Based Human Identification by Combining Shallow Convolutional Neural Network-Stacked Long Short-Term Memory and Deep Convolutional Neural Network,” IEEE Access 6, 63164–63186 (2018). [CrossRef]

11. S. Chen, J. Lach, B. Lo, et al., “Toward Pervasive Gait Analysis With Wearable Sensors: A Systematic Review,” IEEE J. Biomed. Health Inform. 20(6), 1521–1537 (2016). [CrossRef]

12. M. I. Mohamed Refai, B. -J. F. van Beijnum, J. H. Buurke, et al., “Gait and Dynamic Balance Sensing Using Wearable Foot Sensors,” IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 218–227 (2019). [CrossRef]

13. Y. Shi, Y. Wang, L. Zhao, et al., “An Event Recognition Method for Φ-OTDR Sensing System Based on Deep Learning,” Sensors 19(15), 3421–3430 (2019). [CrossRef]

14. Y. Shi, J. Chen, S. Dai, et al., “Φ-OTDR Event Recognition System Based on Valuable Data Selection,” Journal of Lightwave Technology (2023).

15. U. S. Shukla and P. R. Mahapatra, “Optimization of biased proportional navigation (guided projectiles),” IEEE Trans. Aerosp. Electron. Syst. 25(1), 73–79 (1989). [CrossRef]

16. G. Yang, W. Feng, J. Jin, et al., “Face Mask Recognition System with YOLO MODELV5 Based on Image Recognition,” in IEEE 6th International Conference on Computer and Communications, Chengdu, China (2020).

17. F. Zhou, H. Zhao, and Z. Nie, “Safety Helmet Detection Based on YOLO MODELv5,” in IEEE International Conference on Power Electronics, Computer Applications, Shenyang, China, (2021).

18. X. Chao, G. Sun, H. Zhao, et al., “Identification of Apple Tree Leaf Diseases Based on Deep Learning Models,” Symmetry 12(7), 1065–1082 (2020). [CrossRef]

19. Y. Tian, G. Yang, Z. Wang, et al., “Apple detection during different growth stages in orchards using the improved YOLO -V3 model,” Computers and Electronics in Agriculture 157, 417–426 (2019). [CrossRef]

20. C. Lin, J. Lu, G. Wang, et al., “Graininess-Aware Deep Feature Learning for Robust Pedestrian Detection,” IEEE Trans. on Image Process. 29, 3820–3834 (2020). [CrossRef]

21. J. Redmon, S. Divvala, R. Girshick, et al., “You Only Look Once: Unified, Real-Time Object Detection,” in International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, America (2016) pp. 779–788.

22. Z. Sha, H. Feng, X. Rui, et al., “PIG Tracking Utilizing Fiber Optic Distributed Vibration Sensor and YOLO,” J. Lightwave Technol. 39(13), 4535–4541 (2021). [CrossRef]

23. Y. Song, Z. Xie, X. Wang, et al., “MS-YOLO: Object Detection Based on YOLOv5 Optimized Fusion Millimeter-Wave Radar and Machine Vision,” IEEE Sens. J. 22(15), 15435–15447 (2022). [CrossRef]

24. Y. Lu, T. Zhu, L. Chen, et al., “Distributed Vibration Sensor Based on Coherent Detection of Phase-OTDR,” J. Lightwave Technol. 28(22), 3243–3249 (2010). [CrossRef]

IOU_DY Threshold	Accuracy	Precision	Recall	F₁-score	mAP	NAR
0.5	0.869	0.900	0.825	0.860	0.812	0.062
0.6	0.848	0.927	0.809	0.863	0.800	0.056
0.7	0.838	0.930	0.797	0.858	0.784	0.049
0.8	0.827	0.933	0.785	0.835	0.777	0.046
0.9	0.801	0.938	0.762	0.841	0.756	0.042

Index	Single-YOLO Method / Double-YOLO Method
Index	Accuracy	Precision	Recall	F₁-score	mAP	NAR
1	0.717/0.869	0.771/0.900	0.695/0.826	0.728/0.860	0.642/0.812	0.155/0.040
2	0.725/0.862	0.869/0.948	0.652/0.833	0.735/0.887	0.621/0.819	0.114/0.052
3	0.803/0.837	0.821/0.910	0.763/0.846	0.786/0.866	0.741/0.811	0.119/0.023
4	0.695/0.859	0.807/0.936	0.652/0.766	0.701/0.828	0.615/0.746	0.141/0.049
5	0.740/0.868	0.844/0.918	0.637/0.856	0.713/0.881	0.600/0.854	0.128/0.056
6	0.695/0.881	0.736/0.969	0.657/0.861	0.675/0.911	0.588/0.860	0.161/0.029
7	0.745/0.865	0.729/0.954	0.594/0.806	0.648/0.872	0.565/0.800	0.148/0.029
8	0.721/0.827	0.814/0.945	0.684/0.789	0.735/0.858	0.650/0.784	0.140/0.039
9	0.7470/.889	0.843/0.940	0.727/0.841	0.776/0.887	0.681/0.830	0.116/0.034
10	0.7530.841	0.797/0.953	0.566/0.853	0.657/0.898	0.543/0.846	0.150/0.042
Average	0.734/0.860	0.803/0.937	0.663/0.828	0.716/0.875	0.625/0.816	0.137/0.039

Evaluating Indicators	Standard deviation (SD)	T value	Significant (p value, 2-tailed)
Accuracy	0.0427	9.30	6.53*10⁻⁶
Precision	0.0564	7.52	3.61*10⁻⁵
Recall	0.0654	7.98	2.25*10⁻⁵
F₁-score	0.0570	8.79	1.03*10⁻⁵
mAP	0.0736	8.23	1.76*10⁻⁵
NAR	0.0218	14.23	1.79*10⁻⁷

Method	Accuracy	Precision	Recall	F₁-score	mAP	NAR
Single-YOLO	0.782	0.793	0.802	0.794	0.739	0.118
Double-YOLO	0.835	0.947	0.839	0.888	0.834	0.025

Sensor Type	Detected Physical Parameter	ID Recognition Rate	Sensing Range	Advantages	Limitation
Electrical pressure sensor [8]	Pressure	∼90%	∼ meters	Can precisely detect the footstep waveform	Only can be placed in specific place
Visual based sensor [10]	Light	∼95%	∼meters	Good at extract spatial and temporal information	Hard work in low illumination environments.
Microphone sensor [9]	Sound/Vibration	∼30% to 80%	∼ meters	Easy to install and cheap	Other vibration interference
Plastic optical fiber sensor [6,7]	Pressure	-	∼meters	Easy to embedded and small sensing size	High light attenuation coefficient and short sensing range
DOFS sensor (The proposed)	Sound/Vibration	86%	∼kilo-meters	Long sensing range and simultaneous detection in multiple locations	Other vibration interference and low spatial resolution

Footsteps detection and identification based on distributed optical fiber sensor and double-YOLO model

Abstract

1. Introduction

2. Detection and recognition methodology based on optical fiber sensor and double-YOLO method

2.1 Main structure of Φ-OTDR sensing system

2.2 Footstep detection based on the double-YOLO method