## Abstract

Real-time detection and tracking for fast moving object has important applications in various fields. However, available methods, especially low-cost ones, can hardly achieve real-time and long-duration object detection and tracking. Here we report an image-free and cost-effective method for detecting and tracking a fast moving object in real time and for long duration. The method employs a spatial light modulator and a single-pixel detector for data acquisition. It uses Fourier basis patterns to illuminate the target moving object and collects the resulting light signal with a single-pixel detector. The proposed method is able to detect and track the object with the single-pixel measurements directly without image reconstruction. The detection and tracking algorithm of the proposed method is computationally efficient. We experimentally demonstrate that the method can achieve a temporal resolution of 1,666 frames per second by using a 10,000 Hz digital micro-mirror device. The latency time of the method is on the order of microseconds. Additionally, the method acquires only 600 bytes of data for each frame. The method therefore allows fast moving object detection and tracking in real time and for long duration. This image-free approach might open up a new avenue for spatial information acquisition in a highly efficient manner.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Object detection and tracking has found important applications in military, industry, and scientific research. Real-time object detection and tracking allows rapid response for emergencies. Various fast moving object detection and tracking approaches have been proposed [1–14], but few can achieve real-time and long-duration detection and tracking in a cost-effective manner. There are mainly two types of moving object detection and tracking approaches: image-free and image-based.

Radar [5] is an image-free technique for object detection and tracking. It sends out electromagnetic pulses in radio or microwaves range and collects the reflected pulses from the target object. To locate a moving object, radar measures the running time of the transmitted pulses (also known as, Time-of-Flight [6]). LiDAR [7–10] works on the principle of radar, but uses light waves instead of radiowaves. Radars are generally used for remote objects and large field-of-view. Radar systems with high spatial resolution are generally costly [11].

Image-based methods [12–14] rely on photography. Image sensors are used to capture images of the target object. Image post-processing and image analysis algorithms are further employed to detect and track the target object in the images. The accuracy of object detection and tracking depends on the quality of images captured and the performance of the algorithms utilized. However, the quality of images might be affected by two factors. The one is motion blur caused by high-speed moving objects. The other is exposure time which is short in order to reduce the undesired motion blur. Short exposure time might lead to a low signal-to-noise ratio. For detection and tracking of fast moving objects, high-speed cameras are preferable. It is because high-speed cameras have a high frame rate and are able to provide images with a relatively high signal-to-noise ratio even when the exposure time is short. However, high-speed cameras are expensive and their data throughput is generally huge. In practice, data storage and bandwidth are limited, resulting in long-duration object detection and tracking still a challenge. Moreover, computationally complex image analysis algorithms and limited data-processing capacity in actual also add challenge to real-time object detection and tracking.

Object detection is substantially to determine the presence of an object. It is a dual (yes-or-no) problem and consumes a little information in theory. On the other hand, object tracking is to acquire the two-dimensional (2-D) or three-dimensional (3-D) spatial coordinates of an object. Mathematically, there are only 2 or 3 unknowns to be determined. Thus, object tracking consumes a little information in theory as well. From this perspective, it turns out to be wasting resources that image-based methods (such as, high-speed photography) acquire a huge amount of image data for object detection and tracking. The redundancy in data acquisition is the main reason why real-time and long-duration object detection and tracking is challenging.

Recently, D. Shi *et al*. proposed an approach which is based on single-pixel imaging [15]. This method utilizes the ‘slice theory’ of Hadamard transform to obtain one-dimensional (1-D) images instead of 2-D images to locate the target object, so as to reduce the redundant data acquired. By using a 22,000 Hz digital micro-mirror device (DMD), this method achieves a temporal resolution of 1/177 seconds and a spatial resolution of 256×256 pixels. The method demonstrates reduction of redundancy in data acquisition is an effective way for real-time and long-duration object detection and tracking.

Here we propose an image-free and cost-effective method for fast moving object detection and tracking in real time and for long duration. The principle of the proposed image-free method is completely different from existing image-free approaches. Inspired by single-pixel imaging [16,17], the proposed method acquires the spatial information of target object by using spatial light modulation and single-pixel detection. The proposed method is based on Fourier single-pixel imaging [18–23], but acquires no images for object detection and tracking. Instead, it uses 6 Fourier basis patterns for structured light modulation to measure only 2 Fourier coefficients among the complete Fourier spectrum of object image. As Fourier transform is a global-to-point transformation, the 2 Fourier coefficients provide sufficient knowledge of presence or/and motion of object. Moreover, the property that, translation in spatial domain results in a linear phase shift in Fourier domain, allows us to estimate the displacement of a moving object with the 2 Fourier coefficients acquired. The algorithm for object detection and tracking of the proposed method is computationally efficient. Consequently, the proposed method enables real-time object detection and tracking.

## 2. Principle

Fourier single-pixel imaging was initially proposed for image acquisition. It is characterized by the use of Fourier basis (that is, sinusoidal intensity) patterns for spatial light modulation. As such, the spatial information of object is encoded into a 1-D temporal light signal. By measuring the intensity of the resulting light signal with a single-pixel detector, Fourier single-pixel imaging is able to recover the Fourier spectrum of the object image and the object image can be obtained by applying an inverse Fourier transform to the Fourier spectrum recovered.

Each Fourier basis pattern $P({x,\;y} )$ is characterized by its spatial frequency pair $({{f_x},{f_y}} )$ and initial phase ${\varphi _0}$:

where $({x,\;y} )$ denotes 2-D coordinate in spatial domain,*A*is the average intensity of the pattern, and

*B*denotes the contrast. Using the Fourier basis patterns for modulating the illumination light field or the detection light field, the resulting light intensity

*D*is equivalent to an inner product of the object image

*I*and the Fourier basis pattern $P$:

*D*to denote the single-pixel measurement for simplicity. The inner product is either the real or the imaginary part of a Fourier coefficient [18,20].

The integral in Eq. (2) implies that Fourier transform is a global-to-point transformation. Specifically, every single coefficient in Fourier domain is contributed by all points in spatial domain. Consequently, any changes in spatial domain will affect all coefficients in Fourier domain. Exploiting this property, we can detect presence or/and motion of objects in the scene by monitoring the change of one or a few Fourier coefficients in Fourier domain. It can be done by using Fourier single-pixel imaging, as Fourier single-pixel imaging allows Fourier coefficients acquisition by using Fourier basis patterns for spatial light modulation.

By modulating either the illumination light field or the detection light field with Fourier basis patterns, we can observe continuous and large variations in single-pixel measurements when an object enters the scene or starts moving in the scene. A single Fourier coefficient is sufficient for moving object detection by our method. Here we propose to use 2 Fourier coefficients. There are two reasons. The one is that object tracking by the proposed method, as will be introduced below, uses 2 Fourier coefficients. Object detection and object tracking can share the 2 Fourier coefficients, enabling efficient data acquisition. The second reason is that the more Fourier coefficients are used, the more reliable object detection will be. As only 2 Fourier coefficients instead of the complete Fourier spectrum are needed to be measured, our method is efficient in terms of data acquisition and advantageous for fast moving object detection.

For moving object tracking, we exploit the linear phase shift property of Fourier transform. The property illustrated by Eq. (3) indicates that an image with a displacement of $({{x_0},{y_0}} )$ in spatial domain results in a phase shift of $({ - 2\pi {f_x}{x_0}, - 2\pi {f_y}{y_0}} )$ in Fourier domain:

The displacement of object can be obtained from the phase term $\varphi ={-} 2\pi ({{f_x}{x_0} + {f_y}{y_0}} )$ in Eq. (3) where $\tilde{I}$ is replaced by $\tilde{I} - {\tilde{I}_{\textrm{bg}}}$. As there are 2 unknowns (${x_0}$ and ${y_0}$) in Eq. (3), we can establish a system of 2 equations to solve for the 2 unknowns. The linear equation system requires 2 Fourier coefficients. For simplicity, we propose to acquire $\tilde{I}({{f_x},0} )$ and $\tilde{I}({0,{f_y}} )$ for ${x_0}$ and ${y_0}$, respectively.

The acquisition of Fourier coefficients in Fourier domain can be referred to the *N*-step phase-shifting method. Specifically, 3-step phase shifting method [20] is chosen in this work, as it has demonstrated efficient in data acquisition and robust to noise and illumination fluctuation. To acquire a Fourier coefficient $\tilde{I}({{f_x},{f_y}} )$ by using the 3-step phase-shifting method, 3 Fourier basis patterns are needed. The 3 Fourier basis patterns have an identical spatial frequency pair $({{f_x},{f_y}} )$ but a different initial phase, that is, ${\varphi _0} = 0$, ${{2\pi } \mathord{\left/ {\vphantom {{2\pi } 3}} \right.} 3}$, or ${{4\pi } \mathord{\left/ {\vphantom {{4\pi } 3}} \right.} 3}$, respectively. The Fourier coefficient is obtained by the 3 corresponding single-pixel measurements:

*D*denotes single-pixel measurement, and the subscription of

*D*denotes the initial phase of Fourier basis pattern.

We define that one frame in the proposed method is to detect and to locate the target object for one time. As object detection and object tracking can use the same 2 Fourier coefficients and each coefficient uses 3 patterns, only 6 Fourier basis patterns are needed for one frame. With the corresponding 6 single-pixel measurements, object detection and tracking can be done simultaneously. As pattern displaying rate is commonly lower than data sampling rate, the frame rate of the proposed method is limited by pattern displaying rate. The frame rate is equal to the reciprocal of the time for displaying 6 patterns. Digital micro-mirror devices (DMDs) are a common choice for high-speed spatial light modulation. However, Fourier basis patterns are grayscale while DMDs can only generate binary patterns. Thus, we propose to use binarized Fourier basis patterns for spatial light modulation. The binarization strategy is detailed in [20]. The patterns are denoted by ${P_i}$ $({i = 1,2, \cdots ,6} )$ and the corresponding single-pixel measurements are denoted by ${D_i}$ $({i = 1,2, \cdots ,6} )$. The former 3 patterns are for the acquisition of $\tilde{I}({{f_x},0} )$, that is, ${P_1}({{f_x},{f_y},{\varphi_0}} )= P({{f_x},0,0} )$, ${P_2} = P({{f_x},0,{{2\pi } \mathord{\left/ {\vphantom {{2\pi } 3}} \right.} 3}} )$, and ${P_3} = P({{f_x},0,{{4\pi } \mathord{\left/ {\vphantom {{4\pi } 3}} \right.} 3}} )$. Similarly, the latter 3 patterns are for $\tilde{I}({0,{f_y}} )$.

Assuming the target moving object is not present at the beginning, the single-pixel measurements should be a constant in theory, but vary within a small range in actual. The variation might be caused by noise and/or ambient light. As soon as the moving object enters the scene or starts moving, it will cause large variation to the single-pixel measurements. We use the following criterion to determine the presence of moving object. If the difference between $\bar{D}$ (average of ${D_i}$, $i = 1,2, \cdots ,6$) of the current frame and the average of $\bar{D}$ in the latest 5 frames (current frame excluded) is larger than a threshold, we conclude that moving object is present. The threshold is adaptive and depends on the average and the standard deviation of $\bar{D}$ in the previous frames, which will be detailed in the experiments section. Similarly, we can detect if the object exits the scene or stops moving if the difference is smaller than the threshold.

As soon as moving object is detected, object tracking will be started. Specifically, the single-pixel measurements will be substituted into Eq. (4) to obtain the Fourier coefficients $\tilde{I}({{f_x},0} )$ and $\tilde{I}({0,{f_y}} )$, with which the displacement can be derived through:

## 3. Experiments

We demonstrate the proposed method with experiments. As Fig. 1(a) shows, our experimental set-up consists of an illumination system, a detection system, and a target moving object. The illumination system consists of a 10-watt white LED, a DMD (Taxes Instruments DLP Discovery 6500 development kit), and a projection lens.

The illumination system generates Fourier basis patterns using the DMD which is 0.65 inch in size and has 1920×1080 micro mirrors. The mirror pitch is 7.6 µm. The DMD is under illumination by the LED. The DMD operates at its highest refreshing rate, that is, 10,000 Hz. Fourier basis patterns displayed on the DMD are projected onto the image plane through the projection lens. The field-of-view is determined by the size of patterns at the image plane. In our experiment, the field-of-view is 60 mm×34 mm.

We simulate a fast moving object using a hollow metallic ball. Initially, the ball is at the top of a ‘track’. When the ball is released, it slides down along the track due to gravity (also watch Visualization 1 and Visualization 2). The track is prepared by bending a metallic wire. The track is flat and placed at the image plane of the DMD (illustrated by the red dash box in Fig. 1(a)). As such, the ball is restricted to motion in image plane of the DMD where the illumination patterns are in focus. We prepare two tracks in our experiments. The one is h-shaped (Fig. 1(b)) and the other is s-shaped (Fig. 1(c)).

The detection system consists of a collecting lens and a photodiode amplified (PDA) (Thorlabs PDA100A2). The target moving object is under Fourier basis patterns illumination. The resulting light is collected by the PDA through the collecting lens. The PDA converts the light intensities into analog electric signals which are fed to the computer after digitalization by the data acquisition board (National Instruments USB-6343 BNC). The data acquisition board operates at its maximum sampling rate, that is, 500,000 samples per second. The computer processes the input single-pixel measurements for object detection and tracking. In order to evaluate the accuracy of object tracking, we use a side camera (Point Grey CM3-U3-50S5M-CS) to capture images of the target object and track the object by image analysis. The camera operates at its highest frame rate 70 frames per second. The exposure time is set to be 0.2 ms.

We generate 6 binarized Fourier basis patterns (Fig. 2) for structured illumination. The size of the patterns is 1920×1080 pixels. The spatial frequency pair of ${P_1}\sim {P_3}$ is $({{f_x} = {2 \mathord{\left/ {\vphantom {2 {1\textrm{080}}}} \right.} {1\textrm{080}}},{f_y} = 0} )$ and the spatial frequency pair of ${P_4}\sim {P_6}$ is $({{f_x} = 0,{f_y} = {2 \mathord{\left/ {\vphantom {2 {\textrm{1920}}}} \right.} {\textrm{1920}}}} )$. The patterns are binarized using the Floyd-Steinberg dithering algorithm with the upsampling ratio $k = 1$ [20]. The patterns are repeatedly displayed on the DMD in sequence.

As the sampling rate of the utilized data acquisition board is 50 folds of the DMD refreshing rate, we therefore take 50 samples for each illumination pattern. Each single-pixel measurement ${D_i}$ is referred to the average of 50 samples. For each frame, 6 patterns and the corresponding 6 single-pixel measurements are required. Thus, each frame consumes 600 bytes data (16 bits (2 bytes)/sample × 50 samples/measurement × 6 measurements/frame). The data throughput of the proposed method in our experimental configuration is 1,000,000 bytes (0.95 mega bytes) per second, which is much less than that in high-speed photography systems. As the DMD switch 10,000 patterns per second and each frame requires 6 patterns, the temporal resolution we achieve is 600 µs. Figure 3 shows the acquired $\bar{D}$ of 600 frames for the two tracks.

As Fig. 3 shows, the single-pixel measurements are stationary before the target moving object enters the scene. As soon as the moving object enters, the single-pixel measurements drop rapidly as the ball block a portion of the light. The measurement varies continuously during the objects moves inside the scene. The measurements become stationary after the object leaves the scene.

In our experiments, the threshold used in the criterion for object detection is $\mu \pm ({{1 \mathord{\left/ {\vphantom {1 {40}}} \right.} {40}}} )\sigma $, where $\mu $ and $\sigma $ are the average and the standard deviation of $\bar{D}$ s for the latest 5 frames (current frame excluded), respectively. According to the criterion, we conclude that for the h-shaped track the hollow ball enters the scene at the 142^{nd} frame and exits the scene at the 422^{nd} frame. For the s-shaped track, the hollow ball enters at the 183^{rd} frame and exits at 400^{th} frame. The tracking results are shown in Fig. 4. As the figure shows, the results by the proposed image-free method coincide with the results obtained by high-speed photography. The lengths of h-shaped track and the s-shaped track are 0.135 m, and 0.158 m, respectively. The corresponding average moving speeds are 0.801 m/s and 1.214 m/s, respectively. Such speeds are high, because the ball crosses the scene within less than 0.2 seconds (also watch Visualization 1 and Visualization 2). Due to limited frame rate, the camera only captures 11 images for the h-shaped track and 9 images for the s-shaped track, respectively. With quite a few images captured by the camera, the path of the hollow ball is difficult to recover, as the positioning points are sparse in the space (also watch Visualization 3). On the contrary, the track results by the proposed method can well recover the path of the hollow ball. Particularly, the density of the positioning points varies with the speed of the ball accordingly. The positioning points are dense where the ball moves slowly (for instance, at the corners) and vice versa.

## 4. Discussion

The lowest detectable moving speed is determined by the pixel size of illumination pattern at the image plane, because the motion cannot be detected if the displacement of the object is less than one pixel size at the image plane within one frame interval. The highest detectable moving speed is determined by spatial frequency of the utilized Fourier basis patterns. Due to the periodicity of the Fourier basis patterns, it will cause 2-pi phase ambiguity if the displacement of object within one frame interval is larger than a fringe period. Thus, the higher spatial frequency of the used Fourier basis patterns is, the lower highest detectable speed will be.

In our experiment, the data acquisition time is limited by the utilized spatial light modulator. It is reasonable that one can use a high-speed spatial light modulator (for example, high-speed LED matrix [24,25]) so as to reduce the data acquisition time and achieve a higher frame rate.

The computational complexity of the proposed method is low. Averagely, the computational time is 67.3 µs for each frame (evaluated on a computer with an Intel 8700K 3.7 GHz CPU, 24GB RAM, and MATLAB 2014a). In other words, the latency time is on the order of microseconds. As the computational time is shorter than the data acquisition time, object detection and tracking can be conducted on-the-fly and data accumulation can be avoided, which allows for long-duration object detection and tracking.

The proposed method is potentially a cost-effective solution to real-time object detection and tracking. As only 6 binary patterns are needed, one can use a low-cost programmable projector, instead of the DMD development kit used in our experiment, for spatial light modulation.

We acknowledge that the proposed method at the current stage has two limitations. The one is that it can only detect and track only one moving object at a time. The other is the proposed method can only achieve 2-D tracking. Extending the method to multiple objects detection and 3-D tracking is our future work.

## 5. Conclusion

We propose an image-free and cost-effective method for object detection and tracking. The proposed method achieves a temporal resolution of 1/1666 seconds by using a 10,000 Hz DMD. The computationally efficient algorithm of the proposed method enables low latency time. Thus, the method is suitable for high-speed moving object detection and tracking in real time and for long duration. Given the advantages of wide spectral response by single-pixel detector, the proposed method potentially works at invisible wavebands, allowing hidden moving object detection and tracking. This image-free method also opens up a new avenue for spatial information acquisition in a highly efficient manner.

## Funding

Fundamental Research Funds for the Central Universities (11618307); National Natural Science Foundation of China (61875074, 61905098).

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **N. Ogawa, H. Oku, K. Hashimoto, and M. Ishikawa, “Microrobotic visual control of motile cells using high-speed tracking system,” IEEE Trans. Robot. **21**(4), 704–712 (2005). [CrossRef]

**2. **C. Theobalt, I. Albrecht, J. Haber, M. Magnor, and H. P. Seidel, “Pitching a baseball: tracking high-speed motion with multi-exposure images,” ACM Trans. Graph. **23**(3), 540–547 (2004). [CrossRef]

**3. **P. Li, D. Wang, L. Wang, and H. Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recogn. **76**, 323–338 (2018). [CrossRef]

**4. **H. Yang, L. Shao, F. Zheng, L. Wang, and Z. Song, “Recent advances and trends in visual tracking: A review,” Neurocomputing **74**(18), 3823–3831 (2011). [CrossRef]

**5. **P. Bahl, V. N. Padmanabhan, V. Bahl, and V. Padmanabhan, “RADAR: An in-building RF-based user location and tracking system,” in * Proceedings IEEE INFOCOM 2000* (IEEE, 2000), pp. 775.

**6. **A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G. Bawendi, and R. Raskar, “Recovering three-dimensional shape around a corner using ultrafast time-of-flight imaging,” Nat. Commun. **3**(1), 745 (2012). [CrossRef]

**7. **U. Wandinger, “Introduction to lidar,” in * Lidar* (Springer, New York, 2005).

**8. **F. G. Fernald, “Analysis of atmospheric lidar observations: some comments,” Appl. Opt. **23**(5), 652–653 (1984). [CrossRef]

**9. **S. Sato, M. Hashimoto, M. Takita, K. Takagi, and T. Ogawa, “Multilayer lidar-based pedestrian tracking in urban environments,” in * 2010 IEEE Intelligent Vehicles Symposium* (IEEE, 2010), pp. 849–854.

**10. **G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, “Detection and tracking of moving objects hidden from view,” Nat. Photonics **10**(1), 23–26 (2016). [CrossRef]

**11. **Y. Fang, M. Ichiro, and H. Berthold, “Depth-based target segmentation for intelligent vehicles: Fusion of radar and binocular stereo,” IEEE Trans. Intell. Transport. Syst. **3**(3), 196–202 (2002). [CrossRef]

**12. **M. S. Wei, F. Xing, and Z. You, “A real-time detection and positioning method for small and weak targets using a 1D morphology-based approach in 2D images,” Light: Sci. Appl. **7**(5), 18006 (2018). [CrossRef]

**13. **D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Machine Intell. **25**(5), 564–577 (2003). [CrossRef]

**14. **W. Zhong, H. Lu, and M. H. Yang, “Robust object tracking via sparsity-based collaborative model,” * IEEE Conference on Computer vision and pattern recognition* (IEEE, 2015), pp. 1838–1845.

**15. **D. Shi, K. Yin, J. Huang, K. Yuan, W. Zhu, C. Xie, D. Liu, and Y. Wang, “Fast tracking of moving objects using single-pixel imaging,” Opt. Commun. **440**, 155–162 (2019). [CrossRef]

**16. **M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics **13**(1), 13–20 (2019). [CrossRef]

**17. **M. Sun and J. Zhang, “Single-pixel imaging and its application in three-dimensional reconstruction: a brief review,” Sensors **19**(3), 732 (2019). [CrossRef]

**18. **Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of Fourier spectrum acquisition,” Nat. Commun. **6**(1), 6225 (2015). [CrossRef]

**19. **Z. Zhang and J. Zhong, “Three-dimensional single-pixel imaging with far fewer measurements than effective image pixels,” Opt. Lett. **41**(11), 2497–2500 (2016). [CrossRef]

**20. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Fast Fourier single-pixel imaging via binary illumination,” Sci. Rep. **7**(1), 12029 (2017). [CrossRef]

**21. **Z. Zhang, S. Liu, J. Peng, M. Yao, G. Zheng, and J. Zhong, “Simultaneous spatial, spectral, and 3D compressive imaging via efficient Fourier single-pixel measurements,” Optica **5**(3), 315–319 (2018). [CrossRef]

**22. **R. She, W. Liu, Y. Lu, Z. Zhou, and G. Li, “Fourier single-pixel imaging in the terahertz regime,” Appl. Phys. Lett. **115**(2), 021101 (2019). [CrossRef]

**23. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus Fourier single-pixel imaging,” Opt. Express **25**(16), 19619–19639 (2017). [CrossRef]

**24. **Z. H. Xu, W. Chen, J. Penuelas, M. Padgett, and M. J. Sun, “1000 fps computational ghost imaging using LED-based structured illumination,” Opt. Express **26**(3), 2427–2434 (2018). [CrossRef]

**25. **E. Balaguer, P. Carmona, C. Chabert, F. Pla, J. Lancis, and E. Tajahuerce, “Low-cost single-pixel 3D imaging by using an LED array,” Opt. Express **26**(12), 15623–15631 (2018). [CrossRef]