Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Single-pixel foreground imaging without a priori background sensing

Open Access Open Access

Abstract

The single-pixel imaging technique, which is significantly different from conventional multi-pixel imaging, utilizes the signal recorded by a single-pixel detector and a stream of structured illumination patterns to reconstruct an image. We design and experimentally demonstrate a real-time single-pixel foreground imaging system with fewer samples and without a priori sensing of the background by performing incremental principal component analysis on online compressed sampling data. A fast ℓ1 compressed sensing algorithm is adopted to realize real-time foreground imaging of 10 frames per second with an image size of 127 × 127 pixels and a compression ratio of 3%. When applied to a surveillance system that requires long-distance video transmission, this scheme can greatly reduce the compression ratio and allow the system to work with smaller communication bandwidths.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

A surveillance video scene can be divided into two parts: background and foreground. The background refers to the static scene of the video image, and the foreground refers to the moving targets [1]. Foreground extraction has attracted great attention in the fields of image processing and computer vision [15]. It aims at simultaneously separating the video background and extracting the moving objects from a video stream. Typical applications of background subtraction include object tracking [68], urban traffic detection [9], long-term scene monitoring [10], and video compression [11]. In a traditional surveillance system, the high-quality scene images are first captured by a camera . Then, the images are transmitted to the processing terminal through a transmission network. Due to the high transmission rate requirements of high-definition images, real-time monitoring in the terminal is impossible with a limited bandwidth transmission network.

Single-pixel imaging (SPI) is a computational imaging scheme [12] that has emerged in the last decade, and many SPI architectures have been demonstrated for various applications: infrared and terahertz imaging [13,14], microscopy [15], gas leak monitoring [16], wavefront reconstruction [17,18], three-dimensional imaging [19], ultrahigh-precision hyperspectral imaging [20], ghost imaging with machine learning for high-throughput cytometry [21], and quantitative phase microscopy under weak-light intensity [22]. Traditional imaging systems need a multi-pixel camera to record the image. In contrast, a sequence of structured patterns can be used to illuminate the object and a single-pixel detector can be utilized to record the light intensity reflected from the object [23]. This imaging technique is inherently cost effective for imaging beyond visible optical wavelengths, such as infrared or X ray wavelengths [24], for which the array sensors are expensive. Till date, only a few works have focused on foreground extraction in the field of SPI. In 2013, Omar et al. presented a compressive sensing protocol that tracks a moving object using entangled photons [25]. Instead of imaging the moving object, Shi and his colleagues extended their scheme to only acquire target locations [26]. Combining this scheme with the complementary modulation technique, Yu et al. retrieved object silhouettes and trajectories with high image quality and strong robustness against the changing illumination and the noise [27]. However, all these schemes demand a pre-measurement of the background before foreground extraction.

Recently, Sun et al. proposed a temporal intensity difference correlation ghost imaging scheme, in which an image of the moving object could be separated from a complex scene with a compression ratio of $31\%$ [28]. Here, we implement real-time foreground extraction in a surveillance system with the SPI technique, without a priori sensing of the background. In our scheme, the sparsity $K$ of the imaging scene becomes significantly small, and it results in a fewer number of measurements. We experimentally realize the real-time foreground imaging of $10$ frames per second with image size of $127\times 127$ pixels and a compression ratio of $3\%$. A fast and stable $\ell _1$ compressed sensing algorithm is used in our scheme.

2. Principles and methods

Reducing sampling has always been the major goal of SPI [25,29]. In 2006, Wakin and his colleagues at Rice University applied compressed sensing for SPI and achieved image reconstruction from partial information of the scene [30]. Limited by the weak sparseness of the scene in the real world, their scheme failed to reduce the compression ratio to a small enough value suitable for real-time imaging. In 2016, Soldevila et al. introduced balanced detection and simultaneous complementary illumination [31], which halved the sampling of SPI. Recently, Higham et al. demonstrated the application of convolutional auto-encoder networks on recovering real-time $128\times 128$ pixels video at $30$ frames per second with a compression ratio of $2\%$ [32].

When compared to deep learning methods [32], compressed sensing algorithms do not require large amounts of training data. The $\ell _1$ optimization problem that we solve in this work is defined as [33]

$$\textrm{min}_x\|{\Psi}x\|_1, \quad \textrm{subject to} \; {\Phi}x=b,$$
where $x$ denotes the image of the scene, treated as a vector in $\mathbb {R}^N$, and $N$ is the number of pixels. $\Psi$ is a change-of-basis matrix, and ${\Psi }x$ is sparse, ideally. $\Phi$ is the measurement operator, represented as an $M\times N$ matrix, and $b={\Phi }x$ is the $M$-dimensional vector of linear samples of $x$. Provided $\Phi$ and $\Psi$ are sufficiently incoherent (roughly that rows of $\Phi$ cannot be sparsely expressed in terms of columns of ${\Psi }$, and vice versa), the solution of Eq. (1) will be exactly $x$. For the specific convex optimization problem of Eq. (1), $M$ and $N$ should satisfy [34]
$$M{\geqslant}CK\log(N/K),$$
for some constant $C$, where $K$ is the image sparsity $\|{\Psi }x\|_0$. From Eq. (2), we know that the smaller the image sparsity $K$ , the fewer the measurements $M$ required for recovering the scene image $x$.

The process of foreground surveillance system based on SPI is shown in Fig. 1. The $n$th frame of the real-time scene in Fig. 1(a) is expressed as $x_n$. The scene $x_n$ is sampled by a single-pixel camera and the data are noted as $b_n$ in Fig. 1(b). Since compressive measurement is used in the above sampling process, the dimension of $b_n$ is small and it can be transmitted to the terminal with a small bandwidth network. Figure 1(c) depicts a series of data processing procedures after the samples arrive at the terminal. There is a database in which the sampling data $\{b_n,b_{n-1},b_{n-2}, \ldots , b_1\}$ from the $n$ frames are stored, containing the current frame and the previous $n-1$ frames. Since the background of the $n$ frames in the database does not change significantly, we can find the largest principal component ${\Phi }^T b^g$ in the database $\{{\Phi }^T b_n,{\Phi }^T b_{n-1},{\Phi }^T b_{n-2}, \ldots ,{\Phi }^T b_1\}$ by the incremental principal component analysis algorithm (IPCA) [35]. $b^g$ can be expressed as ${\Phi }x^g+\epsilon$, where $x^g$ represents the unknown background and $\epsilon$ is the error induced by using the maximum principal component to estimate background and the noise in sampling data $b_n$. As the IPCA needs plenty of data to make sure that the maximum principal component is sufficiently close to the background, it costs approximately 10 frames for the system initialization. The foreground sampling result of the current frame can be expressed as

$$b_{n}^{f}=\Phi({\Phi}^T b_n-{\Phi}^T b^g).$$

 figure: Fig. 1.

Fig. 1. Schematic of foreground surveillance system based on single-pixel imaging: (a) the scene, (b) the measurement system, and (c) the terminal of the system.

Download Full Size | PDF

Since the foreground $x_{n}^{f}$ is sufficiently sparse, $\Psi$ in Eq. (1) is directly selected as unit matrix $I$ instead of the total variation (TV) transform operator used in the traditional scheme [36]. The speed of our image reconstruction algorithm is improved as the TV transform is not calculated during each iteration. In addition, considering the error comes from $\epsilon$ in $b^{f}_{n}$, the foreground $x_{n}^{f}$ can be solved by a model considering noise:

$$\textrm{min}_{x_{n}^{f}}\|x_{n}^{f}\|_1+\frac{1}{2\rho}\|{\Phi}x_{n}^{f}-b_{n}^{f}\|^2_2,$$
where $\rho$ should be chosen close to zero, and slightly adjusted according to the noise level. We use the YALL1 algorithm to solve the optimization problem of Eq. (4). YALL1 is one of the fastest algorithms to solve the $\ell _1$ problem [37]. $\Phi$ is selected as a partial Fourier spectrum sampling operator. The method of acquiring the Fourier spectrum of an object using SPI technique was proposed in 2015 [38] by Zhang et al.. This method used phase-shifting sinusoidal structured illumination for spectrum acquisition. The scene image can be reconstructed by applying inverse Fourier transform to the obtained Fourier spectrum. Due to the differential data process that exists in the four-step phase shift method, one can effectively reduce the noise in the data acquisition process.

3. Experimental

3.1 Computational simulations

We tested our imaging proposal on two sets of real surveillance video images. The raw data was obtained from two available background subtraction datasets with precisely known ground-truths [39,40], and the resolution of the images is resampled as $239\times 239$ pixels. Figures 2(a1)–(a5) and Figs. 2(d1)–(d5) show a series of raw images in the original videos, and Figs. 2(b1)–(b5) and Figs. 2(e1)–(e5) are the foreground ground-truth. In the foreground imaging process, the number of samplings depends on the size and sparseness of the foreground target. Since the energy of the natural scene is concentrated in the low frequency portion, the radial partial Fourier spectrum with more low frequency components is chosen as our sampling matrix [33]. To reduce the number of samplings and ensure that the reconstructed foreground image is undistorted, the number of radial lines in the Fourier sampling matrix for each of the targets is optimally chosen. The sampling matrix is shown in Fig. 4(b). For example, the sampling matrix is chosen as 20 radial lines in the Fourier domain in the computational simulations. Considering the symmetry of the Fourier spectrum of real functions, we choose the compression ratio of $4\%$. In each of the reconstructed foregrounds, $4\%$ of the Fourier spectrum of the raw images are sampled [38]. Then, the method proposed in section 2 is adopted to update the maximum principal component and extract the foreground sampling result $b^{f}_{n}$. Finally, the foreground image is reconstructed by using Eq. (4) with $\rho =0.05$. In the scenes of Figs. 2(a1)–(a5), one person is walking across the rail from right to left. Figures 2(c1)–(c5) show the reconstructed foreground images.

 figure: Fig. 2.

Fig. 2. Simulated results from two sets of real surveillance videos. (a1)–(a5) and (d1)–(d5) are 10 image frames obtained from raw surveillance videos, (b1)–(b5) and (e1)–(e5) are the corresponding foreground ground-truth. (c1)–(c5) and (f1)–(f5) are the reconstructed foreground using the proposed method with a compression ratio of $4\%$. (g) and (h) are the curves of the mean MSE versus the compression ratio for the two sets of videos.

Download Full Size | PDF

Figures 2(d1)–(d5) belong to a surveillance video of an airport, where an airplane is passing through the field of view from right to left. The sampling matrix is chosen to be the same as the above simulation. Figures 2(f1)–(f5) show the reconstructed foreground images. Figures 2(g) and (h) show the mean squared error (MSE) between the reconstructed foreground images and the ground-truth with different compression ratios, respectively. It is averaged on all the frames. The results show that our method with the $4\%$ Fourier spectrum sampling works well for reconstructing small moving targets.

3.2 Laboratory experiments

The experimental setup is shown in Fig. 3. A LED lamp illuminates the scene, which is composed of a static background and a moving target located at a distance of approximately $2$ m from the SPI system. The scene is imaged onto the plane of a high-speed digital micromirror device (DMD, Texas Instrument DLP7000) with lens L1. A sequence of spatial dithering type binary Fourier patterns [41] are loaded on the DMD memory and displayed with an exposure time of $44$ $\mathrm{\mu}$s for each pattern in our experiments. An amplified photodetector (Thorlabs PDA36A-EC) is used to measure the total light intensity reflected from the DMD. The light intensity data are sampled by an analogue-to-digital converter (NI USB-6361) and then transmitted to a personal computer ( AMD Ryzen 7 2700X eight-core CPU at 3.7 GHz and 16 GB of memory) for foreground extraction.

 figure: Fig. 3.

Fig. 3. Experimental setup. L1, L2: lens, Scene: the moving targets and static background, DMD: digital micromirror device, PDA: amplified photodetector, PC: personal computer, ADC: analogue-to-digital converter, M: mirror, LED: white LED light source.

Download Full Size | PDF

Figure 4 shows two experimental imaging scenes and the corresponding partial Fourier spectrum sampling matrices. In Figs. 4(a) and 4(d), the toy soldier and the cardboard drawn with a circular pattern are placed on a stepper motor. They move from left to right and right to left, respectively. The size of the reconstructed image is set to $127\times 127$ pixels. The sampling matrix of the scene in Fig. 4(a) contains 10 radial lines in the Fourier domain which is shown in Fig. 4(b). Due to the symmetry of the Fourier spectrum, the experimental sampling matrix can be chosen as Fig. 4(c), which contains $621$ Fourier components with compression ratio of $4\%$. The size of the foreground target in Fig. 4(d) is smaller than the target in Fig. 4(a), so the number of samples can be further reduced. Figure 4(e) is the corresponding sampling matrix which contains $8$ radial lines in the Fourier domain. Figure 4(f) shows the experimental sampling matrix with $501$ Fourier components, and the compression ratio is $3\%$.

 figure: Fig. 4.

Fig. 4. Experimental scenes and sampling matrices in Fourier domain, (a) and (d) are two real scenes recorded by a color camera, in which the moving targets have different sizes. (b) is the sampling matrix corresponding to scene (a), it contains $10$ radial lines in the Fourier domain. (c) is the experimental sampling matrix of the scene when considering the symmetry of Fourier spectrum of the real function, and it contains $621$ Fourier components. (e) is the sampling matrix corresponding to scene (d), it contains $8$ radial lines in Fourier domain. (f) is the experimental sampling matrix of scene (d), and it contains $501$ Fourier components.

Download Full Size | PDF

Figures 5(k)–5(o) are the $5$ sample frames from the raw video data (Visualization 1) with sampling interval of $5$ frames. The first two rows in Fig. 5 are the intermediates during the image reconstruction process in Fig. 1. Figures 5(a)–5(e) are obtained by ${\Phi }^T b_n$ in Fig. 1, which are results from the direct inverse Fourier transform of the partial spectrum shown in Fig. 4(b). Figures 5(f)–5(j) are the results of Figs. 5(a)–5(e) minus the maximum principal component of the current frame and all previous frames. It can be expressed as ${\Phi }^T b_n-{\Phi }^T b^g$. To reduce the imaging time, we obtain the Fourier spectrum by performing three-step phase shift on the sinusoidal patterns [41] instead of the usual four-step phase shift. To capture $621$ Fourier components, $1863$ patterns are displayed on the DMD, and the corresponding sampling time for each reconstructed image is found to be $0.08$ s. When processing the online data stream, the IPCA does not need to calculate the covariance matrix, and the principal component can be updated according to the new incoming data. So, the calculation time of the largest principal component can be ignored. The reconstruction part of the YALL1 algorithm consumes most of the time spent by the system on data processing. The number of main iterations of the YALL1 algorithm for each reconstructed image is less than 50 in all the experiments and the consumed time is about $0.04$ s. The real-time imaging frame rate can reach $8$ frames per second. Although we have reduced the compression ratio to $4\%$, the sampling time still plays a decisive role. If the size of the foreground targets continues to decrease, the compression ratio can be reduced to achieve a higher real-time imaging frame rate. All the experiments were performed in the same noise environment and the model input parameter $\rho$ in Eq. (4) is chosen as $0.05$.

 figure: Fig. 5.

Fig. 5. Sample frames of foreground extraction. (a)–(e) are the results of inverse Fourier transform after partial Fourier spectral sampling of a moving target, which is equivalent to ${\Phi }^T b_n$ in Eq. (3). (f)–(j) are obtained by removing the largest principal component in (a)–(e), respectively, and they are the results of ${\Phi }^T b_n-{\Phi }^T b^g$ in Eq. (3). $b_{n}^{f}$ in Eq. (4) is obtained by performing measurement operator $\Phi$ on (f)–(j), and the sample frames (k)–(o) from the raw video data (Visualization 1) are reconstructed by solving the optimization problem Eq. (4).

Download Full Size | PDF

Figures 6(a)–6(d) are $4$ frames that are selected from the raw video data (Visualization 2, Visualization 3, Visualization 4, and Visualization 5) with $4$ different toy soldiers moving from left to right, respectively. Most of the details of the toy soldiers could be reconstructed. Figures 6(e)–6(h) show the samples from the raw video data (Visualization 6, Visualization 7, Visualization 8, and Visualization 9) of four different cards that are driven by a stepper motor from right to left in the scene as shown in Fig. 4(d). Drawn on the four cards are a circle, a triangle, the letter S, and the letter C, respectively. As the sizes of these foreground targets are smaller than the toy soldiers, we can change the sampling matrix to the one shown in Fig. 4(f). By reducing the number of samplings and properly adjusting the stop condition of the YALL1 algorithm, we can achieve real-time foreground imaging of $10$ frames per second.

 figure: Fig. 6.

Fig. 6. Foreground extraction with different scenes. (a)–(d) are four frames taken from the raw video data (Visualization 2, Visualization 3, Visualization 4, and Visualization 5), respectively, which are real-time imaging results of four different toy soldiers moving from left to right in the scene shown in Fig. 3(a). (e)–(h) are four frames taken from the raw video data (Visualization 6, Visualization 7, Visualization 8, and Visualization 9), respectively, which are real-time imaging results of four different cards moving from right to left in the scene shown in Fig. 3(d). The insets in the upper left corner of each result are the experimental targets.

Download Full Size | PDF

In actual surveillance application scenarios, the target generally appears in areas with complicated backgrounds, such as roads and sidewalks, the background components or color of which may change drastically. The quality of the foreground extraction image often deteriorates when the targets passes through the aforementioned complicated background scenes. We experimentally construct a scene with a sharp changing background as shown in Fig. 7(a), where a brown cardboard is placed between the white background and toy soldier. The toy soldier passes the brown cardboard from right to left. Figures 7(b)–7(d) show three experimental frames from the raw video data (Visualization 10), and they represent the partial entry of the target to the cardboard area, full entry to the cardboard area, and exit from the cardboard area, respectively. It can be seen that although the sharp changing background affects the reconstructed foreground image results, the outline of the target is still clear. It shows the strong robustness of our proposed method to sharp changing backgrounds.

 figure: Fig. 7.

Fig. 7. Sample frames from the raw video data (Visualization 10) when the background of the scene changes drastically. (a) The scene captured by a color camera. (b) Part of the target enters the area with the cardboard background (changing background area). (c) Target fully enters the changing background area. (d) Part of the target leaves the changing background area.

Download Full Size | PDF

4. Conclusion

In this work, we experimentally demonstrated a foreground surveillance system with a single-pixel imaging technique, which greatly decreases the compression ratio. More importantly, the images of the foreground targets can be reconstructed without a priori background sensing. Combining compressive sensing and IPCA, we achieved real-time foreground monitoring of $10$ frames per second with an imaging size of $127\times 127$ pixels and a compression ratio of $3\%$. Our scheme greatly reduces the bandwidth requirements for transmitting sampled data. Furthermore, the wide operational optical spectrum bandwidth ($400-2500$ nm) of the DMD could enable the foreground monitoring system to be extended to imaging with non-visible wavelengths, such as infrared, which allows the surveillance system to work at night. The limitation of the proposed method is the weak separation capability of a complex configured foreground from a complex background, such as a dynamic background with illumination changes, smog, or snow. Temporal artifacts exist in rare reconstructed frames. The artifacts arise from the IPCA, which is a non-convex algorithm and its accuracy cannot be proved theoretically. In future work, we will focus on developing better performing algorithms for more complex background subtraction problems.

Funding

National Natural Science Foundation of China (11534008, 11971374, 91736104); Ministry of Science and Technology of the People's Republic of China (2016YFA0301404); Natural Science Foundation of Shaanxi Province (2017JQ1025); Doctoral Program Foundation of Institutions of Higher Education of China (2016M592772); Fundamental Research Funds for the Central Universities.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. M. Piccardi, “Background subtraction techniques: a review,” in 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol. 4 (IEEE, 2004), pp. 3099–3104.

2. T. Bouwmans, “Recent advanced statistical background modeling for foreground detection-a systematic survey,” CSENG 4(3), 147–176 (2011). [CrossRef]  

3. S. Brutzer, B. Höferlin, and G. Heidemann, “Evaluation of background subtraction techniques for video surveillance,” in CVPR 2011, (IEEE, 2011), pp. 1937–1944.

4. Y. Benezeth, P.-M. Jodoin, B. Emile, H. Laurent, and C. Rosenberger, “Comparative study of background subtraction algorithms,” J. Electron. Imaging 19(3), 033003 (2010). [CrossRef]  

5. A. Sobral and A. Vacavant, “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos,” Comput Vis Image Underst. 122, 4–21 (2014). [CrossRef]  

6. C. Beleznai, B. Fruhstuck, and H. Bischof, “Multiple object tracking using local pca,” in 18th International Conference on Pattern Recognition (ICPR’06), vol. 3 (IEEE, 2006), pp. 79–82.

7. R. Chen, Y. Tong, J. Yang, and M. Wu, “Video foreground detection algorithm based on fast principal component pursuit and motion saliency,” Comput. Intel. Neurosc. 2019, 1–11 (2019). [CrossRef]  

8. C. Guyon, T. Bouwmans, and E.-H. Zahzah, “Foreground detection by robust pca solved via a linearized alternating direction method,” in International Conference Image Analysis and Recognition, (Springer, 2012), pp. 115–122.

9. S.-C. S. Cheung and C. Kamath, “Robust background subtraction with foreground validation for urban traffic video,” EURASIP J. Adv. Signal Process. 2005(14), 726261 (2005). [CrossRef]  

10. A. W. Senior, Y. Tian, and M. Lu, “Interactive motion analysis for video surveillance and long term scene monitoring,” in Asian Conference on Computer Vision, (Springer, 2010), pp. 164–174.

11. W. Cao, Y. Wang, J. Sun, D. Meng, C. Yang, A. Cichocki, and Z. Xu, “Total variation regularized tensor rpca for background subtraction from compressive measurements,” IEEE Trans. on Image Process. 25(9), 4075–4090 (2016). [CrossRef]  

12. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13(1), 13–20 (2019). [CrossRef]  

13. M. P. Edgar, G. M. Gibson, R. W. Bowman, B. Sun, N. Radwell, K. J. Mitchell, S. S. Welsh, and M. J. Padgett, “Simultaneous real-time visible and infrared video with single-pixel detectors,” Sci. Rep. 5(1), 10669 (2015). [CrossRef]  

14. D. Shrekenhamer, C. M. Watts, and W. J. Padilla, “Terahertz single pixel imaging with an optically controlled dynamic spatial light modulator,” Opt. Express 21(10), 12507–12518 (2013). [CrossRef]  

15. N. Radwell, K. J. Mitchell, G. M. Gibson, M. P. Edgar, R. Bowman, and M. J. Padgett, “Single-pixel infrared and visible microscope,” Optica 1(5), 285–289 (2014). [CrossRef]  

16. G. M. Gibson, B. Sun, M. P. Edgar, D. B. Phillips, N. Hempler, G. T. Maker, G. P. Malcolm, and M. J. Padgett, “Real-time imaging of methane gas leaks using a single-pixel camera,” Opt. Express 25(4), 2998–3005 (2017). [CrossRef]  

17. R. Liu, S. Zhao, P. Zhang, H. Gao, and F. Li, “Complex wavefront reconstruction with single-pixel detector,” Appl. Phys. Lett. 114(16), 161901 (2019). [CrossRef]  

18. S. Zhao, R. Liu, P. Zhang, H. Gao, and F. Li, “Fourier single-pixel reconstruction of a complex amplitude optical field,” Opt. Lett. 44(13), 3278–3281 (2019). [CrossRef]  

19. M.-J. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Radwell, R. Lamb, and M. J. Padgett, “Single-pixel three-dimensional imaging with time-based depth resolution,” Nat. Commun. 7(1), 12010 (2016). [CrossRef]  

20. K. Shibuya, T. Minamikawa, Y. Mizutani, H. Yamamoto, K. Minoshima, T. Yasui, and T. Iwata, “Scan-less hyperspectral dual-comb single-pixel-imaging in both amplitude and phase,” Opt. Express 25(18), 21947–21957 (2017). [CrossRef]  

21. S. Ota, R. Horisaki, Y. Kawamura, M. Ugawa, I. Sato, K. Hashimoto, R. Kamesawa, K. Setoyama, S. Yamaguchi, K. Fujiu, and H. N. Kayo Waki, “Ghost cytometry,” Science 360(6394), 1246–1251 (2018). [CrossRef]  

22. K. Shibuya, H. Araki, and T. Iwata, “Photon-counting-based diffraction phase microscopy combined with single-pixel imaging,” Jpn. J. Appl. Phys. 57(4), 042501 (2018). [CrossRef]  

23. S. M. Khamoushi, Y. Nosrati, and S. H. Tavassoli, “Sinusoidal ghost imaging,” Opt. Lett. 40(15), 3452–3455 (2015). [CrossRef]  

24. A.-X. Zhang, Y.-H. He, L.-A. Wu, L.-M. Chen, and B.-B. Wang, “Tabletop x-ray ghost imaging with ultra-low radiation,” Optica 5(4), 374–377 (2018). [CrossRef]  

25. O. S. Magana-Loaiza, G. A. Howland, M. Malik, J. C. Howell, and R. W. Boyd, “Compressive object tracking using entangled photons,” Appl. Phys. Lett. 102(23), 231104 (2013). [CrossRef]  

26. D. Shi, K. Yin, J. Huang, K. Yuan, W. Zhu, C. Xie, D. Liu, and Y. Wang, “Fast tracking of moving objects using single-pixel imaging,” Opt. Commun. 440, 155–162 (2019). [CrossRef]  

27. W.-K. Yu, X.-R. Yao, X.-F. Liu, L.-Z. Li, and G.-J. Zhai, “Compressive moving target tracking with thermal light based on complementary sampling,” Appl. Opt. 54(13), 4249–4254 (2015). [CrossRef]  

28. S. Sun, H. Lin, Y. Xu, J. Gu, and W. Liu, “Tracking and imaging of moving objects with temporal intensity difference correlation,” Opt. Express 27(20), 27851–27861 (2019). [CrossRef]  

29. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]  

30. M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk, “An architecture for compressive imaging,” in 2006 International Conference on Image Processing, (IEEE, 2006), pp. 1273–1276.

31. F. Soldevila, P. Clemente, E. Tajahuerce, N. Uribe-Patarroyo, P. Andrés, and J. Lancis, “Computational imaging with a balanced detector,” Sci. Rep. 6(1), 29181 (2016). [CrossRef]  

32. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]  

33. E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006). [CrossRef]  

34. R. Chartrand, “Nonconvex compressive sensing and reconstruction of gradient-sparse images: random vs. tomographic fourier sampling,” in 2008 15th IEEE International Conference on Image Processing, (IEEE, 2008), pp. 2624–2627.

35. J. Weng, Y. Zhang, and W.-S. Hwang, “Candid covariance-free incremental principal component analysis,” IEEE Trans. Pattern Anal. Machine Intell. 25(8), 1034–1040 (2003). [CrossRef]  

36. J. Yang, Y. Zhang, and W. Yin, “A fast alternating direction method for tvl1-l2 signal reconstruction from partial fourier data,” IEEE J. Sel. Top. Signal Process. 4(2), 288–297 (2010). [CrossRef]  

37. “Yall1 package,” http://yall1.blogs.rice.edu/ (2010).

38. Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of fourier spectrum acquisition,” Nat. Commun. 6(1), 6225 (2015). [CrossRef]  

39. “walker,” http://www.cs.cmu.edu/yaser/new_backgroundsubtraction.htm (2010).

40. “airport,” http://www.agvs-caac.com/ (2010).

41. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Fast fourier single-pixel imaging via binary illumination,” Sci. Rep. 7(1), 12029 (2017). [CrossRef]  

Supplementary Material (10)

NameDescription
Visualization 1       Visualization 1
Visualization 2       Visualization 2
Visualization 3       Visualization 3
Visualization 4       Visualization 4
Visualization 5       Visualization 5
Visualization 6       Visualization 6
Visualization 7       Visualization 7
Visualization 8       Visualization 8
Visualization 9       Visualization 9
Visualization 10       Visualization 10

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Schematic of foreground surveillance system based on single-pixel imaging: (a) the scene, (b) the measurement system, and (c) the terminal of the system.
Fig. 2.
Fig. 2. Simulated results from two sets of real surveillance videos. (a1)–(a5) and (d1)–(d5) are 10 image frames obtained from raw surveillance videos, (b1)–(b5) and (e1)–(e5) are the corresponding foreground ground-truth. (c1)–(c5) and (f1)–(f5) are the reconstructed foreground using the proposed method with a compression ratio of $4\%$. (g) and (h) are the curves of the mean MSE versus the compression ratio for the two sets of videos.
Fig. 3.
Fig. 3. Experimental setup. L1, L2: lens, Scene: the moving targets and static background, DMD: digital micromirror device, PDA: amplified photodetector, PC: personal computer, ADC: analogue-to-digital converter, M: mirror, LED: white LED light source.
Fig. 4.
Fig. 4. Experimental scenes and sampling matrices in Fourier domain, (a) and (d) are two real scenes recorded by a color camera, in which the moving targets have different sizes. (b) is the sampling matrix corresponding to scene (a), it contains $10$ radial lines in the Fourier domain. (c) is the experimental sampling matrix of the scene when considering the symmetry of Fourier spectrum of the real function, and it contains $621$ Fourier components. (e) is the sampling matrix corresponding to scene (d), it contains $8$ radial lines in Fourier domain. (f) is the experimental sampling matrix of scene (d), and it contains $501$ Fourier components.
Fig. 5.
Fig. 5. Sample frames of foreground extraction. (a)–(e) are the results of inverse Fourier transform after partial Fourier spectral sampling of a moving target, which is equivalent to ${\Phi }^T b_n$ in Eq. (3). (f)–(j) are obtained by removing the largest principal component in (a)–(e), respectively, and they are the results of ${\Phi }^T b_n-{\Phi }^T b^g$ in Eq. (3). $b_{n}^{f}$ in Eq. (4) is obtained by performing measurement operator $\Phi$ on (f)–(j), and the sample frames (k)–(o) from the raw video data (Visualization 1) are reconstructed by solving the optimization problem Eq. (4).
Fig. 6.
Fig. 6. Foreground extraction with different scenes. (a)–(d) are four frames taken from the raw video data (Visualization 2, Visualization 3, Visualization 4, and Visualization 5), respectively, which are real-time imaging results of four different toy soldiers moving from left to right in the scene shown in Fig. 3(a). (e)–(h) are four frames taken from the raw video data (Visualization 6, Visualization 7, Visualization 8, and Visualization 9), respectively, which are real-time imaging results of four different cards moving from right to left in the scene shown in Fig. 3(d). The insets in the upper left corner of each result are the experimental targets.
Fig. 7.
Fig. 7. Sample frames from the raw video data (Visualization 10) when the background of the scene changes drastically. (a) The scene captured by a color camera. (b) Part of the target enters the area with the cardboard background (changing background area). (c) Target fully enters the changing background area. (d) Part of the target leaves the changing background area.

Equations (4)

Equations on this page are rendered with MathJax. Learn more.

min x Ψ x 1 , subject to Φ x = b ,
M C K log ( N / K ) ,
b n f = Φ ( Φ T b n Φ T b g ) .
min x n f x n f 1 + 1 2 ρ Φ x n f b n f 2 2 ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.