Depth acquisition in single-pixel imaging with multiplexed illumination

Huayi Wang; Huayi Wang; Liheng Bian; Liheng Bian; Jun Zhang; Jun Zhang

doi:10.1364/OE.416481

1. Introduction

Single-pixel imaging (SPI) is an emerging computational imaging scheme that uses a single-pixel detector to acquire images [1–3]. It modulates the target light field with a series of patterns, and acquires a measurement sequence of the converged encoded light using a single-pixel detector. A reconstruction algorithm (such as the linear correlation, compressed sensing or deep learning) [4–6] is employed to decode the target image from the one-dimensional measurements. SPI maintains high signal-to-noise ratio and wide working spectrum, providing a feasible solution when array sensors are expensive or not available. It has been applied in multiple fields such as multispectral imaging [7,8], gas detection [9], and object classification [10].

However, the target’s depth information is lost in the conventional SPI, which is crucial for a number of applications such as robotics and virtual reality [11]. Due to that there exists only two-dimensional spatial modulation and lacks depth modulation, the decoded image contains only 2D information. There has been a few strategies empowering SPI to acquire depth information. Fang et al. borrowed the parallax idea from 3D modeling [12], and employed multiple single-pixel detectors placed at different locations to acquire images of different view angles [13,14]. These images with parallax are fused together to produce the target’s 3D information. Zhang et al. introduced an optical grating into the SPI system for depth modulation, and enables depth recovery from the reconstructed 2D image with anamorphic grating stripes using conventional fringe phase estimation algorithms [15,16]. Sun et al. employed a time-of-flight laser with ultra-shot laser pulses, which enable the single-pixel detector to distinguish photons of different arrivals. Then, depth information is extracted from the time dimension [17]. Although the above techniques enables depth acquisition for SPI, they require additional hardware including multiple single-pixel detectors, optical grating and time-of-flight laser, which increase both system cost and experiment efforts.

In this work, we report an efficient depth acquisition method for SPI without any additional hardware. The technique employs a multiplexed illumination strategy combining both random and sinusoidal encoding, which simultaneously modulates the spatial and depth information of the target. The three-dimensional information is then coupled into a one-dimensional measurement sequence of the single-pixel detector. A convolutional neural network is built to reconstruct both reflectance and depth information directly from the 1D measurements. Compared to the conventional iterative SPI algorithms, the end-to-end framework enables effective decrease of measurement amount and computational complexity.

2. Method

The reported SPI scheme is sketched in Fig. 1, and its measurement formation is presented in Fig. 2. The multiplexed modulation contains two components, including random code and sinusoidal code to encode both the 2D spatial and depth information. Mathematically, the multiplexed modulation is denoted as

(1)$$\boldsymbol{P}^{k}=\boldsymbol{P}_{r}^{k} \odot \boldsymbol{P}_{s},$$

where $\odot$ denotes Hadamard product, $\boldsymbol {P}_{r}^{k}\in \boldsymbol {R}^{n \times n}$ is the $k$th random coding pattern, and $\boldsymbol {P}_{s} \in \boldsymbol {R}^{n \times n}$ represents the sinusoidal code as

(2)$$P_{s}(x, y)=a+b \sin \left(u x+v y+\varphi_{0}\right),$$

where $(x,y)$ denotes the space coordinate, $a$ is the background intensity, $b$ is the stripe’s amplitude, $u$ and $v$ are its angular frequencies, and $\varphi _{0}$ is the initial phase.

Fig. 1. The reported single-pixel imaging scheme. The light is modulated by a series of multiplexed patterns with both random and sinusoidal codes. The reflected light from the target is collected by a single-pixel detector. The measurement sequence is input into a neural network for the end-to-end reconstruction of both reflectance and depth information.

Download Full Size | PDF

Fig. 2. The measurement formation of the reported method. The left part of the figure represents an example of the multiplexed pattern ${P}^{k}$ modulating the scene (depth $H(x, y)$ and reflectance $O(x, y)$). The middle part shows the modulation $\boldsymbol {P_{s}}^{\prime }$ and $\boldsymbol {P^{\prime k}}_{r}$ deformed as depth changes. The right part indicates the measurement sequence $M$ collected by a single-pixel detector, which contains both depth and reflectance information of the target.

Download Full Size | PDF

Due to the off-axis setting of illumination and detection, the multiplexed patterns are deformed with depth variation. The sinusoidal stripes become

(3)$$P_{s}^{\prime}(x, y)=a+b \sin \left(u x+v y+\Delta \varphi(x, y)+\varphi_{0}\right),$$

where $\Delta \varphi (x, y)$ is a function of the target’s depth distribution $H(x,y)$ as [18]

(4)$$\Delta \varphi(x, y)=\frac{2 \pi l \tan \alpha H(x, y)}{t[l-H(x, y)]},$$

where $l$ is the distance between detector and target, $\alpha$ is the path angle between illumination and detection, and $t$ is the sinusoidal period. The encoded light is collected by a single-pixel detector, and the depth information is coupled into the measurement denoted as

(5)$$M^{k}=\sum_{(x, y)} O(x, y) \odot \boldsymbol{P^{\prime k}} = \sum_{(x, y)} O(x, y) \odot P_{s}^{\prime}(x, y) \odot {P^{\prime}}_{r}^{k}(x, y),$$

where $O(x,y)$ is the target’s 2D spatial reflectance.

In order to efficiently reconstruct the depth $H(x,y)$ and reflectance $O(x,y)$ from the one-dimensional measurement sequence $M(k)$, we build an end-to-end convolutional neural network as shown in Fig. 3. The depth reconstruction subnet contains two parts, including the self-encoding subnet (CONV1) and the parallel residual subnet (CONV2). The CONV1 first uses a fully connected layer to transform the one-dimensional measurements to the spatial two dimensions, and employs three-dimensional convolution kernels to extract target features. The CONV2 consists of two parallel residual subnets, each consists of a set of residual blocks and convolution blocks. The parallel framework enables to effectively recover the phase variation ($\Delta \phi$) caused by stripe deformation ($P_{s}^{\prime }$) [19], which further outputs the corresponding depth distribution ($H$). The utilized residual block is capable of avoiding training saturation and gradient dispersion [20]. For reflectance reconstruction , the CONV3 contains a fully connected layer, three 3D convolution layers, and a set of residual blocks.

Fig. 3. The end-to-end reconstruction network. The depth reconstruction subnet includes the self-encoding subnet (CONV1) and the parallel residual subnet (CONV2). The CONV1 contains a fully connected layer and three 3D convolution layers. The convolution kernel sizes are (9 $\times$ 9 $\times$ 1), (1 $\times$ 1 $\times$ 64) and (5 $\times$ 5 $\times$ 32), respectively. The CONV2 consists of two parallel residual subnets, each containing a set of residual blocks and convolution blocks. A concatenate layer is employed to connect the two subnets and output the reconstructed depth map. The reflectance reconstruction subnet (CONV3) contains a fully connected layer, three 3D convolution layers, and a set of residual blocks.

Download Full Size | PDF

3. Simulation

We employed the ShapeNet dataset [21] (available at https://www.shapenet.org/) to train and test the network. The dataset contains 55 common object categories with 51,300 3D models. We randomly took 10000 samples for training, 3800 samples for model validation, and 1200 samples for testing. The training, validation and testing samples are independent of each other with no duplicate ones. The single-pixel measurements were synthesized following Eq. (5). Each illumination pattern and image contain $64 \times 64$ pixels. The sinusoidal parameters are as follows: $a$ is equal to the 2D frontal intensity map of the dataset, $b=0.5$, both $u$ and $v$ equal $\pi / 5$, $\varphi =0$, $t=\frac {10.7}{\sqrt {2}}$. The random code was randomly generated with $0$ and $1$. Referring to the classical crossed-optical-axes geometry of the Phase Measuring Profilometry (PMP) technique [22], the off-axis angle between illumination and detection was set $\alpha =\pi / 12$. We note that the dataset is not limited to the above models, and the number of samples can be adjusted according to specific requirements. Due to the generalization of the network, the samples in the dataset are sufficient to test daily necessities, as validated by the following experiments. We have made the dataset and network public at bianlab.github.io for non-commercial use.

During network training, the training parameters were empirically set as follows: the learning rate was set 0.0001, the batch size was set 120, and all the bias terms were initialized with 0. We used the ReLU function [23] for activation, and the network’s parameters were updated by the adaptive moment estimation (ADAM) optimization technique [24]. The loss function follows the Huber Loss [25] as $L_{total}=\sigma _{MSE} L_{MSE}+\sigma _{MAE} L_{MAE}$. In practice, we set $\sigma _{MSE}=0.999$ and $\sigma _{MAE}=0.001$ for fast convergence, which took around 100 epochs. We implemented the network under the pyTorch framework [26]. The computer is equipped with an Intel i7-9700K processor (3.6GHz), 64GB RAM and an NVidia RTX 2080Ti graphic card. The training process took 80.98 seconds for 10 epochs.

First, we studied the effect of sampling ratio on depth reconstruction accuracy of various methods. The sampling ratio is defined as the ratio between pattern number and pixel number of the 2D reflectance image. The mean squared error (MSE) was used to quantify reconstruction accuracy. We chose different sampling ratios and trained the corresponding networks. The testing results are shown in Fig. 4. Also, the reconstruction results using the conventional SPI algorithms (including the alternating projection (AP) [27], discrete cosine transform (DCT) [1] and total variation regularization (TV) [4] methods) are also presented for comparison (the fringe analysis method [15] is employed for depth recovery). We can see that the reported deep learning technique requires the least sampling ratio ($30\%$) for a certain reconstruction accuracy ($<0.02$) with the highest reconstruction efficiency ($\sim$0.04s). As a comparison, the conventional algorithms consume high computational complexity (two orders of magnitude) and large sampling ratio. The nonmonotonic trend of AP+Fringe and DCT+Fringe dues to that when sampling rate is lower than 0.8, the depth information cannot be successfully solved using conventional algorithms, and there is no monotonicity or certain regularity in the reconstruction results. When sampling rate is above 0.8, the MSE gradually decreases and tends to level off as sampling rate increases.

Fig. 4. Depth reconstruction accuracy under different sampling ratios. (a) Comparison of depth reconstruction accuracy among various methods at different sampling ratios (from 0.1 to 3). The curves plot the depth MSE, and the table shows the reconstruction time at $30\%$ sampling ratio. (b) The exemplar recovered depth of the TV algorithm and our method.

Download Full Size | PDF

Second, we investigated the network’s robustness to measurement noise. We added different levels of Gaussian white noise to the measurements, leading to different signal-to-noise ratios (SNR) of the input. The reconstruction results of both depth and reflectance are presented in Fig. 5. The error map indicates the absolute value of reconstructed map subtracted from the ground truth, with the sampling ratio being $30\%$ and the input SNR being 30dB. We can see that the network is effective to retrieve both reflectance and depth from the single-pixel measurements, even with measurement noise. The noise test on the entire testing dataset (1200 samples) is shown in Fig. 5(b). We averaged all the values and got the result marked by the red line. We can see that the structural similarity index (SSIM) is higher than 0.8 for almost all the samples with different SNR, and the MSE is lower than 0.02. The results validate that the network maintains robustness to measurement noise, as well as generalization on different samples.

Fig. 5. Simultaneous reflectance and depth reconstruction on synthetic measurements using the reported technique. (a) Exemplar reconstructed images and corresponding error maps of different targets in the testing set, with the sampling ratio being $30\%$ and the input SNR being 30dB. (b) Quantitative reconstruction results under different levels of measurement noise for all the testing samples. The red line marks the average value of SSIM and MSE of all the testing samples.

Download Full Size | PDF

4. Experiment

To further validate the effectiveness of the reported technique, we built a proof-of-concept setup to implement experiments. A projector (X416C XGA, Panasonic) was employed to provide multiplexed illumination, and a single-pixel detector (PDA100A2, Thorlabs) was used to collect reflected light from the target. Their placement geometry was the same as that of the above simulation. Considering that the actual light intensity measurements are hard to directly match the simulated ones, we calibrate the light intensity response in advance and subtract the fixed background light intensity from measurements to make the value conform to the theoretical model. The illumination pattern from the projector was zoomed to match different sample sizes, with the off-axis angle between illumination and detection kept unchanged. The sampling ratio was set as 30$\%$. To ensure successful reconstruction of depth details, we set the illumination pattern being 160$\times$160 pixels, with the same angular frequency ($u$ and $v$) with the above simulation.

In the first experiment, we placed a hemispherical gypsum sample as the target, to test the technique’s depth reconstruction accuracy. The hemisphere’s radius is 12 mm, as shown in the top row of Fig. 6(a). With the multiplexed illumination projected on the sample, its depth information was coupled in the single-pixel measurements. The recovered depth map is shown in Fig. 6(b), with the corresponding depth profile crossing the hemisphere’s center presented in Fig. 6(c). We can see that the depth profile coincides well with the ground truth. We note that there exists aberration at the interface between the target and background. The reason may due to that the angled off-axis view introduces shade at the interface, which leads to decreased reconstruction accuracy [22].

Fig. 6. Depth acquisition experiments on a hemisphere sample and a plaster face sample. (a) The two samples. (b) The reconstructed depth distributions using the reported technique. (c) The corresponding depth profiles.

Download Full Size | PDF

In the second experiment, we placed a plaster face sample as the target, to validate the technique’s depth acquisition ability on complex objects. As shown in the bottom row of Fig. 6(a), the sample is 100mm in length, 90mm in width, and 42mm in maximum height. The reconstructed depth map and corresponding face profile are presented in Fig. 6(b) and Fig. 6(c). We can see that the face details including lips and nose were both successfully recovered.

5. Conclusion

In summary, we have demonstrated an efficient technique that enables the existing SPI systems to obtain additional depth information, without any additional hardware nor increased measurements. The difference between the reported technique and the conventional SPI is the multiplexed illumination strategy that combines both random and sinusoidal codes. The modulation couples the target’s 3D information into 1D measurements in a multiplexing manner. An end-to-end neural network is built to perform efficient decoding of both depth and reflectance distributions using deep learning. This tackles the tradeoff between high reconstruction accuracy and low computational complexity.

We would like to note that although the random pattern also deflects as the depth varies, the multiplexed illumination is still required to produce both the 2D reflectance and depth information. This is because the reconstruction task would be too ill-posed and underdetermined if only random pattern is applied. We further explain the reason from the perspective of Fourier spectrum. If the sinusoidal illumination pattern is not multiplexed, all the 2D reflectance and depth information are buried in the low-frequency region, and the depth information is hard to be solved. While the sinusoidal pattern is introduced, the depth information can be separated from the reflectance and shifted to the high-frequency region. We can then use algorithms to recover the depth information.

Compared with the conventional 3D imaging systems using array sensors, the reported technique inherits the advantages of SPI including high signal-to-noise ratio and wide working spectrum. Although its depth accuracy may not be comparable to the methods using multiple detectors [12–14] or the LiDAR method [17], it maintains high efficiency with no additional hardware or increased measurements and short reconstruction time and low sampling ratio, providing potentials for high-throughput imaging.

The technique can be further extended. First, the multi-rate input strategy [28] can be adopted to improve the network’s generalization on different sampling ratios. Second, the multiplexed illumination combining the orthogonal basic (Fourier, Hadamard, etc.) patterns [29] and the sinusoidal patterns, or using learnt patterns [10,30] can be further studied to improve reconstruction quality and reduce sampling rate. Third, the multispectral SPI scheme [7,8] can be incorporated to further increase throughput in the spectrum dimension, building a multimodal imaging system using only one single-pixel detector.

Funding

Fundamental Research Funds for the Central Universities (3052019024); National Natural Science Foundation of China (61827901, 61971045, 61991451); National Key Research and Development Program of China (2020YFB0505601).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

2. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13(1), 13–20 (2019). [CrossRef]

3. G. M. Gibson, S. D. Johnson, and M. J. Padgett, “Single-pixel imaging 12 years on: a review,” Opt. Express 28(19), 28190–28208 (2020). [CrossRef]

4. L. Bian, J. Suo, Q. Dai, and F. Chen, “Experimental comparison of single-pixel imaging algorithms,” J. Opt. Soc. Am. A 35(1), 78–87 (2018). [CrossRef]

5. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]

6. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

7. L. Bian, J. Suo, G. Situ, Z. Li, J. Fan, F. Chen, and Q. Dai, “Multispectral imaging using a single bucket detector,” Sci. Rep. 6(1), 24752 (2016). [CrossRef]

8. Z. Li, J. Suo, X. Hu, C. Deng, J. Fan, and Q. Dai, “Efficient single-pixel multispectral imaging via non-mechanical spatio-spectral modulation,” Sci. Rep. 7(1), 41435 (2017). [CrossRef]

9. G. M. Gibson, B. Sun, M. P. Edgar, D. B. Phillips, N. Hempler, G. T. Maker, G. P. Malcolm, and M. J. Padgett, “Real-time imaging of methane gas leaks using a single-pixel camera,” Opt. Express 25(4), 2998–3005 (2017). [CrossRef]

10. H. Fu, L. Bian, and J. Zhang, “Single-pixel sensing with optimal binarized modulation,” Opt. Lett. 45(11), 3111–3114 (2020). [CrossRef]

11. J. Etgen, S. H. Gray, and Y. Zhang, “An overview of depth imaging in exploration geophysics,” Geophys. 74(6), WCA5–WCA17 (2009). [CrossRef]

12. Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong, “3d deep shape descriptor,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 2319–2328.

13. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. Padgett, “3d computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef]

14. S. S. Welsh, M. P. Edgar, R. Bowman, P. Jonathan, B. Sun, and M. J. Padgett, “Fast full-color computational imaging with single-pixel detectors,” Opt. Express 21(20), 23068–23074 (2013). [CrossRef]

15. Z. Zhang and J. Zhong, “Three-dimensional single-pixel imaging with far fewer measurements than effective image pixels,” Opt. Lett. 41(11), 2497–2500 (2016). [CrossRef]

16. Z. Zhang, S. Liu, J. Peng, M. Yao, G. Zheng, and J. Zhong, “Simultaneous spatial, spectral, and 3d compressive imaging via efficient fourier single-pixel measurements,” Optica 5(3), 315–319 (2018). [CrossRef]

17. M.-J. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Radwell, R. Lamb, and M. J. Padgett, “Single-pixel three-dimensional imaging with time-based depth resolution,” Nat. Commun. 7(1), 12010 (2016). [CrossRef]

18. M. Takeda, “Fourier fringe analysis and its application to metrology of extreme physical phenomena: a review,” Appl. Opt. 52(1), 20–29 (2013). [CrossRef]

19. S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Adv. Photonics 1(2), 025001 (2019). [CrossRef]

20. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE conference on computer vision and pattern recognition, (IEEE, 2016), pp. 770–778.

21. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012 (2015).

22. M. Takeda and K. Mutoh, “Fourier transform profilometry for the automatic measurement of 3-d object shapes,” Appl. Opt. 22(24), 3977–3982 (1983). [CrossRef]

23. X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the 14th international conference on artificial intelligence and statistics, (Academic, 2011), pp. 315–323.

24. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

25. P. J. Huber, “Robust estimation of a location parameter,” in Breakthroughs in statistics, (Springer, 1992), pp. 492–518.

26. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: An imperative style, high-performance deep learning library,” in Advances in neural information processing systems, (2019), pp. 8026–8037.

27. K. Guo, S. Jiang, and G. Zheng, “Multilayer fluorescence imaging on a single-pixel detector,” Biomed. Opt. Express 7(7), 2425–2431 (2016). [CrossRef]

28. Y. Xu and K. F. Kelly, “Compressed domain image classification using a multi-rate neural network,” arXiv preprint arXiv:1901.09983 (2019).

29. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus fourier single-pixel imaging,” Opt. Express 25(16), 19619–19639 (2017). [CrossRef]

30. A. Turpin, G. Musarra, V. Kapitany, F. Tonolini, A. Lyons, I. Starshynov, F. Villa, E. Conca, F. Fioranelli, R. Murray-Smith, and D. Faccio, “Spatial images from temporal data,” Optica 7(8), 900–905 (2020). [CrossRef]

Depth acquisition in single-pixel imaging with multiplexed illumination

Abstract

1. Introduction

2. Method

3. Simulation

4. Experiment

5. Conclusion

Funding

Disclosures

References

Cited By

Figures (6)

Equations (5)

Optics Express