Semantic ghost imaging based on recurrent-neural-network

Yuchen He; Sihong Duan; Yuan Yuan; Hui Chen; Jianxing Li; Zhuo Xu

doi:10.1364/OE.458345

1. Introduction

Ghost imaging (GI), which was first demonstrated by Pittman and Shih in 1995, is regarded as a novel imaging method different than conventional methods based on first-order interference [1]. By employing second-order correlation, GI provides many promising features such as lens-less imaging, turbulence-free and high detection sensitivity. During the past two decades, GI has invoked a large number of researches both in academic and application [2–16]. However, GI has been suffering from slow imaging speed and poor imaging quality, hindering its development in application. These problems lie in the detection scheme of GI: a large number of light patterns are used to illuminate an object, and the corresponding total intensities return from the object (called bucket signals) are detected. In mathematic, this process is to measure the distribution of the spatial frequencies of the object. An illumination pattern can be treated as a spatial frequency basis, and the corresponding bucket signal is the weight of such a basis in the distribution. Summing over the distribution will result in the image of the object, which is called the correlation calculation in GI. This is similar to a Fourier transfer process. The difference is GI can employ various spatial frequency bases including random speckle patterns and orthogonal bases such as Fourier transfer bases [17,18], Hadamard matrices [19–22] and wavelets [23]. Typically, the number of the detections (called sampling) not only directly determines the imaging speed, but also proportionally influences the image quality. One solution is to optimize the post-process algorithms, such as using compressed sensing (CS) algorithm to reduce its sampling rate [24–28]. Recently, artificial intelligence (AI) has been utilized to improve the image quality and reduce the sampling rate as well [29–37]. Another solution is to manipulate the structure of the illumination field, such as exploiting orthogonal patterns to reduce the number of samples, including Hadamard methods, Fourier transfer methods, wavelet GI and tri-directional probing GI.

At the early stage, the deep neural networks for GI worked as an image post-processing method [29–31]: the input was the low-quality images obtained by traditional GI approaches, and the output is improved images. This way relies on how much information is left in the pre-recovered images, limiting the scope of capability of the neural network. In the next stage, the bucket signals were used as the input of the networks [33,35,37]. As in [33], a stable reconstruction on MNIST was achieved at 6.25% sampling rate. Nevertheless, it only worked with the binary images (MNIST as the training dataset). Ref. [35] proposed a DAttNet that is a network model similar to U-Net, which can successfully reconstruct at 5.45% sampling rate. DAttNet works with Hadamard matrix which on the other hand limits the resolution not higher than 128*128. Reference [37] proposed a single-pixel imaging method with a network combining RNN and CNN. It does not work well with random-ordered Hadamard, so it used optimized patterns generated with another neural network. This network recovered images at a sampling rate of 8.1%. These methods [33,35,37] did not use speckle patterns as input or during the training processes, although the acquisition of bucket signals must be done with a specific set of patterns.

We here propose a GI method based on an RNN called GI-RNN, which uses both bucket signals and their corresponding speckle patterns as inputs. It just uses random speckle patterns without any specific optimization. This work is not only inspired from the lack of specifying patterns in neural networks, rather inspired from a general process in ghost imaging: in the previous work, although the detection of GI is sequential, each sampling is assumed independently to the others. The correlation is a simple accumulation of the products between the illumination patterns and the corresponding bucket signals. We propose a different perspective that, the previous sampling result non-linearly influences the next calculation for the correlation. This idea was inspired from human’s language. A stream of words represents a logical meaning. Most of time, people can predict the exact meaning of a speech before listening to all the words. Analogously, a pair of an illumination pattern and its feedback bucket signal is analogous to a “word”. These “word” by “word” gradually depict an image (analogous to “meaning of words”). If we used the previous “words” to predict the upcoming “word” and the “meaning”, GI may be able to reconstruct a good-quality image far below sub-Nyquist. Based on this hypothesis, we built a network based on a recurrent neural network (RNN) [38], called GI-RNN, to “translate” a high quality image from a small number of patterns and bucket signals. RNN is powerful and robust in dealing with sequential data such as language translation, speech recognition, and image captioning. The measured data of GI is sequentially input to each layer of GI-RNN. The layer not only takes a pattern-bucket pair as input, but also uses the output of the previous layer as input. These two types of inputs are put into an activation function with different weights, yielding an output of this layer. Firstly, GI-RNN was trained with MNIST database at different sampling rates. The results show that it can achieve a good-quality image at a sampling rate of 1.28$\%$ under which the traditional basic correlation (BC) algorithm and CS algorithm cannot give a correct result. At sampling rate of 25$\%$, the recovered images with GI-RNN have a 12.58 dB higher than the BC algorithm BC and a 6.61 dB higher than CS algorithm on average. We also trained GI-RNN with natural photo database, and it can also recover complicated images (such as “cameraman”) under a sampling rate of 6.25%. To test its generalization ability, we took photos of landscapes in our hometown. The SSIMs of the recovered images and the original photos are more than 0.7 at a sampling rate of 3%.

2. GI-RNN

GI illuminates an object with a sequence of light patterns, and records the total intensities reflected (or transmitted) from the object with a bucket detector. The correlation between the patterns and their feedback bucket signals recovers the image of the object. The traditional second-order correlation is calculated with

(1)$$T=\frac{1}{t}\sum_{i=1}^t P_i\cdot B_i,$$

where $t$ is the maximum number of the detection, ${P_i}$ represents the speckle of the $i$-th illumination and ${B_i}$ represents the bucket signal of the $i$-th detection. The recovered GI imaging $T$ contains a background, which can be removed from the recovered image with an optimal threshold, or by subtracting the DC component of the bucket signal before the calculation, i.e.

(2)$$T=\frac{1}{t}\sum_{i=1}^t P_i\cdot (B_i-\langle B\rangle),$$

where $\langle \cdot \rangle$ represents the ensemble average. However, to fulfil the RNN, we still use the original form and rewrite it to

(3)$$T_t=\frac{1}{t}\sum_{i=1}^{t-1} P_i\cdot B_i+\frac{1}{t} P_{t}\cdot B_t = \frac{t-1}{t}T_{t-1}+\frac{1}{t} P_{t}\cdot B_t,$$

where ${T_t}$ represents the result of image reconstruction at time $t$, ${P_t}$ represents the speckle and ${B_t}$ represents the bucket signal at time ${t}$. ${T_{t - 1}}$ represents result of image reconstruction at time ${t-1}$.

RNN is a type of neural network for sequential data. It has recursion in the evolution direction of the sequence and all nodes are connected in a chain. Due to its sequence characteristics, which can excavate the temporal information and semantic information in the data. The state of the system at time $t$ can be expressed as

(4)$${h_t} = f\left( {W \cdot{h_{t - 1}} + U \cdot {x_t}} \right),$$

where ${h_t}$ represents the state of $t$. ${x_t}$ represents the input at time $t$. ${h_{t - 1}}$ represents the state of the previous time $t-1$. $W$ and $U$ denote the parameters to be learned. Equation (4) shows that the state of $t$ is determined by the output of the previous state $t-1$ and the input of $t$. Figure 1 shows the relationship between RNN and basic GI.

Fig. 1. Comparison of RNN and GI.

Download Full Size | PDF

Compare Eq. (3) and Eq. (4), their architectures are similar, but the basic correlation of GI assumes each sampling is independent, and the correlation is a superposition of the sampling. Whereas RNN takes the output of a previous layer as an input of the next layer, besides the sequential detected data. The weights of the sequential data and the previous output are different, which are optimized through learning. Through an activation function, the output of a layer introduces non-linearity between the sequential samples. With a optimized parameters, the implementation of GI via RNN can exploit semantic predictable ability and highly reduce the sampling rate. Through training, the GI system can obtain the optimized parameters for the training set and achieve better results at low sampling rate. The schematic diagram of GI-RNN is shown in Fig. 2.

Fig. 2. Schematic diagram of GI-RNN.

Download Full Size | PDF

GI-RNN consists of three parts: pre-processing module, RNN backbone and predictor. The pre-processing module is used to convert speckle and corresponding bucket signal into a form suitable for RNN input. In GI-RNN, long short-term memory (LSTM) is used as the model of RNN. LSTM can effectively learn long-term dependent information. When the target resolution is high, the data sequence will be relatively long, even the sampling rate is low. Consequently, the network is required to have the ability to learn long-term dependent information, so LSTM is adopted. In addition, LSTM also can effectively solve the problems of gradient vanishing and exploding in the process of long sequence training. The input of predictor is the output of the last layer of LSTM, and the output is the target image.

3. Demonstration results

3.1 Training settings

In this paper, we adopt the structure of multi-layer LSTM, and the number of layers is 5. The network input size is 785 and the hidden state size is 1024. Moreover, the predictor input size is 1024 (equal to the LSTM hidden state) and output size is 784. During training, we resize the target image as a one-dimensional vector with a length of 784 as the ground truth. The mean-square error (MSE) is used as the loss function of network training between the reconstructed image and the ground truth, and we use Adam as the optimizer. After a large number of experiments, we found that the optimal initialization learning rate is 0.0001, and the weight decay is 0. The training set in this work is MNIST with the image resolution of 28*28. We randomly selected 9000 images from MNIST for training, and the testing samples are randomly selected from the test set of MNIST. The configuration of this work is a NVIDIA GeForce RTX 3060 GPU and a 11th Gen intel Core i7-11700F CPU. We trained the network independently for different sampling rates.

3.2 Results on MNIST

Figure 3 shows the schematic diagram of GI. We first demonstrated GI-RNN on MNIST, and then on natural objects such as cameraman.

Fig. 3. Schematic diagram of ghost imaging.

Download Full Size | PDF

The demonstrations were conducted at the sampling rates of 0.38$\%$, 1.02$\%$, 6.25$\%$ and 25$\%$, and the results are shown in Fig. 4.

Fig. 4. Reconstruction results at different sampling rates.

Download Full Size | PDF

Figure 4 shows that GI-RNN can obtain stable results (the 10 targets in the testing set) at the sampling rate of 6.25$\%$, and for some targets, the image can be obtained at 1.02$\%$ or even 0.38$\%$ ( which means 3 illuminations).

3.3 Comparison with other methods

Then, we compared GI-RNN with BC algorithm and CS algorithm at sampling rate of 25% and 1.28%. The image reconstruction results are shown in Fig. 5. It can be seen that BC algorithm can not reconstruct target at such low sampling rate, and CS algorithm (FISTA) can partially reconstruct targets at 25%. On the other hand, GI-RNN can even achieve stable results at 1.28%. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) curve of the reconstruction results are shown in Fig. 6, and it shows that the reconstruction results of GI-RNN are significantly better than the other two methods.

Fig. 5. Comparison results with other algorithms at different sampling rates.

Download Full Size | PDF

Fig. 6. PSNR/SSIM comparison results of different algorithms. (a) PSNR results. (b) SSIM results.

Download Full Size | PDF

3.4 Results on complicated targets

Finally, we demonstrated GI-RNN on complicated targets through simulation. Following [39], we use the 400 images in [40] and crop 180*180 region from each image. Then, we cut 67599 image blocks with the size of 32*32 for training. The testing set is in [41] containing 12 pictures with a resolution of 256*256. Similar to the training set, we obtained 10931 images with 32*32 as testing set in complicated target demonstration. To prevent over-fitting of the model, we set the number of layer in LSTM to 2 and add the dropout layer with the probability of 0.5 We demonstrated our method on complicated targets at sampling rates of 6.25$\%$ and 12.5$\%$.

Column (a) of Fig. 7 is the ground truth, column (b) of Fig. 7 shows the results at 6.25$\%$ sampling rate and column (c) of Fig. 7 shows the results at 12.5$\%$ sampling rate, respectively. Regardless of the data acquisition time, it takes 0.0028s to reconstruct a complex image on average by GI-RNN. Figure 7 shows that the reliable results can be obtained when the sampling rate is 6.25$\%$, and the results become better with the increase of the sampling rate. The above demonstration shows that GI-RNN also has promising performance in dealing with complicated targets.

Fig. 7. Results of complicated targets. (a) Ground truth. (b) Results at 6.25$\%$ sampling rate. (c) Results at 12.5$\%$ sampling rate.

Download Full Size | PDF

We further demonstrated the generalization ability of GI-RNN, and select two landscapes (The resolution of the scene is 1024*1024) in our hometown. Each scene is divided into 1024 sub-images. The size of each sub-image is 32*32. We use GI-RNN to image each sub-image and achieve the recovered one. Afterwards, we fuse all the sub-images together and obtain the whole recovered image. This process is shown in the figure below. When the sampling rate is set to 3%, the number of illumination patterns for each sub-image is 32*32*3%$\approx$31. The total number of the illumination patterns is 1024*31=31744. Thus, the overall sampling rate is still 31744/(1024*1024)$\approx$3%, which is the same as if we directly illuminate the whole image with patterns of 1024*1024 at 3% sampling rate. Figure 8 show the results at different sampling rates. Row (a) and (b) of Fig. 8 from left to right is the results at sampling rate of 3$\%$, 6.25$\%$, 12.5$\%$, 25$\%$ and real scene, respectively. The SSIMs of the recovered images at 3% are more than 0.7.

Fig. 8. Results of complicated targets. (a) Scene 1 results at different sampling rates. (b) Scene 2 results at different sampling rates. Under each recovered image are the PSNR and SSIM in comparison with the original one. From the left to right columns, the sampling rates are 3%, 6.25%, 12.5% and 25%.

Download Full Size | PDF

4. Conclusion

In this paper, we propose a novel GI method that realize the imaging process of GI on the architecture of RNN. We introduce the concept of logical association in NLP into GI. In this way, we correlate the states in imaging process of GI through learnable parameters, which can be exploited to highly reduce the sampling rate. We take each speckle and the corresponding bucket signal as the input of each layer in RNN, and the output of RNN is the imaging results of GI. Through training, we can learn the optimal parameters for types of targets, which makes GI to achieve image reconstruction at a much lower sampling rate than traditional methods. Extensive demonstrations results show that GI-RNN can recover good quality images at a sampling rate of 1.28$\%$ for handwriting and at 6.25$\%$ for complicated objects. At such low sampling rates, the traditional GI and the compressed sensing GI can not even recover correct results. More importantly, GI-RNN exhibits the ability of recovered any complicated scene with a high image quality (SSIM > 0.7) but a very low sampling rate of 3%.

Funding

National Natural Science Foundation of China (61901353); 111 Project (B14040); Fundamental Research Funds for the Central Universities (xhj032021005, xjh012019029).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. T. B. Pittman, Y. H. Shih, D. V. Strekalov, and A. V. Sergienko, “Optical imaging by means of two-photon quantum entanglement,” Phys. Rev. A 52(5), R3429–R3432 (1995). [CrossRef]

2. A. Valencia, G. Scarcelli, M. Dangelo, and Y. Shih, “Two-photon imaging with thermal light,” Phys. Rev. Lett. 94(6), 063601 (2005). [CrossRef]

3. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]

4. R. Meyers, K. S. Deacon, and Y. Shih, “Ghost-imaging experiment by measuring reflected photons,” Phys. Rev. A 77(4), 041801 (2008). [CrossRef]

5. Y. Bromberg, O. Katz, and Y. Silberberg, “Ghost imaging with a single detector,” Phys. Rev. A 79(5), 053840 (2009). [CrossRef]

6. F. Ferri, D. Magatti, L. A. Lugiato, and A. Gatti, “Differential ghost imaging,” Phys. Rev. Lett. 104(25), 253603 (2010). [CrossRef]

7. R. E. Meyers, K. S. Deacon, and Y. Shih, “Turbulence-free ghost imaging,” Appl. Phys. Lett. 98(11), 111115 (2011). [CrossRef]

8. S. M. Khamoushi, Y. Nosrati, and S. H. Tavassoli, “Sinusoidal ghost imaging,” Opt. Lett. 40(15), 3452–3455 (2015). [CrossRef]

9. P. Ryczkowski, M. Barbier, A. T. Friberg, J. M. Dudley, and G. Genty, “Ghost imaging in the time domain,” Nat. Photonics 10(3), 167–170 (2016). [CrossRef]

10. D. Pelliccia, A. Rack, M. Scheel, V. Cantelli, and D. M. Paganin, “Experimental x-ray ghost imaging,” Phys. Rev. Lett. 117(11), 113902 (2016). [CrossRef]

11. R. Khakimov, B. Henson, D. K. Shin, S. Hodgman, R. Dall, K. Baldwin, and A. Truscott, “Ghost imaging with atoms,” Nature 540(7631), 100–103 (2016). [CrossRef]

12. S. Ota, R. Horisaki, Y. Kawamura, M. Ugawa, I. Sato, K. Hashimoto, R. Kamesawa, K. Setoyama, S. Yamaguchi, K. Fujiu, K. Waki, and H. Noji, “Ghost cytometry,” Science 360(6394), 1246–1251 (2018). [CrossRef]

13. C. Amiot, P. Ryczkowski, A. T. Friberg, J. M. Dudley, and G. Genty, “Supercontinuum spectral-domain ghost imaging,” Opt. Lett. 43(20), 5025–5028 (2018). [CrossRef]

14. X. Zhang, H. Yin, R. Li, J. Hong, S. Ai, W. Zhang, C. Wang, J. Hsieh, Q. Li, and P. Xue, “Adaptive ghost imaging,” Opt. Express 28(12), 17232–17240 (2020). [CrossRef]

15. W. Gong, “Sub-nyquist ghost imaging by optimizing point spread function,” Opt. Express 29(11), 17591–17601 (2021). [CrossRef]

16. H. Cui, J. Cao, Q. Hao, D. Zhou, M. Tang, K. Zhang, and Y. Zhang, “Omnidirectional ghost imaging system and unwrapping-free panoramic ghost imaging,” Opt. Lett. 46(22), 5611–5614 (2021). [CrossRef]

17. Z. Zhang, S. Liu, J. Peng, M. Yao, G. Zheng, and J. Zhong, “Simultaneous spatial, spectral, and 3d compressive imaging via efficient fourier single-pixel measurements,” Optica 5(3), 315 (2018). [CrossRef]

18. Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of fourier spectrum acquisition,” Nat. Commun. 6(1), 6225 (2015). [CrossRef]

19. L. Wang and S. Zhao, “Fast reconstructed and high-quality ghost imaging with fast walsh–hadamard transform,” Photonics Res. 4(6), 240–244 (2016). [CrossRef]

20. M. F. Li, X. F. Mo, L. J. Zhao, J. Huo, Y. Ran, L. Kai, and A. N. Zhang, “Single-pixel remote imaging based on walsh-hadamard transform,” Acta Phys. Sin. 65(6), 064201 (2016). [CrossRef]

21. M.-J. Sun, L.-T. Meng, M. P. Edgar, M. J. Padgett, and N. Radwell, “A russian dolls ordering of the hadamard basis for compressive single-pixel imaging,” Sci. Rep. 7(1), 1–7 (2017). [CrossRef]

22. W. K. Yu, “Super sub-Nyquist single-pixel imaging by means of cake-cutting Hadamard basis sort,” Sensors 19(19), 4122 (2019). [CrossRef]

23. M. Xi, H. Chen, Y. Yuan, G. Wang, Y. He, Y. Liang, J. Liu, H. Zheng, and Z. Xu, “Bi-frequency 3d ghost imaging with haar wavelet transform,” Opt. Express 27(22), 32349–32359 (2019). [CrossRef]

24. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]

25. V. Katkovnik and J. Astola, “Compressive sensing computational ghost imaging,” J. Opt. Soc. Am. A 29(8), 1556–1567 (2012). [CrossRef]

26. M. Amann and M. Bayer, “Compressive adaptive computational ghost imaging,” Sci. Rep. 3(1), 1545 (2013). [CrossRef]

27. L. Long-Zhen, Y. Xu-Ri, L. Xue-Feng, Y. Wen-Kai, and Z. Guang-Jie, “Super-resolution ghost imaging via compressed sensing,” Acta Phys. Sin. 63(22), 224201 (2014). [CrossRef]

28. H. Zhang, Y. Xia, and D. Duan, “Computational ghost imaging with deep compressed sensing,” Chin. Phys. B 30(12), 124209 (2021). [CrossRef]

29. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]

30. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. 413, 147–151 (2018). [CrossRef]

31. Y. He, G. Wang, G. Dong, S. Zhu, H. Chen, A. Zhang, and Z. Xu, “Ghost imaging based on deep learning,” Sci. Rep. 8(1), 6469 (2018). [CrossRef]

32. G. Wang, H. Zheng, W. Wang, Y. He, J. Liu, H. Chen, Y. Zhou, and Z. Xu, “De-noising ghost imaging via principal components analysis and compandor,” Opt. Lasers Eng. 110, 236–243 (2018). [CrossRef]

33. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

34. T. Bian, Y. Yi, J. Hu, Y. Zhang, Y. Wang, and L. Gao, “A residual-based deep learning approach for ghost imaging,” Sci. Rep. 10(1), 12149 (2020). [CrossRef]

35. H. Wu, R. Wang, G. Zhao, H. Xiao, D. Wang, J. Liang, X. Tian, L. Cheng, and X. Zhang, “Sub-nyquist computational ghost imaging with deep learning,” Opt. Express 28(3), 3846–3853 (2020). [CrossRef]

36. Z. Zhang, C. Wang, W. Gong, and D. Zhang, “Ghost imaging of blurred object based on deep-learning,” Appl. Opt. 60(13), 3732–3739 (2021). [CrossRef]

37. I. Hoshi, T. Shimobaba, T. Kakue, and T. Ito, “Single-pixel imaging using a recurrent neural network combined with convolutional layers,” Opt. Express 28(23), 34069–34078 (2020). [CrossRef]

38. A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international conference on acoustics, speech and signal processing, (IEEE, 2013), pp. 6645–6649.

39. Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1256–1272 (2017). [CrossRef]

40. D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2 (IEEE, 2001), pp. 416–423.

41. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

Semantic ghost imaging based on recurrent-neural-network

Abstract

1. Introduction

2. GI-RNN

3. Demonstration results

3.1 Training settings

3.2 Results on MNIST

3.3 Comparison with other methods

3.4 Results on complicated targets

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Equations (4)

Optics Express