Anti-noise computational imaging using unsupervised deep learning

Xinliang Zhai; Xiaoyan Wu; Yiwei Sun; Jianhong Shi; Guihua Zeng; Guihua Zeng

doi:10.1364/OE.470767

1. Introduction

Conventional imaging techniques record the light from the scene and directly acquire its image by an array detector that consists of millions of pixels. In contrast to conventional direct imaging, computational imaging is an indirect imaging technique that combines illumination, optical system, image sensors, and post-processing algorithms. As a typical computational imaging scheme, computational ghost imaging (GI) is based on a sequence of one-dimensional bucket signals acquired under structured illumination for image reconstruction [1,2]. There are many advantages for this single-pixel computational imaging scheme, such as high sensitivity, high detection efficiency, large spectral range, and cheap price [3,4]. Thanks to these benefits, GI has become an attractive imaging technique and has been applied in many fields, such as Lidar [5,6], X-ray imaging [7–9], and microscopy [10].

However, conventional computational GI naturally suffers from the defects of long imaging time and noise contamination at low sampling ratio [11,12], especially in single-photon detection under complicated noisy conditions, which limits its practical applications in many photon-limited situations, e.g. biomedical imaging in-vivo. To tackle these problems, many imaging approaches have been demonstrated in recent years. These approaches can be roughly divided into three categories: (1) improved iteration-based methods, (2) model-based methods, and (3) learning-based methods. Improved iteration-based methods introduce extra information into conventional GI to improve the imaging quality. For example, differential ghost imaging (DGI) [13] and normalized ghost imaging (NGI) [14] set an extra bucket detector at the reference beam to record a weighted signal for enhancing the imaging quality of GI. Sun et al. used digital microscanning to improve the signal-to-noise ratio (SNR) of single-pixel imaging [15].

However, for reconstructing high signal-to-noise ratio (SNR) images, these algorithms still need much sampling. Improved iteration-based methods are hard to be developed in extreme imaging scenes such as noisy background, weak illumination, and complex imaging targets. In addition, to analyze the noise in GI systems, some works theoretically model the noise [16] and give a theoretical analysis of the SNR of the system [17]. Compared with iteration-based methods, model-based methods usually utilize some physics or image priors to reduce the number of samples ($M$ patterns$<N$ pixels) while ensuring a high-quality imaging result. Specifically, by using the sparsity of natural images, compressive sensing based GI (CSGI) [2,5,11,18] significantly reduced the sample ratio. In fact, CSGI converts reconstructing process of the conventional GI to an optimal problem with sparse constrain [12,19]. Recently, Liu et al. used an untrained convolutional neural network (CNN) to restore the image at a low sampling ratio, which is called UNNCGI [20]. Actually, UNNCGI is the other model-based method, and it regards the neural network as the solver of the inverse problem of GI. In addition, based on sparsity prior, Wang et al. have implemented a far-field super-resolution GI by using a deep CNN recently [21]. Unfortunately, these model-based methods take much time to iteratively optimize an image and cannot be well generalized. Thus, it seems to be contradictory between the short imaging time and high imaging quality in GI. To break this predicament, some researchers attempted to employ a well-trained CNN for real-time high-quality GI [22–27], which are categorized into learning-based methods. Benefiting from the powerful fitting ability of deep CNN, learning-based GI can achieve an encouraging imaging quality even under extremely low sampling ratio condition [28]. However, a widely perceived challenge of this supervised learning-based method applied in real application scenarios is the difficulty in collecting a sufficiently large training dataset with paired groundtruth data, especially in cases where capturing a labeled dataset is even impossible such as biomedical imaging. In addition, these methods are also difficult to be adapted to practical scenes because of their poor generalization performance.

In this paper, we present an unsupervised deep-learning (UnDL) based anti-noise computational imaging framework to achieve better reconstruction quality under much lower sampling ratio without requiring any paired experimental observation for training. The proposed UnDL method consists of two modules: a physics-based imaging module and a learning-based enhancing module, which is shown in Fig. 1(b). The input one-dimension signal collected by a bucket detector is first processed by the imaging module for obtaining the preliminary imaging result with high-level noise. The imaging module is designed based on physical priors of the imaging system rather than depending on the distribution of training data, which ensures its transferability and generalizability in different scenes. And then, this preliminary image is denoised by the enhancing module which is well designed according to the statistical characteristic of the noise in input image. With collaborating these two modules, the proposed UnDL framework has a strong ability for generalization and excellent anti-noise performance. Meanwhile, this UnDL framework can be applied to any other computational imaging systems as a plug-and-play based algorithm. We quantitatively demonstrate the image quality improvement of this proposed method via both simulations and experiments when it comes to sub-Nyquist sampling. Besides, UnDL does not need any clear data or paired data compared with conventional state-of-art supervised based deep learning methods, which makes it more suitable for actual applications. In addition, benefit from the fast inference speed of deep learning, the experiment shows that the imaging rate of our UnDL method can reach to 20 Hz (48ms per image) with a common 20 kHz DMD. Specifically, the main contributions of this paper include:

• We propose an unsupervised framework for noise-robust computational imaging which only needs noisy simulated data for training, suggesting its potential to enable scientific and biomedical applications where the ground truth is hard to capture.
• The proposed method can retrieve a high SNR image in 48ms ($\sim$20Hz) with single-photon detection, which breaks the trade-off between imaging time and imaging quality in computational imaging.
• We experimentally demonstrate the proposed method by imaging a butterfly wing and show that it outperforms some other widespread techniques in terms of reconstruction quality, imaging speed, and generalization ability.

Fig. 1. Overview of the proposed anti-noise unsupervised deep-learning (UnDL) computational imaging method. (a) Experimental setup of UnDL imaging system with a programmable DMD for one-beam GI. (b) The reconstruction flowchart of unsupervised deep-learning based imaging algorithm. The imaging module processes $M$ input intensities to a preliminary noisy image with a physical-based imaging neural network. Then, this noisy image is denoised by the enhancing module which is well-trained in an unsupervised way.

Download Full Size | PDF

2. Methods

2.1 Imaging scheme

The schematic of the proposed UnDL for GI is depicted in Fig. 1, including the optical system and the imaging flow illustrations. As shown in Fig. 1(a), a laser beam is first filtered by a filter and then expanded by a collimator lens for illuminating the digital micromirror device (DMD). The DMD is loaded with a sequence of random patterns ${\{P_i \}}_{i=1}^M$ with the number of $M$. After that, the reflected light of DMD is modulated spatially and propagates to the object. For alleviating the distortion caused by diffraction and scaling the illumination, a lens and a diaphragm are set in front of the object. After interacting with the object, the light is gathered by another lens and recorded by the bucket detector. To improve the imaging quality in the weak-signal condition, a single photon detector combining with time-correlated single-photon-counting technique is used for echo signal detecting and recording.

Next, the intensities ${\{s_i\}}_{i=1}^M$ with the number of $M$ and the corresponding illumination patterns are used for image reconstruction. As shown in Fig. 1(b), the imaging module processes $M$ input intensities to a preliminary noisy image with a physical-based imaging neural network. Then, this noisy image is denoised by the enhancing module which is well-trained in an unsupervised way. For simply formulating, we denote the operation of imaging module as $f_{IM}$, then the preliminary result $\tilde {I}$ with the size of $\sqrt {N}\times \sqrt {N}$ is given by

(1)$$\tilde{I} = f_{IM}(s),$$

where $s$ represents the vector containing $M$ intensities, $s\in {\mathbb {R}}^M$, and $f_{IM}$ represents the proposed imaging module. In this paper, a physical-based imaging net is designed as imaging module, i.e., $f_{IM}$ simulates the physical process of DGI algorithm [13]

(2)$$I_{\text{DGI}} = \sum_{i=1}^{\infty} \left( s_i - \frac{\left\langle s\right\rangle}{\left\langle r\right\rangle}r_i\right)\left(P_i - \left\langle P\right\rangle \right),$$

where $\left\langle s\right\rangle $, and $\left\langle P\right\rangle $ represent the ensemble average values of intensities and patterns, respectively. And $r_i$ denotes the reference intensity which is the average value of all elements of $P_i$ and $\left\langle r\right\rangle $ is the ensemble average of all $\{r_i\}$. However, in practical applications, it is impossible to acquire $M\to \infty$ measurements and the imaging scene is full of various noises. Therefore, the actual imaging formulation of $f_{IM}$ is corrected to

(3)$$\tilde{I} = \sum_{i=1}^{M} \left( \left(s_i + n'_i\right) - \frac{\bar{s} + \bar{n}'}{\bar{r}}r_i\right)\left(P_i - \bar{P} \right).$$

In Eq. (3), $\bar {s}$, $\bar {r}$, $\bar {n}'$ and $\bar {P}$ represent the mean values of intensities, reference intensities, noises, and patterns, respectively. In the actual low-light imaging condition, the system noise includes detection noise (shot noise) and background noise. However, it is difficult to model the complex and unknown system noise in the real imaging environment. In this work, we simply aggregate these two kinds of noises into bucket signals.

After $f_{IM}$ processing, we just get a noisy image. Therefore, the enhancing module is designed to enhance $\tilde {I}$ which can be written as

(4)$$I= f_{EM}(\tilde{I}),$$

where $I$ denotes the final imaging output of our UnDL and $I\in {\mathbb {R}}^{\sqrt {N}\times \sqrt {N}}$, $f_{EM}$ denotes the operation of enhancing module. Here, we design a CNN based pipeline to implement $f_{EM}$, which is inspired by the statistical properties in Eq. (1) and Eq. (3). Statistically, it is reasonable to assume that the system noise is a white Gaussian noise and the expectation ${\cal {E}}(n')=0$. Then, Eq. (3) can be written as

(5)$$\begin{aligned} \tilde{I} & = I + \sum_{i=1}^{M} n'_iP_i + n^{\prime\prime} \\ & = I + n, \end{aligned}$$

where $I$ denotes the clear image output by $f_{EM}$ and $n''$ denotes the noise caused by undersampling. We summarize these two noise to $n$.

In fact, it exists the relation among pixels in a certain natural image, i.e., the pixels of $I$ are not statistically independent [29]. Furthermore, a pixel value $I(x,y)$ can be predicted by the pixels near it. If we only assume the expectation of $n$ to be zero regardless of its specific distribution, i.e. $\mathbb {E}(n)=0$, it can lead to

(6)$${\mathbb{E}}\left(\tilde{I}(x,y)\right) = I(x,y).$$

Equation (6) inspires us to enhance the noisy image generated by $f_{IM}$ in an unsupervised way. In the learning process of CNN, the minimum of the $L_2$ loss is found at the arithmetic mean of the training data [29]. In our case, the objective function of $f_{EM}$ can be defined as

(7)$$\min\sum_{(x,y)}\left| f_{\text{EM}}\left(\tilde{I}_{RF\left(x,y\right)}\right) - \tilde{I}\left(x,y\right) \right|^2,$$

where $\tilde {I}_{RF\left (x,y\right )}$ denotes the receipt field of the pixel $\tilde {I}(x,y)$. The receipt field in this paper refers to the sub-region surrounding the pixel $\tilde {I}(x,y)$ (see Fig. 1(b)). We use all pixels $\tilde {I}_{RF\left (x,y\right )}$ in this sub-region except the middle pixel to predict the middle pixel $\tilde {I}(x,y)$. The detailed architectures and training strategies of imaging module and enhancing module will be demonstrated later.

2.2 Network architecture and training strategy

As shown in Fig. 2(a), the proposed UnDL is an end-to-end deep neural network which takes the measured intensity vector $s$ as input and outputs the reconstruction result $I$. Similar to a recent research called TST-DL [24], the UnDL needs to be trained with two steps: first training imaging module and then training enhancing module. Nevertheless, UnDL is quite different from TST-DL since that UnDL can be trained without clear data.

Fig. 2. The detailed network architectures of imaging module and enhancing module. (a) The data overview architectures of our approach. (b) The specific architecture of physical-based imaging network (FBI net). (c) The detailed architecture of residual block.

Download Full Size | PDF

In order to obtain the preliminary imaging results based on physical prior, we develop a network implementation for imaging module. Specifically, we design a physics-based imaging network (FBI net), which takes the simulated intensity vector $s$ as input and outputs the DGI result. As shown in Fig. 2(b), FBIN is built with a fully connection layer, a reshape layer and three convolutional layers with Rectified Linear Unit (Relu) for activating. The input and output dimensions of the fully connection layer are set with $M$ and $N$, respectively. The reshape layer adjusts the feature shape to $\sqrt {N}\times \sqrt {N}$. In the training stage, we simulate a plenty of bucket signals with adding extra noise on MNIST dataset [30], and the corresponding DGI images are as groundtruths to train the proposed FBI net. The cost function is simply set with pixel-wised $L_1$ loss and the SGD with the learning rate of 0.001 is selected as the optimizer.

The network architecture of enhancing module is based on the generative network of SRGAN [31], which contains several residual blocks [32]. The detailed implement of enhancing module is demonstrated in Fig. 2(c). In the training stage, a rectangle part of the input image is selected randomly and a random pixel is ripped out from this selected part as the pixel to be predicted. With updating according to Eq. (7), the enhancing module gradually learns to output the clear image. It is noted that we use simulated data on existing database (e.g. MNIST [30]) to generate preliminary results $\tilde {I}$ for training enhancing module. We utilize Adam optimizer with the learning rate of 0.0005 to train our enhancing module. After converging to a optimal point, we concatenate the imaging module and enhancing module into an end-to-end whole to continue training at a learning rate of 0.0001 with SGD. In test stage, we directly input the experimental measurements into this end-to-end network to get the final result. All of the simulations and experiments are performed on Pytorch-1.9.1 platform in a computer with an NVIDIA GTX1080 GPU.

The proposed UnDL framework is an end-to-end computational imaging framework but needs to be trained in stages. Here we elaborate the training and testing procedures of the UnDL method.

Step 1 Prepare simulation training data for training the imaging module. First, we add zero-mean noise to the clear images. Then, simulated 1D intensities under 5%, 10%, and 20% sampling rates are generated. Next, according Eq. (3), we can obtain DGI results with noise. We use these simulated intensity measurements and corresponding DGI images to train the imaging module (FBI net).

Step 2 When the imaging module converges, we use its outputs as the training set to training the enhancing module. The training stage does no need any paired or clear data. Specifically, as shown in Fig. 1(b), we random sample some sub-regions with the size of $5\times 5$ in each input image. In each sub-region, the center pixel is removed, and the rest pixels, i.e., its receipt field, are used to predict the center pixel. According to Eq. (6) and Eq. (7), the enhancing module will converge to output a clear image.

Step 3 After the above two steps, the proposed UnDL can take the real imaging measurements as input and output high-quality reconstruction results.

3. Results

In this section, we perform a comprehensive study on the effectiveness of the proposed method. The same random pattern sequence is used as the illuminations in both cases of simulation and real imaging experiment. The pattern size and imaging size are both set to $64\times 64$. In simulations, we evaluate the proposed method on MNIST dataset quantitatively and discuss the ability of generalization on the other dataset. Gaussian noise with the standard deviation of 0.5 and the mean of 0 is added to the bucket signal. In experiments, we use the model learned from the simulation to image the realistic objects.

3.1 Simulation

Now we compare the performance of the proposed method with DGI [13] and CSTV [33] which are widely-used single-pixel imaging. CSTV is a compressive sensing method implemented with total variation regularization. As a model-based method, CSTV [33] takes advantage of image sparsity to optimize the reconstruction iteratively. The comparison results under the sample rates (SR) of 5%, 10% and 20% are demonstrated in Fig. 3. To quantitatively compare different methods, we compute the peak signal to noise ratio (PSNR) and the structural similarity index measure (SSIM) [34] to evaluate the restored results. The PSNR and SSIM are widely used for measuring the reconstruction performance of image. Given the restored image $I$ and its groudtruth image $\hat {I}$, the formulations of PSNR and SSIM are as follows:

(8)$$\text{PSNR}(I, \hat{I}) = 10\log_{10}{\frac{255^2}{\text{MSE}(I, \hat{I})}},$$

(9)$$\text{SSIM}(I,\hat{I}) = \frac{(\mu_I\mu_{\hat{I}}+C_1)(2\delta_{I\hat{I}}+C_2)}{(\mu_I^2 + \mu_{\hat{I}}^2 +C_1)(\delta_I^2 + \delta_{\hat{I}}^2 +C_2)},$$

where $\text {MSE}(I, \hat {I})$ denotes the mean square error between $I$ and $\hat {I}$, $\mu _I$ and $\mu _{\hat {I}}$ are the mean values of $I$ and $\hat {I}$ respectively, $\delta _I$ and $\delta _{\hat {I}}$ are the standard deviations of $I$ and $\hat {I}$ respectively, and $\delta _{I\hat {I}}$ denotes the covariance between $I$ and $\hat {I}$. $C_1$ and $C_2$ are two constants set for avoiding dividing by 0. The value of PSNR is more than zero and the SSIM value ranges from 0 to 1. A higher PSNR or SSIM value means a higher reconstruction quality.

Fig. 3. Comparison among DGI, CSTV and our proposed UnDL at the 5%, 10% and 20% sampling ratios. (a)-(i) Examples of imaging results of three methods under different sampling ratios. The corresponding PSNR and SSIM values are shown at the bottom of each image. (j), (k) Comprehensive comparisons of DGI, CSTV and our proposed UnDL.

Download Full Size | PDF

Visually, one can see from Figs. 3(a)-(i) that the imaging quality is improved with increasing sampling ratio. In contrast with the subjective visual assessment, DGI’s PSNR value at SR=20% is lower than that at SR=10%, which is caused by high-level noises. By comparing the reconstruction results in the same row, we can find that CSTV is slightly better than DGI, and the proposed UnDL is much better than the other two methods, which can be verified by the values of PSNR and SSIM under the same sample rate. In Figs. 3(j) and (k), we depict the histograms with error bars to clearly compare the PSNR and SSIM values of the three methods. As shown in Figs. 3(j) and (k), the proposed method significantly improves the imaging quality under all conditions. Furthermore, our imaging quality at SR=5% is also higher than the results restored by DGI and CSTV at SR=20%. We can also see from Figs. 3(c), (f) and (i) that it still exists noise in our reconstructing results, which may be caused by two reasons. One reason is that our assumption of noise is not very precise due to the high complexity of the imaging environment. And the other reason is that it is impossible to completely eliminate the effects of undersampling, because it belongs to underdetermined problem itself. However, we can utilize the image prior and approximate noise prior to improve the imaging quality.

To further evaluate the generalization ability of our proposed unsupervised learning method, we compare the proposed method with a popular supervised learning algorithm, called U-Net [35]. Recently, the U-Net has been widely used in deep learning based GI [20,23]. Like the setup in existing works [20,23], we train the U-Net with clear paired data in the MNIST handcraft number dataset [30] and test the trained model on the STL-10 natural image dataset [36]. Specifically, we simulate intensity measurements with the same illuminating patterns under the 20% sampling ratio and then input this 1D vector to the well-trained U-Net and our UnDL. To evaluate the performance of generalization quantitatively, we calculate the correlation coefficients between the imaging results and the groundtruth as shown in Fig. 4. The correlation coefficient measures the similarity of two images, a bigger value means the image is more similar to the groundtruth. By comparing the correlation coefficient values and the generalizing results, we find that our method is generalizable but the supervised U-Net is failed. In fact, poor generalization performance is a common problem of data-based learning algorithms. In contrast, the proposed UnDL combines the physical imaging process and the data-independent unsupervised enhancing strategy instead of learning the distribution of images. Therefore, our method can obtain a good result even though the model is trained on a different database.

Fig. 4. Generalization results of different methods. The notation "corr" means the correlation coefficient with gray-level groundtruth.

Download Full Size | PDF

In addition, we discuss the effectiveness of the FBI net. According to the training strategy described mentioned above, FBI net is trained with 60000 paired data of 1D intensity vectors and corresponding DGI results. With this large training dataset, we find that the law of DGI can be easily fitted. Figure 5 shows the inference results of FBI net with different sampling ratios (SR). The outputs of our FBI net are noisy images with the similar characteristic of DGI result. In order to clearly describe it, we compute their correlation coefficients with the groundtruth. Under the same sample ratio, the output of FBI net has the similar correlation coefficients with DGI results. This conclusion is of importance for the downstream enhancing module, because it takes the FBI net reconstructions as input.

Fig. 5. The inference results of our physical-based imaging net. The left, middle and right dotted boxes are the results under 5%, 10%, and 20% sample ratios, respectively. All these images have $64\times 64$ pixels. The correlation coefficient (corr) with groundtruth is marked out below each image.

Download Full Size | PDF

To systematically study on the robustness of the proposed method to different noise, we simulate the imaging results under different noise settings. Specifically, the model is trained on the data with the additive Gaussian noise (the mean value is zero and the standard deviation value is 0.5) , and tested the robustness of this trained model under different standard deviations (Std). As shown in Fig. 6, the right two charts depict the trend of image quality under noise interference. For both DGI and UnDL, the increase of the noise standard deviation will lead to a gradual decrease in the image quality (PSNR or SSIM). Although, the imaging quality of UnDL is always much better than DGI, this conclusion can also be drawn from the four imaging examples of "9", "3", "2", and "5" on the left.

Fig. 6. Imaging results of the proposed UnDL at different noise levels. (a) Example results of DGI and UnDL at different sampling rates (5%-20%) and different noise standard deviations (from 0.1 to 0.9). (b) The mean values of PSNR on MNIST test set for each noise level. (c) The mean values of SSIM on MNIST test set for each noise level.

Download Full Size | PDF

3.2 Optical experiments

The optical system we built for the experimental demonstration is show in Fig. 1(a). The light source from a pulsed laser firstly filtered by a filter with the transmission wavelength of 680 nm. And then, after collimating, the light illuminates the DMD (DLC9500P24, 1080$\times$1920). We use an imaging lens (Lens 1) to project the speckle field at the DMD plane to the surface of the object. The transmitted light is collected by lens 2. Finally, the transmitted light is recorded by a detector. Each pattern has 64$\times$64 pixels, and each pixel consists of 8$\times$8 micromirror units. The focal length of two lens in Fig. 1(a) are both 100 mm. In our experiment, we used a single-photon avalanche diode (SPAD) as the detector. The digital signal recorded by SPAD is then fed into a time-correlated single-photon-counting (TCSPC) module (HydraHarp-400) to accumulate the final intensity. The TCSPC counter is triggered externally by the DMD. Thus, the exposure time is 50us. The restoration pipeline is shown in Fig. 1(b). In order to obtain the experiment SNR, we use a CCD camera to take a clear picture of the target as the groundtruth, and simulate the bucket signal without any noise on the computer according to the illumination intensity. Then, we collected 10 times the bucket signals of the target at 20% sampling rate to calculate the SNR ($\text {SNR} = 10\text {lg}\frac {V_s}{|V_s-V_d|}$ $V_s$, $V_d$: the average intensity of pure signals and noisy signals ) in the experimental environment. The SNR is range from -0.18dB to 1.33dB.

As mentioned above, our method is based on a physical imaging module and an enhancing module. The output of imaging module reserves all noises existing in the imaging system, e.g., illumination noise, background noise, and detection noise. This noisy preliminary image is then improved by the enhancing module. To train the enhancing module, we assume that the total noise existing in the imaging results tends to be a zero-mean noise. Based on this assumption, the imaging module can be trained with noisy data and output a clear reconstruction result. In experiments, we use the model learned from the simulation to image the realistic objects. The pixel numbers of reconstructed image is $N=64\times 64$ with the sampling ratio of 5%, 10% and 20%, which corresponds the numbers of measurements $M=204$, $M=409$ and $M=819$, respectively. According to [21], we measure the full-width at half-max (FWHM) to estimate the value of the speckle grain size on the object plane and the result suggests that the diffraction limit of the experimental GI system is $810\mu m$.

In our proof-of-principle experiment, two real objects are used: a handmade transparent pentagram and a specimen of butterfly which is named Parantica aglea from Yunnan, China. The imaging results are shown in Fig. 7. On the one hand, it is observable that our method has a significant improvement compared with DGI and CSTV especially in noise level. This improvement can be attributed to the effectiveness of the enhancing module. On the other hand, the reconstruction contrast of our UnDL is much higher than the other two methods. When the sampling ratio is much low, the imaging results of DGI is swallowed into noise. Although the imaging results of CSTV seem to be smooth because of the optimal strategy of total variation, it lacks the structure information of the object.

Fig. 7. The experimental results of imaging a handmade pentagram and a part of butterfly wings under different sample ratios with DGI, CSTV, and UnDL.

Download Full Size | PDF

To verify the anti-noise effectiveness of UnDL, we select a region of the butterfly wing for imaging to perform a more comparative study on imaging results of different methods. The results are shown in Fig. 8. Comparing Figs. 8(a)-(i), our method is better than other two methods in the performance of anti-noise. For further demonstrating the effectiveness of our method, we plot the normalized pixel values of highlight line (the red lines in Figs. 8(g), (h)) under 20% sample ratio and the line charts is depicted in Figs. 8(j)-(l). We can see that both CSTV (Fig. 8(k)) and our method (Fig. 8(l)) have obvious peaks compared with DGI (Fig. 8(j)). Figures 8(j), (k), and (l) are plotted in the same coordinate scale, which indicates that our method reaches to a higher contrast than CSTV. In order to clearly show the imaging contrast, Fig. 8(n) draws the 3D contour map of Fig. 8(i). Note that there still exists some noise in our experimental results (see Figs. 8(c), (f), and (i)), which is introduced by undersampling, imperfect assumption and environmental light.

Fig. 8. The comprehensive study on imaging a butterfly specimen under different sampling ratios. (a)-(i) Imaging results of three methods under different sample ratios. (j)-(l) Pixel values of the red highlight lines in (g), (h), and (i), respectively. (m) Picture of the butterfly specimen and a snapshot of the imaging region. (n) The contour map of (i). (o) DGI result under 200% sampling ratio.

Download Full Size | PDF

To quantify the imaging time under the similar imaging quality, we conducted a doubling sampling experiment with SR=200%. The imaging result is shown in Fig. 8(o), which visually has the similar noise level compared with Fig. 8(i). The imaging time contains two parts: the acquisition time and reconstruction time. We present the imaging time in Table 1. The acquisition time is based on reconstructing 64$\times$64 images at 20 kHz modulation rate of DMD. Because of the long reconstruction time of CSTV, we do not compare it with DGI and our method. Although the second row and the third row of Table 1 show that the our imaging time (0.041s+0.085s=0.126s) is slightly longer than DGI imaging time (0.041s+0.056s=0.097s) at 20% sample rate, but our imaging quality is much better than that of DGI (see Figs. 8(g) and (i)). In addition, the frame rate of our method can still reach about 8 fps (frames per second). However, to acquire similar imaging quality with Fig. 8(i), DGI requires approximately 10x sampling and the frame rate is only 1 fps. Moreover, according to Figs. 3(j) and (k), we know that our imaging quality at 5% and 10% sampling ratios are higher than the DGI result at 20%. Meanwhile, our frame rate at 5% can reach 20 fps. Overall, compared with the existing methods, the UnDL proposed in this paper is more suitable for practical SPI and has stronger versatility. Furthermore, within the number of mirror units of the DMD, the higher the image resolution the more illumination patterns are required. Although the parameters of the deep neural network are also increased at the same time, this has a negligible increase on the reconstruction time. Therefore, at the same sampling rate, a larger number of imaging pixels mainly increases the acquisition time.

Table 1. Time spending of practical imaging.

View Table

4. Conclusion

Computational imaging suffers the trade-off between imaging quality and imaging time. To tackle this problem, we design a physical-informed unsupervised deep learning framework. By introducing the physical prior in imaging step and the noise prior in enhancing step, the proposed UnDL can break the mutual restriction of imaging quality and imaging time. Furthermore, our approach is trained with simulation data and does not require any time-consuming database collection. The experiments and simulations verify its effectiveness in anti-noise and fast imaging. Compared with widely-used methods, the proposed framework is capable of reconstructing high quality images with $64\times 64$ resolution at 5% sample rate in 48ms ($\sim$20Hz), which can be used for real-time practical applications. In addition, it is proved that the proposed framework can be well generalized to different imaging environments. Due to the uncontrollability and complexity of imaging conditions, there exists some noise in imaging results. Therefore, our UnDL can be further improved. We believe the proposed method can be developed as a solver for various general imaging scenes, especially in complicated and extreme environments.

To further speed up the computational imaging in the future, improving the modulation speed of the spatial light modulation device is the most direct approach [37]. Beside, reducing the sampling rate or optimize the illumination patterns [28,38] is the other way to improve the imaging speed. In addition, we will improve our noise model and try to carry out super-resolution imaging and high pixel-resolution reconstructing based on unsupervised deep learning in the future.

Funding

National Natural Science Foundation of China (61631014, 61905140); Science and Technology Commission of Shanghai Municipality (22142201900).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]

2. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

3. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13(1), 13–20 (2019). [CrossRef]

4. G. M. Gibson, S. D. Johnson, and M. J. Padgett, “Single-pixel imaging 12 years on: a review,” Opt. Express 28(19), 28190–28208 (2020). [CrossRef]

5. C. Zhao, W. Gong, M. Chen, E. Li, H. Wang, W. Xu, and S. Han, “Ghost imaging lidar via sparsity constraints,” Appl. Phys. Lett. 101(14), 141123 (2012). [CrossRef]

6. N. Radwell, S. D. Johnson, M. P. Edgar, C. F. Higham, R. Murray-Smith, and M. J. Padgett, “Deep learning optimized single-pixel lidar,” Appl. Phys. Lett. 115(23), 231101 (2019). [CrossRef]

7. H. Yu, R. Lu, S. Han, H. Xie, G. Du, T. Xiao, and D. Zhu, “Fourier-transform ghost imaging with hard x rays,” Phys. Rev. Lett. 117(11), 113901 (2016). [CrossRef]

8. A. Zhang, Y. He, L. Wu, L. Chen, and B. Wang, “Tabletop x-ray ghost imaging with ultra-low radiation,” Optica 5(4), 374–377 (2018). [CrossRef]

9. M. P. Olbinado, D. M. Paganin, Y. Cheng, and A. Rack, “X-ray phase-contrast ghost imaging using a single-pixel camera,” Optica 8(12), 1538–1544 (2021). [CrossRef]

10. N. Radwell, K. J. Mitchell, G. M. Gibson, M. P. Edgar, R. Bowman, and M. J. Padgett, “Single-pixel infrared and visible microscope,” Optica 1(5), 285–289 (2014). [CrossRef]

11. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]

12. S. Han, H. Yu, X. Shen, H. Liu, W. Gong, and Z. Liu, “A review of ghost imaging via sparsity constraints,” Appl. Sci. 8(8), 1379 (2018). [CrossRef]

13. F. Ferri, D. Magatti, L. Lugiato, and A. Gatti, “Differential ghost imaging,” Phys. Rev. Lett. 104(25), 253603 (2010). [CrossRef]

14. B. Sun, S. S. Welsh, M. P. Edgar, J. H. Shapiro, and M. J. Padgett, “Normalized ghost imaging,” Opt. Express 20(15), 16892–16901 (2012). [CrossRef]

15. M. Sun, M. P. Edgar, D. B. Phillips, G. M. Gibson, and M. J. Padgett, “Improving the signal-to-noise ratio of single-pixel imaging using digital microscanning,” Opt. Express 24(10), 10476–10485 (2016). [CrossRef]

16. M. Sun, Z. Xu, and L. Wu, “Collective noise model for focal plane modulated single-pixel imaging,” Opt. Lasers Eng. 100, 18–22 (2018). [CrossRef]

17. Y. Jauregui-Sánchez, P. Clemente, P. Latorre-Carmona, E. Tajahuerce, and J. Lancis, “Signal-to-noise ratio of single-pixel cameras based on photodiodes,” Appl. Opt. 57(7), B67–B73 (2018). [CrossRef]

18. X. Li, N. Qi, S. Jiang, Y. Wang, X. Li, and B. Sun, “Noise suppression in compressive single-pixel imaging,” Sensors 20(18), 5341 (2020). [CrossRef]

19. A. Pastuszczak, R. Stojek, P. Wróbel, and R. Kotyński, “Differential real-time single-pixel imaging with fourier domain regularization: applications to vis-ir imaging and polarization imaging,” Opt. Express 29(17), 26685–26700 (2021). [CrossRef]

20. S. Liu, X. Meng, Y. Yin, H. Wu, and W. Jiang, “Computational ghost imaging based on an untrained neural network,” Opt. Lasers Eng. 147, 106744 (2021). [CrossRef]

21. F. Wang, C. Wang, M. Chen, W. Gong, Y. Zhang, S. Han, and G. Situ, “Far-field super-resolution ghost imaging with a deep neural network constraint,” Light: Sci. Appl. 11(1), 1–11 (2022). [CrossRef]

22. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]

23. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

24. R. Shang, K. Hoffer-Hawlik, F. Wang, G. Situ, and G. P. Luke, “Two-step training deep learning framework for computational imaging without physics priors,” Opt. Express 29(10), 15239–15254 (2021). [CrossRef]

25. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. 413, 147–151 (2018). [CrossRef]

26. I. Hoshi, T. Shimobaba, T. Kakue, and T. Ito, “Single-pixel imaging using a recurrent neural network combined with convolutional layers,” Opt. Express 28(23), 34069–34078 (2020). [CrossRef]

27. R. Zhu, H. Yu, Z. Tan, R. Lu, S. Han, Z. Huang, and J. Wang, “Ghost imaging based on y-net: a dynamic coding and decoding approach,” Opt. Express 28(12), 17556–17569 (2020). [CrossRef]

28. S. Rizvi, J. Cao, K. Zhang, and Q. Hao, “Deepghost: real-time computational ghost imaging via deep learning,” Sci. Rep. 10(1), 11400 (2020). [CrossRef]

29. A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void-learning denoising from single noisy images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 2129–2137.

30. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

31. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 4681–4690.

32. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.

33. L. Bian, J. Suo, Q. Dai, and F. Chen, “Experimental comparison of single-pixel imaging algorithms,” J. Opt. Soc. Am. A 35(1), 78–87 (2018). [CrossRef]

34. A. Horé and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in 20th International Conference on Pattern Recognition (2010), pp. 2366–2369.

35. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (Springer International Publishing, 2015), pp. 234–241.

36. A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics (2011), pp. 215–223.

37. Z. Xu, W. Chen, J. Penuelas, M. Padgett, and M.-J. Sun, “1000 fps computational ghost imaging using led-based structured illumination,” Opt. Express 26(3), 2427–2434 (2018). [CrossRef]

38. M. Sun, L. Meng, M. P. Edgar, M. J. Padgett, and N. Radwell, “A russian dolls ordering of the hadamard basis for compressive single-pixel imaging,” Sci. Rep. 7(1), 3464 (2017). [CrossRef]

Methods	of patterns	Acquisition time	Reconstruction time	Frame rate
CSTV	-	-	$\sim$ 7.9s	$\leq$ 0.13fps
DGI (20%, Fig. 8(g))	819	0.041s	0.056s	10.3 fps
DGI (200%, Fig. 8(o))	8192	0.42s	0.57s	1.01 fps
Ours(20%, Fig. 8(i))	819	0.041s	0.085s	7.94 fps
Ours(10%, Fig. 8(f))	409	0.020s	0.053s	13.7 fps
Ours(5%, Fig. 8(c))	204	0.011s	0.037s	20.8 fps

Anti-noise computational imaging using unsupervised deep learning

Abstract

1. Introduction

2. Methods

2.1 Imaging scheme

2.2 Network architecture and training strategy

3. Results

3.1 Simulation

3.2 Optical experiments

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (1)

Equations (9)

Optics Express