Deringing and denoising in extremely under-sampled Fourier single pixel imaging

Saad Rizvi; Saad Rizvi; Jie Cao; Jie Cao; Kaiyu Zhang; Qun Hao

doi:10.1364/OE.385233

1. Introduction

The basic single pixel imaging (SPI) [1] employing random binary patterns fails to reconstruct high-quality images (compared to conventional imaging) even at a large number of measurements (much higher than the number of image pixels). Recently, basis scan schemes [2] have emerged as a successor of random pattern based SPI. These schemes employ basis patterns based on complete orthogonal dataset and use inverse transforms to reconstruct high quality images. Amongst these schemes, Fourier single-pixel imaging (FSI) is widely used for reconstructing high-quality images [3]. FSI uses a digital micromirror device (DMD) to illuminate a target with phase-shifted sinusoidal patterns and collects back-reflected light using an ordinary photodiode. An inverse Fourier transform (IFT) is subsequently used to reconstruct high-quality target image. Compared to basis scan strategy like Hadamard single pixel imaging (HSI), FSI has proven to be efficient in reconstructing under-sampled images [2]. Similarly, FSI achieves better image quality compared to other SPI schemes like differential imaging [4], normalized SPI [5], and frequency-locked SPI [6,7]. The high-quality reconstruction of FSI has opened doors for its application in many areas [8,9].

To achieve high-quality reconstruction, FSI requires a full spectral sweep across the target scene, which costs a large number of measurements (equal to number of target image pixels) and increases the imaging time. The imaging time of FSI further depends on data acquisition time and image reconstruction time. The image reconstruction time in FSI is almost negligible because it is simply an IFT with little computational cost. It is worth mentioning that this inherent low reconstruction time readily improves the efficiency of FSI method compared to basic SPI where specialized compressed sensing (CS) algorithms [10–12] are used for this job. The data acquisition time of FSI depends on the modulation speed of spatial light modulator (SLM). Recent technological advances allow DMDs (commonly used SLM) to operate maximally at 22kHz (fast FSI [13]) setting an upper bound on imaging speed. In order to increase the imaging speed of FSI, the data acquisition time is reduced by capturing undersampled images. In practice, FSI has been demonstrated for dynamic imaging at high frame rates by reconstructing images at low sampling rate of 2% [13,14] which deteriorates image quality.

The main source of image quality deterioration in undersampled FSI is the presence of ringing artifacts (or Gibbs phenomenon) [15]. Ringing effect distorts image sharpness by producing artifacts. The reason for ringing in undersampled FSI is its reliance on capturing low-frequency spectral coefficients, while ignoring high frequency information about the target. This lack of high frequency information in the captured spectrum results in rough approximations of sharp edges and oscillatory artifacts in the reconstructed image shown in Fig. 1. For real-time FSI, the 3-step FSI is an apt choice which requires 25% fewer measurements than 4-step FSI. However, the 3-step reconstruction using a low grade sensor (photodiode) presents slightly more noise compared to 4-step (better differential suppression) scheme which can be observed in Fig. 1. The cumulative effect of noise (environmental) and ringing presents unwanted artifacts in the reconstructed image which lowers image quality. In order to enhance the image quality of real-time FSI, it is important to develop a fast image reconstruction framework that can achieve deringing and denoising with low computational cost. Artifact suppression [16,17] and deringing [18] in image processing has been catered through different algorithms which are either application specific [19] or computationally expensive [20]. Considering the unique artifacts appearing in undersampled FSI, a context learning based approach that can learn reconstruction from those artifacts is deemed suitable.

Fig. 1. Ringing artifacts and noise comparison (3-step vs. 4-step at 4% sampling ratio) in experimental FSI.

Download Full Size | PDF

Deep learning (DL) has been applied in solving complex problems in computational imaging [21]. DL approaches can extract distinctive features from a large dataset and have been successfully employed for unsupervised learning in many applications [22–24]. Particularly, DL has been applied to solve the inverse problem in SPI [25,26], providing a powerful alternative to compressive sensing. The application of DL in SPI mostly demands training on experimental data [27], which is tedious considering its long imaging time. Recently, we proposed a deep learning framework to improve the imaging quality of 4-step FSI [28]. The network showed improvements in processing relatively low resolution (96×96) images at high sampling rates (≥10%). However, the applicability of the method is limited by its requirement of training over physical imaging data for experimental performance. For dynamic scene reconstruction, it is quite difficult (if not unfeasible) to develop a large dataset from practical data, and the imaging capability of the method is limited to specific targets. Similarly, the noise robustness of the model came into question when applied on experimental data. The network pipeline did not support scale variances, and image details were lost at low sampling rates. A novel method that learns from carefully designed simulation data was recently proposed in [29]. This method trains the network on simulation data by mimicking experimental detections and mapping them with training data. The network uses a very deep architecture and is only optimized for simple/limited (MNIST digits) dataset. Therefore, it is challenging to design a network trained from simulation data which is applicable on diverse complex targets with real-time operation.

In this work, we exploit the context learning capability of convolutional layers [30] to solve the problem of image denoising and deringing in 3-step sampling based FSI. Considering the limitations of existing schemes, a novel network architecture with the following salient features is proposed: (a) improved image quality under extreme compression (98-96%), (b) training (on diverse complex scenes) through simulation data for practical imaging. The trained network is applicable to all kinds of scenes without the need of fixing priors or re-training, (c) a novel parallel architecture with feature learning at different scales, (d) real-time powerful denoising and deringing capabilities, (e) integratable with other sampling strategies to yield improved image reconstruction, (f) the method is adaptive in a sense that it applies deringing and denoising by inspecting features in the image and does not distort image (e.g., in the absence of noise in simulations), (g) the denoising (or deringing) performance of the proposed method matches (or outperforms) the performance of state-of-the-art image processing algorithms. The network can simultaneously handle deringing and denoising in real-time, when both these problems are handled separately by computationally expensive application-specific image processing algorithms. The network can reconstruct high-quality 256×256 images from undersampled data (1-4%). The DL based FSI can replace conventional FSI method for real-time applications where a better quality image is required at higher frame rates.

2. Principles and methods

2.1 Fourier single-pixel imaging

FSI is based on the idea that an image can be decomposed as a combination of inner product between sinusoidal patterns and intensity coefficients. To implement this, FSI [3] acquires the Fourier spectrum of target scene by spectral sweeping the target with phase-shifting sinusoidal illumination patterns. The corresponding spectral coefficients encoded as light intensity (back-scattered from the target) are collected by an ordinary photodiode. Applying an inverse Fourier transform on the acquired coefficients reconstructs the target image.

FSI can be performed using either 3-step or 4-step method. The 3-step FSI requires 25% fewer measurements (1.5×image pixels) compared to 4-step FSI (2×image pixels) and is more suitable for real-time imaging. Although 4-step FSI has slightly better differential noise cancellation than 3-step FSI, the performance of 3-step FSI can be enhanced using a denoising autoencoder. Therefore, the proposed imaging framework employs 3-step FSI to reduce acquisition time and achieves denoising through the proposed DCAN.

In FSI, the sinusoidal pattern for frequency pair F= (f_x, f_y) across image plane is generated using the expression [3]:

(1)$${P_\phi }(x,y;{f_x},{f_y}) = a + b\textrm{ }\cos (2\pi {f_x}x + 2\pi {f_y}y + \phi )$$

where a is the image intensity, and b is a contrast. The intensity back-scattered from the target scene integrated over the target can be given by:

(2)$${I_\phi }(x,y;F) = \int\!\!\!\int {r(x,y){P_\phi }(x,y;F)}$$

where r (x, y) is the reflectivity distribution across the target plane. Considering environment noise and random reflections near the scene, the total response encapsulated by the detector is written as [3]:

(3)$${R_\phi }(F) = {R_n} + k{I_\phi }(x,y;F)$$

where k is associated with size of the detector [3], and R_n is related to random light fluctuations around the detector. We generate the following phase sequences at a particular frequency to acquire corresponding coefficients as:${P_{\phi = 0}} \to {C_0};{P_{\phi = {\raise0.7ex\hbox{${2\pi }$} \!\mathord{\left/ {\vphantom {{2\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}} \to {C_{{\raise0.7ex\hbox{${2\pi }$} \!\mathord{\left/ {\vphantom {{2\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}};{P_{\phi = {\raise0.7ex\hbox{${4\pi }$} \!\mathord{\left/ {\vphantom {{4\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}} \to {C_{{\raise0.7ex\hbox{${4\pi }$} \!\mathord{\left/ {\vphantom {{4\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}$ . The phase shift between adjacent patterns is constant. By acquiring response R_ϕ for different phase values we can apply an asymmetric differential mechanism to cancel out noise, given by [2]:

(4)$$(2{R_0}(F) - {R_{{\raise0.7ex\hbox{${2\pi }$} \!\mathord{\left/ {\vphantom {{2\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F) - {R_{{\raise0.7ex\hbox{${4\pi }$} \!\mathord{\left/ {\vphantom {{4\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F)) + \sqrt 3 j(\,{R_{{\raise0.7ex\hbox{${2\pi }$} \!\mathord{\left/ {\vphantom {{2\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F) - {R_{{\raise0.7ex\hbox{${4\pi }$} \!\mathord{\left/ {\vphantom {{4\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F)) \approx F\{ r(x,y)\}$$

Further applying the IFT, we obtain:

(5)$${F^{ - 1}}\{ (2{R_0}(F) - {R_{{\raise0.7ex\hbox{${2\pi }$} \!\mathord{\left/ {\vphantom {{2\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F) - {R_{{\raise0.7ex\hbox{${4\pi }$} \!\mathord{\left/ {\vphantom {{4\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F)) + \sqrt 3 j(\,{R_{{\raise0.7ex\hbox{${2\pi }$} \!\mathord{\left/ {\vphantom {{2\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F) - {R_{{\raise0.7ex\hbox{${4\pi }$} \!\mathord{\left/ {\vphantom {{4\pi } 3}} \right.}\!\lower0.7ex\hbox{$3$}}}}(F))\} \approx \tilde{r}(x,y)$$

where $\tilde{r}(x,y)$ is equal to the undersampled reconstructed image which is subsequently fed to the DCAN model. The reconstructed image $\tilde{r}(x,y)$ has ringing artifacts produced by a low pass response l_d of FSI which filters out all frequencies outside a circular disk of radius $\mathcal{W}_{\textrm d}$ centered at zero frequency in the Fourier domain. The proposed scheme uses ‘circular’ FSI strategy to acquire coefficients which has the origin of Fourier spectra at the center (lowest frequency). Thus the noise and ringing based reconstruction can be written as:

(6)$$\tilde{r}(x,y) = {l_d}(r(x,y)) + n$$

Given $\tilde{r}(x,y)$, denoising and deringing would yield an estimate of the original image $\hat{r}(x,y)$ such that $\hat{r}(x,y) \approx r(x,y)$. To solve this inverse problem, DL is employed for accurate reconstruction of original image via context learning. The undersampled FSI reconstructions at very low sampling rates of 1-5% are passed to DCAN which applies its learned model to improve image resolution and remove artifacts present in the under-sampled images.

2.2 Deep learning based FSI

To recover high-quality images from extremely undersampled FSI reconstructions (sampling ratio ≤ 4%), it is important to retain image details along with denoising and deringing. In order to achieve both denoising-deringing and detail retention, the proposed imaging framework employs a parallel structured network shown in Fig. 2(a). The network consists of two stages. The lower stage (E1-E2-D2-D1) is the denoising autoencoder used for suppressing ringing artifacts and denoising the image. The lower stage while suppressing unwanted artifacts also loses some image details. To recover fine details, the network uses another stage (F1-F2-F3) which does not suppress noise to a great extent and is able to retain more details (seen in feature maps of upper stage in Fig. 2(b)). By integrating two stages, the network is trained to suppress artifacts using the lower stage and recover fine image details through the upper stage. The network employs convolutional layer (Conv2D) to extract features and remove corruptions using a set of trainable filters with 3×3 kernel size shown in Fig. 2(a). All symmetric encoding-decoding layers are connected by skip connections to traverse feature information and gradients to deeper layers. The DCAN takes an undersampled input $\tilde{r}(x,y)$ and maps it to an intermediate space $a = f(\Theta \tilde{r})$ with weights $\Theta $ during encoding in the lower stage. Similar encoding happens for the upper stage $c = h(\Phi \tilde{r})$. In the decoding stages, the mappings a and c are mapped back into reconstructed feature space $b = g({\Theta ^T}a)$ and $d = j({\Phi ^T}c)$ bearing same shape as $r(x,y)$. The final image is reconstructed by combining output from the two stages in the network $\hat{r}(x,y) = b + d$. The parameters $\Theta $ and $\Phi $ are optimized by training the network to learn an end-to-end mapping from $\tilde{r}({x,y} )\textrm{ to }r({x,y} )$. For the reconstructed target $\hat{r}({x,y} )$, the loss function that favors high peak signal-to-noise ratio (PSNR in dB), for m training examples is expressed as:

(7)$$\textrm{min Loss (}\Theta ,\Phi \textrm{) = }\frac{1}{m}{\sum\limits_{i = 1}^m {[{\hat{r}(x,y) - r(x,y)} ]} ^2}$$

The network is initialized with Xavier initialization [31]. To increase training efficiency, batch normalization (BN) layer [32] is used after every Conv2D layer. Non-linear activations are provided by Rectified linear unit (ReLU) used at every stage to avoid the ‘vanishing gradient’ problem. The max-pooling (or up-sampling) layers are used to reduce (or restore) dimensions and provide transitional invariance. To mitigate data over-fitting, l2-regularization combined with Gaussian noise layers are used. The network architecture is carefully designed and fine-tuned to improve image quality with low computational time. To update network parameters and minimize loss, Adam optimization [33] with standard back propagation is employed. The base learning rate for all the layers is fixed at 10⁻⁴. We train our network on STL-10 [34] DL dataset. All images are converted to gray scale and normalized before training. The training is performed on 10,000 unlabeled images. A test set (of 1000 images) is used to verify network performance during training, and a validation set (2000 images) is used to test the performance of the final model. The training data was prepared within one week through simulations. The proposed model is implemented using Keras with TensorFlow on an Intel i7 CPU with 16 GB RAM.

Fig. 2. (a) The proposed DCAN architecture, (b) stage-wise feature maps through DCAN for ‘boats’ test image. The network first extract information at E1&F1. This information is recovered at the end at F3 & D1, where it is combined to reconstruct the final composite image. The high frequency information is captured by the upper layers as seen in feature maps. The information in the lower layers is first encoded (E1 to intermediate) and then decoded (Intermediate to D1) to remove noise. Overall, the filters learn to rectify ringing during training.

Download Full Size | PDF

3. Simulations and experiments

3.1 Simulations

In order to realize the appearance of unwanted artifacts in undersampled FSI, the reconstruction for pepper’s test image for different sampling ratios is shown in Fig. 3. For an N×N image, the sampling ratio (or rate) S defines the number of required basis patterns for reconstruction as (2×N×N)×S for 4-step, and (1.5×N×N)×S for 3-step scheme. It can be observed that the FSI reconstruction for 256×256 image for S ≥ 10% is very clear encompassing all target details. By qualitative comparison, it can be inferred that a clear target reconstruction in FSI is achieved at S = 10%. Therefore, in this paper we benchmark FSI at S = 10% for performance comparisons (with the proposed DL-FSI) because of its clear reconstruction and real-time support (compared to higher S). For S < 10%, it can be observed that even though substantial target information is present in the undersampled reconstruction, this information is effected by ringing artifacts, lowering image quality. Since real-time FSI relies on undersampled reconstruction [13,14], it is important to develop an imaging framework capable of deringing and denoising to recover clean images.

Fig. 3. Simulation of FSI reconstruction for ‘pepper’ test image for different values of S. The spectrum is sampled by ‘circularly’ sweeping from low to high frequency around band of radius $\mathcal{W}_{\textrm d}$.

Download Full Size | PDF

The proposed DL-FSI framework is optimized though numerical simulations. To achieve powerful deringing, the DL-FSI model is trained on thousands of undersampled images with trademark FSI artifacts. Apart from deringing, the network has inherent denoising characteristics based on network architecture. The denoising feature is useful for practical imaging where the reconstruction slightly differs from simulation and contains noise along with ringing. For training and testing, STL-10 dataset is used. The network is trained over input images reconstructed by FSI at S = 1-5%, and output label as the corresponding ground truth. For quantitative comparison, both DL-FSI and FSI methods are evaluated over a validation dataset (2000 images) which is not seen by DL-FSI during training. The reconstruction results are quantified using two metrics i.e., PSNR and Structural SIMilarity (SSIM) [35]. The results on validation dataset for DL-FSI 1% (when input to the DCAN is 1% FSI reconstruction), DL-FSI 2%, DL-FSI 3%, and FSI 10% are plotted as histograms in Fig. 4. The distributions from the histograms indicate that FSI 10% gives slightly better performance than DL-FSI which is understandable because of its higher sampling rate. However, the DL-FSI method also outperforms FSI for many images in the dataset. To further quantify this performance, the average SSIM and PSNR values for validation dataset are presented in Fig. 4. It can be seen that amongst DL-FSI methods, DL-FSI 3% gives better performance in terms of SSIM and PSNR, with average SSIM and PSNR (of 0.67 and 21 dB) quite comparable to FSI 10% (0.74 and 22 dB). Therefore, the lower bound for reliable reconstruction for the proposed model is set to S = 3%. Although the image quality of FSI 10% is slightly better than DL-FSI 3%, the latter outperforms the former in terms of image reconstruction time.

Fig. 4. Quantitative results on validation dataset (2000 images) for FSI 10% and DL-FSI (1%, 2%, and 3%).

Download Full Size | PDF

The reconstruction performance of DL-FSI is further investigated on the standard test set (e.g., peppers, pirate, mandrill) to prove its flexibility of application on diverse complex scenes. For test images, the reconstruction results for different targets and sampling ratios are shown in Fig. 5(a). The results show that the undersampled FSI reconstruction contains ringing artifacts which corrupt sharp edges and fine details in the image. Conversely, the DL-FSI model after rigorous training on thousands of undersampled images (containing inherent ringing artifacts) is capable of deringing any undersampled image. The DL-FSI is able to recover a clean image even at a low sampling ratio of S = 1% (‘house’ image). However, it is observed that for complex scenes and practical imaging, S = 3-4% yields good reconstruction. The denoising characteristics of DL-FSI is tested by adding white Gaussian noise (‘awgn’ function in Matlab) to (intensity) measurement data for fixed ringing at FSI 4%. It is important to mention that the network is not trained on noisy data but it is the network architecture (using scale variances, autoencoder structure, and skipping connections) that provides efficient denoising. The results in Fig. 5(b) indicate that the network is able to recover a clean image even at very high levels of noise corruption. This validates that the network is robust against noise; capable of suppressing noise which may appear during practical imaging. The performance of DL-FSI can also be compared with state-of-the-art denoising algorithm such as BM3D [20] for different noise levels. From Fig. 6, it can be observed that the denoising performance of DL-FSI is very efficient. It is noteworthy that in the absence of ringing, the proposed method does not distort image and only performs denoising, demonstrating its feature learning strength. The performance of DL-FSI is further compared with FSI 10%, shown in Fig. 7, along with image insets for observing recovered image details.

Fig. 5. Reconstruction results: (a) on different sampling ratios (b) FSI 4% with added noise and corresponding DL-FSI reconstructions (SSIM w.r.t ground truth).

Download Full Size | PDF

Fig. 6. Comparison of image reconstruction between BM3D and DL-FSI models at two noise levels.

Download Full Size | PDF

Fig. 7. Comparison of image reconstruction between FSI 10% and DL-FSI models for ‘boats’ and ‘pirate’ test images. The bottom row shows the image insets zoomed at the images in the top row.

Download Full Size | PDF

It can clearly be seen for ‘boats’ and ‘pirate’ images that both DL-FSI 3% and DL-FSI 4% efficiently reconstruct low level features similar to FSI 10% but without ringing. Conversely, DL-FSI 2% is only able to recover coarse details about the target and blurs out fine details because of excessive ringing in its FSI counterpart. Although the FSI 10% reconstruction looks similar to fully reconstructed image, it still contains ringing artifacts shown in the zoomed insets. From this qualitative comparison it can be concluded that DL-FSI for S = 1-2% can be used for ringing suppression and rudimentary target reconstruction without recovering low level details. Whereas, both DL-FSI 3% and 4% outperform FSI 10% in terms of reconstruction quality by suppressing ringing and recovering fine details accurately. Similarly, compared to DL-FSI 3%, the fine details are better recovered by DL-FSI 4%, seen in Fig. 7.

The reconstruction results quantified by SSIM for 6 other targets from standard test set are presented in Table 1. The trend in the table shows that the proposed DL-FSI method outperforms FSI for different (corresponding) sampling ratios. The SSIM values for both methods at S = 1% are almost similar although DL-FSI has slightly better values. Here, a major improvement is not observed because of heavy ringing and lack of target information in FSI. As the FSI captures more target information, the DL-FSI outperforms FSI by a bigger margin. This trend can be seen till S = 6% after which the information in the FSI reconstruction gets better than DL-FSI. The results in Table 1 are based on simulation but this performance improves for physical imaging when the denoising capability of DL-FSI plays its part along with deringing. This improvement is further verified through physical experiments. Overall, the simulation results indicate that the performance of conventional FSI is significantly improved by integrating DL framework to achieve better quality reconstruction with a very small computational time (discussed in next section).

Table 1. SSIM values for 6 targets reconstructed using FSI and DL-FSI

View Table | View all tables in this article

Finally, the performance of DL-FSI is compared with other basis scan and optimization based schemes. Specifically, the models of Fourier domain regularized inversion (FDRI) [36] (DCT sampling with standard parameters), and Hadamard single-pixel imaging (HSI) [2] are compared with DL-FSI to demonstrate its artifact suppression capability. The comparison is made based on similar compression of the corresponding basis sets (3-4%). From Fig. 8, it can be seen that the reconstruction quality of DL-FSI for lower sampling rates is clean (without artifacts) compared to FDRI and HSI (with both having artifacts in reconstruction). In terms of reconstruction time, both FDRI (about 24ms) and DL-FSI (26ms) have small values compared to HSI (81ms). However, a thorough comparison between these methods is beyond the scope of this paper.

Fig. 8. Qualitative and quantitative comparison of DL-FSI with other methods (HSI and FDRI).

Download Full Size | PDF

3.2 Physical experiments

The efficiency and feasibility of the proposed method is demonstrated through experiments. The imaging configuration of DL-FSPI is shown in Fig. 9. An integrated projection system with TI DLP 6500 DMD module is used to illuminate the target with sinusoidal patterns (binarized using the approach in [13]). The scene to be captured is printed on A4-sized paper (placed in the background of 3D object for some experiments) and kept at a distance of 500 mm from the projector and photodetector. Light back-scattered from the scene is collected (lensless) by the photodetector (Thorlabs; 13 mm² active area). Intensity measurements from the photodetector are digitized using 16-bit data acquisition (DAQ) card (Gage CSEG8 sampling at 2MS/s). A customized software developed in LabVIEW synchronously controls both DMD and photodetector. An Intel i7 CPU with 16GB RAM is used for data processing.

Fig. 9. Experimental imaging setup.

Download Full Size | PDF

The proposed DL-FSI is trained on undersampled FSI images and is capable of deringing along with inherent denoising characteristics. This denoising capability is present because the proposed model is based on denoising autoencoder. This denoising feature is important for experimental imaging where the reconstructions differ slightly from simulations. In the first experiment, the proposed method is tested using the ‘house’ test image placed in the background of a tree-shaped toy. The reconstructions for both FSI and DL-FSI are shown in Fig. 10 for different sampling ratios. It is apparent from visual inspection that the FSI reconstruction for S < 10% contains ringing artifacts and noise. To improve the image quality of conventional FSI for undersampling ratios of 1-4%, the DCAN model is applied. The corresponding DL-FSI reconstructions in Fig. 10 indicate that for S = 3-4% the target image can clearly and accurately be recovered from DL-FSI. The DL-FSI 4% reconstruction is slightly better than DL-FSI 3%. For quantitative analysis, the values of SSIM and PSNR given in Fig. 10 are calculated with respect to a clean FSI 10% image (processed by DL-FSI). The results of DL-FSI 3% and DL-FSI 4% are further compared with 4-step FSI 10% (in all experiments, we use 4-step FSI for S ≥ 10) in Fig. 11. It can be seen that for S = 10% the FSI reconstruction gives better quality without ringing. However, the DL-FSI at lower sampling ratios can still provide clean reconstruction compared to FSI 10% with all image details.

Fig. 10. Qualitative and quantitative comparison of FSI with DL-FSI.

Download Full Size | PDF

Fig. 11. Qualitative comparison of 4-step FSI 10% with 3-step based DL-FSI 3% and 4%.

Download Full Size | PDF

The performance of under-sampled DL-FSI is further compared with 4-step FSI (at higher sampling rates S = 10 and 20%) using a toy bear placed in front of USAF test chart. From the reconstructions in Fig. 12, it can be seen that DL-FSI does well in reconstructing a clean image. Although the resolution of DL-FSI is low (because of low sampling rate), the sharpness in the image tends to visually pitch it up. By comparison, it can be seen that the information embedded in DL-FSI is almost similar to FSI 10% with some differences in sharpness, which is understandable considering the higher sampling rate of FSI. For instance, the small-scale markings at the center of chart are unclear in both FSI 10% and DL-FSI 4%.

Fig. 12. Resolution comparison for 4-step FSI (10% and 20%) with 3-step DL-FSI (4%).

Download Full Size | PDF

For a complex scene, the ‘Lake’ target image is used. The reconstructions are presented in Fig. 13. It can be seen that both DL-FSI 3% and DL-FSI 4% clearly recover multiple objects (like clouds, trees, boat, lake, shadows and background trees) from their low-quality FSI counterparts.

Fig. 13. FSI vs. DL-FSI reconstruction for ‘Lake’ test image.

Download Full Size | PDF

To quantify imaging time, different values of time for conventional FSI (4-step) and DL-FSI are presented in Table 2. The imaging time is based on 3-step FSI scheme, reconstructing 256 × 256 images, and using ∼22kHz modulation speed. In Table 2 the total imaging time (I_T) is equal to data acquisition time (I_AQ) + reconstruction time (I_R). The reconstruction time (I_R) for FSI is the time required by IFT, whereas for DL-FSI it is the combined time of IFT + DCAN processing. The reconstruction time for DL-FSI remains the same for different sampling ratios, which is an attractive feature of DL based model.

Table 2. Experimental imaging time for FSI and DL-FSI.

View Table | View all tables in this article

It can be seen from Table 2 that the proposed DL-FSI (3-4%) achieves 4-6 times higher frame rates compared to FSI 10%. An important thing to note from Table 2 is that as the sampling ratio increases, the difference between the number of frames generated by DL-FSI and conventional FSI (3-step) reduces to 1. This tells us that the computational time of DCAN model becomes insignificant for the desired sampling ratios with good reconstruction quality. On the other hand, the frame rate of DL-FSI is overall higher than commonly used 4-step FSI which has better noise suppression at the cost of extra measurements. The DL-FSI can afford to have a fewer measurements and compensate it through DL based noise suppression.

Finally, the results on ‘pirate’ image are shown in the Figs. 14 and 15. In Fig. 15, the qualitative comparison between conventional FSI 10% and DL-FSI 4% is provided along with plots of image profile (pixel intensity values along colored lines). It can be seen from Fig. 15(a) that DL-FSI 4% reconstruction is sharp and clear (with high contrast) compared to FSI 10%. The image profile in the plots (Fig. 15(b)) also confirms that the DL-FSI 4% (blue plot) closely approximates the ground truth (red plot). On the contrary, the FSI 10% (green plot) due to ringing artifacts has an off-setted pixel intensity plot compared to ground truth (red plot).

Fig. 14. FSI vs. DL-FSI reconstruction for ‘pirate’ test image.

Download Full Size | PDF

Fig. 15. (a) reconstruction comparison of 4-step FSI 10% with 3-step DL-FSI 4%, (b) plots of image profile.

Download Full Size | PDF

4. Conclusion

The paper presents a real-time DL based imaging framework employing 3-step Fourier sampling strategy to improve the quality of extremely undersampled images (1-4%). Compared to existing optimization methods, DL is more efficient and flexible in solving image reconstruction problems, and does not rely on fixed priors, dictionary learning, and pre-calculated matrices. The reconstruction quality for different basis scan (such as FSI, HSI) and optimization methods (e.g., FDRI or total variation based algorithms) degrades in case of under-sampled imaging. Particularly, in FSI, the appearance of ringing artifacts combined with noise (from different sources such as detector, light fluctuations, fast sampling of data) distort image quality. To improve the imaging efficiency of FSI, we present a novel DCAN architecture (DL-FSI) which can also be integrated with other sampling strategies. The proposed DL-FSI is deep yet fast, needs training on simulation data only, and has adaptive denoising and deringing capability based on its remarkable feature learning. The denoising capability of DL-FSI is inherited from its denoising autoencoder architecture with skip connections. This denoising performance is further improved by using a separate stage in parallel to retain high frequency information. Through context learning over ringing artifacts, the proposed method can rectify ringing to further improve image quality. Experimental results demonstrate the feasibility of DL-FSI to reconstruct good quality 256×256 image with 96-97% compression at 5-6 Hz frame rates. The imaging speed and quality of DL-FSI is significantly improved compared to conventional FSI. DL-FSI also outperforms other basis scan and optimization strategies such as FDRI and HSI in terms of artifact-free image reconstruction. Moreover, the denoising and deringing capability of DL-FSI competes well with state-of-the-art image processing algorithms specifically designed for a particular task with high computational cost. The proposed method can easily be applied to different applications without worrying about task specific training.

Funding

National Natural Science Foundation of China (61871031, 61875012); Natural Science Foundation of Beijing Municipality (4182058).

Disclosures

The authors declare no conflicts of interest.

References

1. J. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]

2. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus Fourier single-pixel imaging,” Opt. Express 25(16), 19619–19639 (2017). [CrossRef]

3. Z. Zhang, X. Ma, and J. Zhong, “Single-pixel imaging by means of Fourier spectrum acquisition,” Nat. Commun. 6(1), 6225 (2015). [CrossRef]

4. F. Ferri, D. Magatti, L. Lugiato, and A. Gatti, “Differential ghost imaging,” Phys. Rev. Lett. 104(25), 253603 (2010). [CrossRef]

5. B. Sun, S. S. Welsh, M. P. Edgar, J. H. Shapiro, and M. J. Padgett, “Normalized ghost imaging,” Opt. Express 20(15), 16892–16901 (2012). [CrossRef]

6. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D Computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef]

7. S. S. Welsh, M. P. Edgar, R. Bowman, P. Jonathan, B. Sun, and M. J. Padgett, “Fast full-color computational imaging with single-pixel detectors,” Opt. Express 21(20), 23068–23074 (2013). [CrossRef]

8. J. Peng, M. Yao, J. Cheng, Z. Zhang, S. Li, G. Zheng, and J. Zhong, “Micro-tomography via single-pixel imaging,” Opt. Express 26(24), 31094–31105 (2018). [CrossRef]

9. S. Zhao, R. Liu, P. Zhang, H. Gao, and F. Li, “Fourier single-pixel reconstruction of a complex amplitude optical field,” Opt. Lett. 44(13), 3278–3281 (2019). [CrossRef]

10. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

11. V. Katkovnik and J. Astola, “Compressive sensing computational ghost imaging,” J. Opt. Soc. Am. A 29(8), 1556–1567 (2012). [CrossRef]

12. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]

13. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Fast Fourier single-pixel imaging via binary illumination,” Sci. Rep. 7(1), 12029 (2017). [CrossRef]

14. J. Huang, D. Shi, K. Yuan, S. Hu, and Y. Wang, “Computational-weighted Fourier single-pixel imaging via binary illumination,” Opt. Express 26(13), 16547–16559 (2018). [CrossRef]

15. J. Gibbs, “Fourier’s series,” Nature 59(1522), 200 (1898). [CrossRef]

16. L. C. P. Croton, G. Ruben, K. S. Morgan, D. M. Paganin, and M. J. Kitchen, “Ring artifact suppression in X-ray computed tomography using a simple, pixel-wise response correction,” Opt. Express 27(10), 14231–14245 (2019). [CrossRef]

17. B. Münch, P. Trtik, F. Marone, and M. Stampanoni, “Stripe and ring artifact removal with combined wavelet - Fourier filtering,” Opt. Express 17(10), 8567–8591 (2009). [CrossRef]

18. C. Jung and L. Jiao, “Novel bayesian deringing method in image interpolation and compression using a SGLI prior,” Opt. Express 18(7), 7138–7149 (2010). [CrossRef]

19. J. Veraart, E. Fieremans, I. O. Jelescu, F. Knoll, and D. S. Novikov, “Gibbs ringing in diffusion MRI,” Magn. Reson. Med. 76(1), 301–314 (2016). [CrossRef]

20. A. Foi, V. Katkovnik, and K. Egiazarian, “Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images,” IEEE Trans. Image Process. 16(5), 1395–1411 (2007). [CrossRef]

21. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

22. B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE Trans. Image Process. 25(11), 5187–5198 (2016). [CrossRef]

23. G. Satat, M. Tancik, O. Gupta, B. Heshmat, and R. Raskar, “Object classification through scattering media with deep learning on time resolved measurement,” Opt. Express 25(15), 17466–27479 (2017). [CrossRef]

24. P. Caramazza, A. Boccolini, D. Buschek, M. Hullin, C. F. Higham, R. Henderson, and D. Faccio, “Neural network identification of people hidden from view with a single-pixel, single-photon detector,” Sci. Rep. 8(1), 11945 (2018). [CrossRef]

25. X. Zhai, Z. Cheng, Z. Liang, Y. Chen, Y. Hu, and Y. Wei, “Computational ghost imaging via adaptive deep dictionary learning,” Appl. Opt. 58(31), 8471–8478 (2019). [CrossRef]

26. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]

27. Y. He, G. Wang, G. Dong, S. Zhu, H. Chen, A. Zhang, and Z. Xu, “Ghost imaging based on deep learning,” Sci. Rep. 8(1), 6469 (2018). [CrossRef]

28. S. Rizvi, J. Cao, K. Zhang, and Q. Hao, “Improving Imaging Quality of Real-time Fourier Single-pixel Imaging via Deep learning,” Sensors 19(19), 4190 (2019). [CrossRef]

29. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

30. P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” In proceedings of the 25th International Conference on Machine Learning, (ACM2008), pp. 1096–1103.

31. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” In AISTATS, 2010.

32. S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of International conference on Machine Learning (2015), pp. 448–456.

33. D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” ICLR, 2015.

34. A. Coates, H. Lee, and A. Y. Ng, “An analysis of single layer networks in unsupervised feature learning,” AISTATS, 2011.

35. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE T. Image Process. 13(4), 600–612 (2004). [CrossRef]

36. K. M. Czajkowski, A. Pastuszczak, and R. Kotyński, “Real-time single-pixel video imaging with Fourier domain regularization,” Opt. Express 26(16), 20009–20022 (2018). [CrossRef]

Target	Method	S = 1%	S = 2%	S = 3%	S = 4%
Mandrill	FSI	0.52	0.58	0.64	0.67
Mandrill	DL-FSI	0.53	0.64	0.68	0.74
Cameraman	FSI	0.53	0.59	0.61	0.63
Cameraman	DL-FSI	0.59	0.65	0.67	0.71
Pepper	FSI	0.55	0.63	0.67	0.70
Pepper	DL-FSI	0.57	0.70	0.73	0.78
House	FSI	0.58	0.64	0.68	0.72
House	DL-FSI	0.60	0.71	0.75	0.78
Barbra	FSI	0.48	0.55	0.61	0.63
Barbra	DL-FSI	0.50	0.59	0.64	0.68
Plane	FSI	0.55	0.60	0.64	0.67
Plane	DL-FSI	0.57	0.64	0.67	0.72

Method	Projected patterns	I_AQ _(ms)	I_R _(ms)	I_T _(ms)	Frame rate (fps)	3 step-FSI (fps)	4-step FSI (fps)
FSI 10%	13107	595	7	602	1	-	-
DL-FSI 1%	983	47	26	73	13	18	16
DL-FSI 2%	1966	89	26	115	9	10	8
DL-FSI 3%	2949	134	26	160	6	7	5
DL-FSI 4%	3932	178	26	204	4	5	4

Target	Method	S = 1%	S = 2%	S = 3%	S = 4%
Mandrill	FSI	0.52	0.58	0.64	0.67
Mandrill	DL-FSI	0.53	0.64	0.68	0.74
Cameraman	FSI	0.53	0.59	0.61	0.63
Cameraman	DL-FSI	0.59	0.65	0.67	0.71
Pepper	FSI	0.55	0.63	0.67	0.70
Pepper	DL-FSI	0.57	0.70	0.73	0.78
House	FSI	0.58	0.64	0.68	0.72
House	DL-FSI	0.60	0.71	0.75	0.78
Barbra	FSI	0.48	0.55	0.61	0.63
Barbra	DL-FSI	0.50	0.59	0.64	0.68
Plane	FSI	0.55	0.60	0.64	0.67
Plane	DL-FSI	0.57	0.64	0.67	0.72

Method	Projected patterns	I_AQ _(ms)	I_R _(ms)	I_T _(ms)	Frame rate (fps)	3 step-FSI (fps)	4-step FSI (fps)
FSI 10%	13107	595	7	602	1	-	-
DL-FSI 1%	983	47	26	73	13	18	16
DL-FSI 2%	1966	89	26	115	9	10	8
DL-FSI 3%	2949	134	26	160	6	7	5
DL-FSI 4%	3932	178	26	204	4	5	4

Deringing and denoising in extremely under-sampled Fourier single pixel imaging

Abstract

1. Introduction

2. Principles and methods

2.1 Fourier single-pixel imaging

2.2 Deep learning based FSI

3. Simulations and experiments

3.1 Simulations

3.2 Physical experiments

4. Conclusion

Funding

Disclosures

References

Cited By

Figures (15)

Tables (2)

Equations (7)

Optics Express