Photon-level single-pixel 3D tomography with masked attention network

Kai Song; Kai Song; Yaoxing Bian; Yaoxing Bian; Yaoxing Bian; Fanjin Zeng; Fanjin Zeng; Zhe Liu; Shuangping Han; Shuangping Han; Jiamin Li; Jiamin Li; Jiazhao Tian; Jiazhao Tian; Kangze Li; Kangze Li; Xiaoyu Shi; Liantuan Xiao; Liantuan Xiao; Liantuan Xiao; Liantuan Xiao

doi:10.1364/OE.510706

1. Introduction

Tomographic imaging is a noninvasive high-resolution volumetric imaging technology that plays a critical role in observing the three-dimensional (3D) structure of biological tissues and detecting internal defects in industrial samples [1–4]. The technique utilizes sophisticated equipment to control sample or illumination beam, enabling optical sectioning of the sample. These individual sections are then stacked to reconstruct a 3D structure of the sample [5–7]. However, tomographic imaging of samples in extreme environments puts forward higher requirements for imaging systems, such as broader spectral response, more extreme lighting conditions, and increased flexibility [8–10].

Single pixel imaging (SPI) is a typical computational imaging technique that offers significant advantages in challenging lighting conditions [11–13]. It utilizes a set of spatial patterns to illuminate the samples, then uses a bucket detector for data acquisition, and forms an image from the measurements by inversion algorithm [14–16]. Since SPI only employs a high-sensitivity bucket detector to capture echo signals, granting SPI significant advantages in terms of detection sensitivity, extending the spectral response range, and reducing imaging costs. And the unique imaging modality of SPI is compatible with Compressive Sensing (CS) theory, enabling the recovery of high-quality images with a sub-Nyquist sampling rate (SR) [17–19]. Furthermore, SPI provides increased flexibility, which can freely regulate the pixel resolution according to the pixel resolution of the illumination patterns [20]. Consequently, single-pixel tomography has gradually attracted considerable attention [21]. Currently, researchers have successfully applied this technology to biological microscopic imaging and imaging through scattering media [22–24]. However, the technology still faces the challenges of insufficient axial resolution and restricted lateral resolution.

In recent years, deep learning has been extensively applied to improve the imaging quality of SPI and alleviating the restricts of tomography [25–28]. Specifically, well-designed neural networks in SPI can efficiently compensate for information loss caused by under sampling and recover superior images even at a low SR [29–34]. In the field of tomographic imaging, the relevant neural networks enhance both the axial and lateral resolution of tomography, which allows for isotropic imaging while maintaining high resolution [35–37]. Consequently, deep learning is anticipated to offer a viable approach for enhancing the performance of single-pixel tomography.

In this paper, we propose a Masked Attention Network (MANet) to eliminate interference in optical sections of samples. It successfully addresses the issue of overlapped sections caused by the limited axial resolution of photon-level single-pixel tomography. The results demonstrated that MANet can effectively enhance the axial and lateral resolution by 3 times and 2 times respectively. More importantly, the plug-and-play characteristics of MANet allows for flexible integration into most tomography system.

2. Methods

2.1 Single-pixel tomographic imaging via deep learning

In the single-pixel tomographic imaging system, the layer of the sample in the focal plane is modulated by a series of patterns, and the scattered light is recorded by the bucket detector, as shown in Fig. 1(a). According to the principle of CS, a series of patterns constitute the measurement matrix $\phi \in R^{M\times N}$, where $M$ represents the number of patterns, and $N$ is the resolution of the target image. The one-dimensional (1D) measurements $y$ recorded by the detector can be expressed as:

(1)$$y_i = \phi_i x (i=1,2,3 \ldots M)$$

where $x$ is the target image, represented by a 1D column vector with a dimension of $N \times 1$. The measurements $y$ refer to the photon counts with a dimension of $M\times 1$. However, because of the under sampling, the number of unknowns in Eq. (1) being much larger than the number of equations (measurements). Equation (1) is an underdetermined system of equations. It is difficult to recover the $N$-dimensional original signals directly from the $M$-dimensional measurements ($M\ll N$). Therefore, the image $x$ is typically reconstructed by iterative algorithms, such as Total Variation Augmented Lagrangian Alternating Direction (TVAL3) [38], Orthogonal Matching Pursuit [39].

Fig. 1. Schematic diagram of single-pixel tomographic imaging system. (a) Detailed working principle of 3D single-pixel tomographic imaging via deep learning. (b) Reconstructed results of the sample.

Download Full Size | PDF

Due to the inherent limitations of the imaging system, the restored image of a specific layer is usually influenced by the scattered light from neighboring layers. This will result in structures from different layers appearing in the restored image. Consequently, the low-quality reconstructed image is fed into MANet to remove the interference from different layers:

(2) $$x_c = MANet(x)$$

During the whole imaging process, the top of the sample is initially positioned in the focal plane of the tomography system for imaging. Subsequently, the sample is shifted incrementally along its axial direction with a fixed step size. The imaging process described above is executed with each step of movement. After systematically covering all the layers of sample, these individual 2D layers are then stacked to reconstruct a 3D structure of the sample, as illustrated in Fig. 1(b). The step size of the axial movement represents the axial resolution of the system.

2.2 Masked attention network

The architecture of MANet is depicted in Fig. 2(a). It utilizes U-net [40] as the foundational framework and contains four down-sampling blocks, a dual-domain masked attention module, and four up-sampling blocks. The down-sampling block consists of two convolution layers with kernel size of $3\times 3$ and a max pooling layer with stride 2. Similarly, each up-sampling block includes an up-sampling layer followed by two convolution layers. Dual domain masked attention module mainly contains Spatial Masked Attention Module (SMAM) and Channel Masked Attention Module (CMAM), as illustrated in Fig. 2(b). The SMAM conducts a cross-product operation between the reshaped feature maps and the transpose of them, subsequently yielding a probability distribution via activation function softmax [41]. The probability distribution represents the spatial correlation between each pixel and the other pixels in the feature map, which indicates spatial regions that demand attention from the neural network. Similarly, the channel attention module assigns distinct weights to individual channels based on their interrelation analysis. Then the probability distributions will be randomly set to 0 according to a certain proportion ($20 {\% }\sim 50 {\% }$), thus achieving random mask of the feature maps. The masked mechanism enables the model to comprehensively understand and reconstruct the image content rather than interference feature, thereby enhancing generalization of MANet. During the testing process, the mask mechanism is deactivated, ensuring the preservation of the intact probability distribution without randomization to 0. Furthermore, it is worth emphasizing that the deconvolution layer in up-sampling block is replaced by bilinear interpolation, since the bilinear interpolation can greatly reduce the parameters of the model while maintaining its performance.

Fig. 2. The architecture of MANet. (a) The backbone of MANet. Conv 3$\times$3 represents convolution layer and the kernel size is $3\times 3$, BN is BatchNormalization, Dilated Conv is a dilated convolution layer and the kernel size is $3\times 3$, Concat means concatenate. (b) The principle of SMAM and CMAM.

Download Full Size | PDF

In the implementation of the neural network, the Mean Square Error (MSE) [42] is used as the loss function for image restoration to minimize the discrepancy of pixels between the recovered images and ground truth:

(3)$$Loss_{MSE} = \frac{1}{N}\sum_{i=1}^{N}(GT_i-MANet(x_i))^2$$

where $N$ is the number of pixels, $GT_i$ represents the ground truth, and $MANet(x_i)$ is the pixel recovered by the MANet. During training process, epoch is 100 and batch size is 64. As for the learning rate, we use a dynamic learning rate strategy. The learning rate gradually decreases as the training progresses, which makes the model more stable at the end of training. The initial learning rate is set to 0.001 and is halved every 20 epochs during training. The MANet is implemented by PyTorch (version 2.0.0) and trained on a work station equipped with 128 GB RAM, two Intel Xeon Gold C6226R CPUs, and two RTX A6000 GPUs.

Initially, MANet is pre-trained on MNIST reshaped to 128$\times$128 pixels, which spans 3 hours. Then MANet is fine-tuned on the interference dataset and the two-line dataset, that each fine-tuning requires 7 minutes. The MNIST dataset comprises 60,000 training images and 10,000 test images of handwritten digits. The interference dataset consists of 3,000 images. Each image is overlaid with the characters ’3’, ’D’, and ’I’. The rotation angle of characters varies between $-40^\circ$ and $40^\circ$, while the size ranges from 40 to 90. The grayscale value of characters in the focal plane is set to $250\sim 255$, while the grayscale value of characters in adjacent layers is set between $180\sim 220$. The axial resolution of the dataset can be determined by adjusting grayscale value of characters located in adjacent layers. High axial resolution results in closer distances between different layers, indicating increased grayscale values of characters in adjacent layers. Conversely, low axial resolution corresponds to reduced grayscale values. The dual-line dataset includes 3,000 images. The line spacing in the dual-line images ranges from 1 to 45 pixels, the width of line varies between 4 and 10 pixels and the height of line ranges from 50 to 70 pixels. The rotation angle of dual lines varies between $-50^\circ$ and $50^\circ$.

During the fine-tuning stage, the forward process of SPI is performed on both the interference dataset and the two-line dataset to simulate imaging outcomes. These simulated results, along with their corresponding high-quality images, are utilized as image-label pairs for fine-tuning the MANet. To improve the anti-noise ability of MANet, the noise is introduced in the forward process of SPI. Specifically, the normalized image is initially subjected to an element-wise multiplication and summation with the patterns. The patterns are Hadamard [43] patterns with a resolution of $128\times 128$. The calculation result is divided by the pixel number of pattern to calculate the probability of photons returning for each pattern. Establish the response timeline of Single Photon Avalanche Diode (SPAD) for each pattern duration, where the probability of detecting a photon at each moment is the probability obtained above. Subsequently, a response timeline for noise photons is established, wherein the occurrence of photon response events is randomly determined to simulate the presence of noise photons. Finally, the total photon counts, including noise, can be obtained by adding the two timelines. The timeline length is fixed at 60,000. Two datasets incorporate both high-level and low-level noise. Under low-level noise conditions, the count of noise photons fluctuates between 200 and 400. For high-level noise scenarios, the count of noise photons varies between 700 and 900. The images introduced with high-level and low-level noise are equally distributed in both fine-tuning datasets.

3. Numerical simulation

3.1 Sampling rate

To assess the robustness of MANet, this section compares the reconstruction results recovered by Differential Ghost Imaging (DGI) [44], TVAL3, Fully Convolutional Network (FCN) [45], and MANet at different SRs. DGI is a classical ghost imaging algorithm algorithm, and FCN is a deep learning algorithm. Initially, we apply the SPI forward model to acquire 1D measurements of the three test images, with each image exclusively featuring the characters ’3’, ’D’, and ’I’, respectively. Subsequently, DGI and TVAL3 are adopted to reconstruct the image respectively. For the deep learning algorithms, FCN and MANet do not directly translate the 1D measurements into images. Instead, they work on enhancing the low-quality images reconstructed by TVAL3. We introduce low-level and high-level noise into the forward model. The Peak signal-to-noise ratio (PSNR) [46] and Structural Similarity (SSIM) [47] of reconstruction results at SR of 1.5 %, 3.0 %, and 5.0 % are depicted in Fig. 3. The SR is the ratio of the number of patterns to the pixel number of reconstructed image:

(4)$$S R=\frac{M_P}{N}$$

where $M_P$ is the number of patterns, and $N$ is the pixel number of reconstructed image.

Fig. 3. Quantitative comparison of DGI, TAVL3, FCN, and MANet in terms of SR. (a-c) PSNR of the results obtained under noise free (a), low-level noise (b), and high-level noise (c) conditions respectively. (d-f) SSIM of the results obtained under noise free (d), low-level noise (e), and high-level noise (f) conditions respectively.

Download Full Size | PDF

The PSNR is defined as:

(5)$$P S N R=10 \cdot \log _{10}\left(\frac{\left(M A X_I\right)^2}{M S E}\right)$$

where $MSE$ is the Mean Square Error defined in section 2.2, $MAX_I$ is the maximum grayscale value of the image.

The SSIM is defined as:

(6)$$\operatorname{SSIM}=\frac{\left(2 \mu_{G T} \mu_{\operatorname{MANet}(x)}+C_1\right)\left(2 \sigma_{\operatorname{GTMANet}(x)}+C_2\right)}{\left(\mu_{G T}^2+\mu^2{ }_{\operatorname{MANet}(x)}+C_1\right)\left(\sigma_{G T}^2+\sigma^2{ }_{\operatorname{MANet}(x)}+C_2\right)}$$

where $\mu _{G T}$ is the mean of ground truth, $\mu _{\operatorname {MANet}(x)}$ is the mean of the image reconstructed by MANet, and $\sigma _{\operatorname {GTMANet}(x)}$ is the covariance of ground truth and image reconstructed by MANet. $\sigma _{G T}$ is the standard deviation of the ground truth, and $\sigma _{\operatorname {MANet}(x)}$ is the standard deviation of the image reconstructed by MANet. $C_1$, $C_2$ are constants, set to 0.01 and 0.03, respectively.

The results indicate that regardless of the noise level, MANet exhibits comparable performance to FCN at a SR of 1.5 %, surpassing the other conventional algorithms. However, when the SR is increased to 3.0 %, MANet exhibits a substantial performance improvement, surpassing other algorithms significantly. Nevertheless, as more patterns are introduced, the performance gains of MANet become less discernible. Consequently, taking into account a balance between imaging efficiency and image quality, we set the SR of subsequent experiment to 3.0 %. And in the presence of noise, the PSNR of images recovered by traditional algorithms decreases as the SR increase beyond 3.0 %. The reason behind this phenomenon lies in the fact that, when the photon count rate is low, introducing more patterns will result in increased noise levels, consequently impacting the quality of the image [48].

Additionally, for a more straightforward performance comparison of these algorithms, we present imaging results at SR of 3.0 % in Fig. 4, considering noise free, low-level noise, and high-level noise scenes. It can be observed that TVAL3 and DGI are highly sensitive to noise. In the high-level noise conditions, the character ’3’ restored by DGI are dirty or even corrupted by strong noise. In contrast, despite the presence of noise, MANet can always recover clean and high-contrast images. Therefore, MANet surpasses the other algorithms both in quantitative evaluation index and visual appearance, demonstrating its remarkable robustness.

Fig. 4. Qualitative comparisons of DGI, TAVL3, FCN, and MANet in terms of anti-noise robustness. Avg. PSNR represents the average PSNR of the three reconstructed images. Avg. SSIM represents the average SSIM of the three reconstructed images.

Download Full Size | PDF

3.2 Axial resolution

To demonstrate the efficacy of MANet in enhancing the axial resolution of single-pixel tomography, we compare the reconstruction results of different algorithms. First, we simulate overlapped images of the 3D structured characters ’3DI’ along the axial direction, which are not included in the training dataset. Figure 5($a_1$) simulates the scenario where character ’3’ is positioned at the focal plane of the imaging system (grayscale values of ’3’, ’D’, ’I’ are 255, 215, and 185), and the light from characters ’D’ and ’I’ cannot be eliminated. Figures 5($a_2-a_3$) simulate the cases where ’D’ (grayscale values of ’3’, ’D’, ’I’ are 215, 255, and 215) and ’I’ (grayscale values of ’3’, ’D’, ’I’ are 185, 215, and 255) are individually situated in the focal plane, respectively. Then, we conduct the forward sampling and image reconstruction operations on the simulated results. The low-level noise is introduced in the simulation.

Fig. 5. Comparison of DGI, TAVL3, FCN and MANet for interference elimination from various layers. ($a_1-a_3$) Characters ’3’, ’D’ and ’I’ are individually situated in the focal plane of imaging system. ($b_1-b_3$) Results obtained by DGI. ($c_1-c_3$) Results generated by TVAL3. ($d_1-d_3$) Results recovered by FCN. ($e_1-e_3$) Results reconstructed by MANet. (f) Cross-sectional profile of white dotted line in ($b_2$), ($c_2$), ($d_2$), ($e_2$), and ground truth of character ’D’. GT represents the ground truth.

Download Full Size | PDF

Figures 5($b_1-e_3$) illustrate the reconstructed images. It is clearly observed that both TVAL3 and DGI encounter challenges in distinguishing signals from different layers of the 3D sample. The structures from diverse layers of the sample are recovered comprehensively, with a noticeable presence of noise in the reconstructed images. In contrast, deep learning algorithms show superior performance by accurately capturing the intricate structure of each layer, and their efficacy noise suppression is also commendable. But the FCN still grapples with precise recovery of fine details, which struggles to produce a clear ’D’. However, the MANet exhibits clear reconstruction of each layer of the sample. We also use PSNR and SSIM to quantitatively evaluate the quality of the restored image. The PSNR of all the images recovered by MANet exceeds 27 dB. Figure 5(f) shows that the structure recovered by MANet is almost consistent with the ground truth. All these results collectively showcase the exceptional stability and robustness of MANet.

3.3 Lateral resolution

Similarly, the reconstruction results of different algorithms are compared to illustrate the efficacy of MANet in enhancing the lateral resolution of single-pixel tomography. First, simulations are conducted to generate two-line images with varying line spacings (10, 8, 4, 2, and 1 pixels correspond to Figs. 6($a_1-a_5$) respectively), which are not included in training dataset. The width of line is fixed at 8 pixels. Then we perform the forward sampling and image reconstruction operations on the simulated images. The low-level noise is also introduced in the forward process. The restored images of samples with different lateral resolutions are shown in Figs. 6($b_1-e_5$).

Fig. 6. Comparison of lateral imaging resolution of DGI, TAVL3, FCN, and MANet. ($a_1-a_5$) Two-line images with line sapcings of 10 pixels, 8 pixels, 4 pixels, 2 pixels, and 1 pixel. ($b_1-b_5$) Results recovered by DGI. ($c_1-c_5$) Results obtained by TVAL3. ($d_1-d_5$) Results generated by FCN. ($e_1-e_5$) Results reconstructed by MANet. (f) Cross-sectional profile of white dotted line in ($a_3$), ($b_3$), ($c_3$), ($d_3$), ($e_3$). (g) Cross-sectional profile of white dotted line in ($a_5$), ($b_5$), ($c_5$), ($d_5$), ($e_5$).

Download Full Size | PDF

When the distance between the two lines is less than 4 pixels, DGI cannot distinguish two lines well. For cases where the distance is below 2 pixels, TVAL3 fails. Unexpectedly, the FCN exhibit poor performance. When the line spacing is set to 4 pixels, FCN could no longer accurately restore two distinct lines, which is due to the excessive up-sampling factor in the up-sampling layers of the network. In contrast, the MANet reconstructed an almost perfect image under the same condition, which can still offer impressive results even when the distance between the two lines is less than 1 pixel. Figures 6(f-g) show the cross-sections along the white dotted lines in the reconstructed images when the line spacing is 4 pixels and 1 pixel, respectively. Compared to other methods, the MANet can significantly improve lateral resolution, which is conducive to the accurate reconstruction of 3D images.

4. Optical experiments

To prove the feasibility of the above scheme, we design an experimental setup as shown in Fig. 7. The laser (MGL-III-532 nm) emits the continuous light, which pumps the 3D sample at a certain angle after passing through a beam expander (Thorlabs GB10-A). The sample is fixed on a translation stage (Thorlabs PT3A/M). Then, the fluorescence from the 3D sample at the focal plane is captured by the objective lens (Olympus IOPCOL129410X-VIS, 0.25 NA) and subsequently imaging through lens$_1$ onto the Digital Micromirror Device (DMD) (UPOLabs HDSLM136D70-DDR, $1024\times 768$). The DMD displays a set of patterns with a resolution of $128\times 128$ to modulate the image. And the reflected light from DMD is collected by lens$_2$ into a SPAD (SIMINICS SPD500). Finally, the signals of SPAD are recorded by Time-Correlated Single Photon Counting (SIMINICS FT1040) system and accumulate the final intensity. The complete 3D image of the sample can be reconstructed by recovering each layer of the sample.

Fig. 7. Optical experiment setup. Laser emits the continuous light. ND filter attenuates the intensity of light. Beam expander enlarges the diameter of the parallel input beam. Ob is Objective lens, that collects the fluorescence. DMD modulates the image of sample. SPAD detects signals reflected by DMD.

Download Full Size | PDF

In the axial resolution experiment, the dye (Rhodamine 6G, aladdin) is sprayed on the cover slip by inkjet printer to form characters ’3’, ’D’, and ’I’ as samples respectively. The size of all characters is 1.5 $mm$. These customized cover slips can be vertically stacked to form a 3D sample represented by ’3DI’. The vertical spacing between the individual characters can be regulated by a 3D translation stage. Experiments are conducted on 3D samples with axial intervals of 2760 $\mu m$, 1729 $\mu m$, 1237 $\mu m$, 759 $\mu m$, and 605 $\mu m$ with a SR of 3.0 %.

When the axial interval between the characters is 1729 $\mu m$, TVAL3 struggles to discern scattered light originating from diverse layers in the reconstructed image of character ’I’ in Fig. 8(a). As the axial distance decreases, it is more challenging to distinguish the structures of different layers within the sample. In contrast, MANet can almost perfectly recover images of each layer up to an axial distance of 605 $\mu m$. Moreover, pixels with grayscale values exceeding 0.3 are selected in the reconstructed images to reconstruct the 3D sample with the axial interval of 759 $\mu m$, the results are shown in Figs. 8(b-c). The 3D structure obtained by TVAL3 exhibits significant blurriness, rendering the characters barely discernible, whereas the samples generated using MANet demonstrate remarkable clarity. Therefore, compared to single-pixel tomography utilizing the TVAL3, MANet shows a superior ability to enhance axial resolution by approximately 3 times (1729 $\mu m / 605 \mu m$). The experiment results are consistent with the simulation results.

Fig. 8. Measure the effectiveness of MANet in improving axial resolution. (a) Reconstructed images generated by TVAL3 and MANet with a $4 \times 4$ $mm^2$ field of view. (b-c) 3D structure obtained from TVAL3 and MANet.

Download Full Size | PDF

In the lateral resolution experiments, the samples utilized for testing are also customized cover slips. Two fluorescent lines with different lateral distances (700 $\mu m$, 300 $\mu m$, 100 $\mu m$, 50 $\mu m$, and 30 $\mu m$) are printed on the cover slips. Figure 9 shows the imaging results generated by different algorithms at a SR of 3.0 %. When the distance between dual lines is 50 $\mu m$, TVAL3 encounters challenges in recovering the double line, and it fails to operate effectively as the distance is reduced to 30 $\mu m$. However, MANet can resolve features at a minimum of 30 $\mu m$ while effectively suppressing noise. Additionally, Fig. 9(c) presents the cross-sectional outline of the white dotted lines in each image. Images reconstructed by MANet always exhibit higher contrast. The calculations indicate that MANet has successfully enhance the lateral resolution of the imaging system by approximately 2 times.

Fig. 9. Comparison experiments of single-pixel tomographic imaging lateral resolution using different algorithms. Row (a) Results generated by TVAL3. Row (b) Results obtained by MANet. (c) Cross-sectional profile of white dotted line in rows (a-b).

Download Full Size | PDF

5. Conclusion

In conclusion, the MANet is proposed to enhance the resolution of photon-level single-pixel tomography, thereby enabling high-resolution 3D imaging by effectively eliminating interference in optical sections. Within a $4 \times 4$ $mm^2$ field of view, MANet significantly enhances the axial resolution and lateral resolution of single-pixel tomography by 3 times and 2 times, respectively. The characteristics of wide field of view makes it suitable for tasks with high real-time requirements, while avoiding certain limitations of scanning imaging. Moreover, the compatibility of MANet with other tomography technologies allows it to be integrated into diverse tomography systems in a plug-and-play manner, obviating the need for any supplementary hardware.

Funding

National Key Research and Development Program of China (2022YFA1404201); National Natural Science Foundation of China (6191101445, 62127817, 62305239); Science and Technology Major Special Project of Shanxi Province (202201010101005); Fundamental Research Program of Shanxi Province (202203021222104, 202203021222107, 202203021222113, 202203021222133).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. Fang, K. Huang, E. Wu, et al., “Mid-infrared single-photon 3d imaging,” Light: Sci. Appl. 12(1), 144 (2023). [CrossRef]

2. C. Walsh, P. Tafforeau, W. Wagner, et al., “Imaging intact human organs with local resolution of cellular structures using hierarchical phase-contrast tomography,” Nat. Methods 18(12), 1532–1541 (2021). [CrossRef]

3. R. Su, M. Kirillin, E. W. Chang, et al., “Perspectives of mid-infrared optical coherence tomography for inspection and micrometrology of industrial ceramics,” Opt. Express 22(13), 15804–15819 (2014). [CrossRef]

4. Z. Cai, R. Zhang, N. Zhou, et al., “Programmable aperture light-field microscopy,” Laser Photonics Rev. 17(9), 2300217 (2023). [CrossRef]

5. X. Chen, C. Zhang, P. Lin, et al., “Volumetric chemical imaging by stimulated raman projection microscopy and tomography,” Nat. Commun. 8(1), 15117 (2017). [CrossRef]

6. A. G. Podoleanu, “Optical coherence tomography,” J. Microsc. 247(3), 209–219 (2012). [CrossRef]

7. Z. Yuan, D. Yang, W. Wang, et al., “Self super-resolution of optical coherence tomography images based on deep learning,” Opt. Express 31(17), 27566–27581 (2023). [CrossRef]

8. J. Zhao, J. Dai, B. Braverman, et al., “Compressive ultrafast pulse measurement via time-domain single-pixel imaging,” Optica 8(9), 1176–1185 (2021). [CrossRef]

9. X. Yang, Z. Yu, L. Xu, et al., “Underwater ghost imaging based on generative adversarial networks with high imaging quality,” Opt. Express 29(18), 28388–28405 (2021). [CrossRef]

10. X. Yang, P. Jiang, M. Jiang, et al., “High imaging quality of fourier single pixel imaging based on generative adversarial networks at low sampling rate,” Opt. Lasers Eng. 140, 106533 (2021). [CrossRef]

11. K. Song, Y. Bian, K. Wu, et al., “Single-pixel imaging based on deep learning,” arXiv, arXiv:2310.16869 (2023).

12. L. Gao, W. Zhao, A. Zhai, et al., “Oam-basis wavefront single-pixel imaging via compressed sensing,” J. Lightwave Technol. 41(7), 2131–2137 (2023). [CrossRef]

13. P. He, L. Gao, W. Zhao, et al., “Wavefront single-pixel imaging using a flexible slm-based common-path interferometer,” Opt. Lasers Eng. 168, 107633 (2023). [CrossRef]

14. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13(1), 13–20 (2019). [CrossRef]

15. P. Kilcullen, T. Ozaki, and J. Liang, “Compressed ultrahigh-speed single-pixel imaging by swept aggregate patterns,” Nat. Commun. 13(1), 7879 (2022). [CrossRef]

16. S. Sun, W. Zhao, A. Zhai, et al., “Dct single-pixel detecting for wavefront measurement,” Opt. Laser Technol. 163, 109326 (2023). [CrossRef]

17. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

18. M. F. Duarte, M. A. Davenport, D. Takhar, et al., “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

19. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]

20. U. Kim, H. Quan, S. H. Seok, et al., “Quantitative refractive index tomography of millimeter-scale objects using single-pixel wavefront sampling,” Optica 9(9), 1073–1083 (2022). [CrossRef]

21. P. A. Stockton, J. J. Field, J. Squier, et al., “Single-pixel fluorescent diffraction tomography,” Optica 7(11), 1617–1620 (2020). [CrossRef]

22. J. Peng, M. Yao, J. Cheng, et al., “Micro-tomography via single-pixel imaging,” Opt. Express 26(24), 31094–31105 (2018). [CrossRef]

23. L. Pan, Y. Shen, J. Qi, et al., “Single photon single pixel imaging into thick scattering medium,” Opt. Express 31(9), 13943–13958 (2023). [CrossRef]

24. Z. Du, W. Zhao, A. Zhai, et al., “Dmd-based single-pixel off-axis interferometry for wavefront reconstruction of a biological sample,” Appl. Phys. Lett. 123(3), 033702 (2023). [CrossRef]

25. X. Chang, Z. Wu, D. Li, et al., “Self-supervised learning for single-pixel imaging via dual-domain constraints,” Opt. Lett. 48(7), 1566–1569 (2023). [CrossRef]

26. X. Zhang, C. Deng, C. Wang, et al., “Vgennet: Variable generative prior enhanced single pixel imaging,” ACS Photonics 10(7), 2363–2373 (2023). [CrossRef]

27. K. Ning, B. Lu, X. Wang, et al., “Deep self-learning enables fast, high-fidelity isotropic resolution restoration for volumetric fluorescence microscopy,” Light: Sci. Appl. 12(1), 204 (2023). [CrossRef]

28. B. Huang, J. Li, B. Yao, et al., “Enhancing image resolution of confocal fluorescence microscopy with deep learning,” PhotoniX 4(1), 1–22 (2023). [CrossRef]

29. M. Jia, Z. Wei, L. Yu, et al., “Noise-disentangled single-pixel imaging under photon-limited conditions,” IEEE Trans. Comput. Imaging 9, 594–606 (2023). [CrossRef]

30. Y. Wang, K. Huang, J. Fang, et al., “Mid-infrared single-pixel imaging at the single-photon level,” Nat. Commun. 14(1), 1073 (2023). [CrossRef]

31. W. Huang, F. Wang, X. Zhang, et al., “Learning-based adaptive under-sampling for fourier single-pixel imaging,” Opt. Lett. 48(11), 2985–2988 (2023). [CrossRef]

32. K. Song, Z. Zhao, Y. Ma, et al., “A multitask dual-stream attention network for the identification of kras mutation in colorectal cancer,” Med. Phys. 49(1), 254–270 (2022). [CrossRef]

33. K. Song, Z. Zhao, J. Wang, et al., “Segmentation-based multi-scale attention model for kras mutation prediction in rectal cancer,” International Journal of Machine Learning and Cybernetics pp. 1–17 (2022).

34. P. Jiang, J. Liu, L. Wu, et al., “Fourier single pixel imaging reconstruction method based on the u-net and attention mechanism at a low sampling rate,” Opt. Express 30(11), 18638–18654 (2022). [CrossRef]

35. Y. Zhang, T. Liu, M. Singh, et al., “Neural network-based image reconstruction in swept-source optical coherence tomography using undersampled spectral data,” Light: Sci. Appl. 10(1), 155 (2021). [CrossRef]

36. S. Huang, R. Wang, R. Wu, et al., “Snr-net oct: brighten and denoise low-light optical coherence tomography images via deep learning,” Opt. Express 31(13), 20696–20714 (2023). [CrossRef]

37. Y. Huang, Z. Lu, Z. Shao, et al., “Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network,” Opt. Express 27(9), 12289–12307 (2019). [CrossRef]

38. C. Li, An efficient algorithm for total variation regularization with applications to the single pixel camera and compressive sensing (Rice University, 2010).

39. J. A. Tropp and A. C. Gilbert, “Signal recovery from partial information via orthogonal matching pursuit,” IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007). [CrossRef]

40. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, (Springer, 2015), pp. 234–241.

41. S. Sharma, S. Sharma, and A. Athaiya, “Activation functions in neural networks,” Towards Data Sci 6, 310–316 (2017).

42. H. Zhao, O. Gallo, I. Frosio, et al., “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imaging 3(1), 47–57 (2017). [CrossRef]

43. P. G. Vaz, D. Amaral, L. R. Ferreira, et al., “Image quality of compressive single-pixel imaging using different hadamard orderings,” Opt. Express 28(8), 11666–11681 (2020). [CrossRef]

44. F. Ferri, D. Magatti, L. Lugiato, et al., “Differential ghost imaging,” Phys. Rev. Lett. 104(25), 253603 (2010). [CrossRef]

45. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.

46. R. Zhang, P. Isola, A. A. Efros, et al., “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), pp. 586–595.

47. Z. Wang, A. C. Bovik, H. R. Sheikh, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

48. Y. Fan, M. Bai, S. Wu, et al., “High-efficiency single-photon compressed sensing imaging based on the best choice scheme,” Opt. Express 31(5), 7589–7598 (2023). [CrossRef]

Photon-level single-pixel 3D tomography with masked attention network

Abstract

1. Introduction

2. Methods

2.1 Single-pixel tomographic imaging via deep learning

2.2 Masked attention network

3. Numerical simulation

3.1 Sampling rate

3.2 Axial resolution

3.3 Lateral resolution

4. Optical experiments

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Equations (6)

Optics Express