Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Rapid single-photon color imaging of moving objects

Open Access Open Access

Abstract

This paper outlines an experimental demonstration of a Bayesian image reconstruction approach to achieve rapid single-photon color imaging of moving objects. The capacity to extract the color of objects is important in a variety of target identification and computer vision applications. Nonetheless, it remains challenging to achieve high-speed color imaging of moving objects in low-photon flux environments. The low-photon regime presents particular challenges for efficient spectral separation and identification, while unsupervised image reconstruction algorithms are often slow and computationally expensive. In this paper, we address both of these difficulties using a combination of hardware and computational solutions. We demonstrate color imaging using a Single-Photon Avalanche Diode (SPAD) detector array for rapid, low-light-level data acquisition, with an integrated color filter array (CFA) for efficient spectral unmixing. High-speed image reconstruction is achieved using a bespoke Bayesian algorithm to produce high-fidelity color videos. The analysis is conducted first on simulated data allowing different pixel formats and photon flux scenarios to be investigated. Experiments are then performed using a plasmonic metasurface-based CFA, integrated with a 64 × 64 pixel format SPAD array. Passive imaging is conducted using white-light illumination of multi-colored, moving targets. Intensity information is recorded in a series of 2D photon-counting SPAD frames, from which accurate color information is extracted using the fast Bayesian method introduced herein. The per-frame reconstruction rate proves to be hundreds of times faster than the previous computational method. Furthermore, this approach yields additional information in the form of uncertainty measures, which can be used to assist with imaging system optimization and decision-making in real-world applications. The techniques demonstrated point the way towards rapid video-rate single-photon color imaging. The developed Bayesian algorithm, along with more advanced SPAD technology and utilization of time-correlated single-photon counting (TCSPC) will permit live 3D, color videography in extremely low-photon flux environments.

Published by Optica Publishing Group under the terms of the Creative Commons Attribution 4.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

1. Introduction

Single-photon detection techniques have become an essential capability for ultra-sensitive photonic measurements in low-light environments. As an increasingly popular single-photon detector, Single-Photon Avalanche Diodes (SPADs) are an emerging technology capable of excellent single-photon detection sensitivity and an extremely high frame rate [1]. Their reduced size and operating voltages, as well as increased quantum efficiencies, offer advantages over photomultiplier tube (PMT) and microchannel plate (MCP) detectors. Their architecture allows for in-pixel circuitry which can facilitate single-photon counting (SPC), time-correlated single-photon counting (TCSPC), and detector time-gating capabilities, while their low-cost and high achievable frame rates provide advantages over other highly sensitive charge-coupled device (CCD) detectors. Due to these capabilities, SPAD detectors have been driving significant advances in a range of imaging applications. Depending on the light source used for illumination in imaging, SPADs can be used as active or passive sensors. Over the years, it has been widely used in active imaging systems by synchronization with an active light source, e.g., laser. Such active imaging applications include, for example, Light Detection and Ranging (LiDAR) [25], object detection and tracking [6,7], and microscopy [810]. More recently, the development of high-resolution SPAD arrays [11] has enabled the capture of single-exposure two-dimensional (2D) intensity images of scenes under passive lighting. Compared to more mature technologies such as the electron multiplying CCD (EMCCD) [12], intensified CCD (ICCD) [13] or scientific CMOS (sCMOS) image sensor [14], SPAD sensor arrays have so far not scaled up to comparable pixel formats. While scanning [15] and micro-scanning [16] can make up for this shortfall in spatial resolution, these approaches face particular challenges with moving targets. In spite of that, the increasing development of large format SPAD arrays has enabled their use as a passive sensor in a broader range of imaging applications, such as dynamic range imaging [17,18] and burst vision [19,20].

Efficient computational methods have been proposed to perform image reconstruction from the data acquired by SPAD arrays. By modeling the single-photon counting statistics of the SPAD data [21], different signal processing methods have been investigated for the reconstruction of, for example, depth images [22,23], color images [24], fluorescence lifetime images [25], and 3D structure of hidden objects [26,27]. Most recently, a technology-agnostic, application-agnostic, and training-free photon processing pipeline is proposed for general-purpose SPAD data processing [28]. In this work, we consider the use of SPAD arrays with an efficient color reconstruction method for multi-spectral imaging.

SPAD devices have been adopted for use in a number of multi-spectral imaging scenarios by using a variety of spectral unmixing approaches. Recording a set of sequential measurements at different wavelengths using a filter wheel composed of a set of narrow bandpass filters [29] allows high-accuracy color reconstruction of static objects, but suffers from image registration defects in dynamic scenarios and potential sample bleaching in sensitive biological samples. Alternatively, the incident light beam may be split into its individual spectral components using a diffraction grating [30], which has the advantage of retaining all of the light while recording all wavelengths simultaneously but has a much larger Size, Weight, and Power (SWaP) cost due to requiring additional detectors and bulky optics. In an active illumination scenario, each emitted wavelength may be separated temporally, with TCSPC used to determine the relative intensities based on their times of arrival [31]. However, this approach cannot be used in passive imaging.

Color reconstruction from SPAD intensity readings represents a multi-spectral imaging inverse problem. Given photon counts detected by SPAD arrays, different algorithms have been proposed to estimate the number of detected photons per wavelength, which cannot be observed directly due to the combination of the color filters in the degradation model and the presence of noise in the observations. In [24], color filters for red, green, and blue (RGB) light were designed for ultralow-light-level imaging using a SPAD image sensor; a color image reconstruction algorithm was then used to recover the intensity values of the RGB channels via Maximum A Posteriori (MAP) estimation by the well-known Alternating Direction Method of Multipliers (ADMM)-based approach [32,33]. Under active laser illumination or broadband white-light illumination, different colors were correctly recovered by the ADMM-based algorithm. However, this algorithm suffers from several limitations when applied to rapid single-photon color imaging of moving objects. First, it takes a long computational time to reconstruct the color information of a single image, due to (i) the sequential nature of ADMM, (ii) its slow convergence for ill-posed problems, and (iii) manual tuning of the regularization parameter(s). In addition, this MAP-based ADMM approach only provides point estimates for the RGB channels without uncertainty quantification. Such limitations prevent this ADMM-based algorithm from being applied to image moving objects through image sequences, i.e., to reconstruct color videos as addressed by this work. In contrast to [24] where the objects are static and the observed data forms a single image, in this work, the colored objects are moving, and the SPAD array records videos. For such dynamic imaging scenes under low-light-level conditions, SPADs provide an ideal solution for video-rate imaging due to their extreme single-photon detection sensitivity and high frame-rate capability. By using a short exposure duration (5 milliseconds and less in the experiments for this work), the motion problems occurring when imaging dynamic scenes using conventional sensors, such as motion blur, ghosting, and flicker, can be mitigated during video SPAD data acquisition. In addition, a frame-by-frame reconstruction algorithm requiring no frame aggregation also helps to suppress the motion blur in the reconstructed videos. Furthermore, in the work presented herein, the object moving speed is assumed to be much slower than the frame rate of the SPAD array. To enable live color reconstruction of each frame, a fast color reconstruction algorithm is applied here. Our approach is based on the method proposed in [34] for grayscale image restoration problems in the presence of Gaussian noise, and the method proposed in [35] for RGB color image reconstruction from a single channel grayscale image corrupted by Poisson noise. In this work, the two methods are extended for the reconstruction of color videos and to the best of our knowledge, applied for the first time to measured sequences of grayscale SPAD arrays. Compared to the ADMM-based method from [24], this algorithm offers a much lower computational cost with hundreds of times faster reconstruction speeds and allows for automatic tuning of the regularization parameter(s) over time. Moreover, it provides additional uncertainty measures about the color video estimates, making it easier to exploit the output of the SPAD arrays.

2. Experimental setup

SPAD sensor technology is continually advancing in terms of format, sensitivity, and timing resolution. For instance, a recent state-of-the-art design provides $3.2$ megapixel resolution (2,072 $\times$ 1,548 pixel format) with only 6.39 µm pixel pitch and $100$ picoseconds timing resolution [36]. Exploiting the vast potential of these sensors requires further development on two main fronts: systems integration and software acceleration. In terms of system integration, this work presents a proof-of-concept full-color single-photon video imaging system. The aforementioned spectral filtering disadvantages of adopting SPAD devices in multi-spectral imaging scenarios are avoided by the use of a plasmonic metasurface color filter array (CFA) flip-chip bonded to the SPAD array, as will be discussed next.

2.1 Plasmonic metasurface CFA

The wide-ranging scientific research applications of multispectral imaging with SPAD arrays demand a bespoke approach to spectral filtering. For example, biological fluorescence imaging may require filtering for a different set of wavelengths than astrophysical spectroscopy. Plasmonic metasurface-based CFAs have proven to be comparable in quality to traditional thin-film or dye-based filters while having the advantage of easy customization [24,37]. This customizability comes from the filter’s novel design which allows for a single stage of lithography regardless of design or number of filters. This fabrication methodology also allows for potential direct on-chip production. Advances in metasurface design will provide the potential for many other interesting optical effects to be included directly on-chip, such as polarisation [38], lensing [39], beam steering [40], or phase shifting [41], where such benefits cannot be attained with conventional CFAs [42].

The plasmonic metasurface CFA used in this work is composed of a thin $70$ nanometer (nm) film of aluminum deposited on a $515$ micrometer (µm) thick fused silica substrate. A series of circular and elliptical nanoholes are etched into the metal, and $50$ nm into the substrate below, using electron-beam lithography and reactive ion etching (RIE). One unit cell of the metasurface structure is depicted in Fig. 1(a), with the relevant parameters highlighted. The values of the parameters in Fig. 1(a) for each filter are given in Table 1.

Tables Icon

Table 1. The values of metasurface filter design parameters in Fig. 1(a).a

 figure: Fig. 1.

Fig. 1. (a) Metasurface structure with defined unit cell of period ‘p’, middle and corner circles of radii ‘r$_1$’ and ‘r$_2$’ respectively, and central ellipses with long and short axes defined by ‘a’ and ‘b’ respectively. (b) Integration of plasmonic metasurface CFA with $64\times 64$ pixel SPAD array. The micrograph of SPAD array shows a $61.5$ µm pixel pitch, with SEM provided of filter metasurface.

Download Full Size | PDF

To accurately determine the object color from the SPAD data, the transmission spectra of each filter needs to be known. Figure 2 shows the average transmission spectra of each filter set of the CFA, from which accurate transmission coefficients can be extracted. The peak transmission wavelengths are indicated at $470$ nm, $570$ nm, and $670$ nm. The emission spectrum of the white-light source used in the optical setup (described below) is also provided (arbitrary units) in the figure.

 figure: Fig. 2.

Fig. 2. Average transmission coefficients of each filter set (blue, green, and red) with emission spectra of white-light source overlayed (arbitrary units). Highlighted are the peak transmission wavelengths of blue (470 nm), green (570 nm), and red (670 nm) filters.

Download Full Size | PDF

2.2 SPAD–CFA integration

The sensor used in this work is a relatively basic $64\times 64$ pixel Si CMOS SPAD array with photon counting capability and a $61.5$ µm pixel pitch [43]. Within each pixel is a square active area of $11.6$ µm side length, resulting in a fill factor of $3.56$%. The CFA is integrated with the SPAD array using a SET ACCµRA100 flip-chip bonder and a UV-curing optical adhesive. The SPAD array–CFA assembly are shown in Fig. 1(b).

2.3 Optical setup

In this work, the colored objects are placed on a rotating platform and illuminated with a broad spectrum thermal white-light source. A $200\,$ millimeter (mm) Canon SLR camera objective lens was used to collect the reflected light and focus it onto the SPAD array. As no TCSPC timing circuitry is available in this sensor, single photon counting was performed within a pre-determined frame duration, with data collected within the range of 0.1 milliseconds (ms) $\sim$ 5 ms exposure per frame. The per-pixel photon counts are read frame by frame via an FPGA and transmitted via USB 3.0 protocol to a computer. The system is illustrated in Fig. 3.

 figure: Fig. 3.

Fig. 3. Passive imaging optical setup. A broadband white-light source flood illuminates the object scene. The return light is collected by a $200\,$mm Canon SLR camera objective lens and focused onto the SPAD array.

Download Full Size | PDF

When working in the single-photon regime it is critical to determine the average background counts prior to conducting any imaging so they can be removed in processing. For this purpose, during system calibration in the experiments for this work, a series of 200 frames at 5 ms exposure ($\sim 40$ seconds at 5 fps) were collected with the light source active while the camera lens was covered. The average background counts in each pixel are composed of individual detector dark counts and ambient light passing the air-gap between SPAD array and SLR lens (see Fig. 3). In the case of changing ambient light levels this calibration should be repeated, however, if operating in uncontrolled environments the SPAD array and SLR lens should be enclosed in a self-contained unit to avoid changing background levels, in which case the background counts would consist purely of detector dark counts and recalibration becomes redundant.

The SPAD array is capable of performing data collection at a maximum rate of $5$ fps. The data is defined per frame as $\mathbf {Y}_t$, where $\mathbf {Y}$ is the 2D $64\times 64$ pixel array of photon counts, and $t$ represents the frame number or time window. These frames are then fed into the image reconstruction algorithm. From a methodological viewpoint, the main novelty of this work stands in the structure and efficient implementation of a bespoke color reconstruction algorithm when our experimental setup is used for imaging moving objects, as will be discussed next.

3. Efficient color reconstruction algorithm from SPADs of moving objects

This section first formulates the forward observation model when using the optical setup, then describes the Bayesian model adopted, and finally presents an efficient Bayesian estimation strategy to reconstruct color videos.

3.1 Forward degradation model and Bayesian model

For the $n$-th pixel on the SPAD array, let $\pmb a_n = [a_n^{(R)}, a_n^{(G)},a_n^{(B)}]\in \mathbb {R}^{1\times 3}$ denote the peak transmission coefficients of the red, green, and blue filters, i.e., the three coefficients highlighted in Fig. 2. All the $\left\{ {{\boldsymbol a_n}} \right\}_{n = 1}^N$ of the SPAD array are assumed to be known from calibration measurements and do not change over time. At the $t$-th video frame, the color at the $n$-th pixel location is presented by a vector parameter $x_{n,t} = [x^{(R)}_{n,t},x^{(G)}_{n,t},x^{(B)}_{n,t}] \in \mathbb {R}^{1\times 3}$ consisting of red, green, and blue spectral information, respectively. Let $\lambda _{n,t}$ represent the average light flux reaching the $n$-th pixel. We model $\lambda _{n,t}$ by

$$\lambda_{n,t} = \boldsymbol a_n \boldsymbol x_{n,t}^T+b_n,$$
where $b_n$ denotes the average dark counts and background at the $n$-th pixel, which was measured pixel-wise during the system calibration described in the previous section. This model assumes that the average flux in each pixel and each frame is a linear combination of $\boldsymbol a_n$ and $\boldsymbol x_{n,t}$. Note that this model neglects possible cross-talk between pixels, together with temporal blur, i.e., we assume that objects are moving at a pace much slower than the frame rate. In photon-limited imaging scenarios, $\lambda _{n,t}$ is not directly observed. Instead, the observations, denoted by $\{y_{n,t}\}_{n,t}$, are corrupted by shot noise, traditionally modeled by a Poisson distribution. The (color) video reconstruction problem consists of estimating ${\left\{ {{\boldsymbol x_{n,t}}} \right\}_{n,t}}$ from the observed (grayscale) videos $\{y_{n,t}\}_{n,t}$. In this work, a Bayesian method is applied on a per-frame basis for rapid reconstruction.

In our Bayesian model, conditioned on $\{\lambda _{n,t}\}_{n,t}$, the observed photon counts $\{y_{n,t}\}_{n,t}$ across pixels and frames are assumed to be mutually independent. At the $t$-th video frame, the likelihood function of all the recorded counts ${\boldsymbol{y_{t}}} = \{y_{n,t}\}_{n=1}^N$ is expressed as

$$f_{y|x}\left(\boldsymbol y_t|\boldsymbol x_t\right) = \prod_{n=1}^N f_{y|x}\left(\boldsymbol y_{n,t}|\boldsymbol x_{n,t}\right)= \prod_{n=1}^N \mathcal{P}\left(\textrm{max}(\boldsymbol a_{n}\boldsymbol x_{n,t}^T+b_n,0)\right).$$

The $\max (\cdot )$ function in Eq. 2 is not necessary when $\pmb a_{n}\pmb x_{n,t}^T+b_n$ is positive. However, here we introduce it in the Poisson likelihood to ensure the mean of the Poisson distribution is positive since $\pmb a_{n}\pmb x_{n,t}^T+b_n \,>\, 0$ is not enforced when estimating $\{\pmb x_{n,t}\}_{n,t}$. This constraint relaxation is adopted to facilitate hyperparameter estimation [34,44] as will be discussed below.

To keep the reconstruction method fast and tractable, we assign independent prior distributions to the unknown R, G, and B spectral channels of each frame. For each spectral channel of ${\boldsymbol x_t} = \left\{ {{\boldsymbol x_{n,t}}} \right\}_{n = 1}^N$, a smoothness-promoting prior distribution is adopted. It relies on the anisotropic total variation (TV) [45,46], which promotes sparsity of image gradients and can be expressed as

$$\begin{aligned} f_x(\boldsymbol x_t|\boldsymbol \theta_t) \propto \prod_{c = R, G, B} \left[ \prod_{(i,j)\in \mathcal{V}} e^{-\theta_t^{(c)}|{x_{i,t}^{(c)}-x_{j,t}^{(c)}}|}\right], \end{aligned}$$
where $\mathcal {V}$ denotes the set of indices of pixel pairs for a first-order neighboring structure [47] and $\boldsymbol \theta _t = [\theta _t^{(R)}, \theta _t^{(G)}, \theta _t^{(B)}] \in (0,+\infty )^3$ is a set of three hyperparameters. The larger the hyperparameters, the smoother the image $\boldsymbol x_t$ is expected to be.

Following the Baye’s rule, the posterior distribution of $\boldsymbol x_t$ conditioned on $\boldsymbol y_t$, $\boldsymbol \theta _t$ can be expressed as

$$p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol \theta_t) = \frac{f_{y|x}(\boldsymbol y_t|\boldsymbol x_t)f_x(\boldsymbol x_t|\boldsymbol \theta_t)}{\displaystyle \int f_{y|x}(\boldsymbol y_t|\boldsymbol x_t)f_x(\boldsymbol x_t|\boldsymbol \theta_t) \text{d} \boldsymbol x_t}.$$

The hyperparameters in $\boldsymbol \theta _t$ can have a significant impact on the reconstruction quality [48], in particular since we consider sequences of frames, the smoothness levels of the frames can vary over time. Thus, these parameters also need to be adjusted over time when imaging dynamic scenes. To reduce the overall level of user supervision, we adopt a Bayesian estimation strategy that allows $\boldsymbol \theta _t$ to be adjusted automatically along with the estimation of $\boldsymbol x_t$, as will be presented next.

3.2 Bayesian estimation strategy

The proposed Bayesian estimation strategy is an iterative method with a sequential estimation of $\boldsymbol x_t$ and $\boldsymbol {\theta }_t$ in each iteration. Firstly, given the estimate $\boldsymbol {\hat {\theta }}_t$ from the previous iteration, Bayesian inference is conducted on $p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol{\hat {\theta }}_t)$ to estimate $\boldsymbol x_t$. Classical estimators are the Maximum A Posteriori (MAP) and the Minimum Mean Squared Error (MMSE) estimators. The MAP estimator has been extensively used as a preferred point estimator as it can be computed using optimization methods (convex optimization here). The MAP estimator, previously investigated in [24], can be expressed as

$$\boldsymbol {\hat{x}}_{t,\text{MAP}} = \text{arg}\max_{x_{t}} p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol {\hat{\theta}}_t).$$

On the other hand, the MMSE estimator, given by

$$\boldsymbol {\hat{x}}_{t,\text{MMSE}} =\mathbb{E}_{p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol {\hat{\theta}}_t)}\left[\boldsymbol x_t\right],$$
is an alternative point estimator that offers advantages over $\boldsymbol{\hat {x}}_{t,\text {MAP}}$. Specifically, the same tools used to compute the MMSE estimates can further be used to compute the posterior marginal variances of each element in $\boldsymbol x_t$. Here at each frame, these $N \times 3$ variances are gathered in a set denoted by $\mathbb {V}(\boldsymbol x_t)$ to provide pixel-wise uncertainty measures that can complement the MMSE estimates. In practice, such uncertainty measures can be used for subsequent decision-making and planning. However, computing $\boldsymbol{\hat {x}}_{t,\text {MMSE}}$ and $\mathbb {V}(\boldsymbol x_t)$ is intractable, as they require integration over the high dimensional unknown ${\pmb x}_t$.

In this work, an efficient method is applied to approximate $\boldsymbol{\hat {x}}_{t,\text {MMSE}}$ and $\mathbb {V}(\boldsymbol x_t)$ by using Expectation Propagation [49]. The reconstructed R, G, and B channels of each video frame as well as the associated uncertainty measure are inferred from a simple approximating posterior distribution that is easier to manipulate than the exact posterior distribution $p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol{\hat {\theta }}_t)$. Specifically, $p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol{\hat {\theta }}_t)$ is approximated by the following multivariate Gaussian distribution,

$$p(\boldsymbol x_t|\boldsymbol y_t,\boldsymbol {\hat{\theta}}_t) \approx \boldsymbol{\mathcal{N}}(\boldsymbol x_t;\boldsymbol \mu_{t},\boldsymbol \Sigma_t),$$
whose mean vector $\boldsymbol \mu _t$ and marginal variances $\text {diag}(\boldsymbol \Sigma _t)$ directly provide approximations of the MMSE estimate $\boldsymbol{\hat {x}}_{t,\text {MMSE}}$ and the posterior marginal variance $\mathbb {V}(\boldsymbol x_t)$, i.e.,
$$\boldsymbol{\hat{x}}_{t,\text{MMSE}} \approx \boldsymbol \mu_t, \,\, \mathbb{V}(\boldsymbol x_t) \approx \text{diag}(\boldsymbol \Sigma_t).$$

In the next step of the same iteration, $\boldsymbol{\mathcal {N}}(\boldsymbol x_t;\boldsymbol \mu _{t},\boldsymbol \Sigma _t)$ is then used to automatically estimate the prior hyperparameters $\boldsymbol \theta _t$ in $f_x(\boldsymbol x_t|\boldsymbol \theta _t)$ at each video frame, via a variational Expectation Maximization (EM) approach [50]. Similar to [34,35], the resulting algorithm, referred to as EP-EM, allows for color video reconstruction in an unsupervised manner. It should be noted that this EP-EM algorithm is made possible and fast by omitting positivity constraints on $\boldsymbol x_t$. This relaxation does not seem to have a significant effect on the quality of the reconstructed images in our experiments.

4. Experimental results

In this section, the proposed EP-EM algorithm is first applied to the reconstruction of single frames of different formats and under different observation conditions, by using simulated data for which ground truth is available. This simulation experiment is used to compare quantitatively the MAP estimator [24,33] and the proposed EP-EM approach, relying on an approximate MMSE estimator. Both methods use the same $\ell _1$-norm TV prior distribution. The EP-EM method automatically estimates the hyperparameters, while the ADMM-based method requires setting the hyperparameters manually. In our experiments, to facilitate parameter setting for the ADMM-based method, its regularization hyperparameters are set to the values estimated by the EP-EM method. In the second subsection, the EP-EM method is then applied to the color reconstruction of real SPAD videos acquired by the experimental setup presented in Section 2.

4.1 Reconstruction of single simulated frames

The two color images in Fig. 4 are used as ground truth data $\boldsymbol x = [\boldsymbol x^{(R)}, \boldsymbol x^{(G)}, \boldsymbol x^{(B)}]\in \mathbb {R}^{512\times 512\times 3}$ to simulate individual frames $\boldsymbol y\in \mathbb {R}^{512 \times 512}$ following (2). The frame index $t$ is omitted for single frames. In the simulation, the pixel values in $\boldsymbol x$ are normalized between $[0,1]$. The transmission coefficients $\boldsymbol a_n = [a_n^{(R)}, a_n^{(G)},a_n^{(B)}]\in \mathbb {R}^{1\times 3}$ ($\forall n=1,\dots, N$) are randomly selected from one of three coefficient vector sets $[0.7, 0.2, 0.1], \,[0.1, 0.2, 0.7]$ and $[0.1, 0.7, 0.2]$. To mimic real acquisition scenarios with the experimental setup described in Section 2, we generated data according to the statistics of measured data. First, the SPAD array contains approximately $5$% of hot pixels so all the simulated images have been corrupted randomly by the same proportion of hot pixels. It is worth noting that as long as the hot pixel locations are known (they can be identified for instance using the method from [51]), they do not induce severe artifacts in the reconstructed images. Second, background levels vary across pixels of our SPAD array, so we used a long exposure to measure, after hot pixel removal, a spatial average background level and the associated (spatial) variance. With the same long exposure, we measured an average observed photon count. From there, we were able to simulate data according to $y_n \sim \mathcal {P}(\alpha \boldsymbol a_n \boldsymbol x_n^T + b_n)$ to mimic background and signal levels comparable to those measured with exposures of $0.1\,$ms, $0.5\,$ms, $1\,$ms, and $5\,$ms (see analysis of real data below). This was achieved by changing the value of $\alpha$ and the parameters of the gamma distribution used to generate the spatially variable background levels in order to fit the experimental background means and variances.

 figure: Fig. 4.

Fig. 4. Two ground truth color images used to simulate individual frames.

Download Full Size | PDF

Figure 5 presents the images reconstructed from the simulated frames by the ADMM-based and the proposed EP-EM methods and reports their reconstruction time (RT). The average useful photons per pixel (PPP), i.e., the average of observed photon counts subtracting the background photon counts, is also computed for each simulated frame. It can be seen from the results that both methods manage to reconstruct images that are visually similar to the ground truth, and the visual image quality is improved as the exposure increases. Note that when the exposure is short (0.1 ms) leading to PPP $<\,1$ on average, the EP-EM method faithfully restores accurate color information, while the images reconstructed by the ADMM-based method present severe blocky artifacts. As the exposure increases, although the two methods produce similar color images, the EP-EM method provides smoother estimates than those of the ADMM-based method. It is worth noting that the $\ell _1$-norm TV prior hyperparameters used in the ADMM-based method are estimated by the proposed EP-EM method. In practice when there is no access to the hyperparameter estimates, the ADMM-based method requires additional manual hyperparameter tuning in a trial-and-error manner to obtain the presented reconstruction results, making it a less practical approach.

 figure: Fig. 5.

Fig. 5. Experimental results: reconstruction of 512 $\times$ 512 pixels simulated frames under different exposure conditions. PPP: the average useful photons per pixel. RT: reconstruction time.

Download Full Size | PDF

Figure 6 plots the Normalized Root Mean Squared Error (NRMSE) and Structural Similarity Index Measure (SSIM) [52] versus exposure to quantitatively compare the quality of the images reconstructed by the ADMM-based and the EP-EM methods in Fig. 5. The Root Mean Squared Error (RMSE) is firstly computed between the reconstructed images and the ground truth images and then divided by $\alpha$ to obtain the NRMSE. The lower the NRMSE or the higher the SSIM, the better the quality of the reconstructed images. For a fair comparison, 20 realizations with the same exposure are generated. The NRMSE and SSIM values reported in Fig. 6 are the mean values of the NRMSEs and SSIMs over that 20 noise realizations. It can be seen from the two plots that at the exposure of 0.5 ms, 1 ms, and 5 ms, the EP-EM method provides lower NRMSEs and higher SSIMs, i.e., better image reconstruction quality, than the ADMM-based method. In the low-photon count regime where the exposure is short (0.1 ms), although the NRMSEs and SSIMs by the ADMM-based method are similar to or slightly better than the EP-EM method, the color images reconstructed by ADMM-based method suffer from severe blocky artifacts in Fig. 5.

 figure: Fig. 6.

Fig. 6. Normalized Root Mean Squared Error (NRMSE on the left y-axis) and Structural Similarity Index Measure (SSIM on the right y-axis) of the simulated frames reconstructed by the ADMM-based and the EP-EM methods in Fig. 5.

Download Full Size | PDF

In addition to providing visually and quantitatively better quality reconstructed color images than the ADMM-based MAP estimator, the EP-EM method provides uncertainty measures for its estimates. Note that the uncertainties are computed as a by-product of the EP-EM method, thus requiring no post-processing or additional computation resources. Figure 7 displays 3-dimensional (3D) views of the uncertainties of the images reconstructed by the EP-EM method in Fig. 5. The trends of the 3D points in this figure show that the uncertainties reduce from the short exposure (0.1 ms) to the long exposure (5 ms). This is due to the fact that longer exposures help to collect more photon counts. Therefore, acting as an information-aided tool, such uncertainty measures can indicate how reliable the estimates are.

 figure: Fig. 7.

Fig. 7. 3D view of uncertainty measures of the cubic color images and LEGO color images reconstructed by the EP-EM method in Fig. 5. In this figure, each 3D point denotes the uncertainties of a single color pixel in the reconstructed color images in the R, G, and B channels. The different point colors represent different exposures. To visualize the uncertainties on each individual channel, the 3D point clouds are projected onto the 2D planes (RG-plane, RB-plane, and BG-plane).

Download Full Size | PDF

A significant benefit of the EP-EM method over the ADMM-based method is its fast reconstruction time reported in Fig. 5. To compare the computational efficiency of the two methods for different frame sizes, the original 512 $\times$ 512 pixels ground truth LEGO color image is resized to $32\times 32$, $64\times 64$, $128\times 128$, and $256\times 256$ pixels to simulate different sizes of SPAD arrays. The simulated data sets of different pixel sizes are generated using the same statistics as above. The computational time to reconstruct the images is reported in Table 2. The reported times are for a Matlab implementation with an Apple M1 chip processor of an 8-core CPU. From the table, it can be seen that compared to the ADMM-based method, the EP-EM method significantly speeds up the reconstruction of individual frames, and the computational advantage increases with the image size. Indeed, in the most extreme cases (i.e., large pixel format and long exposure time), the proposed EP-EM method is close to $370$ times faster.

Tables Icon

Table 2. CPU computational time (in seconds) comparison between the ADMM-based method and the EP-EM method for different frame sizes and exposures.a

For completeness and to illustrate the benefits of larger SPAD arrays, Fig. 8 depicts examples of images reconstructed from simulated data generated with per pixel signals and background levels in line with real measurements (see Visualization 2). The only difference between the images is the number of pixels observed. As expected, for given observation conditions, higher-resolution frames improve the overall quality of the reconstructed images, which opens promising capabilities using SPAD arrays larger than the 64 $\times$ 64 pixel array used in this work.

 figure: Fig. 8.

Fig. 8. Assessment of the reconstruction quality (via the EP-EM method) under realistic observation conditions. (a) Example of the reconstructed 64 $\times$ 64 pixels frame from real SPAD data (exposure of 5 ms, see Visualization 2). (b) Reconstructed frames of size 64 $\times$ 64, 128 $\times$ 128, 256 $\times$ 256, and 512 $\times$ 512 pixels, for data simulated with per pixel signal and background levels similar to those of the data acquired in (a).

Download Full Size | PDF

This subsection has demonstrated the significant advantages of the EP-EM method over the MAP-based ADMM method in terms of the visual and quantitative color reconstruction quality, the uncertainty measures provided, and most importantly the fast computational time, using simulated data of single frames. Next, real SPAD videos recorded by the experimental setup in Section 2 will be used to further verify the effectiveness of the EP-EM method.

4.2 Reconstruction of real SPAD videos

Using the passive imaging setup in Fig. 3, two multi-colored objects were imaged on a rotating platform to generate real SPAD videos. Different views of the moving objects are shown in Fig. 9. 200 SPAD frames were captured while the platform was moving, with an exposure of 0.1 ms, 0.5 ms, 1 ms, or 5 ms. The videos reconstructed by the EP-EM method are provided in the supplementary files of this paper (see Visualization 1 and Visualization 2). From the videos, we observe that as expected, the longer the exposure, the better the reconstruction quality, due to the larger number of photons collected. Limited by the costly computation and the need for manual hyperparameter tuning of each video frame, only a set of frames were reconstructed by the ADMM-based method for comparison. This comparison is shown in Fig. 10. In a similar fashion to Fig. 5, the average useful PPP of the observed frames and the reconstruction time (RT) of the two methods are reported. It can be seen from the comparison that the EP-EM method manages to reconstruct the true color of the objects, while the ADMM-based method presents the same blocky artifacts as in Fig. 5.

 figure: Fig. 9.

Fig. 9. Two views of the moving objects used to generate the real SPAD videos (see Visualization 1 and Visualization 2) using the passive imaging optical setup in Fig. 3.

Download Full Size | PDF

 figure: Fig. 10.

Fig. 10. Experimental results on a set of selected frames of the real SPAD videos (see Visualization 1 and Visualization 2). PPP is the average useful photons per pixel.

Download Full Size | PDF

Figure 11 presents the 3D view of uncertainty measures for the color frames reconstructed by the EP-EM method in Fig. 10. In a similar fashion to the experiments on simulation data, the trend of uncertainty levels decreases as frame exposure increases. As a result, there is a trade-off between the exposure and the uncertainty measures. On the one hand, increasing the exposure per frame helps to reduce the uncertainty of the reconstructed videos. On the other hand, a longer exposure may result in longer data acquisition and potentially motion blur. Nonetheless, from the results on the real SPAD video data, it can be seen that the EP-EM method has the ability to provide uncertainty measures informing different photon count levels in the observed data.

 figure: Fig. 11.

Fig. 11. 3D view of uncertainty measures of the color video frames reconstructed by the EP-EM method in Fig. 10. Here similar to Fig. 7, each 3D point denotes the uncertainties of a single pixel in the reconstructed color frames in the red, green, and blue channels, and the projection onto 2D planes is to better visualize the difference between uncertainty levels. The different point colors represent different exposures.

Download Full Size | PDF

5. Conclusion

This paper demonstrates a new scheme that combines system integration with software acceleration to achieve rapid single-photon color imaging of moving objects in the low-photon flux regime. The system integration has illustrated the applicability of the SPAD–CFA assembly for passive imaging with white-light illumination for fast-moving objects. Videos were recorded by the system in a low-light environment, and an efficient Bayesian approach was applied to facilitate efficient video reconstruction. Compared to the ADMM-based method, the EP-EM method offers distinct advantages of better reconstruction quality, automatic hyperparameter tuning, provision of uncertainty measures, and most importantly of significantly improved computation time.

Using a relatively small format ($64\times 64$ pixels) and high dark count SPAD array, the EP-EM method allows image reconstruction approximately $300$ times faster than the previous method. Simulation data shows that with larger format SPAD arrays the benefits will be even greater (close to 400$\times$ faster), while higher quality SPAD arrays containing lower dark count rates should allow high quality, high-speed color imaging in even more extreme photon-starved environments.

This paper paves the way towards developing a live, video-rate, full-color single-photon imaging system. Live color image reconstruction could be achieved by GPU implementation of the EP-EM method. If combined with a high temporal resolution SPAD detector array, this implementation could facilitate the development of a real-time full-color imaging system with applications across a wide scientific spectrum. Such a system could find use in low-light environment security applications, or facilitate dynamic fluorescence imaging measurements of sensitive biological samples. The capability of more advanced SPAD arrays to conduct TCSPC could benefit autonomous vehicles by allowing them to record combined 3D and color LiDAR data without suffering image registration or misalignment between data recorded by separate systems. Indoor gaming systems could similarly exploit such a device to render players and environments for augmented reality or gesture-based game play.

Funding

Royal Academy of Engineering (RF201617/16/31); Engineering and Physical Sciences Research Council (EP/N003446/1, EP/S000631/1, EP/S026428/1, EP/T00097X/1); UK MOD University Defence Research Collaboration (UDRC).

Acknowledgments

LEGO® is a trademark of the LEGO Group of companies, which does not sponsor, authorize, or endorse this material.

Disclosures

The authors declare no conflicts of interest

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Vines, K. Kuzmenko, J. Kirdoda, D. C. Dumas, M. M. Mirza, R. W. Millar, D. J. Paul, and G. S. Buller, “High performance planar germanium-on-silicon single-photon avalanche diode detectors,” Nat. Commun. 10(1), 1086 (2019). [CrossRef]  

2. A. Gupta, A. Ingle, and M. Gupta, “Asynchronous single-photon 3D imaging,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 7909–7918.

3. F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3D imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018). [CrossRef]  

4. D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3D imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018). [CrossRef]  

5. J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017). [CrossRef]  

6. G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, “Detection and tracking of moving objects hidden from view,” Nat. Photonics 10(1), 23–26 (2016). [CrossRef]  

7. G. Mora-Martín, A. Turpin, A. Ruget, A. Halimi, R. Henderson, J. Leach, and I. Gyongy, “High-speed object detection with a single-photon time-of-flight image sensor,” Opt. Express 29(21), 33184–33196 (2021). [CrossRef]  

8. A. Gola, L. Pancheri, C. Piemonte, and D. Stoppa, “A SPAD-based hybrid system for time-gated fluorescence measurements,” in Advanced Photon Counting Techniques V, Vol. 8033 (International Society for Optics and Photonics, 2011), p. 803315.

9. V. Zickus, M.-L. Wu, K. Morimoto, V. Kapitany, A. Fatima, A. Turpin, R. Insall, J. Whitelaw, L. Machesky, C. Bruschini, D. Faccio, and E. Charbon, “Fluorescence lifetime imaging with a megapixel spad camera and neural network lifetime estimation,” Sci. Rep. 10(1), 20986 (2020). [CrossRef]  

10. C. Bruschini, H. Homulle, I. M. Antolovic, S. Burri, and E. Charbon, “Single-photon avalanche diode imagers in biophotonics: review and outlook,” Light: Sci. Appl. 8(1), 87 (2019). [CrossRef]  

11. K. Morimoto, A. Ardelean, M.-L. Wu, A. C. Ulku, I. M. Antolovic, C. Bruschini, and E. Charbon, “Megapixel time-gated SPAD image sensor for 2D and 3D imaging applications,” Optica 7(4), 346–354 (2020). [CrossRef]  

12. F. Bestvater, Z. Seghiri, M. S. Kang, N. Gröner, J. Y. Lee, K.-B. Im, and M. Wachsmuth, “EMCCD-based spectrally resolved fluorescence correlation spectroscopy,” Opt. Express 18(23), 23818–23828 (2010). [CrossRef]  

13. R. Cubeddu, D. Comelli, C. D’Andrea, P. Taroni, and G. Valentini, “Time-resolved fluorescence imaging in biology and medicine,” J. Phys. D: Appl. Phys. 35(9), 201R61 (2002). [CrossRef]  

14. Z. Li, S. Kawahito, K. Yasutomi, K. Kagawa, J. Ukon, M. Hashimoto, and H. Niioka, “A time-resolved CMOS image sensor with draining-only modulation pixels for fluorescence lifetime imaging,” IEEE Trans. Electron Devices 59(10), 2715–2722 (2012). [CrossRef]  

15. R. Tobin, A. Halimi, A. McCarthy, X. Ren, K. J. McEwan, S. McLaughlin, and G. S. Buller, “Long-range depth profiling of camouflaged targets using single-photon detection,” Opt. Eng. 57(03), 1 (2017). [CrossRef]  

16. E. Wade, R. Tobin, A. McCarthy, and G. S. Buller, “Sub-pixel micro scanning for improved spatial resolution using single-photon LiDAR,” in Advanced Photon Counting Techniques XV, Vol. 11721 (SPIE, 2021), p. 8–16.

17. I. M. Antolovic, C. Bruschini, and E. Charbon, “Dynamic range extension for photon counting arrays,” Opt. Express 26(17), 22234–22248 (2018). [CrossRef]  

18. A. Ingle, A. Velten, and M. Gupta, “High flux passive imaging with single-photon sensors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 6760–6769.

19. S. Ma, S. Gupta, A. C. Ulku, C. Bruschini, E. Charbon, and M. Gupta, “Quanta burst photography,” ACM Trans. Graph. 39(4), 79 (2020). [CrossRef]  

20. S. Ma, P. Mos, E. Charbon, and M. Gupta, “Burst vision using single-photon cameras,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2023), pp. 5375–5385.

21. S. Scholes, G. Mora-Martín, F. Zhu, I. Gyongy, P. Soan, and J. Leach, “Fundamental limits to depth imaging with single-photon detector array sensors,” Sci. Rep. 13(1), 176 (2023). [CrossRef]  

22. A. Gupta, A. Ingle, A. Velten, and M. Gupta, "Photon-flooded single-photon 3D cameras,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 6770–6779.

23. Y. Altmann, X. Ren, A. McCarthy, G. S. Buller, and S. McLaughlin, “Lidar waveform-based analysis of depth images constructed using sparse single-photon data,” IEEE Trans. on Image Process. 25(5), 1935–1946 (2016). [CrossRef]  

24. Y. D. Shah, P. W. Connolly, J. P. Grant, D. Hao, C. Accarino, X. Ren, M. Kenney, V. Annese, K. G. Rew, Z. M. Greener, A. Yoann, F. Daniele, G. S. Buller, and D. R. S. Cumming, “Ultralow-light-level color image reconstruction using high-efficiency plasmonic metasurface mosaic filters,” Optica 7(6), 632–639 (2020). [CrossRef]  

25. J. L. Lagarto, F. Villa, S. Tisa, F. Zappa, V. Shcheslavskiy, F. S. Pavone, and R. Cicchi, “Real-time multispectral fluorescence lifetime imaging using single photon avalanche diode arrays,” Sci. Rep. 10(1), 1–10 (2020). [CrossRef]  

26. J. H. Nam, E. Brandt, S. Bauer, X. Liu, M. Renna, A. Tosi, E. Sifakis, and A. Velten, “Low-latency time-of-flight non-line-of-sight imaging at 5 frames per second,” Nat. Commun. 12(1), 6526 (2021). [CrossRef]  

27. G. Musarra, A. Lyons, E. Conca, Y. Altmann, F. Villa, F. Zappa, M. J. Padgett, and D. Faccio, “Non-line-of-sight three-dimensional imaging with a single-pixel camera,” Phys. Rev. Appl. 12(1), 011002 (2019). [CrossRef]  

28. J. Lee, A. Ingle, J. V. Chacko, K. W. Eliceiri, and M. Gupta, “CASPI: collaborative photon processing for active single-photon imaging,” Nat. Commun. 14(1), 3158 (2023). [CrossRef]  

29. M. Perenzoni, N. Massari, D. Perenzoni, L. Gasparini, and D. Stoppa, “A 160 × 120 pixel analog-counting single-photon imager with time-gating and self-referenced column-parallel A/D conversion for fluorescence lifetime imaging,” IEEE J. Solid-State Circuits 51(1), 155–167 (2016). [CrossRef]  

30. G. S. Buller, R. D. Harkins, A. McCarthy, P. A. Hiskett, G. R. MacKinnon, G. R. Smith, R. Sung, A. M. Wallace, R. A. Lamb, K. D. Ridley, and J. G. Rarity, “Multiple wavelength time-of-flight sensor based on time-correlated single-photon counting,” Rev. Sci. Instruments 76(8), 083112 (2005). [CrossRef]  

31. X. Ren, Y. Altmann, R. Tobin, A. Mccarthy, S. Mclaughlin, and G. S. Buller, “Wavelength-time coding for multispectral 3D imaging using single-photon LiDAR,” Opt. Express 26(23), 30146–30161 (2018). [CrossRef]  

32. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. 3(1), 1–122 (2010). [CrossRef]  

33. Y. Altmann, A. Maccarone, A. McCarthy, S. McLaughlin, and G. S. Buller, “Spectral classification of sparse photon depth images,” Opt. Express 26(5), 5514–5530 (2018). [CrossRef]  

34. D. Yao, S. McLaughlin, and Y. Altmann, “Fast scalable image restoration using total variation priors and Expectation Propagation,” IEEE Trans. on Image Process. 31, 5762–5773 (2022). [CrossRef]  

35. D. Yao, Y. Altmann, and S. McLaughlin, "Color image restoration in the low photon-count regime using Expectation Propagation,” in International Conference on Image Processing (IEEE, 2022), pp. 3126–3130.

36. K. Morimoto, J. Iwata, M. Shinohara, et al., “3.2 megapixel 3D-stacked charge focusing SPAD for low-light imaging and depth sensing,” in International Electron Devices Meeting (IEEE, 2021), pp. 20.2.4.–3130.

37. P. W. Connolly, J. Valli, Y. D. Shah, Y. Altmann, J. Grant, C. Accarino, C. Rickman, D. R. Cumming, and G. S. Buller, “Simultaneous multi-spectral, single-photon fluorescence imaging using a plasmonic colour filter array,” J. Biophotonics 14, e202000505 (2021). [CrossRef]  

38. P. C. Wu, W.-Y. Tsai, W. T. Chen, Y.-W. Huan, T.-Y. Chen, J.-W. Chen, C. Y. Liao, C. H. Chu, G. Sun, and D. P. Tsai, “Versatile polarization generation with an aluminum plasmonic metasurface,” Nano Lett. 17(1), 445–452 (2017). [CrossRef]  

39. M. Khorasaninejad and F. Capasso, “Metalenses: Versatile multifunctional photonic components,” Science 358(6367), eaam8100 (2017). [CrossRef]  

40. Z. Li, E. Palacios, S. Butun, and K. Aydin, “Visible-frequency metasurfaces for broadband anomalous reflection and high-efficiency spectrum splitting,” Nano Lett. 15(3), 1615–1621 (2015). [CrossRef]  

41. Y. Yu, A. Zhu, R. Paniagua-Domínguez, Y. Fu, B. Luk’yanchuk, and A. Kuznetsov, “High-transmission dielectric metasurface with 2π phase control at visible wavelengths,” Laser Photonics Rev. 9(4), 412–418 (2015). [CrossRef]  

42. P. W. Connolly, Y. D. Shah, J. Valli, A. Sykes, C. Accarino, Y. Altmann, C. Rickman, D. R. Cumming, and G. S. Buller, “Advances in metasurface-based mosaic filters for single-photon detector arrays,” in Emerging Imaging and Sensing Technologies for Security and Defence VII12274 (SPIE, 2022), pp. 105–112.

43. C. Accarino, G. Melino, V. F. Annese, M. A. Al-Rawhani, Y. D. Shah, D. Maneuski, C. Giagkoulovits, J. P. Grant, S. Mitra, C. Buttar, and D. R. S. Cumming, “A 64 × 64 SPAD array for portable colorimetric sensing, fluorescence and X-Ray imaging,” IEEE Sensors J. 19(17), 7319–7327 (2019). [CrossRef]  

44. M. Pereyra, J. M. Bioucas-Dias, and M. A. Figueiredo, “Maximum-a-posteriori estimation with unknown regularisation parameters,” in 23rd European Signal Processing Conference (IEEE, 2015), pp. 230–234.

45. J. Besag, “Digital image processing: Towards Bayesian image analysis,” J. Appl. Stat. 16(3), 395–407 (1989). [CrossRef]  

46. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1-4), 259–268 (1992). [CrossRef]  

47. P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” Int. J. Comput. Vis. 59(2), 167–181 (2004). [CrossRef]  

48. A. F. Vidal, V. De Bortoli, M. Pereyra, and A. Durmus, “Maximum likelihood estimation of regularization parameters in high-dimensional inverse problems: An empirical Bayesian approach part I: Methodology and experiments,” SIAM J. Imaging Sci. 13(4), 1945–1989 (2020). [CrossRef]  

49. T. P. Minka and M. Oshima, “Expectation Propagation for approximate Bayesian inference,” in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (Morgan Kaufmann Publishers Inc., 2001), 362–369.

50. G. Celeux, F. Forbes, and N. Peyrard, “EM procedures using mean field-like approximations for Markov model-based image segmentation,” Pattern Recognit. 36(1), 131–144 (2003). [CrossRef]  

51. P. Connolly, X. Ren, R. Henderson, and G. S. Buller, “Hot pixel classification of single-photon avalanche diode detector arrays using a log-normal statistical distribution,” Electron. Lett. 55(1), 1004–1006 (2019). [CrossRef]  

52. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]  

Supplementary Material (2)

NameDescription
Visualization 1       Color reconstruction of the moving ''cubic'' object shown in Fig. 9. A selected frame of this video is shown in Fig. 10.
Visualization 2       Color reconstruction of the moving "LEGO" object shown in Fig. 9. A selected frame of this video is shown in Figs. 8, 10.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. (a) Metasurface structure with defined unit cell of period ‘p’, middle and corner circles of radii ‘r$_1$’ and ‘r$_2$’ respectively, and central ellipses with long and short axes defined by ‘a’ and ‘b’ respectively. (b) Integration of plasmonic metasurface CFA with $64\times 64$ pixel SPAD array. The micrograph of SPAD array shows a $61.5$ µm pixel pitch, with SEM provided of filter metasurface.
Fig. 2.
Fig. 2. Average transmission coefficients of each filter set (blue, green, and red) with emission spectra of white-light source overlayed (arbitrary units). Highlighted are the peak transmission wavelengths of blue (470 nm), green (570 nm), and red (670 nm) filters.
Fig. 3.
Fig. 3. Passive imaging optical setup. A broadband white-light source flood illuminates the object scene. The return light is collected by a $200\,$mm Canon SLR camera objective lens and focused onto the SPAD array.
Fig. 4.
Fig. 4. Two ground truth color images used to simulate individual frames.
Fig. 5.
Fig. 5. Experimental results: reconstruction of 512 $\times$ 512 pixels simulated frames under different exposure conditions. PPP: the average useful photons per pixel. RT: reconstruction time.
Fig. 6.
Fig. 6. Normalized Root Mean Squared Error (NRMSE on the left y-axis) and Structural Similarity Index Measure (SSIM on the right y-axis) of the simulated frames reconstructed by the ADMM-based and the EP-EM methods in Fig. 5.
Fig. 7.
Fig. 7. 3D view of uncertainty measures of the cubic color images and LEGO color images reconstructed by the EP-EM method in Fig. 5. In this figure, each 3D point denotes the uncertainties of a single color pixel in the reconstructed color images in the R, G, and B channels. The different point colors represent different exposures. To visualize the uncertainties on each individual channel, the 3D point clouds are projected onto the 2D planes (RG-plane, RB-plane, and BG-plane).
Fig. 8.
Fig. 8. Assessment of the reconstruction quality (via the EP-EM method) under realistic observation conditions. (a) Example of the reconstructed 64 $\times$ 64 pixels frame from real SPAD data (exposure of 5 ms, see Visualization 2). (b) Reconstructed frames of size 64 $\times$ 64, 128 $\times$ 128, 256 $\times$ 256, and 512 $\times$ 512 pixels, for data simulated with per pixel signal and background levels similar to those of the data acquired in (a).
Fig. 9.
Fig. 9. Two views of the moving objects used to generate the real SPAD videos (see Visualization 1 and Visualization 2) using the passive imaging optical setup in Fig. 3.
Fig. 10.
Fig. 10. Experimental results on a set of selected frames of the real SPAD videos (see Visualization 1 and Visualization 2). PPP is the average useful photons per pixel.
Fig. 11.
Fig. 11. 3D view of uncertainty measures of the color video frames reconstructed by the EP-EM method in Fig. 10. Here similar to Fig. 7, each 3D point denotes the uncertainties of a single pixel in the reconstructed color frames in the red, green, and blue channels, and the projection onto 2D planes is to better visualize the difference between uncertainty levels. The different point colors represent different exposures.

Tables (2)

Tables Icon

Table 1. The values of metasurface filter design parameters in Fig. 1(a).a

Tables Icon

Table 2. CPU computational time (in seconds) comparison between the ADMM-based method and the EP-EM method for different frame sizes and exposures.a

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

λ n , t = a n x n , t T + b n ,
f y | x ( y t | x t ) = n = 1 N f y | x ( y n , t | x n , t ) = n = 1 N P ( max ( a n x n , t T + b n , 0 ) ) .
f x ( x t | θ t ) c = R , G , B [ ( i , j ) V e θ t ( c ) | x i , t ( c ) x j , t ( c ) | ] ,
p ( x t | y t , θ t ) = f y | x ( y t | x t ) f x ( x t | θ t ) f y | x ( y t | x t ) f x ( x t | θ t ) d x t .
x ^ t , MAP = arg max x t p ( x t | y t , θ ^ t ) .
x ^ t , MMSE = E p ( x t | y t , θ ^ t ) [ x t ] ,
p ( x t | y t , θ ^ t ) N ( x t ; μ t , Σ t ) ,
x ^ t , MMSE μ t , V ( x t ) diag ( Σ t ) .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.