Computational multi-depth single-photon imaging

Dongeek Shin; Feihu Xu; Franco N. C. Wong; Jeffrey H. Shapiro; Vivek K Goyal

doi:10.1364/OE.24.001873

1. Introduction

The ability to acquire 3D structure of a scene is important in many applications, such as biometrics [1], terrestrial mapping [2], and gaming [3]. Time-of-flight methods for 3D acquisition illuminate the scene and analyze the backreflected light signal [4, 5]. However, as illustrated in Fig. 1, scenes that include partially-reflective or partially-occluding objects have complex patterns of light being reflected at different depths even at a single pixel. For such scenes, one can analyze multiple light returns to fully reconstruct the multiple depths present in the field-of-view. This is known as the problem of multi-depth reconstruction from full-waveform measurements [6].

Fig. 1 Examples of active imaging scenarios in which the scene response is a sum of responses from multiple reflectors. (a) Imaging scene with a partially-reflecting object (shown in gray dashed line). (b) Imaging scene with a partially-occluding object.

Download Full Size | PDF

Conventional time-of-flight imaging sensors, such as amplitude-modulated continuous-wave (AMCW) modules, aim to reconstruct multi-depth profiles by the methods of transient imaging [7–9]. For low-power and long-range 3D imaging applications, a sensitive single-photon detector is used instead in the active imaging setup for low-light level operations [10–13]. Recent advances in single-photon imaging system design allowed experimental demonstration of low-flux time-of-flight imaging at long ranges (∼100 m) [14, 15]. Given a pulsed illumination source, time histogramming methods of the photon detections from the backreflected pulse waveform have been used for pixelwise reconstruction of scene depths using single-photon imagers [16, 17]. For example, the high sensitivity of time-correlated single-photon imaging systems was demonstrated when full photon histogram measurements were used to track pulses of light in flight [18].

For low-flux multi-depth imaging of scene reflectors in particular, one may choose to identify the peaks in the photon histogram by brute-force search over time bins. However, since this leads to a large processing time (polynomial in the number of time bins, with degree equal to the number of depths), fast algorithms using parametric deconvolution or finite-rate-of-innovation methods have been developed [19, 20]. Assuming accurate waveform measurement, the compressive depth acquisition camera (CoDAC) [21] framework also exploits parametric deconvolution, but for estimating positions of extended planar facets rather than multiple depths per transverse location.

The previously described multi-depth imaging methods using single-photon detectors only give accurate results when the image acquisition time is long enough that the number of photon detections is sufficiently high to form an accurate histogram. The problem of recovering the multi-depth information is generally difficult in low-flux scenarios due to the low signal-to-noise ratio of observing only a few photon detections, and extraneous background light and detector dark counts. For moderate numbers of detected photons, a statistical approach has been used to estimate the multi-depth profile by learning a mixture of Gaussians (MoG) model that interprets the photon detection data as samples from a distribution of the full-waveform observation. In the mixture model, the mode of each mixture component corresponds to a depth value of a target. Learning of the mixture model is achieved either by using the expectation-maximization (EM) algorithm [22] or the Markov chain Monte Carlo (MCMC) method [23,24]. However, MoG-based multi-depth estimation involves a non-convex cost function and generates locally optimal solutions whose accuracy is generally poor when the number of detected photons is low. Thus, existing methods are limited in their ability to recover a scene’s multi-depth information in a photon-efficient manner.

In this paper, we demonstrate a full-waveform imaging framework that accurately recovers the multi-depth profile from single-photon observations. For example, as detailed in Section 4.2, our framework was successful in accurately reconstructing the depth features of a mannequin behind a partially-reflecting medium using only 19 signal photon detections with root mean-square (RMS) depth error of 11.4 cm. Compared to the conventional MoG-based method, which gave RMS depth error of 48.7 cm, this is an improvement by a factor of 4.2. Unlike previous works, we show that the multi-depth estimation problem from single-photon observations can be reformulated as a convex optimization problem by combining the statistics of photon detection data with sparsity of multi-depth profiles. By adapting the iterative shrinkage-thresholding algorithm (ISTA) [25, 26] used for robust sparse signal pursuit to our single-photon imaging setup, we accurately solve for the global optimum of the convex optimization problem to obtain a multi-depth solution from a small number of photon detections.

2. Imaging framework

2.1. Imaging setup

Figure 2 depicts the single-photon imaging setup we used to estimate the 3D structure of scene. A focused optical source, such as a laser, periodically illuminates a patch of the scene using the pulse waveform function s(t) starting at time 0. Let T_r be the pulse repetition period in seconds, N be the total number of independent pulse illuminations made at the image pixel, and T_p be the pulsewidth defined as the RMS duration of the pulse waveform. The single-photon detector, in conjunction with a time correlator, is used to time stamp individual photon detections that are generated by the backreflected waveform from the scene plus extraneous detections arising from background light and dark counts. The recorded time of a photon detection is relative to the time of the most recent pulse illumination. We raster scan the optical source over multiple pixels in the scene to recover a spatially-resolved depth profile.

Fig. 2 (Top) Full-waveform single-photon imaging setup for estimation of depths of multiple objects. In this example, a pulsed optical source illuminates a pixel of the scene that includes a partially-reflective object occluding a target of interest. The optical flux incident at the single-photon detector combines the backreflected waveform from multiple reflectors in the scene pixel with extraneous background light. (Bottom left) The photon detections, shown as spikes, are generated by the N-pulse rate function Nλ(t) following an inhomogeneous Poisson process. The green and blue spikes represent photon detections from the first and second reflector, respectively; the red spikes represent the unwanted photon detections from background light and dark counts. (Bottom right) Our convex optimization processing enables accurate reconstruction of multiple depths of reflectors in the scene from a small number of photon detections.

Download Full Size | PDF

2.2. Observation model at a single pixel

We first derive the relationship between the depths of multiple reflectors in the scene and our observed data of photon detections. To avoid unnecessary notation, we characterize the light transport and detection statistics for a single pixel; the same model applies at each pixel.

Let r(t) be the total optical flux that is incident on the single-photon detector. Through the linearity and time invariance of optical flux transport, we can write

r (t) = (h * s) (t) + B,

where h(t) is the impulse response of the scene, B is the constant background light flux in photons per second, and * represents convolution. Then, the rate function λ(t) that describes the photon detections at the single-photon detector is

λ (t) = η (h_{d} * r) (t) + d,

where η is the detector’s quantum efficiency, d is the detector dark count rate, and h_d(t) is the detector’s response function. For simplicity, we assume a normalized detector response function so that ∫h_d(t)dt = 1. Substituting Eq. (1) into Eq. (2) then gives

λ (t) = η (h * \tilde{s}) (t) + (η B + d),

where

\tilde{s} = h_{d} * s

is the effective pulse waveform after accounting for the detector response. Note that because ηB + d is constant in t and ∫h_d(t)dt = 1, the detector response function does not appear in the second term. Hereafter, we assume that the scene is entirely within the maximum unambiguous range of the imager, so h(t) = 0 for t > T_r. We also assume that T_p ≪ T_r, so we may avoid complicating the exposition by assuming

(h * \tilde{s}) (t) = 0

for t ∉[0,T_r).

A time-correlated single-photon detector records the time-of-detection of a photon within a timing resolution of Δ seconds. We choose a pulse repetition period that is much longer than the timing resolution (Δ ≪ T_r) and assume that T_r is divisible by Δ. Then, we see that m = T_r/Δ is the number of time bins that may contain photon detections. In other words, m is the dimension of the photon count vector y, where y_k ∈ ℕ for k = 1,2,…,m. (We use y_k to indicate the scalar value at the kth index of the vector y.) Using the probabilistic theory of photon detections [27], after N pulse illuminations at an image pixel, the kth bin of the observed photon count histogram is distributed as

\begin{array}{l} y_{k} \sim Poisson (N \int_{(k - 1) Δ}^{k Δ} λ (t) d t) \\ = Poisson (\underset{Mean count of signal photons}{\underset{︸}{N η \int_{(k - 1) Δ}^{k Δ} (h * \tilde{s}) (t) d t}} + \underset{Mean count of background photons plus dark counts}{\underset{︸}{N Δ (η B + d)}}) . \end{array}

We would like to reach an approximation in which the Poisson parameter of y_k is given by the product of a known matrix and an unknown vector.

We will approximate $(h * \tilde{s}) (t)$ with a sampling period of Δ′, where we have Δ′ = T_r/n for some n ∈ ℤ ⁺. We can approximate the first term in the Poisson parameter in Eq. (4) as

\begin{array}{l} N η \int_{(k - 1) Δ}^{k Δ} (h * \tilde{s}) (t) d t = \int_{(k - 1) Δ}^{k Δ} \int_{0}^{T_{r}} N η h (y) \tilde{s} (t - y) d y d t \\ \overset{(a)}{=} \int_{(k - 1) Δ}^{k Δ} \sum_{j = 1}^{n} \int_{(j - 1) Δ^{'}}^{j Δ^{'}} N η h (y) \tilde{s} (t - y) d y d t \\ = \sum_{j = 1}^{n} \int_{(k - 1) Δ}^{k Δ} \int_{(j - 1) Δ^{'}}^{j Δ^{'}} h (y) N η \tilde{s} (t - y) d y d t \\ \overset{(b)}{\approx} \sum_{j = 1}^{n} \int_{(k - 1) Δ}^{k Δ} \int_{(j - 1) Δ^{'}}^{j Δ^{'}} \frac{x_{j}}{Δ^{'}} \frac{S_{k, j}}{Δ} d y d t \\ = \sum_{j = 1}^{n} S_{k, j} x_{j}, \end{array}

where (a) follows from partitioning [0,T_r) into n subintervals and (b) from replacing h(y) and

N η h (y) \tilde{s} (t - y)

by constant approximations on (y,t) ∈ [(j − 1) Δ′, jΔ′) × [(k − 1)Δ, kΔ); specifically, we define

x_{j} = \int_{(j - 1) Δ^{'}}^{j Δ^{'}} h (y) d y, for j = 1, \dots, n,

S_{k, j} = \frac{1}{Δ^{'}} \int_{(k - 1) Δ}^{k Δ} \int_{(j - 1) Δ^{'}}^{j Δ^{'}} N η \tilde{s} | (t - y) d y d t, for k = 1, \dots, m, j = 1, \dots, n .

Note that the quality of the approximation (b) will depend on the size of Δ′. Finally, using the derived approximations, the observation model of Eq. (4) can be rewritten in a concise matrix-vector form as

y_{k} \sim Poisson ({(Sx + b 1_{m})}_{k}), for k = 1, 2, \dots, m,

where x is an n × 1 vector, S is an m × n matrix, and |1_m is an m × 1 vector of ones, and

b = N Δ (η B + d) .

2.3. Observation likelihood expressions

Our goal of multi-depth reconstruction is to accurately estimate x from y. Using Eq. (6), the photon count histogram vector y has the probability mass function

p_{Y} (y; x, S, b) = \prod_{k = 1}^{m} \frac{\exp {- {(Sx + b 1_{m})}_{k}} {(Sx + b 1_{m})}_{k}^{y_{k}}}{y_{k}!} .

We can thus write the negative log-likelihood of x as

ℒ (x; y, S, b) = - \log p_{Y} (y; x, S, b) ≐ \sum_{k = 1}^{m} [{(Sx)}_{k} - y_{k} \log {(Sx + b 1_{m})}_{k}],

where ≐ indicates equality up to terms independent of x. By checking the positive-semidefiniteness of the Hessian matrix of the negative log-likelihood function, it is straightforward to prove that ℒ (x;y,S,b) is convex in x.

2.4. Characteristics of the impulse response functions of natural scenes

It has been shown that the following K-reflector model is effective in describing the impulse response of a natural scene with multiple reflectors [8,23]:

h (t) = \sum_{i = 1}^{K} a^{(i)} δ (t - 2 d^{(i)} / c), t \in [0, T_{r}),

where a⁽ⁱ⁾ and d⁽ⁱ⁾ are respectively the reflectivity and depth values of the ith reflector at an image pixel, δ(·) denotes the delta function, c is the speed of light, and K is the number of reflectors. Let us choose the indexing rule so that d⁽¹⁾ < d⁽²⁾ <⋯< d⁽^K⁾. Then, we define the minimum separation of reflector depths as

d_{s} = \min_{i = 1, \dots, K - 1} | d^{(i)} - d^{(i + 1)} | .

We observe that, assuming the K-reflector model and cΔ′/2 < d_s, exactly K elements of x are nonzero and those entries have values ${a^{(i)}}_{i = 1}^{K}$ .

3. Novel image reconstruction algorithm

The multi-depth profile can be estimated by minimizing the negative log-likelihood function, while including a signal support constraint that the number of nonzero elements in x must be equal to K. However, the ℓ₀-norm constraint set, which describes the set of vectors with K nonzero elements, is a non-convex set. With no additional assumptions, exactly solving an optimization problem constrained to this set is computationally infeasible, since the problem is NP-hard [28]. In order to design a computationally feasible algorithm, we apply the convex relaxation whereby the ℓ₁-norm serves as a proxy for the ℓ₀-norm [29]. Our proposed imaging framework also includes a constraint on reflectivity values being nonnegative. Thus, we obtain the multi-depth profile estimate ${\hat{x}}_{OPT}$ by solving the following ℓ₁-penalized and constrained likelihood optimization problem:

\begin{array}{l} \underset{x}{minimize} ℒ (x; y, S, b) + τ {‖ x ‖}_{1} \\ subject to x_{k} \geq 0, k = 1, 2, \dots, n, \end{array}

where τ > 0 is the variational parameter controlling the degree of penalizing the non-sparsity of the signal. Because ℒ (x;y,S,b) and the ℓ₁-norm are both convex functions in x and the nonnegative cone is a convex set, the minimization problem given in (OPT) is a convex optimization problem.

ISTA is a celebrated algorithm for rapidly solving the ℓ₁-penalized constrained likelihood optimization problem when the data is corrupted by additive white Gaussian noise. However, instead of using a Gaussian likelihood, (OPT) is derived based on the model for single-photon observations. Thus, we modify the first step of ISTA that takes a gradient descent in the least-squares cost to one that takes a gradient descent in the negative log likelihood obtained from photon observations given in Eq. (9). We can compute the gradient of our negative log likelihood function as

\begin{array}{l} \nabla_{x} ℒ (x; y, S, b) = \sum_{k = 1}^{m} \nabla_{x} [{(Sx)}_{k} - y_{k} \log {(Sx + b 1_{m})}_{k}] = \sum_{k = 1}^{m} [{(S^{T})}_{k} - \frac{y_{k}}{{(Sx + b 1_{m})}_{k}} {(S^{T})}_{k}] \\ = S^{T} [1_{m} - div (y, Sx + b 1_{m})], \end{array}

where we used (S^T)_k to denote the kth column of S^T and div(·, ·) represents elementwise division of the vector in the first argument by the vector in the second one. We then modify the second step of ISTA that performs a shrinkage-thresholding operation on the gradient-descent solution to include the nonnegativity constraint of scene reflectivities. Our extra nonnegativity constraint replaces the shrinkage-thresholding operation with a shrinkage-rectification operation. The shrinkage-thresholding used in ISTA and the shrinkage-rectification used in our algorithm are illustrated in Fig. 3.

Fig. 3 Illustration of a shrinkage-thresholding operation used as a step in ISTA (left) and the shrinkage-rectification operation used as a step in our SPISTA (right) that includes the nonnegativity constraint. Here the operations map scalar v to scalar z (variables only used for illustration purposes), with regularization parameter τ.

Download Full Size | PDF

Our modified ISTA algorithm is given in Algorithm 1. After solving (OPT) using Algorithm 1, we apply post-processing on ${\hat{x}}_{OPT}$ that sets small residual nonzero elements to zero and groups closely-neighboring nonzero elements into an average depth. The depth grouping step is to ensure maximum sparsity of the multi-depth profile, by using the assumption that a tight cluster of depth estimates originates from a single reflector in the scene. This end-to-end processing is summarized in Algorithm 2. To illustrate the role of the post-processing, Fig. 4 shows the intermediate SPISTA output and the final output of Algorithm 2 for one pixel of our experiment detailed in Section 4.2.2 (see pixel marked in experimental setup figure).

Algorithm 1. Single-photon iterative shrinkage-thresholding algorithm (SPISTA)

View Table | View all tables in this article

Algorithm 2. Computational multi-depth single-photon imaging

View Table | View all tables in this article

Fig. 4 Illustration of steps of Algorithm 2 using experimental photon-count data for the single-pixel multi-depth example of partially-occluding reflectors in Section 4.2.2. (a) The raw photon count vector y obtained by imaging a pixel with two depths. Other than the photon detections describing the two targets of interest, we observe extraneous photon detections from background and dark counts. (b) The output solution of SPISTA in Algorithm 1. Note that the extraneous background and detector dark counts are suppressed. (c) The final solution of Algorithm 2 that groups depths of SPISTA output.

Download Full Size | PDF

In summary, we have developed a low-flux multi-depth imaging framework that incorporates the statistics of single-photon detections with the sparsity of the multi-depth profile at a pixel to formulate a convex optimization problem in (OPT). This is unlike existing histogram-based low-flux methods, such as MoG, which solve a non-convex problem directly and do not guarantee high accuracy solutions due to local minima. Our framework modifies ISTA to include accurate photodetection statistics and the nonnegativity constraint to accurately solve (OPT). Our algorithm also employs post-processing to ensure filtering of residual signals and clustering depth estimates to maximize the level of sparsity of the final multi-depth estimate.

4. Results

4.1. Simulations

4.1.1. Performance of two-path recovery

Using simulations, we first study the multi-depth estimation performance for K = 2, motivated by the two-Dirac recovery scenario of second-order multipath interference from reflective surfaces [30] and looking through a transparent object [31] in conventional high light-level time-of-flight imaging. We focus on comparing two algorithms: the MoG-based estimator using a greedy histogram-data-fitting strategy and our proposed imager using convex optimization. Let {d₁,d₂}, with d₁ < d₂, be the set of true depths at a pixel. Also, let ${{\hat{d}}_{1}, {\hat{d}}_{2}}$ , with ${\hat{d}}_{1} < {\hat{d}}_{2}$ , be the set of identified depths obtained using either the MoG method or our proposed framework. Then, we use the pulsewidth-normalized root mean-square error (NRMSE) to quantify the recovery performance of the two-path signal:

NRMSE ({d_{1}, d_{2}}, {{\hat{d}}_{1}, {\hat{d}}_{2}}) = \frac{1}{c T_{p} / 2} \sqrt{E [\frac{1}{2} ({(d_{1} - {\hat{d}}_{1})}^{2} + {(d_{2} - {\hat{d}}_{2})}^{2})]},

such that if NRMSE is below 1, then the imager has achieved sub-pulsewidth depth accuracy. When more than two paths were estimated by the algorithm, two depth values with highest intensities were used for NRMSE computation.

Figure 5 shows Monte Carlo simulated performance results of pixelwise two-path estimation using the MoG-based method and our method. The results are presented for low (b = 0.1) and high (b = 0.5) background levels. For learning MoG components given photon observation samples, we used the EM algorithm. Simulation parameters were set as the following: the number of detector time bins m = 100, the number of discretized depth bins n = 100, RMS pulsewidth T_p = 0.3 ns, and bin width Δ = 1 ns. The pulse shape was discrete Gaussian. The number of Monte Carlo trials for the simulation was 2000. For each Monte Carlo trial, two entries out of n were generated in a uniformly random fashion. (A two-path reflector profile was chosen from n-choose-2 combinations at random.) Both selected entries were set to 1, in order to simulate two reflectors with unit reflectivities. For our algorithm, we chose the regularization parameter τ = b, based on a heuristic that higher penalty is required for higher background levels. We chose the convergence parameter δ = 0.01, and the residual filtering parameter ε = 0.1. Also, our initialization ${\hat{x}}^{(0)}$ was chosen to be y. We see that for both low and high background levels, our proposed framework uniformly outperforms the existing MoG method for various numbers of photons backreflected from the scene. For example, for both b = 0.1 and b = 0.5, the difference in RMSE between our framework and MoG is around 2 given 10 signal photon detections. This translates to RMS depth error reduction of 9 cm, since 1 NRMSE equals cT_p/2 = 4.5 cm. Also, our method successfully achieves sub-pulsewidth depth accuracy (NRMSE less than 1) when the number of signal photons exceeds ∼30, while the MoG method fails to do so.

Fig. 5 Simulated performance of MoG-based method and proposed framework in recovering signals with K = 2 for two different background levels. Signal photon detections are detections originating from scene response and do not include the background-light plus dark-count detections. Note that the units of NRMSE are in meters, after being normalized by the pulsewidth; 1 NRMSE equals cT_p/2 = 4.5 cm. The plots also include error bars indicating the ±1 standard errors.

Download Full Size | PDF

In this simulation, we required an average of 85 SPISTA iterations per pixel. The average per pixel processing time of Algorithm 2 was measured to be ∼0.004 seconds. Our algorithm’s processing time is short because the computational time of a SPISTA update is linear in the number of depth bins n, since the most costly operation in SPISTA is the size-n convolution, and the post-processing step only requires a linear search over n bins. The average per pixel processing time of the MoG method was measured to be ∼0.019 seconds. All processing was done using a laptop with Intel(R) Core(TM) i7-4500u CPU @ 1.80 GHz.

4.1.2. Resolvability

In the problem of multi-depth estimation, it is natural to ask how small the depth separation of two adjacent reflectors can be so that the proposed algorithm can still accurately resolve two reflectors instead of one. Figure 6 shows simulation results that describe how the number of reflectors estimated by our algorithm (number of nonzero elements in ${\hat{x}}_{OPT}$ ) varies with the distance between two reflectors and the relative reflectivities of the two reflectors. We fixed the mean number of photons coming from the first target with unit reflectivity as 100 by changing N, the number of illumination trials, in Eq. (5c). The reflectivity of the second target was set to be one of the values in the set {1, 1/2, 1/4, 1/8}. Here we had m = n = 100, Δ = 1 cm, δ = 10⁻², ε = 0.1, τ = 0.5, and b = 0. Algorithm 2 was applied to 2000 independent simulated experiments and the results in Fig. 6 show the mean number of estimated reflectors. In Fig. 6, for all relative reflectivity settings, we observe that if the distance between two reflectors is too small (around 3cT_p), then our algorithm falsely recognizes two reflectors as a single reflector. Even when there is a large separation between reflectors (larger than 3cT_p), if the ratio between target reflectivities is too small (such as 1/8), then the number of reflectors is likely to be falsely recognized as 1 by our Algorithm 2.

Fig. 6 Simulated results of mean estimates of the number of reflectors produced by Algorithm 2 at a pixel, when the RMS pulsewidth is set to be cT_p = 2 cm. Here we show plots when the reflectivity ratio between the first and second target is (a) 1 (blue line), (b) 1/2 (cyan line), (c) 1/4 (yellow line), and (d) 1/8 (red line),

Download Full Size | PDF

4.2. Experimental results

We demonstrate the performance of the proposed multi-depth imager using an experimental single-photon imaging setup. A PicoQuant LDH series pulsed laser diode with center wavelength of 640 nm, pulsewidth T_p = 270 ps, and repetition period T_r = 100 ns was used as the illumination source. We observed that the laser spot size cast on a planar object at a 1 m distance was around 1 mm, implying that the beam solid angle was around 7.9 × 10⁻⁷ sr. A Thorlabs GVS012 two-axis galvo was used to raster scan the scene with a field-of-view of 40° × 40°. A lensless Micro Photon Devices PDM series Geiger-mode avalanche photodiode detector with quantum efficiency η = 0.35, timing jitter less than 50 ps, and dark counts per second less than 2 × 10⁴ was used for detection of photons. A PicoQuant model HydraHarp 400 time-correlator was used to record the detection times of individual photons. Figure 7 shows a photograph of our experimental single-photon imaging setup. We injected extraneous background light using an incandescent lamp. In this paper, we chose to use the raster-scanning setup simply due to availability of equipment; one can observe that our computational framework can be applied without modification when employing an imaging setup that includes a flood illumination source and an array of single-photon detectors.

Fig. 7 Experimental setup with a raster-scanning source and a single SPAD detector. The red arrows show the path of the optical signal from laser source, and the green arrows show the path of the electrical signal that indicates whether a photon has been detected or not.

Download Full Size | PDF

For all experiments, we obtained the pulse waveform matrix S and background-light plus dark-count value b through a calibration step prior to the scene measurements. We obtained the pulse shape by projecting the laser light at the wall, which was a reflective plane 4 meters from the imager, and collecting a photon count histogram. We time-gated an interval of the photon count histogram at around 4 × 2/c seconds, such that the interval contained a clean representation of the pulse histogram. By creating a convolution matrix using the calibrated pulse waveform shape, we got a measurement of S from Eq. (5c). The value of b from Eq. (7) was obtained by taking a baseline measurement of the incandescent light with laser light not present in scene. Code and data are available in [32].

4.2.1. Imaging through a partially-reflective object

Figure 8 shows experimental results of imaging through a partially-reflective object, which is the multi-depth imaging scenario in Fig. 1(a), using the MoG-based and proposed multi-depth estimation methods with an average of 46 photon detections at each pixel. We used a stack of plastic sheets enclosed in a plexiglass case as the partially-reflective object, with an average reflectivity of ∼50%. Through calibration we found that the probabilities of a photon coming from the scatterer, the scene behind the scatterer, and background light or dark counts were equal to 0.44, 0.42, and 0.14, respectively. These numbers were calibrated using a single histogram with 10000 photons gathered from all pixels, where each pixel contributed a single photon to the aggregate histogram. Thus, the number of photons that originated from the scene-of-interest behind the scatterer is calculated to be 46 × 0.42 ≈ 19. The raster-scanning resolution was set to be 100 × 100 in this experiment.

Fig. 8 (Left) Photograph of the mannequin placed behind a partially-scattering object from the single-photon imagers’ point of view. (Right) Experimental results for estimating depth of the mannequin through the partially-reflective material using MoG-based and our estimators, given that our imaging setup is at longitudinal position z = 0. Our multi-depth results were generated using the parameters τ = 0.1, δ = 10⁻⁴, ε = 0.1, and ${\hat{x}}^{(0)} S^{T} y$ .

Download Full Size | PDF

We see that the existing MoG-based method fails to recover useful depth features of the mannequin, but our method successfully does so. For example, in the side view of the reconstructed depth, the result from our method differentiates the longitudinal locations of the face and the torso of the mannequin, unlike the result from the MoG method. We were able to form a ground truth depth map of the mannequin by using the log-matched filtering solution [27] on a larger dataset of 500 photons per pixel, after time-gating the photon arrivals at around 2.6 ns such that remaining photons only describe the mannequin scene placed at around 4 m. With respect to this ground truth, the conventional MoG solution gave 48.7 cm of RMS depth error and ours gave 11.4 cm. The RMS depth error was computed as the square root of the average of squared depth errors over all pixels. Our framework thus gave an improvement in reducing erroneous pixel depth by a factor of 4.2, compared to the MoG method, for the task of imaging a scene with partially-reflective object.

In this experiment of imaging through a partially-reflecting object, we required an average of 98 SPISTA iterations per pixel. The average per pixel processing time of Algorithm 2 was measured to be ∼0.036 seconds and its total processing time for the spatially-resolved multi-depth reconstruction was ∼6 minutes. The average per pixel processing time of the MoG method was measured to be ∼0.003 seconds and its total processing time was ∼30 seconds.

4.2.2. Imaging a partially-occluding object

For experimental validation of multi-depth estimation for scenes with partially-occluding objects, we used a photon detection dataset that was collected by Dheera Venkatraman for the first-photon imaging project [11] with the same raster-scanning setup. Our experimental scene consisted of a sunflower and the background wall as shown in Fig. 9. This experiment models the multi-depth imaging scenario in Fig. 1(b). Here we have a higher raster-scanning resolution of 300 × 300, such that many pixels are at the depth boundaries of the two reflective objects: the sunflower and the wall. There are multiple returns at such pixels, where the sunflower petals partially occlude the wall behind it. This artifact is also known as one of mixed pixels in the context of time-of-flight imaging [33]. In our data, the probabilities of a photon originating from the scene and from background light or dark counts are 0.8 and 0.2, respectively. These numbers were calibrated using a single histogram with 90000 photons gathered from all pixels, where each pixel contributed to a single photon to the aggregate histogram.

Fig. 9 Single-photon imaging setup for estimating multi-depth from partial occlusions at depth boundary pixels. Sample data from 38 photon detections is shown below for the pixel (94,230) where partial occlusions occur. We show experimental results of multi-depth recovery for this scene using the MoG-based and our methods.

Download Full Size | PDF

Figure 10 shows how the proposed imager compares to the MoG estimator for the sunflower and wall scene. The mean number of photon detections over all pixels was measured to be 26 for this experiment. In the figure, we observe that our proposed multi-depth imager successfully distinguishes the sunflower’s petals from its leaves and the wall behind it, even though there exist mixed-depth pixels at boundaries and high background light plus dark counts. This is most visible in the side view, where we see that the noisy depth estimates present in the MoG results are absent when using our method. Similar to the previous imaging experiment, we were able to form a ground truth depth map of the sunflower by using the log-matched filtering solution on a larger dataset of 500 photons per pixel, after time-gating the photon arrivals at around 1.6 ns such that remaining photons only describe the sunflower placed at around 2.5 m, and not the wall behind it. Then, we computed that while the conventional MoG solution gave 46.5 cm of RMS depth error, ours gave 4.3 cm. Our framework thus gave an improvement in RMS depth error by a factor of 10.8, compared to the MoG method, for the task of imaging a scene with partially-occluded object.

Fig. 10 Experimental results of depth reconstruction of sunflower occluding a wall, given that our imaging setup is at z = 0. Using our imaging framework, the mixed-pixel artifacts at the depth boundary of the flower and background light plus dark count noise are suppressed. Our multi-depth results were generated using the parameters τ = 0.2, δ = 10⁻⁴, ε = 0.1, and ${\hat{x}}^{(0)} = S^{T} y$ .

Download Full Size | PDF

In this experiment of imaging a partially-occluding object, we required an average of 7 SPISTA iterations per pixel. The average per pixel processing time of Algorithm 2 was measured to be ∼0.0014 seconds and its total processing time for the spatially-resolved multi-depth reconstruction was ∼2 minutes. The average per pixel processing time of the MoG method was measured to be ∼0.0057 seconds and its total processing time was ∼8.5 minutes.

5. Conclusion and future work

In this paper, we presented a robust framework for reconstruction of multi-depth profiles of a scene using a single-photon detector. Our novel imaging method accurately models the single-photon detection statistics from multiple reflectors in the scene, while exploiting the fact that multi-depth profiles can be expressed as sparse signals. Using our signal model the multi-depth estimation problem is a sparse deconvolution problem. We designed an algorithm inspired by ISTA that reaches the globally optimal solution of the convex relaxation of the sparse deconvolution problem with high computational efficiency, demonstrated by the sub-second per-pixel processing time in our simulations and experiments. Using both simulations and experiments for scenes including partially-reflecting and partially-occluding objects, we demonstrated that our imaging framework outperforms the existing MoG-based method for multi-depth estimation in the presence of background light and dark counts.

Unlike the parameter-free MoG-based multi-depth imaging method, our framework introduces a number of free parameters, such as the regularization parameter. For our experiments, we used the heuristic of choosing the parameters based on the calibrated background level. In practice, when the signal-to-background ratio varies over multiple imaging experiments, one can employ cross-validation techniques to learn the scalar parameters from multiple experiments [34].

For future work, it is of interest to study how applying other post-processing methods, such as 3D point cloud filtering, can improve multi-depth recovery performance. Also, optoelectronic techniques, such as range gating and narrowband optical filtering, can be incorporated to reject background counts at the data acquisition level to have a more accurate multi-depth reconstruction.

Acknowledgments

This material is based upon work supported in part by a Samsung Scholarship, the US National Science Foundation under Grant No. 1422034, and the MIT Lincoln Laboratory Advanced Concepts Committee. We thank Dheera Venkatraman for his assistance with the experiments.

References and links

1. K. I. Chang, K. W. Bowyer, and P. J. Flynn, “An evaluation of multimodal 2D+3D face biometrics,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 619–624 (2005). [CrossRef] [PubMed]

2. J. Côté, J. Widlowski, R. A. Fournier, and M. M. Verstraete, “The structural and radiative consistency of three-dimensional tree reconstructions from terrestrial LIDAR,” Remote Sens. Environ. 113, 1067–1081 (2009). [CrossRef]

3. C. D. Mutto, P. Zanuttigh, and G. M. Cortelazzo, Time-of-Flight Cameras and Microsoft Kinect™ (Springer-Verlag, 2012). [CrossRef]

4. R. Blahut, Theory of Remote Image Formation (Cambridge University Press, 2004). [CrossRef]

5. J. Liang, L. Gao, P. Hai, C. Li, and L. V. Wang, “Encrypted three-dimensional dynamic imaging using snapshot time-of-flight compressed ultrafast photography,” Sci. Rep. 5, 15504 (2015). [CrossRef] [PubMed]

6. C. Mallet and F. Bretar, “Full-waveform topographic lidar: State-of-the-art,” ISPRS J. Photogramm. Remote Sensing 64, 1–16 (2009). [CrossRef]

7. F. Heide, M. B. Hullin, J. Gregson, and W. Heidrich, “Low-budget transient imaging using photonic mixer devices,” ACM Trans. Graphics 32, 45 (2013) [CrossRef]

8. D. Freedman, Y. Smolin, E. Krupka, I. Leichter, and M. Schmidt, “SRA: Fast removal of general multipath for ToF sensors,” in European Conference on Computer Vision (ECCV), (Springer, 2014), pp. 234–249.

9. H. Qiao, J. Lin, Y. Lin, M. B. Hullin, and W. Dai, “Resolving transient time profile in ToF imaging via log-sum sparse regularization,” Opt. Lett. 40, 918–921 (2015). [CrossRef] [PubMed]

10. S. Pellegrini, G. S. Buller, J. M. Smith, A. M. Wallace, and S. Cova, “Laser-based distance measurement using picosecond resolution time-correlated single-photon counting,” Meas. Sci. Technol. 11, 712–716 (2000). [CrossRef]

11. A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. C. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343, 58–61 (2014). [CrossRef]

12. D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Computational 3D and reflectivity imaging with high photon efficiency,” in Proceedings of IEEE International Conference on Image Processing, (IEEE, New York, 2014), pp. 46–50.

13. D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3D and reflectivity imaging with single-photon detectors,” IEEE Trans. Computational Imaging 1, 112–125 (2015). [CrossRef]

14. C. Niclass, A. Rochas, P. A. Besse, and E. Charbon, “Design and characterization of a CMOS 3-D image sensor based on single photon avalanche diodes,” IEEE J. Solid-State Circuits 40, 1847–1854 (2005). [CrossRef]

15. C. Niclass, M. Soga, H. Matsubara, S. Kato, and M. Kagami, “A 100-m range 10-frame/s 340 96-pixel time-of-flight depth sensor in 0.18-CMOS,” IEEE J. Solid-State Circuits 48, 559–572 (2013). [CrossRef]

16. S. Bellisai, D. Bronzi, F. Villa, S. Tisa, A. Tosi, and F. Zappa, “Single-photon pulsed-light indirect time-of-flight 3D ranging,” Opt. Express 21, 5086–5098 (2013). [CrossRef] [PubMed]

17. Y. Altmann, X. Ren, A. McCarthy, G. S. Buller, and S. McLaughlin, “Lidar waveform based analysis of depth images constructed using sparse single-photon data,” arXiv:1507.02511 [stat:AP] (2015).

18. G. Gariepy, N. Krstajić, R. Henderson, C. Li, R. R. Thomson, G. S. Buller, B. Heshmat, R. Raskar, J. Leach, and D. Faccio, “Single-photon sensitive light-in-flight imaging,” Nat. Commun. 6, 7021 (2015).

19. J. Castorena, C. D. Creusere, and D. Voelz, “Using finite moment rate of innovation for lidar waveform complexity estimation,” in Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, (IEEE, New York, 2010), pp. 608–612.

20. J. Castorena and C. D. Creusere, “Compressive sampling of LIDAR: Full-waveforms as signals of finite rate of innovation,” in Proceedings of the 20th European Signal Processing Conference, (IEEE, New York, 2012), pp. 984–988.

21. A. Kirmani, A. Colaço, F. N. C. Wong, and V. K. Goyal, “Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor,” Opt. Express 19, 21485–21507 (2011). [CrossRef] [PubMed]

22. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist. Soc., Ser. B 39, 1–38 (1977).

23. A. Wallace, J. Ye, N. Krichel, A. McCarthy, R. J. Collins, and G. S. Buller, “Full waveform analysis for long-range 3D imaging laser radar,” EURASIP J. Adv. Signal Process. 2010, 896708 (2010). [CrossRef]

24. S. Hernandez-Marin, A. M. Wallace, and G. J. Gibson, “Creating multi-layered 3D images using reversible jump MCMC algorithms,” in Advances in Visual Computing, G. Bebis, R. Boyle, B. Parvin, D. Koracin, P. Remagnino, A. Nefian, G. Meenakshisundaram, V. Pascucci, J. Zara, J. Molineros, H. Theisel, and T. Malzbender., eds. (Springer, Berlin, 2006), pp. 405–416.

25. I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pure Appl. Math. 57, 1413–1457 (2004). [CrossRef]

26. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci. 2, 183–202 (2009). [CrossRef]

27. D. Snyder, Random Point Processes (Wiley, 1975).

28. B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM J. Comput. 24, 227–234 (1995). [CrossRef]

29. M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, 2010). [CrossRef]

30. S. Fuchs, M. Suppa, and O. Hellwich, “Compensation for multipath in ToF camera measurements supported by photometric calibration and environment integration,” in Computer Vision Systems, M. Chen, B. Leibe, and B. Neumann, eds. (Springer, 2013), pp. 31–41. [CrossRef]

31. A. Bhandari, A. Kadambi, R. Whyte, C. Barsi, M. Feigin, A. Dorrington, and R. Raskar, “Resolving multipath interference in time-of-flight imaging via modulation frequency diversity and sparse regularization,” Opt. Lett. 39, 1705–1708 (2014). [CrossRef] [PubMed]

32. “GitHub repository for multi-depth single-photon imaging,” https://github.com/photon-efficient-imaging/full-waveform/.

33. D. Lefloch, R. Nair, F. Lenzen, H. Schäfer, L. Streeter, M. Cree, R. Koch, and A. Kolb, “Technical foundation and calibration methods for time-of-flight cameras,” in Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications, M. Grzegorzek, C. Theobalt, R. Koch, and A. Kolb, eds. (Springer, 2013), pp. 3–24. [CrossRef]

34. G. H. Golub, M. Heath, and G. Wahba, “Generalized cross-validation as a method for choosing a good ridge parameter,” Technometrics 21, 215–223 (1979). [CrossRef]

Input: y,S,b,τ, ${\hat{x}}^{(0)}$ ,δ
Output: ${\hat{x}}^{(k)}$
Initialize k ← 0;
repeat
k←k+1;	▷ iteration update
${\hat{x}}^{(k)} \leftarrow {\hat{x}}^{(k - 1)} - S^{T} [1_{m} - div (y, S {\hat{x}}^{(k - 1)} + b 1_{m})];$	▷ gradient descent
${\hat{x}}^{(k)} \leftarrow \max ({\hat{x}}^{(k)} - τ, 0);$	▷ shrinkage-rectification
until ${‖ {\hat{x}}^{(k - 1)} - {\hat{x}}^{(k)} ‖}_{2}^{2} < δ$	▷ convergence criterion

Input: y,S,b,τ, ${\hat{x}}^{(0)}$ ,δ
Output: ${\hat{x}}^{(k)}$
Initialize k ← 0;
repeat
k←k+1;	▷ iteration update
${\hat{x}}^{(k)} \leftarrow {\hat{x}}^{(k - 1)} - S^{T} [1_{m} - div (y, S {\hat{x}}^{(k - 1)} + b 1_{m})];$	▷ gradient descent
${\hat{x}}^{(k)} \leftarrow \max ({\hat{x}}^{(k)} - τ, 0);$	▷ shrinkage-rectification
until ${‖ {\hat{x}}^{(k - 1)} - {\hat{x}}^{(k)} ‖}_{2}^{2} < δ$	▷ convergence criterion

Computational multi-depth single-photon imaging

Abstract

1. Introduction

2. Imaging framework

2.1. Imaging setup

2.2. Observation model at a single pixel

2.3. Observation likelihood expressions

2.4. Characteristics of the impulse response functions of natural scenes

3. Novel image reconstruction algorithm

4. Results

4.1. Simulations

4.1.1. Performance of two-path recovery

4.1.2. Resolvability

4.2. Experimental results

4.2.1. Imaging through a partially-reflective object

4.2.2. Imaging a partially-occluding object

5. Conclusion and future work

Acknowledgments

References and links

Cited By

Figures (10)

Tables (2)

Equations (16)

Optics Express