3D target detection and spectral classification for single-photon LiDAR data

Mohamed Amir Alaa Belmekki; Jonathan Leach; Rachael Tobin; Gerald S. Buller; Stephen McLaughlin; Abderrahim Halimi

doi:10.1364/OE.487896

1. Introduction

Single-photon (SP) Light detection and ranging (LiDAR) used with their time-correlated single-photon technique is receiving significant interest as an emerging approach in numerous applications such as automotive LiDAR [1–3], remote sensing for environmental sciences [4,5], or imaging at long-range [6–11], through obscurants [12–14] or with multiple wavelengths [15 16–17]. An SP-LiDAR system operates by illuminating the scene using laser pulses and recording the arrival times of the reflected photons with respect to their emission times using an SP detector, typically a single-photon avalanche diode (SPAD) detector. A histogram of photon counts with respect to time-of-flight is constructed for each individual SPAD pixel. By using a focal plane array composed of multiple SPAD detectors, it is possible to image the entire optical field simultaneously. This process can be repeated using different laser wavelength illumination to obtain additional multispectral information on the observed scene. The resulting timing histograms enable object detection [18,19] as well as estimation of the target’s depth and reflectivity profiles [17,20].

However, SP-LiDAR faces several challenges to allow further deployment to real-world applications. Large data volume constitutes a major challenge for SP-LiDAR imaging due to the acquisition of millions of events per second that are typically acquired in large histogram cubes. This challenge is more evident when using multiple wavelengths to observe the scene and when imaging through scattering media. This happens when the target return signal is attenuated and the background noise is significantly increased due to optical backscatter when imaging through atmospheric obscurants or underwater [12–14,17]. The use of LIDAR in atmospheric obscurants means that the backscattered return will depend both on distance of the scattering event from the transceiver and the optical configuration used, typically resulting in a non-uniform background noise level in the timing histogram [21]. A similar non-uniformity is also observed as a result of the pile-up effect [22,23] caused by the high counting rates, where the probability of a second event within a given timing period is too high, leading to an higher background level near the beginning of the timing period. Another limitation includes the detection of multiple-surfaces-per-pixel which usually occur when imaging through semi-transparent materials (e.g., windows, camouflage), or in long-range depth profiling as shown in [24].

Several approaches have been proposed to deal with these challenges. To reduce data volume, the authors of [18,19] proposed a surface detection problem using a Bayesian formulation to determine if a pixel contained photons related to a surface or only spurious background counts due to ambient light or scattered photons from the observation environment (air, water, etc.). The approach in [20] pushed the problem further by being able to determine the material of a detected surfaces using multispectral imaging. However, these approaches assume a single peak per pixel and are operating on each pixel independently (pixel-wise), hence not taking into account spatial correlation between pixels to improve the detection. Multiple solutions have also been proposed to improve robustness of the processing to low-light or high-background scenarios [25–28]. Altmann et al. [25] introduced a Bayesian approach to regularise the data fidelity term, while Shin et al. [26], Halimi et al. [28], and Rapp and Goyal [27] proposed optimisation approaches as a mean to regularise the maximum likelihood solution. However, all of these approaches are not adapted to scenarios where the one-surface-per-pixel assumption is not valid. In that regard, many papers have been dedicated to tackling the multiple return problem in the context of 3D LiDAR imaging. Bayesian-based approaches [24,29] recovered the 3D profile of a target with multiple returns per-pixel. These approaches used reversible jump Markov chain Monte Carlo (RJ-MCMC) as an inference method to sample from the posterior distribution of interest and promote spatial correlation between points belonging to the same surface, showing good performance. However, the RJ-MCMC results in a large computational cost, which hinders its use for real-time imaging. 3D reconstruction from a point cloud point-of-view has also showed promising results in [30,31] to tackle an unknown number of objects per pixel, but these approaches do not provide uncertainty about the estimates. This shows the need for a fast and robust pre-processing algorithm, to reduce data volume and only extract useful information that can be used to perform higher level processing with quantified uncertainty.

This paper proposes a saliency-based pre-processing algorithm to perform 3D target detection while accounting for multiscale information and the possible presence of multiple-peaks per pixel. The resulting 3D map can be used to filter the raw data to only consider useful regions hence reducing data volume and eliminating background counts. The resulting clean data can be used to perform higher level inference as a post-processing step (see Fig. 1). We propose a hierarchical Bayesian model and a corresponding estimation algorithm for spectral classification and 3D reconstruction of the detected surfaces. Comparisons with state-of-the-art algorithms show that the proposed framework is robust and provide good results even with a high background level and in a low photon return regime.

Fig. 1. Schematic representation of the main steps of the proposed framework. The top path represents the pre-processing, i.e., 3D target detection strategy. The bottom path shows the post-processing, i.e., 3D reconstruction and spectral classification strategy.

Download Full Size | PDF

To summarize the main contributions of the paper are:

• A new computationally efficient 3D target detection algorithm that uses multiscale information to locate unknown number of surfaces per-pixel when the background noise can be non-uniform.
• A new hierarchical Bayesian algorithm that performs 3D spectral classification and scene reconstruction while promoting spatial correlation between neighbouring surfaces in the 3D point-cloud.

The paper is structured as follows. The problem formulation and the proposed 3D target detection algorithm are presented in Section 2. The proposed Bayesian model and estimation algorithm for 3D spectral classification and scene reconstruction are presented in section 3. Results on simulated and real data are analysed in Section 4. The conclusions and future work are finally reported in Section 5.

2. Pre-processing: 3D target detection

2.1 Observation model

We consider 3D histogram cubes $\boldsymbol {Y}$ of LiDAR photon counts of dimension $N \times L \times T$, where $N$ is the number of scanned spatial positions (i.e., $N = N_{\textrm {row}} \times N_{\textrm {col}}$ pixels), $L$ is the number of spectral wavelengths and $T$ is the number of time bins. Akin to [27,29,32], each photon count $y_{n,l,t}$, where ${n \in \{1,\ldots,N\}}$, $l \in \{1,\ldots,L\}$ and $t \in \{1,\ldots,T\}$, is assumed to follow a Poisson distribution as follows:

(1)$${y}_{n,l,t} | {r}_{n,1:K_s,l}, d_{n,1:K_s}, b_{n,t,l} \sim \mathcal{P}\left\lbrace \sum_{c=1}^{K_s}{\left[ {r}_{n,c,l}\,g_{l}(t-d_{n,c}) \right]} +b_{n,t,l} \right\rbrace,$$

where $\mathcal {P(.)}$ denotes a Poisson distribution, $K_s$ represents the maximum number of peaks per pixel (a pixel might have less than $K_s$ peaks), $r_{n,c,l} \geq 0$ is the spectral signature observed at the $l_{th}$ wavelength, $d_{n} \in \{1,2,\ldots,T\}$ represents the position of an object surface at a given range from the sensor, $b_{n,t,l} \geq 0$ is the background level associated with dark counts and ambient illumination which can be non-uniform as it depends on $t$, and $\sum _{t=1}^Tg_{l}(t)=1$ is the normalized system impulse response function (IRF), whose shape can differ between wavelength channels (assumed known from a calibration step). To deal with photon sparse regimes where pixels can have few or no photon detected, a multiscale representation was considered in several recent work by collecting photons from neighborhood pixels [16,27]. This is equivalent to filtering the histogram cubes with a uniform kernel as in the multiscale approximation model [17]. In this paper, we denote the filtered histograms by $\boldsymbol {Y}^{q}$ when filtered with the $q$th uniform kernel (e.g., of size $3 \times 3, 5 \times 5,$ etc).

2.2 Saliency-based target detection

In this work, we will expand the saliency framework proposed [33,34] to high dimensional data such as LiDAR data. This method inspired several state-of-the-art algorithm that deals with RGB [35] or RGB-D data [36]. The approach can highlight efficiently the salient regions of a scene while preserving objects’ boundaries and discarding high frequency components due to noise, texture, or blocking artifacts. In the context of SP Lidar, we propose to compute the saliency matrix $\boldsymbol {S}$ of size $N \times T$ on the observed cube of histograms, as follows:

(2)$$\boldsymbol{S}=\left\|\sum_{q=1}^Q \sum_{l=1}^L \lambda_q\left(\boldsymbol{Y}_l^q \underset{t}{\circledast} \boldsymbol{g}_l\right)-\hat{\boldsymbol{B}}_l\right\|_1$$

where $\left \Vert. \right \Vert _{1}$ is the L1-norm, ${{\boldsymbol {Y}}}^q$ is the histogram of counts of size $N \times L \times T$ matrix, $ {\circledast }_{t}$ is the convolution operator on $t$, ${\boldsymbol {g}_l}$ is the impulse response matrix function of size $L \times 1$, and $\hat {\boldsymbol {B}}_{l}$ is the background noise matrix of size $N \times T$ (estimated in section 2.3). The vector $\boldsymbol {\lambda } = [\lambda _1,\ldots, \lambda _Q]$, whose entries sum up to one ($\sum ^Q_{q=1} \lambda _q =1$), represents the contribution of each scale to the saliency of each voxel of the cube. Note that Eq. (2) is fast to compute as it only involves convolution and summation operators.

The saliency cube in (2) contains voxels of background, and others originating from target peaks. To detect salient regions, we assume the saliency values of a background voxel to follow a Gamma distribution $\mathcal {G}$, and signal voxels to have positive values, as follows

(3)$$\begin{array}{lll} s_{n,t} \sim & \mathcal{G} (\alpha_b, \beta_b), & \textrm{if } (n,t) \textrm{ is a background voxel},\\ {}& \textrm{otherwise } & (n,t) \textrm{ contains a surface }. \end{array}$$

A simple hypothesis test allows us to separate signal from background voxels. This is obtained by first fitting a gamma distribution to ${\boldsymbol {S}}$ followed by a thresholding of the saliency matrix $S > \textrm {Thresh}(\alpha _b, \beta _b)$, where $\textrm {Thresh}(\alpha _b, \beta _b)$ depends on the gamma parameters and can be fixed based on the desired probability of false alarm. This results in a binary 3D map, denoted ${\boldsymbol {M}}$ of size $N \times T$, highlighting regions of target returns (i.e., white voxels in Fig. 1) from those due to background (i.e., black voxels in Fig. 1). The saliency-based target detection algorithm is summarized in Algo. 1.

Algorithm 1. Saliency-based target detection algorithm.

View Table | View all tables in this article

2.3 Background estimation

Our approach requires an estimate of the background level as indicated in (2). The presence of obscurants causes the background level $b_{n,l,t}$ to be non-uniformly distributed w.r.t pixels, wavelengths, and/or time bins (depth dimension). Our background estimation strategy assumes $b_{n,l,t}$ to be smooth, which is the case if the obscurant is spatially homogeneous. This implies that the low-pass filtered histogram $y^{Q}_{n,t,l}$, where $Q$ is the coarsest scale of the histogram, can be represented by the sum of sparse signal returns (related to the reflected target) and a 3D smooth background $\hat {b}_{n,l,t}$. Akin to [17], an efficient estimation can be obtained by assuming the same temporal shape of the background for all pixels as follows

(4)$${\hat{b}}_{n,l,t} = \max \left( \underline{\textit{b}}_{n,l} + \bar{b}_{l,t} -\bar{\bar{b}}_{l}, 0\right) ,$$

where ${\bar {b}}_{l,t}, \underline {\textit {b}}_{n,l}$ represent the temporal and spatial shapes of the background, respectively, and $\bar {\bar {b}}_{l} = \sum _t \bar {b}_{l,t}/T$. The temporal shape can be approximated as follows

(5)$${\bar{b}}_{l,t} = median \left( y^{Q}_{\Pi_n,l,t} \right) ,$$

where $\Pi _n$ represents the indices of the lowest 10% values of $y^{Q}_{n,l,t}$ to only consider background and reject signal returns, and median represents the median operator. For a given time bin, this strategy assumes that at least 10% of pixels only contain background without a target, which is often satisfied except when observing a perfectly lateral plane having the same depth value for all pixels. The spatial shape of the background can also be estimated as the median value over time bins, as follows:

(6)$$\underline{\textit{b}}_{l,t} = median \left( y^{Q}_{n,l,:} \right).$$

3. Post-processing: Bayesian 3D reconstruction and spectral classification

Given the estimated 3D target map and background level, one can isolate target’s peaks and approximate their clean histogram of counts by removing the estimated background returns (see Section 3.3). As a result, we obtain $N_s = N \times K_s$ small histograms called surfaces, where $K_s$ represents the maximum number of surfaces per pixel, i.e., some pixels might have less surfaces. These small histograms can then be used to perform higher level processing such as object recognition, classification, etc. In this paper, we focus on 3D reconstruction and spectral classification of the observed objects. More precisely, we assume the availability of a spectral library containing $K$ objects (defined by their $K$ spectral signatures ${\boldsymbol {m}}_k,$ for $k \in 1,\ldots, K$) and aim to spatially localise the objects contained in the scene. In this case, the observation model becomes

(7)$${y}_{s,l,t} | {r}_{s,l}, d_{s} \sim \mathcal{P}[h_s {r}_{s,l}\,g_{l}(t-d_{s}) ],$$

where $r_{s,l}$ represents the spectral signature of the $s$th surface, and $h_s$ is a positive coefficient allowing the model to account for different illumination conditions. Indeed, an object can be observed under different illumination conditions (e.g., shadowed regions) leading to the same spectral shape but different amplitude levels. It is worth-noting that the model in (7) uses surface indices $s$ instead of pixels, where in comparison to (1), a surface can be associated with the $c$th peak of the $n$th pixel. Note also that (7) assumes the absence of the background, as it was removed in the pre-processing step. The joint likelihood, when assuming the observed surfaces, wavelengths and bins mutually independent, is then given by:

(8)$${\mathit p({\boldsymbol{Y}} | \boldsymbol{R}, {\boldsymbol{h}}, {\boldsymbol{d}}) = \prod_{l=1}^{L} \prod_{t=1}^{T} p({y_{s,l,t}} | h_{s}, r_{s,l}, d_{s}) }$$

where ${\boldsymbol {h}} = \left (h_1,\ldots, h_S \right )$, ${\boldsymbol {d}} = \left (d_1,\ldots, d_S \right )$ and $\boldsymbol {R}$ is a matrix gathering $r_{s,l}, \forall s,l$. The post-processing problem addressed in this section consists in performing a spectral segmentation and 3D reconstruction of a target by determining the spatial class label, the spectral response, and the depth of each surface. This is an ill-posed inverse problem which is regularized using a Bayesian framework combining the data statistics in (8), with available knowledge about the unknown parameters introduced through prior distributions, as detailed below.

3.1 Prior distributions

Prior of ${\boldsymbol {R}}$: Akin to [18,20,37], the classification problem intent to assign to each pixel a label associated to one of the $K$ known spectral classes. The reflectivity conjugate prior accounts for this effect by considering a mixture of $K$ gamma distributions as follows:

(9)$$\mathit P(r_{s,l}|u_s,\alpha_{k,l}^{r},\beta_{k,l}^{r},K) = \sum_{k=1}^{K} \delta (u_s-k) \mathcal{G}(r_{s,l};\alpha_{k,l}^{r},\beta_{k,l}^{r})$$

where $u_s \in \{1,\ldots,K\}$ is a latent variable that indicates the label of the class, $\delta (.)$ is the Dirac delta distribution centred in 0, $\mathcal {G}(r_{s,l};\alpha _{k,l}^{r},\beta _{k,l}^{r})$ represents a gamma density with shape and scale hyperparameters $\left (\alpha _{k,l}^{r}, \beta _{k,l}^{r}\right )$. The shape hyperparameter is fixed, however the scale parameter is estimated to account for spectral variability of the signatures between pixels. Both parameters will encourage the mean of the estimated signatures to be around the $K$ known spectral signatures by promoting a small variance. Unlike in [18,20,37] where the absence of target information is embodied in the prior distribution, the prior (9) assumes the presence of a target as a result of the saliency-based object detection in Section 2.

Prior of ${\boldsymbol {H}}$: The reflectivity modulation parameter $h_s$ should be positive and is assumed to vary smoothly from one pixel to neighbouring ones. These properties are promoted by assigning a hidden gamma-MRF (GMRF) [38] prior to $\textbf {h}$, which also ensures the tractability of the resulting posterior distribution [13,25]. More precisely, we introduce an ${N}_\textrm {row} \times N_\textrm {col} \times K_s$ auxiliary tensor with elements $w_s \in \mathbb {R}^+$ and define a 3D graph between $\textbf {H}$ and ${\boldsymbol {W}}$ such that each $h_s$ is associated to 4$K_s$ elements of ${\boldsymbol {W}}$ and vice-versa (see Fig. 2(a)). The resulting joint distribution is computationally compelling as the conditional prior distributions of $h$ and $w$ reduce to conjugate inverse gamma ($\mathcal {IG}$) and gamma ($\mathcal {G}$) distributions as follows:

(10)$$\mathit h_{s} | W,\boldsymbol{\rho} \sim \mathcal{G}(\theta_1^h,(\theta_2^h)^{{-}1}),\;\; \textrm{and } \;\; \mathit w_{s} | H,\boldsymbol{\rho} \sim \mathcal{IG}(\theta_1^{w},\theta_2^{w})$$

where $\mathit \theta _1^h = \sum ^{4K_s}_{s'=1} c_{s,s'} \rho _{s',s}$, $\mathit \theta _2^h = \sum ^{4K_s}_{s'=1} \frac {c_{s,s'} \rho _{s',s}}{w_{s'}}$, $\theta _1^{w} = \sum ^{4K_s}_{s'=1} c_{s,s'} \rho _{s',s}$, $\theta _2^{w} = \sum ^{4K_s}_{s'=1} c_{s,s'} \rho _{s',s} h_{s'}$ and $\rho _{s,s'}$ is a coupling parameter that controls the level of spatial smoothness applied by the GMRF and $c_{s,s'}$ is a binary variable that indicates the existence or absence of a surface, where $c_{s,:} = 0$ if the $s$th surface is not detected (see Fig. 2(a)).

Fig. 2. Connection graphs of the proposed Bayesian model and GMRF prior.

Download Full Size | PDF

Prior of ${\boldsymbol {u}}$: Potts prior is assigned to the class variable ${\boldsymbol {u}}$ to promote spatial correlations between class labels. The prior of ${\boldsymbol {u}}$ is then obtained using the Hammersley-Clifford theorem

(11)$$f\left({\boldsymbol{u}}\right) = \frac{1}{G(\gamma)} \exp\left[\gamma \sum_{s=1}^{N_s}{\sum_{s' \in \eta(s) } \delta \left(u_{s}-u_{s'} \right)} \right]$$

where $\gamma >0$ is the granularity coefficient, $G(\gamma )$ is a normalizing (or partition) constant and $\eta (s)$ denotes the $4 K_s$ neighbours of the $s$th surface.

Prior of ${\boldsymbol {d}}$: In absence of additional knowledge, the depth parameter $d_n$ is assigned a non-informative uniform prior as follows:

(12)$$\mathit p({d}_{s} = t ) = \frac{1}{T} \quad , \forall{t} \in \{1,\ldots,T\}.$$

Nonetheless, the depth prior can be adjusted in the event of additional information regarding the position of a target.

Prior of $\boldsymbol {\beta }$: The hyper-parameter $\beta _{k,l}$ controls the mean and variance of the unknown spectral signatures ${\boldsymbol {R}}$ for each class $k$. The $\beta _{k,l}$ hyper-parameter is assigned an inverse-gamma distribution

(13)$$\mathit \beta_{k,l} \sim \mathcal{IG}(\upsilon_{k,l},\epsilon_{k,l})$$

where $\upsilon _{k,l}$ and $\epsilon _{k,l}$ are fixed values that are related to the mean the variance of the hyper-parameter $\beta _{k,l}$.

3.2 Joint posterior distribution

From the joint likelihood in (8) and the prior distributions considered in Section 3.1, we can build the joint posterior distribution for $\boldsymbol {\Theta }=\left (\boldsymbol {r},\boldsymbol {B},{\boldsymbol {d}}, {\boldsymbol {u}}, {\boldsymbol {h}}, {\boldsymbol {w}} \right )$ given the 3D histograms $\boldsymbol {Y_s}$ and the hyper-parameters $\boldsymbol {\Theta _{H}}=\left (K, \boldsymbol {\Upsilon }, \boldsymbol {A}, \rho, \boldsymbol {C} \right )$, where $\boldsymbol {A}$, and $\boldsymbol {\Upsilon }$ gather $\{\alpha \}_{k,l}$, $\{\nu, \epsilon \}_{k,l}$, respectively. Using Bayes rule, the joint posterior distribution can be formulated as follows:

(14)$${\mathit p(\boldsymbol{\Theta}|\boldsymbol{Y}, \boldsymbol{\Theta_{H}}) \propto p(\boldsymbol{Y}|\boldsymbol{\Theta}) p(\boldsymbol{B}|\boldsymbol{\Upsilon},K)p({\boldsymbol{H}},{\boldsymbol{W}}|\boldsymbol{\rho})p(\boldsymbol{r}|\boldsymbol{A},\boldsymbol{B},K,{\boldsymbol{u}}) p({\boldsymbol{u}}) p({\boldsymbol{d}}). \;\; }$$

The Directed acyclic graph (DAG) of the proposed hierarchical Bayesian model is summarized in Fig. 2(b).

3.3 Graph estimation - multiscale information

This sections indicates how to obtain clean surfaces of counts interconnected with a graph. Given the multiscale histograms $\boldsymbol {Y}^{q}_{l}$, the estimated noise $\hat {{\boldsymbol {B}}}_l$, and 3D binary detection map, one can approximate the background free signal on detected surfaces as follows

(15)$$*y_{s, l, t}^q=\max \left(y_{s, l, t}^q \underset{t}{\circledast} g_l-\hat{b}_{s, l, t}, 0\right),$$

where $s$ represents a surface detected with the saliency based strategy in Section 2.2. This equation highlights the availability of multiscale information, which is often exploited in state-of-the-art algorithms to improve robustness to noise [16,24,27,39–42]. However, filtering the histograms of counts with a low-pass uniform filter reduces spatial resolution due to the reduction of high frequency components (edges). Different approaches for scale selection were used to pick out a scale among different available set of scales such as a median strategy or attention [43,44]. Here, the goal is to select a scale $\bar {q}$ among the $Q$ spatially downsampled version of the signal histogram of counts ${}^{*}\boldsymbol {Y}^{1:Q}$ for each surface $s$. The criterion that is applied to select a scale $\bar {q}$ is as follows:

(16)$$\mathit {\bar{q}_s}=argmin_{q_s \in \{2,\ldots,Q\}} \, |d^G_s-d^{q_s}_s|$$

where ${\bar {q}_s}$ is the selected scale for the $s$th detected surface and $d^G_s$ and $d^{q_s}_s$ are given by

(17)$$\mathit {d^G_s}=argmax_{d_s \in \{1,\ldots,T\}} \, \sum^Q_{q_s=1} \sum^L_{l=1} \sum^T_{t=1} {}^{*}\boldsymbol{y}_{s,t,l}^{q_s} g_l(t-d_s)$$

(18)$$\mathit {d^{q_s}_s}=argmax_{d_s \in \{1,\ldots,T\}} \, \sum^L_{l=1} \sum^T_{t=1} {}^{*}\boldsymbol{y}_{s,t,l}^{q_s} g_l(t-d_s).$$

For readability purposes, the star ${*}$ and the scale selected ${\bar {q}_s}$ have been omitted from the denoised surface-histogram ${}^{*}y^{{\bar {q}_s}}_{s,l,t}$ in (7) and following equations, leading to the notation ${}^{*}y^{{\bar {q}_s}}_{s,l,t}$ to be simply denoted ${y}_{s,l,t}$. Note that the proposed model assumes a maximum of $K_s$ surfaces per-pixel. However, some pixels might have less surfaces in which case we introduce a binary matrix $C$ of size $N_\textrm {row} \times N_\textrm {col} \times K_s$ to indicate if a surface is detected. If a surface $s$ is not detected, $c_{s,:} = 0$ is set to $0$ and all connections to this surface will have a null weight (see Fig. 2(a)).

3.4 Estimation strategy

This paper approximates the maximum-a-posteriori parameter estimates using a Coordinate Descent Algorithm (CDA) algorithm. The CDA aims at maximizing the posterior in (14) with respect to the parameters of interest, namely $\boldsymbol {\Theta }=\left (\boldsymbol {r},\boldsymbol {B},{\boldsymbol {d}}, {\boldsymbol {u}}, {\boldsymbol {h}}, {\boldsymbol {w}} \right )$. Due to the large number of parameters of interest and the complexity of the joint distribution, the CDA emerges as a great candidate to estimate the parameters of interest as in [13,45–47]. Indeed, this algorithm iteratively updates the unknown parameters by maximizing their conditional distributions with respect to a single parameter while keeping the others fixed until a convergence criterion is met as detailed in Algo. 2.

Algorithm 2. Estimation algorithm.

View Table | View all tables in this article

From (14), we can easily demonstrate that the conditional distributions associated with the parameters of interest are

(19)$$\mathit r_{s,l}| \boldsymbol{H}, \boldsymbol{B_{s}}, K, \boldsymbol{A}, \boldsymbol{Y_s} \sim \mathcal{G} \left[ \overline{y}_{s,l}+\alpha_{l,k},\left(h_s+\frac{1}{\beta_{k,l}}\right)^{{-}1} \right]$$

(20)$$\mathit \beta_{s,k,l}| \boldsymbol{r}, K, \boldsymbol{A}, \boldsymbol{\Upsilon}, \boldsymbol{\rho} \sim \mathcal{IG}\left(\alpha_{l,k}+\upsilon_{l,k},r_{s,l}+\epsilon_{l,k} \right)$$

(21)$$\mathit h_{s}| W, \boldsymbol{r}, \boldsymbol{\rho}, \boldsymbol{Y_s} \sim \mathcal{G} \left[\overline{\overline{y}}_{s}+L\left(\theta^h_1-1\right),\left(\sum^L_{l=1} r_{s,l}+L\theta^h_2\right)^{{-}1} \right]$$

(22)$$\mathit w_{s} | \mathcal{C},H,\boldsymbol{\rho} \sim \mathcal{IG}\left(\theta_1^{w},\theta_2^{w} \right)$$

where $\overline {y}_{s,l}= \sum ^T_{t=1} y_{s,l,t}$, $\overline {\overline {y}}_s = \sum ^L_{l=1}\overline {y}_{s,l}$ and the mode of each distribution is uniquely reached and used in the CDA updates, as follows:

(23)$$u_S^{\text {map }}=\underset{k \in\{1, \ldots, K\}}{\operatorname{argmax}} \,p\left(u_S \mid \boldsymbol{H}, W, \boldsymbol{r}, \boldsymbol{B}_{\boldsymbol{s}}, \boldsymbol{A}, \boldsymbol{Y}_{\boldsymbol{s}}, \boldsymbol{\Upsilon}, \boldsymbol{\rho}\right)$$

(24)$$\begin{aligned} \mathit r^{map}_{s,l} = \frac{ \overline{y}_{s,l}+\alpha_{l,k}-1}{h_s+\frac{1}{\beta_{k,l}}}, & \qquad \quad \mathit \beta^{map}_{s,l,k}= \frac{r_{s,l}+\epsilon_{l,k}}{\alpha_{l,k}+\upsilon_{l,k}+1} \end{aligned}$$

(25)$$\begin{aligned} \mathit h^{map}_{s}= \frac{\overline{\overline{y}}_{s}+L(\theta^h_1-1)}{\sum^L_{l=1} r_{s,l}+L\theta^h_2}, & \qquad \quad \mathit w^{map}_{s} =\frac{\theta_2^{w}}{\theta_1^{w}+1}. \end{aligned}$$

3.4.1 Stopping criteria

Algorithm 2 is an iterative process that requires the definition of some stopping criteria. In this paper, we consider two criteria that would terminate the algorithm if they are satisfied. The first criterion compares the new value of the estimated parameters $\boldsymbol {\Theta }$ to the previous ones and stops the algorithm if changes are smaller than a given user-defined threshold, that is:

(26)$$\mathit {\sqrt{\frac{1}{\textit{N}_s}\sum_{s=1}^{\textit{N}_s}{\Big({\pi_s^{(i+1)} - \pi_s^{(i)}}\Big)^2}}} \leq {\xi_{\pi}}$$

where $\textit {N}_s$ denotes the number of surfaces, $\pi _n^{(i)}$ and $\pi _n^{(i+1)}$ denote the parameter of interest of the $s$th surface at the iterations $i$ and $i+1$, respectively, and the threshold ${\xi _{\pi }}$ can be chosen according to many factors such as the magnitude of the parameter of interest (fixed to $10^{-1}$ in all our experiments). The second criterion is based on a maximum number of iterations $I_{max}$.

4. Experiments

4.1 Comparison algorithms and evaluation criteria

The proposed algorithm performs a joint target detection and 3D scene reconstruction, while considering minimal assumptions such as the presence of multiple points per pixel. To highlight the robustness and benefit of the proposed algorithm, it is compared to several state-of-the-art target detection and reconstruction algorithms including:

• The histogram-based TD algorithm (HTD) [18]: is a fast target detection algorithm that operates on per-pixel histograms and assumes one surface per-pixel.
• The event-based TD algorithm (ETD) [19]: is a fast target detection algorithm that operates on per-pixel raw ToFs and assumes one surface per-pixel.
• The multispectral TD algorithm (MSTD) [20]: is a fast per-pixel signature-based target detection algorithm that performs range estimation and multispectral classification.
• The multispectral 3D reconstruction algorithm (MS3D) [17]: assumes a single-surface and designed for robust processing of multispectral LiDAR data acquired through obscurants.
• The RT3D algorithm [31]: assumes multiple surfaces per-pixel and is used when analysing robustness to noise and photon-sparse regime imaging on single spectral data.
• The MuSaPoP algorithm [24]: assumes multiple surfaces per-pixel and is used when analysing multispectral LiDAR data.

Different metrics are considered to evaluate the TD and the 3D reconstruction estimates:

• True detection $F_{\textrm {true}}(\tau )$: Probability of true detection with respect to the distance $\tau$, which consider a detection as a true one if there is another point in the ground truth in the same pixel such that $|d_{s}^{\textrm {true}}$ – $d_{s'}^{\textrm {est}}| \leq \tau$.
• False detection $F_{\textrm {false}}(\tau )$: Number of estimated points that cannot be assigned to a ground truth point at a distance $\tau$.
• Mean absolute error of intensity at distance $\tau$ denoted IAE: sum of absolute error across all detected points $S_{est}$ of intensity estimates $I^{\textrm {est}}_{s,l}$ normalized with respect to the total number of ground truth points $S_{true}$, i.e., $\frac {1}{S_{true}} \sum ^{S_{est}}_{s=1} \sum ^L_{l=1} | I^{\textrm {est}}_{s,l} \text {--} I^{\textrm {true}}_{s,l}|$ where $I_{s,l}=h_{s} r_{s,l}$ and $I^{\textrm {true}}_{s,l}$ is the intensity of the closest ground-truth point to the estimated one. Note that if a point was falsely estimated or a ground truth point was not found, then they are considered to have resulted in an error of $\sum ^{L}_{l=1} |I_{s,l}|$ .
• Mean absolute error depth at distance $\tau$ denoted DAE: sum of absolute error across all detected points $S_{est}$ of depth estimates normalized with respect to the total number of true detection $F_{\textrm {true}}(\tau )$ i.e., $\frac {1}{F_{\textrm {true}}(\tau )} \sum ^{S_{Est}}_{s=1} | d_{s}^{\textrm {true}} \text {--} d_{s'}^{\textrm {est}}|$. The ground truth and estimated points are coupled using the probability of detection $F_{\textrm {true}}(\tau )$.

4.2 Evaluation on simulated data

In this section, the proposed algorithm is tested on a simulated cluttered art scene extracted from the Middlebury dataset (Available on: http://vision.middlebury.edu/stereo/data/). This scene is commonly used as a standard scene for algorithms evaluation in many LiDAR imaging experiments [17,27]. To simulate a scene with multiple surfaces per-pixel, two cluttered art scenes are concatenated in the depth direction. The number of pixels and bins are, respectively, $N=185\times 232$ and $T=164$. Two values of average photons per-pixel (PPP) and signal-to-background (SBR) ratio are considered and the data are corrupted by uniform and non-uniform type of background. The proposed algorithm has been used with $Q=4$ ($1 \times 1, 5 \times 5, 7 \times 7, 11 \times 11$), $\rho >1$, $\gamma =4$, and $\lambda _q = 1/Q, \forall q$. Figure 3 shows the qualitative results of the 3D Target profile (second and third row) and the 3D spectral reconstruction (fourth and fifth row) of the art scene. This figure shows robust detection results in presence of the multiple surfaces per-pixel and for different levels of SBR and PPP. For uniform background, the detection results are good even at low SBR and PPP levels, but the reflectivity seems underestimated in extreme cases. Slightly lower performance can be observed for non-uniform background at low PPP levels where the algorithm has few false detections and estimates lower reflectivity values. Figure 4 shows quantitative results for the 3D target detection and spectral reconstruction experiments. The first row depicts the detection performance using true, false detected points, and reconstruction results using IAE and DAE, w.r.t distance for four values of SBR and PPP couples. As expected, the latter metrics improves when the PPP or/and SBR increase. However, we notice that the overall performance are more sensitive to a decrease in PPP than in SBR. The second row show the same performance w.r.t PPP and SBR with distance fixed to $\tau$=4mm. Again, the performance improves for larger PPP and/or SBR. However, we notice that false detection gets worse if the SBR level increases for low PPP levels.

Fig. 3. 3D Target detection and spectral reconstruction of the proposed algorithm on the Art scene. The performance is evaluated for several levels of PPP and SBR under uniform and non-uniform background for two surfaces per-pixel.

Download Full Size | PDF

Fig. 4. Quantitative results of the proposed TD and reconstruction algorithm on the Art scene. The performance is evaluated for several levels of PPP and SBR, different background shapes and for two surfaces per-pixel.

Download Full Size | PDF

4.3 Evaluation on real data

This section evaluates our algorithm on different data-sets acquired with real LiDAR systems. The target detection, classification, and spectral reconstruction performance are evaluated on scenes with one and multiple surfaces per-pixel. In the next experimental scenarios, $Q=4$ with kernel sizes $1 \times 1, 3 \times 3, 7 \times 7, 9 \times 9$, $\rho >1$, $\gamma =4$, and $\boldsymbol {\lambda } = [0,1,0,0]$

4.3.1 One surface per-pixel

This subsection considers two real scenes that contain at most one surface per-pixel. The first scene is a life-sized polystyrene head scanned at a stand-off distance of 325 metres during midday. This scene consists of $N=200\times 200$ pixels, $T = 1700$ histogram bins per pixel with a timing bin size of 16ps and has an average $SBR$ of 0.25 with a 5th-95th percentile interval of (0.01, 0.6). The acquisition time per-pixel is 1 ms, which corresponds to an average PPP of 30 photons (more details can be found in [48] regarding this scene). The second scene is a Lego figurine of size 42 mm tall and 30 mm wide which was scanned at a standoff distance of 1.8 m using a TCSPC module at an acquisition time per-pixel of 160ms (40ms acquisition time per wavelength). The size of the spatial, spectral and temporal dimension of the single-photon Lego data are, respectively, $N=200 \times 200$ pixels, $L=4$ wavelengths and $T = 1500$ bins with a timing bin size of 2ps. The data was acquired in presence of ambient illumination leading to an SBR of $1.3$. The Lego exhibits three classes of interest ($K=3$) whose spectral signatures (related to $\boldsymbol {A}$ and $\boldsymbol {B_s}$) are extracted from pixels acquired considering a negligible background contribution and after maximum acquisition time per-pixel. The reader is referred to [49] for more detail regarding the Lego scene. Figure 5 compares the estimated 3D target detection maps of the proposed algorithm with the 2D maps of the state-of-the-art TD algorithms. Both ETD and HTD require a post-processing with a TV regularization [18,19] to obtain smooth detection maps, while the proposed method provides smooth maps thanks to the use of multiscale information. Note also that ETD and HTD provide 2D maps while the proposed method detects 3D maps. Table 1 shows the accuracy of the proposed and MSTD algorithms for different acquisition times. The proposed algorithm has high accuracy performance and outperforms the MSTD algorithms post-processed with a TV regularization for an acquisition time higher than $0.4$ ms per-pixel.

Fig. 5. Comparison of the proposed TD algorithm with ETD and HTD algorithms. The scene is (top) a mannequin head with 1ms acquisition time per-pixel (PPP $\approx$ 30), (bottom) a Lego figurine with 2ms acquisition time per-pixel (PPP $\approx$ 50).

Download Full Size | PDF

Table 1. Target detection accuracy performance of the proposed and MSTD algorithms for different acquisition times on the Lego figurine scene.

View Table | View all tables in this article

Figure 6 shows the 3D reconstruction of the Mannequin head (first row) and the Lego figurine data (second row) for 1 ms (PPP=30) and 2ms (PPP=50) acquisition time per-pixel, respectively. The top row of this figure shows similar 3D reconstruction performance for all algorithms when applied to the mannequin head. For the Lego scene, we notice that the 3D reconstruction of the proposed algorithm and MS3D are closer to the ground truth than MuSaPoP. Table 2 shows the time performance of the proposed framework against the state-of-the-art algorithms considered in this subsection.

Fig. 6. Comparison of the 3D reconstruction performance of the proposed algorithm with the MS3D, and MuSaPoP algorithms. The scene is (top) a mannequin head with 1ms acquisition time per-pixel (PPP $\approx$ 30), (bottom) a Lego figurine with 2ms acquisition time per-pixel (PPP $\approx$ 50).

Download Full Size | PDF

Table 2. Computational times of the proposed framework compared to other state-of-the-art algorithms for different scenes.

View Table | View all tables in this article

Figure 7 shows the 3D classification of the Lego scene against the MSTD algorithm. The proposed approach shows fewer false detections and better accuracy.

Fig. 7. 3D Spectral classification of the Lego figurine scene with 2ms acquisition time per-pixel (PPP $\approx$ 50) of the proposed and MSTD algorithms. The results of MSTD are post-processed using a TV denoiser to improve performance.

Download Full Size | PDF

4.3.2 Multiple surfaces per-pixel

This subsection evaluates the target detection, classification, and spectral reconstruction on real data that contain multiple surfaces per-pixel. The first scene considered consists of a mannequin located 4 meters behind a partially scattering object, with $N = 99 \times 99$ pixels and $T = 4000$ bins with a mean photon count per-pixel of 45. This LiDAR scene is publicly available online (https://github.com/photon-efficient-imaging/full-waveform/tree/master/fcns.). The second scene consists of a man standing behind camouflage at a 230 m stand-off distance from the LiDAR sensor. Different acquisition times (from 0.5 to 3.2 ms per-pixel) were used obtaining 7 to 44.6 photon per-pixel on average from which approximately 2.1-13.3 photons correspond to background levels. The LiDAR cube has $N = 159 \times 78$ pixels and $T = 600$ histogram bins.

Figure 8 compares the proposed 3D target detection and reconstruction with those of state-of-the-art algorithms (MuSaPoP/ManiPoP and RT3D) when considering the two real scenes. Different acquisition times are considered for the man behind camouflage and represented in the first three rows. The proposed algorithm shows consistent results for these different cases and better depth estimates for small acquisition times when compared to the other state-of-the-art algorithms. The other scene (4th row), which depicts a mannequin man behind a scattering object, shows similar performance for all algorithms. Table 3 reports quantitative performance using the metrics defined in Section 4.1 and considering as reference the proposed TD and reconstruction estimates obtained with 3.2 ms acquisition time per-pixel. This table indicates best performance for the proposed approach for both true detection rates and IAE. This comes at the cost of a small increase in false alarm rate in low acquisition times.

Fig. 8. Comparison of the proposed 3D reconstruction algorithm with MuSaPoP, and RT3D algorithms. (1st, 2nd and 3rd rows) the scene is the man behind camouflage with different acquisition times per-pixel (0.5ms, 1ms, and 3.2ms). (4th row) the scene is a mannequin behind a scattering semi-transparent surface. From left to right, results of the 3D target detection and reconstruction of the proposed algorithm, reconstructions of MuSaPoP and RT3D.

Download Full Size | PDF

Table 3. Probabilities of true and false detection for the man behind camouflage scene for different acquisition times with $\tau = 1cm$.

View Table | View all tables in this article

5. Conclusions

This paper proposed 3D target detection and spectral classification algorithms for multispectral 3D single-photon LiDAR data. The detection algorithm exploited multiscale information to approximate the background level and detect signal peaks. This pre-processing step allowed unmixing signal information from background hence reducing data volume, and enabling higher level post-processing. In comparison to existing single-photon TD algorithms, the proposed strategy delivered smooth detection maps by considering spatial correlation between pixels, and allowed the detection of multi-peaks per pixel. We designed a hierarchical Bayesian model and estimation algorithm to perform spectral classification on the cleaned multispectral data. The results compared well with state-of-the-art 3D reconstruction algorithms on both simulated and real data. Future work will investigate the acceleration of these algorithms using parallel processing tools (e.g., GPU) to enable real-time processing.

Funding

Royal Academy of Engineering (RF/201718/17128, RF/202021/20/307); Engineering and Physical Sciences Research Council (EP/S026428/1, EP/T00097X/1).

Disclosures

The authors declare no conflicts of interest.

Data availability

A demo of the Matlab code is not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. M. Wallace, A. Halimi, and G. S. Buller, “Full waveform LiDAR for adverse weather conditions,” IEEE Trans. Veh. Technol. 69(7), 7064–7077 (2020). [CrossRef]

2. J. Rapp, J. Tachella, Y. Altmann, S. McLaughlin, and V. K. Goyal, “Advances in single-photon LiDAR for autonomous vehicles: Working principles, challenges, and recent advances,” IEEE Signal Process. Mag. 37(4), 62–71 (2020). [CrossRef]

3. P. Lindner and G. Wanielik, “3D LiDAR processing for vehicle safety and environment recognition,” in Workshop on Computational Intelligence in Vehicles and Vehicular Systems (IEEE, 2009), pp. 66–71.

4. A. M. Wallace, A. McCarthy, C. J. Nichol, X. Ren, S. Morak, D. Martinez-Ramirez, I. H. Woodhouse, and G. S. Buller, “Design and evaluation of multispectral LiDAR for the recovery of arboreal parameters,” IEEE Trans. Geosci. Remote Sensing 52(8), 4942–4954 (2014). [CrossRef]

5. T. Hakala, J. Suomalainen, S. Kaasalainen, and Y. Chen, “Full waveform hyperspectral LiDAR for terrestrial laser scanning,” Opt. Express 20(7), 7119–7127 (2012). [CrossRef]

6. H. Tan, J. Peng, Z. Xiong, D. Liu, X. Huang, Z.-P. Li, Y. Hong, and F. Xu, “Deep learning based single-photon 3D imaging with multiple returns,” in International Conference on 3D Vision (IEEE, 2020), pp. 1196–1205.

7. Z.-P. Li, X. Huang, P.-Y. Jiang, Y. Hong, C. Yu, Y. Cao, J. Zhang, F. Xu, and J.-W. Pan, “Super-resolution single-photon imaging at 8.2 kilometers,” Opt. Express 28(3), 4076–4087 (2020). [CrossRef]

8. A. McCarthy, R. J. Collins, N. J. Krichel, V. Fernández, A. M. Wallace, and G. S. Buller, “Long-range time-of-flight scanning sensor based on high-speed time-correlated single-photon counting,” Appl. Opt. 48(32), 6241–6251 (2009). [CrossRef]

9. Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, C.-Z. Peng, F. Xu, and J.-W. Pan, “Single-photon computational 3D imaging at 45 km,” Photonics Res. 8(9), 1532–1540 (2020). [CrossRef]

10. A. M. Pawlikowska, A. Halimi, R. A. Lamb, and G. S. Buller, “Single-photon three-dimensional imaging at up to 10 kilometers range,” Opt. Express 25(10), 11919–11931 (2017). [CrossRef]

11. Z. Li, E. Wu, C. Pang, B. Du, Y. Tao, H. Peng, H. Zeng, and G. Wu, “Multi-beam single-photon-counting three-dimensional imaging LiDAR,” Opt. Express 25(9), 10189–10195 (2017). [CrossRef]

12. A. Maccarone, A. McCarthy, X. Ren, R. E. Warburton, A. M. Wallace, J. Moffat, Y. Petillot, and G. S. Buller, “Underwater depth imaging using time-correlated single-photon counting,” Opt. Express 23(26), 33911–33926 (2015). [CrossRef]

13. A. Halimi, A. Maccarone, A. McCarthy, S. McLaughlin, and G. S. Buller, “Object depth profile and reflectivity restoration from sparse single-photon data acquired in underwater environments,” IEEE Trans. Comput. Imaging 3(3), 472–484 (2017). [CrossRef]

14. R. Tobin, A. Halimi, A. McCarthy, M. Laurenzis, F. Christnacher, and G. S. Buller, “Three-dimensional single-photon imaging through obscurants,” Opt. Express 27(4), 4590–4611 (2019). [CrossRef]

15. Y. Altmann, A. Maccarone, A. Halimi, A. McCarthy, G. Buller, and S. McLaughlin, “Efficient range estimation and material quantification from multispectral LiDAR waveforms,” in 2016 Sensor Signal Processing for Defence (SSPD) (IEEE, 2016), pp. 1–5.

16. A. Halimi, R. Tobin, A. McCarthy, J. Bioucas-Dias, S. McLaughlin, and G. S. Buller, “Robust restoration of sparse multidimensional single-photon LiDAR images,” IEEE Trans. Comput. Imaging 6, 138–152 (2020). [CrossRef]

17. A. Halimi, A. Maccarone, R. A. Lamb, G. S. Buller, and S. McLaughlin, “Robust and guided bayesian reconstruction of single-photon 3d lidar data: Application to multispectral and underwater imaging,” IEEE Trans. Comput. Imaging 7, 961–974 (2021). [CrossRef]

18. J. Tachella, Y. Altmann, S. McLaughlin, and J.-Y. Tourneret, “On fast object detection using single-photon LiDAR data,” in Wavelets and Sparsity XVIII, vol. 11138 (International Society for Optics and Photonics, 2019), p. 111380T.

19. A. Halimi, A. Wallace, G. S. Buller, and S. McLaughlin, “Fast surface detection using single-photon detection events,” in Sensor Signal Processing for Defence Conference (IEEE, 2020), pp. 1–5.

20. M. A. A. Belmekki, R. Tobin, G. S. Buller, S. McLaughlin, and A. Halimi, “Fast task-based adaptive sampling for 3d single-photon multispectral lidar data,” IEEE Trans. Comput. Imaging 8, 174–187 (2022). [CrossRef]

21. G. Satat, M. Tancik, and R. Raskar, “Towards photography through realistic fog,” in International Conference on Computational Photography (IEEE, 2018), pp. 1–10.

22. F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726–8 (2018). [CrossRef]

23. A. K. Pediredla, A. C. Sankaranarayanan, M. Buttafava, A. Tosi, and A. Veeraraghavan, “Signal processing based pile-up compensation for gated single-photon avalanche diodes,” arXivarXiv:1806.07437 (2018). [CrossRef]

24. J. Tachella, Y. Altmann, M. Márquez, H. Arguello-Fuentes, J.-Y. Tourneret, and S. McLaughlin, “Bayesian 3D reconstruction of subsampled multispectral single-photon LiDAR signals,” IEEE Trans. Comput. Imaging 6, 208–220 (2020). [CrossRef]

25. Y. Altmann, X. Ren, A. McCarthy, G. S. Buller, and S. McLaughlin, “LiDAR waveform-based analysis of depth images constructed using sparse single-photon data,” IEEE Transactions on Image Processing 25(5), 1935–1946 (2016). [CrossRef]

26. D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015). [CrossRef]

27. J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017). [CrossRef]

28. A. Halimi, Y. Altmann, A. McCarthy, X. Ren, R. Tobin, G. S. Buller, and S. McLaughlin, “Restoration of intensity and depth images constructed using sparse single-photon data,” in 24th European Signal Processing Conference (IEEE, 2016), pp. 86–90.

29. S. Hernandez-Marin, A. M. Wallace, and G. J. Gibson, “Bayesian analysis of LiDAR signals with multiple returns,” IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2170–2180 (2007). [CrossRef]

30. C. Mallet, F. Lafarge, M. Roux, U. Soergel, F. Bretar, and C. Heipke, “A marked point process for modeling lidar waveforms,” IEEE Trans. on Image Process. 19(12), 3204–3221 (2010). [CrossRef]

31. J. Tachella, Y. Altmann, N. Mellado, A. McCarthy, R. Tobin, G. S. Buller, J.-Y. Tourneret, and S. McLaughlin, “Real-time 3d reconstruction from single-photon lidar data using plug-and-play point cloud denoisers,” Nat. Commun. 10(1), 4984 (2019). [CrossRef]

32. Y. Altmann, A. Maccarone, A. McCarthy, G. Buller, and S. McLaughlin, “Joint range estimation and spectral classification for 3d scene reconstruction using multispectral lidar waveforms,” in Statistical Signal Processing Workshop (IEEE, 2016), pp. 1–5.

33. R. Achanta, F. Estrada, P. Wils, and S. Süsstrunk, “Salient region detection and segmentation,” in International conference on computer vision systems (Springer, 2008), pp. 66–75.

34. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Conference on computer vision and pattern recognition (IEEE, 2009), pp. 1597–1604.

35. H. Mei, G.-P. Ji, Z. Wei, X. Yang, X. Wei, and D.-P. Fan, “Camouflaged object segmentation with distraction mining,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 8772–8781.

36. T. Zhou, D.-P. Fan, M.-M. Cheng, J. Shen, and L. Shao, “Rgb-d salient object detection: A survey,” Comp. Visual Media 7(1), 37–69 (2021). [CrossRef]

37. M. A. A. Belmekki, S. McLaughlin, and A. Halimi, “Fast classification and depth estimation for multispectral single-photon lidar data,” in Sensor Signal Processing for Defence Conference (IEEE, 2021), pp. 1–5.

38. H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications (Chapman and Hall/CRC, 2005).

39. S. Chen, A. Halimi, X. Ren, A. McCarthy, X. Su, S. McLaughlin, and G. S. Buller, “Learning non-local spatial correlations to restore sparse 3d single-photon data,” IEEE Trans. on Image Process. 29, 3119–3131 (2020). [CrossRef]

40. W. Marais and R. Willett, “Proximal-gradient methods for poisson image reconstruction with bm3d-based regularization,” in 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (IEEE, 2017), pp. 1–5.

41. A. Halimi, J. Koo, R. A. Lamb, G. S. Buller, and S. McLaughlin, “Robust bayesian reconstruction of multispectral single-photon 3d lidar data with non-uniform background,” in International Conference on Acoustics, Speech and Signal Processing (IEEE, 2022), pp. 1531–1535.

42. S. Chen, W. Hao, X. Su, Z. Zhang, S. Wang, and W. Zhu, “Restoration of sparse multispectral single photon lidar data,” in International Conference on Signal Processing, Communications and Computing (IEEE, 2020), pp. 1–5.

43. J. Koo, A. Halimi, A. Maccarone, G. S. Buller, and S. McLaughlin, “A bayesian based unrolling approach to single-photon lidar imaging through obscurants,” in 30th European Signal Processing Conference (2022).

44. J. Koo, A. Halimi, and S. McLaughlin, “A Bayesian based deep unrolling algorithm for single-photon lidar systems,” IEEE J. Sel. Top. Signal Process. 16(4), 762–774 (2022). [CrossRef]

45. A. Halimi, C. Mailhes, J.-Y. Tourneret, and H. Snoussi, “Bayesian estimation of smooth altimetric parameters: Application to conventional and delay/doppler altimetry,” IEEE Trans. Geosci. Remote Sensing 54(4), 2207–2219 (2016). [CrossRef]

46. A. Halimi, G. S. Buller, S. McLaughlin, and P. Honeine, “Denoising smooth signals using a bayesian approach: Application to altimetry,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 10(4), 1278–1289 (2017). [CrossRef]

47. A. Halimi, P. Honeine, and J. M. Bioucas-Dias, “Hyperspectral unmixing in presence of endmember variability, nonlinearity, or mismodeling effects,” IEEE Trans. on Image Process. 25(10), 4565–4579 (2016). [CrossRef]

48. Y. Altmann, X. Ren, A. McCarthy, G. S. Buller, and S. McLaughlin, “Robust bayesian target detection algorithm for depth imaging from sparse single-photon data,” IEEE Trans. Comput. Imaging 2, 456–467 (2016). [CrossRef]

49. R. Tobin, Y. Altmann, X. Ren, A. McCarthy, R. A. Lamb, S. McLaughlin, and G. S. Buller, “Comparative study of sampling strategies for sparse photon multispectral LiDAR imaging: towards mosaic filter arrays,” J. Opt. 19(9), 094006 (2017). [CrossRef]

Acquisition Time (ms)	0.4	2	4	20	40	80
MSTD	$73.8$	$87.7$	$93.3$	$97.9$	$98.2$	$98.2$
MSTD + TV	$94.1$	$95.2$	$95.7$	$98.2$	$98.3$	$98.2$
Proposed	$93.3$	$98.5$	$98.7$	$98.6$	$98.9$	$100$

	Target Detection (per-pixel in ms)			Spectral Classification (per-pixel in ms)		3D Reconstruction (full scene in seconds)
Algorithms	ETD	HTD	Proposed	MSTD	Proposed	MuSaPoP	MS3D	Proposed
Mannequin Head	0.8	3	0.8	X	X	405	9	40
Lego Figurine	X	X	1.5	50	2	700	20	80

	Acquisition time (ms)	Proposed	MuSaPoP [24]	RT3D [31]
IAE	$0.5$	$17.72$	$20.45$	$21.45$
	$1$	$14.51$	$16.63$	$17.78$
$F_{true} (τ) (%)$	$0.5$	$91.7$	$74.3$	$81.6$
	$1$	$94.5$	$87.1$	$83.6$
$F_{false} (τ)$	$0.5$	$243$	$81$	$442$
	$1$	$129$	$209$	$254$

Acquisition Time (ms)	0.4	2	4	20	40	80
MSTD	$73.8$	$87.7$	$93.3$	$97.9$	$98.2$	$98.2$
MSTD + TV	$94.1$	$95.2$	$95.7$	$98.2$	$98.3$	$98.2$
Proposed	$93.3$	$98.5$	$98.7$	$98.6$	$98.9$	$100$

	Target Detection (per-pixel in ms)			Spectral Classification (per-pixel in ms)		3D Reconstruction (full scene in seconds)
Algorithms	ETD	HTD	Proposed	MSTD	Proposed	MuSaPoP	MS3D	Proposed
Mannequin Head	0.8	3	0.8	X	X	405	9	40
Lego Figurine	X	X	1.5	50	2	700	20	80

3D target detection and spectral classification for single-photon LiDAR data

Abstract

1. Introduction

2. Pre-processing: 3D target detection

2.1 Observation model

2.2 Saliency-based target detection

2.3 Background estimation

3. Post-processing: Bayesian 3D reconstruction and spectral classification

3.1 Prior distributions

3.2 Joint posterior distribution

3.3 Graph estimation - multiscale information

3.4 Estimation strategy

3.4.1 Stopping criteria

4. Experiments

4.1 Comparison algorithms and evaluation criteria

4.2 Evaluation on simulated data

4.3 Evaluation on real data

4.3.1 One surface per-pixel

4.3.2 Multiple surfaces per-pixel

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (5)

Equations (26)

Optics Express