Computational framework for steady-state NLOS localization under changing ambient illumination conditions

Yanpeng Cao; Yanpeng Cao; Rui Liang; Rui Liang; Jiangxin Yang; Jiangxin Yang; Yanlong Cao; Yanlong Cao; Zewei He; Zewei He; Jian Chen; Jian Chen; Xin Li

doi:10.1364/OE.444080

1. Introduction

Recently, with the rapid development of computational imaging and optical sensing technologies [1–8], a novel non-line-of-sight (NLOS) imaging solution is proposed to detect, identify, and reconstruct objects beyond the line of sight. The NLOS imaging technique significantly expands the “field of vision” and could be potentially utilized for hidden obstacle detection during autonomous driving, personnel search in rescue operations, endoscopic examination of organs, and other critical sensing applications [9]. However, NLOS imaging of hidden objects represents a challenging task due to two major reasons. First of all, objects outside the line of sight only contribute to the acquired signal measurement through indirect reflections on visible diffuse surfaces, causing severe signal attenuation and extremely weak object-related reflective signal received by the sensor [10]. Moreover, the acquired measurements have undergone at least three diffuse reflections (wall, object, wall) and do not sample all Fourier coefficients, thus estimating the shape or position of the hidden objects is an ill-posed inverse problem [11].

According to the time resolution of the deployed detectors, the existing NLOS imaging techniques can be divided into two major categories including transient and steady-state methods. Transient NLOS imaging techniques obtain the 3D spatial information of hidden objects by calculating the time of flight of photons, thus they typically require the deployment of detectors with extremely high time resolutions (up to picosecond level). For instance, Velten et al. performed high-quality 3D reconstruction of hidden objects, proving the feasibility of transient NLOS imaging using a streak camera [12]. However, streak cameras are subject to many critical drawbacks such as high cost, low photon efficiency, and long acquisition process. To reduce the cost of imaging hardware systems, many researchers built transient NLOS imaging solutions using single-photon avalanche diode (SPAD) detectors and can achieve higher photon efficiency and signal-to-noise ratio (SNR) [13–20]. Although SPAD detectors are generally less expensive than the streak cameras and can still provide sufficient time resolutions for 3D reconstruction of hidden objects, they still suffer from many noticeable limitations when compared with the conventional intensity or RGB cameras including lower fill factor, lower photon efficiency, higher cost, longer sampling time and higher hardware system complexity [21]. In recent years, steady-state NLOS imaging solutions using low-cost RGB cameras and low-power laser sources have been proposed to determine the location and identity of hidden targets, providing important information for various applications such as personnel search and rescue, industrial/medical inspection, and autonomous driving [9–11].

In steady-state NLOS imaging solutions, signals from the hidden object undergo multiple diffuse reflections and then are passively acquired by an intensity camera. In real-world situations, the attenuated reflective signal from hidden objects may be significantly weaker than the ambient light. Therefore, it is critically important to remove interference caused by ambient illumination for high-accuracy detection/identification of hidden objects. When the intensity of ambient illumination remains constant or only changes within a small range, a commonly used approach is to apply background subtraction operation to separate photons from ambient illumination and ones from the hidden objects [22–24]. A noticeable limitation of such calibration-based background subtraction operations is that they require prior knowledge of the ambient illumination (e.g., intensity images of the scene without hidden objects) [23]. Moreover, these fixed-value background subtraction methods only work satisfactorily when the ambient illumination condition remains unchanged that cannot be always guaranteed during practical NLOS imaging tasks. Therefore, it is desirable to develop accurate and robust steady-state NLOS imaging solutions which can adaptively handle ambient illumination changes.

To address the above-mentioned limitations, we present a complete steady-state NLOS data acquisition and processing framework including (1) physical and virtual setups for capturing realistic and simulated steady-state NLOS images under various ambient light conditions and (2) an end-to-end multi-task convolutional neural network (CNN) model for simultaneous correction of ambient light interference and localization of the hidden objects. Figure 1 shows the overall workflow of our proposed steady-state NLOS data acquisition and processing framework. More specifically, the data acquisition setup consists of a typical NLOS scene with hidden objects (15 different mannequins), a conventional RGB digital camera, a controllable ambient light source, and an active laser light source (532.22nm). Here, we build a physical data acquisition hardware system and its corresponding virtual rendering engine to obtain both real-captured and simulated steady-state NLOS images under different ambient illuminations. Inspired by the recent advance of NLOS imaging based on deep learning techniques [25–28], we design an end-to-end CNN model consisting of two consecutive sub-networks to learn the complex mapping function between the NLOS raw images and the position of hidden objects in the presence of severe ambient light interferences. The captured NLOS images are used to train/fine-tune the multi-task CNN architecture to perform simultaneous background illumination correction and NLOS object localization. The proposed approach can effectively eliminate the influence of ambient light during steady-state NLOS tasks without resorting to prior scene knowledge and thus achieve more accurate and robust hidden object localization results under various illumination conditions. The contributions of this paper are summarized as follows:

Fig. 1. Overview of our proposed computational steady-state NLOS imaging framework.

Download Full Size | PDF

(1) We construct a new steady-state NLOS image dataset that contains more than 13,000 simulated and 700 realistic RGB images captured under very different ambient illumination conditions. This new dataset could be utilized to facilitate the training of CNN-based steady-state NLOS detection and recognition models and to perform quantitative evaluations. The captured dataset will be made publicly available in the future.

(2) We experimentally reveal that incorporating background illumination corrections as a supplementary task into the overall NLOS imaging pipeline provides the best alternative for suppressing ambient illumination interference and achieving more accurate NLOS localization results. Based on this important finding, we design a multi-task CNN-based framework to perform high-quality background illumination correction and NLOS object localization simultaneously.

2. Problem formulation

The light emitted from the active laser source is received by the detector after three diffuse reflections. As illustrated in Fig. 2, the propagation path of the light is from the laser source $\rightarrow$ visible relay wall $\rightarrow$ invisible hidden object $\rightarrow$ visible relay wall $\rightarrow$ RGB/intensity camera. Note that the relay wall acts as a virtual light emitter during the first reflection and a virtual light detector during the third reflection.

Fig. 2. The light propagation path of steady-state NLOS imaging.

Download Full Size | PDF

Assuming that the laser source emits highly parallel directional light that reaches a point $\mathbf {o}$ on the hidden object after the first diffuse reflection, the incurred irradiance at the point $\mathbf {o}$ can be depicted as follows:

(1)$$E\left( \mathbf{o} \right) = \int_{{\mathbf{\Omega _1}}} {{\rho _1}E\left( {{\mathbf{w_1}}} \right)} {f_1}\left( {\vec {{\mathbf{l_1}}} ,\vec {{\mathbf{v_1}}} ,\vec {{\mathbf{n_1}}} } \right)d\vec {{\mathbf{v_1}}} ,$$

where $E(\mathbf {w_1})$ denotes the irradiance at a point $\mathbf {w_1}$ on the relay wall, $f_1$ denotes the corresponding Bidirectional Reflectance Distribution Function (BRDF), and $\rho _1$ denotes the surface albedo of the wall, $\vec {\mathbf {l_1}}$, $\vec {\mathbf {v_1}}$ and $\vec {\mathbf {n_1}}$ represent the incoming, outgoing and normal unit vectors of light at the point $\mathbf {w_1}$, respectively. $\mathbf {\Omega _1}$ represents the solid angle formed by all the reflected light from the relay wall at the point $\mathbf {o}$.

Then the light reaches a point $\mathbf {w_2}$ on the relay wall after the second diffuse reflection and the irradiance at the point $\mathbf {w_2}$ can be described as

(2)$$E\left( {{\mathbf{w_2}}} \right) = \int_{{\mathbf{\Omega_2}}} {{\rho _2}} E\left( \mathbf{o} \right){f_2}\left( {\vec {{\mathbf{l_2}}} ,\vec {{\mathbf{v_2}}} ,\vec {{\mathbf{n_2}}} } \right)d\vec {{\mathbf{v_2}}},$$

where $\vec {\mathbf {l_2}}$, $\vec {\mathbf {v_2}}$ and $\vec {\mathbf {n_2}}$ represent the incoming, outgoing and normal unit vectors of light at the point $\mathbf {o}$, respectively. $f_2$ denotes the BRDF of the hidden object, $\rho _2$ represents the surface albedo of the hidden object, and $\mathbf {\Omega _2}$ represents the solid angle formed by all the reflected light from the object at the point $\mathbf {w_2}$.

Finally, the light is acquired by the detector after the third diffuse reflection. More specifically, the intensity $I_{i,j}$ observed by a pixel $\mathbf {p}_{i,j}$ in the detector can be expressed as

(3)$$I_{i,j} = \int_{{\mathbf{\Omega _3}}} {{\rho _3}E\left( {{\mathbf{w_2}}} \right)} {f_3}\left( {\vec {{\mathbf{l_3}}} ,\vec {{\mathbf{v_3}}} ,\vec {{\mathbf{n_3}}} } \right)d\vec {{\mathbf{v_3}}},$$

where $\vec {\mathbf {l_3}}$, $\vec {\mathbf {v_3}}$ and $\vec {\mathbf {n_3}}$ represent the incoming, outgoing and normal unit vectors of light at the point $\mathbf {w_2}$, respectively. $f_3$ denotes the BRDF of the relay wall at point $\mathbf {w_2}$, and $\mathbf {\Omega _3}$ represents the solid angle formed by all the reflected light from the relay wall at the point $\mathbf {p}_{i,j}$. Note $\rho _1 = \rho _3$ and $f_1=f_3$ if the wall material is uniform.

Based on the light transport model described in Eq. (1), Eq. (2), and Eq. (3), the 2D position of the hidden object can be determined by solving the vectors $\vec {\mathbf {l_1}}$, $\vec {\mathbf {v_1}}$, $\vec {\mathbf {n_1}}$, $\vec {\mathbf {l_2}}$, $\vec {\mathbf {v_2}}$, $\vec {\mathbf {n_2}}$, $\vec {\mathbf {l_3}}$, $\vec {\mathbf {v_3}}$ and $\vec {\mathbf {n_3}}$ based on the signal measurement $\mathbf {I} \in \mathbb {R}^{W \times H}$ ($W$ and $H$ are the image width and height, respectively.) acquired by a conventional RGB/intensity camera [22]. However, it represents a challenging ill-posed inverse problem that cannot be solved without additional priors such as BRDFs of the relay wall and hidden objects. Another noticeable problem of the steady-state NLOS imaging solutions is that the real-measured intensity images $\mathbf {I^\text {R}}$ contain a significant amount of ambient light which is difficult to separate from the signals reflected from the hidden objects and therefore will adversely affect the accuracy of detection/identification of hidden objects [23]. In this paper, we consider the overall light transport model is a non-linear mapping function $\Gamma$ defining the relationship between the position of the hidden object ($x$, $y$) and the signal measurement $\mathbf {I}$ as

(4)$$\mathbf{I} = \Gamma \left( x,y \right),$$

and the relationship between the real-measured intensity image $\mathbf {I^\text {R}}$ and the one without ambient light $\mathbf {I}$ is

(5)$$\mathbf{I^\text{R}} = \Psi(\mathbf{I}).$$

Then the 2D position $(x, y)$ of the hidden object can be estimated by solving two consecutive inverse problems as

(6)$$(x,y) = {\Gamma ^{ - 1}}\left[ {{\Psi^{ - 1}}\left( \mathbf{I^\text{R}} \right)} \right].$$

Based on Eq. (6), we present a data-driven computational framework including steady-state NLOS image acquisition and processing, simultaneously restoring the ambient-light-free image $\mathbf {I}$ and estimating the centroid position of the hidden object.

3. NLOS localization under changing ambient illuminations

As mentioned above, the real-measured steady-state intensity/RGB images contain a significant amount of ambient light which will adversely affect the performance of NLOS imaging tasks (e.g., hidden object detection/localization) [23]. To achieve accurate and stable NLOS localization results under various illumination conditions, we divide the challenging task into two consecutive sub-tasks including background illumination correction and NLOS localization. The proposed two-stage CNN model for NLOS Localization Under Changing Ambient Illuminations (NLOS-LUCAI) is illustrated in Fig. 3.

Fig. 3. The overall architecture of the proposed two-stage NLOS-LUCAI model for background illumination correction and NLOS localization.

Download Full Size | PDF

Background illumination correction: All raw images are resized to $128\times 128$ pixels and the pixel values are scaled to [0, 1] before being fed into the background illumination correction network (BICN) which adopts the encoder-decoder architecture [29]. Given preprocessed $128 \times 128 \times 3$ RGB images with background illumination interference as input, the encoder extracts informative features at various convolutional stages. More specifically, the encoder path consists of five repeated use of a $3 \times 3$ convolution layer and a residual block which contains a $3 \times 3$ convolution layer followed by a batch normalization (BN) layer and ReLU activation function. Meanwhile, $2 \times 2$ max pooling layers connecting different stages are employed to reduce the spatial size and increase the receptive fields. The introduction of residual blocks is conducive to preventing degradation and accelerating the convergence of the network. Then the low-resolution feature maps go through the decoder path, which consists of four above-mentioned convolution layers and residual blocks, and $2 \times 2$ upsampling convolutional (Up-Conv) layers. The decoder is followed by a $1 \times 1$ convolution, through which the network outputs reconstructed $128 \times 128 \times 3$ RGB images after background illumination correction. Note that each convolution layer (besides the final $1 \times 1$ conv) is followed by a BN and ReLU. In addition, the skip connection structure is inserted between the encoder and decoder to supplement information and enhance gradient propagation.

NLOS localization: The NLOS localization network (NLN) consists of a feature extractor and a regressor. Given the background illumination corrected $128 \times 128 \times 3$ RGB images, we adopt the ResNet-18 [30] backbone as the feature extractor. Then the global average pooling (GAP) is embedded to generate the vector for subsequent regression. The regressor consists of four fully connected (FC) layers for directly predicting the 2D location $(x, y)$ of a hidden target.

Joint multi-task training: To facilitate the training of the proposed NLOS-LUCAI model, we design a multi-term loss function $\mathcal {L}$ to drive the parameter learning as

(7)$${\mathcal{L}}{\rm{ = }}{\lambda _1}{\mathcal{L}_\text{BIC}} + {\lambda _2}{\mathcal{L}_\text{LOC}}, \\$$

where $\mathcal {L}_\text {BIC}$ and $\mathcal {L}_\text {LOC}$ denote the loss terms for training background illumination correction and NLOS localization sub-networks, respectively. $\lambda _1$ and $\lambda _2$ are the coefficients controlling the weights of $\mathcal {L}_\text {BIC}$ and $\mathcal {L}_\text {LOC}$ loss terms.

The objective of the background illumination correction sub-task is to learn the non-linear mapping model $\Psi ^{-1}(\cdot )$ in Eq. (6) which converts a real-measured image $\mathbf {I^\text {R}}$ to the version without ambient light interference. The model is optimized by minimizing the pixel-wise difference between the predicted ambient-light-free image $\mathbf {I'}$ and corresponding ground truth $\mathbf {I}$. In our experiment, we adopt the smooth L1 loss function to drive the weight learning process as follows:

(8)$$ \mathcal{L}_\text{BIC} = \frac{1}{WH}{\sum_{i = 1}^{W}{\sum_{j = 1}^{H}{{smooth}_{L1}\left( {I_{i,j} - I^{'}_{i,j}} \right)}}}, $$

(9)$$ {smooth}_{L1}\left( x \right) = \left\{ \begin{matrix}{0.5x^{2}~~~~~~~~~~~~~if~\left| x \right| < 1} \\ {\left| x \right| - 0.5~~~~~~~otherwise} \\ \end{matrix} \right. ,$$

where $I_{i,j}\in \mathbf {I}$ and $I^{'}_{i,j}\in \mathbf {I'}$. Here, we utilize the images captured/simulated under the Level 0 lighting condition (turning off the external light source to eliminate ambient light) as the ground truth $\mathbf {I}$ to supervise the training of the background illumination correction sub-network.

After restoring the ambient-light-free image $\mathbf {I'}$, the NLOS localization sub-network is substantially deployed to learn $\Gamma ^{-1}(\cdot )$ defined in Eq. (6) and predict the centroid position of the hidden objects. For $N$ NLOS images containing reflection information of hidden objects, the localization loss term $\mathcal {L}_\text {LOC}$ is defined as follows:

(10)$${\mathcal{L}_\text{LOC}} = \frac{1}{N}\sum_{i = 1}^N {\left( {|{x^\text{GT}_i} - {x_i}|^2 + |{y^\text{GT}_i} - {y_i}|^2} \right)},$$

where $(x, y)$ and $(x^\text {GT}, y^\text {GT})$ represent the estimation and ground truth of 2D location of a hidden target, respectively.

Through the joint training of two subsequent tasks, our proposed NLOS-LUCAI model incorporates background illumination correction as a supplementary task into the overall NLOS imaging pipeline, performing high-quality background illumination correction and NLOS localization simultaneously. Comparative results with other alternatives to achieve accurate NLOS localization under changing ambient lighting conditions are provided in Sec. 4.4.

4. NLOS data acquisition and experiments

In this section, we systematically evaluate the performance of our proposed NLOS-LUCAI model and compare it with other alternatives for hidden object localization under changing ambient lighting conditions on both simulated and real-captured NLOS images. In our experiments, we calculate the mean absolute error (MAE) and the root mean squared error (RMSE) of all testing samples to quantitatively assess the performance of NLOS localization under individual illumination conditions. Moreover, we adopt the peak signal-to-noise ratio (PSNR) to evaluate the pixel-level difference between the background illumination correction results and the ambient-light-free images captured when turning off the external light source (Level 0).

4.1 NLOS data acquisition

In this paper, we construct a typical NLOS scene with hidden objects of different shapes and obtain real-captured (725 frames) and simulated (13,122 frames) steady-state intensity images under different ambient illuminations. Figure 4 shows some simulated and real-captured NLOS images under different ambient illuminations and when the target object is in different positions. The simulated images could be utilized to pre-train NLOS localization models, while the real-captured ones are further used to fine-tune the pre-trained models when they are applied to realistic NLOS scenes. The captured dataset will be made publicly available in the future.

Fig. 4. Some simulated and real-captured NLOS images under different ambient illuminations and when the target object is in different positions.

Download Full Size | PDF

4.1.1 Real-captured images

As illustrated in Fig. 5, the physically built NLOS data acquisition setup consists of a relay wall, an occluder, hidden objects, a conventional RGB digital camera, an ambient illuminator and controller, and an active laser light source. A XWJG XG-FV0412 RGB camera is utilized to capture images of $1920 \times 1080$ pixels and its field of view (FOV) of $76$ degrees. A $74$ mW laser source with $0.437\%$ power fluctuation is utilized to provide active illumination and its wavelength is $532.22$ nm.

Fig. 5. The built steady-state NLOS data acquisition setup.

Download Full Size | PDF

The hidden objects are 3D printed mannequins made of resin and the relay wall and occluder are made of frosted acrylic, respectively. The hidden mannequins are located in a $8$cm $\times$ $8$cm square area with a $2$cm interval ($25$ different locations) which is outside the direct line of sight of the camera. In total, we manufactured 4 mannequins of different body shapes and sizes (height: 8.06cm$\sim$8.40cm and width: 2.26cm$\sim$3.61cm) as shown in Fig. 6, and used #1 - #3 mannequins for capturing training NLOS images while #4 mannequin for capturing testing NLOS images.

Fig. 6. 3D printed mannequins (height: 8.06cm$\sim$8.40cm and width: 2.26cm$\sim$3.61cm) for capturing the training and testing NLOS images.

Download Full Size | PDF

Different from other existing NLOS data acquisition setups [21–23], we deploy an external light source (30W LED light) and a digital controller to generate scenes under various ambient illumination conditions. In total, we experimentally set 9 different ambient lighting conditions (Level 0 - 8) and capture $725$ realistic NLOS images of 4 mannequins. More specifically, we adjust the power of the external light source from 9W to 30W with a 3W interval to generate Level 1 - 8 lighting conditions. Note Level 0 denotes the lighting condition without ambient light (turning off the external light source). As illustrated in Fig. 7, we only make use of images captured under Level 0, 1, 2, 4, 5, 7, 8 illumination conditions to train the NLOS localization model and validate its effectiveness and generalization ability under all 8 illumination conditions including two unseen settings (Level 3 and 6).

Fig. 7. The overview of real-captured NLOS images for localization model training and testing under 9 different ambient lighting conditions.

Download Full Size | PDF

4.1.2 Simulated images

It is practically difficult and time-consuming to collect a large number of real-captured NLOS images to train localization models with good generalization ability. Following some previous works [31,32], this paper addresses the problem of inadequate data by simulating the same NLOS scene in a physically-based renderer, therefore can easily obtain a large number of virtually rendered images that share similar characteristics with the real-captured ones. More specifically, we set the spatial layout consistent with the realistic scene in a physics-based ray tracing engine Blender, and use the microfacet-based Cook-Torrance model with the GGX distribution to simulate the material properties of relay wall and target objects. In our experimental setup, the relay wall acts as a virtual light source, so we directly simulate the laser spot on the relay wall to illuminate the hidden scene/objects. We adjust the light energy, wavelength (532.22 nm), and light attenuation model (Quadratic) in the Blender node editor to simulate the NLOS images with similar characteristics and intensity distribution as the real-captured ones, as illustrated in Fig. 4. The size of each simulated steady-state NLOS image is $240 \times 135$ pixels, and each one takes about 8s to render on a GeForce GTX 1660Ti GPU.

In total, we set $81$ sampling points in a $8$cm $\times$ $8$cm square hidden area with an interval of $1$cm. As illustrated in Fig. 8, we render the three-bounce diffuse reflection images of 15 mannequins with different sizes and postures (12 mannequins are training samples and the remaining ones are testing samples). Similar to our realistic NLOS data acquisition setups, we also simulate scenes with different ambient illumination conditions (Level 0 - 17). As illustrated in Fig. 9, $11,664$ training images are rendered under 12 illumination conditions (Level 0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16) and another $1,458$ testing images are rendered under another 6 different illumination conditions (Level 2, 5, 8, 11, 14, 17).

Fig. 8. Virtually rendered mannequins for simulating training and testing NLOS images.

Download Full Size | PDF

Fig. 9. The overview of simulated NLOS images for localization model training and testing under 18 different ambient lighting conditions.

Download Full Size | PDF

4.2 Implementation details

Pre-training: We implement the proposed NLOS-LUCAI based on the PyTorch framework version $1.2.0$ and pre-train our model on NVIDIA GeForce GTX 1660Ti GPU with CUDA 11.1 for 50 epochs with a batch size of 8. The Adam solver is utilized to optimize the weights of two sub-networks. The learning rate of BICN is set to $1.5 \times 10^{-4}$, with the learning rate reduced by a factor of $0.7$ every $5$ epochs, and the learning rate of NLN is set to $10^{-4}$, with the learning rate reduced by a factor of $0.9$ each epoch. Here, $\lambda _1$ and $\lambda _2$ are empirically set to 1 and 20 respectively.

Fine-tuning: We fine-tune the pre-trained NLOS-LUCAI model on the same configuration for 200 epochs with a batch size of 8. The Adam solver is utilized to optimize the weights of two sub-networks. The learning rate of BICN is set to $1.5 \times 10^{-4}$, with the learning rate reduced by a factor of $0.7$ every $5$ epochs, and the learning rate of NLN is set to $5 \times 10^{-5}$, with the learning rate reduced by a factor of $0.9$ every 4 epochs. Both $\lambda _1$ and $\lambda _2$ are empirically set to 1. The source code of the NLOS-LUCAI model will be made publicly available in the future.

4.3 Experimental results

We set up experiments to evaluate the performance of different methods for NLOS localization under changing ambient lighting conditions including (a) a baseline method which directly utilizes the real-captured RGB image $\mathbf {I^\text {R}}$ with ambient light for training the NLOS localization network, (b) the background subtraction method which acquires individual background images $\mathbf {I^\text {B}}$ under various illumination conditions and computes the background correction results as $\mathbf {I^\text {R}} - \mathbf {I^\text {B}}$ for training the NLOS localization network, (c) our proposed NLOS-LUCAI model, (d) the model consisting of 3 convolutional layers and 1 fully connected layer proposed by Tancik et al. [23], (e) the model consisting of 3 convolutional layers and 2 fully connected layers proposed by Chandran et al. [33]. Note we re-implemented two data-driven NLOS localization methods proposed by Tancik et al. [23] and Chandran et al. [33] and applied the same network setting described in these papers [23,33]. In our experiments, we utilized the background corrected images ($\mathbf {I^\text {R}} - \mathbf {I^\text {B}}$) to train these two light-weight NLOS localization networks. All of the evaluated NLOS localization methods are trained/fine-tuned using our own simulated/real-captured NLOS images to ensure a fair comparison.

4.3.1 Evaluation results on simulated data

We make use of $11,664$ images of 12 mannequins rendered in 81 locations and under 12 illumination conditions (Level 0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16) to train the above mentioned NLOS localization models. Their performances are qualitatively evaluated on $1,458$ images of 3 different mannequins rendered under 6 unseen illumination conditions (Level 2, 5, 8, 11, 14, 17). Figure 10 visualizes the localization errors in 81 locations under illumination Level 2 and Table 1 summarizes the quantitative results.

Fig. 10. (a) Top view of the scene layout and NLOS localization error visualization of (b) the baseline method, (c) the background subtraction method, and (d) our proposed NLOS-LUCAI model on the simulated images rendered under Level 2 illumination condition.

Download Full Size | PDF

Table 1. Quantitative NLOS localization results (MAE [mm] and RMSE [mm]) on simulated images under various illumination conditions. Red indicates the best performance.

View Table | View all tables in this article

It is experimentally observed that these two light-weight CNN models (Tancik et al. [23] and Chandran et al. [33]) cannot achieve satisfactory NLOS localization results. The evaluation results suggest that it is important to deploy the proven effective CNN architectures such as encoder-decoder [29] and ResNet [30] to extract distinctive multi-scale features and accelerate the network convergence for high-performance NLOS tasks. The experimental results also reveal that it is important to compensate for interference caused by ambient illumination and to achieve satisfactory localization accuracy under changing ambient lighting conditions. Applying calibration-based or model-based background subtraction can both decrease the errors of NLOS localization (e.g., achieving lower MAE and RSME values). Moreover, our proposed NLOS-LUCAI model can achieve more accurate NLOS localization results compared with other alternatives. As shown in Fig. 10 (b), (c) and (d), such improvements are particularly obvious for the regions far away from the relay wall (e.g., the upper right corner) where light signals reflected from the hidden objects are weaker and more susceptible to ambient light interference.

Another noticeable advantage of the proposed NLOS-LUCAI model is that it can be initially trained using images captured under a number of pre-defined lighting conditions (Level 0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16) and then effectively extended to unseen illuminations. Therefore, NLOS-LUCAI can generate high-quality NLOS localization results for Level 2, 5, 8, 11, 14, 17 illumination conditions which are not included in the training phase. In comparison, the background subtraction methods require capturing calibration images of the scene without objects under individual illuminations to properly calculate the ambient-light-free images for the following localization task. It is worth mentioning that the proposed NLOS-LUCAI model only takes about 31ms on NVIDIA GeForce GTX 1660Ti GPU to process a single input frame, thus can facilitate real-time NLOS tracking of hidden targets.

4.3.2 Evaluation results on real-captured data

We further evaluate the performance of different NLOS localization methods on the real-captured images. More specifically, the training dataset includes $525$ realistic NLOS images of #1 - #3 3D printed mannequins captured in 25 different locations and under 7 illumination conditions (Level 0, 1, 2, 4, 5, 7, 8). The testing dataset contains $200$ images of #4 mannequin captured under all 8 illumination conditions including two unseen settings (Level 3 and 6).

Instead of directly using the real-captured images to train the NLOS localization models, we firstly perform model pre-training using sufficient simulated images and fine-tune the pre-trained models using a relatively smaller number of realistic NLOS training samples. As illustrated in Fig. 11, such a pre-training/fine-tuning strategy can achieve significantly higher localization accuracies, which is consistent with previous studies [34,35]. Figure 11 shows the quantitative results on real-captured NLOS images. Our proposed NLOS-LUCAI model performs simultaneous background illumination correction and NLOS localization, thus can effectively remove interference caused by ambient illumination and achieve high-accuracy localization of hidden objects. Similar to experimental results on simulated images, our proposed NLOS-LUCAI model can also achieve more accurate localization results when applied to realistic NLOS scenes, validating its effectiveness and generalization ability.

Fig. 11. Quantitative comparison of different NLOS localization methods on real-captured data with or without fine-tuning. (a) the model proposed by Tancik et al. [23], (b) the model proposed by Chandran et al. [33], (c) our proposed NLOS-LUCAI model.

Download Full Size | PDF

4.4 Performance analysis

In this section, we set up experiments to evaluate the performance of three different schemes to train the proposed NLOS-LUCAI model including (1) NLOS localization single-task training without referring to the ambient-light-free images, (2) individual background illumination correction and NLOS localization multi-task training, (3) joint background illumination correction and NLOS localization multi-task training. For a fair comparison, the experiments are conducted using the same CNN architecture as shown in Fig. 3. The detailed configurations of three different schemes are summarized in Table 2.

Table 2. The configurations of three different schemes to train the proposed NLOS-LUCAI model.

View Table | View all tables in this article

In Table 3, we show quantitative evaluation results on simulated NLOS images. We calculate PSNR and RSME to evaluate the performance of background illumination correction and NLOS localization sub-tasks, respectively. Although the individual multi-task training scheme can generate more similar ambient-light-free images with higher PSNR values, its performance is not satisfactory when applied to the subsequent NLOS localization sub-task. Such experimental results suggest that it is not reasonable to separate the training of two highly dependent and complementary sub-tasks. In comparison, our proposed joint multi-task training scheme incorporates the background illumination correction as a supplementary task into the overall NLOS localization pipeline, providing the optimal alternative for suppressing ambient illumination interference and achieving accurate localization results.

Table 3. Quantitative evaluation results (PSNR [dB] and RSME [mm]) of different training schemes on simulated images. Note the single-task training scheme directly estimates NLOS localization results and does not generate background illumination corrected images for calculating PSNR values. Red indicates the best performance.

View Table | View all tables in this article

5. Conclusion

In recent years, numerous steady-state NLOS imaging solutions using conventional intensity cameras have been proposed for fast, accurate, and low-cost localization and identification of hidden objects, alleviating the requirement for expansive time-of-flight sensors with extremely high temporal resolutions. However, these intensity-based NLOS imaging methods typically require prior knowledge of the ambient illumination condition to separate the object-related reflective photons from the undesirable ambient light, hindering their practical use in real-world scenarios. In this paper, we present a complete steady-state NLOS data acquisition and processing framework to address the above-mentioned limit. We built physical and virtual NLOS image acquisition systems for capturing simulated and realistic images under various ambient light conditions. Moreover, we design an end-to-end multi-task NLOS-LUCAI model for simultaneous correction of ambient light interference and localization of the hidden objects. Experiments have verified the benefits of the proposed framework including less requirement for prior scene knowledge and better performance under changing ambient light.

In this paper, we only consider the uniform lighting condition and evaluate the performance of data-driven NLOS localization methods when the intensity of ambient illumination changes. In the future, we plan to further investigate how variation in ambient illumination distribution affects the performance of NLOS localization. Also, we plan to make use of the proposed framework to solve other NLOS imaging tasks such as target identification. Finally, we plan to further optimize the efficiency and robustness of the proposed method, expanding the “field of vision” of sensing systems for performance-improved rescue operations, medical imaging, and autonomous driving.

Funding

National Natural Science Foundation of China (52075485).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803–813 (2018). [CrossRef]

2. Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica 5(10), 1181–1190 (2018). [CrossRef]

3. Y. Sun, Z. Xia, and U. S. Kamilov, “Efficient and accurate inversion of multiple scattering with deep learning,” Opt. Express 26(11), 14678–14688 (2018). [CrossRef]

4. M. Aittala, P. Sharma, L. Murmann, A. Yedidia, G. Wornell, B. Freeman, and F. Durand, “Computational mirrors: Blind inverse light transport by deep matrix factorization,” Adv. Neural Inf. Process. Syst. 32, 14311–14321 (2019).

5. S. Li, Q. Wang, X. Wei, Z. Cao, and Q. Zhao, “Three-dimensional reconstruction of integrated implosion targets from simulated small-angle pinhole images,” Opt. Express 28(23), 34848–34859 (2020). [CrossRef]

6. H. Yanagihara, T. Kakue, Y. Yamamoto, T. Shimobaba, and T. Ito, “Real-time three-dimensional video reconstruction of real scenes with deep depth using electro-holographic display system,” Opt. Express 27(11), 15662–15678 (2019). [CrossRef]

7. W. Zhao, B. Zhang, C. Xu, L. Duan, and S. Wang, “Optical Sectioning Tomographic Reconstruction of Three-Dimensional Flame Temperature Distribution Using Single Light Field Camera,” IEEE Sens. J. 18(2), 528–539 (2018). [CrossRef]

8. A. Bodenmann, B. Thornton, and T. Ura, “Generation of High-resolution Three-dimensional Reconstructions of the Seafloor in Color using a Single Camera and Structured Light,” J. Field Robotics 34(5), 833–851 (2017). [CrossRef]

9. T. Maeda, G. Satat, T. Swedish, L. Sinha, and R. Raskar, “Recent advances in imaging around corners,” arXiv preprint arXiv:1910.05613 (2019).

10. D. Faccio, A. Velten, and G. Wetzstein, “Non-line-of-sight imaging,” Nat. Rev. Phys. 2(6), 318–327 (2020). [CrossRef]

11. R. Geng, Y. Hu, and Y. Chen, “Recent advances on non-line-of-sight imaging: Conventional physical models, deep learning, and new scenes,” arXiv preprint arXiv:2104.13807 (2021).

12. A. Velten, D. Wu, A. Jarabo, B. Masia, C. Barsi, C. Joshi, E. Lawson, M. G. Bawendi, D. Gutierrez, and R. Raskar, “Femto-photography: Capturing and visualizing the propagation of light,” ACM Transactions on Graph. (SIGGRAPH 2013) 32, 4 (2013). [CrossRef]

13. M. O’Toole, D. B. Lindell, and G. Wetzstein, “Confocal non-line-of-sight imaging based on the light-cone transform,” Nature 555(7696), 338–341 (2018). [CrossRef]

14. B. Ahn, A. Dave, A. Veeraraghavan, I. Gkioulekas, and A. C. Sankaranarayanan, “Convolutional approximations to the general non-line-of-sight imaging operator,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), pp. 7889–7899.

15. M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 2289–2297.

16. M. Buttafava, J. Zeman, A. Tosi, K. Eliceiri, and A. Velten, “Non-line-of-sight imaging using a time-gated single photon avalanche diode,” Opt. Express 23(16), 20997–21011 (2015). [CrossRef]

17. F. Xu, G. Shulkind, C. Thrampoulidis, J. H. Shapiro, A. Torralba, F. N. C. Wong, and G. W. Wornell, “Revealing hidden scenes by photon-efficient occlusion-based opportunistic active imaging,” Opt. Express 26(8), 9945–9962 (2018). [CrossRef]

18. S. Chan, R. E. Warburton, G. Gariepy, J. Leach, and D. Faccio, “Non-line-of-sight tracking of people at long range,” Opt. Express 25(9), 10109–10117 (2017). [CrossRef]

19. M. L. Manna, J.-H. Nam, S. A. Reza, and A. Velten, “Non-line-of-sight-imaging using dynamic relay surfaces,” Opt. Express 28(4), 5331–5339 (2020). [CrossRef]

20. M. Laurenzis, J. Klein, E. Bacher, and N. Metzger, “Multiple-return single-photon counting of light in flight and sensing of non-line-of-sight objects at shortwave infrared wavelengths,” Opt. Lett. 40(20), 4815–4818 (2015). [CrossRef]

21. W. Chen, S. Daneau, C. Brosseau, and F. Heide, “Steady-state non-line-of-sight imaging,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 6783–6792.

22. J. Klein, C. Peters, J. Martín, M. Laurenzis, and M. B. Hullin, “Tracking objects outside the line of sight using 2d intensity images,” Sci. Rep. 6(1), 32491 (2016). [CrossRef]

23. M. Tancik, G. Satat, and R. Raskar, “Flash photography for data-driven hidden scene recovery,” arXiv preprint arXiv:1810.11710 (2018).

24. K. L. Bouman, V. Ye, A. B. Yedidia, F. Durand, G. W. Wornell, A. Torralba, and W. T. Freeman, “Turning corners into cameras: Principles and methods,” in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), pp. 2289–2297.

25. C. Piergiorgio, B. Alessandro, B. Daniel, H. Matthias, C. F. Higham, H. Robert, M. S. Roderick, and F. Daniele, “Neural network identification of people hidden from view with a single-pixel, single-photon detector,” Sci. Rep. 8, 11945 (2017). [CrossRef]

26. M. Tancik, T. Swedish, G. Satat, and R. Raskar, “Data-driven non-line-of-sight imaging with a traditional camera,” in Imaging Systems and Applications, (Optical Society of America, 2018), pp. IW2B–6.

27. N. Scheiner, F. Kraus, F. Wei, B. Phan, F. Mannan, N. Appenrodt, W. Ritter, J. Dickmann, K. Dietmayer, B. Sick, and F. Heide, “Seeing around street corners: Non-line-of-sight detection and tracking in-the-wild using doppler radar,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 2065–2074.

28. X. Lei, L. He, Y. Tan, K. X. Wang, X. Wang, Y. Du, S. Fan, and Z. Yu, “Direct object recognition without line-of-sight using optical coherence,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 11729–11738.

29. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), (2018), pp. 801–818.

30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 770–778.

31. J. Grau Chopite, M. B. Hullin, M. Wand, and J. Iseringhausen, “Deep non-line-of-sight reconstruction,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 957–966.

32. G. Satat, M. Tancik, O. Gupta, B. Heshmat, and R. Raskar, “Object classification through scattering media with deep learning on time resolved measurement,” Opt. Express 25(15), 17466–17479 (2017). [CrossRef]

33. S. Chandran and S. Jayasuriya, “Adaptive lighting for data-driven non-line-of-sight 3d localization and object identification,” in 30th British Machine Vision Conference, BMVC 2019, (2020).

34. N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?” IEEE Transactions on Med. Imaging 35(5), 1299–1312 (2016). [CrossRef]

35. K. Nogueira, O. A. Penatti, and J. A. Dos Santos, “Towards better exploiting convolutional neural networks for remote sensing scene classification,” Pattern Recognit. 61, 539–556 (2017). [CrossRef]

	Level 2	Level 5	Level 8	Level 11	Level 14	Level 17
	MAE / RMSE	MAE / RMSE	MAE / RMSE	MAE / RMSE	MAE / RMSE	MAE / RMSE
Baseline	3.1 / 3.7	2.5 / 2.9	2.4 / 3.0	2.4 / 2.9	2.3 / 2.8	2.4 / 2.9
Background Subtraction	2.5 / 2.9	2.4 / 2.8	2.4 / 2.8	2.2 / 2.6	2.2 / 2.6	2.3 / 2.7
NLOS-LUCAI	1.9 / 2.4	1.9 / 2.4	1.8 / 2.3	1.9 / 2.4	1.8 / 2.3	1.9 / 2.4
Tancik et al. [23]	3.5 / 4.0	3.2 / 3.7	3.0 / 3.4	2.9 / 3.4	3.0 / 3.4	3.2 / 3.7
Chandran et al. [33]	3.1 / 3.7	3.2 / 3.8	2.9 / 3.4	2.6 / 3.0	2.6 / 3.1	3.0 / 3.5

	Level 2	Level 5	Level 8	Level 11	Level 14	Level 17
	MAE / RMSE	MAE / RMSE	MAE / RMSE	MAE / RMSE	MAE / RMSE	MAE / RMSE
Baseline	3.1 / 3.7	2.5 / 2.9	2.4 / 3.0	2.4 / 2.9	2.3 / 2.8	2.4 / 2.9
Background Subtraction	2.5 / 2.9	2.4 / 2.8	2.4 / 2.8	2.2 / 2.6	2.2 / 2.6	2.3 / 2.7
NLOS-LUCAI	1.9 / 2.4	1.9 / 2.4	1.8 / 2.3	1.9 / 2.4	1.8 / 2.3	1.9 / 2.4
Tancik et al. [23]	3.5 / 4.0	3.2 / 3.7	3.0 / 3.4	2.9 / 3.4	3.0 / 3.4	3.2 / 3.7
Chandran et al. [33]	3.1 / 3.7	3.2 / 3.8	2.9 / 3.4	2.6 / 3.0	2.6 / 3.1	3.0 / 3.5

Computational framework for steady-state NLOS localization under changing ambient illumination conditions

Abstract

1. Introduction

2. Problem formulation

3. NLOS localization under changing ambient illuminations

4. NLOS data acquisition and experiments

4.1 NLOS data acquisition

4.1.1 Real-captured images

4.1.2 Simulated images

4.2 Implementation details

4.3 Experimental results

4.3.1 Evaluation results on simulated data

4.3.2 Evaluation results on real-captured data

4.4 Performance analysis

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (3)

Equations (10)

Optics Express