Adaptive compressed photon counting 3D imaging based on wavelet trees and depth map sparse representation

Huidong Dai; Guohua Gu; Weiji He; Ling Ye; Tianyi Mao; Qian Chen

doi:10.1364/OE.24.026080

1. Introduction

The photon counting 3D imaging has aroused considerable attention these years for its ability to acquire 3D structure and reflectivity of a scene at ultra-low light level. Due to the high sensitivity and excellent temporal resolution, its potential applications spanning biological imaging of delicate samples [1, 2] to long-range remote sensing [3, 4].

In a photon counting 3D imaging system, the temporal resolution is obtained based on the time-of-flight (TOF) principle [5], which measures the time required for the photons to travel from the laser transmitter to the target and back to the receiver. However, the conventional photon counting systems achieve high temporal resolution but suffer from poor spatial resolution. In the former, transverse spatial resolution is obtained by a single photon counting detector via raster scanning [5–7]. Since the scanning time are linearly proportional to the spatial resolution, it takes too much time to obtain an image with high resolution, and the mechanical movement decreases the stability of the system. The latter instead uses photon counting detector arrays to acquire the 3D information of the scene, such as the APD arrays used in the Jigsaw system created at MIT Lincoln Labs [8, 9]. Recently, Genevieve et al. presented a demonstration of the potential of SPAD camera using for ultrafast imaging, and it can be applied for light-of-flight imaging [10] and viewing moving objects hidden from view [11]. While the single Geiger-mode APDs are well developed, the APD arrays have limitation in high-resolution array fabrication, with the resolution of $128 \times 32$ commercially available [12]. Moreover, the photon counting detector arrays suffer from high dark count rates, pixel cross-talk, and resource intensive correlated process in TOF ranging. Since the devices are still in their infancy, the technology of SPAD camera will play a significant role in the future.

There have been some works reported to deal with the limited spatial resolution in 3D imaging. Baoqing Sun et al. [13] utilized structured illumination and spatially separated photodiodes to obtain multiple images with different shading information which allows the surface gradient and hence the 3D images to be reconstructed. Kirmani et al. [14] proposed a raster scanning based method, which uses only the first photon detection at each pixel to reduce the data acquisition time. Recently Mingjie Sun et al. [15] presented a single-pixel 3D imaging system which employs Hadamard matrices for providing pulsed structured illumination and a high speed photodiode to sample the time-varying intensity of the back-scattered light from a scene. Another kind of methods employ a single-element time-resolved detector that are sensitive to individual photons in a single-pixel camera configuration [16, 17], where the transverse resolution can be obtained by employing compressed sensing (CS) principles [18–20]. Howland et al. [21] proposed a photon counting laser radar system for 3D imaging where spatial resolution is obtained by compressed sensing without scanning or array detection. Range is determined by gating on a range of interest, and thus a full range sweep is required for acquiring a complete scene depth map, and it requires separate reconstructions for each range of interest. Kirmani et al. [22, 23] presented a method for acquiring depth maps based on a parametric signal modeling and the sparsity of the Laplacian of the depth map of a typical scene. However, the methods are so complicated that it requires huge computational power and long time to reconstruct the depth map. Subsequently, Howland et al. [24] improved their protocol by separately recovering the reflectivity and the modulated image which contains the information of both reflectivity and depth of the scene. The protocol is simple and effective, but it fails to image in the presence of partially-transmissive occluders.

Although these CS-based methods achieve non-scanning range acquisition with a single-pixel photon counting detector, the reconstruction is usually reliant on iterative algorithms to solve the optimization problems, suffering from a computational overhead. The consumed time increases exponentially with the increasing of spatial resolution, which limits the application. Fortunately, these shortcomings can be overcome by a novel single-pixel-based image sampling technique called adaptive compressed sampling (ACS) reported by Averbuch et al. [25]. The technique takes a coarse image with low resolution and adaptively senses the regions that have most of the edges with a higher resolution based on the wavelet trees. Then the image is recovered by performing an inverse wavelet transform, which avoids the computational overhead of CS and thus speeds up the display stage. Similar ideas have been proposed to improve the performance of ghost imaging systems [26, 27]. In our previous work, we extended the concept of wavelet trees by adding the information of the sibling wavelet coefficients, and proposed an improved version (adaptive compressed sampling based on extended wavelet trees, EWT-ACS) [28]. EWT-ACS optimizes the process of edges searching to improve the image quality and further reduces the number of measurements by digging out the redundancies in the sparse sampling process. The technique have been applied for single-pixel video acquisition [29] and color imaging [30].

In this manuscript, we apply the idea of adaptively sensing the regions with edges based on extended wavelet trees to 3D image, and propose a multiresolution photon counting 3D imaging technique. Both the reflectivity and depth map are obtained with short-pulsed structured illumination and a single-pixel photon counting detector. The proposed technique is very suitable for scenes with high spatial resolution for the following advantages. First, the depth map of a scene is generally more compressible or sparse than the reflectivity. Directly exploiting sparsity in the depth map of a natural scene in the wavelet domain, the acquisition time can be significantly reduced. Second, since the depth map is retrieved through an inverse wavelet transform instead of the computational intensive optimization problems performed in CS, the time consumed to retrieve the depth map can be also reduced, and thus it will be suitable for applications of real-time compressed 3D imaging such as object tracking. Third, even though the resolution of the final 3D image can be high, the number of measurements remains small due to the adaptivity of the wavelet-trees-based sampling strategy. Forth, the adaptive sampling technique is quality oriented, allowing more control over the image quality.

2. Methodology

2.1 Depth map sparse representation

In the photon counting 3D imaging system, a 3D image of the scene can be divided into reflectivity and a depth map which can be acquired simultaneously. As only a small portion of the reflectivity and depth map of natural scenes consists of edges, only a small portion of their wavelet coefficients corresponding to regions with edges are significantly different from zero, which means both of them are sparse or compressible in the wavelet domain.

A comparison of the sparsity of the reflectivity and depth map is presented in Fig. 1, where Fig. 1(a) and 1(b) are the reflectivity and depth map of the target scene, and Fig. 1(c) and 1(d) are the corresponding wavelet representations. As can be seen, the depth map has fewer significant coefficients than the corresponding reflectivity. Same results are shown by the curves presented in Fig. 1(e), where the coefficients shown in Fig. 1(c) and 1(d) are sorted by magnitude. The curve of the reflectivity (blue, solid) descends more slowly than that of the depth map (red, dashed). There are ~19.24% nonzero coefficients in the wavelet representation of reflectivity, while the number is only ~8.33% in the wavelet representation of depth map.

Fig. 1 Comparison of the sparsity of scene reflectivity and depth map in the wavelet domain. The target scene is a house with resolutions of $512 \times 512$ pixels, whose reflectivity and depth map are presented in (a) and (b), together with their three-level wavelet representation shown in (c) and (d). (e) presents curves where the coefficients shown in (c) and (d) sorted by magnitude.

Download Full Size | PDF

As can be seen from Fig. 1, depth maps is generally sparser than reflectivity, since the edges in reflectivity are caused by objects contour, surface texture and different illumination, while only objects contour contributes to the edges in depth maps. In order to acquire high quality 3D image with a single-pixel photon counting detector, it is more efficient to acquire scene based on the sparsity of the depth map than the reflectivity.

2.2 Wavelet-trees-based sampling strategy

As can be seen from the wavelet decomposition, an image is composed of low frequency coefficients and high frequency coefficients. The low frequency coefficients represent a coarse version of the image while those with high frequency represent details at high resolution. Only small portion of high frequency coefficients corresponding to regions of edges are large, which are called significant coefficients. The corresponding regions with edges are called significant regions. Therefore, an image can be approximated without any regression using a low resolution preview and a few significant coefficients, which is the principle of wavelet compression [31].

However, although the reflectivity and depth map are sparse or compressible in the wavelet domain, it is not intuitively useful for 3D imaging since one has to know the location of significant coefficients before coefficients sampling. Fortunately, the wavelet decomposition provides a hierarchical data structure called wavelet trees for significant region estimation [32, 33]. Wavelet trees are composed of the coefficients at different resolutions and different orientations but with the same spatial location. The coefficient at the lower resolution is called a parent coefficient and the four coefficients at the higher resolution is called its children coefficients. As the spatial regions of a child coefficient is a subset of the spatial region of its parent coefficient, coefficient values tend to propagate through resolutions and wavelet trees. In addition, as the projection of the same spatial information in three orientations, the three coefficients at the same resolution and with the same spatial location which are called sibling coefficients have a strong correlation. Comprised of the parent-children correlation and the sibling correlation, extended wavelet trees are used to find significant coefficients at a higher resolution. A threshold $t h r_{c o e f}$ is defined as the criterion for estimating the significance of coefficients, i.e. the sharpness of edges in the corresponding regions. If the magnitude of a parent coefficient is larger than $t h r_{c o e f}$ , its four children coefficients are thought to be significant and require to be sampled. In addition, the sampled sibling coefficients of the parent coefficient can be used to improve the estimation accuracy [28].

In wavelet-trees-based sampling, the coefficients thought to be significant are sampled directly using the patterns that form the wavelet basis. As an example, a horizontal Haar wavelet coefficient can be written as [34]

w_{j, k}^{1} = 〈 f, ψ_{j, k}^{1} 〉 = 2^{- j} [\int_{2^{j} k_{1}}^{2^{j} (k_{1} + 1)} \int_{2^{j} k_{2}}^{2^{j} (k_{2} + 1 / 2)} f (x_{1}, x_{2}) d x_{1} d x_{2} - \int_{2^{j} k_{1}}^{2^{j} (k_{1} + 1)} \int_{2^{j} (k_{2} + 1 / 2)}^{2^{j} (k_{2} + 1)} f (x_{1}, x_{2}) d x_{1} d x_{2}],

where

f

represents an image,

x_{1}, x_{2}

are indices over pixels,

j

is the scale (which determines the resolution of the coefficients),

k = (k_{1}, k_{2})

is a 2D location, and

e \in {1, 2, 3}

represents three types of coefficients with different spatially oriented frequency:

L H (e = 1)

detects horizontal edges,

H L (e = 2)

detects vertical edges, and

H H (e = 3)

detects diagonal edges. The wavelet basis

ψ_{j, k}^{e}

represents the edges at a scale

j

, location

k

, and orientation

e

. As Eq. (1), a horizontal coefficient needs to be measured twice with two patterns corresponding to the wavelet basis, and its value is the difference of these two measurements multiplied by a scale factor of

2^{- j}

. In addition, the number of measurements can be further reduced by reusing the measurements of their sibling coefficients sampled previously.

After all the significant wavelet coefficients are sampled, the image can be recovered using an inverse wavelet transform. For more details, please refer to our previous work [28].

2.3 Adaptive compressed photon counting 3D imaging

The schematic of our 3D imaging system is shown in Fig. 2. The combination of a pulsed laser driven by a pulse generator [the black arrow in Fig. 2] and a digital micro-mirror device (DMD) is implemented to provide structured illumination to a scene containing targets at different depths. The back-scattered photons are detected by a photon counting type photomultiplier tube (PMT) module. A time-correlated single-photon counting (TCSPC) module correlates photon arrivals [the green arrow in Fig. 2] with the transmitting pulses [the red arrow in Fig. 2] to find each photon’s TOF. For each pattern, the measurement is performed multiple times and the output of TSCPC [the blue arrow in Fig. 2] is sent to a computer. Then a histogram of photon travel time is computed to obtain information about surfaces at different depths within the field of view and to exclude the effect of noise, such as detector dark counts, ambient light, and shot noise. Acting as a pattern generator, the computer also generates patterns to be displayed at a higher resolution [the yellow arrow in Fig. 2] based on the coefficients previous sampled. Detailed information about experimental setup is provided in Section 3.1.

Fig. 2 Experimental setup. The combination of a pulsed laser and a DMD is implemented to provide structured illumination onto a scene, and the back-scattered photons are detected by a photomultiplier tube module (photon counting type). The timing histograms computed by a TCSPC module are used to reconstruct both reflectivity and depth maps.

Download Full Size | PDF

The multiresolution 3D imaging technique relies on the idea of adaptively sensing the regions of significant wavelet coefficients at the higher resolution based on wavelet trees and depth map sparse representation. An overview of the proposed technique is given in Fig. 3.

Fig. 3 Overview of the 3D imaging technique at the $s$ -th stage. The combination of a pulsed laser and DMD patterns at the $s$ -th stage (a) is implemented to provide structured illumination to a scene. The back-scattered photons (b) are time-correlated with the transmitting pluses to compute the photon counting histograms (c). A wavelet coefficient cube (d) (the intensity of coefficients is enhanced for better illustration) is obtained by combining the results of wavelet coefficients computation and the low-resolution coefficients perviously sampled, which is converted to real space via an inverse wavelet transform to obtain an image cube (e). Then histograms recording the TOF of photons reaching different transverse pixels (f) are be obtained, from which the depth map (g) and reflectivity (h) can be estimated. At last, the patterns to be displayed at the next stage are generated by significant regions estimation based on the wavelet trees and depth map sparse representation. The acquisition and data processing carry on until the expected resolution is reaching.

Download Full Size | PDF

In order to acquire 3D image with high resolution, the acquisition process is divided into several stages for multiresolution acquisition. At the first stage, the technique starts with a $N_{1} \times N_{1}$ pixels preview of the target scene obtained by using a set of point-by-point scanning patterns ${P_{1 i}} (s = 1, i = 1, 2, \dots, m_{1})$ , here $s$ is an index over stages, $i$ is an index over patterns, $m_{s}$ denotes the total number of patterns to be displayed at the $s$ -th stage, and then $m_{1} = N_{1}^{2}$ due to the point-by-point scanning mechanism. The structured laser pulses shown in Fig. 3(a) are projected to the scene and the back-scattered photons are presented in Fig. 3(b). A set of photon counting histograms ${H_{1 i}} (i = 1, 2, \dots, m_{1})$ are computed by the TCSPC module shown in Fig. 3(c), where $H_{1 i}$ denotes the photon counting histogram associated with the $i$ -th pattern at the first stage. Each photon counting histogram has $L$ time bins corresponding to different depth $d_{l} = c t_{l} / 2 (l = 1, 2, \dots, L)$ , where $l$ is an index over depth along the longitudinal axis $z$ , $c$ is the speed of light and $t$ is the time-of-flight. According to the point-by-point scanning mechanism, each of the photon counting histogram $H_{1 i}$ just records the TOF of photons reaching an individual transverse pixel $(x, y) (x = 1, 2, \dots, N_{1}, y = 1, 2, \dots, N_{1})$ during the measurement, as shown in Fig. 3(f). Then the depth map preview $I_{1}^{D}$ of the scene is obtained by finding the maximum in the photon counting histograms for individual pixel, while the reflectivity preview $I_{1}^{R}$ is then computed by averaging the photon counting histograms, shown in Fig. 3(f) and 3(g) respectively.

Then the technique moves to the next stage, where the resolution is increased by four times to obtain significant coefficients at a higher resolution. We denotes this stage as the $s$ -th stage for legibility. A one-level wavelet transform is performed on the depth map obtained at the last stage $I_{s - 1}^{D}$ to obtain its wavelet representation $W_{s - 1}^{D}$ . Then the significant regions can be found based on wavelet trees and depth map sparse representation obtained at the last stage $W_{s - 1}^{D}$ . As described in Section 2.2, a set of patterns ${P_{s i}}$ corresponding to wavelet basis supported on the significant regions are generated to be displayed on a DMD, and their resolution is four times as much as those at the last stage, i.e. $N_{s} \times N_{s} = 2 N_{s - 1} \times 2 N_{s - 1}$ . The same as the first stage, 3D information is recorded in a set of photon counting histograms ${H_{s i}}$ computed by the TCSPC module. For each time bin from the photon counting histograms, the technique utilizes the photon counting values reaching the time bin to compute high-resolution significant wavelet coefficients based on the wavelet-trees-based sampling strategy mentioned in Section 2.2. The high-resolution wavelet coefficients corresponding to the nonsignificant regions remains to be zeros. Combining the results of high-resolution wavelet coefficients computation and the low-resolution coefficients previously sampled, a wavelet coefficient cube formed by $L$ wavelet coefficients matrices is obtained, as given in Fig. 3(d). Then the wavelet coefficients matrices is converted to real space via an inverse wavelet transform to obtain an image cube shown in Fig. 3(e). In the image cube, each transverse pixel $(x, y) (x = 1, 2, \dots, N_{s}, y = 1, 2, \dots, N_{s})$ has a histogram recording the TOF of photons reaching the individual pixel, shown in Fig. 3(f). Then the depth map $I_{s}^{D}$ and the reflectivity $I_{s}^{R}$ at the $s$ -th stage can be reconstructed in the same way as those at the first stage.

The technique sequentially acquires the scene with successively finer resolution from the first stage to the last, and alternately uses the acquired wavelet representation of the intermediate depth maps to find the significant regions and generate the patterns to be displayed at the next stage. The acquisition carries on until the expected resolution is reaching. Combining both the reflectivity and depth map at the final resolution, a 3D image of the scene with high quality is acquired.

3. Experiments and results

3.1 Experimental setup

The schematic diagram of the experimental setup is shown in Fig. 2. A pulsed laser diode (PicoQuant LDH series with an 830 nm center wavelength) with a 4 mW average illumination power, 10 MHz repetition rate, and a 300 ps full width at half maximum is utilized as illumination source. The DMD we used is a Vialux ALP 4.2 model with a XGA resolution (i.e. $1024 \times 768$ pixels). While the modulation rate can reach up to 22.7 kHz, the illumination time of each pattern is set to 1 millisecond to ensure that enough photons are received. A standard Nikor 50mm photographic lens is used to project patterns onto the scene. A photomultiplier tube (Hamamatsu photosensor module H7422P-50, photon counting type) with an active area of 5 mm in diameter, used conjunction with a 35 mm imaging lens, giving a $8.2 °$ field-of-view, which matches that of the projection system. A narrow-band filter with a central wavelength of 830 nm and a full width at half maximum of 10 nm is mounted in front of the imaging lens to block the room-lights. The average count rate is 0.5 megacounts per second, and the corresponding average detected power is about 0.125 pW. Then a TCSPC module (PicoQuant PicoHarp 300) correlates the backscattered photons with the transmitting pulses to record each photon’s TOF.

3.2 3D imaging for scenes

In this experiment, we started the imaging with $64 \times 64$ pixels preview and the number of stages was set to 4. Then the final resolution of the reflectivity and depth map was $512 \times 512$ pixels. The time bin of the TCSPC module was 4 ps, and five time bins were combined as one to rebuild the histograms for computational efficiency. As objects were placed within 30 cm in the two scenes given in our experiment, the image cube was consisted of a hundred 2D images, with the depth resolution of 3 mm. Another critical parameter was the compression ratio, defined as the number of patterns to project divided by the total number of pixels at the final resolution. For a more comprehensive analysis, we imaged each scene at compression ratios of 5%, 10%, and 20%.

First, the photon counting 3D imaging system were tested on a simple scene composed of cardboard cutouts of the letters “NJ”, “U”, “ST”, together with a rubber cushion used as background, as shown in Fig. 4(a). The objects were placed at distances of about 171 cm, 166 cm and 162 cm respectively, and the background was located at a distance of ~173 cm. High quality reconstruction of both the reflectivity and depth map of the scene were acquired with a resolution of $512 \times 512$ pixels and compression ratios of 5%, 10% and 20%, shown in Fig. 4(b)-4(d) and Fig. 4(e)-4(g) respectively. The total time for acquisition, data transmission, and reconstruction at the three compression ratios were about 17.5 seconds, 31.1 seconds, and 57.9 seconds respectively where the reconstruction time is less than 1 second. There is a bit of distortions in Fig. 4(b) and 4(c) due to the omission of some edges in reflectivity. As can be seen, visible distortions are reduced and the reflectivity and depth maps become clear with the increase of the compression ratio.

Fig. 4 Imaging with a simple scene. (a) is the photograph of the scene consisting of cardboard cutouts of the letters “NJ”, “U”, “ST”. The depth maps and the reflectivity are both acquired with resolution of $512 \times 512$ pixel and compression ratios of 5%, 10% and 20%, shown in (b)-(d) and (e)-(g) respectively.

Download Full Size | PDF

Then we tested our system on a natural scene containing real objects. As shown in Fig. 5(a), a magic cube and a porcelain cup placed on a black box were chosen as targets, and a rubber cushion was used as background. The scene was located at a distance of 150-180 cm. The reconstructed reflectivity with resolution of $512 \times 512$ pixels and compression ratios of 5%, 10%, and 20% are presented in Fig. 5(b)-5(d), while the acquired depth maps are presented in Fig. 5(e)-5(g). The consumed time is almost as the same as that for the simple scene. Since the scene is more complicated, there are more distortions in the reflectivity and depth maps. Both the reflectivity and depth map with high quality can be acquired at compression ratio of 5%, and the words on the porcelain cup can be seen clearly at a compression ratio of 20%.

Fig. 5 Imaging with a natural scene. (a) is the photograph of the scene consisting of a magic cube and a porcelain cup. The reflectivity and depth maps are both acquired at a resolution of $512 \times 512$ pixel and compression ratios of 5%, 10% and 20%, shown in (b)-(d) and (e)-(g) respectively.

Download Full Size | PDF

3.3 Comparison

3D imaging by the CS-based methods proposed in [21] and [24] were performed for comparison. As mentioned, the former obtained spatial resolution by compressed sensing and acquired range by a temporal gating, while the latter collapsed the 3D cube data to a modulated image and recovered depth maps via convex optimization. TVAL3 [34], a faster one in a variety of CS solvers, was chosen as the solver to reconstruct the reflectivity and depth map. A set of zero-shifted randomly-permutated Walsh-Hadamard matrices [35] that fits in with TVAL3 was chosen as the projected patterns. The results are given in Fig. 6, where Fig. 6(a)-6(c) and Fig. 6(d)-6(f) are the reflectivity and depth maps acquired by the method proposed in [21] at compression ratios of 5%, 10%, and 20%. Figure 6(g)-6(i) and Fig. 6(j)-6(l) are those acquired by the method proposed in [24]. The quality of reflectivity recovered by the method proposed in [21] is not so good at such low compression ratios while those recovered by the method proposed in [24] have a fussy background. The depth maps recovered by both of the methods are satisfied that there are distortions in the black part of the magic cube. The consumed time are much longer than that of our technique for this scene. The total time for acquisition, data transmission, and reconstruction at the three compression ratios by method proposed in [21] are about 3286.6 seconds, 2961.6 seconds, and 2287 seconds, which are infeasible in practice. Time consumed by method proposed in [24] are 53.8 seconds, 61.2 seconds, and 84 seconds respectively. It should be noted that the computational complexity of our technique has a dependence on the number of time bins where there are targets at the corresponding depths. For long range imaging scenarios when the objects are scattered over meters, the method of [24] will have a better performance in time cost. A hybrid technique containing the advantages of both wavelet-trees-based sampling and the idea of collapsing the 3D cube data to a modulated image will be a better choice for long range imaging scenarios, which will be discussed in our futher work, due to the limitation of the length of this paper.

Fig. 6 A natural scene imaged by the method proposed in [21] and [24]. (a)-(c) and (d)-(f) are the reflectivity and depth maps acquired by the method proposed in [21] at a resolution of $512 \times 512$ pixel and compression ratios of 5%, 10%, and, 20%. (g)-(i) and (j)-(l) are those acquired by the method proposed in [24].

Download Full Size | PDF

3.4 3D imaging through a partially-transmissive occluder

The proposed technique has the ability to acquire 3D imaging in the presence of partially-transmissive occluders, which is crucial for defense applications. Similar to the experiments in previous work [15, 21], a scene composed of cardboard cutouts of the letters “NJ”, “U”, “ST” and a netting used as a partially-transmissive occluder was built, as illustrated in Fig. 7(a). The netting was located at a distance of ~155 cm and fills the field of view. The cardboard cutouts and background were placed behind the netting within 160-180 cm from the projected lens. Considering the whole range as range of interest, the retrieved reflectivity and depth map at $512 \times 512$ pixels and a compression ratio of 20% are presented in Fig. 7(b) and 7(c). As can be seen, the line-of-sight are blocked by the occluder. Performing an artificial gating on the TCSPC histograms to block the back-scattered photons from the netting, the reflectivity and depth map of the targets could be successfully reconstructed with the same resolution and compression ratio, as shown in Fig. 7(d) and 7(e).

Fig. 7 3D imaging through a partially-transmissive occluder. (a) Illustration of scene containing cardboard cutouts and a netting used as a partially-transmissive occluder. The reflectivity and depth map obtained without temporal gating are presented in (b) and (c), while (d) and (e) are the results reconstructed with temporal gating.

Download Full Size | PDF

In addition, it is worth mentioning that, the time-gating method works well only when the reflected photons back from the occluder can be easily separated from the photons back from targets. When the reflected photons back from the occluder and the targets are partially overlapped, the time-gating can just mitigate the effects of the occluder. Then more sophisticated methods should be used to estimate the multi-return signal, such as the method used in the raster scanning framework proposed in [37]. The method has an ability to deal with photon overlapping is one of our future work.

3.5 3D novelty imaging

For many applications such as high frame-rate video, target localization, surveillance, and object tracking, it is important to identify changes in a scene at different times. Some methods [24, 38] uses compressed sensing to exploit the sparsity of the relative changes of a scene with a moving object, leading to the reduction of the number of measurements required to recover the image.

Different from first measuring the reference frame and the current frame and then recovering the novelty image from the difference of their measurement vectors in conventional compressed sensing, our technique uses a multiresolution framework to directly sense the regions with changes based on the wavelet-trees-based sampling and motion estimation in the wavelet domain. The motion estimation is performed by comparing the wavelet trees of the two frames to determine the regions with changes. The technique first takes a coarse image with low resolution, performs motion estimation, and then samples the significant regions with changes along wavelet trees at a finer resolution. Iterating between wavelet-tree-based sampling and motion estimation, the multiresolution framework can acquire novelty images with successively finer resolution, and alternately use them to estimate the motion until the expected resolution is achieved. Then the number of measurement can be significantly reduced. Similar idea has been applied to video acquisition in our previous work [29].

Figure 8 gives an example of 3D novelty imaging using our technique. In the experiment, a scene consisting of cardboard cutouts of the letters “NJ”, “U”, “ST” and a rubber cushion used as background was built, where the cardboard cutout “U” moves both away from the projector and from the left to the right of the scene, shown in Fig. 8(a) and 8(b). Figure 8(c) and 8(d) gives the reflectivity and the difference reflectivity respectively. The depth map and difference depth map are presented in Fig. 8(e) and 8(f) respectively. Both the reflectivity and depth map are acquired at compression ratio of 20%, while the difference images are acquired at only 5%, because the changes are much sparser than the scene.

Fig. 8 3D novelty imaging. (a) and (b) present the photographs of the current scene and the reference scene, where the letter “U” has changed positions (both transversely and longitudinally). (c) and (d) present the reflectivity and depth map of the current scene reconstructed at compression ratio of 20%. (e) and (f) show the corresponding difference reflectivity and depth map obtained at compression ratio of 5%, where the negative values in the difference images denote the object’s former location.

Download Full Size | PDF

4. Discussion

4.1 Noise consideration

We now discuss the influence of noise. There are three dominant sources of noise in photon counting imaging system. Because of noise from various sources in the tube, the output of the PMT may contain pulses that are not related to the light input which are referred to as dark counts. Since events caused by dark noise are uniformly distributed, the dark counts can be regarded as constant values added to the measurements. These values can be measured by blocking the detector and counting the events in absence of light, and then be subtracted from the measurement results to eliminate the influence of dark counts. In addition, the dark count rate of the detector in our experiment is less than 250 counts per second, which is much lower than the signal counts.

Another noise source is ambient light. Although there are fluctuations in ambient light levels, the proposed technique can also deal with the ambient light due to the essence of the wavelet coefficient sampling principle. As mentioned in Section 2, two patterns corresponding to the same wavelet coefficient are projected and the difference of these two measurements is taken to calculate the wavelet coefficient. Since the pattern projection rates are up to 1 kHz, the ambient light can be regarded to be unchanged and it can be reduced through the process of calculating difference.

The last noise source is shot noise, which describes the fluctuations of the number of photons detected due to their occurrence independent of each other. Generally, the effect of shot noise on each pixel is random. On one hand, according to the masking effects in human visual system (HVS), distortions in the regions consisting of edges are masked and hardly noticeable [39]. Therefore, shot noise has a little influence on the regions consisting of edges which corresponding to the significant coefficients. On the other hand, since our technique relays on the strategy of searching edges along depth map sparse representation, only coefficients corresponding to the regions of edges which are larger than the predetermined threshold are sampled. The pixels of the smooth regions are approximated by coefficients at low resolution, which can mitigate the effects of shot noise in the smooth regions. The experimental results showed that the technique can directly acquire novelty images to find changes in a scene.

4.2 Adaptivity of the technique

According to the wavelet-trees-based sampling strategy, an image with a finer resolution consists of a coarse image and some details at the finer resolution. The coarse image can be sampled with a small number of measurements. Since the coefficient values tend to propagate through resolutions and wavelet trees, only regions of details corresponding to the coefficients significantly different from zero in the wavelet domain of the coarse image are sampled at the finer resolution, which can be used to find regions of details at the further finer resolution. Repeat the steps until the expected resolution is reaching.

The idea behind our technique is to sample details at successfully finer resolution along the wavelet trees, and to recover the 3D image with the coarse image and the details. Therefore, even though the resolution of the final 3D image can be high, the number of measurements remains small due to the adaptivity of the wavelet-trees-based sampling strategy. To illustrate this, a simulation on the reflectivity and depth map shown in Fig. 1 is carried out. For a quantitative comparison of the results, the peak signal-to-noise ratio (PSNR) is introduced, as presented in Eq. (3).

P S N R = 10 \log \frac{255^{2}}{M S E},

where

M S E

is the mean square error, presented in Eq. (4).

M S E = \frac{1}{a b} {\sum_{p, q = 1}^{a, b} [T_{0} (p, q) - T_{r} (p, q)]}^{2} .

T_{0}

is the original image consisting of

a \times b

pixels, and

T_{r}

is the retrieved image.

It should be note that the time for acquisition is the product of the dwell-time, the total number of pixels computed based on the Nyquist’s theory, and the compression ratio. The detected power per measurement is proportion to the dwell-time. Then for a certain dwell time, that the detected power is enough for recording photons, the time for acquisition is determined by the compression ratio, which is another critical parameter in our experiment.

The curves given in Fig. 9(a) show that the number of measurements of both the reflectivity and depth map are not significantly increased to reach certain values of PSNR (30 dB, 35 dB, 40 dB in the simulation) with the increasing of resolution. To put it another way, the compression ratios decline with the increasing of resolution, shown in Fig. 9(b).

Fig. 9 Curves describe (a) the number of measurements and (b) the compression ratios to reaching PSNRs of 30 dB, 35 dB, and 40 dB with increasing of resolution.

Download Full Size | PDF

In addition, defined as the criterion for estimating the significance of the coefficients, the significance threshold $t h r_{c o e f}$ determines whether the edges at the regions corresponding to the coefficients are sharp enough to be sampled. Therefore, $t h r_{c o e f}$ determines the preset recovered quality of the 3D image and the adaptive sampling technique is a quality oriented sampling process, which allows more control over the image quality.

5. Conclusion

In this work, we demonstrate a multiresolution photon counting 3D imaging technique which acquires a 3D image with high resolution from a coarse image and details at successfully finer resolution sampled along the wavelet trees and their depth map sparse representations. The main advantage of our proposed system is in acquiring high resolution 3D images where raster scanning and 2D sensor arrays are intractable. The reflectivity and depth map of a scene at resolutions up to $512 \times 512$ pixels are obtained with practical acquisition time as low as 17.5 seconds, with both the image quality and speed improved compared to the CS-based methods proposed in [21] and [24]. In addition, the proposed technique can be used to image in presence of partially-tansmissive occluders and detect changes via novelty imaging. This technique has been proven to be a good choice for high resolution 3D imaging applications.

Benefit from the flexible of the single-pixel configuration, our technique can be extended to infrared wavebands by employing an infrared pulsed laser and switching the silicon detector to an InGaAs detector, which could improve the imaging distance due to reduced atmospheric scattering [40]. Furthermore, a hybrid system combining the adaptive sampling strategy and a SPAD array has the potential to further increase the spatial resolution without significantly increasing the computation overhead.

Funding

National Natural Science Foundation of China (NSFC) (61271332); Fundamental Research Funds for the Central Universities (30920140112012); Seventh Six-talent Peak Project of Jiangsu Province (2014-DZXX-007); Science and Technology on Low-light-level Night Vision Laboratory (J20130501); Foundation of Key laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education (JYB201509).

References and links

1. W. Becker, A. Bergmann, M. A. Hink, K. König, K. Benndorf, and C. Biskup, “Fluorescence lifetime imaging by time-correlated single-photon counting,” Microsc. Res. Tech. 63(1), 58–66 (2004). [CrossRef] [PubMed]

2. S. Kumar, C. Dunsby, P. A. A. De Beule, D. M. Owen, U. Anand, P. M. P. Lanigan, R. K. P. Benninger, D. M. Davis, M. A. A. Neil, P. Anand, C. Benham, A. Naylor, and P. M. W. French, “Multifocal multiphoton excitation and time correlated single photon counting detection for 3-D fluorescence lifetime imaging,” Opt. Express 15(20), 12548–12561 (2007). [CrossRef] [PubMed]

3. W. C. Priedhorsky, R. C. Smith, and C. Ho, “Laser ranging and mapping with a photon-counting detector,” Appl. Opt. 35(3), 441–452 (1996). [CrossRef] [PubMed]

4. U. C. Herzfeld, B. W. Mcdonald, B. F. Wallin, T. A. Neumann, T. Markus, A. Brenner, and C. Field, “Algorithm for detection of ground and canopy cover in micropulse photon-counting lidar altimeter data in preparation for the icesat-2 mission,” IEEE Trans. Geosci. Remote Sens. 52(4), 2109–2125 (2014). [CrossRef]

5. B. Schwarz, “Mapping the world in 3D,” Nat. Photonics 4(7), 429–430 (2010). [CrossRef]

6. R. Lamb and G. Buller, “Single-pixel imaging using 3d scanning time-of-flight photon counting,” SPIE Newsroom (2010).

7. A. McCarthy, R. J. Collins, N. J. Krichel, V. Fernández, A. M. Wallace, and G. S. Buller, “Long-range time-of-flight scanning sensor based on high-speed time-correlated single-photon counting,” Appl. Opt. 48(32), 6241–6251 (2009). [CrossRef] [PubMed]

8. R. M. Marino and W. R. Davis Jr., “Real-time 3d ladar imaging,” Linc. Lab. J. 15, 23–35 (2005).

9. M. Richard and W. Davis, “Jigsaw: A foliage-penetrating 3D imaging laser radar system,” Linc. Lab. J. 15, 1 (2005).

10. G. Gariepy, N. Krstajić, R. Henderson, C. Li, R. R. Thomson, G. S. Buller, B. Heshmat, R. Raskar, J. Leach, and D. Faccio, “Single-photon sensitive light-in-fight imaging,” Nat. Commun. 6, 6021 (2015). [CrossRef] [PubMed]

11. G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, “Detection and tracking of moving objects hidden from view,” Nat. Photonics 10(1), 23–26 (2015). [CrossRef]

12. Princeton Lightwave, “Falcon 128x32 Geiger-Mode Flash 3-D LIDAR Camera,” http://www.princetonlightwave.com/wp-content/uploads/2016/09/PLI-Falcon.pdf.

13. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef] [PubMed]

14. A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. C. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014). [CrossRef] [PubMed]

15. M. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Radwell, R. Lamb, and M. J. Padgett, “Single-pixel three-dimensional imaging with time-based depth resolution,” Nat. Commun. 7, 12010 (2016). [CrossRef]

16. M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

17. M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk, “An architecture for compressive imaging,” in Proceedings of IEEE Conference on Image Processing, (IEEE, 2006), pp. 1273–1276.

18. D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

19. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006). [CrossRef]

20. E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006). [CrossRef]

21. G. A. Howland, P. B. Dixon, and J. C. Howell, “Photon-counting compressive sensing laser radar for 3D imaging,” Appl. Opt. 50(31), 5917–5920 (2011). [CrossRef] [PubMed]

22. A. Kirmani, A. Colaço, F. N. C. Wong, and V. K. Goyal, “Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor,” Opt. Express 19(22), 21485–21507 (2011). [CrossRef] [PubMed]

23. A. Colaço, A. Kirmani, G. A. Howland, J. C. Howell, and V. K. Goyal, “Compressive depth map acquisition using a single photon-counting detector: Parametric signal processing meets sparsity,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2012), pp. 96–102. [CrossRef]

24. G. A. Howland, D. J. Lum, M. R. Ware, and J. C. Howell, “Photon counting compressive depth mapping,” Opt. Express 21(20), 23822–23837 (2013). [CrossRef] [PubMed]

25. A. Averbuch, S. Dekel, and S. Deutsch, “Adaptive compressed image sensing using dictionaries,” SIAM J. Imaging Sci. 5(1), 57–89 (2012). [CrossRef]

26. M. Assmann and M. Bayer, “Compressive adaptive computational ghost imaging,” Sci. Rep. 3, 1545 (2013). [CrossRef] [PubMed]

27. W. K. Yu, M. F. Li, X. R. Yao, X. F. Liu, L. A. Wu, and G. J. Zhai, “Adaptive compressive ghost imaging based on wavelet trees and sparse representation,” Opt. Express 22(6), 7133–7144 (2014). [CrossRef] [PubMed]

28. H. Dai, G. Gu, W. He, F. Liao, J. Zhuang, X. Liu, and Q. Chen, “Adaptive compressed sampling based on extended wavelet trees,” Appl. Opt. 53(29), 6619–6628 (2014). [CrossRef] [PubMed]

29. H. D. Dai, G. H. Gu, W. J. He, Q. Chen, and T. Y. Mao, “Adaptive video compressed sampling in the wavelet domain,” Opt. Laser Technol. 81, 90–99 (2016). [CrossRef]

30. Y. Yan, H. Dai, X. Liu, W. He, Q. Chen, and G. Gu, “Colored adaptive compressed imaging with a single photodiode,” Appl. Opt. 55(14), 3711–3718 (2016). [CrossRef] [PubMed]

31. C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 still image coding system: an overview,” IEEE Trans. Consum. Electron. 46(4), 1103–1127 (2000). [CrossRef]

32. J. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Process. 41(12), 3445–3462 (1993). [CrossRef]

33. A. Said and W. Pearlman, “A new fast and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circ. Syst. Video Tech. 6(3), 243–250 (1996). [CrossRef]

34. S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way (Academic, 2008).

35. C. Li, W. Yin, and Y. Zhang, “Users guide for TVAL3: TV minimization by augmented lagrangian and alternating direction algorithms,” CAAM Report (2009).

36. K. G. Beauchamp, Applications of Walsh and Related Functions (Academic, 1984).

37. D. Shin, F. Xu, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “Computational multi-depth single-photon imaging,” Opt. Express 24(3), 1873–1888 (2016). [CrossRef] [PubMed]

38. O. S. Magana-Loaiza, G. A. Howland, M. Malik, J. C. Howell, and R. W. Boyd, “Compressive object tracking using entangled photons,” Appl. Phys. Lett. 102(23), 231104 (2013). [CrossRef]

39. J. F. Delaigle, C. Deveeschouwer, B. Macq, and I. Langendijk, “Human visual system features enabling watermarking,” in Proceeding of IEEE International Conference on Multimedia and Expo, (IEEE, 2002), pp. 489–492. [CrossRef]

40. A. Bucholtz, “Rayleigh-scattering calculations for the terrestrial atmosphere,” Appl. Opt. 34(15), 2765–2773 (1995). [CrossRef] [PubMed]

Adaptive compressed photon counting 3D imaging based on wavelet trees and depth map sparse representation

Abstract

1. Introduction

2. Methodology

2.1 Depth map sparse representation

2.2 Wavelet-trees-based sampling strategy

2.3 Adaptive compressed photon counting 3D imaging

3. Experiments and results

3.1 Experimental setup

3.2 3D imaging for scenes

3.3 Comparison

3.4 3D imaging through a partially-transmissive occluder

3.5 3D novelty imaging

4. Discussion

4.1 Noise consideration

4.2 Adaptivity of the technique

5. Conclusion

Funding

References and links

Cited By

Figures (9)

Equations (3)

Optics Express