Range clusters based time-of-flight 3D imaging obstacle detection in manifold space

Qicong Wang; Dingxi Gong; Shuang Wang; Yunqi Lei

doi:10.1364/OE.22.008880

1. Introduction

Time-of-flight (TOF) sensor is a range imaging camera system through the fusion of high-speed infrared pulse and traditional video cameras [1]. Namely, it is capable of simultaneously producing the intensity images and the range information of objects in outdoor environments. Now, more and more attention has been paid to TOF imaging sensor in the field of machine vision [2–4]. However, when handling the TOF depth data, we always encounter some difficulties that are caused by the negative characteristics of the data themselves, such as their distance ambiguity, large noise and low resolution, etc [5]. These factors limit the applications of TOF imaging sensor in machine vision. For the distance ambiguity problem of TOF depth data, a method of combining the intensity images to eliminate the distance ambiguities is proposed while segmenting the depth image beforehand [6]. A multiple frequency method overlapping two different frequencies that are acquired at the same time is also proposed to solve the ambiguity problem [7].

For the processing of TOF depth images with noise and low resolution, the regular data patches are considered as basic data processing unit [8]. Compared with the pixel level detection, the edge based detection and the region based detection, object detection based on regular data patches is relative insensitivity to noise and it can preserve enough shape information of the objects. At the same time, the neighborhood structure between the pixels can show the local characteristics and the similarity better, and it can also low down the complexity of processing. The typical method to segment the data into patches is to sperate the original images into regular patches with the size of n × n, and set the moving step between patches as m(m < n) pixels. The detection result of this method is that all the pixels in the same patch are classified as the same category [9]. No matter how small n is (n cannot be set too small, or the method will tends to the pixel level processing), the final detection will show obvious block effect at the edges. That is, when the edge of the obstacle passes through a patch, the pixels belong to different categories on the two sides of the edge cannot be given different category labels and then they will be classified as the same category. Thus the detected object edges are not smooth and the detection precision will also be affected. Such an example is shown in Fig. 1, the depth image data are segmented into regular patches with the size of 8 × 8. In the original depth image on the left in Fig. 1, three rectangle patches in the black box are covering on the edges of the obstacle. The detection result using regular data patches is shown on the right in Fig. 1. As we can see that this detection result is obviously not satisfied - at the locations of the three patches, pixels belong to different categories on the two sides of the edge are classified as the same category. This disadvantage of regular data patches is from that it cannot guarantee the pixels in the same patch to have the same internal characteristic and the same categorical attribute, as such, using regular data patch as the processing unit may not be accurate enough. Moreover, the above method is still sensitive to noise, although not that sensitive. It is known that the main noise sources are photon scattering noise, electron noise, thermal noise, 1/f noise and quantization noise for depth images produced by TOF 3D non-scanning sensors [10, 11], and among them, the photon scattering noise following Poisson distribution accounts for most of all the noises [12]. The noises are unavoidable and they can affect obstacle detection greatly when using regular data patches. This is an important problem that should be addressed further.

Fig. 1 Obstacle detection result of depth images using regular data patches.

Download Full Size | PDF

This paper tries to find a new method to detect objects more precisely in TOF depth images with noise and low resolution. We firstly utilize intensity images to estimate noise deviation of range images and then a weighted local linear smoothing is used to project the TOF data into a new manifold surface, which can low down the influence of outlier and noise effectively. According to the 3D information of depth images, we classify the near points in the 3D space into an original small range cluster. This division method conforms to the cognition way of human [13], that is, the nearer the points in the 3D space are, the higher the probability they belong to the same category is. By the number histogram of the original range clusters we can compute the optimal normalization size of all the original range clusters. And then according to some certain criterions we normalize all the range clusters into regular rectangles with the size of s × s, which we call the optimal range clusters. The optimal range clusters are regarded as the data unit to process the TOF data and are applied to the supervised preserving projection manifold learning to realize obstacle detection.

2. Extraction of the non-ambiguity regions

According to the TOF imaging principles, the TOF sensor computes distance by measuring the phase difference between the transmitted signal and the reflected signal. Because of the periodic variation of the transmitted cosine wave modulation signals, the measured range information of the scene has ambiguities. The formula to compute the distance between the object and the sensor is:

D = R_{amb} \cdot \frac{φ}{2 \cdot π}

where,

R_{amb} = \frac{c}{2 \cdot f_{m}}

If f_m = 20MHz,

R_{amb} = \frac{3 \times 10^{8} m / s}{2 \times 20 \times 10^{6} / s} = 7.5 m

, this means that the effective non-ambiguity measurement distance is 7.5m. When range ambiguity occurs, the depth value changes suddenly at the connections of two contiguous cycles, which causes the false edge presentation of range images. To deal with this phenomena, we firstly detect the boundaries where ambiguity occurs in the TOF range images. Then by combining the corresponding intensity images that are also produced by the TOF sensors, we can quickly and effectively extract the non-ambiguity foreground regions in the first non-ambiguity cycle of the range images [8].

3. Noise estimation and elimination

3.1. Noise estimation of range images combining intensity images

For depth images produced by TOF 3D non-scanning sensors, the main noise sources are photon shot noise, electron noise, thermal noise, 1/f noise and quantization noise [10, 11]. The photon shot noise accounts for most of all the noise. Suppose A₁,..., A₃ are sampled signals, the noise are independent of each other and the standard deviation is $Δ A_{i} = \sqrt{A_{i}}$ . The phase difference Δϕ can be computed by the following formula:

Δ ϕ = \sqrt{\sum_{i = 1}^{3} {(\frac{\partial ϕ}{\partial A_{i}})}^{2} \cdot A_{i}}

If we choose some special phase values (0°, 45°, 90°, 135°, 180°,...), then the distance measurement error can be computed by [10]:

Δ L = \frac{L}{\sqrt{8}} \cdot \frac{\sqrt{B}}{2 \cdot A}

where B is offset, L is the non-ambiguity range value, and ΔL is the random error of the measured distance. We can see that the measurement error of TOF sensor is inversely proportional to the modulation amplitude A. Besides, the PSNR value in the intensity images is higher than that of the range images, so the noise in the range images can be estimated by combining the corresponding intensity information that are captured by a SwissRanger sensor. After wavelet transform, the actual data and the noise data of the image can show different dependences. Because of the random distribution of the noise signal, the dependence of noise is weak or even uncorrelated, while the actual data can show strong dependence. The noise standard deviations of depth images can be estimated via the dependences of the noised data after wavelet transform. According to the energy distribution characteristics of wavelet domain, the sub-band diagram of the three layer wavelet transform of image is shown as Fig. 2. In the figure, the LL sub-band represents low frequency information, while the sub-bands HL, LH and HH represent high frequency information. As the energy value of the coefficients of sub-band HH₁ is relatively small, so this sub-band is always considered as the sub-band that is most closely related to noise in images. For depth image processing, the coefficients of sub-band HH₁ of depth images can also be used to estimate the noise standard deviations.

Fig. 2 Three layer wavelet transform of images.

Download Full Size | PDF

3.2. Estimation of noise standard deviation

As analyzed above, the non-static noise standard deviation σ̂_l can be estimated by the formula:

{\hat{σ}}_{l} = C \cdot \frac{1}{A_{l}}

The formula shows that the noise deviation and the amplitude A_l are inversely proportional. In the formula, C is a constant which is related to camera. To compute C, we do wavelet decomposition to depth image first and divide the HH₁ sub-band and the intensity image into patches without overlapping, then sort these two groups of patches according to the mean variance of intensity patches in ascending order. For every HH₁ sub-band patch b_l of depth images, its noise variance can be estimated by:

{\hat{σ}}_{b_{l}}^{2} = \frac{1}{N} \sum_{k = 1}^{N} {(X_{l, k} - m_{l})}^{2}

where m_l is the mean value of the lth patch of the sub-band coefficients and X_l,k is the kth coefficient of the patch. Let A_{b_l} represents the mean intensity value of patch l. We choose regions with smallest variance of intensity image patches with a probability of P. Then the constant C can be estimated as:

C = \frac{1}{P \cdot M} \sum_{l = 1}^{P M} \frac{A_{b_{l}}}{{\hat{σ}}_{b_{l}}}

where, M is the number of depth data patches.

3.3. Weighted local linear smoothing

In order to reconstruct the nonlinear manifold of the noisy samples (i.e., TOF data patches) more accurately, when using supervised preserving projection, we use weighted local linear smoothing to remove the outliers and the noisy points in the samples firstly, which can effectively reduce the opportunity to get direct short nearest neighbor connection [14]. In this way, the topology structure of the nonlinear manifold can be preserved in the adjacent map that is constructed from the limited samples. In our method, the noise standard deviation estimation is used to construct the data weight matrix. Our aim is to re-project the high-dimensional space of the sample data only, and the low-dimensional estimation and the dimension reduction process are omitted in this stage, which also lows down the cost of the algorithm.

For every sample x_i, we use Euclidean distance to determine the k − 1 nearest neighbors of sample x_i, denote it as [x_i1, x_i2,..., x_ik]. The weight matrix is computed from the noise standard deviation estimation, and every weight vector w_ij is normalized so that $\sum_{j = 1}^{k} w_{i j} = 1$ , where

w_{i j} = {\begin{cases} \frac{Δ}{σ_{j}} exp (- γ ‖ x_{i} - x_{j} ‖), & x_{j} \in [x_{i 1}, x_{i 2}, \dots, x_{i k}] \\ 0, & else \end{cases}

in which x_j ∈ [x_i1, x_i2,..., x_ik] represents the k-nearest neighbor of x_i, and σ_j is the noise deviation of sample x_j. σ_j is used to estimate the effective degree of every nearest neighbor to sample x_i. The bigger σ_j is the less effect x_j has on x_i. Δ is the normalization coefficient to make sure that the sample and its k nearest neighbors satisfy

\sum_{j = 1}^{k} w_{i j} = 1

. As long as we have the approximate projection space of sample x_i, then we can project x_i into the space without dimension reduction.

4. Range clusters

In obstacle detection using depth images, the 3D information must be considered to divide the data into many significative data clusters. Inspired by the criterion based morphology segmentation methods [15,16], through computing the 3D spatial distances between pixels and by setting a serious of criterions, we can classify the 3D data into different small groups. Every group has its own coherent characteristic, and each of the small group is called a basic range cluster.

4.1. Realization of basic range clusters

Time-of-flight 3D non-scanning technology can get the 3D spatial coordinate (X, Y, Z) of every pixel in the scene and the coordinates represent the location information of the obstacles in 3D space. The nearer the pixels in the space are, the higher possibilities they belong to the same classification are, so we can use the 3D spatial distance between the data points to realize original classification by $d (i, j) = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2} + {(z_{i} - z_{j})}^{2}}$ . In this study, we use K-means clustering method to implement dynamic clustering of pixels and we combine 2D projection density map to choose the original clustering centers. Firstly, the 2D grids of a proper size are constructed, and the valid data (3D information of pixels without range ambiguity) is projected onto the 2D grids according to the following formula:

[X, 0, Z] = [X, Y, Z] [\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}]

By counting the number of projected points in every grid as the projection density of the grid, which are denoted as: ρ₁, ρ₂, ..., ρ_M, then, we use K-means clustering algorithm to get basic range clusters. The constructed grids and the vertical view of depth image projection is shown as Fig. 3. The figure shows the vertical view of projection grids of two depth scenes. The two rows are the depth images and their corresponding vertical view of projection respectively. For the imaging pixels within the valid imaging range of the camera, after projected onto the grid, they will form a cone area. From the figure we can see, the projection densities of ground are even and small, and the densities of pedestrians and other obstacles are larger than the ambient values. The projection density map can provide good original clustering centers for k-means clustering. According to the basic range clusters realization algorithm above, and corresponding to the depth images in Fig. 1, the obtained basic range clusters are shown as Fig. 4, where K = 220.

Fig. 3 Vertical view of projection grids of depth images.

Download Full Size | PDF

Fig. 4 Basic range clusters (K = 220).

Download Full Size | PDF

4.2. Normalization of range clusters

Suppose x_i is a basic range cluster, s_i is the least size of the square patch which can cover x_i totally, which means that s_i × s_i can cover x_i totally. For a depth image without ambiguity, the size histogram of the least covering sizes corresponding to all the basic range clusters can be defined as:

hist (l_{k}) = m_{k}, l_{k} = 1, 2, \dots, L

where l_k represents the k − th size level, and L is the highest size level, which means that L × L is the least square region that can cover the biggest basic range cluster totally. m_k is the number of the clusters whose level are l_k. In Fig. 5, we show the pixel number in every basic range cluster and the least square covering region size of every basic range cluster when K = 220, corresponding to the left depth image in Fig. 4.

Fig. 5 Pixel number of basic range clusters and the least covering sizes, (a) pixel number in every basic range cluster, (b) the least square covering region size corresponding to every basic range cluster.

Download Full Size | PDF

From Fig. 5(a) we can see that, when K = 220, the depth image is divided into 220 range clusters. A large part of these clusters have about 100 pixels in them, which means that the mean size of the square patches that can cover all the basic range clusters is about 10. This can also be seen from Fig. 5(b). Fig. 6 is the corresponding size histogram of the least covering size of basic range clusters. The statistics shows that for this depth image, the size of nine is about to the most basic range clusters and most sizes are distributed between 6 and 12. During the process of normalizing the least covering sizes, the kurtosis and skewness need to be computed by the following formulas:

Kurt = \frac{1}{m - 1} \sum_{k = 1}^{L} m_{k} {(l_{k} - l^{*})}^{4} / σ^{4}, Skew = \frac{1}{m - 1} \sum_{k = 1}^{L} m_{k} {(l_{k} - l^{*})}^{3} / σ^{3}

where, σ is the standard deviation of the histogram, l^* is the mean value of all the least square sizes. According to the kurtosis and skewness of the size histogram, the size interval of the normalized optimal range clusters can be estimated by:

S_{\min} = floor (s^{~} - \frac{1 - Kurt}{1 - Skew} \cdot F (a, 0, 1)), S_{\max} = ceil (s^{~} + \frac{1 - Kurt}{1 - Skew} \cdot F (a, 0, 1))

From the histogram mode method, it is better if s^∼ is the mode threshold which is set at the middle of the two peak positions. But the histogram in Fig. 6 shows that the range clusters of some depth images may have no obvious bimodal, so we set s^∼ to be the average value of the size histogram. F(·) is the Gaussian distribution function. In order to estimate the optimal covering size effectively, we set it to a fixed value so that the normalized size depends mostly on the basic range clusters whose sizes are in the middle.

Fig. 6 Histogram of the least covering sizes according to the basic range clusters.

Download Full Size | PDF

4.2.1. Choosing criterions of the optimal size

Denote s^* as the optimal square size when the original partition number K equals to a certain value, and let $x_{i}^{*}$ represent the optimal square that covers the original range cluster x_i, the criterion to evaluate the optimal size depends mainly on those two parameters:

{\begin{array}{l} missing (x_{i}) = & | x_{i} - x_{i}^{*} |, n_{i} < n^{*} \\ added (x_{j}) = & | x_{j}^{*} - x_{j} |, n_{j} > n^{*} \end{array}

where n_i and n_j are the pixel numbers corresponding to the basic range clusters x_i, x_j. n^* is the pixel number in the square with the optimal size, n^* = s^* × s^*. missing(x_i) and added(x_i) represent the lost and the added information, respectively, during the normalization process of basic range cluster x_i. The optimal square region should make

\sum_{i = 1}^{L} missing (x_{i})

and

\sum_{i = 1}^{L} added (x_{i})

as little as possible, which means that the optimal size s^* can be chosen from S_min and S_max. This leads to the absolute Spatial Quality Measure (SQM) given by

SQM = \sum_{i = 1}^{k_{1}} p (x_{i}) \cdot missing (x_{i}) + \sum_{j = 1}^{k_{2}} p (x_{j}) \cdot added (x_{j})

where, k₁ represents the clusters which have information loss, and k₂ represent clusters which need information adding, k₁ + k₂ = K. p(x_i) and p(x_j) show the importance of the clusters. They are measured by the pixel proportions:

p (x_{i}) = \frac{n_{i}}{n}

,

p (x_{j}) = \frac{n_{j}}{n}

, where n_i and n_j are the pixel numbers of the basic range clusters x_i, x_j, and n is the pixel number of the whole depth image. Generally, a lower SQM value indicates a better result.

5. Obstacle detection based on supervised preserving projection

For TOF data, although the inherent underlying manifold structure is unknown and complex, we can regard the whole scene as an ambient space and the TOF data points are just embedded in or they are just adjacent to the underlying manifold. Graph embedding [17] aims to preserve the underlying manifold structure encoded by graph G. We use the Laplacian eigenmaps [18] to preserve the local neighborhood properties of the underlying manifold from the TOF data.

For all normalized range clusters, the similarities and classifications mainly depend on their nonlinear geometry future representation on the low-dimensional manifold. Dimension reduction algorithm based on supervised preserving projection can learn the important information automatically so that it can extract the characteristics of the data itself. Every range cluster is regarded as a point existing in the high-dimensional Euclidean space. The inherent characteristics of range clusters are extracted by preserving the geometry futures between any two range clusters.

For a given data set H = [(h₁, b₁), (h₂, b₂),..., (h_n, b_n)], which contains all normalized range clusters, where h_i ∈ R^T, i = 1,...,n, and b_i is the label of every depth image range cluster (obstacle or non-obstacle). E = [e₁, e₂,..., e_n] is the corresponding low-dimensional embedding of H, where e_i ∈ R^t, i = 1,...,n, t ≪ T.

5.1. Graph construction

Manifold learning can be divided into two parts: graph construction and graph embedding. K-nearest neighborhood algorithm is an effective method to construct graph, and we use biased distance [19] to reduce the data deviation caused by noise and other factors. The definition of biased distance is:

\tilde{dist} (i, j) = dist (i, j) f (i, j)

where dist(i, j) = ‖h_i − h_j‖² is the Euclidean distance,

f (i, j) = \frac{P (i, j) + ν}{{max}_{i j} P (i, j) - P (i, j) + ν}

, P(i, j) = τ|b_i − b_j|, ν is the noise and τ is the control parameter. Gaussian kennel function based on

\tilde{dist} (i, j)

is used to construct the weight matrix of the graph G:

W_{i, j} = {\begin{cases} exp (- \frac{d^{*} {(i, j)}^{2}}{2 σ^{2}}), & h_{i} \in N_{k} (h_{j}) \lor h_{j} \in N_{k} (h_{i}) \\ 0, & otherwise \end{cases}

where, h_i ∈ N_k(h_j) ∨ h_j ∈ N_k(h_i) represents that h_i and h_j are neighbors to each other.

5.2. Graph embedding

The weight neighborhood graph G encodes the underlying manifold structure of the high-dimensional TOF information. The aim of graph embedding is to preserve the underlying manifold structure shown in the embedding process. The Laplacian embedding model we used is:

min_{e} \sum_{i, j}^{n} {‖ e_{i} - e_{j} ‖}^{2} W_{i j} = E^{T} L E

where L is the graph Laplacian matrix of weight matrix W and L = D − W, D_ii = ∑_j W_ij, D is a diagonal matrix. The optimal low-dimensional embedding is obtained by minimizing the eigenvalue of the graph Laplacian matrix. We use supervised manifold learning to realize obstacle detection, and the supervised information is added in the low-dimensional embedding process. A supervised matrix Ω contains 3D information and the sample classification information, and it is defined as follows:

Ω_{i j} = {\begin{array}{l} exp (- \frac{{| c_{i} - c_{j} |}^{2}}{2 σ^{2}}), & h_{i} \in N_{k} (h_{j}) \lor h_{j} \in N_{k} (h_{i}) \\ 0, & otherwise \end{array}

where c_i represents the classification of sample h_i, and the graph embedding formula using supervised Laplacian eigenmaps algorithm is [20]:

min_{e} \sum_{i, j} {‖ e_{i} - e_{j} ‖}^{2} W_{i, j} + λ \sum_{i, j} {‖ e_{i} - e_{j} ‖}^{2} Ω_{i j}

5.3. Laplacian regularized least square regression

The feature mapping f can be obtained by linear transformation E = f (H) = A^TH, where E is the low-dimensional embedding. But the feature mapping matrix A may not exist, then least square regression is used to obtain A [21]:

min_{a} \sum_{i} {‖ e_{i} - a^{T} h_{i} ‖}^{2} + γ {‖ a ‖}^{2}

The solution of the above equation is a = (HH^T + γI)⁻¹He, then the mapping matrix A contains the solution vectors a₁, a₂, ..., a_r, where r is the embedding dimension. A Laplacian Regularized Least Square Regression has been proposed to solve semi-supervised clustering problems [22]. In the method, the manifold assumption of the labeled and unlabeled samples to build more discriminative regression models is used in supervised learning. The method can be expressed as:

min_{a} \sum_{i} {‖ e_{i} - a^{T} h_{i} ‖}^{2} + λ \sum_{i, j} {‖ a^{T} h_{i} - a^{T} h_{j} ‖}^{2} W_{i, j} + γ {‖ a ‖}^{2}

The solution is a = (HH^T + λHLH^T + γI)⁻¹He, and the feature mapping matrix A can be constructed from a₁, a₂, ..., a_r.

6. Experiment results

In our experiments, we use Swiss Ranger SR-3000, a 3-dimensional non-scanning imaging sensor, to capture 3D information of the outdoor scenes. The system uses time-of-flight principle to measure distance, with 0.6 μm CMOS/CCD imaging sensor, and the imaging resolution is 144×176. For sinusoidal modulation of the emitted optical signal with a frequency of 20 MHz, the non-ambiguity range equals to 7.5 m.

6.1. Time-of-flight data preprocessing

6.1.1. Depth ambiguity elimination

To make TOF range images to be used for obstacle detection, we use the depth ambiguity elimination approach as described in section 2 to get the unambiguous part of the TOF data firstly. From left to right, Fig. 7 shows the range images, the ambiguity boundary lines of the range images and the extraction non-ambiguity foreground of two scenes, respectively. From Fig. 7 we can see that the method to eliminate range ambiguities by locating the ambiguity boundary lines can extract the non-ambiguity regions of the range images effectively.

Fig. 7 Ambiguity boundary lines of range images and the extracted non-ambiguity foreground regions.

Download Full Size | PDF

6.1.2. Noise estimation and smoothing

We segment depth images without ambiguity into regular square patches. The size of the patch is 8 × 8 and P is set as 3%. Three layers Haar wavelet decompositions are used to estimate the noise standard deviation using the HH₁ sub-band coefficients. The indexes of the patches are sorted in increasing order from top to bottom and from left to right. Fig. 8 and Fig. 9 show respectively the noise standard deviation statistics before and after the weighted local linear smoothing of the depth image in Fig. 4(a) and 300 frames of depth images. The red curve represents noise deviations before smoothing and the blue curve represents after smoothing.

Fig. 8 Noise standard deviations of a depth image.

Download Full Size | PDF

Fig. 9 Noise standard deviations of 300 frames.

Download Full Size | PDF

In Fig. 8, we can see that the depth image is segmented into 265 patches, and the indexes of patches with large noise deviations are relatively small, which indicates that the regions with large noise deviations are on the top of the image where the distances are relatively farther.

In order to find the noise distribution characteristics of depth images, we do further statistics to 300 frames of depth images. We set the regions into three parts by the index: big, inside and small. Every part has 50 patches and the mean value of each part is computed accordingly. In Fig. 9, we can see that patches with small index have large noise standard deviations, so we can conclude that in depth images, regions with farther distances have relatively larger noise deviations than nearer regions. And at the regions around the ambiguity line, the noise deviations are the highest and unstable. Moreover, from the statistics we can see, except some breakpoints in the regions (regions around the ambiguity boundary line in depth images) with large original noise deviations, the noise standard deviations in the depth images are reduced obviously after the weighted local linear smoothing.

6.2. Obstacle detection

In the process of supervised preserving projection manifold learning, 800 obstacle and non-obstacle data patches with the size of 8 × 8 in different range scenes and different positions are chosen as the training samples of manifold learning. Each sample directly contains the 3D coordinate information (X, Y, Z) of each pixel captured by TOF sensors, where Z is the range information of each pixel, and the unit is millimeter. Class labels (obstacles or non obstacles) are also added in the training samples (i.e., training patches). So the training samples are composed in the form of [X, Y, Z, label]^T. During the testing process, all the 3D data are transformed into normalized range clusters. Then the processing unit is normalized range cluster, and each range cluster contains the 3D information of all the pixels in it. Supervised preserving projection is used to extract the manifold features of the 3D data, and the nearest neighborhood classifier is applied in the low-dimensional embedding space to realize obstacle detection for the test samples (i.e., test patches). Because the selection of the parameters, neighborhood size K and embedding dimension d, is still open problems, in graph construction stage, we set K nearest neighborhood with K = 30 empirically, and in the graph embedding stage, the embedding dimension is set to d = 50 empirically. Fig. 10 shows the detection results and the corresponding pseudocolor images. Note that if the class label of the test patch is obstacle, it is marked as red, and if the class label of the test patch is ground, it is marked as blue. In TOF depth images, the red region is obstacle, and the blue region is the ground.

Fig. 10 Comparison results between regular patches based algorithm and range clusters based algorithm.

Download Full Size | PDF

In Fig. 10, from top to bottom, the rows respectively represent depth images without distance ambiguity, obstacle detection results based on regular patches, range clusters based detection results and range clusters based detection results with noise smoothing. From Fig. 10 we can see, both regular patch algorithm and range cluster algorithm with noise smoothing can locate the obstacles correctly, but range clusters based method with noise smoothing describes the shapes of obstacles more accurately in details. At the same time we can see range cluster algorithm is sensitive to noise, but the noise estimation and smoothing make this approach robust and stable to process TOF data. Besides, because the obstacles are not absolutely perpendicular to the ground and the ground is not absolutely flat, there exist some error detections. From the comparison we can find that, in the aspect of describing the obstacle details, range clusters based method with noise smoothing is much better than the regular patches based method.

7. Conclusion

In order to detect obstacle more precisely using TOF 3D imaging sensors, we use weighted local linear smoothing with noise deviation estimation to preprocess the data to reduce the influence of noise and outliers. And we propose a basic range cluster method which is based on 2D projection grids and K-means algorithm for TOF 3D data. By the number histogram of the original range clusters and the size chosen condition we can determine the optimal normalization size of all the clusters. Then, according to some criterions we normalize all the basic range clusters into regular square patches with the same optimal size. Thereafter, the supervised preserving projection is introduced to realize dimension reduction and manifold futures extraction of the TOF 3D data to classify the normalized range clusters into obstacles or non-obstacles.

Experimental results show that, compared with regular patches based method, range clusters based method we proposed describes the shapes of obstacles more accurately in details. For TOF 3D data, the weighted local linear smoothing algorithm can effectively low down the noise standard deviations. Besides, the supervised preserving projection manifold learning can take advantage of the inherent topological relationship and the underlying manifold structure of the TOF 3D data.

Acknowledgments

This paper was supported by the National Natural Science Foundation of China (Grant No. 61001143), by the Key Project Funds from the Province Office of Science & Technology of Fujian in China (Grant No. 2012H6024) and by the Natural Science Foundation of Fujian Province (Grant No. 2012J01286).

References and links

1. G. J. Iddan and G. Yahav, “3D Imaging in the Studio and Elsewhere,” in Videometrics and Optical Methods for 3D Shape Measurements, Proc. SPIE4298:, 48–55 (2001).

2. R. Bostelman, T. Hong, and R. Madhavan, “Obstacle detection using a time-of-flight range camera for automated guided vehicle safety and navigation,” Integr. Comput-Aid. E. 12, 237–249 (2005).

3. A. Prusak, O. Melnychuk, H. Roth, I. Schiller, and R. Koch, “Pose estimation and map building with a time-of-flight-camera for robot navigation,” Int. J. Intell. Syst. Tech. Appl. 5, 355–364 (2008).

4. F. Yuan, A. Swadzba, R. Philippsen, O. Engin, M. Hanheide, and S. Wachsmuth, “Laser-based navigation enhanced with 3d time-of-flight data,” in Proceedings of IEEE International Conference on Robotics and Automation, 2844C-2850 (2009).

5. S. Y. Kim, J. H. Cho, A. Koschan, and M. A. Abidi, “Spatial and Temporal Enhancement of Depth Images Captured by a Time-of-flight Depth Sensor,” in Proceeding of 20th International Conference on Pattern Recognition, 2358–2361 (2010).

6. S. H. McClure, M. J. Cree, A. A. Dorrington, and A. D. Payne, “Resolving depth measurement ambiguity with commercially available range imaging cameras,” in Image Processing: Machine Vision Applications III, Proc. SPIE7538:, (75380K) 2010.

7. A. D. Payne, A. P. P. Jongenelen, A. A. Dorrington, M. J. Cree, and D. A. Carnegie, “Multiple frequency range imaging to remove measurement ambiguity,” in Proceedings of 9th Conference on Optical 3-D Measurement Techniques, 139–148 (2009).

8. Y. Jiang, Y. Liu, Y. Q. Lei, and Q. C. Wang, “Supervised preserving projection for learning scene information based on time-of-flight imaging sensor,” Appl. Opt. 22, 5279–5288 (2013). [CrossRef]

9. I. Tziakos, C. Theoharatos, N. A. Laskaris, and G. Economou, “Color image segmentation using Laplacian eigenmaps,” J. Electron. Imaging 18,023004 (2009). [CrossRef]

10. L. Jovanov, A. Pizurica, and W. Philips, “Fuzzy logic-based approach to wavelet denoising of 3D images produced by time-of-flight cameras,” Opt. Express 18, 651–676 (2010). [CrossRef]

11. P. Seitz, “Quantum-noise limited distance resolution of optical range imaging techniques,” IEEE Trans. Circuits Syst. I, Reg. Papers 55, 2368–2377 (2008). [CrossRef]

12. S. A. Gumundsson, H. Aans, and R. Larsen, “Environmental effects on measurement uncertainties of time-of-flight cameras,” in International Symposium on Signals, Circuits and Systems, 1–4 (2007).

13. Y. Y. Yao, “Perspectives of granular computing,” in Proceedings of IEEE International Conference on Granular Computing, 85–90 (2005).

14. Z. Zhang and H. Zha, “Local linear smoothing for nonlinear manifold learning,” CSE-03–003, Technical Report, Pennsylvania State University, 2003.

15. J. Serra, “A lattice approach to image segmentation,” J. Math. Imaging Vis. 24, 83–130 (2006). [CrossRef]

16. J. Serra, “Connective segmentation,” in Proceedings of the 15th IEEE International Conference on Image Processing, 2192–2195 (2008).

17. S. Yan, D. Xu, B. Zhang, and H. Zhang, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell. 29, 40–51 (2007). [CrossRef]

18. M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003). [CrossRef]

19. V. N. Balasubramanian, J. Ye, and S. Panchanathan, “Biased manifold embedding: a framework for personin dependent head pose estimation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1–7 (2007).

20. C. BenAbdelkader, K. Daniilidis, P. Maragos, and N. Paragios, “Robust Head Pose Estimation Using Supervised Manifold Learning,” in Computer Vision-ECCV, Lect. Notes Comput. Sc.6316, 518–531 (2010). [CrossRef]

21. X. He, M. Ji, and H. Bao, “Graph embedding with constraints,” in Proceedings of International Joint Conference on Artificial Intelligence, 1065–1070 (2009).

22. M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” J. Mach. Learn. Res. 7, 2399–2434 (2006).

Range clusters based time-of-flight 3D imaging obstacle detection in manifold space

Abstract

1. Introduction

2. Extraction of the non-ambiguity regions

3. Noise estimation and elimination

3.1. Noise estimation of range images combining intensity images

3.2. Estimation of noise standard deviation

3.3. Weighted local linear smoothing

4. Range clusters

4.1. Realization of basic range clusters

4.2. Normalization of range clusters

4.2.1. Choosing criterions of the optimal size

5. Obstacle detection based on supervised preserving projection

5.1. Graph construction

5.2. Graph embedding

5.3. Laplacian regularized least square regression

6. Experiment results

6.1. Time-of-flight data preprocessing

6.1.1. Depth ambiguity elimination

6.1.2. Noise estimation and smoothing

6.2. Obstacle detection

7. Conclusion

Acknowledgments

References and links

Cited By

Figures (10)

Equations (21)

Optics Express