Robust approach to reconstructing transparent objects using a time-of-flight depth camera

Kyungmin Kim; Hyunjung Shim

doi:10.1364/OE.25.002666

1. Introduction

Reconstructing a three dimensional (3-D) object has been a long lasting research problem and has attracted considerable attention from many researchers and practitioners for used in various applications. For example, 3-D data and its processing algorithms have been widely adapted to 3-D film, 3-D printer/display, augmented/virtual reality, game, human-computer interaction, object recognition and image understanding. Recently, remarkable advances in depth sensing technology have lead to the commercial success of depth cameras. Depth cameras are available in the market at an affordable price and are capable of recording a depth video up to 30 fps. Although raw data from depth cameras still presents poor precision and low resolution, recent studies in depth denoising [1–3] and upsampling [4,5] algorithms have considerably improved the quality of depth videos. Consequently, the use of depth cameras is one of the promising means for tackling 3-D reconstruction problems.

Although depth denoising, deblurring, and resampling problems have been actively investigated in previous studies [1, 2, 4, 5], relatively little research has been conducted to address systematic depth errors when acquiring translucent [6, 7] or transparent [8] objects. Existing depth sensing techniques observe light reflection and analyze its value or pattern to obtain the depth value of a 3-D point by assuming an opaque scene. Because translucency involves light refraction and transmission, current depth cameras produce substantial depth errors when acquiring translucent objects. Unfortunately, translucent objects occupy a considerable portion of real objects. This highlights the importance of addressing the 3-D translucent object reconstruction problem.

Our goal is to recover the depth map of translucent objects using a time-of-flight (ToF) depth camera. The ToF depth sensing principle is commonly applied to developing commercial depth sensors. They include a PMD, SwissRanger, TriDiCam, Basler, and Kinect 2.0 [9]. The ToF depth camera emits an IR periodic signal and records its returned signal. For each 3-D point, this returned signal is modulated by the emitted signal in intensity and phase, where the phase modulation is the result of its 3-D information. To recover the phase modulation, we demodulate the returned signal by correlating it with the emitted signal. This correlation function is then evaluated at two or four phases. These resultant two or four values map to a unique depth value. In the case of translucent objects, the returned signal is a mixture of two signals: one from the background, the other from the foreground. Because the signal from the foreground is contaminated by the one from the background, its resultant values are no longer valid to represent the foreground depth.

This problem is akin to the multi-path interference in ToF imaging [10–16]. The multi-path interference is caused by inter-reflection from the surrounding environment. This problem is also caused by the mixture of multiple returned signals from different 3-D points. To address this issue, existing studies have employed one of the following: multiple depth maps that correspond to different modulation frequencies [10–12,14–16], coded exposures [17], or scene constraints such as Lambertain patches [13]. However, their approaches require to capture multiple depth maps under variable conditions. As a result, their approaches are not applicable for the depth map being already recorded.

Recently, Shim and Lee [7] developed a depth error model for translucent objects and used it to restore the depth distortion using a single depth map. Their model is meaningful to understand systematic depth distortions for translucent objects under a ToF depth sensing principle. However, their method is often error-prone because of strict assumptions, namely, a homogeneous thin layer object and the ignorable light refraction. Because of its lack of robustness, their method is not practical for general applications. These same researchers later employed a skewed stereo ToF depth camera for translucent object imaging [18]. They developed an iterative optimization framework to refine the depth of translucent objects using two distorted depth measures. However, both [7] and [18] require special capture scenarios because [7] uses the background depth map as a prior and [18] requires the controlled hardware setup.

In this study, we introduce a robust approach to recovering translucent objects using a single ToF camera and simple user marks as shown in Fig. 1. Our study is based on the depth error model developed by Shim and Lee [7], but extends the depth reconstruction framework by leveraging ground plane information and enforcing a piece-wise linear surface model. To avoid a curse of dimensionality in parameter estimation, [7] ignores the presence of noise, refraction components or irregularity of surface materials (e.g., non-uniform thickness of surface). These factors influence abnormal notches in the recovered depth map and are the major cause of crucial error peaks.

Fig. 1 Overview of the proposed algorithm.

Download Full Size | PDF

To alleviate these errors, we first utilize the ground plane to estimate the 3-D object. Because the object lies on the ground plane (unless it floats in the air), observing that the depth of the object is not very far from the depth of the object ground is possible. Moreover, we include the smoothness constraint into our formulation. More specifically, the object surface is treated as piece-wise linear. This is because even a non-planar surface can be approximated by multiple planar patches with a proper size of the local region. In addition, unlike the previous study [7], we do not require a background depth map as a prior. Instead, we use simple user marks that are rough scribbles to indicate the background and ground region. Based on the user marks, we extract the foreground region, ground, and background by assuming a planar background. Finally, a 3-D translucent object reconstruction problem is formulated by means of energy minimization using three cost terms; 1) a data term that evaluates the fitness of the depth error model, 2) a ground term that imposes the depth estimate close to the depth of the object ground, and 3) a linear surface term that compels the smooth surface. Because our additional cost terms are valid without the loss of generality, the proposed approach covers a wide range of object types and environmental conditions. In addition, our approach effectively eliminates error peaks and improves the accuracy of the estimated depth map.

The remainder of this paper is organized as follows. We introduce the background of depth error model in Section 2 and formulate the proposed approach through energy minimization in Section 3. Experimental results are then compared with those in [7] are presented in Section 4. Our extensive experimental analysis highlights the strength and robustness of our research.

2. Depth distortion of a translucent object in ToF imaging

By considering a thin layer object with a single background, the previous study in [7] developed a depth distortion model of translucent objects in ToF imaging. This model defines the recorded depth value as a function of its foreground depth d_f, background depth d_b and translucent parameter ξ, where d is $\sqrt{x^{2} + y^{2} + z^{2}}$ and denotes the distance of a 3-D point {x, y, z} from its origin of a camera optical axis. The four-bucket principle [11] employing a sinusoidal wave is used to construct this model and is defined as:

\begin{matrix} \dot{d} & = & f (d_{f}, d_{b}, ξ, \dot{I}, I_{b}) \\ = & \frac{c}{2} {tan}^{- 1} {\frac{A (Q_{3}^{b} - Q_{4}^{b}) + B (Q_{3}^{f} - Q_{4}^{f})}{A (Q_{1}^{b} - Q_{2}^{b}) + B (Q_{1}^{f} - Q_{2}^{f})}}, \end{matrix}

where

\begin{matrix} A = \frac{- (1 - ξ) L_{in} + \sqrt{{(1 - ξ)}^{2} L_{in}^{2} + 4 ξ^{2} I_{b} \dot{I} d_{f}^{4}}}{2 L_{in} d_{f}^{2}}, \\ B = \frac{(1 - ξ)}{d_{f}^{2}} . \end{matrix}

Table 1 summarizes the variables introduced in Eq. (1). We obtain L_in by using the calibration scheme described in [7]. When capturing the scene, depth cameras output a set of 3-D points (depth map) and their corresponding IR intensity values (IR image). Thus, ḋ and İ for each 3-D point are given by the raw measurements of the depth camera. $Q_{m}^{b}$ and $Q_{m}^{f}$ can be calculated from d_b and d_f respectively [9]. We use this model to evaluate the fitness of our estimate {d_f, ξ}. In other words:

[{\hat{d}}_{f}, \hat{ξ}] = arg min_{d_{f}, ξ} {‖ f (d_{f}, ξ, d_{b}) - \dot{d} ‖}^{2} .

Table 1. Summary of variables used in Eq. (1). Please see [7] for details.

View Table | View all tables in this article

3. Problem formulation through energy minimization

Our primary interest is to build a practical and reliable framework for reconstructing a 3-D translucent object. Although the depth distortion model described in Section 2 is feasible to account for the major cause of depth errors, this alone does not guarantee robust performance because of two strict assumptions: a homogeneous thin layer object and the ignorable light refraction. Unfortunately, real translucent objects often exhibit irregular surfaces (i.e., non-homogeneity) or contain refraction components. In addition, because of the nature of the active sensing principle, raw data are affected by substantial sensor noise. To address these concerns, we develop a robust approach to recovering a 3-D translucent object by formulating an energy minimization problem as illustrated in Fig. 1. Our objective function aggregates three cost terms to evaluate the 3-D point. The first term is adapted from [7] in order to enforce the fitness of the depth distortion model. This is equivalent to Eq. (2) except that we replace d with p, which indicates a {x, y, z} coordinate of a 3-D point. This is because other cost terms are associated with the coordinate instead of its distance. Table 2 lists new variables used in the following equations. The first energy term evaluates the fitness of the depth distortion model using:

E_{d} = \frac{1}{| T |} \sum_{p_{f} \in T} {‖ f (p_{f}, ξ, p_{b}) - \dot{p} ‖}^{2} .

Table 2. List of new variables for Eqs. (3)–(5).

View Table | View all tables in this article

We then introduce the second term, which penalizes the depth deviation from an object ground. Without loss of generality, we find that the depth distribution of a 3-D object is closely related to its ground position. To use the strong relationship, we use the distance between the estimated and ground depths as the second energy term. Letting p(z) be a z coordinate (depth) of p and g be the ground depth value, we develop the following energy term:

E_{g} = \frac{1}{| T |} \sum_{p_{f} \in T} {‖ p_{f} (z) - g ‖}^{2},

where the third term imposes the smoothness prior into our formulation. The common approach to applying the smoothness prior is to minimize the first order derivative of the object surface (i.e., minimizing the geometric variation). In our formulation, the second order derivative of the object surface is chosen as the third energy term and this is equivalent to enforcing a piece-wise linear surface model in the object geometry. In other words,

E_{s} = \frac{1}{| N |} \sum_{p_{f} \in N} {‖ h (p_{f}) ‖}^{2},

where h(·) represents a planar function with three parameters and N defines the local neighborhood. Notice that the implicit surface satisfies h(x, y, z) = ax + by + cz + d = 0. We apply the least squares method to compute the plane parameters {a, b, c, d} and use the deviation from the estimated plane as the energy term. This second order smoothness is suitable to represent the slanted surface and effective for curved surfaces with a reasonable piece size. Finally, we combine the aforementioned cost terms and construct the following energy equation:

[\hat{p_{f}}, \hat{ξ}] = arg min_{p_{f}, ξ} α E_{d} + β E_{g} + γ E_{s}, 0 \leq α, β, γ \leq 1, α + β + γ = 1 .

To solve Eq. (6), we use the Newton’s method. Although the convexity of energy equation is not guaranteed, we observe that the Newton’s method reaches an optimal solution. It is because the ground depth information provides a good initial value for our optimization problem. From the result of energy minimization, we simultaneously recover the 3-D translucent object and its translucent parameter. Throughout this study, we use $α = β = γ = \frac{1}{3}$ , which is empirically chosen. To derive this solution, we first must determine the following several variables used in each energy term: {T, g, N, p_b, ṗ}. Because ṗ is the raw measurement from the ToF depth camera, the remaining unknowns are {T, g, N, p_b}. We determine N to be an intersection of an N × N block centered on p_f and T. In this study, we set N = 5 for SR4K and N = 11 for Kinect 2.0.

To obtain p_b, Tg and g, we utilize a simple user interaction and the entire procedure is illustrated in Fig. 2. Assuming that the background and the ground region are planar surfaces, we ask the user to draw two scribbles to denote the background and ground plane attached to the object. Each of two scribbles is used to compute a plane parameter of the background and ground, respectively. Using these plane parameters, the background depth map (i.e., the depth map of the scene without the translucent object) is synthesized as shown in the third column of Fig. 2. We can then classify each pixel into one of three classes: background, translucent point, or ground. First, we determine that the pixel belongs to T if the difference between its recorded depth and synthesized background depth (the first and third columns in Fig. 2) is greater than a threshold (50 mm in our implementation). Second, the synthesized depth map provides p_b. Finally, we choose g as the smallest depth value along the intersection of the ground region and the boundary of T.

Fig. 2 Workflow for separating p_b, T and g using user scribbles.

Download Full Size | PDF

In practice, it should be considered that scribbles are not consistent for the same user over multiple trials or across multiple users. Fortunately, our priors are not solely dependent on user scribbles. Instead, we compute the ground plane and background plane based on scribbles and then derive a set of translucent points or the ground depth value from the ground and background information. In this way, user driven priors are not sensitive to variable scribbles because the user does not explicitly provide the ground depth value or a translucent point set. To confirm the robustness of user driven priors, we test the performance of ground and background plane estimation over different scribbles: a total of 45 scribbles which drawn by five users about three scenes (shown in Fig. 3) three times. For that, we compute the standard deviation of surface normal for the background and ground plane; their averages of the standard deviation are 1.48 degree and 4.35 degree, respectively. From this experiment, we observe that the plane estimation is reliable against variable scribbles.

Fig. 3 Qualitative comparison of recovered depth maps. From left to right: raw data, recovered depth map using E₁, E₂, E₃, E₄ and groundtruth. A snapshot of each experimental object is displayed at the top-left corner of each row. Note that E₁ indicates the algorithm proposed by [7], and E₄ is the proposed algorithm.

Download Full Size | PDF

4. Experimental results

Finally, experimental results are presented in the following subsections. In Section 4.1, we compare the results from the proposed algorithm to those of the previous study [7] and analyze the effect of each energy term on performance using MESA SR4K. Then, to show the adaptability of our framework to another sensor, we conduct an experiment using Kinect 2.0, which is the most popular off-the-shelf depth camera and can be purchased at an affordable price. Finally, based on extensive evaluations using various real-world objects illustrated in Figs. 3, 4, and 5, we justify the reliable performance of our proposed algorithm.

Fig. 4 Depth map reconstruction using Kinect 2.0. From left to right: raw data, recovered depth map using E₁, E₄ and groundtruth. A snapshot of each experimental object is displayed at the top-left corner of each row. For each depth map, we report the RMS error (mm) and each is computed after applying the median filter. Bold values indicate the best results for a given object.

Download Full Size | PDF

Fig. 5 Experimental results of various objects. From left to right: raw data, recovered depth map using E₁, and E₄. A snapshot of each experimental object is displayed at top-left corner of each row. A lamp cover is made of paper material, whereas a vinyl sheet is a flat, thin vinyl. From the recovered depth maps, we can observe that the proposed method recovers the original shape and reduces the number of undesirable surface notches.

Download Full Size | PDF

4.1. Analyzing the effect of energy term

As described in Section 3, we adapted three energy terms to our framework: the fitness of depth distortion model (E_d), ground depth prior (E_g), and planar regularization (E_s). To analyze the contribution of each energy term, we compared the accuracy of the 3-D reconstruction using four energy equations: E₁ = E_d, E₂ = E_d + E_g, E₃ = E_d + E_s, and E₄ = E_d + E_g + E_s. This comparison, enables us to understand the contribution of an additional prior(s) aggregating with E₁. For the evaluation, the following three experimental objects were chosen: a brown plastic box with a frontal pose (frontal drawer), the same box rotated along the vertical axis (slant drawer), and a white plastic cylinder (cylindrical basket).

We conducted both quantitative and qualitative evaluations using these three experimental objects. The results are presented in Tables 3 and 4, and in Fig. 3. When acquiring the raw data using MESA SR4K, we applied [19] as part of preprocessing in order to restore the dynamic range of the depth map. In this manner, we successfully removed holes and reduced the spatial-temporal noise from the raw data. For the quantitative evaluation, we obtained the groundtruth depth map as follows. For a planar object, we captured its depth map by attaching it to white matte paper and run RANSAC on the recorded depth map. The recorded depth map was then refined by means of plane parameters and this served as our groundtruth depth map. For the cylindrical basket, we filled it with opaque liquid and captured its depth map. Using the cylinder shape as a prior, we refined the recorded depth map and used it as our groundtruth depth map.

Table 3. Comparison of average depth reconstruction errors using three metrics. Three experimental objects shown in Fig. 3 are used to compute an average of the reconstruction error per metric: rel is a relative error, RMS (mm) is a root mean squared error, and log₁₀ is an absolute difference of log of depth maps. Bold values indicate the best result for a given error metric.

View Table | View all tables in this article

Table 4. Reconstruction errors of a depth map based on the choice of energy function. We use an RMS error as an error metric and mm as its unit. The denotation of “−” indicates that no post-processing is applied before errors are computed, whereas “Median” refers to median filtering used to eliminate outliers before errors are computed. Note that E₁ indicates the algorithm proposed by [7], and E₄ is the proposed algorithm. Bold values indicate the best two results for a given object.

View Table | View all tables in this article

Given both the groundtruth and recovered depth maps of the three experimental objects, we reported average reconstruction errors for the three common metrics listed in Table 3. Letting p(z) be the groundtruth depth and p(ẑ) be the reconstructed depth, the relative (rel) error is $\frac{1}{| T |} \sum_{p \in T} | p (z) - p (\hat{z}) | / p (z)$ , log₁₀ error is $\frac{1}{| T |} \sum_{p \in T} | \log_{10} (p (z)) - \log_{10} (p (\hat{z})) |$ , and the RMS is $\sqrt{\sum_{p \in T} {(p (z) - p (\hat{z}))}^{2} / | T |}$ . This assessment reveals that our method consistently performs better than described in the previous study [7], including other combinations of energy equations for all three metrics.

To analyze further the error characteristics based on object variations, we list reconstruction errors in Table 4. By comparing the groundtruth and recovered depth maps corresponding to four energy equations, the reconstruction errors were computed for each object and are summarized in the last four columns of Table 4. For the qualitative evaluation, we visualized the recovered depth maps shown in Fig. 3. Note that the results shown were generated after applying the median filter.

Our experiment yielded several interesting observations. First, based on the range of errors in the raw data (up to 500 mm), confirming that translucency introduces critical errors in depth maps was possible. Comparing the third and fourth columns in Table 4, we observed that the depth distortion model [7] was effective at reducing reconstruction errors. When including the additional prior(s) into the formulation, substantial gains were achieved and are reported in the last three columns in Table 4. In addition, we determined that simple post-processing (i.e., a median filter) can be useful at handling outliers.

In general, the proposed algorithm E₄ outperforms that in [7] both quantitatively and qualitatively. More importantly, our results consistently reported approximately 20 mm of reconstruction errors regardless the type of object. Note that the type of object greatly influenced the amount of errors in raw data and other energy equations. Between the third and sixth columns in Table 4, the errors for slant drawer are much greater than those for other objects. This is because the slant drawer was a case that did not obey the depth distortion model. In other words, its surface normal was not parallel to the incident IR light ray; as a result, the light refraction could no longer be ignored but significant according to Snell’s law. Note that Shim and Lee ignored the effect of light refraction when developing their depth distortion model. Consequently, this object represented one of the most challenging cases in the previous study [7]. Despite the theoretical limitation of the depth distortion model, our method achieved successful performance when boosted by additional prior terms.

Moreover, examining errors derived from different objects revealed interesting facts about each energy term. For the frontal drawer, we found that E₂ formulation was as effective as the proposed algorithm for the RMS errors. The same observation held in the 2-D visualization of recovered depth maps; the third and fifth columns in Fig. 3 are visually indistinguishable, even with groundtruth. In fact, this was anticipated because the frontal drawer represents a simple planar object with the same depth value. In this special case, the ground depth value was an optimal constraint. Therefore, it is reasonable that the formulation including the ground depth prior yielded the most accurate results in our experiments. The slanted drawer is a planar surface; thus, it should fit well with planar regularization. However, the significance of the improvement through planar regularization was not prominent in E₃ because the results from the depth distortion model were invalid. Instead, the ground depth prior served as a meaningful initial depth, and then the depth distortion model restored the erroneous depth at an affordable range, where planar regularization helped recover the shape. This explains why the proposed algorithm sustained reasonable accuracy whereas other algorithms do not. The cylindrical basket was chosen to represent a generalized object, which none of the three energy terms fits precisely. This object does not match the two layer configuration assumed in the depth distortion model. In addition, its surface presents variable depth such that the ground depth prior might introduce an improper bias into the optimization. Moreover, E_s was inaccurate because its surface was not planar. Although no prior was a strong cue for depth reconstruction, we found that aggregating three terms was effective at reinstating the original shape of the object, as seen in the third row of Fig. 3.

4.2. Evaluations using Kinect 2.0

In this section, we present experimental results using Kinect 2.0. While the first Kinect version used a structured light approach, the recent release Kinect 2.0 is based on the ToF principle [20]. Using reverse engineering, Blake et al. [21] observed that Kinect 2.0 uses a three-phase reconstruction approach. By contrast, we developed a depth distortion model based on four-phase depth reconstruction. Although modifying the depth distortion model according to different phase modulation schemes is possible, we decided to use the four-phase depth reconstruction instead. This is because the manufacture did not release a formulation on the manner in which the depth value is precisely computed from three phase delays. Fortunately, we empirically observed that the depth distortion model built upon the four-bucket principle still provides reasonable approximates for Kinect 2.0.

Fig. 4 shows the recovered depth maps with RMS error. To generate the groundtruth depth map, we used the same scheme described in the previous section. By comparing numerical errors, we confirmed that the proposed algorithm achieved a higher accuracy than did [7]. Moreover, the proposed algorithm was effective at pruning errors along the object boundary and restoring the original shape of the object. These experimental results are analogous to those described in Section 4.1 using MESA SR4K. Consequently, we empirically showed that our research can be extended to different ToF depth cameras.

We must note that, unlike MESA SR4K, Kinect 2.0 does not allow us to control the exposure time. This means that the preprocessing algorithm [19] could not be applied when acquiring raw data. Therefore, a substantial portion of the depth map remains as holes presenting invalid depth values. For both quantitative and qualitative evaluations, we simply discarded those yielding invalid pixels and compared the region retaining valid depth values.

5. Conclusion and discussion

This paper presented a practical solution for recovering a 3-D translucent object using a single depth map from a ToF depth camera and simple user scribbles. Because of the complex nature of light interactions within translucent media, the previous studies ignored the effect of refraction, irregularity of surface materials, and sensor noise. Therefore, the performance of existing techniques has often proven unreliable, producing notches such as error peaks. To alleviate these issues, we aggregated the depth distortion model with ground plane information and a smoothness constraint in order to reconstruct translucent objects. Our study showed that the proposed algorithm is not only effective at improving the accuracy of 3-D reconstruction, but is also feasible for practical applications when only a single depth map is available while no other conditions (e.g., a background or camera viewpoint) are controllable.

To assess performance, we conducted both quantitative and qualitative evaluations compared with those of Shim and Lee [7]. Extensive experiments reveal that the proposed algorithm is superior to previous algorithms at improving accuracy, restoring the original object shape and reducing error peaks. In addition, the proposed algorithm covers various materials such as plastic, translucent paper and vinyl and any smooth surfaces with various orientations. Moreover, by performing our experiments using two ToF cameras (i.e., MESA SR4K and Kinect 2.0), we confirm the adaptability of our framework to different ToF cameras.

One concern with our method is that it recovers a low resolution 3-D shape. This is because 3-D sensors cannot capture minute details of surfaces. Moreover, our method cannot restore a transparent object using this framework, as we do not consider light traversal (because of refraction). In the future, we plan to integrate the distortion caused by light refraction such as described in [8] to develop a comprehensive depth distortion model.

Funding

The MSIP (Ministry of Science, ICT and Future Planning), Korea, under the “ICT Consilience Creative Program” (IITP-R0346-16-1008) supervised by the IITP (Institute for Information & communications Technology Promotion); the Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2016R1A2B4016236).

References and links

1. B. Huhle, T. Schairer, P. Jenke, and W. Strasser, “Robust non-local denoising of colored depth data,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshop (IEEE, 2008), pp. 1–7.

2. L. Jovanov, A. Pižurica, and Wilfried Philips, “Fuzzy logic-based approach to wavelet denoising of 3D images produced by time-of-flight cameras,” Opt. Express , 18, 22651–22676, (2010). [CrossRef] [PubMed]

3. H. Schäfer, F. Lenzen, and C. Garbe, “Model based scattering correction in time-of-flight cameras,” Opt. Express , 22, 29835–29846, (2014). [CrossRef]

4. J. Park, H. Kim, Y.-W. Tai, M. S. Brown, and I. Kweon, “High quality depth map upsampling for 3D-TOF cameras,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 1623–1630.

5. S. Schuon, C. Theobalt, J. Davis, and S. Thrun, “High-quality scanning using time-of-flight depth superresolution,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshop, (IEEE, 2008), pp. 1–7.

6. H. Shim and S. Lee, “Performance evaluation of time-of-flight and structured light depth sensors in radiometric/geometric variations,” Opt. Eng. 51(1), 94401–94414 (2012). [CrossRef]

7. H. Shim and S. Lee, “Recovering translucent object using a single time-of-flight depth camera,” IEEE Trans. Circuits Syst. Video Technol. 26(5), 841–854, May , (2016). [CrossRef]

8. K. Tanaka, Y. Mukaigawa, H. Kubo, Y. Matsushita, and Y. Yagi, “Recovering Transparent Shape from Time-of-Flight Distortion,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 4387–4395.

9. S. Foix, G. Alenya, and C. Torras, “Lock-in time-of-flight (ToF) cameras: a survey,” IEEE Sens. J. 11(9), 1917–1926, (2011). [CrossRef]

10. M. Gupta, S. K. Nayar, M. B. Hullin, and J. Martin, “Phasor Imaging: A Generalization of Correlation-Based Time-of-Flight Imaging,” ACM Trans. Graph. 34(5), 1–18, October , (2015). [CrossRef]

11. A. Bhandari, A. Kadambi, R. Whyte, C. Barsi, M. Feigin, A. Dorrington, and R. Raskar, “Resolving multipath interference in time-of-flight imaging via modulation frequency diversity and sparse regularization,” Opt. Lett. 39, 1705–1708, (2014). [CrossRef] [PubMed]

12. S. Fuchs, “Multipath Interference Compensation in Time-of-Flight Camera Images,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2010), pp. 3583–3586.

13. D. Jimenez, D. Pizarro, M. Mazo, and S. Palazuelos, “Modeling and correction of multipath interference in time of flight cameras,” Image Vision Comput. 32(1), 1–13, January , (2014). [CrossRef]

14. M. Feigin, A. Bhandari, S. Izadi, C. Rhemann, M. Schmidt, and R. Raskar, “Resolving Multipath Interference in Kinect: An Inverse Problem Approach,” IEEE Sens. 16(10), 3419–3427, (2016). [CrossRef]

15. D. Freedman, E. Krupka, Y. Smolin, I. Leichter, and M. Schmidt, “SRA: Fast removal of general multipath for tof sensors,” In Proceedings of Euroupean Conference in Computer Vision, (Springer, 2014), pp. 234–249.

16. M. Feigin, R. Whyte, A. Bhandari, A. Dorington, and R. Raskar, “Modeling ‘wiggling’ as a multi-path interference problem in AMCW ToF imaging,” Opt. Express , 23(15), 19213–19225, July , (2015). [CrossRef] [PubMed]

17. K. Achuta, W. Refael, B. Ayush, S. Lee, B. Christopher, D. Adrian, and R. Ramesh, “Coded time of flight cameras: Sparse deconvolution to address multipath interference and recover time profiles,” ACM Trans. Graph. 32(6), 167November , (2013).

18. S. Lee and H. Shim, “Skewed stereo time-of-flight camera for translucent object imaging,” Image and Vision Comput. 43, 27–38, November , (2015). [CrossRef]

19. H. Shim and S. Lee, “Hybrid exposure for depth imaging of a time-of-flight depth sensor,” Opt. Express , 22, 13393–13402, June , (2014). [CrossRef] [PubMed]

20. H. Sarbolandi, D. Lefloch, and A. Kolb, “Kinect Range Sensing: Structured-Light versus Time-of-Flight Kinect,” Comput Vis Image Und. , 139, 1–20, October , (2015). [CrossRef]

21. J. Blake, F. Echtler, and C. Kerl, “OpenKinect: Open source drivers for the kinect for windows v2 device,” https://github.com/OpenKinect/libfreenect2.

ḋ	Recorded depth value ( $\sqrt{x^{2} + y^{2} + z^{2}}$ )	İ	Recorded IR intensity value
d_f	Foreground depth value	I_b	IR intensity value of background
d_b	Background	Q_m	Correlation between the input and m^th modulation signal
ξ	Translucent parameter	L_in	Emitted IR intensity value

ṗ	Recorded 3-D point	p_f	Foreground 3-D point
g	Depth value of ground point	p_b	Background 3-D point
T	A set of translucent points	N	A set of local neighborhood
\|T\|	The number of points in T	\|N\|	The number of points in N

Method	Raw data	E₁	E₂	E₃	E₄
Rel	0.24	0.07	0.04	0.06	0.02
log₁₀	92.3 × 10⁻³	32.8 × 10⁻³	18.7 × 10⁻³	31.5 × 10⁻³	5.3 × 10⁻³
RMS	276.89	96.46	57.88	88.87	21.98

Object	Filtering	Raw data	E₁	E₂	E₃	E₄

Frontal drawer	-	142.94	80.60	20.92	37.38	21.21
Frontal drawer	Median	143.73	47.90	17.78	35.07	18.12

Slant drawer	-	475.37	182.68	124.59	166.40	71.46
Slant drawer	Median	476.69	165.52	100.79	153.65	22.75

Cylindrical basket	-	94.81	40.78	30.26	41.96	28.00
Cylindrical basket	Median	94.75	37.40	26.28	38.66	24.10

ḋ	Recorded depth value ( $\sqrt{x^{2} + y^{2} + z^{2}}$ )	İ	Recorded IR intensity value
d_f	Foreground depth value	I_b	IR intensity value of background
d_b	Background	Q_m	Correlation between the input and m^th modulation signal
ξ	Translucent parameter	L_in	Emitted IR intensity value

Robust approach to reconstructing transparent objects using a time-of-flight depth camera

Abstract

1. Introduction

2. Depth distortion of a translucent object in ToF imaging

3. Problem formulation through energy minimization

4. Experimental results

4.1. Analyzing the effect of energy term

4.2. Evaluations using Kinect 2.0

5. Conclusion and discussion

Funding

References and links

Cited By

Figures (5)

Tables (4)

Equations (7)

Optics Express