Robust depth estimation for multi-occlusion in light-field images

Wei Ai; Sen Xiang; Li Yu

doi:10.1364/OE.27.024793

1. Introduction

With 3D and VR (virtual reality) technology growing quickly, depth estimation becomes one of the most popular research issues for the last few years. Because light-field camera can provide fruitful information captured by multiple viewpoints from a single shot [1–4], various approaches [5–9] (e.g. epipolar plane image, focal stack, and angular patch) have been proposed to develop light field depth estimation algorithms. Wanner et al. [10] utilized structure tensor to obtain dominant directions on epipolar plane images. Tao et al. [11, 12] firstly combined the defocus cue and the correspondence cue, then the shadow cue was added to make depth map more precise. Jeon et al. [13] made use of the phase shift theorem in the Fourier domain to estimate the sub-pixel shifts of sub-aperture images. These algorithms fail on occlusions because angular pixels may be observed from different objects due to occlusions, and photo-consistency assumption no longer holds.

Recently, Wang et al. [14, 15] explicitly modeled occlusions by using edge orientation to separate the angular patch into two equal regions by a straight line. The method improved depth results for line single-occlusion situations, but a straight line is not enough to deal with multi-occlusion situations. Zhu et al. [16] used k-means to separate the patch for selecting unoccluded views, and thus improves the performance with single-occlusion situation. However, the algorithm depends on the existing K-means strategy clustering effect, and it can not handle this situation with more complex clusters. Williem et al. [17,18] proposed a novel angular entropy and an adaptive defocus response to estimate the depths, but some regions are still over-smoothing because angular entropy is too random for complex details. There are also some other algorithms handling occlusions [19–21] and Each method has its own advantages and disadvantages.

In this paper, we propose a novel algorithm to solve the multi-occlusion problem. Our depth estimation algorithm is developed from occlusion in different views. For occlusion in the central view, an adaptive connected-domain selection algorithm is proposed to accurately separate the spatial patch. By the light-field multi-occlusion model between the spatial domain and the angular domain, the divided spatial patch can be mapped to the angular patch and the consistency regions (unoccluded views) can be obtained more precisely. The first moment and the second moment cues are applied on the consistency region to estimate depth. For occlusion in other views, we present a subpatch idea to design an estimated algorithm, which can obtain more accurate initial depth than the method without handling it. In order to evaluate initial depth more precisely, we present a creative way for confidence analysis, which considers curvature factor for the first time. Finally, a Markov Random Field (MRF) energy model is built to regularize depth maps. As shown in Fig. 1, the proposed algorithm provides much sharper boundaries and more precise details. Experimental results demonstrate that compared with state-of-the-art algorithms on light field, our method can obtain more accurate depth. The main contributions of this paper is as follows:

We analyze the light transmission model and derive the corresponding relationship on different occlusion situations. Then we extend it to multi-occlusion so that it can handle more complex situations.
We take different occlusions into consideration and propose corresponding algorithms to obtain unoccluded views precisely. A novel confidence analysis is designed in energy model to regularize depth maps better.

Fig. 1 Comparison of results of state-of-the-art depth estimation algorithms on a multi-occlusion real scene.

Download Full Size | PDF

2. Light-field multi-occlusion model

In this section, we first analyze different occlusion cases before we introduce the proposed light-field multi-occlusion model. There is a consensus that edge pixels and edges are considered as candidate occlusion pixels and candidate occlusion boundaries respectively and many algorithms start from it. However, some points near edges are not occluded in the central view, but occluded in some other views as Fig. 2 shows. So different from previous work, we analyze different multi-occlusion situations from the perspective of light. It can also be extended to any shape occlusions and deal with different occlusion situations.

Fig. 2 Different occlusion situations. (a) The light model of two occlusion situations. Point x₁ is occluded in the central view u₃ and it is the edge pixel in the central view image. Near the point x₁, point x₂ is not occluded in the central view u₃ and it is not the edge pixel in the central view image. However it can be seen that point x₂ is occluded in other views (u₁). (b) Real images of two occlusion situations. The left column of Fig. 2(b) are the spatial patches of two points in the central view image. The right column of Fig. 2(b) are their angular patches when refocused to the true depth. There are some occluded views in the angular patch of green point although it is not occluded in the central view.

Download Full Size | PDF

One occluder analysis:

For analyzing multi-occlusion better, we firstly take one occluder as an example as shown in Fig. 3. A point (x₀, y₀, F) located on the focal plane at the depth of F, and an occluder at Z₁ plane has a straight edge, which contains the point (X₁, Y₁, Z₁). (X₀, Y₀, F) is projected to Z₁ plane at (X₀, Y₀, Z₁) and the occluder point (X₁, Y₁, Z₁) is projected to F plane at (x₁, y₁, F). Assume that the connection between two points is perpendicular to the edge, so the formed line is the distance between center point and the edge.

Fig. 3 The light-field single occluder model. The left of Fig. 3(a) shows the pinhole imaging model with an occluder where the center of the camera image is (u₀, v₀) and the right of Fig. 3(a) is the spatial patch centered at (x₀, y₀). The left of Fig. 3(b) shows the point (x₀, y₀, F) can be focused by the views within the edges of an occluder while the other views are blocked by the occluder. The right of Fig. 3(b) is the angular patch formed at (x₀, y₀).

Download Full Size | PDF

In Fig. 3(a), for the central view (u₀, v₀), the normal vector of the edge at Z₁ plane can be expressed by (X₁ − X₀, Y₁ −Y₀) and the actual distance D_gt is modulus of the normal vector. When the light at Z₁ plane is projected to f plane, these relationships can be obtained by projection principle.

\begin{array}{l} D_{spatial} = \frac{F}{Z 1} * D_{gt} \\ \underset{(x_{1} - x_{0}, y_{1} - y_{0})}{\to} / / \underset{(X_{1} - X_{0}, Y_{1} - Y_{0})}{\to} \end{array}

and through the Fig. 3(b), the main lens is refocused on the depth f. Similarly, by the reserved light model these can be obtained.

\begin{array}{l} D_{angular} = \frac{F}{F - Z 1} * D_{gt} \\ \underset{(u_{1} - u_{0}, v_{1} - v_{0})}{\to} / / \underset{(x_{1} - x_{0}, y_{1} - y_{0})}{\to} \end{array}

Then considering the distance, we firstly assume that (X₀, Y₀) is the edge pixel in central view. D_gt, D_spatial, D_angular are zero. From the analysis above, it can be seen that the vector (x₀ − x₁, y₀ − y₁) of the edge in spatial domain is consistent with the vector (x₀ − x₁, y₀ − y₁) of the boundary between occluded views and unoccluded views in the angular domain as shown in Fig. 4(a). We call it boundary-consistency. Secondly, we assume that (X₀, Y₀) is near the edge pixel in central view. D_gt is non-zero, so D_spatial is different from D_angular because of the distance as shown in the right of Fig. 4(b).

Fig. 4 (a) The patch situations on condition that (x₀, y₀) is the edge pixel and occluded in the central view. (b) The patch situations on condition that (x₀, y₀) is near the edge pixel and occluded in other views.

Download Full Size | PDF

Multi-occluders analysis:

Then we extend the model to multi-occlusion model as shown in Fig. 5. Take the case of two occluders, a point (X₀, Y₀, F) located on the focal plane at the depth of F. An occluder at Z₁ plane has a straight edge, which contains the point (X₁, Y₁, Z₁), and another occluder at Z₂ plane has a oblique edge with point (X₂, Y₂, Z₂). Let the point (X₀, Y₀) be the origin, any point (X, Y) to the origin can form a vector (X − X₀, Y − Y₀).

Fig. 5 The light-field multi-occlusion model. The left of Fig. 5(a) shows the pinhole imaging model with a multi-occluder where the center of the camera image is (u₀, v₀) and the right of Fig. 5(a) is the spatial patch centered at (x₀, y₀). The left of Fig. 5(b) shows the point (x₀, y₀, F) can be focused by the views within the edges of multi-occluder (shown by two green planes), while the other views are blocked by the occluder. The right of Fig. 5(b) is the angular patch formed at (x₀, y₀).

Download Full Size | PDF

And for the central view (u₀, v₀), the two edges can be expressed by

\begin{array}{l} \vec{e_{1}} = (X_{1} - X_{0}, Y_{1} - Y_{0}) \\ \vec{e_{2}} = (X_{2} - X_{0}, Y_{2} - Y_{0}) \end{array}

As shown in Fig. 5(a), if the point is not occluded in the spatial domain, its vector must be between the vectors of two edges. Assume that (X₁, Y₁, Z₁) is the upper boundary and (X₂, Y₂, Z₂) is the lower boundary, the vector of an unoccluded point $\vec{e_{up}} = (x_{up} - x_{0}, y_{up} - y_{0})$ , so we can infer that it should satisfy the following constraints.

\begin{array}{l} \vec{e_{up}} \times \vec{e_{1}} > 0 D 1_{spatial} = \frac{F}{Z_{1}} * D 1_{gt} \\ \vec{e_{up}} \times \vec{e_{2}} < 0 D 2_{spatial} = \frac{F}{Z_{2}} * D 2_{gt} \end{array}

Then considering Fig. 5(b) in the angular domain, the main lens is refocused on depth F. Similarly, an unoccluded view (u_uo, v_uo) can form the vector $\vec{e_{u v}} = (u_{u o} - u_{0}, v_{u o} - v_{0})$ , if the point (x₀, y₀, F) can be observed by the view, it also obeys the rule.

\begin{array}{l} \vec{e_{u v}} \times \vec{e_{1}} > 0 D 1_{angular} = \frac{F}{F - Z_{1}} * D 1_{gt} \\ \vec{e_{u v}} \times \vec{e_{2}} < 0 D 2_{angular} = \frac{F}{F - Z_{2}} * D 2_{gt} \end{array}

From the analysis above, when D1_gt, D2_gt are zero, Eq. (4) and Eq. (5) have the same constraints with one-to-one correspondence, so (u₀, v₀) and (x₀, y₀) share the same separation line. So for occlusion in central view, boundary-consistency exists in any occlusions. This helps obtaining the consistency region (unoccluded views) in the angular domain. when D1_gt, D2_gt are non-zero, the situation becomes very complex. Occlusions with varying depths have different offsets with the central point. Unfortunately, since depth estimation is unavailable, it is inapplicable to obtain the offset distance. So for occlusion in other views, an alternative approach is designed in our algorithm to obtain the consistency region approximately.

3. Initial depth estimation

The angular patch of each unoccluded pixel exhibits photo-consistency in all regions while the angular patch of each occluded pixel exhibits photo-consistency in part of regions. The key issue is selecting the consistency region (unoccluded views vs. occluded views) in the angular patch for each occluded pixel. In this section, we present how to select the consistency region in the angular domain, and introduce how to obtain an initial depth map by the consistency region based on different properties for the central view and the other views.

3.1. Anti-occlusion in the central view

3.1.1. Adaptive connected-domain selection

Edge detection is applied to the central view (pinhole) image to obtain an edge map. These edge pixels and edges are considered as candidate occlusion pixels and candidate occlusion boundaries respectively. For each edge pixel we extract an edge patch with pixel p being the center, and the patch is in the same size with the angular resolution of light field. Four-Connected Components Labeling algorithm [22,23] is applied on the edge patch to label the spatial patch and these pixels with the same label compose a region. So the patch is divided into several regions because of different labels.

In addition, Connected Components Labeling algorithm can not label these edge line pixels, so in order to label them, we design a method which fuses the color distance in Eq. (6) and the space distance in Eq. (7), which are shown as follows.

\bar{{Dc}_{i}} = \frac{1}{N_{i}} \sum | I_{e} - I_{i} |

{Ds}_{i} = \sqrt{{(x_{e} - x_{c_{i}})}^{2} + {(y_{e} - y_{c_{i}})}^{2}}

{Label}_{e} = \arg min_{i} \bar{{Dc}_{i}} * {Ds}_{i}

where I_e and I_i denote respectively the intensity of pixels at edge line and pixels labeled i. N_i is the number of pixels labeled i. x_e and x_{c_i} are respectively x-coordinate of pixels at edge line and the center pixels of the region labeled i. y_e and y_{c_i} have the similar meaning but are with y-coordinate.

According to the boundary-consistency mentioned in section 2, angular patches have the same labels with edge patches. The region including the center pixel p is selected into the consistency region Ω_p. Compared with state-of-the-art algorithms, our method can divide the patch into several regions adaptively and correctly for multi-occlusion. As an example, Fig. 6 shows the process result for a multi-occlusion point. Compared with other algorithms, the consistency region selected by our method is more accurate.

Fig. 6 Consistency region selection analysis. (a) A multi-occlusion point. (b) Spatial patch for the point. (c) Our initial label result. (d) Our relabeled result(white area is the consistency region). (e) Wang et al. [15]. (f) Zhu et al. [16]. (g) Ground truth.

Download Full Size | PDF

3.1.2. Depth estimation

We refocus the light field data to various depths, the light field 4D shearing is performed as follows [2].

L_{α} (x, y, u, v) = L (x + u (1 - \frac{1}{α}), y + v (1 - \frac{1}{α}), u, v)

where L is the input 4D light-field data and α is the refocused depth. L_α is the refocused light-field image at depth α. (x, y) is the spatial coordinate. (u, v) is the angular coordinate and the central image of the light-field data is defined as L(x, y, 0, 0).

In order to analyze regional pixel consistency, we calculate the first moment and the second moment of the region Ω_p. For pixel p, Ω_p is selected by adaptive connected-domain selection algorithm if p is an edge point, otherwise Ω_p is the whole patch. Specific calculation methods are as follows. The first moment of error is performed by

E (Δ_{α} (x, y)) = | \frac{1}{N_{i}} \sum_{u_{i}, v_{i}} L_{α} (x, y, u_{i}, v_{i}) - L_{α} (x, y, 0, 0) | i = Ω_{p}

where L_α is the refocused light-field image at depth α, (u_i, v_i) is the consistency region of pixel p and N_i is the number of pixels in the consistency region. Then the second moment is performed by

E (Δ_{α}^{2} (x, y)) = \frac{1}{N_{i} - 1} \sqrt{\sum_{u_{i}, v_{i}} {(L_{α} (x, y, u_{i}, v_{i}) - L_{α} (x, y, 0, 0))}^{2}}

Finally, the sum of first and second moment is the total cost of the point at α depth. The initial depth α is determined as

α (x, y) = \arg min_{α} (E (Δ_{α} (x, y)) + E (Δ_{α}^{2} (x, y)))

3.2. Anti-occlusion in other views

For pixels around the occlusion edge, previous work [15,16] just dilated edges with a rough size, so depth edges are blurry because of the uncertain size. In our method, we design a filter to determine these points. Since consistency region of these points occluded in other views are imprecise and they should have larger cost than other points. We calculate the mean and variance of all points’ cost in the scene, which are used as parameters of a filter to identify these points and mark them.

For the marked points, instead of measuring the cost in consistency region which is affected by occlusion in other views, we search a subregion to avoid influence of occlude in other views. In our method, the angular patch (9 × 9) is divided into 9 subpatches (3 × 3). The total cost C_α(p) of each subpatch p is performed by

C_{α} (p) = \frac{1}{N_{p}} \sqrt{\sum {(L_{α} (x, y, u_{p}, v_{p}) - L_{α} (x, y, 0, 0))}^{2}} + | (\frac{1}{N_{p}} \sum L_{α} (x, y, u_{p}, v_{p})) - L_{α} (x, y, 0, 0) |

where p is the index of subpatch. N_p is the number of pixels in subpatch. The minimum subpatch cost is selected as the new consistency region at α depth.

i = \arg min_{p} C_{α} (p)

The depth is estimated by

α (x, y) = \arg min_{α} C_{α} (i)

As an example, Fig. 7 shows the process results. The initial depth map in Fig. 7(b) is obtained by Eq. (12), and the areas occluded in other views (close to the edge) in the initial depth map obtain imprecise depth. In Fig. 7(c) these areas are detected by our method. And in Fig. 7(d), most of these areas are corrected to the accurate depth by our algorithm and sharper boundaries are well preserved.

Fig. 7 An example of botton scene. (a) color image. (b) Initial depth map. (c) Detection map for points occluded in other views (the white area). (d) refined depth map

Download Full Size | PDF

4. Depth regularization

Given the initial depth obtained in Section 3, we present in this section the approach of depth refinement with global regularization. More specifically, we incorporate curvature based confidence analysis into the data cost, and the depth evaluation is more accurate.

4.1. Curvature based confidence analysis

In order to analyze the confidence, we select two pixels as an example. The two pixels are marked in red and yellow color in the initial depth map in Fig. 8(a). Their Depth-Cost (D-C) curves with the horizontal axis and the vertical axis being total cost and different depths respectively show in Fig. 8(b). The D-C curve of the red point is very different from that of the yellow point. Especially, near the minimum value curve of the red point has a sharper change than that of the yellow point and the minimum value of the red curve is smaller.

Fig. 8 Confidence analysis. (a) Two points marked in initial depth map(red point and yellow point). (b) The D-C curves of two points: top one for the yellow point in Fig. 8(a) and bottom one for the red point in Fig. 8(a). Ground truth is 90 for all.

Download Full Size | PDF

For each pixel, if the consistency region is precisely selected, it will exhibit photo-consistency and when reaching the lowest cost, the total cost is sure to be very small and more focused at its minimal. Therefore, these pixels have a higher confidence with more accurate initial depth. For example, the depth of red point is more reliable than that of yellow point. Based on this finding, we propose a method to estimate confidence of initial depth as follows

C_{\min} (x, y) = \min (E (Δ_{α} (x, y)) + E (Δ_{α}^{2} (x, y)))

Then the sharper change can be measured by the curvature of curve at the lowest point, which can be expressed by

curvature : Cur (D, C) = (\frac{d^{2} C}{{(1 + d C^{2})}^{\frac{3}{2}}})

In order to improve the robustness of the algorithm and eliminate the effects of noise, we find multiple troughs of the curve except (D, C_min) point and sort them in ascending order, so the second smallest point (D_second, C_second) is obtained. Then the curvature of (D_second, C_second) is calculate by Eq. (17). At last, the confidence of pixel (x, y) is obtained by

Con (x, y) = k \frac{Cur (D, C_{\min})}{C_{\min}} * \frac{Cur (D, C_{\min})}{Cur (D_{second}, C_{second})} * \frac{C_{second}}{C_{\min}}

where k is a weight. In our experiment, we set 10 for all data. Figure 9 shows the result. Figure 9(c) shows the difference between ground truth and initial depth map where brighter pixels have larger difference. Figure 9(d) is the corresponding confidence map where darker pixels have larger confidence. The confidence map is consistent with the difference map mostly.

Fig. 9 An example of boxes scene. (a) color image. (b) Initial depth map. (c) Difference map. (d) Confidence map

Download Full Size | PDF

4.2. Final depth regularization

Finally, given the initial depth and the confidence cue, we refine the result with global regularization by using data cost and smoothness cost. More specifically, the initial depth map is regularized with a Markov Random Field (MRF) model and the problem is further casted to minimizing the energy as follows.

E = \sum_{p} E_{data} (p, α (p)) + λ \sum_{p, q} E_{smooth} (p, q, α (p), α (q))

where α is the final depth, p, q are neighboring pixels, and λ is a weight which we set to 5.

Based on the confidence, data cost is defined as a Gaussian function, as follows.

E_{data} (p, α (p)) = 1 - {exp}^{- \frac{{(α - α (p))}^{2}}{2 {(1 - con (p))}^{2}}}

where con(p) controls the sensitivity of the function which can be found in confidence analysis section.

The smooth energy cost controls the smoothness constraint between two neighboring pixels, which is defined as

\begin{array}{l} E_{smooth} (p, q, α (p), α (q)) = \\ \frac{| α (p) - α (q) |}{(\nabla I (p) - \nabla I (q)) + ω_{c} (I_{e} (p) - I_{e} (q)) + δ} \end{array}

where ∇I is the gradient of the central pinhole image, ω_c is a weight balances between the gradient and the edge, and δ is 0.1 to avoid infinite smoothness term. If two pixels are very similar, the gradient term and the edge term are very small to compose a region. In contrast, if two pixels are very different or there may be an occlusion, the gradient term and the edge term preserve sharp boundary. The minimization of E is solved by a standard graph cut algorithm [24] [25] [26].

5. Experimental results

In order to evaluate the performance of the proposed method, we test it on both synthetic and real light field datasets. The synthetic datasets are from public datasets provided by 4D Light Field Dataset [27], and it contains light field images and ground truth for comparing. Using the stratified, test and training images of the 4D Light Field Dataset, we compare all results with recent algorithms ([28], [29], [13], [30], [14,15], [16] and [17,18]) in various aspects. Details of the resulting disparity maps of these existing methods can be found in [27]. Then the real light field database is created by Stanford University [31]. In experiments, depths of all methods are set to [0,100] for fair comparison and the given disparity range on each image is provided by the synthetic dataset or [−1,1] in the real dataset. Our proposed algorithm is implemented in MATLAB and VS2015 on a PC with a 3.2GHz CPU.

5.1. Evaluation on synthetic datasets

Table 1 shows the averaged evaluation metrics in general, stratified and photorealistic performance evaluation on the synthetic images from 4D Light Field Dataset [27]. Lower scores indicate better performance for all metrics. It can be seen that our method outperforms these state-of-the-art algorithms in many aspects and shows a better comprehensive performance.

Table 1. General, Stratified and Photorealistic Performance Evaluation

View Table | View all tables in this article

In these metrics, Mean Squared Error (MSE) is a more reasonable metric. So for better evaluation, we show more details on each synthetic dataset compared with these algorithms in Table 2 where our algorithm reports the lowest scores among all algorithms and gives more reliable depth results. In addition, the proposed algorithm consumes less time than most of these methods.

Table 2. MSE Comparison on Each Synthetic Image

View Table | View all tables in this article

Figure 10 shows the detailed results of dino dataset in 4D Light Field Dataset. As shown in the figures, especially for the tiny hole highlighted with red boxes, only our method can obtain the accurate result in this complex occlusion case, while other algorithms produce blurs. As to the blue boxes and green boxes covering object edges, [28], [29] and [13] over-smooth and miss it. [30], [15] and [16] have some loss in line boundaries, and there are many blurs on boundaries of objects because of pixels occluded in other views. [18] obtains rough boundaries and misses complex occlusions, and our results can give sharper boundaries and better details.

Fig. 10 The comparison of dino details on 4D Light Field Dataset.

Download Full Size | PDF

Figure 11 shows the results for 4D Light Field Dataset [27]. As shown in the figures, for the net part in the first row, the proposed method not only provides good details but also avoids being over-sharp. For the region around the statue in the second row, the region is occluded in other view. [29] and [13] just smooth it without handling it explicitly. [28] and [30] are blurry in these parts. [15] and [16] adopt a coarse method, so there are many blurs on boundaries of statue, these depths are incorrect and [18] is better than other algorithms but still over-smooth some parts. Our algorithm solves the occlusion problem, so the boundary around the statue is clear and precise. For the last two rows, our results also show the better performances. In short, our algorithm outperforms these algorithms in terms of multi-occluder occasions, complex details and occlusions in other views. In summary, no matter in the objective or the subjective perspective, the proposed algorithm outperforms other algorithms.

Fig. 11 The comparison of synthetic datasets on 4D Light Field Dataset.

Download Full Size | PDF

5.2. Evaluation on real datasets

Figure 12 compares results on real scenes database created by Stanford University, and these images are captured by Lytro Illum [31]. It can be seen that our results still preserve details of the scene well and avoid being over-sharp, and these results are the same as that of synthetic datasets. Only our method well preserves the structure of the near plant and the net (first row), and the precise details of wooden net (third row). And only our method can capture the little objects. For example, the little plant on the bottom right corner can be well obtained in our method (second row). Moreover, for complex scenes (fourth row), our method also gives a more detailed depth map than other methods, and reproduces the thin structure of the branch without burrs, and keeps the details without expanding the boundaries (final row).

Fig. 12 The comparison of real datasets on Stanford Dataset [31].

Download Full Size | PDF

6. Conclusion

In this paper, a thoughtful depth estimation algorithm is proposed which is robust not only in single-occluder occasions but also in multi-occluder scenarios. we build a light-field model from the multi-occlusion situation and prove that boundaries between occluded views and unoccluded views in angular domain correspond to edges of the occluders in spatial domain. Based on the fact, an adaptive connected-domain selection algorithm is proposed to obtain more accurate consistency regions in angular domain for occlusion in the central view. Considering the occlusion in different views, we develop a subpatch approach for anti-occlusion in other views to keep sharper boundaries. A novel confidence analysis which considers the curvature factor is proposed to obtain more precise confidence value for better depth evaluation. The final depth estimation is optimized by using the MRF framework which fuse the confidence analysis and initial depth map. our algorithm outperforms other algorithms on synthetic datasets and real-world scenes, and can be used in a range of applications such as 3D reconstruction, VR and AR scene.

Funding

National Nature Science Foundation of China (61871437,61702384).

References

1. T. Georgiev, Z. Yu, and A. Lumsdaine, “Lytro camera technology: theory, algorithms, performance analysis,” Int. Soc. Opt. Eng. 8667, 1–10 (2013).

2. N. Ren, L. Marc, B. Mathieu, D. Gene, H. Mark, and H. Pat, “Light field photography with a hand-held plenoptic camera,” Comput. Sci. Tech. Rep. 2, 1–11 (2005).

3. H. Mark, “Focusing on everything,” IEEE Spectr. 49(5), 44–50 (2012). [CrossRef]

4. C. Hahne, A. Aggoun, S. Haxha, V. Velisavljevic, and J. C. J. Fernández, “Light field geometry of a standard plenoptic camera,” Opt. Express 22, 26659–26673 (2014). [CrossRef] [PubMed]

5. Z. Ma, Z. Cen, and X. Li, “Depth estimation algorithm for light field data by epipolar image analysis and region interpolation,” Appl. Opt. 56, 6603–6610 (2017). [CrossRef] [PubMed]

6. Y. Qin, X. Jin, Y. Chen, and Q. Dai, “Enhanced depth estimation for hand-held light field cameras,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, (IEEE, 2017), pp. 2032–2036.

7. C. Kim, H. Zimmer, Y. Pritch, S.-H. Alexander, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” Acm Trans. Graph. 32, 1–12 (2017). [CrossRef]

8. Z. Cai, X. Liu, X. Peng, Y. Yin, A. Li, J. Wu, and B. Z. Gao, “Structured light field 3d imaging,” Opt. Express 24, 20324–20334 (2016). [CrossRef] [PubMed]

9. T. Tao, Q. Chen, S. Feng, Y. Hu, and C. Zuo, “Active depth estimation from defocus using a camera array,” Appl. Opt. 57, 4960–4967 (2018). [CrossRef] [PubMed]

10. S. Wanner and B. Goldluecke, “Globally consistent depth labeling of 4d light fields,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2012), pp. 41–48.

11. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2013), pp. 673–680.

12. M. W. Tao, P. P. Srinivasan, J. Malik, and R. Ramamoorthi, “Depth from shading, defocus, and correspondence using light-field angular coherence,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 1940–1948.

13. H.-G. Jeon, J. Park, G. ChoE, J. Park, Y. Bok, Y.-W. Tai, and S. K. In, “Accurate depth map estimation from a lenslet light field camera,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 1547–1555.

14. T. C. Wang, A. A. Efros, and R. Ramamoorthi, “Occlusion-aware depth estimation using light-field cameras,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 3487–3495.

15. T. C. Wang, A. A. Efros, and R. Ramamoorthi, “Depth estimation with occlusion modeling using light-field cameras,” IEEE Trans. Pattern Anal. Mach. Intell. 38, 2170–2181 (2016). [CrossRef] [PubMed]

16. H. Zhu, Q. Wang, and J. Y. Yu, “Occlusion-model guided anti-occlusion depth estimation in light field,” IEEE J. Sel. Top. Signal Process. 11, 965–978 (2017). [CrossRef]

17. W. Williem and I. K. Park, “Robust light field depth estimation for noisy scene with occlusion,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 4396–4404.

18. W. Williem, I. K. Park, and K. M. Lee, “Robust light field depth estimation using occlusion-noise aware data costs,” IEEE Trans. Pattern Anal. Mach. Intell. 40, 2484–2497 (2018). [CrossRef]

19. T. Ryu, B. Lee, and S. Lee, “Mutual constraint using partial occlusion artifact removal for computational integral imaging reconstruction,” Appl. Opt. 54, 4147–4153 (2015). [CrossRef]

20. M. Ghaneizad, Z. Kavehvash, and H. Aghajan, “Human detection in occluded scenes through optically inspired multi-camera image fusion,” J. Opt. Soc. Am. A 34, 856–869 (2017). [CrossRef] [PubMed]

21. S. Xie, P. Wang, X. Sang, Z. Chen, N. Guo, B. Yan, K. Wang, and C. Yu, “Profile preferentially partial occlusion removal for three-dimensional integral imaging,” Opt. Express 24, 23519–23530 (2016). [CrossRef] [PubMed]

22. A. L. Dulmage and N. S. Mendelsohn, “Coverings of bipartite graphs,” Can. J. Math. 10, 516–534 (1958). [CrossRef]

23. A. Pothen and C.-J. Fan, “Computing the block triangular form of a sparse matrix,” ACM Trans. Math. Softw. 16, 303–324 (1990). [CrossRef]

24. Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Mach. Intell. 26, 1124–1137 (2004). [CrossRef]

25. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001). [CrossRef]

26. V. Kolmogorov and R. Zabin, “What energy functions can be minimized via graph cuts?” IEEE Trans. Pattern Anal. Mach. Intell. 26, 147–159 (2004). [CrossRef] [PubMed]

27. K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke, “A dataset and evaluation methodology for depth estimation on 4d light fields,” in Proceedings of Asian Conference on Computer Vision, (Springer, 2016), pp. 19–34.

28. O. Johannsen, A. Sulc, and B. Goldluecke, “What sparse light field coding reveals about scene structure,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 3262–3270.

29. L. Si and Q. Wang, “Dense depth-map estimation and geometry inference from light fields via global optimization,” in Proceedings of Asian Conference on Computer Vision, (Springer, 2016), pp. 83–98.

30. S. Zhang, H. Sheng, C. Li, J. Zhang, and X. Zhang, “Robust depth estimation for light field via spinning parallelogram operator,” Comput. Vis. Image Underst. 145, 148–159 (2016). [CrossRef]

31. A. S. Raj, M. Lowney, and R. Shah, “Light-field database creation and depth estimation,” Tech. Rep., Department of Computer Science, Stanford University (2016).

Category	[28]	[29]	[13]	[30]	[15]	[16]	[18]	OUR

AVG_MSE	5.975	6.584	9.128	3.968	6.690	6.586	3.730	3.566
MED_MSE	3.932	3.940	7.963	3.309	2.803	3.737	2.667	1.831
Badpix0.07	24.324	14.299	16.193	8.466	17.579	15.403	8.211	14.435
Runtime(log10)	1.945	3.313	3.004	3.325	4.026	3.066	2.920	2.676
Discont.	45.338	29.290	38.210	22.807	38.263	33.346	21.750	31.292
Fine Fattening	58.858	25.193	41.421	29.224	42.090	40.368	27.031	33.154
Bright Stripes	27.745	66.153	52.889	33.906	41.314	23.194	17.739	5.119
Low Texture	5.291	37.386	35.213	10.584	14.151	8.641	2.160	0.037
MSE*Runtime	9.640	17.433	25.671	12.098	19.109	15.825	9.340	7.221

Dateset	Scene	MSE

		[28]	[29]	[13]	[30]	[15]	[16]	[18]	OUR

Stratified	Backgammon	9.56	5.65	13.01	4.59	21.59	20.51	6.07	5.64
	Dots	5.73	19.81	5.68	5.24	3.30	7.42	5.08	6.63
	Pyramids	0.027	0.072	0.273	0.043	0.098	0.070	0.048	0.076
	Stripes	2.67	15.62	17.45	6.96	8.13	4.13	3.56	1.37
Training	Boxes	8.72	12.29	17.43	9.11	9.85	7.49	8.42	6.26
	Cotton	2.25	1.87	9.17	1.31	1.07	3.34	1.51	1.01
	Dino	1.23	0.50	1.16	0.31	1.14	0.89	0.38	0.54
	Sideboard	2.85	1.82	5.07	1.02	2.30	1.32	0.88	1.58
Test	Bedroom	0.57	0.26	0.47	0.21	0.63	0.42	0.23	0.32
	Bicycle	8.52	5.88	11.73	5.57	7.67	6.55	5.14	4.83
	Herbs	24.70	13.02	21.34	11.23	22.20	24.85	11.67	12.46
	Origami	5.01	2.23	6.76	2.03	2.30	2.03	1.78	2.09

Average		5.975	8.240	9.128	3.968	6.690	6.586	3.730	3.566

Category	[28]	[29]	[13]	[30]	[15]	[16]	[18]	OUR

AVG_MSE	5.975	6.584	9.128	3.968	6.690	6.586	3.730	3.566
MED_MSE	3.932	3.940	7.963	3.309	2.803	3.737	2.667	1.831
Badpix0.07	24.324	14.299	16.193	8.466	17.579	15.403	8.211	14.435
Runtime(log10)	1.945	3.313	3.004	3.325	4.026	3.066	2.920	2.676
Discont.	45.338	29.290	38.210	22.807	38.263	33.346	21.750	31.292
Fine Fattening	58.858	25.193	41.421	29.224	42.090	40.368	27.031	33.154
Bright Stripes	27.745	66.153	52.889	33.906	41.314	23.194	17.739	5.119
Low Texture	5.291	37.386	35.213	10.584	14.151	8.641	2.160	0.037
MSE*Runtime	9.640	17.433	25.671	12.098	19.109	15.825	9.340	7.221

Dateset	Scene	MSE

		[28]	[29]	[13]	[30]	[15]	[16]	[18]	OUR

Stratified	Backgammon	9.56	5.65	13.01	4.59	21.59	20.51	6.07	5.64
	Dots	5.73	19.81	5.68	5.24	3.30	7.42	5.08	6.63
	Pyramids	0.027	0.072	0.273	0.043	0.098	0.070	0.048	0.076
	Stripes	2.67	15.62	17.45	6.96	8.13	4.13	3.56	1.37
Training	Boxes	8.72	12.29	17.43	9.11	9.85	7.49	8.42	6.26
	Cotton	2.25	1.87	9.17	1.31	1.07	3.34	1.51	1.01
	Dino	1.23	0.50	1.16	0.31	1.14	0.89	0.38	0.54
	Sideboard	2.85	1.82	5.07	1.02	2.30	1.32	0.88	1.58
Test	Bedroom	0.57	0.26	0.47	0.21	0.63	0.42	0.23	0.32
	Bicycle	8.52	5.88	11.73	5.57	7.67	6.55	5.14	4.83
	Herbs	24.70	13.02	21.34	11.23	22.20	24.85	11.67	12.46
	Origami	5.01	2.23	6.76	2.03	2.30	2.03	1.78	2.09

Average		5.975	8.240	9.128	3.968	6.690	6.586	3.730	3.566

Robust depth estimation for multi-occlusion in light-field images

Abstract

1. Introduction

2. Light-field multi-occlusion model

One occluder analysis:

Multi-occluders analysis:

3. Initial depth estimation

3.1. Anti-occlusion in the central view

3.1.1. Adaptive connected-domain selection

3.1.2. Depth estimation

3.2. Anti-occlusion in other views

4. Depth regularization

4.1. Curvature based confidence analysis

4.2. Final depth regularization

5. Experimental results

5.1. Evaluation on synthetic datasets

5.2. Evaluation on real datasets

6. Conclusion

Funding

References

Cited By

Figures (12)

Tables (2)

Equations (21)

Optics Express