Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Stereo matching based on multi-scale fusion and multi-type support regions

Open Access Open Access

Abstract

Obtaining accurate disparity values in textureless and texture-free regions is a very challenging task. To solve this problem, we present a novel algorithm. First, we use the guided filter method to fuse the color cost volume and the gradient cost volume. Second, we use three types of image category information to merge the different scale disparity maps and obtain the primary disparity map. Third, during the disparity refinement procedure, we also utilize the three types of category information to define different support regions and assign different weights for pixels remaining to be refined. Extensive experiments show that the performance of our method is not inferior to many state-of-the-art methods on the Middlebury data set.

© 2019 Optical Society of America

1. INTRODUCTION

Stereo matching plays an important role in many applications, including 3D reconstruction, 3D scanning, and medical imaging. According to [1], stereo-matching algorithms can be mainly classified into local and global methods. Global stereo methods skip the cost aggregation step. They use belief propagation [2] or graph cut [3] to minimize an energy function and constrain the disparity smoothness between two neighbor pixels. Global methods achieve higher accuracy, but the execution time is long. Local methods use local features to determine each pixel’s disparity. The matching accuracy of the local method is lower than that of the global method, but it benefits from a fast running time, so it has attracted a lot of attention. In recent years, the semi-global method (SGM) and the data-driven method (deep learning) have also been widely used to solve the stereo-matching problem. Compared with these, the local method is simpler in principle and implementation. Therefore, in our work, we focus on computing an accurate disparity map with the local stereo-matching strategy.

Most local stereo-matching methods consist of four steps: cost computation, cost aggregation, disparity computation, and disparity refinement. Most stereo-matching algorithms use intensity-based initial cost measurements, for instance, the sum of absolute differences (SAD) and the sum of squared differences (SSD). But these methods are sensitive to noise and radiometric distortions [4]. Fortunately, the normalized cross correlations (NCC) [5], gradient, rank [6], census transforms [7], and convolutional neural networks (CNN) [8] have been widely used. Certainly, in order to obtain more robust cost, some methods combine different costs [911].

The cost-aggregation procedure is another significant procedure for stereo matching, because cost aggregation can restrain the noise interference in the cost volume. Many aggregation methods have been proposed and evaluated [12]. Yoon and Kweon [13] proposed an adaptive support weight (ASW) method based on color and distance information at the cost aggregation step, and the weight assigned to each pixel is variable. The edge effect is much better than the SAD algorithm, and ASW achieved results comparable to those of global methods. After the ASW, many edge-aware filters were introduced to smooth the cost volume. Yang [14] used improved bilateral filtering to filter the cost volume. Rhemann et al. [15] used a very fast edge-preserving filter (guided filter [16]) to smooth the cost volume. It used the mean and variance information to calculate the filtering weight, which introduced the structure information of the image. The matching result is more accurate, and the noise point is less than that of the ASW algorithm. After Rhemann, many methods used the guided filter or its variant to proceed with the cost aggregation [1719]. Besides the support weight, some scholars pay more attention to the support region shape [2022]. Veksler [20] presented a variable window method by choosing a useful range of window shapes/sizes for evaluation. Zhang et al. [21] constructed a cross-skeleton support region for every pixel. Shi et al. [22] calculated the adaptive support window according to each segmentation region based on the color correlation between adjacent pixels. In order to consider more global information, some segment tree methods were proposed [23,24]. Yang [23] proposed a non-local cost aggregation method, which extended the kernel size to the entire image. By computing a minimum spanning tree (MST) over the image graph, the non-local cost aggregation method can perform extremely fast. Mei et al. [24] followed the non-local cost aggregation idea and enforced the disparity consistency by using a segment tree instead of MST. Zhang et al. [25] proposed a cross-scale cost aggregation (CSCA) stereo-matching method. Compared with the single-scale cost aggregation, this multiscale constraint strengthened the consistency of inter-scale cost volume and behaved well in textureless regions.

Disparity refinement is designed to improve the matching accuracy in occluded regions and low-texture regions. The left-right consistency check (LRC) and the left-right filling strategy [26] are commonly used to detect and refine outliers. Huang and Zhang [27] used belief aggregation for outlier detection and belief propagation for disparity filling. Mei et al. [10] detected and classified pixels into occlusions and mismatches and then used an iterative region voting strategy to interpolate these outliers. Bilateral filtering and weighted median filtering [28] were also employed for disparity refinement.

Though many methods have been proposed to improve the matching accuracy, the low-accuracy problem in texture-free regions has not been solved very well. To improve the matching accuracy in texture-free regions, we propose a local stereo-matching method based on multi-scale fusion and multi-type support regions. There are some contributions for our method.

  • (1) We propose an adaptive proportion cost combination method, which uses the guided filter to combine the color and gradient costs adaptively without considering the original image. Compared with the fixed proportion combination methods [1619], our combination method is more flexible, as it can retain more correct cost values in texture-free regions. Furthermore, our method simplifies the cost combination and the guided filter cost aggregation procedures [1619] into one cost fusion procedure.
  • (2) Different from the cross-scale cost aggregation method (CSCA) [25], we prefer to merge the disparity maps under different scales rather than merge the costs under different scales. This procedure can obtain a primary disparity map with less incorrect pixels in texture-free regions.
  • (3) In the disparity refinement procedure, we propose one multi-type support region refinement method, which considers the local and global information comprehensively. The four types of support regions are determined by the pixel category information. Our refinement method is more valid than the ordinary single left-right consistency check and filling refinement methods.

Figure 1 shows the flowchart of our algorithm. The red lines denote the cost computation step, the green lines denote the cost aggregation step, the blue lines denote the disparity computation step, and the black lines denote the disparity refinement step. The purple lines denote the proposed pixel category step.

 figure: Fig. 1.

Fig. 1. Overview of the proposed method.

Download Full Size | PDF

The paper is organized as follows: Section 1 is the introduction. The proposed method is described in Section 2. Corresponding to the four steps of stereo matching, Section 2.A is the cost computation procedure, Section 2.B is the cost aggregation procedure, Section 2.C is the disparity computation procedure, and Section 2.D is the disparity refinement procedure. Experiments and comparison results are presented in Section 3. Finally, we come to the conclusion in Section 4.

2. PROPOSED METHOD

A. Cost Computation Using Color and Gradient Differences

The common cost measurements include absolute difference (AD), gradient amplitude, and non-parametric transformations such as rank and census. They all have their own strengths and weaknesses and apply to different backgrounds. In order to synthesize the advantages of different algorithms, the matching cost combination method is widely used. Zhu and Li [9] proposed an improved gradient cost function that incorporates the gradient phase because the gradient vector and phase have different invariance properties with respect to the radiometric distortion. Mei et al. [10] proposed a new cost measure by combining AD and census to achieve an impressive performance. Zhan et al. [11] proposed a novel double-RGB gradient model using the guidance image. In our method, we use the color and gradient differences to compute the matching cost.

Given a pixel p=(x,y) in the left image with a disparity d, the corresponding pixel in the right image is pd=(xd,y). First, we compute the color and gradient cost values CAD(p,d) and CGD(p,d) as follows:

CAD(p,d)=min(13i(R,G,B)|Iileft(p)Iiright(pd)|,τAD),
CGD(p,d)=xCGD(p,d)+yCGD(p,d),
xCGD(p,d)=min(|xIgrayleft(p)xIgrayright(pd)|,τGD),
yCGD(p,d)=min(|yIgrayleft(p)yIgrayright(pd)|,τGD).
CAD is the average intensity difference of p and pd in the RGB channels. CGD is the difference of the gradient. τAD and τGD are the truncated values [19,29], and x is the derivative in the X direction. The Y direction is denoted as y. Igrayleft and Igrayright are the left and right gray images.

B. Cost Aggregation by Cost Volume Fusion

After obtaining the color and gradient cost volumes, many cost computation methods use the fixed proportion [Eq. (5)] to merge these different cost volumes [15,19,2931]. The fixed proportion merging method is simple, but it is difficult to obtain the optimal merging parameter α. Different from these methods, we use the guided filter to merge these two cost volumes as follows:

C(p,d)=(1α)CAD(p,d)+αCGD(p,d).
The color cost volume and the gradient cost volume have different characters: the color cost volume can retain more edge information as shown in the red rectangle regions in Fig. 2, and the gradient cost volume is more accurate in texture-free regions as shown in the yellow rectangle regions in Fig. 2.

 figure: Fig. 2.

Fig. 2. Disparity maps under different cost volumes. (a), (c) Disparity maps under single color cost volume; (b), (d) disparity maps under single gradient cost volume.

Download Full Size | PDF

Based on these characteristics, we assume that the aggregated cost volume has relationships with the color cost volume CAD and the gradient cost volume CGD as shown in Eqs. (6) and (7). To minimize the energy function [Eq. (8)], we can solve the guided filter parameters in Eqs. (9) and (10) and compute the fused cost volume as follows:

C(p,d)=ak×CAD(p,d)+bkpΩ(x),
C(p,d)=CGD(p,d)np,
E(ak,bk)=pΩ(x)((ak×CAD(p,d)+bkCGD(p,d))2+εak2),
ak=1|w|pΩ(x)CAD(p,d)×CGD(p,d)μ×CGD¯δ2+ε,
bk=CGD¯akμ,
where ak and bk are linear parameters assumed to be constant in a local block Ω(x). np is an unwanted component-like noise. ε is a parameter to penalize large ak. μ and σ2 are the mean and variance of CAD in Ω(x), |w| is the number of pixels in Ω(x), and CGD¯ is the mean of CGD in Ω(x).

Figure 3 contains the disparity maps under the fixed proportion cost merging method of [15] and our cost fusion method. The disparity maps obtained by our cost volume are more correct, especially in texture-free regions (the red rectangle areas).

 figure: Fig. 3.

Fig. 3. Disparity maps under different cost volume merging algorithms. (a), (c) Disparity maps under the fixed proportion merging method of [15]; (b), (d) disparity maps under our fused cost volume. All results are obtained under the guided filter cost aggregation method.

Download Full Size | PDF

C. Primary Disparity Computation Based on Multi-Scale Disparity Merging

1. Multi-Scale Disparity Map Computation

CSCA [25] proved that cross-scale cost aggregation behaved well in textureless regions. Inspired by that, we use multi-scale images in our method. We use the original and the half-scale images to compute the cost volume and obtain the disparity maps corresponding to each scale. We record the zoom parameter as scale. The block sizes [Ω(x) in Eq. (6)] of the cost volume fusion process under two scales are r1×r1 and r2×r2. Figure 4(b) shows the disparity maps obtained under the half-scale image (the disparity map has been resized to the same size as the original image). In red rectangle regions on the plastic image, the error disparities of the half-scale image are less than the original scale. But, in the red rectangle regions on the Tsukuba image, the fine structural information under the half scale is not better than that of the original scale image. To get a primary disparity map that has less incorrect values in large texture-free regions and retains fine structural information, we propose our disparity merging method based on the pixel category.

 figure: Fig. 4.

Fig. 4. Disparity maps under different scales. (a) Disparity maps obtained under the original scale, (b) disparity maps obtained under the half-scale, and (c) disparity maps obtained under our disparity merging method.

Download Full Size | PDF

2. Multi-Scale Disparity Maps Merging Based on Pixel Category

In our merging method, we define three pixel categories: segmentation category, texture category, and left-right consistency (LRC) category. We use the mean-shift segmentation method [32] to obtain the segmentation category image [as shown in Fig. 5(b)]. Pixels in the same segmentation region have the same label.

 figure: Fig. 5.

Fig. 5. Pixel category images. (a) Original left image, (b) segmentation category image by mean shift, (c) texture category image, and (d) the LRC category image.

Download Full Size | PDF

We use Eq. (11) on the segmentation image to produce the texture category image. In the texture category image, pixels with a value of 1 [white pixels in Fig. 5(c)] lie in the texture-free regions, and pixels with a value smaller than 1 lie in the texture regions;

T(p)=1|w|qΩ(p)t(q),
t(q)={1ifS(p)=S(q)0otherwise,
where Ω(p) is a small block with radius r3 and p is the center pixel. |w| is the pixel number in the small block Ω(p). T is the texture category image. S is the labeled segmentation image by the mean-shift algorithm. t(q) distinguishes whether pixel q and pixel p are in the same segmentation region.

Besides the previous two categories, we also use the LRC check to classify the pixels into stable and unstable points. Figure 5(d) is the LRC category image. Pixels with a value of 1 (white pixels in LRC category image) fail the LRC check, and pixels with a value of 0 (black pixels in LRC category image) pass the LRC check.

In the end, for pixels whose texture category values equal 1 and fail the LRC check simultaneously, we choose the small scale disparity values as the primary disparity values. For other pixels, we choose the original scale disparity values as the primary disparity values. Figure 4(c) shows the primary disparity maps by our disparity map merging method. Compared with Figs. 4(a) and 4(b), our primary disparity maps retain fine structural information and have less incorrect values in texture-free regions.

D. Disparity Refinement

1. Edge Disparity Optimization Based on Multi-Information Weights

The disparity map usually suffers from the edge foreground expansion problem. To solve this problem, we first use the Canny edge detection method on the first filled disparity map to find the disparity edge [Fig. 6(b)]. Then we use the dilate operation to expand the edge lines into edge regions. In Fig. 6(d), the green lines are the ground truth disparity edges, and the black regions are the disparity edge regions extracted by us. The green lines are surrounded by our disparity edge regions. Therefore, operating the cost aggregation on the detected disparity edge regions can once again obtain more accurate edge disparity values.

 figure: Fig. 6.

Fig. 6. Edge regions detection. (a) First filled disparity map, (b) canny edge of left image, (c) ground-truth disparity map, and (d) the location relationship of the expanded edge region and the true edge.

Download Full Size | PDF

In order to limit the time consumed, for the disparity edge pixels we do not proceed with the second cost aggregation on the whole cost volume. As shown in Fig. 7, for an edge point (the blue pixel), the effective disparity values in its support region (the red rectangle) are 5 and 8 on the first filled disparity map, so we prefer to proceed with the aggregation operation on the cost volume only from disparity of 5 to disparity of 8 instead of the whole cost volume from disparity 1 to 25 (the disparity level).

 figure: Fig. 7.

Fig. 7. Disparity limitation on edge region cost aggregation.

Download Full Size | PDF

The cost aggregation methods can be regarded as a filter operation in Eq. (13). Color weight in Eq. (14) and spatial distance weight in Eq. (15) are always used to assign the filter weights. In our method, we introduce another disparity weight in Eq. (16) into the weight assignment procedure. The three image categories (obtained in Section 2.C.2) are also used in the weight calculation procedure, written as follows:

C(p,d)=qΩ(p)w(p,q)×C(p,d),
Col(p,q)=exp(c(R,G,B)|Ic(q)Ic(p)|λ1),
D(p,q)=exp(|xqxp|+|yqyp|r7),
Dis(p,q)=exp(disp(q)λ2).
C(p,d) is the filtered cost and C(p,d) is the cost calculated under the original scale image. w(p,q) is the filtering weight. p is the center pixel. q is neighborhood pixel of p. disp is the first filled disparity map. Dis(p,q) is the disparity weight factor. D(p,q) and Col(p,q) are the spatial distance similarity and color distance similarity between p and q. x and y are the pixel coordinates in the horizontal and vertical directions. I is the image intensity. λ1 are the parameters to adjust the color similarity. λ2 are the parameters to adjust the disparity weight. The size of the support region Ω(p) is r4×r4. In Eq. (16) the larger the disp(q), the smaller the Dis(p,q). So the disparity factor can restrain the edge foreground expansion problem. In different cases, different information is used to assign weights. For edge pixels that do not lie in the texture-free regions, the weight calculation method is Eq. (17) as follows:
w(p,q)={Dis(p,q)D(p,q)Col(p,q)ifdisp¯fill1numDisp2D(p,q)Col(p,q)ifdisp¯fill1>numDisp2Col(p,q)ifS(q)=S(p)anddispfill1(q)dispfill1(p)0ifLRC(q)=1,
where p is the pixel that remains to be refined. dispfill1 is the first filled disparity map. disp¯fill1 is the average disparity value in the support region on the first filled disparity map. In our method, we use the parameter numDisp/2 to divide the image into foreground and background. numDisp is the disparity level. The setting of condition disp¯fill1numDisp/2 is to restrain the foreground expansion phenomenon in background areas. S is the labeled segmented image, and S(q)=S(p) means pixel q and pixel p are in the same segmentation region. LRC is the LRC check image, and the value of 1 means that this pixel fails the LRC check.

For edge pixels the lie in the texture-free regions, the weight calculation method is Eq. (18). The disparity values in texture-free regions are usually low error values, and the parameter λ3 can assign low weight for the pixels with incorrect low disparity values as follows:

w(p,q)={0ifdisp¯fill1>numDisp2anddispfill1(q)<disp¯fill1λ3Col(p,q)otherwise.
Figure 8(b) shows the edge-optimized result using color and spatial distance weights. Figure 8(c) shows the optimized results using our weight assignment method. In the red rectangle disparity edge areas, our disparities are more accurate.

 figure: Fig. 8.

Fig. 8. Disparity map under different weight computation algorithms. (a) Disparity map before edge optimization, (b) disparity maps under the color and distance weights and without the disparity limitation, and (c) disparity maps under our weight assignment method.

Download Full Size | PDF

2. Disparity Map Filling Based on Multi-Type Support Regions

The ordinary left-right consistency check (LRC) is important for disparity refinement, but there are some disadvantages for the LRC check and filling strategy.

First, the LRC check is always not correct enough: pixels that have passed the LRC check may not have the correct disparity, and pixels that have failed the LRC check may have the correct disparity.

Second, the ordinary LRC only pays attention to the horizontal direction and ignores other surrounding pixels’ disparity information.

Third, the perceptual range of the ordinary LRC is limited; it cannot collect enough global information to deal with the texture-free regions.

To overcome all these disadvantages, we define four types of support regions based on the three pixel categories defined in Section 2.C.2. Figure 9 shows the four types of support regions. The horizontal support region is the ordinary LRC check region; the eight scan lines support region and the rectangle support region pay more attentio to the local information. The cross-support region pays more attention to the global information.

 figure: Fig. 9.

Fig. 9. Support region setting for disparity filling. (a) Horizontal support region, (b) rectangle support region, (c) eight scan lines support region, (d) cross-support region. Pixels with red edges form the support regions.

Download Full Size | PDF

In the horizontal support region, we find two pixels that have passed the LRC check first in the left side and the right side, and then we calculate the minimum and maximum disparity value of these two pixels and record them as horizonmin and horizonmax.

In the eight scan lines support region, we find eight pixels that have passed the LRC check first in eight directions. Then, among these eight pixels, we find the pixel that is the most similar to the pixel that failed the center LRC check and record the disparity of it as eightmost. The similarity is calculated by Eq. (19) as follows

similar(p,q)=c(R,G,B)|Ic(q)Ic(p)|,
where I is the intensity of the image, and p is the pixel that failed the LRC check. q is one of the eight candidate pixels in eight directions.

In the rectangle support region, we compute the median disparity value for the pixels that lie in the same segmentation region as the center pixel and record it as rectmed. We also compute the median disparity value for the pixels that have passed the LRC check and simultaneously lie in the same segmentation region as the center pixel, and we record it as rectmed_lrc.

The half-arm length of the cross-support region is r7. In this support region, we compute the median disparity value for the pixels that lie in the same segmentation region as the center pixel and record it as crossmed. After that, we further compute the median disparity value and the largest disparity value for the pixels that have passed the LRC check and simultaneously lie in the same segmentation region as the center pixel. Then we record them as crossmed_lrc and crossmost.

Our disparity refinement contains two disparity filling steps. The first filling step mainly pays attention to the local disparity information, so the horizontal support region, the eight scan lines support region, and the rectangle support region are used. The concrete procedure for the first disparity filling is shown in Table 1.

Tables Icon

Table 1. First Disparity Filling Process

In Table 1, x1 is one of the candidate pixels in the rectangle support region Ω(x). T(x)1 means that pixel x does not lie in the texture-free region. S(x1)=S(x) means that pixel x1 lies in the same segmentation region as pixel x. LRC(x1)1 means that pixel x1 has passed the LRC check. The num() operation counts the pixel number, satisfying the conditions in the brackets. If the amount of pixels that lie in the same region as the center pixel and have also passed the LRC check is not larger than the threshold λ4, the rectmed_lrc is less likely to be the correct disparity value. Considering this, we set the operations from line 9 to line 13.

The second filling step mainly pays attention to the global disparity information, so the horizontal support region and the cross-support region are used. In order to deal with texture-free regions, we divide pixels into four categories: pixels that have failed the LRC check and also lie in the texture-free region, pixels that have failed the LRC check and also lie in the texture region, pixels that have passed the LRC check and also lie in the texture-free region, and pixels that have passed the LRC check and also lie in the texture region. For these four pixel categories, we use different the filling strategies listed in Table 2.

Tables Icon

Table 2. Second Disparity Filling Process

In Table 2, the settings of numDisp/2 in line 5 and line 19 are to restrain low incorrect values in foreground areas and high incorrect values in background areas. In the cross-support region, if the amount of pixels that lie in the same region as the center pixel and have also passed the LRC check is not larger than the threshold λ4, the crossmed_lrc value is less likely to be the correct disparity value. Considering this, we set the operations from line 11 to line 15. In line 24, for the pixels that pass the LRC check, the setting of λ5 can avoid the unnecessary change.

Figure 10 shows the results of each refinement step in our method. For the first and second filling steps in Figs. 10(b) and 10(d), the filled disparity maps have less error disparities than the previous non-filled disparity maps in Figs. 10(a) and 10(c).

 figure: Fig. 10.

Fig. 10. Results of every procedure under the proposed method. (a) Primary disparity maps, (b) first filled disparity maps, (c) edge-optimized disparity maps, (d) second filled disparity maps.

Download Full Size | PDF

3. EXPERIMENT AND ANALYSIS

A. Experimental Environment and Parameter Setting

In order to evaluate the performance of the proposed method, we carry out our experiments with an Intel i5-7300HQ 2.5 GHz CPU and 8 GB RAM environment, and the development tool is MATLAB 2016a. There are some parameters in our experiments, and they are listed in Table 3. In Table 3, wid and hei are image width and height, respectively. The scale of 0.5 means that we use two scale images: the original scale image and the half-scale image. The win is the basic block size in our method, and many other block sizes are based on it. The τAD and τGD are the same as in [19,30]. Figure 11 shows the results by our method.

Tables Icon

Table 3. Parameters Set in our Experiments

 figure: Fig. 11.

Fig. 11. Results of the proposed method: (a), (c), and (e) original images; (b), (d), and (f) the final results of our method (red pixels are error pixels). The original images from left to right are Adirondack, Midd1, and Dolls.

Download Full Size | PDF

The complexity of the cost fusion procedure is similar to that of the CostFilter [15] method, because we all use the guided filter for cost aggregation. The complexity of the disparity refinement is mainly decided by the disparity edge-optimization procedure, which proceeds the second cost aggregation procedure on the disparity edge pixels. Table 4 shows the corresponding time consumption of the images in Fig. 11. In Table 4, the time of the small image Midd1 is close to the time of the large image Adirondack; this is because the edge-optimization procedure will find and optimize more disparity edge pixels in the texture-free regions. The whole matching time for the Midd1 image is 258 s: the disparity edge-optimization procedure takes 151 s, and the cost fusion procedure is 4 s.

Tables Icon

Table 4. Computation Time in Seconds

B. Comparison Experiments

As a method based on the guided filter, our method can be regarded as an improvement on the CostFilter method [15], so we first compare our method with CostFilter. Table 5 is the final error results of our method and the CostFilter method. We do the comparison experiments on the four Middlebury 2.0 test images. Compared with the CostFilter method, our method obtains a lower error rate in three images, and our method’s average error rate is also lower than that of the CostFilter method.

Tables Icon

Table 5. Comparison Result between CostFilter and the Proposed Algorithms

Our method is a multi-scale method that is inspired by the CSCA method. But different from the CSCA method, instead of merging costs under different scales, we merge disparity maps under different scales. Figure 12 shows the comparison result between CSCA and our method. All disparity maps in Fig. 12 do not have disparity refinement. We can see in the texture-free regions (the red rectangle regions) that our disparity maps have more correct disparity values than CSCA. This proves that our multi-scale disparity map merging method is more suitable for texture-free regions than the CSCA method.

 figure: Fig. 12.

Fig. 12. Comparison results between CSCA and the proposed method. (a) Left images, (b) disparity maps under CSCA, and (c) disparity maps under the proposed method.

Download Full Size | PDF

The PatchMatch-based methods have attracted the attention of many scholars, and this kind of method currently achieves the best result in the non-deep-learning methods. Therefore, we also compare our method with some PatchMatch-based methods and other classical methods: PatchMatch Belief Propagation (PMBP) [33], Speed-up PatchMatch Belief Propagation (SPM-BP) [29], Graph Cut based continuous stereo matching using Locally Shared Labels (GCLSL) [34], Segment Tree (ST) [24], Local Expansion (LocalExp) [35], PatchMatch (PM) [36], and PatchMatch-based Superpixel Cut (PMSC) [37]. Since most of the methods are designed for low-resolution images, we run the corresponding experiments on the Middlebury 2006 data set, which consists of 21 low-resolution image pairs. Figures 13 and 14 are part of the image results.

 figure: Fig. 13.

Fig. 13. Comparison of disparity maps generated from different algorithms. Error pixels are marked in red. (a) Left image, (b) PMBP, (c) SPM-BP, (d) GCLCL, (e) PMSC, and (f) our proposed method. Best viewed with zoom-in on a digital display.

Download Full Size | PDF

 figure: Fig. 14.

Fig. 14. Comparison of disparity maps generated from different algorithms. Error pixels are marked in red. (a) Left image, (b) PatchMatch, (c) PMF, (d) LocalExp, (e) our proposed. Best viewed with zoom-in on a digital display.

Download Full Size | PDF

As shown in Figs. 13 and 14, the PM, PatchMatch Filter (PMF), PMBP, SPM-BP, and GCLSL methods are not good at dealing with texture-free regions: they have many error disparities (the red points in these two figures) in the texture-free regions. Table 6 shows the eight methods’ error rates on the 21 image pairs of Middlebury 2006, with 1 pixel error threshold. Error rates are evaluated on non-occluded regions. Our method reaches the best accuracy in 11 out of the 21 image pairs, and the average error rate is also the lowest. The PMSC and LocalExp methods used the pixel-matching cost from Matching Cost with a Convolutional Neural Network (MC-CNN) [8], but their average error rates are also not superior to ours, especially for images containing large texture-free regions like Midd1, Midd2, Monopoly, and Plastic in Table 6.

Tables Icon

Table 6. Comparison Result under Middlebury 2006 Data Set on Non-occluded Regionsa

The Middlebury 2006 data sets contain low-resolution images, so we also evaluate our method on the Middlebury 2014 data sets, which have high-resolution images. There are many state-of-the-art methods on the Middlebury website, including the local method, global method, semi-global method, and data-driven (deep learning) method. Because our method is not a data-driven method, we compare our method with the newest non-data-driven methods: Confidence Map based 3D cost aggregation with multiple Minimum Spanning Trees (3DMST-CM) [38], Coalesced Bidirectional Matching Volume Robost vision challenge (CBMV-ROB) [39], Dense and robust image registration by shift Adapted Weighted Aggregation (DAWA-F) [40], Fusing Adaptive Support Weights (FASW) [41], Improvement of Stereo Matching (ISM) [42], PieceWise Cost Aggregation Semi-Global Matching (PWCA-SGM) [43], Segment-based Disparity Refinement (SDR) [44], Adaptive Weighted bilateral filter Processing on Stereo Matching (SM-AWP) [45], Sparse Representation for Suitable and selective Stereo Matching (SMSSR) [46], Two-branch Convolutional Sparse Coding Stereo Matching (TCSCSM) [47], and Ref. [48]. All these methods were proposed after the year 2018. The qualitative and quantitative comparison results are shown in Fig. 15 and Table 7. During our experiments, we used 1p perfect images (Adirondack, Jade Plant, Motorcycle, Piano, Pipes, Playroom, PlaytableP, Recycle, Shelves, and Teddy) without any interfering factors. In Table 7, our method achieved the fourth-lowest error rate among all twelve methods. This proves that our method is not inferior to the mainstream methods on large images.

 figure: Fig. 15.

Fig. 15. Comparison of disparity maps generated from different algorithms on Middlebury 2014. Error pixels are marked in red. From top to bottom are the results of 3DMST-CM [38], CBMV-ROB [39], DAWA-F [40], FASW [41], ISM [42], PWCA-SGM [43], SDR [44], SM-AWP [45], SMSSR [46], TCSCSM [47], Ref. [48], and our proposed method. Images from left to right are Adirondack, Jade Plant, Motorcycle, Piano, Pipes, Playroom, PlaytableP, Recycle, Shelves, and Teddy. Best viewed with zoom-in on a digital display.

Download Full Size | PDF

Tables Icon

Table 7. Average Error Rates of the 10 Images on Middlebury 2014 Data Sets in Non-occluded Regionsa

4. CONCLUSION

In this paper, we present a novel accurate stereo-matching approach with multi-scale fusion and multi-type support region. We propose a fusion-based cost aggregation method and a multi-scale disparity map merging strategy. Under this method, we can obtain as many correct disparity values as possible in texture-free regions for the primary disparity map. During the disparity refinement procedure, we define three types of support regions that can consider both the local and global information to refine the disparity map. Furthermore, we also define a new weight assignment strategy to refine the disparity values in edge regions. Evaluation shows that the proposed method obtains highly accurate disparity maps, and it is currently superior to many state-of-the-art methods on the Middlebury 2006 data set. There are certainly still some disadvantages to our method; as a type of local method, our method does not obtain the highest score on high-resolution Middlebury 2014 stereo images, and the error disparities always lie in the slant regions with obvious depth discontinuity. In the future, we would like to solve these problems.

Acknowledgment

We thank all editors and reviewers for their work to improve this paper.

REFERENCES

1. D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. 47, 7–42 (2002). [CrossRef]  

2. J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief propagation,” IEEE Trans. Pattern Anal. Mach. Intell. 25, 787–800 (2003). [CrossRef]  

3. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001). [CrossRef]  

4. R. A. Hamzah, H. Ibrahim, and A. H. A. Hassan, “Stereo matching algorithm based on illumination control to improve the accuracy,” Image Anal. Stereol. 35, 39–52 (2016). [CrossRef]  

5. K. Briechle and U. D. Hanebeck, “Template matching using fast normalized cross correlation,” Proc. SPIE 4387, 95–102 (2001). [CrossRef]  

6. G. Zhao, Y. Du, and Y. Tang, “Adaptive rank transform for stereo matching,” in International Conference on Intelligent Robotics and Applications (2011), pp. 95–104.

7. R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” in European Conference on Computer Vision (1994), pp. 151–158.

8. J. Bontar and Y. Lecun, “Stereo matching by training a convolutional neural network to compare image patches,” J. Mach. Learn. Res. 17, 2287–2318 (2016).

9. S. Zhu and Z. Li, “Local stereo matching using combined matching cost and adaptive cost aggregation,” TIIS 9, 224–241 (2015). [CrossRef]  

10. X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang, “On building an accurate stereo matching system on graphics hardware,” in IEEE International Conference on Computer Vision Workshops (IEEE, 2012), pp. 467–474.

11. Y. Zhan, Y. Gu, K. Huang, C. Zhang, and K. Hu, “Accurate image-guided stereo matching with efficient matching cost and disparity refinement,” IEEE Trans. Circuits Syst. Video Technol. 26, 1632–1645 (2015). [CrossRef]  

12. F. Tombari, S. Mattoccia, L. D. Stefano, and E. Addimanda, “Classification and evaluation of cost aggregation methods for stereo correspondence,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), pp. 1–8.

13. K. J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Trans. Pattern Anal. Mach. Intell. 28, 650–656 (2006). [CrossRef]  

14. Q. Yang, “Hardware-efficient bilateral filtering for stereo matching,” IEEE Trans. Pattern Anal. Mach. Intell. 36, 1026–1032 (2014). [CrossRef]  

15. C. Rhemann, A. Hosni, and M. Bleyer, “Fast cost-volume filtering for visual correspondence and beyond,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011), pp. 3017–3024.

16. K. He, J. Sun, and X. Tang, “Guided image filtering,” in European Conference on Computer Vision (2010), pp. 1–14.

17. S. Zhu and L. Yan, “Local stereo matching algorithm with efficient matching cost and adaptive guided image filter,” Vis. Comput. 33, 1087–1102 (2017). [CrossRef]  

18. G. S. Hong and B. G. Kim, “A local stereo matching algorithm based on weighted guided image filtering for improving the generation of depth range image,” Displays 49, 80–87 (2017). [CrossRef]  

19. H. Ma, S. Zheng, C. Li, Y. Li, L. Gui, and R. Huang, “Cross-scale cost aggregation integrating intra-scale smoothness constraint with weighted least squares in stereo matching,” J. Opt. Soc. Am. A 34, 648–656 (2017). [CrossRef]  

20. O. Veksler, “Fast variable window for stereo correspondence using integral images,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2003), Vol. 1, pp. I-556–I-561.

21. K. Zhang, J. Lu, and G. Lafruit, “Cross-based local stereo matching using orthogonal integral images,” IEEE Trans. Circuits Syst. Video Technol. 19, 1073–1079 (2009). [CrossRef]  

22. H. Shi, H. Zhu, J. Wang, S. Y. Yu, and Z. F. Fu, “Segment-based adaptive window and multi-feature fusion for stereo matching,” J. Algorithms Comput. Technol. 10, 3–200 (2016). [CrossRef]  

23. Q. Yang, “A non-local cost aggregation method for stereo matching,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012), pp. 1402–1409.

24. X. Mei, X. Sun, W. Dong, H. Wang, and X. Zhang, “Segment-tree based cost aggregation for stereo matching,” in Computer Vision and Pattern Recognition (IEEE, 2013), pp. 313–320.

25. K. Zhang, Y. Fang, D. Min, L. Sun, S. Yang, S. Yan, and Q. Tian, “Cross-scale cost aggregation for stereo matching,” in Computer Vision and Pattern Recognition (2014), pp. 1590–1597.

26. G. Egnal, M. Mintz, and R. P. Wildes, “A stereo confidence metric using single view imagery with comparison to five alternative approaches,” Image Vision Comput. 22, 943–957 (2004). [CrossRef]  

27. X. Huang and Y. J. Zhang, “An O(1) disparity refinement method for stereo matching,” Pattern Recogn. 55, 198–206 (2016). [CrossRef]  

28. Z. Ma, K. He, Y. Wei, J. Sun, and E. Wu, “Constant time weighted median filtering for stereo matching and beyond,” in IEEE International Conference on Computer Vision (2014), pp. 49–56.

29. Y. Li, D. Min, M. S. Brown, M. N. Do, and J. Lu, “SPM-BP: sped-up PatchMatch belief propagation for continuous MRFs,” in International Conference on Computer Vision (ICCV) (2015), pp. 4006–4014.

30. C. Lei and Y. H. Yang, “Optical flow estimation on coarse-to-fine region-trees using discrete optimization,” in IEEE International Conference on Computer Vision (ICCV) (2009), pp. 1562–1569.

31. T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in variational motion estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 33, 500–513 (2011). [CrossRef]  

32. D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619 (2002). [CrossRef]  

33. F. Besse, C. Rother, A. Fitzgibbon, and J. Kautz, “PMBP: PatchMatch belief propagation for correspondence field estimation,” Int. J. Comput. Vis. 110, 2–13 (2014) [CrossRef]  

34. T. Taniai, Y. Matsushita, and T. Naemura, “Graph cut based continuous stereo matching using locally shared labels,” in Conference on Computer Vision and Pattern Recognition (2014), pp. 1613–1620.

35. T. Taniai, Y. Matsushita, Y. Sato, and T. Naemura, “Continuous 3D label stereo matching using local expansion moves,” IEEE Trans. Pattern Anal. Mach. Intell. 40, 2725–2739 (2018). [CrossRef]  

36. M. Bleyer, C. Rhemann, and C. Rother, “PatchMatch stereo–stereo matching with slanted support windows,” in British Machine Vision Conference (BMVA) (2011), pp. 1–11.

37. L. Li, S. Zhang, X. Yu, and L. Zhang, “PMSC: patchmatch-based superpixel cut for accurate stereo matching,” IEEE Trans. Circuits Syst. Video Technol. 28, 679–692 (2018). [CrossRef]  

38. Y. Xiao, D. Xu, G. Wang, X. Hu, Y. Zhang, X. Ji, and L. Zhang, “Confidence map based 3D cost aggregation with multiple minimum spanning trees for stereo matching,” in International Conference on Computer Analysis of Images and Patterns (CAIP) (submitted).

39. K. Batsos, C. Cai, and P. Mordohai, “CBMV: a coalesced bidirectional matching volume for disparity estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 2060–2069.

40. J. Navarro and A. Buades, “Dense and robust image registration by shift adapted weighted aggregation and variational completion,” Image and Vision Computing (submitted).

41. W. Wu, H. Zhu, S. Yu, and J. Shi, “Stereo matching with fusing adaptive support weights,” IEEE Access 7, 61960–61974 (2019). [CrossRef]  

42. R. Hamzah, A. Kadmin, M. Hamid, S. Fakhar, A. Ghani, and H. Ibrahim, “Improvement of stereo matching algorithm for 3D surface reconstruction,” Signal Process. Image Commun. 65, 165–172 (2018). [CrossRef]  

43. H. Li, Y. Sun, and L. Sun, “Edge-preserved disparity estimation with piecewise cost aggregation,” International Journal of Geo-Information (submitted).

44. T. Yan, Y. Gan, Z. Xia, and Q. Zhao, “Segment-based disparity refinement with occlusion handling for stereo matching,” IEEE Trans. Image Process. 28, 3885–3897 (2019). [CrossRef]  

45. S. Safwana Abd Razak, M. Othman, and A. Kadmin, “The effect of adaptive weighted bilateral filter on stereo matching algorithm,” Int. J. Eng. Adv. Technol. 8, C5839028319 (2019).

46. H. Li and C. Cheng, “Adaptive weighted matching cost based on sparse representation,” IEEE Transactions on Image Processing (submitted).

47. C. Cheng, H. Li, and L. Zhang, “A new stereo matching cost based on two-branch convolutional sparse coding and sparse representation,” IEEE Transactions on Image Processing (submitted).

48. S. Patil, T. Prakash, B. Comandur, and A. Kak, “A comparative evaluation of SGM variants for dense stereo matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (15)

Fig. 1.
Fig. 1. Overview of the proposed method.
Fig. 2.
Fig. 2. Disparity maps under different cost volumes. (a), (c) Disparity maps under single color cost volume; (b), (d) disparity maps under single gradient cost volume.
Fig. 3.
Fig. 3. Disparity maps under different cost volume merging algorithms. (a), (c) Disparity maps under the fixed proportion merging method of [15]; (b), (d) disparity maps under our fused cost volume. All results are obtained under the guided filter cost aggregation method.
Fig. 4.
Fig. 4. Disparity maps under different scales. (a) Disparity maps obtained under the original scale, (b) disparity maps obtained under the half-scale, and (c) disparity maps obtained under our disparity merging method.
Fig. 5.
Fig. 5. Pixel category images. (a) Original left image, (b) segmentation category image by mean shift, (c) texture category image, and (d) the LRC category image.
Fig. 6.
Fig. 6. Edge regions detection. (a) First filled disparity map, (b) canny edge of left image, (c) ground-truth disparity map, and (d) the location relationship of the expanded edge region and the true edge.
Fig. 7.
Fig. 7. Disparity limitation on edge region cost aggregation.
Fig. 8.
Fig. 8. Disparity map under different weight computation algorithms. (a) Disparity map before edge optimization, (b) disparity maps under the color and distance weights and without the disparity limitation, and (c) disparity maps under our weight assignment method.
Fig. 9.
Fig. 9. Support region setting for disparity filling. (a) Horizontal support region, (b) rectangle support region, (c) eight scan lines support region, (d) cross-support region. Pixels with red edges form the support regions.
Fig. 10.
Fig. 10. Results of every procedure under the proposed method. (a) Primary disparity maps, (b) first filled disparity maps, (c) edge-optimized disparity maps, (d) second filled disparity maps.
Fig. 11.
Fig. 11. Results of the proposed method: (a), (c), and (e) original images; (b), (d), and (f) the final results of our method (red pixels are error pixels). The original images from left to right are Adirondack, Midd1, and Dolls.
Fig. 12.
Fig. 12. Comparison results between CSCA and the proposed method. (a) Left images, (b) disparity maps under CSCA, and (c) disparity maps under the proposed method.
Fig. 13.
Fig. 13. Comparison of disparity maps generated from different algorithms. Error pixels are marked in red. (a) Left image, (b) PMBP, (c) SPM-BP, (d) GCLCL, (e) PMSC, and (f) our proposed method. Best viewed with zoom-in on a digital display.
Fig. 14.
Fig. 14. Comparison of disparity maps generated from different algorithms. Error pixels are marked in red. (a) Left image, (b) PatchMatch, (c) PMF, (d) LocalExp, (e) our proposed. Best viewed with zoom-in on a digital display.
Fig. 15.
Fig. 15. Comparison of disparity maps generated from different algorithms on Middlebury 2014. Error pixels are marked in red. From top to bottom are the results of 3DMST-CM [38], CBMV-ROB [39], DAWA-F [40], FASW [41], ISM [42], PWCA-SGM [43], SDR [44], SM-AWP [45], SMSSR [46], TCSCSM [47], Ref. [48], and our proposed method. Images from left to right are Adirondack, Jade Plant, Motorcycle, Piano, Pipes, Playroom, PlaytableP, Recycle, Shelves, and Teddy. Best viewed with zoom-in on a digital display.

Tables (7)

Tables Icon

Table 1. First Disparity Filling Process

Tables Icon

Table 2. Second Disparity Filling Process

Tables Icon

Table 3. Parameters Set in our Experiments

Tables Icon

Table 4. Computation Time in Seconds

Tables Icon

Table 5. Comparison Result between CostFilter and the Proposed Algorithms

Tables Icon

Table 6. Comparison Result under Middlebury 2006 Data Set on Non-occluded Regions a

Tables Icon

Table 7. Average Error Rates of the 10 Images on Middlebury 2014 Data Sets in Non-occluded Regions a

Equations (19)

Equations on this page are rendered with MathJax. Learn more.

C A D ( p , d ) = min ( 1 3 i ( R , G , B ) | I i left ( p ) I i right ( p d ) | , τ A D ) ,
C G D ( p , d ) = x C G D ( p , d ) + y C G D ( p , d ) ,
x C G D ( p , d ) = min ( | x I gray left ( p ) x I gray right ( p d ) | , τ G D ) ,
y C G D ( p , d ) = min ( | y I gray left ( p ) y I gray right ( p d ) | , τ G D ) .
C ( p , d ) = ( 1 α ) C A D ( p , d ) + α C G D ( p , d ) .
C ( p , d ) = a k × C A D ( p , d ) + b k p Ω ( x ) ,
C ( p , d ) = C G D ( p , d ) n p ,
E ( a k , b k ) = p Ω ( x ) ( ( a k × C A D ( p , d ) + b k C G D ( p , d ) ) 2 + ε a k 2 ) ,
a k = 1 | w | p Ω ( x ) C A D ( p , d ) × C G D ( p , d ) μ × C G D ¯ δ 2 + ε ,
b k = C G D ¯ a k μ ,
T ( p ) = 1 | w | q Ω ( p ) t ( q ) ,
t ( q ) = { 1 if S ( p ) = S ( q ) 0 otherwise ,
C ( p , d ) = q Ω ( p ) w ( p , q ) × C ( p , d ) ,
Col ( p , q ) = exp ( c ( R , G , B ) | I c ( q ) I c ( p ) | λ 1 ) ,
D ( p , q ) = exp ( | x q x p | + | y q y p | r 7 ) ,
Dis ( p , q ) = exp ( disp ( q ) λ 2 ) .
w ( p , q ) = { Dis ( p , q ) D ( p , q ) Col ( p , q ) if disp ¯ fill 1 numDisp 2 D ( p , q ) Col ( p , q ) if disp ¯ fill 1 > numDisp 2 Col ( p , q ) if S ( q ) = S ( p ) and disp fill 1 ( q ) disp fill 1 ( p ) 0 if LRC ( q ) = 1 ,
w ( p , q ) = { 0 if disp ¯ fill 1 > numDisp 2 and disp fill 1 ( q ) < disp ¯ fill 1 λ 3 Col ( p , q ) otherwise .
similar ( p , q ) = c ( R , G , B ) | I c ( q ) I c ( p ) | ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.