Abstract
Disparity estimation for binocular images is an important problem for many visual tasks such as 3D environment reconstruction, digital hologram, virtual reality, robot navigation, etc. Conventional approaches are based on brightness constancy assumption to establish spatial correspondences between a pair of images. However, in the presence of large illumination variation and serious noisy contamination, conventional approaches fail to generate accurate disparity maps. To have robust disparity estimation in these situations, we first propose a model - color monogenic curvature phase to describe local features of color images by embedding the monogenic curvature signal into the quaternion representation. Then a multiscale framework to estimate disparities is proposed by coupling the advantages of the color monogenic curvature phase and mutual information. Both indoor and outdoor images with large brightness variation are used in the experiments, and the results demonstrate that our approach can achieve a good performance even in the conditions of large illumination change and serious noisy contamination.
© 2012 Optical Society of America
1. Introduction
Disparity estimation for binocular images is an important problem for many visual tasks such as 3D environment reconstruction, digital hologram, virtual reality, robot navigation, etc. Typically, a matching cost is calculated at every pixel for all disparities under consideration. Conventional approaches usually assume constant intensities for matching image positions. Commonly used pixel-based matching costs are absolute differences, squared differences, sampling-insensitive absolute differences [1]. Window-based matching costs include the sum of absolute, squared differences and normalized cross correlations [2]. However, in the presence of illumination change, constant intensity constraint cannot hold any more and the corresponding disparity map thus contains a lot of errors. Mutual information, as an alternative of matching cost, has been used to compute visual correspondence because of its power to handle some brightness variations [3, 4]. In [5], Geiger et al. proposed a fast and efficient large-scale stereo matching approach. This work can achieve the state-of-the-art performance without the need for global optimization, however, it suffered from large illumination change.
In contrast to intensity, phase information, as an important feature of image, has the advantage of being invariant to illumination change. Different to gradient information, phase has different responses to lines and edges. It contains most significant structure information and the original image can be reconstructed based on only the phase information [6]. In [7, 8], the rotationally invariant monogenic phase model was proposed for gray images. Later on, Demarcq et al. [9] generalized it to handle color images. Unfortunately, the monogenic phase cannot yield accurate results for highly curved lines and edges. In our previous work [10], we proposed monogenic curvature phase to model curved lines and edges. Although it has been applied to compute visual correspondence with good performance [11], only gray images can be processed and multiscale information was not taken into consideration.
The main goal of this paper is to estimate robust disparities in the large illumination change and serious noise contamination environment. To this end, we first propose a model - color monogenic curvature phase to describe the features of color images by embedding the monogenic curvature signal into the quaternion framework. Then we present a multiscale method to compute the disparity map by coupling the advantages of mutual information and the color monogenic curvature phase. To illustrate the efficiency of the proposed approach, we include both indoor and outdoor images with large illumination change for the experiments. Presented experimental results demonstrate that our approach can achieve a good performance even in the conditions of large brightness variation and serious noise corruption.
2. Color monogenic curvature phase
2.1. Monogenic curvature signal
Given a 2D gray image f(x, y), (x, y) ∈ R2, the monogenic curvature signal [10] is defined as
The first component f1 can be obtained as where * represents the convolution operator, and are the two parts of Riesz kernel [7]. Applying the second order Hilbert transform [12] to f1 will yield the other two components f2 and f3 of the monogenic curvature signal. In the frequency domain, the second order Hilbert transform reads H2 = [cos2α sin2α]T, where α is the polar coordinate. The other two components of the monogenic curvature signal are respectively given by where ℱ−1 refers to the inverse Fourier transform and F1 is the Fourier transformed result of f1.Convolving the monogenic curvature signal with the Poisson kernel thus results in the monogenic curvature scale-space fmc(x, y, s) with s being the scale parameter. The monogenic curvature scale-space performs a split of identity, from it, three independent local features, i.e. the amplitude, main orientation and monogenic curvature phase, can be simultaneously obtained as
where atan2(·) ∈ (−π, π] and u(x, y, s) = [f2(x, y, s) f3(x, y, s)]T.2.2. Color monogenic curvature scale-space
The monogenic curvature scale-space was designed to describe the characteristics of gray images, unfortunately, color information is not incorporated in this model. In [13, 14], quaternion was introduced to represent color images. For a color image f(x,y) in RGB color space, it can be represented by encoding three channels as a pure quaternion
where i, j and k are three imaginary units, fr, fg and fb indicate the red, green and blue channels of the color image. We are thus inspired to extend the monogenic curvature scale-space to the color domain by embedding it into the framework of quaternion. Similar to the color image representation, corresponding components of monogenic curvature scale-space are considered as three channels to be encoded in a pure quaternion. Therefore, the color monogenic curvature scale-space fcmc can be constructed as where fnmc refers to the monogenic curvature scale-space of the nth color channel.The corresponding color monogenic curvature phase Φcmc is given by
Figure 1 illustrates the computed color monogenic curvature phase results at the first scale. Top row contains three test images taken from [15], they are captured under different camera exposure and lighting conditions. Bottom row includes corresponding color monogenic curvature phase images. It is shown that the color monogenic curvature phase is very robust against large illumination variation.3. Disparity estimation
To deal with stereo analysis in the environment of large brightness variation and noisy corruption, we propose a multiscale method by combining the advantages of mutual information and the color monogenic curvature phase. Figure 2 illustrates the structure of the proposed multiscale disparity estimation approach. Given a color image pair Il and Ir, the corresponding phase information Φl and Φr can be extracted by applying the color monogenic curvature phase model. Based on Φl and Φr, two pyramids are correspondingly constructed by down-sampling the original phase images. At each scale s, the disparity map can be computed by using the mutual information of two phase images Φl,s and Φr,s as the matching cost. From the coarsest scale, the estimated disparity map is used in the next scale for initialization, and this continues to the finest scale.
At a given scale, the mutual information of the color monogenic curvature phase image pairs Φl,s and Φr,s can be defined as
where H(Φl,s) and H(Φr,s) are the Shannon entropy which can be given by where EΦ indicates the expected value function of Φ, P(Φ) is the probability of Φ, Ωϕ refers to the domain over which the random variable can range and ϕi is an event in this domain. H(Φl,s, Φr,s) indicates the joint entropy of Φl,s and Φr,s, it is represented in the following form where E refers to the expectation, P(Φl,s, Φr,s) is the joint distribution of Φl,s and Φr,s. Since Eq. (13) defines the mutual information for the whole phase image, similar to [16], we approximate the whole mutual information as the sum of the pixel-wise mutual information and use it as a data cost, that is where dp refers to the disparity at the pixel p.Typically, disparity estimation can be obtained by minimizing the following energy expression
where Edata is a matching cost which works as a similarity measure and Esmooth is the smooth energy which penalizes disparity differences. In this paper, we use the mutual information of the color monongeic curvature phase image as a matching cost. Based on the approximation, the pixel-wise data energy Edata can be formulated as We use a truncated quadratic function as the smoothness energy, which is defined as where 𝒩(p) is the neighbourhood pixels of the pixel p, and Vpq is represented as with λ being a weighting parameter. The Graph-cuts expansion algorithm proposed in [17] is employed to minimize the energy function for the dense disparity computation.4. Experimental results
In Section 4, we present some experimental results to demonstrate the efficiency of our proposed approach. First, we take two datasets “baby1” and “lampshade1” from the middlebury stereo benchmark [15, 18] as the test images. These two datasets respectively contain rich and less texture information. All images are captured under the conditions of three real different lighting sources and three different exposures, and illuminations are not equally changed over whole images. The images are rectified and radial distortion has been removed. In [5], the intensity-based approach has been proved to achieve the state-of-the-art performance of stereo matching, therefore, we include the estimated disparity results from the gray monogenic curvature phase based approach [11] and this approach for comparison.
Figure 3 shows estimated disparities for “baby1” by three different methods. Top row from left to right are two views of “baby1” which has the large brightness change, and the disparity ground truth. Bottom row illustrates estimated results using the intensity-based approach, gray monogenic curvature phase based approach and the proposed approach. It is shown that for two images with large illumination variation, intensity based approach fails to generate good results because of its sensitivity to the brightness change, the proposed method produces the best results, gray monogenic curvature phase based method performs slightly worse because no color information is incorporated and the multiscale implementation is not taken into consideration. The “lampshade1” dataset contains images with less texture information, which makes disparity estimation more difficult than that of the rich texture images. Figure 4 demonstrates the corresponding results from three methods. Top row contains two views of “lampshade1” with some brightness change and the ground truth disparity. Bottom row from left to right are estimated disparities from the intensity based, gray monogenic curvature phase based and the proposed approaches. Due to the low texture images, these three approaches generate not very good disparity maps, however, our approach still performs the best among them.
In order to quantitatively evaluate the performance of our approach, we use different lighting combinations with the same camera exposure as input image pairs and compute errors in unoccluded areas. Figure 5 shows disparity errors in unoccluded areas with respect to different lighting combinations for “baby1” and “lampshade1”. The horizontal axis represents the combination of lighting conditions, e.g. “1/3” means the left image is taken under lighting condition 1 and the right image is taken under lighting condition 3. In this figure, “Color phase” indicates the proposed method, “PMI” refers to the gray monogenic curvature phase based approach [11] and “Intensity” represents the intensity based approach [5]. It is shown that the larger the lighting condition difference is the larger the errors are for all these approaches, however, our approach performs the best. To test the robustness of the proposed approach, we use noise contaminated image pairs with the same lighting condition to check the estimated errors. Figure 6 demonstrates disparity errors in unoccluded areas with respect to signal to noise ratios for “baby1” and “lampshade1”. With the increase of signal to noise ratio, estimated errors are correspondingly decreased, and our approach still outperforms others.
Up to now, the experimental image pairs are captured in the indoor environment with ground truth. To investigate more about the performance of our approach in the outdoor environment, we use two different cameras arranged in a stereo vision system to capture outdoor images with strong lighting changes for the experiment. Figures 7 and 8 illustrate different views of outdoor images with large illumination variation and the estimated disparity maps using intensity-based approach [5], gray monogenic curvature phase [11] and the proposed method. It is shown that the intensity-based approach cannot generate good disparity map due to the large brightness change, the gray monogenic curvature phase performs much better than the intensity-based one, and our approach works the best.
5. Conclusions
This paper addresses the problem of estimating robust disparity maps in the large illumination change and noisy contamination environment. Conventional approaches are based on the brightness constancy assumption, however, they fail to generate accurate disparities in this special case. To have robust disparity estimation, we first propose a model - color monogenic curvature phase by embedding the monogenic curvature signal into the quaternion representation, this results in the generalization of the monongeic curvature phase to the color domain. Then, we propose a multiscale framework to estimate disparities by coupling the advantages of mutual information and the color monongeic curvature phase. We use both indoor and outdoor images with large illumination change in the experiments. Demonstrated results prove that our approach outperforms the intensity based and monogenic curvature phase based approaches, and it can a achieve a good performance even in the conditions of large brightness variation and noise corruption.
Acknowledgment
This work has been supported by National Natural Science Foundation of China ( 61103071, 61105122, 61103072), Natural Science Foundation of Shanghai, China ( 11ZR1440200), Research Fund for the Doctoral Program of Higher Education of China ( 20110072120065) and the Key Basic Program of Science and Technology Commission of Shanghai Municipality of China ( 10DJ1400300).
References and links
1. S. Birchfield and C. Tomasi, “A pixel dissimilarity measure that is insensitive to image sampling,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 401–406 (1998). [CrossRef]
2. H. Moravec, “Toward automatic visual obstacle avoidance,” in Proceedings of 5th International Joint Conference on Artificial Intelligence, (Morgan Kaufmann, 1977), pp. 584–590.
3. C. Fookes, M. Bennamoun, and A. Lamanna, “Improved stereo image matching using mutual information and hierarchical prior probabilities,” in Proceedings of 16th International Conference on Pattern Recognition, (IEEE, 2002), pp. 937–940.
4. I. Sarkar and M. Bansal, “A wavelet-based multiresolution approach to solve the stereo correspondence problem using mutual information,” IEEE Trans. Syst. Man. Cybern., B: Cybern. 37, 1009–1014 (2007). [CrossRef]
5. A. Geiger, M. Roser, and R. Urtasun, “Efficient large-scale stereo matching,” in Proceedings of 10th Asian conference on Computer vision - Volume Part I, (Springer-Verlag, 2011), pp. 25–38.
6. A. V. Oppenheim, “The importance of phase in signals,” Proc. IEEE 69, 529–541 (1981). [CrossRef]
7. M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Trans. Signal Process. 49, 3136–3144 (2001). [CrossRef]
8. M. Felsberg and G. Sommer, “The monogenic scale-space: a unifying approach to phase-based image processing in scale-space,” J. Math. Imaging Vision 21, 5–26 (2004). [CrossRef]
9. G. Demarcq, L. Mascarilla, M. Berthier, and P. Courtellemont, “The color monogenic signal: application to color edge detection and color optical flow,” J. Math. Imaging Vision 40, 269–284 (2011). [CrossRef]
10. D. Zang and G. Sommer, “Signal modeling for two-dimensional image structures,” J. Visual Commun. Image 18, 81–99 (2007). [CrossRef]
11. D. Zang, J. Li, and D. Zhang, “Robust visual correspondence computation using monogenic curvature phase based mutual information,” Opt. Lett. 37, 10–12 (2012). [CrossRef] [PubMed]
12. F. Brackx, B. D. Knock, and H. D. Schepper, “Generalized multidimensional hilbert transforms in clifford analysis,” Int. J. Math. Math. Sci. 2006, 98145 (2006). [CrossRef]
13. S. J. Sangwine, “Fourier transforms of color images using quaternion or hypercomplex numbers,” Electron. Lett. 32, 1979–1980 (1996). [CrossRef]
14. N. L. Bihan and S. J. Sangwine, “Quaternion principal component analysis of color images,” in Proceedings of IEEE International Conference on Image Processing, (IEEE, 2003), pp. 809–812.
15. http://vision.middlebury.edu/stereo/.
16. J. Kim, V. Kolmogorov, and R. Zabih, “Visual correspondence using energy minimization and mutual information,” in Proceedings of IEEE International Conference on Computer Vision, (IEEE, 2003), pp. 1033–1040.
17. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001). [CrossRef]
18. D. Scharstein and C. Pal, “Learning conditional random fields for stereo,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2007), pp. 1–8.