Abstract
The multiview images captured by toed-in camera array can reproduce the 3D scene vividly with appropriate positive, negative, and zero disparities. However, it is a challenging task to adjust the depth of the scene according to requirements of visual effects. In this paper, we propose a novel disparity control method based on projection to solve this problem. With the relationship between the world coordinate system and camera coordinate system, the zero disparity point in reference view is projected into other views. Thus, disparities of different views are obtained through matched corresponding points and views are shifted with calculated disparities. The proposed method is easy to implement, and the depth of toed-in multiview images can be adjusted as requirements. Experiment results show that the proposed method is effective in comparison to the conventional method, and the processed multiview images present desirable stereoscopic visual quality.
© 2014 Optical Society of America
1. Introduction
Three-Dimensional (3D) display has undergone rapid growth, and large numbers of multiview video products have entered the vast market [1–3]. Multiview autostereoscopic display provides stereo perception without the use of special glasses and offers great convenience to viewers [4]. Attracted by this convenience, autostereoscopic display technology has become more and more popular [5, 6].
When capturing multiview images for autostereoscopic display, the layouts of camera array are classified into two types: parallel camera array and toed-in camera array [7]. Multiview images obtained through parallel camera array only have negative disparity, and the whole scene is viewed out of the monitor. Because of this display effect, the original scene cannot be displayed vividly and naturally. Comparatively, multiview images captured by toed-in camera array have both positive and negative disparity. In this case, vivid 3D perception can be obtained through such a disparity range. However, multiview images captured directly by the toed-in array have constant depth range, and this defect needs to be compensated through special processing [8].
In order to obtain better stereoscopic visual effect, the disparities of multiview images need to be controlled. Kwon et al. proposed a disparity control method which uses disparity information for a binocular stereoscopic camera [9]. They extract disparity information through central object matching between the left and right views, and calculate the disparity value of the same object in the labeled two views. Disparity control is realized by moving the two cameras via camera motors. But they did not consider the multiview stereoscopic images and the hardware devices are difficult to control. Deng et al. developed a parallel tri-view disparity control method, and acquired the parallax images with both positive and negative horizontal disparities [10]. However, this method only suits for parallel images and fixed depth scenes. In [11], Mean Shift algorithm was utilized to segment views into regions, views were shifted according to the disparity of the object in the central region, and disparity control of the parallel multiview images was realized. However, the method cannot be applied to multiview images captured by the toed-in camera array because the disparities between different neighboring views are different.
In this paper, we propose a projection-based disparity control method for toed-in multiview images. Pixel projection utilizing camera parameters and geometry knowledge are adopted to search the corresponding pixel pair in different views. Then, the disparities of different views are calculated based on the position relationship of the pixel pair. Finally, the multiview images are shifted according to the different disparities. The stereo effect of the scene can be changed to meet various demands by adjusting the disparity of the pixel pair.
The rest of this paper is organized as follows. Section 2 introduces the background and related work. Section 3 describes our projection-based disparity control method in detail. Section 4 reports experimental results. Finally, Section 5 summarizes this paper.
2. Background and related work
2.1 Principle of binocular disparity
Humans have the ability of binocular vision because of the perception of binocular disparity. Disparity is usually divided as zero disparity, positive disparity and negative disparity, as shown in Fig. 1. L stands for the distance between two eyes, and Z denotes the depth between object and eyes. Objects with zero disparity will be viewed right at the screen plane. Other parts of the scene with negative or positive disparity will appear in front of or behind the screen, respectively. The human brain reconstructs real scenes and stereoscopic perception according to the disparity and color stimulation. Stereoscopic perception could become more distinct if the disparity range is suitable. The multiview autostereoscopic display utilizes the principle of binocular disparity and motion disparity. Every two adjacent views can be regarded as binocular views, and the binocular views are auto selected based on the viewer’s position. This explains the interactive selection of views in autostereoscopic display.
2.2 The transformation relationship between multiple views
Projection has been widely used in the transformation of geometric coordinates. The positional relationship between reference pixel and corresponding pixel in the image is determined by the geometric model of camera imaging. Autostereoscopic images are captured from different perspectives at pre-fixed geometric positions. As the intrinsic reference matrix and position of the cameras can be calibrated in advance, stereoscopic images and 3D shapes of objects can be obtained immediately [12, 13].
3D scenes can be reconstructed based on the relationship among the world coordinate system, camera coordinate system and computer coordinate system. The relationship between the world coordinate system and camera coordinate system is shown in Fig. 2.
The world coordinate system, which contains three dimensions u, v and w, is generally used to represent positions of objects in the real world. The camera coordinate system, in which the camera optical center locates on the axis, indicates the plane positions of objects using two other axes x and y. Coordinates calibrated for images displayed on the computer screen are defined as the computer coordinate system, which generally sets the upper left corner of the image as the coordinate origin. With the given intrinsic and extrinsic parameters of stereo camera, image position in the computer coordinate system can be obtained [12].The camera parameter is defined as:
where intrinsic matrix is a 3*3 upper triangular matrix, which can be depicted as:where and are the focal lengths along the u and axes, is the skew parameter, and is the coordinate of principal point.The matrix is the extrinsic matrix and indicates the relationship between the world coordinate and the camera coordinate. The three-dimensional vector stands for the position, the 3*3 orthogonal matrix represents the direction, and it satisfies the constraint condition as follows.
where represents rotation parameters. The above parameters and relationships can be used in the projection and view synthesis process.2.3 Conventional disparity control based on virtual view rendering
Disparity control can be exploited based on virtual view rendering to restrain rendering distance between views. Firstly, virtual views are rendered, and the rendering distances between virtual views are chosen as:
where presents the numbers of virtual views that are generated by DIBR [14], and stands for the distance between the two original views to generate virtual views. Then, the multiple intermediate adjacent virtual views can be chosen to synthesize the stereoscopic composite image. The stereoscopic composite results of Ballet under different numbers of virtual views are shown in Fig. 3.It can be seen from Fig. 3 that the stereo images tend to be clearer with larger . This disparity control method operates easily and is suitable for both parallel and toed-in multiview images. However, the stereo perception will be reduced if is beyond a certain range, and the stereoscopic image quality will be decreased through the process of virtual view rendering.
In this paper, we propose a disparity control method based on projection which takes the above problems into consideration. First, corresponding pixel pairs are obtained by projection with the utilization of camera parameters. Second, the disparities of the different views are calculated based on the position relationship of the pixel pairs. Finally, views are shifted based on the disparity values. With the proposed method, the disparity control for toed-in multiview images is realized accurately, and the stereoscopic effect can be adjusted as requirements.
3. Proposed disparity control method
3.1 The overall process
The overall process of the proposed disparity control method can be separated into four steps, as shown in Fig. 4. First, in order to achieve comfortable display effect, zero-disparity point (ZDP) in the reference image is chosen manually according to different requirements. The selected ZDP is considered as basic point of the projection to find the corresponding points in other views. During the projection step, 3D warping [15] which has ever been used in view synthesis is applied to search the matching points. In detail, using multiview texture images and their corresponding depth images, the corresponding point of ZDP is accurately located in each view through projection with the help of 3D warping, which uses camera parameters and transformation from the world coordinate to the camera coordinate. With matched pixel pairs formed by ZDP and its corresponding points, disparity values are calculated separately, and images of other views are shifted based on these values.
3.2 ZDP projection with 3D warping
Considering the relationship between the camera coordinate and world coordinate, image pixels are mapped into world space firstly and then projected to multiview images through 3D warping. With that principle, ZDP is mapped to multiview image plane. The coordinate of the object is defined as matrixin the camera coordinate system and as matrix in the world coordinate system. Their relationship can be described as:
where is a constant value. The above equation can be expanded as follows:The process of coordinate transformation is shown as follows. First, coordinate transformation is implemented from ZDP to the corresponding space point in the world coordinate system.
Second, coordinate transformation is utilized from corresponding space point in the world coordinate system to corresponding point in the computer coordinate.
where and stand for the coordinate of ZDP in the reference image plane and the world coordinate, respectively. The three-dimensional vectorsanddenote the position of ZDP in the reference image plane and the world coordinate. is the depth value of ZDP, and indicates the coordinate of corresponding point.With these relationships, 2D coordinate of the ZDP’s corresponding point can be obtained through normalization. Based on the projection with 3D warping, ZDP can be matched to the corresponding point in each view. The principle of the ZDP projection is shown in Fig. 5
Different views of multiview images captured by toed-in camera array have different camera parameters, so the different parameters are used to ensure the computation accuracy of the ZDP corresponding point position, where denotes the number in multiview images. Using 3D warping, disparitybetween the coordinate value of ZDP in reference view and the corresponding pointin other view is calculated. As corresponding points are located in the right and left sides of ZDP, disparities can raise a negative value. The disparity value can be depicted as:
To demonstrate the effectiveness of the ZDP projection with 3D warping, a quantitative assessment experiment, which utilizes Root Mean Squared Error (RMSE), is designed to test the matching accuracy. Firstly, a location ground truth test set (including ten frames each of Ballet and Breakdancers) of the matching points of the ZDP is generated manually. Then the locations of the matching points using the segmentation-based matching method [11] and the proposed 3D warping method are obtained respectively. Finally, the following RMSE matric is used to compute the matching accuracy for every frame. The results are presented in Table 1.
where denotes the number of views for each frame. denotes the location of the matched corresponding point of the ith view from the segmentation-based method or the 3D warping method, denotes the ground truth location of the matching point obtained manually.From the results, we can see that the matching points obtained by the segmentation-based method have large error, which will lead to the viewer cannot recognize any stereo sense, and might feel annoyed with the generated composite images. With the proposed method, we can find accurate matching points of ZDP, and improve the stereo sense of the generated composite images.
3.3 Image shifting
Based on the computed disparity , multiview images are shifted respectively. The ZDP and its corresponding points in these images are moved to the same place. Because displacements between corresponding points of adjacent views are different, the distances that need to shift are related to. The schematic diagram of image shifting is shown in Fig. 6.
It is noted that the horizontal distances between adjacent views are equal. However, the horizontal displacements of the object in adjacent views are different. Views are shifted by the distance:
After the image shifting, image cropping is adopted. This ensures that all views have the same resolution.
4. Experimental results
To demonstrate the performances of disparity control, multiview autostereoscopic display experiments are conducted. We evaluate the performance of our proposed projection-based disparity control method and compares it with the conventional disparity control method based on virtual view rendering in terms of the quality of stereo sense. Two toed-in sequences, Ballet and Breakdancers [16], are selected for experiments. The resolution of the two sequences is 1024 × 768. Meanwhile, there are corresponding depth maps of each view, and the internal camera parameters matrix and external camera parameters matrix have been provided. The view3, view5, and the corresponding depth maps are utilized to generate eight views (I1, I2……I8) using the virtual viewpoint synthesis algorithm. Then, I1, I2……I8 are used to construct eight-view stereoscopic images of the toed-in camera array. We choose I4 as the reference view in disparity control. In addition, considering the display resolution, images are cropped to 720 × 384 after disparity control. Since the scenes in the test sequences are complex, ZDP is first selected at the central point, and then changed freely so that the different effects can be compared for obtaining a suitable stereoscopic sense.
4.1 Disparity Control with Central Point
Figure 7 shows examples of the test sequences. The top line images are the frames of I1 for Ballet test sequence, and the bottom line images are the frames of I1 for Breakdancers test sequence.
Figure 8 and Fig. 9 show three frames of the composite images generated from eight viewpoint stereoscopic images of Ballet and Breakdancers, respectively. The red points indicate the locations of ZDP, and the rectangles show the surrounding area of ZDP. It is obvious that the ZDP zone has a vague outline on the composite images without disparity control. Moreover, the stereoscopic visual perception of the composite images in Figs. 8(a) and 9(a) are unimpressive when the images are displayed on the autostereoscopic display device. In stereoscopic composite images with proposed disparity control method, ZDP and its surrounding area are clear, as shown in Figs. 8(b) and 9(b). The other objects in the images are blurred due to positive or negative disparities. When the composite images are displayed on stereoscopic screen, ZDP lies on the surface of screen plane and other parts with negative or positive disparity will be viewed as outside or inside the screen obviously.
4.2 Disparity control with different expected ZDP locations
According to the ZDP location, we can adjust the depth of stereoscopic composite image. The results of disparity control with different locations of ZDP are shown in Figs. 10 and 11. In Fig. 10, the positions on the female dancer and the male coach, which we concerned most, are chosen as ZDP respectively. If the position on the female dancer is chosen as ZDP, the female dancer will be viewed on the surface of the screen, and the surrounding areas of the female dancer will demonstrate a comfortable stereo sense. If the position on the male coach is chosen as ZDP, the male coach will be viewed on the surface of screen after disparity control. In Fig. 11, the positions on the left spectator and the hip-hop dancer are chosen as ZDP respectively. Viewers can achieve different stereoscopic visual experiences with different ZDP positions. Meanwhile, it can be seen that the ZDP zones on stereoscopic composite images in Figs. 10 and 11 are clear, and the other parts of the images have different degrees of blur. When the composite image is displayed on the multiview autostereoscopic display, the stereo effect will change with the depth of ZDP. If the depth of the object is less than the ZDP, that is to say the object has negative disparity, and a bulgy effect will appear when displayed on the screen. On the contrary, a concave effect will be seen where the depth is larger than the ZDP. With the proposed method, the disparity and depth of stereoscopic composite images are controlled freely by selecting different ZDP locations. The proposed disparity control method is effective, and the multiview image after disparity control has a considerable stereoscopic effect compared with the original multiview images.
4.3 Subjective evaluation
To intuitively evaluate the quality of stereoscopic composite images, mean opinion score (MOS) [17] test has been carried out. The subjective assessment is based on the following aspects: the existence of the stereo sense and depth information, the comfort degree of stereo sense, and the naturalness degree of the stereo sense. Detail standard is presented in Table 2. The MOS results for the video sequences are divided into two groups: Ballet and Breakdancers. Each sequence has three comparisons of stereoscopic composite images obtained by original multiview images without disparity control, multiview images with conventional disparity control based on virtual view rendering, and multiview images after the proposed disparity control method.
Figure 12 shows the average MOS results of stereoscopic composite images with 10 frames per sequence, in which the higher MOS means that the image has relatively better stereo visual quality. In Fig. 12, the horizontal axis stands for Ballet and Breakdancers, respectively, and the ordinate axis represents MOS. The blue bar represents the results of stereoscopic composite images obtained by original multiview images without disparity control, the green bar for conventional disparity control based on virtual view rendering, and the brown bar for the proposed disparity control method. It is can be seen that the MOS of composite image without disparity control is smallest. The conventional disparity control method based on virtual view rendering has a higher score. This means that in stereoscopic composite images, it is easier to recognize stereo sense and depth information, and they have more slight influence on the comfort of stereo sense. The proposed method has the highest score, which indicates that the stereo sense and depth information of stereoscopic composite images are easy to recognize, and the stereo visual experiences are comfortable.
5. Conclusion
This paper proposes a novel disparity control method based on projection. First, with the relationship between world coordinate system and camera coordinate system, the ZDP on reference image is mapped to corresponding points on other views based on the principle of projection. Then, disparities of different views are obtained with the ZDP and corresponding points. Finally, image shifting is executed according to the disparities, and the shifted images are used to get the stereoscopic composite images. We have evaluated system performance with several multiview images. Experimental results showed that the stereo sense and depth information of stereoscopic composite images with the proposed method are easy to recognize and the stereo senses are comfortable.
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This research was partially supported by the Natural Science Foundation of China (No. 61271324, 60932007, 61202266), International Science and Technology Cooperation Program (No.2010DFA12780), and Natural Science Foundation of Tianjin(No.12JCYBJC10400).
References and links
1. P. Ndjiki-Nya, M. Köppel, D. Doshkov, H. Lakshman, P. Merkle, K. Müller, and T. Wiegand, “Depth image-based rendering with advanced texture synthesis for 3-D video,” IEEE Trans. Multimed. 13(3), 453–465 (2011). [CrossRef]
2. M. M. Hannuksela, D. Rusanovskyy, W. Su, L. Chen, R. Li, P. Aflaki, D. Lan, M. Joachimiak, H. Li, and M. Gabbouj, “Multiview-video-plus-depth coding based on the advanced video coding standard,” IEEE Trans. Image Process. 22(9), 3449–3458 (2013). [CrossRef] [PubMed]
3. M. Solh and G. AlRegib, “Hierarchical hole-filling for depth-based view synthesis in FTV and 3D video,” IEEE J. Sel. Top. Signal Process. 6(5), 495–504 (2012). [CrossRef]
4. Y.-C. Fan, Y.-T. Kung, and B.-L. Lin, “Three-dimensional auto-stereoscopic image recording, mapping and synthesis system for multiview 3D display,” IEEE Trans. Magn. 47(3), 683–686 (2011). [CrossRef]
5. O. Eldes, K. Akşit, and H. Urey, “Multi-view autostereoscopic projection display using rotating screen,” Opt. Express 21(23), 29043–29054 (2013). [CrossRef] [PubMed]
6. S. Li, J. Lei, C. Zhu, L. Yu, and C. Hou, “Pixel-based inter prediction in coded texture assisted depth coding,” IEEE Signal Process. Lett. 21(1), 74–78 (2014). [CrossRef]
7. W. Kang and S. Lee, “Horizontal parallax distortion correction method in toed-in camera with wide-angle lens,” 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video (2009), pp.1–4.
8. A. Woods, T. Docherty, and R. Koch, “Image distortions in stereoscopic video systems,” Proc. SPIE 1915, 36–48 (1993). [CrossRef]
9. K. C. Kwon, Y. T. Lim, N. Kim, Y.-J. Song, and Y.-S. Choi, “Vergence control of binocular stereoscopic camera using disparity information,” J. Opt. Soc. Korea 13(3), 379–385 (2009). [CrossRef]
10. H. Deng, Q.-H. Wang, D.-H. Li, W.-X. Zhao, Y.-H. Tao, and A.-H. Wang, “Disparity images acquired by parallel camera array with shift,” Acta. Photon. Sinica 38(11), 2985–2988 (2009).
11. J. Lei, H. Zhang, C. Hou, and L. Lin, “Segmentation-based adaptive vergence control for parallel multiview stereoscopic images,” Optik (Stuttg.) 124(15), 2097–2100 (2013). [CrossRef]
12. Y. Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, “View generation with 3D warping using depth information for FTV,” Signal Process. Image Commun. 24(1), 65–72 (2009). [CrossRef]
13. Y. R. Huddart, J. D. Valera, N. J. Weston, T. C. Featherstone, and A. J. Moore, “Phase-stepped fringe projection by rotation about the camera’s perspective center,” Opt. Express 19(19), 18458–18469 (2011). [CrossRef] [PubMed]
14. L. Wang, J. Lei, H. Zhang, K. Fan, and S. Bu, “A novel virtual view rendering approach based on DIBR,” The 7th International Conference on Computer Science & Education(Melbourne, Australia, 2012), pp.759–762. [CrossRef]
15. Y. Sehoon and A. Vetro, “View synthesis prediction for multiview video coding,” Signal Process. Image Commun. 24(1), 89–100 (2009).
16. C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM Trans. Graph. 23(3), 600–608 (2004). [CrossRef]
17. ITU-R Recommendation BT, 500–11, “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunication Union, Geneva, Switzerland (2002).