StereoEditor: controllable stereoscopic display by content retargeting

Feng Shao; Libo Shen; Qiuping Jiang; Randi Fu; Gangyi Jiang

doi:10.1364/OE.25.033202

1. Introduction

With the increasing availability of 3D displays, such as 3D movie and TV, especially recent popular virtual reality (VR), more and more people prefer to view 3D content due to its immersive and realistic visual experience [1]. Therefore, it is an urgent task to create and edit stereoscopic media in an intuitive and efficient way. However, even achieving great success in 2D media editing, it is not proper to directly apply the 2D media editing methods to the stereoscopic media because of the additional dimension of depth for 3D display.

Recently, many efforts have been made in stereoscopic media editing, such as StereoPasting [2], 3D Copy&Paste [3], content-aware stereoscopic media retargeting [4], disparity remapping [5]. Image retargeting is one of the most frequently used tools for image editing. However, except the requirement from content copy and paste, stereoscopic media retargeting has its own challenges. On the one hand, hard disparity/depth preservation constraint is often imposed on the left and right views for stereoscopic media retargeting, in which the original disparity is preserved after retargeting. However, the disparity of the retargeted stereoscopic images should be adaptively adjusted so as to match the depth range of the target display. On the other hand, similar to content composition [6], users should have the capability to interactively select the objects, alter the layout or improve the experience for personalized requirement, while the existing methods omit such requirement.

To address the challenges above, we propose StereoEditor, an efficient tool for content-aware, depth-adaptive and object-selective stereoscopic media retargeting. In this paper, to emphasize the differences of the proposed StereoEditor with other existing methods [4-5, 7–11], we focus on: 1) content-aware retargeting, which uses shape distortion and edge distortion for important content preservation; 2) depth-adaptive retargeting, which introduces a depth control module to edit the disparity of each pixel, and combines it into the horizontal and vertical disparity consistency energy terms; 3) object-selective retargeting, which allows users to interactively select the objects, assign the objects’ scaling factors based on their preference; 4) optimization framework by integrating the constraints of warping error, depth control and object layout, which could be solved exactly.

Contributions. Focused on editing the resolution, depth and object of input stereoscopic images for controllable display, our StereoEditor has the following contributions:

1) We propose an optimization framework for stereoscopic editing, which could be solved by taking content-aware, depth-adaptive and object-selective retargeting into account.
2) We present retargeting semantics from the retinal disparity limits, viewing distance, object’s scaling factors, and users can flexibly adjust these parameters in stereoscopic editing.

The rest of the paper is organized as follows: we first review the related work in Section 2, detail our method in Section 3, and present results in Section 4 and discussion in Section 5.

2. Related work

2D media editing methods have been well studied in the past years. Image composition techniques are widely used for image/video editing based on the distribution of various composition descriptors [12], which can be originated from aesthetic, resolution, layout, and other factors. As a special case for image/video composition, 2D media retargeting has received much attention over the recent years. Among those retargeting methods, cropping is the simplest way to compose images by leveraging the composition criteria for content selection, while seam carving resizes an image by iteratively removing or inserting pixels. However, cropping and seam carving may produce significant information loss and/or structure distortion. In contrast, warping-based methods unevenly distribute the distortions on the important and unimportant areas, producing visually plausible results. In this paper, we refer to warping-based method for stereoscopic image/video editing.

Stereoscopic media editing is related to our work. The most direct application of stereoscopic editing is to fulfil special task in stereoscopic image/video. Tong et al. [2] proposed StereoPasting for depth-consistent stereoscopic composition, in which a source 2D image is interactively blended into a target stereoscopic image. Li et al. [13] proposed a method for recoloring stereoscopic images, which only requires a few user strokes on the left view and automatically transfers the corresponding strokes to the right view. Du et al. [14] presented a technique for stereoscopic image perspective manipulation in a stereoscopic image pair. Sharma and Lall [15] proposed a stereoscopic cloning approach for producing composite 3D contents based on the available or estimated geometry and object saliency prior. Luo et al. [16] proposed a patch-based synthesis framework for stereoscopic image editing to maintain the depth interpretation and provide controllability of the scene depth. Yan et al. [17] proposed a hybrid warping model for stereoscopic image stitching by combining projective and content preserving warping.

For the general tasks to change the resolution or depth in stereoscopic image/video, many content retargeting and depth adaptation methods were proposed. Basha et al. [18] presented a stereo seam carving method that iteratively removes paired seams by taking the visibility relations between pixels in the image pair into account. Lei et al. [19] proposed a pixel fusion based stereo image retargeting method to adaptively retarget stereo images with flexible aspect ratios while preserving the depth. Our previous work [20] incorporated stereoscopic visual attention and binocular just-noticeable difference models for significant energy optimization. Chang et al. [4] proposed a content-aware stereoscopic image display adaptation method, which simultaneously resizes the resolution and adapts the depth for comfortable display. Li et al. [7] presented a warping-based stereoscopic image retargeting approach that simultaneously preserves the shape of salient objects and the depth of 3D scenes. Lin et al. [8] proposed a floating boundary with volumetric warping and object aware cropping method to resize stereoscopic videos by utilizing the information of volumetric objects. Niu et al. [21] proposed a stereo cropping method that can automatically crop and scale an existing stereoscopic photo to a variety of displays while preserving its aesthetic value. Our previous work [9] takes shape preservation, visual comfort preservation and depth perception preservation for quality-of-experience (QoE) optimization.

For another type of stereoscopic image/video editing via disparity/depth adaptation, Lang et al. [5] presented a set of disparity mapping operators for stereoscopic 3D, which uses disparity and saliency estimation to compute a deformation of the input views to meet the target disparities. Yan et al. [10] proposed a linear depth mapping method to adjust the depth range of stereo videos according to the viewing conditions. Park et al. [11] presented a method for adjusting the 3D depth of an object by utilizing a virtual fronto-parallel planar projection in the 3D space without the need for dense 3D reconstruction. Lei et al. [22] proposed a disparity shifting method that adjusts the location of zero disparity plane (ZDP) of a scene but maintains the overall disparity range. Lei et al. [23] proposed a depth sensation enhancement method to adjust the depth sensation via an energy optimization solution. Our previous work [24] optimized visual comfort to enhance the overall stereoscopic 3D experience.

From the above, although stereoscopic media editing issues have been widely researched, the issue how to control the stereoscopic media based on user’s preference and practical viewing conditions still deserves to investigate, especially for controllable stereoscopic display requirement. For this purpose, we refer to the issue as content-aware, depth-adaptive and object-selective stereoscopic image retargeting, and design a StereoEditor to control stereoscopic display.

3. Proposed method

Figure 1 shows the pipeline of StereoEditor. Using a stereoscopic image pair as input, our method performs stereoscopic editing to pursue a visually pleasure stereoscopic image output suitable for stereoscopic display. In addition, our method requires user-specified controllable parameters for editing the resolution, depth and object of the stereoscopic images, specifying the retinal disparity limits, viewing distance, and object’s scaling factors. As a result, our method allows users to flexibly edit the parameters, while receiving different retargeting results, which is consisted of the following key steps.

Fig. 1 Pipeline of StereoEditor.

Download Full Size | PDF

3.1 Preprocessing

In this step, we first estimate the disparity of the input stereoscopic image using the stereo matching algorithm [25]. Then, the significance map is generated from the stereoscopic image and the corresponding disparity map using the 3D saliency detection method [26]. A set of matched grids are generated from the left and right images. Let V be the set of all matched grids in the input left and right images, $V = V_{L} \cup V_{R}$ . For each grid, it has four vertexes. If the vertex of a grid in the left image has no paired one in the right image (e.g., it is disoccluded in the right image), we extend the size of adjacent grids in the right image to cover these disoccluded areas.

3.2 Image warping error

Refer to the related works [4,7,9], we also consider the distortions for image retargeting from two aspects: shape distortion and edge distortion. The first one measures the shape deformation between the original and its deformed grids, while the second one measures the bending of the grid edges. In the follows, we will introduce these steps in detail.

1) Shape distortion: Refer to our previous work [9], we attempt to ensure each grid undergo a uniform similarity transformation, and produce small deformation for visually importance content. We define the shape distortion energy for k-th grid (v_k) as:

E (v_{k}) = \sum_{i = 1}^{4} {‖ ρ_{k} (v_{k}^{i}) - {\hat{v}}_{k}^{i} ‖}^{2}

where

{\hat{v}}_{k}^{i}

is the deformed version of the original grid vertex

v_{k}^{i}

. Since only scale and translation are involved in the proposed content-aware retargeting, we formulate the similarity transformation as:

ρ_{k} (v_{k}^{i}) = [\begin{matrix} s_{x} & 0 \\ 0 & s_{y} \end{matrix}] [\begin{matrix} x_{k}^{i} \\ y_{k}^{i} \end{matrix}] + [\begin{matrix} t_{x} \\ t_{y} \end{matrix}]

Taking the eight coordinate parameters of each grid’s four vertexes into account, the similarity transformation can be easily solved via linear least-squares, obtaining $A P = b$ , where the similarity transformation for the k-th grid is expressed as:

P_{k} = {(A_{k}^{T} A_{k})}^{- 1} A_{k}^{T} {\hat{b}}_{k}

where

A_{k} = [\begin{matrix} x_{k}^{1} & 0 & 1 & 0 \\ 0 & y_{k}^{1} & 0 & 1 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{k}^{4} & 0 & 1 & 0 \\ 0 & y_{k}^{4} & 0 & 0 \end{matrix}]

,

{\tilde{b}}_{k} = {[\begin{matrix} {\hat{x}}_{k}^{1} & {\hat{y}}_{k}^{1} & \dots & {\hat{x}}_{k}^{4} & {\hat{y}}_{k}^{4} \end{matrix}]}^{T}

The final shape distortion for both left and right images are computed as:

E_{S h a p e} = \sum_{v_{k}^{L} \in V_{L}} S_{L} (v_{k}^{L}) \cdot E (v_{k}^{L}) + \sum_{v_{k}^{R} \in V_{R}} S_{R} (v_{k}^{R}) \cdot E (v_{k}^{R})

where

E (v_{k}^{L})

and

E (v_{k}^{R})

are the energies of the k-th grids in the left and right images, respectively, and

S_{L} (v_{k}^{L})

and

S_{R} (v_{k}^{R})

are the average significance values of all pixels in the corresponding grids in the left and right images, respectively.

2) Edge distortion: Beside using the above shape distortion energy to preserve the important content, it is necessary to reduce the deformation for other unimportant grids, e.g., reducing the bending of the grid’s edges. We define the bending energy for each grid as:

E ({\hat{v}}_{i}, {\hat{v}}_{j}) = {‖ ({\hat{v}}_{i} - {\hat{v}}_{j}) - s_{e} (v_{i} - v_{j}) ‖}^{2}

Using the same principle for similarity transformation calculation with the above shape distortion, the scaling factor s_e is computed in terms of the coordinates of the deformed edge vertices. We do not repeat the process for s_e computation again, and the reader can refer to [4] for details. Finally, the total edge distortion energy for both left and right images are obtained by:

E_{E d g e} = \sum_{< {\hat{v}}_{i}^{L}, {\hat{v}}_{j}^{L} > \in V_{L}} E ({\hat{v}}_{i}^{L}, {\hat{v}}_{j}^{L}) + \sum_{< {\hat{v}}_{i}^{R}, {\hat{v}}_{j}^{R} > \in V_{R}} E ({\hat{v}}_{i}^{R}, {\hat{v}}_{j}^{R})

By incorporating the shape distortion E_Shape and edge distortion E_Edge, the final image warping energy is computed as:

E_{I W} = E_{S h a p e} + λ_{E d g e} E_{E d g e}

where λ_Edge is the weighs for the edge distortion term. As a basic module for image retargeting, we do not make further change in the image warping energy term. Thus, only with this term, we can resolve the content retargeting for stereoscopic images, however, the important depth adaptation is omitted in the energy term, and also omit user’s preference for selecting objects and their scaling factors. Therefore, as important retargeting semantics for controllable editing, we design another depth control energy and object layout energy terms to fulfill the editing function, which is the core of the proposed StereoEditor different with the existing models [4,7].

3.3 Depth control

The aim of depth control energy is to constrain the depth range of stereoscopic image so as to provide proper visual comfort and depth perception. In our controllable stereoscopic display, the depth ranges for different display devices should be flexibly adjusted according to image content, viewing condition and user’s preference. Therefore, by pre-defining a target depth range, we adjust the disparity of each vertex pair in the left and right views to match such viewing configuration. If we know the target retinal disparity limits or specify a target retinal disparity limits (e.g., negative and positive retinal disparity limits η₁ and η₂), the target depth range for a particular viewing configuration can be determined as follows:

{\hat{Z}}_{\max} = \frac{d_{e} L_{D}}{d_{e} - η_{1} L_{D}}, {\hat{Z}}_{\min} = \frac{d_{e} L_{D}}{d_{e} - η_{2} L_{D}}

where L_D denotes the viewing distance from the viewer to the display, and d_e is the interocular distance between the observer’s eyes. We set d_e = 65mm and L_D = 800mm in this paper. Of course, users can adjust the viewing distance based on different viewing conditions and their preference.

The relationship between the perceived depth Z_p and parallax D_p is determined as [27]:

Z_{p} = \frac{d_{e} L_{D}}{d_{e} - D_{p}}

where the parallax D_p is derived from the pixel-wise disparity d_p based on the width and horizontal resolution of the display, defined as follows:

D_{p} = \frac{W \cdot d_{p}}{R} = \frac{W \cdot (x_{p}^{R} - x_{p}^{L})}{R}

From (9) and (10), we find that the relationship between depth and disparity is nonlinear. That is, depth is more sensitive to the change of disparity, especially at the positive disparity [2]. Therefore, we adjust the depth of each vertex in the left view, and re-transform it to the disparity. The transformation is formulated as:

\begin{array}{l} {\hat{Z}}_{i} = \frac{{\hat{Z}}_{\max} - {\hat{Z}}_{\min}}{Z_{\max} - Z_{\min}} \cdot (Z_{i} - Z_{\min}) + {\hat{Z}}_{\min} \\ = s_{z} \cdot Z_{i} + t_{z} \end{array}

where [Z_min, Z_max] is the original depth range, s_z and t_z are the scale and translation factors for depth adjustment. Then, the re-transformation from the depth to the disparity is expressed as:

{\hat{d}}_{i} = \frac{d_{e} \cdot R}{W} \cdot (1 - \frac{L_{D}}{{\hat{Z}}_{i}})

To ensure the adjusted disparities of all vertices to be consistent with the estimated target ones, for each vertex pair $(v_{i}^{L}, v_{i}^{R}) \in V$ , their x-coordinate difference should be adjusted to match the target disparity. Thus, we have the horizontal disparity consistency term:

E_{H o r} = \sum_{({\hat{v}}_{i}^{L}, {\hat{v}}_{i}^{R}) \in V} {(({\hat{v}}_{i}^{R} [x] - {\hat{v}}_{i}^{L} [x]) - {\hat{d}}_{i})}^{2}

where we use notation [x] to represent the x component of the vertex.

Research also found that even limited levels of vertical disparity will cause noticeable viewing discomfort [28]. Therefore, to avoid such unwanted vertical disparity, we also add another constraint to ensure horizontal alignment after warping:

E_{V e r} = \sum_{({\hat{v}}_{i}^{L}, {\hat{v}}_{i}^{R}) \in V} {(({\hat{v}}_{i}^{R} [y] - {\hat{v}}_{i}^{L} [y]))}^{2}

where the notation [y] denotes the y component of the vertex.

The final depth control energy term is computed as:

E_{D C} = E_{H o r} + E_{V e r}

3.4 Object layout

As retargeting is a widely used operation for content composition, the purpose of object layout in this step is to allow the users to change the size and depth of the focused objects while keeping enough aesthetic. For this goal, we consider the following three criteria that may have large influence on the final object layout: aesthetic layout, size layout and depth layout. However, different with the above two terms optimized on the whole image, the object layout term is conducted on a local manner. Therefore, to optimize the three energy terms in a unified framework, the object layout term should be naturally integrated into the total warping framework, that is, the above two terms (image warping error and depth control) are computed for the whole images, while the object layout term is computed only for the selected objects, but the three terms are optimized simultaneously. Thus, if having small energy (small deformation) for the selected objects, the energy for the regions outside the selected objects will be comparatively large to make a deformation tradeoff over the whole image.

1) Object aesthetic layout: Inspired by computational aesthetics [29], it is a nature task to keep enough aesthetic after retargeting, or increase image aesthetics via retargeting. In this paper, we focus on two common aesthetics rules incorporated into the retargeting framework, defined as follows:

The rule-of-thirds: This rule divides an image into nine parts with equal size by sampling the horizontal and vertical axes into three equal parts. Four intersection points are defined from the nine parts, as shown in the red points in Fig. 2. When an object is close to one of the points, the perceived aesthetics is comparatively good.

Fig. 2 Illustration of aesthetic criterion.

Download Full Size | PDF

Visual balance: An image is visually balanced if the center of all salient regions is close to the image center.

Since our goal is to obtain an aesthetics component which is consistent with the above two rules, we are inspired to change the location layout of objects. Let $(x_{N W}, x_{N E}, x_{S E}, x_{S W})$ denote the power points in the rule of thirds (red points in Fig. 2), and $x_{c e n t e r}$ be the image center (blue point in Fig. 2), based on the centroids of selected object regions, the error for the rule-of-thirds is computed as the distance to the nearest power point, and the error for the visual balance is computed as the distance to the image center. Thus, the total object aesthetic layout error is defined as

E_{A e s} = \underset{T h i r d s}{\underset{︸}{\sum_{k} {‖ m_{k} - x_{n e a r} ‖}_{2}}} + \underset{B a l a n c e}{\underset{︸}{\sum_{k} {‖ m_{k} - x_{c e n t e r} ‖}_{2}}}

where m_k is centroid of the k-th object,

x_{n e a r}

is the nearest power point, and

{‖ ‖}_{2}

denotes the Euclidean (L2) norm. It should be emphasized that, in this term, we only adjust the horizontal and vertical coordinates of the objects, while depth information is not involved in the energy optimization even depth is proved to be an important cue for aesthetic experience [30], because we use another object depth layout term to fulfill the depth adjustment. As will be demonstrated later, only using this energy term for retargeting, the size of the objects should be decreased correspondingly to adapt the target resolution. Thus, we add a same scaling factor related to image scaling ratio for all grids within the objects in the Scheme-3 in the subsequent subsection 4.2.

2) Object size layout: To maintain the original size or enlarge the size of a selected important object, it should allow the observer to specify a scaling factor. Assuming the scaling factor varies along the horizontal edge (only retargeting on the horizontal axis), and applying the same scaling factor on the left and right views, the object size layer energy is computed as:

\begin{array}{l} E_{S i z e} = \sum_{{\hat{v}}_{i}^{L} \in R} | ({\hat{v}}_{i}^{L} [x] - {\hat{v}}_{i + 1}^{L} [x]) - s'_{x} ({\hat{v}}_{i}^{L} [x] - {\hat{v}}_{i + 1}^{L} [x]) | \\ + \sum_{{\hat{v}}_{i}^{R} \in R} | ({\hat{v}}_{i}^{R} [x] - {\hat{v}}_{i + 1}^{R} [x]) - s'_{x} ({\hat{v}}_{i}^{R} [x] - {\hat{v}}_{i + 1}^{R} [x]) | \end{array}

Here, we use

{\hat{v}}_{i + 1}^{L}

(

{\hat{v}}_{i + 1}^{R}

) to denote the horizontally adjacent vertex of

{\hat{v}}_{i}^{L}

(

{\hat{v}}_{i}^{R}

). Selection of

s'_{x}

is related to image scaling ratio (defined as γ) between the retargeted and original images (only horizontal scaling is considered in this paper). If

s'_{x} > γ

, the object will be enlarged after retargeting, while having opposite effect if

s'_{x} < γ

. Particularly, the original object size is preserved with

s'_{x} = γ

.

3) Object depth layout: Similarly, our method also allows the observer to specify a scaling factor for the perceived depth. First, the observer uses a scaling factor $s'_{z}$ to obtain the target depth for a selected important object:

{\hat{Z}}_{j} = s'_{z} Z_{j}

Then, by converting the depth to the target disparity using Eq. (11), we have the object depth layout energy term:

E_{D e p t h} = \sum_{({\hat{v}}_{j}^{L}, {\hat{v}}_{j}^{R}) \in R} {(({\hat{v}}_{j}^{R} [x] - {\hat{v}}_{j}^{L} [x]) - {\hat{d}}_{j})}^{2}

It should be emphasized that with a large $s'_{z}$ , the disparity range of the resultant retargeted image is decreased due to the inverse relationship between depth and disparity.

4) Total energy: The final object layout related energy term is computed as:

E_{O L} = E_{A e s} + λ_{S i z e} E_{S i z e} + λ_{D e p t h} E_{D e p t h}

where λ_Size and λ_Depth are the weights for the size and depth layout terms, respectively.

As important properties for controllable stereoscopic display, our StereoEditor allows the users to specify a window to define an object, and specify the scaling factors $s'_{x}$ and $s'_{z}$ . As shown in Fig. 3, by specifying two toys as focused objects, with different scaling factors $s'_{x}$ , the selected objects are enlarged while regions outside the objects are squeezed (shown in the first row). With different scaling factors $s'_{z}$ , the depth perception of the selected objects will be also changed (shown in the second row). These results further indicate the effectiveness and necessity of the object layout energy designed in the proposed StereoEditor.

Fig. 3 Examples with different scaling factor settings.

Download Full Size | PDF

3.5 Optimization

By combining the above warping error, visual comfort and object layout energy terms, the final optimization for the grid warping is formulated as

\min (E_{I W} + λ_{D C} E_{D C} + λ_{O L} E_{O L})

where λ_DC and λ_OL are the weights for the last two terms. The first term constrains the grid warping distortion, and the second term constrains the depth range between the left and right views. The last term is defined to control the coordinate and depth of the selected objects. It is clear that, minimizing the total energy function corresponds to solving a linear system

A P = b

, and a set of deformed vertices

\hat{V}

can be found.

4. Experimental results

In this section, we demonstrate several stereoscopic image editing results using the proposed method. In the experiment, four stereoscopic images captured by us are tested. Two of them (marked as #1 and #2) have relatively large depth range, while the others (marked as #3 and #4) have small depth range. All the scenes contain two independent objects for the purpose of object layout.

To objectively evaluate the performance of our method with state-of-the-art stereoscopic image retargeting methods, we select four methods for comparison, including QoE-guided warping-based retargeting approach (QoEWAR) [9] (our previous work), single-layer warping-based retargeting approach (SLWAR) [4], visual attention guided stereoscopic seam carving (VASSC) approach [20] (our previous work), and geometrically consistent stereoscopic seam carving approach (GCSSC) [18]. The first two are warping-based retargeting approaches, and the latter two are seam carving based approaches. Compared with our previous QoEWAP approach, the main innovations of the proposed approach are that much richer controllable semantics from objects and scaling factors are considered and the left and right views are simultaneously optimized in a unified framework, while for the other three approaches, parameters are uncontrollable for editing. In the experiment, all images are shrunk by 40% of the original width.

In the experiment, we set the interocular distance d_e = 65mm, the viewing distance L_D = 800mm, and the disparity limits η₁ = −1 and η₂ = 1. The parameters L_D and $(η_{1}, η_{2})$ are adjustable and slight changes above or below the values will have certain effects on the editing performance. For the weights of different energy terms, we set λ_Edge = 4, λ_Size = 0.8, λ_Depth = 1.6, λ_DC = 2 and λ_OL = 5 according to the tradeoff in overall perception. For the key $s'_{x}$ and $s'_{z}$ parameters, we set different values for these testing images to better indicate their influences. In the experiment, we select two main objects in the images and assign their $s'_{x}$ values. All $s'_{x}$ values are not lower than 0.6 (image scaling ratio), indicating object size preservation. We use a large $s'_{z}$ for the input stereoscopic images with large depth ranges, while a small value for the input stereoscopic images with small depth ranges.

4.1 Qualitative comparison with other methods

Figure 4 shows the retargeting results on two source input stereoscopic images (#1 and #2) with large depth range. Even with strong depth perception for these testing images, visual comfort is usually poor due to accommodation-convergence conflict. From the figures, we find that, due to the nature of depth preservation for seam carving based retargeting methods (VASSC and GCSSC), the retargeted images still retain the original depth range, and shape deformation will be occurred in the edge due to irregular seam removal (e.g., the glass wall in #2). The QoEWAR and SLWAR methods have good performance in shape preservation. In contrast, our method can achieve a better tradeoff among shape preservation, depth control and object layout, obtaining more visually pleasure stereoscopic vision. For the results on two source input stereoscopic images (#3 and #4) with small depth range shown in Fig. 5, similar with the above conclusion, our method can obviously enhance the depth sensation for those testing image pairs while having better visual experience.

Fig. 4 Results on testing image #8 and #10. From top to bottom in each testing image: the retargeted stereoscopic images (shown in red-cyan anaglyph), and the disparity maps (shown in pseudo-color map).

Download Full Size | PDF

Fig. 5 Results on testing image #8 and #10. From top to bottom in each testing image: the retargeted stereoscopic images (shown in red-cyan anaglyph), and the disparity maps (shown in pseudo-color map).

Download Full Size | PDF

4.2 Contribution of each component

In addition, to better demonstrate the impact of each energy term in the proposed retargeting framework, we design the following four schemes for comparison:

Scheme-1: Only using image warping error term.

Scheme-2: Using image warping error and depth control terms.

Scheme-3: Using image warping error, depth control and object aesthetic layout terms.

Scheme-4: Using image warping error, depth control and object size layout terms.

Scheme-5: Using image warping error, depth control and object depth layout terms.

The retargeting results for these comparative schemes are shown in Fig. 6. Some important phenomena can be observed from the results: 1) compared with Scheme-1 and Scheme-2 with or without depth control term, since the original disparity range is near the target the retinal disparity limits, the difference between the two results is not remarkable; 3) compared with Scheme-3 and Scheme-2 with or without object aesthetic layout term, the layout of the selected objects is changed. Also, the size of the object is decreased due to the rule-of-thirds and visual balance constraints; 4) compared with Scheme-4 and Scheme-2 with or without object size layout term, by assigning a large $s'_{x}$ value ( $s'_{x} = 0.9$ ), the size of the objects is obviously enlarged; 5) compared with Scheme-5 and Scheme-2 with or without object depth layout term, by assigning a large $s'_{z}$ value ( $1 / s'_{z} = 0.5$ ), the depth range of the objects is correspondingly reduced. Overall, by cooperating all these energy terms, the proposed method can provide more natural visual perception for users.

Fig. 6 Results on testing image #2 for different schemes.

Download Full Size | PDF

4.3 Influence of objects’ scaling factor $s'_{x}$

From the perspective of content adaptation, if a retargeted stereoscopic image is displayed on a small screen, its objects’ scaling factor should be reduced correspondingly. In our optimization framework, we can optionally specify the scaling factor based on the scaling ratio. In Fig. 7, we set three different objects’ scaling factors for four scaling ratios (20%, 30%, 40%, and 50%), respectively. We can see that, by adding the scaling factor, the relation among object is slightly changed to satisfy the content layout. Also, with small image width, the relative depth distance between objects is increased. Thus, if we display the retargeted stereoscopic images on different displays (e.g., represented by different scaling ratios), the scaling factor should be carefully adjusted to provide a better content adaptation.

Fig. 7 The results of changing objects’ scaling factor for shrinking the image width by: 20%, 30%, 40% and 50%.

Download Full Size | PDF

4.4 Subjective evaluation results

We also perform user study to assess our algorithm. Twenty graduates (9 female) were participated in our user study. We conduct subjective experiment on a Samsung UA65F9000 65-inch Ultra HD 3D-LED TV with 3D shutter glasses. We perform paired comparisons between the retargeted results obtained by our method and one of the comparative methods: QoEWAR [9], SLWAR [4], and VASSC [20], GCSSC [18]. In the test, four stereoscopic image pairs are randomly chosen for this testing. Each participant is asked to choose one retargeted result preferred to other according to the overall visual experience based on overall visual experience, and performs 16 comparisons in each iteration. The user study results are reported in Table 1. Our methods can higher votes for all testing images when compared to VASSA and GCSSC. Except for #1 and #3, our method is also better than QoEWAR, which further confirm the strong preference of our method over other methods.

Table 1. The user’s preference results of subjective paired comparisons.

View Table

5. Conclusions

In this paper, we present a StereoEditor for content-aware, depth-adaptive and object-selective stereoscopic image retargeting to achieve controllable stereoscopic display. The most important technical innovation of our StereoEditor is that image warping error, depth control and object layout energy terms are considered in an optimization framework for stereoscopic editing, and it allows users to edit controllable parameters from the retinal disparity limits, viewing distance and object’s scaling factors. Our results demonstrate the effectiveness and power of our model. In the future work, we expect to extend this framework to enable new applications, such as stereoscopic image stitching and disparity manipulation.

Funding

Natural Science Foundation of China (NSFC) (grant 61622109); Zhejiang Natural Science Foundation of China (grant R18F010008); K.C.Wong Magna Fund in Ningbo University.

References and links

1. F. Gou, H. Chen, M. C. Li, S. L. Lee, and S. T. Wu, “Submillisecond-response liquid crystal for high-resolution virtual reality displays,” Opt. Express 25(7), 7984–7997 (2017). [CrossRef] [PubMed]

2. R. F. Tong, Y. Zhang, and K. L. Cheng, “StereoPasting: Interactive composition in stereoscopic images,” IEEE Trans. Vis. Comput. Graph. 19(8), 1375–1385 (2013). [CrossRef] [PubMed]

3. W. Y. Lo, J. van Baar, C. Knaus, M. Zwicker, and M. H. Gross, “Stereoscopic three-dimensional copy & paste,” ACM Trans. Graph. 29(6), 147 (2010). [CrossRef]

4. C. H. Chang, C. K. Liang, and Y. Y. Chuang, “Content-aware display adaptation and interactive editing for stereoscopic images,” IEEE Trans. Multimed. 13(4), 589–601 (2011). [CrossRef]

5. M. Lang, A. Hornung, O. Wang, S. Poulakos, A. Smolic, and M. Gross, “Nonlinear disparity mapping for stereoscopic 3D,” ACM Trans. Graph. 29(4), 75 (2010). [CrossRef]

6. S. Bhattacharya, R. Sukthankar, and M. Shah, “A holistic approach to aesthetic enhancement of photographs,” ACM Trans. Multimed. Comput. 7S(1), 21 (2011).

7. B. Li, L. Y. Duan, C. W. Lin, T. Huang, and W. Gao, “Depth-preserving warping for stereo image retargeting,” IEEE Trans. Image Process. 24(9), 2811–2826 (2015). [CrossRef] [PubMed]

8. S. S. Lin, C. H. Lin, Y. H. Kuo, and T. Y. Lee, “Consistent volumetric warping using floating boundaries for stereoscopic video retargeting,” IEEE Trans. Circ. Syst. Video Tech. 26(5), 801–813 (2016). [CrossRef]

9. F. Shao, W. Lin, W. Lin, Q. Jiang, and G. Jiang, “QoE-guided warping for stereoscopic image retargeting,” IEEE Trans. Image Process. 26(10), 4790–4805 (2017). [CrossRef] [PubMed]

10. T. Yan, R. W. H. Lau, Y. Xu, and L. Huang, “Depth mapping for stereoscopic videos,” Int. J. Comput. Vis. 102(1–3), 293–307 (2013). [CrossRef]

11. H. Park, H. Lee, and S. Sull, “Efficient viewer-centric depth adjustment based on virtual fronto-parallel planar projection in stereo 3D images,” IEEE Trans. Multimed. 16(2), 326–336 (2014). [CrossRef]

12. L. Liu, R. Chen, L. Wolf, and D. Cohen-Or, “Optimizing photo composition,” Comput. Graph. Forum 29(2), 469–478 (2010). [CrossRef]

13. X. Li, H. Zhao, H. Huang, L. Xiao, Z. Hu, and J. Shao, “Stereoscopic image recoloring,” J. Electron. Imaging 25(5), 053031 (2016). [CrossRef]

14. S. P. Du, S. M. Hu, and R. R. Martin, “Changing perspective in stereoscopic images,” IEEE Trans. Vis. Comput. Graph. 19(8), 1288–1297 (2013). [CrossRef] [PubMed]

15. M. Sharma and B. Lall, “Content-aware seamless stereoscopic 3D compositing,” in Proc. 2014 Indian Conference on Computer Vision Graphics and Image Processing (2014), pp. 72.

16. S. J. Luo, Y. T. Sun, I. C. Shen, B. Y. Chen, and Y. Y. Chuang, “Geometrically consistent stereoscopic image editing using patch-based synthesis,” IEEE Trans. Vis. Comput. Graph. 21(1), 56–67 (2015). [CrossRef] [PubMed]

17. W. Yan, C. Hou, J. Lei, Y. Fang, Z. Gu, and N. Ling, “Stereoscopic image stitching based on a hybrid warping model,” IEEE Trans. Circ. Syst. Video Tech. 27(9), 1934–1946 (2017). [CrossRef]

18. T. D. Basha, Y. Moses, and S. Avidan, “Stereo seam carving A geometrically consistent approach,” IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2513–2525 (2013). [CrossRef] [PubMed]

19. J. Lei, M. Wu, C. Zhang, F. Wu, N. Ling, and C. Hou, “Depth-preserving stereo image retargeting based on pixel fusion,” IEEE Trans. Multimed. 19(7), 1442–1453 (2017). [CrossRef]

20. F. Shao, W. Lin, W. Lin, G. Jiang, M. Yu, and R. Fu, “Stereoscopic visual attention guided seam carving for stereoscopic image retargeting,” J. Disp. Technol. 12(1), 22–30 (2016). [CrossRef]

21. Y. Niu, F. Liu, W. C. Feng, and H. Jin, “Aesthetics-based stereoscopic photo cropping for heterogeneous displays,” IEEE Trans. Multimed. 14(3), 783–796 (2012). [CrossRef]

22. J. Lei, M. Wang, B. Wang, K. Fan, and C. Hou, “Projection-based disparity control for toed-in multiview images,” Opt. Express 22(9), 11192–11204 (2014). [CrossRef] [PubMed]

23. J. Lei, C. Zhang, Y. Fang, Z. Gu, N. Ling, and C. Hou, “Depth sensation enhancement for multiple virtual view rendering,” IEEE Trans. Multimed. 17(4), 457–469 (2015). [CrossRef]

24. F. Shao, Q. Jiang, R. Fu, M. Yu, and G. Jiang, “Optimizing visual comfort for stereoscopic 3D display based on color-plus-depth signals,” Opt. Express 24(11), 11640–11653 (2016). [CrossRef] [PubMed]

25. D. Sun, S. Roth, and M. J. Black, “Secrets of optical flow estimation and their principles,” in Proc. IEEE International Conference on Computer Vision and Pattern Recognition (2010), 2432–2439. [CrossRef]

26. Q. Jiang, F. Shao, G. Jiang, M. Yu, Z. Peng, and C. Yu, “A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency,” Signal Process. Image Commun. 38, 57–69 (2015). [CrossRef]

27. R. Cormack and R. Fox, “The computation of disparity and depth in stereograms,” Percept. Psychophys. 38(4), 375–380 (1985). [CrossRef] [PubMed]

28. F. L. Kooi and A. Toet, “Visual comfort of binocular and 3D displays,” Displays 25(2), 99–108 (2004). [CrossRef]

29. F. L. Zhang, M. Wang, and S. M. Hu, “Aesthetic image enhancement by dependence aware object re-composition,” IEEE Trans. Multimed. 15(7), 1480–1490 (2013). [CrossRef]

30. M. Ross, “The 3-D aesthetic: Avatar and hyperhaptic visuality,” Screen 53(4), 381–397 (2014). [CrossRef]

Method	#1	#2	#3	#4	Avg
QoEWAR	50%	100%	50%	60%	65%
SLWAR	90%	100%	100%	90%	95%
VASSC	90%	100%	100%	100%	97.5%
GCSSC	90%	100%	100%	100%	97.5%

StereoEditor: controllable stereoscopic display by content retargeting

Abstract

1. Introduction

2. Related work