Multi-frame super-resolution algorithm for complex motion patterns

A. V. Kanaev; C. W. Miller

doi:10.1364/OE.21.019850

1. Introduction

Multi-frame super-resolution algorithms have been a focus of image processing research for several decades [1–12]. These algorithms seek to produce a high resolution image by combining sampling limited low resolution images of the same scene collected over a short time interval. Classical approaches rely on accurate subpixel motion estimation between the low resolution images, which continues to be a difficult problem for arbitrary motion patterns. As a result, successful super-resolution performance was initially demonstrated only in cases of simple global motion, e.g., uniform translations or rotations, and then in more general cases where the motion was described using global affine motion models [1,2]. Lately, some progress has been achieved in cases containing relatively simple local motion fields with one or two objects moving through the scene, in a straightforward manner, with small relative displacements accumulated through the whole sequence [3–8]. Modern optical flow (OF) algorithms are capable of estimating such motion fields with the required subpixel accuracy. However these model scenes are rarely representative of real life situations. The more complex motion patterns present in real life image sequences cause OF algorithms to experience significant problems, invalidating their underlying models. Uncertainty of motion estimation can be modeled using a Bayesian approach, leading to significant improvement in super-resolution. However, the errors arising from estimation of the most complex scene movements still produce distortions in the final high-resolution images [9].

The focus of this paper is on super-resolving images containing complex motion patterns that present a significant challenge to motion estimation techniques, despite the recent progress of OF algorithms. The most prevalent modern approach to motion estimation is to pose the problem in a variational form with a global smoothness constraint [13–16,18]. Some recent OF algorithms can accurately compute motion fields containing irregularities, e.g., discontinuities, occlusions, and brightness constancy violations. However, estimation of large displacements remains difficult because the solution that is obtained as a result of local optimization is biased towards initialization, which is usually a zero motion field. The coarse-to-fine approach adopted by modern OF algorithms somewhat alleviates this problem by first computing an estimates on the coarser scales, then refining this estimates at finer scales. However, these algorithms tend to bias the motion of finer features towards the motion of the larger scale structures. Thus, the motion patterns in which small structures move in a different way from larger scale structures and motion patterns where the relative motion of small scale structures are larger than their own scale represent the most difficult problem for modern OF algorithms [15]. One category of image sequences where such motion arises is human motion. This is due to the fact that relatively small body parts such as hands or legs move extremely fast relative to their own size. The most recent attempt to resolve nonconforming motion of different scale structures has been made by the addition of local descriptors such as SIFT and HOG features to the variational OF formulation [14,15]. Unfortunately, the interpolation error of such algorithms does not experience dramatic improvement compared to optical flow algorithms that don't use descriptors [19]. Also, rich features generally rely on large pixel counts, diminishing the applicability of these optical flow algorithms to the low resolution images used in super-resolution processing.

Within the context of super-resolution, failure to estimate local motion details leads to lack of resolution, spurious image distortions, and reduced dynamic range of the resulting image. Super-resolution techniques that use implicit motion estimation via block matching have been developed recently and are generally free of the motion induced image processing artifacts inherent to classical algorithms [10–12]. They are able to provide image resolution enhancement to real life video sequences. However, the demonstrated resolution enhancement factor of nonlocal methods has generally been modest. Additionally, nonlocal techniques experience block matching difficulties with large displacements, rotational motion, and blurred edges.

In this paper, we present a novel super-resolution algorithm that follows a variational approach and addresses the problems of motion estimation errors. A two-way optical flow computation between the reference image and other images in the sequence is employed to reduce errors due to occlusions. Other motion estimation errors are accounted for by implementation of a corresponding set of two-way weights that represents uncertainty measures designed to approximate the interpolation error of the OF algorithm. The presented super-resolution framework can be implemented with any optical flow algorithm, however its performance will depend on the accuracy of that particular algorithm.

The remainder of this paper is organized as follows. In Section 2, we present the super-resolution problem in a classical variational setting including a novel approach to warping operator estimation, which together comprise the Double Uncertainty Double Estimation (DUDE) super-resolution algorithm. In Section 3, we demonstrate the results of the DUDE algorithm applied to human motion sequences and show that the proposed algorithm outperforms both classical and nonlocal techniques. Finally, Section 4 contains concluding remarks.

2. Variational super-resolution

2.1 Inversion problem for image formation

Super-resolution reconstruction is an inverse problem, the goal of which is to recreate a higher resolution image from the sequence of low resolution images. The inversion is performed with respect to the camera image formation process. This process can be modeled in system matrix notation as,

I_{L}^{n} = D B W_{n} I_{H} + e_{n}, n = 1 \dots N,

where

{I_{L}}_{n = 1}^{N}

is the set of low resolution images,

I_{H}

is the reconstructed high resolution image, D is a down sampling operator, B is a blurring matrix,

W_{n}

is the warping matrix that describes scene motion, and

e_{n}

is a Gaussian noise vector. Here the blurring and decimation operators are assumed to be constant throughout the image collection process. Additionally, the blur kernel is assumed to be a constant Gaussian across the entire image.

The common variational approach is to minimize the difference between the observed images ${I_{L}}_{n = 1}^{N}$ and the warped, blurred and decimated version of the estimated high resolution image $I_{H}$ while also enforcing a global smoothness constraint with addition of regularization term $ℛ$ [1–9],

E (I_{H}) = \sum_{n = 1}^{N} ‖ D B W_{n} I_{H} - I_{L}^{n} ‖ + λ ℛ (\nabla I_{H}),

where

λ

is a regularization weight. The norm

‖ \cdot ‖

in Eq. (2) is typically taken to be the L1 or L2 norm. Our experiments have shown that when using the scheme presented in this paper the L2 norm in the formulation of the data term in the functional (2) provides faster convergence and better image quality of the reconstructed image. The most frequently used regularization term

ℛ

penalizes some norm of the image gradient [1–4,7–9], but sharper edges and faster convergence can be achieved using and anisotropic smoothness term developed in [16,17] for use in optical flow, which adapts the smoothing direction to local image structures. This regularization term is given by,

ℛ = Ψ ({(ν_{1}^{t} \nabla I_{H})}^{2}) + {(ν_{2}^{t} \nabla I_{H})}^{2},

where

Ψ (s^{2}) = 2 μ^{2} {(1 + s^{2} / μ^{2})}^{0.5}

is the Charbonnier function with a fixed contrast parameter

μ

and

ν_{1, 2}

are the two orthonormal eigenvectors of the image structure tensor J. The image structure tensor is given by,

J = G_{ρ_{1}} * [\nabla (G_{ρ_{2}} * I) {(\nabla (G_{ρ_{2}} * I))}^{t}],

where

*

is the convolution operator,

^{t}

denotes vector transposition,

G_{ρ}

is a Gaussian kernel, and

ρ_{1, 2}

represent neighborhood and image smoothing scales respectively.

The steepest descent iterations for the minimization problem (2) are given by [16,17],

\begin{array}{l} I_{H}^{(k + 1)} = I_{H}^{(k)} + Δ t (\sum_{n = 1}^{N} W_{n}^{T} B^{T} D^{T} (I_{L}^{n} - D B W_{n} I_{H}^{(k)}) + \\ + λ d i v ([Ψ^{'} ({(ν_{1}^{t} \nabla I_{H}^{(k)})}^{2}) ν_{1} ν_{1}^{t} + ν_{2} ν_{2}^{t}] \nabla I_{H}^{(k)})), \end{array}

where

^{T}

denotes the adjoint operator, which in the case of the linear operator

D^{T}

indicates upsampling without interpolation and in the case of blurring operator one obtains that

B^{T} = B

due to the symmetry of the Gaussian kernel. Warping operators

W_{n}^{T}

and

W_{n}

represent forward and backward motion compensation, which requires subpixel precision motion estimation between the frames. In practice, the warping operation can be performed by interpolating the image onto the grid distorted according to the estimated optical flow field.

The accuracy of the warping operators defines the resolution gain attainable by super-resolution processing. Motion compensation is the weakest link of the SR algorithm presented by Eq. (5). The motion compensation error is most closely related to the interpolation error of optical flow [19], which can be used to demonstrate the challenge of obtaining accurate motion compensated image. To the date the interpolated images in the Middlebury database obtained by all of the state-of-the-art optical flow algorithms contain significant image artifacts spanning number of pixels in the image regions where the most complex motion is present (for example, “Backyard” and “Basketball” image pairs) [19]. As a contrast to these interpolation results super-resolution processing requires sub-pixel accuracy to be able to enhance spatial resolution of low-resolution images. The motion compensation problem for super-resolution is further exacerbated compared to optical flow interpolation problem since it is necessary to compute optical flow not just between two frames that are relatively close to each other in time but between all the frames in the sequence. This results in accumulation of even more complex motion patterns and hence even larger errors. As a consequence functional minimization based on Eq. (2) tends to converge to erroneous local minima without resolving image regions containing motion and, as it will be shown in the next Section, the iterations of Eq. (5) actually diverge in the Peak Signal-to-Noise-Ratio (PSNR) sense.

A natural way to stabilize SR iterations in Eq. (2) is to introduce a weighted norm formulation,

E (I_{H}) = \sum_{n = 1}^{N} (D B W_{n} I_{H} - I_{L}^{n}) U_{n} (D B W_{n} I_{H} - I_{L}^{n}) + λ ℛ (\nabla I_{H}),

where, in the most general sense, the weights

U_{n}

represent confidence or conversely uncertainty of the estimation of the

D B W_{n}

image formulation operator. As pointed out above, the accuracy of the combined

D B W_{n}

operator is generally limited by the accuracy of the warping operator

W_{n}

. Therefore, the weights should be determined primarily by our confidence in the estimation of this operator. Previously the weights in Eq. (6) have not been defined for OF based warping involving a pixel-by-pixel approach. Instead, the weights have been used in region based approaches to approximate registration error in image sequences with motion limited to global shifts [5] and affine model [6,7].

To achieve the high fidelity warping necessary to super-resolve image sequences with more complex motion patterns, two modifications to Eqs. (5) and (6) are proposed. The first modification is a warping error reduction step that includes separate estimation of the forward and adjoint warping operators $W_{n}$ and $W_{n}^{T}$ . Indeed, for relatively simple motion patterns, OF between the frames can be considered symmetric and, therefore, forward warping can be obtained from the backward warping simply by changing the sign of the underlying OF field $(u_{x}, u_{y}) : W_{n} (u_{x}, u_{y}) = W_{n}^{T} (- u_{x}, - u_{y})$ This approximation has dominated super-resolution algorithms, regardless of the motion patterns under consideration [1–9]. For complex motion fields, however, this symmetry does not hold, mainly due to occlusions but also due to slight asymmetry of the estimation algorithms. Thus, it becomes necessary to directly compute estimates of both the forward and backward flow field. The computation of both forward and reverse OF estimates is referred to as double estimation.

Generally, there are two ways to accomplish motion estimation in the sequence of low resolution images. The first is to compute OF between pairs of consecutive images and the second is to compute OF between each image and a reference image. In the first approach, OF between frames distant in time is computed by simple vector addition of motion fields estimated for consecutive image pairs set inside that time interval. Ideally, each such OF calculation attains higher accuracy compared to the second approach, because of the smaller displacements and simpler motion patterns. In practice, this advantage can be offset in the second approach by initializing each OF field with the motion field already estimated for the neighboring frame. On the other hand the first approach guarantees that OF error accumulates far beyond the subpixel level and precludes any viable tracing of OF error propagation. In the proposed algorithm, the second approach, which allows for OF error control is used.

The second modification is an assignment of two separate confidence weights $U_{n}^{F}$ and $U_{n}^{B}$ to each of the warping operators $W_{n}$ and $W_{n}^{T}$ . The weights must act after each warping operation, otherwise the confidence is assigned to the coordinates from which the pixels are then shifted. As a result, the system of the two separate confidence measure does not conform to the formulation of Eq. (6) and instead is introduced directly into Eq. (5). Proposed implementation of these two separate warping operator weights is based on a pixel-by-pixel approach of OF uncertainty measures and is referred to as double uncertainty. Then the final form of super-resolution iterations in the DUDE algorithm is,

\begin{array}{l} I_{H}^{(k + 1)} = I_{H}^{(k)} + Δ t (\sum_{n = 1}^{N} U_{n \to r e f} W_{n \to r e f} B D^{T} U_{r e f \to n} (I_{L}^{n} - D B W_{r e f \to n} I_{H}^{(k)}) + \\ + λ d i v ([Ψ^{'} ({(ν_{1}^{t} \nabla I_{H}^{(k)})}^{2}) ν_{1} ν_{1}^{t} + ν_{2} ν_{2}^{t}] \nabla I_{H}^{(k)})), \end{array}

where the indexes

n \to r e f

and

r e f \to n

denote an image warping towards the reference image and vice versa. The resulting minimization procedure puts heavier emphasis (larger weights) on the pixels with easily estimated motion which consists of short translations. Such pixel values are established after a relatively small number of iterations. The values of pixels with small weights, indicating potential motion compensation errors, are determined after a larger number of iterations. These necessary extra iterations set up mild instabilities along the image edges with larger weights, resulting in ripples along those edges. Artifacts created by this procedure can be removed using sparse filtering. The Block Matching and 3-D filtering (BM3D) algorithm is used, due to its ability to preserve edge sharpness [20]. It decomposes the image into patches, which are grouped by block matching. Blocks with similar content, as defined by the L2 norm distance between their pixels, are arranged into 3-D arrays. The filtering procedure involves a 3-D transform, represented by separable composition of a 2-D discrete cosine transform for every two-dimensional block and a 1-D Haar transform in the third array dimension spanning across the group of blocks. The sparsity of the signal is enforced by 1-D transform-domain spectral shrinkage attenuating the noise. The inverse 3-D transform produces estimates for all the blocks, which are then placed back to their original positions.

2.2 OF algorithms

The presented DUDE algorithm is universal with respect to the chosen OF estimation algorithm. However, the choice of OF algorithm defines the quality of super-resolved image. In this paper, four algorithms from the top part of Middlebury database ratings with respect to interpolation error, namely primal-dual TVL1 OF (TVL1) [7], warping based OF (WB) [18], classical nonlocal OF (CNL) [13], and motion detail preserving OF (MDP) [14] are used to demonstrate SR processing. Because the purpose of this work is not an investigation of OF algorithms, but the novel super-resolution framework that exploits them, these algorithms as well as their underlying models are described briefly, while their full details can be found in [7,13,14,18].

Given a pair of consecutive images, the reference $I_{1}$ , and the target $I_{2}$ the OF field $u (u_{x}, u_{y})$ between these two images is defined by the image intensity constancy assumption $I_{1} (r) = I_{2} (r + u)$ . Most OF algorithms, including the four used in this paper, minimize some variation of a constrained image constancy assumption based functional,

\begin{array}{l} E_{o p t f l} (u) = (1 - γ_{1}) Ψ_{z} (I_{1} (r) - I_{2} (r + u)) + γ_{2} Ψ_{z} (\nabla I_{1} (r) - \nabla I_{2} (r + u)) + \\ + α ℛ_{o p t f l} (I_{1}, I_{2}, \nabla u), \end{array}

where

Ψ_{z}

is the Charbonnier penalty function with power z = 0.5 for TVL1, WB, and MDP methods and power z = 0.45 for the CNL method,

γ_{1, 2}

are the relative weights of image intensity and image intensity gradient constancy terms respectively,

α

is a regularization weight, and

ℛ_{o p t f l}

is the regularization term. The image gradient constancy assumption is not used in TVL1 and CNL algorithms

γ_{1} = 0, γ_{2} = 0

;

γ_{1} = 0

and

γ_{2}

is a free parameter in WB algorithm;

γ_{1} = γ_{2}

and its value is iteratively estimated using mean field approximation with free parameter temperature in the MDP algorithm. TVL1, MDP, and CNL algorithm consider linearized versions of the data term(s) in Eq. (8), while the WB algorithm uses nonlinearized intensity and intensity gradient constancy assumptions. The regularization term in all four algorithms is based on total variation (TV) flow smoothness,

ℛ_{o p t f l} = ω {‖ \nabla u ‖}_{L 1},

however WB, CNL, and MDP algorithms employ L1 norm approximation by Charbonnier penalty function while TVL1 is able to avoid this approximation with implementation of primal-dual solver. Parameter

ω = \exp (- {| \nabla I_{1} |}^{0.8})

represents a structure adaptive map, accounting for discontinuities along the motion boundaries in MDP method, while

ω = 1

for the rest of the techniques. A more complex approach to OF discontinuities in CNL algorithm adds nonlocal filtering to regularization,

ℛ_{o p t f l}^{N L} = {| u - \hat{u} |}^{2} + \sum_{r} \sum_{r^{'} \in N_{r}} \exp [- \frac{{| r - r^{'} |}^{2}}{2 σ_{1}^{2}} - \frac{{(I (r) - I (r^{'}))}^{2}}{2 σ_{2}^{2}}] \frac{o (r^{'})}{o (r)} | \hat{u} (r) - \hat{u} (r^{'}) |,

where the first term couples the auxiliary flow

\hat{u}

to the real flow

u

and the second term performs the median filtering using the pixel’s

r

neighborhood

N_{r}

with the weights that are defined according to spatial distance scaled with

σ_{1}

, intensity value difference scaled with

σ_{2},

and occlusion states

o (r^{'})

and

o (r)

of the pixels

r^{'}

and

r

respectively. All approaches use a coarse-to-fine warping technique to escape trapping in local minima and to increase tolerance to large displacements.

The WB algorithm was implemented according to the original recipe of [18] in which nonlinear Euler-Lagrange equations corresponding to Eq. (8) are solved, using two nested fixed point iteration loops resulting from two different linearization procedures and fully nonlinear diffusion scheme discretization. Public implementations of CNL, TVL1, and MDP algorithms were downloaded from author’s websites. WB, CNL, and TVL1 algorithms were modified to accept nonzero initialization of the flow fields because OF estimation errors due to large displacements between the images in the sequence and the reference image can be reduced by initialization of OF field with the motion field already estimated for the neighboring frame (see Section 2.1). Alternatively, The MDP algorithm provides extended flow initialization using SIFT feature detection and matching [14,21].

2.3 Confidence measure of warping operator

Confidence weights U in Eq. (7) can be approximated by a measure of OF interpolation error. The subtle difference between the two concepts lays in the warping procedure. In super-resolution, only the target image is used to perform the warping, while in the OF interpolation error definition, both the reference and the target images are used to perform the warping [19]. Use of the single image in super-resolution warping does not provide effective handling of occlusions which means slight underestimation of warping operator error by an OF interpolation error measure.

Two measures have been shown to describe OF interpolation error accurately [22]. The first measure is an uncertainty measure based on the difference of the reference image I₁ and the motion compensated version of the target image I₂

φ = | I_{1} (r) - I_{2} (r + u) | .

The second measure is also an uncertainty measure that represents the final value of the functional $E_{o p t f l}$ after minimization [23]. The first measure is more precise in describing the highest error pixels. Indeed it contains no explicit smoothing part which contributes strongly to the problematic pixel values of the functional based measure reducing the functional energy but not necessarily reducing interpolation error. However, the warping based measure is noisy for the same reason that it lacks the explicit smoothing part. The noise in the uncertainty measure tends to propagate into the noise in the super-resolved image. To avoid that, a Gaussian smoothing filter can be applied. To fit Eq. (7) definition the uncertainty measure $φ$ needs to be inverted into a confidence measure with a normalized dynamic range,

U = G_{ρ} * e x p (- φ / a),

where

a

represents a scaling factor which depends on the dynamic range of the image.

3. Numerical experiments

3.1 Test image sequences and algorithm parameters

The low-resolution test images for super-resolution processing are generated by using the human motion image sequences from the Middlebury database, namely the 8-frame “Walking” sequence (Fig. 1 and Media1), the 8-frame “Basketball” sequence (Fig. 2 and Media 2), and the less challenging 10-frame “Foreman” sequence (Fig. 3 and Media 3). All images are in 8-bit gray scale format. The three sequences are blurred using a Gaussian kernel with standard deviation equal to 2 and then decimated by a factor of 4. The fourth frame from the “Walking” and the “Basketball” sequences and the fifth frame from the “Foreman” sequence are chosen as reference frames. The baseline for the comparison is simple pixel replication and Lanczos interpolation of the low-resolution data. We compare performances of the DUDE algorithm, the Nonlocal means super-resolution (NLS) algorithm [11], which is known for its ability to handle a wide variety of motion patterns, and a classical formulation of super-resolution based on OF (C-SR) [7], which is representative of a broad class of super-resolution algorithms.

Fig. 1 The “Walking” image sequence (Media 1). Image courtesy of http://vision.middlebury.edu/flow/.

Download Full Size | PDF

Fig. 2 The “Basketball” image sequence (Media 2). Image courtesy of http://vision.middlebury.edu/flow/.

Download Full Size | PDF

Fig. 3 The “Foreman” image sequence (Media 3).

Download Full Size | PDF

The NLS algorithm was implemented according to [11]. The NLS similarity block size of 7 pixels was found to be optimal for all three sequences. The search area for the similar blocks was extended to the full image sequence in each case. The weight moderating parameter was set to σ = 5.9 which was the minimum value that allowed for population of reference image similarity groups with more than one member. A single iteration was found to be optimal in terms of the final image quality. The original version of the NLS algorithm allows a choice of deblurring method. In this paper the state-of-art Deblurring BM3D (DEBBM3D) method is used with the assumption of known blur kernel and the noise variance of 15/255 [24].

A classical formulation of super-resolution C-SR was implemented according to [7]. The basis of the algorithm is represented by Eq. (2) using the L1 and L2 norms. The regularization term $ℛ$ was given as the L1 norm (TV approach) and the L2 norm of the flow gradient, respectively. Motion estimation was computed between the neighboring image pairs and then the optical field between each frame and the reference frame was computed by vector addition.

The parameters for the four OF algorithms were established by minimizing the interpolation error between the fourth (reference) and the fifth frames of the low-resolution “Walking” image sequence. These parameters were kept constant for each of the examples. In the TVL1 algorithm, the smoothness parameter was set to $α = 50$ and the coarse-to-fine rescaling factor was set to 0.9. In the CNL algorithm, the smoothness parameter was set to $α = 3$ and the relative scales were set to $σ_{1} = σ_{2} = 7$ . In the MDP algorithm, the smoothness parameter was set to $α = 6$ . In the WB algorithm, the smoothness parameter was set to $α = 20$ the gradient constancy weight was set to $γ_{2} = 3$ , and the coarse-to-fine rescaling factor was set to 0.95. Cubic spline interpolation was employed to upsample the OF fields obtained from the low-resolution images for use in super-resolution iterations described in Eq. (7). The warping operation was performed using bilinear interpolation.

Parameters for the DUDE algorithm were fixed for all image sequences using the following values: step size $Δ t = 0.1$ , regularization weight $λ = 0.1$ , contrast parameter $μ = 0.2$ , and image structure scales $ρ_{1} = ρ_{2} = 0.7$ . A warping based OF measure with scaling factor $α = 6.75$ was employed. The BM3D filtering noise variance was set to 20.

3.2 Experimental results

The “Walking” sequence represents a moderately difficult scene for super-resolution, due to the occlusion of the doorway behind the walking man’s right hand, occlusion of his left leg, and the motion of his left hand’s fingers, which does not conform to the motion of the left hand itself. The maximum pixel displacement from the reference frame to the most distant frame in time is approximately 6 pixels. The maximum displacement is accurately estimated by all of the OF algorithms used in this paper. Analysis of the warping errors that affect super-resolution processing is provided in Fig. 4 and Table 1. Figure 4 demonstrates four image warping test cases. Each subfigure consists of the gray scale image (center) obtained bywarping frames using the estimated TVL1 OF, the corresponding color coded estimated TVL1 OF field (upper right), and the corresponding TVL1 OF interpolation error uncertainty measure φ (lower right). Ideally, the warped images in Figs. 4(a) and 4(b) would reproduce the low-resolution reference frame 4 and the warped images on the Figs. 4(c) and 4(d) would reproduce the low-resolution frame 8 from the “Walking” sequence. Table 1 provides OF uncertainty measure values per pixel for the TVL1 OF cases presented on Fig. 4 as well as for the rest of the OF algorithms. It is important to note that the presentation of the data in Table 1 is not aimed at comparison of OF algorithms, but rather its goal is to demonstrate broad trends of warping operator performance that led to the DUDE algorithm design.

Fig. 4 The “Walking” image sequence experiments with TVL1 OF algorithm. Warped image (in the center) and the warping error measure relative to 8-bit gray scale levels (lower right) which are obtained using OF estimation (upper right) between (a) frames 4 and 5; (b) frames 4 and 8; (c) frames 8 and 4; (d) using negative of OF estimation between frames 4 and 8. OF is color coded such that hue indicates flow direction according to presented scheme and saturation indicates flow magnitude.

Download Full Size | PDF

Table 1. Warping error per pixel

View Table | View all tables in this article

Comparison of Figs. 4(a) and 4(b) and the values in the first and the second columns of Table 1, which correspond to warping frames 5 and 8 toward the reference frame 4, shows a dramatic increase of interpolation error with increased displacement values and rising complexity of the motion pattern. The warped image on Fig. 4(b) suggests that OF estimation starts to break down in the image regions containing motion. The same situation can be observed in Fig. 4(c) for the image obtained with the inverse warping of the reference frame 4 toward frame 8. If the backward OF field is estimated explicitly, as suggested by the proposed DUDE algorithm in Eq. (7), then the interpolation errors of forward and backward warping are naturally similar (compare motion compensation errors in the second and the third columns of Table 1). If, instead of explicitly estimating the two way OF, the negative forward OF field is used for backward warping of the reference frame 4 toward frame 8, then the interpolation error significantly increases. This can be observed by comparing the motion compensation errors in Figs. 4(c) and 4(d), as well as the third and the fourth columns in Table 1. The penalty of failing to estimate the backward OF is a complete breakdown of the motion compensated image in Fig. 4(d). The reason for such a breakdown is the significant difference between the backward OF in Fig. 4(c) and the negative forward OF in Fig. 4(d).

The results of super-resolving the “Walking” sequence by a factor of four using the C-SR (L1 and L2) and NLS algorithms as well as the DUDE algorithm employing the four different OF methods described above are compared to the baseline interpolation results in Fig. 5. The super-resolution iterations of the DUDE algorithm (see Eq. (7)) converged to a constant PSNR value after approximately 100 steps for all image sequences. These PSNR values are reported in Table 2. To illustrate the importance of introducing separate forward and backward warping operators along with the corresponding confidence measures, the “Walking” sequence was also processed with the super-resolution algorithm without those features as defined by iterations of Eq. (5). The corresponding PSNR values are denoted as SR in Fig. 6. Similarly to the diverging SR result, the super-resolution iterations of the L2 C-SR algorithm achieved a maximum PSNR value after 21 steps for the “Walking” sequence, 15 steps for the “Basketball” sequence, and 26 steps for the “Foreman” sequence after which the value of PSNR decreased. The L1 C-SR algorithm exhibited similar behavior, but attained its maximum PSNR value after 116 steps for the “Walking” sequence, 120 steps for the “Basketball” sequence, and 195 steps for the “Foreman”. The maximum PSNR values and their corresponding images are reported in these cases.

Fig. 5 Reference image of the “Walking” sequence with zoomed in region containing man’s left hand: (a) original; (b) degraded; (c) Lanczos interpolated; and super-resolved with (d) L2 C-SR; (e) L1 C-SR (f) NLS; (g) TVL1 OF DUDE; (h) CNL OF DUDE; (i) MDP OF DUDE; (g) WB OF DUDE.

Download Full Size | PDF

Table 2. PSNR results for the three test sequences.

View Table | View all tables in this article

Fig. 6 Image PSNR dependence on number of super-resolution iterations. $P S N R = 10 \log_{10} (255^{2} p / {‖ I_{H} - I_{O R I G} ‖}_{L 2}) [d B]$ , where p is a number of pixels.

Download Full Size | PDF

The L2 C-SR algorithm fails to resolve the “Walking” sequence, with smearing and numerical instabilities occurring across the motion lines (Fig. 5(d)). As a consequence, the PSNR of the super-resolved image, 27.26 dB, is only slightly higher than the 27.05 dB PSNR of the pixel replicated image (Fig. 5(b)) and lower than the 28.89 PSNR of the interpolated image (Fig. 5(c)). The L1 C-SR algorithm provides significantly improved resolution with respect to the L2 C-SR result and the interpolated image and attains a higher PSNR of 31.44 dB (Fig. 5(e)). The distortion that is observed behind the man’s right side can be attributed to poor handling of occlusions. The NLS algorithm produces a sharper image compared to the L1 C-SR image (Fig. 5(f)). The regions containing motion are resolved, albeit with some distortions of both of the man’s hands. The resolution improvement with respect to the interpolated image is reflected in the higher PSNR value of 31.60 dB. Despite the discussed substantial OF errors, the DUDE algorithm using any of the four different OF estimation methods reveals additional spatial content absent in the image obtained with the NLS algorithm and avoids distortions in the regions containing significant motion (Fig. 5(g)-5(j)).

In addition to overall enhanced spatial acuity, the man’s facial features are more pronounced, the hands attain the correct shapes, and the texture of the man’s pants is more evident. The edges of the desk, which are experiencing only simple shifts due to camera positioning change, are sharper in the NLS image where all edges are equally determined by nonlocal averaging. The ripples around the desk edges in the image obtained with the DUDE algorithm are due to mentioned numerical instability, caused by convergence rate difference. The highest PSNR values of 32.90 and 32.80 are achieved by the DUDE algorithm using WB OF (Fig. 5(j)) and MDP OF (Fig. 5(i)) respectively. Performance of the DUDE algorithm employing two other OF methods CNL and TVL1 is very similar 32.45 dB and 32.44 dB, (see Figs. 5(g) and 5(h)) respectively. Use of TVL1 OF provides the sharpest and the most precise images of the both man’s hands, but suffers numerically unstable sharpening along the motion boundaries. One conclusion that can be derived by combining the data of the Tables 1 and 2 is that, although average warping error provides general approximation of OF performance in the super-resolution algorithm, it cannot predict final image quality, which is dependent on intricacies of motion compensation errors.

The “Basketball” sequence presents a more difficult problem for super-resolution compared to the previous example. This is mostly due to the large displacement of the ball in the scene. The maximum OF vector magnitude between the reference frame and the most distant frame is approximately 26 pixels. Intricate motion of the hands also adds complexity to the example. For the sake of clarity and because of the previously demonstrated steady performances of all four OF methods, the result of the DUDE algorithm using only the WB OF algorithm is presented. For the “Basketball” sequence, the evidence of WB OF partial breakdown is its maximum OF vector magnitude estimate of only 11 pixels which effectively precludes resolution of the ball movement in the last frame of the sequence.

The results of the “Basketball” sequence super-resolved by a factor of four are presented in Fig. 7 and corresponding PSNR values are shown in Table 2. As before, the L2 C-SR algorithm is not able to capture the motion of the scene (Fig. 7(d)). The L1 C-SR algorithm result (Fig. 5(e)) represents an improved image compared to the L2 C-SR approach. Higher noise levels can be observed in the image regions experiencing larger displacements such as the ball and the hands of both men. The NLS algorithm provides some resolution improvement over the interpolation, but does not add spatial acuity to the moving hands of the man facing the camera. The left hand of the man facing away from the camera is distorted, as well (Fig. 7(e)). The image obtained with the DUDE algorithm generates a higher level of details, including the facial expression of the man throwing the ball, the texture of the ball, and several resolved fingers on the both men’s hands (Fig. 7(f)).

Fig. 7 Reference image of the “Basketball” sequence with zoomed in region containing man’s hands and the ball: (a) original; (b) degraded; (c) Lanczos interpolated; and super-resolved with (d) L2 C-SR; (e) L1 C-SR; (f) NLS; (g) DUDE.

Download Full Size | PDF

The “Foreman” sequence is less challenging than the two previous cases, but it serves the point of putting this work in the context of the previously performed state-of-art research [10–12]. The main difficulty in this example is presented by the localized motion of the man’s mouth. There are no significant occlusions and the magnitude of maximum displacement in the sequence with respect to the reference frame is approximately only 3 pixels which is successfully resolved by WB OF.

The four times super-resolved images obtained from the “Foreman” sequence follow the previous trends and are demonstrated in Fig. 8. The corresponding PSNR values are shown in Table 2. Again the L2 C-SR algorithm fails to capture the local motion of the man’s mouth (Fig. 8(d)), while the L1 C-SR algorithm improves image resolution with minor distortions around the center of the man’s mouth (Fig. 8(e)). The NLS algorithm generates sharper edges and a higher dynamic range compared to the interpolated image (Fig. 8(f)). Note that the PSNR value is different than that in the original work [12], which is due to smaller number of frames used in processing, as well as larger downsampling and stronger blur applied to the original data. The DUDE algorithm provides a super-resolved image with less noise and distortions compared to the image obtained with the L1 C-SR algorithm (see Figs. 8(g) and 8(e)) and with more facial expression details compared to the image obtained with the NLS algorithm (see Figs. 8(f) and 8(g)). The NLS algorithm better resolves the edges which are experiencing small global shifts through the image sequence. As before, the numerical instabilities of the DUDE algorithm caused by convergence rate difference create slight rippling effects along those edges.

Fig. 8 Reference image of the “Foreman” sequence: (a) original; (b) degraded; (c) Lanczos interpolated; and super-resolved with (d) C-SR L2; (e) L1 C-SR; (f) NLS; (g) DUDE.

Download Full Size | PDF

In addition to numerical experiments with publicly available images from accepted databases, a collection of data was performed with a high frame rate camera (Fig. (9) and Media 4). The purpose of this test image sequence was to demonstrate the performance of super-resolution algorithms for motion patterns that included large local rotation, nonrigid motion, and some occlusions. The image sequence in Media 4 consists of forty frames that are degraded according to the recipe from Section 3.1. Because of L2 C-SR algorithm’s inability to resolve motion in all of the previous examples, only the results of the L1 C-SR, the NLS, and the DUDE algorithm using WB OF are compared. The parameters of the algorithms remain fixed at the previously discussed values. Each frame of the super-resolution sequence is obtained using twenty one low-resolution frames.

Fig. 9 High frame rate sequence of images with human motion (Media 4).

Download Full Size | PDF

The images presented in Fig. 10 and Media 5 demonstrate that the DUDE algorithm attains significantly higher image quality compared to that obtained with the interpolation, the NLS, and the L1 C-SR algorithms. The image quality difference is noticeable in overall image sharpness, as well as in details of the facial expression and the book cover. The L1 C-SR algorithm is able to improve PSNR of some images by as much as ~0.7 dB compared to the interpolation result, while the DUDE algorithm improvement over interpolation attains ~1.7 dB. Also the image generated by the DUDE algorithm is free from distortions and excessively flat regions present in the image generated by the NLS algorithm. The relatively low PSNR values achieved by the NLS method can be attributed to these distortions arising from difficulty of handling rotation and to a slight shift accumulated by the image during processing which first was observed by authors in the original work [11]. The NLS PSNR calculation was adjusted to compensate for the shift on the pixel scale, but a remaining subpixel shift still affected the PSNR value.

Fig. 10 High frame rate sequence of images (a) degraded and then Lanczos interpolated; and super-resolved using (b) the NLS; (c) the L1 C-SR; (d) the DUDE (Media 5).

Download Full Size | PDF

It is clear that the performance of any super-resolution algorithm depends strongly on the complexity of the motion present in the processed image sequence. Our tests indicate not only that the proposed super-resolution algorithm performs better than the state-of-art super- resolution algorithms, but also the performance gap widens with increasing complexity of the motion pattern. The computational complexity of the algorithm is essentially equal to that of the classical super-resolution approach, with double computational time for motion estimation. The weights based on motion compensation interpolation error are inexpensive to compute, since the motion compensated image is already required by the super-resolution algorithm. Applying the weights only involves two additional multiplications per pixel.

4. Summary

In summary, a novel multi-frame super-resolution algorithm enabling resolution enhancement of complex motion patterns is proposed. The presented algorithm employs an OF based variational formulation of the super-resolution problem, which previously has been unable to produce high-resolution images free of distortion in sequences containing localized, nonrigid, and large displacement motion. Because motion estimation errors are responsible for the major classical super-resolution limitations, the described approach concentrates on OF interpolation error reduction and control. First, the variational formulation is augmented with warping based on two way OF estimation, the value of which is demonstrated for a number of OF algorithms. Second, in super-resolution iterations each warping operator is supplemented by confidence weights based on an uncertainty measure of OF interpolation error. The structure of the implemented weights does not conform to the weighted norm formulation used in the previous work.

The proposed algorithm’s advantage in terms of image quality and spatial resolution is demonstrated with respect to existing state-of-the-art super-resolution algorithms using challenging human motion image sequences. Future work will involve investigation of alternative OF error measures and their extension to adaptive selection of optimal frame numbers used for super-resolution processing from long image sequences.

References and links

1. S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: A technical overview,” IEEE Signal Process. Mag. 20(3), 21–36 (2003). [CrossRef]

2. R. C. Hardie and K. J. Barnard, “Fast super-resolution using an adaptive Wiener filter with robustness to local motion,” Opt. Express 20(19), 21053–21073 (2012). [CrossRef] [PubMed]

3. S. P. Belekos, N. P. Galatsanos, and A. K. Katsaggelos, “Maximum a posteriori video super-resolution using a new multichannel image prior,” IEEE Trans. Image Process. 19(6), 1451–1464 (2010). [CrossRef] [PubMed]

4. M. K. Park, M. G. Kang, and A. K. Katsaggelos, “Regularized high-resolution image reconstruction considering inaccurate motion information,” Opt. Eng. 46(11), 117004 (2007). [CrossRef]

5. O. A. Omer and T. Tanaka, “Multiframe image and video super-resolution algorithm with inaccurate motion registration errors rejection,” Proc. SPIE 6822, 682222, 682222-9 (2008). [CrossRef]

6. O. A. Omer and T. Tanaka, “Region-based weighted-norm with adaptive regularization for resolution enhancement,” Digit. Signal Process. 21(4), 508–516 (2011). [CrossRef]

7. D. Mitzel, T. Pock, T. Schoenemann, and D. Cremers, “Video super resolution using duality based TV-L1 optical flow,” Lect. Notes Comput. Sci. 5748, 432–441 (2009). [CrossRef]

8. M. Unger, T. Pock, M. Werlberger, and H. Bischof, “A convex approach for variational super-resolution,” Lect. Notes Comput. Sci. 6376, 313–322 (2010). [CrossRef]

9. C. Liu and D. Sun, “A Bayesian approach to adaptive video super resolution,” in Proceeding of Computer Vision and Pattern Recognition Conference (IEEE, 2011), pp. 209–216. [CrossRef]

10. A. Danielyan, A. Foi, V. Katkovnik, and K. Egiazarian, “Image and video super-resolution via spatially adaptive block-matching filtering,” in Proceedings of Int. Workshop on Local and Non-Local Approx. in Image Processing, Lausanne, Switzerland, 23–24 August, 2008.

11. M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing the Nonlocal-Means to Super-Resolution Reconstruction,” IEEE Trans. Image Process. 18(1), 36–51 (2009). [CrossRef] [PubMed]

12. H. Takeda, P. Milanfar, M. Protter, and M. Elad, “Super-resolution without explicit subpixel motion estimation,” IEEE Trans. Image Process. 18(9), 1958–1975 (2009). [CrossRef] [PubMed]

13. D. Sun, S. Roth, and M. J. Black, “Secrets of optical flow estimation and their principles,” in Proceeding of Computer Vision and Pattern Recognition Conference (IEEE, 2010), pp. 2432–2439. [CrossRef]

14. L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1744–1757 (2012). [CrossRef] [PubMed]

15. T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in variational motion estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500–513 (2011). [CrossRef] [PubMed]

16. H. Zimmer, A. Bruhn, J. Weickert, L. Valgaerts, A. Salgado, B. Rosenhahn, and H.-P. Seidel, “Complementary optic flow,” Lect. Notes Comput. Sci. 5681, 207–220 (2009). [CrossRef]

17. H. Zimmer, A. Bruhn, and J. Weickert, “Freehand HDR Imaging of Moving Scenes with Simultaneous Resolution Enhancement,” in Proceeding of Eurographics, Llandudno, UK, 11–15 April, 2011. [CrossRef]

18. T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” Lect. Notes Comput. Sci. 3024, 25–36 (2004). [CrossRef]

19. S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” Int. J. Comput. Vis. 92(1), 1–31 (2011). [CrossRef]

20. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE Trans. Image Process. 16(8), 2080–2095 (2007). [CrossRef] [PubMed]

21. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2), 91–110 (2004). [CrossRef]

22. A. V. Kanaev, “Confidence measures of optical flow estimation suitable for multi-frame super-resolution,” Proc. SPIE 8399, 839903, 839903-12 (2012). [CrossRef]

23. A. Bruhn and J. Weickert, “A confidence measure for variational optic flow methods,” in Geometric Properties for Incomplete Data, (Springer-Verlag, 2006).

24. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image restoration by sparse 3D transform-domain collaborative filtering,” Proc. SPIE 6812, 681207 (2008). [CrossRef]

OF Algorithm	Frames 4-5	Frames 4-8	Frames 8-4 with backward OF	Frames 8-4 with negative forward OF
TVL1	1.21	2.65	2.7	4.94
CNL	1.75	5.41	5.31	5.97
WB	1.37	3.81	4.05	4.76
MDP	1.6	5.52	5.22	6.09

	Pixel replication	Lanczos interpolation	L2 C-SR	L1 C-SR	NLS	DUDE with WB OF
Walking	27.05 dB	28.89 dB	27.26 dB	31.44 dB	31.60 dB	32.90 dB
Basketball	27.48 dB	29.67 dB	24.60 dB	31.81 dB	32.01 dB	33.80 dB
Foreman	26.19 dB	27.32 dB	27.45dB	29.13 dB	29.58 dB	29.98 dB

OF Algorithm	Frames 4-5	Frames 4-8	Frames 8-4 with backward OF	Frames 8-4 with negative forward OF
TVL1	1.21	2.65	2.7	4.94
CNL	1.75	5.41	5.31	5.97
WB	1.37	3.81	4.05	4.76
MDP	1.6	5.52	5.22	6.09

	Pixel replication	Lanczos interpolation	L2 C-SR	L1 C-SR	NLS	DUDE with WB OF
Walking	27.05 dB	28.89 dB	27.26 dB	31.44 dB	31.60 dB	32.90 dB
Basketball	27.48 dB	29.67 dB	24.60 dB	31.81 dB	32.01 dB	33.80 dB
Foreman	26.19 dB	27.32 dB	27.45dB	29.13 dB	29.58 dB	29.98 dB

Multi-frame super-resolution algorithm for complex motion patterns

Abstract

1. Introduction

2. Variational super-resolution

2.1 Inversion problem for image formation

2.2 OF algorithms

2.3 Confidence measure of warping operator

3. Numerical experiments

3.1 Test image sequences and algorithm parameters

3.2 Experimental results

4. Summary

References and links

Supplementary Material (5)

Cited By

Figures (10)

Tables (2)

Equations (12)

Optics Express