High-quality blind defocus deblurring of multispectral images with optics and gradient prior

Xiao-Xiang Wei; Lei Zhang; Hua Huang

doi:10.1364/OE.390158

1. Introduction

Multispectral images comprise of spatial maps of light intensity variation across a wide range of wavelengths. Compared to the traditional RGB imaging, the rich details of multispectral imaging with multiple channels exhibits its great potential in many applications [1], like agriculture, remote sensing, fluorescence microscopy, etc.

However, for some light-weight applications like unmanned aerial vehicle (UAV) and deep-space exploration, it is quite difficult to equip existing multispectral imaging systems, due to the complex and heavy configuration of lenses [2,3]. While the use of simple lenses can reduce the weight and cost of imaging systems, it has to suffer from quality degradation like defocus blur. This is because the focal lengths through simple lenses are different for the light with a series of wavelengths in its spectrum, such that the light rays cannot focus on the same imaging plane.

It should be noted that simple lenses do not mean just a single lens, but a concatenation of several single lenses in a simple manner. Figure 1 shows an illustration on the simple lenses imaging process, where four channels of the monochromatic light correspond to four wavelengths in the formation of multispectral images with 16 channels. Here, only the light rays represented by the green line focuses on the imaging plane exactly, while the other light rays generate the circle of confusion (CoC) with different radii on the plane. This results in different levels of blurriness after imaging, namely defocus blur. Hence, it is necessary to develop the high-quality deblurring method for defocus multispectral images.

Fig. 1. An illustration of the simple lenses imaging process that generates multispectral images with 16 channels and its RGB sample. The $9$-th image is at the well-focused channel, while images at the other channels have defocus blur. Note that the each channel is corresponding to a specific wavelength.

Download Full Size | PDF

Although the deblurring of grayscale or RGB images has been well studied and many decent methods have emerged in the last decade [4–16], they do not suit the restoration of every single channel individually for multispectral images. The reason is that spectral properties and inter-channel relationships are not fully explored to assist the generation of high-quality deblurring results. Recently, Chen and Shen [17] propose to utilize the inter-channel correlation characterized by the intensity similarity over all the channels of the entire spectrum for multispectral image deblurring. This method is able to deblur images with small blurriness. However, the images with large blurriness cannot be correctly restored, because of inaccurate blur kernel estimation and incorrect intensities from the unfit guiding images. Additionally, other tensor-based multispectral images deblurring methods like [18–20], cannot well handle the wavelength-dependent kernels, due to some disturbing artifacts when optimizing the sharp images.

Generally, the level of blurriness modeled by blur kernels is positively related to the distribution of CoCs over the channels, which inspires us to consider the blurriness and wavelengths together for more accurate blur kernel estimation. Besides, light with close wavelengths results in the local structural similarity in the spectral dimension when imaging the same scene, which suggests the restoration by using some prior in adjacent channels, instead of the intensity similarity over all the channels.

In this context, our goal is to remove the defocus blur caused by the chromatic aberration, and our contributions are the following two aspects:

• We model the defocus blur in the simple lenses imaging process and propose a more accurate blur kernel estimation for defocus multispectral images. This enables the better sharpness after deblurring.
• We propose a content-adaptive reference image selection to guide the correct intensity restoration, as well as the gradient-based prior in the optimization of maximum a posterior.

2. Related works

2.1 Natural image deblurring

Blind deblurring aims at restoring unknown sharp images (namely the latent images) from blurry images with the unknown blur kernel. According to the number of images used in the deblurring process, there are roughly two categories: single-image based and multi-image based deblurring.

In mathematical parlance, single-image based deblurring is an ill-posed problem. A typical solution is to adopt the maximum a posteriori (MAP) optimization to make this problem well-posed. The MAP optimization aims to find a set of parameter values that maximize the posterior probability if the prior probability distribution of the parameters is known. Early MAP methods apply the total variation (TV) regularization [8] to estimate the blur kernel and latent images iteratively. However, Levin et al. [11] have shown that most of these methods favor no-blur explanations. Some novel priors for natural images have also been proposed to generate sharp images, e.g., dark channels [10], local maximum gradient prior [16] and re-weighted graph total variation (RGTV) [14]. There are also some other deblurring methods based on the variational Bayesian (VB) [11,21] for estimating more accurate kernels. Moreover, deep learning based methods [22,23] have arisen with the help of the neural network. However, these single-image based deblurring methods are generally struggling for defocus multispectral images with heavy loss of details.

Alternatively, some multi-image based methods use multiple blurry images of the same scene to restore the latent sharp image. Zhang and Carin [24] combine the alignment and deblurring together, which does not require multiple images to exactly be aligned. The use of different regularization terms can improve the deblurring quality, e.g., sparse kernel regularization [25] and $L_{1}$ regularization [26]. Actually, multispectral image deblurring is totally different from the above ones, because it has to produce multiple sharp images instead of a single one after deblurring.

Defocus deblurring. Defocus blur is caused by the positional difference between the focal plane and the imaging plane, which is commonly characterized by an isotropic kernel, e.g., a disk-like kernel or a Gaussian kernel. Actually, many approaches are proposed to recover both defocus blur kernels and sharp images [15,27].

In addition to ordinary defocus blur, chromatic aberration (CA) is another kind of defocus degradation. CA is caused by the difference of focal lengths that depends on wavelengths, which needs to be corrected towards normal images. Under this circumstance, images of a few channels are nearly sharp, but the others suffer from blur of different levels. Kang [6] proposes an automatic method to correct CA, which fails to handle saturated pixels. Schuler et al. [12] utilize the hyper-Laplacian prior to assist non-blind correction of CA in the YUV space. Recently, more priors or models have been proposed to correct CA, e.g., the convex cross-channel prior [13], the geometric and visual priors [7] and the cross-channel information transfer model [5].

2.2 Multispectral image deblurring

Intuitively, the gradient-sparsity based priors that are learned from natural single images, can be directly applied to the multispectral image deblurring, e.g., TV regularization [28] and Gaussian mixture model [29]. These methods mainly apply gradient-sparsity constraints on images in the spatial domain with less attention to the spectral structure, which incurs the spectral uncorrelation in the restored sharp images. Recently, some methods consider both spatial continuity and spectral correlation to maximize the utilization of the spectral structure. Song et al. have proposed the spatial and spectral smoothness by penalizing large differences between adjacent pixels in these two domains [30]. The 3D tensor-based model is also used for multispectral image deblurring with the effectiveness as demonstrated in [18–20].

Defocus deblurring. To eliminate the defocus blur in multispectral images, a direct solution is to apply the forementioned single-image based defocus deblurring methods to every channel individually. However, the lack of inter-channel correlation makes most of existing deblurring methods unsatisfactory. The method of [17] is classic for deblurring defocus multispectral images. It tries to eliminate the blur based on the inter-channel correlation, which does produce sharp images for some channels by delivering candidate intensities from guiding images. However, this method relies on a simple brute-force strategy for kernel estimation, and guiding images over the entire spectrum. It cannot well deblur the images at the channels with large blurriness due to inaccurate kernel estimation and unfit guiding images. In this paper, we explore the physical formation of blur kernel and gradient prior based on the content-adaptive image selection for high-quality blind deblurring of multispectral images.

3. Methods

3.1 Degradation model

For brevity, we introduce our method based on the simple filter-based multispectral imaging system which consists of a monochrome camera and a filter wheel. Considering there is no blur in the spectral dimension for the filter-based imaging system, we consider the imaging process with 2D kernels in the band-by-band manner.

Let $\mathcal {B}\in \mathbb {R}^{M \times N \times K}$ represent the multispectral data captured by the filter-based imaging system. We adopt the following 2D model for each channel in $\mathcal {B}$:

(1)$$\mathbf{B}_{i}=\mathbf{H}_{i}\ast \mathbf{L}_{i}+\mathbf{\Theta}_{i}$$

where the symbol $\ast$ stands for the 2D convolution operator, $\mathbf {B}_{i},\mathbf {L}_{i}\in \mathbb {R}^{M\times N}$ are the blurry image and the latent sharp image at the $i$-th channel respectively. Besides, $\mathbf {H}_{i}$ is the defocus blur kernel, and $\mathbf {\Theta }_{i}$ represents the additive noise including both the random noise and structural line pattern noise in multispectral images. It should be noted the joint destriping and deblurring for multispectral images is a very challenging problem, whereupon we assume there is only the random noise in the input images by removing the structural line pattern noise with some sophisticated destriping methods like [31–33].

Remark. For the other multispectral imaging systems like line imaging cameras, the main difference with the filter based camera is that the dispersion element in these systems may introduce some blur in the spectral dimension, which reveals the desire for the 3D blur kernel. To handle this problem, we just concatenate different 2D blur kernels along the spectral dimension to form the 3D blur kernel. Though the degradation may be different from Eq. (1), the defocus blur is still modeled by our optics prior (see Sec. 3.2).

3.2 Defocus blur kernel estimation with optics prior

In mathematics, the blur kernel is typically modeled by a point spread function (PSF). Especially, the PSF of defocus blur can be cast as a Gaussian function $\mathbf {G}(\sigma )$ with $\sigma$ as its standard deviation. In physics, when the object distance and the lens configuration are fixed, the wavelength is the only factor that determines the focal length and the corresponding CoC on the imaging plane (see Fig. 2). It should be noted that the box in Fig. 2 is just an illustration of the simple lenses system, and the real multispectral imaging system has more complex configuration of single lenses in the box. But they follow the analogous optical principle as described in Fig. 2. Since this paper considers the defocus blur mainly caused by the refraction of lenses, we omit other irrelevant optical equipment for more concise description of our model. In fact, there is a linear relationship between the radius of the CoC and the standard deviation of a PSF [34]. This inspires more kernel priors based on the optical imaging model of the CoC, by which it purports to improve the accuracy of kernel estimation.

Fig. 2. The physical model of the CoC by the simple lenses imaging. The simplified rectangular box denotes the configuration of single lenses.

Download Full Size | PDF

3.2.1 Physical model

We assume the defocus image at the $i$-th channel corresponds to the imaging light with its focal length $f_i$ (see the blue dotted line in Fig. 2) and refractive index $n_i$. The famous lensmaker’s formula [35] for the single thin lens gives a relationship between the focal length $f_i$ and the refractive index $n_i$ as

(2)$$\frac{1}{f_i} = (n_i-1)\left( \frac{1}{C_{1}}-\frac{1}{C_{2}} \right)$$

where $C_{1}$ and $C_{2}$ denote the curvature radii of the lens surface closest to the object and the one closest to the imaging plane respectively.

Furthermore, the light ray generates a CoC in the imaging plane, and its radius $r_i$ has the following expression (see the proof in Appendices):

(3)$$r_{i}=\frac{dA}{f_{i}}\frac{|f_{i}-\tilde{f}|}{d-\tilde{f}}$$

where $d$ is the object distance, $A$ is the aperture radius of the lens, $\tilde {f}$ is the focal length of well-focused light (see the green dotted line in Fig. 2). All the three variables are assumed to be fixed in the imaging process.

Actually, we can model the relationship between the standard deviation $\sigma _i$ and the wavelength $\lambda _i$ according to the Cauchy’s dispersion formula [35]. Especially, we have the following quadratic polynomial to approximate the standard deviation as

(4)$$\hat{\sigma}_{i}(\lambda_{i}) = \frac{a}{\lambda^{2}_{i}} + \frac{b}{\lambda_{i}} + c=\mathbf{\Lambda}_{i}^\mathrm{T}\mathbf{P}$$

where $a$, $b$ and $c$ denote the coefficients of the polynomial (see the proof in Appendices), and the matrix forms of $\mathbf {\Lambda }_{i}$ and $\mathbf {P}$ are defined by

\mathbf{\Lambda}_{i} = \left[ \begin{matrix} \frac{1}{\lambda_{i}^{2}} & \frac{1}{\lambda_{i}} & 1 \end{matrix} \right]^\mathrm{T}, \quad \mathbf{P} = \left[ \begin{matrix} a & b & c \end{matrix} \right]^\mathrm{T},

which are $3\times 1$ matrices respectively.

Actually, with the known wavelength $\lambda _i$ in Eq. (4), we obtain the new prior knowledge on the Gaussian kernel, i.e., a sort of constraints characterized by a quadratic polynomial. Next, we show how to use this model to estimate a better defocus blur kernel.

3.2.2 Kernel estimation

The goal is to compute the blur kernel, i.e., the Gaussian PSF for each channel of multispectral images, such that the standard deviation satisfies the quadratic polynomial prior in Eq. (4). Due to the lack of the lens configuration (e.g., material and optics) in practice, it is impossible to directly solve the equation for the standard deviation. Alternatively, we attempt to infer a set of initial kernels based on only the image intensity, and then perform optimization to make them as close as possible to follow the polynomial prior. Next, we first demonstrate our optimization on the blur kernels with the abovementioned physical model.

MLS optimization. Suppose we already obtain a set of initial kernels which are not accurate enough to assist the restoration of the latent images. This low accuracy may be caused by the appearance difference in images at different channels, which suggests more room for improving the estimated standard deviations by utilizing our physical model of defocus blur.

Concretely, we employ the moving least squares (MLS) formulation to find the optimal $\hat {\sigma }_{i}$ in the sense of fitting Eq. (4). In MLS, the coefficient matrix $\mathbf {P}$ is not constant, but a function with respect to the wavelength. Thus, by replacing the constant coefficient $\mathbf {P}$ with $\mathbf {P}(\lambda _i)$ ($\mathbf {P}_i$ for short), the physical model in Eq. (4) can be represented as

(5)$$\hat{\sigma}_{i}(\lambda_{i}) = \frac{a(\lambda_i)}{\lambda^{2}_{i}} + \frac{b(\lambda_i)}{\lambda_{i}} + c(\lambda_i)=\mathbf{\Lambda}_{i}^\mathrm{T}\mathbf{P}_i$$

Given a wavelength $\lambda _{i}$, the optimal $\mathbf {P}_i$ is obtained by minimizing the following equation,

(6)$$\mathop{\arg\min}_{\mathbf{P}_i}\sum_{j=1}^{K}w(|i-j|)\|\mathbf{\Lambda}_{j}^\mathrm{T}\mathbf{P}_{i}-\sigma_{j}\|^{2}$$

where $\sigma _{j}$ is the value obtained in the kernel initialization (see Sec. 3.2), and the weight function $w(x)$ is defined by

w(x)=\exp{\left({-x^2}/{(2\theta)}\right)}

where $\theta$ is a coefficient to control the influence of the weight function. The larger values of $\theta$ lead to greater impact from adjacent channels. $\theta$ is set to be $10$ in the experiments of this paper.

Then, we have the closed-form solution for the matrix $\mathbf {P}_{i}$ as follows:

(7)$$\mathbf{P}_{i} = \mathbf{A}^{{-}1}(\lambda_{i})\mathbf{B}(\lambda_{i})\mathbf{\Sigma}^{\mathrm{T}}$$

where $\mathbf {A}(\lambda _{i})$ and $\mathbf {B}(\lambda _{i})$ are defined by

(8)$$\mathbf{A}(\lambda_{i})=\sum_{j=1}^{K}w(|i-j|)\mathbf{\Lambda}_{j}\mathbf{\Lambda}_{j}^{\mathrm{T}} $$

(9)$$\mathbf{B}(\lambda_{i})=\left[ \begin{matrix} w(|i-1|)\mathbf{\Lambda}_{1} & \cdots & w(|i-N|)\mathbf{\Lambda}_{K} \end{matrix} \right] $$

and $\Sigma =\left [\sigma _{1}, \sigma _{2}, \ldots , \sigma _{K}\right ]^{\mathrm {T}}$.

Finally, the optimized standard deviation $\hat {\sigma }_i$ in accordance with the physical model can be computed by Eq. (5) with $\mathbf {P}_{i}$, by which we can obtain the Gaussian kernels for all the channels. The pseudo code of our kernel estimation is described in Algorithm 1.

Initial kernel setting. We provide an effective method to compute a set of standard deviations as the initialization of the kernel. Here, the wavelength of the well-focused channel is usually known (e.g., the middle channel) in most imaging devices and multispectral image datasets, or it can be determined by using the normalized sparsity measure (NSM) [9]. Then, for the well-focused channel, its corresponding standard deviation $\sigma _s$ should be close to zero, whereupon we set $\sigma _s=0$ as the initial kernel setting for the image $\mathbf {B}_s$ corresponding to this well-focused channel.

For the other channels, we compute the relative kernel by blurring $\mathbf {B}_s$ to infer the standard deviations. Concretely, we are interested in the Gaussian kernel $\mathbf {G}(\sigma _{s\to i})$ that has its standard deviation to minimize the following intensity similarity function described by the mean squared error (MSE):

(10)$$\sigma_{s \to i} = \mathop{\arg\min}_{\sigma} \textrm{MSE} (\mathbf{B}_{s} \ast \mathbf{G}(\sigma), \mathbf{B}_{i})$$

where $\ast$ is the convolution operator. Here given two images $\mathbf {X}$ and $\mathbf {Y}$, the MSE is computed by

(11)$$\textrm{MSE}(\mathbf{X}, \mathbf{Y})=\frac{1}{M\times N}\sum_{i=1}^{M}\sum_{j=1}^{N}\lVert \mathbf{X}(i,j)-\mathbf{Y}(i,j) \rVert ^{2}$$

where $M$ and $N$ stand for the rows and columns of the images, and $\mathbf {X}(i,j)$ and $\mathbf {Y}(i,j)$ denote the intensity at the pixel position $(i,j)$. Because $\sigma _s$ is set to be zero by default, we use $\sigma _i$ to denote $\sigma _{s\to i}$ for simplicity in the sequel.

Due to the random and cumulative noise, solving $\sigma _{i}$ in the Fourier domain is prone to large errors. Besides, it is observed that the blurriness has the non-descending property between the current channel and the well-focused channel, i.e., the blurriness gets worse for the images corresponding to the channels far away from the well-focused channel. For example in Fig. 1, the $9$-th channel is the well-focused one, and the blurriness becomes worse from the $9$-th channel to the $1$-th channel or the $16$-th channel. Hence, we propose to use the kernel concatenation between adjacent channels to facilitate the kernel computation by Eq. (10).

In mathematics, the convolution of two Gaussian functions is still a Gaussian, with its variance being the sum of the original variances. Hence, we can use the concatenating form to compute $\sigma _i$ of the $i$-th channel, i.e.,

(12)$$\sigma_{i}=\sigma_{s\to i}=\\ \begin{cases} \sqrt{\sigma^{2}_{s\to s-1}+\cdots+\sigma^{2}_{i+1 \to i}} & i<s\\ \sqrt{\sigma^{2}_{s\to s+1}+\cdots+\sigma^{2}_{i-1 \to i}} & i>s\\ \end{cases}$$

which essentially gives the blur kernel with respect to the image of the well-focused channel. Therefore, the standard deviations of all the channels can be obtained if we have the set of standard deviations between adjacent channels, i.e., $\{\sigma _{s\to s-1},\ldots ,\sigma _{2 \to 1}\}$ and $\{\sigma _{s \to s+1},\ldots ,\sigma _{K-1 \to K}\}$, where $K$ is the number of channels.

The following task is to compute $\sigma _{i \to i+1}$ (or $\sigma _{i+1 \to i}$) between the two images at the adjacent channels. Here, we first reblur the image of lower blurriness with different Gaussian kernels, and then select the Gaussian kernel as $\sigma _{i \to i+1}$ (or $\sigma _{i+1 \to i}$) that corresponds to the minimum MSE between the reblurred image and the image with larger blurriness (see Algorithm 1).

To reduce the computational cost of searching, we set the interval to be $0.01$ in enumerating $\sigma _{i \to i+1}$ (or $\sigma _{i+1 \to i}$) from zero to a specified value (e.g., 5) in the searching process. This setting is sufficient to produce satisfactory results in the experiments of this paper. Thus, we obtain the initial standard deviation for the Gaussian kernel of each channel.

3.3 Content-adaptive deblurring based on gradient prior

For multispectral images of many channels, the inter-channel correlation is the key ingredient for image priors as pointed in [17]. Generally, this kind of correlation arises from imaging the same scene, such that images at different channels bear some resemblance related to the scene content. This is because the reflections of the same material to the light with similar wavelengths are similar. Furthermore, the resemblance decreases when the wavelength differences increase. Hence, we suggest the content-adaptive generation of an intermediate image for each channel to guide the restoration of the sharp image. Here, we refer to the generated intermediate image as the reference image.

3.3.1 Content-adaptive reference image selection

Concretely given a range size $W$, we model the reference image for the $i$-th channel as a weighted linear combination of known latent images as

(13)$$\mathbf{R}_{i}=\sum_{j=i-{W}}^{{i+W}}v(i,j)\delta(j)\mathbf{L}_{j}$$

where $\mathbf {R}_{i}$ denotes the reference image, $v(i,j)$ is the weight function and the mask function $\delta (j)$ restricts the latent image selection. For the unknown latent image $\mathbf {L}_{j}$, we have $\delta (j)=0$, otherwise $\delta (j)=1$. Here, the latent images are derived from the images corresponding to the well-focused channels, or deblurred images based on the estimated kernels as described in the next section.

The setting of the weight function $v(i,j)$ is important to create the high-quality reference images. A straightforward selection is to average all the latent images in the range, but the intensity change among different channels is not linear (see Fig. 1). For the purpose that the reference image can be adaptively changed with the content at the close channels, we combine the intensity similarity and channel distance into the definition of the weight function. Inspired by the steering kernel in [36], our weight function is defined by

(14)$$v(i,j)={\delta(j)\tilde{v}(i,j)}/{\sum_{j=i-W}^{i+W}\delta(j)\tilde{v}(i,j)}$$

where $\delta (\cdot )$ is the mask function as in Eq. (13), and

(15)$$\tilde{v}(i,j)=\exp{(-\frac{|i-j|^{2}\cdot|1-0.9^{\textrm{MSE}(B_{i},L_{j})}|^{2}}{2\beta^{2}})}$$

which has the indices $i$ and $j$ that satisfy $i-W\le j \le i+W$, and $\beta$ is a prescribed parameter set as 0.05 in the experiments of this paper.

It should be noted that the major difference in constructing the guiding images in [17] and our reference images lies in the selection of relevant channels. We will further explain the reason why we select neighboring channels to construct reference images in Sec. 3.3.4.

3.3.2 Deblurring with the gradient prior

Image deblurring is typically formulated as the following MAP framework [8,17], towards the optimal kernel and deblurred image that satisfy

(16)$$\mathop{\arg\min}_{\mathbf{H},\mathbf{L}}\{ \|\mathbf{L}\ast \mathbf{H} - \mathbf{B}\|_{2}^{2} + \mu\Psi(\mathbf{L})+\eta\Phi(\mathbf{H})\}$$

where $\mathbf {L}$, $\mathbf {B}$ and $\mathbf {H}$ have the same meanings with Eq. (1). The latent image prior $\Psi (\mathbf {L})$ and kernel prior $\Phi (\mathbf {H})$ with parameters $\mu$ and $\eta$ regularize the whole deblurring process to a well-posed problem.

With the blur kernel estimated by Algorithm 1, the term of kernel prior is defined as

(17)$$\Phi(\mathbf{H}_i)=\|\mathbf{H}_i-\mathbf{G}(\hat{\sigma}_i)\|_2^2$$

Besides, we exploit the prior from content-adaptive reference images. Here, we suggest the gradients from the reference images as the prior, rather than the intensity as used in [17]. The reason is that the intensity-based prior might be easily affected by minor inaccuracies of reference images, while the gradient-based prior can preserve more accurate intensities in the flat area. We will further explain the difference between these two priors in Sec. 3.3.1. Note that the intensity gradients in the spatial domain for each reference image are used in this paper, i.e., the intensity difference between adjacent pixels. Due to the usage of the 2D model, gradients in the spectral domain are not used here. Concretely, we have

(18)$$\Psi(\mathbf{L}_i)= \|\nabla \mathbf{L}_i- \nabla \mathbf{R}_i\|_{2}^{2}$$

where $\nabla (\cdot )$ is the gradient operator, and $\mathbf {R}_{i}$ and $\mathbf {L}_{i}$ denote the reference and the latent image respectively.

Then, we apply our kernel and gradient priors to the MAP framework of [17], and the deblurring formulation becomes

(19)$$\mathop{\arg\min}_{\mathbf{L},\mathbf{H}}\| \mathbf{L} \ast \mathbf{H} - \mathbf{B} \|_{2}^{2} + \mu \| \nabla \mathbf{L} - \nabla \mathbf{R} \|_{2}^{2} + \eta \| \mathbf{H}-\mathbf{G}(\hat{\sigma}) \|_{2}^{2}$$

where $\mu$ and $\eta$ is the parameters corresponding to the two priors. Note that we borrow the kernel energy item from [17], but the estimation of $\mathbf {G}(\hat {\sigma })$ is totally different. In this framework, multispectral images need to be deblurred according to a certain sequence, which ensures that sufficient known latent images can be involved in computing reference images within the range. More specifically, images are sorted by the channel distance to the well-focused channel in an ascending order, and then the images are to be deblurred one by one.

We employ an alternating iteration strategy as in [17] to solve Eq. (19), which can be divided into two sub-problems for the $i$-th channel, i.e.,

(20)$$\hat{\mathbf{H}}_i=\mathop{\arg\min}_{\mathbf{H}_i}\|\mathbf{L}_i\ast \mathbf{H}_i - \mathbf{B}_i \|_{2}^{2} + \eta \| \mathbf{H}_i-\mathbf{G}(\sigma_i) \|_{2}^{2}$$

and

(21)$$\hat{\mathbf{L}}_i=\mathop{\arg\min}_{\mathbf{L}_i}\|\mathbf{L}_i\ast \mathbf{H}_i - \mathbf{B}_i\|_{2}^{2} + \mu \| \nabla \mathbf{L}_i - \nabla \mathbf{R}_i \|_{2}^{2}$$

which computes the optimal kernel $\hat {\mathbf {H}}_i$ and deblurred image $\hat {\mathbf {L}}_i$ sequentially.

The sub-problem of Eq. (20) can be solved in the frequency domain as

(22)$$\hat{\mathbf{H}}^{\scriptscriptstyle{(n+1)}}=\mathcal{F}^{{-}1}\left(\frac{\overline{\mathcal{F}}(\hat{\mathbf{L}}^{\scriptscriptstyle{(n)}})\mathcal{F}(\mathbf{B})+\eta \mathcal{F}\left( \mathbf{G}(\sigma)\right)}{\overline{\mathcal{F}}(\hat{\mathbf{L}}^{\scriptscriptstyle{(n)}})\mathcal{F}(\hat{\mathbf{L}}^{\scriptscriptstyle{(n)}})+\eta}\right)$$

where $\mathcal {F}(\cdot )$ represents the 2D discrete Fourier transform (DFT), $\mathcal {F}^{-1}(\cdot )$ denotes the inverse of DFT, and $\overline {\mathcal {F}}(\cdot )$ denotes the conjugate of DFT. In Eq. (22), the $n+1$-th iteration of $\mathbf {H}_i$ can be computed based on the $n$-th iteration of $\mathbf {L}_i$.

Analogically, the solution of the sub-problem of Eq. (21) has the expression as

(23)$$\hat{\mathbf{L}}^{\scriptscriptstyle{(n+1)}}=\mathcal{F}^{{-}1}\left(\frac{\overline{\mathcal{F}}(\hat{\mathbf{H}}^{\scriptscriptstyle{(n)}})\mathcal{F}(\mathbf{B})+\mu \nabla_{\mathcal{F}} \mathcal{F}(\mathbf{R})}{\overline{\mathcal{F}}(\hat{\mathbf{H}}^{\scriptscriptstyle{(n)}})\mathcal{F}(\hat{\mathbf{H}}^{\scriptscriptstyle{(n)}})+\mu\nabla_{\mathcal{F}}}\right)$$

where $\nabla _{\mathcal {F}}$ is represented by

\nabla_{\mathcal{F}}=\overline{\mathcal{F}}(\nabla_{x})\mathcal{F}(\nabla_{x})+\overline{\mathcal{F}}(\nabla_{y})\mathcal{F}(\nabla_{y})

with $\nabla _{x}=\left [-1,1\right ]$ and $\nabla _{y}=\left [-1,1\right ]^{\mathrm {T}}$ as the gradient operator in the vertical and horizontal directions respectively.

3.3.3 Implementation

When solving Eq. (19) iteratively for the blur kernel and deblurred image, the lack of unknown latent images might reduce the quality of the reference image $\mathbf {R}$. Besides, the selection of the well-focused channel and the setting of the parameters $\mu$ and $\eta$ are also important to the performance of the whole deblurring process. Here, we present the implementation details towards the optimal results. The pseudo code corresponding to the whole process is described in Algorithm 2.

Selecting the well-focused channel. We suggest measuring the sharpness for each channel by using NSM and then selecting the well-focused channel. More Concretely, we select the channel with the smallest NSM value as the well-focused channel, because the ratio of the norm $L1/L2$ on the high frequencies of an image increases with blurriness (see [9] for details). Then, the well-focused channel of the target multispectral images can be easily determined.

Pre-processing. In order to ensure that the latent images $\{L_{i-W},L_{i-W+1},\ldots ,L_{i+W}\}$ within a local range of the current image $\mathbf {B}_{i}$ are available, we add a pre-processing ahead of deblurring. We utilize the alternating iteration solution for Eq. (19) to generate intermediate latent images with much fewer iterations than the deblurring process. At first, there is only $1$ latent image, i.e., the well-focused channel before pre-processing, while the latent images are all available, including $15$ intermediate latent images and $1$ real latent image after pre-processing. Although these intermediate latent images inevitably have some artifacts, they can provide part details as the initialization in the kernel estimation and deblurring process.

Parameters setting. There are two parameters in Eq. (19), i.e., $\mu$ and $\eta$. The sub-problem of Eq. (20) that contains $\eta$ is the same one with the expression of $P_{2}$ in [17], whereupon we utilize the generalized cross validation (GCV) for $\eta$.

Then, we estimate $\mu$ from the standard deviations of the Gaussian kernels. Generally, the larger standard deviation means a higher level of blurriness, which results in fewer details in the original images. The value of $\mu$ needs to be increased to make use of details in images at the adjacent channels. So we propose to set the parameter $\mu$ by the following empirical formula:

(24)$$\mu_{i}=\hat{\sigma}_{i}^{4}$$

where $\hat {\sigma }_{i}$ denotes the standard deviation of the Gaussian kernel at the $i$-th channel obtained by the MLS optimization in Sec. 3.2.2. The experiments in this paper demonstrate the effectiveness of this parameter setting to obtain high-quality deblurring results.

3.3.4 Analysis of gradient prior

We explain why we exploit the gradient-based prior instead of the intensity-based prior in [17] and demonstrate the advantages of our image prior as well as the reference image selection.

For multispectral images, the images at adjacent channels hold high correlations due to the local consistencies in the spectral dimension. Thus, the reference image constructed from neighboring channels is with high similarity with the latent image. By contrast, when a specific channel (namely the target channel) is deblurred, the channels far away from the target channel have little contribution to the guiding image (see Fig. 3 for examples). Moreover, some unfit information is introduced due to the selection of channels from the entire spectrum. Figure 4 shows an example that has the quality of guiding images significantly reduced due to the obvious artifacts, which is even worse than the simple average intensity of all the channels. Compared to guiding image construction in [17], we just select the most relevant channels to prevent abovementioned unfit information and retain most of the effective information, which gains the faithfulness in constructing the reference images in an efficient manner.

Fig. 3. The average weights of each channel in the entire spectrum to the target channels. These weights are all computed with ridge regression described in [17]. The three target channels in this figure are 2-nd channel (red), 12-th channel (orange) and 13-th channel (blue) respectively.

Download Full Size | PDF

Fig. 4. Comparison of the guiding image and the reference image corresponding to the $16$-th channel of the example in Fig. 1. (a) shows the ground truth image. (b) shows the guiding image obtained by [17]. (c) and (d) show the reference images obtained by using the average intensity and our method.

Download Full Size | PDF

With the usage of intensity-based prior, i.e., $||\mathbf {L}-\mathbf {R}||_{2}^2$, the intensities of deblurring result are forced to be close to the reference image everywhere in the whole spatial domain. However, the intensities of reference image differ from latent sharp image slightly due to minor differences between adjacent channels. These differences can easily lead to inaccurate results under the constraint of intensity-based prior. On the contrary, the gradient-based prior just impels the deblurred image to keep close with the reference image in the gradient domain, which means the data fidelity term maintains dominant roles in flat areas (i.e., the areas with minor gradient values). As a result, the inaccuracies introduced by the reference image are eliminated in flat areas. Figure 5 shows a 1D signal example, in which the reference image signal has slight differences with ground truth signal on both intensities and gradients. Here, we just replace the image prior in Eq. (21) by $\mu ||\mathbf {L}-\mathbf {R}||^2$ to form the intensity-based constraint as [17]. With the change of the weighting parameter $\lambda$, the restored signal by the intensity-based prior easily drifts off from the ground truth signal, which leads to unsatisfied result as well as the difficulties in parameter-tuning. By contrast, our gradient-based prior can reduce the errors even with large weighting parameter by imposing the constraint in the gradient domain.

Fig. 5. The deblurred results of a 1D signal by using gradient-based prior and intensity-based prior respectively with different parameters.

Download Full Size | PDF

4. Data analysis

We have implemented our kernel estimation and deblurring method for defocus multispectral images. In this section, we report the results by our method and some comparisons with other state-of-the-art multispectral image deblurring methods [17,19] as well as CA correction methods [5,13]. We choose three public multispectral image datasets for the tests: Columbia Vision Laboratory multispectral dataset (CAVE) [37], Interdisciplinary Computational Vision Lab multispectral dataset (ICVL) [38] and Deblurring with Inter-channel Correlation dataset (DIC) [17]. Here, we use the examples from CAVE and ICVL to construct the synthetic data, while DIC provides the real data of the defocus multispectral images.

All the experiments were done on a desktop computer with a $3.3$ GHz Intel Quad Core CPU and $8$GB RAM, and all the methods were implemented based on the Matlab programming. Four metrics are used for the quantitative evaluation, including the peak signal-to-noise ratio (PSNR), structure similarity (SSIM) [39,40], erreur relative globale adimensionnelle de synthese (ERGAS) [41] and spectral angle map (SAM) [42]. Generally, the larger PSNR/SSIM values or smaller ERGAS/SAM values indicate the better quality. We also made the statistics on the average computation time per channel to compare the computational cost of different deblurring methods. Next, we elaborate the details of our experiments.

To demonstrate the superiority of our method, we make the comparison with other multispectral image deblurring methods like [17,19]. The DIC method in [17] is a classic blind deblurring method for defocus multispectral images, which realizes both kernel estimation and image prior for deblurring, and is the most closely related to ours. The spectral-spatial total variation (SSTV) method in [19] is a 3D tensor-based non-blind deblurring method, which provides only the image prior with the known blur kernel. For those two methods, we use their default parameters in the papers, except for setting the same kernel size with ours, i.e., $31\times 31$. Next, we show the evaluation based on the synthetic data and the real data respectively.

4.1 Synthetic data

We synthesize the defocus multispectral images based on the examples from CAVE [37] and ICVL [38], where the wavelengths of the images of an example range from $400$nm to $700$nm with the interval of $10$nm, i.e., 31 channels in total for each example. To better simulate the defocus blur close to the real data, we apply the CoCs corresponding to different wavelengths. Firstly, the 16-th channel (at the wavelength of 550nm) is selected as the well-focused channel. Then, we get the diameters of CoCs for each channel by using the optics software ZEMAX (www.zemax.com), whereupon the standard deviations of Gaussian kernels for all the 31 channels can be calculated by applying the linear multiplication on the CoC diameters. Finally, the images at different channels are blurred by using these Gaussian kernels as well as adding $1\%$ random noise. Thus, we have the synthetic defocus multispectral images.

Since both the blur kernel and the sharp image at each channel are known as the ground truth, we can evaluate the accuracy of kernel estimation, the effectiveness of image priors, and the influence of noise and channel number for the methods of DIC [17], SSTV [19] and our method. Hence, we design a set of tests by combining different strategies of kernel estimation and image priors from the three methods. Additionally, it should be noted that we have 31 Gaussian kernels for multispectral images of each example and apply them to [17] and our method in the channel-by-channel manner. Besides, as mentioned in [19], we apply those kernels to SSTV by assuming that the overall PSF of the data cube is the linear combination of PSFs. Next, we discuss the details of the comparisons with respect to quantitative evaluation, noise and channel number.

Quantitative evaluation. Table 1 shows the 9 types of tests (#1$\sim$#9) and the statistics on the deblurring results based on different quantitative evaluation. For example, the #5 test is done by using the estimated kernel from our method but the image prior from the non-blind deblurring method of SSTV [19], while the #6 test is done by using the estimated kernel and image prior that are both from our method. Because the kernel used in the #5 and #6 tests is the same one, it indicates that our method realizes a better image prior than SSTV, which has the larger PSNR/SSIM values and smaller ERGAS/SAM values.

Table 1. The quantitative evaluation based on the synthetic data by the tests with different kernel estimation and image prior in the methods of DIC [17], SSTV [19], our method and the ground truth.

View Table | View all tables in this article

Generally, among the first six tests (#1$\sim$#6), our method (#6) performs better than other methods, which show the effectiveness of our kernel estimation and gradient prior. Moreover, we can infer from the statistics in #6 and #9 that the estimated kernels by our method are close to the ground truth kernel. Our gradient prior also shows its efficiency by the comparison among #7, #8 and #9. We will further show the detailed comparison in Sec. 5.

The noise influence. To evaluate how the noise affects the deblurring effects, we blur some examples from CAVE and ICVL, and add the handcrafted noise of different degrees, e.g., from $0.01\%$ to $10\%$. Then, we compare the results by our method with DIC [17] and SSTV [19]. Table 2 shows the statistics on the average PSNR values of the selected 16 channels with the largest blurriness. It can be seen that our method is more robust to the noise of small and moderate levels than the other two methods. But when the noise becomes very large, the deblurring results degrade severely for all the three methods (see Fig. 6 for example). This is because the quality of the reference image gets even worse to guide the correct deblurring of images at other channels.

Fig. 6. Deblurring results of three methods with the ground truth kernel on the sushi example from CAVE. These results are obtained by deblurring the synthetic data with difference levels of noise: (a) $0.01\%$, (b) $0.1\%$, (c) $1\%$, (d) $5\%$, and (e) $10\%$.

Download Full Size | PDF

Table 2. The influence of different level noise on the deblurring results. The average PSNR values are computed based on the most blurry 16 images from all the 31 channels.

View Table | View all tables in this article

The channel number influence. In fact, the number of channels affects the spectral quality of multispectral images. More concretely, the number of channels affects the wavelength difference between adjacent channels. Consequently, the local structural similarity of the multispectral images is also affected. We choose a subsequence of images from a CAVE example to form a new example of multispectral images with fewer channels. Then, we do the deblurring process by our method, DIC and SSTV respectively. Figure 7 shows the PSNR values of the deblurring results by setting different channel numbers. It can be seen that the quality of the restored images by all the three methods gets better rapidly when the number of channels increases, especially when the number is less than 10. Then, the increase of PSNR values slows down until reaching the maximum number, e.g., the 31 channels. Concretely, when the number is 10, the blur difference between adjacent channels is maximized where the two standard deviations of the 2D Gaussian PSFs are $2.51$ and $1.62$ in our synthetic data. If the difference of blur between adjacent channels is larger than this maximum value, the performance of the three methods gets worse rapidly. Overall, our method still generates the better deblurring results with higher PSNR values than the other two methods.

Fig. 7. The influence of the channel number on the deblurring results from CAVE dataset.

Download Full Size | PDF

4.2 Real data

The DIC dataset in [17] includes the real defocus multispectral images, which have the wavelength range from 400nm to 700nm with the interval of 20nm, i.e., 16 channels in total for each example. The well-focused channel corresponding to the blur-free image is the 9-th one. In this scenario, we design 6 types of tests in Table 3, where the kernels are estimated by DIC [17] and our method, and the image priors from DIC, SSTV [19] and our method respectively. Table 3 presents the 6 tests (#10$\sim$#15) and the statistics on the deblurring results based on different quantitative evaluations, in which our method is still superior to other methods because of the higher PSNR/SSIM values and lower ERGAS/SAM values.

Table 3. The quantitative evaluation based on the real data by the tests with different kernel estimation and image priors from the methods of DIC [17], SSTV [19] and our method.

View Table | View all tables in this article

5. Discussion

We discuss the details of the comparisons with respect to kernel estimation and image priors.

5.1 Accuracy of kernel estimation

Because SSTV [19] is a non-blind deblurring method, we compare the accuracy of kernel estimation by our method just with DIC [17] and the ground truth (only for the synthetic data). Here, we choose the same image prior in deblurring the synthetic data to examine the accuracy of kernel estimation. Therefore, the 15 tests in Table 1 and Table 3 can be classified into 6 groups: $\mbox {G}_1=\{{\#1, \#4, \#7}\}$, $\mbox {G}_2=\{{\#2, \#5, \#8}\}$, $\mbox {G}_3=\{{\#3, \#6, \#9}\}$, $\mbox {G}_4=\{{\#10, \#13}\}$, $\mbox {G}_5=\{{\#11, \#14}\}$ and $\mbox {G}_6=\{{\#12, \#15}\}$ , according to the use of different image priors. For the tests in $\mbox {G}_1$ and $\mbox {G}_4$ with the common image prior from DIC, it can be seen that the kernel estimated by our method (#4 and #13) enables much better deblurring quality than DIC (#1 and #10), and has the comparable results with the use of the ground truth kernel (#7). For the tests in $\mbox {G}_3$ and $\mbox {G}_6$ with the common image prior by our method, our kernel estimation (#6 and #15) also enables the deblurring effects much better than DIC (#3 and #12), and much close to the use of the ground truth kernel (#9). Note that our kernel estimation is not better than the DIC kernel estimation on the DIC dataset in all four metrics and the ICVL dataset in PSNR and ERGAS metrics for the tests in $\mbox {G}_2$ and $\mbox {G}_5$. This is because the SSTV method assumes that the overall PSF of the data cube is the linear combination of PSFs, which makes our kernel estimation inaccurate for these tests. However, our kernel estimation performs better than DIC kernel estimation in other tests, which still demonstrates the superiority of our kernel estimation in general.

Actually, DIC infers the kernel based on the statistics of the intensity deterioration with respect to the latent images. However, the large blurriness reduces the image correlations and thus results in intolerable error for estimated kernels. Contrarily, our method further exploits the physical model based on the relationship between wavelengths and blur kernel, which is usually able to obtain more accurate kernel estimation for better deblurring results than DIC, and very close to the ground truth. Figure 8 shows an real data example of the deblurring results at the $1$-st channel from the tests of #10 and #13. It can be seen that the result with our kernel has a better restoration on the appearance with respect to the ground truth image, which also brings a significant increase on the PSNR. We also make the statistics on the PSNR values for all the real data tests over all the 16 channels for DIC dataset (see Fig. 9). In summary, our method gains prominence on the deblurring results with the higher PSNR values.

Fig. 8. The comparison on the deblurring results for the $1$-st channel of an example by using our kernel and the one from DIC [17], but with the same image prior from DIC.

Download Full Size | PDF

Fig. 9. The comparison on the deblurring results in the 6 tests listed in Table 3. The PSNR values are from $12$ examples in the DIC dataset and averaged for each channel.

Download Full Size | PDF

It should be noted that the only difference in obtaining the above results is the kernel estimation, which indicates that our method gets more accurate kernels than DIC [17]. The reason is that DIC [17] adopts the intensity similarity to find the optimal kernel by deblurring the guiding image of the well-focused channel. However, the well-focused channel is usually located in the middle of all the channels (e.g., the $9$-th channel for real data), which might have large appearance difference between images at the well-focused channel and the others. For example in Fig. 1, the flower in the images at the well-focused channel and lower channel are clearly visible but completely invisible at the $16$-th channel, which exhibits obvious different image contents. This causes the less accuracy in kernel estimation, thus the quality degradation in the deblurred images, especially for images at the channels far away from the well-focused one. On the contrary, our kernel estimation is endowed with the prior from sophisticated physical model, and resolved in cooperation with the MLS optimization. This enables our method to produce more accurate kernels for deblurring the images at all the channels.

5.2 Effectiveness of image prior

We set the common kernels to evaluate the effectiveness of the image priors proposed in DIC [17], SSTV [19] and our method. The 15 tests in Table 1 and Table 3 can be classified into another 5 groups: $\mbox {G}_7=\{\mbox {\#1, \#2, \#3}\}$, $\mbox {G}_8=\{\mbox {\#4, \#5, \#6}\}$, $\mbox {G}_9=\{\mbox {\#7, \#8, \#9}\}$, $\mbox {G}_{10}=\{\mbox {\#10, \#11, \#12}\}$ and $\mbox {G}_{11}=\{\mbox {\#13, \#14, \#15}\}$, according to the use of different kernels. For the tests in $\mbox {G}_8$ and $\mbox {G}_{11}$ with the common kernel estimated by our method, it can be seen that our method (#6 and #15) generates better results than both DIC itself (#4 and #13) and SSTV (#5 and #14). This means our image prior is more effective towards the defocus deblur than DIC and SSTV with the same kernel. The results in the tests of other groups also verify the effectiveness of our method.

Actually, SSTV can not well handle the variation of kernels over a range of wavelengths, which reduces the quality of the deblurring results. However, some unfit information is suppressed because of the usage of local prior in their model. But taken together, they are still considered to be having negative effects on the deblurring results (see Fig. 10). While DIC combines the images through all the channels, it might induce the tangled artifacts from the channels far away with each other. On the contrary, our method considers the change of kernels caused by different wavelengths as well as the structure characterized by the gradients, so it significantly improves the quality of restored sharp images with the higher PSNR value as shown in Fig. 10. A more obvious example can be seen in Fig. 11 that shows a typical example for the artifact of the guiding image from a test in the group $\mbox {G}_{11}$, where the region in the red box of the original image is not easily recognizable, but in the guiding image it is noticeable. This is because the information in the images corresponding to the lower wavelengths are tangled into the guiding image.

Fig. 10. Deblurring result of three methods with the ground truth kernel on the beads example from CAVE. Unfit blur kernel and strong noise incur the obvious artifact in DIC [17] and SSTV [19], while our reference images have the spectral correlation to suppress the noise well.

Download Full Size | PDF

Fig. 11. Input image of the example Stamp_1 at the 16-th channel and its deblurring results with the same kernel by our method. (a) The ground truth image. (b)The input image. (c) The guiding image by DIC [17]. (d) The reference image by our method. (e) The deblurring result by DIC [17]. (f) The deblurring result by SSTV [19]. (g) Our deblurring result.

Download Full Size | PDF

Additionally, DIC usually comes at the considerably higher computational cost than our method. It takes about 6.05 seconds to perform kernel estimation and deblurring for the image of each channel on average, which is much slower than our method (about 1.85 seconds). Actually, due to the inaccuracy in kernel estimation, DIC needs more iterations in the MAP optimization, typically $20\sim 30$ times, to obtain the stable solution of a good restoration. As for SSTV, the combinational usage of kernels reduces the accuracy of PSF and slows down the convergence rate, whereupon it takes about $15\sim 20$ iterations towards a good restoration. On the contrary, our method only needs 10 iterations to obtain the desirable results.

5.3 Limitation

Although our method enables appealing results, it is not without limitations. We utilize the MSE to measure the blurriness of images at the adjacent channels in the stage of kernel estimation. But there are sometimes minor differences between these image appearances, which might cause the estimated kernel by our method still a little different from the ground truth. The gradient prior might have slight influence in deblurring the images with severe noise.

Additionally, our method deals with defocus blur images caused by the chromatic aberrations, which might not perform well on other types of blur, e.g., coma, transverse aberration, or distortion. There are two reasons. Firstly, it is impractical using our physical model to describe other types of blur that are caused by using more complex lenses, e.g., the achromatic lens with a concave lens. And the use of Gaussian PSFs to model these types of blur might lead to inaccurate kernel estimation. Secondly, the non-uniform blur in the spatial domain can also degrade the performance of our method. Actually, we have synthesized some non-uniform-blur examples by dividing images at each channel into several blocks and re-blurring these blocks with Gaussian PSFs of different standard deviations (randomly generated between 0 and 2). The results by our method show that the PSNR values of the deblurred images reduce over 0.5dB compared with the uniform-blur examples.

6. Conclusion

We have presented a novel method for deblurring defocus multispectral images. The use of physics and gradient prior enables our method with more accurate kernel estimation and high-quality restored images. The computational cost of our method is also lower than state-of-the-art deblurring methods.

As the future work, we plan to extend the physical model of blur kernel from the Gaussian PSF to a large functional space, and the gradient prior to 3D tensor based model. This would furthermore improve the deblurring quality in the case of some other blur types besides of defocus blur.

Appendices

A. Proof of Eq. (3)

For a single convex lens, the relationship among object distance, image distance and focal length can be written as

(25)$$1/d_i+1/{u_i} =1/{f_i}, 1/{\tilde{d}}+1/{\tilde{u}}=1/{\tilde{f}},$$

where $d_i$, $u_i$ and $f_i$ are the object distance, image distance and focal length for the $i$-th channel. And $\tilde {d}$, $\tilde {u}$ and $\tilde {f}$ represent the same meanings for the well-focused channel. Besides, the object distances for the two channels are the same. We denote the object distance by $d=d_i=\tilde {d}$. Next, we discuss the radius in the following two cases.

i) When $u_i>\tilde {u}$, as shown in Fig. 12(a), we derive the form of the radius of CoC based on Eq. (25) as

(26)$$r_i = A\frac{u_i - \tilde{u}}{u_i} = \frac{dA}{f_i}\frac{f_i-\tilde{f}}{d-\tilde{f}},$$

where $A$ is the aperture radius of the lens and $r_i$ denotes the radius of CoC in the imaging plane at i-th channel.

Fig. 12. The two cases for the radius of CoC

Download Full Size | PDF

ii) When $u_i<\tilde {u}$, as shown in Fig. 12(b) the $r_i$ can be derived as

(27)$$r_i = A\frac{\tilde{u}-u_i}{u_i} = \frac{dA}{f_i}\frac{\tilde{f}-f_i}{d-\tilde{f}}.$$

Finally, we get the form of the radius of CoC as

(28)$$r_i = \frac{dA}{f_i}\frac{|f_i-\tilde{f}|}{d-\tilde{f}}.$$

B. Proof of Eq. (4)

With the focal length $f_{s}$ for the well-focused channel and $r_{s}$ is its corresponding CoC radius, we have

(29)$${r_{i}}/{r_{s}}=\left({f_{s}}{|f_{i}-\tilde{f}|}\right)/{\left(f_{i}|f_{s}-\tilde{f}|\right)}$$

where the variables $d$ and $A$ in Eq. (3) can be eliminated by division. To further simplify this relationship, we borrow the famous lensmaker’s formula [35] for the thin lens, i.e.,

(30)$$\frac{1}{f_i} = (n_i-1)\left[ \frac{1}{C_{1}}-\frac{1}{C_{2}} \right]$$

where $C_{1}$ and $C_{2}$ denote the curvature radii of the lens surface closest to the object and the one farthest from object, and $n_i$ represents the refractive index at the $i$-th channel. Then, the ratio in Eq. (29) becomes

(31)$${r_{i}}/{r_{s}}={|\tilde{n}-n_{i}|}/{|\tilde{n}-n_{s}|}$$

where $\tilde {n}$ is the refractive index corresponding to the wavelength $\tilde {\lambda }$.

Furthermore, we keep the first two items of the Cauchy’s dispersion formula to represent the refractive index as

(32)$$n(\lambda)=\alpha+\frac{\beta}{\lambda^{2}}+\frac{\gamma}{\lambda^{4}}+\cdots$$

where $\alpha$, $\beta$ and $\gamma$ are the parameters that only depend on the lens material.

It has been verified that the first two items of the Cauchy’s equation are sufficient to represent the relationship between the refractive index and the wavelength in the range of visible spectrum. Thus, Eq. (31) can be simplified to be

(33)$${r_{i}}/{r_{s}}={|\frac{1}{\tilde{\lambda}^{2}}-\frac{1}{\lambda_{i}^{2}}|}/{|\frac{1}{\tilde{\lambda}^{2}}-\frac{1}{\lambda_{s}^{2}}|} $$

Additionally for standard deviation $\bar {{\sigma }}_{i}$ of the Gaussian kernel corresponding to the CoC with its radius $r_{i}$, there is a linear relationship between the two variables, i.e., ${{\bar \sigma }_i}=g\cdot r_{i}$ for a global constant $g$ [34]. Therefore, the relationship between ${{\bar \sigma }}_{i}$ and $\bar {{\sigma }}_{s}$ can be rewritten as

(34)$${\bar{\sigma}_{i}}/{\bar{\sigma}_{s}}={r_{i}}/{r_{s}} $$

Finally, we derive the form of the standard deviation $\sigma _{i}$ as

(35)$$\sigma_{i} \approx \sigma_{s \to i} = \sqrt{\bar{\sigma}_{i}^{2}-\bar{\sigma}_{s}^2} ={\bar\sigma_{s}} \sqrt{\frac{|\frac{1}{\tilde{\lambda}^{2}}-\frac{1}{\lambda_{i}^{2}}|^{2}}{|\frac{1}{\tilde{\lambda}^{2}}-\frac{1}{\lambda_{s}^{2}}|^{2}}-1} $$

where $\bar {\sigma }_{s}$, $\tilde {\lambda }$ and $\lambda _{s}$ are all fixed. Thus, $\sigma _{i}$ can be taken as a quadratic polynomial with respect to the wavelength.

Moreover, as the simple concatenation of single lens, simple lenses can be approximately regarded as a single lens. Therefore, it can also be taken as a quadratic polynomial with respect to the wavelength. Finally, we get the quadratic polynomial as

(36)$$\sigma_{i} \approx \frac{a}{\lambda_i^2}+\frac{b}{\lambda_i}+c.$$

Funding

National Natural Science Foundation of China (61922014).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geosci. Remote Sens. Mag. 1(2), 6–36 (2013). [CrossRef]

2. L. Wang, Z. Xiong, D. Gao, G. Shi, and F. Wu, “Dual-camera design for coded aperture snapshot spectral imaging,” Appl. Opt. 54(4), 848–858 (2015). [CrossRef]

3. T. Takatani, T. Aoto, and Y. Mukaigawa, “One-shot hyperspectral imaging using faced reflectors,” in Proc. CVPR, (2017), pp. 2692–2700.

4. J. Bardsley, S. Jefferies, J. Nagy, and R. Plemmons, “A computational method for the restoration of images with an unknown, spatially-varying blur,” Opt. Express 14(5), 1767–1782 (2006). [CrossRef]

5. T. Sun, Y. Peng, and W. Heidrich, “Revisiting cross-channel information transfer for chromatic aberration correction,” in Proc. ICCV, (2017), pp. 3268–3276.

6. S. B. Kang, “Automatic removal of chromatic aberration from a single image,” in Proc. CVPR, (2007), pp. 1–8.

7. T. Yue, J. Suo, J. Wang, X. Cao, and Q. Dai, “Blind optical aberration correction by exploring geometric and visual priors,” in Proc. CVPR, (2015), pp. 1684–1692.

8. Q. Shan, J. Jia, and A. Agarwala, “High-quality motion deblurring from a single image,” ACM Trans. Graph. 27(3), 1–10 (2008). [CrossRef]

9. D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in Proc. CVPR, (2011), pp. 233–240.

10. J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Deblurring images via dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2315–2328 (2018). [CrossRef]

11. A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Efficient marginal likelihood optimization in blind deconvolution,” in Proc. CVPR, 2657, 2664, (2011

12. C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf, “Non-stationary correction of optical aberrations,” in Proc. ICCV, 659, 666, (2011

13. F. Heide, M. Rouf, M. B. Hullin, B. Labitzke, W. Heidrich, and A. Kolb, “High-quality computational imaging through simple lenses,” ACM Trans. Graph. 32(5), 1–14 (2013). [CrossRef]

14. Y. Bai, G. Cheung, X. Liu, and W. Gao, “Graph-based blind image deblurring from a single photograph,” IEEE Trans. Image Process. 28(3), 1404–1418 (2019). [CrossRef]

15. C.-C. Lee and W.-L. Hwang, “Sparse representation of a blur kernel for out-of-focus blind image restoration,” in Proc. ICIP, (2016), pp. 2698, 2702.

16. L. Chen, F. Fang, T. Wang, and G. Zhang, “Blind image deblurring with local maximum gradient prior,” in Proc. CVPR, (2019), pp. 1742, 1750.

17. S.-J. Chen and H.-L. Shen, “Multispectral image out-of-focus deblurring using interchannel correlation,” IEEE Trans. Image Process. 24(11), 4433–4445 (2015). [CrossRef]

18. I. Kopriva, “Tensor factorization for model-free space-variant blind deconvolution of the single-and multi-frame multi-spectral image,” Opt. Express 18(17), 17819–17833 (2010). [CrossRef]

19. H. Fang, C. Luo, Z. Gang, and X. Wang, “Hyperspectral image deconvolution with a spectral-spatial total variation regularization,” Can. J. Remote Sens. 43(4), 384–395 (2017). [CrossRef]

20. L. Geng, X. Nie, S. Niu, Y. Yin, and J. Lin, “Structural compact core tensor dictionary learning for multispec-tral remote sensing image deblurring,” in Proc. ICIP, (IEEE, 2018), pp.2865–2869.

21. L. Yang and H. Ji, “A variational em framework with adaptive edge selection for blind motion deblurring,” in Proc. CVPR, (2019), pp. 10167–10176.

22. L. Li, J. Pan, W.-S. Lai, C. Gao, N. Sang, and M.-H. Yang, “Learning a discriminative prior for blind image deblurring,” in Proc. CVPR, (2018), pp. 6616, 6625.

23. O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in Proc. CVPR, (2018), pp. 8183, 8192.

24. H. Zhang and L. Carin, “Multi-shot imaging: Joint alignment, deblurring, and resolution-enhancement,” in Proc. CVPR, (2014), pp. 2925–2932.

25. H. Zhang, D. Wipf, and Y. Zhang, “Multi-observation blind deconvolution with an adaptive sparse prior,” IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1628–1643 (2014). [CrossRef]

26. T.-C. Lin, L. Hou, H. Liu, Y. Li, and T.-K. Truong, “Reconstruction of single image from multiple blurry measured images,” IEEE Trans. Image Process. 27(6), 2762–2776 (2018). [CrossRef]

27. M. Masoudifar and H. R. Pourreza, “Analysis and design of coded apertures for defocus deblurring based on imaging system properties and optical features,” IET Image Process. 11(12), 1123–1134 (2017). [CrossRef]

28. X. L. Zhao, F. Wang, T. Z. Huang, M. K. Ng, and R. J. Plemmons, “Deblurring and sparse unmixing for hyperspectral images,” IEEE Trans. Geosci. Remote Sens. 51(7), 4045–4058 (2013). [CrossRef]

29. Z. Ping, P. Ni, and R. Wang, “Learning to diversify patch-based priors for remote sensing image restoration,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 8(11), 5225–5245 (2015). [CrossRef]

30. Y. Song, D. Brie, E.-H. Djermoune, and S. Henrot, “Regularization parameter estimation for non-negative hyperspectral image deconvolution,” IEEE Trans. Image Process. 25(11), 5316–5330 (2016). [CrossRef]

31. Y. Chang, L. Yan, and S. Zhong, “Transformed low-rank model for line pattern noise removal,” in Proc. ICCV, (2017), pp. 1726–1734.

32. X. Liu, H. Shen, Q. Yuan, X. Lu, and C. Zhou, “A universal destriping framework combining 1-d and 2-d variational optimization methods,” IEEE Trans. Geosci. Remote Sens. 56(2), 808–822 (2018). [CrossRef]

33. D. Danon, H. Averbuch-Elor, O. Fried, and D. Cohen-Or, “Unsupervised natural image patch learning,” Computational Visual Media 5(3), 229–237 (2019). [CrossRef]

34. G. Xu, Y. Quan, and H. Ji, “Estimating defocus blur via rank of local patches,” in Proc. ICCV, (2017), pp. 5371–5379.

35. E. Hecht, “Optics 4th edition,” Addison Wesley Longman Inc 1 (1998).

36. H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image processing and reconstruction,” IEEE Trans. Image Process. 16(2), 349–366 (2007). [CrossRef]

37. F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum,” IEEE Trans. Image Process. 19(9), 2241–2253 (2010). [CrossRef]

38. B. Arad and O. Ben-Shahar, “Sparse recovery of hyperspectral signal from natural rgb images,” in Proc. ECCV, (Springer, 2016), pp. 19–34.

39. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef]

40. X. Wang, X. Liang, B. Yang, and F. W. Li, “No-reference synthetic image quality assessment with convolutional neural network and local image saliency,” Comp. Visual Media 5(2), 193–208 (2019). [CrossRef]

41. L. Wald, Data fusion: definitions and architectures: fusion of images of different spatial resolutions (Presses des MINES, 2002).

42. R. H. Yuhas, J. W. Boardman, and A. F. Goetz, Determination of semi-arid landscape endmembers and seasonal trends using convex geometry spectral unmixing techniques (1993).

No.	Kernel	Image prior	PSNR	SSIM	ERGAS	SAM	PSNR	SSIM	ERGAS	SAM
No.	Types of tests		Synthetic data based on CAVE [37]				Synthetic data based on ICVL [38]
#1	DIC [17]	DIC [17]	42.10	0.8069	60.78	0.1279	44.48	0.9435	22.79	0.1794
#2	DIC [17]	SSTV [19]	43.05	0.9177	34.02	0.0957	44.50	0.9439	20.29	0.1059
#3	DIC [17]	Ours	44.68	0.9391	28.2	0.0897	44.86	0.9513	20.70	0.1030
#4	Ours	DIC [17]	45.42	0.9640	22.69	0.0790	44.68	0.9114	31.05	0.2584
#5	Ours	SSTV [19]	45.36	0.9666	23.30	0.0755	44.22	0.9585	20.67	0.1043
#6	Ours	Ours	45.94	0.9673	22.02	0.0727	45.88	0.9631	19.63	0.0997
#7	Ground truth	DIC [17]	45.38	0.9245	31.68	0.0940	45.27	0.9625	18.64	0.1011
#8	Ground truth	SSTV [19]	45.65	0.9673	22.58	0.0726	45.40	0.9619	18.02	0.0955
#9	Ground truth	Ours	46.01	0.9696	22.02	0.0728	45.89	0.9634	17.98	0.0943

Methods	PSNR values (dB) with the noise of different levels(%)
	0.01%	0.1%	1%	5%	10%
	DIC [17]	36.9	36.8	32.7	21.4	15.9
SSTV [19]	30.1	30.1	28.2	18.1	16.0
Ours	39.5	39.3	32.9	21.7	16.1

No.	Kernel	Image prior	PSNR	SSIM	ERGAS	SAM
#10	DIC [17]	DIC [17]	35.72	0.9064	48.14	0.0005
#11	DIC [17]	SSTV [19]	31.71	0.6464	78.16	0.0018
#12	DIC [17]	Ours	35.81	0.9030	47.87	0.0005
#13	Ours	DIC [17]	36.01	0.9076	47.48	0.0004
#14	Ours	SSTV [19]	31.62	0.6358	78.57	0.0030
#15	Ours	Ours	36.26	0.9210	45.70	0.0003

No.	Kernel	Image prior	PSNR	SSIM	ERGAS	SAM	PSNR	SSIM	ERGAS	SAM
No.	Types of tests		Synthetic data based on CAVE [37]				Synthetic data based on ICVL [38]
#1	DIC [17]	DIC [17]	42.10	0.8069	60.78	0.1279	44.48	0.9435	22.79	0.1794
#2	DIC [17]	SSTV [19]	43.05	0.9177	34.02	0.0957	44.50	0.9439	20.29	0.1059
#3	DIC [17]	Ours	44.68	0.9391	28.2	0.0897	44.86	0.9513	20.70	0.1030
#4	Ours	DIC [17]	45.42	0.9640	22.69	0.0790	44.68	0.9114	31.05	0.2584
#5	Ours	SSTV [19]	45.36	0.9666	23.30	0.0755	44.22	0.9585	20.67	0.1043
#6	Ours	Ours	45.94	0.9673	22.02	0.0727	45.88	0.9631	19.63	0.0997
#7	Ground truth	DIC [17]	45.38	0.9245	31.68	0.0940	45.27	0.9625	18.64	0.1011
#8	Ground truth	SSTV [19]	45.65	0.9673	22.58	0.0726	45.40	0.9619	18.02	0.0955
#9	Ground truth	Ours	46.01	0.9696	22.02	0.0728	45.89	0.9634	17.98	0.0943

Methods	PSNR values (dB) with the noise of different levels(%)
	0.01%	0.1%	1%	5%	10%
	DIC [17]	36.9	36.8	32.7	21.4	15.9
SSTV [19]	30.1	30.1	28.2	18.1	16.0
Ours	39.5	39.3	32.9	21.7	16.1

High-quality blind defocus deblurring of multispectral images with optics and gradient prior

Abstract

1. Introduction

2. Related works

2.1 Natural image deblurring

2.2 Multispectral image deblurring

3. Methods

3.1 Degradation model

3.2 Defocus blur kernel estimation with optics prior

3.2.1 Physical model

3.2.2 Kernel estimation

3.3 Content-adaptive deblurring based on gradient prior

3.3.1 Content-adaptive reference image selection

3.3.2 Deblurring with the gradient prior

3.3.3 Implementation

3.3.4 Analysis of gradient prior

4. Data analysis

4.1 Synthetic data

4.2 Real data

5. Discussion

5.1 Accuracy of kernel estimation

5.2 Effectiveness of image prior

5.3 Limitation

6. Conclusion

Appendices

A. Proof of Eq. (3)

B. Proof of Eq. (4)

Funding

Disclosures

References

Cited By

Figures (12)

Tables (3)

Equations (39)

Optics Express