How to deal with color in super resolution reconstruction of images

Rui Gong; Yi Wang; Yilin Cai; Xiaopeng Shao

doi:10.1364/OE.25.011144

1. Introduction

High resolution (HR) images are often desired in applications since they can supply more detailed information. Other than reducing pixel size or increasing chip size at the aspect of hardware, super resolution (SR) reconstruction is a promising technology to achieve HR images from low resolution (LR) images without updating imaging devices, which is a resolution enhancement approach at the aspect of signal processing [1, 2]. Generally, SR reconstruction requires one frame or multiple frames of LR images [3–5].

Recently, learning-based algorithms became an open and widely investigated topic in this field, which stemmed from using a database of training images to create plausible high-frequency details in zoomed images [6]. Inspired by the compressed sensing theory, a sparse-coding-based SR algorithm was proposed [7, 8], which opened up a favorable situation of learning-based SR studies. Most researches in this field were focused on the improvements of the sparse-coding-based method, such as to perfect the process of sparse representation [9], or to make the sparse domain selection and regularization become adaptive [10]. These algorithms showed their advantage of combining the apriori knowledge from LR images since they could induce more high frequency information from training samples. However, their reconstruction performances depend on the size of the formed dictionaries (the amount of atoms), so that improving the amount of atoms may help to achieve better reconstruction effects, but would increase computational complexity. To resolve this problem from the aspect of optimizing dictionary training, more studies were concentrated on building more ascendant dictionaries in learning process, e.g. dual-dictionary learning [11], multi-scale dictionary learning [12], geometric dictionaries [13], and adaptive dictionary learning [14]. Currently, most learning-based algorithms cannot fulfil real time requirement, which is a limit in applications. Therefore, the employment of embedded platform may supply a solution from the aspect of hardware. Another approach to realize nearly real time SR reconstruction was based on deep learning such as convolutional neural networks, which could show good performance on time cost in the reconstruction process [15, 16]. These deep learning algorithms often cost several days or more in the training process and involve large quantity of image samples. In addition, methods of protecting edge information for the deep learning SR algorithms should be investigated.

These SR methods are designed to acquire more spatial information in the processing of resolution enhancement, to a great extent they are presented as for grayscale images (intensity images). While for a color image the treatment is simple, it is decomposing the color information into different dimensions such as in RGB (red, green, blue) space and in YCbCr system [17], and then merging the reconstructed grayscale images of individual dimensions to form an HR color image via a reversible operation. However, few studies explore the influences of color spaces on image reconstruction effects, even implementing the same SR reconstruction algorithm in different color spaces may produce diverse results.

For that matter, this study aimed to systematically investigate how the selection of color spaces take effect in the process of image SR reconstruction, in which various typical color spaces were tested and compared using a reconstruction algorithm based on classified dictionary learning. From the aspect of improving the dictionary training process, the proposed algorithm could improve the reconstruction quality without increasing computational complexity by sorting image features into several dictionaries. The evaluated spaces included RGB, YCbCr, YIQ, HSV (hue, saturation, value), HSI (hue, saturation, intensity), and CIELAB [17–21], which produced corresponding color coordinate systems. Moreover, the necessity of whether all the dimensions need to be reconstructed was discussed. In addition, considering the frequently-used indexes of comparing reconstruction results were based on the diversities of image digital inputs, such as peak signal to noise ratio (PSNR) [22] and structural similarity index (SSIM) [22, 23], which did not reflect perceptual differences between two images. Therefrom, the comparisons among different color spaces were carried out by some typical numerical measures involving both digital inputs’ deviations and perceptual differences. Finally the recommended color coordinate systems and how to process their dimensions were presented, and the conclusions could support beneficial strategies on how to deal with image color information for any given SR reconstruction algorithm in applications.

2. Methods and algorithms

2.1 Procedure and selection of color coordinate systems

The whole procedure for a general SR reconstruction algorithm is demonstrated in Fig. 1, as well as the process of dealing with image color information. For a digital color image, its digital inputs are usually described in the three dimensions of RGB color space [17], i.e. using the digital inputs of (d_R, d_G, d_B) coordinates for each pixel. This image can be split into three grayscale pictures of P₁, P₂, and P₃, in which each picture represents a separate and independent color coordinate. Usually, most color spaces hold a three-dimensional coordinate system. For example, in RGB color space, the grayscale pictures form red channel d_R, green channel d_G, and blue channel d_B can be regarded as P₁, P₂, and P₃ respectively. Certainly, in other color spaces, the settings of P₁, P₂, and P₃ are different accordingly. Afterwards, the processing of the three grayscale pictures of P₁, P₂, and P₃ can employ two strategies, the first one is handling these three pictures all by a learning-based SR reconstruction algorithm separately and combining the reconstructed P₁', P₂', and P₃' pictures to a new color image, while the second one is reconstructing merely P₁ to P₁' via SR reconstruction algorithm and deriving P₂' and P₃' by pixel interpolation algorithm. The operations of both decomposing one color image into three LR grayscale pictures and merging the processed HR grayscale pictures into one color image are reversible, which will not bring about any color transformation errors. In the course of dictionary learning, some HR image samples along with their downsampled LR versions are regarded as training samples and handled by a learning procedure with K-means singular value decomposition (K-SVD) method and principal component analysis (PCA) method [24]. Then, an HR image with more detail information can be derived from its LR version and the sparse representation coefficients, which are solved based on the pairs of HR dictionaries and LR dictionaries via the orthogonal matching pursuit (OMP) method [25].

Fig. 1 Procedure of SR reconstruction and dealing with color information in an image.

Download Full Size | PDF

As for the selection of color coordinates, in principle, color spaces can produce corresponding color coordinate systems, which can be distributed into three categories. The first category stems from describing color by additive color principle, such as the (d_R, d_G, d_B) coordinates in RGB color space, as well as the device-independent color specification way, i.e. the Commission Internationale de L'Eclairage (CIE) stipulated XYZ system [20, 21]. The second category of color coordinate systems comes from conventional television signal standards, and these color coordinate systems include the YCbCr of phase alteration line (PAL) and the YIQ proposed by National Television Standards Committee (NTSC), which employ one dimension to represent bright and dark (such as luminance) information and the other two dimensions to express chromatic information [26]. The third category is based on a theory that human eyes often perceive color in three dimensions that are considered as color appearance parameters, i.e. hue, brightness/lightness (or saturation), and colorfulness/chroma [19, 27]. These color systems not only include the HSV and the HSI in computer graphics [18], but also involve the CIELCh system generated from the CIE recommended CIELAB color space [20, 21]. Another color coordinate system based on CIELAB space is adopting one dimension L* to describe brightness/lightness and two dimensions a*, b* of chromatic information [21], which can be sorted into the second category such as YCbCr and YIQ.

For the color coordinate systems of the first category, the three grayscale pictures P₁, P₂, and P₃ are in equal positions which should be handled in the same way. While for color coordinate systems of the second and the third categories, they both have a dimension to express the bright and dark information of images, such as luminance and brightness/lightness, which can be set as the P₁ coordinate in Fig. 1, whereas P₂ and P₃ coordinates can represent their two dimensions of chromatic information. Since the bright and dark coordinate is usually considered to undertake more information than the two chromatic coordinates in digital color transmission, the strategies of spacial resolution enhancement for P₂ and P₃ can involve either the SR reconstruction algorithm or the pixel interpolation method. Table 1 lists the selected color spaces/systems in this study and their corresponding coordinates of P₁, P₂, and P₃. A matter needing attention is that the numerical ranges of the second to the fourth columns in Table 1 are widely different for these involved color spaces/systems, even negative values may appear. However, for image processing such as SR reconstruction and pixel interpolation, the input data should be positive and in a fixed interval such as 0~1 or 0~255. Therefore, these various numerical ranges should be preprocessed, since both constraining negative values to 0 and forcing values larger than 1 to be as 1 will cause terrible image distortions. Thus, for all the color coordinates in Table 1, they should be converted to a suitable numerical range, here, a normalization of 0.1~0.9 is carried out when importing P₁, P₂, and P₃ coordinates into the process of resolution enhancement, in order to ensure all the data are suitable for the disposes of SR reconstruction and the pixel interpolation. The determination of normalization range is based on two aspects of consideration, for one thing, boundary points (the minimum 0 and the maximum 1) in intercity images are usually hard to handle since only one side of the numeral information could be employed, so a normalization range narrower than 0~1 is implemented to avoid treating boundary points. For another, the reason for not choosing 0.2~0.8 or 0.3~0.7 settings is that, it is unnecessary to leave too wide range to boundary points, and too narrow ranges for SR reconstruction and pixel interpolation might be disadvantageous in some degree because they cause intensive data distribution. In addition, the two right columns of Table 1 give the denomination for the strategies on how to deal with color information in the process of resolution enhancement, which is also used in “Section 3”.

Table 1. Selection of color spaces with their corresponding color coordinates.

View Table | View all tables in this article

Thus, even for the same SR reconstruction method, the employment of various color spaces/systems would induce different image quality performances and visual effects. Moreover, even for the same selected color space/system, the strategies of whether dealing with P₁, P₂, and P₃ by SR method in parallel or handling P₂ and P₃ by pixel interpolation may cause different results, which should be evaluated by image numerical measures in detail.

2.2 SR reconstruction of classified dictionary learning

This study presents an SR reconstruction algorithm based on dictionary learning to increase the resolution of a grayscale image, which originates from the aspect of optimizing dictionary learning process. In the proposed procedure, all the features (atoms) can be reasonably sorted into several dictionaries. The computational complexity in the reconstruction step can be deduced, because the involved atoms to reconstruct one image patch become a fraction of all the atoms. Compared to other SR reconstruction algorithms, when using the same amount of atoms, the proposed SR algorithm would cost less time in the reconstruction process. The whole procedure of SR reconstruction is illustrated in Fig. 2. After partitioning the image samples into some HR patches and acquiring their corresponding LR patches by means of downsampling, the features can be extracted by K-SVD method and their dimensionalities could be reduced by PCA technique. Then these features from training samples are clustered into separate groups by K-means cluster algorithm [24, 28], which form several clusters of dictionary pairs, that is to say, each cluster contains pairs of HR dictionaries and LR dictionaries. Thus, each image patch will be reconstructed based on its most suitable dictionary, so that the high quality of reconstructed images can be ensured. As for the target LR grayscale picture, firstly it should be cut apart into a number of LR patches such as 50 × 50 pixels, since the information of different areas in one image may be various, so that disposing patches rather a whole image could make the matching between the trained dictionary pairs and LR patches more accurate. Afterward, for the extracted features of target LR patches, a suitable cluster of dictionary pairs with high similarity can be chosen according to the weighting information from the K-means cluster algorithm, and then all the LR patches of different clusters are all reconstructed by using trained HR and LR dictionary pairs along with the solved sparse representation coefficients [29].

Fig. 2 The work flow of SR reconstruction for grayscale images.

Download Full Size | PDF

The idea of image sparse representation is originated from the compressed sensing theory, it is stated that natural images can be sparsely represented by some dictionary matrixes [30]. Supposing the image is x ∈ Rⁿ, as shown in Eq. (1), it can be represented by the overcomplete dictionary D = [d₁, d₂, …, d_m] ∈ R^{n × m} (n < m), which is the linear combination of elements, i.e. the vector d_i (i = 1, 2, …, m). The α = [α₁, α₂, …, α_m]^T ∈ R^m is the matrix of sparse representation coefficients, which satisfies the inequation ||α₀|| << n. The symbol ||α₀|| means the number of nonzero elements.

x = D α, {‖ α ‖}_{0} < < n

In specifics, based on the theory of image sparse representation, the two key technological sections are elaborated in the following.

(i) Classified dictionary learning

Dictionary learning is to train a dictionary pair of sparse representation by the method of machine learning using the given samples, in which features of training samples are split into several clusters, and pairs of HR dictionaries and corresponding LR dictionaries are derived. The employment of classified dictionaries can use more features and reduce training time because of the introduction of similarity among features.

Each image sample is partitioned into p HR patches, denote the i th original HR patch is x_i (i = 1, 2, …, p), all the original HR samples are { x_i }. Use the bicubic method to reduce the resolution (to make the pixel of each side become the 1/3 of its original version) of { x_i }, and then enlarge them via the same method to obtain their corresponding LR patches { u_i }. As shown in Eq. (2), remove low frequency information from high frequency information to get HR features.

f^{i} = {x_{i}} - {u_{i}}

Afterwards, high frequency features are extracted by the two-dimensional filtering operator F shown in Eq. (3), including first-order and second-order derivatives.

F = [F_{1} F_{2} F_{3} F_{4}]

The four sub-operators of F are represented in Eq. (4), in which LoG means an operation of 5 × 5 Laplacian of Gaussian filtering.

\begin{array}{l} F_{1} = [1, -1] F_{2} {=F}_{1}^{T} \\ F_{3} =LoG F_{4} {=F}_{3}^{T} \end{array}

After the F operation, high frequency features of HR patch f_xⁱ and those of LR patch f_uⁱ are obtained, so that pair of extracted features can be expressed as { f_xⁱ, f_uⁱ }. To reduce time cost of dictionary learning, the dimensionality reduction is applied on the high frequency features of LR patch via the PCA algorithm.

The high frequency features can be classified into K clusters with K cluster centers using the K-means cluster algorithm, and its cluster center is C_k (k = 1, 2, …, K). The features in k th cluster can also be written as Eq. (5).

Z_{k} = [F x_{1}^{k}, F x_{2}^{k}, ..., F x_{p}^{k}]

The process of classified dictionary learning is expressed as Eq. (6), by which each cluster can be trained into a dictionary pair D^k, and α_i is its matrix of sparse representation coefficients, which contains the elements of α_i^k (k = 1, 2, …, K). The T_k is the parameter that controls the degree of sparse representation.

\min_{D^{k}, α_{i}} {‖ Z_{k} - F D^{k} α_{i} ‖}, s .t . {‖ α_{i}^{k} ‖}_{0} \leq T_{k}

This is an ill-posed problem, and the K-SVD method can be employed to solve Eq. (6) and to calculate D_h^k and D_l^k, finally the k th dictionary pair D^k = { D_h^k, D_l^k } is obtained.

(ii) Reconstruction algorithm

After exacting each dictionary pair from training samples by learning process, a reconstruction algorithm is adopted to exactly recover high frequency information for an input LR image.

Firstly, this LR image is partitioned into m LR patches with a certain size of n × n pixels, where n is a constant. Then, each LR patch is scaled up using the bicubic method (to make the pixels of each side become 3 times of its original version) to obtaining the i th patch y_i (i = 1, 2, …, m).

The feature extraction for patch y_i can be implemented by the filtering operator F shown in Eqs. (3) and (4), so its LR features { Fy_i } is obtained. Afterwards, the dimensionality reduction of PCA algorithm is applied on the { Fy_i }.

For the classified dictionaries, the dictionary pair is selected to calculate its HR version based on the similarity between each cluster center C_k (k = 1, 2, …, K) and this LR patch y_i. According to the K-means cluster algorithm, Eq. (7) expresses the function to find the most suitable dictionary pair D^k = { D_h^k, D_l^k }, in which g_k{ y_i, C_k } as the criterion of similarity is expressed as the membership grade of Euclidean distance between y_i and C_k.

D^{k} = \max (g_{k} {y_{i}, C_{k}})

According to the optical observation model, each LR patch y_i can be expressed as Eq. (8), where S, H, η denote the down sampling, blurring, and additive noise of the optical system respectively, and ${\hat{y}}_{i}$ represents the ideal high resolution patch.

y_{i} = SH {\hat{y}}_{i} + η

The key step is to calculate the unknown ideal HR patch ${\hat{y}}_{i}$ through the LR patch y_i and the selected dictionary D^k = { D_h^k, D_l^k }, the expressions of Eqs. (9) and (10) are used. Here, β_i^k is the sparse representation coefficient of y_i under the selected sub-dictionary D_l^k and β_i^k can be calculated by the OMP method. As shown in Eq. (10), the ideal HR patch ${\hat{y}}_{i}^{k}$ can be obtained through multiplying D_h^k by β_i^k.

β_{i}^{k} = \underset{β_{i}^{k}}{\arg \min} {‖ F y_{i} - FSH D_{l}^{k} β_{i}^{k} ‖}_{2}^{2}, s .t . ‖ β_{i}^{k} ‖ \leq T_{0}

{\hat{y}}_{i}^{k} = D_{h}^{k} β_{i}^{k}

Thus, repeat the above steps to the other LR patches and obtain their HR patches, and finally join the m HR patches together to a whole HR image.

To show the visual effects of the proposed SR reconstruction algorithm, a grayscale image of optical resolution test board RT-MIL-T4102 was employed as an original HR image, and its downsampled version was LR image. Hereby, the reconstructed HR version (multiplying power values is 3 × 3) from the above algorithm could be visually assessed and compared. Figure 3 supplies the visualization results, in which the spacial resolution parameter of line pairs can be calculated. From Fig. 3, it can be seen, compared to the LR version, this algorithm can improve the identifiable line pairs by 40 percent (from 1.41 lp/mm to 2.00 lp/mm).

Fig. 3 SR reconstruction results for a grayscale image of the optical resolution test board.

Download Full Size | PDF

3. Experiment and discussions

To analyze the impacts of various color coordinate systems and their strategies of dealing with color, a set of digital color images exhibited in Fig. 4 [all photos were taken by the authors] was adopted as test samples, and the image contents involved plants, buildings, portraits, still objects, and landscapes, etc. The selection of these test samples aimed to cover a certain range in color, shadow, and frequency to a feasible extent. The HR versions of test images were regarded as “ideal answers”, and then they were downsampled by averaging the information of the nearest 3 × 3 pixels to get their LR test images, which employed a similar process of tuzzy sampling of digital cameras. The resolution of HR images was 1800 × 1800 pixels, so that the resolution of their LR versions was 600 × 600 pixels. Other multiplying power values were also feasible in the SR reconstruction procedure, here we adopted the setting of 3 × 3. As the inputs of the process, the selected LR images were firstly split into three color coordinates of P₁, P₂, and P₃ according to the settings of Table 1, then the grayscale pictures from each coordinate could be reconstructed via the SR algorithms of classified dictionary learning described in “Section 2.2”. For the reconstruction in the CIELAB color space, the (d_R, d_G, d_B) coordinates of RGB color space would firstly be transformed to XYZ system according to the sRGB (standard red green blue color space) standard [31] for digital images, then the color coordinates in CIELAB space, i.e. (L*, a*, b*), and (L*, C_ab*, h_ab), were obtained [20, 21]. In this way, 14 HR versions could be collected correspondingly for each LR test image in total.

Fig. 4 The selected test images as inputs of the SR reconstruction procedure (the first row, T1~T6, the second row, T7~T12, the third row, T13~T18).

Download Full Size | PDF

Therefrom, to compare which color coordinate systems can perform more effectively for SR reconstruction, the numerical analysis was implemented by calculating the differences between the reconstructed images and the original HR test images, not only included common indexes of PSNR and SSIM, but also involved the CIE recommended color difference formula, i.e. CIEDE2000 (ΔE₀₀). The introduce of color difference is based on the consideration that, PSNR is mainly to show the diversities on image digital inputs (d_R, d_G, d_B), and SSIM is a parameter to present content structural similarity between two images based on digital inputs as well, while color difference formulae are more suitable to convert the diversities of digital inputs (d_R, d_G, d_B) into differences on human perceptual feelings. Therefore, the CIEDE2000 was employed, usually color difference values smaller than 3 ΔE₀₀ units could be considered as visually acceptable between two color pairs [32, 33]. The mean ΔE₀₀ value for all the pixels was employed as an index to compare the overall performances among various strategies, whereas the maximum ΔE₀₀ value was adopted to show the worst visual effects caused by the improper reconstruction. Table 2 lists results of these four indexes for all the 14 strategies, along with their time cost, in order to take the time efficiency and image quality synthetically. Here, the time cost was recorded by a computer (Intel Core CPU i5-4460 3.20GHz, RAM 8G) with MATLAB R2014b software. In Table 2, the listed data of the four indexes are the mean values calculated from the numerical results of all the 18 selected images for each given strategy.

Table 2. Comparison on numerical measures and time cost for the test images

View Table | View all tables in this article

From Table 2, the performances of the 14 strategies on selecting color coordinate systems and dealing P₂ and P₃ coordinates can be deduced. According to the definitions of these indexes, higher values of PSNR and SSIM show higher similarities between two images, while higher values of ΔE₀₀ represent poorer similarities. As for SSIM, all the values are 1.000 in the case of four significant digits, indicating that SSIM is not a suitable parameter to evaluate differences between the reconstructed version and the ideal version for the same image and it is not susceptible for two images with close resemblance, while it pays more attention on similarity of contents such image outlines. For the SSIM and PSNR results, they show accordant tendencies for these 14 strategies, but PSNR is an index that is often more sensitive to image contents to a certain extent, e.g. for different test images T1~T18, their PSNR ranges of the 14 SR versions locate in separate numerical regions. Whereas CIEDE2000 is an index mainly concerned on the perceptual differences between two images, and it is unconcerned on the contents of different test images. As shown by the PSNR and CIEDE2000 results, the better image reconstruction quality comes from the strategies of M-YIQ-1, M-YIQ-3, M-LAB-1, and M-LAB-3 (mean color difference CIEDE2000 values smaller than 2 ΔE₀₀ units, and PSNR values larger than 27). Their color coordinate systems have a communality, that is, they are all from the second category of color systems with one dimension to represent bright and dark information and the other two dimensions to express chromatic information. The medium performance is from M-RGB, M-XYZ, M-YCbCr-1, M-YCbCr-3, M-LCh-1, and M-LCh-3. While the strategies of M-HSV-1, M-HSV-3, M-HSI-1, and M-HSI-3 cannot achieve good SR reconstruction effects. In Fig. 5, all the 14 SR reconstructed versions and their original HR version for some representative area in image T3 are exhibited. By comparing to the “ideal answer”, the strategies of M-LCh-1, M-LCh-3, M-HSV-1, M-HSV-3, M-HSI-1, and M-HSI-3 would lead to some wrong interim colors in the stripes of petals.

Fig. 5 Visual comparisons for image T3 including the14 SR reconstructed versions and the original HR version.

Download Full Size | PDF

These above strategies with poor performances have a common ground, they are all based on the color coordinate systems from the third category that express color information in three dimensions of hue, brightness/lightness, and colorfulness/chroma. Figure 6 depicts another extreme example of the failures cause by these color coordinates systems. It can been that, the pixels in the color boundaries between different contents with disparate color information may suffer from a serious distortion, especially for the HSV and HSI systems. These results can also be verified by the extremely high values of maximum ΔE₀₀ in Table 2, since values larger than 10 ΔE₀₀ units could be regarded as abnormal reconstructed colors for the corresponding pixels. According to the visualization results and numerical calculation of maximum ΔE₀₀, it can be concluded that these coordinates are not suitable for SR reconstruction and pixel interpolation. One explanation on their poor results is that, the hue and saturation (similar dimension of colorfulness/chroma) information is discontinuous in color images, even in the area of the same content, hue and saturation would show disconnected points, which are not appropriate results neither for the pixel interpolation nor for the SR reconstruction algorithm. So treating the hue and saturation coordinates as grayscale images may cause improper median values far from the truth, and it is the reason for the emergence of the green pixels in the contiguous area between the white pixels and the red pixels, as shown in Fig. 6.

Fig. 6 Examples of image T16 to represent failures for color coordinate systems from the third category.

Download Full Size | PDF

As for the time cost, an obvious outcome is that strategies of treating P₂ and P₃ coordinates using SR process would cost about 3 times of second quantity compared with those of using pixel interpolation method, thus, from the aspect of productiveness, the M-YIQ-1 and M-LAB-1 can take away less time and achieve equivalent quality in comparison with the M-YIQ-3 and M-LAB-3, which are more functional and more efficient in applications. Moreover, to show the whole reconstruction process with real images using the recommended strategies of dealing with color, Fig. 7 shows the phased visual results of each step when reconstructing a color image by the M-LAB-1 strategy (Only part of image T14 with high frequency information is depicts, in order to give more details). It can indicate that, the L* coordinate carries more detailed information than a* and b* coordinates, thus there are no essentialities to employ the SR algorithm for the two chromatic dimensions of a* and b*.

Fig. 7 The phased visual results of each step when reconstructing part of color image T14 by the M-LAB-1 strategy.

Download Full Size | PDF

To sum up, the conclusions derived from the results includes three following points. Firstly, the color coordinate systems of CIELAB and YIQ are suitable mapping spaces for the resolution enhancement operations, including pixel interpolation and SR reconstruction. Secondly, take time cost into consideration, it is unnecessary to treat the coordinates of P₁, P₂, and P₃ by the SR reconstruction process simultaneously, since only the P₁ coordinate of bright and dark information (including luminance, and brightness/lightness) is the dimension that need to be reconstructed by the SR algorithm. Accordingly, for the RGB and XYZ color systems from the first category, their algorithms may cost more time, though they achieve acceptable image quality for SR reconstruction. Thus, the recommended strategies of dealing with color information in the SR reconstruction are implementing the SR algorithm for merely L* coordinate of CIELAB space or merely Y coordinate of YIQ system, while the other two coordinates should use the pixel interpolation to enhance resolution. The third point is that, the color coordinates systems from the third category with three dimensions of hue, brightness/lightness and colorfulness/chroma will cause severe color distortions when they are treated as coordinates in the process of resolution enhancement, especially for the HSV and HSI systems. The color coordinates systems from this category cannot provide suitable mapping spaces neither for SR reconstruction nor for pixel resolution.

4. Conclusions

To explore optimum strategies of dealing with color information in SR reconstruction, a method with classified dictionary learning was designed, then various color spaces/systems including RGB, YIQ, YCbCr, HSI, HSV, and CIELAB were involved. Moreover, whether all the three color dimensions need to be reconstructed was tested for those color spaces. Beside PSNR and SSIM, the color difference formula of CIEDE2000 was employed to reflect perceptual differences. Based on the comparisons of these numerical measures, the recommended strategies to obtain good image reconstruction quality are adopting merely L* coordinate in CIELAB space or merely Y coordinate of YIQ system, which indicate the color coordinate systems with one dimension of bright and dark information and the other two dimensions to express chromatic information have more advantages in the process of SR reconstruction. That is to say, only the coordinate of bright and dark information should be reconstructed by the method of classified dictionary learning. Though handling the three coordinates using the same way for (Y, Cb, Cr) and (L*, a*, b*) can also yield good effects, it will cost more computation time. Another significant matter is that, color spaces with three perceptual parameters of hue, brightness/lightness, and colorfulness/chroma are not suitable to supply coordinates in the resolution enhancement process, neither for pixel interpolation nor for SR reconstruction. Our further study will be focused on increasing the efficiency of the whole SR reconstruction procedure, and improving the dictionaries for some particular images with pertinency.

Funding

National Natural Science Foundation of China (NSFC) (61505156, 61575154); Fundamental Research Funds for the Central Universities (JB150512).

References and Links

1. P. Milanfar, Super-resolution Imaging (CRC, 2011).

2. S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Process. Mag. 20(3), 21–36 (2003). [CrossRef]

3. D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2009), pp. 349–356.

4. S. H. Rhee and M. G. Kang, “Discrete cosine transform based regularized high-resolution image reconstruction algorithm,” Opt. Eng. 38(8), 1348–1356 (1999). [CrossRef]

5. S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Variational Bayesian Super Resolution,” IEEE Trans. Image Process. 20(4), 984–999 (2011). [CrossRef] [PubMed]

6. W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” IEEE Comput. Graph. Appl. 22(2), 56–65 (2002). [CrossRef]

7. J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution as sparse representation of raw image patches,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008), pp. 1–8.

8. J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process. 19(11), 2861–2873 (2010). [CrossRef] [PubMed]

9. R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” on International Conference on Curves and Surfaces (IEEE, 2010), pp. 711–730.

10. W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization,” IEEE Trans. Image Process. 20(7), 1838–1857 (2011). [CrossRef] [PubMed]

11. J. Zhang, C. Zhao, R. Xiong, S. Ma, and D. Zhao, “Image super-resolution via dual- dictionary learning and sparse representation,” on IEEE International Symposium on Circuits and Systems (IEEE, 2012), pp. 1688–1691. [CrossRef]

12. K. Zhang, X. Gao, D. Tao, and X. Li, “Multi-scale dictionary for single image super-resolution,” IEEE Computer Vision Pattern Recognition 157(10), 1114–1121 (2012).

13. S. Yang, M. Wang, Y. Chen, and Y. Sun, “Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding,” IEEE Trans. Image Process. 21(9), 4016–4028 (2012). [CrossRef] [PubMed]

14. Q. Liu, S. Wang, L. Ying, X. Peng, Y. Zhu, and D. Liang, “Adaptive dictionary learning in sparse gradient domain for image recovery,” IEEE Trans. Image Process. 22(12), 4652–4663 (2013). [CrossRef] [PubMed]

15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” on IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE, 2015), pp. 295–307.

16. J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” on IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 1646–1654. [CrossRef]

17. E. J. Giorgianni and T. E. Madden, Digital Color Management: Encoding Solutions, 2nd ed. (JohnWiley & Sons, 2008).

18. J. C. Russ, The Image Processing Handbook, 6th ed. (CRC, 2011).

19. R. W. G. Hunt, The Reproduction of Color, 6th ed. (John Wiley & Sons, 2004).

20. CIE 15.3, Colorimetry, 3rd ed. (Commission Internationale de L'Eclairage, Vienna, 2004).

21. G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition (John Wiley and Sons, 2000).

22. Z. Wang and A. C. Bovik, Modern Image Quality Assessment (Morgan & Claypool, 2006).

23. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef] [PubMed]

24. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54(11), 4311–4322 (2006). [CrossRef]

25. Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar Conference on Signals, Systems and Computers (IEEE, 1993), pp. 40–44. [CrossRef]

26. Eric Dubois, The Structure and Properties of Color Spaces and the Representation of Color Images (Morgan & Claypool, 2010)

27. M. D. Fairchild, Color Appearance Models, 2nd ed. (John Wiley & Sons, 2005).

28. M. J. Gangeh, A. Ghodsi, and M. S. Kamel, “Kernelized supervised dictionary learning,” IEEE Trans. Signal Process. 61(19), 4753–4767 (2013). [CrossRef]

29. Y. Zhou, K. Liu, R. E. Carrillo, K. E. Barner, and F. Kiamilev, “Kernelbased sparse representation for gesture recognition,” Pattern Recognit. 46(12), 3208–3222 (2013). [CrossRef]

30. M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, 2010).

31. IEC 61966–2-1, Multimedia systems and equipment– Colour measurement and management–Part 2–1: Colour management–Default RGB colour space–sRGB, Amendment 1 (IEC, Switzerland, 2003).

32. M. R. Luo, G. Cui, and B. Rigg, “The development of the CIE 2000 Colour-Difference formula: CIEDE2000,” Color Res. Appl. 26(5), 340–350 (2001). [CrossRef]

33. CIE 142, Improvement to industrial colour-difference evaluation (Commission Internationale de L'Eclairage, Vienna, 2001).

method	PSNR	SSIM	mean ΔE₀₀	max ΔE₀₀	Time / s
M-RGB	26.520	1.000	1.977	32.072	95.0
M-XYZ	26.263	1.000	2.348	46.667	96.4
M-YCbCr-1	26.809	1.000	2.124	30.236	32.5
M-YCbCr-3	27.263	1.000	2.271	30.236	94.5
M-YIQ-1	27.100	1.000	1.937	25.233	32.5
M-YIQ-3	27.110	1.000	1.895	25.233	96.1
M-HSV-1	25.713	1.000	2.639	93.416	32.7
M-HSV-3	25.245	1.000	2.734	93.416	95.6
M-HSI-1	25.795	1.000	2.671	92.260	32.8
M-HSI-3	26.091	1.000	2.505	92.260	95.5
M-LCh-1	26.610	1.000	2.166	48.322	34.0
M-LCh-3	26.232	1.000	2.184	48.322	97.0
M-LAB-1	27.042	1.000	1.946	18.320	34.2
M-LAB-3	27.368	1.000	1.918	19.902	97.2

method	PSNR	SSIM	mean ΔE₀₀	max ΔE₀₀	Time / s
M-RGB	26.520	1.000	1.977	32.072	95.0
M-XYZ	26.263	1.000	2.348	46.667	96.4
M-YCbCr-1	26.809	1.000	2.124	30.236	32.5
M-YCbCr-3	27.263	1.000	2.271	30.236	94.5
M-YIQ-1	27.100	1.000	1.937	25.233	32.5
M-YIQ-3	27.110	1.000	1.895	25.233	96.1
M-HSV-1	25.713	1.000	2.639	93.416	32.7
M-HSV-3	25.245	1.000	2.734	93.416	95.6
M-HSI-1	25.795	1.000	2.671	92.260	32.8
M-HSI-3	26.091	1.000	2.505	92.260	95.5
M-LCh-1	26.610	1.000	2.166	48.322	34.0
M-LCh-3	26.232	1.000	2.184	48.322	97.0
M-LAB-1	27.042	1.000	1.946	18.320	34.2
M-LAB-3	27.368	1.000	1.918	19.902	97.2

How to deal with color in super resolution reconstruction of images

Abstract

1. Introduction

2. Methods and algorithms

2.1 Procedure and selection of color coordinate systems

2.2 SR reconstruction of classified dictionary learning

(i) Classified dictionary learning

(ii) Reconstruction algorithm

3. Experiment and discussions

4. Conclusions

Funding

References and Links

Cited By

Figures (7)

Tables (2)

Equations (10)

Optics Express

color space/system	P₁ coordinate	P₂ coordinate	P₃ coordinate	SR for merely P₁	SR for P₁, P₂, and P₃
RGB	d _R	d _G	d _B	/	M-RGB
XYZ	X	Y	Z	/	M-XYZ
YCbCr	Y	Cb	Cr	M-YCbCr-1	M-YCbCr-3
YIQ	Y	I	Q	M-YIQ-1	M-YIQ-3
HSV	V	H	S	M-HSV-1	M-HSV-3
HSI	I	H	S	M-HSI-1	M-HSI-3
CIELCh	L*	C_ab*	h_ab	M-LCh-1	M-LCh-3
CIELAB	L*	a*	b*	M-LAB-1	M-LAB-3