Joint constraints of guided filtering based confidence and nonlocal sparse tensor for color polarization super-resolution imaging

Feng Huang; Yating Chen; Xuesong Wang; Shu Wang; Xianyu Wu

doi:10.1364/OE.507960

1. Introduction

Color polarization imaging integrates spectral and polarization information to capture multidimensional images with color (RGB-red-green-blue) and polarization components, thereby allowing the inference of object properties such as shape and surface roughness [1]. As shown in Fig. 1, the polarization information in different color channels exhibits similar object surface characterization while also encompassing complementary object surface characterization. In comparison to monochromatic polarization, multichromatic polarization can offer additional clues for target recognition or object surface characterization. Color polarization imaging has broad applications in various fields, including material classification [2], target recognition [3], and three-dimensional measurement of nontextured targets [4]. In addition to using the polarization properties of the target object, differences in polarization between the target object and background can be exploited for camouflage recognition [5], image defogging [6], underwater imaging [7], and other tasks.

Fig. 1. Visualization results of polarization information in the R, G, and B color channels. Multichromatic polarization information provides richer object surface characterization, compared to monochromatic polarization.

Download Full Size | PDF

The existing spectral polarization imaging techniques can be broadly categorized into two types: scanning spectropolarimeter and snapshot spectral polarization imaging systems. Scanning spectropolarimeter is unable to capture the complete spectral polarization data cube in a single exposure, utilizing mechanisms such as wheels [8], liquid crystal tunable filters [9], and acousto-optic tunable filters [10] to adjust wavelength and polarization angles. Snapshot spectral polarization imaging systems combine snapshot spectral imaging techniques with polarization detection modules, such as integrating compressed sensing technology with channeled imaging spectropolarimeter [11], and incorporating integral field spectrometry and polarization camera with division of the focal plane (DoFP) [12] etc. However, these snapshot spectropolarimeters have complex structures, making them challenging to generalize for practical applications. Currently, snapshot spectral polarization imaging technology that has been commercially deployed is primarily DoFP color polarization camera. It enables high-speed polarization imaging with multiple adjacent pixels to form a superpixel. However, a device like the Sony color polarization (complementary metal-oxide-semiconductor (CMOS) sensor (IMX250 MYR) commonly used in such cameras has fixed spectral bands, polarization states, and sensor parameters, thus reducing its flexibility for specific tasks and limiting its generalizability [13]. Additionally, the limited sensor size and light efficiency of micro-optical filters reduce the signal-to-noise ratio (SNR) [14]. To address these drawbacks, we propose and validate a snapshot color polarization imaging system called camera-array-based color polarized SR imaging system (CAPS). CAPS exploits non-redundant spatial motion information among apertures to reconstruct high-resolution (HR) color polarized images. The reasons for choosing camera array with multi-aperture instead of a division of amplitude or division of aperture system design are as follows: (1) Division of aperture achieves multi-channel imaging on the same sensor. Consequently, due to limitations in sensor size, the sub-lens size for each channel is small, leading to a lower diffraction limit that restricts the potential for SR. (2) Division of amplitude reduces the light throughput for each channel and is considerably more challenging to assemble compared to planar camera array with multi-aperture.

Color polarized image reconstruction has primarily been developed for mosaic images captured using DoFP color polarization cameras, and it can be categorized into interpolation-, regularization-, and deep-learning-based methods. Interpolation-based methods have low computational complexity. Morimatsu et al. [15] recently proposed a color polarization filter array for demosaicking based on edge-aware residual interpolation (EARI), achieving independent demosaicking of color and polarization modules. However, interpolation-based methods are often sensitive to noise, which can be reflected in the reconstructed color polarized images. This can potentially affect the accuracy of polarization analysis and lead to incorrect conclusions [16]. Performing denoising after interpolation is more challenging since interpolation changes the noise properties. Moreover, interpolation-based demosaicking relies heavily on the pixel layout of the micro-polarizer, which limits its applicability to image reconstruction in other color polarization imaging systems. Alternatively, regularization-based color polarization image reconstruction offers an inverse optimization solution to image degradation, thus increasing generalizability. Zhang et al. [17] proposed a nonlocal sparse tensor regularization method for restoring panchromatic polarized mosaic images. This method preserves the three-dimensional structure of polarization data and leverages the local sparsity and nonlocal self-similarity of the polarization cube to remove missing pixels and noise. However, it only is focused on polarization and does not involve color information. Lu et al. [18] proposed a color polarization demosaicking method (JCPD) that combines color and polarization sparse dictionaries for high-fidelity demosaicking. However, this method vectorizes color polarization data in lexicographical order, consequently destroying the three-dimensional structure of the color polarization data. In addition, the method only constrains the block sparsity, ignoring the correlation between nonlocal similar blocks [19] and thus limiting the denoising performance. Current regularization-based methods mainly rely on incorporating diverse regularization terms to constrain the solution of the cost function, but they cannot fully exploit the constraints of the data fidelity term. Furthermore, existing algorithms only constrain observed images with the same chromatic and polarimetric features when estimating the HR multichannel images in the data fidelity term, neglecting the complementary information between all images with all chromatic and polarimetric features. Consequently, this approach may result in suboptimal recovery of image details and oversmoothed edges. Recently, Wen et al. [20] introduced a pioneering color polarization demosaicking network (CPDnet) with a multitask cascaded structure consisting of color and polarization modules along with a customized loss function that incorporates constraints for intermediate feature layers and the final result. However, current deep-learning-based color polarization demosaicking methods are highly dependent on the training set, which can be distorted owing to chromatic and polarimetric fractures present in real-world scenes.

In recent years, the field of super-resolution (SR) using deep learning (DL) has witnessed rapid substantial advancements. From the early utilization of convolutional and generative adversarial networks to the more recent emergence of Transformer networks and diffusion models, DL-based SR techniques have consistently achieved significant breakthroughs [21–25]. However, it is essential to highlight that the majority of these methods predominantly rely on RGB color spaces for modeling, with limited exploration into other color spaces [26]. While DL-based single-image SR methods [21,23,24,27] independently enhance individual polarimetric direction images, they tend to overlook the inherent correlation and complementary spatial information among images with diverse polarization directions. Recently, D. Yu introduced a dedicated SR network model tailored for color polarization images. However, it's noteworthy that the low-resolution (LR) training images are generated via bicubic downsampling, a method less suited for real-world LR images [28]. Additionally, DL is susceptible to producing “hallucinated” details in HR predictions [29]. This susceptibility becomes particularly pronounced when restoring aliasing artifacts in LR images, as DL methods frequently misinterpret aliasing artifacts as genuine features of the original scene, leading to reduced result accuracy [30].

The contributions of this study can be summarized as follows. First, we introduce CAPS for high-spatial-resolution snapshot color polarization imaging, which exhibits universal applicability. Second, we propose a color polarization SR reconstruction algorithm to enhance imaging quality and reliability. Currently, research focused on SR within color spaces beyond RGB color is considerably limited. Because of variations in luminance, structure, and texture across images with different polarization directions, applying monochrome multi-image SR reconstruction [31–36] directly to color polarized images is infeasible. Current polarization image reconstruction algorithms [17,18] fail to effectively harness the complementarity among all observed images, as they solely employ the observed image with matching chromatic and polarimetric signatures as the HR image to be estimated in the data fidelity term. In this study, an analysis of the statistical characteristics of color polarization observed data is conducted, and for the first time, guided filter residuals are employed to calculate correlation coefficients among different channel features. This approach enables confidence-guided color polarization SR reconstruction. Previous methods primarily used guided filter residuals as denoising confidence measures. Whereas, the proposed confidence-guided SR reconstruction method enables comprehensive utilization of complementary information from all observed data whilst avoiding the introduction of false spectral and polarimetric signature. A joint optimization strategy for color and polarization information is proposed to ensure high reconstruction accuracy of the chromatic and polarimetric signatures. In addition, to simultaneously perform SR and denoising, the proposed algorithm incorporates the regularization constraint of a nonlocal sparse tensor. An alternating direction method of multipliers (ADMM) with adaptive parameter adjustments is also introduced for iterative optimization jointly constrained by the data fidelity and regularization. Overall, the proposed algorithm can simultaneously perform Bayer demosaicking, SR imaging, and denoising on color polarized images.

2. Background and system overview

2.1 Notation

We adopt the following notation in this paper. Tensors are denoted by bold Euler script letters (e.g., $\cal T$). Bold uppercase letters, bold lowercase letters, and lowercase italicized letters (e.g., T, t and $t$) denote matrices, vectors, and scalars, respectively. An N-dimensional real tensor is represented as ${\cal T} \in {\mathbb{R}}^{{I_1} \times {I_2}, \cdots \times {I_N}}$. The elements in tensor $\cal T$ are denoted as ${t_{{i_1}{i_2}, \ldots {i_N}}}$, where $1 \le {i_n} \le {I_n}$. The definitions of the F-norm and ${l_1}$-norm of a tensor are consistent with those presented in [37]. The unfolding matrix of $\cal T$ along the n-th dimension is expressed as ${{{\mathbf T}}_{(n)}} \in {\mathbb{R}}^{{I_n} \times ({{I_1} \times {I_2} \times \cdots {I_{n - 1}} \times {I_{n + 1}} \times \cdots \times {I_N}})}$. The n-mode product of tensor ${\cal T} \in {\mathbb{R}}^{{I_1} \times {I_2}, \cdots \times {I_N}}$ and matrix ${\textrm{B}} \in {\mathbb{R} ^{{J_n} \times {I_n}}}$ is denoted as ${\cal T} \times_n{\mathbf B}$, which results in an N-dimensional real tensor ${\cal A} \in {\mathbb{R}}^{{I_1} \times {I_2} \cdots \times {J_n} \cdots \times {I_N}}$ . The calculation of its elements can be found in [38].

2.2 Stokes vectors

Stokes vectors can describe the polarization properties. The phase information between the orthogonal polarization components is not readily available for natural light, that is, no circular polarization state is measured [39]. Therefore, for linear polarization, we consider only the first three parameters of the Stokes vector [40] expressed as

(1)$$\left\{ {\begin{array}{{c}} {{S_0} = {{({{I_0} + {I_{45}} + {I_{90}} + {I_{135}}} )} / 2}}\\ {{S_1} = {I_0} - {I_{90}}}\\ {{S_2} = {I_{45}} - {I_{135}}} \end{array}} \right.$$

where ${I_p}$ represents the light intensity filtered by the linear polarizer oriented in direction p for $p \in \left\{ {\begin{array}{{cccc}} {{0^ \circ }}&{{{45}^ \circ }}&{{{90}^ \circ }}&{{{135}^ \circ }} \end{array}} \right\}$. Direction p refers to the relative angle between the direction of polarization axis and the specified zero direction (positive direction of X axis) under the system coordinate. Based on the Stokes vector, the degree of linear polarization (DoLP) and angle of polarization (AoP) are formulated as

(2)$$\textrm{DoLP = }\frac{{\sqrt {S_1^2 + S_2^2} }}{{{S_0}}}$$

(3)$$\textrm{AoLP = }\frac{1}{2}\textrm{arctan}\left( {\frac{{{S_2}}}{{{S_1}}}} \right)$$

2.3 Super resolution imaging

Camera-array-based SR imaging overcomes the physical constraint of conventional optical system and detector, such as low dynamic range and high noise associated with small pixel size sensors, and the limited Field of View (FOV) and depth of field associated with large focal length lenses [41]. It surpasses the limitations of the spatial bandwidth product in conventional imaging, enabling wide-field HR imaging. The reconstructed image with camera array of which focal length is f is equivalent to the imaging result of the long focal length lens with focal length of $\sqrt N f$ [42]. This is advantageous for reducing $\sqrt N \times \sqrt N $ the optical track length of the system.

Camera-array-based SR imaging exploits non-redundant motion information among apertures for SR reconstruction. It disentangles the aliased artifacts in input LR images [43]. These aliasing artifacts emerge owing to the inadequate detector sampling frequency, which fails to capture high-frequency signals transmitted through optical components. Consequently, LR image high-frequency information is not lost albeit encoded within low-frequency signals. The reconstruction of multiple aliased LR images with sub-pixel offsets enables retrieval of high-frequency details [29,36,44–46]. Random errors in the optical axis parallelism during camera array alignment and assembly ensure that the sampling deviation between apertures has a subpixel magnitude [47]. Owing to the layout of the camera array, the offsets of all the apertures relative to the reference aperture are uniformly distributed on a disk, establishing a well-posed SR reconstruction problem of the camera array [36]. Hence, the camera array enables an exceptional SR imaging capacity for real-world applications.

The observation model of a camera array-based SR imaging system can be expressed as follows [47]

(4)$$\textrm{vec(}{\mathbf I}_L^n) = {{\mathbf D}_{\textrm{spatial}}}{{\mathbf W}_n}\textrm{vec(}{{\mathbf I}_H}) + {{\mathbf e}_n}({n = 1 \ldots N} )$$

where ${\mathbf I}_L^n \in {{{\mathbb R}}^{h \times w}}$ represents the image captured by the n-th camera, ${{\mathbf I}_H} \in {{{\mathbb R}}^{h\sqrt N \times w\sqrt N }}$ represents the HR image to be estimated, ${{\mathbf W}_n}$ is the warping matrix defined by the geometrical transformation related to the image captured by the n-th camera with the reference image, ${{\mathbf e}_n} \in {{{\mathbb R}}^{h \times w}}$ represents noise, N is the number of apertures, ${{\mathbf D}_{\textrm{spatial}}}$ is the bi-dimensional downsampling operator for the imaging focal plane arrays which captures blur owing to pixel integration, such as averaging or Gaussian smoothing, and vec represents vectorization.

Disregarding factors such as the registration accuracy and optical aberration, the theoretical SR magnification can be expressed as [41,48]

(5)$${r_{SR}} = \min \left( {\frac{{{f_{diffraction}}}}{{{f_{Nyquist}}}},\sqrt N } \right) = \min \left( {\frac{{2AP}}{{1.22\lambda f}},\sqrt N } \right)$$

where ${f_{diffraction}}$ is the angular frequency of the diffraction limit, ${f_{Nyquist}}$ is the Nyquist angular frequency, A and f are the aperture diameter and focal length of the lens, respectively, P is the pixel size of the sensor, and $\lambda $ is the wavelength. Compared with the raw observation image without demosaicking, the proposed four-aperture CAPS is equivalent to four-times SR reconstruction.

The effect FOV in camera-array-based SR imaging system is the same as that of a single aperture. In CAPS, the DoLP and AoP polarization information can only be obtained from the common overlapping region of all apertures. Therefore, the effect FOV for CAPS is determined by the common overlapping region of all apertures. As the camera array in CAPS is the planar distribution and grid arrangement, this ensures a compact structure, maximizing the overlapping region among apertures. Under the ideal conditions of the parallel optical axes among apertures and the pinhole camera model, for the proposed four-aperture CAPS, the size of the common overlapping region in the horizontal direction is given by,

(6)$${R_x} = \frac{{{N_x}Z}}{{{f_P}}} - B$$

where Z is the object depth, ${f_P} = f/P$ represents the pixel focal length, B is the baseline length of the camera array, and ${N_x}$ is the total number of pixels in the horizontal direction for single aperture. Therefore, the effect FOV in the horizontal direction is given by,

(7)$$FO{V_h} = 2\arctan \left( {\frac{{{R_x}}}{{2Z}}} \right) = 2\arctan \left( {\frac{{{N_x}}}{{2{f_P}}} - \frac{B}{{2Z}}} \right)$$

Similarly, the effect FOV in the vertical direction is given by,

(8)$$FO{V_v} = 2\arctan \left( {\frac{{{N_y}}}{{2{f_P}}} - \frac{B}{{2Z}}} \right)$$

where ${N_y}$ is the total number of pixels in the vertical direction for single aperture.

2.4 Observation model

The proposed CAPS is a four-aperture planar compound eye imaging system. Each sub-eye is equipped with an independent color camera, lens, and linear polarizer. In contrast to compound eye imaging systems based on micro-lens arrays [37], this optical design featuring independent sub-eyes, effectively eliminates interference between apertures. CAPS operates as a passive imaging system, relying on ambient light or the target's self-emission for imaging without the necessity for an active light source. Consequently, it exhibits minimal interference arising from coherent scattering. The observation model of the CAPS can be expressed as

(9)$$\textrm{vec(}{\mathbf I}_{L\_raw}^n) = {{\mathbf D}_{\textrm{spectral}}}{{\mathbf D}_{\textrm{spatial}}}{{\mathbf W}_n}{{\mathbf P}_n}\textrm{vec(}{{\mathbf I}_H}) + {{\mathbf{e}}_n}({n = 1 \ldots 4} )$$

where ${{\mathbf P}_n}$ represents the polarization filter matrix. In addition, ${{\mathbf D}_{\textrm{spectral}}}$ represents spectral domain downsampling, including spectral filtering. In detail, the RGB band is spatially downsampled by a color filter array, while only one of the RGB values is recorded at each pixel position, and four adjacent pixels form a superpixel. Moreover, ${\mathbf I}_{L\_raw}^n \in {{{\mathbb R}}^{h \times w}}$ represents the raw image captured by the n-th color camera, i.e., the mosaic image with an RGGB Bayer filter. A schematic of imaging using the proposed CAPS is shown in Fig. 2.

Fig. 2. Schematic of proposed CAPS. (a) Four-aperture imaging system with polarization directions of 0°, 45°, 90°, and 135°. There is non-redundant motion information among apertures. (b) Single-aperture structure. Each aperture corresponds to an independent color camera, lens, and linear polarizer. There are IFOV errors in the color filter array per aperture.

Download Full Size | PDF

Considering the two imaging conditions of the RGGB Bayer filter and directional polarizing filter, (9) can be rewritten as

(10)$$\textrm{vec(}{\mathbf I}_L^{m(c )}) = {{\mathbf D}_{\textrm{spatial}}}{{\mathbf W}_{m(c )}}{{\mathbf F}_c}\textrm{vec(}{{\mathbf I}_H}) + {{\mathbf{e}}_c}({c = 1 \ldots 12})$$

where ${{\mathbf F}_c}$ represents the joint filter matrix of color and polarization and c is the number of channels. The combination of different chromatic and polarimetric signatures results in 12 channels, (R, G, B) × (0°, 45°, 90°, 135°). In addition, ${{\mathbf W}_{m(c )}}$ denotes the warping matrix of the joint inter-aperture geometric transformation and instantaneous field of view (IFOV) in the color filter array. Although downsampling in ${{\mathbf D}_{\textrm{spatial}}}$ corresponds to pixel integration in a small region and downsampling in ${{\mathbf D}_{\textrm{spectral}}}$ corresponds to pixel decimation in a small region, the information provided by neighboring pixels is required to reconstruct the lost pixels in the color filter array. Therefore, downsampling in (9) is unified as ${{\mathbf D}_{\textrm{spatial}}}$. Finally, ${\mathbf I}_L^{m(c )} \in {{{\mathbb R}}^{h/2 \times w/2}}$ represents the observed image corresponding to channel c, and m(c) represents the index of the observed image corresponding to channel c.

Let $\textrm{vec(}{\mathbf I}_H^c) = {{\mathbf F}_c}\textrm{vec(}{{\mathbf I}_H})$ with ${\mathbf I}_H^c \in {{{\mathbb R}}^{2h \times 2w}}$ be the HR image of channel c to be estimated and consider sensing matrix ${{\mathbf M}_{m(c )}} = {{\mathbf D}_{\textrm{spatial}}}{{\mathbf W}_{m(c )}}$ with ${{\mathbf M}_{m(c )}} \in {{{\mathbb R}}^{hw/4 \times 4hw}}$. Then, (10) is equivalent to

(11)$$\textrm{vec(}{\mathbf I}_L^{m(c )}) = {{\mathbf M}_{m(c )}}\textrm{vec(}{\mathbf I}_H^c) + {{\mathbf{e}}_c}({c = 1 \ldots 12} )$$

The sampling frequency of the reconstructed HR image ${\mathbf I}_H^c$, is four times higher than that of observed image ${\mathbf I}_L^{m(c )}$. CAPS provides the solution to the inverse problem of (11).

3. Imaging methods

To leverage the correlation between chromatic and polarimetric signatures without mutual interference, we propose a unified optimization approach for chromaticity and polarization. Moreover, we employ guided filter residuals to compute correlation coefficients between different channel features, serving as a confidence metric for SR reconstruction. This approach ensures the absence of spurious spectral polarization information in the reconstruction results whilst fully harnessing complementary information from all observed images. The primary aim of the proposed approach is to strengthen the data fidelity constraint within the objective function. To concurrently accomplish both SR and denoising in the reconstruction process, we introduce a regularization term of non-local sparse tensors. To address the objective function with these joint constraints, ADMM optimization with adaptive parameter adjustments [49] is employed to balance the tradeoff between SR and denoising. A flowchart of the corresponding algorithm is shown in Fig. 3.

Fig. 3. Schematic of proposed reconstruction algorithm for CAPS. The algorithm combines a data fidelity constraint of guided filtering based confidence and regularization constraint of a nonlocal sparse tensor. LR and noisy color polarized images were acquired using a CAPS prototype. The four captured images were registered and rearranged to obtain the warping matrixes and 16 images with complementary information in 12 channels with different colors and polarization directions. The proposed confidence-aware guided filtering was used to calculate the correlation coefficients between the 12 channels, and the reconstruction result of SCSR was taken as the initial output. In confidence-guided SR reconstruction algorithm, the constraints from the 16 images were incorporated into image reconstruction per channel in the data fidelity term using the calculated correlation coefficients between the 12 channels. The regularization constraint of nonlocal sparse tensor was added to simultaneously achieve SR imaging and denoising.

Download Full Size | PDF

3.1 Color polarized image rearrangement

There are multiple sources of polarization in the real world, such as illumination, reflection, and birefringence, which cause the color of a scene to vary at different polarization angles [18]. The chromatic and polarimetric information can be reconstructed through a cascade structure, that is, the color image per polarization direction is first obtained through Bayer demosaicking. Then, SR reconstruction based on the monochromatic polarization camera array is used per channel in the RGB image. However, this leads to serious chromatic and polarimetric distortions. Because the cascade structure ignores the correlation between chromatic and polarimetric information, the constraint of the polarimetric signatures is weakened [50], and the reconstruction error of the previous stage is accumulated and amplified in the next stage.

To address this problem, a joint solution strategy is proposed. First, the captured RGGB Bayer filter images in four polarization directions are rearranged. Since each pixel in a superpixel has a changing IFOV, the two G components contain complementary spatial information. Therefore, the RGGB Bayer filter image per polarization direction is split into four images, and the four polarization directions are split into 16 images with 12 channels: 0°-R, 0°-G, 0°-G, 0°- B, 45°-R, 45°-G, 45°-G, 45°-B, 90°-R, 90°-G, 90°-G, 90°-B, 135°-R, 135°-G, 135°-G, and 135°-B. Taking 0°-R as the reference image, the warping matrices of the remaining 15 images are solved during registration.

3.2 Confidence-guided SR via guided filtering

To analyze the statistical characteristics of Bayer RGB polarized image sequences containing complementary aliased high-frequency components, a confidence-guided SR method based on guided filtering is proposed, as illustrated in Fig. 4. The data constraint limitations from existing algorithms are clearly illustrated in Fig. 4, where the HR image reconstruction for each channel is restricted solely to observed images with identical channel characteristics. In contrast, the proposed approach enables comprehensive utilization of all original observed images to reconstruct HR images in each channel without introducing spurious spectral and polarimetric signatures.

Fig. 4. Comparison of data term between the confidence-guided SR reconstruction algorithm utilizing guided filtering and the current polarization image restoration algorithms. (a) Constraint of data term from existing algorithms. The HR image reconstruction for each channel is restricted solely to observed images with identical channel characteristics. It does not fully exploit the complementary information from all observed images. (b) Constraint of data term from the presented method. Confidence-guided SR reconstruction is achieved by introducing guided filtering based Confidence. This ensures that when leveraging complementary information from the observed images in other channels, false textures are not introduced.

Download Full Size | PDF

Confidence measurement is required to achieve both low confidence for an exclusive texture, which ensures that the reconstructed image does not introduce false information, and high confidence for a consistent texture with complementary aliasing information, thereby fully leveraging the information complementarity between channels. For example, considering an indoor scene in a color polarization dataset [40], the simulated observation image is obtained through imaging according to (10). Figure 5 illustrates the regions of exclusive and consistent textures for aligned 0°-R and 90°-G images.

Fig. 5. Regions of exclusive and consistent textures for an indoor scene in a color polarization dataset [40]. Local regions with exclusive (red boxes) and consistent (blue boxes) textures are selected. Leaves are reflected on the specular surface of the 0°-R image but not on that of the 90°-G image. The consistent texture has complementary aliasing information. To better demonstrate the non-redundancy of the 0°-R and 90°-G images in the consistent texture region, a 9 × 9 window is selected, showing that the grayscale distributions of the two windows differ.

Download Full Size | PDF

Traditional methods rely on calculating normalized cross-correlation within clustered segmented regions [51] or local square windows as a confidence measurement. However, they fail to effectively distinguish between exclusive and consistent textures, as depicted in Fig. 6. Owing to non-redundant aliasing information and noise interference, the former tends to underestimate the consistent texture confidence level. The latter not only underestimates the consistent texture confidence level albeit also overestimates the exclusive texture confidence level. Exclusive textures appear as local nonlinear brightness differences, whereas consistent textures with complementary aliasing approximate local linear brightness differences. Guided filtering is a technique used for local linear transformation and is robust to noise [52]. Inspired by the method in [53], which uses a guided filtering residual as the confidence basis for denoising, we exploit guided filtering residual as the confidence basis of color polarization SR for integrating other channel images with different spectral components and polarization directions into the SR reconstruction of the target channel image. The guided filtering residual is as follows:

(12)$${r_k} = \mathop {\min }\limits_{{a_k},{b_k}} \sum\limits_{i \in {w_k}} {({{{({{a_k}{{\mathbf G}_i} + {b_k} - {{\mathbf A}_i}} )}^2} + \varepsilon a_k^2} )}$$

where ${\mathbf G}$ represents the guide image, ${\mathbf A}$ represents the target image, $\varepsilon $ is a smoothing parameter, ${{\mathbf w}_k}$ is a window centered at pixel k. ${a_k}$ and ${b_k}$ are linear coefficients.

Fig. 6. Confidence levels obtained by different measurement methods for 0°-R and 90°-G images in Fig. 5. From top to bottom, the rows show the pseudo-color confidence results of the normalized cross-correlation (NCC) of the clustered segmented regions, NCC of the local square window, and proposed guided filter. The (mean, variance) of the Gaussian functions for the three methods is (1, 0.2), (1, 0.2), and (0, 0.2), respectively. We consider 15 clusters, noting that the confidence level for more clusters is basically the same.

Download Full Size | PDF

The guided filtering residual indicates the likelihood that a pixel belongs to an exclusive texture. Hence, pixels with higher residuals have lower confidence, implying an inverse proportionality between the two variables. Moreover, the confidence levels are required to be within the range of 0 to 1. The corresponding relation is obtained by applying the following Gaussian function to ${r_k}$:

(13)$${L_k} = \exp ({ - {r_k}/\nu } )$$

where $\nu $ is the Gaussian variance.

Let confidence-aware guided filtering be denoted as CAGF. An image from one of the 12 channels is selected as the target image, and the images of the other 11 channels are used as guide images. Using (13), the correlation coefficient between the target image and each of the 11 guide images is calculated, obtaining the confidence level per channel for SR reconstruction of the corresponding guide image. By traversing all the images, the correlation coefficient between images from any two channels across the 12 channels can be calculated. Equation (14) represents the confidence level between the images from channels k and j, where the first parameter of CAGF is the target image, and the second parameter is the guide image. High values is often assigned to exclusive textures in the guide image not present in the target image. For exclusive textures, low values are required. Therefore, we take the smaller value from $\textrm{CAGF}({{\mathbf I}_L^k,{\mathbf I}_L^j} )$ and $\textrm{CAGF}({{\mathbf I}_L^j,{\mathbf I}_L^k} )$ as the confidence level between ${\mathbf I}_L^k$ and ${\mathbf I}_L^j$.

(14)$${\mathbf L}_{k,j}^L = {\mathbf L}_{j,k}^L = \min ({\textrm{CAGF}({{\mathbf I}_L^k,{\mathbf I}_L^j} ),\textrm{CAGF}({{\mathbf I}_L^j,{\mathbf I}_L^k} )} )\textrm{ }({k = 1 \ldots 12,j = 1 \ldots 12} )$$

where min represents the minimum value.

Considering only the data fidelity, the objective function for SR reconstruction of the image from channel c is given:

(15)$$\sum\limits_{m = 1}^{16} {\|{\textrm{vec}({{\mathbf I}_L^m} )- {{\mathbf M}_m}\textrm{vec}({{\mathbf I}_H^{c(m )}({{\mathbf I}_H^c} )} )} \|_2^2}$$

where c(m) is the channel index of the m-th observed image and ${\mathbf I}_H^{c(m )}({{\mathbf I}_H^c} )$ represents the relation between the images from channels c(m), ${\mathbf I}_H^{c(m )}$ and c, ${\mathbf I}_H^c$. Note that there are two images of the G component per polarization direction in $\{{{\mathbf I}_L^n} \}_{n = 1}^{16}$.

The solution for the HR image from channel c obtained by using the steepest descent method proceeds iteratively as follows:

(16)$$\textrm{vec(}{\mathbf I}_H^{c,i + 1}) = \textrm{vec(}{\mathbf I}_H^{c,i}) + \varDelta t\sum\limits_{m = 1}^{16} {\textrm{vec}\left( {{{\left[ {\frac{{\partial {\mathbf I}_H^{c(m ),i}}}{{\partial {\mathbf I}_H^{c,i}}}} \right]}^T}} \right) \odot ({{\mathbf M}_m^T({\textrm{vec}({{\mathbf I}_L^m} )- {{\mathbf M}_m}\textrm{vec}({{\mathbf I}_H^{c(m ),i}} )} )} )}$$

where ${\odot} $ denotes the Hadamard (elementwise) product and ${\left( {\frac{{\partial {\mathbf I}_H^{c(m ),i}}}{{\partial {\mathbf I}_H^{c,i}}}} \right)^T}$ can be regarded as the confidence level, ${\mathbf L}_{c(m ),c}^H$, between the images from channels c(m) and c. This term is obtained by computing the confidence level ${\mathbf L}_{m,m(c)}^L$ between the m-th and m(c)-th observed images using (14) and then resizing ${\mathbf L}_{m,m(c)}^L$ to match the size of the HR image.

Based on the proposed confidence-aware guided filtering, we observe that the confidence level between channels with different polarization directions is higher than that between channels with different colors in most cases, being consistent with the finding in [54]. Recently, Huang et al. [48] proposed a spectral-clustering-based SR reconstruction algorithm (SCSR) to realize multi-spectral SR based on camera array. However, the observed images used in the data term of this algorithm are those obtained after multiple resampling operations of the original observed images, thus limiting the edge sharpness of the final reconstructed image. But SCSR has good spectral and polarimetric signature fidelity and can restore aliasing information, so the results of SCSR are used for initialization to ensure and accelerate optimization convergence.

3.3 Nonlocal sparse tensor regularization

As the captured LR color polarized image is corrupted by noise, applying (16) directly results in reconstructed HR intensity images with notable noise, which is further amplified in the DoLP and AoP images. To ensure a high SNR for the reconstructed intensity as well as the DoLP and AoP images, we introduce nonlocal sparse tensor regularization into (15). Group sparsity or low-rank regularization of two-dimensional images is unsuitable for high-dimensional data. Moreover, rearranging high-dimensional data into a two-dimensional matrix destroys the positional relations between pixels in the spatial domain and weakens the corresponding relations between the spatial and other dimensions [55]. Compared with matrix, a tensor can represent high-dimensional data without information loss, Therefore, we use a tensor to describe the three-dimensional color polarization data.

To fully use the spatial and color polarized channel dimensions and capture the intrinsic structure of the color polarized three-dimensional data, the nonlocal sparse characteristics of a local region in the two-dimensional image are extended to the color polarized three-dimensional data. In addition, a nonlocal sparse tensor constraint is introduced. Nonlocal sparse tensor factorization can exploit the internal structure correlation in the spatial domain, inter-channel correlation, and nonlocal self-similarity. The four-dimensional tensor preserves the spatial channel structure and provides sparsity in the clusters that aggregate similar blocks [56], thus improving the SNR and fidelity of the chromatic and polarimetric signatures in the reconstruction results.

The HR color polarized images to be estimated can be represented as tensor ${\cal I} \in {\mathbb{R}}^{h \times w \times 12}$ after being stacked along the channel dimensions. $\cal I$ is then decomposed into several overlapping three-dimensional blocks, with the k-th block being denoted as ${{\cal R}^{(k)}} = {R_k}({{\cal I}})\in {{\mathbb{R}}^{{f_h} \times {f_w} \times 12}}$, where ${R_k}({\cdot} )$ represents block extraction. Using sparse tensor factorization, ${{\cal R}^{(k)}}$ can be decomposed as follows:

(17)$${{\cal R}^{(k)}}\textrm{ = }{{\cal C}^{(k)}}{\times _1}{\mathbf{\Phi}}_h^{(k)}{\times _2}{\mathbf{\Phi}}_w^{(k)}{\times _3}{\mathbf{\Phi}}_c^{(k)}$$

where ${{\cal C}^{(k)}} \in {{{\mathbb R}}^{h \times w \times 12}}$ is the tensor core, and ${\mathbf{\Phi}}_h^{(k )} \in {{{\mathbb R}}^{{f_h} \times {f_h}}}$, ${\mathbf{\Phi}}_w^{(k)} \in {{{\mathbb R}}^{{f_w} \times {f_w}}}$ and ${\mathbf{\Phi}}_c^{(k)} \in {{{\mathbb R}}^{12 \times 12}}$ are the height-, width-, and channel-mode dictionaries, respectively.

Because principal component analysis bases outperform conventional bases for sparse representation [57], these bases are used as atoms in dictionary learning. For the k-th block ${{\cal R}^{(k)}}$, the first P blocks most similar to it are selected within the window centered on ${{\cal R}^{(k)}}$, denoted as ${{\cal S}^{(k)}} \in {{{\mathbb R}}^{{f_h} \times {f_w} \times 12 \times P}}$. The blocks in ${{{\cal S}}^{(k)}}$ share the same dictionary. Therefore, to improve the accuracy of dictionary learning, singular value decomposition of the covariance matrix of ${S}_{(n )}^{(k )}$ is applied to calculate the height-, width-, and channel-mode dictionaries [17].

The local adaptive dictionaries of all the blocks form an overcomplete dictionary. Except for ${\mathbf{\Phi}}_h^{(k)}$, ${\mathbf{\Phi}}_w^{(k)}$ and ${\mathbf{\Phi}}_c^{(k)}$, the tensor core coefficients of ${{\cal R}^{(k)}}$ corresponding to the other dictionaries are zero. That is, local adaptive dictionaries essentially guarantee the sparsity of the tensor cores. The intrinsic structural correlation of the spatial domain and inter-channel correlation make the tensor core of ${{\cal R}^{(k)}}$ also sparse under dictionaries ${\mathbf{\Phi}}_h^{(k)}$, ${\mathbf{\Phi}}_w^{(k)}$ and ${\mathbf{\Phi}}_c^{(k)}$, Thus, we add the ${l_1}$-norm constraint of ${C^{(k)}}$ to restrict the solution of the tensor core. Estimating the tensor core from a degraded data cube corrupted by noise may fail despite adding the sparsity constraint of ${{\cal C}^{(k)}}$ [57]. Note that ${{\cal R}^{(k)}}$ has both local and nonlocal similarities. Therefore, by taking advantage of the group sparsity between similar blocks, we add nonlocal self-similarity to further constrain the solution of the tensor core, thus improving the robustness and insensitivity to noise in sparse coding learning. The applied nonlocal self-similarity selects similar blocks within a local window, because such blocks have a higher similarity than those in other regions, instead of clustering and grouping all blocks of the image to then determine similar blocks [38]. There are two main implementations of nonlocal self-similarity: similarity between image blocks and similarity between sparse cores corresponding to image blocks. This is because the similar blocks share the same dictionary. However, the second implementation calculates the sparse cores of similar blocks, making it impractical for the 3D tensor structure due to the large amount of calculation. Therefore, we adopt nonlocal self-similarity based on the image blocks and only calculate the tensor core of block ${{\cal R}^{(k)}}$ without needing to calculate the tensor cores of its similar blocks [32].

Finally, by combining the sparse tensor factorization of the blocks, ${l_1}$ -norm constraint of ${{\cal C}^{(k)}}$, and nonlocal self-similarity constraint, the camera-array-based color polarized SR reconstruction can be expressed as follows:

(18)$$\begin{array}{l} \mathop {\textrm{min}}\limits_{{\cal I}} \left\{ {\frac{1}{2}\sum\limits_{m = 1}^{16} {\|{\textrm{vec}({{\mathbf I}_L^m} )- {{\mathbf M}_m}\textrm{vec}({{\mathbf I}_H^{c(m)}} )} \|_2^2} } \right. + \sum\limits_{k = 1}^K {\left( {\frac{\mu}{2}\|{{\mathbf R^{(k)}} - {\mathbf C^{(k)}}{\times_1}{\mathbf{\Phi}}_h^{(k)}{\times_2}{\mathbf{\Phi}}_w^{(k)}{\times_3}{\mathbf{\Phi}}_c^{(k)}} \|_F^2} \right.} \\ \textrm{ } + {\eta _k}\|{{{\cal U}^{(k)}} - {{\cal C}^{(k)}}{ \times_1}{\mathbf{\Phi}}_h^{(k)}{\times_2}{\mathbf{\Phi}}_w^{(k)}{\times_3}{\mathbf{\Phi}}_{c}^{(k)}} \|_F^2 { { + {\lambda_k}{{\|{{{\cal C}^{(k)}}} \|}_1}} )} \}\textrm{ } \end{array}$$

where ${\mathbf I}_H^{c(m )}$ denotes the image of the c(m)-th slice in the channel dimension and ${{\cal U}^{(k)}} \in {{\mathbb{R}}^{{f_h} \times {f_w} \times 12}}$ is the weighted sum of similar blocks,

(19)$${{\cal U}^{(k)}} = {{\cal S}^{(k)}}{\times _4}{{\mathbf w}^{(k)}}$$

with ${{\mathbf w}^{(k)}} \in {{\mathbb{R}}^{1 \times P}}$ being a weight vector, whose values depend on the distance between ${{\cal R}^{(k)}}$ and similar blocks. We denote the operation for calculating ${{\cal U}^{(k)}}$ as ${U_k}({})$.

The transpose of the observation matrix ${{\mathbf M}_m}$ is constructed using forward warping as in [48]. Because the proposed algorithm adopts nonlocal sparse tensor regularization for denoising, the sampling kernel in the observation matrix directly uses an isotropic Gaussian kernel with a relatively low computational complexity.

4. Optimization

We introduce ADMM optimization [49] with adaptive parameter adjustments to solve (18). In ADMM, Lagrangian multiplier $\cal L$ is introduced to constrain the solution of $\cal I$ as follows:

(20)$$\begin{array}{l} \mathop {\textrm{min}}\limits_{{\cal I}} \left\{ {\frac{1}{2}\sum\limits_{m = 1}^{16} {\|{\textrm{vec}({{\mathbf I}_L^m})- {{\mathbf M}_m}\textrm{vec}({{\mathbf I}_H^{c(m)}} )} \|_2^2} } \right. + \sum\limits_{k = 1}^K {\left( {\frac{\mu}{2}\|{{R_k}({{{\cal I}} - {{\cal L}}})- {{{\cal C}}^{(k)}}{\times_1}{\mathbf{\Phi}}_h^{(k)}{\times_2}{\mathbf{\Phi}}_w^{(k)}{ \times_3}{\mathbf{\Phi}}_c^{(k)}} \|_F^2} \right.} \\ \textrm{ + }{\eta _k}\|{{U_k}({{{\cal I}} - {{\cal L}}})- {{{\cal C}}^{(k)}}{ \times_1}\mathbf{\Phi}_h^{(k )}{ \times_2}\mathbf{\Phi}_w^{(k )}{ \times_3}\mathbf{\Phi}_c^{(k )}} \|_F^2\textrm{ } + { {{\lambda_k}{{\|{{{{\cal C}}^{(k)}}} \|}_1}} )} \}\textrm{ } \end{array}$$

Hence, camera-array-based color polarized SR reconstruction is expressed as an iterative update process of the dictionaries (${\mathbf{\Phi}}_h^{(k)}$, and ${\mathbf{\Phi}}_c^{(k)}$), tensor core ${{\cal C}^{(k)}}$, HR color polarized tensor $\cal I$, and Lagrangian multiplier $\cal L$ as described below.

Step 1. Update the dictionaries (${\Phi }_h^{(k )}$, ${\Phi }_w^{(k )}$, and ${\Phi }_c^{(k )}$)

$\cal I - \cal L$ substitutes I in section 3.3 to solve the local adaptive dictionary of the block. For the k-th block, the dictionaries after i + 1 iterations are given by

(21)$${\mathbf{\Phi}}_h^{(k),i + 1},{\mathbf{\Phi}}_w^{(k),i + 1},{\mathbf{\Phi}}_c^{(k),i + 1} = PCAT({{R_k}({{{\cal I}_i} - {{\cal L}_i}})} )$$

where $PCAT({} )$ represents dictionary learning described in section III-C.

Step 2. Update tensor core ${C^{(k )}}$

For tensor core ${{\cal C}^{(k)}}$ and retaining its relevant components in (20), the minimization problem is formulated as follows:

(22)$$\begin{array}{l} \mathop {\min }\limits_{{c^{(k )}}} \textrm{ }\sum\limits_{k = 1}^K {\left( {\frac{\mu }{2}\|{{R_k}({\cal I - L} )- {{\cal C}^{(k)}}{ \times_1}{\mathbf{\Phi}}_h^{(k )}{ \times_2}{\mathbf{\Phi}}_w^{(k )}{ \times_3}{\mathbf{\Phi}}_c^{(k )}} \|_F^2} \right.} \\ \textrm{ } + {\eta _k}\|{{U_k}({{\cal I} - L} )- {{\cal C}^{(k)}}{ \times_1}{\mathbf{\Phi}}_h^{(k )}{ \times_2}{\mathbf{\Phi}}_w^{(k )}{ \times_3}{\mathbf{\Phi}}_c^{(k )}} \|_F^2 { + {\lambda_k}{{\|{{{\cal C}^{(k)}}} \|}_1}} )\end{array}$$

The soft threshold shrinkage algorithm [58] is used to solve (22), and the tensor core at iteration i + 1 is given by

(23)$${{\cal C}^{(k),i + 1}} = soft\left( {\frac{{2\frac{{\eta_k^i}}{{{\mu_i}}}{{\mathbf{\beta}}^{(k),i}} + {{\mathbf{\alpha}}^{(k),i}}}}{{1 + 2\frac{{\eta_k^i}}{{{\mu_i}}}}},\frac{{\frac{{\lambda_k^i}}{{{\mu_i}}}}}{{1 + 2\frac{{\eta_k^i}}{{{\mu_i}}}}}} \right)$$

Due to variations in image content across ${\lambda _k}$ different regions, such as flat or textured areas, and the varying local and non-local self-similarity in different textured regions, the sparsity levels of their tensor cores also vary. Therefore, an adaptive strategy is employed to determine ${\eta _k}$ and for local block ${{{\cal R}}^{(k)}}$, aiming to solve a more accurate tensor core ${{{\cal C}}^{(k)}}$, i.e., ${{\alpha}^{(k),i + 1}} = {R_k}({{{\cal I}_i} - {{\cal L}_i}} ){ \times _1}{({{\mathbf{\Phi}}_h^{(k),i}})^T}{ \times _2}{({{\mathbf{\Phi}}_w^{(k),i}} )^T}{ \times _3}{({{\mathbf{\Phi}}_c^{(k),i}} )^T}$, $\frac{{\lambda _k^i}}{{{\mu _i}}} = \frac{{{c_1}}}{{{{\|{{{\alpha }^{(k ),i + 1}}} \|}_1}}}$, $\frac{{\eta _k^i}}{{{\mu _i}}} = \frac{{{c_2}}}{{\|{{{\alpha }^{(k ),i + 1}} - {{\beta }^{(k ),i + 1}}} \|_2^2}}$ [17], and ${{\beta}^{(k),i + 1}} = {U_k}({{{\cal I}_i} - {{\cal L}_i}}){ \times _1}{({{\mathbf{\Phi}}_h^{(k),i}})^T}{ \times _2}{({{\mathbf{\Phi}}_w^{(k),i}} )^T}{ \times _3}{({{\mathbf{\Phi}}_c^{(k),i}} )^T}$.

Step 3. Update the HR color polarized images $\{{{\mathbf I}_H^c} \}_{c = 1}^{12}$

For HR color polarized tensor $\cal I$ and retaining its relevant components in (20), we obtain

(24)$$\begin{array}{l} E({{\mathbf I}_H^c} )\textrm{ = }\frac{1}{2}\sum\limits_{m = 1}^{16} {\|{\textrm{vec}({{\mathbf I}_L^m} )- {{\mathbf M}_m}\textrm{vec}({{\mathbf I}_H^{c(m )}({{\mathbf I}_H^c} )} )} \|_2^2} \\ \textrm{ }\textrm{ + }\frac{\mu }{2}\sum\limits_{k = 1}^K {\|{{R_k}({{{\cal I}} - {{\cal L}}} )- {{{\cal C}}^{(k)}}{ \times_1}{\mathbf{\Phi}}_h^{(k)}{ \times_2}{\mathbf{\Phi}}_w^{(k)}{ \times_3}{\mathbf{\Phi}}_c^{(k)}} \|_F^2} \end{array}$$

Because ${U_k}({{\cal I} - {\cal L}})$ is a function of ${\cal S}^{(k)}$ and not of ${{\cal R}^{(k)}}$, there is no quadratic penalty term for ${U_k}({{\cal I} - {\cal L}})$ in solution (24) of $\cal I$. The minimization problem of $E({{\cal I}_H^c})$ is a quadratic convex problem with closed solutions. However, as solving the inverse of a large matrix is time consuming, we adopt a numerical optimization algorithm to solve $\cal I$.

(25)$${{\cal G}} = \sum\limits_{k = 1}^K {\left[ {\sum\limits_{k = 1}^K {R_k^T({{{1}_{{f_h} \times {f_w} \times 12}}} )} } \right] \odot [{R_k^T({{R_k}({{\cal I} - {{\cal L}}})- {C^{(k)}}{ \times_1}{\Phi }_h^{(k )}{ \times_2}{\Phi }_w^{(k )}{ \times_3}{\Phi }_c^{(k)}} )} ]}$$

(26)$$\textrm{vec}\left( {\frac{{\partial E({{\mathbf I}_H^c} )}}{{\partial {\mathbf I}_H^c}}} \right) = \mu \textrm{vec}({{{\mathbf G}_c}} )- \sum\limits_{m = 1}^{16} {\textrm{vec}({{\mathbf L}_{c(m ),c}^H} )} \odot ({{\mathbf M}_m^T({\textrm{vec}({{\mathbf I}_L^m} )- {{\mathbf M}_m}\textrm{vec}({{\mathbf I}_H^{c(m ),i}} )} )} ){{G}_c}$$

where ${{1}_{{f_h} \times {f_w} \times 12}}$ represents a tensor of dimension ${f_h} \times {f_w} \times 12$ and elements 1 and $\mathbf G_c$ represents the c-th slice of $\cal G$ in the channel dimension. Minimizing the data term constraint yields a clear image rich in texture and details but susceptible to noise interference. On the other hand, minimizing the non-local sparse tensor regularization term constraint produces a smooth and clean image that retains most of the texture. To balance the data term and regularization term constraints, we introduce a weighting coefficient $\mu $. Upon observing actual captured images, it is found that noise is approximately globally uniformly distributed; hence, we set $\mu $ as a global coefficient. It grows in equal intervals as the number of iterations increases [45]. This is because as the number of iterations increases, the estimated HR color polarized images become cleaner, and the weight of the second term in (24) should be increased accordingly. Parameter $\mu $ at iteration i is given by

(27)$${\mu _i} = \min ({{\mu_0} + {\textrm{grow}}\_{\textrm{rate}} \cdot \textrm{fix}({i/{{\textrm{grow}}\_{\textrm{step}}}} ),1.0} )$$

where ${\mu _0}$ is the initial value, ${\textrm{grow}}\_{\textrm{rate}}$ is the growth step with equal intervals, and fix denotes the round-down function.

${\mathbf I}_H^c$ at iteration i + 1 is given by

(28)$${\mathbf I}_H^{c,i + 1} = {\mathbf I}_H^{c,i} - \Delta t\frac{{\partial E({{\mathbf I}_H^{c,i}} )}}{{\partial {\mathbf I}_H^{c,i}}}$$

Because the image from each channel is updated iteratively based on images from other channels, the images must be updated before proceeding to the next iteration in (28) instead of updating each image independently.

Step 4. Update Lagrangian multiplier ${{\cal L}}$

(29)$${{{\cal X}}^{i + 1}} = \sum\limits_{k = 1}^K {{{\left[ {\sum\limits_{k = 1}^K {R_k^T({{{1}_{{f_h} \times {f_w} \times 12}}})}} \right]}^{\textrm{ - }1}}} \odot [{R_k^T({{C^{(k),i + 1}}{ \times_1}{\mathbf{\Phi}}_h^{(k),i + 1}{ \times_2}{\mathbf{\Phi}}_w^{(k),i + 1}{ \times_3}{\mathbf{\Phi}}_c^{(k),i + 1}})}]$$

(30)$${{\cal L}^{i + 1}} = {{\cal L}^i} - ({{{\cal I}^{i + 1}} - {{{\cal X}}^{i + 1}}})$$

5. Experimental results

5.1 Experimental settings

We used the adaptive iterative algorithm of block matching presented in [59] to reduce the calculation load, that is, to calculate the iteration position of the next update block matching based on the reconstruction difference between two successive iterations. Simultaneously, the number of similar blocks was reduced by five when updating block matching. The initial number of similar blocks was set to 40, and the size of each block was 8 × 8 × 12. Parameter $\nu $ is set to 0.2, with a higher value indicating a greater inter-channel confidence level. Moreover, ${c_1}$ was set to 0.002, and ${c_2}$ was set to 0.02. For the ADMM, the maximum number of iterations was 25, ${\mu _0}$ was 0.5, ${\textrm{grow}}\_{\textrm{rate}}$ was 0.1, ${\textrm{grow}}\_{\textrm{step}}$ was 5, and $\Delta t$ was 1. The above parameter settings were used for images with grayscale values ranging from 0 to 1.

The proposed camera-array-based color polarized SR reconstruction is described in Algorithm1.

oe-32-2-2364-i001

5.2 Results on synthetic data

Existing color polarization datasets [15,20,40] were constructed by mainly capturing images using rotating polarizers. For dynamic scenes, images with different polarization directions exhibited local motion. Therefore, we chose static indoor scenes from the datasets in [15,40] to evaluate the effectiveness of our system and algorithm.

To substantiate the efficacy of the proposed SR method, a comparative analysis was conducted with state-of-the-art DL-based single-image SR methods: EDSR [21], HAN [23], SwinIR-Real-GAN [24], SwinIR-Real-PSNR [24], and HAT [27]. For the training datasets utilized in EDSR, HAN, and HAT, LR images were generated using “bicubic” downsampling from ground truth images. For the training datasets employed in SwinIR-Real-GAN and SwinIR-Real-PSNR, the same degradation model as BSRGAN [60] was utilized to synthesize low-quality images, which contributes to realizing real-world SR. It should be noted that SwinIR-Real-GAN and SwinIR-Real-PSNR differ in their loss functions, with the former utilizing adversarial loss while the latter employs mean squared error loss. Although HAT also consists of a real SR model trained on the same dataset, it did not provide a pre-trained model for 2x SR, which made it impossible to conduct comparative testing. Furthermore, to validate the superiority of the proposed system, we also compared the proposed CAPS and DoFP color polarization camera, both of which have 12 channels as described in section 2.4 and a total pixel count that is equivalent. For color polarization mosaic image reconstruction, we adopted the state-of-the-art CPDnet [20], EARI [15] and JCPD [18] based on deep learning, interpolation, and regularization, respectively, for comprehensive analysis and comparisons.

According to the CAPS observation model described in section 2.4, we applied geometric transformations (i.e., random offset, rotation, and scale transformation), spatial-domain downsampling, spectral-domain downsampling, and noise interference to the ground-truth color polarization images to generate degraded Bayer RGB polarized image sequences. In spatial-domain downsampling, a Gaussian weight function with a downsampling factor of 2 was employed. Spectral-domain downsampling was performed using the RGGB Bayer pattern. The noise interference was assumed to be white Gaussian noise. Because current DL-based SR methods predominantly operate within the RGB color space, the degraded Bayer RGB polarized image sequence, generated through CAPS simulation, was initially demosaicked to yield a color polarized image sequence. Subsequently, each sequence image was upsampled by a factor of 2. The superpixel layouts of the input color polarized mosaic images for CPDNet, EARI, and JCPD were the same as those used in their original studies. We added the same noise components to the simulated color polarized mosaic images as in CAPS. The DoLP and AoP images followed the color map shown in Fig. 7.

Fig. 7. Colormap for DoLP and AoP visualization. Axis labels R, G, and B correspond to the RGB channels for DoLP and AoP.

Download Full Size | PDF

Figure 8 and Fig. 9 displayed the reconstruction results on the datasets constructed by Morimatsu et al. [15] and Qiu et al. [40], where ${0^ \circ }$, ${90^ \circ }$, $\textrm{DoLP}$ and $\textrm{AoP}$ and indicate a color polarized images oriented in the 0° and ${90^ \circ }$ direction, DoLP image and AoP image, respectively. Upon close examination of Fig. 8 and Fig. 9, it was apparent that the reconstruction results of the proposed CAPS and algorithm were consistent with the ground-truth images, particularly regarding the DoLP and AoP images. While SwinIR-Real-GAN and SwinIR-Real-PSNR exhibited the capability to produce sharper results in polarized intensity images, it was noteworthy that DL-based SR tended to introduce severe spectral and polarization distortions because of the correlation neglect among different polarization directions. This phenomenon was evident in DoLP and AoP images. Furthermore, as those methods relied on guessed high-frequency details and failed to exploit complementary spatial information from the input image sequence, some fine details remained unrecovered or exhibited deformation, as exemplified by the stripes in the red box as shown in Fig. 8. The results for EDSR, HAN, and HAT were less favorable, primarily because their training degradation models differed from the testing degradation models, thus showcasing the poor generalization of DL-based SR methods. In the restoration results of simulated DoFP color polarization camera, corrupted by noise, the DoLP and AoP images provided by EARI and JCPD failed to show polarization information of the target object. As CPDNet relies heavily on the training set, it produced serious distortions of the chromatic and polarimetric signatures when applied to images from other datasets. The proposed CAPS and algorithm did not lose color and polarization information and reconstructed cleaner intensity images with finer details.

Fig. 8. Reconstruction results for synthetic observations from dataset in [15] using DL-based single-image SR methods (EDSR, HAN, SwinIR-Real-GAN, SwinIR-Real-PSNR, and HAT), color polarization demosaicking methods (CPDNet, EARI and JCPD), and the proposed method. For DL-based single-image SR methods simulation, data generated through CAPS simulation is first demosaicked to yield a color polarized image sequence and each sequence image is subsequently upsampled by a factor of 2. For CPDNet, EARI, and JCPD simulation, data captured by the color polarization camera was first simulated, subsequently, the three methods were applied to restore the full-resolution color polarization image.

Download Full Size | PDF

Fig. 9. Reconstruction results for synthetic observations from dataset in [40] using DL-based single-image SR methods (EDSR, HAN, SwinIR-Real-GAN, SwinIR-Real-PSNR, and HAT), color polarization demosaicking methods (CPDNet, EARI and JCPD), and the proposed method.

Download Full Size | PDF

Table 1 lists the peak SNR (PSNR), structural similarity (SSIM) and perceptual similarity (LPIPS) of the reconstruction results for the evaluated datasets. As LPIPS relies on networks trained on intensity images to calculate the perceptual similarity between two images, LPIPS was exclusively employed to assess the four polarization intensity images. The PSNR of the proposed CAPS and algorithm was the highest in most cases, and their SSIM and LPIPS was superior in all the cases, with the gain being more evident for the DoLP and AoP images.

Table 1. PSNR, SSIM and LPIPS of reconstruction results on color polarization datasets reported in [15] and [40]

View Table | View all tables in this article

Ablation experiments were conducted to validate the necessity of the main components of the proposed algorithm. The main components were 1) the joint solution of color and polarization, 2) data fidelity constraint of guided filtering based confidence, and 3) nonlocal sparse tensor regularization constraint. In the ablation experiments, except for the eliminated component, no other parts were changed. The first ablation experiment followed the framework presented in [15]. Specifically, the Bayer demosaicking algorithm in [15] was adopted to eliminate the IFOV error, and SR reconstruction of the monochromatic polarized images was performed independently per RGB channel, which employed the joint constraints of (2) and (3). Because the monochromatic polarized image sequence had four channels, parameters ${c_1}$ and ${c_2}$ were set to 0.001 and 0.01, respectively, after adjustment. The second ablation experiment (No CAGF) removed (2), and the data fidelity term solely employed the observed image with matching chromatic and polarimetric signatures as the HR image to be estimated. This was consistent with the framework in [17], except that four grayscale images with four channels were used in [17], whereas we input 16 grayscale images with 12 channels. The third ablation experiment (No NLSTR) removed (3), that is, the cost function was constrained only by the data fidelity term.

Fig. 10 showed the partial reconstruction results on a bus image from the dataset in [15] and an indoor scene from the dataset in [40] . The DoLP and AoP images showed that ICCP suffers from severe distortion of chromatic and polarimetric signatures. The edges and textures of No CAGF were oversmoothed. This was because the information of the reconstructed image only exploited the spatial channel multidimensional similarity of the color polarization data to restore lost information. However, its source was not guaranteed to contain the complementary information of the 16 images. No NLSTR was considerably corrupted by noise caused by overfitting of the data fidelity term owing to the lack of a regularization constraint. Table 2 lists the quantitative results of the ablation experiments. It can be observed from Table 2, that the metrics for polarization intensity, DoLP and AoP images in the proposed method achieve either the best or near-best results. This further emphasized that the joint constraints of the three primary components can better balance detail enhancement, noise removal, and distortion suppression of the chromatic and polarimetric signatures.

Table 2. PSNR, SSIM and LPIPS obtained from ablation experiments^a

View Table | View all tables in this article

Fig. 10. Ablation experiment results on bus image from dataset in [15] and indoor scene from dataset in [40]. ICCP used the cascade structure of color and polarization modules to replace the joint solution structure in the proposed algorithm. No CAGF removed guided filtering based confidence from the proposed algorithm. No NLSTR removed the nonlocal sparse tensor regularization constraint of the proposed algorithm.

Download Full Size | PDF

Although the PSNR values from the proposed method were not the highest on certain images in Table 1 and Table 2, these values continued to rank among the top. This was because PSNR primarily focused on pixel-level differences, tending to smooth results, rather than mammalian visual perception, which prioritized scene structures [61]. This characteristic proved beneficial for oversmoothed images and those subjected to demosaicking (where some pixels are copied from the ground-truth image). In contrast, the proposed method concentrated more on high-frequency detail and structures recovery, which explains why it excels in terms of SSIM and LPIPS.

We also compared the runtime of different algorithms. ICCP, No CAGF, No NLSTR, and our method were implemented in MATLAB (MathWorks, Natick, MA, USA) on an Intel I5-9500 processor (3.00 GHz) with 16 GB of memory. The DL-based SR algorithms utilized an NVIDIA A100 80GB PCIe GPU. To obtain clean polarization intensity images, DoLP, and AoP, we incorporate the regularization term constraint of non-local sparse tensor in the reconstruction algorithm. However, this constraint introduces a time-consuming element to the process. As shown in the Table 3, the runtime of our method is close to that of No CAGF. This reflects that the majority of our method is spent on this constraint. A primary reason for the long runtime of the regularization constraint of non-local sparse tensor is the individual computation of sparse tensor decomposition for local blocks, which can be shortened through parallel computation of multiple local blocks. Additionally, introducing data-driven regularization term constraints as a replacement for the regularization term constraints of the non-local sparse tensor holds the potential to significantly enhance runtime efficiency in the future.

Table 3. Runtime of different algorithms^a

View Table | View all tables in this article

5.3 Field test results

We use a four-aperture CAPS. To ensure non-redundant imaging among apertures and considering manufacturing errors, aberrations of the optical imaging system, and image registration errors, let ${{{f_{diffraction}}} / {{f_{Nyquist}}}} \ge 2$ [48]. According to (5), ${{{f_{diffraction}}} / {{f_{Nyquist}}}}$ is related to the focal length, F-number of the lens, and pixel size of the sensor. Based on the imaging requirements and to achieve distant target imaging, we selected Edmund 35 mm C visible-near infrared lens (fixed focal length of 35 mm and F/1.65), Hikvision MV-CA013-20GC color camera (PYTHON1300; with pixel size of $4.8\textrm{ }\mathrm{\mu m} \times 4.8\textrm{ }\mathrm{\mu m}$) and Thorlabs WP25M-VIS visible wire grid polarizer (with the usable wavelength range of 420 nm∼700 nm). At the central wavelength of $\lambda = 550\textrm{ }\textrm{nm}$, ${{{f_{diffraction}}} / {{f_{Nyquist}}}} \approx 8.7 \gg 2$.

In this section, applications of the proposed polarization imaging system are demonstrated, including target detection, and classification, reflection separation, and glare removal. In image acquisition, the offset between the multichannel images split from the Bayer filter image captured per aperture was not ideally one pixel. Hence, we adopted the method reported in [62] to correct the translational motion. The coarse-to-fine registration algorithm based on the hierarchical clustering of feature points (Crab-HFP) [48] were used to calculate the warping matrix between the apertures to correct their geometric transformations.

To ensure a fair comparison with DL-based SR methods and to avoid color sharpening introduced by camera Image Signal Processing (ISP), which could impact spectral fidelity assessments, the Bayer demosaicking algorithm [15] was employed to generate color images of the camera-captured Bayer data. In terms of the image dimensions captured by our built system, the HAT imposed an impractical working memory demand of 54 GB, which made it unsuitable for real-world testing. We adjusted the existing algorithms reported in [15] and [17] to the proposed system, that is, ICCP and No CAGF of the ablation experiments. In addition, we evaluated the images captured by the apertures to clearly distinguish the resolution improvement before and after reconstruction. Because of the IFOV of the Bayer filter and viewpoint changes among apertures, the DoLP and AoP were calculated after geometric correction of the images captured by the apertures, and took one of the two G channels. Additionally, to illustrate the spectral distortions associated with DL-based SR methods, we also exhibited images subjected to Bayer demosaicking followed by cubic interpolation. This comparison effectively validated whether spectral distortions emerge from Bayer demosaicking or the DL-based SR methods.

From Fig. 11–13, it can be observed that the reconstruction results of the real scene were similar to those of the synthetic data. While the intensity images reconstructed by SwinIR-Real-GAN and SwinIR-Real-PSNR appeared the clearest, they introduced erroneous high-frequency details in specific scenarios, as indicated by the red box shown in Fig. 12, which exhibited undesired cracks and distortions. As no ground-truth high-quality images exist, only a visual comparison is provided. Although no ground-truth images are available to validate the spectral and polarimetric signature fidelity of high-frequency information, the acquired LR images served as references for the spectral and polarimetric signature fidelity of low-frequency components. Comparing DoLP and AoP images captured by our built system and reconstructed by SwinIR-Real- GAN and SwinIR-Real-PSNR, SwinIR-Real-GAN and SwinIR-Real-PSNR revealed a substantial distortion in the overall spectral and polarimetric signature. This was particularly noticeable in the green boxes as shown in Fig. 11 and Fig. 12. Tric, EDSR, HAN, and HAT were notably impacted by noise interference and failed to recover certain fine details. The edges and textures of the intensity image reconstructed by No CAGF were oversmoothed and blurred. The intensity images reconstructed using ICCP and our method achieved outstanding quality, with our method producing sharper edges. Upon inspecting the DoLP and AoP images within the overall and local red boxes shown in Fig. 12, it became evident that among all methods, the spectral and polarization distortions in the results reconstructed by ICCP were the most severe. The intensity, DoLP, and AoP images captured by the apertures were corrupted by noise and exhibited obvious jagged artifacts on the edges. In contrast, the proposed method reconstructed reliable texture details without introducing spurious spectral and polarimetric signatures, while enhancing the SNR of the intensity, DoLP, and AoP images. Furthermore, the significant polarization amplitude differences among different targets improved applications including target detection and classification.

Fig. 11. Polarization imaging application in target detection and reflection separation. As highlighted in the red boxes, the car's front grille is not easily discernible in the intensity image, albeit becomes prominent in the DoLP and AoP images. This characteristic aids in target detection [3]. Observing the car's windows, it is evident that the radiance of reflected and transmitted light vary with different polarization directions, as depicted in the yellow boxes. This characteristic can be utilized for reflection and transmission separation.

Download Full Size | PDF

Fig. 12. Polarization imaging application in target classification. Objects fabricated from various materials, including miniature metal car models, architectural plastic models, road model with a sandpaper texture. It can be observed from the DoLP and AoP images that these materials exhibit distinct polarimetric features. This characteristic can be leveraged for distinguishing targets composed of different materials.

Download Full Size | PDF

Fig. 13. Polarization imaging application in glare removal from water surfaces. Observing intensity images at different polarization angles, it is clear that the 90° polarization direction exhibits glare on the lake's surface, while the 0° polarization direction does not, as depicted in the yellow boxes. This characteristic can be employed to effectively eliminate water surface glare [63].

Download Full Size | PDF

In real imaging, we obtained the Modulation Transfer Function (MTF) curve from sinusoidal Siemens stars. It can be seen from Fig. 14 and Fig. 15 that although the images reconstructed by SwinIR exhibit higher contrast (higher MTF values), the recovered black-and-white sinusoidal fringe deviate from the ground truth. In contrast, our method is capable of effectively decoding aliasing artifacts, restoring fine textures and details, and remaining faithful to the ground truth. The limit resolution of the reconstructed image is the frequency of the intersection between its MTF curve and the human eye contrast threshold (HECT). The HECT value is approximately equal to 0.03. The limiting frequency of a single aperture is the Nyquist frequency equal to 0.5 cycles/pixel. Therefore, the SR magnification is the limit resolution divided by 0.5, as shown in Table 4. Our method achieves a real SR magnification of 2.5860.

Table 4. Limit resolution and SR magnification of different algorithms^a

View Table | View all tables in this article

Fig. 14. Comparison of the reconstruction results of real sinusoidal Siemens stars to validate the SR capability. The first channel (0°-R) image is selected for presentation. Compared with other methods, our method can effectively decode aliasing artifacts, restore fine textures and details and be faithful to the ground truth.

Download Full Size | PDF

Fig. 15. MTF curves are calculated from sinusoidal Siemens stars images reconstructed using different algorithms. The limit resolution and SR magnification are obtained based on the MTF curves. Although SwinIR exhibits higher contrast (higher MTF), Observing Fig. 14, it can be noticed that the recovered black-and-white sinusoidal fringe deviate from the ground truth.

Download Full Size | PDF

6. Conclusion

We propose a novel snapshot color polarized imaging system, CAPS, that enables high-spatial-resolution imaging with general applicability. Currently, most SR algorithms are built upon the RGB color space, with limited research on other color spaces. Moreover, existing color polarization image restoration algorithms often fail to effectively harness the complementarity among all observed images, as they solely employ the observed image with matching chromatic and polarimetric signatures as the HR channel image to be estimated in the data fidelity term. The proposed confidence-guided SR reconstruction algorithm based on guided filtering fully uses the complementarity among all channels without introducing spurious information to reconstruct HR images for each channel. Furthermore, we integrate the constraints of guided filtering based confidence and a nonlocal sparse tensor simultaneously performs SR imaging and denoising, which are balanced by ADMM optimization with adaptive parameter adjustments. Experimental results on synthetic data demonstrate that the proposed system and algorithm outperform the DoFP color polarized camera and the color polarization demosaicking algorithms. Experimental results on real data captured by the built CAPS system demonstrate that the proposed algorithm outperform the state-of-the-art DL-based SR methods. Compared to the observed images before reconstruction, the proposed CAPS system achieves a real SR magnification of 2.586. In future work, we will explore real-time high-quality SR reconstruction and extend color polarized SR imaging to multispectral polarization imaging.

Funding

Natural Science Foundation of Fujian Province (2023J01130137); Fuzhou University (GXRC-18066).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data may be obtained from the authors upon reasonable request.

References

1. D. Miyazaki, T. Shigetomi, M. Baba, et al., “Surface normal estimation of black specular objects from multiview polarization images,” Opt. Eng. 56(4), 041303 (2016). [CrossRef]

2. F. Hu, Y. Cheng, L. Gui, et al., “Polarization-based material classification technique using passive millimeter-wave polarimetric imagery,” Appl. Opt. 55(31), 8690–8697 (2016). [CrossRef]

3. H. Mei, B. Dong, W. Dong, et al., “Glass segmentation using intensity and spectral polarization cues,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2022), pp. 12622–12631.

4. Z. Cui, J. Gu, B. Shi, et al., “Polarimetric multi-view stereo,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), pp. 1558–1567.

5. C. Åkerlind, T. Hallberg, J. Eriksson, et al., “Optical polarization: background and camouflage,” in Target and Background Signatures III (SPIE2017), pp. 22–35.

6. F. Huang, C. Ke, X. Wu, et al., “Polarization dehazing method based on spatial frequency division and fusion for a far-field and dense hazy image,” Appl. Opt. 60(30), 9319–9332 (2021). [CrossRef]

7. F. Liu, Y. Wei, P. Han, et al., “Polarization-based exploration for clear underwater vision in natural illumination,” Opt. Express 27(3), 3629–3641 (2019). [CrossRef]

8. Y. Andre, J.-M. Laherrere, T. Bret-Dibat, et al., “Instrumental concept and performances of the POLDER instrument,” in Remote Sensing and Reconstruction for Three-Dimensional Objects and Scenes (SPIE, 1995), pp. 79–90.

9. H. Kurosaki, H. Koshiishi, T. Suzuki, et al., “Development of tunable imaging spectro-polarimeter for remote sensing,” Adv. Space Res. 32(11), 2141–2146 (2003). [CrossRef]

10. F. Li, Y. Xu, and Y. Ma, “Design of hyper-spectral and full-polarization imager based on aotf and lcvr,” in International Symposium on Optoelectronic Technology and Application (SPIE, 2014), pp. 204–214.

11. W. Ren, C. Fu, D. Wu, et al., “Channeled compressive imaging spectropolarimeter,” Opt. Express 27(3), 2197–2211 (2019). [CrossRef]

12. T. Mu, S. Pacheco, Z. Chen, et al., “Snapshot linear-Stokes imaging spectropolarimeter using division-of-focal-plane polarimetry and integral field spectroscopy,” Sci. Rep. 7(1), 42115 (2017). [CrossRef]

13. S. Sattar, P.-J. Lapray, A. Foulonneau, et al., “Review of spectral and polarization imaging systems,” in Unconventional Optical Imaging II (SPIE, 2020), pp. 191–203.

14. S. Roussel, M. Boffety, and F. Goudail, “Polarimetric precision of micropolarizer grid-based camera in the presence of additive and Poisson shot noise,” Opt. Express 26(23), 29968–29982 (2018). [CrossRef]

15. M. Morimatsu, Y. Monno, M. Tanaka, et al., “Monochrome and color polarization demosaicking using edge-aware residual interpolation,” in Int. Conf. Image Process. (IEEE, 2020), pp. 2571–2575.

16. A. B. Tibbs, I. M. Daly, N. W. Roberts, et al., “Denoising imaging polarimetry by adapted BM3D method,” J. Opt. Soc. Am. A 35(4), 690–701 (2018). [CrossRef]

17. J. Zhang, J. Chen, H. Yu, et al., “Polarization image demosaicking via nonlocal sparse tensor factorization,” IEEE Trans. Geosci. Remote Sens. 60, 1–10 (2022). [CrossRef]

18. S. Wen, Y. Zheng, and F. Lu, “A sparse representation based joint demosaicing method for single-chip polarized color sensor,” IEEE Trans. on Image Process. 30, 4171–4182 (2021). [CrossRef]

19. Z. Zha, X. Yuan, B. Wen, et al., “A benchmark for sparse coding: When group sparsity meets rank minimization,” IEEE Trans. on Image Process. 29, 5094–5109 (2020). [CrossRef]

20. S. Wen, Y. Zheng, F. Lu, et al., “Convolutional demosaicing network for joint chromatic and polarimetric imagery,” Opt. Lett. 44(22), 5646–5649 (2019). [CrossRef]

21. B. Lim, S. Son, H. Kim, et al., “Enhanced deep residual networks for single image super-resolution,” in Conf. Comput. Vis. Pattern Recognit. Workshops (IEEE, 2017), pp. 136–144.

22. Y. Zhang, K. Li, K. Li, et al., “Image super-resolution using very deep residual channel attention networks,” in Eur. Conf. Comput. Vis. (2018), pp. 286–301.

23. B. Niu, W. Wen, W. Ren, et al., “Single image super-resolution via a holistic attention network,” in Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part XII 16 (Springer, 2020), pp. 191–207.

24. J. Liang, J. Cao, G. Sun, et al., “Swinir: Image restoration using swin transformer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (2021), pp. 1833–1844.

25. C. Saharia, J. Ho, W. Chan, et al., “Image super-resolution via iterative refinement,” IEEE Trans. Pattern Anal. Mach. Intell. 45, 1–14 (2022). [CrossRef]

26. B. B. Moser, F. Raue, S. Frolov, et al., “Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances,” IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9862–9882 (2023). [CrossRef]

27. X. Chen, X. Wang, J. Zhou, et al., “Activating more pixels in image super-resolution transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2023), pp. 22367–22377.

28. D. Yu, Q. Li, Z. Zhang, et al., “Color Polarization Image Super-Resolution Reconstruction via a Cross-Branch Supervised Learning Strategy,” Optics and Lasers in Engineering 165, 107469 (2023). [CrossRef]

29. J. Lafenetre, N. L. Nguyen, G. Facciolo, et al., “Handheld burst super-resolution meets multi-exposure satellite imagery,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2023), pp. 2055–2063.

30. Y. Zhang, T. Li, Y. Zhang, et al., “Computational Super-Resolution Imaging With a Sparse Rotational Camera Array,” IEEE Trans. Comput. Imaging 9, 425–434 (2023). [CrossRef]

31. A. Marquina and S. J. Osher, “Image super-resolution by TV-regularization and Bregman iteration,” J. Sci. Comput. 37(3), 367–382 (2008). [CrossRef]

32. J. Zhang, D. Zhao, and W. Gao, “Group-based sparse representation for image restoration,” IEEE Trans. on Image Process. 23(8), 3336–3351 (2014). [CrossRef]

33. Y. Peng, J. Suo, Q. Dai, et al., “Reweighted low-rank matrix recovery and its application in image restoration,” IEEE Trans. Cybern. 44(12), 2418–2430 (2014). [CrossRef]

34. Z. Zha, B. Wen, X. Yuan, et al., “Image restoration via reconciliation of group sparsity and low-rank models,” IEEE Trans. on Image Process. 30, 5223–5238 (2021). [CrossRef]

35. G. Bhat, M. Danelljan, L. Van Gool, et al., “Deep burst super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2021), pp. 9209–9218.

36. N. L. Nguyen, J. Anger, A. Davy, et al., “Self-supervised multi-image super-resolution for push-frame satellite images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2021), pp. 1121–1131.

37. L. Sun, F. Wu, T. Zhan, et al., “Weighted nonlocal low-rank tensor decomposition method for sparse unmixing of hyperspectral images,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 13, 1174–1188 (2020). [CrossRef]

38. R. Dian, L. Fang, and S. Li, “Hyperspectral image super-resolution via non-local sparse tensor factorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), pp. 5344–5353.

39. V. Gruev, R. Perkins, and T. York, “CCD polarization imaging sensor with aluminum nanowire optical filters,” Opt. Express 18(18), 19087–19094 (2010). [CrossRef]

40. S. Qiu, Q. Fu, C. Wang, et al., “Polarization demosaicking for monochrome and color polarization focal plane arrays,” in Proc. Int. Symp. Vis., Modeling Vis. (2019), pp. 117–124.

41. G. Carles, G. Muyo, N. Bustin, et al., “Compact multi-aperture imaging with high angular resolution,” J. Opt. Soc. Am. A 32(3), 411–419 (2015). [CrossRef]

42. M. A. Preciado, G. Carles, and A. R. Harvey, “Video-rate computational super-resolution and integral imaging at longwave-infrared wavelengths,” OSA Continuum 1(1), 170–180 (2018). [CrossRef]

43. J. Downing, E. Findlay, G. Muyo, et al., “Multichanneled finite-conjugate imaging,” J. Opt. Soc. Am. A 29(6), 921–927 (2012). [CrossRef]

44. B. Wronski, I. Garcia-Dorado, M. Ernst, et al., “Handheld multi-frame super-resolution,” ACM Trans. Graph. 38(4), 1–18 (2019). [CrossRef]

45. B. Lecouat, J. Ponce, and J. Mairal, “Lucas-kanade reloaded: End-to-end super-resolution from raw image bursts,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (2021), pp. 2370–2379.

46. N. L. Nguyen, J. Anger, A. Davy, et al., “Self-supervised super-resolution for multi-exposure push-frame satellites,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2022), pp. 1858–1868.

47. G. Carles, J. Downing, and A. R. Harvey, “Super-resolution imaging using a camera array,” Opt. Lett. 39(7), 1889–1892 (2014). [CrossRef]

48. F. Huang, Y. Chen, X. Wang, et al., “Spectral Clustering Super-Resolution Imaging Based on Multispectral Camera Array,” IEEE Trans. on Image Process. 32, 1257–1271 (2023). [CrossRef]

49. S. Boyd, N. Parikh, E. Chu, et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. 3(1), 1–122 (2010). [CrossRef]

50. X. Liu, X. Li, and S.-C. Chen, “Enhanced polarization demosaicking network via a precise angle of polarization loss calculation method,” Opt. Lett. 47(5), 1065–1069 (2022). [CrossRef]

51. A. Kanaev, M. Kutteruf, M. Yetzbacher, et al., “Imaging with multi-spectral mosaic-array cameras,” Appl. Opt. 54(31), F149–F157 (2015). [CrossRef]

52. K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013). [CrossRef]

53. S. Liu, J. Chen, Y. Xun, et al., “A new polarization image demosaicking algorithm by exploiting inter-channel correlations with guided filtering,” IEEE Trans. on Image Process. 29, 7076–7089 (2020). [CrossRef]

54. J. Zhang, H. Luo, B. Hui, et al., “Image interpolation for division of focal plane polarimeters with intensity correlation,” Opt. Express 24(18), 20799–20807 (2016). [CrossRef]

55. Y. Qian, F. Xiong, S. Zeng, et al., “Matrix-vector nonnegative tensor factorization for blind unmixing of hyperspectral imagery,” IEEE Trans. Geosci. Remote Sensing 55(3), 1776–1792 (2017). [CrossRef]

56. W. Wan, W. Guo, H. Huang, et al., “Nonnegative and nonlocal sparse tensor factorization-based hyperspectral image super-resolution,” IEEE Trans. Geosci. Remote Sensing 58(12), 8384–8394 (2020). [CrossRef]

57. Z. Zha, X. Yuan, B. Wen, et al., “Group sparsity residual constraint with non-local priors for image restoration,” IEEE Trans. on Image Process. 29, 8960–8975 (2020). [CrossRef]

58. I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pur. Appl. Math. 57(11), 1413–1457 (2004). [CrossRef]

59. L. Bian, Y. Wang, and J. Zhang, “Generalized MSFA engineering with structural and adaptive nonlocal demosaicing,” IEEE Trans. on Image Process. 30, 7867–7877 (2021). [CrossRef]

60. K. Zhang, J. Liang, L. Van Gool, et al., “Designing a practical degradation model for deep blind image super-resolution,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (2021), pp. 4791–4800.

61. Z. Wang, A. C. Bovik, H. R. Sheikh, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

62. M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efficient subpixel image registration algorithms,” Opt. Lett. 33(2), 156–158 (2008). [CrossRef]

63. R. Avrahamy, B. Milgrom, M. Zohar, et al., “Improving object imaging with sea glinted background using polarization method: analysis and operator survey,” IEEE Trans. Geosci. Remote Sensing 57(11), 8764–8774 (2019). [CrossRef]

		EDSR	HAN	SwinIR1^a	SwinIR2	HAT	CPD-Net	EARI	JCPD	Ours
Dataset in [15]
PSNR (dB) ↑	I_0°	32.79	32.77	30.88	32.03	32.78	25.02	35.33	36.05	36.51
	I_90°	32.75	33.63	30.78	32.26	33.36	26.57	35.20	35.95	35.14
	DoLP	18.21	18.42	18.93	20.68	18.32	15.77	20.16	21.01	21.41
	AoP	9.095	9.375	10.06	10.76	9.239	7.754	9.940	9.944	11.30
SSIM ↑	I_0°	0.915	0.915	0.9267	0.941	0.915	0.714	0.937	0.946	0.964
	I_90°	0.916	0.924	0.929	0.945	0.922	0.713	0.940	0.949	0.962
	DoLP	0.302	0.310	0.419	0.516	0.299	0.255	0.400	0.386	0.516
	AoP	0.141	0.157	0.211	0.319	0.146	0.123	0.217	0.181	0.327
LPIPS ↓	I_0°	0.168	0.168	0.092	0.082	0.169	0.115	0.111	0.079	0.047
	I_45°	0.169	0.171	0.093	0.085	0.172	0.094	0.119	0.079	0.052
	I_90°	0.166	0.163	0.088	0.080	0.166	0.088	0.120	0.077	0.047
	I_135°	0.175	0.170	0.090	0.086	0.171	0.095	0.120	0.085	0.051
Dataset in [40]
PSNR (dB) ↑	I_0°	35.14	35.15	34.34	35.87	35.17	25.43	38.07	34.47	38.93
	I_90°	35.55	35.35	34.40	35.13	33.67	25.83	37.73	34.36	38.05
	DoLP	22.28	22.33	24.31	26.47	22.03	17.18	23.77	22.81	27.24
	AoP	13.09	13.07	14.43	15.33	12.77	9.397	13.86	13.43	14.85
SSIM ↑	I_0°	0.951	0.951	0.951	0.963	0.951	0.762	0.971	0.944	0.979
	I_90°	0.958	0.955	0.959	0.965	0.938	0.802	0.972	0.945	0.978
	DoLP	0.445	0.442	0.624	0.727	0.426	0.250	0.535	0.518	0.746
	AoP	0.255	0.255	0.458	0.571	0.250	0.109	0.349	0.250	0.595
LPIPS ↓	I_0°	0.151	0.151	0.058	0.061	0.151	0.131	0.089	0.099	0.045
	I_45°	0.123	0.128	0.048	0.055	0.131	0.096	0.085	0.089	0.043
	I_90°	0.134	0.133	0.047	0.056	0.135	0.090	0.088	0.088	0.043
	I_135°	0.147	0.146	0.053	0.056	0.148	0.095	0.094	0.100	0.044

		ICCP	No CAGF	No NLSTR	Ours
PSNR (dB) ↑	I_0°	36.87	34.30	33.415	36.75
	I_90°	35.84	32.25	32.00	35.23
	DoLP	20.35	22.24	21.11	22.38
	AoP	9.40	11.20	10.29	10.83
SSIM ↑	I_0°	0.958	0.940	0.906	0.964
	I_90°	0.957	0.925	0.890	0.959
	DoLP	0.525	0.579	0.502	0.585
	AoP	0.360	0.336	0.400	0.407
LPIPS ↓	I_0°	0.076	0.104	0.163	0.056
	I_45°	0.076	0.101	0.170	0.059
	I_90°	0.072	0.098	0.169	0.043
	I_135°	0.077	0.105	0.165	0.056

EDSR	HAN	SwinIR1	SwinIR2	HAT	CPDNet
0.85	1.48	1.75	1.25	2.00	0.58
EARI	JCPD	ICCP	No CAGF	No NLSTR	Ours
0.13	2.06	331.25	683.50	2.32	702.52

	DT	EDSR	HAN	SwinIR1	SwinIR2	ICCP	NoCAGF	Ours
LR	0.8889	0.9697	0.9697	1.9600	1.4550	0.8687	1.0300	1.2930
SRM	1.7778	1.9394	1.9394	3.9200	2.9100	1.7374	2.0600	2.5860

		EDSR	HAN	SwinIR1^a	SwinIR2	HAT	CPD-Net	EARI	JCPD	Ours
Dataset in [15]
PSNR (dB) ↑	I_0°	32.79	32.77	30.88	32.03	32.78	25.02	35.33	36.05	36.51
	I_90°	32.75	33.63	30.78	32.26	33.36	26.57	35.20	35.95	35.14
	DoLP	18.21	18.42	18.93	20.68	18.32	15.77	20.16	21.01	21.41
	AoP	9.095	9.375	10.06	10.76	9.239	7.754	9.940	9.944	11.30
SSIM ↑	I_0°	0.915	0.915	0.9267	0.941	0.915	0.714	0.937	0.946	0.964
	I_90°	0.916	0.924	0.929	0.945	0.922	0.713	0.940	0.949	0.962
	DoLP	0.302	0.310	0.419	0.516	0.299	0.255	0.400	0.386	0.516
	AoP	0.141	0.157	0.211	0.319	0.146	0.123	0.217	0.181	0.327
LPIPS ↓	I_0°	0.168	0.168	0.092	0.082	0.169	0.115	0.111	0.079	0.047
	I_45°	0.169	0.171	0.093	0.085	0.172	0.094	0.119	0.079	0.052
	I_90°	0.166	0.163	0.088	0.080	0.166	0.088	0.120	0.077	0.047
	I_135°	0.175	0.170	0.090	0.086	0.171	0.095	0.120	0.085	0.051
Dataset in [40]
PSNR (dB) ↑	I_0°	35.14	35.15	34.34	35.87	35.17	25.43	38.07	34.47	38.93
	I_90°	35.55	35.35	34.40	35.13	33.67	25.83	37.73	34.36	38.05
	DoLP	22.28	22.33	24.31	26.47	22.03	17.18	23.77	22.81	27.24
	AoP	13.09	13.07	14.43	15.33	12.77	9.397	13.86	13.43	14.85
SSIM ↑	I_0°	0.951	0.951	0.951	0.963	0.951	0.762	0.971	0.944	0.979
	I_90°	0.958	0.955	0.959	0.965	0.938	0.802	0.972	0.945	0.978
	DoLP	0.445	0.442	0.624	0.727	0.426	0.250	0.535	0.518	0.746
	AoP	0.255	0.255	0.458	0.571	0.250	0.109	0.349	0.250	0.595
LPIPS ↓	I_0°	0.151	0.151	0.058	0.061	0.151	0.131	0.089	0.099	0.045
	I_45°	0.123	0.128	0.048	0.055	0.131	0.096	0.085	0.089	0.043
	I_90°	0.134	0.133	0.047	0.056	0.135	0.090	0.088	0.088	0.043
	I_135°	0.147	0.146	0.053	0.056	0.148	0.095	0.094	0.100	0.044

Joint constraints of guided filtering based confidence and nonlocal sparse tensor for color polarization super-resolution imaging

Abstract

1. Introduction

2. Background and system overview

2.1 Notation

2.2 Stokes vectors

2.3 Super resolution imaging

2.4 Observation model

3. Imaging methods

3.1 Color polarized image rearrangement

3.2 Confidence-guided SR via guided filtering

3.3 Nonlocal sparse tensor regularization

4. Optimization

5. Experimental results

5.1 Experimental settings

5.2 Results on synthetic data

5.3 Field test results

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (15)

Tables (4)

Equations (30)

Optics Express