Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Conditional generative adversarial network demosaicing strategy for division of focal plane polarimeters

Open Access Open Access

Abstract

Division of focal plane (DoFP), or integrated microgrid polarimeters, typically consist of a 2 × 2 mosaic of linear polarization filters overlaid upon a focal plane array sensor and obtain temporally synchronized polarized intensity measurements across a scene, similar in concept to a Bayer color filter array camera. However, the resulting estimated polarimetric images suffer a loss in resolution and can be plagued by aliasing due to the spatially-modulated microgrid measurement strategy. Demosaicing strategies have been proposed that attempt to minimize these effects, but result in some level of residual artifacts. In this work we propose a conditional generative adversarial network (cGAN) approach to the microgrid demosaicing problem. We evaluate the performance of our approach against full-resolution division-of-time polarimeter data as well as compare against both traditional and recent microgrid demosaicing methods. We apply these demosaicing strategies to data from both real and simulated visible microgrid imagery and provide an objective criteria for evaluating their performance. We demonstrate that the proposed cGAN approach results in estimated Stokes imagery that is comparable to full-resolution ground truth imagery from both a quantitative and qualitative perspective.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Imaging polarimeters have been used in remote sensing applications for natural clutter suppression and target detection and tracking applications [1]. Polarimetric measurements are often uncorrelated with measurements of the magnitude and spectral content of an electromagnetic (EM) signal and can provide additional information about an imaged scene; however, unlike spatial and spectral signatures that are relatively stable with viewing geometry, polarimetric signatures are a rather-complex function of target material and the geometry of the illumination source, target, and sensor [2].

The goal of polarimetric imaging is to obtain estimates of the polarized Stokes vector at each point in the scene. The Stokes vector is a time-averaged parameterization of the polarization properties of the EM field and cannot be directly measured with optical sensors. Instead, polarization-sensitive optics must be introduced into the optical path in order to modulate the incoming light. A linear polarizer can be used for this purpose when only the linear polarization states are desired, as is often the case in passive remote sensing applications since circular polarization is rarely observed [2]. The linear Stokes vector is defined as

$$\mathbf{S} = \begin{bmatrix} s_0\\ s_1\\ s_2 \end{bmatrix} = \begin{bmatrix} I_0+I_{90}\\ I_0-I_{90}\\ I_{45}-I_{135} \end{bmatrix},$$
where $I_\theta$ represents the intensity measurement collected with the linear polarizer oriented at angle $\theta$, $s_0$ represents the unpolarized intensity measurement and $s_1$ and $s_2$ represent the difference in the indicated cross-polarized intensity components [2]. Other common measures of polarization are the degree of linear polarization (DoLP) and angle of polarization (AoP), defined as
$$\textrm{DoLP} = \sqrt{\left(\frac{s_1}{s_0}\right)^2+\left(\frac{s_2}{s_0}\right)^2}$$
and
$$\textrm{AoP} = \frac{1}{2}\tan^{{-}1}\left(\frac{s_2}{s_1}\right).$$

A number of measurement strategies can be employed to obtain the polarized intensity measurements [1]. Linear division-of-time (DoT) polarimeters obtain the intensity measurements by rotating a linear polarizer in the optical path to the desired angular orientations and collecting corresponding images across time. The Stokes vector can then be directly computed from these intensity measurements. While this strategy can yield full-resolution polarimetric images, it is sensitive to any changes in the scene throughout the acquisition process (i.e., motion, illumination changes, etc.) and thus is only effective for imaging static scenes under relatively stable illumination conditions. Division of aperture (DoA) systems use beam splitters to direct the optical path through various polarization optics simultaneously and allow for full-resolution, time-synchronous polarized intensity measurements to be obtained. However, such devices require multiple focal plane arrays and complex optical alignment that make these polarimeters cost prohibitive and bulky while suffering from their own set of image registration artifacts.

Division of focal plane (DoFP), or integrated microgrid imaging polarimeters, typically consist of a $2\times 2$ mosaic of modulated linear polarizers overlaid upon the focal plane array sensor and obtain temporally-synchronized polarized intensity measurements across a scene, similar in concept to a Bayer color filter array (CFA) camera. The comparison of a CFA and DoFP sensor is shown in Fig. 1. While DoFP sensors effectively solve the time sensitivity of DoT systems, the trade-off is a loss in spatial resolution and introduction of spatial aliasing in the resulting Stokes images [3]. The loss in resolution in DoFP devices is easily seen in Fig. 2 where each polarizer orientation is only obtained for one fourth of the image. Thus, in order to maintain the full-resolution image size, some form of interpolation must be performed, which is referred to as the demosaicing problem.

 figure: Fig. 1.

Fig. 1. A comparison of CFA and DoFP polarimeter focal plane array sensors. The lines in the image on the right represent the physical polarizer wire grids, where the angle of polarized light that is transmitted is perpendicular to the wire grid orientation.

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. The demosaicing process for DoFP polarimeters involves demodulation followed by interpolation.

Download Full Size | PDF

The demosaicing process for a DoFP polarimeter image is shown in Fig. 2. When simple demosaicing approaches are employed, registration becomes an issue since the polarized intensity channels are each sampled at a slightly shifted field of view and hence can result in misregistration artifacts that appear as false edges in the polarization images [4]. More sophisticated approaches have been proposed that exploit inherent redundancy in the microgrid measurements and the high spatial correlation among the intensity channels [5,6]. While some methods perform better than others at preserving spatial detail, nearly all methods suffer from aliasing. This is particularly problematic for highly detailed or unresolved targets.

The simplest, and consequently fastest, demosaicing strategy is the nearest like-polarization neighbor (NLPN) method [4]. Here the Stokes vector is estimated at each pixel location by using the nearest polarized intensity measurement based upon Euclidean distance. The estimated Stokes images are prone to high amounts of false edge artifacts and aliasing. Bilinear and bicubic interpolation approaches are presented in [4,7] that provide considerable reduction in false edge artifacts, but tend to result in smoothed Stokes images that exhibit significant aliasing. A gradient-based interpolation method was presented by Gao et al. [8] that attempts to identify edges in a scene and performs bicubic interpolation along detected edges, whereas bilinear interpolation is performed in the smooth areas of the image, with the goal of improving computational efficiency. In [9], correlation among intensity measurements of different orientation is first used to detect edges. Interpolation is then performed along these directed edges to minimize error by preventing interpolation across boundary regions. Ahmed et al. [10] described a residual bilinear interpolation method where a cost function is minimized to obtain an optimal estimate of the Stokes vector using guided linear filtering. A frequency-domain filtering approach was presented by Tyo et al. [3] that demonstrated the conditions under which alias-free reconstruction of the Stokes images is possible. A locally-adaptive approach by Ratliff et al. was presented in [5] that is based upon the edge preserving concepts of bilateral filtering. Furthermore, after interpolation, the technique uses physics constraints to exploit inherent redundancy in the polarized intensity measurements. This redundancy results in the following relationships that can be enforced among the interpolated intensity measurements:

$$\begin{aligned} &I_0 = I_{45}+I_{135}-I_{90}\ &I_{45} = I_{0}+I_{90}-I_{135},\\ &I_{90} = I_{45}+I_{135}-I_{0}\ &I_{135} = I_{0}+I_{90}-I_{45}. \end{aligned}$$

This technique is able to eliminate instantaneous-field-of-view (IFOV) artifacts and reduce noise while preserving edges, particularly for infrared DoFP polarimeter imagery. However, slight zippering and aliasing artifacts can be present in the reconstructed Stokes imagery.

More recently, several deep learning approaches have been presented based upon convolutional neural networks (CNNs) [1113]. In [11], Zhang et al. developed a convolutional neural network (CNN) architecture that estimates each full-resolution polarized intensity image. The approach first uses bicubic interpolation to upsample each microgrid polarized intensity channel to the full resolution intensity images. These interpolated images are then input into the CNN where the goal is to learn the best end-to-end mapping that minimizes a loss function that consists of error between the full-resolution ground truth polarized intensity images and the predicted Stokes images and corresponding gradient maps. Zeng et al. [13] proposed a CNN, known as ForkNet, that directly takes the microgrid image as input. The CNN then branches into three different networks where the $s_0$, DoLP, and AoP images are estimated directly, as opposed to estimating the polarized intensity images. The loss function in this case consists of minimizing error between ground truth and the estimated images in an L1-norm sense. A contrast consistency metric is also incorporated into the loss function that is used in the structural similarity index (SSIM). This method was shown to outperform other state-of-the-art strategies, including the CNN method in [11], in terms of peak signal to noise ratio (PSNR). However, this technique does not estimate the individual $s_1$ and $s_2$ images or the polarized intensity images.

In this work, we present a deep learning approach to DoFP demosaicing based upon a recently proposed conditional generative adversarial network (cGAN) architecture used for general image-to-image translation problems [14]. We use full-resolution polarized intensity images collected with a DoT polarimeter as ground truth data that we use to simulate corresponding microgrid images. This allows us to train our cGAN to transform an input microgrid image into polarized intensity images that are representative of the corresponding full-resolution ground truth images. We include the polarimetric redundancy relationships from Eq. (4) that arise from the over-determined nature of the polarized intensity measurements to ensure the generated data result in physically-realizable and accurate polarized intensity measurements. We find that the demosaiced images resulting from our technique do not suffer from IFOV artifacts, show a reduction in noise, and demonstrate a significant reduction in aliasing over traditional demosaicing strategies. We use meaningful metrics to evaluate the results of our strategy against other relevant techniques and demonstrate that the proposed cGAN approach results in estimated Stokes imagery that is comparable to full-resolution ground truth imagery.

The remainder of this paper is organized as follows. In Section 2 we discuss relevant concepts regarding generative adversarial network (GAN) and CNN architectures that we employ. We then present the architecture of our cGAN-based demosaicing strategy and describe our training methodology in Section 3. Our cGAN-based approach, along with other relevant demosaicing strategies from the literature, are applied to both real and simulated visible microgrid data in Section 4. A detailed performance evaluation and corresponding discussion are then presented. Finally, conclusions and avenues of future research are presented in Section 5.

2. Generative adversarial networks

The GAN architecture was first introduced by Goodfellow et al. in [15]. The concept is to train two models simultaneously: a generative model $G$ that captures a data distribution, and a discriminative model $D$ that assigns a probability that a sample is from training data rather than synthetic data produced by the generator [15]. This approach can be viewed as a minimax two-player game, where the goal of $G$ is to maximize the probability of $D$ misclassifying generated data as true training data. The general architecture for a GAN is shown in Fig. 3.

Following the derivation in [[15]], in order to learn the distribution $p_g$ of $G$ over data $x$, a prior input noise variable distribution $p_z$ is defined. The mapping to the data space is then represented as $G(z;\alpha _g )$, where $G$ is a differentiable function represented by a multilayer perceptron with parameters $\alpha _g$. The multilayer perceptron for $D$ is represented the same way as $D(x;\alpha _d )$, which outputs a single probability that indicates whether the data is real or generated. The goal for $D$ is to train the multilayer perceptron to maximize the probability of correctly separating training samples from generated samples. If $D(x)$ represents the probability that $x$ came from the training data rather than $p_g$, we can mathematically represent this two-player game as in [[15]] according to

$$\min_G \max_D V(D,G) = {\mathbb{E}}_{x \sim p_d(x)}\log(D(x))+{\mathbb{E}}_{z \sim p_z(z)}\log(1-D(G(z))).$$

Stated more simply, we want the discriminator to be best at distinguishing generated data from training data, and the generator to be best at deceiving the discriminator. For $G$, the only term that it can affect directly is $\log (1-D(G(z)))$. Therefore, minimizing Eq. (5) for $G$ is equivalent to minimizing $\log (1-D(G(z)))$. However, due to the vanishing gradient problem, maximizing $\log (D(G(z))$ is preferred to minimizing $\log (1-D(G(z)))$ [15,16].

 figure: Fig. 3.

Fig. 3. Traditional GAN architecture for generating images from noise.

Download Full Size | PDF

For $D$, the optimal discriminator for any given $G$ is

$$D_{G}(x) = \frac{p_d(x)}{p_d(x)+p_g(x)}.$$

It is easy to see from Eq. (6) that when $G$ perfectly recovers the training data, $D(x)=1/2$.

The traditional GAN architecture was thus designed for the purpose of generating meaningful images of a given class (based upon the training data) from a random input vector. For the purposes of this work, we would instead like to train a GAN to take a meaningful input image (i.e., a microgrid image), rather than random noise, that the GAN can learn to map to a meaningful set of output images, which in our case are the full-resolution polarized intensity images. This can be achieved with modifications to the traditional GAN architecture. To do so, we make use of a conditional GAN coupled with a U-Net architecture for our generator and a PatchGAN architecture for our discriminator.

2.1 Conditional GANs

Unlike traditional GANs, cGANs learn to map a random noise vector and a known image $x$ to an output image $y$. Noise is typically applied to the input image as either a random noise vector or in the form of dropout [17], which is applied on several layers of the generator $G$ at both training and testing time. The objective function of a cGAN has the same form as the traditional GAN from Eq. (5) and can be expressed as [14]

$$L_{cGAN}(G,D)={\mathbb{E}}_{x,y}[\log D(x,y)]+{\mathbb{E}}_{x,z}[1-\log D(x,G(x,z))].$$

The goal of the discriminator $D$ remains the same as before, where $D$ is trying to maximize Eq. (7). However, the generator cost function is modified such that

$$G_{min}=\min_G\max_DL_{cGAN}(G,D)+L_{L1}(G),$$
where
$$L_{L1}(G)={\mathbb{E}}_{x,y,z}[\left\lVert{y-G(x,z)}\right\rVert_1]$$
is an additional cost that encourages the generated image be as close to the ground truth image as possible in an L1-norm sense.

Thus, for a cGAN, generator training is conditioned based upon the set of input images $x$. The training data $y$ are the set of provided ground truth images. In our case $x$ is a given microgrid image and $y$ will be the corresponding desired full-resolution polarized intensity images.

2.2 Generator architecture

We use a U-Net for our generator architecture as opposed to a more traditional encoder-decoder network, both of which are illustrated in Fig. 4. The U-Net architecture incorporates skip connections between each layer in order to permit communication of each input layer to each corresponding output layer [14,18]. We chose the U-Net structure since, for a given microgrid image, we have actual measurements for $1/4$th of each polarized intensity image. The U-Net architecture thus allows for these measurements to be passed from the input layer directly to the output layer and thus preserve them in the final full-resolution estimates.

 figure: Fig. 4.

Fig. 4. Comparison of encoder-decoder network versus U-Net architectures.

Download Full Size | PDF

2.3 Discriminator architecture

A PatchGAN architecture [14] differs from a traditional GAN in the discriminator. The traditional GAN discriminator takes a full-size image as input and outputs a score in the range of $[0,1]$ based upon its determination of whether the input image is real or synthetic. Conversely, for a PatchGAN, the discriminator maps $N\times N$ patches of the input image to a set of corresponding output probabilities. The discriminator output from all patches is then averaged to provide the discriminator score. This strategy has the effect of encouraging high frequency content in generated images that may otherwise be discouraged when mapping the entire image due to the nature of the generator loss function. The implementation of a PatchGAN is achieved by progressive downsampling through convolutional layers and the patch size $N$ is determined by the number of convolutional layers $K$ and their corresponding kernel sizes $M$ and strides $S$. $N$ can be calculated according to the following recursive relationship by tracing backwards from the output layer, i.e.,

$$N_k = S_k(N_{k-1}-1)+M_k,$$
where $k=1,2,\ldots ,K$, $N_0=1$ indicates the $1\times 1$ output probability of the discriminator, $N=N_k$ is the calculated patch size, and $S_k$ and $M_k$ represent the $kth$ convolutional layer stride and kernel size, respectively. The patch size is thus fully determined by the network architecture and hence is independent of the input image size. We chose the PatchGAN architecture to ensure better enforcement of high frequency content and reduction of aliasing in the generated polarized intensity measurements.

3. Demosaicing strategy

Figure 5 illustrates the general cGAN architecture for our demosaicing strategy. The architectures for our generator and discriminator are shown in Fig. 6 where each convolutional layer is followed by a LeakyRelu activation function and batch normalization stage. The generator is based upon a U-Net architecture and the discriminator uses the PatchGAN architecture. We initially adopted the $256\times 256$ PatchGAN discriminator architecture proposed in the Pix2Pix image translation cGAN describe in [14]. We experimented with the discriminator architecture to test different input image sizes and ultimately found that a $64\times 64$ input image size yielded the best performance. Table 1 provides the parameters we ultimately used for each layer of the generator and discriminator architectures.

 figure: Fig. 5.

Fig. 5. Proposed cGAN architecture for generating demosaiced intensity images from an input microgrid image.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. The general architectures used by our generator and discriminator. The number of layers depends upon the input image size. The red boxes represent convolutional layers, the green boxes represent activation functions, and the blue boxes represent batch normalization stages. The bold black arrows indicate skip connections where the microgrid polarized intensity measurements are passed directly to each output layer.

Download Full Size | PDF

Tables Icon

Table 1. Generator and discriminator architecture for an input image size of $64\times 64$.

We next modify the cost function of Eq. (8) to incorporate additional loss terms to enforce polarization physics constraints and accuracy of the estimated Stokes images. Thus, we first create a loss making use of the redundancy relationships of Eq. (4) [5]. This redundancy loss is defined as

$$L_R = R_{I_0} + R_{I_{45}} + R_{I_{90}} + R_{I_{135}},$$
where, given an image of size $M \times N$,
$$\begin{array}{l} R_{I_0}=\frac{1}{MN}\sum^M_{i=1}\sum^N_{j=1}|(I_{0_{i,j}}-I_{45_{i,j}}-I_{135_{i,j}}+I_{90_{i,j}})| \\ R_{I_{45}}=\frac{1}{MN}\sum^M_{i=1}\sum^N_{j=1}|(I_{45_{i,j}}-I_{0_{i,j}}-I_{90_{i,j}}+I_{135_{i,j}})| \\ R_{I_{90}}=\frac{1}{MN}\sum^M_{i=1}\sum^N_{j=1}|(I_{90_{i,j}}-I_{45_{i,j}}-I_{135_{i,j}}+I_{0_{i,j}})| \\ R_{I_{135}}=\frac{1}{MN}\sum^M_{i=1}\sum^N_{j=1}|(I_{135_{i,j}}-I_{0_{i,j}}-I_{90_{i,j}}+I_{45_{i,j}})|. \end{array}$$

Thus, a loss will be introduced when the generated polarized intensity measurements do not obey the physics constraints defined in Eq. (4). We next define a loss term that measures the error between each generated Stokes image and the corresponding full-resolution ground truth images, i.e.,

$$\begin{aligned}L_{s_0} &= {\mathbb{E}}_{x,y,z}[\left\lVert{y_{s_0}-G_{s_0}(x,z)}\right\rVert_1] \\ L_{s_1} &= {\mathbb{E}}_{x,y,z}[\left\lVert{y_{s_1}-G_{s_1}(x,z)}\right\rVert_1] \\ L_{s_2} &= {\mathbb{E}}_{x,y,z}[\left\lVert{y_{s_2}-G_{s_2}(x,z)}\right\rVert_1]. \end{aligned}$$

We then incorporate these terms into the loss function of Eq. (8) to obtain

$$G_{min}=\min_G\max_D\lambda_1L_{cGAN}(G,D)+\lambda_2L_{L1}(G)+ \lambda_3L_R + \lambda_4L_{s_0} + \lambda_5L_{s_1} + \lambda_6L_{s_2},$$
where $\lambda _i$ are weighting parameters used to control the relative importance of each loss term. For all results in this paper we chose $\lambda _i = [1, 100, 10, 100, 100, 100]$, where we found that increasing the relative weight of the ground truth error terms encourages generation of more accurate polarized intensity images.

3.1 Training methodology

Our goal is to train the generator such that when a microgrid image is input, it will generate the four corresponding full-resolution intensity images ($I_0, I_{45}, I_{90},$ and $I_{135}$) and thus achieve demosaicing. We train our cGAN using two different datasets. The first we refer to as the Forknet dataset and was generously provided by the authors of [13]. This data consists of $120$ scenes collected with an 8-bit $960 \times 1280$ DoT polarimeter at orientation angles of $\theta =\{0,45,90,135\}$. $110$ of these full-resolution DoT images are used for training with the remaining $10$ used for testing purposes. The scenes were all collected indoors under similar illumination conditions and consist of various objects imaged at a single orientation at a close distance, with a fixed sensor geometry. The images are stored as 8-bit bitmaps and contain a noticeable amount of temporal noise.

We collected a second dataset using a 16-bit, $2048\times 2448$ FLIR Blackfly monochromatic, visible linear DoT polarimeter based upon the Sony IMX250 sensor, as well as a corresponding FLIR Blackfly visible microgrid polarimeter based upon the Sony IMX250MZR sensor. We refer to this as the FLIR dataset. Both cameras have identical Fujinon 12.5mm 2/3" C-mount lenses set to an aperture of f/8. The DoT system lens is augmented with a Tiffin 49CP 49mm polarizer. We experimentally determined that both polarimeters have similar transmission characteristics, but the DoT polarizer has worse extinction performance, resulting in the DoT system having a slightly lower extinction ratio than the DoFP system. We collected 24 DoT images (at polarizer orientation angles of $\theta =\{0,45,90,135\}$) of various static indoor and outdoor scenes that contain a variety of polarized objects and imaging geometries, where $17$ full-resolution DoT images are used for training and $7$ are used for testing purposes. For each DoT polarizer orientation, we collected 50 image frames and averaged them in time to reduce the effects of temporal noise. We additionally collected $15$ scenes with the microgrid sensor that corresponds to the same ambient conditions and field-of-view as the DoT scenes. This provides additional testing data from a real microgrid sensor that will allow for comparisons to full-resolution DoT data of the same scene. In all cases, care was taken to set the camera integration times to utilize the full dynamic range of the sensor while avoiding significant regions of pixel saturation.

Figure 7 shows example training images from the ForkNet and FLIR datasets. For each training image we show the $s_0$ and corresponding DoLP image to illustrate representative scene structure and polarization content. For both datasets, the full-resolution DoT images are used as the ground truth training images $y$. These data were then used to simulate microgrid images of the configuration depicted in Fig. 8 and hence are used as the corresponding generator input training images $x$.

 figure: Fig. 7.

Fig. 7. Example training images from the ForkNet and FLIR datasets. For each training image, the $s_0$ and DoLP image are shown to illustrate representative scene structure and polarization content. Each $s_0$ image is statistically scaled to two standard deviations and the DoLP images are scaled to the range of $[0,0.4]$.

Download Full Size | PDF

 figure: Fig. 8.

Fig. 8. Each full-resolution polarized intensity image is downsampled to simulate the desired microgrid image.

Download Full Size | PDF

To improve the robustness of our cGAN approach, we divide a given $M \times N$ full-resolution intensity image into sub-images of size of $64 \times 64$ that are non-overlapping for the ForkNet training dataset and overlapping by $16$ pixels for the FLIR training dataset. We perform sub-image creation for both training and testing purposes to allow our algorithm to accommodate images of arbitrary size. Moreover, application of the cGAN to each sub-image can result in border effects that are several pixels wide. This overlap allows for cropping of each generated sub-image to remove these border-effected pixels prior to stitching the full-resolution output image together. For the ForkNet training data, this results in $32,983$ training sub-images and for the FLIR dataset results in $47,770$ training sub-images. For outdoor scenes in the FLIR dataset, regions of vegetation and cloud cover often exhibited small amounts of movement from frame-to-frame and thus were a source of error in the DoT ground truth data. Furthermore, any detected regions with significant amounts of saturated pixels were also identified. We then attempted to mask these problematic regions in the ground truth images and ignore any corresponding sub-images for training purposes.

To gain better insight into the polarimetric diversity of each set of training data, each $64\times 64$ sub-image is first mapped to a single point within the unit circle, where the $x$-axis represents the median pixel value of $s_1/s_0$, and the $y$-axis represents the median pixel value of $s_2/s_0$. Each sub-image is then grouped into one of ten classes. Class 1 represents low-polarization sub-images with $(s_1/s_0)^2+(s_2/s_0)^2 <= P^2_L$, where $P_L$ is the specified low polarization threshold. Class 10 represents high polarization sub-images with $(s_1/s_0)^2+(s_2/s_0)^2 >= P^2_U$, where $P_U$ is a specified high-polarization threshold. The remaining $8$ classes are determined by dividing up the central annulus of the unit circle into equal-area regions that correspond to different regions of AoP. The top row of Fig. 9 displays these sub-image point-mappings for the ForkNet and FLIR datasets with $P_L = 0.1$ and $P_U = 0.4$. For the center row, a third axis was added that represents the median $s_0$ intensity value in the scene. In the bottom row, the third axis instead represents the median local binary pattern (LBP) [19] of $s_0$ as a representation of sub-image texture. Selected sub-images from each of the ten classes for the ForkNet data set are shown in Fig. 10 and for the FLIR data in Fig. 11.

 figure: Fig. 9.

Fig. 9. Classification of sub-image samples according to (top) median normalized Stokes value for the ForkNet (left column) and FLIR datasets (right column). A third axis was added that represents the sub-image median $s_0$ intensity value (center). The third axis instead represents the sub-image median $s_0$ LBP value (bottom).

Download Full Size | PDF

 figure: Fig. 10.

Fig. 10. Example ForkNet DoLP sub-images from Class 1 at left (low polarization) to the far right column representing Class 10 (high polarization). Each image is scaled to the range of $[0,1]$. A breakdown of the number and percentage of training samples in each class is shown at left.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. Example FLIR DoLP sub-images from Class 1 at left (low polarization) to the far right column representing Class 10 (high polarization). Each image is scaled to the range of $[0,1]$. A breakdown of the number and percentage of training samples in each class is shown at left.

Download Full Size | PDF

From these figures, we see that the ForkNet data contains very few sub-images with high polarization content (<0.05%) whereas over $90\%$ of the sub-images are classified as low-polarization. The FLIR dataset contains a significant number of high polarization sub-images (nearly 2%) with only $75\%$ being classified as low polarization. Hence, we observe that the FLIR dataset is more polarimetrically diverse. We also observe that the median intensity value of the sub-images is well-distrubted for both datasets, with the highest polarization sub-images from the FLIR dataset tending towards darker median intensity values. Finally, we see a greater diversity of textures represented by the FLIR dataset as compared with the ForkNet data in terms of median LBP.

4. Performance evaluation and discussion

In this section, we evaluate the quality of imagery generated by our cGAN demosaicing algorithm and compare the results to imagery obtained from several other microgrid demosaicing strategies. Care was taken to investigate a number of quantitative and qualitative performance metrics prior to our selection. We ultimately chose peak signal-to-noise ratio (PSNR), gradient magnitude similarity deviation (GMSD) [20], physics-based redundancy loss (PBRL) from Eq. (11), no-reference perception-based image quality evaluator (PIQUE) [21], image histogram comparisons, and subjective quality assessments of the output Stokes images. This range of evaluation metrics was selected due to the varied nature of the Stokes images, as we found that certain metrics are more appropriate depending upon the image being evaluated.

PSNR was selected due to its prominence in the literature for evaluating demosaicing strategies. While we do not believe it is the best metric for evaluating demosaicing performance it allows us to assess our algorithm against results of other reported approaches [11,13]. GMSD provides a quantitative measure of image distortion that correlates well with human image perception [20]. Note that a lower GMSD score indicates higher performance. We use the PBRL as a measure of how well generated data adheres to physics-based polarization constraints. PIQUE is a no-reference opinion-unaware image quality assessment method that attempts to quantify distortion without relying on ground truth data [21]. Both PBRL and PIQUE are useful for evaluating demosaicing performance for microgrid test images where no corresponding full-resolution ground truth image is available for comparison. It is important to note that we apply these metrics to each test image and report the average score across all images in a given testing set.

Our evaluation is performed for two different training scenarios based upon the ForkNet and FLIR datasets described in Section 3.1. In each scenario, we compare the results of our cGAN demosaicing strategy against the NLPN [4], locally-adaptive bilateral filtering [5], and ForkNet CNN [13] demosaicing approaches. Where possible, each technique is evaluated against the full-resolution DoT ground truth images. Figure 12 displays a ForkNet and FLIR $s_0$ test image with corresponding region of interest indicated in red that we use in the following evaluations. The ForkNet scene contains a spherical fruit and was selected for its relative simplicity. The FLIR scene was collected indoors and contains a number of high-detail objects with diverse polarization content and was selected due to its complexity relative to the ForkNet image.

 figure: Fig. 12.

Fig. 12. Example simulated microgrid test images from the (left) ForkNet and (right) FLIR datasets. The region of interest indicated in red are $500\times 500$ sub-images used in our visual performance evaluations.

Download Full Size | PDF

Before we describe each training scenario, it is worth noting that the authors of [13] provided us with source code for the ForkNet demosaicing strategy. This allowed us to train the ForkNet algorithm and our cGAN approach using the same training data and microgrid convention to provide a fair comparison of the methods. However, one inconsistency we noted in the ForkNet algorithm is in their computation of AoP, which is computed according to

$$AoP = \frac{1}{2}\tan^{{-}1}\left(\frac{s_2}{s_1}\right) + \frac{\pi}{4}. $$

Moreover, in their implementation, they use the $\tt {atan(s_2/s_1)}$ function as opposed to the $\tt {atan2(s_2,s_1)}$ function. As a result, the range of their AoP calculation is $[0,\frac {\pi }{2}]$, whereas our convention defined in Eq. (3) and based upon the $\tt {atan2()}$ function yields AoP values in the range of $[-\frac {\pi }{2},\frac {\pi }{2}]$. To facilitate a direct comparison of AoP imagery between the various demosaicing strategies, we updated the provided ForkNet source code to generate AoP imagery according to our convention.

4.1 Training scenario 1: ForkNet data

We first used the ForkNet dataset to train the the ForkNet CNN and our cGAN approach using the same training and testing images described in [13]. We trained each algorithm for 300 iterations with a minibatch size of 128. The test image at left in Fig. 12 was demosaiced using each technique and the corresponding Stokes images were computed along with DoLP and AoP. In the case of ForkNet, the CNN directly outputs the $s_0$, DoLP, and AoP images (the polarized intensity images and $s_1$ and $s_2$ are not generated and hence are not shown). Table 2 displays the mean GMSD, PSNR, PIQUE, and PBRL scores for each estimated $960\times 1280$ polarization image across all ten ForkNet testing images. Example output polarization images for a $500\times 500$ region of interest for the test image of Fig. 7 are displayed in Fig. 13, where each row of images is scaled to the same dynamic range. The $s_0$, $s_1$, and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$, and the AoP images are scaled to the range of $[-2,2]$. Figure 14 shows image histograms for the $s_0$, DoLP, and AoP images of Fig. 13. In all cases, the DoT image results are considered to be ground truth.

From Table 2, we see that the cGAN scores are best with the exception of ForkNet GMSD AoP and PSNR $s_0$, and DoT PIQUE $s_0$. It is notable that the cGAN PBRL score is lower than the DoT data, indicating that the cGAN is generating data that obeys the imposed redundancy constraints and hence seemingly overcoming sources of error within the ground truth images. For the output images of Fig. 13, the cGAN images appear visually similar to the ground truth DoT images, whereas the ForkNet DoLP and AoP images show notable differences. For the ForkNet DoLP image, the magnitudes are higher on the left side of the fruit and the darker patch of DoLP is missing in the top-left region. The AoP image is not representative of ground truth and it is unclear why the GMSD AoP score is best for ForkNet, but we suspect it is due to the significant reduction in noise in the background regions of the test images. The NLPN and Adaptive results have a similar appearance to the DoT data, however, both are noisier and the NLPN method exhibits IFOV artifacts and the Adaptive methods shows zippering artifacts along edges. These effects are more apparent in the zoomed $100\times 100$ image regions shown in Fig. 15.

 figure: Fig. 13.

Fig. 13. Estimated polarization images for the $500\times 500$ region of interest of the ForkNet image shown in Fig. 12 when training is performed using the ForkNet dataset. The $s_0$, $s_1$, and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$, and the AoP images are scaled to the range of $[-2,2]$.

Download Full Size | PDF

 figure: Fig. 14.

Fig. 14. Image histograms for the for the $s_0$, DoLP, and AoP images from Fig. 13 resulting from each demosaicing strategy.

Download Full Size | PDF

 figure: Fig. 15.

Fig. 15. Zoomed $100\times 100$ regions of the stem of the fruit from the images in Fig. 13. All images are scaled using the same parameters as in Fig. 13.

Download Full Size | PDF

Tables Icon

Table 2. Average evaluation scores for the ten ForkNet testing images when the ForkNet dataset is used for cGAN training.

For the histograms of Fig. 14, the $s_0$ results, much like the visual images, appear fairly equivalent across methods. For the DoLP histograms, we see similar DoLP distributions across techniques with the exception of ForkNet. Most notably we observe a peak in low polarization values as well as an increased distribution of polarization values across the range of [0.1,0.35]. For the AoP histograms, we see more variation across methods. The DoT image exhibits considerable noise, which is common for AoP imagery due to the nature of the computation, particularly for areas of low polarization. The NLPN AoP histogram also appears noisy, whereas the Adaptive and cGAN histograms show similar distributions with less noise. The ForkNet AoP histogram appears quite different from the other methods and indicates that the AoP values differ significantly from the DoT data.

4.2 Training scenario 2: FLIR data

We next trained the cGAN using the FLIR dataset described in Section 3.1 for 300 iterations with a minibatch size of 128. The test image at right in Fig. 12 was demosaiced using each technique and the corresponding polarimetric images were computed. Table 3 displays mean GMSD, PSNR, PIQUE, and PBRL results computed across the seven estimated $2048\times 2448$ FLIR dataset polarization test images. Output polarization images for the $500\times 500$ region of interest are displayed in Fig. 16, where each row of images is scaled to the same dynamic range. The $s_0$, $s_1$, and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$, and the AoP images are scaled to the range of $[-2,2]$. Figure 17 shows image histograms for the $s_0$, DoLP, and AoP images resulting from each technique.

The results presented in Table 3 show that the cGAN scores are best with the exception of ForkNet PSNR $s_0$ and DoT PIQUE $s_0$. For the output images of Fig. 16, the cGAN images again appear visually similar to the ground truth images, but with some slight differences. The ForkNet DoLP image demonstrates some minor differences against ground truth while a significant difference is once again observed for the AoP image. The NLPN and Adaptive results show severe aliasing and edge-related artifacts that can be more clearly observed in the zoomed $100\times 100$ image regions of the lawnmower wheel hub shown in Fig. 18. Despite the differences in the ForkNet images, more high frequency details appear to be present as compared with the cGAN method. Moreover, the cGAN method appears to show lower polarization magnitudes in the DoLP image. The cGAN AoP image is comparable and is less noisy than the DoT AoP image. The reduction of aliasing in both the ForkNet and cGAN results is apparent.

 figure: Fig. 16.

Fig. 16. Estimated polarization images for the $500\times 500$ region of interest of the FLIR image shown in Fig. 12 when cGAN training is performed using the FLIR dataset. The $s_0$, $s_1$, and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$, and the AoP images are scaled to the range of $[-2,2]$.

Download Full Size | PDF

 figure: Fig. 17.

Fig. 17. Image histograms for the for the $s_0$, DoLP, and AoP images from Fig. 16 resulting from each demosaicing strategy.

Download Full Size | PDF

 figure: Fig. 18.

Fig. 18. Zoomed $100\times 100$ regions of the lawnmower wheel hub from the images in Fig. 16. All images are scaled using the same parameters as in Fig. 16

Download Full Size | PDF

Tables Icon

Table 3. Average evaluation scores for the seven FLIR testing images when the FLIR dataset is used for cGAN training.

For the histograms of Fig. 17, we again observe consistency among the $s_0$ results. In the case of the DoLP histograms, we see that the cGAN results are comparable to ground truth, where the ForkNet DoLP histogram again exhibits a peak in low polarization values. The Adaptive and NLPN results show similar shape to the ground truth DoLP histogram; however, more values are observed for higher polarization values and is most likely due to the strong false edge and aliasing artifacts observed across the image. For the AoP histograms, we observe that the closest distribution is from the cGAN method, while the NLPN and Adaptive AoP histograms appear similar but more uniformly distributed. The ForkNet AoP histogram again shows a significant deviation from the ground truth AoP histogram.

4.3 Testing on real microgrid data

As part of the FLIR dataset, test images were collected of the same scenes as the DoT data using a microgrid polarimeter. We made best efforts to collect co-registered data between the DoT and microgrid imagery; however, error does exist due to alignment imperfections in the sensor mounts, minor illumination changes, and differing sensor noise characteristics and integration times. As such, we could not perform a pixel-to-pixel comparison against the corresponding ground truth data. Thus, we only apply the PIQUE and the PBRL metrics for quantitative scoring and use image histograms and subjective visual evaluation for comparison against the ground truth data. Table 4 provides the mean PIQUE and PBRL scores computed across thirteen microgrid test images and Fig. 19 provides a region of interest comparison between the demosaicing methods along with the ground truth images for the same scene collected with the DoT sensor. Corresponding histograms for $s_0$, DoLP, and AoP are shown in Fig. 20.

Table 4 shows that the cGAN scores are highest across the test set. Figure 19 displays the output polarization images for the same region of interest of the scene as shown in Fig. 16. The images in each case are similar to the results estimated from the corresponding simulated microgrid data and hence is encouraging to see that our simulated microgrid data is representative of real microgrid data. The zoomed image regions of the wheel hub are shown in Fig. 21. Here we observe that, although similar, we observe slight degradation of spatial detail and decrease in polarization magnitude as compared to the simulated microgrid images. The histograms of Fig. 20 are similar to those shown in Fig. 17. Overall, from these results we can conclude that basing our training methodology upon DoT ground truth and simulated microgrid imagery is a sound approach.

 figure: Fig. 19.

Fig. 19. Estimated polarization images for the $500\times 500$ region of interest of the FLIR microgird image corresponding to the scene shown in Fig. 12 when cGAN training is performed using the FLIR dataset. The $s_0$, $s_1$, and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$, and the AoP images are scaled to the range of $[-2,2]$.

Download Full Size | PDF

 figure: Fig. 20.

Fig. 20. Image histograms for the for the $s_0$, DoLP, and AoP images from Fig. 19 resulting from each demosaicing strategy.

Download Full Size | PDF

 figure: Fig. 21.

Fig. 21. Zoomed $100\times 100$ regions of the lawnmower wheel hub from the images in Fig. 19. All images are scaled using the same parameters as in Fig. 19

Download Full Size | PDF

Tables Icon

Table 4. Average evaluation scores for the thirteen FLIR microgrid testing images when the FLIR dataset is used for cGAN training.

4.4 Testing on ForkNet data with FLIR dataset training

Finally, we applied the cGAN algorithm after training with the FLIR dataset to the ForkNet test image set. Thus, the results remain the same for the other techniques as presented in Section 4.1, but are presented again for ease of comparison in Table 5 and Figs. 2223, and 24.

The cGAN results in Table 5 show scoring improvements for $s_1$, $s_2$ and AoP and a decrease in performance for $s_0$ and DoLP. As such, ForkNet now reports the best results across $s_0$ and DoLP. Examination of the cGAN images of Fig. 22 does show more deviation from ground truth versus the results obtained under ForkNet training in Section 4.1; however, the images are still of high quality. The zoomed images of Fig. 24 confirm this where the cGAN results are comparitively smoother, but also show a reduction in noise. This is most likely due to the fact that the FLIR dataset is considerably less noisy than the ForkNet dataset. Moreover, the 16-bit nature of the FLIR dataset as opposed to the 8-bit format of the ForkNet data may influence this as well. The cGAN histograms of Fig. 23 exhibit some minor changes when compared with the histograms from the previous scenario, which are indicated by the decrease in PBRL score. This testing scenario demonstrates that selection of appropriate training data is an important step in optimizing both robustness and performance of the cGAN demosaicing algorithm.

 figure: Fig. 22.

Fig. 22. Estimated polarization images for the $500\times 500$ region of interest of the ForkNet image shown in Fig. 12 when cGAN training is performed using the FLIR dataset. The $s_0$, $s_1$, and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$, and the AoP images are scaled to the range of $[-2,2]$.

Download Full Size | PDF

 figure: Fig. 23.

Fig. 23. Image histograms for the for the $s_0$, DoLP, and AoP images from Fig. 22 resulting from each demosaicing strategy.

Download Full Size | PDF

 figure: Fig. 24.

Fig. 24. Zoomed $100\times 100$ regions of the stem of the fruit from the images in Fig. 22. All images are scaled using the same parameters as in Fig. 22.

Download Full Size | PDF

Tables Icon

Table 5. Average evaluation scores for the ten ForkNet testing images when the FLIR dataset is used for cGAN training.

5. Conclusion

We presented an approach to demosaicing microgrid polarimeter images using the architecture of a cGAN. We based our generator on a U-Net architecture and our discriminator on a $64\times 64$ PatchGAN. We incorporate both physics-based constraints and Stokes image losses into the cost function to encourage physically realizable and accurate demosaiced imagery. We trained our cGAN approach using two different datasets: the ForkNet DoT dataset described in [13] as well our own dataset collected with both FLIR DoT (for training and testing) and microgrid (for training) polarimeters. We applied the cGAN approach to test images from each dataset and evaluated performance against DoT ground truth images as well as results obtained from both traditional approaches and the recent CNN-based ForkNet demosaicing strategy. In most cases we were able to demonstrate state-of-the-art performance based upon quantitative PSNR, GMSD, PIQUE, and PBRL scores. We also used visual comparisons of the estimated polarization images and corresponding histograms to perform subjective qualitative assessments. Both the cGAN and ForkNet approaches demonstrate the notable ability to overcome aliasing in the output Stokes images. While ForkNet showed quality high spatial frequency reconstruction in demosaiced imagery, it often deviated from ground truth imagery, particularly for AoP. Our cGAN approach produced imagery that was quantitatively and qualitatively similar to corresponding ground truth imagery across a number of training scenarios.

In this work we attempted to classify $64\times 64$ sub-images of each training dataset according to median polarization content, $s_0$ intensity level, and texture. Our initial goal was to use this classification methodology to guide our training process to ensure sufficient polarimetric diversity as well as avoid bias towards a specific polarization state (e.g., vertical, low-polarization, etc.). While we learned a great deal about the training process for our cGAN, this effort proved to be far more complex than we initially expected. In the future, we plan to pursue this guided training methodology to gain better insight into the training process with the goal of reducing the number of required training images and improving the overall robustness and accuracy of the demosaicing results. Furthermore, we trained our cGAN to output the full-resolution polarized intensity images. We would also like to explore training our cGAN to instead directly generate the Stokes images and other polarimetric image products for applications where the intensity images are not required.

Acknowledgments

We would like to thank the authors of the ForkNet CNN demosaicing algorithm [13] for providing source code and the corresponding training and testing data. This made the comparison to our algorithm easier, more accurate, and more insightful. We would also like to thank Prof. J. Scott Tyo and his students at the University of New South Wales for discussions and feedback regarding this work.

Disclosures

The authors declare no conflicts of interest.

References

1. J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw, “Review of passive imaging polarimetry for remote sensing applications,” Appl. Opt. 45(22), 5453–5469 (2006). [CrossRef]  

2. J. R. Schott, Fundamentals of polarimetric remote sensing, vol. 81 (SPIE, 2009).

3. J. S. Tyo, C. F. LaCasse, and B. M. Ratliff, “Total elimination of sampling errors in polarization imagery obtained with integrated microgrid polarimeters,” Opt. Lett. 34(20), 3187–3189 (2009). [CrossRef]  

4. B. M. Ratliff, C. F. LaCasse, and J. S. Tyo, “Interpolation strategies for reducing ifov artifacts in microgrid polarimeter imagery,” Opt. Express 17(11), 9112–9125 (2009). [CrossRef]  

5. B. M. Ratliff, C. F. LaCasse, and J. S. Tyo, “Adaptive strategy for demosaicing microgrid polarimeter imagery,” in 2011 Aerospace Conference, (IEEE, 2011), pp. 1–9.

6. S. Mihoubi, P.-J. Lapray, and L. Bigué, “Survey of demosaicking methods for polarization filter array images,” Sensors 18(11), 3688 (2018). [CrossRef]  

7. S. Gao and V. Gruev, “Bilinear and bicubic interpolation methods for division of focal plane polarimeters,” Opt. Express 19(27), 26161–26173 (2011). [CrossRef]  

8. S. Gao and V. Gruev, “Gradient-based interpolation method for division-of-focal-plane polarimeters,” Opt. Express 21(1), 1137–1151 (2013). [CrossRef]  

9. J. Zhang, H. Luo, B. Hui, and Z. Chang, “Image interpolation for division of focal plane polarimeters with intensity correlation,” Opt. Express 24(18), 20799–20807 (2016). [CrossRef]  

10. A. Ahmed, X. Zhao, V. Gruev, J. Zhang, and A. Bermak, “Residual interpolation for division of focal plane polarization image sensors,” Opt. Express 25(9), 10651–10662 (2017). [CrossRef]  

11. J. Zhang, J. Shao, H. Luo, X. Zhang, B. Hui, Z. Chang, and R. Liang, “Learning a convolutional demosaicing network for microgrid polarimeter imagery,” Opt. Lett. 43(18), 4534–4537 (2018). [CrossRef]  

12. S. Wen, Y. Zheng, F. Lu, and Q. Zhao, “Convolutional demosaicing network for joint chromatic and polarimetric imagery,” Opt. Lett. 44(22), 5646–5649 (2019). [CrossRef]  

13. X. Zeng, Y. Luo, X. Zhao, and W. Ye, “An end-to-end fully-convolutional neural network for division of focal plane sensors to reconstruct s 0, dolp, and aop,” Opt. Express 27(6), 8566–8577 (2019). [CrossRef]  

14. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), pp. 1125–1134.

15. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, (2014), pp. 2672–2680.

16. I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160 (2016).

17. “CS231n: Convolutional Neural Networks for Visual Recognition kernel description,” http://cs231n.stanford.edu/. Accessed: 2020-03-22.

18. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

19. T. Ahonen, A. Hadid, and M. Pietikäinen, “Face recognition with local binary patterns,” in European Conference on Computer Vision, (Springer, 2004), pp. 469–481.

20. W. Xue, L. Zhang, X. Mou, and A. C. Bovik, “Gradient magnitude similarity deviation: A highly efficient perceptual image quality index,” IEEE Trans. on Image Process. 23(2), 684–695 (2014). [CrossRef]  

21. N. Venkatanath, D. Praneeth, M. C. Bh, S. S. Channappayya, and S. S. Medasani, “Blind image quality evaluation using perception based features,” in 2015 Twenty First National Conference on Communications (NCC), (IEEE, 2015), pp. 1–6.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (24)

Fig. 1.
Fig. 1. A comparison of CFA and DoFP polarimeter focal plane array sensors. The lines in the image on the right represent the physical polarizer wire grids, where the angle of polarized light that is transmitted is perpendicular to the wire grid orientation.
Fig. 2.
Fig. 2. The demosaicing process for DoFP polarimeters involves demodulation followed by interpolation.
Fig. 3.
Fig. 3. Traditional GAN architecture for generating images from noise.
Fig. 4.
Fig. 4. Comparison of encoder-decoder network versus U-Net architectures.
Fig. 5.
Fig. 5. Proposed cGAN architecture for generating demosaiced intensity images from an input microgrid image.
Fig. 6.
Fig. 6. The general architectures used by our generator and discriminator. The number of layers depends upon the input image size. The red boxes represent convolutional layers, the green boxes represent activation functions, and the blue boxes represent batch normalization stages. The bold black arrows indicate skip connections where the microgrid polarized intensity measurements are passed directly to each output layer.
Fig. 7.
Fig. 7. Example training images from the ForkNet and FLIR datasets. For each training image, the $s_0$ and DoLP image are shown to illustrate representative scene structure and polarization content. Each $s_0$ image is statistically scaled to two standard deviations and the DoLP images are scaled to the range of $[0,0.4]$ .
Fig. 8.
Fig. 8. Each full-resolution polarized intensity image is downsampled to simulate the desired microgrid image.
Fig. 9.
Fig. 9. Classification of sub-image samples according to (top) median normalized Stokes value for the ForkNet (left column) and FLIR datasets (right column). A third axis was added that represents the sub-image median $s_0$ intensity value (center). The third axis instead represents the sub-image median $s_0$ LBP value (bottom).
Fig. 10.
Fig. 10. Example ForkNet DoLP sub-images from Class 1 at left (low polarization) to the far right column representing Class 10 (high polarization). Each image is scaled to the range of $[0,1]$ . A breakdown of the number and percentage of training samples in each class is shown at left.
Fig. 11.
Fig. 11. Example FLIR DoLP sub-images from Class 1 at left (low polarization) to the far right column representing Class 10 (high polarization). Each image is scaled to the range of $[0,1]$ . A breakdown of the number and percentage of training samples in each class is shown at left.
Fig. 12.
Fig. 12. Example simulated microgrid test images from the (left) ForkNet and (right) FLIR datasets. The region of interest indicated in red are $500\times 500$ sub-images used in our visual performance evaluations.
Fig. 13.
Fig. 13. Estimated polarization images for the $500\times 500$ region of interest of the ForkNet image shown in Fig. 12 when training is performed using the ForkNet dataset. The $s_0$ , $s_1$ , and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$ , and the AoP images are scaled to the range of $[-2,2]$ .
Fig. 14.
Fig. 14. Image histograms for the for the $s_0$ , DoLP, and AoP images from Fig. 13 resulting from each demosaicing strategy.
Fig. 15.
Fig. 15. Zoomed $100\times 100$ regions of the stem of the fruit from the images in Fig. 13. All images are scaled using the same parameters as in Fig. 13.
Fig. 16.
Fig. 16. Estimated polarization images for the $500\times 500$ region of interest of the FLIR image shown in Fig. 12 when cGAN training is performed using the FLIR dataset. The $s_0$ , $s_1$ , and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$ , and the AoP images are scaled to the range of $[-2,2]$ .
Fig. 17.
Fig. 17. Image histograms for the for the $s_0$ , DoLP, and AoP images from Fig. 16 resulting from each demosaicing strategy.
Fig. 18.
Fig. 18. Zoomed $100\times 100$ regions of the lawnmower wheel hub from the images in Fig. 16. All images are scaled using the same parameters as in Fig. 16
Fig. 19.
Fig. 19. Estimated polarization images for the $500\times 500$ region of interest of the FLIR microgird image corresponding to the scene shown in Fig. 12 when cGAN training is performed using the FLIR dataset. The $s_0$ , $s_1$ , and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$ , and the AoP images are scaled to the range of $[-2,2]$ .
Fig. 20.
Fig. 20. Image histograms for the for the $s_0$ , DoLP, and AoP images from Fig. 19 resulting from each demosaicing strategy.
Fig. 21.
Fig. 21. Zoomed $100\times 100$ regions of the lawnmower wheel hub from the images in Fig. 19. All images are scaled using the same parameters as in Fig. 19
Fig. 22.
Fig. 22. Estimated polarization images for the $500\times 500$ region of interest of the ForkNet image shown in Fig. 12 when cGAN training is performed using the FLIR dataset. The $s_0$ , $s_1$ , and $s_2$ images are statistically scaled within two standard deviations of the mean, the DoLP images are scaled to the range of $[0,0.25]$ , and the AoP images are scaled to the range of $[-2,2]$ .
Fig. 23.
Fig. 23. Image histograms for the for the $s_0$ , DoLP, and AoP images from Fig. 22 resulting from each demosaicing strategy.
Fig. 24.
Fig. 24. Zoomed $100\times 100$ regions of the stem of the fruit from the images in Fig. 22. All images are scaled using the same parameters as in Fig. 22.

Tables (5)

Tables Icon

Table 1. Generator and discriminator architecture for an input image size of 64 × 64 .

Tables Icon

Table 2. Average evaluation scores for the ten ForkNet testing images when the ForkNet dataset is used for cGAN training.

Tables Icon

Table 3. Average evaluation scores for the seven FLIR testing images when the FLIR dataset is used for cGAN training.

Tables Icon

Table 4. Average evaluation scores for the thirteen FLIR microgrid testing images when the FLIR dataset is used for cGAN training.

Tables Icon

Table 5. Average evaluation scores for the ten ForkNet testing images when the FLIR dataset is used for cGAN training.

Equations (15)

Equations on this page are rendered with MathJax. Learn more.

S = [ s 0 s 1 s 2 ] = [ I 0 + I 90 I 0 I 90 I 45 I 135 ] ,
DoLP = ( s 1 s 0 ) 2 + ( s 2 s 0 ) 2
AoP = 1 2 tan 1 ( s 2 s 1 ) .
I 0 = I 45 + I 135 I 90   I 45 = I 0 + I 90 I 135 , I 90 = I 45 + I 135 I 0   I 135 = I 0 + I 90 I 45 .
min G max D V ( D , G ) = E x p d ( x ) log ( D ( x ) ) + E z p z ( z ) log ( 1 D ( G ( z ) ) ) .
D G ( x ) = p d ( x ) p d ( x ) + p g ( x ) .
L c G A N ( G , D ) = E x , y [ log D ( x , y ) ] + E x , z [ 1 log D ( x , G ( x , z ) ) ] .
G m i n = min G max D L c G A N ( G , D ) + L L 1 ( G ) ,
L L 1 ( G ) = E x , y , z [ y G ( x , z ) 1 ]
N k = S k ( N k 1 1 ) + M k ,
L R = R I 0 + R I 45 + R I 90 + R I 135 ,
R I 0 = 1 M N i = 1 M j = 1 N | ( I 0 i , j I 45 i , j I 135 i , j + I 90 i , j ) | R I 45 = 1 M N i = 1 M j = 1 N | ( I 45 i , j I 0 i , j I 90 i , j + I 135 i , j ) | R I 90 = 1 M N i = 1 M j = 1 N | ( I 90 i , j I 45 i , j I 135 i , j + I 0 i , j ) | R I 135 = 1 M N i = 1 M j = 1 N | ( I 135 i , j I 0 i , j I 90 i , j + I 45 i , j ) | .
L s 0 = E x , y , z [ y s 0 G s 0 ( x , z ) 1 ] L s 1 = E x , y , z [ y s 1 G s 1 ( x , z ) 1 ] L s 2 = E x , y , z [ y s 2 G s 2 ( x , z ) 1 ] .
G m i n = min G max D λ 1 L c G A N ( G , D ) + λ 2 L L 1 ( G ) + λ 3 L R + λ 4 L s 0 + λ 5 L s 1 + λ 6 L s 2 ,
A o P = 1 2 tan 1 ( s 2 s 1 ) + π 4 .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.