An end-to-end fully-convolutional neural network for division of focal plane sensors to reconstruct S<sub>0</sub>, DoLP, and AoP

Xianglong Zeng; Yuan Luo; Xiaojing Zhao; Wenbin Ye

doi:10.1364/OE.27.008566

1. Introduction

Polarization, as one of the fundamental properties of light, describes the vibration direction of the photoelectric field. The polarization imaging is able to show the information of object shape, surface roughness and texture while avoiding the influence of reflected light. Thus it has been applied in material classification [1], military reconnaissance [2, 3], underwater imaging [4] and biomedical diagnosis [5].

There are several types of polarization imaging sensors, including division of time (DoT), division of amplitude (DoAM), division of aperture (DoAP) and division of focal plane polarimeters (DoFP) [6, 7]. Because of the advancement in nanofabrication as well as the advantage of temporal synchronization, DoFP sensor has gained more attention and been wildly adopted to get real-time polarization images. As shown in Fig. 1, the key component of DoFP sensor is the focal plane array (FPA), where every four micro-polarizers which respectively collect polarized light with orientations of 0 $^{\circ}$ , 45 $^{\circ}$ , 90 $^{\circ}$ and 135 $^{\circ}$ form a 2-by-2 super-pixel. These super-pixels with a periodical pattern are arranged on the focal plane, allowing the DoFP sensor to obtain polarization images of four orientations at the same time.

Fig. 1 The schematic diagram of an typical DoFP polarimeter.

Download Full Size | PDF

However, since each micro-polarizer in super-pixels can only collect the intensity information of its corresponding polarization orientation, the output image from DoFP sensor is sub-sampled by a factor of 4, leading to the loss in spatial resolution and further causing the inaccuracy of the following calculation of polarization parameters. Thus, in the past few years, many interpolation algorithms have been proposed to fill in those missing pixels, in other words, to work on the demosaicing task. Bilinear and bicubic interpolation algorithms [8] are the earliest methods applied for DoFP sensors. The former has low computational complexity and interpolation accuracy, while the latter one achieves smaller interpolation error in high contrast area but consumes much more computation. The gradient-based method [9] is then proposed to combine these two methods. In this method, the gradients of low resolution images of four orientations are calculated and then utilized to distinguish areas with edges and from those without edges and classify the edges into four directions. The bicubic interpolation is applied along edges’ directions for those areas with edges and bilinear interpolation is applied for those without edges. Similar idea is adopted in the intensity correlation-based method [10] and smoothness-based method [11]. Although all these three methods adopt adaptive interpolation algorithms based on the geometric information of images and make some progress in the interpolation accuracy, they essentially use artificially designed polynomial fitting formulas to estimate missing pixels, which means that they are inapplicable for images with complex structures and unable to get significant improvement in image resolution. The sparse representation-based method proposed in [12] is the first learning-based demosaicing method. An adaptive sub-dictionary is learned based on patches of different polarization orientations in order to obtained more optimal coding coefficients. Besides, the sparsity and non-local self-similarity priors are used as regularization terms to improve the interpolation results. This method achieves higher peak signal-to-noise ratio (PSNR) for reconstructed images compared with the previous methods.

Recently, with the rise of studies for deep learning, a convolutional neural network (CNN) model for polarization demosaicing called PDCNN is proposed in [13]. The mosaic polarization image is firstly divided into four channels and interpolated with the bicubic method and then fed into a CNN model which combines the architecture of U-Net [14] and skip connection. The model has 13 trainable convolutional layers, one pre-defined untrainable “Stokes block” to calculate the Stokes parameters and one “gradient block” to calculate the gradient of output images. The mean square error (MSE) between output images, Stokes parameters, gradient and their ground truths, are jointly utilized to form the custom loss function and guide the training of model. The PDCNN method shows state-of-the-art performance on PSNR but it still has some shortcomings. Its deep structure leads to high computational complexity. The bicubic interpolation imposed on the input images also introduces redundant information and extra computation. Moreover, since the “Stokes block” is pre-defined according to the design formulas and the outputs of the network are images of four orientations in essence, the formula calculation step is still not avoided for getting Stokes parameters, degree of linear polarization (DoLP) and angle of polarization (AoP), which inevitably brings cumulative error. In this sense, the PDCNN actually learns a deblurring method.

What’s more, all methods mentioned above only focus on reducing the interpolation error of intensity images of four polarization orientations (i.e.,I₀, I₄₅, I₉₀ and I₁₃₅). However, for a practical application of DoFP (e.g., polarized cameras), what researchers really care about are the outputs of polarization properties like the total intensity (i.e., S₀), DoLP and AoP. Thus, in this paper, rather than adopting the “interpolation - deblurring - parameters calculation” idea, we propose an end-to-end CNN model in true sense call Fork-NET, which is able to accept raw mosaic images as input, and directly output S₀, DoLP and AoP. The network has a straightforward architecture, which allows it to directly learn the mapping between mosaic images and polarization properties, avoiding the accumulation of errors caused by the stepwise method and ensuring an unified optimization scheme. Besides, a customized loss function is utilized to improve the visual quality of the output AoP images. what’ more, since our network has only four layers and fewer network parameters, its computational cost is much lower than that of PDCNN. Finally, the proposed method achieves highest PSNRs for S₀ and DoLP among several existing methods, and it produces AoP images with high quality as well.

The rest of the paper is organized as follows. The detailed architecture and the implementation method of the proposed neural network are described in Section 2. The experimental result and evaluation are presented in Section 3. The final section gives the conclusion and the discussion of our work.

2. The proposed method

Network architecture and training method are directly related to the function and the performance of a deep learning based method. Thus, in this section, the architecture of proposed Fork-Net is introduced first. Next, the details about how to train the network are described.

2.1. Network architecture

Inspired by the SRCNN [15] which has successfully applied in the single image super-resolution task, we design a four-layer fully-convolutional neural network with straightforward architecture (as shown in Fig. 2) to address the quality degradation of output images caused by the grid structure of DoFP polarimeters. The raw polarization image with a single channel is directly sent into the network as the input without being split into four channels or pre-interpolated with other methods, aiming to preserve original pixels position and make it easier for the network to learn the correlation between pixels with different polarization orientations while avoiding introducing redundant information. Then, the low-level features of input images are extracted in the following two convolutional layers. After that, the network is divided into three branches in order to learn non-linear mapping relations to transfer the low-level features into different high-level feature spaces, since our network finally produces three different outputs corresponding to three important polarization properties: S₀, DoLP and AoP. Note that the proposed network does not output the intensity images of four polarization orientations, mainly for the following two reasons. Firstly, intensity images of different orientations are merely intermediate products rather than the ultimate optimization objectives from the practical perspective. And secondly, the error is likely to further accumulate when the demosaiced intensity images are used to calculate the polarization properties based on conventional formulas. Thus, the proposed network is designed to directly output the three polarization properties. To summarize, we aim to utilize the proposed fully-convolutional network to replace the “interpolation - deblurring - parameters calculation” procedure and learn the non-linear mapping between the mosaic images and the polarization properties. In this way, the model can be optimized with a unified optimization scheme and the error accumulation caused by the stepwise procedure can be avoided.

Fig. 2 The architecture diagram of proposed Fork-Net. The transparent blocks in the middle actually correspond to feature maps from different layers.

Download Full Size | PDF

Besides, it can be seen from Fig. 2 that the convolutional kennel size for the ith layer is represented as $f_{i} x f_{i}$ . In our final scheme, $f_{1} = 5$ , $f_{2} = 3$ , $f_{3} = 3$ , $f_{4} = 5$ and filter numbers of each layer are set to be 96, 48, 32 and 1 respectively. Strides of convolution are all set to be 1. In addition, considering that the direct convolution operation will scale down the image which results in outputs with size smaller than that of input, feature maps in every layer, as well as the input images, are padded with zeros before convolution in order to fix the size (i.e., the convolution mode of every layer is set to be “same”). Since the proposed network is relatively shallow and can be trained without costing much time, batch-normalization is not adopted in our method. Rectified linear unit (ReLU) [16] activation function is adopted in every layer except for the last layer. The output layers of S $_{0}$ and DoLP don’t need any activation functions, but arctan is used as the activation function in the AoP output layer since the AoP is a value of angle.

2.2. Training method

The loss function is vital for the training of a neural network, since it reflects the optimization objective of a model. As mentioned above, the proposed network has three outputs, hence a loss function with customized terms corresponding to the outputs is used in our method. The expression of the complete customized loss function is shown as below:

\begin{matrix} L o s s = \frac{1}{N} \sum_{n = 1}^{N} [\frac{1}{W \cdot H} (λ_{1} {‖ {\hat{Y}}_{S_{0}}^{(n)} - Y_{S_{0}}^{(n)} ‖}_{1} + λ_{2} {‖ {\hat{Y}}_{D o L P}^{(n)} - Y_{D o L P}^{(n)} ‖}_{1} \\ + λ_{3} {‖ {\hat{Y}}_{A o P}^{(n)} - Y_{A o P}^{(n)} ‖}_{1}) - λ_{4} \log C] \end{matrix}

where

\cdot_{1}

represents the

l_{1}

norm; N is the number of input images; W and H are the width and height of the image;

{\hat{Y}}_{S_{0}}

,

{\hat{Y}}_{D o L P}

and

{\hat{Y}}_{A o P}

denote the prediction of S₀, DoLP and AoP output from the network,

Y_{S_{0}}

,

Y_{D o L P}

and

Y_{A o P}

are their ground truth (label) respectively; λ₁, λ₂, λ₃ and λ₄ are three weight coefficients of the loss function, which are experimentally set to be 0.1, 1, 0.05 and 0.02 respectively; C is a contrast consistency metric used in the structural similarity index (SSIM) [17] whose definition is shown below:

C = \frac{2 {\hat{σ}}_{A o P} σ_{A o P} + {(k \cdot M A X_{I})}^{2}}{{\hat{σ}}_{A o P}^{2} + σ_{A o P}^{2} + {(k \cdot M A X_{I})}^{2}}

where

{\hat{σ}}_{A o P}

and

σ_{A o P}

represent the variance of output AoP and corresponding ground truth respectively; k is a constant commonly set to be 0.03 and

M A X_{I}

denotes the maximum possible pixel value of images. The range of C is between 0 to 1, which describes the proximity of variances of two images.

Note that rather than MSE, the mean absolute error (MAE) is adopted in the loss function to measure the difference between the prediction result and the ground truth. To explain the reason, we should know that because of the characteristic of AoP imaging, serious noise is inevitably introduced into the AoP image even for its ground truth, which means that there will be more outliers appearing in the training images of AoP. Since these noisy images of AoP are still used to train the network, it puts forward a high demand on the robustness of our model to noise. However, the MSE loss (i.e., $l_{2}$ loss) is more sensitive to outliers due to the existence of the second power. If the $l_{2}$ norm is used in the loss function, the error caused by the outliers will be magnified. It will theoretically result in the decline in both of the stability of training and the overall performance of the model. In addition, the $l_{1}$ loss is easier to reach optimal minima in the training process and performs better on the image restoration tasks using PSNR as the quality evaluation criteria [18]. Thus, MAE loss (i.e., $l_{1}$ loss) is chosen in our work since it more directly reflects the original error and it is more robust to noise and outliers. (The network trained with $l_{2}$ loss will be further evaluated in the experiment section.)

The other problem is that, since the MAE only measures the average error of all pixels, if the usual $l_{1}$ loss function is merely used, the grayscale values in predicted AoP images will tend to be homogenized, causing a decrease in contrast. Thus, $- log C$ is utilized as a constraint term to correct the value of output AoP value during training. If the variance of the predicted AoP image is very different (basically, lower in this case) from that of ground truth, the constraint term will produce a large loss. On the contrary, if their variances are close, the loss will be smaller. Hence minimizing the constraint term will force the network to produce AoP images with higher contrast.

All the weights and bias in the proposed network are initialized with MSRA initialization method [19]. What’s more, Adam optimizer [20], which has been successfully applied to many deep learning tasks, is used to optimize our network, with an initial learning rate of 0.001. In order to be adapted to the MAE term and promote the loss function’s convergence property, learning rate exponentially decays every epoch with a rate of 0.988. The training lasts for 300 epochs and the model will be immediately saved if a lower loss is achieved at the end of every epoch. Finally, as data augmentation, the input images are flipped horizontally and vertically, and rotated with 0 $^{\circ}$ , 90 $^{\circ}$ , 180 $^{\circ}$ or 270 $^{\circ}$ randomly during training.

3. Experiment and evaluation

In this section, experimental setup and data acquisition are first introduced. The performance of Fork-Net is evaluated in the following part. The effectiveness of the proposed end-to-end network design is validated. Then, the experiment on different kernel sizes is performed. Finally, the quantitative result of Fork-Net and the comparison between the proposed method and other several methods are laid out.

Fig. 3 Schematic diagram of the polarization image acquisition experimental device.

Download Full Size | PDF

3.1. Data set

In order to train and evaluate the proposed neural network, a data set which contains 120 groups of polarization images corresponding to 120 different objects is built. Each group consists of four high resolution images with a size of 1280 x 960 pixels taken of four different polarization orientations. As shown in Fig. 3, all these images are taken by an 8-bit greyscale CCD camera fixed on an optical experiment platform, with a linear polarization filter in front of its lens. Rotating the polarization filter to four specific angles, i.e., 0 $^{\circ}$ , 45 $^{\circ}$ , 90 $^{\circ}$ and 135 $^{\circ}$ , images of corresponding polarization orientation can be collected. 100 groups of the images are used as training set and the rest 20 groups of images are equally split into validation and test set. For the purpose of obtaining the input mosaic images, pixels in the images of different polarization orientations are down-sampled and reconstituted into mosaic images according to the arrangement pattern shown in Fig. 1. Besides, the ground truth of the three outputs, i.e., S₀, DoLP, and AoP, are calculated by formulas below using high resolution images:

I n t e n s i t y = \frac{1}{2} \cdot (I_{0} + I_{45} + I_{90} + I_{135})

D o L P = \frac{\sqrt{{(I_{0} - I_{90})}^{2} + {(I_{45} - I_{135})}^{2}}}{I n t e n s i t y}

A o P = \frac{1}{2} \cdot arctan (\frac{I_{45} - I_{135}}{I_{0} - I_{90}})

where I_x represents the intensity image of orientation

x^{\circ}

,

x \in {0, 45, 90, 135}

. Since images with small size save more memory during computation and the local information is enough for the reconstruction task, in training and validation step, the mosaic images as well as the ground truth images (in array form) of S₀, DoLP and AoP are split into small patches with size of 40 x 40. There are 76890 patches in total used for training.

3.2. Evaluation

The main feature of the proposed method is the complete end-to-end network design, i.e., using the single channel raw mosaic image as input and directly producing three polarization properties. In order to demonstrate the effectiveness of the these two modifications, two networks (Net-A and Net-B) are built. Net-A accepts four channel polarization images interpolated through the bicubic method as inputs and outputs deblurred images of four polarization orientations. Net-B is a modified version based on Net-A; the only change is that the last two layers are replaced by the three-branch architecture in Fork-Net. Except for that, other configurations, e.g., the hyper parameters and training methods of Net-A and Net-B are the same as those of Fork-NET as described in Section 2. In addition, to explain why the $l_{1}$ norm, rather than the $l_{2}$ norm, is used in our customized loss function, we alter the first three terms in Eq. (1) by replacing the $l_{1}$ norm to $l_{2}$ norm and train an extra network call Net-C. And note that because of the reconstruction quality of the AoP images is usually much low than that of S₀ and DoLP, the second power in the $l_{2}$ norm will result in an increase in the loss from AoP term if the coefficients remain unchanged. It will drive the network to focus more on the optimization of AoP and lower the performance of the other two outputs. Thus, to make a fair comparison, the coefficients of the first three terms are retuned ( $λ_{1} = 0.1$ , $λ_{2} = 1$ , $λ_{3} = 0.035$ ) to ensure the AoP performance of Net-C is close to that of Fork-Net. To evaluate and compare the performance of different networks, the peak signal-to-noise ratio (PSNR), whose definition is shown below, is adopted as the quantitative criterion:

P S N R = 10 \cdot \underset{10}{\log} (\frac{M A X_{I}^{2}}{M S E})

where MSE denotes the mean square error between the predicted image and its ground truth.

Fig. 4 PSNRs vs training epochs curves of $S_{0}$ , DoLP and AoP for networks with different architectures. (a) Curves of S $_{0}$ . (b) Curves of DoLP. (c) Curves of AoP.

Download Full Size | PDF

Table 1. PSNRs of Networks with Different Architectures and Loss Functions on Test Set

View Table | View all tables in this article

Curves in Fig. 4 show the variation of PSNR with the increase of training epochs on the validation set for S₀, DoLP and AoP obtained from the different networks. Moreover, the best models are saved during training and the final PSNRs calculated on the test set for three networks are laid out in Table 1. From both the curves and the table, it can be observed that Net-A, which is with four-channel inputs and four-channel outputs, shows the worst overall performance on three properties. After applying the three-branch structure and direct properties output, Net-B achieves higher PSNRs than Net-A. One can also find that PSNRs are further improved if single channel input is adopted. The results prove that replacing the formula calculation process with the convolutional layers significantly reduces the reconstruction error of the image, and the pre-interpolation imposed on the mosaic images has a negative impact on network training. Finally, for Net-C, we can see that it shows a good performance, but its training process is relatively less stable, and its final PSNRs on both S₀ and DoLP are lower than that of Fork-Net. Note that although the performance of $l_{2}$ loss is not significantly lower than that of $l_{1}$ loss, we still find that models trained with $l_{1}$ loss are easier to produce better results during parameters tuning. It can be inferred that the network with the $l_{2}$ loss is relatively more challenging to train compared with that with $l_{1}$ loss. In summary, the proposed end-to-end architecture and the MAE loss used in our work are beneficial for the model performance.

Fig. 5 PSNRs vs training epochs curves of $S_{0}$ , DoLP and AoP for networks with different kernel sizes. (a) Curves of S $_{0}$ . (b) Curves of DoLP. (c) Curves of AoP.

Download Full Size | PDF

Table 2. PSNRs on Test Set and Parameter Numbers of Fork-Nets with Different Kernel Sizes

View Table | View all tables in this article

We also experiment with the effect of different convolution kernel sizes on the network. For many recent CNN models, the kernel sizes of 3x3 and 5x5 are more commonly used. Hence multiple networks are trained based on the convolution kernels with these two sizes and their performance is evaluated. Except for the kernel sizes, the other configurations of this network are the same as the description in Section 2. Figure 5 shows the PSNRs vs training epochs curves on the validation set for Fork-Nets with different kernel sizes. The number strings in the lower left corner of the charts represent the different combinations of kernel sizes. For example, “5-3-3-5” represents that $f_{1} = 5$ , $f_{2} = 3$ , $f_{3} = 3$ and $f_{4} = 5$ . It can be seen that there are more fluctuations in the chart of AoP and the effect of different kernel sizes on the PSNRs of AoP is not very clear. Thus we mainly focus on the performance on S₀ and DoLP in the following discussion. For both charts of S₀ and DoLP, the curves of “3-3-3-3” show the lowest level on PSNRs and the curves “5-5-5-5” achieve the best performance. The curves of “3-5-5-3” and “5-3-3-5” reach the medium level and their performance is almost identical for the PSNRs of S₀. The final PSNRs of the above four networks on the test set, as well as the trainable parameter numbers of networks are listed in the Table 2. Note that a larger number of the parameters means higher computational complexity and longer runtime. Combining the curves and the table above, it can be found that the performance of networks is correlated with the number of parameters to some extent. But the correlation is not linear. When the kernel size is changed from “3-3-3-3” to “5-3-3-5”, the PSNRs of the network are effectively improved while the number of parameters only increases a little. When the kernel size is changed from “5-3-3-5” to “5-5-5-5”, the number of parameters soars to three times the original value. However, the improvement of PSNRs is relatively slight. On the other hand, we find that setting the size of the first and the last kernel (i.e., f₁ and f₄) to be larger than that of the middle kernels (i.e., f₂ and f₃) is not a good choice, since compared to the performance of “5-3-3-5”, that of “3-5-5-3” has hardly been improved, even though “3-5-5-3” has a much larger number of parameters. We hypothesized that compared with the feature transformation process in the middle part of the network, the feature extraction on the input images and the final reconstruction stage require a larger kernel size to help the network collect information from larger neighborhoods and produce better results. Finally, we make a trade-off between the performance and the computational complexity and select the “5-3-3-5” as the best kernel size. It’s similar for the tuning of other hyper-parameters, i.e., the performance and the computational complexity of the network will be considered together to make a compromised selection.

Furthermore, the benefit of adding the variance constraint term to the loss function is validated. Figure 6 shows the AoP images of a circuit board in the form of both grayscale and heatmap. It can be seen that the grayscale image of ground truth has much noise, but the structure information still can be clearly observed in the heatmap. For the model purely using MAE loss function, the output AoP image is more smooth in grayscale, leading to low contrast on the heatmap. For the model using a loss function with variance constraint term, the contrast of the grayscale image is obviously improved and the quality of the heatmap is much higher. In addition, since the proposed Fork-Net is an end-to-end learning-based method, the noise in the original AoP images will be partly evaded during training, hence there is less noise in the output AoP images. But it also causes slight damage to the surface structure information of some regions as shown in the pink box.

Fig. 6 AoP images which indicates the improvement brought by variance constraint term. First row shows the grayscale images and the second row shows the heatmaps of AoP. (a)AoP images output from Fork-Net trained with common MAE loss function. (b) AoP images output from Fork-Net trained with the proposed customized loss function. (c) Original AoP images.

Download Full Size | PDF

In addition, as an example to help us understand what the proposed network learned after training, the feature maps of each layer alone the DoLP branch are extracted and shown in Fig. 7. One can see that features at different levels are extracted by different layers. For example, the first layer extracts more macroscopic features, such as different trends of the wires and contour of the chip, while the last layer extracts more information of the microscopic details, such as subtle unevenness on the wires and dust on the chip, which are crucial for getting a high PSNR. And note that the features from the first two layers are actually shared by the three outputs (i.e., S₀, DoLP and AoP).

Fig. 7 Feature maps output from layers alone the DoLP branch. (Due to the limited space, for the three layers in the middle, only four maps are selected to be shown as examples.)

Download Full Size | PDF

Next, the proposed Fork-Net is compared with several existing methods, including the bicubic method, intensity correlation-based method [10] and PDCNN [13]. The PDCNN model and the proposed model are trained with the same training set (described in Section 3.1) and the same maximum epoch number (i.e., 300 epochs). The other configurations of PDCNN are set according to the description in the original paper [13]. The average PSNRs on test set obtained by different methods are laid out in Table 3. It clearly shows that the proposed network achieves the best performance among different methods. On average, the PSNRs of three properties obtained by our method are 0.9 dB higher than that obtained by PDCNN which provides the second best result. Moreover, to evaluate the visual quality of images produced by different methods, grayscale images of S₀, DoLP and heatmaps of AoP with false color are shown in Fig. 8. One can see that the bicubic method produces blurriest S₀ and DoLP images. Intensity correlation-based method performs better on the reconstruction of S₀, but it causes loss to some edges on the DoLP image and brings intolerable damage to the AoP image. As a deep learning based method, PDCNN does show much better performance than the first two methods since it adopts a deep network structure and considers the correlation between input channels. Besides, the gradient terms used in its loss function indeed brings more sharpening image edges. But the details of the reconstructed AoP image is still unsatisfactory. Proposed Fork-Net also outputs S₀ and DoLP images with impressive visual quality which are competitive compared with that obtained by PDCNN. Most notably, our network yields AoP heatmaps with less noise and preserves details of the object in the image, for example, the embossed numbers in the circuit board are still clearly visible.

On the other hand, we realize that even if the highest PSNR of AoP is achieved by our method, the value is still low because of the serious noise in the original AoP images. Thus, in the following work, we will manage to decrease the noise during data acquisition and use clearer AoP ground truth to train the network. Hopefully, this will help the network to output AoP images with less noise and more accurate details.

Table 3. The PSNRs for Different Methods on Test Set

View Table | View all tables in this article

Fig. 8 S $_{0}$ , DoLP grayscale images and AoP heatmaps obtained through different methods. Images in the pink boxes show the magnification of local regions.

Download Full Size | PDF

Finally, the number of trainable parameters as well as the average runtime of PDCNN and Fork-Net are calculated and compared. Note that the runtime here refers to the time from inputting a single mosaic image to outputting S₀, DoLP and AoP. Besides, both of two networks are running on the same server with Intel Xeon E5-2640 CPU and an NVIDIA GTX1080TI graphics card. As a result, PDCNN has 568612 parameters and it spends 466.21 ms processing one image on average, while Fork-Net has only 87987 parameters(one ninth of that of PDCNN) and its average runtime is 154.85 ms. In other words, while achieving prominent performance, the proposed Fork-Net uses much fewer network parameters and runs with higher speed.

4. Conclusion

In this paper, we propose a four-layer fully-convolutional neural network for DoFP sensors to improve the quality of polarization imaging. It accepts the raw mosaic images as input and output reconstructed images ofS₀, DoLP and AoP, directly learning the non-linear mapping between mosaic images and polarization properties. A customized loss function with a variance constraint term is also used to guide the network training. The network is compared with existing methods, and it achieves prominent performance on both quantitative indicators and visual effects. Furthermore, the straightforward network architecture and relatively fewer parameters give the proposed Fork-Net higher potential to be integrated into hardware in the future work.

Funding

National Natural Science Foundation of China (61601301); Fundamental Research Foundation of Shenzhen (JCYJ20170302151123 005).

References

1. M. Sarkar, D. S. S. San Segundo Bello, C. van Hoof, and A. Theuwissen, “Integrated polarization analyzing cmos image sensor for material classification,” IEEE Sensors J. 11, 1692–1703 (2011). [CrossRef]

2. D. H. Goldstein, “Polarimetric characterization of federal standard paints,” in Polarization Analysis, Measurement, and Remote Sensing III, vol. 4133 (International Society for Optics and Photonics, 2000), pp. 112–124. [CrossRef]

3. Y. Aron and Y. Gronau, “Polarization in the lwir: a method to improve target aquisition,” in Infrared Technology and Applications XXXI, vol. 5783 (International Society for Optics and Photonics, 2005), pp. 653–662. [CrossRef]

4. B. Huang, T. Liu, H. Hu, J. Han, and M. Yu, “Underwater image recovery considering polarization effects of objects,” Opt. Express 24, 9826–9838 (2016). [CrossRef] [PubMed]

5. E. Salomatina-Motts, V. Neel, and A. Yaroslavskaya, “Multimodal polarization system for imaging skin cancer,” Opt. Spectrosc. 107, 884–890 (2009). [CrossRef]

6. J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw, “Review of passive imaging polarimetry for remote sensing applications,” Appl. Opt. 45, 5453–5469 (2006). [CrossRef] [PubMed]

7. R. Perkins and V. Gruev, “Signal-to-noise analysis of stokes parameters in division of focal plane polarimeters,” Opt. Express 18, 25815–25824 (2010). [CrossRef] [PubMed]

8. S. Gao and V. Gruev, “Bilinear and bicubic interpolation methods for division of focal plane polarimeters,” Opt. Express 19, 26161–26173 (2011). [CrossRef]

9. S. Gao and V. Gruev, “Gradient based interpolation for division of focal plane polarization imaging sensors,” in 2012 IEEE International Symposium on Circuits and Systems (ISCAS), (IEEE, 2012), pp. 1855–1858. [CrossRef]

10. J. Zhang, H. Luo, B. Hui, and Z. Chang, “Image interpolation for division of focal plane polarimeters with intensity correlation,” Opt. Express 24, 20799–20807 (2016). [CrossRef] [PubMed]

11. J. Zhang, W. Ye, A. Ahmed, Z. Qiu, Y. Cao, and X. Zhao, “A novel smoothness-based interpolation algorithm for division of focal plane polarimeters,” in 2017 IEEE International Symposium on Circuits and Systems (ISCAS), (IEEE, 2017), pp. 1–4.

12. J. Zhang, H. Luo, R. Liang, A. Ahmed, X. Zhang, B. Hui, and Z. Chang, “Sparse representation-based demosaicing method for microgrid polarimeter imagery,” Opt. Lett. 43, 3265–3268 (2018). [CrossRef] [PubMed]

13. J. Zhang, J. Shao, H. Luo, X. Zhang, B. Hui, Z. Chang, and R. Liang, “Learning a convolutional demosaicing network for microgrid polarimeter imagery,” Opt. Lett. 43, 4534–4537 (2018). [CrossRef] [PubMed]

14. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), (Springer International Publishing, Cham, 2015), pp. 234–241.

15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis Mach. Intell. 38, 295–307 (2016). [CrossRef]

16. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

17. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing 13, 600–612 (2004). [CrossRef] [PubMed]

18. H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for neural networks for image processing,” arXiv preprint arXiv:1511.08861 (2015).

19. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in The IEEE International Conference on Computer Vision (ICCV), (2015), pp. 1026–1034.

20. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

	Net-A	Net-B	Net-C	Fork-Net
S $_{0}$	42.6149	42.6045	43.5816	43.7225
DoLP	33.9165	34.7217	34.8174	35.0061
AoP	9.7793	10.7844	11.0300	11.0450

Kernel Size	3-3-3-3	5-3-3-5	3-5-5-3	5-5-5-5
S $_{0}$	43.6440	43.7225	43.7235	43.7865
DoLP	34.8734	35.0061	34.9930	35.0385
AoP	10.9421	11.0450	10.9521	10.9611
Parameters	84915	87987	232371	235443

	Bicubic	Correlation-based	PDCNN	Fork-Net
S $_{0}$	38.0604	42.1540	42.9584	43.7225
DoLP	31.7751	29.8021	34.5301	35.0061
AoP	9.3744	7.6640	9.8273	11.0450

	Net-A	Net-B	Net-C	Fork-Net
S $_{0}$	42.6149	42.6045	43.5816	43.7225
DoLP	33.9165	34.7217	34.8174	35.0061
AoP	9.7793	10.7844	11.0300	11.0450

Kernel Size	3-3-3-3	5-3-3-5	3-5-5-3	5-5-5-5
S $_{0}$	43.6440	43.7225	43.7235	43.7865
DoLP	34.8734	35.0061	34.9930	35.0385
AoP	10.9421	11.0450	10.9521	10.9611
Parameters	84915	87987	232371	235443

An end-to-end fully-convolutional neural network for division of focal plane sensors to reconstruct S₀, DoLP, and AoP

Abstract

1. Introduction

2. The proposed method

2.1. Network architecture

2.2. Training method

3. Experiment and evaluation

3.1. Data set

3.2. Evaluation

4. Conclusion

Funding

References

Cited By

Figures (8)

Tables (3)

Equations (6)

Optics Express