SPNet: a size-variant progressive network for aero-optical thermal radiation effects correction

Yu Shi; Yu Shi; Jisong Chen; Jisong Chen; Yaozong Zhang; Yaozong Zhang; Zhenghua Huang; Zhenghua Huang; Hanyu Hong; Hanyu Hong; Hanyu Hong

doi:10.1364/OE.506923

1. Introduction

When an aircraft is flying at a high speed in the atmosphere, the airflow is compressed upon reaching the optical cover of the detector. Kinetic energy of the incoming airflow is converted into thermal energy. As a result, temperature of the airflow surrounding the cover increases, causing the cover's surface to heat up. This convective heat transfer is called aerodynamic heating. Thermal radiation resulting from aerodynamic heating is a source of interference for infrared imaging detection system. It not only affects the strength of the cover material but also hampers the effectiveness of the imaging detection system. It can lead to sensor saturation in specific wavelength bands. Temperature variations and thermal radiation have a substantial impact on the infrared imaging and detection system's performance, including image quality and target recognition [1,2].

Thermal radiation effects are linearly correlated with the temperature of the infrared detection window. Early research on correcting the thermal radiation effects primarily focused on the selection of materials, which can be divided into three categories: 1) single crystal, such as Si, Ge, Sapphire and Quartz. 2) polycrystalline, such as ZnS, ZnSe and MgO. 3) Glasses, such as Calcium aluminate, Germanate glass, Fluoride glass and Chalcogenide glasses [3]. While these materials can effectively reduce the thermal radiation effects due to their superior heat and radiation resistance, they still cannot avoid thermal radiation at extremely high temperatures and cannot fully eliminate the occurrence of the thermal radiation effects.

Window cooling technologies have been proposed to reduce the thermal radiation effects of windows by designing an cooling system [4,5]. The cooling system provides window temperature control through film cooling using supersonic gas or liquid droplet coolants ejected over the surface of each window. Except for the cooling technologies, optical filtering and optimized design of the optical system are also effective ways to reduce thermal radiation effects [6]. Although these methods can effectively reduce the thermal radiation effects, a better and more cost-effective way is to directly correct the degraded image.

In the case where the optical cover is hemispherical and the imaging optical window is directly facing the incoming flow, considering the axial symmetry, the thickness and material in every direction of the cover is the same. Thus, the thermal radiation around the surface is evenly distributed in the direction of the circumferential angle, as shown in Fig. 1. In this paper, we mathematically formulate the degradation is:

(1) $$Y = X + B + N$$

where Y is the degraded image, X is the clear image, B is the thermal radiation bias field, and N is system noise. We aim to obtain a clear image X from the degraded image Y.

Fig. 1. The distribution of the temperature in the hemispherical window and the temperature field of the air, along with an illustration of the degradation model. (a) Temperature distribution of the window when the hemispherical window is heated for 4s. (b) Temperature field of the air when the aircraft flies for 4 seconds. (c) Temperature field of the air when the aircraft flies for 16 seconds. (d) Clear image. (e) Thermal radiation bias field. (f) Degraded image.

Download Full Size | PDF

Image-processing-based correction of thermal radiation effects in aero-optics can effectively reduce economic costs and can better correct thermal radiation effects. Optimization-based methods are widely used in thermal radiation effects correction [7,8,9]. They involve applying various constraints to the degradation model and utilizing different natural image priors to regularize the solution. Cao and Tisse [10] locally fitted the derivatives of the correction model to the gradient components, and used a modified bilateral filter for refinement. Hong [11] established a progressive thermal radiation effects correction model to estimate the thermal radiation bias field. Li and Xu [12] used a structure prior weighted L₂-norm regularizer to constrain the bias field and an L_p regularized term to accurately restore the latent image. These optimization-based methods are relatively time-consuming when dealing with large-size images. Aero-optical thermal radiation effects correction algorithm must meet the requirements of speed and real-time. Although Shi [13] employed a multi-scale strategy and vector representation to effectively reduce the running time of the model, it still took several seconds to process an image, which did not meet the real-time requirements for practical application.

Recently, deep learning [14,15] has been successfully applied to various computer vision tasks. In image restoration, deep learning is also applied to image defogging, image deraining, image deblurring, and other tasks [16]. More complex network architectures are proposed, such as regional convolutional neural networks [17], fully convolutional neural networks [18], and generative adversarial networks [19]. Deep learning has achieved significant performance gains in related image restoration tasks. However, few scholars have investigated the use of deep learning for thermal radiation effects correction. Only Cang [20] proposed a deep convolutional neural network for aero-optical thermal radiation effects correction. This prompts us to propose a fast thermal radiation effects correction network based on deep learning.

When a degraded image of the original size is used directly as input to the network, the network can recover an image with more accurate details [21]. However, as the size of the image increases, the computational complexity of the network also grows. To address this problem, we thoroughly analyze the characteristic of the thermal radiation bias field and propose a size-variant progressive network. The degree of thermal radiation degradation is reflected in the intensity distribution of the thermal radiation bias field, and the intensity is only related to the pixel value of and the percentage of the bias field $\rho $. Resizing an image does not change the distribution of pixel values of the original image. When we scale the degraded image up or down by a factor of t, both the area of the degraded image and the thermal radiation bias field are multiplied by a factor of ${t^2}$. At this time:

(2)$${\rho _1} = \frac{{{t^2} \times {S_B}}}{{{t^2} \times {S_Y}}} = \frac{{{S_B}}}{{{S_Y}}} = \rho $$

where ${S_B}$ and ${S_Y}$ are the areas of the thermal radiation bias field and the degraded image, respectively, ${\rho _1}$ is the percentage of the bias field. That is, performing a size variation on the image does not change the percentage of the bias field within the image. In summary, performing size variation on the image does not change the intensity of the thermal radiation bias field, as shown in Fig. 2. That is, the degrade level of the thermal radiation bias field remains consistent regardless of image size.

Fig. 2. Degree of thermal radiation degradation versus image size. (a) Thermal radiation degraded image and its corresponding bias field. (b) Scale down images of (a). (c) Three-dimensional visualization of (a). (d) Three-dimensional visualization of (b).

Download Full Size | PDF

In this work, based on the characteristic that the degrade level of the thermal radiation bias filed remains consistent regardless of image size, we propose a size-variant progressive network (SPNet). The main contributions of this work are summarized below:

• We propose a size-variant progressive network (SPNet), which obtains a corrected image by progressively estimating more accurate thermal radiation bias fields through two sub-networks. SPNet can effectively reduce the computational complexity and running time.
• To minimize the loss of information when the feature map is upsampled between two sub-networks, we propose an upsampling module called multi-scale feature upsampling module (MFUM), which can extract features from larger receptive fields while preserving the multi-scale characteristic of the information.
• An adaptive feature fusion module (AFFM) is proposed to dynamically fuse the useful features of two sub-networks with different sizes. The module fuses the features of different receptive fields while retains their unique complementary features.
• Considering the global smoothness characteristic of the thermal radiation bias field, we propose the multi-head self-attention feature extraction module (MSFEM) to extract global features. The multi-head self-attention is utilized to extract global features while a gated feed-forward is used to control the information flow and allow deeper propagation of useful information.

2. Related work

2.1 Attention mechanism

In the past few years, attention mechanisms [22] have been used in many computer vision tasks such as target detection [23] and image segmentation [24]. Attention mechanisms can be divided into global attention and local attention. Local attention focuses on a specific local area of the input image. SENet [25] presents a Squeeze-and-Excitation (SE) attention to enhance the representation ability of the model at the feature channel level. CBAM [26] sequentially applies channel and spatial attention modules to efficiently help the information flow within the network by learning which information to emphasize or suppress. In recent years, Transformers [27,28] have also widely adapted in numerous computer vision tasks, such as image recognition [29], image super-resolution [30], and super-resolution [31]. Transformer-based models are proficient at extracting global image representations, relying solely on image-level self-attention can lead to the loss of local fine-grained details. Therefore, effectively combining global information and local features is essential for high-quality correction of thermal radiation effects.

2.2 Feature upsampling

Upsampling refers to increasing the resolution of an image or feature. The existing upsampling strategies can be divided into two categories: 1) interpolation based. 2) deep-learning based. Interpolation based strategies include nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation. Bilinear interpolation uses the intensity values of the four surrounding neighboring points to get the intensity value of the sampling point. Bicubic interpolation considers not only the intensity value of the four neighboring points but also the rate of change between them, and it is an improved algorithm of bilinear interpolation. These local interpolation strategies may not accurately capture subtle changes, which will lead to the loss of high-frequency information like edges and textures [32,33].

Pixel-shuffle [34] increases the resolution of an image by rearranging pixels. Specifically, it reorganizes the pixels in a low-resolution image block into a higher-resolution image block to achieve upsampling on the image. By using pixel rearrangement, pixel-shuffle can increase the resolution of an image without introducing additional parameters. Recently, the upsampling of images using pixel-shuffle has been widely used in deep learning.

3. Network

3.1 Overall network structure

In this section, we will introduce the structure of SPNet, which is illustrated in Fig. 3. SPNet consists of two sub-networks: sub-network S1 and sub-network S2. Sub-network S1 is responsible for learning the coarse thermal radiation bias field and is designed for processing small-size image. Sub-network S2, on the other hand, focuses on learning the accurate thermal radiation bias field and is designed for processing original-size image. In both sub-networks S1 and S2, there are different stages denoted as S1_i and S2_i (i = 1,2,3) respectively. Each stage is composed of a UNet architecture with three-level encodes and decodes, where each level encoder or decode consists of two standard resblocks.

Fig. 3. Architecture of the proposed SPNet for thermal radiation effects correction. SPNet consists of two sub-networks: sub-network S1 and sub-network S2, where S1 consists of S1_1,S1_2,S1_3, and S2 consists of S2_1,S2_2,S2_3.

Download Full Size | PDF

Our SPNet takes advantage of the characteristic that the degrade level of the thermal radiation bias field remains consistent regardless of image size. We employ a residual learning strategy in which we predict a more accurate thermal radiation bias field B. We then subtract this bias field B from the degraded image Y to obtain the output clear image X = Y - B. It is worth mentioning that we obtain an output at each stage of each sub-network. Take the S1 sub-network as an example, given a degraded image Y of size H × W, and downsample it to obtain the input Y↓ of size H/2×W/2. Each stage of the S1 sub-network produces an output and then subtracted from Y↓ to get the latent clear image of each stage. The same process is followed for the S2 sub-network. It can be observed from the bottom of Fig. 3 that as the stages progress deeper, the estimated thermal radiation bias field becomes closer to the ground truth thermal radiation bias field, and the correction result becomes more closer to the ground truth clear image. We construct a smaller version of SPNet called SPNet-tiny for subsequent experimental comparisons, which contains only three stages, S1_1, S1_2, S2_1. Additionally, we also construct a single-size version of SPNet-tiny called SPNet-tiny-single to verify the effectiveness of the size-variant strategy, which has only one sub-network S2 with three stage S2_1, S2_2, S2_3. All other conditions and configurations remain the same as in SPNet-tiny.

3.2 Multi-scale feature upsampling module (MFUM)

SPNet directly upsamples the output of the S1 sub-network and uses it as the input to the S2 sub-network. To achieve more efficient progressive correction, it is important to preserve the features as much as possible. A better upsampling strategy is a key factor in improving the network's performance. We design a multi-scale feature upsampling module based on pixel-shuffle, called multi-scale feature upsampling module, as shown in Fig. 4.

Fig. 4. Architecture of multi-scale feature upsampling module(MFUM). SE denotes squeeze and excitation attention module.

Download Full Size | PDF

Specifically, we extract the spatial information at different scales of the input features using a multi-branch strategy. Given an input feature $I \in {{\mathbb R}^{H \times W \times C}}$, we use different sizes of convolution kernels on each branch in a parallel manner to obtain the spatial information of different scales of the input features by using group convolution [35]. The kernel sizes K and the group G for the group convolutions on the four branches are 3, 5, 7, 9, and 1, 4, 5, 10, respectively. Channel-shuffle [36] is introduced to enhance information interaction between different groups. This facilitates the interaction of information between different groups. We perform pixel-shuffle for upsampling in each branch, resulting in upsampling results for each branch:

(3)$$\begin{array}{{cc}} {{{\hat{F}}_i} = GConv({K_i},{G_i})(I)}&{i = 0,1,2,3} \end{array}$$

where GConv denotes group convolution, K and G denotes corresponding kernel and group, respectively. Next, we concatenate the upsampled information from each branch along the channel dimension to obtain multi-scale upsampling information:

(4)$$\hat{F} = Cat([{\hat{F}_0},{\hat{F}_1},{\hat{F}_{2,}}{\hat{F}_3}])$$

Finally, we employ an SE attention module [25] to obtain an attention map, and channel-shuffle is used to enhance the inter-channel information interaction to obtain the final upsampled features $F \in {{\mathbb R}^{2H \times 2W \times C}}$:

(5)$$F = S(S(\hat{F}) \otimes (SE(\hat{F})))$$

where S denotes channel-shuffle and ${\otimes}$ denotes element-wise multiplication. The upsampled features obtained using the multi-scale feature upsampling module (MFUM) take into account both spatial and channel factors and also contain the multi-scale information of the image. It ensures that the upsampled features retain as much details of the original image or feature as possible.

3.3 Adaptive feature fusion module (AFFM)

Feature fusion commonly involve simple concatenation or summation operations. However, in both concatenation and summation, the features are typically fused in a 1:1 ratio, these choices limit the express power of features, as reported in [37]. Features produce from sub-network S1 contain more detailed information, but they have less contextual information. Features produce from sub-network S2 have more contextual information, but they have poor details. By adaptively assigning different weights to multi-size features and aggregate them can effectively fuse more useful features and improve the efficiency of feature fusion. Thus, we introduce a nonlinear module for fusing features coming from multiple size using a self-attention mechanism. We call it adaptive feature fusion module (AFFM), as shown in Fig. 5.

Fig. 5. Architecture of adaptive feature fusion module (AFFM). Features L1 and L2 contain detailed information and contextual information respectively.

Download Full Size | PDF

AFFM adaptively fuses and re-weights the output features of two sub-networks S1 and S2 to efficiently utilize the useful features at different sizes. First, the element-wise summation of the two features, L1 and L2, is calculated:

(6) $$L = L1 + L2$$

After obtaining the fused feature representation L from the element-wise summation, a global average pooling (GAP) operation is performed on L. Next, we apply a descending-channel 1*1 convolution to obtain a compact feature representation $c \in {{\mathbb R}^{1 \times 1 \times c/4}}$. To further enhance the representation capability, a multi-layer perceptron (MLP) is used to map the features to a higher dimensional feature space to obtain ${c_1}$:

(7)$${c_1} = MLP({conv({GAP(L )} )} )$$

Finally, we use two parallel 1*1 convolutions with softmax activation to obtain the corresponding attention activations $W1 \in {{\mathbb R}^{1 \times 1 \times c}}$ and $W2 \in {{\mathbf {\mathbb R}}^{1 \times 1 \times c}}$

(8)$$W1,W2 = soft\max (conv({c_1}))$$

We then separately recalibrate the two features and sum them to get the adaptive fused feature map U:

(9)$$U = L1 \otimes W1 + L2 \otimes W2$$

3.4 Multi-head self-attention feature extraction module (MSFEM)

Thermal radiation bias fields have another important characteristic: globally smooth and locally non-uniform. we designed the MSFEM module based on multi-head self-attention (MSA) [29] to utilize the global smoothing characteristics of thermal radiation, which can make network learn the feature with more global feature, as shown in Fig. 6. Under this consideration, network can learn feature with more information to better correct the images with thermal radiation effects.

Fig. 6. Architecture of (a) Multi-head self-attention feature extraction module (MSFEM). (b) Gated feed-forward module (GFM). (c) Multi-head self-attention module (MSA).

Download Full Size | PDF

Firstly, the features are normalized using layer norm, and MSA is used to calculate weights, allowing for the introduction of global information and capturing long-range dependencies. This helps to obtain feature representations that contain more useful information. Then, a gated feed-forward module (GFM) is used to control the information flow and allow deeper propagation of useful information. Given a feature $M \in {{\mathbb R}^{h \times w \times c}}$, MSFEM can be expressed as:

(10) $$MSFEM(M) = MSA(LN(M)) + M) + GFM(LN(MSA(LN(M)) + M)$$

where LN is layer normalization. The GFM contains two parallel paths. Given a feature $G \in {{\mathbb R}^{h \times w \times c}}$, using a 1 × 1 convolution to increase the channel dimension. Then, we apply depth-wise convolution to encode information from spatially neighboring pixels. We implement gating by the element-wise product of two parallel paths in a linear transformation layer. One of them is activated by the GELU non-linear function [38]. Finally, we reduce the channel dimension back to the original input dimension using a 1 × 1 convolution layer. GFM is formulated as:

(11)$$GFM(G) = W_c^2(\phi (W_c^1{W_d}(G)) \otimes (W_c^1{W_d}(G)))$$

where $\phi $ denotes the GELU activation function, $W_c^1$, $W_c^2$ denote the first 1 × 1 convolution used to raise and lower the channel respectively, ${W_d}$ denotes the depth-wise convolution, and ${\otimes}$ denotes the element-wise multiplication. MSFEM has advantages in understanding long-range dependencies and capturing global structural features, which can extracted features containing global information. This ability helps to better estimate accurate thermal radiation bias fields.

3.5 Loss function

During training, we guide each stage of each sub-network to produce a corrected image, as shown in the bottom of Fig. 4. The auxiliary convolution layers are attached to outputs of $S_1^i(i = 1,2,3)$ to produce thermal radiation bias fields $B_1^i(i = 1,2,3)$, which are then subtracted from the input Y↓ of the S1 sub-network, which has a size of H/2 × W/2. Thus, we obtain three outputs $X_1^i(i = 1,2,3)$ in the S1 sub-network and three outputs $X_2^i(i = 1,2,3)$ in the S2 sub-network, and calculate losses for each of them.

Content loss ${L_{cont}}$ and frequency reconstruction loss ${L_{freq}}$ are used to train the SPNet. Content loss ${L_{cont}}$ is defined as:

(12)$${L_{cont}} = \sum\nolimits_{i = 1}^3 {\left( {\frac{1}{{{N_1}}}||X_1^i - GT \downarrow |{|_1} + \frac{1}{{{N_2}}}||X_2^i - GT \downarrow |{|_1}} \right)}$$

where GT is the ground-truth clear image and GT↓ is its downsampled version, $||\bullet |{|_1}$ denotes ${L_1}$ loss. ${N_1}$, ${N_2}$ are normalization factors, which we set to ${N_1} = H/2 \times W/2 \times 3$, ${N_2} = H \times W \times 3$ here.

The frequency reconstruction loss recovers the high-frequency details of the degraded image by minimizing the difference between the degraded image and the ground truth in the frequency domain. The frequency reconstruction loss ${L_{freq}}$ is defined as:

(13)$${L_{freq}} = \sum\nolimits_{i = 1}^3 {\left( {\frac{1}{{{N_1}}}||F({X_1^i} )- F({GT \downarrow } )|{|_1} + \frac{1}{{{N_2}}}||F({X_2^i} )- F({GT} )|{|_1}} \right)}$$

where F is the Fourier transform. Finally, we get the loss ${L_{total}} = {L_{cont}} + \lambda {L_{freq}}$, where $\lambda = 0.1$. The determination of $\lambda $ is explained in section 4.4.5.

4. Experiments

4.1 Experimental settings

To validate the effectiveness of our SPNet, we conduct experiments on both simulated degraded images and real degraded images. For simulated thermal radiation, we use the Infrared-Car dataset from Kaggle [39] and add thermal radiation bias fields with different sizes and directions to simulate the thermal radiation effects that occurs when the aircraft is flying at different flight angles and speeds. The dataset consists of 12,000 degraded-sharp image pairs for the training set and 4,500 degraded-sharp image pairs for the testing set. Each image in the dataset has a size of 256 × 256. For the real thermal radiation, we conduct experiments on the real degraded images in [7] and the degraded images in the window heating experiment. We train our model for 150 epochs with batch size 4. We use the adam optimizer with cosine annealing. We set the initial learning rate to 2 × 10⁻⁴ and gradually decrease it to 1 × 10⁻⁶. We select PSNR and SSIM as the quantitative evaluation metrics for simulation experiments. Brenner, EOG, SMD2, spatial frequency (SF) and standard deviation (SD) [42] are used as the quantitative evaluation metrics for real thermal radiation experiments. The computation times of all our proposed models are measured on a PC with an NVIDIA GeForce RTX 1080TI GPU.

4.2 Experiment for simulated degradation of the thermal radiation effects

4.2.1 Simulated thermal radiation effects correction

We first compare the correction result of SPNet with the representative optimization-based methods, such as Shi [10], BFBSF [11], and the deep-learning based method DMRN [20] on the simulated dataset. In terms of quantitative analysis, we compare the average PSNR and

SSIM of 4500 test images. As shown in Table 1, our method achieves superior results compared to traditional optimization-based methods like Shi's method and BFBSF. When compared to the deep-learning-based method DMRN, our method achieves superior results to DMRN in both PSNR and SSIM metrics. Furthermore, even our lightweight variant, SPNet-tiny, which has less parameters and MACs compared to DMRN, outperforms DMRN.

Table 1. Average quantitative results of different methods on 4500 test IR images (best and second best are highlighted and underlined, respectively)

View Table | View all tables in this article

Since SPNet efficiently utilizes the output of the S1 sub-network as the input of the S2 sub-network, which saves a lot of time. It takes only 0.11s for SPNet to process an image, and SPNet-tiny takes only 0.05s, while DMRN takes 0.29s, and Shi's method and BFBSF take 4.18s and 2.33s respectively. Considering the real-time requirements in practical applications, our method is superior to other methods. The corrected results of four different degraded images are shown in Fig. 7. It can be observed that for images with more severe thermal radiation effects, the corrected results of traditional methods (Shi’method and BFBSF) always have residual thermal radiation bias field (indicated by the red box in Fig. 7), resulting in overall low contrast in the images. In contrast, the corrected images obtained by SPNet contain almost no residual thermal radiation. The better result indicates that our method is more effective in removing the thermal radiation bias field, resulting in higher-quality corrected images.

Fig. 7. Comparison of thermal radiation effects correction on the simulated degraded images (best are highlighted). Red box shows the residual thermal radiation effects.

Download Full Size | PDF

4.2.2 Cross profile analysis

To further assess the effectiveness of the thermal radiation correction, we analysis the cross profile of the corrected results, as shown in Fig. 8. We take the thermal radiation correction image (second row of Fig. 7) as an example. From Fig. 8(b), it is evident that the correction result of SPNet (red curve) closely resembles the ground truth (black curve), indicating that the corrected image obtained by SPNet is closer to the ground truth image. Compared with the green and blue curves, we can infer from the smoother red curve that the thermal radiation bias field has been satisfactorily removed [40].

Fig. 8. Cross profile analysis for thermal radiation effects correction. Horizontal axis: column number of the IR image. Vertical axis: intensity value of the IR image. (a) Cross profile of a certain row. (b) Zoomed-in view of (a).

Download Full Size | PDF

4.2.3 Benefit for recognition

To validate the effectiveness of the thermal radiation correction images in subsequent tasks like target detection and recognition, we employ Google Vision API on the images before and after correction to perform the scene recognition [41,42]. The recognition results were used to assess the effectiveness of the corrected images for subsequent tasks. As shown in Fig. 9, the presence of thermal radiation bias fields can lead to erroneous labels being assigned, as indicated by the “Grey” label. However, when using the corrected images obtained by SPNet, the recognized labels are nearly consistent with those of the original images, and the recognition accuracy remains almost unchanged. Our proposed method also can benefit for infrared small target detection.

Fig. 9. Effectiveness of SPNet for the recognition (a) Ground truth image’s, (b) degraded image’s, (c) corrected image’s scene recognition.

Download Full Size | PDF

4.3 Experiment on real degraded images with thermal radiation effects

To evaluate the correction of real degraded images, we conduct experiments using three real thermal degraded images from [7], as shown in Fig. 10. From left to right, the three images represent a real outdoor infrared image, an infrared image captured during the heating process with the chamber temperature at 40°C, and an infrared image captured during the cooling process with the chamber temperature at 40°C.

Fig. 10. Comparison of thermal radiation effects correction on three real degraded images.

Download Full Size | PDF

It can be seen that the traditional optimization-based method can effectively remove the thermal radiation bias field, but the overall contrast of the image is low and the edges of the content in the image are blurred. On the other hand, DMRN has a slight thermal radiation residue in the corrected image. Our SPNet fully takes into account the global smoothness and local non-uniformity of thermal radiation, and the corrected images almost not contain any thermal radiation residue.

The corrected results with high clarity and contrast, making the content edges more distinct. From the perspective of quantitative analysis, we use the energy gradient function EOG, Brenner gradient function, SMD2 (the product of the gray scale variance) function, spatial frequency (SF) and standard deviation (SD) as evaluation metrics, the higher the value indicates that the larger the image gradient, the clearer the edges, the richer the texture details, the more obvious the image content, and the better the correction results of the image. As can be seen from Table 2, our method exceeds the other methods in the EOG, Brenner, SMD2, SF and SD metrics.

Table 2. Comparison of Brenner and EOG metrics for the first column image in Fig. 11

View Table | View all tables in this article

We also conduct window heating experiments. The window temperatures recorded in Fig. 11 are 599 K (up) and 578 K (down). In the degraded image, the contrast between the target and the background in the image is low due to the thermal radiation effects, and the background changes from black to gray or even white. The corrected result with Shi's method and BFBSF improve the contrast, but there are still some residual thermal radiation effects. DMRN is better than the traditional methods in terms of corrected results, but still not very good. The corrected result obtained by using SPNet has a high contrast between the target and the background, and there is almost no thermal radiation effects left in the background.

Fig. 11. Comparison of thermal radiation effects correction in window heating experiment.

Download Full Size | PDF

As shown in Table 3, our method outperforms other methods in Brenner, EOG, SF and SD metrics, which indicates that our correction image has more distinct edges, higher image contrast, and clearer targets.

Table 3. Comparison of Brenner and EOG metrics for the first row image in Fig. 11

View Table | View all tables in this article

4.4 Ablation study

To analyze the influence of each architectural components of our method, extensive ablation studies were performed in this section. For efficiency, we used SPNet-tiny for ablation experiments. All ablation experiments are trained with SPNet-tiny as the base model for 50 epochs.

4.4.1 Effectiveness of MFUM

To validate the effectiveness of MFUM, we compare the result of the traditional interpolation upsampling strategies and the deep-learning based upsampling strategy (pixel-shuffle). As shown in Table 4, the deep-learning based method is overall better than the interpolation based methods. Our MFUM module inherits the advantages of pixel-shuffle and can obtain the features of different receptive fields, while MFUM fully considers the channel factor and enhances the inter-channel information interaction. The results in Table 4 show that our method outperforms the traditional interpolation strategy and pixel-shuffle.

Table 4. Comparison of upsampling strategies

View Table | View all tables in this article

4.4.2 Effectiveness of AFFM

Our SPNet adaptively fuses the output features of the two sub-networks S1 and S2 using AFFM. In Table 5, we analyze the effectiveness of AFFM. It can be seen that our AFFM achieves better results compared to direct summation and concatenation. This further demonstrates that assigning appropriate weights to the output features of the two sub-networks is crucial for obtaining optimal results, rather than using a simple 1:1 ratio. Compared to concatenation, AFFM has approximately one-third of the number of parameters.

Table 5. Comparison of feature fusion strategies

View Table | View all tables in this article

4.4.3 Effectiveness of MSFEM

To evaluate the effectiveness of our multi-head self-attention feature extraction module (MSFEM), we compared it with the Resblock [43] and the DAU [44] which contains channel attention and spatial attention. Both Resblock and DAU can only extract local features and ignore global features, while MSFEM can extract global features. As can be seen in Table 6, the PSNR and SSIM values obtained using MSFEM demonstrate superior performance compared to the ResBlock and DAU. This indicates that our MSFEM has a greater advantage in feature extraction.

Table 6. Comparison of feature extraction modules

View Table | View all tables in this article

4.4.4 Effectiveness of size-variant strategy

We also demonstrate the advantages of the size-variant strategy compared to the single-size strategy. As shown in Table 7, we compare SPNet-tiny-single and SPNet-tiny under the same conditions, where SPNet-tiny-single is the version of SPNet-tiny that does not contain a size scaling strategy. As shown in Table.7, SPNet-tiny has less computation (MACs) compared to SPNet-tiny-single without image size-variant strategy, about 1/2 of it. The size-variant strategy greatly reduces the computational effort while achieving better corrected results than the single-size model.

Table 7. Single-size compared to size-variant

View Table | View all tables in this article

4.4.5 Determination of weighting coefficient in loss function

In this section, to analyze the weighting coefficient in the loss function, we describe how we evaluated the weighting coefficient $\lambda $ of the frequency reconstruction loss. To change the value of the weighting coefficient is to magnify or weaken the role of frequency reconstruction loss in the network. Large values magnify the strength of the frequency reconstruction loss, whereas small values reduce the strength of the frequency reconstruction loss. We conduct experiments to determine $\lambda $. As shown in Table 8, the best results are achieved when the parameter $\lambda $ is set to 0.1.

Table 8. Determination of $\lambda $

View Table | View all tables in this article

5. Conclusions

In this paper, we propose a size-variant progressive network (SPNet) for aero-optical thermal radiation effects correction. SPNet utilizes two unique properties of thermal radiation bias fields: (1) the thermal radiationthe degrade level of the thermal radiation bias field remains consistent regardless of image size; (2) the thermal radiation bias field exhibits globally smooth and locally non-uniform. SPNet can efficiently and accurately correct thermal radiation effects images and obtain a clear, high-contrast corrected thermal radiation effects image on both the simulated and the real thermal radiation effects images, which is helpful for the subsequent tasks, such as target detection and recognition. Unlike previous networks, our SPNet progressively corrects thermal radiation, which not only improves the correction efficiency of the networks, but also achieves better results than other methods in terms of real-time and computation.

Funding

Knowledge Innovation Program of Wuhan-Basi Research (2022010801010351); National Natural Science Foundation of China (62171329).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. Zhang and J. Fei, “Study on aero-optical effect of the imaging detection system of high speed flight vehicle,” Infrared and Laser Engineering 49(6), 20201016 (2020). [CrossRef]

2. E. F. Cross, “Window heating effects on airborne infrared system calibration,” Infrared Technology XVIII. SPIE. 1762, 576–583 (1992). [CrossRef]

3. N. Yang, Y. Yuan, C. Zhang, et al., “Accurate characterization of full-chain infrared multispectral imaging features under an aerodynamic thermal environment,” Opt. Express 31(16), 26643–26658 (2023). [CrossRef]

4. W. Hui, S. Chen, W. Zhang, et al., “Evaluating imaging quality of optical dome affected by aero-optical transmission effect and aero-thermal radiation effect,” Opt. Express 28(5), 6172–6187 (2020). [CrossRef]

5. S. Luo, H. Ding, S. Yi, et al., “Influence of optical aperture sizes on aero-optical effects induced by supersonic turbulent boundary layers,” Opt. Express 31(12), 19133–19145 (2023). [CrossRef]

6. V. Magnin, M. Zegaoui, J. Harari, et al., “Design, optimization and fabrication of an optical mode filter for integrated optics,” Opt. Express 17(9), 7383–7391 (2009). [CrossRef]

7. Y. Cao and C. L. Tisse, “Single-image-based solution for optics temperature-dependent nonuniformity correction in an uncooled long-wave infrared camera,” Opt. Lett. 39(3), 646–648 (2014). [CrossRef]

8. L. Liu and T. Zhang, “Optics temperature-dependent nonuniformity correction via L₀-regularized prior for airborne infrared imaging systems,” IEEE Photonics J. 8(5), 1–10 (2016).

9. L. Liu and T. Zhang, “Intensity non-uniformity correction of aerothermal images via L_p-regularized minimization,” J. Opt. Soc. Am. A 33(11), 2206–2212 (2016). [CrossRef]

10. Y. Shi, H. Hong, X. Hua, et al., “Aero-optic thermal radiation effects correction with a low-frequency prior and a sparse constraint in the gradient domain,” J. Opt. Soc. Am. A 36(9), 1566–1572 (2019). [CrossRef]

11. H. Hong, J. Liu, Y. Shi, et al., “Progressive nonuniformity correction for aero-optical thermal radiation images via bilateral filtering and bézier surface fitting,” IEEE Photonics J. 15(2), 1–11 (2023). [CrossRef]

12. Z. Li, G. Xu, Z. Wang, et al., “A structure prior weighted hybrid ℓ2–ℓp variational model for single infrared image intensity nonuniformity correction,” Optik 229, 165867 (2021). [CrossRef]

13. Y. Shi, J. Chen, Y. Zhang, et al., “Multi-scale thermal radiation effects correction via a fast surface fitting with Chebyshev polynomials,” Appl. Opt. 61(25), 7498–7507 (2022). [CrossRef]

14. S. Dong, P. Wang, and K. Abbas, “A survey on deep learning and its applications,” Computer Science Review 40, 100379 (2021). [CrossRef]

15. Z. Li, F. Liu, W. Yang, et al., “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transactions on neural networks and learning systems 33(12), 6999–7019 (2021).

16. J. Whang, M. Delbracio, H. Talebi, et al., “Deblurring via stochastic refinement,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2022), pp. 16272–16282.

17. R. Girshick, J. Donahue, T. Darrell, et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2014), pp. 580–587.

18. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 3431–3440.

19. I. Goodfellow, J. P. Abadie, M. Mirza, et al., “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020). [CrossRef]

20. Y. Chang, L. Yan, L. Liu, et al., “Infrared aerothermal nonuniform correction via deep Multiscale Residual Network,” IEEE Geosci. Remote Sensing Lett. 16(7), 1120–1124 (2019). [CrossRef]

21. K. Zhang, W. Zuo, Y. Chen, et al., “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

22. Q. Wang, B. Wu, P. Zhu, et al., “ECA-Net: efficient channel attention for deep convolutional neural networks,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2020), pp. 11534–11542.

23. L. Ma, F. Zhao, H. Hong, et al., “Complementary parts contrastive learning for fine-grained weakly supervised object co-localization,” IEEE Trans. Circuits Syst. Video Technol. 33(11), 6635–6648 (2023). [CrossRef]

24. L. Ma, H. Hong, F. Meng, et al., “Deep progressive asymmetric quantization based on causal intervention for fine-grained image retrieval,” IEEE Transactions on Multimeida 1–13 (2023).

25. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 7132–7141.

26. S. Woo, J. Park, J.Y. Lee, et al., “CBAM: Convolutional block attention module,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 3–19.

27. A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Advances in neural information processing systems 30, (2017).

28. J. Devlin, M.W. Chang, K. Lee, et al., “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv, arXiv:1810.04805 (2018). [CrossRef]

29. H. Touvron, M. Cord, M. Douze, et al., “Training data-efficient image transformers & distillation through attention,” International conference on machine learning. PMLR, 10347–10357 (2021).

30. S.H. Park, Y.S. Moon, and N.I. Cho, “Perception-oriented single image super-resolution using optimal objective estimation,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2023), pp. 1725–1735.

31. J. Liang, J. Cao, G. Sun, et al., “SwinIR: image restoration using swin transformer,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2021), pp. 1833–1844.

32. J. Wang, K. Chen, R. Xu, et al., “CARAFE: Content-aware reassembly of features,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 3007–3016.

33. X. Hu, H. Mu, X. Zhang, et al., “Meta-SR: A magnification arbitrary network for super-resolution,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 1575–1584.

34. W. Shi, J. Caballero, F. Huszár, et al., “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 1874–1883.

35. Y. Lee, J. Park, and C. O. Lee, “Two-level group convolution,” Neural Networks 154, 323–332 (2022). [CrossRef]

36. X. Zhang, X. Zhou, M. Lin, et al., “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 6848–6856.

37. X. Li, W. Wang, X. Hu, et al., “Selective kernel networks,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 510–519.

38. D. Hendrycks and K Gimpel, “Gaussian error linear units (GELUs),” arxivarXiv:1606.08415 (2016). [CrossRef]

39. Y. Zhou, Y. Shi, Y. Zhang, et al., “Intra-block pyramid cross-scale network for thermal radiation effect correction of uncooled infrared images,” J. Opt. Soc. Am. A 40(9), 1779–1788 (2023). [CrossRef]

40. H. Hong, Z. Zuo, Y. Shi, et al., “Adaptive anisotropic pixel-by-pixel correction method for a space-variant degraded image,” J. Opt. Soc. Am. A 40(9), 1686–1697 (2023). [CrossRef]

41. Y. Li, G. Liu, D. P. Bavirisetti, et al., “Infrared-visible image fusion method based on sparse and prior joint saliency detection and LatLRR-FPDE,” Digital Signal Processing 134, 103910 (2023). [CrossRef]

42. H. Tang, G. Liu, L. Tang, et al., “MdedFusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion,” Infrared Phys. Technol. 127, 104435 (2022). [CrossRef]

43. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

44. J. Fu, J. Liu, H. Tian, et al., “Dual attention network for scene segmentation,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 3146–3154.

Metrics	Degraded	Optimization-based		Deep-learning based
Metrics	Degraded	Shi	BFBSF	DMRN	SPNet-tiny	SPNet
PSNR(dB)	11.22	17.58	17.83	29.22	32.45	33 . 12
SSIM	0.5530	0.7672	0.8839	0.9341	0.9439	0.9452
Params (M)	-	-	-	8.01	6.25	12.36
MACs(G)	-	-	-	62.37	24.32	61.32
Time(s)	-	4.18	2.33	0.29	0.05	0.11

Metrics	Degraded	Cao	Shi	BFBSF	DMRN	SPNet
Brenner	68.52	224.06	116.88	66.03	208.46	292 . 53
EOG	104.64	207.48	168.91	105.61	162.66	275.27
SMD2(M)	0.60	1.93	0.97	0.59	1.85	2.00
SF	0.0412	0.0592	0.0513	0.0406	0.0561	0.0654
SD	0.1316	0.1250	0.0914	0.0773	0.1623	0.1751

Metrics	Degraded	Shi	BFBSF	DMRN	SPNet
Brenner	28.81	95.41	96.94	54.70	168 . 12
EOG	49.42	160.69	163.20	98.45	311.45
SF	0.0169	0.0301	0.0304	0.0331	0.0556
SD	0.0997	0.01026	0.1026	0.0988	0.1048

Metrics	Interpolation			Deep-learning-based
Metrics	Nearest	Bilinear	Bicubic	Pixel-shuffle	MFUM
PSNR(dB)	31.69	32.33	32.45	32.48	32.89
SSIM	0.9884	0.9886	0.9887	0.9887	0.9890

Metrics	Sum	Cat + 1 × 1conv	AFF w/o MLP	AFFM
PSNR(dB)	31.26	32.35	32.45	32.89
Params	-	40*20 = 800	240	280

SPNet: a size-variant progressive network for aero-optical thermal radiation effects correction

Abstract

1. Introduction

2. Related work

2.1 Attention mechanism

2.2 Feature upsampling

3. Network

3.1 Overall network structure

3.2 Multi-scale feature upsampling module (MFUM)

3.3 Adaptive feature fusion module (AFFM)

3.4 Multi-head self-attention feature extraction module (MSFEM)

3.5 Loss function

4. Experiments

4.1 Experimental settings

4.2 Experiment for simulated degradation of the thermal radiation effects

4.2.1 Simulated thermal radiation effects correction

4.2.2 Cross profile analysis

4.2.3 Benefit for recognition

4.3 Experiment on real degraded images with thermal radiation effects

4.4 Ablation study

4.4.1 Effectiveness of MFUM

4.4.2 Effectiveness of AFFM

4.4.3 Effectiveness of MSFEM

4.4.4 Effectiveness of size-variant strategy

4.4.5 Determination of weighting coefficient in loss function

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (8)

Equations (13)

Optics Express

$λ$	0.01	0.1	0.2	0.5
PSNR(dB)	32.42	32.98	31.89	32.10
SSIM	0.9888	0.9890	0.9886	0.9887

$λ$	0.01	0.1	0.2	0.5
PSNR(dB)	32.42	32.98	31.89	32.10
SSIM	0.9888	0.9890	0.9886	0.9887