Why thermal images are blurry

Fanglin Bao; Fanglin Bao; Shubhankar Jape; Andrew Schramka; Junjie Wang; Tim E. McGraw; Zubin Jacob; Zubin Jacob

doi:10.1364/OE.506634

Optics Express
Vol. 32,
Issue 3,
pp. 3852-3865
(2024)
•https://doi.org/10.1364/OE.506634

Why thermal images are blurry

Fanglin Bao, Shubhankar Jape, Andrew Schramka, Junjie Wang, Tim E. McGraw, and Zubin Jacob

Open Access

Get PDF
Email
Share
Get Citation
Copy Citation Text
Fanglin Bao, Shubhankar Jape, Andrew Schramka, Junjie Wang, Tim E. McGraw, and Zubin Jacob, "Why thermal images are blurry," Opt. Express 32, 3852-3865 (2024)

Export Citation
- BibTex
- Endnote (RIS)
- HTML
- Plain Text
Citation alert
Save article
Spotlight Summary

Check for updates

More Like This

Spinning metasurface stack for spectro-polarimetric thermal imaging
Xueji Wang, et al.
Optica 11(1) 73-80 (2024)

Estimation of daylight spectral power distribution from uncalibrated hyperspectral radiance images
Maximilian Czech, et al.
Opt. Express 32(6) 10392-10407 (2024)

Cross-domain colorization of unpaired infrared images through contrastive learning guided by color...
Tong Jiang, et al.
Opt. Express 32(9) 15008-15024 (2024)

Related Topics
Table of Contents Category
- Physical Optics
Optics & Photonics Topics
?

The topics in this list come from the Optics and Photonics Topics applied to this article.

About this Article
History
- Original Manuscript: September 25, 2023
- Revised Manuscript: December 28, 2023
- Manuscript Accepted: January 3, 2024
- Published: January 22, 2024
Virtual Issues
May 17, 2024 Spotlight on Optics

Abstract

The resolution of optical imaging is limited by diffraction as well as detector noise. However, thermal imaging exhibits an additional unique phenomenon of ghosting which results in blurry and low-texture images. Here, we provide a detailed view of thermal physics-driven texture and explain why it vanishes in thermal images capturing heat radiation. We show that spectral resolution in thermal imagery can help recover this texture, and we provide algorithms to recover texture close to the ground truth. We develop a simulator for complex 3D scenes and discuss the interplay of geometric textures and non-uniform temperatures which is common in real-world thermal imaging. We demonstrate the failure of traditional thermal imaging to recover ground truth in multiple scenarios while our thermal perception approach successfully recovers geometric textures. Finally, we put forth an experimentally feasible infrared Bayer-filter approach to achieve thermal perception in pitch darkness as vivid as optical imagery in broad daylight.

1. Introduction

Thermal imaging has myriad applications from surveillance and defense to advanced driver-assistance systems and autonomous navigation [1]. Originally developed as a night-vision tool for defense applications, thermal imaging has now become a versatile consumer technology due to its ability to see through bad weather [2], darkness [3], or act as a thermography technique for biomedical applications [4]. According to Planck’s law, every object at finite temperature $T$, including human bodies, ground, and buildings, would emit infrared thermal radiation, depending on the object’s emissivity $e$. Thermal radiation propagates and scatters off multiple objects, and is omnipresent in both day and night. This sensing modality works in the mid-wave and long-wave infrared spectrum and is complementary to visible-light optical imaging. Nevertheless, thermal images are known to be of low contrast and lack details and are often blurry compared to optical images [5,6]. This phenomenon of blurry thermal images is partly attributed to the sensor noise, lower pixel array sizes as well as the wavelength differences between infrared and optical radiation. However, the goal of this paper is to show that a fundamental mechanism is at play beyond the hardware differences between visible-light and thermal-infrared cameras.

Machine learning especially for visible-light optical images has led to revolutionary advances in machine perception [7]. On the other hand, the advances in the infrared thermal domain have been hindered due to the above-mentioned blurry nature of thermal images. For example, thermal imagery is often used only as a subsidiary sensing approach to enhance RGB images under poor ambient illumination [3]. Recently, it was shown that embedding the physics of thermal radiation in machine learning algorithms can overcome the blurry nature of thermal images [8]. It was shown that machine perception using infrared thermal radiation can lead to imagery in pitch darkness as vivid as broad daylight [8]. Thus a thermal perception technique like HADAR: Heat-Assisted Detection and Ranging [8] can compete with visible-light optical imaging as a stand-alone sensing modality in the future. However, open challenges remain in analyzing complex scenes with non-uniform temperature distributions. Furthermore, the robustness of the inverse estimation problem in multiple scenarios with resource constraints remains untested.

In this paper, our first goal is to explain the underlying mechanism that causes thermal images to lose texture and become blurry. This mechanism which is called ‘ghosting’ is unique to infrared thermal imaging. It does not occur in visible-light optical imaging. Furthermore, the limitations in texture/resolution are distinct from the well-known diffraction limit as well as the noise limit arising from detector non-idealities. We develop a thermal simulator specifically for three-dimensional real-world scenes with non-uniform temperature distribution. We show that thermal physics-driven perception algorithms can correctly recover the geometric textures of realistic scenes with non-uniform temperatures. Unlike previous work [8], we do not restrict ourselves to scenes with uniform temperatures. Through extensive numerical simulations, we shed light on the thermal physics-driven definition of texture. This can have important applications for designing algorithms where physics can be embedded into the signal processing and machine learning approaches.

The necessary hardware modification to overcome ghosting is based on the spectral resolution of thermal images. Spectral resolution increases the complexity of the thermal cameras but leads to higher texture recovery in the collected images. Here, we demonstrate the role of the number of spectral bands in recovering the ground truth of complex scenes using our thermal simulator. We show that the error in estimating the scene decreases as the number of spectral bands increases. Long-wave infrared hyperspectral imaging underlying the TeX vision theory [8] is experimentally challenging, due to limitations on existing sensors. Here, TeX vision stands for the representation of heat radiation where the pixels in images are represented by three physical attributes of temperature ($T$), emissivity ($e$), and physics-driven texture ($X$). We demonstrate that we can achieve a TeX vision representation close to the ground truth with the Bayer-filter approach (using only 4 Bayer filters). Similar to optical frequency Bayer filters, we believe our design can have widespread industrial applications. Our results can lead to the design of optimal spectral modules for next-generation thermal imagers.

We show that thermal-physics driven perception has significant advantages in performance over existing state-of-the-art physics-agnostic approaches. Our paper uses the Contrast-Limited Adaptive Histogram Equalization (CLAHE) [9] as the state-of-the-art baseline of image processing. Among a variety of digital image processing algorithms for contrast enhancement of thermal images [10], histogram equalization and its variants [9,11,12] are widely adopted techniques in state-of-the-art thermal datasets such as the FLIR thermal dataset. Machine-learning-based approaches [13–16] also present a new research frontier in improving the visual contrast by learning the multi-scale thermal features. Those image processing algorithms aim to recover the scattering signal by heuristically removing the strong contribution of direct emission. Our work shows that image processing fails to recover geometric textures when the direct emission is spatially non-uniform. We argue that this is the fundamental reason causing the persistence of the ghosting effect in thermal imaging in spite of significant advances in camera hardware (low-noise cooled sensors, larger pixel array sizes etc.).

This paper is organized as follows. Section 2 shows general examples to explain the loss of geometric textures and the emergence of the ghosting effect. Section 3 briefly introduces the TeX vision for completeness. Section 4 investigates TeX-SGD (semi-global decomposition) with non-uniform temperature and demonstrates texture recovery close to the ground truth. Sec. 5 shows the main results of this paper on the interplay of geometric textures and non-uniform temperature. This section also demonstrates that TeX vision beats existing techniques in overcoming the ghosting effect. Sections 6 and 7 show the main results on the influence of spectral resolution and finite cutoff of environmental objects in TeX vision.

2. Loss of geometric texture in thermal imaging

The mechanism of the ghosting effect can be revealed by analogy to a shining bulb [8]. On a glowing bulb, it is impossible to identify the surface features, spot the geometric texture, or read the written text. However, as soon as it is turned off, the radiation from other sources scatters off the surface of the bulb rendering the surface features visible to the human eye or a camera. The key aspect is that the intrinsic radiation from the object mixes with the extrinsic scattering from nearby objects leading to loss of information/texture. Following the theory of thermal radiation and scattering, the total thermal radiation, or heat signal, $S$, entering a thermal camera is given by [8],

(1)$$S_{\alpha\nu} = e_{\alpha\nu}B_{\nu}(T_{\alpha}) +[1- e_{\alpha\nu}] X_{\alpha\nu},$$

with

(2)$$X_{\alpha\nu} = \sum_{\beta\neq\alpha} V_{\alpha\beta}S_{\beta\nu},$$

where $B$ is the blackbody radiation given by Planck’s law, $V$ is the thermal lighting factor related to the object’s geometric surface normals, $\alpha$ is the object index, and $\nu$ is the wavenumber. Here, we assume a Lambertian emitter but our equations can be generalized to non-Lambertian emitters as well. The crucial thermal variable carrying texture information is $X$ which we define as the physics-driven geometric texture. The lack of geometric texture, i.e., lack of 3D surface normal information is the loss of information in the thermal variable $X$. This is because the direct emission term (first term) often dominates in the infrared spectral range. Geometric texture in imagery is crucial for machine perception tasks such as ranging and semantic segmentation. Thus a major goal of our work is to recover the texture $X$ using advanced thermal perception algorithms and custom sensors.

For a shining bulb or natural objects with near-unity emissivity ($e\approx 1$), blackbody radiation dominates the total heat signal, that is, $S_{\alpha \nu } \approx B_{\nu }(T_{\alpha })$, according to Eq. (1). This explains why thermal imaging is widely utilized for measuring the temperature $T$ using primarily the intrinsic radiation from the object of interest. However, we emphasize that no information on the object’s geometric textures is revealed in this direct emission. As explained before, the geometric textures on a shining bulb can only be seen by turning the bulb off. However, the fundamental blackbody radiation from natural objects can never be turned off leading to the ghosting effect. For a panchromatic thermal image, $S=\int S_\nu \, {\mathrm {d}}\nu$, separating the intrinsic and extrinsic signals is extremely challenging or impossible. The texture $X$ is irreversibly lost in the broadband image since there exist infinite solutions of $T$, $e_\nu$, and $X_\nu$, which can lead to the same observed signal. We address this as TeX degeneracy of panchromatic thermal imagery [8]. We note that post-processing algorithms cannot isolate the direct emission term, $e\cdot B$, from the total signal $S$ which causes the ghosting effect. This TeX degeneracy is the (novel) mechanism that causes the ghosting effect and is in stark contrast to well-known factors of diffraction and sensor noise.

The insight of Eq. (1) is that geometry information does lurk inside the total heat signal as a weak contribution of magnitude order $1-e$, in a low-contrast background of magnitude order $e$. Note, here we address the direct emission as the background, i.e., the direct emission signal which is widely used in thermography is in fact the background as it does not contain texture. Based on inverse computational algorithms and spectral resolution, Ref. [8] shows that artificial intelligence (AI) using TeX vision can recover geometric textures from the heat signal and see through the pitch darkness like broad daylight, making the long-standing dichotomy between day and night for human beings obsolete.

We note that Eq. (1) is a unified theory of optical imaging in daylight under solar illumination and thermal imaging at night. The differences between optical RGB imaging and long-wave infrared (LWIR) thermal imaging are roughly twofold, within the scope of this paper. Firstly, the spectral response range of the sensor for optical or thermal imaging is in the visible light range and LWIR, respectively. Secondly, the scattering signal of optical imaging is mainly the solar radiance, while the scattering signal of thermal imaging is from room-temperature objects. There is a large temperature contrast between the sun ($T\approx 5,500\,{\mathrm {C}^\circ }$) and the scene objects on earth ($T\approx 20\, {\mathrm {C}^\circ }$). According to Planck’s law, the blackbody radiation is given by

(3)$$B_\lambda(T) = \frac{2hc^2}{\lambda^5}\frac{1}{e^{hc/(\lambda k_\mathrm{B}T)}-1},$$

where $\lambda =1/\nu$ is the wavelength, $h$ is Planck’s constant, $c$ is the speed of light, and $k_\mathrm {B}$ is Boltzmann’s constant. Solar radiance peaks in the visible-light range, while room-temperature objects’ radiance peaks in the LWIR, as shown in Fig. 1(a). For optical imaging, the scattering of solar illumination is well separated from the direct thermal radiation of scene objects, due to the high temperature contrast. This gives optical imaging vivid geometric textures encoded in the texture term $X$, as can be seen in the left column of Fig. 1(b). Here, optical imaging is set as grayscale for a fair comparison. For thermal imaging, scattering and direct emission are both from scene objects around room temperatures. The fact that a traditional thermal camera collects both weak scattering and strong direct emission leads to textureless thermal images and thus the ghosting effect. See the right column of Fig. 1(b).

Fig. 1. The ghosting effect occurs when the geometric textures in the scattering signal are immersed in the strong direct emission of low contrast. (a) Typical radiance signal (normalized) for optical and thermal imaging. In optical imaging, the scattering signal (red curve) from solar illumination is well separated from the direct emission (blue circles) of scene objects, due to the high temperature contrast of the sun and our living environment. However, in thermal imaging, direct emission from targets largely overlaps with the scattering signal from environmental objects in the spectral domain, leading to the ghosting effect. Shaded areas are typical spectral response ranges for optical/thermal cameras. (b) Monte Carlo path tracing simulation of typical thermal (optical) imaging examples without (with) textures illustrating the ghosting effect. Optical and thermal imaging are set to have the same spatial resolution and sensor noise. Snow, grass, and sand surface are modeled as uniform materials at uniform temperatures but with different geometric surface-normal textures. Street has a non-uniform temperature distribution.

Download Full Size | PDF

We emphasize that visible light and LWIR radiation have different wavelengths, distinct sensor pixel sizes, and separate noise performance. The resulting different spatial and signal resolutions for the two modalities partially account for the different textures that can be captured by optical and thermal cameras. However, the insight from Fig. 1 is that thermal imaging suffers from the ghosting effect and is textureless even if it has the same spatial and signal resolutions as optical imaging. The second insight from Fig. 1 is that the vivid textures of snow, grass, and sand in optical imaging come merely from the geometric surface normals, where their material and temperature have been set to be uniform. These insights imply the possibility of recovering vivid textures through the geometric surface normals by properly removing the direct emission from the total heat signal.

3. TeX vision overcomes the dichotomy between day and night

We emphasize that HADAR (heat-assisted detection and ranging) separates the scattering signal from the direct emission in thermal imaging using spectral resolution. To represent a hyperspectral imaging datacube, or the heat cube, traditional methods commonly use the principal component analysis (PCA) [17]. The PCA approach usually adopts the first 3 principal components to show in the RGB color space, while the rest components are discarded leading to information loss. In contrast, we note that the total information in the heat signal Eq. (1) can be captured in three physics quantities, namely, temperature $T$, emissivity $e$, and texture $X$. TeX vision decomposes these physics quantities from heat signal and shows them in the HSV color space, as illustrated in Fig. 2. In the RGB color space, each channel represents a light intensity. In stark contrast, each channel in the HSV color space has a physical meaning. The Hue channel has a strong correlation with the daily semantics of objects. For example, blue usually indicates water or sky, green usually indicates grass or leaves, yellow usually indicates sand, etc. The Value/Brightness channel represents image textures of light and shadow. To reconstruct/mimic daylight RGB images with heat signals, in TeX vision, we show material category $e$ in the Hue channel and texture $X$ in the Value channel. For example, we assign to ‘water’ a blue hue, ‘grass’ a green hue, ‘sand’ a yellow hue, and so on.

Fig. 2. Schematics of the TeX vision. TeX vision represents physics attributes of temperature $T$, material $e$, and texture $X$ in the HSV color space. Explicitly, color hue encodes the semantic material category, saturation encodes the temperature, and value/brightness encodes the texture. TeX vision enables AI to see through pitch darkness like broad daylight. The algorithm for TeX decomposition will be discussed in Sec. 4.

Download Full Size | PDF

For TeX decomposition, we use a semantic library, ${\mathcal {M}} = \{e_{\nu }(m)|m = 1,2,\ldots,M\}$, that approximates all possible spectral emissivities in the scene. For each object, its emissivity can be approximately described by one of the curves in the library, i.e., $e_{\alpha \nu } = e_{\nu }(m_\alpha )$. This semantic library can either be calibrated on-site or estimated from the heat cube itself [8]. The adoption of a semantic library leads to the existence of a unique solution of the inverse TeX decomposition problem defined by Eq. (1). The parameters to be estimated in the inverse problem are $\{T_\alpha, m_\alpha, V_{\alpha \beta }\}$. The detailed decomposition algorithm will be discussed in Sec. 4. Since TeX vision records all the physics quantities, we argue that TeX vision is a full representation of the hyperspectral heat signal without losing information of any spectral bands. Furthermore, we emphasize that the colors in the TeX vision indicate material categories, $m_\alpha$, unlike pseudo coloring in traditional image processing. The accuracy of TeX decomposition depends on the accuracy of how the semantic library depicts the scene. An ideal semantic library becomes a ground-truth material library with exact spectral emissivity profiles. In this paper, we use the exact material library to demonstrate the TeX vision theory.

When the TeX decomposition in Fig. 2 is ideal, the resulting TeX vision strikingly shows an image of a night scene as if it is seen in daylight, with both recovered textures and semantic information. Artificial intelligence with TeX-based machine vision is thus able to overcome the long-standing dichotomy between day and night for human beings.

Note that the texture term $X$ in optical imaging usually involves only the sky and the sun as light sources. In thermal imaging, every object in the environment like streets and buildings has its scattering contributions in texture $X$. To mimic daylight optical imaging, scattering contributions of environmental objects other than the sky need to be removed. This process is discussed in Sec. 4.

4. TeX-SGD: texture recovery close to the ground truth

For each color pixel, TeX vision visualizes temperature $T$ as the saturation, material category $e(m)$ as the color hue, and texture $X$ as the brightness. By manually splitting between the scattering contribution and the direct emission contribution along with manually controlling the temperature and material in the Monte Carlo path tracing simulation, we can determine the ground truth TeX vision of a scene. The ground truth TeX vision in Fig. 2 shows the recovered textures of night scenes which are as vivid as when viewed in daylight. Now, we show how to recover the texture and generate the TeX vision close to the ground truth.

To tackle the iterative system of Eqs. (1) and (2) with a possibly infinite number of environmental objects, we approximate the panoramic environment as $k$ equivalent environmental objects whose spectral emissivity are also among those $M$ curves in the semantic library $ {\mathcal {M}}$. The following reconstructed heat signal $\tilde {S}^k_{\alpha \nu }$ with only $k$ environmental objects presents a good approximation of the original heat signal $S_{\alpha \nu }$,

(4)$$\tilde{S}^k_{\alpha\nu} = e_\nu(m_\alpha)B_{\nu}(T_{\alpha}) +[1- e_\nu(m_\alpha)] \tilde{X}^k_{\alpha\nu},$$

with

(5)$$\tilde{X}^k_{\alpha\nu} = V_{\alpha 1}\tilde{S}_{1\nu}+V_{\alpha 2}\tilde{S}_{2\nu}+\cdots+V_{\alpha k}\tilde{S}_{k\nu},$$

where environmental radiance $\tilde {S}_{1,\ldots,k;\nu }$ can be approximated from the captured images, see Sec. 7 for more details. The above approximation can be understood from the viewpoint of ray/path tracing. Path tracing of a real-world scene is asymptotically accurate and realistic, when the ray depth and meshing density of the environment increase.

TeX-SGD (semi-global decomposition) aims to extract the scene attributes, i.e., $T_\alpha$, $m_\alpha$, and $V_{\alpha ;1,\ldots,k}$, by minimizing the $l_2$-norm residue, $\delta ^k_\alpha \equiv ||\tilde {S}^k_{\alpha \nu } - S_{\alpha \nu }||$,

(6)$$\{T^k_\alpha,m^k_\alpha,V^k_{\alpha;1,\ldots,k}\} = \mathrm{argmin}_{TmV}\delta^k_\alpha,$$

with additional smoothness constraints. As $k$ increases, $\{T^k_\alpha,m^k_\alpha,\tilde {X}^k_{\alpha \nu }\}$ are expected to approach $\{T_\alpha,m_\alpha,X_{\alpha \nu }\}$. In practice, small $k$ is used for ease of computation, and hence $\delta ^k_\alpha$ also contains considerable textures as $\delta ^k_\alpha \propto \sum _{\beta \neq 1,2,\ldots,k}V_{\alpha \beta }S_{\beta \nu }$. Notably, if those $k$ equivalent environmental objects do not include sky, the whole term of $\tilde {X}^k_{\alpha \nu }$ needs to be removed from $X_{\alpha \nu }$ to mimic daylight optical imaging, as explained in Sec. 3. $\delta ^k_\alpha \propto ||\tilde {X}^k_{\alpha \nu } - X_{\alpha \nu } ||$ is exactly the right quantity to find in order to show textures. When sky is included in the equivalent environmental objects, we first inversely solve $\{T^k_\alpha,m^k_\alpha,V^k_{\alpha ;1,\ldots,k}\}$ from the heat signal according to Eq. (6). Then we forwardly evaluate the iterative Eqs. (1) and (2) with $\{T^k_\alpha,m^k_\alpha \}$ but keeping only the sky thermal lighting factor $V^k_{\alpha ;\mathrm {sky}}$ to get a distilled texture $\bar {X}_\alpha$. Finally, the distilled texture $\bar {X}_\alpha$ is fused with $\delta ^k_\alpha$ to get the final texture used in TeX vision.

Figure 3 illustrates the process to solve $\{T^k_\alpha,m^k_\alpha,V^k_{\alpha ;1,\ldots,k}\}$ for a given sample sky pixel, with $k=2$. The sky pixel is marked in Fig. 3(a) as a white cross. Figure 3(b) shows sample spectral radiance curves for different materials. The subtle difference in the spectral radiance signal is crucial to distinguishing the materials. Figure 3(c) shows the ground truth material library we used for the TeX decomposition. The spectral emissivity profiles are from the HADAR database which is partially based on the NASA JPL ECOSTRESS spectral library [8]. In practice, we define $\delta _m = \mathrm {min}_{TV}\delta ^k_\alpha$ for each pixel $\alpha$. Equation (6) therefore gives $m^k_\alpha = \mathrm {argmin}_{m}\delta _m$. Figure 3(d) shows $\delta _m$ for the sample sky pixel. The minimum residue correctly predicts it as ‘Sky’. Continuous parameters of temperature $T^k_\alpha$ and thermal lighting factors $V^k_{\alpha ;1,\ldots,k}$ can be readily solved out by minimizing $\delta _\mathrm {sky}$ using nonlinear least-squares algorithms (matlab fmincon, interior-point algorithm, max iteration 1000). The above procedures are repeated for each pixel for a local decomposition. We then impose a smoothness penalty on the resulting $T^k_\alpha$ and $m^k_\alpha$, in addition to $\delta ^k_\alpha$, for a global decomposition. Explicitly, we set the global solution of $m$ as the 2-dimensional medium filtering of local $m$ in a 3 by 3 neighboring pixel array. And we impose a global penalty on temperature, $p\times |T-T^{\prime }|/\Delta$, where the penalty coefficient $p = 0.01$. Here, $T^{\prime }$ is the 2-dimensional medium filtering of local $T$ in the 3 by 3 neighboring pixel array, and $\Delta$ is the standard deviation of local $T$ in the 3 by 3 neighboring pixel array.

Fig. 3. Workflow of TeX-SGD. (a) The hyperspectral heat cube. (b) Sample spectral radiance curves for the pixels cross-marked in (a). (c) The material library $ {\mathcal {M}}$ used for TeX-SGD. This material library is the ground truth library that we used to simulate the scene. (d) The residues in fitting the radiance curve of the pixel marked ‘Sky’ with all possible materials in the library. The minimum residue gives a prediction of ‘Sky’, which is correct. In (a), B: Bark, F: Foliage, R: Rock, and W: Water.

Download Full Size | PDF

Figure 4 shows the TeX vision process to reconstruct texture under sky illumination, mimicking daylight optical imaging. Note that for the estimated TeX vision, the inverse decomposition is based on TeX-SGD. For the ground truth TeX vision, the Monte Carlo path tracing simulation generates the ground truth decomposition.

Fig. 4. Workflow of texture distillation in TeX vision, to reconstruct the texture keeping only the sky illumination, mimicking daylight optical imaging.

Download Full Size | PDF

Figure 5 shows the TeX vision generated by TeX-SGD, in comparison with the raw thermal vision and the ground truth TeX vision explained in Sec. 3. We emphasize that TeX-SGD has not been previously tested on the synthetic scenes in the HADAR database with ground truth. Figure 5 clearly shows the TeX vision generated by TeX-SGD according to Eq. (6) has recovered the geometric textures as well as semantic information close to the ground truth. However, errors also exist due to the inaccurate approximation of the environmental radiance. See Sec. 7 for more analyses.

Fig. 5. TeX vision generated by TeX-SGD recovers geometric textures and disentangles temperature and emissivity close to the ground truth.

Download Full Size | PDF

5. Interplay of geometric textures and temperature variation

For scenes with artificially designed uniform temperatures, texture recovery may be possible by traditional image processing. Furthermore, it is unclear if TeX vision theory can recover textures for common scenes with non-uniform temperatures. Here, we study realistic scenes with non-uniform temperatures, in addition to geometric surface normals, and show the advantage of TeX vision with respect to traditional image processing. Differentiating Eq. (1), we have

(7)$$\delta S = \delta T \cdot e\partial_{T}B + \delta e\cdot [B(T) - X] + \delta X\cdot(1-e).$$

Here, we have suppressed the subscripts for clarity. The overall signal variation in the image consists of 3 contributions: the material change $\delta e$, the temperature contrast $\delta T$, and geometric texture $\delta V$ in the term $\delta X = \delta \vec {V}\cdot \vec {S} + \delta \vec {S}\cdot \vec {V}$. Equation (7) indicates that recovering the geometric textures requires the separation of the 3 variation contributions. We argue this is possible with spectral resolution, but in principle, impossible for traditional thermal imaging. This inseparability of geometric textures with temperature and material variations in panchromatic thermal imaging is the fundamental reason underlying the ghosting effect.

Figure 6 shows a newly designed scene with both temperature contrast and geometric textures within each object. The ground truth temperature in Fig. 6(a) and the optical imaging in Fig. 6(b) show the temperature contrast and vivid geometric textures, respectively. However, geometric textures become invisible after being immersed in the temperature contrast using only raw thermal imaging, see Fig. 6(c). The state-of-the-art CLAHE algorithm can improve the visual contrast but fails to separate the geometric textures from temperature contrast, see Fig. 6(d). Furthermore, the state-of-the-art principal component analysis approach also fails to separate the geometric textures from temperature contrast, see Fig. 6(e). On the contrary, TeX vision recovers the geometric textures, enabling a night vision that approaches the texture contrast present in daylight optical imaging. Note that Fig. 6(d) uses broadband thermal images as input, but Fig. 6(e) and Fig. 6(f) use spectrally resolved thermal images as input. The comparison between Fig. 6(d) and Fig. 6(f) shows the importance of the spectral resolution of thermal sensing, while the comparison between Fig. 6(e) and Fig. 6(f) shows the advantage of our TeX vision approach over traditional algorithms. This result clearly demonstrates the advantage of TeX vision in overcoming the ghosting effect for realistic scenes with non-uniform temperatures. See Fig. 6(f) for the ground truth TeX vision.

Fig. 6. TeX vision recovers geometric textures from non-uniform temperature contrast, while traditional thermal imaging with image processing fails. The interplay of temperature contrast and geometric textures makes it difficult for thermal imaging to recover the geometric texture, and this interplay is the fundamental reason causing the ghosting effect. (a) Ground truth temperature. The temperature variation within an object is roughly within 2 Celsius degrees. (b) Optical imaging in daylight with geometric textures. (c) Raw thermal imaging at night. (d) State-of-the-art CLAHE thermal vision. (e) State-of-the-art principal component analysis. (f) TeX vision generated by our TeX-SGD with 50 spectral bands. (g) Ground truth TeX vision.

Download Full Size | PDF

6. Influence of spectral resolution

Spectral resolution plays a vital role in solving the inverse problem of Eq. (6). In the absence of spectral resolution, thermal camouflage [8,18,19] leads to ambiguous solutions. How many spectral bands are needed for a TeX vision generally depends on the specific scene and how many materials we want to discern. Note that the real-world scene of the HADAR database is analyzed by 6 materials; the synthetic scenes of the HADAR database usually contain around 10 materials. Here, we study the error scaling law of TeX-SGD with respect to the ground truth TeX vision for various spectral resolutions. The analysis is based on the scene shown in Fig. 6. The scene was rendered with 100 spectral bands equidistantly distributed within the LWIR (8-14 $\mu$m). Out of the 100 bands, we choose an equidistant subset to generate TeX vision and derive the error of predicted material and temperature. Figure 7 shows the error scaling law as a function of the number of spectral bands. Our results confirm that prediction errors of TeX-SGD decrease with increasing spectral resolution.

Fig. 7. Prediction error of TeX-SGD decreases with increasing spectral resolution. (a) Cross entropy error for material classification. (b) Mean squared error of temperature estimation. (c) Mean squared error of texture estimation normalized by the number of bands. 5 equivalent environmental objects were used by K-means clustering. Note that the exact error values depend on the specific bands used in TeX decomposition. Errors statistically decrease when the number of spectral bands increases. Red stars and dots correspond to Fig.8b and Fig.8c, respectively

Download Full Size | PDF

LWIR hyperspectral imaging is experimentally very challenging and expensive. Telops hypercam, for example, takes around 10 seconds to record a frame ($\sim$ 100 spectral bands). Multi-spectral imaging by the Bayer-filter approach [20] using 4 filters can operate in the snapshot mode. Snapshot capability is crucial to analyze fast processes and avoid spatial and spectral motion blur. Therefore, it is promising for real-time applications such as autonomous navigation. Figure 8 demonstrates the TeX vision of TeX-SGD based on 4 spectral bands in comparison with the ground truth TeX vision and the TeX vision based on 54 bands. The fact that 4-band TeX vision can recover geometric textures and even come close to the ground truth TeX vision shows the possibility of implementing TeX vision using the Bayer-filter approach, which is much more feasible in experiments than LWIR hyperspectral imaging.

Fig. 8. TeX vision in the Bayer-filter approach shows the experimental feasibility of applying TeX vision in real-time applications.

Download Full Size | PDF

7. Cutoff on the number of environmental objects

Here we analyze the dependence of TeX vision on the approximation of environmental radiance. According to Eqs. (4) and (5), the accuracy of the predicted TeX vision depends on the modeling of the environmental radiation. A panoramic image is ideally needed to correctly characterize the environment. If the field of view is restricted, the captured image (i.e., the spectral data cube) is used to approximate significant environmental objects.

Figure 9 shows the TeX vision comparison, with average down sampling vs. the K-means down sampling, for two equivalent environmental objects ($k=2$). For average down sampling, we split the spectral data cube (Image height $\times$ Image width $\times$ Spectral bands) into upper and lower halves along the height direction, spatially average each sub data cube, and get two radiation spectra. The upper spectrum approximates the sky radiation, while the lower spectrum approximates the ground radiation. In K-means down sampling, we perform K-means clustering with $k=2$ on all image pixels, and each cluster is spatially averaged to get the spectrum. As can be seen, K-means down sampling for environment approximation shows more accurate TeX vision, compared to the ground truth in Fig. 5. Explicitly, for the K-means approach, the peak signal-to-noise ratio (PSNR) is 14.36, and the structural similarity (SSIM) is 0.71; for the average approach, the PSNR is 6.52, and the SSIM is -0.25. This is reasonable, as K-means clustering is better than coarse meshing at extracting objects with irregular shapes.

Fig. 9. The influence of down sampling methods on the accuracy of TeX vision.

Download Full Size | PDF

We used K-means down sampling for all experiments unless otherwise specified. Figure 10 further shows the error in predicted TeX vision for various preset numbers of equivalent environmental objects ($k$). $k$ is used as the input to K-means clustering. Due to specific orientations, every object has its own set of environmental objects that dominate its scattering signal. It follows that more environmental objects (i.e., larger $k$) give lower error in solving Eq. (6). However, larger $k$ immediately results in more variables of thermal lighting factors $V_k$, which in turn requires higher spectral resolution to solve the problem. This explains, for a given spectral resolution, why errors steadily approach constants in Fig. 10.

Fig. 10. Error in predicted TeX vision as a function of the number of equivalent environmental objects. (a) Cross entropy error for material classification. (b) Mean squared error of temperature estimation. (c) Mean squared error of texture estimation normalized by the number of bands. 54 spectral bands were used.

Download Full Size | PDF

8. Discussion

The TeX vision is based on long-wave infrared hyperspectral imaging. The image quality of TeX vision is also determined by the used thermal sensor, in addition to the spectral resolution discussed above. Pixel size and pixel pitch of the sensor determine the minimal lengthscale of resolvable textures. For larger pixel size, the total heat signal collected by the pixel is more likely to be a mixture of signals from different materials with different temperatures. This leads to increased difficulties in TeX decomposition. According to the information analysis in Ref. [8], the noise-equivalent power and dynamic range of the sensor restrict its information capacity. Lower dynamic range and higher noise-equivalent power lose more information and lead to poorer TeX vision.

9. Conclusion

We studied the interplay of geometric textures and non-uniform temperature contrast in thermal imaging. We have shown that traditional thermal imaging lacking spectral resolution cannot resolve geometric textures from temperature variation, and we argue this is the fundamental reason causing the ghosting effect. This work verifies the TeX vision theory and demonstrates its advantage over traditional thermal imaging in overcoming the ghosting effect. Furthermore, we have demonstrated TeX vision with the Bayer-filter approach with low spectral resolution. This relieves the experimental challenge of TeX vision, enabling its real-time applications in, for example, autonomous navigation, robotics, wildlife monitoring, smart healthcare, geoscience, and defense.

Funding

Defense Advanced Research Projects Agency; Army Research Office (W911NF-21-1-0287).

Disclosures

The authors declare no conflicts of interest.

Data availability

Results presented in this paper are available in Ref. [8].

References

1. R. Gade and T. B. Moeslund, “Thermal cameras and applications: a survey,” Mach. Vis. Appl. 25(1), 245–262 (2014). [CrossRef]

2. M. Krišto, M. Ivasic-Kos, and M. Pobar, “Thermal object detection in difficult weather conditions using yolo,” IEEE Access 8, 125459–125476 (2020). [CrossRef]

3. A. González, Z. Fang, Y. Socarras, et al., “Pedestrian detection at day/night time with visible and fir cameras: A comparison,” Sensors 16(6), 820 (2016). [CrossRef]

4. K. Tang, K. Dong, C. J. Nicolai, et al., “Millikelvin-resolved ambient thermography,” Sci. Adv. 6(50), eabd8688 (2020). [CrossRef]

5. K. P. Gurton, A. J. Yuffa, and G. W. Videen, “Enhanced facial recognition for thermal imagery using polarimetric imaging,” Opt. Lett. 39(13), 3857–3859 (2014). [CrossRef]

6. W. Treible, P. Saponaro, S. Sorensen, et al., “Cats: A color and thermal stereo benchmark,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017, pp. 134–142).

7. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition, (IEEE, 2012, pp. 3354–3361).

8. F. Bao, X. Wang, S. H. Sureshbabu, et al., “Heat-assisted detection and ranging,” Nature 619(7971), 743–748 (2023). [CrossRef]

9. Y. Khare and V. Pavithran, “Infrared image enhancement using convolution matrices,” in ICOL-2019, K. Singh, A. K. Gupta, S. Khare, N. Dixit, and K. Pant, eds. (Springer Singapore, 2021, pp. 337–340).

10. R. Soundrapandiyan, S. C. Satapathy, P. V. S. S. R. Chandra Mouli, et al., “A comprehensive survey on image enhancement techniques with special emphasis on infrared images,” Multimed. Tools Appl. 81(7), 9045–9077 (2022). [CrossRef]

11. K. G. Dhal, A. Das, S. Ray, et al., “Histogram equalization variants as optimization problems: A review,” Arch. Computat. Methods Eng. 28(3), 1471–1496 (2021). [CrossRef]

12. S. Li, W. Jin, L. Li, et al., “An improved contrast enhancement algorithm for infrared images based on adaptive double plateaus histogram equalization,” Infrared Phys. Technol. 90, 164–174 (2018). [CrossRef]

13. F. Bouhlel, H. Mliki, R. Lagha, et al., “Tir-gan: Thermal images restoration using generative adversarial network,” in Intelligent Systems Design and Applications, A. Abraham, S. Pllana, G. Casalino, K. Ma, A. Bajaj, eds. (SpringerNature Switzerland, 2023, pp. 428–437).

14. Z. Pang, G. Liu, G. Li, et al., “A two-stream deep neural network for infrared image enhancement,” in AOPC 2022: AI in Optics and Photonics, vol. 12563 C. Zuo, ed., International Society for Optics and Photonics (SPIE, 2023, pp. 1256302).

15. X. Kuang, X. Sui, Y. Liu, et al., “Single infrared image enhancement using a deep convolutional neural network,” Neurocomputing 332, 119–128 (2019). [CrossRef]

16. K. Lee, J. Lee, J. Lee, et al., “Brightness-based convolutional neural network for thermal image enhancement,” IEEE Access 5, 26867–26879 (2017). [CrossRef]

17. Q. Du and J. E. Fowler, “Hyperspectral image compression using jpeg2000 and principal component analysis,” IEEE Geosci. Remote Sensing Lett. 4(2), 201–205 (2007). [CrossRef]

18. M. Li, D. Liu, H. Cheng, et al., “Manipulating metals for adaptive thermal camouflage,” Sci. Adv. 6(22), eaba3494 (2020). [CrossRef]

19. Y. Qu, Q. Li, L. Cai, et al., “Thermal camouflage based on the phase-changing material gst,” Light: Sci. Appl. 7(1), 26 (2018). [CrossRef]

20. J. C. Briñez-de León, A. Restrepo-Martínez, and J. W. Branch-Bedoya, “Computational analysis of bayer colour filter arrays and demosaicking algorithms in digital photoelasticity,” Opt. Lasers Eng. 122, 195–208 (2019). [CrossRef]

Previous Article Next Article

Data availability

Results presented in this paper are available in Ref. [8].

8. F. Bao, X. Wang, S. H. Sureshbabu, et al., “Heat-assisted detection and ranging,” Nature 619(7971), 743–748 (2023). [CrossRef]

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.

View in Article | Download Full Size | PDF

Fig. 2. Schematics of the TeX vision. TeX vision represents physics attributes of temperature

$T$

, material

$e$

, and texture

$X$

in the HSV color space. Explicitly, color hue encodes the semantic material category, saturation encodes the temperature, and value/brightness encodes the texture. TeX vision enables AI to see through pitch darkness like broad daylight. The algorithm for TeX decomposition will be discussed in Sec. 4.

View in Article | Download Full Size | PDF

Fig. 3. Workflow of TeX-SGD. (a) The hyperspectral heat cube. (b) Sample spectral radiance curves for the pixels cross-marked in (a). (c) The material library

$ {\mathcal {M}}$

used for TeX-SGD. This material library is the ground truth library that we used to simulate the scene. (d) The residues in fitting the radiance curve of the pixel marked ‘Sky’ with all possible materials in the library. The minimum residue gives a prediction of ‘Sky’, which is correct. In (a), B: Bark, F: Foliage, R: Rock, and W: Water.

View in Article | Download Full Size | PDF

Fig. 4. Workflow of texture distillation in TeX vision, to reconstruct the texture keeping only the sky illumination, mimicking daylight optical imaging.

View in Article | Download Full Size | PDF

Fig. 5. TeX vision generated by TeX-SGD recovers geometric textures and disentangles temperature and emissivity close to the ground truth.

View in Article | Download Full Size | PDF