Reliable image dehazing by NeRF

Zheyan Jin; Zhihai Xu; Huajun Feng; Qi Li; Yueting Chen

doi:10.1364/OE.514044

1. Introduction

Image dehazing is used to deal with the negative effects of medium scattering on imaging. Without suitable and excellent image dehazing algorithms, the down-stream applications such as image signal processor (ISP) and image recognition will fail. Image dehazing algorithms have been developing rapidly for more than a decade, but there are still several problems that need to be addressed.

Firstly, different scattering media and different lighting environments can cause fog to appear in different colors. During night time, various artificial color sources need to be considered when dehazing, and fog often appears blue and green when in water, while dust often appears as a yellow-brown color. These changing colors cannot be correctly perceived by a single dehaze model.

Compared to a uniform haze scene, existing dehaze models have poor processing capabilities for non-uniform haze [1]. Existing models are well-suited to handling uniform haze and uniform lighting conditions. In reality, the fog may be non-uniform, and fog may not be irradiated by uniform lights. Non-uniform fog and non-uniform lighting conditions result in non-uniform haze images, making most existing dehaze models fail.

Thirdly, it is difficult to obtain high-quality image dehazing datasets. Many low-level image problems (such as demosaic, deblurring, super-resolution, jitter removal, and denoising) are relatively easy to perform internal camera calibration and dataset acquisition because the degradation occurs inside the camera. Outdoor haze is difficult to generate and remove by equipment, and it is also difficult to maintain consistent lighting and scenery. There are few real shooting data for outdoor dehazing, and the production cost is high. The data generated using a fog machine is difficult to meet the real haze distribution. However, dehazing algorithms are essentially a mapping problem in the image domain [2]. Without a reliable and appropriate real shot training set, the mapping of the network will only be close to simulated data. There will be a problem of mapping domain shift, causing the results of processing real haze scenes to often rely on the priori of the simulated data set.

Finally, the robustness of image dehazing algorithms is poor. In addition to various color biases, there are also cases of insufficient dehazing and excessive dehazing. In many scenes, the concentration of haze will change, but the corresponding haze-free image is unique. Excellent algorithms should be adaptive to the concentration of haze, and changes in the concentration of haze should not affect the processing results. It is important to ensure that the no haze scene remains unchanged after processing.

To solve the above problem, we proposed a brand new haze model. We abandoned the simple uniform haze model and combined the optical scattering formula with the computer graphics rendering formula. The new model can effectively deal with non-uniform haze and the impact of non-uniform lighting on images. Since the light scattering model can handle all types of scattering scenarios, the new model can also be used in multiple fields such as ordinary dehazing, night-time dehazing, remote sensing dehazing, and underwater dehazing without any prior knowledge of each field.

Based on the above new model, we proposed a brand new defogging dataset generation pipeline based on three-dimensional reconstruction voxel deletion. We have proved through theoretical derivation and experimentation that the integral state of the haze scene formed based on the scattering model can be rewritten as a rendering formula. Based on the rendering formula and the correlation between the neural radiation field, we can establish a three-dimensional space based on the haze image, and calculate the haze color and haze intensity of each region in the space under the stable state of haze scattering. The translucent voxels in space are created by the haze mapping relationship in the image. After removing the semi-transparent voxels in the space and re-rendering, we can get the dehazed image, as shown in Fig. 1. We have proved the effectiveness of the entire dehazing pipeline through simulation data and real shooting data. Based on the brand new data generation pipeline, we can process many real shooting data that could not be obtained before, greatly improving the quality of existing datasets. Using these real shooting datasets as training sets can also quickly improve network performance.

In order to improve the robustness of the defogging algorithm and the potential of our generation pipeline, we propose a reliable data augmentation method, which controls the haze by nonlinearly adjusting the brightness of semi-transparent voxels. We can freely adjust the intensity of haze on real shooting data to obtain diverse data varying from no fog, light fog to heavy fog. The haze augmented by these data still retains the non-uniformity of haze and the non-uniformity of illumination. The texture details and illumination are very realistic, and the original image and enhanced image cannot be distinguished subjectively. Feeding these different concentrations of data into the defogging network can greatly improve the robustness of the network. We have made part of the dataset public as we show in Dataset 1 (Ref. [3]).

Our main technical contributions are as follows:

• For non-uniform haze and non-uniform lighting scenes, we proposed a new haze model with optical scattering, which is approximated by computer graphics rendering and represented by neural radiance fields (NeRF).
• We proposed a new dehazing dataset generation pipeline based on three dimensional reconstruction and voxel deletion to solve the lack of existing real shooting datasets.
• To solve the problem of insufficient robustness of existing algorithms, we proposed a data augmentation method based on voxel control, which creates data with different intensities of haze to improve the robustness of dehazing algorithms.

2. Related work

2.1 Dehaze algorithm

Image dehazing is a type of low-level computer vision image restoration. Tang $et al.$ used a random forest regressor to estimate haze [4]. They randomly sampled from multiple clean images and extracted various multi-scale features related to fog, and then synthesized fog maps. The experimental results once again proved the importance of the dark channel feature DCP [5], and proved that the integration of various features can more accurately estimate the degree of cloud and fog coverage.

The method of dehazing based on deep learning can be divided into two stages: the initial network training to get intermediate parameters and then substitute the atmospheric degradation model to calculate the final fog-free image. Later models tend to directly learn from foggy images to fog free. The mapping of the fog image (called end-to-end) omits the solving of intermediate parameters, thereby reducing the generation of errors. In 2016, Cai $et al.$ introduced an end-to-end CNN network called DehazeNet [6]. The input of the model is a contaminated foggy image, and the output is the transmittance map $t(x)$ of the entire image, and then $t(x)$ and estimated the global atmospheric light is substituted into the degradation model to calculate a clean defogging image. Ren $et al.$ proposed a multi-scale deep neural network to estimate the transmittance [7]. The limitation of these methods is that only the transmittance is estimated separately through the CNN framework, so that the errors are amplified by each other. Chen $et al.$ proposed a threshold fusion sub-network [8], which uses GAN to achieve image defogging, and solves the common problem of unreal ghosting. With the recent development of transformer network architecture, DehazeFormer [9] is the SOTA of publicly defogged data SOTS [10].

Compared with ordinary image dehazing, the scene conditions for image dehazing at night are more complicated, and the research started relatively late. Jing $et al.$ proposed the NDIM algorithm, which processes a color correction step after estimating the color characteristics of the incident light [11]. Yu $et al.$ distinguished atmospheric light, haze light, glow, and light sources of different colors, and proposed an NHRG algorithm based on special processing of glow and recognition of different light sources at night [12]. Ancuti $et al.$ proposed a multi-scale patched pyramid network for artificial light sources to fit the night haze environment [13]. It is believed that the local maximum intensity of each color channel of the night image is mainly contributed by the ambient lighting, and the priori of the maximum reflectance is proposed, and the MRP algorithm is designed [14]. Later, the team also proposed a new method of constructing foggy data at night called OSFD [15], based on scene geometry, and then two-dimensional simulation of light and object reflectivity. They use the new haze rendered image proposed new algorithm and a benchmark test method.

Image dehazing algorithms have also evolved into the field of video, which often use multiple frames of information between the temporal variation of the object and haze [16]. Zhang $et al.$ proposed a method based on a mechanical arm retention of motion trajectory to capture of an indoor video dehaze dataset REVIDE [17], and proposed a corresponding fog removal method. Xu $et al.$ proposed the MPA-Net [18], a multi-range time alignment network with physical priors, as a video dehaze removal method based on REVIDE dataset.

2.2 View render

Neural Radiance Fields (NeRF) [19] is an implicit MLP-based model that maps 5D vectors to 3D coordinates plus 2D viewing directions to opacity and color values, computed by fitting the model to a set of training views. NeRF++ [20] tries to solve the ambiguity problem of image reconstruction in NeRF and presents a novel spatial parameterization scheme. PixelNeRF [21] can be achieved with fewer images. NGP [22] uses hash coding and other acceleration methods to greatly speed up the computation of neural radiation fields.

In addition to the reduction in the number of images required and calculation overhead, there are variations of NeRF type algorithms for each area. BlockNeRF [23] focuses on street view generation, megaNeRF [24] focuses on large scale images, wildNeRF [25] focuses on image fusion in different exposure ranges, and darkNeRF [26] focuses on low illumination at night.

2.3 Dehaze dataset and rendering

The most famous image dehazing dataset is RESIDE [10]. It is divided into ITS indoor dataset, OTS outdoor dataset, HSTS mixed subjective test set, and SOTS comprehensive subjective test set. There is also a dehazing dataset synthesized by NYU2Depth [27]. Both of these datasets are simulated. The CVPR NTIRE workshop from 2018 to 2021 released a dataset for competition every year, such as O-HAZE [28] and I-HAZE [29] datasets, which are outdoor and indoor real shot datasets. Later, the outdoor real-shot non-uniform defogging image pair datasets DenseHaze [30] and NH-HAZE [1] were released. But the amount of these data is relatively small, the scene is relatively single, and there is still a gap between the number of training needs.

The defogging data generation is divided into two types: mask construction and physical prior rendering. The mask construction is generally based on the simple atmospheric transmission model, which overlays the haze on the fog-free image, such as the RICE dataset [31]. The other, such as OSFD [15], divides the scene in the middle of the image semantically and then performs lighting and texture re-rendering based on the physical model. However, these data sets are still based on two-dimensional images to generate haze. Therefore, there are problems such as not complying with the three-dimensional haze scattering laws, failing to consider complex lighting environments, and being unable to provide three-dimensional reconstruction capabilities. Moreover, the haze in these images is often artificially created or simulated, and its distribution is still different from the haze distribution in real climate environments.

3. Methodology

In the following section, we first explain the new haze scattering model and compare it with existing dehazing models. The relationship between the scattering formula, rendering formula, neural radiation field formula, and haze image is demonstrated. Then we introduce the connection between the three dimensional neural radiation field and the haze space, and explain the relationship between non-transparent voxels and haze. To solve the voxel threshold problem in the neural radiation field, we introduce a constraint condition of image change speed. In addition, we also discuss the application changes of various haze types caused by lighting factors. Finally, the main process of the entire data generation pipeline is introduced.

3.1 More reliable haze model

The image degradation caused by haze is mainly due to the scattering of particles on light. Existing haze formulas and most datasets assume that haze is depth-related. However, the lighting and haze concentration in actual scenes are not uniformly consistent. For various reasons, haze of different concentrations is illuminated by the light of different intensities and colors in different directions. Existing simple dehazing models are difficult to accurately measure complex haze scenes, and we need a new and more reliable haze model.

The traditional dehazing formula is as follows,

(1) $$I(x) = R(x)t(x)+L(x)(1-t(x)),$$

where $x$ is the position of the pixel, $I(x)$ is the signal received by the camera pixel, $R(x)$ is the signal emitted by the object itself, $L(x)$ is the atmospheric global illumination, and $t(x)$ is a transmission rate. The transmission rate formula is as follows,

(2)$$t(x)=e^{-\beta \cdot d(x)},$$

where $d(x)$ is the distance from the object to the camera, $\beta$ is the attenuation coefficient, and $e$ shows that the attenuation is exponentially linear.

The formula above introduces depth information, but it is incorrect to assume that the light scattering caused by haze only occurs in the depth direction. The dehazing result of this model is often very different from reality. Due to its complexity, light scattering should be considered in three-dimensional space. At the same time, the concentration of haze in space, the degree of illumination of haze, and color all need to be considered.

Returning to the most fundamental model of medium scattering, the impact of haze on light can be mainly divided into absorption, out-scattering, emission, and in-scattering.

Absorption refers to the intensity of light absorbed by haze particles, as shown in Eq. (3):

(3)$$d L(x, \omega) / d x={-}\sigma_{a} L(x, \omega),$$

where $x$ represents haze particles and $\omega$ represents the angle which light emerges from the haze particles. $\sigma _{a}(x)$ represents absorption coefficient. $L(x,\omega )$ represents the intensity of light that emerges from the haze particles at angle $\omega$, and $\sigma _a$ represents the absorption coefficient of the haze particles.

Radiative transfer equation (RTE) as Eq. (4) and Eq. (5). $-\sigma _{s} L(x, \omega )$ represents out-scattering. $\sigma _{s}(x)$ represents out-scattering coefficient. $\sigma _{a} L_{e}(x, \omega )$ represents emission and $f_{p}\left (x, \omega, \omega ^{\prime }\right )$ is a phase function. $\int _{s^{2}} f_{p}\left (x, \omega, \omega ^{\prime }\right ) L\left (x, \omega ^{\prime }\right ) d \omega ^{\prime }$ represents in-scattering.

(4)$$\begin{array}{r} d L(x, \omega) / d x={-}\sigma_{t} L(x, \omega)+\sigma_{a} L_{e}(x, \omega)\\ +\sigma_{s} \int_{s^{2}} f_{p}\left(x, \omega, \omega^{\prime}\right) L\left(x, \omega^{\prime}\right) d \omega^{\prime}, \end{array}$$

(5)$$\sigma_{t}(x)=\sigma_{a}(x)+\sigma_{s}(x).$$

Approximating the derivation of the above formula to the volume can be expressed as Eq. (6), we call it volume rendering equation (VRE) [32]. VRE is the integral form of RTE. $M$ is opaque surface. $L_{i}(x, \omega )$ is in scattering in Eq. (5).

(6)$$\begin{aligned} L(P, \omega) = &T(M) L(M, \omega)\\ &+ \int_{0}^{d} T(x)\left[\sigma_{a} \cdot L_{e}(x, \omega)+\sigma_{s} \cdot L_{i}(x, \omega)\right] dx, \end{aligned}$$

here the transmittance $T(x)$ is the net reduction factor from absorption to out-scattering, which is formulated as follows,

(7)$$T(x)=e^{-\int_{x}^{p} \sigma_{t}(s) d s}.$$

Therefore, the above RTE and VRE formulas are true formulas for light scattering in both real world and computer graphics (CG) renderings. If pursuing accurate image dehazing is necessary, the user must abandon the simple dehazing formula shown in Eq. (1) and Eq. (2) and use RTE or VRE instead. Existing image processing technology is difficult to deal with this three-dimensional lighting problem, but we can use computer graphics rendering to study the removal and generation of haze.

3.2 Multilevel scattering approximation and steady state

The above RTE and VRE formulas are very complex and require a lot of parameter acquisition and calculation overhead. It is difficult to calculate each scattering event completely relying on image information alone. Fortunately, we don’t actually need to calculate every step of scattering and radiation specifically. What we need to calculate is the integral result of scattering or radiation. The intensity of the integral is determined by the light, and the haze particles in the air determine how many levels to integrate. The three dimensional space we finally see is actually the result of the final integration. We need to reconstruct and defog the space, which is the steady state after multi-level scattering approximation.

We assume that the $n$-level scattered light is $G_{n}$, and the scattering change function of each level of multi-level scattered light is ${f}_{ms}$, the formula is as follows,

(8)$$G_{n+1}=G_{n} \cdot \boldsymbol{f}_{m s}.$$

Each level of scattered light will become the light source of the next level of scattering. From a macroscopic scale, the mutual scattering between light rays is completely independent. Since the scattering change function ${f}_{ms}$ is determined by haze particles and is independent of light intensity, we can do multi-level scattering approximation, and the formula is as follows,

(9)$${F}_{ms}=1+{f}_{ms}+{f}_{ms}^{2}+{f}_{ms}^{3}+\ldots=\frac{1}{1-{f}_{ms}}.$$

So the scattering steady state ${F}_{ms}$ will eventually converge to the degree linearly related to each level of scattering change function ${f}_{ms}$ [33].

The formula demonstrates that the final outcome of light scattering must enter a converging stable state. In the actual scenario, light is constantly emitted from the source, and haze particles constantly scatter among themselves. The camera captures the result of the continuous integration of the light rays. As long as the haze and the scene do not change, the actual image we see remains unchanged. It is intuitive that light scattered by haze converges to a steady state. Therefore, when we begin to calculate the amount of fog in the space, we do not need to calculate the complex VRE process of light scattering, but only the stable state after scattering approximation.

3.3 Voxel and volume fog

After analyzing the real haze scattering model and rendering the model and scattering steady state, we can use the three-dimensional reconstruction model to fit the haze scattering steady state. The formula is as follows,

(10)$$\boldsymbol{r}(t)=\boldsymbol{o}+t \boldsymbol{d},$$

where $\boldsymbol {o}$ represents the 3D position of the camera pixel, and $\boldsymbol {d}$ represents the three-dimensional vector of the direction of the light emitted by the camera. Then the final camera pixel color is expressed as,

(11)$$\boldsymbol{C}=(r, g, b)=\int_{t_{n}}^{t_{f}} T(t) \sigma(\boldsymbol{r}(t)) \boldsymbol{c}(\boldsymbol{r}(t), \boldsymbol{d}) d t,$$

(12)$$T(t)=\exp \left(-\int_{t_{n}}^{t} \sigma(\boldsymbol{r}(s)) d s\right),$$

where $T(t)$ represents the accumulated transparency of the ray. $\boldsymbol {C}$ is the color that finally enters the camera. Then the formula for the entire 3D space can be represented by Eq. (13). The conversion from image to 3D space is completed,

(13)$$F_{\text{ haze }}:[(x, y, z),(\theta, \psi) \rightarrow[(r, g, b), \alpha].$$

Since multilevel scattering approximation and steady state exists (Eq. (9)), multi-level scattering can be approximated as once, and the rendering equation (Eq. (6)) and the neural radiation field integral (Eq. (11)) express the same meaning. That is to say, in the stable state of scattering, we can use the three-dimensional reconstruction ability of the neural radiation field to fit a stable foggy space where light and particles do not change.

The voxel with low transparency in space is the stable haze area in the real scattering model. We can eliminate volume fog by removing some voxels. Then Eq. (13) can be rewritten as Eq. (14). We only need to remove the voxels in space that are less than the alpha threshold to achieve accurate non-prior image dehazing,

(14)$$F_{\text{ dehaze }}:[(x, y, z),(\theta, \psi)\rightarrow[(r, g, b), (\alpha>\Delta t)].$$

3.4 Dehazing threshold ($\alpha$)

Due to the accuracy of the sampling, not all voxels are sampled on objects or haze, and some voxels are often sampled on the boundary between haze and objects. As shown in Fig. 2, black represents objects, and light gray represents haze. In most of the light gray and black pixel units, the model inferences some intermediate gray-scale pixel units with different gray scales. These voxels with different gray levels are the inaccurate sampled voxels. However, for the entire voxel space, the boundary area between objects and haze is relatively small. Most sampling lies on the black objects or light haze areas.

Fig. 1. The relationship between voxels and haze. We reconstruct the voxels in the space through multiple sets of foggy images. The fog can be removed by removing the voxels. We prove the effectiveness and reliability of this method through formula derivation, simulation experiments, and real shooting experiments. Therefore, we can reliably and accurately remove haze in various situations without requiring a large amount of data training or complex prior knowledge.

Download Full Size | PDF

Fig. 2. The voxel threshold removal’s regularity. There are specific guidelines for voxel removal that are based on the haze and object distribution found in real life. Not all voxels are objects or haze due to the accuracy of sampling, some voxels are frequently sampled to the boundaries of haze and objects. As seen, there are some voxels of intermediate grey voxels among the mostly light grey and black voxels. However, for the entire voxel space, this portion of the boundary is quite small. There will be three phases when the threshold starts to change: the removal of a large amount of haze, the removal of a small amount of haze and a small amount of object details, and the removal of a large amount of object details. So long as we capture the threshold in the middle phases, we can successfully remove the haze and leaves a significant amount of object details.

Download Full Size | PDF

In order to effectively remove haze while retaining the texture details of objects, we propose a threshold determination method based on image change speed. When the threshold starts to change, there are three stages: removing a large amount of haze, removing the interval between haze and object, and removing a large amount of object details. When the threshold starts to increase at a certain speed, a large amount of haze is removed at the beginning, and the image changes quickly. At this time, the main thing to be removed is haze. Then the same threshold increases significantly, and the image change speed slows down, starting to process the boundary area between haze and objects. After reaching the optimal point, with the same threshold change, the image change speed once again becomes faster, and the object part begins to be removed. After sampling (as shown in the right area of Fig. 2), the square starts to be removed from light to dark, and it can be found that the light color and the final dark color are removed quickly, while the middle part is removed slowly. The slowest part is the optimal solution we need, and it is also the optimal solution for separating objects and haze after sampling.

We show how to determine the threshold in Algorithm 1, the initial setting of $a_{max}$ at 0.2 is an empirical index. We summarize this value by calculating the speed of image change under different situation of thresholds. At last computing image SSIM at equal intervals. Later in the experimental part, we will prove its effectiveness.

3.5 Global and non-global haze

According to the relationship between the light source and the haze, we divide the hazy scene into four categories:

• Uniform illumination & uniform haze (eg: daytime normal dehazing, remote sensing, underwater)
• Non-uniform illumination & uniform haze (eg: nighttime dehazing)
• Uniform illumination & non-uniform haze (eg: daytime smoke or fog machines)
• Non-uniform illumination & non-uniform haze (eg: nighttime smoke or fog machines)

To facilitate discussion, we refer to uniform illumination & uniform haze as global haze and the other types of haze as non-global haze. In a global haze scene, the light often passes through the haze once before illuminating the object, and then passes through the haze again before entering the camera. In a non-global haze scene, most of the light first illuminates the object and then enters the camera after passing through the haze. Compared with non-global haze scenes, the light in global haze scenes passes through the haze scene more times, and this conclusion will affect the subsequent algorithm.

Algorithm 1. Cycle threshold comparison.

View Table | View all tables in this article

Therefore, for global haze, we need to double the difference between the rendered quasi-defogging image $Render(i_{opt})$ and the hazy image $Render(0)$ to deal with the haze between the light source and the object. For global haze, we not only need to deal with the haze between the object and the camera, but also pay attention to the haze between the light source and the object. It should be noted that the concept of incident light passing through haze is a macroscopic concept, while the entry and exit of light between different voxels in the above rendering and scattering equations is a microscopic process.

We have proposed an automatic haze type identifier (Algorithm 2) to distinguish between uniform haze and non-uniform haze automatically. Since we can collect fog samples throughout the entire space, we can determine the type of haze based on the statistical analysis of the distribution of translucent point samples. Regardless of the size of the haze, only the distribution of translucent point samples is statistics, seeking the distribution rule. We have set the threshold value to 10 in the Algorithm 2, which is obtained from experiments.

Algorithm 2. Haze type identifier and adaptive residual.

View Table | View all tables in this article

3.6 Pipeline overview and algorithm structure

We propose a software and hardware integrated dehaze data generation pipeline, as shown in Fig. 3. The system consists of three modules: haze image capture, voxel reconstruction and removal, and post-processing. Both our camera array and the movement of a single camera can capture images of haze. Then, we use a three-dimensional voxel space to reconstruct the haze in the image. We determine the appropriate threshold by rendering the image’s change rate at different thresholds to best eliminate the haze voxels in the area. At the same time, we set up a voxel-based statistical haze type detection mechanism to process the global or non-global haze scene step-by-step. To compensate for the quality degradation of haze-free images caused by various haze reasons and information loss during rendering, we choose a reference image super-resolution to maintain texture details.

Fig. 3. Pipeline overview. We developed a hardware and software-integrated haze removal system. Haze Image Take, Rebuild and Remove Voxels, and Post Processing are the three modules that make up the system. Either our camera array or the movement of a single camera to capture images of the haze. Then, using a number of images, the hazy 3D voxel space is recreated. We also choose an appropriate threshold to eliminate the voxels that are thought to be hazy in this area. By using the image’s change information, we limit the threshold. To obtain the required haze-free image, we re-render the 3D space with the haze voxels removed. In order to enhance the quality of rendered haze-free images due to various haze causes and information loss during rendering, we select specific image post-processing techniques. Image post-processing including image super-resolution and dehazing optimization for uniform and non-uniform haze scenes. Because our image dahazing processing is more accurate and closely resembles the real haze scene, it produces processing results that are more precise, the algorithm is more reliable, and it does not need prior knowledge or data-driven training.

Download Full Size | PDF

To compensate for the information loss during the rendering process and dehaze process, we can choose super resolution with reference images to preserve texture details based on the degree of degradation. It is worth noting that, to ensure fairness in the experimental comparison, all data and image processing results shown in this article have not been used for high-pass filtering.

As our method can take into account different lighting and haze conditions, all data is taken from real-life scenes, so it can be closely related to real haze scenes. At the same time our data process does not require prior knowledge or data-driven training and can be used for a variety of scenarios. Also this dataset generation pipeline itself is a high precision dehazing algorithm that can be used in remote sensing and scientific research where dehazing accuracy is required.

4. Data preparation

4.1 Simulate data by rendering engine

Outdoor shooting datasets need to consider issues such as object movement, changes in light, haze fluctuations, and the power of the haze machine. Therefore, it is very difficult to obtain a real dehazing dataset that maintains all other outdoor conditions. We cannot directly obtain the outdoor shooting dataset required for the experiment. At the same time, existing datasets do not provide a precise quantitative measure of haze concentration. Therefore, existing image dehaze data often only consists of two types: with fog and without fog. The dataset does not contain any variations in fog concentration in the middle. Even if there are image data pairs with different fog concentration, the dehaze images with different fog concentration do not correspond to the same groundtruth. Secondly, existing dehaze datasets are rarely based on different perspectives of the same scene, and cannot be used for 3D reconstruction. Finally, existing datasets cannot simultaneously contain different light sources and different types of fog. Therefore, in order to verify the reliability of our data generation algorithm, we need a haze simulation tool. This tool should be able to generate various concentrations of fog, and create various uniform and non-uniform fog and lighting. The generated scene should also comply with the scattering formula, VRE and RTE.

Therefore, we propose a simulation haze data construction method based on Unreal Engine5 (UE5). We can collect uniform and non-uniform haze scene data respectively through UE5. Due to the nanite rendering feature of the engine, the rendered image accuracy is far higher than sub-pixel level, and it is also far higher than the accuracy required for reconstruction and dehazing.

In computer graphics, many indicators in haze rendering correspond to the field of image dehazing in computational photography. For example, atmospheric fog and exponential height fog correspond to remote sensing haze and normal haze. The density, attenuation, scattering color, scattering distribution, and reflectivity of haze correspond to optical thickness, attenuation coefficient, light source color, atmospheric PSF, and glow haze gradient. Compared with two-dimensional image haze in computational photography, the three-dimensional parameters of computer graphics haze are more complex, and many parameters are not corresponding, such as projected volume shadows and volume scattering intensity. So we only need to adjust some parameters in computer graphics to generate simulation datasets. There are mainly two types of fog in UE5, exponential height fog and atmospheric fog [34]. For remote sensing images, atmospheric fog is more suitable, which has a certain response to height. Only exponential height fog is needed for other scenes. A single exponential height fog corresponds to a uniform haze scene, and the exponential height fog of volumetric fog can handle non-uniform haze scenes.

We move the virtual camera in the 3D space of the rendering engine. After fixing the camera in a certain position, we switch the parameters to take pictures of images with and without fog. Images containing transparency layers will be directly saved in the project folder. Since it is a virtual camera, the obtained image positions are completely aligned and registration is eazy. For some scenes, the objects being photographed have moving parts and special effects. In order to maintain the consistency of the scene, these moving parts and special effects need to be removed. We can choose different scenes and views, as shown in Fig. 4. The view posture result is generated by colmap [35]. We finally produced 180 different unreal scenes 3600 pairs of dehazing data with a resolution of 3000$\times$1600 pixels.

Fig. 4. Simulate Data by Rendering Engine. We get the simulated array image through the rendering engine. Provide sufficient and reliable data for subsequent experiments. The rendering engine’s volumetric fog algorithm is mature and conforms to the scattering formula. Moving objects and lights in the rendering engine can be turned off, and the haze color is determined by the light source. The haze can be adjusted in intensity or even turned off.

Download Full Size | PDF

From the simulation dataset in Fig. 4, we can observe that the collected images have characteristics of directional haze, atmospheric light, and attenuation of the original image signal. We can even discover image features that were previously not noticed, such as the haze reducing the contrast of shadows, and the illuminated haze itself becoming a light source to illuminate areas that are difficult to be directly illuminated by light. The formulas in the previous section prove that our simulation data conforms to the three-dimensional scattering law. However, compared with simulated haze, real haze data is more complex. So we are introducing how to capture real data.

4.2 Array camera and fog maker

We use various devices and methods to capture different scenes, as shown in Fig. 5. For remote sensing images, we use a moving drone to fly back and forth for capturing, with the camera always vertically facing the ground. For underwater scenes, we wait for the water and objects to stabilize and use a moving action camera to capture underwater photos. We record videos with an iPhone on foggy days to obtain uneven haze visual effects at night. We use a 600-watt ultrasonic atomizer to create haze indoors. The floating haze will affect the algorithm, and a single camera cannot handle this scene, so we use a camera array. We place a changing timestamp in the middle of the haze to synchronize the capture time.

Fig. 5. Array Camera and Fog Maker. In order to collect the data for typical scenes, we use a camera array. The camera array mount is 3D printed in resin. We create haze indoor using a 600w ultrasonic fog machine.

Download Full Size | PDF

4.3 Foggy image registration

The reconstruction of the haze-filled space requires image registration using colmap. As the density of the haze increases, the features in the image become increasingly blurred, making it difficult to register the images. We tested this using UE5 simulation data, and found that when the volumetric haze concentration exceeds 0.2, the success rate of feature matching drops significantly, and some images cannot be matched at all, making it almost impossible to use colmap matching. Fortunately, during the long-term data collection process on hazy days, such situations are rare. Most of the data has a voxel concentration distribution between 0.01 and 0.1, and there are almost no cases of high haze concentration in real images that would prevent image registration. Where registration is not possible, the camera array method described earlier can still be used with the previously calibrated camera parameters.

It is important to note that there is no direct correspondence between the haze concentration in UE5 and voxel density. The haze concentration needs to be converted to an equivalent voxel density after considering the multi-level lighting conditions. The volumetric haze concentration mentioned in the previous text is a parameter used in UE5.

5. Experimental assessment

5.1 Appropriate dehazing threshold

As shown in Fig. 2, after sampling, we change the threshold, and the speed of voxel deletion is fast at first, then slow, and then fast. The pseudocode 1 for cycle threshold comparison is also given in the previous part. Here we prove its effectiveness through a simulation experiment. The non-uniform haze simulation dataset is still obtained by rendering with UE5. The experimental results are shown in Fig. 6. Different haze densities are represented by curves of different colors, and the density ranges from 0 to 0.2, corresponding to the haze density parameter in the engine UE5. The upper image shows the variation of interval SSIM and interval PSNR at different densities. The lower image shows the PSNR of the rendered dehaze image under different thresholds compared to the haze-free reference image at different densities. The best haze removal effect is where the PSNR is the highest compared to the haze-free reference image. Using interval SSIM to determine the threshold, the most suitable threshold can be found at any haze concentration. The highest point of interval SSIM in Fig. 6 always falls vertically on the corresponding concentration curve where the PSNR is the highest. Relatively speaking, interval PSNR can also achieve the same effect as interval SSIM when the haze density is large. However, it is easy to produce deviations when there is no haze concentration or when the haze concentration is small. We believe the bias is due to the loss caused by NeRF sampling outweighing the negative effects of haze. Therefore, in the following part, we will use interval SSIM to determine the threshold.

Fig. 6. Cycle threshold comparison experiment. The threshold method’s viability is shown by the experimental results. Different haze concentrations are represented by lines of varying colors from 0 to 0.2. Dehazed images at different thresholds are compared in intervals PSNR and SSIM. Images and haze-free images are compared in PSNR by groundtruth at various thresholds. Hollow circles represent incorrect predictions, and filled circles represent correct predictions.

Download Full Size | PDF

SSIM (Structural Similarity Index) is a metric for measuring the similarity between images. We can subjectively consider the interval SSIM as measuring the speed of haze removal under threshold changes. When the speed of haze removal is the slowest, interval SSIM is the highest, indicating the highest similarity between the images, which corresponds to the threshold that we need to find during the haze removal process.

Whether there is haze in the scene or not, the number of images often affects the quality of the reconstruction of the 3D neural radiation field. The precision of calculating the entire voxel (haze particles) is limited by the number of sampled images and the discrete sampling precision of the network. The more input images there are, the higher the network accuracy, and the higher the calculation cost.

As shown in Table 1, both reconstruction and defogging result get better with the increase in the number of images, and eventually tend to saturate. For reconstructing haze scenes, the reconstruction effect of 4 to 10 images rapidly improves, and from 10 to 20 images, the effect slightly improves, sometimes with minor disturbances. Overall, more images are better, and the growth rate gradually slows down, reaching a saturation state at about 20 images. For defogging after reconstructing the haze, the defogging effect of 4 to 10 images also rapidly increases, and the growth rate is more obvious than that of the haze removal reconstruction effect. The growth rate of 10 to 20 photos slows down, but is still more obvious than the haze reconstruction effect. It also gradually saturates after more than 20 images.

Table 1. Quantitative comparison of the effect of different number of reconstructed images on haze reconstruction and haze removal results.

View Table | View all tables in this article

Many NeRF algorithms are currently being improved for a small number of images, but these algorithms often require encoding. The premise for spatial encoding to have a huge advantage is that there is no haze in the space. Our method can be accelerated by encoding, but the experimental comparison for different image number reconstruction and defogging does not use spatial encoding, but with the relatively stable original NeRF. The ground truth for haze reconstruction and removal comes from a non-uniform haze dataset generated by the UE5.

5.2 Number of reconstructed images

Whether there is haze in the scene or not, the number of images often affects the quality of the reconstruction of the 3D neural radiation field. The precision of calculating the entire voxel (haze particles) is limited by the number of sampled images and the discrete sampling precision of the network. The more input images there are, the higher the network accuracy, and the higher the calculation cost.

As shown in Table 1, both reconstruction and defogging result get better with the increase in the number of images, and eventually tend to saturate. For reconstructing haze scenes, the reconstruction effect of 4 to 10 images rapidly improves, and from 10 to 20 images, the effect slightly improves, sometimes with minor disturbances. Overall, more images are better, and the growth rate gradually slows down, reaching a saturation state at about 20 images. For defogging after reconstructing the haze, the defogging effect of 4 to 10 images also rapidly increases, and the growth rate is more obvious than that of the haze removal reconstruction effect. The growth rate of 10 to 20 photos slows down, but is still more obvious than the haze reconstruction effect. It also gradually saturates after more than 20 images.

5.3 Comparison of different reconstruction algorithms

The improvement method of NeRF has developed rapidly, with the main direction being to recover more texture details and faster computational speeds. These improvements often rely on certain coding and graphics priors to enhance the efficiency of the network. These improvements mainly focus on the transparent space, which is the high alpha voxel region. Most scenarios are clean and free of fog, so low alpha voxel regions are not considered in existing methods. When there is no fog in the three-dimensional space, the voxels in the space are transparent and have no weight, and the encoding can ignore these voxels to improve computational efficiency. However, when an image space has fog, the voxels in the three-dimensional space cannot be simply encoded and omitted, as each voxel needs to be accurately recorded to achieve correct image dehaze.

We have tested the ability of different reconstruction methods to reconstruct and dehaze different scenes under different scenarios, with a quantitative comparison shown in Table 2. The scenes are divided into three types: nohaze, uniform haze, and non-uniform haze. Compared to other methods, the basic NeRF [19] is the most expensive in terms of calculate expenses, but it is also the most accurate and stable. The other methods like NGP [22] use different precision encoding(NGP16 or NGP32), data pre-loading(Pre), and Nvidia acceleration methods(CUDA).

Table 2. Quantitative comparison of reconstruction and dehazing performance of different NeRF methods on simulated datasets.

View Table | View all tables in this article

In terms of the ability to reconstruct haze scene, the difference between different algorithms is not significant, with the native NeRF providing a slight improvement. NGP [22] and CUDA [36] have a significant speed advantage, but the reconstruction quality improves slightly after encoding and sampling. Using high-precision encoding can also provide some improvement, such as NGP32 always being better than NGP16. Regardless of the type of uniform or non-uniform haze, the differences between different methods are not significant.

Unlike haze reconstruction, there is a significant gap between different algorithms in terms of removing haze, and this gap is scene-dependent. Due to the decrease in sampling and perception accuracy, improved encoding methods may decrease the perception ability of haze. For scenarios without fog, improved coding methods will not degrade the effectiveness. For scenarios with non-uniform fog, the degradation is moderate, and for scenarios with uniform fog, the degradation is the most severe when using improved coding methods.

For the scenario of nohaze dehaze, the image should be automatically set the threshold at 0, and the quality of the image should not be greatly affected. The NeRF algorithm using encoding and acceleration is generally better, as voxel-based encoding are usually more compressed and closer to the surface of non-transparent objects. Better encoding of voxel can also better prevent them from being incorrectly removed by the slightly deviated threshold. Similarly, the original NeRF algorithm will remove some incorrect pixels through insufficient encoding. Overall, there is a visible difference between different encoding improvements for nohaze scenes, but the change is not significant.

For non-uniform dehaze, encoding can also reduce dehaze ability. Once encoding begins, non-uniform haze will lose its original transparency, and its pixel transparency weights will be encoded, causing a slight change in distribution, which can lead to degeneration during the dehaze process. Only traditional NeRF can handle non-uniform haze robustly. However, it is worth noting that in order to meet computational requirements, the original NeRF method also uses a rough and then fine encoding method, which cannot achieve absolute accuracy, but does have better sampling accuracy than other methods. In experimental results, we found that NGP-type methods can also handle some non-uniform haze, and the results can be close to traditional NeRF. Since there is less fog in non-uniform haze scenes than uniform haze scenes, the side effects of encoding are not very significant.

For uniform haze scenes, since the haze concentration is high and the entire image is covered by fog, the dehazing results are generally greatly degraded compared to the reconstruction results. The degradation tendency is also related to the ability of the encoding, as encoding causes a deviation in the transparency of all pixels in the entire space. As shown in Fig. 1, even for uniform haze scenes, the voxel after fitting are not completely low-opacity pixels but rather pixels transparency space similar to noise, which is a problem caused by encoding. Encoding can improve the efficiency of network fitting, but it also reduces the accuracy of dehazing.

5.4 Synthetic image dehazing

Firstly, it needs to be disclaimed that since our method is very novel and differs greatly from all existing dehaze methods, it is not appropriate to compare it with existing datasets. Moreover, comparing performance alone is not fair as it is not feasible to establish a fair baseline. Our method can significantly outperform all existing methods in both subjective and objective metrics, but it requires a significant amount of image and computational resources. Therefore, the comparison presented here is mainly intended to showcase the advantages and creativity of our method compared to existing dehaze methods, and to demonstrate the effectiveness of our approach.

Qualitative comparison: Simulation data does not have the movement of real-world fog and camera imaging errors, so our algorithm performs very well here. Subjective results are shown in Fig. 7. For other algorithms, even if the scene depth information is collected or based on the latest and most advanced network structures, it is difficult to ensure the accuracy of color, and often there are serious color errors. All existing defog algorithms are essentially a domain mapping from one image region to another image region, and only our algorithm considers the basic scattering principle of fog. Therefore, our algorithm visually almost matches the ground truth. Our method is superior to all existing algorithms in maintaining color and restoring texture details, whether there is uniform fog or non-uniform fog. In non-uniform lighting environments such as night time haze scene, other methods are almost impossible to completely dehaze. Even though some algorithms can accurately estimate the depth of objects in images, it is difficult to estimate the haze in the image. Haze is not simply related to depth, but also depends on the concentration of haze, the distribution of haze, and the lighting conditions. In contrast, our algorithm can effectively suppress the fog and scattered light caused by the light source, and restore details.

Fig. 7. Hazy image dehazing result with ground truth. The comparison of dehazing outcomes between uniform and non-uniform simulation datasets is shown in the first and second rows. It is evident that our method produces results that are more similar to actual haze-free scenes, richer picture details, all of the haze removed from them, and have more consistent color. A zoom-in of some details can be seen in the small image below.

Download Full Size | PDF

Quantitative comparison: Our algorithm outperforms all existing algorithms in all scenes, whether it is uniform haze or non-uniform haze, or nohaze scenes that reflect the robustness of the algorithm. The specific data is shown in Table 3. Real-world scene data is based on indoor photography in the laboratory, and non-uniform mist is generated using fog machines and air conditioning. Uniform haze is achieved through waiting, while non-uniform haze scene data is generated using UE5. Existing dehaze methods often fail in complex lighting scenarios, and objective indicators even indicate better quality for images with haze. However, our method consistently outperforms most algorithms in all indicators, verifying its effectiveness and reliability in dehazing.

Table 3. Quantitatively compare different dehazing algorithms in different scenarios on simulation unreal data and indoor real data.

View Table | View all tables in this article

5.5 Real image dehazing

The biggest problem with real-world experiments compared to simulated images is that it is difficult to maintain the stationary state of the scene and the fog, especially outdoors. Additionally, real-world images lack true value images as reference. Compared to simulated images, our method has some degradation in performance on real-world images due to the movement of haze and imaging errors. It is worth noting that all experiments require at least 10 reconstructed images.

Due to our method being based on the original scattering equation, our results perform well in various application scenarios. As shown in Fig. 8, our method can better restore the true colors and features of remote sensing images, resulting in more accurate colors for roads, vegetation, glass, and white car paint surfaces, which are consistent with those commonly observed in life. Our method can also completely remove thin cloud in non-uniform haze images while preserving completely transparent cloud areas. In underwater images, our method can remove the turbidity in the water while maintaining the correct colors. We use CIE2000 [37] to evaluate the color difference. The smaller the value of CIE, the smaller the color difference between the dehazing result image and the original image, and the better the image dehazing algorithm maintains color.

Fig. 8. Real hazy image dehazing result without ground truth. Our algorithm can effectively maintain the true color of the image even in the presence of remote sensing, underwater, non-uniform haze and nohaze scene. Other algorithms always distort colors and remove too much haze. Compared with the simulated images, our method also has a certain degenerate in the real-shot images, mainly due to the floating problem of haze and the scarcity of the number of reconstructed images.

Download Full Size | PDF

Furthermore, we tested our method on scenarios without haze, and our method can automatically detect scenarios without haze and reduce the intensity of haze removal by using the cycle threshold comparison. We tested our method using the LLFF dataset [38] and found that almost no negative effects were observed after haze removal. We also tested our method on night haze scenarios, and our method suppressed the uneven directional haze and retained the original night colors, while slightly correcting the colors as shown in Fig. 9.

Fig. 9. Real Nighttime image dehaze. We compared the performance of different algorithms under nighttime real shot data, and only our method was able to completely handle the fog over the street lights and maintain normal colors. PSNR, SSIM, and CIE were obtained from the same algorithm’s simulation data under nighttime conditions by UE5.

Download Full Size | PDF

Some algorithms can obtain depth information from single images and thus identify where there is haze, but it is not clear how much haze needs to be removed and what the true color of the scene will be after haze removal. Deep learning-based algorithms, even the latest transformer-based dehazing model dehazeformer [9], can achieve slightly better results than algorithms that only know the depth location information. However, the methods based on deep learning are still limited by the data distribution, and the limitations of the data will affect the ability of the network. They are prone to color distortion and overdehazing. Even with specifically optimized networks and datasets, such as in the night vision dehazing field, existing algorithms still perform much worse than ours. In general, when faced with images that the network is not familiar with, other methods may suffer from color distortion and overdehazing issues. All comparisons highlight the importance of data mapping in the field of dehazing. Without suitable datasets, single image dehazing is difficult to effectively handle complex haze.

5.6 Real image haze augmentation

Our method allows for excellent control over the removal and enhancement of haze pixels, and it also performs well for controlling the enhancement of haze. We adjust the weights of haze voxels, the amount of haze in the image changes with the adjustments. Since haze voxels record information about lighting, the amount of lighting received by each haze pixel is also adjusted accordingly, as shown in Fig. 10. When the parameters change, each pixel in the image will change according to the distribution of light and haze in three-dimensional space. Subjectively, enhanced images created using our method are difficult to distinguish from real images, and the quality of the generated images is high. Whether it is night-time indicators, road lamps, or tree leaves, the contrast between the illuminated and shadowed sides is high without fog, and the image becomes brighter as the fog increases.

Fig. 10. Real Nighttime Hazy Augmentation. Our method can effectively control the removal and enhancement of haze. 1 represents normal haze , 1-2 represents haze enhancement, and 0 represents dehaze. Although nighttime images without haze scattering have texture information, they are often dark. Therefore, we choose the CLAHE method to brighten them for easier visualization.

Download Full Size | PDF

We note that our method can remove all types of haze, but it cannot remove scattering flare caused by the lens. Our data will still preserve the scattering glare after dehaze processing because the lens defects are different from fog, which does not change with perspective. Since the relationship between different concentrations of haze images and nohaze images is one-to-many, the haze augmentation method is a good method for enhancing real shot dehaze data, which can significantly increase the quantity of the training data.

5.7 Single image dehazing dataset

Existing methods for dehazing are mainly limited by the lack of reliable data sets, making it difficult to achieve reliable image domain mappings. Due to the high reliability and robustness of the method we propose, we treat the generated results as valuable real-world data sets. We feed our real-world data into the Restormer [39] network as the training pairs. The result is shown in Fig. 11. Our single image nigittime dehazing result is also very good, as learning a reliable nighttime haze mapping relationship, knowing that night fog is a scattering model, the contrast between bright and dark images without fog is strengthened. It can also effectively remove non-uniform lighted haze while ensuring the details in the dark areas.

Fig. 11. Nighttime single image dehaze result. The image is from the widely recognized subjective night vision testing set without groundtruth.

Download Full Size | PDF

Due to the aforementioned haze data augmentation ability, as well as the inherent angle interpolation rendering capability of the NeRF type method, our real shot dataset can be quickly expanded in both the haze concentration and the perspective dimensions, and these expansions contain the domain mapping information of the haze and the light. We conducted ablation experiments on these augmented capabilities and demonstrated the improvement in single image defog performance realized by our data augmentation in Table 4. "RenderEngine" is a dataset generated by UE5 for night time scenes. "Our pipeline" refers to the result of our proposed entire dehaze data generation pipeline. "View angle+" is a feature provided by NeRF that enables interpolation of view angles. "Haze alpha+" is a function that changes the concentration of haze in the scene, with the range changing as shown in Fig. 10. In our experiment, the angle increased by one, while the alpha of haze increased by 1.2 to 2.0.

Table 4. Results of different training data generation and augmentation methods ablation experiments on DehazeFormer.

View Table | View all tables in this article

6. Conclusion

Our work proposes new ideas for high performance, high confidence, and robust image dehazing, combining computer rendering, image dehazing, and 3D reconstruction. Exciting results have been achieved on both simulated and real-shot images. Our algorithm is very accurate and robust, it can be used in various fields, and does not require any prior knowledge and training process. Our method has various advantages but also comes with a cost. It requires a camera array or video, which is easily obtainable in remote sensing, autonomous driving, and scientific experiments. However, it is not feasible in personal consumer imaging, and it requires several orders of magnitude more computational resources than traditional feed forward networks. Despite these drawbacks, we believe that our method represents a step towards reliable and high-quality capture of haze in real-world environments. Our approach addresses three of the key challenges in existing dehaze techniques: the lack of real-world dataset, inaccuracy of domain mapping, and lack of robustness of algorithms.

Although the threshold discrimination method adopted in this article is simple and understandable, its effectiveness still has room for improvement. Future improvements in coding for translucent voxels may include techniques such as signed distance field (SDF) [40] to distinguish between clouds and object boundaries, improving the accuracy of cloud removal. At the same time, better image defog datasets are soon to be released. After applying this method, anyone with an array camera or simple video can quickly and cheaply obtain outdoor and indoor defog image datasets with minimal effort. Additionally, dehaze dataset simulation becomes simpler, and large amounts of 3D data like LLFF [38] can be transformed into dehaze datasets.

Funding

National Natural Science Foundation of China (No. 62275229); Civil Aerospace Pre-Research Project (No. D040104).

Acknowledgments

We thank Meijuan Bian from the facility platform of optical engineering of Zhejiang University for instrument support.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available as Dataset 1 (Ref. [3]).

References

1. C. O. Ancuti, C. Ancuti, and R. Timofte, “Nh-haze: An image dehazing benchmark with non-homogeneous hazy and haze-free images,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020).

2. Y. Shao, L. Li, W. Ren, et al., “Domain adaptation for image dehazing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), pp. 2808–2817.

3. Z. Jin, “Reliable image dehazing by nerf,” github (2023) [retrieved 03 Jan 2024], https://github.com/madone7/nerf_dehaze,.

4. K. Tang, J. Yang, and J. Wang, “Investigating haze-relevant features in a learning framework for image dehazing,” in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR), (2014).

5. K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). [CrossRef]

6. B. Cai, X. Xu, K. Jia, et al., “Dehazenet: An end-to-end system for single image haze removal,” IEEE Trans. on Image Process. 25(11), 5187–5198 (2016). [CrossRef]

7. W. Ren, J. Pan, H. Zhang, et al., “Single image dehazing via multi-scale convolutional neural networks with holistic edges,” Int. J. Comput. Vis. 128(1), 240–259 (2020). [CrossRef]

8. D. Chen, M. He, Q. Fan, et al., “Gated context aggregation network for image dehazing and deraining,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), (2019).

9. Y. Song, Z. He, H. Qian, et al., “Vision transformers for single image dehazing,” arXiv, arXiv:2204.03883 (2022). [CrossRef]

10. S. Vashishth, R. Joshi, S. S. Prayaga, et al., “Reside: Improving distantly-supervised neural relation extraction using side information,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018).

11. Z. Jing, C. Yang, and Z. Wang, “Nighttime haze removal based on a new imaging model,” in 2014 IEEE International Conference on Image Processing (ICIP), (2014).

12. L. Yu, R. T. Tan, and M. S. Brown, “Nighttime haze removal with glow and multiple light colors,” in IEEE International Conference on Computer Vision, (2015).

13. C. Ancuti, C. O. Ancuti, C. D. Vleeschouwer, et al., “Night-time dehazing by fusion,” in IEEE International Conference on Image Processing, (2016).

14. Z. Jing, C. Yang, F. Shuai, et al., “Fast haze removal for nighttime image using maximum reflectance prior,” in IEEE Conference on Computer Vision and Pattern Recognition, (2017).

15. J. Zhang, Y. Cao, Z.-J. Zha, et al., “Nighttime dehazing with a synthetic benchmark,” in Proceedings of the 28th ACM International Conference on Multimedia, (Association for Computing Machinery, New York, NY, USA, 2020), MM ’20, p. 2355–2363.

16. B. Li, X. Peng, Z. Wang, et al., “End-to-end united video dehazing and detection,” (2017).

17. X. Zhang, H. Dong, J. Pan, et al., “Learning to restore hazy video: A new real-world dataset and a new method,” in Computer Vision and Pattern Recognition, (2021).

18. J. Xu, X. Hu, L. Zhu, et al., “Video dehazing via a multi-range temporal alignment network with physical prior,” arXiv, arXiv:2303.09757 (2023). [CrossRef]

19. B. Mildenhall, P. P. Srinivasan, M. Tancik, et al., “Nerf: representing scenes as neural radiance fields for view synthesis,” Commun. ACM 65(1), 99–106 (2022). [CrossRef]

20. K. Zhang, G. Riegler, N. Snavely, et al., “Nerf++: Analyzing and improving neural radiance fields,” arXiv, arXiv:2010.07492 (2020). [CrossRef]

21. A. Yu, V. Ye, M. Tancik, et al., “pixelnerf: Neural radiance fields from one or few images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 4578–4587.

22. T. Müller, A. Evans, C. Schied, et al., “Instant neural graphics primitives with a multiresolution hash encoding,” arXiv, arXiv:2201.05989 (2022). [CrossRef]

23. M. Tancik, V. Casser, X. Yan, et al., “Block-nerf: Scalable large scene neural view synthesis,” arXiv, arXiv:2202.05263 (2022). [CrossRef]

24. H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), pp. 12922–12931.

25. R. Martin-Brualla, N. Radwan, M. S. Sajjadi, et al., “Nerf in the wild: Neural radiance fields for unconstrained photo collections,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 7210–7219.

26. B. Mildenhall, P. Hedman, R. Martin-Brualla, et al., “Nerf in the dark: High dynamic range view synthesis from noisy raw images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), pp. 16190–16199.

27. P. Kohli, N. Silberman, D. Hoiem, et al., “Indoor segmentation and support inference from rgbd images,” in ECCV, (2012).

28. C. O. Ancuti, C. Ancuti, R. Timofte, et al., “O-haze: A dehazing benchmark with real hazy and haze-free outdoor images,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2018).

29. C. Ancuti, C. O. Ancuti, R. Timofte, et al., “I-haze: A dehazing benchmark with real hazy and haze-free indoor images,” in Advanced Concepts for Intelligent Vision Systems: 19th International Conference, ACIVS 2018, Poitiers, France, September 24–27, 2018, Proceedings 19, (Springer, 2018), pp. 620–631.

30. C. O. Ancuti, C. Ancuti, M. Sbert, et al., “Dense haze: A benchmark for image dehazing with dense-haze and haze-free images,” arXiv, arXiv:1904.02904 (2019).

31. C. Xie, A. Mousavian, Y. Xiang, et al., “Rice: Refining instance masks in cluttered environments with graph neural networks,” in Conference on Robot Learning, (PMLR, 2022), pp. 1655–1665.

32. J. Novák, I. Georgiev, J. Hanika, et al., “Monte carlo methods for physically based volume rendering,” in ACM SIGGRAPH 2018 Courses, (2018).

33. S. Hillaire, “A scalable and production ready sky and atmosphere rendering technique,” Computer Graphics Forum (2020).

34. EPIC, https://docs.unrealengine.com/4.27/en-us/building-worlds/fogeffects/, (2022).

35. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), (2016).

36. T. Müller, “Tiny CUDA neural network framework,” (2021). Https://github.com/nvlabs/tiny-cuda-nn.

37. M. R. Luo, G. Cui, and B. Rigg, “The development of the cie 2000 colour-difference formula: Ciede2000,” Color Res. & Appl. 26(5), 340–350 (2001). [CrossRef]

38. B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, et al., “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Transactions on Graphics (TOG) (2019).

39. S. W. Zamir, A. Arora, S. Khan, et al., “Restormer: Efficient transformer for high-resolution image restoration,” (2021).

40. W.-T. Chen, W. Yifan, S.-Y. Kuo, et al., “Dehazenerf: Multiple image haze removal and 3d shape reconstruction using neural radiance fields,” arXivarXiv:2303.11364 (2023). [CrossRef]

Array image number	4	6	8	10	12	14	16	18	20
PSNR-reconstruction	16.58	17.32	19.49	20.83	21.14	21.23	21.10	21.06	21.32
SSIM-reconstruction	0.539	0.568	0.718	0.737	0.744	0.754	0.755	0.754	0.754
PSNR-dehaze	15.10	15.59	17.91	19.43	19.46	20.26	20.36	20.46	20.52
SSIM-dehaze	0.436	0.433	0.612	0.622	0.636	0.646	0.652	0.653	0.662

NeRF method	NeRF	NeRF+NGP16	NeRF+NGP32	NeRF+NGP+Pre	NeRF+CUDA
non-uniform(reconst.) PSNR	20.26(0%)	20.10(1.8%)	20.11(1.7%)	20.10(1.8%)	20.06(2.3%)
non-uniform(reconst.) SSIM	0.7862(0%)	0.7438(20%)	0.7450(19%)	0.7445(20%)	0.7392(22%)
non-uniform(dehaze) PSNR	20.98(0%)	17.31(52%)	17.35(52%)	17.56(48%)	17.16(55%)
non-uniform(dehaze) SSIM	0.7646(0%)	0.6634(43%)	0.6652(42%)	0.6159(63%)	0.6718(39%)
uniform(reconst.) PSNR	37.23(0%)	36.18(13%)	36.19(13%)	36.20(13%)	35.87(17%)
uniform(reconst.) SSIM	0.9220(0%)	0.9118(13%)	0.9121(13%)	0.9119(13%)	0.9007(27%)
uniform(dehaze) PSNR	20.74(0%)	19.32(18%)	19.87(10%)	19.18(20%)	14.90(95%)
uniform(dehaze) SSIM	0.7918(0%)	0.7679(11%)	0.7711(10%)	0.7673(12%)	0.6643(61%)
nohaze(reconst.) PSNR	39.89(0%)	39.59(4%)	39.58(4%)	39.60(3%)	39.49(5%)
nohaze(reconst.) SSIM	0.8771(100%)	0.9382(1%)	0.9375(2%)	0.9388(0%)	0.9387(0%)
nohaze(dehaze) PSNR	38.17(17%)	38.94(7%)	38.98(7%)	38.93(7%)	39.54(0%)
nohaze(dehaze) SSIM	0.9886(5%)	0.9492(370%)	0.9537(329%)	0.9507(356%)	0.9892(0%)

Dehazing method	Haze	depth+BCCR	STMRF	DehazeFormer	Ours
non-uniform real	19.18 / 0.4348	18.85 / 0.4412	15.74 / 0.4438	18.93 / 0.4509	26.02 / 0.7152
non-uniform unreal	13.19 / 0.6229	15.54 / 0.6326	16.10 / 0.6491	14.82 / 0.6482	20.98 / 0.7646
uniform real	15.28 / 0.5668	10.09 / 0.4653	13.79 / 0.5560	10.90 / 0.4966	15.83 / 0.5745
uniform unreal	14.36 / 0.6387	12.79 / 0.6500	20.84 / 0.7227	14.56 / 0.6564	20.74 / 0.7918
nohaze real	$\infty$ / 1	12.28 / 0.8415	14.64 / 0.8095	16.56 / 0.8521	25.43 / 0.8578
nohaze unreal	$\infty$ / 1	22.62 / 0.9625	21.77 / 0.8372	25.87 / 0.8660	40.17 / 0.9886

RenderEngine(train)	✓		✓	✓		✓	✓
Our pipeline(train)		✓	✓		✓	✓	✓
View angle+				✓	✓	✓	✓
Alpha haze+							✓
RenderEngine(test) PSNR	25.04	16.87	27.89	25.77	17.45	28.87	29.26
RenderEngine(test) SSIM	0.853	0.559	0.891	0.900	0.643	0.899	0.903
Our pipeline(test) PSNR	24.70	28.05	33.83	25.45	29.30	34.26	34.40
Our pipeline(test) SSIM	0.601	0.800	0.833	0.692	0.830	0.916	0.919

Array image number	4	6	8	10	12	14	16	18	20
PSNR-reconstruction	16.58	17.32	19.49	20.83	21.14	21.23	21.10	21.06	21.32
SSIM-reconstruction	0.539	0.568	0.718	0.737	0.744	0.754	0.755	0.754	0.754
PSNR-dehaze	15.10	15.59	17.91	19.43	19.46	20.26	20.36	20.46	20.52
SSIM-dehaze	0.436	0.433	0.612	0.622	0.636	0.646	0.652	0.653	0.662

Reliable image dehazing by NeRF

Abstract

1. Introduction

2. Related work

2.1 Dehaze algorithm

2.2 View render

2.3 Dehaze dataset and rendering

3. Methodology

3.1 More reliable haze model

3.2 Multilevel scattering approximation and steady state

3.3 Voxel and volume fog

3.4 Dehazing threshold ($\alpha$)

3.5 Global and non-global haze

3.6 Pipeline overview and algorithm structure

4. Data preparation

4.1 Simulate data by rendering engine

4.2 Array camera and fog maker

4.3 Foggy image registration

5. Experimental assessment

5.1 Appropriate dehazing threshold

5.2 Number of reconstructed images

5.3 Comparison of different reconstruction algorithms

5.4 Synthetic image dehazing

5.5 Real image dehazing

5.6 Real image haze augmentation

5.7 Single image dehazing dataset

6. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Supplementary Material (1)

Data availability

Cited By

Figures (11)

Tables (6)

Equations (14)

Optics Express