Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Tuning-free and self-supervised image enhancement against ill exposure

Open Access Open Access

Abstract

Complex lighting conditions and the limited dynamic range of imaging devices result in captured images with ill exposure and information loss. Existing image enhancement methods based on histogram equalization, Retinex-inspired decomposition model, and deep learning suffer from manual tuning or poor generalization. In this work, we report an image enhancement method against ill exposure with self-supervised learning, enabling tuning-free correction. First, a dual illumination estimation network is constructed to estimate the illumination for under- and over-exposed areas. Thus, we get the corresponding intermediate corrected images. Second, given the intermediate corrected images with different best-exposed areas, Mertens’ multi-exposure fusion strategy is utilized to fuse the intermediate corrected images to acquire a well-exposed image. The correction-fusion manner allows adaptive dealing with various types of ill-exposed images. Finally, the self-supervised learning strategy is studied which learns global histogram adjustment for better generalization. Compared to training on paired datasets, we only need ill-exposed images. This is crucial in cases where paired data is inaccessible or less than perfect. Experiments show that our method can reveal more details with better visual perception than other state-of-the-art methods. Furthermore, the weighted average scores of image naturalness matrics NIQE and BRISQUE, and contrast matrics CEIQ and NSS on five real-world image datasets are boosted by 7%, 15%, 4%, and 2%, respectively, over the recent exposure correction method.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Affected by the limited dynamic range of imaging devices and complex lighting conditions, ill exposure (e.g., under- and over-exposure) becomes one of the most common causes of image quality degradation. The images with ill-exposed areas have poor visual perception and aren’t conducive to high-level vision tasks such as segmentation, detection, and tracking [1]. Therefore, it’s necessary to perform image enhancement on the ill-exposed image.

Traditional image enhancement methods against ill exposure are mainly based on histogram and tone curve [2]. Histogram-based methods [35] improve the contrast by stretching the dynamic range of the input image but may result in unnatural colors and over-exposure. Tone-curve-based methods [6,7] adjust the image by the known or estimated tone curve. However, they seem not to work well on over-exposed images and may induce unnatural results [8]. In addition, Retinex theory [9] is widely used in under-exposure correction, in other words, low-light image enhancement. One class of Retinex-inspired methods [1012] is based on the image formation concept. That is, any image is the point-wise product of the reflectance of the materials composing the acquired scene and the illumination of that scene. Therefore, removing the illumination component from the input image can discard illuminant-dependent features (e.g., possible color dominants due to the light) while retaining the structural appearance of the surfaces displayed in the picture [13]. Zhang et al. cast the under- and over-exposure correction as trivial illumination estimation of the input image and the inverted input image while correction results are parameter dependent [8].

With the booming development of deep learning on various vision tasks, learning-based methods are applied in under- [1416] and over-exposure [17] correction. Recent work by Afifi et al. [2] achieves simultaneous under- and over-exposure correction in a single network architecture. To improve on the color distortion and lack of correction consistency seen in Afifi’s work, Nsampi et al. [18] introduce a deep feature matching loss that enables the network to learn exposure invariant representation in the feature space. Both works require paired data for training, which is derived from the exposure correction dataset in [2]. This dataset is rendered from the MIT-Adobe FiveK dataset [19], which has $5,000$ raw-RGB images and corresponding sRGB images rendered manually by five expert photographers. As the ground truth images, [2] adopts images that are manually retouched by an expert photographer (referred to as Expert C). However, these reference images may not have excellent contrast and exposure on every image block (e.g., Fig. 1(a)~(d)). The effectiveness of the supervised learning methods depends on training data quality, which leads to these two methods performing poorly on images out of the dataset, see Fig. 1(e)~(h).

 figure: Fig. 1.

Fig. 1. The limited generalization of supervised learning methods. Top Row: different exposure intensity images (a)~(c) and their ground truth image (d) in Afifi’s dataset [2]. Bottom Row: visual comparison of supervised learning methods MSEC (Afifi et al.) [2] and ECCM (Nsampi et al.) [18] on real-world image dataset DICM [5] with the presented technique. MSEC and ECCM are trained on Afifi’s dataset.

Download Full Size | PDF

In this work, we report a tuning-free and self-supervised image enhancement method against ill exposure. It offers adaptive correction for various types of ill exposure without training on paired datasets, boosting its practicability and generalization. The main contributions are summarized as follows:

  • • We built a dual illumination estimation network to estimate the illumination in under- and over-exposed areas. Thus, we can get the corresponding intermediate corrected images for under- and over-exposure. It allows handling various kinds of ill-exposed scenes with tuning-free.
  • • We employ Mertens’ multi-exposure fusion strategy to fuse the intermediate corrected images for having different best-exposed areas. It gives the final corrected image the proper exposure with adequate details and vibrant colors.
  • • We study the self-supervised learning strategy that learns to adjust the global histogram and enhances the contrast of images. Therefore, we only need ill-exposed images as training data, which is crucial in cases where paired data is inaccessible or less than perfect.
  • • Extensive experiments validate that the proposed method has superior performance in processing real-world images and is conducive to high-level vision tasks. The weighted average scores of image naturalness matrics NIQE and BRISQUE, and contrast matrics CEIQ and NSS on five commonly adopted datasets could be boosted by 7$\%$, 15$\%$, 4$\%$, and 2$\%$, respectively, over the recent exposure correction method ECCM [18].

The rest of this article is organized as follows. The related works are covered in Sec. 2. The details of the reported method are presented in Sec. 3. The experiment results are shown in Sec. 4. In Sec. 5, we conclude this work.

2. Related works

2.1 Non-learning-based methods

Image enhancement as an image processing problem has been positively studied for a long time. For non-learning-based methods, the main methods are based on histogram equalization and the Retinex theory. Histogram-based methods improve contrast by expanding the dynamic range of images. Histogram equalization at the global scale is less effective when the contrast characteristics vary across the image. AHE [20] overcomes this drawback by generating the mapping for each pixel from the histogram in a surrounding window. CLAHE [21] adds a threshold to AHE. It assumes that if a certain pixel value of the histogram exceeds the threshold, crops that pixel and assigns the portion above the threshold to each pixel evenly. However, these simple redistribution operations may produce serious unrealistic effects in the enhanced image as they ignore image structure information [22].

The Retinex theory is developed by Land and McCann, which presents a computational model for estimating the so-called human color perception from digital images [13]. Over the last decades, several Retinex-inspired methods have been exploited for image enhancement. To extract structural information from low-contrast images, one class of Retinex-inspired methods based on image formation concept decomposes the input image into a reflectance map and an illumination map. Further, various strategies are employed to enhance the reflectance and illumination maps. NPE [12] proposes a bi-log transformation, which is utilized to map the illumination to make a balance between details and naturalness. In [11], the illumination of each pixel is first estimated individually by finding the maximum value in its RGB channel. Then the initial illumination map is refined by imposing a structural prior. SRIE [10] proposes a weighted variational model to preserve the estimated reflectance map with more details. We argue that most of these methods work in low-light images and it’s not reasonable to directly apply these solutions for over-exposure correction. Another class of non-learning-based methods inspired by Retinex is Milano-Retinex algorithms [23] which do not exploit image decomposition. The Milano-Retinex algorithms take as input a color image and enhance it by processing channel-wise the intensity of each pixel based on the spatial distribution of surrounding intensities [13].

2.2 Learning-based methods

Learning-based methods have attracted more attention in recent years due to their better accuracy, robustness, and speed compared to traditional methods [24]. One class of methods that do not rely on the Retinex-inspired decomposition model and only establish image-to-image mapping through CNNs. Such as LLNet [25] is an earlier learning-based image enhancement work. It presents a variant of the stacked sparse denoising autoencoder and performs end-to-end learning to brighten and denoise low-light images. MBLLEN [26] is a multi-branch low-light enhancement network. It extracts rich features up to different levels so that it applies enhancement via multiple subnets and finally produces the output image via multi-branch fusion. The category also includes these works [1,27,28] and so on. Another class of methods combines the Retinex-inspired decomposition model and supervised learning. Such as Chen et al. [16] propose a deep Retinex decomposition network that learns to decompose the observed image into reflectance and illumination for low-light image enhancement. Similar works are RUAS [29], KinD [30], Progressive retinex [31], etc. Overall, supervised learning-based image enhancement methods are booming. In our knowledge, Afifi et al. [2] first implement both under- and over-exposure correction based on supervised learning in a coarse-to-fine network. This work corrects the exposure errors by Laplacian pyramid decomposition and reconstruction.

However, a non-negligible issue is that training on paired datasets may lead to the network model overfitting and thus limit its generalization [24]. To address this challenge, methods based on self-supervised learning have been presented. EnligthenGAN [15] is the pioneer in learning to enhance low-light images using unpaired low-light/normal-light data. It employs an attention-guided U-Net as the generator and uses the dual-discriminator to guide global and local information. Zero-DCE [14] eliminates the dependence on paired or unpaired data by a set of well-designed and formulated non-reference loss functions. Zhang et al. [32] design a light-up module and a noise disentanglement for unsupervised low-light images Retinex decomposition and reflectance map denoising. These self-supervised efforts are targeted at low-light image enhancement. So far, we are perhaps the first to realize tuning-free and self-supervised image enhancement against ill exposure, including under- and over-exposure.

3. Method

Our goal is to bring out the hidden information of the ill-exposed image in a tune-free manner. Meanwhile, we prefer the network performance to be independent of the training data. Specifically, we introduce the histogram equalization prior which achieves self-supervised learning global histogram adjustment and improves image contrast. No paired data or any clean data are desired for this purpose. Furthermore, the technique enables the adaptive fixing of ill-exposed areas and produces well-exposed images through the correction-fusion scheme. The overall framework of the presented technique is shown in Fig. 2. The detailed description is classified into three parts as follows.

 figure: Fig. 2.

Fig. 2. Overview of the presented image enhancement technique. UEC and OEC respectively denote under-exposure correction and over-exposure correction.

Download Full Size | PDF

3.1 Histogram equalization prior

Histogram equalization prior is based on histogram equalization enhanced images. Classical histogram equalization adjusts the grayscale histogram to a uniform distribution on a global scale which increases the contrast of the image. As shown in Fig. 3, whether under- or over-exposed, the images are rich in detail, colorful, and enjoyable after histogram equalization. Moreover, in the feature map space, the histogram equalized images have comparable texture information and a similar level of illumination as the ground truth image. However, the histogram equalized images may lead to local over-exposure. For example, the bottles in (c) and (d). Yet it doesn’t happen in our mechanism. See the results in Sec. 4.

 figure: Fig. 3.

Fig. 3. Visual and feature maps comparison of ill-exposed images (a) and (b) with the corresponding histogram equalized images (c) and (d).

Download Full Size | PDF

To further validate histogram equalized images can provide comparable details as the ground truth images, the 500 paired images with different exposure intensities are randomly selected from the SICE dataset [22]. This dataset includes 589 sequences from indoor and outdoor scenes, containing a total number of 4,413 multi-exposure images. Sample image sequences are provided in Fig. 4(a). Then, we calculate separately the SSIM (Structure Similarity Index Measure) [33] on the input image and the histogram equalized image with the ground truth image. SSIM is a metric that measures the degree of structural distortion. The closer the SSIM is to 1, the more similar it is to the ground truth image. The results are graphically depicted in Fig. 4(b). The SSIM of the histogram equalized images is remarkable ( 77$\%$ > 0.7). Besides, Fig. 5 displays the histogram equalized images with different SSIM scores. When SSIM>0.7 (Fig. 5(d)∼(f)), the histogram equalized images have rich details, no particularly under- or over-exposed areas, and more natural colors.

 figure: Fig. 4.

Fig. 4. (a) is the sampling image sequences of the SICE [22] dataset that is adopted to calculate the SSIM [33]. (b) comparison showing the SSIM pie charts of the ill-exposed and histogram equalized images with the ground truth image. The closer the SSIM is to 1, the more similar it is to the ground truth image.

Download Full Size | PDF

 figure: Fig. 5.

Fig. 5. Histogram equalized images with different SSIM scores. The top row, the middle row, and the bottom row are the input image, the histogram equalized image, and the ground truth image, respectively. Besides, (a), (b), (c), (d), and (e) represent respectively SSIM$<$0.5, 0.5$\leq$SSIM$<$0.6, 0.6$\leq$SSIM$<$0.7, 0.7$\leq$SSIM$<$0.8, 0.8$\leq$SSIM$<$0.9, and SSIM$\geq$0.9.

Download Full Size | PDF

In cases where paired training data are inaccessible or unsatisfactory, the above results motivate us to consider using the histogram equalized images to guide the training process of the network. In other words, we perform self-supervised learning of the global histogram adjustment to enhance the image contrast. Taking inspiration from previous work [32], we adopt VGG feature maps to constrain the perceptual similarity between the histogram equalized image and the corrected image in Sec. 3.2. Unlike direct histogram equalization over the spatial domain, feature map constraint enables the network to learn global histogram adjustment and detail augmentation more softly. It also allows the proposed method superior but not equal to histogram equalization.

3.2 Exposure correction module

Given an input image with ill-exposed areas, the exposure module is to correct these areas. Therefore, the under-exposure correction (UEC) sub-network and over-exposure correction (OEC) sub-network are designed to handle under- and over-exposed areas, respectively. Enlightened with the observation that over-exposed areas appear under-exposed in the input-inverted image, we simplify the effort in the OEC sub-network to under-exposure correction and inversion operation. More specifically, first, our solution to under-exposure correction is based on the Retinex-inspired decomposition model [9], which is defined as follows:

$$I = L \odot R,$$
where $I$ stands for the input image, $L$ denotes the illumination map and $R$ represents the reflectance map. In particular, $I$ is formed by the pixel-level multiplication of $L$ and $R$. Furthermore, we assume that the reflectance map is consistent under any illumination condition as in previous work [32]. In other words, the reflectance map indicates the true properties of the target. Therefore, we take it as the final enhanced image. Here, i.e., $R$ is the under-exposure corrected image.

Then, following the observation that over-exposed areas are under-exposed in the input-inverted image, the input-inverted image can be derived with this formula:

$$I_{inv} = 1 - I.$$

Next, we can perform the same Retinex-inspired decomposition operation on $I_{inv}$ as on $I$:

$$I_{inv} = L_{inv} \odot R_{inv},$$
where $L_{inv}$ and $R_{inv}$ denote correspondingly the illumination and reflectance maps of $I_{inv}$. Moreover, $R_{inv}$ can be regarded as the under-exposure corrected image of $I_{inv}$. To acquire the over-exposure corrected image of $I$, an inversion operation is necessary:
$$R_{inv}^{'} = 1 - R_{inv},$$
where $R_{inv}^{'}$ is the desired over-exposure corrected image. A practical example of this process is presented in Fig. 6. It can be seen that the over-exposed pink flower and green leaves in the input image $I$ turn out to be black in the inverted input image $I_{inv}$, which indicates under-exposed. With the Retinex-inspired decomposition on $I_{inv}$, the resulting under-exposure corrected $R_{inv}$ is brighter overall and the final over-exposure corrected image $R_{inv}^{'}$ of $I$ is properly exposed.

 figure: Fig. 6.

Fig. 6. Example for converting over-exposure correction to under-exposure correction and inversion operation. UEC and OEC respectively denote under-exposure correction and over-exposure correction.

Download Full Size | PDF

The last issue left to take into consideration is the design of the learning-based Retinex-inspired decomposition network. We follow the previous work [32]. The Retinex-inspired decomposition network DecomNet is shown in Fig. 7. DecomNet takes the under-exposed image and its maximum channel map (the maximum value of R, G, and B channels) as input. After certain convolution and concatenate layers, it outputs the illumination and reflectance maps in the range of $[0, 1]$ via a sigmoid layer.

 figure: Fig. 7.

Fig. 7. Learning-based Retinex-inspired decomposition network framework. The reflectance map derived from the network can be treated as the under-exposure corrected image.

Download Full Size | PDF

To optimize the parameters of UEC and OEC sub-networks, we adopt the loss function $\mathcal {L}$ in the form of Maximum A Posteriori Probability criterion [34]:

$$\mathcal{L} = \mathcal{L}_{recon} + \lambda_{1}\mathcal{L}_{ill} + \lambda_{2}\mathcal{L}_{ref},$$
where $\mathcal {L}_{recon}$ is the reconstruction loss, $\mathcal {L}_{ill}$ is the illumination loss, and $\mathcal {L}_{ref}$ is the reflectance loss. $\lambda _{1}$ and $\lambda _{2}$ are the weight parameters. To begin with, the reflectance and illumination maps derived from the DecomNet should be able to reconstruct the input image, then the reconstruction loss can be expressed as follows:
$$\mathcal{L}_{recon} = \Vert L \odot R - I\Vert_1 + \Vert L_{inv} \odot R_{inv} - I_{inv}\Vert_1.$$

Second, since the reflectance map is supposed to retain more details and texture information, the illumination map should be smooth and preserve certain boundaries [32]. This is achieved as follows:

$$\mathcal{L}_{ill} = \Vert \frac{\nabla{L}}{max\left(\lvert \nabla{I} \rvert,\epsilon\right)} \Vert_1 + \Vert \frac{\nabla{L_{inv}}}{max\left(\lvert \nabla{I_{inv}} \rvert,\epsilon\right)} \Vert_1,$$
where $\lvert \cdot \rvert$ indicates the absolute value operation, $\epsilon$ represents a small positive constant to avoid zero denominators, and $\nabla$ denotes the gradient including vertical gradient $\nabla {v}$ and horizontal gradient $\nabla {h}$.

Lastly, it’s expected that the final corrected image $\tilde {C}$ derived from the two intermediate sub-corrected images $R$ and $R_{inv}^{'}$ has rich details and texture information. This is constrained by the histogram equalized image $HE$. In other words, the reflectance loss $\mathcal {L}_{ref}$ can be the histogram prior loss $\mathcal {L}_{hep}$, defined as follows:

$$\mathcal{L}_{hep} = \Vert F\left( \tilde{C} \right) - F\left( HE \right)\Vert_2^2,$$
where $F\left ( \cdot \right )$ refers to the feature extraction. During the training process, $\tilde {C}$ is acquired from the simplified average of the two sub-corrected images.

In summary, the loss function can be obtained from the eq. (6), (7), and (8):

$$\mathcal{L} = \mathcal{L}_{recon} + \lambda_{1}\mathcal{L}_{ill} + \lambda_{2}\mathcal{L}_{hep}.$$

In our experiments, $\lambda _{1} = 0.01$ and $\lambda _{2} = 1$.

3.3 Fusion computation module

To preserve interesting areas with vivid colors and rich details in the final corrected image, we adopt an effective fusion strategy guided by quality measures. The quality measures include contrast, saturation, and well exposure. For each pixel, the information from these different measures is combined by multiplication into a scalar weight map $W$ as follows:

$$W_{i j, k}=\left(C_{i j, k}\right)^{\omega_{C}} \times\left(S_{i j, k}\right)^{\omega_{S}} \times\left(E_{i j, k}\right)^{\omega_{E}},$$
where the subscript $i j, k$ refers to pixel $(i, j)$ in the $k$-th image, and $\omega _{C}$, $\omega _{S}$, and $\omega _{E}$ are the corresponding scalar weights. $C$ represents contrast, which is derived from a Laplace filter $F_{L}( \cdot )$ applied to the grayscale map of the image and takes the absolute value on the filter response. Defined as follows:
$$C_{i j, k} = \lvert F_{L}\{Gray(I_{i j, k})\} \rvert,$$
where the Laplace filter $F_{L}$ is [0, 1, 0; 1, -4, 1; 0, 1, 0], and $\lvert \cdot \rvert$ represents the absolute value operator. $S$ denotes saturation which is calculated by the standard deviation of the RGB channels on each pixel. It’s indicated as follows:
$$S_{i j, k} = \sqrt{\frac{\sum_{m \in\{R, G, B\}}(m_{i j, k}-\mu_{i j, k})^{2}}{3}},$$
where $\mu$ is the mean value on RGB channels. $E$ is measured with a Gaussian curve of how close the intensity is to 0.5 (proper exposure) on RGB channels individually and multiply the results. It can be expressed as:
$$E_{i j, k} = \prod_{n \in\{R, G, B\}} e^{-\frac{(n_{i j, k}-0.5)^{2}}{2 \sigma^{2}}},$$
where $\sigma =0.2$.

Besides, we perform normalization and Gaussian pyramid decomposition $\mathbf {G}\{ \cdot \}$ on the weight maps and conduct Laplacian pyramid decomposition $\mathbf {L}\{ \cdot \}$ on the input image sequences as in [35]. It’s for obtaining consistent results. That is, the Laplacian pyramid decomposition of the resulted image $\tilde {C}$ is obtained by:

$$\mathbf{L}\{\tilde{C}\}_{i j}^{l}=\sum_{k=1}^{N} \mathbf{G}\{\hat{W}\}_{i j, k}^{l}\mathbf{L}\{I\}_{i j, k}^{l},$$
where $l$ stands for the $l$-th level in Laplacian pyramid decomposition or Gaussian pyramid decomposition, $N$ is the number of input images and $\hat {W}$ is the normalization of $W$.

Finally, $\mathbf {L}\{\tilde {C}\}^{l}$ is collapsed to produce the final corrected image $\tilde {C}$. An overview is displayed in Fig. 8. The two sub-corrected images are corrected for the corresponding under- and over-exposed areas but they are either brighter or darker. With the fusion technique introduced, the final corrected image retains the colorful areas of the over-exposure corrected image and the detailed areas of the under-exposure corrected image. Moreover, it has the proper exposure.

 figure: Fig. 8.

Fig. 8. Fusion strategy. We adopt the classical exposure fusion technique in [35,36]. It fuses differently exposed images using the Laplacian pyramid decomposition of the images and the Gaussian pyramid decomposition of the weight maps. The weight map relates to the exposure, contrast, and saturation of the input image.

Download Full Size | PDF

4. Experiment

In this section, we first provide specific details about the experiments. Then, we validate the necessity of each module and conduct the ablation study on the contribution of loss functions. Besides, we visualize the correction performance of the proposed method with a histogram exemplar. Further, the experiments are carried out to compare with other classical and state-of-the-art exposure correction methods. Moreover, we take our approach as a pre-processing algorithm for face detection to explore its usefulness for high-level vision tasks. Finally, we summarize the limitations of the proposed method.

4.1 Implementation details

It requires self-supervised training on the exposure correction module. The training data is just ill-exposed images. The SICE dataset [22] is our first choice. It is a multi-exposure image dataset, which has 589 image sequences and 4,413 high-resolution images of different exposures. We randomly select 50 image sequences with a total of 550 images as training data. Besides, we train on Afifi’s exposure correction training dataset for a fair comparison with methods [2] and [18] in Sec. 4.3. This dataset has a total of 17,675 images available for training and 1,000 images are randomly selected for our experiments. All images are resized to $600\times 400$. During the training process, the Adam optimizer is utilized to perform the optimization with a weight decay equal to 0.0001. The initial learning rate is set to $10^{-4}$, which drops to $10^{-5}$ after 20 iterations and then to $10^{-6}$ after 40 iterations. The batch size is set to 16 and the patch size is 64 $\times$ 64. And following our attempts, the network usually converges after 20 to 40 iterations.

Tables Icon

Table 1. NIQE and BRISQUE evaluation between eight exposure correction methods on five real-world image datasets: MEF (17 images) [3], LIME (10 images) [11], NPE (85 images) [12], VV (24 images) [41], and DICM (69 images) [5]. The best score is in bold and the second is underlined. Smaller NIQE and BRISQUE mean better image naturalness.

4.2 Validation

To validate the necessity of each module, we compare the information quantity and contrast score on the images produced by them. Figure 9 shows the results with Entropy [37] and CEIQ [38] scores under different exposure types. Entropy (H) is a statistical measure of randomness that can be used to characterize the texture of the input image. The higher the Entropy, the more texture features the image has. CEIQ is a no-reference quality metric to assess the quality of contrast-changed images. A larger CEIQ means a higher contrast quality to the image. For an overall under-exposed image (top row), the under-exposure corrected image is already fairly well rectified but slightly over-brightened. It works well with the over-exposure corrected image to produce a result with favorable visual perception and proper global exposure. Both H and CEIQ scores reflect it. In the case of the over-exposed image (middle row), the OEC sub-network plays a crucial role. It restores significant details and provides a considerable improvement in contrast. The fusion module is mainly the icing on the cake. In the non-uniformly exposed image (bottom row), under- or over-exposure correction alone doesn’t give a satisfactory result. They are either too bright or too dark. The fused image has a high contrast (CEIQ=3.35) and a proper exposure with rich details (H=7.76). In summary, the presented technique is designed reasonably with two sub-correction networks and a fusion module. It enables the implementation of all kinds of exposure correction and is robust against a variety of scenes.

 figure: Fig. 9.

Fig. 9. Comparison between different modules. The information and contrast of the corrected images are evaluated respectively by the Entropy (H) and CEIQ metric. Both the higher the better. The highest scores are bolded. From top to bottom are the corrections for under-exposure, over-exposure, and non-uniform exposure types.

Download Full Size | PDF

Furthermore, we show in Fig. 10 the visual comparison and quantitative evaluation for different loss combinations. Image quality is measured by NIQE [39] and BRISQUE [40] metrics. NIQE is based on a space domain natural scene statistic model and it’s obtained by a simple distance metric between the model statistics and those of the distorted image. BRISQUE uses scene statistics of locally normalized luminance coefficients to quantify possible losses of "naturalness" in the image. Both NIQE and BRISQUE are non-reference image evaluation matrices and both are smaller the better. Without reconstruction loss $\mathcal {L}_{recon}$, it makes the corrected images dull and grayish. The illumination smoothing loss $\mathcal {L}_{ill}$ has little effect on the visual perception of the corrected images. Further, except for the NIQE score in the bottom row, the remaining NIQE and BRISQUE scores are somewhat worse than the full loss combination. What’s more, there is almost no effect of exposure correction without histogram equalization prior loss $\mathcal {L}_{hep}$. In contrast, the full loss combination presents favorable visual perception and evaluation scores in different scenes.

 figure: Fig. 10.

Fig. 10. Ablation study about the contribution of loss functions (reconstruction loss $\mathcal {L}_{recon}$, illumination smoothness loss $\mathcal {L}_{ill}$, and histogram equalization prior loss $\mathcal {L}_{hep}$). The image quality is measured with the no-reference matrices NIQE [39] and BRISQUE [40]. The smaller the NIQE and BRISQUE, the better the image quality. The best score is bolded.

Download Full Size | PDF

Last, we exemplarily compare the grayscale histogram between the input image and the image corrected by the proposed method to test the correction performance. The grayscale histogram is employed to simply judge whether the image has ill-exposed areas. The gray level from low to high indicates the image from dark to bright. We consider an image as having ill-exposed areas when there are a significant number of pixels in areas with low or high gray levels. Examples are the areas framed by the aqua-green and orange-red rectangles in the left histogram of Fig. 11. In the images, these correspond to the areas where the arrows are pointing. Inside the aqua-green box, the input image has a marked peak, which represents the existence of substantially darker areas in the image, i.e., the eaves. Inside the orange-red box, the input image takes up a relatively small percentage of pixels in this area. This refers to slightly over-exposed areas, i.e. the sky. After correction by our approach, the histogram peak is centered and the whole histogram becomes flat in both darker or brighter areas with few pixel numbers. The changes corresponding to the image are uniform brightness, high contrast, and vivid colors. It proves that our method performs well with adaptive correction of ill-exposed areas.

 figure: Fig. 11.

Fig. 11. An exemplar of the grayscale histogram comparison between the input image and the image corrected by the presented method. The rectangular box on the left histogram corresponds to the arrow on the right. The aqua-green and orange-red respectively represent the darker and brighter areas of the images.

Download Full Size | PDF

4.3 Experimental results

In this subsection, we first qualitatively contrast the proposed method with other advanced exposure correction methods including HE [21], HEP [32], MSEC [2], and ECCM [18]. Figure 12 shows the visual comparison between the selected methods and our method. A detailed view is also displayed of the results for under- and over-exposure correction. The red box and blue box refer to the under-exposure correction and over-exposure correction, respectively. In particular, the red box in Row3 represents the proper exposure. We test whether these correction methods can preserve the areas of proper exposure. HE [21] improves the contrast but leads to over-exposure, such as the grass in Row1. HEP [32] only corrects for under-exposed areas while showing poor visual performance for over-exposed areas. MSEC [2] fails to recover color and texture details in some areas when correcting under-exposed areas as in Row2. Besides, it may produce distorted colors when correcting over-exposed areas, as in Row3. ECCM [18] is the improvement on MSEC [2], which has the same problem as MSEC in correcting under-exposed areas, yet it won’t produce distorted colors in correcting over-exposed areas. Our method can recover details and colors in the ill-exposed areas. What’s more, it preserves as much of the proper exposure areas as possible. In comparison, the presented approach performs better and achieves rich details, vivid colors, and proper exposure.

 figure: Fig. 12.

Fig. 12. Qualitative comparisons for non-uniform exposure correction (the red box represents under-exposure correction, and the blue box indicates over-exposure correction). In particular, the red box in Row3 represents the proper exposure.

Download Full Size | PDF

Besides, we conduct a human subjective study to compare the performance of our method and other methods. Methods for comparison include HE [21], HEP [32], MSEC [2], and ECCM [18]. We invite 10 participants to evaluate the enhanced results of the 10 real-world images using different methods. The 10 images are from these five commonly adopted datasets: MEF (17 images) [3], LIME (10 images) [11], NPE (85 images) [12], VV (24 images) [41], and DICM (69 images) [5]. Each dataset has two images randomly selected. The participants score them according to the performance of the enhanced images (from 1 to 5, 1 means the worst, and 5 means the best). They should consider the contrast, artifacts, noise, details, and color of the enhanced results. For each method, we receive a total of one hundred scores. Figure 13 shows the 10 images we used and score distribution, and our method gets the best. It shows that our method has good subjective image quality.

 figure: Fig. 13.

Fig. 13. The 10 images we used and the score distribution of the user study. The x-axis denotes the score index (1$\sim$5, 5 represents the best), and the y-axis denotes the number of scores in each score index.

Download Full Size | PDF

Then, to further demonstrate the generalizability of the proposed method, we quantitatively test it on the five real-world datasets mentioned above including MEF [3], LIME [11], NPE [12], VV [41], and DICM [5]. A single measure does not suffice to capture all the image quality aspects but we favor the contrast and naturalness of the enhanced image. First, we employ the no-reference metrics NIQE and BRISQUE to evaluate image naturalness as these datasets are unpaired. Different from NIQE, BRISQUE exploits human opinion [13]. To further fairly compare the image contrast, we adopt CEIQ and NSS metrics. CEIQ [38] is a method based on that a high-contrast image is often more similar to its histogram equalized image. NSS [43] is a method based on the principle of natural scene statistics. The compared methods are HE [21], Retinex-Net [16], ECCM [18], Zeo-DCE [14], HEP [32], Tan et al. [42], and Advanced RetinexNet [44]. The methods of Tan et al. and Advanced-Retinex have no publicly available code, so we only compared the NIQE data available in these two publications. The results are shown in Table 1 and Table 2. Figure 14 shows the score distribution on the five dataset combinations for the four measurement matrics. For NIQE and BRISQUE, our method may not achieve the best performance on all single datasets, but the average score ranks first. Also, see Fig. 14, the score distributions of our method are more concentrated. HE gets the best score on CEIQ, related to CEIQ is based on histogram equalization. The poor score of NSS for HEP may be due to its over-enhancement, as shown in the examples in Fig. 12. The remaining methods have similar scores on CEIQ and NSS. In summary, our method has good scores for all four evaluation metrics, with relatively concentrated distributions. It indicates the robustness of our method in enhancing image contrast and maintaining naturalness.

 figure: Fig. 14.

Fig. 14. Numerical distributions of the result on the five dataset combinations for the four no-reference measurement metrics. (a), (b), (c), (d), and (e) are methods HE [21], Retinex-Net [16], ECCM [18], Zero-DCE [14], and HEP [32], respectively.

Download Full Size | PDF

Tables Icon

Table 2. CEIQ and NSS evaluation between six exposure correction methods on five real-world image datasets: MEF (17 images) [3], LIME (10 images) [11], NPE (85 images) [12], VV (24 images) [41], and DICM (69 images) [5]. The best score is in bold and the second is underlined. Larger CEIQ and NSS mean better image contrast.

In the end, we conduct a face detection experiment on the DarkFace dataset [45] with the S3FD algorithm [46]. The results are given in Fig. 15. It compares the detection results of S3FD before and after the execution of the presented algorithm. The yellow rectangular box represents the detected face and the red rectangular box stands for the detail display. The detection results on the first image indicate that our method enables the detection algorithm S3FD to correctly detect more faces. The detection results in the middle two images show that our algorithm can detect faces that aren’t detected before its implementation. Moreover, the last image suggests that our algorithm has better capabilities in avoiding detection errors. In conclusion, it demonstrates that our method is favorable for this high-level vision task.

 figure: Fig. 15.

Fig. 15. Verifying the effectiveness of our algorithm facilitating face detection in the DarkFace dataset [45]. The yellow rectangular box represents the detected face and the red rectangular box stands for the detail display.

Download Full Size | PDF

4.4 Limitations

While our method produces satisfactory results for most of our tested images, it still has a few limitations. First, We compare the time taken by different learning-based methods to process a single image of size 600 $\times$ 400 $\times$ 3. The results are shown in Tab. 3. Zero-DCE performs superior inference speed and fewer parameters. The parameters of our model are large compared to Zero-DCE. Second, for over-exposed images, the information on seriously over-exposed areas is lost. It may show some wrong colors in the enhanced result or recover the over-exposed areas with less texture. As shown in images (a) and (b) in Fig. 16. Finally, the colors of some enhanced results are distorted with unrealistic. See the images (c) and (d) in Fig. 16. The possible reasons are twofold. On the one hand, the ideal assumption of taking the reflectance component as the correction result doesn’t hold always, especially given various illumination properties, which could lead to unrealistic enhancement such as loss of details and distorted colors [24]. Similar problems also appear in other low-light image enhancement methods based on the Retinex-inspired decomposition model [16,47,48]. On the other hand, the self-supervised learning strategy without any clean data constraint is quite likely to make the model decomposition run away and produce unrealistic colors.

 figure: Fig. 16.

Fig. 16. Failed cases. Top: Input images. Bottom: Our results.

Download Full Size | PDF

Tables Icon

Table 3. Comparisons of runtime, FLOPs, parameters, and platform. Images of size 600$\times$400$\times$3 are selected for experiments. RT is the inference time in seconds per image. FLOPs is the number of floating point operations per image. Parameters is the number of trainable parameters per image. The model inference is performed on an NVIDIA GeForce RTX 2060. The best result is in bold.

5. Conclusion

In this work, we present a novel tuning-free and self-supervised learning image enhancement scheme against ill exposure. The proposed method can simultaneously correct both under- and over-exposed areas via dual illumination estimation. With the fusion strategy guided by quality measures, we can achieve well-exposed images with adequate details and vivid colors. The correction-fusion manner enables the proposed method to be adaptive in treating various ill-exposed scenarios. Further, we experimentally verified the practicality of histogram equalization enhanced images for guided training. No paired data or any clean data is necessary for training. The proposed method learns global histogram adjustment and detail enhancement self-supervised, which improves its generalizability. Extensive experiments demonstrate that our approach performs favorably against other state-of-the-art methods. Moreover, our method can improve the detection performance suggesting that it can be applied as a pre-processing algorithm for high-level vision tasks.

Our future work is in three directions. First, we will explore lightweight network structures. Second, we are interested in employing semantic information and text synthesis techniques to recover information on terribly exposed areas [8]. Finally, we intend to improve the results by considering the global tone curve and portrait features.

Funding

National Key Research and Development Program of China (2020YFB0505601); National Natural Science Foundation of China (61971045, 61991451, 61827901).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Refs. [2,3,5,11,12,22,41].

References

1. F. Lv, B. Liu, and F. Lu, “Fast enhancement for non-uniform illumination images using light-weight CNNs,” in Proc. ACM Multimedia, (2020), pp. 1450–1458.

2. M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown, “Learning multi-scale photo exposure correction,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2021), pp. 9157–9167.

3. C. Lee, C. Lee, Y.-Y. Lee, and C.-S. Kim, “Power-constrained contrast enhancement for emissive displays based on histogram equalization,” IEEE Trans. on Image Process. 21(1), 80–93 (2012). [CrossRef]  

4. T. Arici, S. Dikbas, and Y. Altunbasak, “A histogram modification framework and its application for image contrast enhancement,” IEEE Trans. on Image Process. 18(9), 1921–1935 (2009). [CrossRef]  

5. C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2D histograms,” IEEE Trans. on Image Process. 22(12), 5372–5384 (2013). [CrossRef]  

6. L. Yuan and J. Sun, “Automatic exposure correction of consumer photographs,” in Proc. Eur. Conf. Comput. Vision, (2012), pp. 771–785.

7. E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction for digital images,” in ACM Trans. Graph., (2002), pp. 267–276.

8. Q. Zhang, Y. Nie, and W.-S. Zheng, “Dual illumination estimation for robust exposure correction,” in Proc. Comput. Graph. Forum, Vol. 38 (2019), pp. 243–252.

9. E. H. Land, “The retinex theory of color vision,” Sci. Am 237(6), 108–128 (1977). [CrossRef]  

10. X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2016), pp. 2782–2790.

11. X. Guo, Y. Li, and H. Ling, “Lime: Low-light image enhancement via illumination map estimation,” IEEE Trans. on Image Process. 26(2), 982–993 (2017). [CrossRef]  

12. S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. on Image Process. 22(9), 3538–3548 (2013). [CrossRef]  

13. G. Simone, M. Lecca, G. Gianini, and A. Rizzi, “Survey of methods and evaluation of retinex-inspired image enhancers,” J. Electron. Imag. 31(06), 063055 (2022). [CrossRef]  

14. C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong, “Zero-reference deep curve estimation for low-light image enhancement,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2020), pp. 1780–1789.

15. Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang, P. Zhou, and Z. Wang, “Enlightengan: Deep light enhancement without paired supervision,” IEEE Trans. on Image Process. 30, 2340–2349 (2021). [CrossRef]  

16. C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for low-light enhancement,” in Proc. Brit. Mach. Vision Conf, (2018).

17. M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for real-time image enhancement,” ACM Trans. Graph. 36(4), 1–12 (2017). [CrossRef]  

18. N. E. Nsampi, Z. Hu, and Q. Wang, “Learning exposure correction via consistency modeling,” in Proc. Brit. Mach. Vision Conf., (2021).

19. V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photographic global tonal adjustment with a database of input/output image pairs,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2011), pp. 97–104.

20. S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Comput. Vision, Graph. Image Process. 39(3), 355–368 (1987). [CrossRef]  

21. S. Pizer, R. Johnston, J. Ericksen, B. Yankaskas, and K. Muller, “Contrast-limited adaptive histogram equalization: speed and effectiveness,” in Proc. 1st Conf. Visualization Biomed. Comput., (1990), pp. 337–345.

22. J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrast enhancer from multi-exposure images,” IEEE Trans. on Image Process. 27(4), 2049–2062 (2018). [CrossRef]  

23. A. Rizzi and C. Bonanomi, “Milano retinex family,” J. Electron. Imag. 26(3), 031207 (2017). [CrossRef]  

24. C. Li, C. Guo, L.-H. Han, J. Jiang, M.-M. Cheng, J. Gu, and C. C. Loy, “Low-light image and video enhancement using deep learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, (2021) p. 1.

25. K. G. Lore, A. Akintayo, and S. Sarkar, “Llnet: A deep autoencoder approach to natural low-light image enhancement,” Pattern Recognit. 61, 650–662 (2017). [CrossRef]  

26. F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video enhancement using CNNs,” British Machine Vision Conference , 220(1), 4 (2018).

27. C. Li, J. Guo, F. Porikli, and Y. Pang, “Lightennet: a convolutional neural network for weakly illuminated image enhancement,” Pattern Recognit. Lett. 104, 15–22 (2018). [CrossRef]  

28. X. Ren, M. Li, W.-H. Cheng, and J. Liu, “Joint enhancement and denoising method via sequential decomposition,” in IEEE Int. Symp. Circuits and Syst. (ISCAS), (IEEE, 2018), pp. 1–5.

29. R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. Vision and Pattern Recognit., (2021), pp. 10561–10570.

30. Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: a practical low-light image enhancer,” in Proc. 27th ACM Int. Conf. Multimedia, (2019), pp. 1632–1640.

31. Y. Wang, Y. Cao, Z.-J. Zha, J. Zhang, Z. Xiong, W. Zhang, and F. Wu, “Progressive retinex: mutually reinforced illumination-noise perception network for low-light image enhancement,” in Proc. 27th ACM Int. Conf. Multimedia, (2019), pp. 2015–2023.

32. F. Zhang, Y. Shao, Y. Sun, K. Zhu, C. Gao, and N. Sang, “Unsupervised low-light image enhancement via histogram equalization prior,” arXiv, arXiv: 2112.01766 (2021). [CrossRef]  

33. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]  

34. Y. Zhang, X. Di, B. Zhang, and C. Wang, “Self-supervised image enhancement network: training with low light images only,” arXiv, arXiv: 2002.11300 (2020).

35. T. Mertens, J. Kautz, and F. Van Reeth, “Exposure fusion,” in Proc. Comput. Graph. Forum., (2007), pp. 382–390.

36. J. Hai, Y. Hao, F. Zou, F. Lin, and S. Han, “Advanced retinexnet: a fully convolutional network for low-light image enhancement,” Signal Process. Image Commun. p. 116916 (2022).

37. C. E. Shannon, “A mathematical theory of communication,” The Bell Syst. Tech. J. 27(3), 379–423 (1948). [CrossRef]  

38. J. Yan, J. Li, and X. Fu, “No-reference quality assessment of contrast-distorted images using contrast enhancement,” arXiv, arXiv:1904.08879 (2019). [CrossRef]  

39. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Process. Lett. 20(3), 209–212 (2013). [CrossRef]  

40. A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Trans. on Image Process. 21(12), 4695–4708 (2012). [CrossRef]  

41. V. Vonikakis, “Busting image enhancement and tone-mapping algorithms: a collection of the most challenging cases,” https://sites.google.com/site/vonikakis/datasets (Accessed: 1 September 2022).

42. M. Tan, J. Fan, G. Fan, and M. Gan, “Low-light image enhancement via multistage feature fusion network,” J. Electron. Imag. 31(06), 063050 (2022). [CrossRef]  

43. Y. Fang, K. Ma, Z. Wang, W. Lin, Z. Fang, and G. Zhai, “No-reference quality assessment of contrast-distorted images based on natural scene statistics,” IEEE Signal Process. Lett. 22(7), 838–842 (2014). [CrossRef]  

44. J. Hai, Y. Hao, F. Zou, F. Lin, and S. Han, “Advanced retinexnet: a fully convolutional network for low-light image enhancement,” Signal Process. Image Commun. 112, 116916 (2023). [CrossRef]  

45. W. Yang, Y. Yuan, W. Ren, et al., “Advancing image understanding in poor visibility environments: a collective benchmark study,” IEEE Trans. on Image Process. 29, 5737–5752 (2020). [CrossRef]  

46. J. Li, “A pytorch implementation of single shot scale-invariant face detector,” Github (2019) [retrieved 2022-12-10], https://github.com/yxlijun/S3FD.pytorch.

47. S. Park, S. Yu, M. Kim, K. Park, and J. Paik, “Dual autoencoder network for retinex-based low-light image enhancement,” IEEE Access 6, 22084–22093 (2018). [CrossRef]  

48. L. Shen, Z. Yue, F. Feng, Q. Chen, S. Liu, and J. Ma, “MSR-net: low-light image enhancement using deep convolutional network,” arXivarXiv:1711.02488 (2017). [CrossRef]  

Data availability

Data underlying the results presented in this paper are available in Refs. [2,3,5,11,12,22,41].

2. M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown, “Learning multi-scale photo exposure correction,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2021), pp. 9157–9167.

3. C. Lee, C. Lee, Y.-Y. Lee, and C.-S. Kim, “Power-constrained contrast enhancement for emissive displays based on histogram equalization,” IEEE Trans. on Image Process. 21(1), 80–93 (2012). [CrossRef]  

5. C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2D histograms,” IEEE Trans. on Image Process. 22(12), 5372–5384 (2013). [CrossRef]  

11. X. Guo, Y. Li, and H. Ling, “Lime: Low-light image enhancement via illumination map estimation,” IEEE Trans. on Image Process. 26(2), 982–993 (2017). [CrossRef]  

12. S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. on Image Process. 22(9), 3538–3548 (2013). [CrossRef]  

22. J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrast enhancer from multi-exposure images,” IEEE Trans. on Image Process. 27(4), 2049–2062 (2018). [CrossRef]  

41. V. Vonikakis, “Busting image enhancement and tone-mapping algorithms: a collection of the most challenging cases,” https://sites.google.com/site/vonikakis/datasets (Accessed: 1 September 2022).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (16)

Fig. 1.
Fig. 1. The limited generalization of supervised learning methods. Top Row: different exposure intensity images (a)~(c) and their ground truth image (d) in Afifi’s dataset [2]. Bottom Row: visual comparison of supervised learning methods MSEC (Afifi et al.) [2] and ECCM (Nsampi et al.) [18] on real-world image dataset DICM [5] with the presented technique. MSEC and ECCM are trained on Afifi’s dataset.
Fig. 2.
Fig. 2. Overview of the presented image enhancement technique. UEC and OEC respectively denote under-exposure correction and over-exposure correction.
Fig. 3.
Fig. 3. Visual and feature maps comparison of ill-exposed images (a) and (b) with the corresponding histogram equalized images (c) and (d).
Fig. 4.
Fig. 4. (a) is the sampling image sequences of the SICE [22] dataset that is adopted to calculate the SSIM [33]. (b) comparison showing the SSIM pie charts of the ill-exposed and histogram equalized images with the ground truth image. The closer the SSIM is to 1, the more similar it is to the ground truth image.
Fig. 5.
Fig. 5. Histogram equalized images with different SSIM scores. The top row, the middle row, and the bottom row are the input image, the histogram equalized image, and the ground truth image, respectively. Besides, (a), (b), (c), (d), and (e) represent respectively SSIM$<$0.5, 0.5$\leq$SSIM$<$0.6, 0.6$\leq$SSIM$<$0.7, 0.7$\leq$SSIM$<$0.8, 0.8$\leq$SSIM$<$0.9, and SSIM$\geq$0.9.
Fig. 6.
Fig. 6. Example for converting over-exposure correction to under-exposure correction and inversion operation. UEC and OEC respectively denote under-exposure correction and over-exposure correction.
Fig. 7.
Fig. 7. Learning-based Retinex-inspired decomposition network framework. The reflectance map derived from the network can be treated as the under-exposure corrected image.
Fig. 8.
Fig. 8. Fusion strategy. We adopt the classical exposure fusion technique in [35,36]. It fuses differently exposed images using the Laplacian pyramid decomposition of the images and the Gaussian pyramid decomposition of the weight maps. The weight map relates to the exposure, contrast, and saturation of the input image.
Fig. 9.
Fig. 9. Comparison between different modules. The information and contrast of the corrected images are evaluated respectively by the Entropy (H) and CEIQ metric. Both the higher the better. The highest scores are bolded. From top to bottom are the corrections for under-exposure, over-exposure, and non-uniform exposure types.
Fig. 10.
Fig. 10. Ablation study about the contribution of loss functions (reconstruction loss $\mathcal {L}_{recon}$, illumination smoothness loss $\mathcal {L}_{ill}$, and histogram equalization prior loss $\mathcal {L}_{hep}$). The image quality is measured with the no-reference matrices NIQE [39] and BRISQUE [40]. The smaller the NIQE and BRISQUE, the better the image quality. The best score is bolded.
Fig. 11.
Fig. 11. An exemplar of the grayscale histogram comparison between the input image and the image corrected by the presented method. The rectangular box on the left histogram corresponds to the arrow on the right. The aqua-green and orange-red respectively represent the darker and brighter areas of the images.
Fig. 12.
Fig. 12. Qualitative comparisons for non-uniform exposure correction (the red box represents under-exposure correction, and the blue box indicates over-exposure correction). In particular, the red box in Row3 represents the proper exposure.
Fig. 13.
Fig. 13. The 10 images we used and the score distribution of the user study. The x-axis denotes the score index (1$\sim$5, 5 represents the best), and the y-axis denotes the number of scores in each score index.
Fig. 14.
Fig. 14. Numerical distributions of the result on the five dataset combinations for the four no-reference measurement metrics. (a), (b), (c), (d), and (e) are methods HE [21], Retinex-Net [16], ECCM [18], Zero-DCE [14], and HEP [32], respectively.
Fig. 15.
Fig. 15. Verifying the effectiveness of our algorithm facilitating face detection in the DarkFace dataset [45]. The yellow rectangular box represents the detected face and the red rectangular box stands for the detail display.
Fig. 16.
Fig. 16. Failed cases. Top: Input images. Bottom: Our results.

Tables (3)

Tables Icon

Table 1. NIQE and BRISQUE evaluation between eight exposure correction methods on five real-world image datasets: MEF (17 images) [3], LIME (10 images) [11], NPE (85 images) [12], VV (24 images) [41], and DICM (69 images) [5]. The best score is in bold and the second is underlined. Smaller NIQE and BRISQUE mean better image naturalness.

Tables Icon

Table 2. CEIQ and NSS evaluation between six exposure correction methods on five real-world image datasets: MEF (17 images) [3], LIME (10 images) [11], NPE (85 images) [12], VV (24 images) [41], and DICM (69 images) [5]. The best score is in bold and the second is underlined. Larger CEIQ and NSS mean better image contrast.

Tables Icon

Table 3. Comparisons of runtime, FLOPs, parameters, and platform. Images of size 600 × 400 × 3 are selected for experiments. RT is the inference time in seconds per image. FLOPs is the number of floating point operations per image. Parameters is the number of trainable parameters per image. The model inference is performed on an NVIDIA GeForce RTX 2060. The best result is in bold.

Equations (14)

Equations on this page are rendered with MathJax. Learn more.

I = L R ,
I i n v = 1 I .
I i n v = L i n v R i n v ,
R i n v = 1 R i n v ,
L = L r e c o n + λ 1 L i l l + λ 2 L r e f ,
L r e c o n = L R I 1 + L i n v R i n v I i n v 1 .
L i l l = L m a x ( | I | , ϵ ) 1 + L i n v m a x ( | I i n v | , ϵ ) 1 ,
L h e p = F ( C ~ ) F ( H E ) 2 2 ,
L = L r e c o n + λ 1 L i l l + λ 2 L h e p .
W i j , k = ( C i j , k ) ω C × ( S i j , k ) ω S × ( E i j , k ) ω E ,
C i j , k = | F L { G r a y ( I i j , k ) } | ,
S i j , k = m { R , G , B } ( m i j , k μ i j , k ) 2 3 ,
E i j , k = n { R , G , B } e ( n i j , k 0.5 ) 2 2 σ 2 ,
L { C ~ } i j l = k = 1 N G { W ^ } i j , k l L { I } i j , k l ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.