Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Multi-focus image fusion algorithm based on random features embedding and ensemble learning

Open Access Open Access

Abstract

Multi-focus image fusion algorithm integrates complementary information from multiple source images to obtain an all-in-focus image. Most published methods will create incorrect points in their decision map which have to be refined and polished with post-processing procedure. Aim to address these problems, we present, for the first time, a novel algorithm based on random features embedding (RFE) and ensemble learning which reduced the calculation workload and improved the accuracy without post-processing. We utilize RFE to approximate a kernel function so that Support Vector Machine (SVM) can be applied to large scale data set. With ensemble learning scheme we then eliminate the abnormal points in the decision map. We reduce the risk of entrap into over-fitting predicament and boost the generalization ability by combining RFE and ensemble learning. The theoretical analysis is in consistence with the experimental results. With low computation cost, the proposed algorithm achieve high visual quality as the state-of-the-art(SOTA).

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

In the field of digital photography, it is almost impossible to obtain an all-in-focus image due to the limited depth-of-field (DOF) of the lens. When light from a source point of an object within DOF passes through the lens and imposes on the sensor of the camera, it will converge into a focal point with a small circle of confusion (CoC). For the points out of DOF, the corresponding diameter of CoC grows larger than a threshold, therefore the image will appear blurred [1]. To obtain an all-in-focus image, multi-focus image fusion (MFIF) is widely used to fuse multiple images, which are from same scene with different DOF.

In the light of fusion domain, MFIF methods can be summarily classified into transform domain-based and spatial domain-based methods. The transform domain-based methods generally comprise three stages, namely, domain transform, fusion, and reconstruction. Traditional transform domain-based methods [2,3] transform source images into multi-scale feature domain with transform theories. Newly published transform domain-based methods [4,5] transform source images into single-scale domain with signal representation theories. The spatial domain-based methods fuse images without domain transform. On the basis of fusion unit, spatial domain-based methods can be roughly divided into three categories: pixel-based [6], block-based [7], and region-based [8]. Either in transform-based or spatial-based methods, focus detection and fusion rule are designed manually, which can not achieve a satisfactory fusion result [9] due to that it is not possible to take all factors into account.

With the renaissance of Artificial Neural Networks(ANNs), various Deep Learning(DL) based MFIF algorithms have been proposed for automatically and jointly detecting focus and fusing image. Liu et al. [10] firstly introduced Convolutional Neural Networks(CNNs) into MFIF filed. Du and Gao [11] combined a multi-scale CNNs with post-processing methods to propose MSCNN-based fusion method. Tang et al. [12] proposed a pixel-wise fusion method, which recognize whether a pixel is in focus through a learned CNNs. Zhao et al. [13] firstly proposed an end-to-end CNNs-based method. Xu et al. [14] proposed an end-to-end fully convolutional two-stream network. Zhai and Zhuang [15] measured the focus level of the source images by the Energy of Laplacian(EOL) and an immediately following denoising encoder-decoder. Li et al. [16] developed a MFIF framework in surfacelet domain based on four dynamic threshold neural P systems (DTNP systems), where DTNP systems were used to control the fusion of low- and high- frequency coefficients in surfacelet domain. Peng et al. [17] utilized coupled neural P (CNP) systems to solve MFIF problem, and developed a MFIF framework in the non-subsampled contourlet transform (NSCT) domain. Panigrapy et al. [18] proposed a translation-invariant MFIF approach based on à-trous wavelet transform, they utilized fractal dimension to measure the approximation coefficients and Otsu’s threshold to fuse the detail coefficients. They also [19] proposed a MFIF method based on a novel parameter adaptive dual channel pulse coupled neural network(PA-DCPCNN), which derived from DCPCNN and its parameters were adaptively estimated. Unfortunately, these DL-based methods have many incorrect points within their decision map which have to be refined and polished by vast post-processing procedures. For example, consistency verification is used in [10,15], morphological operation and watershed are used in [11], small regions removal is used in [12].

The Support Vector Machine (SVM) is introduced in MFIF field by Li [20] et al., they employ SVM instead of focus detection measures, which makes images preserve more details of source images. However, the fusion rule is manually designed and post-processing procedure is used in their method. Ensemble learning is introduced in MFIF field by Naji et al. [21]. They divide their network into five paths and use two paths to integrate three deep features that generated from three CNNs paths respectively. The ensemble learning scheme greatly improves the fusion result. However, as a deep neural network, it is a large model with huge amount parameters. They create 1,000,000 macro-patch(patch size: 32 $\times$ 64) to train their deep network, in which has 1,582,784 weights and 1,474 biases to be tuned [21].

This paper, for the first time, presents a novel algorithm based on random features embedding (RFE) and ensemble learning which reduced the calculation workload and improved the accuracy without post-processing. We considers MFIF as a classification task, where, features extraction and classifier are crucial. We use Laplacian operators and RFE to extract features. Specifically, we utilize EOL and Modified-Laplacian (ML) operators to generate focus maps from the source images. After that, we utilize sliding windows to sequentially crop local region maps from these focus maps, and we then create the feature vectors for the classifier by mapping these local region maps through the RFE. We use SVM as a classifier to judge whether the corresponding image patch is clear or blurred, and we use Hard-Voting (HV) to ensemble three trained SVM classifiers, which will create the final decision map by judging the source image patches individually. Finally, we obtain a clear image with the final decision map and the source images. By combining RFE and SVM, the time complexity of the proposed algorithm is $O(D)$, where D is the dimension of the random features. To be specific, compared with [21], which has more than 1.5 millions computing units, we greatly reduce the calculation workload due to the fact that D is finally set up as 1024 in the proposed algorithm. With the help of ensemble learning, we automatically eliminate the abnormal points in the decision map. Meanwhile, we reduce the risk of entrap into the over-fitting predicament and boost the generalization ability by combing SVM and ensemble learning. The theoretical analysis is in consistence with the experimental results. With low computation cost, the proposed algorithm achieve high visual quality as the state-of-the-art(SOTA) [22].

2. Materials and methods

The goal of MFIF is to generate a clear image that can provide more information for downstream vision tasks. The schematic diagram of the proposed algorithm is shown in Fig. 1, in which consists of three stages: focus detection, committee forming & policy-making, and fusion. In focus detection stage, we use EOL and ML operators to extract focus features from the source images, for each image, two type focus information maps are obtained. In the next stage, we utilize a sliding window and RFE to generate three feature vectors and then fed these vectors back to three trained SVM classifiers respectively. These classifiers form a committee to decide the fusion map. To be specific, each SVM plays a role of commissar to judge which region of the source image corresponding to the feature vector is clearer. The output of SVM is regarded as a ballot to select the most sharp one, and the winner takes all. After binarization, the committee produces the final decision map. In the final stage, a fused image is obtained from the source images.

 figure: Fig. 1.

Fig. 1. The flowchart of the proposed method. We use EOL and ML operators to obtain two type focus maps firstly. Then, we obtain three feature vectors from the focus maps and fed them back to the committee to obtain the decision map. Finally, a clear image is obtained from the source images.

Download Full Size | PDF

2.1 Focus detection

EOL is a focus measure for analyzing high spatial frequencies associated with image border sharpness [23]. ML is a modified version of EOL. If multi-focus source images are RGB color, they need to be converted into gray-scale images firstly. Then, the focus detection is performed on each gray image, consequently, two corresponding focus information maps are obtained. Formally, let $l(x,y)$ be the gray-level intensity of pixel(x,y), let $I_{GA}$ , $I_{GB}$ be the source gray images, and let $ELM_{A}$, $ELM_{B}$ be the focus information maps generated by EOL, let $SLM_{A}$, $SLM_{B}$ be the focus information maps generated by ML, respectively.

$$\begin{aligned} & ELM_{A} = \{E_L(x,y)|(x,y)\in I_{GA} \}, \\ & ELM_{B} = \{E_L(x,y)|(x,y)\in I_{GB} \}, \\ & SLM_{A} = \{M_L(x,y)|(x,y)\in I_{GA} \}, \\ & SLM_{B} = \{M_L(x,y)|(x,y)\in I_{GB} \}, \end{aligned}$$
where $E_L(x,y)$ express as
$$\begin{aligned} E_L(x,y) = & -I(x-1,y-1)-4I(x-1,y)-I(x-1,y+1)\\ & -4I(x,y-1)+20I( x, y)-4I(x,y+1) \\ & -I(x+1,y-1)-4I(x+1,y)-I(x+1,y+1). \end{aligned}$$
and $M_L(x,y)$ express as
$$\begin{aligned} M_L(x,y) = & |2I(x,y)-I(x-1,y)-I(x+1,y)|\\ & +|2I(x,y)-I(x,y-1)-I(x,y+1)|. \end{aligned}$$

2.2 Committee forming & policy-making

2.2.1 Base learner

SVM is a vector space based machine learning algorithm and aims to search a decision boundary among two class that is maximally away from any sample in the training data [24]. Apparently, SVM breaks down when input data are not linearly separable for classification or do not have a linear relationship for regression. Kernel trick is a prevalent method to solve this imperfection. The basic idea of kernel trick is to map the source data into a higher-dimensional space so that data can be linearly separable for categorization. Kernel trick finds an equivalent kernel function to avoid expensive operations in high-dimensional feature space. While, kernel trick will lead to efficiency issues for large-scale data set. Consider a classifier problem with a data set $\{(x_n,y_n)\}_{n=1}^{N}$, where $x_n\in X$ and $y_n \in Y$. Let $f(x)$ be the decision function that is optimal for some loss function.

$$f(x) = \sum_{n=1}^{N} \alpha_n k(x,x_n),$$
where $k( \cdot )$ is the kernel function and $\alpha _n$ denotes the weight parameter, which is induced by a $N\times N$ co-variance matrix $K$.
$$\begin{aligned} K = \begin{bmatrix} k(x_1,x_1) & k(x_1,x_2) & \dots & k(x_1,x_N) \\ k(x_2,x_1) & k(x_2,x_2) & \dots & k(x_2,x_N) \\ \vdots & \vdots & \ddots & \vdots \\ k(x_N,x_1) & k(x_N,x_2) & \dots & k(x_N,x_N) \end{bmatrix}. \end{aligned}$$
The computation is time-intensive and memory-consuming, and makes it difficult to apply SVM to large-scale data sets. Fortunately, Ali et al. [25] propose a different tack: approximating a continuous shift-invariant kernel $k:X \times X \to R$, by
$$k(x_i,x_j) \approx Z(x_i)^{T} Z(x_j)=s(x_i,x_j),$$
where Z is a feature embedding$:X \to R^{D}$, So the decision function in (4) can be written as
$$f(x) = \sum_{n=1}^{N} \alpha_nk(x,x_n)\approx \sum_{n=1}^{N} \alpha_n s(x,x_n)=\beta ^{T}Z(x).$$
Provided $s(x_i,x_j)$ is a good approximation of $k(x_i,x_j)$, then we can solve the learning problem in $O(N)$ time. Based on the facts discussed above, we obtain the base learner SVM by optimizing such an object function.
$$w=\underset{w}{argmin} \frac{1}{2}\Vert w \Vert ^{2} + \frac{C}{N} \sum_{i=1}^{N} max(0,1-y_i \langle w,\phi(x_i)\rangle),$$
where ${(x_i,y_i)}_{i=1}^{N}$ is the training set, C is the hyper parameter and the decision function can be modified as :
$$f(x) = \langle w,\phi(x)\rangle,$$
where $\phi (x)$ is a kernel embedding based on the Fourier transform $P(w)$ of the kernel function $k(\cdot )$. To be specific, we use the radial basis function as the kernel function
$$k(x_i,x_j) = e^{-\frac{||x_i-x_j||_2^{2}}{2}}.$$
and the corresponding Fourier transform $P(w)$ is
$$P(w) = (2\pi)^{-\frac{D}{2}}e^{-\frac{||w||_2^{2}}{2}}.$$
then, $\phi (x)$ can be represented as
$$\begin{aligned} \phi(x)=Z(x) = \sqrt{ \frac{2}{D}} \begin{bmatrix} sin(w_1^{T}x) \\ cos(w_1^{T}x) \\ \vdots \\ sin(w_{D/2}^{T}x) \\ cos(w_{D/2}^{T}x) \end{bmatrix}, w_i \xleftarrow[]{i.i.d} P(w), \end{aligned}$$
where D is the dimension of random features that need to experimentally set, and $i.i.d$ denotes independently identically distribution [26].

2.2.2 Committee forming & policy-making

Ensemble learning integrates multiple base learners to expand the hypothesis space, reduce the risk of falling into a local optimum, and improve generalization ability [27]. Consequently, ensemble methods obtain better performance than any of the constituent learner alone [28]. We introduce three type data sets in a cheap and simple pattern to train three base learners, and adopt HV to ensemble these base learners.

Type 1:

Firstly, we employ a sliding window (window size:$16 \times 16$, step size:2) to crop two local focus maps from the global focus maps $ELM_A$ and $ELM_B$ respectively. Then, we concatenate these two local focus maps to form a $32 \times 16$ map and reshape it to a 512-dimensional vector(marks as $V_a$). Finally, we perform RFE $Z$ on $V_a$ to obtain a feature vector(marks as feature vector 1). If the source image $I_{GA}$ is sharper than $I_{GB}$, we mark the label of feature vector 1 as 1, otherwise as 0.

Type 2:

Firstly, we employ a sliding window (window size:$16 \times 16$, step size:2) to crop two local focus maps from the global focus maps $SLM_A$ and $SLM_B$ respectively. Then, we concatenate these two local focus maps to form a $32 \times 16$ map and reshape it to a 512-dimensional vector(marks as $V_b$). Finally, we perform RFE $Z$ on $V_b$ to obtain a feature vector(marks as feature vector 3). If the source image $I_{GA}$ is sharper than $I_{GB}$, we mark the label of feature vector 3 as 1, otherwise as 0.

Type 3:

Firstly, we perform element-wise-add on $V_a$ and $V_b$ to obtain a vector(marks as $V_c$). Then, we perform RFE $Z$ on $V_c$ to obtain a feature vector(marks as feature vector 2). If the source image $I_{GA}$ is sharper than $I_{GB}$, we mark the label of feature vector 2 as 1, otherwise as 0.

After focus detection, we use above vector generator procedure to sequentially obtain feature vector 1 , 2, and 3, which correspond to a region in the source image pair. Then, we feed these vectors back to the trained base learners, which can judge whether element in source images corresponding to the vector is focus(1) or defocus(0). Finally, A policy-maker consists of an accumulator register and a conditional judgement takes these learner’s output as input to create the decision map. Formally, let $Y_{bl1}$, $Y_{bl2}$, and $Y_{bl3}$ denote the base learner 1, 2, and 3 prediction respectively, let $y$ be the policy-maker prediction.

$$\begin{aligned} & y = max(0,\lceil (Y_{bl1}+Y_{bl2}+Y_{bl3})-Thr \rceil),\\ \end{aligned}$$
where $Thr$ is a hyper parameter,
$$\begin{aligned} & Y_{bl1} = \langle w_{bl1}, \phi(V_a) \rangle ,\\ & Y_{bl2} = \langle w_{bl2}, \phi(V_b) \rangle ,\\ & Y_{bl3} = \langle w_{bl3}, \phi(V_c) \rangle, \end{aligned}$$
where $w_{bl1}$, $w_{bl2}$, and $w_{bl3}$ are the normal vector of the base learner hyperplane respectively, and $\phi (\cdot )$ is the kernel embedding. If y is positive, then pixels (initialized to 0) in decision map corresponding to the vector is incremented by 1, otherwise do nothing. When all of the source image regions have been processed, we binarize the decision map to obtain the final decision map(marks as $M_D$). To be specific, we find the max value in the decision map firstly. Then all of the pixels in the decision map are divided by the max value. Finally, we set the pixel as 1 if it greater than 0.5, otherwise set it as 0.

2.3 Fusion

Finally, we get a clear image through:

$$F(x,y)=M_D(x,y)I_A(x,y) + (1-M_D(x,y))I_B(x,y).$$

3. Results

3.1 Experimental configuration

3.1.1 Data set

We use the test set of BSD300 [29] to generate training set. For each image, we use a Gaussian filter blurring (standard deviation:2, kernel size: $7 \times 7$) to obtain five blurred version images. To be specific, we perform Gaussian filter blurring on the source image to obtain the first blurred version. Then, we obtain the second version from the first version and so on. For each blurred-and-clearer image pair, we use EOL and ML operators to generate four focus maps ($ELM_A$, $ELM_B$, $SLM_A$, and $SLM_B$). Finally, we use the vector generator procedure described in $2.2.2$ to obtain three training data sets, and each training data set includes 56700 positive samples and 56700 negative samples. We use the $Lytro$ data set [4] as the test set.

3.1.2 Model parameter

The parameters configuration are summarized in Table 1. As improper selection of the $D$ and $C$ may cause over-fitting or under-fitting problem, we use cross-validation to determine the proper value, and empirically set the $Thr$ as 1.9999. To effectively use the training data sets, we utilize 60% (680400/1134000) of each set to train the base learner and hold-out 40% (453600/1134000) for testing. The verification results show that the classification accuracy of each base learner can reach around 99% (99.14% for base learner 1, 98.90% for base learner 2, 99.04% for base learner 3).

Tables Icon

Table 1. Parameters and precision of the base learners

3.1.3 Evaluation metrics

As shown in Table 2, we use five metrics to comprehensive and objective assess the proposed algorithm. Specifically, entropy(EN) [30] and cross entropy(CE) [31] are information theory based. The gradient-based similarity measurement(Qabf) [32] and spatial frequency(SF) [33] are image feature based, and the structural similarity index measure(SSIM) [34] is image structural similarity based. For all metrics except CE, a larger value indicates a better performance [10]. In addition, we compare the proposed algorithm with other 10 MFIF algorithms, in which include transform domain-based, spatial domain-based, and deep learning based. Specifically, transform domain-based methods include GD [35], MGFF [36], MSVD [37] and SFMD [38]. Spatial domain-based methods include BGSC [39], BFMF [8], and IFM [40]. Deep learning based methods include ECNN [21], IFCNN [41], and SESF [22]. We obtain the source code from the related publications and configure the parameters following the default values.

3.2 Results and comparisons

3.2.1 Qualitative performance

We illustrate the fusion results in Figs. 24. For each sample, post index A and B denote the source image pair, and F denotes the fused image. We display an enlarged focus/defocus boundary region at the top right corner of each image, that demonstrate the proposed algorithm capable of generating clear objects and boundary. In order to further compare the visual performance between the proposed method and other 10 MFIF methods, we investigate the fusion result of $Lytro-11$ image pair, which is closed-grained and contains more objects. As shown in Fig. 5, we transform all images into pseudo-color maps by using the function $applyColorMap$ with the colormaps $COLORMAP\_RAINBOW$ in OpenCV. And we can evidently see that many methods produce pixel artifact and color distortion. To be specific, as shown in the yellow/black rectangular regions and the enlarged version at the top left/right corner, BFMF, BGSC, and MGFF show artifact and distortion, while GD and MSVD show color distortion. While, the proposed algorithm obtains a good fused result which not only reduces blurring but also boosts visual perception.

 figure: Fig. 2.

Fig. 2. The source and fused images of Lytro data set. post index A,B: source image pair, F: fused image, red box: enlarged region within the focused/defocused boundary.

Download Full Size | PDF

 figure: Fig. 3.

Fig. 3. The source and fused images of Lytro data set. post index A,B: source image pair, F: fused image, red box: enlarged region within the focused/defocused boundary.

Download Full Size | PDF

 figure: Fig. 4.

Fig. 4. The source and fused images of Lytro data set. post index A,B: source image pair, F: fused image, red box: enlarged region within the focused/defocused boundary.

Download Full Size | PDF

 figure: Fig. 5.

Fig. 5. Pseudo-color maps, which are produced by using the function $applyColorMap$ and the colormaps $COLORMAP\_RAINBOW$ in OpenCV. (a) and (b) $Lytro-11$ source image pair, (c) to (m) fused images, the enlarged image within yellow box (top left) and black box (top right).

Download Full Size | PDF

Tables Icon

Table 2. Evaluation metrics and algorithms

Figs. 68 show visual quality comparisons between ECNN, SESF, and the proposed method. For each image, subfigures a and b are the source image pair, c, d, and e are the fusion images obtained by ECNN, SESF, and the proposed method respectively. All images are transformed into pseudo-maps. As shown in Fig. 6 and Fig. 7, the proposed method can maintain more texture and color details of the source image than ECNN. As shown in Fig. 8, the proposed method also shows more robustness than ECNN. In the fusion result of ECNN (c), the shape of windowsill in the background has distorted. While in the fusion result of the proposed method (e), all objects are clear.

 figure: Fig. 6.

Fig. 6. Pseudo-color maps of Lytro-01 and their fused images obtained with different fusion methods. (a) and (b) source image pair, (c) ECNN, (d) SESF, (e) proposed. It is obviously that the proposed method can maintain more texture details of the source image than ECNN, labeled by the black rectangle.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. Pseudo-color maps of Lytro-17 and their fused images obtained with different fusion methods. (a) and (b) source image pair, (c) ECNN, (d) SESF, (e) proposed. It is obviously that the proposed method can maintain more color details of the source image than ECNN, labeled by the black rectangle.

Download Full Size | PDF

 figure: Fig. 8.

Fig. 8. Pseudo-color maps of Lytro-20 and their fused images obtained with different fusion methods. (a) and (b) source image pair, (c) ECNN, (d) SESF, (e) proposed. As shown in the black rectangle, the proposed method can obtain clear image, while in the fusion result of ECNN, the shape of windowsill in the background has distorted.

Download Full Size | PDF

3.2.2 Quantitative evaluation

The quantitative results are summarized in Table 3, for each metric (row), we denote the worst three values in green. It is obviously that BFMF, BGSC, and MSVD are the worst three in EN, while GD, IFCNN, and MGFF are the worst three in CE. Besides, IFM is among the worst three in SF, and SFMD is among the worst three in Qabf. For all of the methods (column), we denote the method in red when it perform well in all metrics, i.e. ECNN, SESF, and the proposed algorithm are predominantly advanced over others.

Tables Icon

Table 3. Average of the quantitative evaluation metrics for different methods. The worst three values in each metric (row) are denoted in green. The best three methods (column) are denoted in red, which does not possess any metrics denoted in green.

Following [21], we calculate the number of the weights and biases parameters of the proposed method, ECNN and SESF methods. For both ECNN and SESF, the weights and biases will be tuned during the training phase, and the fusion image need to be calculated with these tuned parameters during the inference phase. The number of parameters is the key indicator of the computation cost. As shown in Table 4, the parameters number of the proposed method is only of 1/30 compared to that of the SESF. Furthermore, the proposed method does not need post-processing procedure for refining, while SESF does need extra calculation cost for post-processing. Meanwhile, the parameters number of the proposed method is only of 1/500 compared to that of ECNN.

Tables Icon

Table 4. Comparison of number of parameters between the proposed method and the others

3.2.3 Ablation study

We further explore how the proposed method automatically eliminate the abnormal points in the final decision map. Firstly, we take the $Lytro-03$ image pair as an input specimen to obtain a decision map. Then, we remove the module $policy-maker$ and obtain three decision maps from the three base learners. To be specific, we directly binarize the output map of each base learners to create the decision map. We demonstrate the different decision maps in Fig. 9, and we can see that the abnormal points in the yellow (a), green (b), and red (c) boxes are eliminated in the decision map (d) by the proposed method. The advantages of the decision map of the proposed method are to eliminate the abnormal points without post-processing, while other method has to correct the abnormal points based on their decision map. The higher visual quality of the proposed method has obviously proved that our decision map is more accurate.

 figure: Fig. 9.

Fig. 9. The decision maps without and with ensemble learning of the proposed method for $lytro-03$.The yellow/green/red rectangular region shows the outliers induced by base learner 1/2/3.

Download Full Size | PDF

4. Conclusion and discussion

In this paper, we, for the first time, present a novel MFIF algorithm based on RFE and ensemble learning, which reduced the calculation workload and improved the accuracy without post-processing. The time complexity of the proposed algorithm is linear to the dimension of the random feature vector. Compared with the DL based ECNN [21] and SESF [22], which have millions of parameters to be tuned and millions of calculations to be executed, the proposed algorithm greatly reduces the calculation workload. Meanwhile, with the procedure of the features vector generator, the train set contains adequate image variety and ensures a degree of focus diversity. This makes the proposed algorithm reduces the risk of entrap into the over-fitting predicament. With the help of ensemble learning, the proposed algorithm automatically eliminates the abnormal points in the decision map and boosts the generalization ability. The experimental fusion result of the proposed algorithm (without post-processing) is in consistence with the theoretical analysis. With low computation cost, the proposed algorithm achieve high visual quality as the SOTA [22] (with post-processing) .

It is worthwhile to notice that the proposed algorithm is of great potentials for further improvement. We use relatively simple ensemble tactic, and the train set is generated based on just 200 images and includes three type features. Empirically, ensemble model with a significant diversity tend to yield better results [42]. By increasing diversity among the data set and designing more elaborate ensemble scheme, the fusion results of the proposed algorithm is definitely able to be enhanced and more cutting-edge works could be conducted.

Funding

West Light Foundation of the Chinese Academy of Sciences (Y72Z510Y10); Instrument Developing Project of the Chinese Academy of Sciences (E028610101).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but can be obtained upon reasonable request.

References

1. V. Aslantas and D. Pham, “Depth from automatic defocusing,” Opt. Express 15(3), 1011–1023 (2007). [CrossRef]  

2. Q. Zhang and B. Guo, “Multifocus image fusion using the nonsubsampled contourlet transform,” Signal Processing 89(7), 1334–1346 (2009). [CrossRef]  

3. L. Cao, L. Jin, H. Tao, G. Li, Z. Zhuang, and Y. Zhang, “Multi-focus image fusion based on spatial frequency in discrete cosine transform domain,” IEEE Signal Process. Lett. 22(2), 220–224 (2014). [CrossRef]  

4. M. Nejati, S. Samavi, and S. Shirani, “Multi-focus image fusion using dictionary-based sparse representation,” Inf. Fusion 25, 72–84 (2015). [CrossRef]  

5. B. Zhang, X. Lu, H. Pei, Y. Liu, W. Zhou, and D. Jiao, “Multi-focus image fusion based on sparse decomposition and background detection,” Digit. Signal Process. 58, 50–63 (2016). [CrossRef]  

6. J. Duan, L. Chen, and C. P. Chen, “Multifocus image fusion using superpixel segmentation and superpixel-based mean filtering,” Appl. Opt. 55(36), 10352–10362 (2016). [CrossRef]  

7. X. Bai, Y. Zhang, F. Zhou, and B. Xue, “Quadtree-based multi-focus image fusion using a weighted focus-measure,” Inf. Fusion 22, 105–118 (2015). [CrossRef]  

8. Y. Zhang, X. Bai, and T. Wang, “Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure,” Inf. Fusion 35, 81–101 (2017). [CrossRef]  

9. S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of the art,” Inf. Fusion 33, 100–112 (2017). [CrossRef]  

10. Y. Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural network,” Inf. Fusion 36, 191–207 (2017). [CrossRef]  

11. C. Du and S. Gao, “Image segmentation-based multi-focus image fusion through multi-scale convolutional neural network,” IEEE Access 5, 15750–15761 (2017). [CrossRef]  

12. H. Tang, B. Xiao, W. Li, and G. Wang, “Pixel convolutional neural network for multi-focus image fusion,” Inf. Sci. 433-434, 125–141 (2018). [CrossRef]  

13. W. Zhao, D. Wang, and H. Lu, “Multi-focus image fusion with a natural enhancement via a joint multi-level deeply supervised convolutional neural network,” IEEE Trans. Circuits Syst. Video Technol. 29(4), 1102–1115 (2018). [CrossRef]  

14. K. Xu, Z. Qin, G. Wang, H. Zhang, K. Huang, and S. Ye, “Multi-focus image fusion using fully convolutional two-stream network for visual sensors,” IEEE Trans. Circuits Syst. Video Technol. 12, 2253–2272 (2018). [CrossRef]  

15. H. Zhai and Y. Zhuang, “Multi-focus image fusion method using energy of laplacian and a deep neural network,” Appl. Opt. 59(6), 1684–1694 (2020). [CrossRef]  

16. L. A. Bo, P. A. Hong, B. Jw, and A. Xh, “Multi-focus image fusion based on dynamic threshold neural p systems and surfacelet transform,” Knowledge-Based Syst. 196(7), 105794 (2020). [CrossRef]  

17. P. A. Hong, L. A. Bo, Y. A. Qian, and B. Jw, “Multi-focus image fusion approach based on cnp systems in nsct domain,” Comput. Vis. Image Underst. 210(4), 103228 (2021). [CrossRef]  

18. C. Panigrahy, A. Seal, N. K. Mahato, O. Krejcar, and E. Herrera-Viedma, “Multi-focus image fusion using fractal dimension,” Appl. Opt. 59(19), 5642–5655 (2020). [CrossRef]  

19. C. Panigrahy, A. Seal, and N. K. Mahato, “Fractal dimension based parameter adaptive dual channel pcnn for multi-focus image fusion,” Opt. Lasers Eng. 133, 106141 (2020). [CrossRef]  

20. X. Li, L. Wang, J. Wang, and X. Zhang, “Multi-focus image fusion algorithm based on multilevel morphological component analysis and support vector machine,” IET Image Process. 11(10), 919–926 (2017). [CrossRef]  

21. M. Amin-Naji, A. Aghagolzadeh, and M. Ezoji, “Ensemble of cnn for multi-focus image fusion,” Inf. Fusion 51, 201–214 (2019). [CrossRef]  

22. B. Ma, Y. Zhu, X. Yin, X. Ban, H. Huang, and M. Mukeshimana, “Sesf-fuse: An unsupervised deep model for multi-focus image fusion,” Neural Comput. Appl. 33(11), 5793–5804 (2021). [CrossRef]  

23. W. Huang and Z. Jing, “Evaluation of focus measures in multi-focus image fusion,” Pattern Recognition Letters 28(4), 493–500 (2007). [CrossRef]  

24. C. Cortes and V. Vapnik, “Support vector machine,” Mach. Learn. 20(3), 273–297 (1995). [CrossRef]  

25. A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in NIPS, vol. 3 (Citeseer, 2007), p. 5.

26. D. J. Sutherland and J. Schneider, “On the error of random fourier features,” arXiv preprint arXiv:1506.02785 (2015).

27. H. Blockeel, “Hypothesis space,” Encycl. Mach. Learn. 1, 511–513 (2011). [CrossRef]  

28. L. Rokach, “Ensemble-based classifiers,” Artif. Intell. Rev. 33(1-2), 1–39 (2010). [CrossRef]  

29. D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. 8th Int’l Conf. Computer Vision, vol. 2 (2001), pp. 416–423.

30. J. W. Roberts, J. A. Van Aardt, and F. B. Ahmed, “Assessment of image fusion procedures using entropy, image quality, and multispectral classification,” J. Appl. Remote. Sens. 2(1), 023522 (2008). [CrossRef]  

31. D. Bulanon, T. Burks, and V. Alchanatis, “Image fusion of visible and thermal images for fruit detection,” Biosyst. engineering 103(1), 12–22 (2009). [CrossRef]  

32. C. Xydeas and V. Petrovic, “Objective image fusion performance measure,” Electron. Lett. 36(4), 308–309 (2000). [CrossRef]  

33. A. Eskicioglu and P. Fisher, “Image quality measures and their performance,” IEEE Trans. Commun. 43(12), 2959–2965 (1995). [CrossRef]  

34. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing 13(4), 600–612 (2004). [CrossRef]  

35. S. Paul, I. S. Sevcenco, and P. Agathoklis, “Multi-exposure and multi-focus image fusion in gradient domain,” J. Circuits, Syst. Comput. 25(10), 1650123 (2016). [CrossRef]  

36. D. P. Bavirisetti, G. Xiao, J. Zhao, R. Dhuli, and G. Liu, “Multi-scale guided image and video fusion: A fast and efficient approach,” Circuits, Syst. Signal Process. 38(12), 5576–5605 (2019). [CrossRef]  

37. V. Naidu, “Image fusion technique using multi-resolution singular value decomposition,” Def. Sci. J. 61(5), 479 (2011). [CrossRef]  

38. H. Li, L. Li, and J. Zhang, “Multi-focus image fusion based on sparse feature matrix decomposition and morphological filtering,” Opt. Commun. 342, 1–11 (2015). [CrossRef]  

39. J. Tian, L. Chen, L. Ma, and W. Yu, “Multi-focus image fusion using a bilateral gradient-based sharpness criterion,” Opt. Commun. 284(1), 80–87 (2011). [CrossRef]  

40. S. Li, X. Kang, J. Hu, and B. Yang, “Image matting for fusion of multi-focus images in dynamic scenes,” Inf. Fusion 14(2), 147–162 (2013). [CrossRef]  

41. Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “Ifcnn: A general image fusion framework based on convolutional neural network,” Inf. Fusion 54, 99–118 (2020). [CrossRef]  

42. A. S. K. Sollich, “Learning with ensembles: How over-fitting can be useful,” in Proceedings of the 1995 Conference, vol. 8 (1996), p. 190.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but can be obtained upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1.
Fig. 1. The flowchart of the proposed method. We use EOL and ML operators to obtain two type focus maps firstly. Then, we obtain three feature vectors from the focus maps and fed them back to the committee to obtain the decision map. Finally, a clear image is obtained from the source images.
Fig. 2.
Fig. 2. The source and fused images of Lytro data set. post index A,B: source image pair, F: fused image, red box: enlarged region within the focused/defocused boundary.
Fig. 3.
Fig. 3. The source and fused images of Lytro data set. post index A,B: source image pair, F: fused image, red box: enlarged region within the focused/defocused boundary.
Fig. 4.
Fig. 4. The source and fused images of Lytro data set. post index A,B: source image pair, F: fused image, red box: enlarged region within the focused/defocused boundary.
Fig. 5.
Fig. 5. Pseudo-color maps, which are produced by using the function $applyColorMap$ and the colormaps $COLORMAP\_RAINBOW$ in OpenCV. (a) and (b) $Lytro-11$ source image pair, (c) to (m) fused images, the enlarged image within yellow box (top left) and black box (top right).
Fig. 6.
Fig. 6. Pseudo-color maps of Lytro-01 and their fused images obtained with different fusion methods. (a) and (b) source image pair, (c) ECNN, (d) SESF, (e) proposed. It is obviously that the proposed method can maintain more texture details of the source image than ECNN, labeled by the black rectangle.
Fig. 7.
Fig. 7. Pseudo-color maps of Lytro-17 and their fused images obtained with different fusion methods. (a) and (b) source image pair, (c) ECNN, (d) SESF, (e) proposed. It is obviously that the proposed method can maintain more color details of the source image than ECNN, labeled by the black rectangle.
Fig. 8.
Fig. 8. Pseudo-color maps of Lytro-20 and their fused images obtained with different fusion methods. (a) and (b) source image pair, (c) ECNN, (d) SESF, (e) proposed. As shown in the black rectangle, the proposed method can obtain clear image, while in the fusion result of ECNN, the shape of windowsill in the background has distorted.
Fig. 9.
Fig. 9. The decision maps without and with ensemble learning of the proposed method for $lytro-03$.The yellow/green/red rectangular region shows the outliers induced by base learner 1/2/3.

Tables (4)

Tables Icon

Table 1. Parameters and precision of the base learners

Tables Icon

Table 2. Evaluation metrics and algorithms

Tables Icon

Table 3. Average of the quantitative evaluation metrics for different methods. The worst three values in each metric (row) are denoted in green. The best three methods (column) are denoted in red, which does not possess any metrics denoted in green.

Tables Icon

Table 4. Comparison of number of parameters between the proposed method and the others

Equations (15)

Equations on this page are rendered with MathJax. Learn more.

ELMA={EL(x,y)|(x,y)IGA},ELMB={EL(x,y)|(x,y)IGB},SLMA={ML(x,y)|(x,y)IGA},SLMB={ML(x,y)|(x,y)IGB},
EL(x,y)=I(x1,y1)4I(x1,y)I(x1,y+1)4I(x,y1)+20I(x,y)4I(x,y+1)I(x+1,y1)4I(x+1,y)I(x+1,y+1).
ML(x,y)=|2I(x,y)I(x1,y)I(x+1,y)|+|2I(x,y)I(x,y1)I(x,y+1)|.
f(x)=n=1Nαnk(x,xn),
K=[k(x1,x1)k(x1,x2)k(x1,xN)k(x2,x1)k(x2,x2)k(x2,xN)k(xN,x1)k(xN,x2)k(xN,xN)].
k(xi,xj)Z(xi)TZ(xj)=s(xi,xj),
f(x)=n=1Nαnk(x,xn)n=1Nαns(x,xn)=βTZ(x).
w=argminw12w2+CNi=1Nmax(0,1yiw,ϕ(xi)),
f(x)=w,ϕ(x),
k(xi,xj)=e||xixj||222.
P(w)=(2π)D2e||w||222.
ϕ(x)=Z(x)=2D[sin(w1Tx)cos(w1Tx)sin(wD/2Tx)cos(wD/2Tx)],wii.i.dP(w),
y=max(0,(Ybl1+Ybl2+Ybl3)Thr),
Ybl1=wbl1,ϕ(Va),Ybl2=wbl2,ϕ(Vb),Ybl3=wbl3,ϕ(Vc),
F(x,y)=MD(x,y)IA(x,y)+(1MD(x,y))IB(x,y).
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.