Robust registration for ultra-field infrared and visible binocular images

Shuai Zhang; Fuyu Huang; Bingqi Liu; Gang Li; Yichao Chen; Lihui Sun; Yunzuo Zhang

doi:10.1364/OE.391672

1. Introduction

With the rapid development of sensor technology and imaging techniques, more and more complementary information is provided by the multispectral imaging and the ultra-field imaging techniques, which are popular in remote sensing [1], military reconnaissance [2], computer vision [3,4], and so on. However, due to the different viewpoints, sensors and lens, the multi-modal images are inevitably misaligned, so they cannot be directly used for image fusion, change detection and image mosaic [5]. Image registration, as a fundamental step, has been studied for the images alignment, in which many researches focus on multiple spectrums, but few of them are related to ultra-field imaging technique. Among them, the infrared-visible registration is a research hotspot, but it is still a difficult task because there are different gray-scale characteristics in the infrared and visible images, such as nonlinear intensity variations and local gray inversion [6].

Currently, the main multi-modal registration methods can be categorized into area based and feature based methods [7–9]. Area based methods accomplish the registration depending on calculating the image similarity between the input images. Commonly used area based similarity functions include mutual information (MI) [10], cross-correlation (CC) [11], normalized cross correlation (NCC) [12], and so on. This type of methods can easily realize automatic and work well when input images are highly similar. However, misregistration always occurs when there is difference in input images, especially for multi-modal images registration. In contrast to the area based methods, the feature based methods don’t work directly with original images. They mainly extract obvious image features and match the features based on their similarity. In the procedure of feature extracting, the image features usually include point feature, edge feature, region feature, and so on. Among these features, point feature is the most popular, including scale invariant feature (SIFT) [13], Harris algorithm, binary robust independent elementary features (BRIEF) [14], features from accelerated segment test (FAST) [15], and so on. In addition, Zhao proposed a line feature based registration method for multi-modal image, in which the descriptor is named as multi-modality robust line segment descriptor (MRLSD) [16]. Song proposed a retrofitted SIFT algorithm and Lissajous-curve trajectories based feature for the registration [17]. Lv combined the line and points features to realize automatic registration of airborne Li DAR point cloud data and optical imagery depth map [18]. There are also many region feature based descriptor, such as local binary pattern (LBP) [19], local self-similarity (LSS) [20], histogram of oriented gradients (HOG) [21]. Among them, the region centers are usually treated as control points (CP). Teng proposed an improved local feature descriptor and it can improve the performance of all SIFT-based applications [22]. Tang proposed an adaptable local-global feature for the rail registration in infrared and visible bands [23]. One current popular trend of the features based registration methods is the structures based method, and the phase congruency is one of the effective theories for multi-modal images registration [24]. Liu employed the structure features by constructing a maximally stable phase congruency descriptor to for registration [25]. Many other researches also have proved that the feature based method is more suitable for various situations, including illumination changes, gray intensity changes, geometric distortion, and so on.

Through analysis above-mentioned, the feature based method may satisfy the infrared and visible images registration. However, on the one hand, many CP detection methods depend on gradient, which varies nonlinearly in infrared and visible images. Moreover, the CP detected should be not only enough for quantity, but also high corresponding rate for transformation model. On the other hand, we can find that most registration methods focus on image registration with standard field of view (FOV). In contrary, the literatures related to the ultra-filed image registration are not so many, especially for ultra-field infrared and visible images. The reason for this phenomenon is that the conventional methods can’t be directly applied into ultra-filed images without considering the nonlinear imaging. In detail, three difficulties, existing in ultra-field infrared and visible images, are analyzed as follows.

(1) The distortions become large from center to border in the ultra-field image, i.e., the straight structure lines will become curves. Hence, the corner extraction may be insufficient by utilizing the corner and edge based algorithms, such as Harris and shape context (SC) algorithms. Moreover, the intensity variation in infrared and visible images makes the extraction difficulty more serious.
(2) The FOV of the ultra-field image is larger than commonly used images. So its scene can’t be always full of salient structures, and the sky background is inevitably brought in the image. Thus, the uniform extraction method in whole image is unreasonable, in other words, only the image parts with obvious structures can be used for CP extraction. Moreover, due to different bands, some structures may not keep consistently salient in both infrared and visible images.
(3) Due to nonlinear imaging and vision disparity, the image contents are different between two adjacent windows for the same CP in ultra-field infrared and visible images. Therefore, the template matching, adopted in most of current methods, is inapplicable to identify correspondences in ultra-field image.

To overcome the difficulties above, we proposed a novel registration method for the ultra-filed infrared-visible images, and our contributions mainly consist of three parts. Firstly, phase congruency is utilized to extract enough CP in both ultra-field infrared and visible images. However, due to visible image has more abundant detail information, many obvious single points in visible image don’t have corresponding points in the infrared image. So the nonlinear diffusion filtering is adopted to preserve significant structures, and then, their gradient maps are calculated to guide the quantity optimization for CP. Thus, the CP extraction is robust in both infrared and visible images. Secondly, the ultra-field infrared and visible binocular cameras are calibrated in advance, in which the RBF network is introduced to optimize the calibration results. On the basis of the calibration results, the epipolar curve is proposed to be equivalent to search window for ROI matching. Thus, the interference brought by different imaging mechanisms and vision disparity can be reduced to a great extent. Thirdly, a significant structure based descriptor, naming multiple phase congruency directional patterns (MPCDP), is proposed for the correct CP pairs identification. Specifically, the phase congruency amplitudes are quantized to binary pattern at multi-orientation. The distribution information of MPCDP is represented by spatial histograms of binary pattern, which are weighted by nonzero mean amplitudes at every orientation. Similarly, the main direction of the MPCDP is calculated by combining weighted multi-orientations together. The experiments are implemented in six pairs of ultra-field infrared and visible images, which contain five different scenes and two different camera relationships. The performances demonstrate that the proposed method is robust and accurate to achieve the registration of the ultra-field infrared and visible images.

The remainder of the paper is organized as follows. Section 2 presents the methodology of the proposed method, which mainly consists of CP extraction, ROI matching and MPCDP descriptor construction. The experiment results and analysis are presented in Section 3. Finally, the conclusions are given in Section 4.

2. Methodology

2.1 Robust ultra-field infrared and visible CP extraction

The significance and quantity of detected candidate CP have a large influence on the performance and accuracy of the image registration [26]. In multi-modal images registration, the issue of significance becomes difficult, but it’s important. The first step of nonrigid registration is to select a set of candidate CP in ultra-field infrared and visible images. Considering different imaging mechanisms, the structures based methods are more robust than other features based methods in the infrared and visible images. We take the phase congruency theory into account. It postulates that perceptually significant features are situated at locations of the image where the Fourier components are maximally in phase [26]. Compared with commonly used gradient based methods, the phase congruency is more robust against the gray-scale intensity variation, especially in multi-modal images, as shown in Fig. 1. Additionally, the logarithmic Gabor filter is utilized to expand the phase congruency into multi-scales and orientations, which is defined as follows.

(1)$$P({x,y} )= \frac{{\sum\limits_n {W({x,y} )\lfloor{{A_n}({x,y} )\Delta \Phi ({x,y} )- T} \rfloor } }}{{\sum\limits_n {{A_n}({x,y} )+ \varepsilon } }}$$

(2)$$\Delta \Phi ({x,y} )= \cos ({{\phi_n}({x,y} )- \bar{\phi }({x,y} )} )- |{\sin ({{\phi_n}({x,y} )- \bar{\phi }({x,y} )} )} |$$

where $({x,y} )$ represents the coordinate of the point in the image, n is the scale of the filter, $W({x,y} )$ is the weighting factor based on the frequency spread, ${A_n}({x,y} )$ is the amplitude, ${\phi _n}({x,y} )$ is the phase at the scale n, $\bar{\phi }({x,y} )$ is the weighted mean phase, T represents the noise threshold and $\varepsilon$ is a small constant to avoid division by zero. $\lfloor{} \rfloor$ denotes that the enclosed quantity is equal to itself when its value is positive or zero otherwise.

Fig. 1. Comparison of the phase congruency and gradient in infrared and visible images

Ultra-field Camera	Center coordinate	FOV	c,d,e
Infrared	(407.21,302.42)	${111.97}^{\circ} \times {83.74}^{\circ}$	(1, $1.046 \times 10^{- 6}$ , $- 1.1 \times 10^{- 6}$ )
Visible	(403.11, 298)	${108.46}^{\circ} \times {81.34}^{\circ}$	(1, $- 1.91 \times 10^{- 8}$ , $6.55 \times 10^{- 8}$ )
Extrinsic relationship 1 (image pairs 1 to 4)	$R = [\begin{array}{ccc} 0.999 & - 0.040 & 0.002 \\ 0.040 & 0.999 & - 0.023 \\ - 0.001 & 0.023 & 0.1 \end{array}], T = [\begin{matrix} 62.378 \\ - 3.836 \\ 27.420 \end{matrix}]$
Extrinsic relationship 2 (image pairs 5 to 6)	$R = [\begin{array}{ccc} 0.999 & 0 & 0.039 \\ 0 & 1 & 0.015 \\ - 0.001 & - 0.015 & 0.999 \end{array}], T = [\begin{matrix} 62.939 \\ 0.853 \\ 13.693 \end{matrix}]$

Abstract

1. Introduction

2. Methodology

2.1 Robust ultra-field infrared and visible CP extraction

2.2 ROI pairs matching based on epipolar curve

2.3 Multiple phase congruency directional patterns descriptor

3. Experiment results and analysis

3.1 Performance analysis of robust CP extraction

3.2 Performance analysis of ROI pairs matching

3.3 Performance analysis of MPCDP descriptor

3.4 Registration results

4. Conclusions

Funding

Disclosures

References

Cited By

Figures (15)

Tables (1)

Equations (10)

Optics Express