Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Achieving widely distributed feature matches using flattened-affine-SIFT algorithm for fisheye images

Open Access Open Access

Abstract

Performing correction first is the most common methods to address feature matching issues for fisheye images, but corrections often result in significant loss of scene details or stretching of images, leaving peripheral regions without matches. In this paper, we propose a novel approach, named flattened-affine-SIFT, to find widely distributed feature matches between stereo fisheye images. Firstly, we establish a new imaging model that integrates a scalable model and a hemisphere model. Utilizing the extensibility of the imaging model, we design a flattened array model to reduce the distortion of fisheye images. Additionally, the affine transformation is performed on the flattened simulation images, which are computed using the differential expansion and the optimal rigidity transformation. Then feature matches are extracted and matched from the simulated images. Experiments on indoor and outdoor fisheye images show that the proposed algorithm can find a large number of reliable feature matches. Moreover, these matches tend to be dispersed over the entire effective image, including peripheral regions with dramatic distortion.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The fisheye camera is a powerful omni-directional perception sensor that is inexpensive, compact, and robust, which has witnessed significant utilization in robotics applications. Due to the wider field-of-view (FOV) compared to the pinhole camera model, fisheye cameras can pack more information into the same sensor area [1,2]. In applications that demand 3D information, employing fisheye cameras offer substantial advantages, particularly for robot navigation and autonomous driving [35]. The wide FOV allows for simultaneous visualization and observation of objects from multiple directions.

Feature matching is a prerequisite for the aforementioned applications [6,7]. To capitalize on the wide FOV, several methods have been proposed to address the feature matching problem for fisheye images. The most significant challenge in this process is dealing with the dramatic distortion inherent in fisheye images. As shown in Fig. 1 (a), the steps and columns in the fisheye images are severely distorted.

 figure: Fig. 1.

Fig. 1. Original fisheye image (a) and two types of corrected image: perspective-rectified images (b) and equirectangular image (c).

Download Full Size | PDF

Generally, the distortion is removed by performing perspective projection rectification on fisheye images. This process enables the use of feature matching algorithms designed for perspective images. Fiala et al. [8] transformed omnidirectional fisheye images into cube maps and calculated scale invariant feature transform (SIFT) features [9] on each cube surface. Zhang et al. [10] rectified fisheye images using a scanning line projection strategy and extracted speed up robust features (SURF) [11] from rectified perspective images. Lin et al. [3] designed a navigation system equipped with fisheye cameras for an unmanned aerial vehicle. They only used part of the fisheye images: the horizontal area was rectified to a perspective image for subsequent processing, while most of the vertical FOV was abandoned. Miiller et al. [12] proposed an omnidirectional 3D mapping system. To address the distortion, the central region of each fisheye image was rectified to two perspective images, and the peripheral scenes were also discarded. The perspective rectification can effectively remove the nonlinear distortion but also eliminate the main advantage of such cameras - the wide FOV. Furthermore, objects closer to the edge of the valid area were highly stretched, while objects close to the center were highly compressed in fisheye images, as shown in the blue and red rectangular regions in Fig. 1 (b).

Other rectification methods, such as the equirectangular projection, retain the fisheye camera’s wide FOV. These methods can alleviate the distortion in fisheye images to a certain extent. In order to generate a 3D surround view, Lo et al. [13] stitched a pair of fisheye images captured by back-to-back fisheye cameras with a 195° FOV angle. Their method involved rectifying the overlapping field of view (FOV) into equirectangular images and extracting features to achieve the stitched image. In [14], fisheye images were deformed using an equirectangular projection to simplify the stereo correspondence search. Template stereo matching was used to calculate corresponding points in the deformed image. Based on the traditional oriented FAST and rotated BRIEF (ORB) feature matching algorithm [15], Zhao et al. [16] proposed the spherical ORB algorithm by constructing a new binary feature for equirectangular images in spherical space. To enhance the accuracy of feature descriptors for fisheye images, Pourian et al. [17] proposed a new framework for detecting and matching features that employed multi-level geometric perception. However, the use of equirectangular projection introduced new distortions that degraded the performance of the descriptors. Based on a spherical projection model, equirectangular projection methods unfold the spherical image into a rectangular image, which can cause severe image stretching deformation, especially around the poles, as shown in the green rectangular regions in Fig. 1 (c), resulting in secondary distortion and a large number of mismatches.

The P-ASIFT algorithm [18] is modified from the Affine-SIFT (ASIFT) algorithm [19] and is designed for fisheye images. Experiments have shown that P-ASIFT achieves good matching results. It adopts a hemispherical division model and 1-level division method to divide the fisheye images into a series of patches. Then the affine transformation is performed on the patches. However, the number of patches obtained after the 1-level division is already as many as 40. Each patch needs to be transformed and feature extracted several times. It can be seen that the good performance of P-ASIFT is based on the premise of high computational complexity. Although parallel computing can be realized, it is also on the precondition of high hardware configuration. In this paper, we also improve the ASIFT algorithm for fisheye images, but unlike P-ASIFT, the proposed flattened-Affine-SIFT (abbreviated as FA-SIFT) algorithm obtains a wider distribution of matches with fewer affine transformations.

More specifically, we first construct an imaging model that combines a hemi-icosahedron model with a hemisphere model. We then extend the imaging model into a flattened array model composed of flattened elements. For each flattened element, we construct a flattened simulation element. Subsequently, we convert the fisheye image into a series of flattened simulation images using the optimal rigidity transformation. This process greatly reduces the nonlinear distortion in fisheye image. After that, we apply affine transformations to the flattened simulation images to obtain a series of simulated images. Finally, we implement the SIFT algorithm on the simulated images to extract and match features. We can address the feature matching issue by fully leveraging the fisheye camera’s wide FOV without loss of scene or stretching of images.

The remainder of this paper is structured as follows. Section 2 is dedicated to describing the flattened-affine-SIFT algorithm proposed for fisheye images. Section 3 presents the comparison experiments and their results. Section 4 concludes this paper.

2. Establishment of the scalable integrated imaging model

The ASIFT algorithm is fully affine invariant to changes in viewpoint. Its distinctive innovation is to carry out affine transformations on the input images and then extract features from a series of transformed images. The ASIFT algorithm has the advantages of good robustness as well as a considerable number of matches. However, it is strongly tailored to perspective images that follow the linear pinhole model. To take advantage of the benefits of ASIFT for fisheye images, we enhance the ASIFT algorithm and introduce a new flattened-affine-SIFT approach. A crucial first step in this method is to construct a new imaging model.

2.1 Establishment of the scalable integrated imaging model

The aim of developing a scalable integrated imaging model is to waken the nonlinear distortion in fisheye images. Unlike perspective correction and equivalent rectangle correction, the premise of this method is that there is no loss of scene information or stretching of images. The sphere model is a common imaging model for fisheye cameras [2022]. To establish the scalable integrated imaging model as the fundamental imaging model, we embed a regular icosahedron within a unit sphere initially, as depicted in Fig. 2 (a). The advantage of the regular icosahedron is that it has a more manageable data structure. It comprises multiple flat faces and can be extended. Furthermore, when the tridimensional points of the scene are linearly projected onto the icosahedron, the image formed on the faces is theoretically undistorted. The fisheye camera used in our experiments has a FOV angle of 180°, so the scalable integrated imaging model actually comprises a hemisphere model and half of a regular icosahedron, named hemi-icosahedron, as illustrated in Fig. 2 (b). As shown in Fig. 2 (c), the camera coordinate system is defined as $O - XYZ$.

 figure: Fig. 2.

Fig. 2. The scalable integrated imaging model.

Download Full Size | PDF

As shown in Fig. 2(c), based on the scalable integrated imaging model, when a tridimensional point of the scene M is imaged on the image plane, it undergoes two steps: first, M is linearly projected onto $m^{\prime}$ and ${m_\Delta }$. $m^{\prime}$ is the intersection of $\overrightarrow {MO} $ and the hemispherical surface. ${m_\Delta }$ is the intersection of $\overrightarrow {MO} $ and the hemi-icosahedron. The axis $\overrightarrow {OZ} $ and the vector $\overrightarrow {OM} $ form a polar angle $\rho (\rho \in [0,\pi /2]).$ $\sigma (\sigma \in [0,2\pi ])$ denotes the azimuth angle between the axis $\overrightarrow {OX} $ and the projection vector of $\overrightarrow {OM} $ on the XOY plane. Then $m^{\prime}$ and ${m_\Delta }$ are nonlinearly projected onto m according to the projection model. This process defines the mapping between tridimensional points of the scene and image points, resulting in a circular effective region on the fisheye image.

To project as large a scene as possible into a limited plane, the fisheye camera follows a non-similar imaging rule. The commonly used projection models of fisheye lens include stereoscopic projection, equidistant projection, equisolid projection, orthogonal projection, and polynomial approximation projection [2326]. The fisheye camera used in this experiment follows the equisolid projection model, which defines the relationship between the distance ${d_{ima}}$ from the center ${o_I}$ to $m$ in the fisheye image and the polar angle $\rho $. Its mathematical form is formulated in (1):

$${d_{ima}}(\rho ) = 2f\sin (\frac{\rho }{2}),$$
where f denotes the focal length of the fisheye camera for experiments.

2.2 Correspondence between geodesic face and hemispherical surface

Based on the imaging model, a pixel in the fisheye image corresponds to a point on the hemispherical surface and a point on the geodesic face. By utilizing the method in [27], the correlation between $m^{\prime} = ({X_{m^{\prime}}},{Y_{m^{\prime}}},{Z_{m^{\prime}}})$ on the hemispherical surface and $m = ({x_m},{y_m})$ on the fisheye image can be determined. The next step is to calculate the correlation between $m^{\prime}$ and ${m_\Delta } = ({X_\Delta },{Y_\Delta },{Z_\Delta })$ on the geodesic face. The key to achieving this step is to determine which geodesic face ${m_\Delta }$ is located on, and then further localize the exact point on the geodesic face. Suppose that the three vertices of a geodesic face ${\nu _1}{\nu _2}{\nu _3}$ where ${m_\Delta }$ is located are ${\nu _1} = ({X_1},{Y_1},{Z_1})$, ${\nu _2} = ({X_2},{Y_2},{Z_2})$, and ${\nu _3} = ({X_3},{Y_3},{Z_3})$. The Plücker coordinates ${\Omega _\nu }_{_1{\nu _2}}$ of the directed line $\overrightarrow {{\nu _1}{\nu _2}} $ can be obtained by the formula (2). In the same way, the Plücker coordinates ${\varOmega _\nu }_{_1{\nu _3}}$ and ${\varOmega _\nu }_{_2{\nu _3}}$ of the directed line $\overrightarrow {{\nu _1}{\nu _3}} $ and $\overrightarrow {{\nu _2}{\nu _3}} $ can also be calculated.

$${\Omega _\nu }_{_1{\nu _2}} = [{X_1}{Y_2} - {X_2}{Y_1},{X_1}{Z_2} - {X_2}{Z_1},{X_1} - {X_2},{Y_1}{Z_2} - {Y_2}{Z_1},{Z_1} - {Z_2},{Y_2} - {Y_1}]. $$

The origin of the camera coordinate system is $O = (0,0,0)$. The Plücker coordinates ${\Omega _O}_{m^{\prime}}$ of the directed line $\overrightarrow {Om^{\prime}} $ are $[0,0, - {X_{m^{\prime}}},0, - {Z_{m^{\prime}}}, - {Y_{m^{\prime}}}]$. We perform the side operator on $\overrightarrow {Om^{\prime}} $ and the directed line $\overrightarrow {{\nu _1}{\nu _2}} $ to obtain the relation $\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _1}{\nu _2}} )$, which are expressed in formula (3). Similarly, the relation $\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _1}{\nu _3}} )$ and $\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _2}{\nu _3}} )$ can also be calculated.

$$\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _1}{\nu _2}} ) = {X_{m^{\prime}}}({Y_2}{Z_1} - {Y_1}{Z_2}) + {Y_{m^{\prime}}}({X_2}{Z_1} - {X_1}{Z_2}) + {Z_{m^{\prime}}}({X_2}{Y_1} - {X_1}{Y_2}). $$

Utilizing the relation $\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _1}{\nu _2}} )$, $\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _1}{\nu _3}} )$, and $\widehat \Phi (\overrightarrow {Om^{\prime}} ,\overrightarrow {{\nu _2}{\nu _3}} )$, we can determine if ${m_\triangle }$ is on the geodesic face ${\nu _1}{\nu _2}{\nu _3}$. To obtain the one-to-one mapping between the image plane, the hemispherical surface, and the geodesic face, it is essential to obtain the correspondence between the points on the hemispherical surface and the points on the geodesic face. To address this issue, we propose a positioning formula to confirm the exact projection points on the geodesic face. Taking the geodesic face ${\nu _1}{\nu _2}{\nu _3}$ as an example, we can compute its normalized normal vector $\overrightarrow {{n_\Delta }} = ({i_\Delta },{j_\Delta },{k_\Delta })$. Utilizing the point $m^{\prime} = ({X_{m^{\prime}}},{Y_{m^{\prime}}},{Z_{m^{\prime}}})$ on hemispherical surface, the normalized normal vector $\overrightarrow {{n_\Delta }} = ({i_\Delta },{j_\Delta },{k_\Delta })$ and one of the vertex of ${\nu _1}{\nu _2}{\nu _3}$, the coordinates $({X_\Delta },{Y_\Delta },{Z_\Delta })$ of ${m_\Delta }$ can be calculated by positioning formula (4):

$$\left\{ \begin{array}{l} {X_\Delta } = \frac{{{X_{m^{\prime}}}({i_\Delta } \cdot {X_1} + {j_\Delta } \cdot {Y_1} + {k_\Delta } \cdot {Z_1})}}{{{i_\Delta } \cdot {X_{m^{\prime}}} + {j_\Delta } \cdot {Y_{m^{\prime}}} + {k_\Delta } \cdot {Z_{m^{\prime}}}}}\\ {Y_\Delta } = \frac{{{Y_{m^{\prime}}}({i_\Delta } \cdot {X_1} + {j_\Delta } \cdot {Y_1} + {k_\Delta } \cdot {Z_1})}}{{{i_\Delta } \cdot {X_{m^{\prime}}} + {j_\Delta } \cdot {Y_{m^{\prime}}} + {k_\Delta } \cdot {Z_{m^{\prime}}}}}\\ {Z_\Delta } = \frac{{{Z_{m^{\prime}}}({i_\Delta } \cdot {X_1} + {j_\Delta } \cdot {Y_1} + {k_\Delta } \cdot {Z_1})}}{{{i_\Delta } \cdot {X_{m^{\prime}}} + {j_\Delta } \cdot {Y_{m^{\prime}}} + {k_\Delta } \cdot {Z_{m^{\prime}}}}} \end{array} \right.. $$

By utilizing the correlation between $m^{\prime} = ({X_{m^{\prime}}},{Y_{m^{\prime}}},{Z_{m^{\prime}}})$ and $m = ({x_m},{y_m})$, along with formula (4), the one-to-one correlation between points on the image plane, geodesic face, and hemispherical surface can be obtained.

2.3 Flattened simulation images used for affine simulation

As shown in Fig. 3, the regular icosahedron can be expanded into five connected parallelograms. The geodesic faces 1, 2, 3, …, 20 correspond to triangles $1^{\prime},.2^{\prime},.3^{\prime},\ldots ..,.20^{\prime}$ in parallelograms. The expansion was performed by dividing the top five sides as well as the bottom five sides of the regular icosahedron. Taking the top five sides a, b, c, d, and e as an example, each side forms a pair of jagged edges $(a^{\prime},a^{\prime\prime})$, $(b^{\prime},b^{\prime\prime})$, $(c^{\prime},c^{\prime\prime})$, $(d^{\prime},d^{\prime\prime})$, and $(e^{\prime},e^{\prime\prime})$. The vertex ${O_p}$ of the regular icosahedron forms five points ${O_{p1}}$,${O_{p2}}$, ${O_{p3}}$,${O_{p4}}$, and ${O_{p5}}$ on the parallelograms, where ${O_{p1}}$ is used as the origin to establish the coordinate system.

 figure: Fig. 3.

Fig. 3. The regular icosahedron is expanded into five connected parallelograms.

Download Full Size | PDF

Due to the fisheye camera employed in the experiment possessing a FOV angle of 180°, it suffices to expand only the geodesic faces on the hemi-icosahedron to form a flattened array model as shown in Fig. 4. Therefore, the gray solid part is not required for consideration. Each cell in the flattened array model, for instance, $1^{\prime}$, $2^{\prime}$, or $3^{\prime}$ is referred to as an array element.

 figure: Fig. 4.

Fig. 4. The hemi-icosahedron is expanded into a flattened array model.

Download Full Size | PDF

Utilizing the one-to-one correlation between points on the image plane, geodesic face, and hemispherical surface, the 3D coordinates of points on each geodesic face can be computed. These 3D coordinates are then subjected to principal component analysis (PCA) for dimension reduction. Subsequently, the pixels on the fisheye image are projected onto the flattened array model and array element to obtain a flattened array image and a series of element images, as illustrated in Fig. 5.

 figure: Fig. 5.

Fig. 5. The diagram of a flattened array image and its element images.

Download Full Size | PDF

As shown in Fig. 5, the flattened array model can significantly reduce distortions in the fisheye image, such as the edges of the computer and bookshelf enclosed within the green and blue ellipses. In particular, the effect is more pronounced in the peripheral regions, such as the edge of the yellow book enclosed within the red ellipse. Moreover, unlike perspective rectification and equirectangular projection methods, our expansion method does not result in any obvious loss of scene details or stretching of images. In this case, the array images can be approximated as perspective images. This feature makes the application of affine transformation reasonable and feasible. Therefore, drawing inspiration from the ASIFT algorithm framework, this paper aims to perform simulated affine transformation on the array images, followed by feature extraction on the resulting simulated images.

Before performing the affine transformation, there is another issue that needs to be addressed. During the process of flattening the fisheye image, jagged edges are formed, such as $(a^{\prime},a^{\prime\prime})$, $(b^{\prime},b^{\prime\prime})$, $(c^{\prime},c^{\prime\prime})$, $(d^{\prime},d^{\prime\prime})$, and $(e^{\prime},e^{\prime\prime})$ in Fig. 4. These jagged edges separate pixels that would otherwise be neighbors of each other in the fisheye image. Regardless of whether features are extracted from the entire flattened array image or from individual element images, features near the jagged edges cannot be accurately extracted and characterized due to a lack of local information.

To mitigate the impact of jagged edges on feature extraction and description, we propose a differential expansion method to obtain flattened simulation images according to the specific positions of array elements in the flattened array model. As shown in Fig. 6, taking the geodesic face 1 as an example, there are three geodesic faces $(2,e5,e17)$ connected to geodesic face 1. We construct the flattened simulation element with the assistance of the adjacent geodesic faces $(2,e5,e17)$ of geodesic face 1, ensuring that there is sufficient local information around the features near the three edges of geodesic face 1 for more accurate extraction and description of features. We implement this step by utilizing the optimal rigid transformation method. The specific process is as follows.

 figure: Fig. 6.

Fig. 6. The diagram of the flattened simulation element.

Download Full Size | PDF

After performing the PCA dimensionality reduction steps on the points in geodesic faces, let us suppose that the three vertices of geodesic face 1 are respectively ${\alpha _1} = {(\widetilde x{1_\alpha },\widetilde y{1_\alpha },0)^T}$, ${\alpha _2} = {(\widetilde x{2_\alpha },\widetilde y{2_\alpha },0)^T}$, ${O_p} = {(\widetilde x{3_\alpha },\widetilde y{3_\alpha },0)^T}$. The flattened element of ${\alpha _1}{\alpha _2}{O_p}$ is ${\beta _1}{\beta _2}{O_{p1}}$, whose vertices are ${\beta _1} = {(\widetilde x{1_\beta },\widetilde y{1_\beta },0)^T}$, ${\beta _2} = {(\widetilde x{2_\beta },\widetilde y{2_\beta },0)^T}$ and ${O_{p1}} = {(\widetilde x{3_\beta },\widetilde y{3_\beta },0)^T}$. In order to transform the projected points in ${\alpha _1}{\alpha _2}{O_p}$ to the flattened element ${\beta _1}{\beta _2}{O_{p1}}$, we first compute the optimal rigid transformation matrix, which proceeds according to the following steps. First, ${\alpha _1}{\alpha _2}{O_p}$ and ${\beta _1}{\beta _2}{O_{p1}}$ can be expressed as formula (5) and (6).

$$Tr{i_\alpha } = [{{\alpha_1}^{}{\alpha_2}^{}{O_p}} ], $$
$$Tr{i_\beta } = [{{\beta_1}^{}{\beta_2}^{}{O_{p1}}} ]. $$

The centroid coordinates $ce{n_\alpha }$ of ${\alpha _1}{\alpha _2}{O_p}$ and $ce{n_\beta }$ of ${\beta _1}{\beta _2}{O_{p1}}$ can be calculated using the following formula (7) and (8).

$$ce{n_\alpha } = {\left[ {\frac{1}{3}\sum\limits_{i = 1}^3 {\widetilde x{i_\alpha }} ,\frac{1}{3}\sum\limits_{i = 1}^3 {\widetilde y{i_\alpha }} ,0} \right]^T}, $$
$$ce{n_\beta } = {\left[ {\frac{1}{3}\sum\limits_{i = 1}^3 {\widetilde x{i_\beta }} ,\frac{1}{3}\sum\limits_{i = 1}^3 {\widetilde y{i_\beta }} ,0} \right]^T}. $$

To find the optimal rotation matrix, we use formula (9) to obtain a recentered matrix ${H_{ \to o}}$ that realizes recentering the two point sets of ${\alpha _1}{\alpha _2}{O_p}$ and ${\beta _1}{\beta _2}{O_{p1}}$ so that their centroids are at the origin:

$${H_{ \to o}} = (Tri\_\alpha - cen\_\alpha ) \times {(Tri\_\beta - cen\_\beta )^T}$$

After performing singular value decomposition on ${H_{ \to o}}$ using formula (10), the optimal rotation matrix ${R_{ET}}$ can be obtained using formula (11). Then, the optimal translation matrix ${T_{ET}}$ can be calculated using formula (12), which involves the optimal rotation matrix ${R_{ET}}$ and the centroid coordinates $ce{n_\alpha }$ and $ce{n_\beta }$.

$${H_{ \to o}} = {U_{ET}} \times {S_{ET}} \times {V_{ET}}, $$
$${R_{ET}} = {V_{ET}} \times {U_{ET}}^T, $$
$${T_{ET}} ={-} {R_{ET}} \times ce{n_\alpha } + ce{n_\beta }. $$

After performing the PCA dimensionality reduction, the coordinates of all points ${\alpha _1}$, ${\alpha _2}$, ${\alpha _3}$, …, ${\alpha _n}$, ${O_p}\;(n \in N\ast )$ in ${\alpha _1}{\alpha _2}{O_p}$ can form a matrix $\alpha U$:

$$\alpha U = [{{\alpha_1}^{}{\alpha_2}^{}{\alpha_3}^{}{\alpha_4}^{}{{\ldots }^{}}{\alpha_n}^{}{O_p}} ]. $$

Then performing an optimal rigid transformation on ${\alpha _1}{\alpha _2}{O_p}$ yields the flattened element ${\beta _1}{\beta _2}{O_{p1}}$, and the set of coordinates of all points in ${\beta _1}{\beta _2}{O_{p1}}$ can be determined by formula (14).

$$\beta U = {R_{ET}} \times \alpha U + [{T_1},{T_2},{T_3},\ldots ,{T_n},{T_{n + 1}}], $$
where ${T_1}$, ${T_2}$, ${T_3}$, …, ${T_n}$, ${T_{n + 1}}\;(n \in N\ast )$ are equal to ${T_{ET}}$.

Similar to geodesic face 1, the geodesic faces 2, 5, and 17 in Fig. 6 undergo dimensionality reduction and optimal rigid transformations, and subsequently the flattened simulation element for geodesic face 1 can be established. The same method can be utilized to establish the corresponding flattened simulation elements for other geodesic faces. Based on the flattened simulation elements, a series of flattened simulation images can be obtained using the correlation between the image plane and geodesic face. Some examples of flattened simulation images are shown in Fig. 7. It can be observed that the distortion in the original fisheye image is substantially weakened in flattened simulation images (enclosed in the purple triangular area). The yellow book, for example, has edges that were severely curved in the original fisheye image have tended to straighten out in flattened simulation images. In addition, the element images (enclosed in the orange triangular area) are surrounded by neighborhood scene information in the flattened simulation images. This information can be utilized to extract and describe features for each element image, which can effectively eliminate the negative impact on feature extraction and description produced by jagged edges.

 figure: Fig. 7.

Fig. 7. The flattened simulation images and element images. The first row shows the original stereo fisheye images, and the second row shows the flattened simulation images.

Download Full Size | PDF

The distortion in the fisheye image is greatly reduced, and the flattened simulation image can be approximately regarded as a perspective image. This property allows us to perform affine transformation on the flattened simulation image. In this process, a series of affine matrices are calculated by varying the latitude angle $\vartheta (\vartheta \in [0,\pi /2])$ and longitude angle $\varpi (\varpi \in [0,\pi ])$. Then, affine matrices are used to transform the flattened simulation image, thus achieving the goal of simulating several images of the same scene from different viewpoints. First, the flattened simulation image is rotated with the longitude angle $\varpi$ and then tilted with the absolute tilt value t, where $t = |{1/\cos \vartheta } |$. with the sampling interval $\Delta t = \sqrt 2 $, t takes the following values: $1,\; \sqrt 2 ,\ldots ,\; 4\sqrt 2$, and corresponding $\vartheta$ is $0^\circ ,45^\circ ,\ldots ,79.8^\circ $. The sampling interval of longitude angle $\varpi$ is $\Delta \varpi \textrm{ = }72^\circ{/}t$. For each absolute tilt value t, the longitude angle $\varpi$ is $0^\circ ,72^\circ{/}t,144^\circ{/}t,\ldots ,180^\circ $. These values can be utilized to determine the affine transformation matrix ${A_{\textrm{affine}}}$ using formula (15).

$${A_{\textrm{affine}}} = {H_\gamma }{R_i}(\upsilon ){T_t}{R_{ii}}(\varpi ) = \gamma \left[ \begin{array}{l} \cos \upsilon \textrm{r} - \sin \upsilon \\ \sin \upsilon \; \; \; \; \cos \upsilon \end{array} \right]\left[ \begin{array}{l} t\textrm{r }0\\ 0\textrm{r}\; 1 \end{array} \right]\left[ \begin{array}{l} \cos \varpi \textrm{r} - \sin \varpi \\ \sin \varpi \; \; \; \; \cos \varpi \end{array} \right], $$
where $\gamma$ is a scaling parameter; $\upsilon (\upsilon \in [0,2\pi ])$ represents the rotation angle of the camera around its optical.

Then the flattened simulation image can be transformed into several simulated images. The SIFT algorithm is employed on the simulated image to extract feature points, generate descriptors and match the feature points. Finally, all the matched pairs are amalgamated into one set, and the duplicate pairs are eliminated.

3. Experiment and analysis

To test the performance of the FA-SIFT method, we employ the experimental stereo images in [18], which were acquired by the NM33 fisheye camera indoors. They are shown in Fig. 8. There are three sets and twelve pairs of fisheye images. In order to verify that the proposed method has complete invariance to camera viewpoint changes, there are also some complex factors that will interfere with matching results: rotation, scaling, translation, and the different camera axis orientation settings between the stereo fisheye images. Due to the nonlinear imaging property, the relative motion between the fisheye cameras produces a more drastic change in the scene content, which will undoubtedly bring more challenges to the subsequent stereo matching.

 figure: Fig. 8.

Fig. 8. Stereo images captured from indoor scenes. (a), (b), and (c) are stereo images of set 1, 2, and 3. The stereo images (the first to the fourth rows) are acquired by the cameras at relative motion with rotation, scaling, translation, and axis orientation change, respectively.

Download Full Size | PDF

Since FA-SIFT is designed for fisheye images using ASIFT as a base algorithm for improvement, one point that needs to be verified is whether FA-SIFT is able to improve the matching performance on fisheye images compared to ASIFT. SIFT is one of the most commonly used and robust feature matching algorithms. In addition, the local D-nets algorithm [28] is one of the most recent feature matching algorithms designed for fisheye images. Therefore, we compare FA-SIFT with SIFT, ASIFT and local D-nets. In the four algorithms, the RANSAC algorithm is employed to eliminate the mismatched points.

The total matching correspondences (TPC) of the four algorithms are listed in Table 1. In Table 1, Ri, Si, Ti, and Ai (i = 1, 2, and 3) represent the indoor experimental images pairs with rotation, scaling, translation, and camera-axis orientation change in set i, respectively. In addition, two metrics, percentage of bad matches (PBM) and root mean square error (RMSE), are used to compare the matching veracity of the four methods, as shown in Table 1. The formulas for calculating the PBM value ${\Theta _{\textrm{PBM}}}$ and the RMSE values ${\Omega _{\textrm{RMSE}}}$ are as follows:

$${\Theta _{\textrm{PBM}}}\textrm{ = }\frac{{{N_T}\textrm{ - }{N_C}}}{{{N_T}}} \times 100\%, $$
where ${N_T}$, ${N_C}$ represents the number of TPC and correct matching correspondences (CPC).
$${\Omega _{\textrm{RMSE}}} = \sqrt {\frac{1}{{{N_T}}}\sum\limits_{i = 1}^{{N_T}} {{{({I_R}(i) - {I_T}(i))}^2}} } , $$
where ${I_R}$, ${I_T}$ represents the pixel intensity values of features in the reference and target image. In the last row of the Table 1, the average of the data in each column for each indicator is presented. For an image pair, the best results for the same index of the four methods are bolded.

Tables Icon

Table 1. PBM Values and RSME Values Obtained by The Four Algorithms on Indoor Images

Table 1 reveals that SIFT manages to produce only a limited number of matches. The explanation behind this is that SIFT is solely based on the perspective imaging model. Although ASIFT is also designed for perspective images, it manages to produce a larger number of matches. ASIFT executes affine transformation on the original image and extracts features on the simulated image. This approach enhances the overall robustness and stability of the algorithm, enabling ASIFT to handle distortions to a certain extent. The local D-nets algorithm is custom-made for fisheye images based on the nonlinear imaging model and is able to produce more matches on fisheye images. However, local D-nets lacks sufficient stability. The number of matches it produced on certain image pairs was only marginally higher than those of SIFT.

Compared to the three algorithms mentioned above, it is evident that FA-SIFT can achieve a high number of matches, up to thousands, making it highly valuable for applications such as 3D reconstruction and scene understanding. Based on the flattened array model, flattened simulation images are constructed and used as a replacement for the entire input image as the base for affine transformation. In this case, FA-SIFT not only retains the robust and stable characteristics of ASIFT, which is reflected by PBM and RSME: compared to SIFT and local D-nets, ASIFT and FA-SIFT exhibit lower PBM and RSME values, indicating higher matching accuracy. Additionally, an imaging model suitable for fisheye images is designed that follows an equisolid projection tailored for fisheye cameras to enhance the performance of the matching algorithm. Therefore, FA-SIFT can obtain a considerable number of reliable feature matches.

Taking the first set of indoor images as an example, Fig. 9 more intuitively shows the matching results of the four algorithms. In order to show the matching relationship more clearly, we show one out of every five matches in Fig. 9 and subsequent figures with matching lines. It can be observed from Fig. 9 that the matches of ASIFT are mainly concentrated in regions with small scene deformation differences in image pair, and SIFT suffers from the same problem, as shown in Fig. 9 (a) and (b). Both SIFT and ASIFT are designed based on perspective imaging model, so they can only cope with the mild distortions in central region. Differently, the local D-nets can obtain a more evenly distributed matches, as shown in Fig. 9 (c). Additionally, local D-nets manages to produce extensive correspondences for most experimental stereo images. Nevertheless, this algorithm is not stable, and its matching accuracy is unsatisfactory. As observed in Fig. 9 (d), FA-SIFT exhibits a tendency to distribute feature matches across the entire disk of the fisheye images. By utilizing the scalable integrated imaging mode, the fisheye image was flattened, significantly reducing distortion. This enabled FA-SIFT to obtain feature matches with a broader distribution, encompassing both the mildly distorted central region and the severely distorted peripheral region.

 figure: Fig. 9.

Fig. 9. Matching results of the four methods for the first set of indoor images. The matching results of SIFT, ASIFT, local D-nets, and FA-SIFT algorithms are shown in (a)-(d). The features are marked with “.” and the correspondences are connected by colored matching lines.

Download Full Size | PDF

The experimental images shown above were collected indoors. We acknowledge that compared to indoor scenes, outdoor scenes are typically more complex, involving stronger light interference and a higher risk of encountering large areas of weak texture (e.g., sky) and repetitive texture (e.g., floor tiles). To further validate the effectiveness and stability of the proposed algorithm, three sets and twelve pairs of outdoor images were captured, as shown in Fig. 10. The outdoor experiments were conducted based on the same experimental setup as described earlier.

 figure: Fig. 10.

Fig. 10. Stereo images captured from outdoor scenes. (a), (b), and (c) are stereo images of set 1, 2, and 3. The stereo images (the first to the fourth rows) are acquired by the cameras at relative motion with rotation, scaling, translation, and axis orientation change, respectively.

Download Full Size | PDF

Taking the first set of outdoor experimental images as an example, Fig. 11 shows the matching results of the four methods. It can be observed that the performance of the four methods for outdoor images does not appear as good as for indoor images, due to the increased complexity in outdoor scenes. In textureless areas like the sky area, all methods fail because there are essentially no features in such areas. In textured regions, even regions filled with repetitive textures, FA-SIFT still yields abundant matches for both the center and peripheral regions of the fisheye images, such as the school buildings in the leftmost and rightmost regions of the images. This further demonstrates the robustness and stability of FA-SIFT.

 figure: Fig. 11.

Fig. 11. Matching results of the four methods for the first set of outdoor images. (a)-(d) demonstrate the matching results of the SIFT, ASIFT, local D-nets, and FA-SIFT algorithms.

Download Full Size | PDF

Based on the outdoor experimental images, FA-SIFT is also compared with SIFT, ASIFT, and local D-nets. Their matching results are presented in Table 2. In Table 2, Roi, Soi, Toi, and Aoi (i = 1, 2, and 3) represent the outdoor experimental images pairs with rotation, scaling, translation, and camera-axis orientation change in set i, respectively. It can be observed that for outdoor images, SIFT and local D-nets do not perform as well on most of the images as they do on indoor images. The complex imaging environment outdoors confirms that these two methods are not robust enough.

Tables Icon

Table 2. The Matching Results of Four Algorithms for Outdoor Images

ASIFT and FA-SIFT still exhibited stable and robust matching performance for outdoor images. Their robust matching performance was attributed to the affine transformation step. As can be seen from the average values of indicators in Table 1 and Table 2, the PBM and RMSE values of FA-SIFT are slightly higher than those of ASIFT. This is due to the fact that FA-SIFT obtains a large number of matches in the peripheral regions in addition to the center region compared to ASIFT. The distortion of the peripheral regions is severe, but FA-SIFT has been effective in dealing with it to a large extent. In terms of both the number of matches and matching accuracy, FA-SIFT can still obtain excellent and stable matching results for each pair of outdoor images, and still exhibits absolute performance advantages over the other three methods. This is because FA-SIFT not only inherited the basic idea of ASIFT, but also designed the entire transformation strategy and feature extraction method based on the scalable integrated imaging mode tailored for fisheye cameras.

FA-SIFT first flattens the hemi-icosahedron into a flattened array model based on the scalable integrated imaging mode. Then, the entire fisheye image can generate 15 flattened simulation images. That is to say, FA-SIFT only needs 15 rounds of affine transformation, whereas P-ASIFT with icosahedron division requires 20 rounds of affine transformations. Obviously, compared to P-ASIFT, FA-SIFT can greatly reduce the computational complexity. To demonstrate the good performance of FA-SIFT in terms of the number of matches, matching accuracy, and uniformity of matches, the following comparison experiments with P-ASIFT using icosahedron division were performed.

Taking the first and second set of indoor images as examples, the matching results of P-ASIFT and FA-SIFT are shown in Fig. 12. It can be observed from Fig. 12 that the matches of P-ASIFT are mostly concentrated in regions that are close to the center or where the relative deformation between images is small. Compared with P-ASIFT, FA-SIFT obtains matches that are more widely distributed. Whether it is the peripheral region with large distortion or the region with large relative deformation, FA-SIFT tends to spread the matches over the entire disk. This is thanks to the scalable integrated imaging mode proposed in this paper. Moreover, based on the imaging model, the hemi-icosahedron is expanded into a flattened array model, which can reduce the nonlinear distortion of the fisheye image without obvious stretching deformation and loss of scene. This enables FA-SIFT to effectively overcome the interference of nonlinear distortion.

 figure: Fig. 12.

Fig. 12. Comparison results between P-SIFT and FA-SIFT on the first (a) and second (b) set of indoor images. The first column in (a) and (b) represents the results of P-ASIFT, while the second column in (a) and (b) represents the results of FA-SIFT.

Download Full Size | PDF

The matching results in Fig. 13 are more intuitive, and the same conclusion can be drawn. It is obvious that only a small number of matches are obtained in the peripheral areas by P-ASIFT. Differently, the matches of FA-SIFT are more widely distributed and more uniform, and a considerable number of features can be extracted even in the peripheral areas. This contrast is more striking in the blue ellipse area.

 figure: Fig. 13.

Fig. 13. Comparison results of feature distribution between P-ASIFT (a) and FA-SIFT (b).

Download Full Size | PDF

In addition to the indicators TPC, PBM, and RSME are used to compare P-ASIFT and FA-SIFT in terms of the number of feature matches and matching accuracy, the indicators ${d_m}$ and ${\Gamma _m}$ are also used to evaluate the matches in the peripheral regions as well as the distribution of the matches. ${d_m}$ is the average distance from each feature to the center of the image, and it is computed by the following formula:

$${d_m} = \frac{1}{{{N_T}}}\sum\limits_{i = 1}^{{N_T}} {\sqrt {{{({C_{ex}}(i) - {C_{ox}})}^2} + {{({C_{ey}}(i) - {C_{oy}})}^2}} } , $$
where $({C_{ex}}(i),{C_{ey}}(i))$ represents the coordinate of each feature; $({C_{ox}},{C_{oy}})$ represents the coordinate of the image center which can be obtained by parameter calibration. Larger value of ${d_m}$ indicate more feature matches present in the peripheral regions of a fisheye image.

${\Gamma _m}$ is the average distance between each feature and the centroid of all feature matching sets. Larger values of ${\Gamma _m}$ indicate that the feature matches are more spread out over the entire fisheye image disk region. The formula for ${\Gamma _m}$ is as follows:

$${\Gamma _m} = \frac{1}{{{N_T}}}\sum\limits_{i = 1}^{{N_T}} {\sqrt {{{({C_{ex}}(i) - {{\bar{C}}_{cx}})}^2} + {{({C_{ey}}(i) - {{\bar{C}}_{cy}})}^2}} } , $$
where $({\bar{C}_{cx}},{\bar{C}_{cy}})$ represents the coordinate of the centroid of all features. The expression of ${\bar{C}_{cx}}$ and ${\bar{C}_{cy}}$ is as follows:
$${\bar{C}_{cx}} = \frac{1}{{{N_T}}}\sum\limits_{i = 1}^{{N_T}} {{C_{ex}}(i)} $$
$${\bar{C}_{cy}} = \frac{1}{{{N_T}}}\sum\limits_{i = 1}^{{N_T}} {{C_{ey}}(i)} $$

The comparison results of P-ASIFT and FA-SIFT for indoor images are shown in Table 3. It can be observed that for the indicators TPC, PBM, RSME, and their average values performance of P-ASIFT and FA-SIFT do not differ much. Both methods can obtain a considerable number of matches with high accuracy. This is due to the fact that both methods apply the idea of affine transformations to local regions, something that ensures the accuracy and number of matches.

Tables Icon

Table 3. Comparative Quantization Results of P-ASIFT and FA-SIFT for Indoor Images

However, for indicators ${d_m}$ and ${\Gamma _m}$, the two methods obtain significantly different results. The ${d_m}$ and ${\Gamma _m}$ values of FA-SIFT are larger than those of P-ASIFT on all indoor image pairs. This indicates that although the two methods are not much different in the number of matches, the matches of P-ASIFT are more concentrated in the central region. FA-SIFT can obtain more widely distributed matches. Compared with P-ASIFT, which directly performs the affine transformation on local distorted regions of fisheye image, FA-SIFT establishes a flattened array model to reduce the distortion and then performs the affine transformation on flattened simulation images. This effectively overcomes the interference of nonlinear distortion.

Figure 14 demonstrates the matching results of P-ASIFT and FA-SIFT on the first and second set of outdoor fisheye images. It is evident from Fig. 14 that FA-SIFT can obtain feature matches with a broader distribution range. In the peripheral regions of the fisheye image, such as the teaching building and car scenes on the left and right sides of the fisheye images, and the human legs and red box scenes below the fisheye images, P-ASIFT often fails. FA-SIFT can handle the peripheral regions by utilizing the flattened array model and the scalable integrated imaging model.

 figure: Fig. 14.

Fig. 14. Comparison results between P-ASIFT and FA-SIFT on outdoor images. In (a) and (b), the first column shows the matching results of P-ASIFT, and the second column shows the matching results of FA-SIFT.

Download Full Size | PDF

The same conclusion can be drawn from Table 4, where the ${d_m}$ and ${\Gamma _m}$ values computed for the FA-SIFT method are significantly higher than those of P-ASIFT for each outdoor image pair. Thus, it can be inferred that the matches obtained by FA-SIFT are more dispersed. The outdoor scene is more complex. There are illumination variation and many repeated textures in the peripheral region of the fisheye image, such as the teaching building and floor tile textures, that is to say, FA-SIFT can still obtain widely distributed matches in the peripheral region when severe distortion and outdoor disturbances coexist. This shows that the stability of FA-SIFT is excellent.

Tables Icon

Table 4. Comparative Quantization Results of P-ASIFT and FA-SIFT for Outdoor Images

Although both P-ASIFT and FA-SIFT can obtain thousands of matches with good matching accuracy, FA-SIFT is significantly better than P-ASIFT in terms of the number of matches. This is because outdoors, some scenes in the peripheral regions are far away from the cameras, and the relative motion of the cameras does not cause obvious scene differences, so FA-SIFT is able to obtain more matches in the peripheral regions.

4. Conclusion

In this paper, we propose a new approach, named FA-SIFT, to find reliable feature matches with a wide distribution between fisheye images. The FA-SIFT approach yields matches with good performance even in peripheral regions with drastic distortion. Its main novelties are two-fold: first, we establish a scalable integrated imaging model and a flattened array model, which are utilized to reduce the distortion of the fisheye images without loss of scene details or stretching of images. This step ensures the widely distributed characteristics of feature matches. Second, we design the flattened simulation elements to construct the flattened simulation images for affine transformation, which can guarantee superiority in the number as well as robustness of the feature matches. We believe that for highly distorted fisheye images, our matching strategy may be a good solution to taking full advantage of the wide FOV of fisheye cameras.

In the future, we will adapt the FA-SIFT method for stereo fisheye cameras with a FOV wider than 180° or even up to 360°, such as the fashionable Ricoh Theta and GoPro cameras that can capture 360° panoramic view. The establishment of new imaging and projection models with panoramic characteristics is necessarily the first step in realizing this future work.

Funding

National Natural Science Foundation of China (62203332).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. Roxas and T. Oishi, “Variational fisheye stereo,” IEEE Robot. Autom. Lett. 5(2), 1303–1310 (2020). [CrossRef]  

2. X. Yang, X. Gao, D. Song, et al., “ASI aurora search: an attempt of intelligent image processing for circular fisheye lens,” Opt. Express 26(7), 7985–8000 (2018). [CrossRef]  

3. Y. Lin, F. Gao, S. Qin, et al., “Autonomous aerial navigation using monocular visual-inertial fusion,” J. Field Robot. 35(1), 23–51 (2018). [CrossRef]  

4. A. R. Sekkat, Y. Dupuis, VR Kumar, et al., “SynWoodScape: synthetic surround-view fisheye camera dataset for autonomous driving,” J. Field Robot. 7(3), 8502–8509 (2022). [CrossRef]  

5. VR Kumar, Cá Eising, S. K. Witt, et al., “Surround-view fisheye camera perception for automated driving: overview, survey & challenges,” IEEE Trans. Intell. Transp. Syst. 24(4), 3638–3659 (2023). [CrossRef]  

6. J. Sun, Z. Yang, and S. Li, “3D measurement,” Opt. Express 31(11), 18379–18398 (2023). [CrossRef]  

7. L. Kou, K. Yang, L. Luo, et al., “Binocular stereo matching of real scenes based on a convolutional neural network and computer graphics,” Opt. Express 29(17), 26876–26893 (2021). [CrossRef]  

8. M. Fiala and G. Roth, “Automatic alignment and graph map building of panoramas,” in IEEE International Workshop on Haptic Audio-Visual Environments and their Applications, (2005).

9. D. G. Lowe, “Object recognition from local scale-invariant features,” in IEEE International Conference on Computer Vision, (1999).

10. J. Zhang, X. Yin, T. Luan, et al., “An improved vehicle panoramic image generation algorithm,” Multimed. Tools Appl. 78(19), 27663–27682 (2019). [CrossRef]  

11. H. Bay, T. Tuytelaars, and L. V. Gool, “Speeded up robust features,” in European Conference on Computer Vision, (2006).

12. M. G. Miiller, F. Steidle, M. J. Schuster, et al., “Robust visual-inertial state estimation with multiple odometries and efficient mapping on an MAV with ultra-wide FOV stereo vision,” in IEEE International Conference on Intelligent Robots and Systems, (2018).

13. I. Lo, K. Shih, and H. H. Chen, “Image stitching for dual fisheye cameras,” in IEEE International Conference on Image Processing, (2018).

14. A. Ohashi, Y. Tanaka, G. Masuyama, et al., “Fisheye stereo camera using equirectangular images,” in International Conference on Research and Education in Mechatronics, (2016).

15. E. Rublee, V. Rabaud, K. Konolige, et al., “ORB: An efficient alternative to SIFT or SURF,” in IEEE International Conference on Computer Vision, (2011).

16. Q. Zhao, W. Feng, J. Wan, et al., “SPHORB: a fast and robust binary feature on the sphere,” Int. J. Comput. Vis. 113(2), 143–159 (2015). [CrossRef]  

17. N. Pourian and O. Nestares, “An end-to-end framework to high performance geometry-aware multi-scale keypoint detection and matching in fisheye images,” in IEEE International Conference on Image Processing, (2019).

18. Y. Zhang, H. Li, W. Zhang, et al., “Establishing a large amount of point correspondences using patch-based affine-scale invariant feature transform for fisheye images,” J. Electron. Imaging 30(4), 043022 (2021). [CrossRef]  

19. J. M. Morel and G. Yu, “ASIFT: A new framework for fully affine invariant image comparison,” SIAM J. Imaging Sci. 2(2), 438–469 (2009). [CrossRef]  

20. D. Kang, H. Jang, J. Lee, et al., “Uniform subdivision of omnidirectional camera space for efficient spherical stereo matching,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022). [CrossRef]  

21. M.Ã. Flores, D. Valiente, L. Gil, et al., “Efficient probability-oriented feature matching using wide field-of-view imaging,” Eng. Appl. Artif. Intell. 107, 104539 (2022). [CrossRef]  

22. H. Cho, J. Jeong, and K-J Yoon, “EOMVS: event-based omnidirectional multi-view stereo,” IEEE Robot. Autom. Lett. 6(4), 6709–6716 (2021). [CrossRef]  

23. W. Hou, M. Ding, X. Qin, et al., “Digital deformation model for fisheye image recti,” Opt. Express 20(20), 22252–22261 (2012). [CrossRef]  

24. J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1335–1340 (2006). [CrossRef]  

25. S. Xie, D. Wang, and Y. Liu, “OmniVidar: omnidirectional depth estimation from multi-fisheye images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023).

26. L. Gallagher, G. Sistu, J. Horgan, et al., “A System for Dense Monocular Mapping with a Fisheye Camera,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (2023). [CrossRef]  

27. Y. Zhang, H. Li, C. Zhang, et al., “Dense stereo fish-eye images using a modified hemispherical ASW algorithm,” J. Opt. Soc. Am. A 38(4), 476–487 (2021). [CrossRef]  

28. Y. Zhang, H. Zhang, and W. Zhang, “Feature matching based on curve descriptor and local D-nets for fish-eye images,” J. Opt. Soc. Am. A 37(5), 787–796 (2020). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (14)

Fig. 1.
Fig. 1. Original fisheye image (a) and two types of corrected image: perspective-rectified images (b) and equirectangular image (c).
Fig. 2.
Fig. 2. The scalable integrated imaging model.
Fig. 3.
Fig. 3. The regular icosahedron is expanded into five connected parallelograms.
Fig. 4.
Fig. 4. The hemi-icosahedron is expanded into a flattened array model.
Fig. 5.
Fig. 5. The diagram of a flattened array image and its element images.
Fig. 6.
Fig. 6. The diagram of the flattened simulation element.
Fig. 7.
Fig. 7. The flattened simulation images and element images. The first row shows the original stereo fisheye images, and the second row shows the flattened simulation images.
Fig. 8.
Fig. 8. Stereo images captured from indoor scenes. (a), (b), and (c) are stereo images of set 1, 2, and 3. The stereo images (the first to the fourth rows) are acquired by the cameras at relative motion with rotation, scaling, translation, and axis orientation change, respectively.
Fig. 9.
Fig. 9. Matching results of the four methods for the first set of indoor images. The matching results of SIFT, ASIFT, local D-nets, and FA-SIFT algorithms are shown in (a)-(d). The features are marked with “.” and the correspondences are connected by colored matching lines.
Fig. 10.
Fig. 10. Stereo images captured from outdoor scenes. (a), (b), and (c) are stereo images of set 1, 2, and 3. The stereo images (the first to the fourth rows) are acquired by the cameras at relative motion with rotation, scaling, translation, and axis orientation change, respectively.
Fig. 11.
Fig. 11. Matching results of the four methods for the first set of outdoor images. (a)-(d) demonstrate the matching results of the SIFT, ASIFT, local D-nets, and FA-SIFT algorithms.
Fig. 12.
Fig. 12. Comparison results between P-SIFT and FA-SIFT on the first (a) and second (b) set of indoor images. The first column in (a) and (b) represents the results of P-ASIFT, while the second column in (a) and (b) represents the results of FA-SIFT.
Fig. 13.
Fig. 13. Comparison results of feature distribution between P-ASIFT (a) and FA-SIFT (b).
Fig. 14.
Fig. 14. Comparison results between P-ASIFT and FA-SIFT on outdoor images. In (a) and (b), the first column shows the matching results of P-ASIFT, and the second column shows the matching results of FA-SIFT.

Tables (4)

Tables Icon

Table 1. PBM Values and RSME Values Obtained by The Four Algorithms on Indoor Images

Tables Icon

Table 2. The Matching Results of Four Algorithms for Outdoor Images

Tables Icon

Table 3. Comparative Quantization Results of P-ASIFT and FA-SIFT for Indoor Images

Tables Icon

Table 4. Comparative Quantization Results of P-ASIFT and FA-SIFT for Outdoor Images

Equations (21)

Equations on this page are rendered with MathJax. Learn more.

d i m a ( ρ ) = 2 f sin ( ρ 2 ) ,
Ω ν 1 ν 2 = [ X 1 Y 2 X 2 Y 1 , X 1 Z 2 X 2 Z 1 , X 1 X 2 , Y 1 Z 2 Y 2 Z 1 , Z 1 Z 2 , Y 2 Y 1 ] .
Φ ^ ( O m , ν 1 ν 2 ) = X m ( Y 2 Z 1 Y 1 Z 2 ) + Y m ( X 2 Z 1 X 1 Z 2 ) + Z m ( X 2 Y 1 X 1 Y 2 ) .
{ X Δ = X m ( i Δ X 1 + j Δ Y 1 + k Δ Z 1 ) i Δ X m + j Δ Y m + k Δ Z m Y Δ = Y m ( i Δ X 1 + j Δ Y 1 + k Δ Z 1 ) i Δ X m + j Δ Y m + k Δ Z m Z Δ = Z m ( i Δ X 1 + j Δ Y 1 + k Δ Z 1 ) i Δ X m + j Δ Y m + k Δ Z m .
T r i α = [ α 1 α 2 O p ] ,
T r i β = [ β 1 β 2 O p 1 ] .
c e n α = [ 1 3 i = 1 3 x ~ i α , 1 3 i = 1 3 y ~ i α , 0 ] T ,
c e n β = [ 1 3 i = 1 3 x ~ i β , 1 3 i = 1 3 y ~ i β , 0 ] T .
H o = ( T r i _ α c e n _ α ) × ( T r i _ β c e n _ β ) T
H o = U E T × S E T × V E T ,
R E T = V E T × U E T T ,
T E T = R E T × c e n α + c e n β .
α U = [ α 1 α 2 α 3 α 4 α n O p ] .
β U = R E T × α U + [ T 1 , T 2 , T 3 , , T n , T n + 1 ] ,
A affine = H γ R i ( υ ) T t R i i ( ϖ ) = γ [ cos υ r sin υ sin υ cos υ ] [ t 0 0 r 1 ] [ cos ϖ r sin ϖ sin ϖ cos ϖ ] ,
Θ PBM  =  N T  -  N C N T × 100 % ,
Ω RMSE = 1 N T i = 1 N T ( I R ( i ) I T ( i ) ) 2 ,
d m = 1 N T i = 1 N T ( C e x ( i ) C o x ) 2 + ( C e y ( i ) C o y ) 2 ,
Γ m = 1 N T i = 1 N T ( C e x ( i ) C ¯ c x ) 2 + ( C e y ( i ) C ¯ c y ) 2 ,
C ¯ c x = 1 N T i = 1 N T C e x ( i )
C ¯ c y = 1 N T i = 1 N T C e y ( i )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.