See farther and more: a master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline

Yijie Zhang; Yijie Zhang; Pei An; Pei An; Zhilong Li; Zhilong Li; Qiong Liu; Qiong Liu; You Yang; You Yang

doi:10.1364/OE.520677

1. Introduction

Designing an optical system with both the wide field of view (FoV) and high imaging resolution is still a challenging task. It has the various applications, such as medical microscopy [1], astronomical observations [2], and mechanical structural analysis [3]. The optical system with the single camera cannot promise the wide FoV and high resolution at the same time, due to the limitation of focal length and depth of view (DoF) [4]. To address this problem, the traditional way is designing multi-camera system (or say, camera array) with the short or fixed baselines, for multi-view images are able to bring more information of the light field [5–8].

We focus on generating the aerial image with the wide FoV and high resolution. Different from the conventional ground-to-ground or ground-to-air image acquisition, air-to-ground image collection requires the reasonable utilization of optical systems. Reasonable utilization of optical systems [9] can improve efficiency and reduce human and material resources. Under this background, the camera array can be formed by the swarm UAVs [10]. However, a large number of UAVs is requires by this scheme to ensure the large area of FoV overlap with the neighbored UAVs. Therefore, a better solution is to reduce the number of UAVs in this optical system. In this case, to achieve the wide FoV, the overlap ratio of UAV observation area should be cut down. Also, in the real flying circumstance of swarm UAVs, the distance between different UAVs changes dynamically throughout the flight. Thus, the optical system with the few UAVs is actually the camera array with the wide and dynamic baseline. As the FoV overlap between the neighbored cameras is small, how to stitch the multi-view aerial images is a crucial problem.

We further analysis the difficulty of stitching the multi-view aerial images with small FoV overlaps. The traditional approach typically stitches multiple images to obtain as high resolution panoramic image [11], and the accuracy of the image stitching relies on large overlapping areas. Large overlapping areas provide sufficient feature points for subsequent matching stage and ensure the registration accuracy, meanwhile, it also reduces the utilization efficiency of the images, making it necessary to increase the number of cameras to capture the more visual information of the target scene. Moreover, when capturing the identical object with cameras from different viewpoints, the optical centers of these cameras can hardly be aligned perfectly in the air. This leads to noticeable differences in the appearance of captured objects [12]. For instance, when capturing the same car, a camera positioned to the left may capture a substantial portion of the car’s left side, while a camera to the right may only capture a limited portion of the left side. The variation in camera viewpoint leading to differences in the shapes of captured objects, poses challenges for image registration. In many practical applications, the shape differences of objects captured from different angles are unavoidable, and employing traditional image registration methods can lead to unacceptable error accumulation [13].

In order to generate the image with wide FoV and high resolution using few UAVs, we propose a novel synthetic optical aperture imaging system with wide and dynamic baseline based on master-slave UAVs. This system consists of one master UAV and multiple slave UAVs. Master and slave UAVs can provide the image at the different scales with the global and local FoV, respectively. FoV of master UAV is designed to cover FoVs of all slave UAVs, and improve the efficiency of image acquisition. In the proposed system, multi-view image stitching is still a problem because of the small overlap between slave UAVs and the scale gap between salve and master UAVs. To address these problems, we propose a coarse-to-fine stitching method. Consequently, we further exploit an adaptive video synthesis algorithm to merge individual frames into a smoothness video. Experimental results and application validation demonstrate the efficiency of the master-slave UAV based system and the coarse-to-fine stitching method in synthesizing the high resolution panoramic videos. Therefore, we believe the proposed system is able to make contributions to future emergency rescue operations and other extensive visual applications. The source code is shown in Code 1 (Ref. [14]).

The remainder of this paper is organized as follows. In Sec.2, we briefly introduce the high resolution image system of camera array and the wide baseline image stitching. Following that, the proposed master-slave UAVs architecture and a coarse-to-fine stitching method are presented by details in Sec.3. In Sec.4, we discuss the experimental setup, results, application-based validation. Finally, we summarize our work in Sec.5.

2. Related works

In the traditional optical system, in order to capture an image with the wide FoV, it is necessary to reduce the system’s focal length. It leads a reduction in image resolution. Traditional monocular imaging system cannot capture a both high resolution and wide field of view image [15]. The innovations of sensors [16–19] in optical systems is an effective method for addressing difficulties. From wide-angle lenses to fish-eye lenses, sensor upgrades enable lenses to expand their FoV. Typically, a more convenient approach is to use a camera array. Researchers often design camera arrays imaging systems [13,20] to enlarge the scene content in images. These devices are typically bulky and have a limited shooting angle, making them unsuitable for aerial operations with dynamic baselines. Kopf et al. [21] attempted to use a high resolution Canon camera to capture images with optimal exposure at different positions. They then matched and stitched hundreds of images to obtain a wide Fov image with a resolution of one billion pixels. However, this approach require an excessively long processing time and its application scope is limited. Brady et al. [5] constructed the first gigapixel camera AWARE2. This camera includes a spherical objective lens capturing the global scene, while 98 micro-optical cameras capturing the local scene. This camera system is limited in usage of air applications by its considerable electrical power cost and a large physical volume. Yuan et al. [22] proposed a robust image matching and alignment algorithm that can generate gigapixel video, and it optimize the traditional array camera calibration requirement of having significant overlap regions between cameras. Firas proposed a new multi camera system with circular arrangement and designed a calibration model based on color chessboard [13]. In this model, the requirement of overlap is also discussed and optimized.

It poses significant challenges in image registration when the overlap between UAVs with wide baseline is small. Meng et al. [23] utilized two UAVs with the wide baseline to collect images. They corrected the perspective distortion by utilizing UAV attitude and GPS location data from the onboard flight controller. However, they use datasets such as lawns, playgrounds, and other objects close to a plane, without disparity for the same object in different images, thus their approach lacks universality. To address the misalignment caused by the parallax of objects in the video, Lai et al. [24] introduced a fast push broom interpolation layer and proposed a novel push broom stitching network, which learns a dense flow field to smoothly align the multiple input videos for spatial interpolation. Existing methods are limited to addressing images with parallax issues at the same scale. However, in the case of different scale drone images, the problems related to object disparity become more prominent. Therefore, this paper proposes a two-stage image registration method to achieve visually appealing wide-field image results. This approach is beneficial for downstream tasks, and it will be introduced in the third section.

3. Proposed synthetic optical aperture imaging system

3.1 Overview

In this paper, we design a master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline to generate the aerial image with both the large FoV and the high resolution. To the best of our knowledge, it is the first time to exploit the master-slave UAV architecture for the aerial imaging system. The master-slave UAVs based synthetic optical aperture imaging system is introduced in Sec. 3.2. In order to align the high resolution image from master-slave UAVs, we present an coarse-to-fine image stitching method in Sec. 3.3. This step aligns the images captured by the slave UAVs to the coordinate system of the master UAV’s captured images. Also, to eliminate the ghosting in the aligned images captured by the slave UAVs, an effective image blending method is cited in Sec. 3.4. Finally, to generate the smooth video from the high resolution image sequences, a video stabilization method is also provided for this imaging system in Sec. 3.5.

3.2 Master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline

In this paper, to detect the wide ground area with the high imaging resolution, we design a novel master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline. As showed in Fig. 1(a), the proposed imaging system consists of one master UAV and the multiple slave UAVs. It is regarded as the camera array with the wide and dynamic baseline. Each UAV is equipped with the optical camera with the same intrinsic parameters. In this imaging system, master UAV provides the global image with the large FoV. FoV of the master UAV covers the FoVs of all slave UAVs. In the world coordinate system, the position of master UAV is marked as $(X_M,Y_M,Z_M)^T$. Slave UAVs provide the local high-resolution image with the small FoVs. Specifically, slave UAVs can be regarded as a $N\times N$ camera array. Position of the slave UAV in the $i$-th rows and $j$-th columns is marked as $(X_S(i,j),Y_S(i,j),Z_S(i,j))^T$ where $i\in [1,N]$, $j\in [1,N]$. After introducing the master and slave UAVs, we strictly define master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline in the following:

Fig. 1. Synthetic optical aperture imaging system with wide and dynamic baseline for UAV. (a) The proposed master-slave UAV based synthetic optical aperture imaging system. For the clear understanding, both the side and 3D views of imaging system are provided. (b) Swarm UAVs based imaging system. (c) Single UAV based imaging system. (d) The proposed system in the real application.

Download Full Size | PDF

Definition 3.1 (Master-slave UAVs based synthetic optical aperture with wide and dynamic baseline)

An imaging system can be regarded as a master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline by following five conditions: (i) it consists of $1$ master UAV and $N^2$ slave UAVs; (ii) During the flight, the relative motion between master UAV and all slave UAVs is constant. (iii) During the flight, slave UAVs are uniformly in a $N\times N$ square array parallel to the ground, and FoVs of the neighbored slave UAVs have small or even no overlap; (iv) During the flight, FoV of master UAV covers the FoVs of all slave UAVs; (v) During the flight, the master UAV is exactly above the center of slave UAVs.

More in-depth explanations of Definition 3.1 are provided. At first, with the usage of Beidou Navigation Satellite System (BDS) and flight control algorithm embedded in the commercial UAV, condition (ii) is easy to be satisfied. Second, benefiting from BDS, from the conditions (iii) to (v), the position constraint of master and slave UAVs can be determined, the master and slave UAVs fly in unison according to the definition.

(1)$$UAV X_M = \frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N X_S(i,j), \,\, Y_M = \frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N Y_S(i,j)$$

Condition (iii) implies that $\vert X_S(i,j)-X_S(i-1,j) \vert = \vert Y_S(i,j)-Y_S(i,j-1) \vert$ and $Z_S(i,j)$ is constant during the flight. Combining the conditions (iii) and (iv), a weak position constraint is obtained as:

(2)$$\vert X_S(i,j)-X_S(i-1,j) \vert \leq \frac{1}{N} \vert X_S(1,j)-X_S(N,j) \vert$$

(3)$$Z_M \geq 2\cdot Z_S(i,j)$$

After discussing the definition of the proposed master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline, we further analyze the advantage of this imaging system. Compared with swarm UAVs based imaging system in Fig. 1(b), the proposed system does not requires too many UAVs. In swarm UAVs based system, to prevent the failure of image stitching method, FoV overlap of the neighbored UAVs should be large enough to match the high quality 2D-2D correspondences. On the contrary, the proposed system does not require the large FoV overlap of the neighbored slave UAVs, because the image stitching is operated on the slave and master UAVs. Thus, the proposed imaging system requires UAVs less than swarm UAVs. Compared with the single UAV based imaging system in Fig. 1(c), the proposed method can generate the image with the higher resolution, for it make uses of the imaging resolution of slave UAVs.

In the actual applications, due to the flight control error of UAVs, conditions (ii)-(v) cannot be satisfied strictly. Thus, images captured by the master UAV and slave UAVs cannot be stitched directly. To solve this problem, we design a coarse-to-fine image stitching method for master-slave UAVs. It will be introduced in Sec 3.3.

3.3 Coarse-to-fine image stitching method for master-slave UAVs

To generate the aerial image with the large FoV and high resolution, we propose a coarse-to-fine image stitching method for master-slave UAVs. Traditional image stitching succeeds with a single registration due to uniform scale and absence of object disparity. However, it is not applicable to images with gap of resolution scale and having significant object disparities. Therefore, we propose a two-stage image registration method. The procedure is shown in Fig. 2(a). To overcome the scale difference of master and slave images, the proposed method consists of two stage (i.e. in a coarse-to-fine manner). The technique details are discussed in the following.

Fig. 2. Overview of the proposed image processing approach for master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline. (a) Flowchart of the proposed method. It consists of image stitching and video stabilization methods. Terms extract. and homo. denote extraction and homography. (b) Images captured by master-slave UAVs (one master and four slave UAVs). (c) Stitched image in the two slave case. (d) Stitched image in the four slave case.

Download Full Size | PDF

Inputs and output of the proposed stitching method are illustrated at first. It takes one $W\times H\times 3$ image $\mathbf {M}_\text {1}$ (captured from the master UAV) and the multiple $W\times H\times 3$ images $\{\mathbf {S}_\text {i}\}_{\text {i}=1}^{\text {N}^2}$ (captured from $N^2$ slave UAVs) as inputs, in which $W$ and $H$ are the width and height of the image. $\mathbf {M}_\text {1}$ and $\mathbf {S}_\text {i}$ are shown in Fig. 2(b). It is found that FoV of $\mathbf {M}_\text {1}$ covers FoV of $\mathbf {S}_\text {i}$ $(\text {i}=1,\ldots,\text {N}^2)$, and FoVs of $\mathbf {S}_\text {i}$ and $\mathbf {S}_\text {j}$ has the few overlap, as shown in Fig. 2(b). The aim of the proposed method is to generate a super-resolution $W\times H\times 3$ image $\mathbf {R}$ from $\mathbf {M}_\text {1}$ and $\mathbf {S}_\text {i}$ $(\text {i}=1,\ldots,\text {N}^2)$. $\mathbf {R}$ has the same FoV of $\mathbf {M}_\text {1}$, while $\mathbf {R}$ has the resolution $N\times$ higher than $\mathbf {M}_\text {1}$. Visualization of $\mathbf {R}$ is shown in Fig. 2(d). In the actual application, $N$ is set as $2$ for both the system cost and image stitching efficiency.

High-altitude UAV images are often affected by fog, especially $\mathbf {M}_\text {1}$ captured by the Master UAV. To highlight the details of the images and facilitate subsequent cross-scale feature extraction and feature matching, we employ a 4*4 local histogram to dehaze UAV images, resulting in images with prominent features.

After that, the procedure of the proposed coarse-to-fine stitching method is discussed. Initially Feature extract. I is exploiting the conventional techniques to independently detect 2D-2D keypoint correspondence from $\mathbf {S}_\text {i}$ and $\mathbf {M}_\text {1}$. With these coarse 2D-2D correspondences, $\mathbf {S}_\text {i}$ is coarsely warped onto the master image $\mathbf {M}_\text {1}$. It generates the coarsely warped detail images $\mathbf {S}_\text {i,coarse}$. Then, with the usage of pre-trained neural networks in Feature extract. II, the fine 2D-2D correspondences are matched from $\mathbf {S}_\text {i,coarse}$ and $\mathbf {M}_\text {1}$. It can generate the fine-grained image $\mathbf {S}_\text {i,fine}$. To address the issue of object disparity in the overlapping regions of the neighbored $\mathbf {S}_\text {i,fine}$ and $\mathbf {S}_\text {j,fine}$, we employ the maximum flow and minimum cut to calculate stitching seam, which can generate a high-quality stitching result.

To alleviate the gap of resolution scale of $\mathbf {M}_\text {1}$ and $\mathbf {S}_\text {i}$, the first step is the coarse warp between $\mathbf {M}_\text {1}$ and $\mathbf {S}_\text {i}$. It aims to generate a $W\times H\times 3$ image $\mathbf {S}_\text {i,coarse}$ from $\mathbf {S}_\text {i}$ where $\mathbf {S}_\text {i,coarse}$ has the nearly same FoV of $\mathbf {M}_\text {1}$. Technique details of coarse warping is represented in Fig. 3(a). Using the SIFT [25] method, both $\mathbf {S}_\text {i}$ and $\mathbf {M}_\text {1}$ will detect a sufficient number of feature points $p_i=(x_i,y_i)$ and $P_k=(X_k,Y_k)$. Then, the Euclidean distance is calculated from $p_i=(x_i,y_i)$ in $\mathbf {S}_\text {i}$ to the corresponding $P_k=(X_k,Y_k)$ in $\mathbf {M}_\text {1}$, and the two feature points in which the Euclidean distance is the smallest are selected. We use the brute-force matching by computing the Euclidean distance between $p_i=(x_i,y_i)$ and $P_k=(X_k,Y_k)$:

(4)$$D(P_k,p_i)=\sqrt{\left(X_k-x_i\right)^2+\left(Y_k-y_i\right)^2}$$

Fig. 3. Overview of the the two-stage image warping strategy. (a) Technique details of coarse warping. (b) Technique details of fine warping. (c) Inputs of coarse warping($\mathbf {M}_\text {1}$ and $\mathbf {S}_\text {1}$ captured by Slave-1). (d) Result of pre-warp and the corresponding part in $\mathbf {M}_\text {1}$. (e) Result of post-warp.

Download Full Size | PDF

Furthermore, the ratio of the Euclidean distances of the two selected feature points is calculated. If this ratio falls below a threshold T (typically ranging between 0.4 and 0.6), the pair of matches is considered acceptable; otherwise, it is rejected. Based on this, we obtain matching pairs of feature point $p_j=(x_j,y_j)$ and $P_j=(X_j,Y_j)$ between $\mathbf {S}_\text {1}$ and $\mathbf {M}_\text {1}$. Next, randomly selecting four pairs of feature point matches, we use the RANSAC [26] (Random Sample Consensus) algorithm to estimate the spatial optimal homography matrix $\text {H}_{\text {i},\text {coarse}}^{\text {S}}$ for warping $\mathbf {S}_\text {i}$ to $\mathbf {M}_\text {1}$:

(5)$$\arg\min_{\text{H}_{\text{i},\text{coarse}}^S}\sum_{j=1}^n\begin{Vmatrix}P_j-\text{H}_{\text{i},\text{coarse}}^S\cdot P_j\end{Vmatrix}We$$

where $P_j=(x_j,y_j,1)$ and $P_j=(X_j,Y_j,1)$. For N iterations, we repeat the process and identify the iteration with the highest number of inliers. In cases where the inlier ratio is particularly low, such as challenging scenarios, we often increase the value of N substantially (e.g., to 500), aiming to ensure that at least one subset of the four matched features does not include any outliers. Using the $\mathbf {M}_\text {1}$ as a reference, coarsely warp the scene content of the $\mathbf {S}_\text {i}$ into the scale of $\mathbf {M}_\text {1}$ to obtain $\mathbf {S}_\text {i,coarse}$. Apply the same method to coarsely warp the other three detail images.

To compute the accurate homography matrix, the second step is the fine warp between $\mathbf {M}_\text {1}$ and $\mathbf {S}_\text {i,coarse}$. It aims to generate a fine warped $W\times H\times 3$ image $\mathbf {S}_\text {i,fine}$. Due to the different viewpoints of the master and slave drones, the features of the same object in $\mathbf {S}_\text {i}$ and $\mathbf {M}_\text {1}$ are not entirely identical. Traditional feature detection and matching methods can only achieve a coarse registration. Technique details of fine warping is represented in Fig. 3(b). By utilizing the self-supervised training of the deep neural network architecture SuperPoint [27], fine 3D feature points are extracted from $\mathbf {S}_\text {i,coarse}$ and corresponding parts of $\mathbf {M}_\text {1}$. Next, the Graph Neural Network SuperGlue [28] method is employed to achieve feature point matching and outlier exclusion. Finally, based on the refined feature point matching pairs, recalculate the spatial fine homography matrix $\text {H}_{\text {i},\text {fine}}^\text {S}$, rewarp the geometric position of $\mathbf {S}_\text {i,coarse}$ to obtain $\mathbf {S}_\text {i,fine}$. Apply the same method to finely warp the other three coarsely warpped images.

3.4 Image blending method for master-slave UAVs

After the two-stage image registration, the images captured by slave UAVs have been prealigned. Due to the presence of parallax, ghosting always occur in overlapping area between adjacent $\{\mathbf {S}_\text {i}\}_{\text {i}=1}^4$. In order to blend a global image with better visual perception quality, we propose a seam cut method based on histogram matching. After the coarse-to-fine image warping, we obtain $W\times H\times 3$ images $\{\mathbf {S}_\text {i,fine}\}_{\text {i}=1}^4$. However, due to the inconsistency in the sensitivity elements of the UAV camera, the $\{\mathbf {S}_\text {i,fine}\}_{\text {i}=1}^4$ often exhibit varying brightness levels. Therefore, we utilize the overlapping region between $\{\mathbf {S}_\text {i,fine}\}_{\text {i}=1}^4$ to perform histogram matching, aligning the brightness of $\{\mathbf {S}_\text {i,fine}\}_{\text {i}=1}^4$.

Seam cutting is an effective way to address this problem. It means to seek an optimal seam that is comfortable to human perception. Thus, we conduct max-flow min-cut [29] to find an optimal seam and copy each warped image to the corresponding side of the seam.

Approach of max-flow min-cut is represented in Fig. 4. The optimal seam line is formed by connecting all the links with the minimum energy. In a case of two adjacent images $\mathbf {S}_\text {i, fine}$ and $\mathbf {S}_\text {j, fine}$, $\mathrm {L=}\{0,1\}$ is a label set, "0" represents $\mathbf {S}_\text {i, fine}$, "1" represents $\mathbf {S}_\text {j, fine}$. In optical systems, the energy function serves as an effective means to represent the distribution of energy [30]. The energy of each edge $E(l)$ in the graph is defined as:

(6)$$E(l)=\sum_{p\in\Omega}E_d(l_p)+\sum_{(p,q)\in N}E_s(l_p,l_q)$$

Where $E_d$ represents the data term, $E_s$ represents the smoothness term, $\Omega$ denotes the overlap region, $p$, $q$ are two adjacent pixels. $l_p$ and $l_p$ represent the labels corresponding to $p$ and $q$ in $\Omega$, $N$ represent four connected neighborhoods.

Fig. 4. Representation of seam cutting. We set each pixel as node, and the thickness of the line between adjacent pixel as the edge weight. The red line represents the calculated stitching seam using the max-flow min-cut algorithm.

Download Full Size | PDF

The data term $E_d$ is designed to ensure that all pixels on the seam line fall within the overlap region. Assuming a pixel is outside the overlap region, the value of the data term is infinity, making it impossible to satisfy the condition for minimizing the energy function.

(7)$$E_d(p,l_p)=\begin{cases}0, & ifp\in\Omega\\+\infty, & else\end{cases}$$

In contrast, the smoothness term $E_s$ is intended to determine the shape of the seam line:

(8)$$E_s(l_p,l_q)=|l(p)-l(q)|\cdot[\|I_{l(p)}(p)-I_{l(q)}(p)\|_2+\|I_{l(p)}(q)-I_{l(q)}(q)\|_2]$$

Finally, after calculating the stitching seams between all adjacent detail images, the super-resolution image $\mathbf {R}$ can be obtained.

3.5 Video stabilization method for master-slave UAVs

The proposed coarse-to-fine image stitching methods are based on videos captured by Master-slave UAVs. However, sudden changes in the scene lead to significant variations in the inter-frame homography transformation of the same video. In Sec 3.3, our method only considers spatial transformations, which does not consider the temporal transformations in video stitching. To keep the smooth transition in neighboring frames, a naive idea is to track feature points in neighboring frames, however, it is time-consuming to track feature points in video frames. In our paper, we have computed the spatial fine homography transformation $\text {H}_{\text {i},\text {t}}^{\text {S}}(\text {i}=1,\ldots,N,\text {t}=1,\ldots,T)$ in Sec 3.3, we also need to compute a temporal global homography transformation $\text {H}_{\text {i},\text {t}}^{\text {T}}(\text {i}=1,\ldots,N,\text {t}=1,\ldots,T)$ to make video result smoothly transiting from frame 0 to frame $t$. The essence of smooth transition is that the difference in homography matrix frame 0 to frame $t$ is not significant. In order to ensure that the difference in the homography matrix from frame 1 to frame $t$ is not significant compared to frame 0, we use $\{\text {H}_{\text {i},\text {0}}^{\text {S}}\}_{\text {i}=1}^4$ as the reference. $\text {H}_{\text {i},\text {t}}^{\text {T}}$ is as follows:

(9)$$\text{H}_{\text{i},\text{t}}^{\text{T}}=(\text{H}_{\text{i},0}^{\text{S}})^{a}\cdot(\text{H}_{\text{i},\text{t}}^{\text{S}})^{1-a}$$

where $a$ is the smoothing coefficient. It typically falls within the range of 0 to 1, commonly set at 0.3, and it can be adjusted based on specific circumstances. Video stabilization result is presented in the supplement material.

4. Experiments and discussions

4.1 Experimental environment and data

To verify the effectiveness of the master-slave based synthetic optical aperture imaging system with wide and dynamic baseline, we not only collected the UAV images from different regions with different flight altitudes, but also compared the experimental results with the algorithms from open source. Figure 5 shows the master-slave architecture we built. In order to ensure that communication between drones at different heights does not have any impact, we use different kinds of drones as the master UAV and slave UAV. We use MMC M11 as master UAV and DJI M300RTK as slave UAVS, each UAV can communicate with its fight controller for over 8KM. All the images were divided into three groups according to where they were collected. The parameters of sequences are shown in Table 1. Specific scenes are shown in Fig. 5(c), Fig. 5(d) and Fig. 5(e). Scene one is an example of a parkade. In this scene, there are bridges to test registration accuracy and moving vehicles to test video stability. Scene two and scene three are examples of a square and a park, respectively. In these two scenes, there are roads and buildings to test registration accuracy. The experimental computer environment is Windows 11 operating system, GPU: GeForce RTX 2080 Ti, CPU: Intel Core i7-12700, RAM: 16 GB.

Fig. 5. The master-slave architecture we used in (a) parkade, (b) square and park, each UAV is equipped with a flight controller to receive collected videos. Groups of experimental UAV images captured in (c) parkade, (d) square, (e) park.

Download Full Size | PDF

Table 1. Parameters of sequences captured from the master-slave based synthetic optical aperture imaging system

View Table | View all tables in this article

Experiments are divided into subjective and objective experiments to verify the performance of our method compared with other image stitching experiments. In experiments, for a better presentation of the image stitching performance, we uniformly employ the stitching seams calculated from the overlapping regions after two-stage registration, along with the histogram registration and multiband fusion for image stitching. In objective experiments, we incorporate two crucial evaluation metrics for the quantitative assessment and comparison of our proposed algorithm’s stitching results with those of existing methods. The first metric we employ is PSNR (Peak Signal-to-Noise Ratio) index, which assesses the similarity between two images by evaluating the errors at corresponding pixel points. The higher score indicates the more favorable outcome. PSNR is based on MSE (Mean Square Error). For two images denoted as $I$ and $K$, MSE is defined as:

(10)$$MSE=\frac1{mn}\sum_{i=0}^{m-1}\sum_{j=0}^{n-1}[I(i,j)-K(i,j)]^2$$

PSNR is defined as:

(11)$$PSNR=10\cdot\log_{10}\left(\frac{MAX_{I}^{2}}{MSE}\right)=20\cdot\log_{10}\left(\frac{MAX_{I}}{\sqrt{MSE}}\right)$$

The second metric we employ is SSIM (structural similarity) index, which measure similarity between two images by considering the brightness and contrast related to the structure of objects. For a good result, the SSIM score should be close to 1. For two images denoted as $I$ and $K$, SSIM is defined as:

(12)$$l(I,K) =\frac{2\mu_{I}\mu_{K}+C_{1}}{\mu_{I}^{2}+\mu_{K}^{2}+C_{1}}, \,\, c(I,K) =\frac{2\sigma_{I}\sigma_{K}+C_{2}}{\sigma_{I}^{2}+\sigma_{K}^{2}+C_{2}}, \,\, s(I,K) =\frac{\sigma_{IK}+C_{2}/2}{\sigma_{I}\sigma_{K}+C_{2}/2}$$

where $\sigma$ and $\mu$ presents the average and variance of the image, $\sigma _{IK}$ represents the covariance of the image $I$ and $K$. $C_1=(k_1L)^2$ and $C_2=(k_2L)^2$ are constants used to maintain stability, $L$ represents the dynamic range of pixels, $k_1$ is typically set to 0.001, $k_2$ is typically set to 0.003. The final SSIM is obtained by combining these three metrics:

(13)$$SSIM(I,K)=l(I,K)c(I,K)s(I,K)=\frac{2\mu_I\mu_K+C_1}{\mu_I^2+\mu_K^2+C1}\cdot\frac{2\sigma_{IK}+C_2}{\sigma_I^2+\sigma_K^2+C_2}$$

In the ablation experiment, root mean square error (RMSE) is used to compare the performance of single-stage and two-stage registration. RMSE is defined as:

(14)$$\text{RMSE}=\sqrt{\frac{1}{N}\sum_{i=1}^N||p_j-H\cdot P_j||^2}$$

where N is the total number of matching points, $p_j$ and $P_j$ is a pair of feature point correspondence, $H$ is the homography matrix that warps the detailed image to the global image.

4.2 Performance on image stitching

4.2.1 Qualitative evaluation

This experiment investigates the image stitching performance through qualitative evaluation. The stitching performance of the cross-scale images we collecting is compared with SIFT, GLUE, APAP [31] and our method. The existing aerial imaging systems [23,32,33] usually use traditional feature detection and matching methods such as SIFT and RANSAC. Therefore, we don’t compare with the aerial image stitching techniques in existing aerial imaging systems, we compare our method with traditional image registration, which is more universal. The misalignment of objects in the overlapping regions of stitching results can be utilized as a qualitative indicator for assessing the stitching performance. For a better show of the stitching performance, We uniformly employ the stitching seams calculated from the overlapping regions after two-stage image stitching, along with the histogram registration for image stitching.

Figure 6 shows the stitching results of different methods on three scenes. Scene 1 describes a stitching case in a parkade, where numerous objects are present to depict the performance of image stitching, such as straight tracks and vehicles. In Scene 1, Frame 1(a) and Frame 2(a) depict the registration results using the SIFT method, where there is significant misalignment of objects in the overlapping region, such as the misalignment of roads and the incomplete construction of buildings. Frame 1(b) and Frame 2(b) illustrate the registration results using the GLUE method, showing a reduced degree of misalignment, although some slight misalignment still exists. Since both methods adopt a global homography matrix, the linear structures of objects in non-overlapping regions remain correct. Due to the sparse and uneven distribution of feature points, in Frame 1(c) and Frame 2(c), the APAP method results in a significant misalignment in the overlapping region. The shapes of objects in the non-overlapping region are also severely distorted, such as severe deformation of buildings, making objects indistinguishable. In contrast, the stitching results generated by our method exhibit minimal misalignment of objects in the overlapping region, and the shapes of objects in the non-overlapping region are maintained with a good structural integrity, such as straight roads, the shapes of buildings are maintained very well. Scene 2 and Scene 3 depict stitching cases of a park and a square, respectively. Similarly, the SIFT method results in severe misalignment in the overlapping region, such as misaligned roads and paths in the forest. The GLUE method alleviates some of the misalignment, but there is still some remaining misalignment. The APAP method not only introduces misalignment but also distorts the shapes of objects. Our method minimizes misalignment of objects in the overlapping region while preserving the structural integrity of object shapes. The stitching results demonstrate the effectiveness of our method.

Fig. 6. Our image stitching method compared to other image stitching methods. In each scene, we use two frames with a significant time gap to compare, Images from left to right is the stitching result of (a) SIFT, (b) GLUE, (c) APAP, (d) our method. The intersection of the two detail images are highlighted to show the precision of registration.

Download Full Size | PDF

4.2.2 Quantitative evaluation

This experiment investigates the image stitching performance through quantitative evaluation. The proposed two-stage image stitching method is compared with the common approaches, such as APAP, Sift and Glue. We downsample the global image obtained from two-stage image stitching to match the same resolution as the input global image, and quantitatively compare the downsampled images with the input images by calculating PSNR and SSIM scores.

Table 2 shows the PSNR and SSIM of different methods between stitching images and inputting panorama. We can see that our method consistently achieves higher PSNR and SSIM scores in each dataset compared to other methods, indicating that our method can generate better registration performance. Compared to traditional methods, our two-stage registration approach enables more precise registration of images with wide baseline and better preserves the integrity of object structures in the images. All of these results demonstrate that the proposed two-stage image stitching is more suitable for wide baseline, cross-scale image stitching, and the registration performance surpasses the state-of-the-art schemes.

Table 2. Comparison of PSNR and SSIM scores of different methods in three data.

View Table | View all tables in this article

4.3 Performance on image blending

This experiment investigates the performance with respect to image with respect to image blending. We compare our image blending method with two popular image blending methods: average blending [34] and linear blending [35]. For fair model comparison, all image blending methods utilize detailed images after the two-stage image stitching.

As shown in Fig. 7, we present the comparative results of two sets of image blending. The average blending method often results in visible seams due to color inconsistency. While linear blending mitigates visible seams to some extent, both blending methods are unable to eliminate misalignments or ghosting. Through the application of the seam cutting step, we identify a seam that divides the image into two sides. In Fig. 7, these two cases involve copying each warped image to the corresponding side of the seam, avoiding ghosting but retaining visible seams. Our implementation of image blending generates visually pleasing results, highlighting the effectiveness of our image blending strategy.

Fig. 7. Our image blending method compared to other image blending methods. In each scene, from left to right is the blending result of (a) average blending, (b) linear blending, (c) seam cutting, (d) our method. To make a more distinct comparison, specific areas have been magnified and positioned beneath the stitched results.

Download Full Size | PDF

4.4 Ablation experiment

4.4.1 Performance on image defogging

In order to address the issue of foggy effects on images captured by drones at high altitudes, we conducted comparative experiments to choose a dehazing method and parameters suitable for real scene. Specifically, in Fig. 8, we compared the dark channel dehazing [36] method with different degrees of haze removal, histogram equalization, and local histogram equalization with different sub-block sizes. Figure 8(b) and Fig. 8(c) depict the results of applying the dark channel dehazing method with different degrees of haze removal. In Fig. 8(b), with a dehazing level of 0.7, the texture details of the white buildings vanish completely. In Fig. 8(c), the texture details of the white buildings are preserved, but the image still appears blurry due to the influence of haze. In Fig. 8(d), while histogram equalization results in an improvement in reducing the haziness caused by fog, there is a significant color distortion. For example, the color of the trees changes from green to brown. Figure 8(e) displays the result of applying local histogram equalization with sub-blocks set to $10\times 10$ pixels, the result still exhibits some degree of blurriness. Our method sets the local sub-blocks to 4*4 pixels, as shown in Fig. 8(e), the result exhibits clear details, and there is minimal color distortion in the objects within the image. Compared with other methods, our methods can highlight object details and reduce color distortion.

Fig. 8. Qualitative comparison of the defog results. (a) is the input image, the remaining images are respectively the result of (b) dark channel dehazing, fog removal intensity set to 0.7, (c) dark channel dehazing, fog removal intensity set to 0.3, (d) histogram equalization, (e) local histogram equalization, sub-block set to $10\times 10$, (f) local histogram equalization, sub-block set to 4*4.

Download Full Size | PDF

4.4.2 Performance on two-stage image stitching

This experiment validates the superiority of two-stage image stitching over single-stage image stitching. We employed RMSE as the comparative metric To compare the performance of our method with SIFT+RANSAC, Superpoint+Superglue. Table 3 shows the comparison of the performance of single-stage and two-stage image stitching. While SIFT method shows the minimum RMSE values in some frames, it exhibits significantly high RMSE values in others. This implies that SIFT registration is not consistently accurate, leading to substantial changes between frames. On the other hand, the GLUE method maintains stable RMSE values, but compared to it, our method achieves consistently minimal RMSE values on each frame while maintaining stability. Therefore, experimental results demonstrate the effectiveness of our two-stage image stitching method.

Table 3. Comparison of the performance of single-stage and two-stage image stitching.

View Table | View all tables in this article

4.5 Failure cases and discussion

The experimental results indicate that the proposed two-stage cross-scale image stitching method can achieve accurate alignment of images captured by slave drones while preserving the structural integrity of objects in the images. Additionally, it is capable of generating stable and smooth video streams. However, our method also has some shortcomings.

In real-time processing, for stitching a panoramic image, the single stage SIFT+RANSAC registration takes 9.0853 seconds, the single stage Superpoint+Superglue registration takes 6.9659 seconds, and our method takes 12.0898 seconds. In terms of time consumption, it is not enough to complete real-time video stitching.

In environmental conditions, our method is not suitable for situations where there are multiple planes in detailed images. This typically occurs when objects in the images have significant differences in height, and the same object captured by the master and slave drones does not belong to the same plane due to different viewpoints. In Fig. 9, Fig. 9(a) and Fig. 9(b) represent the warping of feature points in two frames with a very small time interval for detailed images. It is evident that the feature points in the images are clustered on two planes. Some feature points are distributed on the ground, while another set of feature points is distributed on white buildings. In Fig. 9(a), with the coarse homography matrix, the ground is projected neatly, while the white building have a significant projection error. In Fig. 9(b), the white building is projected neatly, while the ground have a significant projection error. Figure 9(c) and Fig. 9(d) are detailed images warped by these two different homography matrices respectively, and their shapes differ significantly, leading to a considerable warping error.

Fig. 9. Failure cases caused by multiple planes. (a) and (b) show projection error of feature points, (c) and (d) show result after two-stage image stitching.

Download Full Size | PDF

In future work, we will focus on addressing disparities, multiple planes caused by differences in drone altitude and trying to shorten the time as much as possible to achieve real-time stitching. We can use end-to-end neural networks, which can improve registration efficiency through learning and greatly shorten runtime. Our aim is to utilize the information from the disparity to enhance plane distinction and warping and obtain the improved performance in image stitching.

5. Conclusions

To achieve the image acquisition with the wide FoV and high resolution, we present a master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline in this paper. It contains one master UAV, multiple slave UAVs, and can be regarded as a camera array with the wide and dynamic baseline. After that, we present a coarse-to-fine stitching method for the proposed imaging system. Real data experiments demonstrate that our method achieves more accurate alignment state-of-the-art methods.

Funding

Major Project of Fundamental Research on Frontier Leading Technology of Jiangsu Province (BK20222006); National Key Research and Development Program of China (2020YFB2103501).

Acknowledgments

The authors thank Dr. Wenhao Xu for his support on providing swam UAVs and getting real data in Zhaoqing City, Canton.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. K. C. Zhou, M. Harfouche, and C. L. Cooke, “Parallelized computational 3d video microscopy of freely moving organisms at multiple gigapixels per second,” Nat. Photonics 17(5), 442–450 (2023). [CrossRef]

2. N. M. Law, O. Fors, and J. Ratzloff, “Evryscope science: exploring the potential of all-sky gigapixel-scale telescopes,” Publ. Astron. Soc. Pac. 127(949), 234–249 (2015). [CrossRef]

3. L. Sun, C. Tang, M. Xu, et al., “Dic measurement for large-scale structures based on adaptive warping image stitching,” Appl. Opt. 61(22), G28–G37 (2022). [CrossRef]

4. O. Cossairt and S. Nayar, “Spectral focal sweep: Extended depth of field from chromatic aberrations,” in 2010 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2010), pp. 1–8.

5. D. J. Brady, M. E. Gehm, and R. A. Stack, “Multiscale gigapixel photography,” Nature 486(7403), 386–389 (2012). [CrossRef]

6. X. Lin, J. Wu, G. Zheng, et al., “Camera array based light field microscopy,” Biomed. Opt. Express 6(9), 3179–3189 (2015). [CrossRef]

7. J. Lee, B. Kim, K. Kim, et al., “Rich360: optimized spherical representation from structured panoramic camera arrays,” ACM Trans. Graph. 35(4), 1–11 (2016). [CrossRef]

8. K. Wu, H. Zhang, and Y. Chen, “All-silicon microdisplay using efficient hot-carrier electroluminescence in standard 0.18μm cmos technology,” IEEE Electron Device Lett. 42(4), 541–544 (2021). [CrossRef]

9. A. R. Prado, A. G. Leal-Junior, and C. Marques, “Polymethyl methacrylate (pmma) recycling for the production of optical fiber sensor systems,” Opt. Express 25(24), 30051–30060 (2017). [CrossRef]

10. L. He, X. Li, X. He, et al., “VSP-based warping for stitching many uav images,” IEEE Transactions on Geoscience and Remote Sensing (2023).

11. P. J. Burt and E. H. Adelson, “A multiresolution spline with application to image mosaics,” ACM Trans. Graph. 2(4), 217–236 (1983). [CrossRef]

12. J. Ma, X. Jiang, and A. Fan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis. 129(1), 23–79 (2021). [CrossRef]

13. F. Abedi, Y. Yang, and Q. Liu, “Group geometric calibration and rectification for circular multi-camera imaging system,” Opt. Express 26(23), 30596–30613 (2018). [CrossRef]

14. P. An, “Code of master-slave-uavs,” figshare (2024) [retrieved 29 Feb 2024], https://doi.org/10.6084/m9.figshare.25309855.

15. A. W. Lohmann, “Scaling laws for lens systems,” Appl. Opt. 28(23), 4996–4998 (1989). [CrossRef]

16. X. Liu, R. Singh, G. Li, et al., “Waveflex biosensor-using novel tri-tapered in tapered four-core fiber with multimode fiber coupling for detection of aflatoxin b1,” Journal of Lightwave Technology (2023).

17. A. G. Leal-Junior, A. Frizera, and C. Marques, “Polymer optical fiber strain gauge for human-robot interaction forces assessment on an active knee orthosis,” Opt. Fiber Technol. 41, 205–211 (2018). [CrossRef]

18. W. Zhang, R. Singh, and Z. Wang, “Humanoid shaped optical fiber plasmon biosensor functionalized with graphene oxide/multi-walled carbon nanotubes for histamine detection,” Opt. Express 31(7), 11788–11803 (2023). [CrossRef]

19. X. Liu, R. Singh, and M. Li, “Plasmonic sensor based on offset-splicing and waist-expanded taper using multicore fiber for detection of aflatoxins b1 in critical sectors,” Opt. Express 31(3), 4783–4802 (2023). [CrossRef]

20. P. An, Q. Liu, F. Abedi, et al., “Novel calibration method for camera array in spherical arrangement,” Signal Process. Image Commun. 80, 115682 (2020). [CrossRef]

21. J. Kopf, M. Uyttendaele, O. Deussen, et al., “Capturing and viewing gigapixel images,” ACM Trans. Graph. 26(3), 93 (2007). [CrossRef]

22. X. Yuan, L. Fang, Q. Dai, et al., “Multiscale gigapixel video: A cross resolution image matching and warping approach,” in 2017 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2017), pp. 1–9.

23. X. Meng, W. Wang, and B. Leong, “Skystitch: A cooperative multi-uav-based real-time video surveillance system with stitching,” in Proceedings of the 23rd ACM International Conference on Multimedia, (2015), pp. 261–270.

24. W.-S. Lai, O. Gallo, J. Gu, et al., “Video stitching for linear camera arrays,” arXivarXiv:1907.13622 (2019). [CrossRef]

25. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2), 91–110 (2004). [CrossRef]

26. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM 24(6), 381–395 (1981). [CrossRef]

27. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2018), pp. 224–236.

28. P.-E. Sarlin, D. DeTone, T. Malisiewicz, et al., “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 4938–4947.

29. Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Machine Intell. 26(9), 1124–1137 (2004). [CrossRef]

30. K. Xu, “Silicon electro-optic micro-modulator fabricated in standard cmos technology as components for all silicon monolithic integrated optoelectronic systems,” J. Micromech. Microeng. 31(5), 054001 (2021). [CrossRef]

31. J. Zaragoza, T.-J. Chin, M. S. Brown, et al., “As-projective-as-possible image stitching with moving dlt,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2013), pp. 2339–2346.

32. Y. Yuan, F. Fang, and G. Zhang, “Superpixel-based seamless image stitching for uav images,” IEEE Trans. Geosci. Remote Sensing 59(2), 1565–1576 (2021). [CrossRef]

33. D. Guo, J. Chen, L. Luo, et al., “Uav image stitching using shape-preserving warp combined with global alignment,” IEEE Geosci. Remote Sensing Lett. 19, 1–5 (2022). [CrossRef]

34. Y. Li, Y. Wang, W. Huang, et al., “Automatic image stitching using sift,” in 2008 International Conference on Audio, Language and Image Processing, (IEEE, 2008), pp. 568–571.

35. J. Gao, S. J. Kim, and M. S. Brown, “Constructing image panoramas using dual-homography warping,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (IEEE, 2011), pp. 49–56.

36. K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). [CrossRef]

Sequence	Location	Image Size (pixel)	Master UAV Height (m)	Slave UAV Height (m)
parkade	Wuhan, Hubei	$1920 \times 1080$	100	50
square	Zhaoqing, Canton	$1920 \times 1080$	300	150
park	Zhaoqing, Canton	$1920 \times 1080$	300	150

Parkade	PSNR	1	2	3	4	5	6	7	Average
	SIFT	21.0092	21.073	21.0167	21.0097	21.9568	21.0082	21.0715	21.0269
	GlUE	21.2072	21.1894	21.1902	21.1482	21.1593	21.2885	21.3298	21.2161
	APAP	20.9993	20.9615	20.9204	20.8971	20.8640	20.9384	20.9475	20.9326
	Ours	21.4918	21.4557	21.4298	21.3783	21.3852	21.5165	21.4676	21.4435
	SSIM	1	2	3	4	5	6	7	Average
	SIFT	0.6643	0.6707	0.6648	0.6590	0.6518	0.6544	0.6570	0.6603
	Glue	0.6715	0.6561	0.6733	0.6592	0.6701	0.6630	0.6655	0.6655
	Apap	0.6494	0.6480	0.6456	0.6388	0.6404	0.6365	0.6335	0.6417
	Ours	0.6754	0.6898	0.6756	0.6805	0.6800	0.6751	0.6755	0.6788
Square	PSNR	1	2	3	4	5	6	7	Average
	Sift	17.5751	17.6765	17.6603	17.7167	17.7548	17.6803	17.7796	17.6919
	Glue	17.7750	17.7723	17.8151	17.7794	17.8354	17.7785	17.8268	17.7975
	Apap	17.1629	17.4063	17.3754	17.3925	17.4555	17.3565	17.4177	17.3667
	Ours	18.4112	17.8712	17.8874	17.8503	17.8885	17.8385	17.8543	17.9431
	SSIM	1	2	3	4	5	6	7	Average
	Sift	0.5996	0.6023	0.6032	0.6143	0.6112	0.6061	0.6121	0.6070
	Glue	0.6190	0.6171	0.6142	0.6191	0.6198	0.6231	0.6162	0.6184
	Apap	0.5641	0.5738	0.5758	0.5766	0.5794	0.5788	0.5700	0.5741
	Ours	0.6585	0.6500	0.6422	0.6508	0.6465	0.6393	0.6400	0.6468
Park	PSNR	1	2	3	4	5	6	7	Average
	Sift	16.9652	16.8094	17.3118	16.8594	17.2300	16.6963	16.9641	16.9766
	Glue	17.1812	17.1968	17.0388	17.0016	17.0734	16.9707	17.2549	17.1025
	Apap	16.9186	16.5882	17.0288	16.7432	17.0815	16.5520	16.7482	16.8086
	Ours	18.6622	17.2346	17.5151	17.1036	17.447	18.4113	17.3253	17.6714
	SSIM	1	2	3	4	5	6	7	Average
	Sift	0.4442	0.4393	0.4471	0.4405	0.4367	0.4218	0.4195	0.4356
	Glue	0.4583	0.4502	0.4406	0.4466	0.4339	0.4406	0.4374	0.4439
	Apap	0.4351	0.4274	0.4292	0.4251	0.4331	0.4152	0.4153	0.4258
	Ours	0.5038	0.4895	0.4948	0.4817	0.4829	0.4873	0.4645	0.4864

Methods	Parkade				Square				Park
Methods	frame1	frame2	frame3	average	frame1	frame2	frame3	average	frame1	frame2	frame3	average
SIFT	10.6654	1.8457	8.6977	7.0696	24.8739	4.1758	4.3021	11.1173	5.6343	27.6874	5.4976	12.9393
GLUE	6.1411	6.2857	5.9388	6.1218	4.1484	4.1205	4.0485	4.1058	9.3015	8.9362	9.3428	9.1935
Our	5.8832	5.3076	5.4241	5.5383	4.1137	4.1058	4.1065	4.1087	5.8624	5.5113	5.4347	5.6028

Sequence	Location	Image Size (pixel)	Master UAV Height (m)	Slave UAV Height (m)
parkade	Wuhan, Hubei	$1920 \times 1080$	100	50
square	Zhaoqing, Canton	$1920 \times 1080$	300	150
park	Zhaoqing, Canton	$1920 \times 1080$	300	150

Parkade	PSNR	1	2	3	4	5	6	7	Average
	SIFT	21.0092	21.073	21.0167	21.0097	21.9568	21.0082	21.0715	21.0269
	GlUE	21.2072	21.1894	21.1902	21.1482	21.1593	21.2885	21.3298	21.2161
	APAP	20.9993	20.9615	20.9204	20.8971	20.8640	20.9384	20.9475	20.9326
	Ours	21.4918	21.4557	21.4298	21.3783	21.3852	21.5165	21.4676	21.4435
	SSIM	1	2	3	4	5	6	7	Average
	SIFT	0.6643	0.6707	0.6648	0.6590	0.6518	0.6544	0.6570	0.6603
	Glue	0.6715	0.6561	0.6733	0.6592	0.6701	0.6630	0.6655	0.6655
	Apap	0.6494	0.6480	0.6456	0.6388	0.6404	0.6365	0.6335	0.6417
	Ours	0.6754	0.6898	0.6756	0.6805	0.6800	0.6751	0.6755	0.6788
Square	PSNR	1	2	3	4	5	6	7	Average
	Sift	17.5751	17.6765	17.6603	17.7167	17.7548	17.6803	17.7796	17.6919
	Glue	17.7750	17.7723	17.8151	17.7794	17.8354	17.7785	17.8268	17.7975
	Apap	17.1629	17.4063	17.3754	17.3925	17.4555	17.3565	17.4177	17.3667
	Ours	18.4112	17.8712	17.8874	17.8503	17.8885	17.8385	17.8543	17.9431
	SSIM	1	2	3	4	5	6	7	Average
	Sift	0.5996	0.6023	0.6032	0.6143	0.6112	0.6061	0.6121	0.6070
	Glue	0.6190	0.6171	0.6142	0.6191	0.6198	0.6231	0.6162	0.6184
	Apap	0.5641	0.5738	0.5758	0.5766	0.5794	0.5788	0.5700	0.5741
	Ours	0.6585	0.6500	0.6422	0.6508	0.6465	0.6393	0.6400	0.6468
Park	PSNR	1	2	3	4	5	6	7	Average
	Sift	16.9652	16.8094	17.3118	16.8594	17.2300	16.6963	16.9641	16.9766
	Glue	17.1812	17.1968	17.0388	17.0016	17.0734	16.9707	17.2549	17.1025
	Apap	16.9186	16.5882	17.0288	16.7432	17.0815	16.5520	16.7482	16.8086
	Ours	18.6622	17.2346	17.5151	17.1036	17.447	18.4113	17.3253	17.6714
	SSIM	1	2	3	4	5	6	7	Average
	Sift	0.4442	0.4393	0.4471	0.4405	0.4367	0.4218	0.4195	0.4356
	Glue	0.4583	0.4502	0.4406	0.4466	0.4339	0.4406	0.4374	0.4439
	Apap	0.4351	0.4274	0.4292	0.4251	0.4331	0.4152	0.4153	0.4258
	Ours	0.5038	0.4895	0.4948	0.4817	0.4829	0.4873	0.4645	0.4864

See farther and more: a master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline

Abstract

1. Introduction

2. Related works

3. Proposed synthetic optical aperture imaging system

3.1 Overview

3.2 Master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline

3.3 Coarse-to-fine image stitching method for master-slave UAVs

3.4 Image blending method for master-slave UAVs

3.5 Video stabilization method for master-slave UAVs

4. Experiments and discussions

4.1 Experimental environment and data

4.2 Performance on image stitching

4.2.1 Qualitative evaluation

4.2.2 Quantitative evaluation

4.3 Performance on image blending

4.4 Ablation experiment

4.4.1 Performance on image defogging

4.4.2 Performance on two-stage image stitching

4.5 Failure cases and discussion

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Supplementary Material (1)

Data availability

Cited By

Figures (9)

Tables (3)

Equations (14)

Optics Express