ASI aurora search: an attempt of intelligent image processing for circular fisheye lens

Xi Yang; Xinbo Gao; Bin Song; Nannan Wang; Dong Yang

doi:10.1364/OE.26.007985

1. Introduction

In our real life, ordinary lens are often used to capture natural images, e.g., wonderful sceneries, delicious food and various portraits. Hence, there are extensive research on ordinary lens for numerous tasks, e.g., image classification, object segmentation and visual reconstruction. Especially, the amazing development of deep neural networks supplies strong boost for the performance improvement in normal images processing.

In practice, there are still a lot of images captured with anamorphic lens, especially the fisheye lens. Because of the ability of capturing broader scenes, fisheye lens are commonly used in video surveillance, teleconference system, wide-angle imaging, motion recording with GoPro, etc. In particular, in the field of natural scientific research, circular fisheye lens play an important role of recording natural phenomena, such as cloud in the sky.

However, the captured images are distributed non-uniformly with spherical deformation, thus increasing the difficulty of automatic processing. Generally, there are two approaches dealing with this problem. One approach is transforming the deformed images with circular vision into normal one with rectangular vision (see Fig. 1 as an example), and then applying state-of-the-art models to conduct image processing. Nevertheless, the interpolation errors (marked with red bounding box in Fig. 1) introduced in the transforming procedure greatly affects the image quality. The other approach is applying simple image processing methods out of fashion, which obviously limits the processing performance. Therefore, how to directly exploit the state-of-the-art models for circular images without transforming and thus achieve intelligent processing is necessary and of significant importance.

Fig. 1 One example of the transformed results. (a) Deformed image with circular vision. (b) Normal image with rectangular vision. The latitude-longitude projection and cylinder compression are applied in the transformation from deformed image to normal image.

Download Full Size | PDF

In this paper, we focus on one kind of images captured with circular fisheye lens, i.e., the aurora images. Aurora is a natural light occurred in high latitude regions, it is caused by the collision of solar wind composed of energetic charged particles with atoms in earth’s magnetosphere. As one of the most important phenomenon reflecting the activities in solar-terrestrial space, numbers facilities are utilized to capture it [1,2]. Because of the advantages of all-weather observation and high resolution, images captured with all-sky-imagers (ASI) [3] installed at Yellow River Station (YRS) are chosen for our study. From the first winner observation in 2003, the number of ASI aurora images has broken 30 million, which absolutely can be treated as a “big data”. To conduct effective processing and understanding from so large a data set, the first and most important step is image search, which aims to select interesting images or choose specific ones based on the research requirement.

Traditional ASI aurora image search are performed with visual observation from researchers themselves. Such manual way inevitably increases the human consumption and is easily contaminated with subjective errors due to visual fatigue. To solve this problem, our previous work proposed a polar embedding (PE) model [4] which exploited hand-crafted features to represent ASI aurora image, thus achieving automatic search without the human participation. However, the search performance is not satisfying due to two reasons. On one hand, PE model directly applies the bag-of-words (BoW) model [5] which is commonly used for normal images for ASI aurora images with spherical deformation. The lack of deep analysis on imaging principle for circular fisheye lens results in undesirable search results. On the other hand, the hand-crafted features of PE model are simple and low in discriminability. The lack of self-training process and visual perception limits the search performance.

To this end, we aim to combine the characteristics of circular fisheye lens with the advanced deep learning methods. First, the structure of ASI in YRS is explored to build the deformed coordinates for aurora image. Together with its magnetic characteristics and the shapes of auroral structures, a deformed region division (DRD) scheme is presented. Then, we select one of the state-of-the-art deep learning methods, called Mask region convolutional neural network (Mask R-CNN) [6], to perform self-training and visual perception. Because Mask R-CNN is designed for normal images, we refine it by using our DRD scheme to replace its region proposals network (RPN). Thus, deformed regions generated from DRD are able to encompass auroral structures with minimum areas, and CNN features can be extracted in multi-scale. Subsequently, we cluster all deformed regions with CNN features to form a visual vocabulary consisting of various visual words (cluster centers), and thus each aurora image composing of deformed regions can be represented as a “bag” of visual words. Afterwards, indexing table is constructed based on each visual word linking the corresponding quantified deformed regions. Finally, given a query image, by measuring the similarity scores between it and all images in the data set, the ranking result is exported.

The main contributions of this paper are summarized as follows. 1) We exploit the advanced CNN model for images captured with circular fisheye lens. We put forward a general idea of how to analyze images captured with distorted lens, especially the circular fisheye lens, with the state-of-the-art deep learning methods, and thus achieves intelligent processing with favorable performance. 2) The DRD is proposed to revise the RPN in the Mask R-CNN model. By considering the imaging principle of circular fisheye lens, the RPN for region proposal detection is revised as DRD, which effectively outputs the locations and shapes of regions with saliency information, thus promoting the search performance. 3) The proposed ASI aurora search model greatly facilitates the physics research. Our model is able to search related images from the big ASI aurora data set, which liberates researchers from burdensome visual observation and is helpful for their further study of solar-terrestrial space.

2. Related work

2.1 Normal image search based on the BoW model

Normal image search is conducted on the framework of bag-of-words (BoW) with hand-crafted features. The hand-crafted features are designed by researchers without any training process, and the BoW model regards each image as a “bag” of hand-crafted features. Generally, there are four modules in the BoW-based normal image search, i.e., feature extraction, feature quantization, indexing construction and post processing.

Feature Extraction. This module extracts hand-crafted features from the whole image, and the extraction is based on pixel points of interest, or called “keypoints”. In practice, keypoints are first detected with different detectors, such as Hessian affine [7]. Then, each keypoint is represented with hand-crafted feature, such as scale-invariant feature transform (SIFT) [8] or local binary patterns (LBP) [9]. In some cases, dense meshing is applied to supply sufficient keypoints (e.g., rectangular meshing or polar meshing) and spatial information is embedded with region division schemes (e.g., spatial pyramid matching (SPM) [10]).

Feature Quantization. This module aims to organize the extracted features in an orderly fashion. Firstly, compact representation is achieved using clustering algorithms, i.e., grouping a set of features in such a way that features in the same group. Commonly used clustering algorithms include hierarchical k-means (HKM) [11] and approximate k-means (AKM) [12]). Therefore, the features are clustered to generate a visual vocabulary composing of k visual words (cluster centers), and thus each feature can be quantized to its nearest visual word (hard quantization) or as a combination of several visual words (soft quantization). More effective models are proposed to improve the compactness of visual words, such as the vector of locally aggregated descriptors [13] which adopts the principle component analysis [14] to conduct dimensionality reduction.

Indexing Construction. This module aims to save the organized features into the memory. To decrease memory consumption, the inverted file [5] is applied. In practice, the information of all quantized features are saved according to the visual vocabulary, i.e., each visual word control one line to link the corresponding keypoints with their information of image ID, term frequency (TF) and other clues. Thus, given a query image, only images in the data sets containing the same visual words need to be visited. The improved methods promote search performance by adding more clues or refining the visual vocabulary and indexing structures, e.g., the Hamming embedding (HE) [15], the joint-index [16], the multi-index [17] and the collaborative-index [18].

Post Processing. Given a query image, the features are extracted and quantized in the same manner, and similarity scores are computed based on the construed indexing table. To filter out false positives and export the most similar images, post processing is performed to re-rank the candidate search result with spatial verification and query expansion. The former checks the geometric consistency of matched features. The latter regards the initial top-ranked outputs as new queries to introduce extra information, and thus achieves a high recall.

2.2 Normal image search based on the CNN model

Although normal image search based on BoW model successfully applies in various data sets, its hand-crafted features limit the search performance due to the lack of self-training process and visual perception.

Recently, the deep learning methods, especially the CNN models [19,20], have attracted increasing attention in various computer tasks. By cascading several convolutional layers, pooling layers and fully-connected layers, a “deep” neural network is built to simulate human cognitive for extracting semantic features and realizing intelligent identification, classification, segmentation, etc.

For the task of normal image search [21], the CNN models such as AlexNet [22], VGG [23], ResNet [24] and ResNeXt [25], are adopted and semantic features (features carrying high level concepts such as “tree” and “house”) are outputted from single or multiple layers. To achieve finer and more precise processing for specific objects in the whole image, schemes to determine region proposals are added into the backbone of CNN models. The straightforward scheme is the region division schemes (e.g., SPM [10]), typical methods comprise the multi-scale orderless pooling (MOP) [26] which pools regional CNN features extracted from regions determined by SPM together, and the probabilistic analysis (PA) [27] which integrates multi-scale regional CNN features with the global CNN feature.

Afterwards, the region proposals determination schemes, e.g., the selective search (SS) [28], are used to select informative regions into the CNN framework, thus resulting in the famous regions with CNN (R-CNN) model [29]. However, determining region proposals outside the backbone may lead to high memory cost and repetitive computation. To solve this problem, a series of improved methods are presented to embed the region proposals determination into the CNN framework. One of the most important methods is the Faster R-CNN [30] which proposes a region proposal network (RPN) to realize synchronous region detection and feature extraction. Other state-of-the-art methods comprise you only look once (YOLO) [31], single shot detector (SSD) [32], and YOLO2 [33]. In addition, how to achieve multi-level feature from different layers with multi-resolution is an important problem. One approach is the maximum activation of convolutions (MAC) model [34] which encodes the maximum local response of the last convolutional filters. The other approach is the feature pyramid network (FPN) [35] which integrates feature maps from adjacent layers without losing their resolutions. Recently, the Mask R-CNN [6] introducing a pixel-level segmentation branch into the Faster R-CNN framework with FPN is presented and achieves amazing performance.

The above CNN based normal image search methods have achieved compelling performance. However, they actually focus on images without spherical aberration. When faces to other images captured with distorted lens, especially the circular images captured with fisheye lens, the search performance is unsatisfying. To solve this problem, a spherical CNN model [36] is proposed to detect local patterns with 3D rotation over the sphere. With the introduction of three-dimensional manifold SO(3), the spherical CNN changes the definition of cross-correlation by replacing filter translations with rotations. Also, a generalized fast Fourier transform (FFT) algorithm is applied to improve its efficiency.

The spherical CNN model is mainly designed for 3D model recognition, lacking specific analysis of imaging principle of circular fisheye lens and magnetic characteristics of aurora may affect its performance. Hence, this paper aims to establish a bridge between the advanced CNN models with the circular fisheye lens by analyzing the characteristics of ASI aurora.

3. Characteristics of ASI aurora

3.1 Structure of the ASI

In the winter of 2003, the YRS of China was built at Ny-Alesund, Spitsbergen Archipelago of Norway [Fig. 2(a)]. The geographic coordinates of YRS is 78.9°N, 11.9°E with corrected magnetic latitude 76.2°, and this special location creates favorable conditions for perennial optical aurora observations in the dayside on Earth. To conduct wide-angle aurora imaging, YRS [Fig. 2(b)] sets up an observation hut [Fig. 2(c)] with three ASI systems (under monochrome wavelengths 427.8 nm (purple), 557.7 nm (green) and 630.0 nm (red), respectively) inside.

Fig. 2 Related images of YRS. (a) Location of YRS. (b) Outdoor scene of YRS. (c) Observation hut of ASI systems in YRS.

Download Full Size | PDF

As illustrated in Fig. 3, each ASI is mainly composed of three parts, i.e., optical lens part, charge coupled device (CCD) part, record and control part. In the optical lens part, Nikon’s ASI-2 lens group is applied which comprises circular fisheye lens, relay lens, filter lens and focusing lens. By inserting different narrow bandpass filters, aurora under different wavelengths can be captured. In the CCD part, Hamamatsu’s C4880-21-24A 14 bit digital camera is chosen with 512 × 512 resolution, and is regulated by a CCD control cabinet. In the record and control part, a computer is linked to control the operation of ASI and record aurora images in real time. There are two kinds of work modes, i.e., normal mode and battle mode. In the normal mode, the exposure time for every frame of image is 7s and the shooting interval is 10s. In the battle mode, joint observation can be performed with the EISCAT Svalbard radar (ESR) for analyzing the kinetic features of aurora in polar gap and auroral hotspot afternoon. To this end, the exposure time and shooting interval are reduced to 3s and 6s, respectively. Here, to synchronize ASIs under the purple, green and red wavelength, a public global position system (GPS) receiver is shared to obtain the time signals. Figure 4 shows examples of aurora images captured with triple-wavelength ASIs, each image is in grayscale with size 512 × 512.

Fig. 3 The structure of ASI composing of three parts.

Download Full Size | PDF

Fig. 4 Examples of aurora images captured with triple-wavelength ASIs.

Download Full Size | PDF

3.2 Imaging principle of the ASI aurora

Aurora particles fall along the magnetic force line and are imaged mainly by the circular fisheye lens based on the information of zenith angle $θ$ and azimuth angle $ϕ$ . For the convenience of polar research, we must project the aurora image into the geomagnetic coordinates. Figure 5 represents the geometrical relationship for ASI aurora observation, where Fig. 5(a) is an ASI aurora image under polar coordinate system and Fig. 5(b) shows the corresponding geometrical relationship of YRS and the sky above.

Fig. 5 Geometrical relationship for ASI aurora observation. (a) ASI aurora image under polar coordinate system. (b) Geometrical relationship of YRS and the sky above.

Download Full Size | PDF

It can be seen that $O$ represents the center of the image (the origin) relating to the observation zenith, and the polar axis directs to magnetic north (M. N.) which exhibits an offset angle 28.9° from the horizontal line. $R$ is the radius of the FOV (180° in our ASI with circular fisheye lens). $Y$ is the location of YRS, $T (r, ϕ)$ is the location of target (the auroral structure), where $r$ is the distance from $O$ and $ϕ$ (azimuth angle) is the offset angle from the polar axis. Here, $r$ is able to reflect the information of zenith angle $θ$ , i.e.,

θ = 90 ° \times r / R,

and

r = R

(

T

lies to the periphery) means that

θ

is 90°. Thus, the solid angle of sky for each pixel is the same in ASI aurora image, but the geocentric angle

β

it refers is different. Their relationship can be derived as

β = π - α^{'} = π - (α^{″} + α) = π - (π - θ + α) = θ - α,

where

α

is the angle between the vision line (

Y T

) and the normal direction of sky (

O T

). According to sine theorem, we can obtain the following equation

\frac{R_{E}}{\sin α} = \frac{R_{E} + h}{\sin (π - θ)} = \frac{R_{E} + h}{\sin (θ)},

where

R_{E}

is the radius of earth (set to 6370km),

h

is the height from YRS to the sky above (e.g., 150km for the green length). By substituting (3) to (2), the geocentric angle

β

can be expressed as

β = θ - α = θ - \sin^{- 1} (\sin α) = θ - \sin^{- 1} [\frac{R_{E}}{R_{E} + h} \sin (θ)] .

While the distance from each pixel to the zenith in physical space $d$ (in radian) can be represented as

d = (R_{E} + h) β .

In practice, along with the increase of $θ$ , the change of the corresponding $β$ becomes larger and larger, and thus the resolution (distance between adjacent pixels) becomes larger and larger. Specifically, when $θ$ changes from 0° to 90°, the resolution changes from 1 km/pixel to 40 km/pixel, indicating the description on aurora is more and more rough that detailed information may be easily ignored.

Therefore, we can conclude that the circular fisheye lens results in non-uniform resolutions in ASI aurora images. This spherical distortion must be taken into consideration for achieving reasonable region division.

3.3 Analysis of the ASI aurora image

As ASI aurora images are distributed with spherical distortion, i.e., the peripheral regions with more deformation than the central regions, the conventional rectangular gridding inevitably introduces serious mistakes. Thus, a new gridding scheme conforming to circular fisheye lens is in urgent need.

As illustrated in Fig. 6, the deformation lines (marked with dashed orange lines) are determined based on the imaging principle of ASI, and are located perpendicular to the magnetic meridian (red dashed line points to M. N.). In practice, physicists always regard the auroral structures (e.g., the “arc” (highlighted with blue solid boxes) or “hotspot” (highlighted with yellow solid box)) as the search objects. An image owns more auroral structures similar to the query image can be exported as search result. By carefully observing the distribution of these auroral structures, we find that they are more likely to locate along the deformation lines.

Fig. 6 Analysis of ASI aurora image.

Download Full Size | PDF

Consequently, a straightforward idea for gridding scheme is to follow the distribution of deformation lines. Furthermore, by introducing the advanced CNN model and BoW framework, ASI aurora images search can be realized.

4. ASI aurora image search

As shown in Fig. 7, our ASI aurora image search consists of two modules, i.e., offline indexing and online search. On the stage of offline indexing, we use the advanced Mask R-CNN model [6] with FPN scheme [35]. Notably, the RPN in Mask R-CNN is trained for normal images and not suitable for ASI aurora images. To this end, a deformed region division (DRD) scheme conforms to the imaging principle of ASI is proposed to replace the RPN. After extracted the CNN features from the deformed regions, the BoW framework is applied with a visual vocabulary generated via AKM clustering [12]. By quantifying each deformed region to the corresponding visual word, an offline indexing can be constructed. On the stage of online search, similar steps are conducted for the query image to obtain quantified CNN features for deformed regions. By measuring the distances between each deformed regions together with the term frequency-inverse document frequency (TF-IDF), similarity scores of images in the indexing table can be calculated with the highest one exported as the top rank. The following contents focus on our key innovations including the DRD layer for feature extraction and BoW-based indexing and querying.

Fig. 7 The diagram of the proposed method.

Download Full Size | PDF

4.1 DRD layer for feature extraction

Based on the characteristics of ASI aurora, our DRD is mainly composed of circular keypoints determination, deformed region detection and multi-scale feature extraction.

Circular Keypoints Determination. To select deformed regions encompassing auroral structures, the first step is to determine the centers of these regions, or called “keypoints” in image search. Different from the traditional rectangular gridding, we follow the imaging principle of circular fisheye lens and determine keypoints based on the deformation lines. In practice, both radial deformation lines and equatorial deformation lines are taken into consideration and their intersections are regarded as our circular keypoints (marked with red dots in Fig. 6). Thus, our circular keypoints not only conform to the location of auroral structures, but able to reflect the geomagnetic distribution which would facilitate further physical research as well.

Deformed Region Detection. Centered at the circular keypoints, candidate bounding boxes are generated in multi-scale with different length-width ratios. By carefully observing the characteristics of ASI aurora, the directions of our candidate bounding boxes are not horizontal as conventional methods, but along the deformation lines perpendicular to magnetic meridian (see Fig. 6). This special design makes them capable of enclosing auroral structures with minimum area. Then, region priors are chosen based on statistical analysis from training data. The training data is processed by polar experts with auroral structures labeled with ground-truth bounding boxes. By changing the cluster number, K-means clustering results in different detection precision. Here, the precision is measured by the score of intersection over union (IoU), i.e.,

IoU = \frac{\cap (A_{g}, A_{p})}{\cup (A_{g}, A_{p})},

where

\cap (A_{g}, A_{p})

and

\cup (A_{g}, A_{p})

are the overlap area and union area between ground-truth bounding box and the predicted bounding box, respectively. Figure 8(a) shows example image of how to calculate IoU score, while Fig. 8(b) presents the average IoU scores for different K. It can be seen that along with the increase of K, the average IoU score first improves rapidly and then keep a stable value when arrive at 6. Considering that high value of K definitely leads to heavy computational complexity, we finally set K to 6 for balancing the precision and complexity, and the corresponding 6 region priors are presented in Fig. 8(c). Afterwards, initialized with these priors, region detection is conducted to obtain promising deformed regions based on the objectness of each bounding boxes.

Fig. 8 Deformed region detection. (a) IoU score calculation. (b) Changes of average IoU score with the increase of clusters number. (c) Our 6 region priors.

Download Full Size | PDF

Multi-Scale Feature Extraction. After obtained the deformed regions, feature extraction is performed with FPN to achieve multi-scale representation. Remarkably, our DRD layer is embedded into the Mask R-CNN framework, and thus repetitive computation for regions in one image is avoided. In practice, the location of each deformed region is projected to the corresponding feature maps which are actually the outputs of convolutional layers with a RoIAlign processing. Then, feature maps are linked to FPN and max-pooling (uses the maximum value from each of a cluster of neurons at the prior layer) is conducted for each pyramid layer to generate a 256-dimensional vector $P_{i} (i = 2, \dots, 5)$ . For simplicity, we directly cascade them to form a CNN feature $f = {P_{i}} = {P_{2}, P_{3}, P_{4}, P_{5}}$ . Thanks to the fusion of different layers with FPN and multi-scale shapes of deformed regions, the CNN feature we extracted covers multi-level sematic information.

4.2 BoW-based indexing and querying

After obtained all deformed regions with corresponding CNN features, BoW is applied to accomplish image representation for ASI data set with D images. Generally, each image is regarded as a “bag” of $l_{d}$ deformed regions detected from our DRD layer. $l_{d}$ changes with the number of auroral structures in the dth image.

On the stage of offline indexing, all CNN features from the ASI data set are gathered and clustered via AKM scheme [12]. With each cluster center treated as a visual “word”, a visual vocabulary composing of $K_{w}$ (set to 20K in this paper) visual words is constructed, and each deformed region can be quantified to its nearest cluster center. Hereafter, the indexing table $W = {W_{1}, \dots, W_{K_{w}}}$ can be built where each entry $W_{i}$ relates to one visual word. Here, $W_{i}$ consists of all deformed regions which are quantified to the ith visual word, in which the information of its image ID and CNN feature are saved.

On the stage of online querying, we follow the same steps for extracting CNN features of all deformed regions in the query image. Then, feature matching is conducted with the following function

m (f_{q}, f_{j}) = {\begin{array}{l} \exp (\frac{- e^{2}}{σ^{2}}), & i f q (f_{q}) = q (f_{j}), e < T \\ 0, & o t h e r w i s e \end{array},

where

f_{q}

and

f_{j}

are the CNN feature of specific deformed region from query image and indexed image, respectively.

e = E u c l i d e a n (f_{q}, f_{j})

is the Euclidean distance between

f_{q}

and

f_{j}

,

T

is an empirical threshold, and

\exp (- e^{2} / σ^{2})

is the weight measuring the matching degree with a parameter

σ

. The smaller the Euclidean distance, the higher value of the matching function.

After scanning all deformed regions, similarity score between the query image $Q$ and a data set image $I$ can be computed, i.e.,

S S (Q, I) = \frac{\sum_{f_{q} \in Q, f_{j} \in I} m (f_{q}, f_{j}) \cdot i d f^{2}}{\sqrt{{‖ I ‖}_{2}}} .

Noting that the TF-IDF scheme in BoW model is considered where $i d f = {D / D}_{v}$ is the IDF represented as the radio of $D$ (total number of images in the data set) over $D_{v}$ (the number of images containing specific visual word). ${‖ I ‖}_{2} = {(\sum_{i = 1}^{K_{w}} t_{i}^{2})}^{1 / 2}$ is the $l_{2}$ -norm of the ith visual word with the corresponding TF represented as $t_{i}$ . The higher the similarity score, the topper the rank of the image with the highest value refers to the most similar image.

5. Experiments and discussion

To prove the effectiveness of the proposed method, we compare the state-of-the-art image search methods including the BoW (baseline) [5], PE [4], MAC [34], MOP [26], PA [27], R-CNN [29], Faster R-CNN [30], Mask R-CNN [6] and spherical CNN [36]. All experiments are conducted on a computer with two Titan X GPUs and 128 GB of RAM.

The implementation settings of all comparison methods follow the corresponding papers, and the details are given in Table 1. The “type” column indicates whether the method is SIFT-based or CNN-based, the “architecture” column supplies the used network form with the number of layers. The fourth to the sixth columns represent the number of parameters, the region analysis scheme and the multi-scale scheme each method exploits, respectively. The “meshing scheme” column indicates whether the method considers the image deformation (polar, spherical or circular) or just treats test data as normal images (rectangular). In practice, all backbones of CNNs are trained with the stochastic gradient descent (SGD) scheme while the branch networks (such as RPN, FPN and DRD) are optimized with the back propagation (BP) scheme. In the CNN training procedure, the hyper-parameters are configured as common settings, i.e., momentum is set to 0.9, weights in each layer are initialized as Gaussian distribution with mean value as 0 and standard deviation as 0.01, weight decay is set to 0.0005, learning rate is set to 0.001 at the beginning and reduced to 0.0001 when there is small error rates gap between adjacent iterations. Additionally, to avoid interpolation errors, the inputs images for all methods are the original circular vision ones without any rectangular transformation.

Table 1. Implementation details of all comparison methods.

View Table | View all tables in this article

To analyze the change of search performance with the increase of size for data sets, we build five typical ASI aurora image data sets by filtering out uninformative data, i.e., ASI8K (the query image data set with 10 categories where each one has 800 images), ASI14K (the first overwintering observation data set captured from 19 days during December 2003 to January 2004), ASI100K (the one month data set of January 2005), ASI500K (the one year data set of 2006) and ASI1M (the large-scale data set in recent years to evaluate the scalability of image search methods). Noting that to focus on the contribution of our method for aurora content representation and search, images under bad weather, with heavy clouds and light pollution or contain little texture information are removed.

The search performance is measured by accuracy and efficiency which are evaluated by the mean average precision (mAP) and average query time, respectively. Additionally, sample results on aurora vortex search are presented to give a visualized comparison on search performance.

5.1 Quantitative comparison with the state-of-the-art methods

The quantitative comparison results of accuracy and efficiency are shown in Table 2, where the second to the sixth columns illustrate mAPs (%) using different methods for ASI aurora data sets with increasing sizes, and the last two columns present average query times (s) using different methods for ASI1M data set. Accordingly, we can summarize several conclusions as follows.

Table 2. Comparison of mAPs (%) and average query times (s) using different methods.

View Table | View all tables in this article

Accuracy. 1) The BoW and PE model apply the traditional hand-crafted features, lacking the self-training process leads to poor performance on accuracy. Specially, the PE surpasses the BoW because of the consideration of polar meshing scheme. 2) Compared with hand-crafted feature based methods, the CNN based methods show obvious advantages on mAP, demonstrating the strong representational capacity of CNN feature. 3) Among the CNN based methods, the MAC cascades all levels of convolutional layers from the global image. Although it uses multi-level information, the single scale processing results in inferior performance. 4) In contrast, the MOP and PA improve the mAP by introducing the CNN features extracted from regional patches in an image. 5) By combining the multi-level and multi-scale CNN feature together with the FPN scheme based on the R-CNN framework, the Faster R-CNN promotes the improvement of search accuracy. Furthermore, applying the recent advanced Mask R-CNN framework achieves even higher mAP. 6) Benefit from three-dimensional manifold and new definition of cross-correlation, the spherical CNN gets comparable performance with the Mask R-CNN. However, the spherical CNN applies the global network architecture which lacks the regional analysis layer, and thus weakens the power for region representation. Also, no consideration of multi-scale structure such as FPN and characteristics of ASI aurora limit its further improvement. 7) Remarkably, our method gets the highest mAP and keeps this tendency in all data sets with increasing sizes, demonstrating its effectiveness and stability. Especially, for the large-scale ASI1M data set, we achieve 67.66% mAP. This value is far ahead of other methods and thus proves the feasibility for practical application of assisting polar physicist with even bigger ASI aurora data.

Efficiency. On the stage of online search, there are mainly two processing modules composing the query time, i.e., feature extraction and ranking computation (separately presented in Table 2). It can be seen that the MOP consumes the most time in feature extraction because of its complex region division. On the contrary, the Faster R-CNN, Mask R-CNN and the proposed method save a lot of time, revealing that the R-CNN framework with region proposal network performs favorable in efficiency. For the ranking computation, MOP, PA, R-CNN and MAC perform poorly due to their linear scanning querying scheme. Thanks to the introduction of BoW-indexing scheme and the compact CNN feature, our method is the most time saving method.

5.2 Sample results-aurora vortex search

Due to the property of linking closely with the magnetic reconnection evens, the “vortex” is one of the most important auroral structures polar physicists curious about. Figure 9 illustrates example searching results for aurora vortex using different methods. Since the top 15 images of all comparison methods are true positives, only images ranking from 16 to 23 are shown. Note that the vortex structures in true positives are marked with orange solid boxes while the false positives are highlighted with red solid boxes.

Fig. 9 Sample results of aurora vortex search using different methods.

Download Full Size | PDF

It can be seen that the sample results on aurora vortex search broadly conform to the conclusion in quantitative accuracy evaluation. Compared with other methods recommending false positives, the search results of our method in top 23 methods are all true positives. This remarkable result further demonstrates the effectiveness of our method on ASI aurora search.

5.3 Discussion

Both the quantitative evaluation and sample illustration prove that our method greatly promotes the search accuracy. This improvement is benefits from two aspects, i.e., the design of the DRD layer and the introduction of BoW-based indexing. On one hand, the proposed DRD layer is designed based on the characteristics of ASI aurora, which ensures the determined region proposals accord with the principle of circular fisheye lens, thus strengthening the representative ability of CNN feature. On the other hand, the BoW-based indexing achieves compact and organized feature quantization, and thereby suppresses redundant feature computation which may hinder the online search. Notably, our method also achieves preferable efficiency compared with the state-of-the-art methods. This superiority mostly ascribes to the BoW-based querying which avoids the repetitive computation.

6. Conclusion

This paper is an attempt of intelligent image processing for circular fisheye lens, and we focus on one special application, i.e., ASI aurora image search. Our overall idea is to apply the advanced natural image processing model, i.e., CNN, with consideration of the imaging principle of circular fisheye lens. Based on this idea, we first make an in-depth analysis on the characteristics of ASI aurora, e.g., the structure of the ASI, the imaging principle of ASI aurora and the features of ASI aurora image. Subsequently, the state-of-the-art CNN model, Mask R-CNN is exploited as our framework. To accord with the circular fisheye lens, a DRD scheme is proposed to replace the traditional RPN in the original Mask R-CNN. Together with the FPN scheme, multi-scale regional CNN features can be extracted. Afterwards, to achieve compact and orderly organization, the BoW model is applied to describe an aurora image as a “bag” of deformed regions represented with CNN features. On the stage of offline indexing, a visual vocabulary is generated by AKM clustering while each deformed region is quantified to its nearest center (visual word) with the corresponding CNN feature being saved. On the stage of online querying, similarity scores are computed and the higher the value, the topper the rank of the image. We conduct experiments on both quantitative evaluation and sample illustration, and the results demonstrate that our method not only promotes the search accuracy but improves the search efficiency as well.

In the future, we will transfer our method to other images captured with anamorphic lens on various tasks, e.g., image classification, object recognition and event detection. We believe that the fusion of artificial intelligence with optics theory will create surprise innovations on both fields.

Funding

National Natural Science Foundation of China (61602355, 61432014, 61772402 and 61671339); National Key Research and Development Program of China (2016QY01W0200); China Post-Doctoral Science Foundation (2016M590926); Shaanxi Province Natural Science Foundation (2017JQ6007 and 2017JM6085); Shaanxi Province Post-Doctoral Science Foundation.

References and links

1. F. Sigernes, J. M. Holmes, M. Dyrland, D. A. Lorentzen, T. Svenøe, K. Heia, T. Aso, S. Chernouss, and C. S. Deehr, “Sensitivity calibration of digital colour cameras for auroral imaging,” Opt. Express 16(20), 15623–15632 (2008). [CrossRef] [PubMed]

2. C. Goenka, J. Semeter, J. Noto, J. Baumgardner, J. Riccobono, M. Migliozzi, H. Dahlgren, R. Marshall, S. Kapali, M. Hirsch, D. Hampton, and H. Akbari, “LiCHI - Liquid Crystal Hyperspectral Imager for simultaneous multispectral imaging in aeronomy,” Opt. Express 23(14), 17772–17782 (2015). [CrossRef] [PubMed]

3. S. B. Mende, R. H. Eather, and E. K. Aamodt, “Instrument for the monochromatic observation of all sky auroral images,” Appl. Opt. 16(6), 1691–1700 (1977). [CrossRef] [PubMed]

4. X. Yang, X. Gao, and Q. Tian, “Polar embedding for aurora image retrieval,” IEEE Trans. Image Process. 24(11), 3332–3344 (2015). [CrossRef] [PubMed]

5. J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Comput. Vis. (2003), pp. 1470–1477.

6. K. He, G. Gkioxari, P. Doll’ar, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (2017), pp. 2980–2988.

7. K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” Int. J. Comput. Vis. 60(1), 63–86 (2004). [CrossRef]

8. L. Juan and O. Gwun, “A comparison of sift, pca-sift and surf,” Int. J. Image Process. 3(4), 143–152 (2009).

9. T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002). [CrossRef]

10. S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2006), pp. 2169–2178. [CrossRef]

11. D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2006), 2161–2168.

12. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, ” Object retrieval with large vocabularies and fast spatial matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2007), pp. 1–8. [CrossRef]

13. H. Jegou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2010), pp. 3304–3311. [CrossRef]

14. S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987). [CrossRef]

15. H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in Proc. Eur. Conf. Comput. Vis. (2008), pp. 304–317. [CrossRef]

16. Y. Xia, K. He, F. Wen, and J. Sun, “Joint inverted indexing,” in Proc. IEEE Int. Conf. Comput. Vis. (2013), pp. 3416–3423.

17. L. Zheng, S. Wang, W. Zhou, and Q. Tian, “Bayes merging of multiple vocabularies for scalable image retrieval,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2014), pp. 1963–1970. [CrossRef]

18. W. Zhou, H. Li, J. Sun, and Q. Tian, “Collaborative index embedding for image retrieval,” IEEE Trans. Patt. Anal. Mach. Intell. 99, 1 (2017). [CrossRef]

19. L. Zheng, Y. Yang, and Q. Tian, “SIFT meets CNN: A decade survey of instance retrieval,” IEEE Trans. Patt. Anal. Mach. Intell. 99, 1 (2017). [CrossRef]

20. T. Cohen, M. Geiger, and M. Welling, “Convolutional networks for spherical signals,” arXiv preprint arXiv:1709.04893 (2017)

21. M. Wu, T. Leng, L. de Sisternes, D. L. Rubin, and Q. Chen, “Automated segmentation of optic disc in SD-OCT images and cup-to-disc ratios quantification by patch searching-based neural canal opening detection,” Opt. Express 23(24), 31216–31229 (2015). [CrossRef] [PubMed]

22. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Adv. Neural Inform. Process. Syst. (2012), pp. 1097–1105.

23. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. (2015).

24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016), pp. 770–778.

25. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), pp. 5987–5995. [CrossRef]

26. Y. Gong, L. Wang, R. Guo, and S. Lazebnik, “Multi-scale orderless pooling of deep convolutional activation features,” in Proc. Eur. Conf. Comput. Vis. (2014), pp. 392–407. [CrossRef]

27. L. Zheng, S. Wang, J. Wang, and Q. Tian, “Accurate image search with multi-scale contextual evidences,” Int. J. Comput. Vis. 120(1), 1–13 (2016). [CrossRef]

28. J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis. 104(2), 154–171 (2013). [CrossRef]

29. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2014), pp. 580–587. [CrossRef]

30. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). [CrossRef] [PubMed]

31. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016), pp. 779–788. [CrossRef]

32. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis. (2016), pp. 21–37.

33. J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), 6517–6525.

34. G. Tolias, R. Sicre, and H. Jegou, “Particular object retrieval with integral maxpooling of CNN activations,” in Proc. Int. Conf. Learn. Represent. (2016), pp. 1–12.

35. T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), pp. 936–944.

36. T. S. Cohen, M. Geiger, J. Koehler, and M. Welling, “Spherical CNNs,” in Proc. Int. Conf. Learn. Represent. (2018).

Methods	Type	Architecture (Network-layers)	Parameters (M)	Region analysis	Multi-scale	Meshing scheme
BoW (baseline)	SIFT	\	<0.001	SIFT 8]	\	Rectangular
PE	SIFT	\	<0.001	SIFT [8]&LBP [9]	\	Polar
MAC	CNN	AlexNet-8 [22]	~50	\	\	Rectangular
MOP	CNN	AlexNet-8 [22]	~50	SPM [10]	\	Rectangular
PA	CNN	AlexNet-8 [22]	~50	SPM [10]	\	Rectangular
R-CNN	CNN	AlexNet-8 [22]	~50	SS [28]	\	Rectangular
Faster R-CNN	CNN	VGG-16 [23]	~138	RPN [30]	FPN [35]	Rectangular
Mask R-CNN	CNN	ResNeXt-101 [25]	~0.45	RPN [30]	FPN [35]	Rectangular
Spherical CNN	CNN	ResNet-50 [23]	~1.40	\	\	Spherical
Our	CNN	ResNeXt-101 [25]	~0.45	DRD	FPN [ 35 ]	Circular

Methods	ASI8K	ASI14K	ASI100K	ASI500K	ASI1M	Time (feature extraction)	Time (ranking computation)
BoW (baseline)	49.01	47.65	46.65	42.32	39.09	1.21	0.77
PE	62.88	61.09	60.35	58.31	57.99	0.29	0.53
MAC	65.58	64.02	61.58	59.12	58.87	0.34	0.88
MOP	67.12	66.49	65.12	64.58	63.22	1.21	1.11
PA	68.00	67.15	66.87	65.41	64.98	0.91	0.85
R-CNN	66.54	65.42	64.85	63.52	62.12	0.38	0.87
Faster R-CNN	67.89	66.87	66.01	65.45	64.98	0.29	0.33
Mask R-CNN	69.01	68.47	67.95	66.24	64.12	0.31	0.31
Spherical CNN	70.24	68.54	67.65	65.98	64.25	0.56	0.54
Our	71.42	70.88	69.87	68.85	67.66	0.30	0.21

Methods	Type	Architecture (Network-layers)	Parameters (M)	Region analysis	Multi-scale	Meshing scheme
BoW (baseline)	SIFT	\	<0.001	SIFT 8]	\	Rectangular
PE	SIFT	\	<0.001	SIFT [8]&LBP [9]	\	Polar
MAC	CNN	AlexNet-8 [22]	~50	\	\	Rectangular
MOP	CNN	AlexNet-8 [22]	~50	SPM [10]	\	Rectangular
PA	CNN	AlexNet-8 [22]	~50	SPM [10]	\	Rectangular
R-CNN	CNN	AlexNet-8 [22]	~50	SS [28]	\	Rectangular
Faster R-CNN	CNN	VGG-16 [23]	~138	RPN [30]	FPN [35]	Rectangular
Mask R-CNN	CNN	ResNeXt-101 [25]	~0.45	RPN [30]	FPN [35]	Rectangular
Spherical CNN	CNN	ResNet-50 [23]	~1.40	\	\	Spherical
Our	CNN	ResNeXt-101 [25]	~0.45	DRD	FPN [ 35 ]	Circular

Methods	ASI8K	ASI14K	ASI100K	ASI500K	ASI1M	Time (feature extraction)	Time (ranking computation)
BoW (baseline)	49.01	47.65	46.65	42.32	39.09	1.21	0.77
PE	62.88	61.09	60.35	58.31	57.99	0.29	0.53
MAC	65.58	64.02	61.58	59.12	58.87	0.34	0.88
MOP	67.12	66.49	65.12	64.58	63.22	1.21	1.11
PA	68.00	67.15	66.87	65.41	64.98	0.91	0.85
R-CNN	66.54	65.42	64.85	63.52	62.12	0.38	0.87
Faster R-CNN	67.89	66.87	66.01	65.45	64.98	0.29	0.33
Mask R-CNN	69.01	68.47	67.95	66.24	64.12	0.31	0.31
Spherical CNN	70.24	68.54	67.65	65.98	64.25	0.56	0.54
Our	71.42	70.88	69.87	68.85	67.66	0.30	0.21

ASI aurora search: an attempt of intelligent image processing for circular fisheye lens

Abstract

1. Introduction

2. Related work

2.1 Normal image search based on the BoW model

2.2 Normal image search based on the CNN model

3. Characteristics of ASI aurora

3.1 Structure of the ASI

3.2 Imaging principle of the ASI aurora

3.3 Analysis of the ASI aurora image

4. ASI aurora image search

4.1 DRD layer for feature extraction

4.2 BoW-based indexing and querying

5. Experiments and discussion

5.1 Quantitative comparison with the state-of-the-art methods

5.2 Sample results-aurora vortex search

5.3 Discussion

6. Conclusion

Funding

References and links

Cited By

Figures (9)

Tables (2)

Equations (8)

Optics Express