Robust optical axis control of monocular active gazing based on pan-tilt mirrors for high dynamic targets

Ruimin Cao; Jian Fu; Hui Yang; Lihui Wang; Lihui Wang; Masatoshi Ishikawa

doi:10.1364/OE.439083

1. Introduction

As one of the critical technologies of the non-intrusive dynamic analysis system, an active vision system (AVS) dynamically gaze targets by controlling their optical parameters and the camera’s viewpoint, so as to obtain real-time images centered on the targets [1]. When subjected to transient impact during high dynamic movement, AVS could capture its trajectory and visual information. AVS is particularly important to cope with occlusions, limited field of view, and limited resolution of the camera [2,3]. Visual attention and the viewpoint of active camera control can help in focusing computational resources on the relevant element of the scene [4].

High dynamic targets (HDTs) in the real world expresses highly nonlinear trajectories and experience occlusion from the external environment. Owing to the inherently large inertia, it is difficult for traditional AVS to change its gaze direction promptly, thus facing a significant technical bottleneck [5]. Practical AVS for HDTs need to consider the real-time and stability performance simultaneously.

In the previous study, by controlling the angles of pan-tilt mirrors, an optical high-speed gaze controller system that included a static high-speed camera and a pupil shift optical design achieved a millisecond pan-tilt performance, and could rapidly control its gaze direction, with the help of the small inertia of the scanning motion of the pan-tilt mirrors [6]. Moreover, the high-speed gazing algorithms include the parallel processing architecture, self-window [7,8], and template matching, such that they decreased the time-consumption of the computational complexity and improved real-time performance [9–11]. Pan-tilt-mirror-based AVS (PTM-AVS) solved the insufficient high-speed dynamic response of the stacked-stage-based AVS and the low resolution of the multi-camera-based AVS when observing movement details [12]. However, HDTs that subjected to transient impact in the real world, it is difficult to obtain real-time images centered on the HDTs, and even leading to gazing failure. This becomes the bottleneck technology of the high-speed active vision. There are mainly two aspects affecting the gazing stability of PTM-AVS as follows:

1. Highly nonlinear trajectory of HDTs: when gazing for HDTs, the acceleration will change dramatically, leading to unpredictable changes in its trajectory. Owing to the time-delay and inertia, PTM-AVS cannot control the optical axis to maintain the HDTs within the self-window region, resulting in gazing failure [13]. The PTM-AVS usually resetting the viewing angle or adopting the larger scale fixed self-window to restore the target [9]. Never the less, the above methods will deteriorate the real-time performance.
2. Occlusion from the external environment. Owing to the occlusion, HDTs are partially or even completely occluded, causing the optical axis to deviate from the actual position, and even causing the HDTs to be lost [14]. Traditional compensate and prediction methods calculate the Jacobian matrix of the nonlinear trajectory [15], and the linear approximation results obtained through the first-order Taylor expansion are difficult to guarantee real-time and accuracy.

Therefore, to solve the above problems when gazing HDTs, this study focuses on the robust optical axis control mechanism of monocular PTM-AVS. In detail, an adaptive self-window was proposed to accommodate the HDTs within the region of interest (ROI). The minimum-envelope-ellipse and unscented-Kalman-filter methods were proposed to compensate and predict the angle of optical axis when the HDTs were blocked, thereby improving the gazing stability for HDTs while ensuring the real-time performance.

This paper is organized as follows: Section 2 compares the development and drawbacks of AVS and analyzes trajectory prediction methods for HDTs, and subsequently introduces the principle of PTM-AVS. Section 3 establishes the adaptive self-window mechanism. Section 4 presents the MEE-based optical axis compensation method under partial occlusion. Section 5 proposes the UKF-based optical axis prediction method under complete occlusion. Section 6 analyzes the results of frame rate, self-window scale, and verify the effectiveness of proposed mechanism on the stability of PTM-AVS. Section 7 concludes this paper.

2. Related works and motivation

2.1 Related works

To realize the motivation of the AVS, there are mainly three mechanism as follows:

1. Stacked-stage-based AVS. The camera is directly attached to the stacked mechanical stages. Stages drive the camera to adjust the view angle. Because of the broad field of view (FOV) and insensitivity to optical conditions, mechanical AVS is widely applied in robot vision and surveillance cameras [16]. However, owing to the inherently large inertia of the camera and stacked stages, the response time of the stacked-stage-based mechanical AVS is generally greater than 25ms, and the tracking cutoff frequency is less than 50Hz [17]. Therefore, it is difficult to meet the real-time performance that matches the high-speed camera’s capture rate and the efficiency of image processing algorithms [18–20].
2. Multi-camera-based AVS. Multiple high-speed optional static cameras cover a large area and are utilized to capture the target's kinematic data from different viewing angles. The computer calibrates the images and subsequently generates a three-dimensional virtual image. It is usually applied in TV broadcast [11]. This approach can be highly effective, but also incredibly costly. Not only are multiple cameras involved but you also must have them communicate with each other, which can be computationally expensive. Owing to the limitations of computer resources, this approach cannot achieve a broad FOV and high resolution simultaneously [21][22]. In the actual application scenarios, to observe the details of the moving targets more clearly, the local resolution will be improved by sacrificing the overall viewing angle.
3. Pan-tilt-mirror-based AVS. Previous research [6] proposed a high-speed AVS, named Saccade mirror, which includes a static camera, a pupil shift optical design and a set of pan-tilt mirrors. The system achieved a millisecond auto pan-tilt performance with a visual feedback to control the angles of pan-tilt mirrors. The camera is fixed while the optical path is quickly adjusted by panning and tilting mirrors, thereby changing the viewing angle [5,23]. Because the only rotating parts are two mirrors, the inertia of that subsystem can be considerably reduced. With the help of high-speed visual servo hardware, high-efficient image processing algorithms, parallel computing, and self-window [5][6], this approach can decrease the computational complexity of image processing while improving the real-time performance. The response time could be less than 3ms [24–26]. This approach leads to reduced inertia and outstanding real-time performance. Table 1 shows the comparison of the three schemes in real-time and environmental adaptability.

Table 1. Comparison with the three schemes of AVS.

View Table | View all tables in this article

When HDTs are occluded, it is necessary to predict the true trajectory of the target, then manipulate pan-tilt mirrors to the proper angles. Kalman filter is an optimal linear state estimation method for high-speed linear trajectory, but the prediction results for nonlinear trajectories is not ideal, the extended Kalman filter (EKF) obtains the linear approximation by ignoring the high-order terms through the first-order Taylor expansion; therefore, the EKF-based prediction results for a highly nonlinear trajectory are not accurate, and the Jacobian matrix uses several computer resources [27]. In addition, when EKF reaches a stationary state, it will lose the ability to track the steep trajectory. The filter divergence occurs if the error propagation function cannot be approximated with a linear function [28]. UKF utilizes the unscented transformation that applies the Kalman filter under the linear assumption to the nonlinear system, approximating the nonlinear function’s probability density distribution rather than the nonlinear function itself [29]. Through sampling, the new Gaussian distribution is utilized to approximate the distribution of nonlinear transformation. UKF has a higher estimation accuracy than EKF.

2.2 Research motivation

PTM-AVS consists of the following three modules, as shown in Fig. 1:

1. High-speed image processing: It includes a computer and high-speed camera. The camera captures the targets’ image, whereas the computer runs the image processing algorithm such as feature extraction, self-window, and trajectory prediction, and thereafter generates the control commands to drive mirrors.
2. Pupil shift: It consists of several customized optical lenses to build the pupil shift system, which transfers the position of the camera pupil into the proximity of the two mirrors. The pupil shift can obtain an extended angle of view even using small mirrors.
3. Optical path adjustment: There are two mirrors for the X axis and Y axis, and the commands signals for the mirrors is computed and sent from the computer to the motor drivers. When the mirrors are rotated, the optical gazing direction is changed and the view of the target can be centered.

Fig. 1. Schematic diagram of optical axis control of monocular active gazing based on pan-tilt mirrors for HDTs. It consists of three modules, which is high-speed image processing, pupil shift, and optical path adjustment.

Download Full Size | PDF

In the PTM-AVS, the camera itself was static, and an external optical path adjustment subsystem was used for optical gaze control. The PTM-AVS has reduced inertia and outstanding high-speed performance. An arbitrary gaze direction within ±30° optical stroke for both pan and tilt could be achieved in less than 3.5ms. Once the target is detected in the captured image, the features of the target image are extracted. The control goal of PTM-AVS is to make the centroid of the target image (TIC) coincide with the center of the captured image (CIC). Once the target moves, the TIC deviates from the CIC. By calculating the distance between the two and adjusting the mirror angle, and then changing the optical axis so that the TIC is located at the CIC again in the new field of view. However, when gazing for HDTs, owing to the rapid change of acceleration and the system response time-delay of HDTs, PTM-AVS cannot keep up with the HDTs within the fixed self-window region, resulting in the loss of the target’s kinematic information [9]. To restore the target, methods such as resetting the viewing angle to the default value or adopting the larger scale fixed self-window are used, but these methods will deteriorate the real-time performance.

Meanwhile, because of the occlusion from the external environment, targets are occluded partially or completely, and the recognized centroid deviates from the actual position, even causing the HDTs to be lost [30,31].

3. Adaptive self-window

Self-window is an efficient feature extraction mechanism for high-speed image processing [9,10] It is a local fixed ROI in the original image, as shown in Fig. 2(a). By processing the pixels in this region instead of the entire original image, the computational complexity of image processing can be significantly decreased, and thus the efficiency can be improved. Fixed self-window is suitable for gazing targets with high-speed but low-acceleration. The acceleration of HDTs change rapidly, the response speed of mirrors cannot maintain HDTs within the fixed self-window. HDTs jump out of the region, leading to the failure of target gazing. Although extending the fixed self-window region could improve the success rate of gazing, it will significantly reduce the real-time performance of the image processing.

Fig. 2. Self-window mechanism for image processing. (a) Fixed self-window; (b) Proposed adaptive self-window

Download Full Size | PDF

Adaptive self-window was first proposed in [13]. This study deepens the theory of adaptive self-window, and gives the solving method of adaptive rate. Extract the target’s acceleration component; once the target jumps out of the region, the self-window expands temporarily at an appropriate rate, as shown in Fig. 2(b). Finally, the target can always be kept within the self-window.

To begin with, we establish the image coordinate system ${x_\textrm{I}}O{y_\textrm{I}}$ and the self-window coordinate system ${x_\textrm{S}}O{y_\textrm{S}}$, respectively. This study focuses on the high-speed movement (up-down, left-right and roll) of the target in the vertical focal plane of the galvanometer. An image in this study is expressed as $({x_{0}},{y_{0}},w,h )$, where $({{x_{0}},{y_{0}}} )$ denotes the coordinate of the upper left corner of the image, $({w,h} )$ denotes the width and height. Define the image vector ${{\boldsymbol M}_{{\mathop{\rm Im}\nolimits} }}\textrm{ = }{({0,0,{w_\textrm{I}},{h_\textrm{I}}} )^\textrm{T}}$ and its size vector ${{\boldsymbol S}_{{\mathop{\rm Im}\nolimits} }} = {({{w_\textrm{I}},{h_\textrm{I}}} )^\textrm{T}}$. The self-window vector in ${x_\textrm{I}}O{y_\textrm{I}}$ is expressed as ${{\boldsymbol M}_\textrm{S}}\textrm{ = }{({{x_{\textrm{S0}}},{y_{\textrm{S0}}},{w_\textrm{S}},{h_\textrm{S}}} )^\textrm{T}}$ and its size vector ${{\boldsymbol S}_\textrm{S}} = {({{w_\textrm{S}},{h_\textrm{S}}} )^\textrm{T}}$.

Define the scale vector from the image to self-window ${\boldsymbol{Scale}} = {{\boldsymbol S}_{\textrm{Im}}}./{{\boldsymbol S}_\textrm{S}}$, where ./ represents the element division operator. For a particular image, the larger the $|{{\boldsymbol{Scale}}} |$, the smaller the self-window region. PTM-AVS captures real-time images centered on the HDTs. Meanwhile, the target moves near the center of the self-window; thus, the self-window region ${{\boldsymbol M}_\textrm{S}}$ can be expressed as the following Eqs. (1)–(2):

(1)$${[{{x_{\textrm{S0}}}, {y_{\textrm{S0}}}} ]^\textrm{T}} = 0.5{{\boldsymbol S}_{{\mathop{\rm Im}\nolimits} }}. \times ({1 - 1./{\boldsymbol{Scale}}} )$$

(2)$${{\boldsymbol S}_\textrm{S}}\textrm{ = }{({{w_\textrm{S}},{h_\textrm{S}}} )^\textrm{T}}\textrm{ = }{{\boldsymbol S}_{\textrm{Im}}}./{\boldsymbol{Scale}}, $$

where .×represents the element multiplication operator.

The bias redundancy (Br) of the self-window is defined as the maximum allowable bias of the TIC when the recognized region of HDTs reaches the boundary of the self-window without occlusion. ${\boldsymbol{Br}} = {\left( {\begin{array}{cc} {B{r_x}}&{B{r_y}} \end{array}} \right)^\textrm{T}}$ can be expressed as the following Eq. (3):

(3)$${\boldsymbol{Br}} = 0.5\left( {{{\boldsymbol S}_S} - {{\left[ {\begin{array}{cc} {\max ({|{{\textbf{D}_x}} |} )}&{\max ({|{{\textbf{D}_y}} |} )} \end{array}} \right]}^\textrm{T}}} \right), $$

where $\max ({|{{\textbf{D}_x}} |} )$ and $\max ({|{{\textbf{D}_y}} |} )$ assign the maximum component of the recognized region in x and y directions of ${x_\textrm{S}}O{y_\textrm{S}}$.

Define the fixed self-window ${\textbf{D}_\textrm{S}}$, target maximum recognized region ${\textbf{D}_{\textrm{T0}}}$, and recognized region at k times ${\textbf{D}_\textrm{T}}(k )$. As shown in Fig. 3(a), in the fixed self-window, $\forall {\textbf{D}_\textrm{T}}(k )\subset {\boldsymbol{R}^2}$, if ${\textbf{D}_\textrm{S}} \cap {\textbf{D}_\textrm{T}}(k )= {\textbf{D}_{\textrm{T0}}}$, then the target can be stably gazed. Otherwise, if ${\textbf{D}_\textrm{S}} \cap {\textbf{D}_\textrm{T}}(k )\subset {\textbf{D}_{\textrm{T0}}}$, the recognized region of the target within the self-window is reduced, resulting in the recognition error ${\boldsymbol B^{\prime}} - {\boldsymbol B^{\prime\prime} }$, as shown in Fig. 3(b). The adaptive self-window proposed in this paper can expand the self-window’s region according to the target kinematic information. The gazing stability for the HDT is improved by increasing the bias redundancy, as shown in Fig. 4(c). In ${x_\textrm{S}}O{y_\textrm{S}}$, define the CIC ${{\boldsymbol L}_{\textrm{RD}}}$ and the TIC ${{\boldsymbol L}_{\textrm{RT}}}(k )$ at k times as the following Eqs. (4)–(5):

(4)$${{\boldsymbol L}_{\textrm{RD}}} = {({{x_{\textrm{RD}}},{y_{\textrm{RD}}}} )^\textrm{T}} = {({{w_\textrm{S}}/2,{h_\textrm{S}}/2} )^\textrm{T}}$$

(5)$${{\boldsymbol L}_{\textrm{RT}}}(k )= {({{x_{\textrm{RT}}}(k ),{y_{\textrm{RT}}}(k )} )^\textrm{T}}$$

and the acceleration of the target at k times as the following Eq. (6):

(6)$${{\boldsymbol a}_{\textrm{RT}}}(k )= {{\boldsymbol L}_{\textrm{RT}}}(k )- \textrm{2}{{\boldsymbol L}_{\textrm{RT}}}({k - 1} )+ {{\boldsymbol L}_{\textrm{RT}}}({k - 2} )$$

Notably, the acceleration acquired from the captured images jumps with high frequency. We adopt the weighted recursive average filter for de-noising. Let weight vector ${\boldsymbol {\lambda} }(k )\textrm{ = }{({{\lambda_i}|{1 \le i \le n} } )^\textrm{T}}$, $\sum {{\lambda _i}} = 1$, ${{\boldsymbol A}_{\textrm{RT}}}(k )= {({{{\boldsymbol a}_{\textrm{RT}}}(i )|{k - n < i \le k} } )^\textrm{T}}$, $n$ denotes the recursive series. The de-noised acceleration ${{\boldsymbol a}_n}(k )$ at k time is expressed as the following Eq. (7):

(7)$${{\boldsymbol a}_n}(k )= {\boldsymbol {\lambda} }{(k )^\textrm{T}}{{\boldsymbol A}_{\textrm{RT}}}(k )$$

Fig. 3. Bias redundancy (Br) of the self-window. (a) Bias redundancy of the target in the fixed self-window; (b) the recognized error of TIC when target exceeds the bias redundancy in the fixed self-window; (c) Bias redundancy of the proposed adaptive self-window after expanding the region.

Download Full Size | PDF

Fig. 4. Schematic of the intersection of target and occluder. After target are blocked, D_B disappear and PTM-AVS can only recognize D_E,.

Download Full Size | PDF

Define the adaptive ratio vector of self-window ${\boldsymbol {\gamma} }(k )= {({{\gamma_\textrm{x}}(k ),{\gamma_\textrm{y}}(k )} )^\textrm{T}}$, taking the fixed self-window center as the origin point to expand the region according to the adaptive ratio. The expanded self-window is expressed as the following Eqs. (8)–(9):

(8)$${[{{x_{\textrm{S0}}}, {y_{\textrm{S0}}}} ]^\textrm{T}} = 0.5{{\boldsymbol S}_{{\mathop{\rm Im}\nolimits} }}. \times ({1 - 1./{\boldsymbol{Scale}}} )- 0.5{{\boldsymbol a}_n}(k ). \times {\boldsymbol {\gamma} }{(k )^\textrm{T}}$$

(9)$${{\boldsymbol S}_\textrm{S}} = {({{{\boldsymbol S}_{{\mathop{\rm Im}\nolimits} }}./{\boldsymbol{Scale}}} )^\textrm{T}} + {{\boldsymbol a}_n}(k ). \times {\boldsymbol {\gamma} }{(k )^\textrm{T}}$$

By calculating the bias between ${{\boldsymbol L}_{\textrm{RD}}}$ and ${{\boldsymbol L}_{\textrm{RT}}}(k )$, the control voltage signals of the mirrors driven by the motors are generated to adjust the viewing angle. Define control voltage vector ${\boldsymbol{\varDelta} V}(k )$ and angle increment vector ${\boldsymbol{\varDelta} }\phi (k )$ as the following Eqs. (10)–(11):

(10)$${\boldsymbol{\varDelta} V}(k )= \varphi ({{{\boldsymbol L}_{\textrm{RD}}},{{\boldsymbol L}_{\textrm{RT}}}(k )} )$$

(11)$${\boldsymbol{\varDelta} }\phi (k )= Lp({{\boldsymbol{\varDelta} V}(k )} ), $$

where $\varphi ({\cdot} )$ is the feedback control function, $Lp({\cdot} )$ is the mapping function between voltage signal and angle increment. Suppose that the target is moving in the focal plane vertical to the galvanometer. Because targets are captured with a high frame rate, viewing angle increment ${\boldsymbol{\varDelta} }\phi (k )$ in adjacent frames is sufficiently small, $l\sin ({{\boldsymbol{\varDelta} }\phi (k )} )\approx l{\boldsymbol{\varDelta} }\phi (k )$. Define the bias redundancy ${\boldsymbol{Br}}(k )= {\left( {\begin{array}{cc} {B{r_x}(k )}&{B{r_y}(k )} \end{array}} \right)^\textrm{T}}$ in the adaptive self-window. To maintain the target within the self-window, the following conditions (12)-(13) need to be met:

(12)$$|{{{\boldsymbol L}_{\textrm{RT}}}(k )- {{\boldsymbol L}_{\textrm{RT}}}({k - 1} )} |- l{\boldsymbol{\varDelta} }\phi \le {\boldsymbol{Br}}(k )$$

(13)$${{\boldsymbol {\gamma} }^\textrm{T}} = \left\{ {\begin{array}{cc} {({0,0} )}&{{\textbf{D}_\textrm{S}} \cap {\textbf{D}_\textrm{T}}(k )= {\textbf{D}_\textrm{T}}}\\ {2{\boldsymbol{Br}}(k )./{{\boldsymbol a}_n}(k )}&{\textrm{else}} \end{array}} \right., $$

where l is the vertical distance between the mirror and the target’s focal plane. By calibrating the vertical distance l and the area of the target image ${s_\textrm{T}}$ in advance, when the target is moving in a high dynamic, l(k) can be obtained by calculating ${s_\textrm{T}}(k )$ of the target image at the current moment.

4. Minimum-envelope-ellipse-based optical axis compensation

When HDTs are blocked, PTM-AVS can only recognize partial images, thus optical axis deviates from the correct position. We assume that an ellipse can minimally envelope the image region of target with the major axis l and the minor axis $w$. We propose the minimum-envelope-ellipse (MEE)-based optical axis compensation method. Define the occlusion region ${\textbf{D}_\textrm{O}}$, and the minimum envelop region of HDTs ${\textbf{D}_\textrm{T}}$. When the target is partially occluded, ${\textbf{D}_\textrm{T}}$ can be divided into two sub-regions, as shown in Fig. 4:

• Occluded region ${\textbf{D}_\textrm{B}}$: $\forall ({x,y} )\in {\textbf{D}_\textrm{O}} \cap {\textbf{D}_\textrm{T}}$
• Recognized region ${\textbf{D}_\textrm{E}}$: $\forall ({x,y} )\in ({\neg {\textbf{D}_\textrm{O}}} )\cap {\textbf{D}_\textrm{T}}$

$\forall ({x,y} )\in {\textbf{D}_\textrm{E}}$, the boundary curve of the minimum envelope ellipse ${{\boldsymbol D}_\textrm{E}}(k )$ at k times is defined as the following Eq. (14):

(14)$${{\boldsymbol D}_\textrm{E}}(k )\textrm{ = }{({{{\boldsymbol L}_{\textrm{RT}}}(k ),l(k ),w(k ),\theta (k )} )^\textrm{T}}, $$

where the ${{\boldsymbol L}_{\textrm{RT}}}(k )$, major axis $l(k )$, minor axis $w(k )$, and angle $\theta (k )$ are calculated with [32].

The initial minimum envelope ellipse ${\boldsymbol D}_\textrm{E}^0\textrm{ = }{({{\boldsymbol L}_{\textrm{RT}}^0, {l_0}, {w_0}, {\theta_0}} )^\textrm{T}}$. When the target is occluded, the recognized region of the target will be reduced. At this time, the recognized minimum envelope ellipse ${{\boldsymbol D}_\textrm{E}}(k )\textrm{ = }{({{{\boldsymbol L}_{\textrm{RT}}}(k ), l(k ), w(k ), \theta (k )} )^\textrm{T}}$, as shown in Fig. 5. Calculate the components of $\Delta l(k )$, $\Delta w(k )$ as the following Eqs. (15)–(17), respectively:

(15)$$\Delta l(k )\textrm{ = }{l_0}\textrm{ - }l(k )\cos ({\theta (k )- {\theta_0}} )\textrm{ - }w(k )\sin ({\theta (k )- {\theta_0}} )$$

(16)$$\Delta w(k )= {w_0}\textrm{ - }l(k )\sin ({\theta (k )- {\theta_0}} )\textrm{ - }w(k )\cos ({\theta (k )- {\theta_0}} )$$

(17)$$\Delta \theta (k )\textrm{ = }{\theta _0}\textrm{ - }\theta (k )$$

Fig. 5. Schematic diagram of the principle of optical axis compensation mechanism. Region of target enclosed by solid lines assigns the recognized target image and region enclosed by dotted lines assigns the compensated target image with proposed methods.

Download Full Size | PDF

Moreover, the compensation direction is related to the relative position of the target and the occluder. It is expressed by the increment of the control voltage ${\boldsymbol{\varDelta} V}(k )$ and the region increment $\Delta {{\textbf D}_\textrm{E}}(k )\textrm{ = }{{\textbf D}_\textrm{E}}(k )\textrm{ - }{{\textbf D}_\textrm{E}}({k - 1} )$. The compensated TIC ${\boldsymbol L}_{\textrm{RT}}^\textrm{m}(k )$, major axis ${l_\textrm{m}}(k )$, minor axis ${w_\textrm{m}}(k )$, and angle ${\theta _\textrm{m}}(k )$ are calculated as the following Eqs. (18)–(22):

(18)$${\boldsymbol {\xi} }(k )\textrm{ = }\frac{{\Delta {\boldsymbol V}(k )}}{{|{\Delta {\boldsymbol V}(k )} |}}. \times \frac{{\Delta {{\boldsymbol D}_\textrm{E}}(k )}}{{|{\Delta {{\boldsymbol D}_\textrm{E}}(k )} |}}$$

(19)$${\boldsymbol L}_{\textrm{RT}}^\textrm{m}(k )= {{\boldsymbol L}_{\textrm{RT}}}(k )+ 0.5{\boldsymbol {\xi} }(k ). \times {\left[ {\begin{array}{cc} {\Delta l(k )}&{\Delta w(k )} \end{array}} \right]^\textrm{T}}$$

(20)$${l_\textrm{m}}(k )= l(k )+ \Delta l(k )$$

(21)$${w_\textrm{m}}(k )= w(k )+ \Delta w(k )$$

(22)$${\theta _m}(k )= \theta (k )+ \Delta \theta (k )$$

5. Unscented Kalman filter-based optical axis prediction

Extended Kalman filter (EKF) and the UKF has been used for nonlinear tracking estimation [33]. In this study, UKF is adopted to predict the nonlinear trajectory of HDTs and control the optical axis under complete occlusion. Consider the following highly nonlinear trajectory for the HDTs, difference equation, and observation model with additional noise described in Eqs. (23)–(24):

(23)$${{\boldsymbol x}_k} = f({{{\boldsymbol x}_{k - 1}}} )+ {{\boldsymbol w}_k}$$

(24)$${{\boldsymbol z}_k} = h({{{\boldsymbol x}_k}} )+ {{\boldsymbol v}_k}, $$

where ${{\boldsymbol x}_k} = {({{x_{\textrm{RT}}}(k ),{y_{\textrm{RT}}}(k ),v{x_{\textrm{RT}}}(k ),v{y_{\textrm{RT}}}(k )} )^\textrm{T}} \in {{\mathbb R}^4}$, ${{\boldsymbol z}_k} = {({{x_{\textrm{RT}}}(k ),{y_{\textrm{RT}}}(k )} )^\textrm{T}} \in {{\mathbb R}^2}$, ${{\boldsymbol w}_k} \sim {\mathbb N}({0,{\boldsymbol Q}} ), {\boldsymbol Q} \in {{\mathbb R}^{4 \times 4}}$, ${{\boldsymbol v}_k} \sim {\mathbb N}({0,{\boldsymbol R}} ), {\boldsymbol R} \in {{\mathbb R}^{2 \times 2}}$.

Initial state ${{\boldsymbol x}_0}$ is a random vector whose mean ${{\boldsymbol {\mu} }_0}\textrm{ = E}[{{{\boldsymbol x}_0}} ]$ and covariance ${{\boldsymbol {\Sigma} }_0}\textrm{ = E}[({{{\boldsymbol x}_0} - {{\boldsymbol {\mu} }_0}} ) {{({{{\boldsymbol x}_0} - {{\boldsymbol {\mu} }_0}} )}^\textrm{T}} ]$. ${{\boldsymbol x}_{k - 1}}$ contains 9 sigma points (normally select 2n+1 points, where n is the dimension of the state space), weights are expressed as $\{{({{\boldsymbol x}_{k - 1}^i,{w^i}} )|{i = 0 \cdot{\cdot} \cdot 2n} } \}$, and the rules for selecting sigma points are as in the following Eqs. (25)–(32) [34]:

(25)$$\sum\limits_i {{w^i}} = 1$$

(26)$${{\boldsymbol {\mu} }_{k - 1}}\textrm{ = }\sum\limits_i {{\boldsymbol x}_{k - 1}^i} {w^i}$$

(27)$${\Sigma _{k - 1}}\textrm{ = }\sum\limits_i {{w^i}({{\boldsymbol x}_{k - 1}^i - {{\boldsymbol {\mu} }_{k - 1}}} ){{({{\boldsymbol x}_{k - 1}^i - {{\boldsymbol {\mu} }_{k - 1}}} )}^\textrm{T}}}$$

(28)$${\boldsymbol x}_{k - 1}^0 = {{\boldsymbol {\mu} }_{k - 1}}$$

(29)$${\boldsymbol x}_{k - 1}^i = {{\boldsymbol {\mu} }_{k - 1}} \pm {\left( {\sqrt {({4 + \lambda } ){\boldsymbol{\varSigma }_{k - 1}}} } \right)_i}$$

(30)$$w_\textrm{m}^0 = \lambda /({\lambda + 4} )$$

(31)$$w_\textrm{c}^{0} = w_\textrm{m}^{0} + ({1 - {\alpha^2} + \beta } )$$

(32)$$w_\textrm{m}^i = w_\textrm{c}^i = 1/({2({4 + \lambda } )} ), $$

where $\lambda \textrm{ = }{\alpha ^2}({n + \kappa } )- n$ denotes the scale factor. $w_\textrm{m}^i$ is utilized to calculate the mean, whereas $w_\textrm{c}^i$ is utilized to calculate the variance. $\alpha \in ({0,1} ]$, $\kappa \ge 0$, $\beta \textrm{ = }2$. Let the estimate of the state at k times be ${\mathbb N}({{{\boldsymbol {\mu} }_k},{{\boldsymbol {\Sigma} }_k}} )$, the sigma points obtained from the estimation are as in the following Eq. (33) [35]:

(33)$${\boldsymbol{\chi }_{\textrm{k - 1}}}\textrm{ = }\left\{ {{{\boldsymbol {\mu} }_{k - 1}},{{\boldsymbol {\mu} }_{k - 1}} + {{\left( {\sqrt {({4 + \lambda } ){{\boldsymbol {\varSigma} }_{k - 1}}} } \right)}_i},{{\boldsymbol {\mu} }_{k - 1}} - {{\left( {\sqrt {({4 + \lambda } ){{\boldsymbol {\varSigma} }_{k - 1}}} } \right)}_i}} \right\}$$

The priori estimate of the current states are calculated as in the following Eqs. (34)–(35):

(34)$${\boldsymbol {\mu} }_\textrm{k}^\textrm{ - }\textrm{ = }\sum\limits_i {w_\textrm{m}^if({\chi_{k - 1}^i} )} $$

(35)$${\boldsymbol {\varSigma} }_k^ -{=} \sum\limits_i {w_\textrm{c}^i({f({\chi_{k - 1}^i} )- {\boldsymbol {\mu} }_k^ - } ){{({f({\chi_{k - 1}^i} )- {\boldsymbol {\mu} }_k^ - } )}^\textrm{T}}} + {\boldsymbol Q}$$

Sigma points obtained through priori estimation are as in the following Eq. (36):

(36)$$\chi _\textrm{k}^ - \textrm{ = }\left\{ {\boldsymbol{\mathrm{\mu}}_\textrm{k}^ - ,\boldsymbol{\mathrm{\mu}}_k^ -{+} {{\left( {\sqrt {({4 + \lambda } ){\bf \sum }_k^ - } } \right)}_i},\boldsymbol{\mathrm{\mu}}_k^ -{-} {{\left( {\sqrt {({4 + \lambda } ){\bf \sum }_k^ - } } \right)}_i}} \right\}$$

and the posterior estimate is as in the following Eqs. (37)–(39) [36]:

(37)$${\hat{{\boldsymbol z}}_k} = \sum\limits_i {w_\textrm{m}^ig({\chi_k^{ - i}} )}$$

(38)$${{\boldsymbol S}_k} = \sum\limits_i {w_\textrm{c}^i({g({\chi_k^{ - i}} )- {{\hat{{\boldsymbol z}}}_k}} ){{({g({\chi_k^{ - i}} )- {{\hat{{\boldsymbol z}}}_k}} )}^\textrm{T}} + {\boldsymbol R}} $$

(39)$${\boldsymbol {\varSigma} }_k^{ - x,z} = {\sum\limits_i {w_\textrm{c}^i({\chi_k^{ - i} - {\boldsymbol {\mu} }_k^ - } )({g({\chi_k^{ - i}} )- {{\hat{{\boldsymbol z}}}_k}} )} ^\textrm{T}}$$

6. Experimental analysis

In this section, the collision process is utilized as a case of high dynamic trajectory to study PTM-AVS’s gazing process of HDTs. The control variable method is adopted to analyze experiments. Unless otherwise specified, the default value of the variables are summarized in Table 2. Assume the root mean square error (RMSE) to be the robust evaluation index of PTM-AVS.

Table 2. Default value of the variables.

View Table | View all tables in this article

The experimental device of PTM-AVS is shown in Fig. 6. Real-time performance of PTM-AVS can be improved with the following methods to make possible feature extraction speed compatible with 500 fps image refresh rate:

1. Organizational structure of parallel computing: since the image reading and writing takes up a lot of time, thus the code is divided into three main parts: 1) image capture and format conversion based on high-speed cameras(≥2.532ms); 2) feature recognition, TIC calculation, adaptive self-window, incremental PID feedback control algorithm, MEE-based optical axis compensation, UKF-based optical axis prediction (≤0.293ms); 3) target gazing image writing(≤0.984ms).
2. The Basler acA-640-750uc camera resolution is 640×480 pixels, and the frame rate can was set as 500fps by adjusting the exposure time at 1900µs.

Fig. 6. PTM-AVS experimental device. (a) Galvanometer driver boards; (b) GSI galvanometer scanners; (c) Basler acA-640-750uc monocular camera; (d) Target; (e) Pupil shift optics design; (f) High-speed Interface D/A board; (g) Image processing of DELL Workstation 7810; (h) Oscilloscope ; (i) signal generator; (j) Fill lights.

Download Full Size | PDF

6.1 Frame rate and self-window scale

Frame rate increases from 50 to 500 fps, and the results of the robustness of gazing on frame rates is shown in Fig. 7(a). RMSE gradually decreases as the frame rate increases. Compared to the traditional camera with a low frame rate at 25∼30 fps, a high-speed camera for target capture can improve the gazing stability of PTM-AVS.

Fig. 7. Stability curve of target gazing. (a) RMSE on different frame rates; (b) RMSE and time-consumption on different self-window scales.

Download Full Size | PDF

The self-window scale increases from 1.5 to 4.5, and the RMSE and time-consumption on the different self-window scales are shown in Fig. 7(b). As the self-window scale increases, the self-window region gradually decreases, and the RMSE and time-consumption reduced quickly at first, but gradually stabilized. Fit the RMSE and time- consumption data to obtain the fitting curves. It can be discovered that RMSE and the time-consumption fitting curve are related to the reciprocal of Scale²; this can be explained from the operation mechanism of a self-window. Scale² denotes the area of the self-window. The larger the scale², the smaller the area, the higher the efficiency of the image processing algorithm, and the lower the time-consumption. Increasing the self-window scale can effectively improve the robustness and real-time performance. However, the region gradually stabilizes after the scale exceeds 4.0 because the absolute reduction of the self-window region gradually decreases.

6.2 Adaptive self-window and fixed self-window

Table 3 lists the maximum and minimum displacement of each collision process for the adaptive and fixed self-windows. Average bias redundancy refers to the average value of the difference between the maximum and minimum displacement during 5 collisions. Compared with the fixed self-window, the adaptive self-window proposed in this paper improves the bias redundancy by 22.19% at the expense of 14.23% of the time-consumption.

Table 3. Comparison between proposed adaptive self-window and fixed self-windows.

View Table | View all tables in this article

Figure 8 shows the recognized target in the self-window. When the target is static or uniformly moving, it can be kept at the center of the fixed self-window; however, when the target moves dynamically, hits the ground, and bounces as an example, the acceleration changes sharply. Owing to the response time-delay, optical axis cannot be adjusted in time, in the fixed self-window, the target almost jumps out of the self-window region, whereas in the adaptive self-window, the region in the Y direction is adjusted based on the acceleration. By expanding the bias redundancy, the target remains in the self-window region, the high dynamic target can be always maintained within the self-window, thereby improving the robustness of target gazing.

Fig. 8. Feature image of targets with adaptive and fixed self-windows. (a) static or uniformly moving target with a fixed self-window; (b) high dynamic target in the fixed self-window jumps out of the region; (c) the region is adjusted according to the acceleration in the adaptive self-window in vertical direction; (d) the region is adjusted in the adaptive self-window in horizontal direction; (see Visualization 1).

Download Full Size | PDF

6.3 Optical axis compensation under partial occlusion

Figure 9 shows the static optical axis compensation results for occluded targets under different occlusion rates. Among them, (a)-(e) show the recognized feature as the occlusion rate increases respectively, and (f)-(j) correspond to the compensated results of occluded targets. The target’s contour drawn after compensation can fit the boundary of the target’s unoccluded region well. Figure 10(a) shows the recognized bias and its compensation curve with the occlusion rate in the x and y directions. As the occlusion rate increases, the recognized bias of the target gradually increases while the compensated target can be centered on the image, the maximum compensation error rate is 1.46% and 1.09%, as shown in Table 4.

Fig. 9. Static optical axis compensation results for occluded targets. (a)–(e) feature recognition result of the target under partial occlusion; (f)–(j) compensation results of occluded targets (see Visualization 2).

Download Full Size | PDF

Table 4. Compensation results of the optical axis for static and dynamic targets.

View Table | View all tables in this article

The dynamic compensation results for occluded targets in the x and y directions is shown in Fig. 10(b). Compared to the static compensation, the RMSE and maximum compensation error rate of the dynamic compensation increase. The reasons are mainly from the following two aspects: 1) the acceleration change in the nonlinear motion, and 2) the state transition from no occlusion to partial occlusion according to the conditional threshold. Owing to the target detection error, there is no apparent boundary between no occlusion and partial occlusion. Therefore, the jump phenomenon occurs during the state transition. Optimizing the threshold can decrease, but not eliminate this phenomenon. Nevertheless, the dynamic compensation error is still acceptable.

Fig. 10. Bias curve of centroid under partial occlusion and compensation curve of optical axis. (a) Static bias curve along with the occlusion rate; (b) Dynamic bias curve.

Download Full Size | PDF

6.4 Optical axis prediction under complete occlusion

The UKF is used to predict the angle of optical axis so as to gaze HDTs with highly nonlinear trajectories under complete occlusion. The prediction results of the linear Kalman filter (LKF) and UKF are compared, as shown in Fig. 11(a).

Fig. 11. Target prediction results based on UKF and LKF under complete occlusion. (a) local enlarged view of prediction curve; (b) prediction error curve.

Download Full Size | PDF

At the extreme value of the curve under no occlusion, the target hits the ground and bounces. During this time, the target elastic deformation. If the target is obscured, its elastic deformation cannot be observed in real-time. Therefore, both UKF and LKF will have prediction errors. However, compared with LKF, UKF calculates the probability density distribution of the nonlinear falling trajectory based on unscented transformation, the trajectory is estimated more accurately, and the maximum prediction error rate is less than that of LKF. The prediction error curves of the UKF and the LKF are shown in Fig. 11(b). The absolute value of the maximum prediction error and the prediction error rate of UKF are smaller than those of LKF. The maximum prediction error rate of LKF reaches 13.88%, while that of UKF is only 4.48%, as shown in Table 5. Therefore, for the HDTs, the prediction accuracy of UKF is better than that of LKF. However, in terms of time-consumption, UKF is 37.51% higher than LKF because of unscented transformation, as shown in Table 5, but the time-consumption is less than 0.3ms, which is acceptable.

Table 5. Target prediction results based on UKF and LKF.

View Table | View all tables in this article

Figure 12 shows the prediction results based on LKF and UKF. It can be discovered that apart from the above conclusions about the target’s centroid, the region of the predicted target based on LKF is larger than the region when there is no occlusion or partial occlusion. For the UKF-based prediction, the region enlargement phenomenon is not apparent; therefore, it can also confirm that the UKF-based prediction results are better than those of LKF.

Fig. 12. Prediction results based on LKF and UKF. (a) LKF; (b) UKF (see Visualization 3).

Download Full Size | PDF

6.5 Active visual gazing of irregular-shaped targets

To verify the generalization ability of the proposed algorithms, we perform active visual gazing on some irregular-shaped HDTs and shows the corresponding experimental results.

Figure 13 shows the active visual gazing effects on irregular-shaped HDTs. In Fig. 1(a), a posture identification line is added into the target image. Besides that the similar conclusions as in Section 6.2, we could find that posture of the target keeps changing due to rolling over time. According to formulas (12) and (13), when the irregular-shaped target partly jumps out of the self-window, the larger the boundary of the target image region in the x and y-directions, the larger the self-window size. In Fig. 13(b), during the target is free-falling, the target can be effectively gazed with UKF before unobstructed, and can be smoothly re-gazed after it appears again. In Fig. 13(c), as the occlusion rate increases, the recognized bias of the target’s centroid gradually increases, while the compensated target’s centroid can always be centered on the image.

Fig. 13. Active visual gazing effects on irregular-shaped HDTs. (a) Feature image of irregular-shaped targets with adaptive and fixed self-windows. (b) Prediction results of trajectory and pose for irregular-shaped targets based on UKF. (c) Centroid compensation effect for irregular-shaped occluded targets.

Download Full Size | PDF

7. Discussion and conclusion

A rigid body in 3D motion is considered to have 6 degrees of freedom (DOF), including 3 DOF for translation in front-back, left-right, and up-down, and 3 DOF for rotation in pitch, yaw, and roll. Among them, up-down, left-right, and roll has verified successfully in this study, then we will discuss the desired scheme in the future for other degree of freedom:

Forward-back: Our team is focusing on high-speed 3D active vision. In terms of fast focusing and zooming, we are studying high-speed liquid lenses. Compared with the traditional way of changing the distance between solid lenses, high-speed focusing and zooming can be achieved by directly adjusting the curvature of the liquid lens. It will be integrated with the proposed gazing system to achieve a genuinely complete 3D motion active vision.

Pitch and yaw: The contour of the image is visually distorted when target pitches and yaws, which leads to inaccurate centroid recognition. A practical solution is to set mark points on the target surface with silicon-carrying reflective material. The pitch and yaw angle of the target can be calculated by identifying the mark points and then corrects the centroid.

Aiming to solve the problems of visual gazing failure when gazing HDTs, this study focused on the robust optical axis control of monocular active gazing mechanism based on PTM-AVS. We proposed the adaptive self-window based on kinematic information and deduced the adaptive rate ${{\boldsymbol {\gamma} }^\textrm{T}}$, such that it could maintain the HDTs within the region of the self-window. Based on the assumption that the recognized target was located in the elliptical region, the optical axis compensation under partial occlusion was proposed. The UKF-based optical axis prediction was proposed for a highly nonlinear trajectory under complete occlusion.

Experiments analyzed the results of frame rate, self-window scale, adaptive self-window, static and dynamic optical axis compensation under partial occlusion, and UKF-based optical axis prediction under complete occlusion on the stability of PTM-AVS. The results confirmed that the proposed adaptive self-window improved the bias redundancy by 22.19% at the expense of 14.23% of the time-consumption. The static and dynamic maximum compensation error rates of optical axis under partial occlusion were 1.46% and 2.71%, respectively. The maximum prediction error rate of optical axis prediction based on the UKF was 13.88%, which was better than that of the LKF.

The proposed vision gazing method is very useful for the AVS to improve the gazing stability while ensuring real-time performance. It can be widely applied in a variety of potential applications, including in machine vision, image processing and among other vision-related areas.

Funding

China Postdoctoral Science Foundation (2021M690744); Special Fund for the Action of Guangdong Academy of Sciences to Build a National First-class Research Institution (2021GDASYL-20210102006, 2021GDASYL-20210103068, 2021GDASYL-20210103070); Guangdong Provincial Basic and Applied Basic Research Fund-Natural Science Foundation Project (General Project) (2021A1515012596).

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Wang and X. Wang, “On-line three-dimensional coordinate measurement of dynamic binocular stereo vision based on rotating camera in large FOV,” Opt. Express 29(4), 4986–5005 (2021). [CrossRef]

2. Y. Liu, P. Sun, and A. Namiki, “Target Tracking of Moving and Rotating Object by High-Speed Monocular Active Vision,” IEEE Sens. J. 20(12), 6727–6744 (2020). [CrossRef]

3. M. Hržica, R. Cupec, and I. Petrović, “Active vision for 3D indoor scene reconstruction using a 3D camera on a pan-tilt mechanism,” Adv. Robot. 35(3-4), 153–167 (2021). [CrossRef]

4. A. Bhakta, C. Hollitt, W. N. Browne, and M. Frean, “Utility function generated saccade strategies for robot active vision: a probabilistic approach,” Auton. Robots 43(4), 947–966 (2019). [CrossRef]

5. K. Okumura, H. Oku, and M. Ishikawa, “High-speed gaze controller for millisecond-order pan/tilt camera,” in Proceedings of IEEE International Conference on Robotics ans Automation (IEEE, 2011), pp.6186–6191.

6. K. Okumura, K. Yokoyama, H. Oku, and M. Ishikawa, “1 ms Auto Pan-Tilt - Video shooting technology for objects in motion based on Saccade Mirror with background subtraction,” Adv. Robot. 29(7), 457–468 (2015). [CrossRef]

7. Y. Mikawa, T. Sueishi, Y. Watanabe, and M. Ishikawa, “Projection Mapping System To A Widely Dynamic Sphere With Circumferential Markers,” in Proceedings of IEEE International Conference on Multimedia and Expo (IEEE, 2020), pp. 1–6.

8. H. Ullah, O. Zia, J. H. Kim, and K. Han, “Automatic 360 Mono-Stereo Panorama Generation Using a Cost-Effective Multi-Camera System,” Sensors 20(11), 3097 (2020). [CrossRef]

9. I. Ishii and M. Ishikawa, “Self windowing for high-speed vision,” Syst. Comp. Jpn. 32(10), 51–58 (2001). [CrossRef]

10. Y. Nakabo, M. Ishikawa, H. Toyoda, and S. Mizuno, “1Ms Column Parallel Vision System and It’S Application of High Speed Target Tracking,” in Proceedings of IEEE International Conference on Robotics ans Automation (IEEE, 2000), pp. 650–655.

11. Y. Ariki, S. Kubota, and M. Kumano, “Automatic production system of soccer sports video by digital camera work based on situation recognition,” in Proceedings of IEEE International Symposium on Multimedia (IEEE, 2006), pp. 851–860.

12. O. F. Ince and J. S. Kim, “Tima slam: Tracking independently and mapping altogether for an uncalibrated multi-camera system,” Sensors 21(2), C1 (2021). [CrossRef]

13. R. Cao, L. Wang, and J. Fu, “Adaptive self-window-based optical information acquisition method for high dynamic target,” in Frontiers in Optics / Laser Science, B. Lee, C. Mazzali, K. Corwin, and R. Jason Jones, (eds)., OSA Technical Digest (Optical Society of America, 2020), paper FW5A.4. [CrossRef]

14. H. Li, K. Han, X. Wang, S. He, Q. Wu, and Z. Xu, “A compact and lightweight two-dimensional gimbal for inter-satellite laser communication applications,” Opt. Express 27(17), 24060 (2019). [CrossRef]

15. X. Wang, Z. Xu, X. Gou, and L. Trajkovic, “Tracking a maneuvering target by multiple sensors using extended kalman filter with nested probabilistic-numerical linguistic information,” IEEE Trans. Fuzzy Syst. 28(2), 346–360 (2020). [CrossRef]

16. R. Wang, R. Huang, and J. Yang, “Facilitating PTZ camera auto-calibration to be noise resilient with two images,” IEEE Access 7, 155612–155624 (2019). [CrossRef]

17. H. Yong, J. Huang, W. Xiang, X. Hua, and L. Zhang, “Panoramic Background Image Generation for PTZ Cameras,” IEEE Trans. Image Process. 28(7), 3162–3176 (2019). [CrossRef]

18. A. Namiki, K. Shimada, Y. Kin, and I. Ishii, “Development of an active high-speed 3-D vision system,” Sensors 19(7), 1572 (2019). [CrossRef]

19. M. Ishikawa and T. Komuro, “Digital vision chips and high-speed vision systems,” in Proceedings of IEEE Symposium on VLSI Circuits, Digest of Technical Papers. (IEEE, 2001), pp. 1–4.

20. F. Willomitzer, C.-K. Yeh, V. Gupta, W. Spies, F. Schiffers, A. Katsaggelos, M. Walton, and O. Cossairt, “Hand-guided qualitative deflectometry with a mobile device,” Opt. Express 28(7), 9027–9038 (2020). [CrossRef]

21. T. Parr, M. B. Mirza, H. Cagnan, and K. J. Friston, “Dynamic Causal Modelling of Active Vision,” J. Neurosci. 39(32), 6265–6275 (2019). [CrossRef]

22. S. M. Dibyendu Kumar Das, Mouli Laha, and D. Ray, “Stable and Consistent Object Tracking: An Active Vision Approach,” in Proceedings of International Conference on Advanced Comptational ans Communication Paradigms, (Springer, 2018), pp. 299–308.

23. J. Sakakibara, J. Kita, and N. Osato, “Note: High-speed optical tracking of a flying insect,” Rev. Sci. Instrum. 83(3), 036103 (2012). [CrossRef]

24. M. Kawakita, K. Iizuka, R. Iwama, K. Takizawa, H. Kikuchi, and F. Sato, “Gain-modulated Axi-Vision Camera (high speed high-accuracy depth-mapping camera),” Opt. Express 12(22), 5336–5344 (2004). [CrossRef]

25. T. Sueishi, M. Ishii, and M. Ishikawa, “Tracking background-oriented schlieren for observing shock oscillations of transonic flying objects,” Appl. Opt. 56(13), 3789–3798 (2017). [CrossRef]

26. M. Nitta, T. Sueishi, and M. Ishikawa, “Tracking projection mosaicing by synchronized high-speed optical axis control,” in Proceedings of ACM Symposium on Virtual Reality Software ans Technology, (ACM, 2018), pp. 1–5.

27. J. Cashbaugh and C. Kitts, “Vision-Based Object Tracking Using an Optimally Positioned Cluster of Mobile Tracking Stations,” IEEE Syst. J. 12(2), 1423–1434 (2018). [CrossRef]

28. G. Huang, D. Wu, J. Luo, Y. Huang, and Y. Shen, “Retrieving the optical transmission matrix of a multimode fiber using the extended Kalman filter,” Opt. Express 28(7), 9487–9500 (2020). [CrossRef]

29. S. G. Larew and D. J. Love, “Adaptive Beam Tracking with the Unscented Kalman Filter for Millimeter Wave Communication,” IEEE Signal Process. Lett. 26(11), 1658–1662 (2019). [CrossRef]

30. C. Sun, D. Wang, and H. Lu, “Occlusion-Aware Fragment-Based Tracking with Spatial-Temporal Consistency,” IEEE Trans. Image Process. 25(8), 3814–3825 (2016). [CrossRef]

31. M. Guan, C. Wen, M. Shan, C. L. Ng, and Y. Zou, “Real-time event-triggered object tracking in the presence of model drift and occlusion,” IEEE Trans. Ind. Electron. 66(3), 2054–2065 (2019). [CrossRef]

32. Gary R. Bradski and Santa Clara, “Computer Vision Face Tracking For Use in a Perceptual User Interface,” in Proceedings of IEEE Workshop on Applications of Computer Vision, (IEEE, 1998), pp.1–15.

33. P. Sudheesh and M. Jayakumar, “Non linear tracking using unscented kalman filter,” Adv. Intell. Syst. Comput. 678, 38–46 (2018). [CrossRef]

34. X. Wang, S. Fang, X. Zhu, K. Kou, Y. Liu, and M. Jiao, “Phase unwrapping based on adaptive image in-painting of fringe patterns in measuring gear tooth flanks by laser interferometry,” Opt. Express 28(12), 17881–17897 (2020). [CrossRef]

35. R. Mohammadi Asl, Y. Shabbouei Hagh, S. Simani, and H. Handroos, “Adaptive square-root unscented Kalman filter: An experimental study of hydraulic actuator state estimation,” Mech. Syst. Signal Process. 132, 670–691 (2019). [CrossRef]

36. D. Li, J. Zhou, and Y. Liu, “Recurrent-neural-network-based unscented Kalman filter for estimating and compensating the random drift of MEMS gyroscopes in real time,” Mech. Syst. Signal Process. 147, 107057 (2021). [CrossRef]

	Stacked-stage-based AVS	Multi-camera-based AVS	Pan-tilt-mirror-based AVS
Response Time	≥50ms	≤5ms	≤5ms
Field of View	360°in horizontal direction, −15°−90° in vertical direction.	Inversely proportional to the image resolution.	±60°in horizontal direction, ±60°in vertical direction.
Resolution	Related to camera resolution.	Inversely proportional to the field of view.	Related to camera resolution.
Installation Cost	Low	High, Once the environment changes, you need to re-adjust the position of the camera and light source.	Low
Calibration Cost	Low	High, Once the environment changes, it needs to be re-calibrated.	Low
Environmental Adaptability	Strong	Can not be applied in dynamic environments.	Strong

Variable	Default Value
Exposure time/us	2000
Capture Frame Rate/fps	250
Self-window scale	4
Target bouncing times	5
Acquisition frame amount	6000

	Adaptive Self-window					Fixed Self-window
Maximum displacement /px	368	366	371	368	376	357.5	355	358	357	359
Minimum displacement /px	286.5	282.5	288.5	288	285	291	291	289.5	288	284.5
Average bias redundancy /px	83.7					68.5					22.19%
Average time-consumption/us	202.3					177.1					14.23%

	Static Compensation		Dynamic Compensation
	X direction	Y direction	X direction	Y direction
Ideal centroid /px	240	320	240	320
Maximum bias /px	275.25	357.40	274.01	362.10
Average compensated centroid /px	240.22	320.17	240.14	320.47
RMSE of compensated centroid /px	0.68	0.56	1.9228	1.8718
Average compensation error rate /%	0.09	0.05	0.06	0.15
Maximum compensation error /px	3.5	3.5	6.5	7.5
Maximum compensation error rate/%	1.46	1.09	2.71	2.34

	LKF	UKF
Average predict error /px	−0.28	−0.20
Average predict error rate/%	0.08	0.05
Maximum predict error /px	42.58	22.61
Maximum predict error rate/%	13.88	4.48
Predict RMSE/px	3.62	3.31
Average time-consumption/us	213.17	293.14

Robust optical axis control of monocular active gazing based on pan-tilt mirrors for high dynamic targets

Abstract

1. Introduction

2. Related works and motivation

2.1 Related works

2.2 Research motivation

3. Adaptive self-window

4. Minimum-envelope-ellipse-based optical axis compensation

5. Unscented Kalman filter-based optical axis prediction

6. Experimental analysis

6.1 Frame rate and self-window scale

6.2 Adaptive self-window and fixed self-window

6.3 Optical axis compensation under partial occlusion

6.4 Optical axis prediction under complete occlusion

6.5 Active visual gazing of irregular-shaped targets

7. Discussion and conclusion

Funding

Disclosures

Data Availability

References

Supplementary Material (3)

Data Availability

Cited By

Figures (13)

Tables (5)

Equations (39)

Optics Express

Name	Description
Visualization 1	adaptive self-window and fixed self-windows
Visualization 2	Optical axis compensation results for occluded targets
Visualization 3	Prediction results based on LKF and UKF