Approach for accurate calibration of RGB-D cameras using spheres

Hongyan Liu; Hongyan Liu; Hongyan Liu; Daokui Qu; Daokui Qu; Daokui Qu; Fang Xu; Fang Xu; Fang Xu; Fengshan Zou; Jilai Song; Kai Jia; Kai Jia; Kai Jia

doi:10.1364/OE.392414

1. Introduction

RGB-D cameras have been widely used in many vision applications, such as 3D environment reconstruction [1,2], simultaneous localization and mapping [3,4], object recognition and tracking [5,6]. A general RGB-D camera consists of two parts: a RGB camera and a depth camera that are rigidly coupled together (e.g., Microsoft Kinect, ASUS Xtion, and Intel RealSense). Usually, RGB cameras provide color information, while depth cameras based on structured light or time-of-flight technology provide depth information. The combination of the two cameras can provide an immediate reconstruction of the surrounding 3D environment. RGB-D cameras are usually calibrated at the factory, and the calibration parameters are stored in the internal memory. However, the accuracy of the factory calibration cannot meet the high-precision requirements of robot vision tasks and may only be suitable for playing virtual games. In order to improve the accuracy of RGB-D cameras for other applications, an appropriate method for their accurate calibration is urgently needed. For RGB-D cameras, the intrinsic parameters (focal length, principal point, and lens distortion) of the color and depth cameras and the extrinsic parameters (relative position and orientation) between the cameras must be calibrated. In addition, the deviation and uncertainty of the depth measurement increase with the measurement distance, and the depth measurement error and uncertainty vary across different pixel positions [7–9]. Therefore, the proposed calibration framework for RGB-D cameras is designed to determine the intrinsic and extrinsic parameters and to correct depth measurement errors.

Most early work [10–12] related to RGB-D camera calibration used standard RGB camera calibration techniques. To better detect the corners of a checkerboard with an IR camera, it is necessary to block the infrared projector and provide an external light source (e.g. an incandescent lamp) to illuminate the calibration plate. Jung et al. [13] used a large wooden board with dozens of rings as the calibration piece, and manually matched the corresponding ring centers in the RGB and depth images. Teichman et al. [14] proposed a calibration method based on the SLAM framework. Their method is reliable only for close-range measurements (within 2 m), and parameter optimization may take several hours. Di Cicco et al. [15] used regression methods to solve the calibration parameters on a sufficiently large plane. The calibration of the extrinsic parameters was seldom considered. Herrera et al. [16] proposed a calibration method for optimal color and depth camera pairs based on the hypothesis principle. The algorithm considered both color and depth characteristics and improved the calibration accuracy of the camera pair as a whole. Raposo et al. [17] proposed methods that improved the work of Herrera et al. Their method required less than 1/6 of the original input data and ran in 1/30 of the time while resulting in a similar calibration accuracy. However, this method requires expert users to play an active role in selecting the corresponding features in the depth image. Due to the noisy nature of the depth image, the corners and edges of the checkerboard are not clearly defined and cannot be selected accurately. The possible errors in manual feature selection will be propagated through the calibration algorithm, resulting in a decrease in the overall calibration accuracy. Basso et al. [9] used a polynomial to model the depth measurement error. The polynomial was required to be known in advance. Staranowicz et al. [18] proposed using a spherical object to calibrate RGB-D cameras but did not take depth measurement errors into account. Liu et al. [19] proposed a method of calibrating the camera using the grid spherical target (GST) latitude and longitude circles and the intersection of two circles. This method can calibrate RGB cameras but may not be suitable for RGB-D cameras. Deetjen et al. [20] proposed an automatic calibration method for multi-camera structured light projector systems that can be used in multi-viewpoint 3D curved surface reconstruction based on structured light. An et al. [21] proposed a geometric calibration method to estimate the extrinsic parameters of laser and RGB-D camera systems by using the corresponding constraint relationship between 3D-2D and 3D-3D points. Wang et al. [22] proposed an algorithm to remove noise and radial lens distortion by aligning the captured mode with the ideal mode.

In this paper, a novel and accurate RGB-D camera calibration method based on a sphere is proposed. The projection information (ellipse center and quadratic curve) of the sphere on the RGB and depth images is first extracted, and outliers are automatically excluded. A closed solution of the initial parameters is then solved for using the projection information, and a depth measurement error correction strategy based on key points applied. Finally, all the parameters are optimized within a global optimization framework. Unlike a checkerboard, a sphere can be observed over 360 degree in all directions and allows the calibration to be well extended to multiple RGB-D cameras. In particular, when two RGB-D cameras are positioned face-to-face to each other, a single checkerboard-based calibration method may not be applicable, while a sphere-based calibration method may give satisfactory results.

2. Mathematical analysis

2.1 General model of RGB-D camera

As shown in Fig. 1, our setup consists of a RGB-D camera and a sphere. Let a general 3D point in the depth camera coordinate system $\left \{ D \right \}$ be ${}^{D}\mathbf {x}={{\left [ {{x}^{D}},{{y}^{D}},{{z}^{D}} \right ]}^{T}}$, and its projection point on the RGB coordinate system $\left \{ R \right \}$ image plane be ${}^R\mathbf {p} = {\left [ {{u^R},{v^R}} \right ]^T}$. The image coordinates are obtained in the following three-step process.

Fig. 1. The imaging model of the RGB-D camera, where $({}^{D}{{\mathbf {o}}_{e}},{}^{R}{{\mathbf {o}}_{e}})$ is the ellipse center, $({}^{D}{{\mathbf {o}}_{s}},{}^{R}{{\mathbf {o}}_{s}})$ the spherical center projection center, $\left \{ R \right \}$ the RGB coordinate system, $\left \{ D \right \}$ the depth camera coordinate system, ${}_{D}^{R}\mathbf {T}=\left ( {}_{D}^{R}\mathbf {R},{}_{D}^{R}\mathbf {t} \right )$ the rotation and translation matrix between the two coordinate systems, ${}^{D}\mathbf {x}$ the 3D point in the depth camera coordinate system, ${}^{R}\mathbf {C},{}^{D}\mathbf {C}$ represent the projected ellipse of the sphere on the two coordinate systems, ${}^{R}\mathbf {p},{}^{D}\mathbf {p}$ the projected points of the sphere on the two image planes, and ${{{}^{D}\mathbf {O}}_{s}}$ and ${{r}_{s}}$ the center and radius of the sphere, respectively.

Download Full Size | PDF

The 3D points ${}^{D}\mathbf {x}$ in the $\left \{ D \right \}$ coordinate system are first transformed into the $\left \{ R \right \}$ coordinate system via

(1)$${}^R\mathbf{x} = {}_D^R\mathbf{T}{}^D\mathbf{x}.$$

The 3D point ${}^{R}\mathbf {x}={{\left [ {{x}^{R}},{{y}^{R}},{{z}^{R}} \right ]}^{T}}$ in the $\left \{ R \right \}$ coordinate system is then projected onto the RGB image plane using the camera intrinsic model proposed by Heikkila et al. [23]. The model can be described as a pinhole model [24] with radial and tangential distortion correction. ${}^{R}\mathbf {x}$ is then normalized to ${}^{R}{{\mathbf {x}}_{n}}={{\left [ {{x}_{n}},{{y}_{n}} \right ]}^{T}}={{\left [ {}^{{{x}^{R}}}/{}_{{{z}^{R}}},{}^{{{y}^{R}}}/{}_{{{z}^{R}}} \right ]}^{T}}$. The distortion correction is represented as

(2)$${}^{R}{{\mathbf{x}}_{t}}=\left[ \begin{array}{l} 2{{p}_{1}}{{x}_{n}}{{y}_{n}}+{{p}_{2}}\left( {{r}^{2}}+2x_{n}^{2} \right) \\ {{p}_{1}}\left( {{r}^{2}}+2y_{n}^{2} \right)+2{{p}_{2}}{{x}_{n}}{{y}_{n}} \\ \end{array} \right],$$

(3)$${}^{R}{{\mathbf{x}}_{r}}=\left( 1+{{k}_{1}}{{r}^{2}}+{{k}_{2}}{{r}^{4}}+{{k}_{5}}{{r}^{6}} \right){}^{R}{{\mathbf{x}}_{n}}+{}^{R}{{\mathbf{x}}_{t}},$$

where ${{r}^{2}}=x_{n}^{2}+y_{n}^{2}$ and $\mathbf {d}{{\mathbf {k}}_{R}}={{\left [ {{k}_{1}},{{k}_{2}},{{p}_{1}},{{p}_{2}},{{k}_{5}} \right ]}^{T}}$ are vectors composed of distortion coefficients. The obtained image coordinates are finally obtained as

(4)$${}^{R}{{s}^{R}}\mathbf{p}={}^{R}\mathbf{K}{}^{R}{{\mathbf{x}}_{r}},{{\textrm{ }}^{R}}\mathbf{K}=\left[ \begin{matrix} {}^{R}{{f}_{x}} & {}^{R}s & {}^{R}{{u}_{0}} \\ 0 & {}^{R}{{f}_{y}} & {}^{R}{{v}_{0}} \\ 0 & 0 & 1 \\ \end{matrix} \right],$$

where ${}^{R}s$ is an unknown scale factor, $\left ( {}^{R}{{f}_{x}},\textrm { }{}^{R}{{f}_{y}} \right )$ the focal lengths and $\left ( {}^{R}{{u}_{0}},\textrm { }{}^{R}{{v}_{0}} \right )$ the principal point.

It is worth noting that the intrinsic model of the depth camera is similar to that of the color camera. However, the depth camera distortion model differs from that of the color camera. The distortion in the latter is defined by the forward projection model (world to image), while the distortion in the former is expressed by the backward projection model (image to world). Previous research [23,25] has shown that the geometric distortion model of the depth camera can be obtained by switching the roles of ${{\mathbf {x}}_{n}}$ and ${{\mathbf {x}}_{r}}$ in Eq. (2) and Eq. (3).

The depth information obtained by the RGB-D camera is stored in the depth image. Each pixel in the depth image contains the pixel position and distance information along the $z$ direction denoted as ${}^{D}\mathbf {p}={{\left [ {{u}^{D}},{{v}^{D}},{{z}^{D}} \right ]}^{T}}$. Its unique corresponding 3D point coordinate ${}^{D}\mathbf {x}$ can also be retrieved via

(5)$${}^{D}\tilde{\mathbf{x}}=\left[ \begin{matrix} {}^{D}{{\mathbf{K}}^{-1}} & {{\mathbf{0}}_{3\times 1}} \\ {{\mathbf{0}}_{1\times 3}} & 1 \\ \end{matrix} \right]{}^{D}\tilde{\mathbf{p}},{{\textrm{ }}^{D}}\mathbf{K}=\left[ \begin{matrix} {}^{D}{{f}_{x}} & {}^{D}s & {}^{D}{{u}_{0}} \\ 0 & {}^{D}{{f}_{y}} & {}^{D}{{v}_{0}} \\ 0 & 0 & 1 \\ \end{matrix} \right],$$

where the tilde marks the vector as a homogeneous coordinate and $^{D}\tilde {\mathbf {p}}=\left [ {{u}^{D}}{{z}^{D}},{{v}^{D}}{{z}^{D}},{{z}^{D}},1 \right ]$.

2.2 Definition of the spherical projection quadratic curve

Defining that the center of a sphere is ${}^{D}{{\mathbf {O}}_{s}}$ and the radius is ${{r}_{s}}$, the sphere can be algebraically represented by a quadric surface [24] as

(6)$${}^{D}\mathbf{Q}=\left[ \begin{matrix} {{\mathbf{I}}_{3}} & -{}^{D}{{\mathbf{O}}_{s}} \\ -{}^{D}\mathbf{O}_{s}^{T} & {}^{D}\mathbf{O}_{s}^{T}{}^{D}{{\mathbf{O}}_{s}}-r_{s}^{2} \\ \end{matrix} \right],$$

where $\mathbf {I}$ is the identity matrix.

Generally, the projection of a spherical surface on the camera image plane is an ellipse. The quadratic curve matrix of the sphere projected on the RGB image plane, viewed from the depth coordinate system, is denoted as ${}^{R}{{\mathbf {C}}^{*}}$. The conversion relationship is

(7)$${}^{R}{{\mathbf{C}}^{*}}={}_{D}^{R}\mathbf{M}{}^{D}{{\mathbf{Q}}^{*}}{}_{D}^{R}{{\mathbf{M}}^{T}},$$

where ${}_{D}^{R}\mathbf {M}={}^{R}\mathbf {K}{}_{D}^{R}\mathbf {T}$ , ${}^{D}{{\mathbf {Q}}^{*}}$ are the dual conic curves of ${}^{D}\mathbf {Q}$ [24]. The elliptic quadratic curve matrix detected directly from the RGB image plane is ${}^{R}\mathbf {C}$. The quadratic curves ${}^{R}\mathbf {C}$ and ${}^{R}{{\mathbf {C}}^{*}}$ are important constraints in the global optimization phase of the calibration framework, which is crucial for improving the overall calibration accuracy of the RGB-D camera.

2.3 Convert ellipse center to projection sphere center

Usually, the centers $({}^{D}{{\mathbf {o}}_{e}},{}^{R}{{\mathbf {o}}_{e}})$ of the two ellipses do not coincide with the projection center $({}^{D}{{\mathbf {o}}_{s}},{}^{R}{{\mathbf {o}}_{s}})$ of the sphere, as shown in Fig. 1. The center of the ellipse on the depth image is very close to the center of the sphere projection with an error of less than 1 pixel [26]. Therefore, in our work, the center of the sphere projection on the depth image is approximated by the center of the ellipse, and the closed expression provided by [24] is used to convert the center of the ellipse on the RGB image plane to the center of the sphere projection:

(8)$${}^{D}{{\mathbf{o}}_{s}}\approx {}^{D}{{\mathbf{o}}_{e}},$$

(9)$${}^{R}{{\mathbf{o}}_{s}}={}^{R}{{\mathbf{o}}_{e}}(1-\gamma )+{{\left[ {}^{R}{{u}_{0}},{}^{R}{{v}_{0}} \right]}^{T}}\gamma,$$

where $\gamma ={{\left ( {{r}_{s}}/{{{}^{D}\mathbf {O}}_{s}}(3) \right )}^{2}}$ and ${{{}^{D}\mathbf {O}}_{s}}(3)$ is the third element of the sphere center vector ${{{}^{D}\mathbf {O}}_{s}}$, and the sphere radius ${{r}_{s}}$ is known.

3. Calibration approach

3.1 Overview

Multiple RGB-depth image pairs are input into the calibration algorithm shown in Fig. 2 to accurately calibrate the RGB-D camera. In this section, the image feature extraction methods are introduced in detail, and the mathematical principles of the key components of the calibration framework (initial parameter estimation, depth error correction, global optimization) are explained.

Fig. 2. A framework diagram of RGB-D camera calibration method based on sphere. The dotted line indicates that the two line segments do not intersect.

Download Full Size | PDF

3.2 Feature-extraction process

In the feature extraction phase, ellipse detection on the RGB image is carried out automatically at high speed without the need to manually select spherical areas. Based on the RANSAC method [27], outliers on the sphere point cloud in the $\left \{ D \right \}$ coordinate system are excluded to enhance the robustness of the detection system.

For the ellipse fitting of the RGB images, a high-efficiency ellipse detector for arc-supported line segments [28] is used to detect ellipses in the RGB images. The detector simplifies the complex expressions of the curves in the RGB images while retaining the general properties of the convexity and polarity of the elliptic curves. By counting pixels, arc support line segments that potentially belong to a common ellipse are iteratively and robustly connected to form an arc support group. The group with the highest significance is selected to fit an ellipse, and local and global initial ellipse sets are generated from all the valid paired arc support groups. The superposition principle of fast ellipse fitting is used to improve the fitting speed. Convex ellipse candidates are then obtained through the hierarchical clustering of the 5D initial ellipse set parameter space.

Because of the highly irregular depth values of the sphere edges in the depth image, it is not feasible to use the same ellipse detection method as in the RGB image to detect the center of the ellipse in the depth image. In this study, we first use the bounding box to determine the motion area of the elliptical pixels in the depth image, and the upper and lower depth values to extract the sphere projection area. The pre-selected depth camera intrinsic parameters $^{D}\mathbf {K}$ will then project the sphere pixel point $^{D}\tilde {\mathbf {p}}$ to 3D space. The RANSAC algorithm is used to fit the sphere while excluding outliers. Finally, the center of the fitted sphere is re-projected onto the depth image.

3.3 Initial calibration

In the initial calibration phase, the initial values of the RGB-D camera parameters are estimated. The method in [29], which has mature implementations in OpenCV and MATLAB, is used to estimate the initial parameters of the RGB camera. The initial values of the intrinsic parameter ${}^{D}\mathbf {K}$ of the depth camera and the extrinsic parameter ${}_{D}^{R}\mathbf {T}$ between the two cameras are solved as follows:

Equations (8) and (9) are first used to convert the obtained ellipse center $\left ( {}^{D}{{\mathbf {o}}_{{{e}_{i}}}},{}^{R}{{\mathbf {o}}_{{{e}_{i}}}} \right )$ to the sphere projection center $\left ( {}^{D}{{\mathbf {o}}_{{{s}_{i}}}},{}^{R}{{\mathbf {o}}_{{{s}_{i}}}} \right )$ that is used as the input of the initial parameter estimation. According to Eq. (4) and Eq. (5) in Section 2.1, the following relations are satisfied:

(10)$${}^{D}{{\tilde{\mathbf{O}}}_{{{s}_{i}}}}=\left[ \begin{matrix} {}^{D}{{\mathbf{K}}^{-1}} & {{\mathbf{0}}_{3\times 1}} \\ {{\mathbf{0}}_{1\times 3}} & 1 \\ \end{matrix} \right]{}^{D}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}},$$

(11)$${}^{R}\lambda {}^{R}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}={}^{R}\mathbf{K}{}_{D}^{R}\mathbf{T}{}^{D}{{\tilde{\mathbf{O}}}_{{{s}_{i}}}}.$$

Multiplying both sides of Eq. (11) by matrix ${}^{R}{{\mathbf {K}}^{-1}}$, we obtain

(12)$${}^{R}\lambda {}^{R}{{\mathbf{K}}^{-1}}{}^{R}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}={}_{D}^{R}\mathbf{T}{}^{D}{{\tilde{\mathbf{O}}}_{{{s}_{i}}}}.$$

Multiplying both sides of Eq. (12) using the skew-symmetric matrix related to ${}^{R}{{\mathbf {K}}^{-1}}{}^{R}{{\tilde {\mathbf {o}}}_{{{s}_{i}}}}$, the left side of Eq. (12) becomes $\left ( {}^{R}{{\mathbf {K}}^{-1}}{}^{R}{{{\tilde {\mathbf {o}}}}_{{{s}_{i}}}} \right )\times \left ( {}^{R}{{\mathbf {K}}^{-1}}{}^{R}{{{\tilde {\mathbf {o}}}}_{{{s}_{i}}}} \right )=\mathbf {0}$, which eliminates the effect of unknown scale factors, and the right side of Eq. (12) becomes $\left ( {}^{R}{{\mathbf {K}}^{-1}}{}^{R}{{{\tilde {\mathbf {o}}}}_{{{s}_{i}}}} \right )\times {}_{D}^{R}\mathbf {T}{}^{D}{{\tilde {\mathbf {O}}}_{{{s}_{i}}}}={{\left ( {}^{R}{{\mathbf {K}}^{-1}}{}^{R}{{{\tilde {\mathbf {o}}}}_{{{s}_{i}}}} \right )}_{\times }}{}_{D}^{R}\mathbf {H}{}^{D}{{\tilde {\mathbf {o}}}_{{{s}_{i}}}}$. We thus obtain the constraint

(13)$${{\left( {}^{R}{{\mathbf{K}}^{-1}}{}^{R}{{{\tilde{\mathbf{o}}}}_{{{s}_{i}}}} \right)}_{\times }}{}_{D}^{R}\mathbf{H}{}^{D}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}={{\mathbf{0}}_{3\times 1}},$$

where ${}_{D}^{R}\mathbf {H}={}_{D}^{R}\mathbf {T}\left [ \begin {matrix} {}^{D}{{\mathbf {K}}^{-1}} & {{\mathbf {0}}_{3\times 1}} \\ {{\mathbf {0}}_{1\times 3}} & 1 \\ \end {matrix} \right ]$.

In Eq. (13), the unknown ${}_{D}^{R}\mathbf {H}$ is a matrix with 3 rows and 4 columns. Generally, converting a matrix equation to a general linear equation can facilitate its solution.

We define the linear mapping of matrix space to vector space as $\sigma :\textrm { }{{\mathbb {R}}^{m\times n}}\to {{\mathbb {R}}^{mn}}$, which transposes each row vector of the matrix into columns and arranges them into a column vector in order, i.e.,

(14)$$\sigma ({{{{[}^{R}}{{\mathbf{K}}^{-1}}{}^{R}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}]}_{\times }}{}_{D}^{R}\mathbf{H}{}^{D}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}})=[{{{{[}^{R}}{{\mathbf{K}}^{-1}}{}^{R}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}]}_{\times }}\otimes {}^{D}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}]\sigma ({}_{D}^{R}\mathbf{H}),$$

Then,

(15)$$[{{{{[}^{R}}{{\mathbf{K}}^{-1}}{}^{R}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}]}_{\times }}\otimes {}^{D}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}]\sigma ({}_{D}^{R}\mathbf{H})={{\mathbf{0}}_{3\times 1}},$$

where $\otimes$ represents the Kronecker product.

Finally, Eq. (15) is re-expressed as two linearly independent homogeneous systems

(16)$$\mathbf{b}_{i}^{T}\mathbf{h}={{\mathbf{0}}_{2\times 1}},$$

where

(17)$$\mathbf{b}_{i}^{T}={{{{[}^{R}}{{\mathbf{K}}^{-1}}{}^{R}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}]}_{\times }}\otimes {}^{D}{{\tilde{\mathbf{o}}}_{{{s}_{i}}}}=\left[ \begin{matrix} {{\mathbf{0}}_{1\times 4}} & -{}^{D}{{{\tilde{\mathbf{o}}}}^{T}}_{{{s}_{i}}} & \psi (2){}^{D}{{{\tilde{\mathbf{o}}}}_{{{s}_{i}}}}^{T} \\ {}^{D}{{{\tilde{\mathbf{o}}}}^{T}} & {{\mathbf{0}}_{1\times 4}} & -\psi (1){}^{D}{{{\tilde{\mathbf{o}}}}_{{{s}_{i}}}}^{T} \\ \end{matrix} \right],$$

(18)$$\mathbf{h}={{\left[ {{H}_{11}},{{H}_{12}}\textrm{ },\ldots,{{H}_{34}} \right]}^{T}}\in {{\mathbb{R}}^{12}},$$

in which $\psi (n)$ is the nth (n=1,2) component of the vector $^{R}{{\mathbf {K}}^{-1}}{}^{R}{{\tilde {\mathbf {o}}}_{{{s}_{i}}}}$.

According to Eq. (18), the vector $\mathbf {h}$ contains 12 elements. Dividing all the elements in $\mathbf {h}$ by ${{H}_{34}}$, the number of homogeneous equations required to solve all the variables in $\mathbf {h}$ is 11. It can be seen from the derived $\mathbf {b}_{i}^{T}$ that each pair of $({}^{D}{{\tilde {\mathbf {o}}}_{{{s}_{i}}}},{}^{R}{{\tilde {\mathbf {o}}}_{{{s}_{i}}}})$ can provide two linearly independent constraint equations, which means that at least $N\left ( N\ge 6 \right )$ RGB-Depth image pairs are required. The linear system can finally be obtained as

(19)$$\mathbf{Bh}={{\mathbf{0}}_{2N\times 1}},$$

where $\mathbf {B}$ is a $2N*12$ matrix.

The singular value decomposition (SVD) of ${{\mathbf {B}}^{T}}\mathbf {B}$ can be used to obtain the unique non-zero solution of the homogeneous Eq. (19) as the eigenvector corresponding to the smallest singular value under the condition $\left \| \mathbf {h} \right \|=1$.

Once ${}_{D}^{R}\mathbf {H}$ is estimated, ${}^{D}\mathbf {K}$ and ${}_{D}^{R}\mathbf {T}$ can be calculated using QR decomposition (function qr in Matlab). It is worth noting that ${}_{D}^{R}\mathbf {H}$ is estimated under the framework of the RANSAC algorithm, which can effectively exclude outliers (blurred frames). Specifically, input all $\left ( {}^{D}{{\mathbf {o}}_{{{s}_{i}}}},{}^{R}{{\mathbf {o}}_{{{s}_{i}}}} \right )$ pairs and initial $^{R}\mathbf {K}$ into the RANSAC algorithm, randomly select a set of six points in each iteration to estimate calibration parameters, and use all data points as the current consensus voting process. In particular, the reprojection error threshold of the sphere center is set (e.g.,5 pixels). When the reprojection error of the sphere center is greater than the threshold, it is regarded as an outlier. Finally, the initial estimated parameters can ensure that the sphere center reprojection error of all input image frames is within the threshold.

3.4 Depth error correction

In this study, to improve the depth measurement accuracy, spheres observed in the RGB coordinate system are used to provide constraints in the depth measurement. Assuming that the center of the sphere coincides with the origin of the world coordinate system, the sphere diameter circle is defined as the maximum section circle. The proposed depth error correction method consists of three parts: the selection of key points on the maximum section circle and the RGB image plane, the 3D position estimation of key points in the RGB coordinate system, and the depth error correction strategy.

The key points are first selected on the maximum section circle. We start from the starting point on the circle, and divide the circle into 16 segments with equal arc lengths in the counterclockwise direction, e.g., $\left ({{{\overset {\scriptscriptstyle \frown }{L}}}_{12}}={{{\overset {\scriptscriptstyle \frown }{L}}}_{25}} \right )$. The center angle $\alpha$ subtended by each segment length is the same, and 16 key points are generated on the edge of the section circle (red dots). The angle between the axis ${{X}_{w}}$ and the line connecting the key point of the nth(n=1,2,…16) edge point to the center of the sphere is ${{\theta }_{n}}=n*\alpha$. We can easily calculate the three-dimensional space coordinate ${{\mathbf {P}}_{nw}}$ of the key point on the maximum section circle edge of the sphere by using three variables: the center of the sphere ${{O}_{s}}$, the radius ${{r}_{s}}$, and the corresponding angle ${{\theta }_{n}}$. Without loss of generality, we assume the maximum section circle is on $Z = 0$ of the world coordinate system. The key points surrounded by the edge line are linearly combined with the points on the edge, such as $x\_{{P}_{20}}=x\_{{P}_{2}},y\_{{P}_{20}}=y\_{{P}_{17}}$, as shown in Fig. 3(a). Finally, the 3D key points ${{\mathbf {P}}_{w}}$ are obtained.

Fig. 3. Key points and their projection process. (a) Three-dimensional key points on the maximum section circle. (b) Key points on the ellipse of the RGB image plane. (c) Projection of key points in RGB coordinate system.

Download Full Size | PDF

The projection points on the RGB image of the key points on the circular edge of the maximum section circle are also approximately uniformly distributed. Therefore, the elliptic quadratic curve detected on the RGB image is divided into 16 equal parts to generate uniformly distributed edge key points ${{\mathbf {p}}_{nr}}$. The other key points on the ellipse plane are obtained by linearly combining the edge points, namely, $x\_{{p}_{20}}=x\_{{p}_{2}},y\_{{p}_{20}}=y\_{{p}_{17}}$. Finally, we obtain the 2D key points ${{\mathbf {p}}_{r}}$ that correspond one-to-one with the 3D key points, as shown in Fig. 3(b).

The coordinates of the key points ${{\mathbf {P}}_{w}}$ in the RGB coordinate system can now be estimated. As shown in Fig. 3(c) and Fig. 4(b), given the 3D key point $\mathbf {P}_{w}$, the corresponding 2D key point $\mathbf {p}_{r}$, RGB camera matrix ${}^{R}\mathbf {K}$, and distortion coefficient $\mathbf {d}{{\mathbf {k}}_{R}}$, the transformation matrix ${}_{W}^{R}\mathbf {T}$ between the world coordinate system and the RGB camera coordinate system is estimated using Levenberg Marquardt’s iterative optimization method (function solvePnP in OpenCV [30]). The 3D coordinates of the 41 key points on the maximum section circle of the sphere in the RGB camera coordinate system are thus obtained and recorded as ${{\mathbf {P}}_{R}}$.

Fig. 4. Depth error correction framework. (a) Key points (purple point) of the maximum section circle is projected on the plane ${{\pi }_{d}}$ along the line of sight. (b) Principle flow chart of correction framework. (c) The process of obtaining a calibrated sphere center using corrected key points.

Download Full Size | PDF

The error correction strategy is shown in Fig. 4(b). First, 41 key points in the coordinate system $\left \{ R \right \}$ are transformed into the coordinate system $\left \{ D \right \}$ via ${}_{D}^{R}{{\mathbf {T}}^{-1}}$ and denoted as ${{\mathbf {P}}_{D}}$. Using the intrinsic parameters ${}^{D}\mathbf {K}$ of the depth sensor and the distortion coefficient $\mathbf {d}{{\mathbf {k}}_{D}}$, the spherical area in the depth image is then back-projected to the 3D space, and the spherical center ${}^{D}\mathbf {O}_{s}^{*}$ is estimated. Using the spherical center ${}^{D}\mathbf {O}_{s}^{*}$ and the space vector $\overrightarrow {{{\mathbf {O}}_{D}}{}^{D}\mathbf {O}_{s}^{*}}$, the space plane ${{\pi }_{d}}$ can be uniquely determined. ${{\mathbf {P}}_{D}}$ is projected onto the space plane ${{\pi }_{d}}$ along the line of sight of the depth sensor to form 41 new key points ${{\mathbf {P}}_{ND}}$, as shown in Fig. 4(a).

By connecting each new key point ${{\mathbf {P}}_{ND}}$ to the origin ${{\mathbf {O}}_{D}}$ of the $\left \{ D \right \}$ coordinate system, the direction vector of each straight line can be obtained. These direction vectors along with the sphere radius as the distance constraint are used to obtain 41 spherical points, denoted as ${{\mathbf {P}}_{S}}$, as shown by the blue point in Fig. 4(c). The linear least squares method is used to fit the 41 spherical points to obtain the spherical center ${}^{D}\mathbf {O}_{s\_new}^{*}$. We believe that the newly obtained spherical center has more accurate depth than the spherical center obtained by directly fitting the original point cloud. Through applying the correction strategy described above to all the input RGB-Depth image pairs, the measurement error of the depth sensor can be corrected.

Accurate extrinsic parameters between the two sensors are required for the depth error correction process. Unfortunately, the extrinsic parameters obtained in the initial stage are only rough estimates. To solve this problem, a global method for simultaneously calibrating intrinsic and extrinsic parameters and depth measurement errors is proposed.

3.5 Global nonlinear optimization

The nonlinear global optimization consists of two important parts: minimizing the spherical center reprojection errors and minimizing the errors between the elliptic quadratic curves ${}^{R}{{\mathbf {C}}^{*}}$ and ${}^{R}\mathbf {C}$. To improve the accuracy of the overall calibration, Eq. (9) is used to convert the ellipse center ${}^{R}{{\tilde {\mathbf {o}}}_{e}}$ to the spherical projection center ${}^{R}{{\tilde {\mathbf {o}}}_{s}}$.

Using the pinhole camera model, the projection of the spherical center (in the $\left \{ D \right \}$ coordinate system) on the RGB image satisfies

(20)$${}^{R}\lambda {}^{R}{{\tilde{\mathbf{o}}}_{s}}^{*}={}^{R}\mathbf{K}\mathbf{d}{{\mathbf{k}}_{R}}\left\{ {}_{D}^{R}\mathbf{T}\mathbf{d}{{\mathbf{k}}_{D}}\left\{ {}^{D}\mathbf{O}_{s\_new}^{*} \right\} \right\}.$$

It is assumed that ${}^{R}{{\tilde {\mathbf {o}}}_{s}}^{*}$ follows a Gaussian distribution with the ground truth position as ${}^{R}{{\tilde {\mathbf {o}}}_{s}}$ and covariance $\mathbf {\Phi }$, that is,

(21)$${}^{R}{{\tilde{\mathbf{o}}}_{s}}^{*}\sim\mathcal{N}\left( {}^{R}{{{\tilde{\mathbf{o}}}}_{s}},\mathbf{\Phi } \right).$$

Considering the spherical center reprojection error as a constraint, the log-likelihood function can be written as

(22)$$Loss1=-\frac{1}{2M}\sum_{i=1}^{M}{\epsilon _{i}^{T}}\mathbf{\Phi }{i}^{-1}{{\epsilon }_{i}},$$

among them, in order to make each reprojection error have the same importance in the calibration process, $\mathbf {\Phi }$ is defined as the identity matrix. The reprojection error is given by

(23)$${{\epsilon }_{i}}= {}^{R}{{{\tilde{\mathbf{o}}}}_{{{s}_{i}}}}^{*}-{}^{R}{{{\tilde{\mathbf{o}}}}_{{{s}_{i}}}} = {}^{R}\mathbf{K}{}_{D}^{R}\mathbf{T}{}^{D}\mathbf{O}_{s\_ne{{w}_{i}}}^{*}-{}^{R}{{{\tilde{\mathbf{o}}}}_{{{s}_{i}}}} .$$

Another important constraint for the global optimization is the spherical projection quadratic curve itself. In our work, the log-likelihood of the spherical projection quadratic curve is given by

(24)$$Loss2=-\frac{1}{2M}\sum_{i=1}^{M}{\frac{1}{{{W}_{i}}}}\left\| {}^{R}{{\mathbf{C}}_{i}}-{}^{R}{{\mathbf{C}}_{i}}^{*} \right\|_{F}^{2},$$

where $\|\textrm { }\cdot \textrm { }{{\|}_{F}}$ represents the Frobenius norm, and ${}^{R}{{\mathbf {C}}_{i}}^{*}=_{D}^{R}{{\mathbf {M}}^{D}}\mathbf {Q}_{i}^{*R}{{\mathbf {M}}^{T}}$ is a restatement of Eq. (7). In order to reduce the influence of the increase measurement distances on the measurement accuracy of the Kinect sensor, we draw on the experience of earlier research [7,10], and let ${{W}_{i}}={{\left ( \sigma _{z}^{i} \right )}^{2}}$, which is a quadratic function of the average value of the spherical distance measured by the depth camera. The two loss functions are finally combined to obtain the maximum log-likelihood

(25)$$\underset{{}^{R}\mathbf{K},{}^{D}\mathbf{K},{}_{D}^{R}\mathbf{T},\mathbf{d}{{\mathbf{k}}_{R}},\mathbf{d}{{\mathbf{k}}_{D}}}{\mathop{\max }}\,{{\rho }_{1}}Loss1+{{\rho }_{2}}Loss2.$$

In the above, ${{\rho }_{i}}\left ( i=1,2 \right )$ is a weighting parameter introduced so that the two contributions have similar magnitudes. In addition, the objective function is a combination of 2-norm and Frobenius norm, which is a convex function [31]. The local minimum of the convex function is also the global minimum, so the global minimum can always be obtained.

Figure 5 shows the workflow of nonlinear global optimization, where the inputs are depth map points, initial calibration parameters and ${}^{R}\mathbf {C}$. In the global optimization process, ${}^{D}\mathbf {O}_{s\_new}^{*}$ is recalculated at each iteration as ${}^{D}\mathbf {K}$ and $\mathbf {d}{{\mathbf {k}}_{D}}$ are iteratively estimated. Specifically, ${}^{D}\mathbf {O}_{s\_new}^{*}$ is obtained by using the depth error correction algorithm described in Section 3.4. The global optimization process adopts the Levenberg-Marquardt optimization method to solve the nonlinear least squares problem. The whole process continues until the stopping criterion is met (e.g., the residual between the previous iteration and the current iteration satisfies the condition).

Fig. 5. Workflow diagram for the nonlinear global optimization phase.

Download Full Size | PDF

4. Experimental evaluation and results

In order to verify the effectiveness of our proposed calibration method, in this section we compared the calibration results with those from the current state-of-the-art sphere-based method proposed by Staranowicz et al. [26] and the most advanced checkerboard-based method proposed by Herrera et al. [16]. Compared with the most advanced sphere-based method, our method can not only calibrate the intrinsic and extrinsic parameters of the RGB-D camera simultaneously but can also correct the depth measurement error. Compared with the current state-of-the-art chessboard-based method, our method does not require other external high-definition cameras to observe the calibration component and does not require manual feature selection. And we show that our calibration method can provide higher calibration accuracy for RGB-D cameras based on structured light technology compared to these methods.

A Kinect RGB-D camera was selected for our experiments because it is currently the most widely used RGB-D camera based on structured light technology. The accuracy of all the camera parameters was verified experimentally and compared with the values obtained using other advanced methods. In addition, we also performed RGB-D visual odometry using the Kinect camera. The experimental results show that the motion trajectory estimated by the corrected parameters is closer to the actual trajectory than the trajectory estimated from the original parameter estimates, and that the 3D environment reconstruction benefits highly from our calibration data.

4.1 Experimental implementation process

The Kinect camera was fixed during the calibration to obtain sufficient experimental data. Two high-precision laser rangefinders were located on the left and right of the stand and aligned as much as possible with the Kinect camera, as shown in Fig. 6. The measurement values of the laser rangefinders were used as reference values for the distance measurement. During the calibration process, only the Kinect camera and spheres were required without the need for other auxiliary hardware equipment and software. In this study, a sphere was moved in front of the camera (1 to 3 meters) for a period of time (14 s, 30 PS/s), and approximately 400 pairs of RGB-depth images were obtained for the Kinect calibration.

Fig. 6. RGB-D camera calibration experiment configuration component

Download Full Size | PDF

4.2 Depth and color information registration and sphere center reprojection error

The calibration parameters were first qualitatively compared with those from Staranowicz’s sphere-based method and Herrera’s checkerboard-based method. The calibrated Kinect camera was used to obtain the depth and color information and registration of cuboid boxes and spheres. Specifically, the calibrated ${}^{D}\mathbf {K}$ was used to back-project the pixels of the depth image to 3D space, and 3D space points were transformed via ${}_{D}^{R}\mathbf {T}$ to the $\left \{ R \right \}$ coordinate system. Then, ${}^{R}\mathbf {K}$ was used to project the 3D points onto the RGB image plane for color assignment. Figure 7 shows the registration performance of the two sensors after the same Kinect camera was calibrated using different methods. It can be intuitively observed that our proposed calibration method performed better than the their methods. The their methods were directly implemented using the original code provided by authors.

Fig. 7. Experimental results of two cameras aligned. (a) Default configuration parameters. (b) Herrera’s method. (c) Staranowicz’s method and (d) Our method.

Download Full Size | PDF

A rigorous quantitative analysis method is then used to compare our method with Staranowicz’s sphere-based method where the pixel reprojection error is used as an important reference index. Figures 8(a) and 8(b) show the reprojection errors of the sphere center in the $X$ and $Y$ directions of the image, respectively. Figure 8(c) shows the sphere center reprojection error distribution. From a comparison of the experimental results, we can clearly see that the spherical center reprojection error obtained by our proposed calibration method is much smaller than the their method, mainly because the RGB camera provided important constraints during the global optimization stage. Table 1 details the averages and variances of the pixel reprojection errors.

Fig. 8. Result of reprojection error of sphere center in X, Y direction of RGB image plane and its distribution. (a) Staranowicz’s method. (b) Our method. (c) Spherical center reprojection error distribution.

Download Full Size | PDF

Table 1. Mean and Variance of Spherical Center Reprojection Error

View Table | View all tables in this article

4.3 Depth correction

In order to verify the validity of the conversion between the world and RGB camera coordinate systems and the rationality of the defined key points, we calculated the error between the 3D key point true projection points and the defined 2D key points. The specific process is described by Eq. (26) to (28). Figure 9(a) shows an exemplary projection result, and Fig. 9(b) the distribution of the average value of the projection error of the key points on each RGB image. In Fig. 9(b), the fluctuation of the mean of the reprojection errors may be caused by the large viewing angle between the sphere and the optical axis of the camera, and the mean of the reprojection errors is no more than 0.05 pixels. Specifically, the average reprojection error of key points for all RGB images is 0.0032 pixels, and the variance is 0.0221. This accuracy is acceptable, and the variance indicates that the system is very stable.

(26)$${}^{R}\mathbf{X}={}_{W}^{R}\mathbf{R}{}^{W}\mathbf{X}+{}_{W}^{R}\mathbf{t},$$

(27)$${}^{R}\mathbf{p}={}^{R}\mathbf{K}\mathbf{d}{{\mathbf{k}}_{R}}\left\{ {}^{R}\mathbf{X} \right\},$$

(28)$$error=\frac{1}{K}\sum_{k=1}^{K}{\left\| {}^{R}{{\mathbf{p}}_{k}}-{{\mathbf{p}}_{rk}} \right\|}\textrm{ }(K=41),$$

where ${}^{R}{{\mathbf {p}}_{k}}$ is the kth key point projected on the RGB image, and ${{\mathbf {p}}_{rk}}$ is the corresponding kth 2D key point.

Fig. 9. Key points re-project the experimental results. (a) First row: 2D key points (blue points); second row: 3D key points true projection points (yellow circle). (b) The distribution of the mean of the projection errors of key points on each RGB image.

Download Full Size | PDF

In order to verify the effect of depth error correction, Kinect cameras equipped with laser measuring instruments were fixed at 1 m, 2 m, and 3 m in front of flat walls. The planar point clouds obtained from each calibration result were observed from the front, side, and top. For the quantitative evaluation, a flatness error was defined by fitting each point cloud to a plane using the RANSAC-based method and then calculating the distance from all the inliers to the fitted plane. The flatness error is described by

(29)$$erro{{r}_{\textrm{planarity}}}=\sqrt{\frac{1}{S}\sum_{i=1}^{S}{\frac{A{{x}_{i}}+B{{y}_{i}}+C{{z}_{i}}+D}{\sqrt{{{A}^{2}}+{{B}^{2}}+{{C}^{2}}}}}},$$

where A, B, C, and D are the fitted plane parameters, S the number of points in the point cloud, and $\left ( {{x}_{i}},{{y}_{i}},{{z}_{i}} \right )$ the point coordinates on the point cloud.

Figure 10(a) shows the qualitative experimental results of depth error correction. It can be clearly seen that the method based on the checkerboard [16] has large measurement errors at the edges and corners of the plane point cloud and shows a trend of backward extension. However, the method based on the sphere [26] shows a trend of overall convexity in the area below the point cloud plane, and a trend of backward extension at the sides and corners. Our proposed method shows good depth measurement results, and the plane point cloud shows proper planar characteristics. The red arrow indicates that after calibrating the RGB-D camera using Staranowicz’s sphere-based and Herrera’s checkerboard-based calibration methods, respectively, the depth measurement still has problems. The green arrow indicates that our method performs well in the same location.

Fig. 10. Depth error correction results. (a) After using different methods (Herrera, Staranowicz, Our) to calibrate the RGB-D camera, observe the point cloud shape of the plane from different directions (front, side, top) at different distances (1m, 2m, 3m). (b) Planarity error at different measurement distances.

Download Full Size | PDF

The result of the quantitative evaluation based on the flatness error is shown in Fig. 10(b). Our proposed method is far superior to the other two methods, and when the measurement distance is less than 2.2 meters, the Herrera’s chessboard-based calibration method is better than the Staranowicz’s sphere-based method. However, when the measurement distance is greater than 2.2 meters, its performance is not as good as the latter method. Table 2 shows the flatness errors of each method at different distances.

Table 2. Planarity Error at Different Measurement Distances

View Table | View all tables in this article

4.4 Visual odometry use case

To further validate our method, we ran the RGB-D visual odometry system DVO-SLAM [32] on a physical robot. The system directly records two consecutive RGB-D frames with minimal photometric errors to quickly and accurately estimate the camera motion. This process strongly depends on the intrinsic and extrinsic parameters of the RGB-D camera.

In the experiments, we fixed the RGB-D camera (Kinect v1) to a mobile robot and let the robot follow a known trajectory. The DVO-SLAM system estimated the trajectory of the Kinect camera based on the raw and corrected data. Figure 11(a) shows the camera trajectories estimated based on different camera parameters as well as the actual trajectory. Figures 11(b) and 11(c) show the reconstructed environmental point cloud. It can be seen from Fig. 11(a) that the estimated camera trajectory after parameter correction is closer to the actual trajectory, but there are still deviations. The main reason is that the robot could only obtain sequential frames with few visual features when turning, which led to drift accumulation. When the original parameters were used, the camera trajectory estimated by the DVO system shows a divergent trend, which is mainly due to the proportional drift effect. Table 3 shows the maximum, minimum, and root mean square error (RMSE) of the estimated motion trajectory drift. From Fig. 11(b) and Fig. 11(c), we also find that the quality of the point cloud reconstructed during the robot movement has been greatly improved.

Fig. 11. RGB-D visual odometry experiment results: (a) Top view of the trajectory. The starting position is at the circle in the figure. (b) The point cloud of using original parameters. (c) The point cloud of using corrected parameters.

Download Full Size | PDF

Table 3. Error Comparison of Estimated Motion Trajectory Drift (in Meters)

View Table | View all tables in this article

5. Conclusions

In this paper, we proposed a robust and accurate RGB-D camera calibration method based on spheres. A fast and accurate feature extraction method is first used to extract projection information of spheres on the RGB and depth images and automatically exclude outliers. A closed solution of the initial parameters is then obtained using the projection information, and the depth measurement error is corrected by defining key points on the maximum section circle of the sphere. Finally, all the initial parameters are integrated into the nonlinear global optimization framework to complete the accurate calibration of all parameters. Compared with the chessboard-based method, the proposed calibration method does not require much manual intervention and is more robust. Compared with other sphere-based calibration methods, our method can not only calibrate the intrinsic and extrinsic parameters of the camera but can also correct the depth error as well. Experimental results show that our method can provide higher calibration accuracy than other methods.

Funding

Science and Technology Plan of Liaoning Province (2019JH1/10100005); National Key Research and Development Program of China (2017YFB1301100).

Acknowledgments

The authors is grateful for the language expression suggestions provided by the OSA language editing team. Thanks to anonymous reviewers for their valuable comments and suggestions.

Disclosures

The authors declare no conflicts of interest.

References

1. H. Yanagihara, T. Kakue, Y. Yamamoto, T. Shimobaba, and T. Ito, “Real-time three-dimensional video reconstruction of real scenes with deep depth using electro-holographic display system,” Opt. Express 27(11), 15662–15678 (2019). [CrossRef]

2. H. Nguyen, Z. Wang, P. Jones, and B. Zhao, “3d shape, deformation, and vibration measurements using infrared kinect sensors and digital image correlation,” Appl. Opt. 56(32), 9030–9037 (2017). [CrossRef]

3. F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard, “3-d mapping with an rgb-d camera,” IEEE Trans. Robot. 30(1), 177–187 (2014). [CrossRef]

4. M. Labbe and F. Michaud, “Online global loop closure detection for large-scale multi-session graph-based slam,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, (IEEE, 2014), pp. 2661–2666.

5. J. Tang, S. Miller, A. Singh, and P. Abbeel, “A textured object recognition pipeline for color and depth image data,” in Proceedings of IEEE International Conference on Robotics and Automation, (IEEE, 2012), pp. 3467–3474.

6. C. Choi and H. I. Christensen, “Rgb-d object tracking: A particle filter approach on gpu,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, (IEEE, 2013), pp. 1084–1091.

7. K. Khoshelham and S. Oude Elberink, “Accuracy and resolution of kinect depth data for indoor mapping applications,” Sensors 12(2), 1437–1454 (2012). [CrossRef]

8. A. Canessa, M. Chessa, A. Gibaldi, S. P. Sabatini, and F. Solari, “Calibrated depth and color cameras for accurate 3d interaction in a stereoscopic augmented reality environment,” J. Vis. Commun. Image R. 25(1), 227–237 (2014). Visual Understanding and Applications with RGB-D Cameras. [CrossRef]

9. F. Basso, E. Menegatti, and A. Pretto, “Robust intrinsic and extrinsic calibration of rgb-d cameras,” IEEE Trans. Robot. 34(5), 1315–1332 (2018). [CrossRef]

10. J. Smisek, M. Jancosek, and T. Pajdla, “3d with kinect,” in Proceedings of IEEE International Conference on Computer Vision Workshops (ICCV Workshops), (IEEE, 2011), pp. 1154–1160.

11. N. Burrus, “Kinect calibration,” (2011), http://nicolas.burrus.name/index.php/Research/KinectCalibration.

12. P. Mihelich, “Ros openni-launch package for intrinsic and extrinsic kinect calibration,” (2011), http://www.ros.org/wiki/openni_launch/Tutorials/.

13. J. Jung, Y. Jeong, J. Park, H. Ha, J. D. Kim, and I. Kweon, “A novel 2.5d pattern for extrinsic calibration of tof and camera fusion system,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, (IEEE, 2011), pp. 3290–3296.

14. A. Teichman, S. Miller, and S. Thrun, “Unsupervised intrinsic calibration of depth sensors via slam,” in Robotics: Science and Systems, vol. 248 (2013), p. 3.

15. M. D. Cicco, L. Iocchi, and G. Grisetti, “Non-parametric calibration for depth sensors,” Robot. Auton. Syst. 74, 309–317 (2015). Intelligent Autonomous Systems (IAS-13). [CrossRef]

16. D. Herrera C., J. Kannala, and J. Heikkilä, “Joint depth and color camera calibration with distortion correction,” IEEE Trans. Pattern Anal. Machine Intell. 34(10), 2058–2064 (2012). [CrossRef]

17. C. Raposo, J. P. Barreto, and U. Nunes, “Fast and accurate calibration of a kinect sensor,” in Proceedings of International Conference on 3D Vision - 3DV, (IEEE, 2013), pp. 342–349.

18. A. Staranowicz, G. R. Brown, F. Morbidi, and G. L. Mariottini, “Easy-to-use and accurate calibration of rgb-d cameras from spheres,” in Pacific-Rim Symposium on Image and Video Technology, (Springer, 2013), pp. 265–278.

19. Z. Liu, Q. Wu, S. Wu, and X. Pan, “Flexible and accurate camera calibration using grid spherical images,” Opt. Express 25(13), 15269–15285 (2017). [CrossRef]

20. M. E. Deetjen and D. Lentink, “Automated calibration of multi-camera-projector structured light systems for volumetric high-speed 3d surface reconstructions,” Opt. Express 26(25), 33278–33304 (2018). [CrossRef]

21. P. An, T. Ma, K. Yu, B. Fang, J. Zhang, W. Fu, and J. Ma, “Geometric calibration for lidar-camera system fusing 3d-2d and 3d-3d point correspondences,” Opt. Express 28(2), 2122–2141 (2020). [CrossRef]

22. Z. Wang, “Removal of noise and radial lens distortion during calibration of computer vision systems,” Opt. Express 23(9), 11341–11356 (2015). [CrossRef]

23. J. Heikkila, “Geometric camera calibration using circular control points,” IEEE Trans. Pattern Anal. Machine Intell. 22(10), 1066–1077 (2000). [CrossRef]

24. R. Hartley and A. Zisserman, Multiple view geometry in computer vision (Cambridge university press, 2003).

25. C. B. Duane, “Close-range camera calibration,” Photogramm. Eng 37, 855–866 (1971).

26. A. N. Staranowicz, G. R. Brown, F. Morbidi, and G.-L. Mariottini, “Practical and accurate calibration of rgb-d cameras using spheres,” Comput. Vis. Image Und. 137, 102–114 (2015). [CrossRef]

27. M. A. Fischler and R. C. Bolles, Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1987), pp. 726–740.

28. C. Lu, S. Xia, M. Shao, and Y. Fu, “Arc-support line segments revisited: An efficient high-quality ellipse detection,” IEEE Trans. on Image Process. 29, 768–781 (2020). [CrossRef]

29. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Machine Intell. 22(11), 1330–1334 (2000). [CrossRef]

30. A. Kaehler and G. Bradski, Learning OpenCV2nd Edition (O’Reilly Media, Inc., 2014).

31. S. Boyd and L. Vandenberghe, Convex Optimization (Cambridge University Press, USA, 2004).

32. C. Kerl, J. Sturm, and D. Cremers, “Robust odometry estimation for rgb-d cameras,” in Proceedings of IEEE International Conference on Robotics and Automation, (IEEE, 2013), pp. 3748–3754.

	Mean_x	Mean_x	Std_x	Std_y
Staranowicz	−5.2335	3.8335	4.2961	2.6162
Our method	0.3479	0.0301	0.8941	0.8356

	1m	2m	3m
Herrera	0.0034	0.0090	0.0176
Staranowicz	0.0037	0.0082	0.0192
Our method	0.0033	0.0065	0.0083

	Max	Min	Rmse
Original	1.65224	0.267389	1.046682
Corrected	1.040863	0.033635	0. 305095

	Mean_x	Mean_x	Std_x	Std_y
Staranowicz	−5.2335	3.8335	4.2961	2.6162
Our method	0.3479	0.0301	0.8941	0.8356

	1m	2m	3m
Herrera	0.0034	0.0090	0.0176
Staranowicz	0.0037	0.0082	0.0192
Our method	0.0033	0.0065	0.0083

Approach for accurate calibration of RGB-D cameras using spheres

Abstract

1. Introduction

2. Mathematical analysis

2.1 General model of RGB-D camera

2.2 Definition of the spherical projection quadratic curve

2.3 Convert ellipse center to projection sphere center

3. Calibration approach

3.1 Overview

3.2 Feature-extraction process

3.3 Initial calibration

3.4 Depth error correction

3.5 Global nonlinear optimization

4. Experimental evaluation and results

4.1 Experimental implementation process

4.2 Depth and color information registration and sphere center reprojection error

4.3 Depth correction

4.4 Visual odometry use case

5. Conclusions

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (11)

Tables (3)

Equations (29)

Optics Express