Unsupervised monocular visual odometry via combining instance and RGB
information

Min Yue; Guangyuan Fu; Ming Wu; Hongyang Gu; Erliang Yao

doi:10.1364/AO.452378

Applied Optics
Vol. 61,
Issue 13,
pp. 3793-3803
(2022)
•https://doi.org/10.1364/AO.452378

Unsupervised monocular visual odometry via combining instance and RGB information

Min Yue, Guangyuan Fu, Ming Wu, Hongyang Gu, and Erliang Yao

Not Accessible

Your library or personal account may give you access

Get PDF
Email
Share
Get Citation
Copy Citation Text
Min Yue, Guangyuan Fu, Ming Wu, Hongyang Gu, and Erliang Yao, "Unsupervised monocular visual odometry via combining instance and RGB information," Appl. Opt. 61, 3793-3803 (2022)

Export Citation
- BibTex
- Endnote (RIS)
- HTML
- Plain Text
Citation alert
Save article

Check for updates

Abstract

Unsupervised deep learning methods have made significant progress in monocular visual odometry (VO) tasks. However, due to the complexity of the real-world scene, learning the camera ego-motion from the RGB information of monocular images in an unsupervised way is still challenging. Existing methods mainly learn motion from the original RGB information, lacking higher-level input from scene understanding. Hence, this paper proposes an unsupervised monocular VO framework that combines the instance and RGB information, named combined information based (CI-VO). The proposed method includes two stages. First is obtaining the instance maps of the monocular images, without finetuning on the VO dataset. Then we obtain the combined information from the two types of information, which is input into the proposed combined information based pose estimation network, named CI-PoseNet, to estimate the relative pose of the camera. To make better use of the two types of information, we propose a fusion feature extraction network to extract the fused features from the combined information. Experiments on the KITTI odometry and KITTI raw dataset show that the proposed method has good performance in the camera pose estimation task, which exceeds the existing mainstream methods.

Full Article | PDF Article

More Like This

MAFFNet: real-time multi-level attention feature fusion network with RGB-D semantic segmentation for autonomous driving

Tongfei Lv, Yu Zhang, Lin Luo, and Xiaorong Gao
Appl. Opt. 61(9) 2219-2229 (2022)

Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network

Shiyuan Liu, Jingfan Fan, Dengpan Song, Tianyu Fu, Yucong Lin, Deqiang Xiao, Hong Song, Yongtian Wang, and Jian Yang
Biomed. Opt. Express 13(5) 2707-2727 (2022)

Real-time pose estimation for an underwater object combined with deep learning and prior information

Xianwei Ge, Shukai Chi, Wei Jia, and Ke Jiang
Appl. Opt. 61(24) 7108-7118 (2022)

Previous Article Next Article

Supplementary Material (2)

Name	Description
Dataset 1	CI-VO results of sequence 09 in the KITTI dataset.
Dataset 2	CI-VO results of sequence 10 in the KITTI dataset.

Data availability

Data underlying the results presented in this paper are available in Ref. [36], Dataset 1, Ref. [40], and Dataset 2, Ref. [41].

36. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: the KITTI dataset,” International Journal of Robotics Research, (2013), http://www.cvlibs.net/datasets/kitti/.

40. M. Yue, G. Fu, H. Gu, and E. Yao, “CI-VO results of sequence 09 in KITTI dataset,” figshare (2022), https://doi.org/10.6084/m9.figshare.19387430.

41. M. Yue, G. Fu, H. Gu, and E. Yao, “CI-VO results of sequence 10 in KITTI dataset,” figshare (2022), https://doi.org/10.6084/m9.figshare.19387433.

Cited By

You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Figures (9)

You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Tables (5)

You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Equations (6)

You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

			Sequence 09			Sequence 10			Avg
Method	Dataset	Resolution	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE
Depth-VO-Feat [42]	$608 \times 160$	K(S)	11.89	3.60	52.12	12.82	3.41	24.70	12.36	3.51	38.41
UndeepVO [43]	$416 \times 128$	K(S)	7.01	3.61	—	10.63	4.65	—	8.82	4.10	—
SfMLearner [21]	$416 \times 128$	K(M)	19.15	6.82	77.79	40.40	17.69	67.34	29.78	12.26	72.57
GeoNet [44]	$416 \times 128$	K(M)	28.72	9.80	158.45	23.90	9.00	43.04	26.31	9.40	100.75
DeepMatchVO [45]	$416 \times 128$	K(M)	9.91	3.80	27.08	12.18	5.90	24.44	11.05	4.85	25.76
Monodepth2^* [22]	$416 \times 128$	K(M)	17.77	3.93	79.01	14.48	5.93	23.15	15.96	4.99	51.08
SC-SfMLearner [23]	$416 \times 128$	K(M)	11.20	3.35	—	10.10	4.96	—	10.60	4.16	—
NeuralBundler [46]	$416 \times 128$	K(M)	8.10	2.81	—	12.90	3.17	—	10.50	2.99	—
TrajNet [47]	$416 \times 128$	K(M)	7.40	2.21	—	10.28	2.82	—	8.84	2.52	—
SC-Depth^* [24]	$416 \times 128$	K(M)	8.43	2.04	32.08	10.96	3.14	17.11	9.70	2.59	24.60
CM-VO [25]	—	K(M)	9.69	3.37	—	10.01	4.87	—	9.85	4.12	—
CM-VO-opt [25]	—	K(M)	8.71	3.02	—	9.33	4.03	—	9.02	3.53	—
CI-VO	$416 \times 128$	K(M)	7.20	1.87	16.84	8.70	2.55	16.01	7.95	2.21	16.43

	2011-09-26-0036			2011-09-26-0117			2011-09-29-0071			2011-10-03-0047
Method	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE
Monodepth2^* [22]	16.46	4.86	16.50	7.82	7.37	5.10	12.44	10.44	3.74	11.37	3.61	12.27
SC-Depth^* [24]	17.65	5.63	19.31	7.30	6.91	5.15	14.78	13.60	5.75	16.37	4.38	21.52
CI-VO	13.96	4.76	16.02	4.37	3.32	3.10	8.45	5.86	3.15	11.03	2.62	14.10

	Sequence 09			Sequence 10			Avg
Method	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE
lCI-VO-Semantic (-)	8.37	2.41	29.43	9.06	2.81	16.70	8.72	2.61	23.07
CI-VO-Semantic	8.90	2.40	36.82	12.78	4.43	19.68	10.84	3.42	28.25
CI-VO	7.20	1.87	16.84	8.70	2.55	16.01	7.95	2.21	16.43

	Sequence 09			Sequence 10			Avg
Method	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE	$t_{e r r}$	$r_{e r r}$	ATE
CI-VO (without instance or attention)	8.43	2.04	32.08	10.96	3.14	17.11	9.70	2.59	24.60
CI-VO (without instance)	7.05	1.90	18.24	8.78	2.67	16.75	7.92	2.86	17.50
CI-VO (with separate feature extractor)	9.23	2.46	35.23	13.32	5.40	21.94	11.28	3.93	28.59
CI-VO (full)	7.20	1.87	16.84	8.70	2.55	16.01	7.95	2.21	16.43

Method	MACs (G)	Params (M)	Inference Time (ms)
CI-VO (without instance or attention)	2.12	12.50	3.57
CI-VO (without instance)	2.12	12.79	4.04
CI-VO (full)	2.37	12.81	4.32

Abstract

Supplementary Material (2)

Data availability

Cited By

Figures (9)

Tables (5)

Equations (6)

Applied Optics