ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop

Yu Tian; Yu Tian; Da Song; Da Song; Mengna Yang; Mengna Yang; Jie Liu; Jie Liu; Jie Liu; Guohua Geng; Guohua Geng; Mingquan Zhou; Mingquan Zhou; Kang Li; Kang Li; Xin Cao; Xin Cao; Xin Cao

doi:10.1364/JOSAA.473657

Journal of the Optical Society of America A
Vol. 39,
Issue 12,
pp. 2343-2353
(2022)
•https://doi.org/10.1364/JOSAA.473657

ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop

Yu Tian, Da Song, Mengna Yang, Jie Liu, Guohua Geng, Mingquan Zhou, Kang Li, and Xin Cao

Not Accessible

Your library or personal account may give you access

Get PDF
Email
Share
Get Citation
Copy Citation Text
Yu Tian, Da Song, Mengna Yang, Jie Liu, Guohua Geng, Mingquan Zhou, Kang Li, and Xin Cao, "ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop," J. Opt. Soc. Am. A 39, 2343-2353 (2022)

Export Citation
- BibTex
- Endnote (RIS)
- HTML
- Plain Text
Citation alert
Save article

Check for updates

Related Topics
Table of Contents Category
- Image Processing and Image Analysis
Optics & Photonics Topics
?

The topics in this list come from the Optics and Photonics Topics applied to this article.

About this Article
History
- Original Manuscript: August 18, 2022
- Revised Manuscript: October 25, 2022
- Manuscript Accepted: October 27, 2022
- Published: November 29, 2022

Abstract

Although many recent deep learning methods have achieved good performance in point cloud analysis, most of them are built upon the heavy cost of manual labeling. Unsupervised representation learning methods have attracted increasing attention due to their high label efficiency. How to learn more useful representations from unlabeled 3D point clouds is still a challenging problem. Addressing this problem, we propose a novel unsupervised learning approach for point cloud analysis, named ULD-Net, consisting of an equivariant-crop (equiv-crop) module to achieve dense similarity learning. We propose dense similarity learning that maximizes consistency across two randomly transformed global–local views at both the instance level and the point level. To build feature correspondence between global and local views, an equiv-crop is proposed to transform features from the global scope to the local scope. Unlike previous methods that require complicated designs, such as negative pairs and momentum encoders, our ULD-Net benefits from the simple Siamese network that relies solely on stop-gradient operation preventing the network from collapsing. We also utilize the feature separability constraint for more representative embeddings. Experimental results show that our ULD-Net achieves the best results of context-based unsupervised methods and comparable performances to supervised models in shape classification and segmentation tasks. On the linear support vector machine classification benchmark, our ULD-Net surpasses the best context-based method spatiotemporal self-supervised representation learning (STRL) by 1.1% overall accuracy. On tasks with fine-tuning, our ULD-Net outperforms STRL under fully supervised and semisupervised settings, in particular, 0.1% accuracy gain on the ModelNet40 classification benchmark, and 0.6% medium intersection of union gain on the ShapeNet part segmentation benchmark.

Full Article | PDF Article

Previous Article Next Article

Data availability

Data underlying the results presented in this paper are available in Refs. [31–33].

31. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for Volumetric Shapes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 1912–1920.

33. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3D semantic parsing of large-scale indoor spaces,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 1534–1543.

Cited By

You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Figures (8)

You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Tables (7)

You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Equations (14)

You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Pretraining Dataset	Method	OA
ShapeNet	FoldingNet [4]	88.4
	Du et al. [16]	89.6
	Jigsaw3D [11]	90.6
	Rotation 3D [12]	90.8
	STRL [10]	90.9
	Ours	91.9
ModelNet40	FoldingNet [4]	84.4
	Jigsaw3D [11]	87.8
	MAP-VAE [5]	90.2
	OcCo [8]	89.2
	Ours	92.0

Method	Sup.	OA
PointNet [19]	✓	89.2
RGCNN [23]	✓	90.5
PointNet $+ +$ [20]	✓	90.7
KD-Net [24]	✓	91.8
PointCNN [21]	✓	92.2
DGCNN [22]	✓	92.2
Point Cloud Transformer [26]	✓	93.2
PointTransformer [25]	✓	93.7
Jigsaw 3D [11]	✗	92.4
Info 3D [15]	✗	93.0
OcCo [8]	✗	93.0
FoldingNet [4]	✗	93.1
STRL [10]	✗	93.1
Ours	✗	93.4

Method	1%	5%	10%	20%
DGCNN [22]	58.4	80.7	85.2	88.1
STRL [10]	60.5	82.7	86.5	89.7
Ours	60.6	82.5	86.8	89.8

	Supervised Method			Unsupervised Method
Shapes	DGCNN [22]	RSCNN [36]	PCT [26]	LGAN [6]	Method in [16]	Jigsaw 3D [11]	OcCo [8]	STRL [10]	Ours
Ins.	85.2	86.2	86.4	57.0	82.3	85.3	85.5	85.1	85.7
Aero	84.0	83.5	85.0	54.1	82.1	84.1	84.4	83.7	84.7
Bag	83.4	84.8	82.4	48.7	74.5	84.0	77.5	80.3	82.8
Cap	86.7	88.8	89.0	62.6	83.6	85.8	83.4	87.6	83.8
Car	77.8	79.6	81.2	43.2	74.9	77.0	77.9	77.7	78.3
Chair	90.6	91.2	91.9	68.4	87.9	90.9	91.0	90.9	90.9
Earphone	74.7	81.1	71.5	58.3	72.4	80.0	75.2	78.0	77.0
Guitar	91.2	91.6	91.3	74.3	89.9	91.5	91.6	91.4	91.3
Knife	87.5	88.4	88.1	68.4	85.4	87.0	88.2	87.7	88.2
Lamp	82.8	86.0	86.3	53.4	79.1	83.2	83.5	83.7	83.8
Laptop	95.7	96.0	95.8	82.6	95.2	95.8	96.1	96.1	95.6
Motor	66.3	73.7	64.6	18.6	67.3	71.6	65.5	66.7	68.6
Mug	94.9	94.1	95.8	75.1	93.3	94.0	94.4	95.0	94.3
Pistol	81.1	83.4	83.6	54.7	81.0	82.6	79.6	81.2	80.6
Rocket	63.5	60.5	62.2	37.2	58.2	60.0	58.0	58.2	61.9
Skateboard	74.5	77.7	77.6	46.7	74.0	77.9	76.2	75.3	75.1
Table	82.6	83.6	83.7	66.4	79.2	81.8	82.8	82.1	83.4

Method	Sup.	OA	mIoU
PointNet [19]	✓	78.6	47.6
PointNet $+ +$ [20]	✓	81.0	54.5
PointCNN [21]	✓	88.1	65.4
DGCNN [22]	✓	84.1	56.1
Jigsaw [11]	✗	84.4	56.6
OcCo [8]	✗	85.1	58.5
Ours	✗	85.5	59.2

Model	$C$	Rot.	Trans.	Scal.	Jit.	OA
$A_{1}$	✓	✓	✓	✓	✓	92.0
$B_{1}$	✗	✗	✗	✗	✗	88.0
$C_{1}$	✗	✓	✓	✓	✓	89.6
$D_{1}$	✓	✗	✓	✓	✓	91.0
$E_{1}$	✓	✓	✗	✓	✓	91.0
$F_{1}$	✓	✓	✓	✗	✓	91.0
$G_{1}$	✓	✓	✓	✓	✗	91.0

Model	$L_{i n s t a n c e}$	$L_{p o i n t}$	$L_{s e p a r a b i l i t y}$	OA
$A_{2}$	✓	✗	✗	91.3
$B_{2}$	✓	✓	✗	91.7
$C_{2}$	✓	✗	✓	91.6
$D_{2}$	✓	✓	✓	92.0

Abstract

Data availability

Cited By

Figures (8)

Tables (7)

Equations (14)

Journal of the Optical Society of America A