Abstract
Lensless computational imaging, a technique that combines optical-modulated measurements with task-specific algorithms, has recently benefited from the application of artificial neural networks. Conventionally, lensless imaging techniques rely on prior knowledge to deal with the ill-posed nature of unstructured measurements, which requires costly supervised approaches. To address this issue, we present a self-supervised learning method that learns semantic representations for the modulated scenes from implicitly provided priors. A contrastive loss function is designed for training the target extractor (measurements) from a source extractor (structured natural scenes) to transfer cross-modal priors in the latent space. The effectiveness of the new extractor was validated by classifying the mask-modulated scenes on unseen datasets and showed the comparable accuracy to the source modality (contrastive language-image pre-trained [CLIP] network). The proposed multimodal representation learning method has the advantages of avoiding costly data annotation, being more adaptive to unseen data, and usability in a variety of downstream vision tasks with unconventional imaging settings.
© 2024 Optica Publishing Group
Full Article | PDF ArticleMore Like This
Tong Jiang, Xiaodong Kuang, Sanqian Wang, Tingting Liu, Yuan Liu, Xiubao Sui, and Qian Chen
Opt. Express 32(9) 15008-15024 (2024)
Yukun Wu, Zhehao Xu, Shanshan Liang, Lukang Wang, Meng Wang, Hongbo Jia, Xiaowei Chen, Zhikai Zhao, and Xiang Liao
Biomed. Opt. Express 15(5) 2910-2925 (2024)
Zehua Wang, Shenghao Zheng, Zhihui Ding, and Cheng Guo
J. Opt. Soc. Am. A 41(2) 165-173 (2024)