Abstract
As the foundation of virtual content generation, cameras are crucial for augmented reality (AR) applications, yet their integration with transparent displays has remained a challenge. Prior efforts to develop see-through cameras have struggled to achieve high resolution and seamless integration with AR displays. In this work, we present LightguideCam, a compact and flexible see-through camera based on an AR lightguide. To address the overlapping artifacts in measurement, we present a compressive sensing algorithm based on an equivalent imaging model that minimizes computational consumption and calibration complexity. We validate our design using a commercial AR lightguide and demonstrate a field of view of 23.1° and an angular resolution of 0.1° in the prototype. Our LightguideCam has great potential as a plug-and-play extensional imaging component in AR head-mounted displays, with promising applications for eye-gaze tracking, eye-position perspective photography, and improved human–computer interaction devices, such as full-screen mobile phones.
© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
Augmented reality (AR) technology offers an immersive experience blending real-world and computer-generated content, which promises to revolutionize areas as diverse as education, entertainment, and medical services. As one of the most representative AR devices, near-eye display components based on transparent media have received widespread attention [1–7]. Cameras, which serve as a base for creating virtual content, are also key components of AR devices [8–12]. However, conventional cameras have an opaque appearance that makes it challenging to integrate them with transparent displays. A compact and flexible see-through camera represents a promising solution for AR applications.
Recently, two types of see-through cameras have been proposed. In one approach, semi-transparent photodetectors have been created by stacking a graphene light-sensing layer on high-transmission substrates [13]. In the other approach, opaque photodetectors and peripheral circuits have been hidden using beam-splitting elements [14] or a light modulator [15]. As a part of the efforts to lighten the optical elements burden, computational imaging schemes were employed in see-through cameras based on luminescent concentrator films [16], window with roughened edges [17], and volume holographic optical elements [18]. However, these systems have tended to lean on highly customized optical elements. Meanwhile, imaging resolution is limited due to low-density arrayed photodetectors or computational-intensive reconstruction requirements. Both defects preclude their integration and applicability.
In this Letter, we present LightguideCam, a high-resolution see-through computational camera that can be integrated with AR displays. As demonstrated in Fig. 1, light from the object partly passes through the lightguide and forms images on the retina, while the rest of the light is guided to form a blurred image on the sensor plane. An equivalent model-based algorithm is proposed to achieve high-resolution reconstruction with minimal computational complexity, compared with the fully calibration-based methods. The design of LightguideCam is simple yet effective and can be integrated as an extension component with AR display devices, offering great potential for AR applications such as eye gaze tracking, eye-position perspective photography, and beyond.
Typically used in AR head-mounted devices, lightguides project images on a display onto the retina while allowing external light to pass through. The lightguide demonstrated in the LightguideCam consists of a partially reflective mirrors array (PRMA), which is a series of transparent plates with parallel bevels. Light emitted by the display is collimated by a lens, then undergoes total reflections at the lightguide–air interface. A portion of the light is reflected to the human eye by the bevel and the rest continues to travel, resulting in virtual images at infinity and maintaining the view of the scene behind the lightguide. Improved display quality is achieved by designing PRMA structures, including using multiple reflective mirrors for eyebox expansion and adjusting bevel reflectivity with coatings to ensure a uniform display intensity at variant positions [5].
As illustrated in Fig. 2(a), we reverse the conventional projection light path by replacing the display with an image sensor. Light from an object point at a finite distance partly passes through the lightguide and the rest of the light forms multiple points on the sensor plane. Figure 2(b) presents the experimentally captured spatially varying point spread functions (PSFs) of the LightguideCam. Here, $p_x$, $p_y$ and $p_z$ are orthogonal axes in the world coordinate. Computational reconstruction is employed to eliminate the artifacts caused by propagation in the lightguide and acquire clear images.
The PSFs of the LightguideCam exhibit remarkable shift variance and large scales, which present significant challenges for deconvolution methods due to the high computational costs and impractical calibration demands [19–22]. To address these limitations, we propose an equivalent forward imaging model of PRMA lightguides and a corresponding model-based reconstruction algorithm.
The equivalent forward imaging model is depicted in Fig. 3. The artifacts in measurement are decomposed into images of the object from different viewpoints (refer to Supplement 1 for a detailed derivation). A shifted camera array is equivalently created, with displacements parallel and perpendicular to the sensor plane. Angle $\theta$ between the shifted array and the $p_z$ axis is given by
where $l$ represents the distance between two partially reflective mirrors, $t$ and $n$ are the thickness and the reflective index of the lightguide, respectively, and $\phi$ is the angle between the lightguide-air interface and the bevel. The final image is obtained by summing the measurements of the equivalent cameras. To achieve improved accuracy, the PSF variance along the $p_x$ axis caused by secondary reflections in the lightguide is further modeled with a $p_x$-dependent camera weight $w_i$. Supplement 1 contains detailed descriptions of the calibration method.To summarize, the measurement $b$ of the LightguideCam is a weighted summation of sub-images $a_i$ given by the operator $\mathcal {A}_i$. We number the equivalent cameras $1, 2, \ldots, N$ in ascending order of $p_x$ and mark their corresponding measurement as $a_1, a_2, \ldots, a_N$. The relation in the camera coordinates $(q_x,q_y)$ is given by
By representing images as vectors and operators as matrices, which are indicated with the original letter in bold, a concise form of Eq. (2) is
The equivalent model describes PSF variance in the entire space, with a depleted calibration along the $p_x$ axis, which is highlighted by the red dotted box in Fig. 2(b). As demonstrated in Figs. 4(a) and 4(b), the high similarity between the experimentally measured and simulated PSFs along three directions are quantitatively demonstrated, showcasing the effectiveness of the equivalent model in describing the spatially varying artifacts caused by the lightguide. The PSFs at different depths $d$ exhibit low cross correlation, as indicated in Fig. 4(c). Therefore, scenes at different depths can be separately reconstructed, indicating the LightguideCam’s capability on depth resolution.
It should be noted that when the scene is located at an infinite distance, the displacements between equivalent cameras become negligible compared with the imaging distance, i.e., $\delta _x, \delta _z \ll d$. Under such circumstances, the scene forms a clear image on the sensor plane. However, computational reconstruction techniques must be employed to achieve accurate imaging of the scene when the displacement effect cannot be ignored.
We construct an optimization problem based on the equivalent imaging model. Following the compressive sensing theory, we pursue the $l_1$ norm minimization of images after sparse transformation to achieve robust reconstruction. A commonly used prior is that natural images are sparse in the gradient domain, which is also known as total variation regularization. The optimization problem is shown as
Equation 4 can be efficiently solved with the fast iterative shrinkage/thresholding algorithm (FISTA). The pseudo-code for iterative reconstruction is presented in Algorithm 1, where we perform gradient projection for denoising [23].
In Algorithm 1 and Algorithm 2, $\lambda$ is the regularization parameter, $\boldsymbol {\Psi }$ is the gradient operation, $\boldsymbol {(\cdot )^T}$ represents the transpose, and $\nabla f(\boldsymbol {y_k})=\boldsymbol {A^T \Sigma ^T}(\boldsymbol {\Sigma A y_k -b})$. The operators $\mathcal {P}_P$ and $\mathcal {P}_C$ are projection operators that are defined on the sets $[-1,1]$ and $[0,1]$, respectively. Specifically, these operators are represented by the pixel-wise functions $\mathcal {P}_P(\boldsymbol {x})=\max \left [ \min (\boldsymbol {x},1),-1\right ]$ and $\mathcal {P}_C(\boldsymbol {x})=\max \left [ \min (\boldsymbol {x},1),0\right ]$.
Our method is verified using a non-customized PRMA Lightguide in AR glasses (LLVision L-PAT35), as demonstrated in Fig. 5(a). A Sony IMX264 CMOS sensor with $2056 \times 2463$ pixels is positioned parallel to the lens principle plane. The sensor plane could be adjusted for focusing distance by moving it in the normal direction. The 2D scene is positioned $d= 50$ cm away from the lightguide. The calibration of this prototype covers a field of view of 23.1$^{\circ }$, ensuring high-quality reconstruction within that region. As illustrated in Fig. 5(b), the model-based algorithm reduces overlapping artifacts in the measurement. The reconstruction achieves an angular resolution of 0.1$^{\circ }$, approaching the diffraction limits corresponding to the numerical aperture of the lightguide. Supplement 1 contains additional information on the experiment and performance analysis.
We also test the depth resolution of LightguideCam with 3D scenes as illustrated in Fig. 6. The front and back objects are respectively positioned at distances of 50 cm and 60 cm from the lightguide. Digital refocus from single-shot measurement is computationally executed by applying varying depth parameters in reconstruction. The defocused object presents overlapping artifacts. The depth resolution increases with the maximum distance between the partially reflective mirrors array, i.e., the baseline distance. A larger baseline distance will produce low-correlated PSF patterns in a larger depth range, which portends the capacity of single-shot compressive 3D imaging and extended depth of field (EDoF) imaging with the proposed LightguideCam.
In conclusion, we have demonstrated a novel design for an integrated, compact, and flexible see-through computational camera based on a PRMA AR lightguide. The reconstruction algorithm leverages the equivalent imaging model to circumvent high computational consumption and calibration burden, enabling high-resolution reconstruction. Tests with 3D scenes reveal the potential of single-shot 3D imaging and EDoF imaging with the prototype. This scheme is valid for AR devices with diverse structures, highlighting its potential applications as diverse as smart glasses, mobile phone screens, automotive electronics, functional decorations, etc. The integrated optical path also facilitates active illumination, which can be helpful to improve the detection signal-to-noise ratio under dark environments. Moreover, it also highlights the potential of computational imaging strategies for creating new-form cameras.
Funding
National Natural Science Foundation of China (62235009); National Key Research and Development Program of China (2021YFB2802000).
Acknowledgments
The authors thank Mr. Fei Wu at LLVision for providing the AR Lightguide.
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are available in Ref. [24].
Supplemental document
See Supplement 1 for supporting content.
REFERENCES
1. Y. Shi, C. Wan, C. Dai, S. Wan, Y. Liu, C. Zhang, and Z. Li, Optica 9, 670 (2022). [CrossRef]
2. G.-Y. Lee, J.-Y. Hong, S. Hwang, S. Moon, H. Kang, S. Jeon, H. Kim, J.-H. Jeong, and B. Lee, Nat. Commun. 9, 4562 (2018). [CrossRef]
3. X. Zhang, X. Li, H. Zhou, Q. Wei, G. Geng, J. Li, X. Li, Y. Wang, and L. Huang, Adv. Funct. Mater. 32, 2209460 (2022). [CrossRef]
4. C. Chang, K. Bang, G. Wetzstein, B. Lee, and L. Gao, Optica 7, 1563 (2020). [CrossRef]
5. D. Cheng, Q. Wang, Y. Liu, H. Chen, D. Ni, X. Wang, C. Yao, Q. Hou, W. Hou, G. Luo, and Y. Wang, Light: Advanced Manufacturing 2, 336 (2021). [CrossRef]
6. J. Xiong, E.-L. Hsiang, Z. He, T. Zhan, and S.-T. Wu, Light: Sci. Appl. 10, 216 (2021). [CrossRef]
7. Y. Li, S. Chen, H. Liang, X. Ren, L. Luo, Y. Ling, S. Liu, Y. Su, and S.-T. Wu, PhotoniX 3, 29 (2022). [CrossRef]
8. P.-H. C. Chen, K. Gadepalli, R. MacDonald, Y. Liu, S. Kadowaki, K. Nagpal, T. Kohlberger, J. Dean, G. S. Corrado, J. D. Hipp, C. H. Mermel, and M. C. Stumpe, Nat. Med. 25, 1453 (2019). [CrossRef]
9. L. Mu noz-Saavedra, L. Miró-Amarante, and M. Domínguez-Morales, Appl. Sci. 10, 322 (2020). [CrossRef]
10. C. Ebner, P. Mohr, T. Langlotz, Y. Peng, D. Schmalstieg, G. Wetzstein, and D. Kalkofen, IEEE Trans. Visual. Comput. Graphics 29, 2816 (2023). [CrossRef]
11. Z. Lv, J. Liu, J. Xiao, and Y. Kuang, Opt. Express 26, 32802 (2018). [CrossRef]
12. J. Zhao, B. Chrysler, and R. K. Kostuk, Opt. Eng. 60, 085101 (2021). [CrossRef]
13. M.-B. Lien, C.-H. Liu, I. Y. Chun, S. Ravishankar, H. Nien, M. Zhou, J. A. Fessler, Z. Zhong, and T. B. Norris, Nat. Photonics 14, 143 (2020). [CrossRef]
14. A. R. Travis, T. A. Large, N. Emerton, and S. N. Bathiche, Proc. IEEE 101, 45 (2013). [CrossRef]
15. J.-H. Song, J. van de Groep, S. J. Kim, and M. L. Brongersma, Nat. Nanotechnol. 16, 1224 (2021). [CrossRef]
16. A. Koppelhuber and O. Bimber, Opt. Express 21, 4796 (2013). [CrossRef]
17. G. Kim and R. Menon, Opt. Express 26, 22826 (2018). [CrossRef]
18. X. Chen, N. Tagami, H. Konno, T. Nakamura, S. Takeyama, X. Pan, and M. Yamaguchi, Opt. Express 30, 25006 (2022). [CrossRef]
19. K. Yanny, K. Monakhova, R. W. Shuai, and L. Waller, Optica 9, 96 (2022). [CrossRef]
20. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, Light: Sci. Appl. 9, 53 (2020). [CrossRef]
21. Y. Xue, Q. Yang, G. Hu, K. Guo, and L. Tian, Optica 9, 1009 (2022). [CrossRef]
22. J. Alido, J. Greene, Y. Xue, G. Hu, Y. Li, K. J. Monk, B. T. DeBenedicts, I. G. Davison, and L. Tian, “Robust single-shot 3D fluorescence imaging in scattering media with a simulator-trained neural network,” arXiv, arXiv:2303.12573 (2023). [CrossRef]
23. A. Beck and M. Teboulle, IEEE Trans. on Image Process. 18, 2419 (2009). [CrossRef]
24. Y. Ma, Y. Gao, J. Wu, and L. Cao, “See-through camera based on an AR lightguide,” GitHub, 2023, https://github.com/THUHoloLab/LightguideCam.