Toward a see-through camera via AR lightguide

Yuchen Ma; Yunhui Gao; Jiachen Wu; Liangcai Cao

doi:10.1364/OL.492370

Augmented reality (AR) technology offers an immersive experience blending real-world and computer-generated content, which promises to revolutionize areas as diverse as education, entertainment, and medical services. As one of the most representative AR devices, near-eye display components based on transparent media have received widespread attention [1–7]. Cameras, which serve as a base for creating virtual content, are also key components of AR devices [8–12]. However, conventional cameras have an opaque appearance that makes it challenging to integrate them with transparent displays. A compact and flexible see-through camera represents a promising solution for AR applications.

Recently, two types of see-through cameras have been proposed. In one approach, semi-transparent photodetectors have been created by stacking a graphene light-sensing layer on high-transmission substrates [13]. In the other approach, opaque photodetectors and peripheral circuits have been hidden using beam-splitting elements [14] or a light modulator [15]. As a part of the efforts to lighten the optical elements burden, computational imaging schemes were employed in see-through cameras based on luminescent concentrator films [16], window with roughened edges [17], and volume holographic optical elements [18]. However, these systems have tended to lean on highly customized optical elements. Meanwhile, imaging resolution is limited due to low-density arrayed photodetectors or computational-intensive reconstruction requirements. Both defects preclude their integration and applicability.

In this Letter, we present LightguideCam, a high-resolution see-through computational camera that can be integrated with AR displays. As demonstrated in Fig. 1, light from the object partly passes through the lightguide and forms images on the retina, while the rest of the light is guided to form a blurred image on the sensor plane. An equivalent model-based algorithm is proposed to achieve high-resolution reconstruction with minimal computational complexity, compared with the fully calibration-based methods. The design of LightguideCam is simple yet effective and can be integrated as an extension component with AR display devices, offering great potential for AR applications such as eye gaze tracking, eye-position perspective photography, and beyond.

Fig. 1. Schematic illustration of the proposed LightguideCam. A portion of the light from the object passes through the lightguide and creates images on the retina, whereas the remaining part is directed toward the sensor plane to generate a blurred image. To achieve high-resolution reconstruction, an equivalent model-based algorithm is performed.

Download Full Size | PDF

Typically used in AR head-mounted devices, lightguides project images on a display onto the retina while allowing external light to pass through. The lightguide demonstrated in the LightguideCam consists of a partially reflective mirrors array (PRMA), which is a series of transparent plates with parallel bevels. Light emitted by the display is collimated by a lens, then undergoes total reflections at the lightguide–air interface. A portion of the light is reflected to the human eye by the bevel and the rest continues to travel, resulting in virtual images at infinity and maintaining the view of the scene behind the lightguide. Improved display quality is achieved by designing PRMA structures, including using multiple reflective mirrors for eyebox expansion and adjusting bevel reflectivity with coatings to ensure a uniform display intensity at variant positions [5].

As illustrated in Fig. 2(a), we reverse the conventional projection light path by replacing the display with an image sensor. Light from an object point at a finite distance partly passes through the lightguide and the rest of the light forms multiple points on the sensor plane. Figure 2(b) presents the experimentally captured spatially varying point spread functions (PSFs) of the LightguideCam. Here, $p_x$, $p_y$ and $p_z$ are orthogonal axes in the world coordinate. Computational reconstruction is employed to eliminate the artifacts caused by propagation in the lightguide and acquire clear images.

Fig. 2. Structure of the PRMA lightguide and the spatially varying PSFs. (a) Multiple points on the sensor plane are formed by a point source due to reflections in the lightguide. (b) PSFs of the LightguideCam variant across the three-dimensional space.

Download Full Size | PDF

The PSFs of the LightguideCam exhibit remarkable shift variance and large scales, which present significant challenges for deconvolution methods due to the high computational costs and impractical calibration demands [19–22]. To address these limitations, we propose an equivalent forward imaging model of PRMA lightguides and a corresponding model-based reconstruction algorithm.

The equivalent forward imaging model is depicted in Fig. 3. The artifacts in measurement are decomposed into images of the object from different viewpoints (refer to Supplement 1 for a detailed derivation). A shifted camera array is equivalently created, with displacements parallel and perpendicular to the sensor plane. Angle $\theta$ between the shifted array and the $p_z$ axis is given by

(1)$$\tan (\theta)=\frac{\delta_x}{\delta_z}=\frac{2lt}{n \cos (2\phi)},$$

where $l$ represents the distance between two partially reflective mirrors, $t$ and $n$ are the thickness and the reflective index of the lightguide, respectively, and $\phi$ is the angle between the lightguide-air interface and the bevel. The final image is obtained by summing the measurements of the equivalent cameras. To achieve improved accuracy, the PSF variance along the $p_x$ axis caused by secondary reflections in the lightguide is further modeled with a $p_x$-dependent camera weight $w_i$. Supplement 1 contains detailed descriptions of the calibration method.

Fig. 3. Equivalent forward imaging model of the LightguideCam. The raw measurement of LightguideCam is modeled as a weighted combination of images captured by a positionally shifted camera array.

Download Full Size | PDF

To summarize, the measurement $b$ of the LightguideCam is a weighted summation of sub-images $a_i$ given by the operator $\mathcal {A}_i$. We number the equivalent cameras $1, 2, \ldots, N$ in ascending order of $p_x$ and mark their corresponding measurement as $a_1, a_2, \ldots, a_N$. The relation in the camera coordinates $(q_x,q_y)$ is given by

(2)$$b(q_x,q_y)=\sum_{i=1}^{N} w_i(q_x) \cdot a_i(q_x,q_y) = \sum_{i=1}^{N} w_i(q_x) \cdot \mathcal{A}_i (v),$$

where $a_1$ is marked as $v$ and $N$ is the total number of equivalent cameras.

By representing images as vectors and operators as matrices, which are indicated with the original letter in bold, a concise form of Eq. (2) is

(3)$$\boldsymbol{b}=\boldsymbol{\Sigma}\left[\boldsymbol{a_1}, \boldsymbol{a_2}, \ldots, \boldsymbol{a_N}\right]={\boldsymbol{\Sigma A v}},$$

where $\boldsymbol {A}$ is the matrix representation of $\left [\mathcal {A}_1, \mathcal {A}_2, \ldots, \mathcal {A}_N\right ]$ which creates a stack of $N$ image vectors. Additionally, $\boldsymbol {\Sigma }$ sums $N$ image vectors to one with weight $w_i$. Equation 3 expresses the equivalent forward imaging model of the LightguideCam with an operator $f$.

The equivalent model describes PSF variance in the entire space, with a depleted calibration along the $p_x$ axis, which is highlighted by the red dotted box in Fig. 2(b). As demonstrated in Figs. 4(a) and 4(b), the high similarity between the experimentally measured and simulated PSFs along three directions are quantitatively demonstrated, showcasing the effectiveness of the equivalent model in describing the spatially varying artifacts caused by the lightguide. The PSFs at different depths $d$ exhibit low cross correlation, as indicated in Fig. 4(c). Therefore, scenes at different depths can be separately reconstructed, indicating the LightguideCam’s capability on depth resolution.

Fig. 4. PSF analysis of the LightguideCam. (a) Comparison between the simulated and experimentally measured PSF along the $p_x$, $p_y$, and $p_z$ axes. (b) Cross correlation analysis between the simulated and experimentally measured PSF at the three axes. (c) Correlation matrix shows the low similarity between the PSF at different depths, indicating that the LightguideCam achieves higher depth resolution by encoding object depth information onto the sensor plane.

Download Full Size | PDF

It should be noted that when the scene is located at an infinite distance, the displacements between equivalent cameras become negligible compared with the imaging distance, i.e., $\delta _x, \delta _z \ll d$. Under such circumstances, the scene forms a clear image on the sensor plane. However, computational reconstruction techniques must be employed to achieve accurate imaging of the scene when the displacement effect cannot be ignored.

We construct an optimization problem based on the equivalent imaging model. Following the compressive sensing theory, we pursue the $l_1$ norm minimization of images after sparse transformation to achieve robust reconstruction. A commonly used prior is that natural images are sparse in the gradient domain, which is also known as total variation regularization. The optimization problem is shown as

(4)$$\begin{aligned} {\hat{v}}= & \mathop{{\rm argmin}}_{\boldsymbol{v}}\Vert \boldsymbol{\Psi v} \Vert_1\\ \text{such that}\quad & \boldsymbol{b}=\boldsymbol{\Sigma A v}\\ & 0\leq \boldsymbol{v} \leq1, \end{aligned}$$

where $\Vert \cdot \Vert _1$ represents the $l_1$ norm, $\boldsymbol {\Psi }$ is the gradient operation, $\boldsymbol {b}=\boldsymbol {\Sigma A v}$ is the fidelity constraint according to the forward model, and $0\leq \boldsymbol {v} \leq 1$ constrains the intensity of reconstruction to accelerate convergence.

Equation 4 can be efficiently solved with the fast iterative shrinkage/thresholding algorithm (FISTA). The pseudo-code for iterative reconstruction is presented in Algorithm 1, where we perform gradient projection for denoising [23].

Algorithm 1. FISTA for LightguideCam reconstruction

View Table | View all tables in this article

Algorithm 2. Gradient projection $\boldsymbol{\hat{p}}=\text{prox}_{\lambda}(\boldsymbol{u}) $

View Table | View all tables in this article

In Algorithm 1 and Algorithm 2, $\lambda$ is the regularization parameter, $\boldsymbol {\Psi }$ is the gradient operation, $\boldsymbol {(\cdot )^T}$ represents the transpose, and $\nabla f(\boldsymbol {y_k})=\boldsymbol {A^T \Sigma ^T}(\boldsymbol {\Sigma A y_k -b})$. The operators $\mathcal {P}_P$ and $\mathcal {P}_C$ are projection operators that are defined on the sets $[-1,1]$ and $[0,1]$, respectively. Specifically, these operators are represented by the pixel-wise functions $\mathcal {P}_P(\boldsymbol {x})=\max \left [ \min (\boldsymbol {x},1),-1\right ]$ and $\mathcal {P}_C(\boldsymbol {x})=\max \left [ \min (\boldsymbol {x},1),0\right ]$.

Our method is verified using a non-customized PRMA Lightguide in AR glasses (LLVision L-PAT35), as demonstrated in Fig. 5(a). A Sony IMX264 CMOS sensor with $2056 \times 2463$ pixels is positioned parallel to the lens principle plane. The sensor plane could be adjusted for focusing distance by moving it in the normal direction. The 2D scene is positioned $d= 50$ cm away from the lightguide. The calibration of this prototype covers a field of view of 23.1$^{\circ }$, ensuring high-quality reconstruction within that region. As illustrated in Fig. 5(b), the model-based algorithm reduces overlapping artifacts in the measurement. The reconstruction achieves an angular resolution of 0.1$^{\circ }$, approaching the diffraction limits corresponding to the numerical aperture of the lightguide. Supplement 1 contains additional information on the experiment and performance analysis.

Fig. 5. Experimental setups and 2D imaging with LightguideCam. (a) LightguideCam is demonstrated using a commercial AR lightguide. (b) The proposed method effectively eliminates the artifacts present in the raw measurement.

Download Full Size | PDF

We also test the depth resolution of LightguideCam with 3D scenes as illustrated in Fig. 6. The front and back objects are respectively positioned at distances of 50 cm and 60 cm from the lightguide. Digital refocus from single-shot measurement is computationally executed by applying varying depth parameters in reconstruction. The defocused object presents overlapping artifacts. The depth resolution increases with the maximum distance between the partially reflective mirrors array, i.e., the baseline distance. A larger baseline distance will produce low-correlated PSF patterns in a larger depth range, which portends the capacity of single-shot compressive 3D imaging and extended depth of field (EDoF) imaging with the proposed LightguideCam.

Fig. 6. Depth resolution test of the prototype. (a) The 3D scene includes a card and a book, positioned at distances of 50 cm and 60 cm, respectively, from the prototype. (b) Raw measurements and (c),(d) digitally refocused reconstructions at distances of 50 cm and 60 cm, respectively.

Download Full Size | PDF

In conclusion, we have demonstrated a novel design for an integrated, compact, and flexible see-through computational camera based on a PRMA AR lightguide. The reconstruction algorithm leverages the equivalent imaging model to circumvent high computational consumption and calibration burden, enabling high-resolution reconstruction. Tests with 3D scenes reveal the potential of single-shot 3D imaging and EDoF imaging with the prototype. This scheme is valid for AR devices with diverse structures, highlighting its potential applications as diverse as smart glasses, mobile phone screens, automotive electronics, functional decorations, etc. The integrated optical path also facilitates active illumination, which can be helpful to improve the detection signal-to-noise ratio under dark environments. Moreover, it also highlights the potential of computational imaging strategies for creating new-form cameras.

Funding

National Natural Science Foundation of China (62235009); National Key Research and Development Program of China (2021YFB2802000).

Acknowledgments

The authors thank Mr. Fei Wu at LLVision for providing the AR Lightguide.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [24].

Supplemental document

See Supplement 1 for supporting content.

REFERENCES

1. Y. Shi, C. Wan, C. Dai, S. Wan, Y. Liu, C. Zhang, and Z. Li, Optica 9, 670 (2022). [CrossRef]

2. G.-Y. Lee, J.-Y. Hong, S. Hwang, S. Moon, H. Kang, S. Jeon, H. Kim, J.-H. Jeong, and B. Lee, Nat. Commun. 9, 4562 (2018). [CrossRef]

3. X. Zhang, X. Li, H. Zhou, Q. Wei, G. Geng, J. Li, X. Li, Y. Wang, and L. Huang, Adv. Funct. Mater. 32, 2209460 (2022). [CrossRef]

4. C. Chang, K. Bang, G. Wetzstein, B. Lee, and L. Gao, Optica 7, 1563 (2020). [CrossRef]

5. D. Cheng, Q. Wang, Y. Liu, H. Chen, D. Ni, X. Wang, C. Yao, Q. Hou, W. Hou, G. Luo, and Y. Wang, Light: Advanced Manufacturing 2, 336 (2021). [CrossRef]

6. J. Xiong, E.-L. Hsiang, Z. He, T. Zhan, and S.-T. Wu, Light: Sci. Appl. 10, 216 (2021). [CrossRef]

7. Y. Li, S. Chen, H. Liang, X. Ren, L. Luo, Y. Ling, S. Liu, Y. Su, and S.-T. Wu, PhotoniX 3, 29 (2022). [CrossRef]

8. P.-H. C. Chen, K. Gadepalli, R. MacDonald, Y. Liu, S. Kadowaki, K. Nagpal, T. Kohlberger, J. Dean, G. S. Corrado, J. D. Hipp, C. H. Mermel, and M. C. Stumpe, Nat. Med. 25, 1453 (2019). [CrossRef]

9. L. Mu noz-Saavedra, L. Miró-Amarante, and M. Domínguez-Morales, Appl. Sci. 10, 322 (2020). [CrossRef]

10. C. Ebner, P. Mohr, T. Langlotz, Y. Peng, D. Schmalstieg, G. Wetzstein, and D. Kalkofen, IEEE Trans. Visual. Comput. Graphics 29, 2816 (2023). [CrossRef]

11. Z. Lv, J. Liu, J. Xiao, and Y. Kuang, Opt. Express 26, 32802 (2018). [CrossRef]

12. J. Zhao, B. Chrysler, and R. K. Kostuk, Opt. Eng. 60, 085101 (2021). [CrossRef]

13. M.-B. Lien, C.-H. Liu, I. Y. Chun, S. Ravishankar, H. Nien, M. Zhou, J. A. Fessler, Z. Zhong, and T. B. Norris, Nat. Photonics 14, 143 (2020). [CrossRef]

14. A. R. Travis, T. A. Large, N. Emerton, and S. N. Bathiche, Proc. IEEE 101, 45 (2013). [CrossRef]

15. J.-H. Song, J. van de Groep, S. J. Kim, and M. L. Brongersma, Nat. Nanotechnol. 16, 1224 (2021). [CrossRef]

16. A. Koppelhuber and O. Bimber, Opt. Express 21, 4796 (2013). [CrossRef]

17. G. Kim and R. Menon, Opt. Express 26, 22826 (2018). [CrossRef]

18. X. Chen, N. Tagami, H. Konno, T. Nakamura, S. Takeyama, X. Pan, and M. Yamaguchi, Opt. Express 30, 25006 (2022). [CrossRef]

19. K. Yanny, K. Monakhova, R. W. Shuai, and L. Waller, Optica 9, 96 (2022). [CrossRef]

20. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, Light: Sci. Appl. 9, 53 (2020). [CrossRef]

21. Y. Xue, Q. Yang, G. Hu, K. Guo, and L. Tian, Optica 9, 1009 (2022). [CrossRef]

22. J. Alido, J. Greene, Y. Xue, G. Hu, Y. Li, K. J. Monk, B. T. DeBenedicts, I. G. Davison, and L. Tian, “Robust single-shot 3D fluorescence imaging in scattering media with a simulator-trained neural network,” arXiv, arXiv:2303.12573 (2023). [CrossRef]

23. A. Beck and M. Teboulle, IEEE Trans. on Image Process. 18, 2419 (2009). [CrossRef]

24. Y. Ma, Y. Gao, J. Wu, and L. Cao, “See-through camera based on an AR lightguide,” GitHub, 2023, https://github.com/THUHoloLab/LightguideCam.

Toward a see-through camera via AR lightguide

Abstract

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

REFERENCES

Supplementary Material (1)

Data availability

Cited By

Figures (6)

Tables (2)

Equations (4)

Optics Letters