Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Tomographic waveguide-based augmented reality display

Open Access Open Access

Abstract

A tomographic waveguide-based augmented reality display technique is proposed for near-eye three-dimensional (3D) display with accurate depth reconstructions. A pair of tunable lenses with complementary focuses is utilized to project tomographic virtual 3D images while maintaining the correct perception of the real scene. This approach reconstructs virtual 3D images with physical depth cues, thereby addressing the vergence-accommodation conflict inherent in waveguide augmented reality systems. A prototype has been constructed and optical experiments have been conducted, demonstrating the system’s capability in delivering high-quality 3D scenes for waveguide-based augmented reality display.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Augmented reality (AR) is widely recognized as the next-generation display technology with immense potential, and it has already secured a notable market share. Among all the augmented reality combiners, such as the beam splitter (BS) [1], the birdbath systems [2], and diffractive optical elements (DOE) [3], the waveguide has the closest appearance to glasses and can provide a user-friendly wearing experience [4]. However, due to the fact that waveguides can only transmit plane waves, the images provided by traditional AR systems usually locate on a single depth, causing serious vergence-accommodation conflict (VAC) [5].

Efforts have been made to address VAC with a variety of methods. These include Maxwellian displays [69], integral imaging augmented reality displays [1012], holographic displays [1315], and multifocal displays [1618]. The Maxwellian displays directly project images onto the retina, bypassing the eyes’ accommodation process. The integral imaging displays can approximately reconstruct the light field, offering discrete depth cues. The holographic displays rigorously reconstruct the original light field, providing complete depth cues. These methods address the VAC issue to some extent, but they are not applicable to waveguide-based AR systems.

Over the past decades, research into the multifocal display approach has yielded intriguing findings, with a key component being the utilization of varying focal devices, often in the form of tunable lenses [1923]. Back in 2008, a seminal paper laid out the foundational structure of multifocal displays, employing tunable lenses as their centerpiece [19]. Generating multiple depth planes imposes high demands on the refresh rate of the devices. In recent years, with the development of display devices, the refresh rate and depth range of multifocal displays have gradually improved [21,22]. However, it’s worth noting that most of these approaches have been directed at virtual reality applications [20], with AR receiving comparatively less attention. Some AR solutions, relying solely on traditional optical schemes like BS, exhibit complexity in structure and perform poorly in terms of portability and comfort [16,21,24,25].

To address the aforementioned challenges, we present a tomographic waveguide-based augmented reality display (TWARD) that can provide a large depth of field and solve the VAC problem. The key component is the focus adjustment and compensation subsystem (FACS), formed by the two focus tunable lenses located on both sides of the waveguide. The FACS, synchronized at high speed with the image source, constructs multiple depth layers. As a result, our prototype shows high performance AR display effect of large depth-of-field (from 0.25 m to infinity), high spatial resolution of 2048 × 1600, 25 frames per second and large pupil exit (16 mm clear aperture).

2. Methods

TWARD forms approximately continuous depth effect by several sufficiently dense depth layers. Due to the human eye’s limited resolution for close depths, which is 0.3D (diopter) [26,27], when the depth layer density exceeds the eye’s resolution capacity, intermittent depth layers can be perceived as continuous depth. The structure schematic diagram of TWARD is shown in Fig. 1. A high-refresh-rate Liquid Crystal on Silicon (LCoS), coupled with a laser light source and other relay optical systems, serves as the image source to provide all depth layer images within a single frame. A commercial optical waveguide is employed as the combiner, and offers advantages such as a wide field of view, light weight, high brightness, and pupil expansion. Two focus tunable lenses are positioned on either side of the waveguide. The one closer to the human eye modulates the depth position of virtual images and the other one compensates the impact of the modulating lens on the light of the real scene. The optical path of TWARD can be illustrated as follows: The laser light source illuminates the Liquid Crystal on Silicon. Then the images from the LCoS, which are indicated in pink in Fig. 1, are modulated to plane waves by the in-coupling lens and are coupled into the waveguide. Finally, the output images are modulated to the specific depth by the adjustment tunable lens. Meanwhile, a compensation tunable lens is placed on the other side of the waveguide to counteract the effect of the adjustment tunable lens on the real scene light.

 figure: Fig. 1.

Fig. 1. Structure schematic diagram of the tomographic waveguide-based augmented reality displays.

Download Full Size | PDF

In summary, the whole system consists of three parts, namely FACS, the pupil expanded waveguide and the high frequency image display (HFID). With the help of high-precision synchronization of the FACS and the HFID, large depth of field augmented reality effect is achieved.

2.1 Focus adjustment and compensation

The function of the FACS is to modulate the depth position of the virtual images and at the same time avoid the impact on the real scene. The FACS is made up of two tunable lenses located on the opposite side of the waveguide, namely adjustment tunable lens (ATL) and compensation tunable lens (CTL), as shown in Fig. 2.

 figure: Fig. 2.

Fig. 2. Schematic diagram of the principle of focus adjustment and compensation model. The blue rays are from real scene, while the orange rays are emitted from the outcoupling grating of the waveguide and adjusted to divergent light by the ATL. (a) When the virtual image “T” is formed onto the ${z_1}$ plane by ATL, the effect of ATL on real scene light is counteracted by CTL. (b) When the focus length of ATL is adjusted to form the virtual image “H” onto the ${z_2}$ plane, the focus length of CTL is simultaneously adjusted.

Download Full Size | PDF

In Fig. 2, the real scene light rays are represented by the blue lines, and the virtual image light rays are represented by the orange lines. The virtual image light passes through ATL only, while the real scene light passes through both CTL and ATL. The ATL modulates the plane waves emitted from the waveguide into divergent sphere waves. The reserve extension of the divergent sphere waves converges on the target depth. The CTL keeps the opposite focal power to the ATL to counteract the effect of ATL on real scene light. By changing the focal power of the ATL, virtual images can be projected to the different depths. As shown in Fig. 2(a), at the t1 moment, the focal length of ATL is set to ${f_{{t_1}}}$ to project the virtual image “t” on the z1 depth plane. Meanwhile, the focal length of CTL is set to $f_{{t_1}}^{\prime}$ to counteract the effect of ATL on real scene. At the t2 moment, the virtual image “H” is projected onto the z2 depth plane, and the focal length of CTL is simultaneously adjusted to $f_{{t_2}}^{\prime}$. When the depth planes z1 to zn are dense enough, continuous 3D scenes are formed.

According to the ideal optical system combination formula, the focal length of FACS can be calculated as follows:

$$\frac{1}{{f^{\prime}}} = \frac{1}{{{f_1}^{\prime}}} + \frac{1}{{{f_2}^{\prime}}} - \frac{d}{{{f_1}^{\prime}{f_2}^{\prime}}}$$
where $f^{\prime}$ is the image-side focal length of FACS, $f_1^{\prime}$ and $f_2^{\prime}$ are the image-side focal length of CTL and ATL, respectively, d is the distance between the image-side principal plane of CTL and the object-side principal plane of ATL.

The focal power of tunable lens is directly determined by the current applied, and the human eye's ability to perceive depth is also expressed by focal power, so it is more convenient to use focal power for most occasions, as shown in Eq. (2).

$$\varphi = {\varphi _1} + {\varphi _2} - d{\varphi _1}{\varphi _2}$$
where $\varphi $ is the focal power of FAC, ${\varphi _1}$ and ${\varphi _2}$ are the focal power of CTL and ATL. By adapting $\varphi $ according to the user's level of myopia or hyperopia, it is possible to correct their visual acuity. If the user's eyes are healthy, neither nearsighted nor farsighted, and $\varphi $ is set to 0, then the focal power of CTL ${\varphi _1}$ changes with ${\varphi _2}$ as follows:
$${\varphi _1} = \frac{{{\varphi _2}}}{{d{\varphi _2} - 1}}$$

When the focal power of the CTL is improperly set, blurring may occur during high-frequency refresh. We have provided a rigorous geometric optical analysis of the causes of blurring and potential solutions have been given in section 1 ofSupplement 1. In section 2 of Supplement 1, we analyze the impact of ATL focal power changes on the field of view of virtual images. The analyses demonstrates that while the depth of virtual images changes with ATL focal power variations, the field of view of TWARD remains constant.

2.2 Depth plane images generation

By manipulating focus adjustments and compensation elements, TWARD can precisely project virtual images at defined depths. If the depth layer images are sufficiently dense and the refresh rates of all the components are high enough, exceeding the discernible threshold of the human eyes, continuous depth for three-dimensional display can be achieved.

To reconstruct an entire 3D scene, we need to separate the target into layer images by diopter, and then all the layer images need to be displayed in a short time to generate a 3D scene, as shown in Fig. 3. Specifically, as shown in Fig. 3(a), the depth map and the intensity map of a virtual scene can be obtained by ray tracing. Then the intensity map is stratified into tomographic images according to diopter given by the depth map. The tomographic images are sequentially played back over a period of time, where “t1” to “tn” represents the duration corresponding to each depth-layer image. Finally, all the tomographic images form one frame of 3D scene. Figure 3(b) and (c) show the processing method of a specific virtual scene of a 3D airplane model. The target airplane is separated into layer images and can be reconstructed by TWARD again.

 figure: Fig. 3.

Fig. 3. The generation and display methods of depth layer images. (a) The generation method. (b) The target virtual light field. (c) The reconstruction of the target light field.

Download Full Size | PDF

2.3 Matching and controlling method

To achieve continuous depth tomographic display, it is imperative to employ high-refresh-rate devices and subject them to high-precision control. Each frame of a color depth layer image is composed of red, green, and blue components. Therefore, a high-refresh-rate tricolor laser is required as the light source. The relationship between the refresh rate of the laser ${F_{laser}}$, the number of depth layers ${N_{depth}}$, and the frame rate of images perceived by the human eye $FP{S_{eyes}}$ is represented by the following equation:

$${F_{laser}} = {N_{depth}} \times FP{S_{eyes}} \times 3$$

The matching mode of the laser, LCoS, and the focal power of the ATL is depicted as shown in Fig. 4. Each depth layer comprises three images, representing the red, green, and blue components. Insert depth layer images only on either the rising edge or the falling edge, and refraining from image insertion on the other edge. This measure halves the frame rate at which the human eye perceives the images. However, it simultaneously mitigates the impact of optical focal power errors between the rising and falling edge, results in more stable depth perceptions of the output virtual scene.

 figure: Fig. 4.

Fig. 4. The matching mode of the laser, FLCO S, and the tunable lens.

Download Full Size | PDF

3. Results

The TWARD prototype system is set up as shown in Fig. 5. The light emitted by the laser light source (LLS) is directed onto the Ferroelectric Liquid Crystal on Silicon (FLCoS) via a polarizing beam splitter (PBS). The resulting FLCoS images are coupled into the waveguide through a coupling lens and are modulated to specific depths by the ATL. Meanwhile, the real scene is transmitted through both ATL and CTL. The 12-volt battery supplies power for the whole system. The QXGA FLCoS is used as high refresh rate image display. The QXGA is a kind of reflective liquid crystal device with a spatial resolution of pixels on a 17.843mm × 13.645 mm area. The response time of QXGA is 0.5893 ms, thus, it can provide 1696 frames per second of single-color binary images. The Fisba Readybeam laser source is employed as illumination source, which supports high frequency RGB channels outputs of 1 MHz.

 figure: Fig. 5.

Fig. 5. Physical diagram of the prototype system.

Download Full Size | PDF

For the FACS, two EL-16-40 focus tunable lenses of Optotune company are used as ATL and CTL, respectively. Each of the tunable lenses can be shaped from a flat zero-state into a concave or convex lens and the focal tuning range is -10 to +10 diopters. As a result, the maximum depth range of virtual scene is 10 cm to infinity. The EL-16-40 supports a maximum optical focal modulation frequency of 1000 Hz and has a response time of 5 ms, far exceeding the requirements of the display system at 25 Hz, 30 Hz or 60 Hz, making it suitable for the focus adjustment tasks of TWARD.

The TDDW40PLUS waveguide from Greatar company is used as the coupler to provide augmented reality display, which has two surface relief gratings as the in-out grating. The waveguide can provide large field of view of 50 degree, and large eyebox of 23mm × 20 mm with the help of pupil replication function. However, due to the liquid lens aperture size of 16 mm, the system exit pupil is a circular aperture with a diameter of 16 mm.

To achieve high accuracy synchronous control of laser source, image display, and two tunable lenses, we utilize an Arduino board to generate synchronous signals. As shown in Fig. 6(a), the Arduino board generates two triangular wave signals to control ATL and CTL, and a trigger signal to control FLCoS development board. In the prototype system, the images are pre-stored into the memory of FLCoS development board. Once the trigger signal is received, the laser source can be controlled to switch RGB channels and the images can be synchronously displayed on the FLCoS. Thus, the relationship between the signals outputted by Arduino is shown in Fig. 6(b). For the analog control mode of EL-16-40 tunable lens, the voltage of 2 V represents 0 focal power, and 0V-2 V and 2V-4 V represent convex and concave ranges, respectively.

 figure: Fig. 6.

Fig. 6. The generation of control signals. (a) Synchronous controlling of all components through hardware. (b) The temporal relationships among signals.

Download Full Size | PDF

The optical experiment results are depicted in Fig. 7. A phone camera whose aperture is similar to human eyes is used to capture the results. By achieving high-precision time synchronization among FLCoS, LLS, and FACS, the best display effect of TWARD is achieved as ten depth planes with a display rate of 25 frames per second, and a depth range from 25 cm to infinity. The experiment is divided into two steps. Firstly, all the depth planes of the virtual airplane are projected about 5 m in front of the camera, as shown in Fig. 7(a) and 7(b). When the focal length of the camera is adjusted to 5 m, the virtual image of airplane and the distant object are clear, as shown in Fig. 7(a). On the contrary, when the focal length of the camera is turned to 0.25 m, the virtual image is blurred. Then, all the depth planes of the virtual airplane are place around 0.25 m, as shown in Fig. 7(c) and 7(d). In the event that the camera's focal length is once again set to 5 m, the virtual image becomes blurred, while distant real objects maintain their clarity, in accordance with Fig. 7(c). However, when the camera’s focal length is adjusted to 0.25 m again, the virtual image become sharp, as shown in Fig. 7(d). The aforementioned experiments validate the depth position adjustment capability of the TWARD. In section 3 of Supplement 1, we provide a clear demonstration of the defocus blur effect of the other two optical experiments, serving as supplementary evidence of its 3D display capability.

 figure: Fig. 7.

Fig. 7. Optical experimental results. All depth layers of the virtual airplane are positioned at about 5 m, results captured by the camera focused at 5 m (a) and 0.25 m (b). All depth layers of the virtual airplane are positioned at about 0.25 m, results captured by the camera focused at 5 m (a) and 0.25 m (b).

Download Full Size | PDF

4. Discussion

Despite the use of QXGA FLCoS in the prototype, the image quality is still not optimal. After analysis, three factors have been identified: One issue is that our image processing algorithm. We employ an image dithering algorithm to simulate grayscale effects using a single binary image, which reduces the resolution of the displayed images. Using temporal accumulation to generate grayscale effects could enhance spatial resolution but at the expense of refresh rate. Another issue is the seidel aberration generated by the liquid lens we used, which occurs to a certain extent when the lens is placed vertically due to gravity. Opting for liquid lenses with a smaller range of focal power modulation, such as 5 diopters, can reduce chromatic aberration. Lastly, the lack of uniformity in the laser light source affects the brightness uniformity of the images.

The liquid lenses used in the prototype exhibit a transmittance of approximately 94% in the visible light spectrum. Consequently, the overall transmittance of the system is only reduced by 11.64% due to the ATL and CTL.

5. Conclusion

An innovative tomographic waveguide-based augmented reality system has been proposed to provide real 3D full-color display. This approach enables rapid adjustment of the depth position of virtual images by controlling the focal power of the CTL and ATL located on both sides of the waveguide, addressing the issue of VAC in waveguide-based augmented reality systems. A prototype has been constructed, and optical experiments have been conducted. The results demonstrate that the system can achieve continuous depth scene display with a maximum of ten depth layers and 25 frames per second. The depth range of the virtual scene can span from 0.25 meters to infinity. However, existing components result in an overall thickness of approximately 2 cm for the glasses, reducing the portability of this solution. Additionally, the current FLCoS refresh rate is only 1696 Hz, leading to a limited number of depth layers and a lower frame rate for images perceived by the human eye. Our future work will involve using liquid crystal devices to achieve focus adjustment and compensation, further enhancing the compactness of the method and improving user comfort. We will also opt for higher refresh rate display devices such as Digital Micromirror Device to increase the number of depth layers, reduce flicker, and enhance image quality.

Funding

National Key Research and Development Program of China (2021YFB2802100); National Natural Science Foundation of China (62035003).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. H. Hua and S. Liu, “Dual-sensor foveated imaging system,” Appl. Opt. 47(3), 317–327 (2008). [CrossRef]  

2. Y. Itoh, T. Kaminokado, and K. Akşit, “Beaming Displays,” IEEE Trans. Visual. Comput. Graphics 27(5), 2659–2668 (2021). [CrossRef]  

3. C. Jang, K. Bang, S. Moon, et al., “Retinal 3D: augmented reality near-eye display via pupil-tracked light field projection on retina,” ACM Trans. Graph. 36(6), 1–13 (2017). [CrossRef]  

4. Y. Ding, Q. Yang, Y. Li, et al., “Waveguide-based augmented reality displays: perspectives and challenges,” eLight 3(1), 24 (2023). [CrossRef]  

5. D. M. Hoffman, A. R. Girshick, K. Akeley, et al., “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J Vis 8(3), 33 (2008). [CrossRef]  

6. X. Shi, J. Liu, Z. Zhang, et al., “Extending eyebox with tunable viewpoints for see-through near-eye display,” Opt. Express 29(8), 11613–11626 (2021). [CrossRef]  

7. X. Zhang, Y. Pang, T. Chen, et al., “Holographic super multi-view Maxwellian near-eye display with eyebox expansion,” Opt. Lett. 47(10), 2530–2533 (2022). [CrossRef]  

8. S. Zhang, Z. Zhang, and J. Liu, “Adjustable and continuous eyebox replication for a holographic Maxwellian near-eye display,” Opt. Lett. 47(3), 445–448 (2022). [CrossRef]  

9. Z. Wang, K. Tu, Y. Pang, et al., “Lensless phase-only holographic retinal projection display based on the error diffusion algorithm,” Opt. Express 30(26), 46450–46459 (2022). [CrossRef]  

10. H. Huang and H. Hua, “High-performance integral-imaging-based light field augmented reality display using freeform optics,” Opt. Express 26(13), 17578–17590 (2018). [CrossRef]  

11. J. Wang, H. Suenaga, H. Liao, et al., “Real-time computer-generated integral imaging and 3D image calibration for augmented reality surgical navigation,” Computerized Medical Imaging and Graphics 40, 147–159 (2015). [CrossRef]  

12. H. Deng, Q.-H. Wang, Z.-L. Xiong, et al., “Magnified augmented reality 3D display based on integral imaging,” Optik 127(10), 4250–4253 (2016). [CrossRef]  

13. Z. He, X. Sui, G. Jin, et al., “Progress in virtual reality and augmented reality based on holographic display,” Appl. Opt. 58(5), A74–A81 (2019). [CrossRef]  

14. A. Maimone, A. Georgiou, and J. S. Kollin, “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36(4), 1–16 (2017). [CrossRef]  

15. B. L. Jae-Hyeung Park, “Holographic techniques for augmented reality and virtual reality near-eye displays,” Light: Advanced Manufacturing 3(1), 1 (2022). [CrossRef]  

16. G. D. Love, D. M. Hoffman, P. J. Hands, et al., “High-speed switchable lens enables the development of a volumetric stereoscopic display,” Opt. Express 17(18), 15716–15725 (2009). [CrossRef]  

17. R. Narain, R. A. Albert, A. Bulbul, et al., “Optimal presentation of imagery with focus cues on multi-plane displays,” ACM Trans. Graph. 34(4), 1–12 (2015). [CrossRef]  

18. C. K. Lee, S. Moon, S. Lee, et al., “Compact three-dimensional head-mounted display system with Savart plate,” Opt Express 24(17), 19531–19544 (2016). [CrossRef]  

19. L. Sheng, C. Dewen, and H. Hong, “An optical see-through head mounted display with addressable focal planes,” in 7th IEEE/ACM International Symposium on Mixed and Augmented Reality (2008), 33–42.

20. S. Lee, Y. Jo, D. Yoo, et al., “Tomographic near-eye displays,” Nat. Commun. 10(1), 2497 (2019). [CrossRef]  

21. K. Rathinavel, H. Wang, A. Blate, et al., “An Extended Depth-at-Field Volumetric Near-Eye Augmented Reality Display,” IEEE Trans. Visual. Comput. Graphics 24(11), 2857–2866 (2018). [CrossRef]  

22. C. Ebner, S. Mori, P. Mohr, et al., “Video See-Through Mixed Reality with Focus Cues,” IEEE Trans. Visual. Comput. Graphics 28(5), 2256–2266 (2022). [CrossRef]  

23. J.-H. R. Chang, B. V. K. V. Kumar, and A. C. Sankaranarayanan, “Towards multifocal displays with dense focal stacks,” ACM Trans. Graph. 37(6), 1–13 (2018). [CrossRef]  

24. D. Dunn, C. Tippets, K. Torell, et al., “Wide Field Of View Varifocal Near-Eye Display Using See-Through Deformable Membrane Mirrors,” IEEE Trans. Visual. Comput. Graphics 23(4), 1322–1331 (2017). [CrossRef]  

25. K. Akşit, W. Lopes, J. Kim, et al., “Near-eye varifocal augmented reality display using see-through screens,” ACM Trans. Graph. 36(6), 1–13 (2017). [CrossRef]  

26. F. W. Campbell, “The Depth of Field of the Human Eye,” Opt. Acta 4(4), 157–164 (1957). [CrossRef]  

27. M. Susana, M. Esther, and N. Rafael, “The depth-of-field of the human eye from objective and subjective measurements,” Vision Res. 39(12), 2039–2049 (1999). [CrossRef]  

Supplementary Material (1)

NameDescription
Supplement 1       Supplemental Document 1

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Structure schematic diagram of the tomographic waveguide-based augmented reality displays.
Fig. 2.
Fig. 2. Schematic diagram of the principle of focus adjustment and compensation model. The blue rays are from real scene, while the orange rays are emitted from the outcoupling grating of the waveguide and adjusted to divergent light by the ATL. (a) When the virtual image “T” is formed onto the ${z_1}$ plane by ATL, the effect of ATL on real scene light is counteracted by CTL. (b) When the focus length of ATL is adjusted to form the virtual image “H” onto the ${z_2}$ plane, the focus length of CTL is simultaneously adjusted.
Fig. 3.
Fig. 3. The generation and display methods of depth layer images. (a) The generation method. (b) The target virtual light field. (c) The reconstruction of the target light field.
Fig. 4.
Fig. 4. The matching mode of the laser, FLCO S, and the tunable lens.
Fig. 5.
Fig. 5. Physical diagram of the prototype system.
Fig. 6.
Fig. 6. The generation of control signals. (a) Synchronous controlling of all components through hardware. (b) The temporal relationships among signals.
Fig. 7.
Fig. 7. Optical experimental results. All depth layers of the virtual airplane are positioned at about 5 m, results captured by the camera focused at 5 m (a) and 0.25 m (b). All depth layers of the virtual airplane are positioned at about 0.25 m, results captured by the camera focused at 5 m (a) and 0.25 m (b).

Equations (4)

Equations on this page are rendered with MathJax. Learn more.

1 f = 1 f 1 + 1 f 2 d f 1 f 2
φ = φ 1 + φ 2 d φ 1 φ 2
φ 1 = φ 2 d φ 2 1
F l a s e r = N d e p t h × F P S e y e s × 3
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.