High dynamic range head mounted display based on dual-layer spatial modulation

Miaomiao Xu; Hong Hua

doi:10.1364/OE.25.023320

1. Introduction

A head-mounted display (HMD) is one of the key enabling technologies for virtual reality (VR) and augmented reality (AR) systems and has been developed for a wide range of applications. For instance, a lightweight optical see-through HMD (OST-HMD) enables optical superposition of two-dimensional (2D) or three-dimensional (3D) digital information onto a user’s direct view of the physical world and maintains see-through vision to the real world. It is viewed as a transformative technology in the digital age, enabling new ways of accessing digital information essential to our daily life. In recent years, significant advancements have been made toward the development of high-performance HMD products and several HMD products are commercially deployed.

Despite the tremendous progresses, one key limitation of the state-of-the-art HMDs is its low dynamic range (LDR). The dynamic range (DR) of a real scene, which has infinite and continuous luminous levels, is commonly defined as the ratio between the brightest and the darkest luminance in the scene, or as a base-10 or base-2 logarithmic value of the ratio. A display, however, typically renders finite and discrete number of luminous levels, and its DR is commonly characterized by the maximum command level (CL) that a display can produce or the bit depth (BD) for each pixel (or each channel in the case of color displays). Most of the state-of-the-art color displays, including HMDs, are only capable of rendering images with an 8-bit depth per color channel or maximally 256 discrete intensity levels. Such low bit depth is far below the capability of accurately rendering the broad dynamic range of real-world scenes that can reach as high as 14 orders of magnitude and infinitely fine luminous levels. In the meanwhile, the perceivable luminance variation range of the human visual system is above 5 orders of magnitude without adaptation [1]. For immersive VR applications, the images of LDR HMDs lack the capability of rendering scenes with large contrast variations, which may result in loss of fine structural details, high image fidelity, and sense of immersion. For optical see-through AR applications, the virtual images displayed by LDR HMDs may appear to be washed out with highly compromised spatial details when merged with a real scene which likely contains a much wider dynamic range by several orders of magnitude. The most common method of displaying a high dynamic range (HDR) image on conventional LDR displays is to adopt a tone-mapping technique that compresses the HDR the image to fit the dynamic range of the LDR devices while maintaining the image integrity. Although a tone-mapping technique can make HDR images accessible through conventional displays of nominal dynamic range, it comes at the cost of reduced image contrast which is subject to the limit of the device dynamic range and it does not prevent the displayed images being washed out in an AR display. Therefore, developing hardware solutions to HDR-HMD technologies becomes essentially critical, especially for AR applications.

Several efforts have been made toward developing hardware solutions to HDR displays for direct-view type desktop applications. Perhaps the most straightforward method toward HDR displays is to increase the maximally displayable luminance level and increase the addressable bit-depth for each of the color channels of the display pixels. However, it requires high-amplitude, high-resolution drive circuits as well as light sources of high luminance, which is challenging to implement at affordable cost [2]. An alternative method is to combine two or more layers of spatial light modulators (SLM) to simultaneously control the pixel output values. Seetzen et al. proposed an HDR display method for direct-view desktop displays based on a dual-layer spatial light modulating scheme [3]. Different from conventional liquid-crystal displays (LCD) using a uniform backlight, they utilized a projector to provide spatially modulated light source for a transmissive LCD to achieve dual-layer modulation and achieve a 16-bit dynamic range with two 8-bit SLMs. They also demonstrated an alternative implementation of the dual-layer modulation scheme, in which an LED array, driven by spatially-varying electrical signals, is used to replace the projector unit and provides a spatially-varying light source to an LCD [3]. More recently, Wetzstein et al [4,5] and Hirsch et al [6] demonstrated the use of multi-layer multiplicative modulation and compressive light field factorization method for HDR displays.

In principle, the aforementioned multi-layer modulation scheme developed for direct-view desktop displays can be adopted to the design of an HDR-HMD system by directly stacking two or more miniature SLMs along with a backlight source and an eyepiece. In practice, however, the direct stacking method of multiple layers of SLMs suffers from several critical problems, which make it practically infeasible for HDR-HMD systems. In this paper, we present a systematic work of the design, implementation, calibration, and image-rendering of an HDR-HMD system by adapting the dual-layer spatial modulation method. The rest of the paper is organized as follows. We discuss the key challenges in adapting the dual-layer modulation scheme for HDR-HMD designs in Section 2, describe the overall optical system design and prototype implementation in Section 3, present the geometric calibration and rendering process Section 4, discuss the radiance calibration and image rendering in Section5, and demonstrate the experimental results of HDR image rendering and display in Section 6.

2. Multi-layer spatial modulation method for HDR-HMD displays

Figure 1 shows the schematic layout of two-dimensional (2D) HDR displays based on a multi-layer spatial light modulating scheme. A viewer sees a 2D image located at one of the SLM layers (e.g. the front layer SLM1), from which a cone of light is seen from each pixel of the image, and the other layers (e.g. the back layer SLM2) provide spatially-varying light modulation to enhance dynamic range. Without the loss of generalization, hereafter the front layer is referred to as the display layer and all the back layers adjacent to the light source are referred to as the modulation layer. It is worth noting the physical order of the display and modulation layers may be altered in an actual implementation. When the layers of SLMs are placed with negligible separations, the maximum command level that can be rendered by such a multi-layer architecture can be as high as (CL₁*CL₂*…CL_N), where CL₁, CL₂, and CL_N are the maximum command levels of the N-layers of the SLMs, respectively. It is worth pointing out that the number of distinctive command levels rendered by a multi-layer modulation architecture, depending on the rendering algorithms, is typically smaller than the maximum CL given by the product. Finally, a multi-layer modulation architecture gains the DR extension at a potential cost of compromised light efficiency considering the light loss through multiple SLMs. Therefore, increasing light efficiency is a critical factor of consideration when selecting the type and number of SLMs for the layers.

Fig. 1 Schematic layout of multi-layer modulating scheme of 2D HDR display with two different cases. (a) two SLM layers perfectly overlay as they have same spatial resolution. The spatial frequency contents of the image are distributed equally in both layers and (b) two SLM layers are separated with a gap as they have different spatial resolution. The spatial frequency contents of the image are distributed unevenly in two layers.

Download Full Size | PDF

The range and accuracy of the DR modulation highly depend on the spacing between the SLM layers. When the two SLMs of an HDR display provide the same spatial resolution, as illustrated in Fig. 1(a), making the two SLM layers perfectly overlay without any axial gap is highly desired, which offers pixel-by-pixel DR modulation and yields the maximum range and accuracy of DR enhancement. In this case, each pixel on the display layer is only modulated by a corresponding pixel on the modulation layer. When the spatial resolutions of the two SLMs do not match, as illustrated in Fig. 1(b), the higher-resolution SLM is typically viewed as the display layer for displaying contents of high spatial frequencies while the lower-resolution SLM is for providing low-frequency, spatially-varying light modulation, which is similar to the prototype by Seetzen et al utilizing a low-resolution LED array for spatially-varying modulation [3]. In this case, a small gap between the modulation layer and the display layer is desired such that the pixel structure of the low-resolution modulation layer is not perceivable and not degrades the overall image quality. With a non-negligible gap, a finite area on the modulation layer projects the cone of light seen from a pixel on the display layer and it highly depends on the separation of the two SLM layers and the numerical aperture of the HDR display to the viewer. When the projection area on the modulation layer is larger than the pixel size of the modulation layer, the perceived illuminance of each pixel on the display layer is simultaneously modulated by multiple pixels on modulation layer. Consequently, the maximal range and accuracy of dynamic range enhancement are compromised in comparison to pixel-by-pixel modulation approach. Furthermore, the low-resolution pixel structure of the modulation layer may have negative effects on the overall quality of the dual-layer modulated image. For instance, the pixel structure of the modulation layer may cast blurry but visible halo or shadow effects near the sharp edges of a displayed image.

Directly stacking two transmissive, miniature LCDs along with a backlight source may be considered as the most straightforward and compact adaption of the dual-layer modulation schemes for an HDR-HMD system, which would result in a hardware configuration similar to the light field stereoscope approach by Wetzstein et al [4]. Unlike the light field stereoscope approach where the SLM layers are spaced apart with a necessary gap for rendering the light field apparently emitted by an object in space by the directional light rays defined by pairs of pixels, the gap between the SLM layers for HDR rendering is subject to the considerations discussed above. In practice, however, the direct stacking method of multiple layers of SLMs suffers from several critical problems, which make it practically infeasible for HDR-HMD systems. Firstly, transmissive LCDs tend to have low dynamic range and low transmittance. The stacked dual-layer modulation would lead to very low light efficiency and limited dynamic range enhancement. Secondly, transmissive LCDs tend to have low fill factors and the microdisplays utilized in HMDs typically have pixels as small as a few microns, much smaller than the pixel size of about 100~500 microns for direct-view displays. As a result, the light transmitting through a two-layer LCD stack will inevitably suffer from severe diffraction effects and yield poor image resolution following an eyepiece magnification. Most importantly, due to the physical construction of typical LCD panels, the modulation layers of liquid crystal will be inevitably separated by a gap as large as a few millimeters, depending on the physical thickness of the cover glasses. For direct-view type desktop displays, a gap of a few millimeters between the two SLMs would not have much influence for dynamic range modulation. In an HMD system where the microdisplays are optically magnified by a large magnitude, even a gap as small as 1 millimeter in the SLM stack will be elongated by tens of times, resulting in a large separation in the viewing space, due to the HMD eyepiece magnification. The resulted separation in the visual space depends on the physical gap between the SLM layers and the focal length of the eyepiece. For example, assuming a focal length of 20mm for a typical HMD eyepiece and a typical viewing distance of 2 meters in the visual space, a 0.1mm gap in the SLM stack will lead to an axial separation as large as 0.6 meters. Unlike a tensor display where a gap between adjacent SLM layers is required for light field rendering, a large gap between adjacent SLM layers makes accurate dynamic range modulation practically impossible.

To illustrate how the quality of HDR enhancement is influenced by the physical gap between the SLMs, we simulated the reconstructed image performance as a function of the SLM separation. The grayscale map of a HDR target image is shown in Fig. 2(a), which is a sinusoid fringe pattern with decreasing spatial frequencies and damped image contrast in the horizontal and vertical directions, respectively. The target image has a spatial resolution of 1280 by 960 pixels with a 6.35um actual pixel pitch, which corresponds to the resolution of the hardware we used in Section 3. The maximum luminance and command level (CL) of the target HDR image are assumed to be 220 $c d / m^{2}$ and 65535 (i.e. 16-bit depth), respectively. The front SLM is assumed to be the display layer, while the back SLM is assumed to be the modulation layer and its distance from front layer is varied to simulate the effects of the physical gap. A numerical aperture of 0.176 in the microdisplay space is assumed to view the HDR display. Figures 2(b).1 and 2(b).2 show the grayscale of the reconstructed HDR images by the two SLM layers with a gap of 0.01mm and 2 mm, respectively, between the two SLMs. The image intensity profiles along the dashed line in Fig. 2(a) are plotted underneath each image. Compared with the original intensity profile in Fig. 2(a), image contrast degradation is more noticeable in Fig. 2(b).2 with a larger gap than in Fig. 2(b).1 of a smaller gap. To simulate the HDR image reconstruction, the original HDR image was decomposed into two 8-bit images for the two SLMs based on the same rendering algorithm in Section 5. With a given gap between the two SLMs, the projected image of the back layer at the location of the display layer is blurred, which can be simulated by the aberration-free incoherent point spread function (PSF) of the defocused layer [7,8]:

Fig. 2 Image reconstruction results of a dual-layer HDR-HMD and reconstructed image performance evaluation with the physical separation of the two SLMs set at 0.01mm and 2mm, respectively. (a) tone-mapped HDR target image, along with the intensity distribution along the dashed line; (b) the tone-mapped reconstructed HDR image; (c) binary noticeable difference map, where white areas denote the pixels with errors beyond the JND threshold; (d) PSF cross section at the display layer.

Download Full Size | PDF

\begin{array}{l} P S F (r, Δ z) = {| 2 \int_{0}^{1} J_{0} (\frac{2 π}{λ} N A \cdot r \cdot ρ) \cdot \exp (i \frac{2 π}{λ} σ ρ^{2}) ρ d ρ |}^{2} \\ \begin{matrix} \begin{matrix}  \end{matrix} \end{matrix} w i t h \begin{matrix} σ = 2 Δ z \sin^{2} (\frac{α}{2}) \end{matrix} \end{array}

Where $Δ z$ is the defocusing distance, which is the separation distance between the back layer and display layer, $J_{0}$ is the zero order Bessel function, $N A$ is the numerical aperture of the HDR image generator, $α$ is the half angle of the emitting cone corresponding to $N A$ , $r$ is the radial distance from the ray bundle center, $λ$ is the wavelength, $ρ$ is the normalized integral variable on the exit pupil. The equivalent image of the defocusing modulation layer at the location of the display layer can be computed by convolving the PSF. The display layer content can be obtained by dividing the original target image with convolved image as a compensation.

We further computed the just noticeable difference (JND), which denotes the threshold intensity errors for all the command levels of a 16-bit image that are just distinguishable by the human visual perception based on the Barten’s contrast sensitivity function (CSF) model with the DICOM standard [9]. We then compared the reconstructed HDR image against the original target image and computed a binary noticeable difference map where a white pixel denotes that the intensity difference between the reconstructed and original HDR images is above the corresponding JND value and a dark pixel indicates the difference is below the threshold. Figures 2(c).1 and 2(c).2 show the binary noticeable difference maps corresponding to the 0.01mm and 2mm gap, respectively. The noticeable areas were 0 and 67.95% of the whole image for SLM gap of 0.01mm and 2mm, respectively. A significant increase in the noticeable errors is observed with the increase of SLM gap, especially in the region with high contrast and high spatial frequencies. Figures 2(d).1 and 2(d).2 further plot the PSFs of a back layer point on the display layer for the gap of 0.1 and 2mm, respectively, which characterize the intensity distribution of the light cone on the display layer. The light cone from a pixel on the modulation layer is projected to a circular region covering 1 and 12769 pixels for the gap of 0.01mm and 2mm, respectively, which shows a dramatically larger illuminated area and distribution change.

3. System design and prototype

Based on the analysis in Section 2, reducing the gap between the modulation and display layers becomes a critical consideration in the design of an HDR-HMD using a dual-layer modulation method. We proposed a new HDR-HMD system design in which ferroelectric liquid crystal on silicon (F-LCoS) microdisplays were used as the SLMs and an optical relay system was designed to optically minimize the physical separation between the SLMs. Compared to the drawbacks of low fill factor and low light efficiency of transmissive LCDs, reflective type F-LCoS microdisplays offer high pixel resolution with high fill factor as well as large contrast ratio and optical efficiency, which help to minimize diffraction artifacts and enhance dynamic range of the system [10]. The relay optics enables the system to optically overlay the SLMs with a minimal gap as small as a few microns and achieve pixel-by-pixel contrast modulation.

Figure 3 shows the schematic layout of a monocular HDR-HMD design, consisting of an HDR image generator and viewing optics. The HDR image generator is composed of two F-LCoS microdisplays by Miyota (FL1401) [11] and a custom relay system. The two F-LCoS devices are 0.4” diagonally with a pixel resolution of 1280 by 960 and a 6.35um pixel pitch. The LCoS1 has a built-in RGB LED source served as the system illumination and a built-in wire grid film (WGF) served as the output polarizer, while we removed the built-in LED source and WGF polarizer from the LCoS2. The LCoS1 unit offers a maximum luminance of $220 c d / m^{2}$ . A relay system with a 1:1 magnification and a polarizing beam splitter (PBS) were inserted between LCoS1 and LCoS2. The p-polarized light output by the LCoS1 transmits through the relay system and the PBS, is converged to form an intermediate image of LCoS1 coinciding with the LCoS2, and provides the per-pixel modulated illumination source for the LCoS2. With its built-in LED and the WGF cover removed, the LCoS2 modulates the incoming light and changes its polarization state based on the properties of LCoS microdisplays. The reflected s-polarized light modulated by the LCoS2 is then reflected by the PBS and propagates toward the viewing optics [12]. Given that both of the LCoS displays have an 8-bit depth for each color channel, the HDR image generator is anticipated to achieve a combined 16-bit modulation with an enhanced dynamic range beyond 60000:1.

Fig. 3 Schematic layout of a monocular HDR-HMD based on dual-layer modulation scheme.

Download Full Size | PDF

One of the most critical aspects in designing an HDR-HMD system based on the scheme in Fig. 3 is the design of the relay system, because the final image quality highly depends on how well the optical resolution, contrast, and pixel geometry of the LCoS1 are retained by the relay system. Owing to the reflective nature of LCoS displays, it is highly desirable to achieve a double telecentric relay system that offers uniform illumination modulation, uniform optical magnification, and uniform light efficiency across the field of view (FOV). A double telecentric relay also offers some tolerance to small axial displacements between the images of the two SLMs. Based on the requirement of telecentricity and the incidence angle limit of ± 10° of the LCoS displays, an f/4 relay system was designed with all off-the-shelf lenses. The representative wavelengths were set to be 0.47, 0.55 and 0.61μm according to the dominant wavelengths of the RGB LED sources and were weighted as 1:3:1 based on the relative luminance response of the human visual system. Figure 4 shows the final optimization result of the HDR image generator, with the optical layout, polychromatic modulation transfer function (MTF), and f-tangent distortion shown in Figs. 4(a)-4(c) respectively. The MTF values are all above 0.25 at the cut-off frequency of 78.7 cycles/mm over the entire FOV. The system distortion is well corrected, less than −1.55% over the field. The residual distortion can be corrected by image processing which will be discussed in Section 4. The diameter of the relay tube is only 25mm and system total length is 123mm.

Fig. 4 Optical design of an HDR image generator: (a) optical layout; (b) polychromatic modulation transfer function; and (c) distortion.

Download Full Size | PDF

Figure 5 shows a bench prototype of the HDR-HMD system. The two LCoS panels were mounted onto two optical mount platforms with $\pm 6 °$ tip-tilt adjustments. The mechanical housing of the relay system was 3D printed and a commercial eyepiece was utilized as the viewing optics for magnifying HDR image. A grayscale camera with a focal length of 16mm was placed at the exit pupil of the eyepiece to replace an eye for image capturing.

Fig. 5 Bench prototype of an HDR-HMD system based on the optical design in Fig. 4.

Download Full Size | PDF

4. Geometrical calibration and rendering method

One of the most critical aspects in developing the prototype shown in Fig. 5 is to overlay the virtual images of the two SLMs with a substantially zero gap in between in order to achieve pixel-by-pixel high-accuracy modulation of dynamic range, high-resolution image, and high tolerance to the eye position within the exit pupil. Even small axial or lateral displacements can cause visible artifacts or resolution degradation. In practice, it is very challenging, if feasible at all, to achieve pixel-level alignment through pure mechanical adjustments of the LCoS panels, each of which has 6 degrees of freedom with translations and rotations. Moreover, the different optical paths of the two SLMs with respect to the shared eyepiece will introduce different optical distortions, which impose additional challenges for achieving perfect alignment through mechanical means.

To address these alignment challenges and achieve per-pixel modulation, we developed a camera-based calibration process. Figure 6 shows a simplified projection process where L1 and L2 represent the virtual images of the LCoS1 and LCoS2, respectively, viewed through the HMD optics, and the calibration camera located at O is assumed to be the global reference frame. A camera with a focal length of 12mm and an angular resolution of 0.44 arc minutes/pixel was placed at the exit pupil of the HMD and the lens distortion and intrinsic projection matrix of the camera was pre-calibrated using a well-established camera calibration method [13]. All captured images were digitally pre-warped using these calibrated parameters before applying display calibration. Through two separate calibration steps, we adopted our existing camera-based HMD calibration method [14] to obtain the inverse homographic mapping transformation matrices, $T_{c a m \leftarrow L 1}$ and $T_{c a m \leftarrow L 2}$ , for planes L1 and L2, respectively, which map a pixel on a given SLM on to a corresponding pixel of the calibration camera coordinate system [15]. Each of the two matrices consists of an extrinsic transformation matrix characterizing the positions and orientations of a given image plane (L1 or L2) with respect to the camera reference, and an intrinsic projection matrix characterizing the imaging properties of a corresponding HMD optical path. The calibration steps also obtain the optical distortion coefficients induced to the images of the SLMs. The first three orders of radial distortion and two orders of tangential distortion coefficients were calibrated by using camera-based HMD calibration mentioned in [13,14].

Fig. 6 Illustration of the projection process of points displayed in the virtual images of the SLMs onto the camera image plane (black vectors and coordinates are denoted in the camera global coordinate OXYZ).

Download Full Size | PDF

Based on these calibrated parameters, a computational model is established for each of the SLM optical paths, through which a pixel displayed on one of the SLMs can be properly aligned with a corresponding pixel on the other SLM during the HDR image rendering process, to digitally correct alignment errors. Consider an image point P on the L2 and its corresponding location mapped on L1 is denoted as point Q. The computational model describing the mapping relationship of these two points can be simply expressed as:

\begin{array}{l} [\begin{matrix} x_{L 1} \\ y_{L 1} \\ 0 \\ 1 \end{matrix}] = T_{c a m \leftarrow L 1}^{- 1} \cdot α t \cdot T_{c a m \leftarrow L 2} [\begin{matrix} x_{L 2} \\ y_{L 2} \\ 0 \\ 1 \end{matrix}] \\ where t = \frac{A^{2} + B^{2} + C^{2}}{A l + B n + C p} \end{array}

Where

(x_{L 1}, y_{L 1})

and

(x_{L 2}, y_{L 2})

are the pixel coordinates of the points P and Q on the two image planes L1 and L2, respectively,

t

is the parameter characterizing the ray vector mapping the pair of corresponding points P and Q, which is calculated by the normal vector

(A, B, C)

of the projection plane L1 and the reference point on L2

(l, n, p)

in the camera global coordinate, α is a normalization factor which keeps the 4th dimension of the coordinate equals to 1. With the projection transformation relation in Eq. (2) and an appropriate interpolation method, the pixel-by-pixel mapping between the images on LCoS1 and LCoS2 can be established.

To digitally correct the misalignment between the two SLMs, the image to be displayed on of each SLM should be rendered individually by accounting for the corresponding homographic mapping matrix and distortion characteristics. Figure 7 illustrates the geometrical rendering procedures for two SLMs. As shown in Fig. 7(a), geometrically rendering the image for LCoS1 starts with pre-warping the pixel locations of an original image by applying the computational model expressed in Eq. (1), which enables pixel-by-pixel alignment of the LCoS1 against LCoS2 in the camera reference frame without accounting for optical distortions. The pre-warped image is further warped by applying the distortion correction. Geometrically rendering the LCoS2 image, shown in Fig. 7(a), is simpler than LCoS1 and it mainly requires an image flip operation for image parity change due to the odd number of reflections in the LCoS2 optical path and a pre-warp step for optical distortion correction.

Fig. 7 Geometric calibration procedure and alignment performance of a dual-layer HDR display. (a) geometric rendering procedure to create pre-warped images for LCoS1 and LCoS2 using a repetitive dot pattern as an input image; (b) error analysis after alignment procedure, Asterisk and circular dots stand for sampled coordinates on each LCoS. Arrow directions and magnitudes stand for the directions and relative magnitudes of the residual alignment errors; (c.1) and (c.2) photographs captured through the HDR system before applying the geometric pre-warping rendering and one after applying the correction, respectively.

Download Full Size | PDF

To evaluate the alignment residual errors, we created a grid target sampling the entire field of view with 19*14 evenly-distributed positions. The original image was rendered using the procedures described above and shown in Fig. 7(a). The pre-warped images were displayed through the corresponding SLMs. To quantify the alignment error, we measured the magnitude and direction of the alignment errors of the 19*14 sampled points and Fig. 7(b) plotted the error map. The average of residual alignment errors can be less than 0.5 pixel after the whole alignment procedure. As a visual validation, two photographs captured by the calibration camera through the HMD system, one before applying the geometric pre-warping rendering and one after applying the correction, were shown in Figs. 7(c.1) and 7(c.2), respectively. The photograph for post-correction clearly demonstrated high-accuracy alignment of the calibration procedure.

5. Radiance calibration and HDR image rendering

The second key aspect in developing the dual-layer HDR prototype shown in Fig. 5 is to calibrate the radiance response of the system so that radiance values stored in an HDR image can be correctly rendered as command levels and displayed through the two SLMs. The radiance response of the system, which establishes the relationship between the SLM command level and the resulted scene luminance, depends on not only the typically non-linear tone response curves of each SLM, but also the inherent inhomogeneity of the optical system, such as the system vegnetting, non-uniform illumination or pupil mismatch. Appropriate radiance correction should be implemented to obtain a desired level of radiance uniformity across the field of view.

Achieving the aforementioned radiance calibration across the field of view requires the calibration of a set of field-dependent tone response curves for each color channel in each SLM light path. Although in theory such calibration can be carried out by repetitively measuring the spectral luminance response of each color stimuli patch displayed at a given field position with a spectra-radiometer, such a field-by-field calibration for each of the color channels and each SLM is practically impossible not only due to the large repetition of measurements but also due to the practical challenges of aligning the display field position with the field of the spectra-radiometer. To overcome this challenge, we developed a three-step calibration process. The first step is to obtain the tone response curves of the central field for each SLM, the second step is to obtain a field-dependent uniformity correction map for the HDR display, and the third step is to obtain field-dependent tone response curves.

The calibration started with obtaining the tone response curves for the central field of each SLM path. A broad-band spectra-radiometer was placed at the center of the exit pupil of the HMD system to measure the spectral radiance of a small color patch displayed at the center of a SLM under calibration. By adopting a calibration procedure similar to the one in [3], the tone response curves of each color channel of each SLM path were obtained separately based on the spectral radiance measurements obtained for different color calibration patches. As an example, the normalized tone response curves of the central field of LCoS1 after piecewise cubic polynomial fitting are shown in Fig. 8(a).

Fig. 8 The results of radiance response calibration and compensation. (a) red, green and blue channel response curves for LCoS1, (b) field normalization map obtained from radiance calibration, (c) tone response curves of LCoS1 green channel at the central field and four corner fields after radiance correction.

Download Full Size | PDF

The second-step calibration aims to render an apparently uniform image through the HDR display after applying correction to a uniform grayscale image. A radiance-calibrated camera was placed at exit pupil of the HMD optics to capture the modulated images of the SLMs on which images with equal grayscales (e.g. white background) were shown. A relative radiance response map of the HMD can be obtained from the captured image by applying the correction of the camera radiance response, from which a field normalization map $f (i, j)$ , defined as the ratio of the luminance for each pixel $(i, j)$ to the maximum luminance over the entire field, was calculated, which will be utilized for field uniformity correction. Figure 8(b) plotted the field normalization map obtained through the calibration.

To achieve the field uniformity correction, the final step of calibration is to obtain the field-dependent tone response curves for a given field of each SLMs by truncating the tone response curves of the central field at the maximum command levels over the field normalization map, $f (i, j)$ , and then scaling the rest part of the curves. Figure 8(c) plotted an example of the tone response curves of LCoS1 green channel corresponding to the central field and the four corner fields. Note that truncating the response curves actually reduce the contrast ratio and maximum command level to some extent. Thus, the tradeoffs between the field radiance uniformity and the image contrast ratio should be taken into account.

Following the above steps of radiance calibration, an HDR image storing the absolute luminance values of an HDR scene needs to be converted to two images with radiance-calibrated command levels for LCoS1 and LCoS2, respectively. The rendering algorithm began with equally splitting the input HDR image into two low-dynamic range modulation images by taking the square root of the input image. The luminance values of each modulation image were then converted to the command levels of corresponding pixels based on the corresponding field-dependent tone response curves obtained via the radiance calibration process.

6. Experimental results

To demonstrate the performance of the prototype, we captured a raw HDR image by a sequence of LDR raw images of an HDR scene with different exposures, as shown in Figs. 9(a)-9(k). The captured scene mainly consists of two USAF 1951 resolution targets and a desk lamp. The resolution target on the left side of the picture was printed on a white paper while the resolution target on the right side was printed on a transparency and mounted on a transparent plastic sheet. The lamp was placed right behind the targets, partially blocked by the left target while providing backlight to the transparent target on the right. The room lighting dimly illuminated the targets. Overall, the scene provides a wide range of luminance distribution as well as spatial details. It can be clearly seen from the pictures that the features of the light bulb and top-left corner of the transparent target were only captured with short exposures while the front targets in the shadow area were captured with long exposures. Radiance maps of the raw LDR images were then analyzed based on their radiance distribution and relative exposures. Finally, a single HDR image, which encodes the extended luminance values rather than the discrete command levels, was then synthesized from the multi-exposure radiance maps. To display the synthesized HDR image via LDR devices, the HDR image was tone-mapped into an 8-bit image shown in Fig. 9(l). Though the tone-mapped image shows a significantly lower image contrast than the HDR image, it indeed preserves all the spatial details.

Fig. 9 HDR image generation: (a)-(k) raw LDR images of an HDR scene captured with different camera exposure settings with the exposure time shown on the top-left corner of each image, (l) tone-mapped HDR image synthesized from the raw LDR images.

Download Full Size | PDF

The synthesized HDR image was then re-rendered into two LDR modulation images, one for each SLM, by applying the radiance correction and rendering algorithm described in Section 5. The modulation images were then pre-warped by applying the geometric rendering algorithms in Section 4 for digital correction of misalignment and optical distortions. These geometrically corrected images were then displayed via their corresponding SLMs. To record the HDR image seen through the HDR-HMD, a camera of regular dynamic range was placed at the center of the exit pupil captured a series of images of four different exposures, shown in Fig. 10(a), to synthesize the dynamic range of the human visual system. Different parts of the scene were clearly captured by the different exposure settings, which demonstrated the fact that the HDR-HMD system was able to successfully render and display HDR contents. As a comparison, we created a LDR-HMD setup by disabling one of the SLM and only allowing a single layer modulation. The 8-bit tone-mapped image shown in Fig. 9(l) and one of the raw LDR images captured at a short exposure were respectively displayed through the LDR-HMD configuration. Following the same camera settings as in Fig. 10(a), we captured series of images and the results were shown in Figs. 10(b) and 10(c), respectively. Using the different sets of images shown in Figs. 10(a)-10(c) and following the same procedure as used for obtaining Fig. 9(l), we synthesized an image that merged the images from four exposure settings and applied the same tone-mapping techniques to the synthesized image. The results of the tone-mapped images from Figs. 10(a)-10(c) were shown in Fig. 10(d)-10(f), respectively. It is clear that the re-rendered HDR image displayed by the HDR-HMD was able to preserve the similar amount of details and dynamic range as the one shown in Fig. 9(l), while both the tone-mapped image and LDR image displayed by an LDR-HMD failed to preserve the details and dynamic range as expected.

Fig. 10 Performance comparison of an HDR-HMD and a LDR-HMD. (a)-(c) images captured by a camera of different exposures through an HDR-HMD using an HDR image source, a LDR-HMD using a tone-mapped LDR image source, and a LDR-HMD using an LDR image under moderate exposure. (d)-(f) images synthesized and tone-mapped from the raw images in (a), (b) and (c), respectively.

Download Full Size | PDF

7. Summary

In this paper, we presented the design and implementation of a high dynamic range head mounted display system using a dual-layer spatial modulation technique. By using two LCoS microdisplays as the spatial light modulators and optically overlaying their modulation planes by a relay system, we demonstrated the ability to create an HDR-HMD with high modulation accuracy. The paper further demonstrated the development of geometrical and radiance calibrations and rendering methods, which are two of the key enabling technical aspects for the proposed HDR-HMD architecture, and experimentally demonstrated the performance of the HDR system in comparison to an LDR-HMD. Future works can be done for further analyzing the image performances, system tolerances, increasing the luminance of backlighting, as well as improving the optics and electronics.

Disclaimer

Dr. Hong Hua has a disclosed financial interest in Magic Leap Inc. The terms of this arrangement have been properly disclosed to The University of Arizona and reviewed by the Institutional Review Committee in accordance with its conflict of interest policies.

Funding

National Science Foundation (14-22653).

References and links

1. D. Gerwin, H. Seetzen, G. Ward, W. Heidrich, and L. Whitehead, “3.2: High dynamic range projection systems,” SID Symposium Digest of Technical Papers38(1), 4–7 (2007).

2. B. Hoefflinger, High-Dynamic-Range (HDR) Vision (Springer 2007).

3. H. Seetzen, W. Heidrich, W. Stuerzlinger, G. Ward, L. Whitehead, M. Trentacoste, A. Ghosh, and A. Vorozcovs, “High dynamic range display systems,” ACM Trans. Graph. 23(3), 760–768 (2004). [CrossRef]

4. G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, “Tensor displays: compressive light field synthesis using multilayer displays with directional backlighting,” ACM Trans. Graph. 31(4), 80 (2012). [CrossRef]

5. G. Wetzstein, D. Lanman, W. Heidrich, and R. Raskar, “Layered 3D: tomographic image synthesis for attenuation-based light field and high dynamic range displays,” ACM Trans. Graph. 30(4), 95 (2011). [CrossRef]

6. M. Hirsch, G. Wetzstein, and R. Raskar, “A compressive light field projection system,” ACM Trans. Graph. 33(4), 58 (2014). [CrossRef]

7. S. H. Lu and H. Hua, “Imaging properties of extended depth of field microscopy through single-shot focus scanning,” Opt. Express 23(8), 10714–10731 (2015). [CrossRef] [PubMed]

8. J. B. Sibarita, “Deconvolution microscopy,” Adv. Biochem. Eng. Biotechnol. 95, 201–243 (2005). [CrossRef] [PubMed]

9. P. Nema, “Digital imaging and communications in medicine (DICOM) Part 14: Grayscale standard display function” National Electrical Manufacturers Association, Rosslyn, VA (2000).

10. R. Hainich and O. Bimber, Displays: Fundamentals and Applications (CRC, 2016).

11. CITIZEN FINEDEVICE Co, LTD., “QuadVGA −1280×960, 0.40″ diagonal, single chip FLCoS display,” https://www.miyotadca.com/mdca_product/quadvga.

12. M. Xu and H. Hua, “46‐1: Dual‐layer High Dynamic Range Head Mounted Display,” SID Symposium Digest of Technical Papers48(1), 668–671 (2017). [CrossRef]

13. Z. Zhang, “Flexible camera calibration by viewing a plane from unknown orientations,” in The Proceedings of the Seventh IEEE International Conference on Computer Vision (IEEE, 1999) 1, pp. 666–673. [CrossRef]

14. S. Lee and H. Hua, “A robust camera-based method for optical distortion calibration of head-mounted displays,” J. Disp. Technol. 11(10), 845–853 (2015). [CrossRef]

15. O. Faugeras, Three-dimensional Computer Vision: A Geometric Viewpoint (MIT, 1993).

High dynamic range head mounted display based on dual-layer spatial modulation

Abstract

1. Introduction

2. Multi-layer spatial modulation method for HDR-HMD displays

3. System design and prototype

4. Geometrical calibration and rendering method

5. Radiance calibration and HDR image rendering

6. Experimental results

7. Summary

Disclaimer

Funding

References and links

Cited By

Figures (10)

Equations (2)

Optics Express