Abstract
We introduce a near eye light field display proposal to reconstruct a light field in high synthesis speed by utilizing the multi-layer light field display technology and human visual features. The resolution distribution of the reconstructed light field is set to be identical to human visual acuity which decreases with the increasing visual eccentricity. We compress the light field information by using different sampling rates in different visual eccentricity area. A new optimization method for the compressed light field is proposed, which dramatically reduces the amount of calculation. The results demonstrate that the acceleration of the proposed scheme is obvious and escalates when the spatial resolution increases. The synthesis scheme is verified and its key aspects are analyzed by simulation and an experimental prototype.
© 2017 Optical Society of America
1. Introduction
Virtual reality (VR) display blocks the normal sight to the real world and provides a virtual realistic scenario rendered by the computer to viewers. It has promising applications in commercial, constructional, medical and educational domains, such as video game, architecture visualization, and model visualization.
Since the first head-mounted-display (HMD) device had been developed in 1968, numerous VR schemes have been developed. These display technologies can be roughly classified into two types which are based on binocular parallax and light field. For binocular parallax based display technology, the representative devices are Google Glass and Oculus Rift, which provide a three dimensional scene by displaying different views to the left and right eyes. Nevertheless, since only two views are presented, the reconstructed virtual scene has vergence-accommodation conflict [1]. In recent years, the near eye light field display technologies have been developed to improve the amount of presented information to eliminate the vergence-accommodation conflict and enhance the immersion. Nvidia has introduced a multi-view based near eye light field display precept by employing a high-resolution OLED layer and a microlens array to reconstruct a light field [2]. Although it achieves an approximate monocular focusing effect, a tradeoff between the spatial resolution and the angular resolution is imposed owing to the limited light field information provided by a single layer of OLED, which leads to a low resolution performance. Based on fiber scanning, Schowengerdt. et al. has proposed a near-eye light field display design by using a fast scanning fiber to display multi-view scene [3]. The amount of information is limited by the characteristics of the fiber, which results in a tradeoff between refresh rate and resolution. In addition, a near eye light field display strategy utilizing multi-layer LCDs as light modulations to reconstruct the light field has been exploited [4]. This strategy factorizes the original light field data into some patterns displayed by these LCDs, which makes it possible for providing a large amount of information by a simple device. It has a very high information utilization and presents a high resolution light field supporting accommodation. However, the light field factorization of this method is computational complexity and time-consuming. Moreover, a significant latency causes degraded performance and adverse users’ sense of immersion [5, 6].
In order to realize near-eye 3D display with a good sense of immersion, it is necessary to quickly reconstruct a 3D scene with a large amount of information and accommodation effect by a simple hardware device. Inspired by the multi-layer light field display technology, we propose a near eye light field display design with super-fast synthesis speed based on human visual features.
Human visual acuity over visual field is inconsistent. The highest acuity is confined to the fovea which is responsible for sharp vision, and the large peripheral area delivers low resolution information of the surroundings [7]. In this paper, we propose to reconstruct a multi-resolution light field whose resolution distribution is consistent with human visual acuity distribution. The proposed system provides an immersive high resolution light field supporting retinal blur in high reconstruction speed. A human vision based algorithm is used to synthesize the light field. Experiment results show that the acceleration of the proposed scheme is evident and escalates when the spatial resolution increases. These key aspects of our design are verified by simulation and a prototype device.
2. The display principle
In this section, the principle of the proposed near eye light field display is discussed. In the first part, the original light field rendering method is introduced. In the second part, we explains the human vision based reconstruction algorithm, which shows the light field data can be compressed by setting its resolution distribution identical to human visual acuity. The reconstruction speed can be accelerated by synthesizing a multi-resolution light field.
2.1The light field rendering
The general steps of multi-layer display technology are rendering or capturing the original light field data, then factorizing the light field into 2D patterns, and lastly displaying these patterns by multi-layer LCDs display equipment.
The Fig. 1 explains the arrangement of viewpoints. The origin of coordinate system is located at the center of the eyeball. The scene is located in front of the observer along Z axis. A reference plane is arranged for original light field rendering. The position of light ray is described by its intersection on the reference plane and the viewpoint. As Takaki et al. mentioned [8], there should be at least two light rays falling into the eye to achieve accommodation. On account of that, several viewpoints over the eye are arranged as shown in Fig. 1. For the purpose of simulating the distribution of the viewpoints in different viewing directions, the viewpoint is approximated as the point distributed on the eyeball, as shown in Fig. 1(b). The viewpoints are distributed at equal angle gap between each other in X and Y direction, respectively. The position of each viewpoint () is described by the Eqs. (1)-(3).
Here, R represents the radius of the eyeball, and are the index of viewpoints, () and () are the angle intervals between each viewpoint and the direction of visual axis, respectively, and the subscripts x and y denote the directions along X and Y axis. The target light field data is composed of these perceived images obtained on these viewpoints, where () is the pixel index of each perceived image.2.2 The factorization of light field
In this paper, the display system consists of a backlight, dual-layer LCDs and a lens, wherein the lens images the LCD and backlight as a far-enlarged virtual LCD and backlight, respectively (Fig. 2). The imaging relationship can be described as
Where f is the focal length of the lens, d and are the object distance and the image distance, respectively.The components of a conventional dual-layer display are dual-layer LCDs and backlight [9]. The front LCD and the rear LCD modulate these light rays emitted by the backlight to create a discrete light field , where () and () correspond to the pixel indices of the front LCD and the rear LCD, respectively. The matrixes and correspond to the display patterns of front LCD and rear LCD, respectively. The light field can be expressed as the outer product of the matrixes and , as described by Eq. (5).
The light field is imaged by the lens to form a magnified light field , which can be written as
Where and are the imaged patterns of and , respectively. The synthesis can be cast as seeking a factorization patterns of to minimize the weighted Euclidean distance to, which can be described asThe update rule of factorization is given by [10]
Where W is a weight tensor to set weight to every pixel value in these system, and the symbol represents the operator of Hadamart product. The initial pixel values in dual-layer LCDs are random values between zero to one. Figure 3 shows the geometric relationship between , and the imaged reference plane. As shown in Fig. 3(b), the values of and are optimized by the passing through light ray. The corresponding pixels of light ray l can be calculated by geometric mapping. For example, the position () of the intersection A on can be given by Where () is the coordinate of the point on reference plane. The direction and position of a light ray is determined by () and (). The position of intersection C on can be calculated by the same way. By Eqs. (7)-(9) and the results of Eqs. (10) and (11), the light field can be factorized into matrices. In this paper, we conduct the rank-1 light field factorization. Derived from Eqs. (8) and (9), high resolution means a mass of information to be updated, resulting in a large time consuming. The reconstruction process is accelerated by combining the aforementioned algorithm and human visual features.2.3 Human visual features
It is a universal experience that we can obtain the sharp detail in the viewing direction, but the surrounding is less distinct. In fact, human gets part of information of the whole scene at one moment and changes viewing direction to obtain the whole information. Visual acuity represents the ability of visual resolution, which declines with the increasing visual eccentricity [11]. Visual acuity can be expressed as the resolution of different sizes of letters at different visual eccentricities. By measuring reading speed about a sequence of letters with distinct sizes at central vision and other visual field at different eccentricity, Chung et al. indicates that reading speeds should remain invariant with visual eccentricity, as long as the print is appropriately scaled in size [12]. That paper summarized that the angular resolution at the maximum reading speeds formulated as a function of visual eccentricity
Where is the angular resolution at visual eccentricity , and is the angular resolution at fovea. Equation (12) can be depicted by the red line in Fig. 4, which denotes the angular resolution linearly declining with the increasing visual eccentricity. In this paper, the red line is replaced with the polyline (the green line in Fig. 4) for the convenience of synthesis computational complexity.2.4 The reconstruction algorithm based on human visual features
It is different from the light field with inherent uniform resolution that the number of pixels per row in each view is not the same in multi-resolution light field. Therefore, the original light field should be sampled owning to the multi-resolution, and the sampling information of each view is collected to form a matrix.
The Fig. 5 shows the sampling process of each view. As shown in Fig. 5(a), for every view, firstly, the entire visual field is divided into N sub-regions according to discrete and the corresponding visual eccentricity . Secondly, for each sub-region, the sampling unit size is equal to the spatial resolution derived by the corresponding . The sampling information in each sub-region is recorded in a column-wise manner. Lastly, the sub-region’s information is arrayed from outside to inside to form a new matrix (the compressed image), where the is the pixel index of matrix S. The number of the row and column of S is equal, and the extra pixels value of S are set as zero. For every view, is the matrix that records the value of sampling points. All the of each view are gathered as the compressed light field . The process of converting to is defined as the light field compression.
The whole compressed light field is which can be factorized as the compressed front LCD and rear LCD , where the and correspond to the index of pixel of and .
We construct a map image of the reference plane to record the original index () of the sampling point, which allows us to leverage GPU for fast sampling. The construction process of the is the same as . By contrast, the information to be recorded in is the original index () of the sampling point instead of the value of sampling point. The map image of and is formed in the same way as the generation process of of the reference plane. Furthermore, we construct a map image of to record the index of sampling point on . The map image of is formed in the same way as . The map images and are utilized in the reconstruction process.
As mentioned in section 2.2, in order to factorize by the Eqs. (7)-(9), the indexes of the sampling light rays’ corresponding pixels on the compressed front LCD and rear LCD are necessary. However, this correspondence cannot be directly derived by Eqs. (10) and (11). Figure 6 explains the process of obtaining the correspondence.
With regard to each view, firstly, the index () of sampling point B in original light field is obtained by the map of the reference plane (Fig. 6(a)), and the light ray is positioned by B and the viewpoint. Secondly, the intersections of light ray B on and corresponding to and can be calculated by Eqs. (10) and (11)(Fig. 6(b)). Thirdly, the index () of A on can be obtained by map (Fig. 6(c)). The index () of C on can be obtained in the same way by utilizing the map of .In this way, the s corresponding sampling points on and can be ascertained. By combining this ascertaining method and the Eqs. (7)-(9), the can be quickly factorized as and .
The factorization patterns can be displayed until the and are converted to the multi-resolution form and , respectively. The conversion process is defined as decompression which is shown in Fig. 7. It is fast to get the pixel value of and stored in and by the map image and respectively. In summary, the whole algorithm flow of factorizing the into multi-resolution patterns and is depicted in Fig. 8.
As discussed above, the and are used to record the indices of pixels on original light field or the LCD and the compressed light field, respectively. The information is recorded as image format to take full advantage of efficient GPU implementations. However, since the recorded index may be larger than 255, the format of the map image is set as portable network graphic format (.png) to insure the veracity. The relationship between the RGBA of the map images and recorded indices of the pixels is described by Eqs. (13)-(16).
Where R, G, B, A correspond to the red, green, blue, alpha value of pixel of the map image, and () is the index of the recorded pixel. Equations (13)-(16) make the map image available to record an ultra-high resolution image.3. Experiment
In this section, the details of constructing the near-eye light field display based on visual features are explored. Furthermore, we implement a comparison on acceleration between the conventional dual-layer LCD light field display and our method.
3.1 Hardware and software
We built a prototype (Fig. 9) to validate the proposed approach. The prototype includes a dual-LCDs with pixel size of 51um, a uniform backlight and a lens. The focal length of the lens is 54.25mm, and its diameter is 40mm. A light field with a resolution of 608*760 and 5*5 views is set as a target. The reconstructed light field is observed by a CCD with an FOV of .
According to the optical imaging formula, the two physical LCDs are located at distances of 48.051mm and 53.355mm behind the lens, and the virtual LCDs are imaged at 370.00mm and 2000.00mm, respectively. The lens is placed at 21.8mm away from the CCD. The original light field data rendering, light field factorization, and display are wrote by GLSL and C + + . The algorithms are running on an Intel Core i3 PC with Intel HD Graphics 4400 GPU. A model composed of 4 cubes marked as a, b, c and d is exploited in our system to demonstrate the accommodation and multi-resolution. Their relative position is shown in Fig. 10. The distances from cube a, b, c and d to the observer are equal to 370mm, 770mm, 1185mm, and 2000mm, respectively.
The light field is synthesized by the proposed reconstruction algorithm. We took 4 sets of images of the synthesized light field when observer gazed at 4 cubes and kept the corresponding cube within depth of field, respectively.
3.2 Optical distortion correction
Generally speaking, there is distortion degrading the display fidelity in optical system. The general form of the distortion is expressed as a combination of basis radial functions and tangential components [13], which can be comprehended as the form of nonlinear polynomial. Pre-distortion pattern generated by the computational distortion correction can compensate the distortion. A geometric mapping polynomial relationship between the optical object, ideal image and distorted image is established in this paper. The input images displayed on dual-LCDs need to be observed as the ideal images when applied the pre-distortion.
The distortion correction process is conducted as follow: firstly, a calibration image composed of regularly arranged grid points is created. Then this calibration image is displayed the on front LCD, and the rear LCD is set as white to obtain a distortion image. Another distortion image is also required by exchanging the display image of the front and real LCDs. The mapping relationship between optical object and distorted image is set up as
The (, ) and (, ) are the coordinates of the grid point and the corresponding distorted point, respectively, m and n are the powers of the polynomials, and are the coefficients of corresponding polynomials, which can be calculated using least squares method.
Since the distortion near the optical axis is minimal, the central region of the distortion image can be taken as the linear magnified image of the same region of the calibration image. The ideal image of the object is a linear magnified image. The magnification is calculated by the position of the central grid points of calibration image and the distortion image. An ideal magnified input image is acquired by applying the magnification to the input image. Afterwards we applied Eqs. (17) and (18) to the ideal input image to get the pre-distortion image, as described by Eqs. (19) and (20).
Here, (, ) and (, ) are the coordinates of the pre-distortion and the ideal input image, respectively. The distortion of dual-layer LCDs system is compensated by applying the approach to both of the LCD layers.
4. Results and analysis
4. 1 Accommodation
Depth discrimination is a fundamental issue for 3D display technology. Accommodation elicited by retinal blur can improve depth discrimination performance. The experimental results, shown in Fig. 11, demonstrate the ability of the proposed approach to achieve retinal blur. In Fig. 11(a), a set of imageries were taken by altering CCD’s focal depth from 370mm to 2000mm. The result shows that each object is in reasonable focus when the CCD is focused at the intended focal depth, and out of focus when the CCD is focused elsewhere.
4.1 The Perceived images
The perceived sharpness of the real world is varied with the observer’s focal depth and viewing direction. Figure 11(b) are perceived images of cube a and d recorded within depth of field but with different viewing directions. Figure 11(b) demonstrates the sharpness contrast result of the images captured with varying view direction when the cube is within depth of field. It indicates that although the object is within depth of field, the sharpness of perceived scene is different when the viewing direction is varied. As a consequence, the sharp images are perceived unless the objects are located in reasonable focus and within fovea. This property is coincident with the human visual features.
The Fig. 12 shows the perceived results when camera is focus at the gazed cube. The result shows that the object within fovea is perceived sharply, by contrast, the perceived images in the peripheral part is low resolution and distinct. The Fig. 13 is the maps when the viewer gazed at the cube a. The Fig. 14 is the example multi-resolution layer patterns which is reconstructed as acceleration 3.47 times by proposed method compared to the conventional algorithm. Combined with Fig. 12 (a) and the division of light field which reflected by map S2 in Fig. 13, it is clear that the sharpness of the reconstructed light field declines with the increasing visual eccentricity. By comparing Fig. 12(a) with Fig. 12(d), it suggests that the image sharpness of the gazed cube d is lower than cube a. It is because the image quality of multi-layer LCDs is influenced by diffraction effect [4, 14, 15].
4.3 Acceleration
Time consumption is an important issue for VR. In order to estimate the computational acceleration of this method, two sets of comparison experiments were conducted. To analyze the acceleration with different spatial resolution, a set of experiments were conducted by altering the spatial resolution with constant viewpoints. Ten light fields with 2x2 viewpoints and different spatial resolution were reconstructed by the proposed algorithm and conventional dual-layer reconstruction algorithm. These spatial resolutions were designed as different integer multiples of 276*276. The runtime was measured for 5 iterations, and the recorded time was the runtime for one iteration.
These results are plotted by Figs. 15 and 16. Figure 15 plots the runtime of two algorithms with different spatial resolution. Figure 16 plots the runtime’s ratio of conventional algorithm to proposed method with different spatial resolution, which reflects the acceleration directly. To summarize, with the increasing spatial resolution, the time consumption of conventional algorithm compared with the proposed algorithm increases rapidly, and a higher resolution results a faster acceleration.
To discuss the influence of the angular resolution and spatial resolution on acceleration, a set of experiments were manipulated by altering angular resolution and spatial resolution with the amount of light field information as a constant. Table 1 shows the runtime taken by these two methods for one iteration of each light field reconstruction. It shows that the proposed method supports an obvious acceleration. In addition, a more prominent acceleration is gained with higher spatial resolution. In conclusion, the acceleration of the proposed scheme is evident and increases rapidly when the spatial resolution increases.
As discussed previously, the proposed display system supports sharp image in fovea and gradually blurred scene at the peripheral area. Human changes the viewing direction to bring the attention part into human’s fovea vision when viewer watches a scene. This watching behavior is comprised of fixation and saccades [16]. Saccades is fast eye movement. Saccades is not continuous, and Fixation is the interval between saccades. The general duration of fixation is 200~300ms [17]. In this paper, the reduced time consumption is much less than that time. It indicates that we can achieve a high resolution near eye light field display without latency by taking full use of that time to update the entire scene when the viewing direction is changed.
5. Conclusion
VR display as the next-generation display technology is gaining prevalence. For the near-eye light field display, updating the displayed images in accordance with the rotation of the eye or head is necessary, because it is hardly to keep the head or eye motionless. What’s more, high resolution without latency or low-latency is a crucial issue for immersion, which still has not been adequately addressed yet. In this paper, we introduced a new near eye light field display proposal that reconstructing a multi-resolution light field whose resolution distribution is consistent with human visual features for corresponding viewing direction in a super-fast synthesis speed. A human vision based reconstruction algorithm is introduced to reconstruct a high resolution light field with support accommodation. The prototype and experimental results indicates the reconstruction speed of the proposed system is accelerated obviously comparing to the conventional reconstruction system. By this method, it is possible to realize real time multi-layer light field near eye display with higher image quality with an eye tracking device.
Acknowledgments
This work is supported by National Natural Science Foundation of China (61575175), the National Key Research and Development Program of China (2016YFB1001502), and the National Basic Research Program of China(2013CB328802)
References and links
1. E. Peli, “Visual and optometric issues with head-mounted displays,” in IS & T/OSA Optics & Imaging in the Information Age, (The Society for Imaging Science and Technology, 1996), pp. 364–369.
2. D. Lanman and D. Luebke, “Near-eye light field displays,” ACM Trans. Graph. 32(6), 1–10 (2013). [CrossRef]
3. B. T. Schowengerdt, H. G. Hoffman, C. M. Lee, C. D. Melville, and E. J. Seibel, “57.1: Near‐to‐Eye Display using Scanning Fiber Display Engine,” SID Symposium Digest Tech. Papers 41(1),848–851 (2010). [CrossRef]
4. F. C. Huang, K. Chen, and G. Wetzstein, “The light field stereoscope: immersive computer graphics via factored near-eye light field displays with focus cues[J],” ACM Trans. Graph. 34(4), 60 (2010).
5. R. B. Welch, T. T. Blackmon, A. Liu, B. A. Mellers, and L. W. Stark, “The effects of pictorial realism, delay of visual feedback, and observer interactivity on the subjective sense of presence,” Presence (Camb. Mass.) 5(3), 263–273 (1996). [CrossRef]
6. S. Uno and M. Slater, “The sensitivity of presence to collision response,” in Virtual Reality Annual International Symposium (IEEE, 1997), pp. 95–103. [CrossRef]
7. A. H. Chan and A. J. Courtney, “Foveal acuity, peripheral acuity and search performance: A review,” Int. J. Industrial Ergonomics 18(2), 113–119 (1996). [CrossRef]
8. Y. Takaki, “High-density directional display for generating natural three-dimensional images,” Proc. IEEE 94(3), 654–663 (2006). [CrossRef]
9. D. Lanman, M. Hirsch, Y. Kim, and R. Raskar, “Content-adaptive parallax barriers: optimizing dual-layer 3D displays using low-rank light field factorization,” ACM Trans. Graph. 29(6), 1–10 (2010). [CrossRef]
10. N. Ho, P. Van Dooren, and V. Blondel, “Weighted nonnegative matrix factorization and face feature extraction,” Image Vis. Comput. 2007, 1–17 (2007).
11. R. Rosén, Peripheral Vision: Adaptive Optics and Psychophysics (KTH Royal Institute of Technology, 2013).
12. S. T. Chung, J. S. Mansfield, and G. E. Legge, “Psychophysics of reading. XVIII. The effect of print size on reading speed in normal peripheral vision,” Vision Res. 38(19), 2949–2962 (1998). [CrossRef] [PubMed]
13. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000). [CrossRef]
14. A. Maimone, G. Wetzstein, M. Hirsch, D. Lanman, R. Raskar, and H. Fuchs, “Focus 3D: Compressive accommodation display,” ACM Trans. Graph. 32(5), 153 (2013). [CrossRef]
15. A. Maimone and H. Fuchs, “Computational augmented reality eyeglasses,” in Mixed and Augmented Reality (ISMAR) (IEEE, 2013), pp. 29–38.
16. S. Pannasch, J. R. Helmert, K. Roth, A. K. Herbold, and H. Walter, “Visual fixation durations and saccade amplitudes: Shifting relationship in a variety of conditions,” J. Eye Mov. Res. 2(2), 1–19 (2008).
17. K. Rayner, “Eye movements in reading and information processing: 20 years of research,” Psychol. Bull. 124(3), 372–422 (1998). [CrossRef] [PubMed]