Abstract
Integral-imaging-based (InI-based) light-field near-eye display (LF-NED) is an effective way to relieve vergence-accommodation conflict (VAC) in applications of virtual reality (VR) and augmented reality (AR). Lenslet arrays are often used as spatial light modulator (SLM) in such systems. However, the conflict between refocusing on a virtual object point from the light-field image (LF image) and focusing on the image plane of the lenslets leads to degradation of the viewing effect. Thus, the light field (LF) cannot be accurately restored. In this study, we introduce matrix optics and build a parameterized model of a lenslet-array-based LF-NED with general applicability, based on which the imaging process is derived, and the performance of the system is analyzed. A lenslet-array-based LF-NED optical model is embodied in LightTools to verify the theoretical model. The simulations prove that the model we propose and the conclusions about it are consistent with the simulation results. Thus, the model can be used as the theoretical basis for evaluating the primary performance of an InI-based LF-NED system.
© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
As one of the core technologies of virtual reality (VR) and augmented reality (AR), stereo vision can restore the depth information of virtual scenes, so that users can experience a more efficient and intuitive understanding of virtual information. In the field of near-eye displays (NEDs), conventional binocular stereovision relies on binocular parallax to stimulate the depth perception of the human eye, but these types of NEDs only have one virtual image plane [1–3], which leads to the well-known vergence-accommodation conflict (VAC) problem, in which the eyes perceive different depths through binocular vergence, while the focus of the lenses is restricted to a single depth [4]. The solution to this problem is to simulate the light entering the eye in a natural state, that is, to create a light field (LF) near the position of the eyes, so that the lens of the eye can respond to the light entering the eyes and focus on the corresponding depth of the virtual object. Integral-imaging-based (InI-based) LF displays are one of the main methods used to generate light field [5]. Such systems use a lenslet array [6–8] or a pinhole array [9,10] parallel to the display as a spatial light modulator (SLM). By setting up plane coordinate systems on the SLM plane and the display plane, we can parameterize any ray in space into four coordinates [11].
In order to maximize the use of the limited resolution of displays for recovering depth information, a proportion of the image resolution of each depth plane of InI-based LF displays must be sacrificed. Moreover, for the sake of thinness, a lenslet array used in such systems usually has a relatively small focal length compared with a typical head-mounted display (HMD), which means that the magnification of lenslets is relatively large, resulting in larger virtual pixels and a lower angular pixel density observed by the user. Fortunately, with the continuing development of micro light-emitting diode ($\mu$LED) display technologies, some commercial grade $\mu$LED displays with high pixel density have been launched, such as the 14,000 PPI $\mu$LED display by Mojo Vision [12] and the 10,000 PPI $\mu$LED display by JBD [13]. These will produce clearer visual effects and provide a good prospect for high-resolution LF-NED.
As the display screens become increasingly precise, an analysis of the imaging rule of actual LF display systems becomes more meaningful. To analyze the formation of LF, several studies have been conducted on LF rendering [11,14–16] and plenoptic cameras [17–19]. In the field of LF-NED, Huang and Hua [20] constructed a point spread function (PSF) of the retinal image, and systematically analyzed the visual effects of LF displays and the visual responses of the human eyes by combining the human vision system with an ideal model of LF display. Qin et al [21] analyzed the visual effect based on an image formation model that incorporates all factors affecting the image formation into an LF-NED, including diffraction, aberration, defocusing and pixel size. Yuan et al [22] built a full-chain performance characterization model of an InI-based 3D display system, and evaluated the key parameters of the restored voxels. Wave optics is often used in modeling and evaluating an InI system and it provides relative rigorous results [22–24]. However, these studies mainly focus on the analysis of simulation results to summarize the changing rules or require complex calculation to illustrate the imaging effects. However, they do not provide a relatively simple parametric LF imaging model in an analytical form. Thus, it is still difficult to predict the performance of an InI-based LF-NED system directly by calculation, without simulations or experiments.
Therefore, this study focuses on the parametric analysis of InI-based LF-NED. Our work includes: 1) building an ideal parameterized model of InI-based LF-NED; 2) analyzing the refocusing process by means of matrix optics based on the parameterized model; 3) analyzing the image quality and the error influence of an InI-based LF-NED; and 4) conducting simulations of a simulated InI-based LF-NED to verify the results of the analysis. Compared with previous studies, our work focuses on the primary properties and the general rules of an InI-based imaging system. Thus, we provide a parametric tool which can be used for evaluating the basic indicators a system can achieve based on some simple parameters, such as the focal length and the aperture of the lenslet unit, to design the primary scheme of an actual InI display system, before optimization considering diffraction and aberration.
2. Matrix optics representation of the LF display
A typical InI-based LF display system includes a display plane (DP), a spatial light modulator (SLM), an eyebox plane (EBP), a central depth plane (CDP), and a refocusing plane (RFP), as shown in Fig. 1. We can use a Cartesian reference system in which we define the optical axis to be in the z direction, and the y direction is perpendicular to it in the meridian plane. The light emitted from a display screen at the DP is modulated by the SLM and distributed in the eyebox. The eyebox is a spatial range that limits the range of movement of the pupil, allowing the pupil to experience the right visual effect as it moves within it. EBP is the plane perpendicular to the optical axis at the intersection of the chief rays emitted from the center of each elemental image of the LF image. In the depth priority form, EBP represents the performance of the eyebox, in which a pupil can have the largest range of movement. In the resolution priority system, the EBP acts as a viewing window through which the correct reconstructed scene can be observed. Refocusing in such a system is a process during which rays containing the same information from different elemental images and passing through different SLM units form a single virtual point that can be perceived by the eye. By extending the chief rays of the SLM units into the reverse, rays that represent the same virtual object point will intersect at a point at the RFP to achieve refocusing. As each SLM unit also serves as an imaging system, the DP will also have a conjugate plane about the SLM, namely the CDP. Both CDP and RFP can be in front of or behind the SLM, to form virtual or real images. The InI system can be embodied in depth priority form, in which the CDP and the RFP are on the same side of the SLM as the DP, or resolution priority form, where the CDP and the RFP are on different sides of the SLM from the DP [25]. The human eyes tend to focus on the RFP when observing the virtual scene. Thus, it is obvious that the virtual scene has the clearest refocusing effect when the RFP and the CDP coincide; however, if the CDP and the RFP do not coincide, the reconstructed virtual point on the RFP will no longer be an ideal point, but an accumulation of defocusing spots. The difference between the RFP and the CDP will lead to a reduction in imaging quality as human eyes focus on the RFP when viewing virtual information. In addition, in most studies of InI-based LF displays, researchers have used pinhole array models to simplify lenslet arrays, that is, to consider only the chief rays while ignoring the distribution of non-chief rays [26,27], or to assume that the DP is located within the depth of field (DOF) of the lenslets [28]. These studies also ignore the difference between the CDP and RFP. Therefore, in this section, a perfect lenslet array model is used to quantify the propagation and imaging processes of the rays.
2.1 Representation of the InI system
This work mainly focuses on studying the most basic characteristics of the LF display system based on lenslet arrays, and the paraxial optical model is the basic model of imaging optics. Moreover, for the human eye, the display quality of the central field of view usually occupies the most important position. To facilitate this analysis, we make the following assumptions and approximations for a lenslet array used as an SLM when building a theoretical model:
1. The lenslet array is composed of lenslets with rotational symmetry surfaces, each of which has the same optical parameters.
2. Each lenslet unit is approximately equivalent to an ideal paraxial system composed of an exit pupil and two principal planes, in which the distance between the exit pupil and the rear principal plane is $d_{exp}$;
3. The system is in homogeneous surroundings regarding the refractive index, so that the height and angle of any rays remain the same when propagating between principal planes.
4. The distance between the two principal planes (usually millimeter scale) is ignored; that is, we refer to the two principal planes as one single plane. According to the property of the principal planes, the only cost of this approximation is that the image space will have an axial position deviation equal to the distance between the two principal planes, which is negligible compared with meter-scale scenes.
Based on these assumptions, all aberrations in the system are ignored, which enables us to focus on the relationship between refocusing and imaging. Meanwhile, the object-image relationship of the system is still consistent with the actual situation, despite the approximations in aberration, so the model reflects the theoretical situation of an LF display.
To improve the calculation efficiency, we use matrix optics to build an InI-based LF-NED model. Matrix optics is a geometric optics analysis method which is in accordance with Gaussian optics [15,29]. To represent the components of the optical path in terms of a matrix, we specify the positive and negative signs of the system: positive is to the right and up in terms of directions, and angles are counted clockwise from the optical axis to the ray. In addition, in the analysis, the principle plane of the lenslet array is at position $z=0$. Then, the state of a ray at a given plane perpendicular to the optical axis can be described with only two spatial and angular coordinates. Based on this, a ray of light passing through a point with height $y$ and an angle with the optical axis of $\theta$, a thin lens with a focal length of $f'$, and a distance of $L$ in the homogeneous refractive index space can be represented as:
In an InI system, no matter which form it takes, each ray emitted from the DP passes through the SLM with an object distance of $d$ and a focal length of $f'$, propagates to the exit pupil distance of $z_{EPL}$, reaches the EBP, and then propagates in reverse a certain distance $z$. The process can be expressed as
Because each lenslet unit in the SLM is not coaxial to the optical axis of the system, but has periodic eccentricity, it is necessary to introduce an eccentricity matrix $\mathbf {R}_{decenter}$ and a regression matrix $\mathbf {R}_{regress}$ to describe this. The eccentricity of an optical element cannot be completely defined by a two-dimensional square matrix; therefore, the lens matrix needs to be extended from a 2$\times$2 matrix to a 3$\times$3 matrix. Here, the eccentricity matrix and the regression matrix are, respectively, represented as
2.2 Representation of LF rendering and refocusing
Based on the principles of InI, to recover LF information, the content displayed on the DP must be an LF image. Generally, the rendering of an LF image and the refocusing of an LF display are a pair of approximate reciprocal processes. They are approximated because the limited modulation power of the SLM makes it almost impossible to render the LF image exactly as the light is distributed in actual situations. When we focus on rendering an LF image, we only need to focus on a specific ray for each SLM unit, and ensure that it conforms to the light distribution of the natural state as much as possible. For these rays, we need to trace the processes by which they are emitted from virtual object points and modulated to obtain their coordinates on the DP. Ray tracing is an accurate way to calculate an LF image and can even be used to correct distortions of the lenslets of the SLM. However, because of the tremendous amount of spatial ray data, it requires high-performance computers to realize a real-time generation of an LF image by ray tracing. Alternatively, a rendering method based on projection matrix transformation can effectively reduce the computational cost of LF image rendering, owing to the linear mapping of object point coordinates from 3D space onto a 2D plane. Here, we can set up a camera array according to the relevant rules of computer graphics and build a projection matrix model corresponding to the parameters of the SLM.The pinhole array model is the simplest form of SLM, as it does not refract any rays. It shields most rays, and only allows for rays that conform to the actual LF to enter the eyebox. Therefore, in the case of a pinhole array, the rendering process of an LF image displayed on the DP is the reverse process of refocusing. In the rendering model, the center of each camera model used for rendering is located at the center of the corresponding pinhole, and the visual axis intersects at the center of the EBP [8]. In this way, the virtual object point, the corresponding pixel on the DP, and the center of the exit pupil are collinear. Thus, as shown in Fig. 2, in the pinhole array model, the corresponding pixel on the DP is located at point A, the intersection of the DP and line $\textrm {AO}_{\textrm {exp}}$, the connection of the virtual object point, and the center of the exit pupil. In a lenslet-based SLM, however, it is obvious that the chief rays passing through the lenslets are refracted by the lenslet before passing through the center of the exit pupil. The result is that the corresponding pixel on the display is at B, instead of A. Using the projection matrix model to calculate the coordinates of point B will be a very complex process, so we use $\textrm {B}'$, the intersection of the DP and line CO, the connection of the principal point O and the virtual object point, to replace B.
The error caused by the approximation can be expressed as the degree to which pixel B is close to $\textrm {B}'$ in an elemental image. According to Fig. 2, the coordinates of point B and $\textrm {B}'$ have the following geometric relationship:
In summary, we can represent the rendering process by a matrix transformation. The ray emitted from the virtual object point at distance $z_0$ in front of the principal plane of the lenslet can be expressed as
This result reflects the refocusing state of the rays. It can be seen from the results that the coefficients of $y$, $ND_{pitch}$ and $r$ represent the effects of the virtual object point position, the lenslet position, and the position of the ray in the exit pupil of the lenslet on the light vector, respectively. The refocusing error of the system can be analyzed by comparing the rays on the RFP with those from the virtual object. When the rays refocus on the RFP, the height of each ray is independent of the position of the lenslets, that is, $k_{12}=0$ when $z=z_{RFP}$. Thus, we can derive
Substituting $z_{RFP}$ into $k_{11}$, we obtain $k_{11}=1$. This indicates that the image height on the RFP does not change from the original virtual object point, but that the depth of the refocused image point deviates from the depth of the original virtual object point to some extent. Because the values of $f'$ and $d$ are usually close, and $d_{exp}$ is a small value compared to $z_0$, this depth deviation is very small. When CDP and RFP coincide, the deviation decreases to zero.These equations are applicable to all types of InI systems and can be simplified by changing the parameters. For example, if $r=0$ and $d=f'$, the lenslet array model can be simplified to a pinhole array model. In this way, we can analyze the system using the pinhole imaging principle, which is very convenient for roughly determining the initial parameters of an InI-based LF display system. In addition, for $z=z_{EPL}$, the light distribution at the EBP can be obtained. Particularly, when $z_0=-d$, we can directly calculate the propagation of any pixel from DP. Under these conditions, Eq. (19) can be expressed as
Note that although the focal plane is farther than DP from the SLM in Fig. 2 and Fig. 3, thus embody a depth-priority form, the calculations above also apply to a resolution-priority form, the only difference is that $d$ is larger than $f'$ and $z_0$ is positive.
3. Imaging analysis of the LF display
Several variables contribute to the visual effects of the system. To better illustrate the state of the rays in a system, we used a high-PPI micro-display as the image source and designed a lenslet array with typical parameters as the SLM. The parameters of the system are shown in Table 1. In this section, we analyze a depth-priority InI display system, but it should be noted that since the calculation is based on the formula derived in the previous section, this analysis method is also applicable to resolution-priority systems.
3.1 Resolution analysis
As the LF image is composed of several elemental images, the information of the same virtual object point appears in multiple positions on the DP. Therefore, the number of times the same information appears on the DP reflects the information redundancy of the virtual object point. Clearly, this information redundancy varies with the location of the virtual object points because the pixels of the virtual object points at different distances appear at different intervals on the display. For convenience, we assume that the pixels of the display screen used are evenly distributed, and the elemental images are closely spaced, although the actual display screens may have more complex structures. We use the effective pixel ratio $\eta$ to represent the resolution loss level of the LF display. Note that although only 1-D situation is considered in this section, the conclusion is also applicable to a 2-D plane by applying the same calculation idea to 2-D coordinates.
The effective pixel ratio is calculated as follows. First, the size of an elemental image $D_{unit}$ is calculated. This value determines the size of the EBP, which is similar to the shape of each elemental image. According to Eq. (21), the rays emitted from the same edge of the adjacent elemental image intersect at the EBP. That is, the following is satisfied at the EBP:
Note that in the rendering process, $d_{exp}$ is set 0. Further, according to the rendering method, the pixel interval $D_{info}$ of a single virtual object point on the DP should satisfy Based on the above equations, the effective pixel ratio can be calculated as3.2 Refocusing depth accuracy
In the 4-D LF, we can define a ray using a line between the exit pupil center of a lenslet and the pixel center on the DP. Theoretically, any ray can be defined in this way. However, as the pixels and lenslets are distributed discretely in their respective planes, these lines cannot fully describe the arbitrary ray distribution in real situations. Thus, using these lines to define the refocusing depth inevitably decreases the depth accuracy of the refocused scene.From the microscopic perspective, when the information in a 3-D scene is rendered to a 2-D image, the information represented in each full pixel range is the same as that at the center of the pixel (regardless of the structure of RGB sub-pixel). Additionally, the gray value of each pixel is determined by the intensity value of the ray emitted by its corresponding virtual object point. Therefore, to achieve perfect refocusing, the virtual images of the pixels representing the same information formed by the corresponding lenslets must completely overlap. Ideally, the rays emitted by the virtual object point can still maintain the original gray value after rendering and refocusing. However, because of resolution degradation [30], a single virtual point may not be rendered on an exact position in each elemental image. Specifically, for example, if the virtual object points in a virtual scene are separated from each other, they can be rendered to every single elemental image. However, due to the limitation of sampling accuracy, the coordinates of the pixels in the elemental images are not always the same as in the ideal case. Thus, in the process of refocusing, the light represented by the coordinates of these pixels and the coordinates of the corresponding SLM units cannot perfectly intersect at the original virtual object point, which may cause the focusing state of the human eye to deviate slightly from the ideal situation. In addition, when the scene is more complex, some virtual object points in the virtual space may not be rendered to each elemental image, but may be replaced by other points next to these virtual points, because their locations are more accurate. This may result in insufficient sampling for a virtual object point, which will also cause the deviation of the focus state of the human eye [31].
Therefore, based on the geometric relation depicted in Fig. 5, the ideal refocusing condition of a virtual object point is:
3.3 Refocusing definition
As the lenslet aperture is smaller than the pupil of the human eye, the beam diameter is limited by the lenslet aperture. Evidently, to achieve the refocusing effect, the rays from the virtual object point should pass through at least two lenslets before entering the pupil. Here, we assume that the number of lenslet footprints within the pupil range is a positive integer, and we use the chief rays of the lenslets to analyze the refocusing situation.
The smallest unit of an image is pixel, and it has a certain size. When the human eye observes a pixel through a lenslet unit or an ideal pinhole unit in the SLM, it actually observes the virtual image of the pixel in the image space of the lenslet or the ideal pinhole, which is called the virtual pixel here. For a lenslet, the virtual pixel is located on the CDP, but refocusing makes the human eye focus on the RFP, not necessarily the CDP. Therefore, the result observed by the human eye on a single pixel is actually the defocusing spot of the virtual pixel, as shown in Fig. 7 For SLM in the form of pinhole array, the virtual pixel defocusing spot at RFP can be regarded as the distribution range of all the main rays within the field of view of pixel at that position, and its size is:
The angle (in radians) of the spot to the human eye can be calculated as:Meanwhile, for a single imaging channel in SLM, an ideal virtual image point at the CDP is reflected as a defocus blurred spot at the RFP, the diameter of which is expressed as $D_{spot\_lens}$, and the angle formed by the light spot to the eyes is expressed as $U_{spot\_lens}$. Therefore, the defocusing spot formed by a single virtual pixel at the RPF can be regarded as the result of the convolution of the defocusing spot formed by the ideal object point within the range of Eq. (27). If the spot boundary is defined as where the brightness drops to zero, the angle of the defocusing spot to the eye formed by a single virtual pixel is:
As different systems have different optical properties in practice, it is difficult to describe $U_{spot\_lens}$ in parametric analytical form. Nevertheless, we can substitute the ideal aberration-free optical system model into the equation to obtain the most essential variation law of refocusing clarity, while the actual system has a similar variation trend.In an aberration-free imaging system, the angle formed by the light spot to the eyes can be expressed as:
For a better description, the definition can be denoted by angular pixel density (represented as pixel-per-degree or PPD): Substituting Eq. (29) into it, the angular pixel density image can be obtained as shown in Fig. 8(c). It can be seen that the change rule of the sharpness of a single pixel is similar to that of an ideal object point, and the sharpness reaches the maximum value at $z_{RFP}=z_{CDP}$. For different $d$, when $d$ approaches the focal length value, the peak position of angular pixel density rapidly moves to the right, resulting in the decrease of the sharpness at the near position and the improvement of the sharpness far away.It is important to note that the definition of the spot edge here is the position where the brightness drops to zero. However, if the definition of the spot edge is changed, for example, as the position with the half-high brightness value, the angular pixel density will increase appropriately. Or, when there is aberration in the system, the clarity changes to a certain extent, but the overall trend remains. Therefore, in the actual system, the display clarity does not always conform to the curve in Fig. 8. However, because the curve can reflect the change in clarity and distribution trend under ideal conditions, we can still adjust the parameters of the system according to the clarity reflected by the curve. The specific situation will be better reflected by simulation.
3.4 Effect of assembly errors on refocusing
The camera model used for the LF image rendering must be consistent with the parameters of the actual lenslet array model. However, there are errors between the position of the DP in the imaging system and the theoretical model. These errors can be decomposed into rotation and translation along the x-, y-, and z-axes. Among them, the translation along the x- and y-axes will cause the translation of the refocused image and eyebox in the x-y plane without affecting the clarity. The translation along the z-axis will cause the refocused result of the system to be inconsistent with the expectation. Meanwhile, as the values are usually small, rotation along the x- and y-axes can be approximated by translation along the z-axis at a local position; the rotation along the z-axis can be approximated as a translation along the x- or y-axes at a local position. Hence, the translation on the z-axis has the greatest impact on the refocusing effect. Therefore, this subsection focuses on the determination of the translation error of the z-axis and its influence.
Considering the above conclusion, if the DP is shifted by $\Delta d$ along the direction of the optical axis and other parameters in the system remain unchanged, the matrix $\mathbf {T}_1$ will become
4. Simulation verification
In the calculation of some of the system parameters such as resolution and RFP position by using the formula derived above, only the chief rays of each field of view of each imaging unit is used, so the calculated results are not affected by aberration. However, for the parameters such as refocusing definition, which requires the consideration of full-aperture rays, there is a significant difference between the actual system with aberration and the ideal model. At this point, the calculation results based on the ideal model can only provide a reference for the design of the actual system. Two experiments were conducted to compare the display effect between the ideal system and the actual system.
As it is difficult in practice to precisely control every optical element within the sub-mm level, we conducted simulations to illustrate the viewing effect of the InI-based LF display. The simulation was conducted using LightTools [32], as shown in Fig. 9. The SLM used here is a lenslet array with a unit pitch of 1 mm. Each lenslet is a plano-convex lens with a focal distance of 5 mm, and are combined into one single element. The material of the element is E48R, the refractive index of which is 1.53. The simulated microdisplay is a square Lambertian planar surface light source with a diagonal size of 1 inch. It emits light with a wavelength of 555 nm into the space, and its divergence angle is set to cover the range of the eyebox. An LF image with a resolution of $3840\times 3840$ (i.e. a pixel density of 5430 PPI), same size as the light source is used as a mask for the light source to simulate the content displayed on the display screen. The mask is located at the DP position, and the LF image it displays is rendered in Unity using the method described in Section 2. To simulate the viewing effect under different CDP, the distance between the mask and the SLM main plane varies from 5.00mm to 4.86 mm, and the corresponding CDP position is -173 mm to -$\infty$. In order to simulate the visual effect of human eyes, an ideal thin lens with adjustable focal length is used as the lens of a human eyes, and its aperture is set to 8 mm according to the general size of the pupil of a human eye. Behind the lens is a planar receiver, used for simulating the retina of the human eye. The distance between the receiver and the lens is fixed at 35mm. By adjusting the focal length of the ideal lens based on Gaussian equation, it forms a conjugate relation with different RFPs in the system. By ray tracing, we can obtain retinal images on these planar receivers to represent the visual effects of the human eye.
In the first simulation, an object of sufficiently small size was placed at different distances and rendered on LF images separately, so that the object in each elemental image occupies only one single effective pixel, that is, the size of the object is the smallest detail that could be presented by the LF display system. In order to reflect the situation mentioned above that perfect refocusing cannot be achieved due to the discrete coordinates on the DP and the SLM, we enabled only one row of the elemental images to ensure that refocusing only occurs in the horizontal direction. The refocusing image on the retina can be regarded as a collection of retinal images of different elemental images. Therefore, the size of the retinal image in the vertical and horizontal directions reflects the size of the light spot formed by a single pixel and that formed by the refocusing of multiple pixels, respectively. Figure 10 shows the retinal images under different CDP and RFP conditions. The retinal image has the minimum size when the CDP and the RFP coincide, and when they do not coincide, the retinal image will deviate from the original shape of the pixel and gradually show the shape of the pupil of the lenslet unit. However, as mentioned above, even in the case that the CDP and the RFP coincide, the limited size of the pixels as well as the rendering errors can lead to a deviation between the RFP and the ideal refocusing plane. This makes the observed refocused image points larger and less sharp than those formed by individual pixels.
Table 2 illustrates the comparison between the calculated and simulated values of the angular pixel density of the retinal spots. The simulated values include the single-pixel angular density and the refocused-pixel angular density, which are calculated based on the sizes of the corresponding retinal image simulation results in the vertical and horizontal directions respectively. Due to the blurry boundary of the spots and the influence of aberration, the shape of the spots is not regular. Therefore, we define the boundary of the retinal spots as where the brightness drops to 10% of the maximum. It can be seen that the clarity of the simulation values are close to the calculated values, and there is a deviation between them due to the influence of aberration. Some of the simulated PPD values are even higher than the corresponding calculated value. This is because aberrations, especially spherical aberration, may extend the depth of field of a single lenslet. Meanwhile, the angular pixel density of a single-pixel spot is about 1 to 3 times that of a refocused-pixel spot. This phenomenon is caused by the finiteness of the amount of ideal RFP positions as mentioned in Section 3.2 and the rendering error of a single virtual object point. This was also discussed in Qin et al.’s study [21], and can be alleviated by rendering the virtual scene at sub-pixel level [33].
From the first simulation, the angular pixel density is reduced after refocusing, but despite this, the virtual objects at the RFPs still have the highest refocusing clarity when observed. We illustrated this with another simulation. In this simulation, the LF image contains 7 patterns, namely 6 digital patterns and 1 circle pattern. To increase the detail for clearer recognition, black horizontal stripes with periodic arrangement were added to each figure. The positions of the objects are evenly arranged in a reciprocal order, and the distances to the principal plane of the SLM are 173 mm, 203 mm, 245 mm, 307 mm, 412 mm, 620 mm, and 1245 mm, respectively. The CDP varies among these patterns with the adjustment of the focal length of the ideal lens. A bias pattern is also set at 10 m in front of the principal plane of the SLM as the background to represent objects at approximately infinite distance. To distinguish the patterns from the background, the contrast of the bias pattern on the background pattern is set lower. Depending on their positions, the figures were scaled to appear of the same size on the retina, as well as the same detailed texture and richness. As shown in Fig. 9(b) The retinal images of different RFPs and CDPs are shown in Fig. 11. The elemental images reveal that when the eye focuses on an RFP, the virtual object on it appears clearer than other objects off the RFP. Additionally, the images of the objects off the RFP appear blurry, which is similar to the effect of natural human vision and verifies that the system provides correct focus cues. Furthermore, when the CDP and RFP coincide, the object on the RFP is the clearest, although some approximations are introduced in structure and rendering. Nevertheless, even the clearest images of the objects on the CDP are not as sharp as expected due to the misalignment of the accumulated retinal images of the pixels. Although according to Section 3.2, the error generated in the rendering process will cause a slight change in the focus cues, from a macro perspective, the focus cues reflected in the simulation results are basically consistent with the actual visual effect.
5. Discussion
The calculation in this paper is based on the light in the meridian plane. However, due to the rotation symmetry of the system, this calculation can also be extended to the light in 3-D space without loss of generality. Specifically, if we want to expand the calculation to the 3-D space, the matrices of a ray, lens, and distance should be
Note that the equations in this paper are derived based on the ideal lens model without any aberration. But the aberrations in an actual system will cause a deviation between the actual effect and the theoretical calculation. For example, field curvature and spherical aberration may not only cause blur, but also cause axial offset or curvature of the CDP, while distortion can cause axial offset of the RFP. Nevertheless, an optical system based on the paraxial approximation is always the most ideal case of a practical optical system, and its variation law in the analysis process reflects the most basic characteristics of a practical optical system. Therefore, the conclusions derived in this paper can provide theoretical reference for the design of a practical LF display system.
6. Conclusion
We described a parameterized model of the structure, imaging process, and rendering and refocusing process of an InI-based LF-NED and analyzed the performance of it by introducing matrix optics. The calculation results can be expressed as a combination of independent monomials related to the pixel position, lenslet position, and position of the ray on each lenslet. We further described the calculation of the 2-D resolution and its influencing factors. Based on the rendering process and physical parameters of the display, we provided the distribution of the RFPs that satisfy the perfect refocus condition. The definition on the perfect RFPs can also be derived by calculating the rays at the edge of the spot formed by each virtual pixel. To obtain the effect of assembly error on refocusing, we introduced an error term and calculated the refocusing result. The result showed that the only effect of assembly error is the linear scaling of the refocus depth. Two simulations were conducted to verify the theoretical calculation, which showed that the refocusing results of an ideal InI-based LF-NED are consistent with the theoretical calculation. Although the calculations in this paper were all performed in the meridional plane, the representation and calculation can be easily extended to the 3-D space. Our study provides a parametric calculating tool for designing the key parameters and evaluating the primary display effects of an InI-based LF-NED system. In future work, aberrations will be considered for the model to predict and evaluate the performance of actual InI-based LF-NEDs more accurately.
Funding
Key Technologies Research and Development Program (2017YFA0701200); National Natural Science Foundation of China (61822502).
Acknowledgments
We would like to thank Synopsys for providing the educational license of CODE V and LightTools.
Disclosures
The authors declare no conflicts of interest.
References
1. D. Cheng, Y. Wang, H. Hua, and M. M. Talha, “Design of an optical see-through head-mounted display with a low f-number and large field of view using a freeform prism,” Appl. Opt. 48(14), 2655–2668 (2009). [CrossRef]
2. Y. Amitai, “Extremely compact high-performance HMDs based on substrate-guided optical element,” SID Symp. Dig. Tech. Pap. 35(1), 310–313 (2004). [CrossRef]
3. J. Han, J. Liu, X. Yao, and Y. Wang, “Portable waveguide display system with a large field of view by integrating freeform elements and volume holograms,” Opt. Express 23(3), 3534–3549 (2015). [CrossRef]
4. L. Marran and C. Schor, “Multiaccommodative stimuli in vr systems: problems & solutions,” Hum. Factors 39(3), 382–388 (1997). [CrossRef]
5. H. Hua, “Enabling focus cues in head-mounted displays,” Proc. IEEE 105(5), 805–824 (2017). [CrossRef]
6. D. Lanman and D. Luebke, “Near-eye light field displays,” ACM Trans. Graph. 32(6), 1–10 (2013). [CrossRef]
7. H. Hua and B. Javidi, “A 3d integral imaging optical see-through head-mounted display,” Opt. Express 22(11), 13484–13491 (2014). [CrossRef]
8. C. Yao, D. Cheng, T. Yang, and Y. Wang, “Design of an optical see-through light-field near-eye display using a discrete lenslet array,” Opt. Express 26(14), 18292–18301 (2018). [CrossRef]
9. K. Aksit, J. Kautz, and D. Luebke, “Slim near-eye display using pinhole aperture arrays,” Appl. Opt. 54(11), 3422 (2015). [CrossRef]
10. W. Song, Y. Wang, D. Cheng, and Y. Liu, “Light field head-mounted display with correct focus cue using micro structure array,” Chin. Opt. Lett. 12, 060010 (2014). [CrossRef]
11. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques - SIGGRAPH ’96, (1996), pp. 31–42.
13. https://www.jb-display.com.
14. A. Isaksen, L. McMillan, and S. J. Gortler, “Dynamically reparameterized light fields,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques - SIGGRAPH ’00, (2000), pp. 297–306.
15. C.-K. Liang, Y.-C. Shih, and H. H. Chen, “Light field analysis for modeling image formation,” IEEE Trans. on Image Process. 20(2), 446–460 (2011). [CrossRef]
16. R. Matsubara, Z. Y. Alpaslan, and H. S. El-Ghoroury, “Light field display simulation for light field quality assessment,” in Proceedings of SPIE - Stereoscopic Displays and Applications XXVI, vol. 9391 (2015), p. 93910G.
17. D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding, calibration and rectification for lenselet-based plenoptic cameras,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, (2013), pp. 1027–1034.
18. T. Iwane, “Light field display and 3D image reconstruction,” in Three-Dimensional Imaging, Visualization, and Display 2016, vol. 9867 (2016), p. 98670S.
19. X. Jin, L. Liu, and Q. Dai, “Approximation and blind reconstruction of volumetric light field,” Opt. Express 26(13), 16836–16852 (2018). [CrossRef]
20. H. Huang and H. Hua, “Systematic characterization and optimization of 3D light field displays,” Opt. Express 25(16), 18508–18525 (2017). [CrossRef]
21. Z. Qin, P. Y. Chou, J. Y. Wu, Y. T. Chen, C. T. Huang, N. Balram, and Y. P. Huang, “Image formation modeling and analysis of near-eye light field displays,” J. Soc. Inf. Disp. 27(4), 238–250 (2019). [CrossRef]
22. Y. Yuan, X. Wang, Y. Yang, H. Yuan, C. Zhang, and Z. Zhao, “Full-chain modeling and performance analysis of integral imaging three-dimensional display system,” J. Eur. Opt. Soc.-Rapid Publ. 16(1), 12 (2020). [CrossRef]
23. C.-G. Luo, Q.-H. Wang, H. Deng, X.-X. Gong, L. Li, and F.-N. Wang, “Depth calculation method of integral imaging based on gaussian beam distribution model,” J. Disp. Technol. 8(2), 112–116 (2012). [CrossRef]
24. Z. Zhao, J. Liu, L. Xu, Z. Zhang, and N. Zhao, “Wave-optics and spatial frequency analyses of integral imaging three-dimensional display systems,” J. Opt. Soc. Am. A 37(10), 1603–1613 (2020). [CrossRef]
25. M. Cho, M. Daneshpanah, I. Moon, and B. Javidi, “Three-dimensional optical sensing and visualization using integral imaging,” in Proceedings of the IEEE, vol. 99 (2011), pp. 556–575.
26. S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12(3), 483–491 (2004). [CrossRef]
27. A. Schwarz, J. Wang, A. Shemer, Z. Zalevsky, and B. Javidi, “Lensless three-dimensional integral imaging using variable and time multiplexed pinhole array,” Opt. Lett. 40(8), 1814–1817 (2015). [CrossRef]
28. X. Shen and B. Javidi, “Large depth of focus dynamic micro integral imaging for optical see-through augmented reality display using a focus-tunable lens,” Appl. Opt. 57(7), B184–B189 (2018). [CrossRef]
29. M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512–566 (2018). [CrossRef]
30. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998). [CrossRef]
31. H. Huang and H. Hua, “Effects of ray position sampling on the visual responses of 3D light field displays,” Opt. Express 27(7), 9343–9360 (2019). [CrossRef]
32. https://www.synopsys.com/optical-solutions/lighttools.html.
33. Z. Qin, P.-Y. Chou, J.-Y. Wu, C.-T. Huang, and Y.-P. Huang, “Resolution-enhanced light field displays by recombining subpixels across elemental images,” Opt. Lett. 44(10), 2438–2441 (2019). [CrossRef]