Matrix optics representation and imaging analysis of a light-field near-eye display

Cheng Yao; Dewen Cheng; Dewen Cheng; Yongtian Wang

doi:10.1364/OE.411997

1. Introduction

As one of the core technologies of virtual reality (VR) and augmented reality (AR), stereo vision can restore the depth information of virtual scenes, so that users can experience a more efficient and intuitive understanding of virtual information. In the field of near-eye displays (NEDs), conventional binocular stereovision relies on binocular parallax to stimulate the depth perception of the human eye, but these types of NEDs only have one virtual image plane [1–3], which leads to the well-known vergence-accommodation conflict (VAC) problem, in which the eyes perceive different depths through binocular vergence, while the focus of the lenses is restricted to a single depth [4]. The solution to this problem is to simulate the light entering the eye in a natural state, that is, to create a light field (LF) near the position of the eyes, so that the lens of the eye can respond to the light entering the eyes and focus on the corresponding depth of the virtual object. Integral-imaging-based (InI-based) LF displays are one of the main methods used to generate light field [5]. Such systems use a lenslet array [6–8] or a pinhole array [9,10] parallel to the display as a spatial light modulator (SLM). By setting up plane coordinate systems on the SLM plane and the display plane, we can parameterize any ray in space into four coordinates [11].

In order to maximize the use of the limited resolution of displays for recovering depth information, a proportion of the image resolution of each depth plane of InI-based LF displays must be sacrificed. Moreover, for the sake of thinness, a lenslet array used in such systems usually has a relatively small focal length compared with a typical head-mounted display (HMD), which means that the magnification of lenslets is relatively large, resulting in larger virtual pixels and a lower angular pixel density observed by the user. Fortunately, with the continuing development of micro light-emitting diode ($\mu$LED) display technologies, some commercial grade $\mu$LED displays with high pixel density have been launched, such as the 14,000 PPI $\mu$LED display by Mojo Vision [12] and the 10,000 PPI $\mu$LED display by JBD [13]. These will produce clearer visual effects and provide a good prospect for high-resolution LF-NED.

As the display screens become increasingly precise, an analysis of the imaging rule of actual LF display systems becomes more meaningful. To analyze the formation of LF, several studies have been conducted on LF rendering [11,14–16] and plenoptic cameras [17–19]. In the field of LF-NED, Huang and Hua [20] constructed a point spread function (PSF) of the retinal image, and systematically analyzed the visual effects of LF displays and the visual responses of the human eyes by combining the human vision system with an ideal model of LF display. Qin et al [21] analyzed the visual effect based on an image formation model that incorporates all factors affecting the image formation into an LF-NED, including diffraction, aberration, defocusing and pixel size. Yuan et al [22] built a full-chain performance characterization model of an InI-based 3D display system, and evaluated the key parameters of the restored voxels. Wave optics is often used in modeling and evaluating an InI system and it provides relative rigorous results [22–24]. However, these studies mainly focus on the analysis of simulation results to summarize the changing rules or require complex calculation to illustrate the imaging effects. However, they do not provide a relatively simple parametric LF imaging model in an analytical form. Thus, it is still difficult to predict the performance of an InI-based LF-NED system directly by calculation, without simulations or experiments.

Therefore, this study focuses on the parametric analysis of InI-based LF-NED. Our work includes: 1) building an ideal parameterized model of InI-based LF-NED; 2) analyzing the refocusing process by means of matrix optics based on the parameterized model; 3) analyzing the image quality and the error influence of an InI-based LF-NED; and 4) conducting simulations of a simulated InI-based LF-NED to verify the results of the analysis. Compared with previous studies, our work focuses on the primary properties and the general rules of an InI-based imaging system. Thus, we provide a parametric tool which can be used for evaluating the basic indicators a system can achieve based on some simple parameters, such as the focal length and the aperture of the lenslet unit, to design the primary scheme of an actual InI display system, before optimization considering diffraction and aberration.

2. Matrix optics representation of the LF display

A typical InI-based LF display system includes a display plane (DP), a spatial light modulator (SLM), an eyebox plane (EBP), a central depth plane (CDP), and a refocusing plane (RFP), as shown in Fig. 1. We can use a Cartesian reference system in which we define the optical axis to be in the z direction, and the y direction is perpendicular to it in the meridian plane. The light emitted from a display screen at the DP is modulated by the SLM and distributed in the eyebox. The eyebox is a spatial range that limits the range of movement of the pupil, allowing the pupil to experience the right visual effect as it moves within it. EBP is the plane perpendicular to the optical axis at the intersection of the chief rays emitted from the center of each elemental image of the LF image. In the depth priority form, EBP represents the performance of the eyebox, in which a pupil can have the largest range of movement. In the resolution priority system, the EBP acts as a viewing window through which the correct reconstructed scene can be observed. Refocusing in such a system is a process during which rays containing the same information from different elemental images and passing through different SLM units form a single virtual point that can be perceived by the eye. By extending the chief rays of the SLM units into the reverse, rays that represent the same virtual object point will intersect at a point at the RFP to achieve refocusing. As each SLM unit also serves as an imaging system, the DP will also have a conjugate plane about the SLM, namely the CDP. Both CDP and RFP can be in front of or behind the SLM, to form virtual or real images. The InI system can be embodied in depth priority form, in which the CDP and the RFP are on the same side of the SLM as the DP, or resolution priority form, where the CDP and the RFP are on different sides of the SLM from the DP [25]. The human eyes tend to focus on the RFP when observing the virtual scene. Thus, it is obvious that the virtual scene has the clearest refocusing effect when the RFP and the CDP coincide; however, if the CDP and the RFP do not coincide, the reconstructed virtual point on the RFP will no longer be an ideal point, but an accumulation of defocusing spots. The difference between the RFP and the CDP will lead to a reduction in imaging quality as human eyes focus on the RFP when viewing virtual information. In addition, in most studies of InI-based LF displays, researchers have used pinhole array models to simplify lenslet arrays, that is, to consider only the chief rays while ignoring the distribution of non-chief rays [26,27], or to assume that the DP is located within the depth of field (DOF) of the lenslets [28]. These studies also ignore the difference between the CDP and RFP. Therefore, in this section, a perfect lenslet array model is used to quantify the propagation and imaging processes of the rays.

2.1 Representation of the InI system

This work mainly focuses on studying the most basic characteristics of the LF display system based on lenslet arrays, and the paraxial optical model is the basic model of imaging optics. Moreover, for the human eye, the display quality of the central field of view usually occupies the most important position. To facilitate this analysis, we make the following assumptions and approximations for a lenslet array used as an SLM when building a theoretical model:

Fig. 1. General structure and principle of an LF display system: (a) and (b) depth priority integral imaging and (c) and (d) resolution priority integral imaging. These structures share a common imaging model. Note that the virtual object plane and the refocusing plane, located at $z_0$ and $z_{RFP}$ respectively, do not necessarily coincide because of the approximations introduced in our work.

Download Full Size | PDF

1. The lenslet array is composed of lenslets with rotational symmetry surfaces, each of which has the same optical parameters.

2. Each lenslet unit is approximately equivalent to an ideal paraxial system composed of an exit pupil and two principal planes, in which the distance between the exit pupil and the rear principal plane is $d_{exp}$;

3. The system is in homogeneous surroundings regarding the refractive index, so that the height and angle of any rays remain the same when propagating between principal planes.

4. The distance between the two principal planes (usually millimeter scale) is ignored; that is, we refer to the two principal planes as one single plane. According to the property of the principal planes, the only cost of this approximation is that the image space will have an axial position deviation equal to the distance between the two principal planes, which is negligible compared with meter-scale scenes.

Based on these assumptions, all aberrations in the system are ignored, which enables us to focus on the relationship between refocusing and imaging. Meanwhile, the object-image relationship of the system is still consistent with the actual situation, despite the approximations in aberration, so the model reflects the theoretical situation of an LF display.

To improve the calculation efficiency, we use matrix optics to build an InI-based LF-NED model. Matrix optics is a geometric optics analysis method which is in accordance with Gaussian optics [15,29]. To represent the components of the optical path in terms of a matrix, we specify the positive and negative signs of the system: positive is to the right and up in terms of directions, and angles are counted clockwise from the optical axis to the ray. In addition, in the analysis, the principle plane of the lenslet array is at position $z=0$. Then, the state of a ray at a given plane perpendicular to the optical axis can be described with only two spatial and angular coordinates. Based on this, a ray of light passing through a point with height $y$ and an angle with the optical axis of $\theta$, a thin lens with a focal length of $f'$, and a distance of $L$ in the homogeneous refractive index space can be represented as:

(1)$$\mathbf{B}= \begin{bmatrix} y \\ \theta \end{bmatrix} \textrm{, } \mathbf{M}= \begin{bmatrix} 1 & 0 \\ \frac{1}{f'} & 1 \end{bmatrix} \textrm{, } \mathbf{T}= \begin{bmatrix} 1 & -L \\ 0 & 1 \end{bmatrix} \textrm{.}$$

When a ray passes through a thin lens or a specific distance, this is equivalent to left-multiplying the ray vector by the corresponding matrix. It should be noted, that although the distribution and propagation of light are only considered in the 2-D meridian plane of the optical system, the analysis method used in this study is also applicable for systems extending into the 3-D space, owing to the symmetry of the system.

In an InI system, no matter which form it takes, each ray emitted from the DP passes through the SLM with an object distance of $d$ and a focal length of $f'$, propagates to the exit pupil distance of $z_{EPL}$, reaches the EBP, and then propagates in reverse a certain distance $z$. The process can be expressed as

(2)$$\mathbf{B}_{DP-z}=\mathbf{T}_4\cdot\mathbf{T}_3\cdot\mathbf{T}_2\cdot\mathbf{M}\cdot\mathbf{T}_1\cdot\mathbf{B}_{DP}\textrm{,}$$

where $\mathbf {T}_1$, $\mathbf {M}$, $\mathbf {T}_2$, $\mathbf {T}_3$, $\mathbf {T}_4$ are the transfer matrices between the DP, the principal plane of the SLM, the exit pupil plane of the lenslets, EBP, and RFP, respectively.

Because each lenslet unit in the SLM is not coaxial to the optical axis of the system, but has periodic eccentricity, it is necessary to introduce an eccentricity matrix $\mathbf {R}_{decenter}$ and a regression matrix $\mathbf {R}_{regress}$ to describe this. The eccentricity of an optical element cannot be completely defined by a two-dimensional square matrix; therefore, the lens matrix needs to be extended from a 2$\times$2 matrix to a 3$\times$3 matrix. Here, the eccentricity matrix and the regression matrix are, respectively, represented as

(3)$$\mathbf{R}_{decenter}= \begin{bmatrix} 1 & 0 & -ND_{pitch} \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \textrm{, } \mathbf{R}_{regress}= \begin{bmatrix} 1 & 0 & ND_{pitch} \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \textrm{,}$$

where $N$ represents the index number of the lenslet unit. The index number of the lenslets whose center falls onto the optical axis of the system is 0. $D_{pitch}$ indicates the pitch of adjacent lenslets. Therefore, $ND_{pitch}$ indicates the eccentricity of a lenslet unit relative to the optical axis. Correspondingly, to achieve eccentricity, the state transition matrix of the lens needs to be extended to

(4)$$\mathbf{M}_{0}=\begin{bmatrix} 1 & 0 & 0 \\ \frac{1}{f'} & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \textrm{.}$$

By making it eccentric, we can obtain

(5)$$\mathbf{R}_{regress}\cdot\mathbf{M}_0\cdot\mathbf{R}_{decenter}= \begin{bmatrix} 1 & 0 & 0 \\ \frac{1}{f'} & 1 & -\frac{ND_{pitch}}{f'} \\ 0 & 0 & 1 \end{bmatrix} \textrm{.}$$

Because the subsequent parts of the system no longer involve eccentricity, the bottom row of the above matrix can be removed to obtain the definition of an eccentric lenslet $\mathbf {M}$ in Eq. (2) :

(6)$$\mathbf{M}= \begin{bmatrix} 1 & 0 & 0 \\ \frac{1}{f'} & 1 & -\frac{ND_{pitch}}{f'} \\ \end{bmatrix} \textrm{.}$$

For this matrix to be able to be multiplied with other matrices, $\mathbf {T}_1$ also needs to be represented as a 3-D square matrix

(7)$$\mathbf{T}_1= \begin{bmatrix} 1 & -d & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \textrm{.}$$

The subsequent parts of the system are still represented by 2$\times$2 matrices:

(8)$$\mathbf{T}_2= \begin{bmatrix} 1 & -d_{exp} \\ 0 & 1 \end{bmatrix},\ \mathbf{T}_3= \begin{bmatrix} 1 & -z_{EPL}+d_{exp} \\ 0 & 1 \end{bmatrix},\ \mathbf{T}_4= \begin{bmatrix} 1 & -z+z_{EPL} \\ 0 & 1 \end{bmatrix} \textrm{.}$$

In summary, the state transfer matrix from DP to RFP of an InI system is

(9)$$\begin{aligned} \mathbf{T}_{DP-z}= & \mathbf{T}_4\cdot\mathbf{T}_3\cdot\mathbf{T}_2\cdot\mathbf{M}\cdot\mathbf{T}_1 \\ = & \begin{bmatrix} 1-\frac{z}{f'} & -z-\frac{d\left(f'-z\right)}{f'} & \frac{ND_{pitch}z}{f'} \\ \frac{1}{f'} & 1-\frac{d}{f'} & -\frac{ND_{pitch}}{f'} \end{bmatrix} \textrm{.} \end{aligned}$$

For ray vector $\mathbf {B}_{DP}$ to be multiplied by this matrix, $\mathbf {B}_{DP}$ should be extended as follows:

(10)$$\mathbf{B}_{DP}= \begin{bmatrix} y \\ \theta \\ 1 \end{bmatrix} \textrm{.}$$

By left-multiplying the extended ray vector of the pixel on the DP by the state transfer matrix, the position and direction of the light at the intersection with RFP can be obtained.

2.2 Representation of LF rendering and refocusing

Based on the principles of InI, to recover LF information, the content displayed on the DP must be an LF image. Generally, the rendering of an LF image and the refocusing of an LF display are a pair of approximate reciprocal processes. They are approximated because the limited modulation power of the SLM makes it almost impossible to render the LF image exactly as the light is distributed in actual situations. When we focus on rendering an LF image, we only need to focus on a specific ray for each SLM unit, and ensure that it conforms to the light distribution of the natural state as much as possible. For these rays, we need to trace the processes by which they are emitted from virtual object points and modulated to obtain their coordinates on the DP. Ray tracing is an accurate way to calculate an LF image and can even be used to correct distortions of the lenslets of the SLM. However, because of the tremendous amount of spatial ray data, it requires high-performance computers to realize a real-time generation of an LF image by ray tracing. Alternatively, a rendering method based on projection matrix transformation can effectively reduce the computational cost of LF image rendering, owing to the linear mapping of object point coordinates from 3D space onto a 2D plane. Here, we can set up a camera array according to the relevant rules of computer graphics and build a projection matrix model corresponding to the parameters of the SLM.The pinhole array model is the simplest form of SLM, as it does not refract any rays. It shields most rays, and only allows for rays that conform to the actual LF to enter the eyebox. Therefore, in the case of a pinhole array, the rendering process of an LF image displayed on the DP is the reverse process of refocusing. In the rendering model, the center of each camera model used for rendering is located at the center of the corresponding pinhole, and the visual axis intersects at the center of the EBP [8]. In this way, the virtual object point, the corresponding pixel on the DP, and the center of the exit pupil are collinear. Thus, as shown in Fig. 2, in the pinhole array model, the corresponding pixel on the DP is located at point A, the intersection of the DP and line $\textrm {AO}_{\textrm {exp}}$, the connection of the virtual object point, and the center of the exit pupil. In a lenslet-based SLM, however, it is obvious that the chief rays passing through the lenslets are refracted by the lenslet before passing through the center of the exit pupil. The result is that the corresponding pixel on the display is at B, instead of A. Using the projection matrix model to calculate the coordinates of point B will be a very complex process, so we use $\textrm {B}'$, the intersection of the DP and line CO, the connection of the principal point O and the virtual object point, to replace B.

Fig. 2. The process of rendering a virtual point on the DP. The ideal ray that contains the information of the virtual point is $\textrm {AO}_{\textrm {exp}}$, emitted from the pixel at point B. B’O is selected as the approximated ray to render a virtual point as the calculation of the position of point B is complicated.

Download Full Size | PDF

The error caused by the approximation can be expressed as the degree to which pixel B is close to $\textrm {B}'$ in an elemental image. According to Fig. 2, the coordinates of point B and $\textrm {B}'$ have the following geometric relationship:

(11)$$\begin{aligned} k=\frac{y_B-ND_{pitch}}{y_{B'}-ND_{pitch}}=\frac{d\left(f'+d_{enp}\right)}{f'\left(d+d_{enp}\right)}\textrm{,} \end{aligned}$$

where $y_B$ and $y_{B'}$ are the height of the rays at point B and $\textrm {B}'$, respectively; $d_{enp}$ is the entrance pupil distance of the lenslet, which is conjugated with the exit pupil distance, $d_{exp}$. Because the values of $f'$ and $d$ are usually close to each other, $k$ is very close to 1 for lenslet in general use, and the error caused by it is even less obvious than the error caused by the distortion of an actual lenslet (which can be controlled within 1%). Therefore, it can be considered that point $\textrm {B}'$ can represent the result of the virtual object point rendering to the display plane.

In summary, we can represent the rendering process by a matrix transformation. The ray emitted from the virtual object point at distance $z_0$ in front of the principal plane of the lenslet can be expressed as

(12)$$\begin{aligned} \mathbf{B}_0= \begin{bmatrix} y \\ \frac{ND_{pitch}-y}{z_0} \end{bmatrix} \end{aligned} \textrm{.}$$

The ray vector of point $\textrm {B}'$ is

(13)$$\begin{aligned} \mathbf{B}_{B'}= & \mathbf{T}_0\cdot\mathbf{B}_0 \\ = & \begin{bmatrix} 1 & z_0+d \\ 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} y_0 \\ \frac{ND_{pitch}-y}{z_0} \end{bmatrix} \\ = & \begin{bmatrix} ND_{pitch}+\frac{d\left(ND_{pitch}-y\right)}{z_0} \\ \frac{ND_{pitch}-y}{z_0} \end{bmatrix} \end{aligned} \textrm{.}$$

As each elemental image in the LF image corresponds to a lenslet unit, the direction of a ray emitted from the DP can be represented by other variables. Specifically, we need to calculate the angle term of the ray vector at the DP according to the height at the DP and that at the exit pupil plane of the lenslet. The specific process is shown in Fig. 3. The ray from point $\textrm {B}'$ on the DP enters the lenslet and refracts at point D on the principal plane of the lenslet, and then passes through point E on the exit pupil plane. In this way, according to the properties of the focal plane, ray CO is parallel with ray DE. The above geometric relation can be expressed as:

(14)$$\frac{r_{lens}-r_{exp}}{d_{exp}}=\frac{\frac{f'}{d}\left(y_{B'}-ND_{pitch}-r_{lens}\right)+r_{lens}}{f'}\textrm{.}$$

We can determine

(15)$$r_{lens}=\frac{f'\left(d_{exp}ND_{pitch}-d_{exp}y_{B'}-dr_{exp}\right)}{d_{exp}f'+df'-dd_{exp}}\textrm{.}$$

Then, the angle term of the ray vector at DP can be expressed as

(16)$$\begin{aligned} \theta & =\frac{y_{B'}-ND_{pitch}-r_{lens}}{d} \\ & =\frac{f'\left(y_{B'}-ND_{pitch}-r_{exp}\right)-d_{exp}\left(y_{B'}-ND_{pitch}\right)}{d_{exp}f'+df'-dd_{exp}} \textrm{.} \end{aligned}$$

Thus, by substituting the value of $y_{B'}$ and extending the vector into a form that can be multiplied by a 3-order matrix, and the vector of a ray emitted from $\textrm {B}'$ onto DP can be represented as

(17)$$\mathbf{B}_{DP}=\begin{bmatrix} ND_{pitch}+\frac{d\left(ND_{pitch}-y\right)}{z_0} \\ \frac{d\left(f'-d_{exp}\right)\left(ND_{pitch}-y\right)-f'rz_0}{z_0\left(d_{exp}f'+df'-dd_{exp}\right)} \\ 1 \end{bmatrix} \textrm{.}$$

By tracing the rays in reverse, the distribution of the rays at a certain distance $z$ from the principal plane of the SLM can be obtained as

(18)$$\begin{aligned} \mathbf{B}_{z} & = \mathbf{T}_{DP-z}\cdot\mathbf{B}_{DP} \\ & = \begin{bmatrix} k_{11}y+k_{12}ND_{pitch}+k_{13}r \\ k_{14}y+k_{15}ND_{pitch}+k_{16}r \end{bmatrix} \textrm{,} \end{aligned}$$

where

(19)$$\left\{ \begin{aligned} k_{11} & =\frac{df'\left(z-d_{exp}\right)}{\left(d_{exp}f'+df'-dd_{exp}\right)z_0} \\ k_{12} & =1-\frac{df'\left(z-d_{exp}\right)}{\left(d_{exp}f'+df'-dd_{exp}\right)z_0} \\ k_{13} & =1-\frac{\left(f'-d\right)\left(d_{exp}-z\right)}{d_{exp}f'+df'-dd_{exp}} \\ k_{14} & =-\frac{df'}{\left(d_{exp}f'+df'-dd_{exp}\right)z_0} \\ k_{15} & =\frac{df'}{\left(d_{exp}f'+df'-dd_{exp}\right)z_0} \\ k_{16} & =-\frac{f'-d}{d_{exp}f'+df'-dd_{exp}} \end{aligned} \right.$$

Fig. 3. Calculation of the direction of the ray emitted from point $\textrm {B}'$ on the DP using an auxiliary point C on the focal plane.

Download Full Size | PDF

This result reflects the refocusing state of the rays. It can be seen from the results that the coefficients of $y$, $ND_{pitch}$ and $r$ represent the effects of the virtual object point position, the lenslet position, and the position of the ray in the exit pupil of the lenslet on the light vector, respectively. The refocusing error of the system can be analyzed by comparing the rays on the RFP with those from the virtual object. When the rays refocus on the RFP, the height of each ray is independent of the position of the lenslets, that is, $k_{12}=0$ when $z=z_{RFP}$. Thus, we can derive

(20)$$z_{RFP}=z_0+\left(\frac{1}{d}-\frac{1}{f'}\right)d_{exp}z_0+d_{exp}$$

Substituting $z_{RFP}$ into $k_{11}$, we obtain $k_{11}=1$. This indicates that the image height on the RFP does not change from the original virtual object point, but that the depth of the refocused image point deviates from the depth of the original virtual object point to some extent. Because the values of $f'$ and $d$ are usually close, and $d_{exp}$ is a small value compared to $z_0$, this depth deviation is very small. When CDP and RFP coincide, the deviation decreases to zero.

These equations are applicable to all types of InI systems and can be simplified by changing the parameters. For example, if $r=0$ and $d=f'$, the lenslet array model can be simplified to a pinhole array model. In this way, we can analyze the system using the pinhole imaging principle, which is very convenient for roughly determining the initial parameters of an InI-based LF display system. In addition, for $z=z_{EPL}$, the light distribution at the EBP can be obtained. Particularly, when $z_0=-d$, we can directly calculate the propagation of any pixel from DP. Under these conditions, Eq. (19) can be expressed as

(21)$$\left\{ \begin{aligned} k_{11}' & =\frac{f'\left(d_{exp}-z\right)}{d_{exp}f'+df'-dd_{exp}} \\ k_{12}' & =1-\frac{f'\left(d_{exp}-z\right)}{d_{exp}f'+df'-dd_{exp}} \\ k_{13}' & =1-\frac{\left(f'-d\right)\left(d_{exp}-z\right)}{d_{exp}f'+df'-dd_{exp}} \\ k_{14}' & =\frac{f'}{d_{exp}f'+df'-dd_{exp}} \\ k_{15}' & =-\frac{f'}{d_{exp}f'+df'-dd_{exp}} \\ k_{16}' & =\frac{d-f'}{d_{exp}f'+df'-dd_{exp}} \end{aligned} \right. \textrm{.}$$

These equations are useful for calculating the parameters of the LF image displayed on the DP.

Note that although the focal plane is farther than DP from the SLM in Fig. 2 and Fig. 3, thus embody a depth-priority form, the calculations above also apply to a resolution-priority form, the only difference is that $d$ is larger than $f'$ and $z_0$ is positive.

3. Imaging analysis of the LF display

Several variables contribute to the visual effects of the system. To better illustrate the state of the rays in a system, we used a high-PPI micro-display as the image source and designed a lenslet array with typical parameters as the SLM. The parameters of the system are shown in Table 1. In this section, we analyze a depth-priority InI display system, but it should be noted that since the calculation is based on the formula derived in the previous section, this analysis method is also applicable to resolution-priority systems.

Table 1. Structural and optical parameters of the designed optical model

View Table | View all tables in this article

3.1 Resolution analysis

As the LF image is composed of several elemental images, the information of the same virtual object point appears in multiple positions on the DP. Therefore, the number of times the same information appears on the DP reflects the information redundancy of the virtual object point. Clearly, this information redundancy varies with the location of the virtual object points because the pixels of the virtual object points at different distances appear at different intervals on the display. For convenience, we assume that the pixels of the display screen used are evenly distributed, and the elemental images are closely spaced, although the actual display screens may have more complex structures. We use the effective pixel ratio $\eta$ to represent the resolution loss level of the LF display. Note that although only 1-D situation is considered in this section, the conclusion is also applicable to a 2-D plane by applying the same calculation idea to 2-D coordinates.

The effective pixel ratio is calculated as follows. First, the size of an elemental image $D_{unit}$ is calculated. This value determines the size of the EBP, which is similar to the shape of each elemental image. According to Eq. (21), the rays emitted from the same edge of the adjacent elemental image intersect at the EBP. That is, the following is satisfied at the EBP:

(22)$$k_{11}'D_{unit}+k_{12}'D_{pitch}=0 \textrm{.}$$

Note that in the rendering process, $d_{exp}$ is set 0. Further, according to the rendering method, the pixel interval $D_{info}$ of a single virtual object point on the DP should satisfy

(23)$$\frac{D_{info}}{z_0+d}=\frac{D_{pitch}}{z_0} \textrm{.}$$

Based on the above equations, the effective pixel ratio can be calculated as

(24)$$\eta=\frac{D_{unit}-D_{info}}{D_{unit}}=\frac{d\left(z_0-z_{EPL}\right)}{z_0\left(d+z_{EPL}\right)}\textrm{, where }z_0\in\left(-\infty, -d\right) \textrm{.}$$

As seen from Eq. (24), when the virtual object point approaches infinity from near, the value of $\eta$ approaches its minimum value $d$/$\left (d+z_{EPL}\right )$ from 1. Moreover, reducing the eye relief helps to reduce the resolution loss. The function can be graphed as Fig. 4. As seen in the figure, the effective pixel ratio decreases with an increasing depth of the virtual object point $-z_0$. Additionally, the decrease rate slows down with the increase of $-z_0$. In the observation range with a distance greater than 250mm, that is, in the solid line part of Fig. 4, the decrease rate drops more slowly and tends to a certain value. According to the equation, if EBP is located at infinity, $\eta$ will be reduced to 0, which is unreasonable. Actually, as the display resolution is limited, the minimum value of $\eta$ is the ratio of the 1-D resolution of an elemental image to the 1-D resolution of the display.

Fig. 4. The plot of the effective pixel ratio $\eta$ with respect to the virtual object point distance $z_0$. $\eta$ drops rapidly as $-z_0$ increases, then it approaches a certain limit.

Download Full Size | PDF

3.2 Refocusing depth accuracy

In the 4-D LF, we can define a ray using a line between the exit pupil center of a lenslet and the pixel center on the DP. Theoretically, any ray can be defined in this way. However, as the pixels and lenslets are distributed discretely in their respective planes, these lines cannot fully describe the arbitrary ray distribution in real situations. Thus, using these lines to define the refocusing depth inevitably decreases the depth accuracy of the refocused scene.From the microscopic perspective, when the information in a 3-D scene is rendered to a 2-D image, the information represented in each full pixel range is the same as that at the center of the pixel (regardless of the structure of RGB sub-pixel). Additionally, the gray value of each pixel is determined by the intensity value of the ray emitted by its corresponding virtual object point. Therefore, to achieve perfect refocusing, the virtual images of the pixels representing the same information formed by the corresponding lenslets must completely overlap. Ideally, the rays emitted by the virtual object point can still maintain the original gray value after rendering and refocusing. However, because of resolution degradation [30], a single virtual point may not be rendered on an exact position in each elemental image. Specifically, for example, if the virtual object points in a virtual scene are separated from each other, they can be rendered to every single elemental image. However, due to the limitation of sampling accuracy, the coordinates of the pixels in the elemental images are not always the same as in the ideal case. Thus, in the process of refocusing, the light represented by the coordinates of these pixels and the coordinates of the corresponding SLM units cannot perfectly intersect at the original virtual object point, which may cause the focusing state of the human eye to deviate slightly from the ideal situation. In addition, when the scene is more complex, some virtual object points in the virtual space may not be rendered to each elemental image, but may be replaced by other points next to these virtual points, because their locations are more accurate. This may result in insufficient sampling for a virtual object point, which will also cause the deviation of the focus state of the human eye [31].

Therefore, based on the geometric relation depicted in Fig. 5, the ideal refocusing condition of a virtual object point is:

(25)$$\frac{n_{pixel}D_{pixel}}{d}=-\frac{n_{pitch}D_{pitch}}{z_0}, \ \textrm{where}\ n_{pixel},\ n_{pitch} \in \mathbb{N}^+ \textrm{,}$$

where $n_{pitch}$ is the number of lenslet intervals, and its maximum value is the number of the intervals of the lenslet footprints within the pupil range of the human eye; $n_{pixel}$ is the offset of the pixel interval representing the same information relative to $z_{RFP}=\infty$; and $D_{pixel}$ is the size of a pixel. It is important to note that in some systems, the pixels are not necessarily closely aligned; thus, $n_{pixel}$ may not be an integer. However, for calculation convenience, it can be assumed to be an integer, which does not affect the variation of the calculation results.As the number of lenslet footprints contained in the range of the human eye is uncertain, its maximum value can be expressed as

(26)$$n_{pitch\_max}=\lfloor\frac{D_{pupil}}{D_{pitch}}\rfloor \textrm{.}$$

The smaller the pupil size, the fewer values $n_{pitch}$ can take.Therefore, the overlap of pixels under different $n_{pitch}$ should be considered. The values of $n_{pitch}$ under different pupil diameters are shown in Fig. 6. It can be seen that the smaller the $n_{pitch}$, the more the multiples of $n_{pitch}D_{pitch}$ in the pupil range. In other words, the more pixels the virtual image can overlap at the RFP, the clearer the image and the higher the contrast, but the fewer the number of perfect RFPs. On the contrary, a higher $n_{pitch}$ corresponds to a higher number of perfect RFPs, but the imaging clarity may be lower.

Fig. 5. A macro illustration of the LF image rendering. The original continuous virtual object information is discretized due to the discreteness of the pixels. Note that there is deviation between the virtual object plane and the refocusing plane, as calculated in Eq. (20).

Download Full Size | PDF

Fig. 6. Distribution of the perfect RFPs. The positions of the perfect RFPs are where the curve intersects with the integer values of $n_{pixel}$.

Download Full Size | PDF

3.3 Refocusing definition

As the lenslet aperture is smaller than the pupil of the human eye, the beam diameter is limited by the lenslet aperture. Evidently, to achieve the refocusing effect, the rays from the virtual object point should pass through at least two lenslets before entering the pupil. Here, we assume that the number of lenslet footprints within the pupil range is a positive integer, and we use the chief rays of the lenslets to analyze the refocusing situation.

The smallest unit of an image is pixel, and it has a certain size. When the human eye observes a pixel through a lenslet unit or an ideal pinhole unit in the SLM, it actually observes the virtual image of the pixel in the image space of the lenslet or the ideal pinhole, which is called the virtual pixel here. For a lenslet, the virtual pixel is located on the CDP, but refocusing makes the human eye focus on the RFP, not necessarily the CDP. Therefore, the result observed by the human eye on a single pixel is actually the defocusing spot of the virtual pixel, as shown in Fig. 7 For SLM in the form of pinhole array, the virtual pixel defocusing spot at RFP can be regarded as the distribution range of all the main rays within the field of view of pixel at that position, and its size is:

(27)$$D_{virtual\_pixel}=\left| k_{11}' \right|D_{pixel} \textrm{.}$$

The angle (in radians) of the spot to the human eye can be calculated as:

(28)$$U_{spot\_pixel}=\frac{D_{virtual\_pixel}}{-z_{RFP}+d_{exp}+z_{EPL}} \textrm{.}$$

Fig. 7. Formation of the pixel spot on the RFP, which can be regarded as the convolution result of the spot formed by an ideal object point within the virtual image range of the pixel.

Download Full Size | PDF

Meanwhile, for a single imaging channel in SLM, an ideal virtual image point at the CDP is reflected as a defocus blurred spot at the RFP, the diameter of which is expressed as $D_{spot\_lens}$, and the angle formed by the light spot to the eyes is expressed as $U_{spot\_lens}$. Therefore, the defocusing spot formed by a single virtual pixel at the RPF can be regarded as the result of the convolution of the defocusing spot formed by the ideal object point within the range of Eq. (27). If the spot boundary is defined as where the brightness drops to zero, the angle of the defocusing spot to the eye formed by a single virtual pixel is:

(29)$$U_{spot}=U_{spot\_lens}+U_{virtual\_pixel} \textrm{.}$$

As different systems have different optical properties in practice, it is difficult to describe $U_{spot\_lens}$ in parametric analytical form. Nevertheless, we can substitute the ideal aberration-free optical system model into the equation to obtain the most essential variation law of refocusing clarity, while the actual system has a similar variation trend.

In an aberration-free imaging system, the angle formed by the light spot to the eyes can be expressed as:

(30)$$U_{spot\_lens}=\frac{\left| k_{13}' \right|D_{lens}}{-z_{RFP}+d_{exp}+z_{EPL}} \textrm{.}$$

For a better description, the definition can be denoted by angular pixel density (represented as pixel-per-degree or PPD):

(31)$$PPD = \frac{\pi}{180U_{spot}} \textrm{.}$$

Substituting Eq. (29) into it, the angular pixel density image can be obtained as shown in Fig. 8(c). It can be seen that the change rule of the sharpness of a single pixel is similar to that of an ideal object point, and the sharpness reaches the maximum value at $z_{RFP}=z_{CDP}$. For different $d$, when $d$ approaches the focal length value, the peak position of angular pixel density rapidly moves to the right, resulting in the decrease of the sharpness at the near position and the improvement of the sharpness far away.

Fig. 8. PPDs of (a) a virtual pixel in a pinhole-based model, (b) a spot formed by an ideal point at a certain distance, and (c) a virtual pixel in a lenslet-based system.

Download Full Size | PDF

It is important to note that the definition of the spot edge here is the position where the brightness drops to zero. However, if the definition of the spot edge is changed, for example, as the position with the half-high brightness value, the angular pixel density will increase appropriately. Or, when there is aberration in the system, the clarity changes to a certain extent, but the overall trend remains. Therefore, in the actual system, the display clarity does not always conform to the curve in Fig. 8. However, because the curve can reflect the change in clarity and distribution trend under ideal conditions, we can still adjust the parameters of the system according to the clarity reflected by the curve. The specific situation will be better reflected by simulation.

3.4 Effect of assembly errors on refocusing

The camera model used for the LF image rendering must be consistent with the parameters of the actual lenslet array model. However, there are errors between the position of the DP in the imaging system and the theoretical model. These errors can be decomposed into rotation and translation along the x-, y-, and z-axes. Among them, the translation along the x- and y-axes will cause the translation of the refocused image and eyebox in the x-y plane without affecting the clarity. The translation along the z-axis will cause the refocused result of the system to be inconsistent with the expectation. Meanwhile, as the values are usually small, rotation along the x- and y-axes can be approximated by translation along the z-axis at a local position; the rotation along the z-axis can be approximated as a translation along the x- or y-axes at a local position. Hence, the translation on the z-axis has the greatest impact on the refocusing effect. Therefore, this subsection focuses on the determination of the translation error of the z-axis and its influence.

Considering the above conclusion, if the DP is shifted by $\Delta d$ along the direction of the optical axis and other parameters in the system remain unchanged, the matrix $\mathbf {T}_1$ will become

(32)$$\mathbf{T}_1=\begin{bmatrix} 1 & -d-\Delta d & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \textrm{.}$$

Owing to the displacement of $\Delta d$ between the DP and SLM, the ray emitted by the DP is modified to

(33)$$\mathbf{B}_{DP}'= \begin{bmatrix} ND_{pitch}+\frac{d\left(ND_{pitch}-y\right)}{z_0} \\ \frac{d\left(f'-d_{exp}\right)\left(ND_{pitch}-y\right)-f'rz_0}{z_0\left[d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}\right]} \\ 1 \end{bmatrix} \textrm{.}$$

By tracing the ray from the object point and refocusing it, the result will be

(34)$$\begin{aligned} \mathbf{B}_o= & \mathbf{T}_4\cdot\mathbf{T}_3\cdot\mathbf{T}_2\cdot\mathbf{M}\cdot\mathbf{T}_1\cdot\mathbf{B}_{DP}' \\ = & \begin{bmatrix} k_{11}y+k_{12}ND_{pitch}+k_{13}r \\ k_{21}y+k_{22}ND_{pitch}+k_{23}r \end{bmatrix} \end{aligned} \textrm{,}$$

where

(35)$$\left\{ \begin{aligned} k_{21}= & \frac{df'\left(z-d_{exp}\right)}{\left[d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}\right]z_0} \\ k_{22}= & 1-\frac{df'\left(z-d_{exp}\right)}{\left[d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}\right]z_0} \\ k_{23}= & 1-\frac{\left(d+\Delta d-f'\right)\left(z-d_{exp}\right)}{d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}} \\ k_{24}= & -\frac{df'}{\left[d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}\right]z_0} \\ k_{25}= & \frac{df'}{\left[d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}\right]z_0} \\ k_{26}= & \frac{d+\Delta d-f'}{d_{exp}f'+\left(d+\Delta d\right)f'-\left(d+\Delta d\right)d_{exp}} \end{aligned} \right. \textrm{.}$$

When refocusing is achieved, $k_{22}=0$, and the following can be obtained:

(36)$$z_{RFP}=\left(1+\frac{\Delta d}{d}+\frac{d+f'-\Delta d}{df'}d_{exp}\right)z_0+d_{exp}$$

If $z_{RFP}$ at this time is substituted into $k_{21}$, $k_{21}=1$, which indicates that the image height at the refocusing plane is unaffected by the z-axis assembly error. It can also be seen from the equation that the refocus and ideal depths are scaled proportionally, with the center of the lenslet exit pupil as the scaling center. In other words, the depth distortion is formed, and the scaling ratio is linearly related to the depth error $\Delta d$. When $d_{exp}=f'$, the SLM becomes a telecentric system array, then the scaling will be stable at $f'$/$d$ without being affected by the assembly error. It should be noted that although the assembly error will cause a change of the refocusing plane, refocusing can still be realized. Therefore, its clarity still conforms to the change rule summarized in the previous section while the only difference is that $d$ becomes $d+\Delta d$.

4. Simulation verification

In the calculation of some of the system parameters such as resolution and RFP position by using the formula derived above, only the chief rays of each field of view of each imaging unit is used, so the calculated results are not affected by aberration. However, for the parameters such as refocusing definition, which requires the consideration of full-aperture rays, there is a significant difference between the actual system with aberration and the ideal model. At this point, the calculation results based on the ideal model can only provide a reference for the design of the actual system. Two experiments were conducted to compare the display effect between the ideal system and the actual system.

As it is difficult in practice to precisely control every optical element within the sub-mm level, we conducted simulations to illustrate the viewing effect of the InI-based LF display. The simulation was conducted using LightTools [32], as shown in Fig. 9. The SLM used here is a lenslet array with a unit pitch of 1 mm. Each lenslet is a plano-convex lens with a focal distance of 5 mm, and are combined into one single element. The material of the element is E48R, the refractive index of which is 1.53. The simulated microdisplay is a square Lambertian planar surface light source with a diagonal size of 1 inch. It emits light with a wavelength of 555 nm into the space, and its divergence angle is set to cover the range of the eyebox. An LF image with a resolution of $3840\times 3840$ (i.e. a pixel density of 5430 PPI), same size as the light source is used as a mask for the light source to simulate the content displayed on the display screen. The mask is located at the DP position, and the LF image it displays is rendered in Unity using the method described in Section 2. To simulate the viewing effect under different CDP, the distance between the mask and the SLM main plane varies from 5.00mm to 4.86 mm, and the corresponding CDP position is -173 mm to -$\infty$. In order to simulate the visual effect of human eyes, an ideal thin lens with adjustable focal length is used as the lens of a human eyes, and its aperture is set to 8 mm according to the general size of the pupil of a human eye. Behind the lens is a planar receiver, used for simulating the retina of the human eye. The distance between the receiver and the lens is fixed at 35mm. By adjusting the focal length of the ideal lens based on Gaussian equation, it forms a conjugate relation with different RFPs in the system. By ray tracing, we can obtain retinal images on these planar receivers to represent the visual effects of the human eye.

Fig. 9. (a) The simulation system built in LightTools, including a mask lit by the backlight, 5 mm perfect lenslet array, 35 mm perfect lens, and 8 image planes corresponding to 8 RFPs. (b) The scene built in Unity for the second simulation.

Download Full Size | PDF

In the first simulation, an object of sufficiently small size was placed at different distances and rendered on LF images separately, so that the object in each elemental image occupies only one single effective pixel, that is, the size of the object is the smallest detail that could be presented by the LF display system. In order to reflect the situation mentioned above that perfect refocusing cannot be achieved due to the discrete coordinates on the DP and the SLM, we enabled only one row of the elemental images to ensure that refocusing only occurs in the horizontal direction. The refocusing image on the retina can be regarded as a collection of retinal images of different elemental images. Therefore, the size of the retinal image in the vertical and horizontal directions reflects the size of the light spot formed by a single pixel and that formed by the refocusing of multiple pixels, respectively. Figure 10 shows the retinal images under different CDP and RFP conditions. The retinal image has the minimum size when the CDP and the RFP coincide, and when they do not coincide, the retinal image will deviate from the original shape of the pixel and gradually show the shape of the pupil of the lenslet unit. However, as mentioned above, even in the case that the CDP and the RFP coincide, the limited size of the pixels as well as the rendering errors can lead to a deviation between the RFP and the ideal refocusing plane. This makes the observed refocused image points larger and less sharp than those formed by individual pixels.

Table 2 illustrates the comparison between the calculated and simulated values of the angular pixel density of the retinal spots. The simulated values include the single-pixel angular density and the refocused-pixel angular density, which are calculated based on the sizes of the corresponding retinal image simulation results in the vertical and horizontal directions respectively. Due to the blurry boundary of the spots and the influence of aberration, the shape of the spots is not regular. Therefore, we define the boundary of the retinal spots as where the brightness drops to 10% of the maximum. It can be seen that the clarity of the simulation values are close to the calculated values, and there is a deviation between them due to the influence of aberration. Some of the simulated PPD values are even higher than the corresponding calculated value. This is because aberrations, especially spherical aberration, may extend the depth of field of a single lenslet. Meanwhile, the angular pixel density of a single-pixel spot is about 1 to 3 times that of a refocused-pixel spot. This phenomenon is caused by the finiteness of the amount of ideal RFP positions as mentioned in Section 3.2 and the rendering error of a single virtual object point. This was also discussed in Qin et al.’s study [21], and can be alleviated by rendering the virtual scene at sub-pixel level [33].

Fig. 10. Retinal images of the spot form by the pixels under different CDPs and RFPs.

Download Full Size | PDF

Table 2. Comparison of the calculated PPD and the simulated PPD

View Table | View all tables in this article

From the first simulation, the angular pixel density is reduced after refocusing, but despite this, the virtual objects at the RFPs still have the highest refocusing clarity when observed. We illustrated this with another simulation. In this simulation, the LF image contains 7 patterns, namely 6 digital patterns and 1 circle pattern. To increase the detail for clearer recognition, black horizontal stripes with periodic arrangement were added to each figure. The positions of the objects are evenly arranged in a reciprocal order, and the distances to the principal plane of the SLM are 173 mm, 203 mm, 245 mm, 307 mm, 412 mm, 620 mm, and 1245 mm, respectively. The CDP varies among these patterns with the adjustment of the focal length of the ideal lens. A bias pattern is also set at 10 m in front of the principal plane of the SLM as the background to represent objects at approximately infinite distance. To distinguish the patterns from the background, the contrast of the bias pattern on the background pattern is set lower. Depending on their positions, the figures were scaled to appear of the same size on the retina, as well as the same detailed texture and richness. As shown in Fig. 9(b) The retinal images of different RFPs and CDPs are shown in Fig. 11. The elemental images reveal that when the eye focuses on an RFP, the virtual object on it appears clearer than other objects off the RFP. Additionally, the images of the objects off the RFP appear blurry, which is similar to the effect of natural human vision and verifies that the system provides correct focus cues. Furthermore, when the CDP and RFP coincide, the object on the RFP is the clearest, although some approximations are introduced in structure and rendering. Nevertheless, even the clearest images of the objects on the CDP are not as sharp as expected due to the misalignment of the accumulated retinal images of the pixels. Although according to Section 3.2, the error generated in the rendering process will cause a slight change in the focus cues, from a macro perspective, the focus cues reflected in the simulation results are basically consistent with the actual visual effect.

Fig. 11. Simulating results of a LF image containing figures corresponding to different RFPs under different CDPs.

Download Full Size | PDF

5. Discussion

The calculation in this paper is based on the light in the meridian plane. However, due to the rotation symmetry of the system, this calculation can also be extended to the light in 3-D space without loss of generality. Specifically, if we want to expand the calculation to the 3-D space, the matrices of a ray, lens, and distance should be

(37)$$\mathbf{B}= \begin{bmatrix} x & y \\ \theta_{x} & \theta_{y} \\ 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \textrm{, } \mathbf{M}= \begin{bmatrix} 1 & 0 & 0 & 0 \\ -\frac{1}{f'} & 1 & 0 & 0 \\ \end{bmatrix} \textrm{, } \mathbf{T}= \begin{bmatrix} 1 & -L & 0 & 0 \\ 0 & 1 & 0 & 0 \\ \end{bmatrix} \textrm{, }$$

and the decenter and the regress matrices should be represented as

(38)$$\mathbf{R}_{decenter}= \begin{bmatrix} 1 & 0 & -N_{x}D_{pitch} & -N_{y}D_{pitch} \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \textrm{, } \mathbf{R}_{decenter}= \begin{bmatrix} 1 & 0 & N_{x}D_{pitch} & N_{y}D_{pitch} \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \textrm{.}$$

By substituting these matrices into the formula in this paper, the analysis of the display effect can be extended to the 3-D case.

Note that the equations in this paper are derived based on the ideal lens model without any aberration. But the aberrations in an actual system will cause a deviation between the actual effect and the theoretical calculation. For example, field curvature and spherical aberration may not only cause blur, but also cause axial offset or curvature of the CDP, while distortion can cause axial offset of the RFP. Nevertheless, an optical system based on the paraxial approximation is always the most ideal case of a practical optical system, and its variation law in the analysis process reflects the most basic characteristics of a practical optical system. Therefore, the conclusions derived in this paper can provide theoretical reference for the design of a practical LF display system.

6. Conclusion

We described a parameterized model of the structure, imaging process, and rendering and refocusing process of an InI-based LF-NED and analyzed the performance of it by introducing matrix optics. The calculation results can be expressed as a combination of independent monomials related to the pixel position, lenslet position, and position of the ray on each lenslet. We further described the calculation of the 2-D resolution and its influencing factors. Based on the rendering process and physical parameters of the display, we provided the distribution of the RFPs that satisfy the perfect refocus condition. The definition on the perfect RFPs can also be derived by calculating the rays at the edge of the spot formed by each virtual pixel. To obtain the effect of assembly error on refocusing, we introduced an error term and calculated the refocusing result. The result showed that the only effect of assembly error is the linear scaling of the refocus depth. Two simulations were conducted to verify the theoretical calculation, which showed that the refocusing results of an ideal InI-based LF-NED are consistent with the theoretical calculation. Although the calculations in this paper were all performed in the meridional plane, the representation and calculation can be easily extended to the 3-D space. Our study provides a parametric calculating tool for designing the key parameters and evaluating the primary display effects of an InI-based LF-NED system. In future work, aberrations will be considered for the model to predict and evaluate the performance of actual InI-based LF-NEDs more accurately.

Funding

Key Technologies Research and Development Program (2017YFA0701200); National Natural Science Foundation of China (61822502).

Acknowledgments

We would like to thank Synopsys for providing the educational license of CODE V and LightTools.

Disclosures

The authors declare no conflicts of interest.

References

1. D. Cheng, Y. Wang, H. Hua, and M. M. Talha, “Design of an optical see-through head-mounted display with a low f-number and large field of view using a freeform prism,” Appl. Opt. 48(14), 2655–2668 (2009). [CrossRef]

2. Y. Amitai, “Extremely compact high-performance HMDs based on substrate-guided optical element,” SID Symp. Dig. Tech. Pap. 35(1), 310–313 (2004). [CrossRef]

3. J. Han, J. Liu, X. Yao, and Y. Wang, “Portable waveguide display system with a large field of view by integrating freeform elements and volume holograms,” Opt. Express 23(3), 3534–3549 (2015). [CrossRef]

4. L. Marran and C. Schor, “Multiaccommodative stimuli in vr systems: problems & solutions,” Hum. Factors 39(3), 382–388 (1997). [CrossRef]

5. H. Hua, “Enabling focus cues in head-mounted displays,” Proc. IEEE 105(5), 805–824 (2017). [CrossRef]

6. D. Lanman and D. Luebke, “Near-eye light field displays,” ACM Trans. Graph. 32(6), 1–10 (2013). [CrossRef]

7. H. Hua and B. Javidi, “A 3d integral imaging optical see-through head-mounted display,” Opt. Express 22(11), 13484–13491 (2014). [CrossRef]

8. C. Yao, D. Cheng, T. Yang, and Y. Wang, “Design of an optical see-through light-field near-eye display using a discrete lenslet array,” Opt. Express 26(14), 18292–18301 (2018). [CrossRef]

9. K. Aksit, J. Kautz, and D. Luebke, “Slim near-eye display using pinhole aperture arrays,” Appl. Opt. 54(11), 3422 (2015). [CrossRef]

10. W. Song, Y. Wang, D. Cheng, and Y. Liu, “Light field head-mounted display with correct focus cue using micro structure array,” Chin. Opt. Lett. 12, 060010 (2014). [CrossRef]

11. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques - SIGGRAPH ’96, (1996), pp. 31–42.

12. https://www.mojo.vision.

13. https://www.jb-display.com.

14. A. Isaksen, L. McMillan, and S. J. Gortler, “Dynamically reparameterized light fields,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques - SIGGRAPH ’00, (2000), pp. 297–306.

15. C.-K. Liang, Y.-C. Shih, and H. H. Chen, “Light field analysis for modeling image formation,” IEEE Trans. on Image Process. 20(2), 446–460 (2011). [CrossRef]

16. R. Matsubara, Z. Y. Alpaslan, and H. S. El-Ghoroury, “Light field display simulation for light field quality assessment,” in Proceedings of SPIE - Stereoscopic Displays and Applications XXVI, vol. 9391 (2015), p. 93910G.

17. D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding, calibration and rectification for lenselet-based plenoptic cameras,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, (2013), pp. 1027–1034.

18. T. Iwane, “Light field display and 3D image reconstruction,” in Three-Dimensional Imaging, Visualization, and Display 2016, vol. 9867 (2016), p. 98670S.

19. X. Jin, L. Liu, and Q. Dai, “Approximation and blind reconstruction of volumetric light field,” Opt. Express 26(13), 16836–16852 (2018). [CrossRef]

20. H. Huang and H. Hua, “Systematic characterization and optimization of 3D light field displays,” Opt. Express 25(16), 18508–18525 (2017). [CrossRef]

21. Z. Qin, P. Y. Chou, J. Y. Wu, Y. T. Chen, C. T. Huang, N. Balram, and Y. P. Huang, “Image formation modeling and analysis of near-eye light field displays,” J. Soc. Inf. Disp. 27(4), 238–250 (2019). [CrossRef]

22. Y. Yuan, X. Wang, Y. Yang, H. Yuan, C. Zhang, and Z. Zhao, “Full-chain modeling and performance analysis of integral imaging three-dimensional display system,” J. Eur. Opt. Soc.-Rapid Publ. 16(1), 12 (2020). [CrossRef]

23. C.-G. Luo, Q.-H. Wang, H. Deng, X.-X. Gong, L. Li, and F.-N. Wang, “Depth calculation method of integral imaging based on gaussian beam distribution model,” J. Disp. Technol. 8(2), 112–116 (2012). [CrossRef]

24. Z. Zhao, J. Liu, L. Xu, Z. Zhang, and N. Zhao, “Wave-optics and spatial frequency analyses of integral imaging three-dimensional display systems,” J. Opt. Soc. Am. A 37(10), 1603–1613 (2020). [CrossRef]

25. M. Cho, M. Daneshpanah, I. Moon, and B. Javidi, “Three-dimensional optical sensing and visualization using integral imaging,” in Proceedings of the IEEE, vol. 99 (2011), pp. 556–575.

26. S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12(3), 483–491 (2004). [CrossRef]

27. A. Schwarz, J. Wang, A. Shemer, Z. Zalevsky, and B. Javidi, “Lensless three-dimensional integral imaging using variable and time multiplexed pinhole array,” Opt. Lett. 40(8), 1814–1817 (2015). [CrossRef]

28. X. Shen and B. Javidi, “Large depth of focus dynamic micro integral imaging for optical see-through augmented reality display using a focus-tunable lens,” Appl. Opt. 57(7), B184–B189 (2018). [CrossRef]

29. M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512–566 (2018). [CrossRef]

30. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998). [CrossRef]

31. H. Huang and H. Hua, “Effects of ray position sampling on the visual responses of 3D light field displays,” Opt. Express 27(7), 9343–9360 (2019). [CrossRef]

32. https://www.synopsys.com/optical-solutions/lighttools.html.

33. Z. Qin, P.-Y. Chou, J.-Y. Wu, C.-T. Huang, and Y.-P. Huang, “Resolution-enhanced light field displays by recombining subpixels across elemental images,” Opt. Lett. 44(10), 2438–2441 (2019). [CrossRef]

Parameter	Value
Pixel size ( $μ$ m)	4.7 (5,430 PPI)
SLM unit pitch (mm)	1
SLM unit aperture (mm)	1
SLM focal length (mm)	5
Exit pupil length of each lenslet (mm)	0.75
Eye relief $z_{E P L}$ (mm)	25
Range of image (mm)	200-5,000

Parameter	Calculated PPD	Single-pixel PPD	Refocused-pixel PPD
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ = $- \infty$	18.65	13.28	8.37
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ =-620 mm	7.13	6.64	6.50
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ =-307 mm	5.13	2.68	2.59
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ =-203 mm	3.35	2.53	2.45
$z_{C D P}$ =-620 mm, $z_{R F P}$ = $- \infty$	6.84	8.98	5.93
$z_{C D P}$ =-620 mm, $z_{R F P}$ =-620 mm	19.28	13.28	7.36
$z_{C D P}$ =-620 mm, $z_{R F P}$ =-307 mm	7.31	6.17	6.11
$z_{C D P}$ =-620 mm, $z_{R F P}$ =-203 mm	4.62	3.82	4.07
$z_{C D P}$ =-307 mm, $z_{R F P}$ = $- \infty$	4.16	5.36	4.56
$z_{C D P}$ =-307 mm, $z_{R F P}$ =-620 mm	7.03	8.85	7.19
$z_{C D P}$ =-307 mm, $z_{R F P}$ =-307 mm	19.79	13.28	9.12
$z_{C D P}$ =-307 mm, $z_{R F P}$ =-203 mm	7.50	6.71	7.19
$z_{C D P}$ =-203 mm, $z_{R F P}$ = $- \infty$	2.98	3.99	3.82
$z_{C D P}$ =-203 mm, $z_{R F P}$ =-620 mm	4.28	5.60	5.18
$z_{C D P}$ =-203 mm, $z_{R F P}$ =-307 mm	7.24	9.26	7.83
$z_{C D P}$ =-203 mm, $z_{R F P}$ =-203 mm	20.35	13.88	11.10

Parameter	Value
Pixel size ( $μ$ m)	4.7 (5,430 PPI)
SLM unit pitch (mm)	1
SLM unit aperture (mm)	1
SLM focal length (mm)	5
Exit pupil length of each lenslet (mm)	0.75
Eye relief $z_{E P L}$ (mm)	25
Range of image (mm)	200-5,000

Parameter	Calculated PPD	Single-pixel PPD	Refocused-pixel PPD
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ = $- \infty$	18.65	13.28	8.37
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ =-620 mm	7.13	6.64	6.50
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ =-307 mm	5.13	2.68	2.59
$z_{C D P}$ = $- \infty$ , $z_{R F P}$ =-203 mm	3.35	2.53	2.45
$z_{C D P}$ =-620 mm, $z_{R F P}$ = $- \infty$	6.84	8.98	5.93
$z_{C D P}$ =-620 mm, $z_{R F P}$ =-620 mm	19.28	13.28	7.36
$z_{C D P}$ =-620 mm, $z_{R F P}$ =-307 mm	7.31	6.17	6.11
$z_{C D P}$ =-620 mm, $z_{R F P}$ =-203 mm	4.62	3.82	4.07
$z_{C D P}$ =-307 mm, $z_{R F P}$ = $- \infty$	4.16	5.36	4.56
$z_{C D P}$ =-307 mm, $z_{R F P}$ =-620 mm	7.03	8.85	7.19
$z_{C D P}$ =-307 mm, $z_{R F P}$ =-307 mm	19.79	13.28	9.12
$z_{C D P}$ =-307 mm, $z_{R F P}$ =-203 mm	7.50	6.71	7.19
$z_{C D P}$ =-203 mm, $z_{R F P}$ = $- \infty$	2.98	3.99	3.82
$z_{C D P}$ =-203 mm, $z_{R F P}$ =-620 mm	4.28	5.60	5.18
$z_{C D P}$ =-203 mm, $z_{R F P}$ =-307 mm	7.24	9.26	7.83
$z_{C D P}$ =-203 mm, $z_{R F P}$ =-203 mm	20.35	13.88	11.10

Matrix optics representation and imaging analysis of a light-field near-eye display

Abstract

1. Introduction

2. Matrix optics representation of the LF display

2.1 Representation of the InI system

2.2 Representation of LF rendering and refocusing

3. Imaging analysis of the LF display

3.1 Resolution analysis

3.2 Refocusing depth accuracy

3.3 Refocusing definition

3.4 Effect of assembly errors on refocusing

4. Simulation verification

5. Discussion

6. Conclusion

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (11)

Tables (2)

Equations (38)

Optics Express