Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Implementation of the real–virtual 3D scene-fused full-parallax holographic stereogram

Open Access Open Access

Abstract

This work focuses on the generation of three-dimensional (3D)-scene information as well as the fusion of real and virtual 3D scene information for the full-parallax holographic stereogram based on the effective perspective images’ segmentation and mosaicking (EPISM) method. The improved depth-image-based rendering (DIBR) method was used to generate the virtual viewpoint images of the real 3D scene, and the regularization and densification processing models of the degraded light field were established; as a result, the real sampling-light field was reconstructed. Combined with the computer-rendered virtual 3D scene information, a “real + virtual” light-field fusion method based on a pixel-affine-projection was proposed to realize the fusion of the real and virtual 3D scene. The fusion information was then processed by the EPISM encoding and was then holographically printed. The optical experiment results showed that the full-parallax holographic stereogram with the real–virtual scene-fused 3D scenes could be correctly printed and reconstructed, which validated the effectiveness of our proposed method.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

The holographic stereogram (HS) combines the traditional optical holography and binocular parallax principle, which can record a series of two-dimensional perspective images recorded with a real or virtual 3D scene’s parallax information into the holographic elements (hogels). This is done according to the principle of interference, and then according to the principle of diffraction, the 3D scene can be reconstructed so that the human eyes perceive a realistic stereoscopic display effect. Limited by the human visual system, HSs approximate the complete light-field (LF) by using the discrete LF sampling information, which reduces the production cost and complexity, and improves the flexibility of holographic recording. Therefore, HS has potential applications in many fields, such as military, medical, and artistic [14].

In 1967, R. V. Pole applied a fly-eye lens array to record different perspective images of a small 3D scene and used them directly for holographic recording, completing the earliest HS experiment. However, only the virtual-stereo images of the real 3D scene could be reconstructed [5]. In 1969, DeBitetto used cameras placed at equal intervals in the horizontal direction to record the horizontal-parallax perspective-image sequence of a real 3D-scene, and printed a high-resolution horizontal-parallax-only (HPO) HS. However, because only the horizontal parallax information was recorded, human eyes needed to stay close to the HS to perceive the correct stereoscopic display [6]. Subsequently, M. C. King et al. used a computer and an automatic plotter to generate a sequence of perspective images on the arc surface surrounding a virtual 3D scene in 1970. Then the HPO HS that recorded the virtual scene information through the two-step transfer exposure method was printed out, and the reconstructed stereo scene could be displayed protruding from the holographic plate [7]. In 1991, Halle et al. [8] used an infinity camera to capture real or virtual 3D-scene information, and then pre-distorted the captured-perspective images. After a single-step printing, the HPO HS was obtained, and the observer could perceive the undistorted-stereoscopic display in an arbitrary plane. In 1992, M. Yamaguchi et al. applied digital image processing technology and ray-tracing principles to calculate a virtual 3D scene’s perspective-image sequence to be exposed, and printed the full-parallax HS of a virtual 3D-scene, but the calculation process was complicated, and the occlusion relationship between object points needed to be considered [9]. The direct-writing digital-holographic (DWDH) method was proposed by H. Bjelkhagen et al, which can be traced back to as early as 2002 [2,10], in which the essential idea was to establish the pixel-mapping relationship between the SLM plane and film-projection plane by simulating the two-step transfer process in the computer, and then calculate the perspective images to be exposed to realize single-step printing. Recently our group proposed a single-step full-parallax HS printing method, which was named the effective perspective images’ segmentation and mosaicking method (EPISM) [11], which could obtain the reconstructed scene protruding from the holographic plate. By simulating the cone-radial observation effect of the human eyes, this method divided and reorganized the sampled images to obtain effective-perspective images that needed to be exposed. This method effectively reduced the overall printing time and the number of sampled images while obtaining higher reconstruction resolution.

The 3D-scene information required for fabricating a full-parallax HS is usually described in terms of LF formalism. For virtual 3D scenes, an ideal-virtual camera is usually used to render the 3D scene directly to get a dense and regular-sampling LF, which can be used for printing after computer encoding [1012]. In order to solve the pseudoscopic-stereoscopic display when the sampled image was directly used for printing, Halle et al. designed a virtual-pseudoscopic camera to render the 3D scene and applied it to print a HS, which could reconstruct the real stereoscopic image [13]. For real 3D scenes, the generation of scene information needs to be collected by optical devices, and the acquisition mode can be divided into camera-array acquisition [14] (sampling the LF with high resolution, but also a large system volume and high cost), timing acquisition [15] (low cost but high time cost), and spatial-multiplexing acquisition [16,17] (the sampling LF has mutual constraints between spatial and angular resolution). However, due to the influence of internal or external factors in the acquisition system, the original scene information cannot meet the precise requirements of the HS in terms of the number and interval of sampled-perspective images, so it is necessary to reconstruct the original sampled-LF. View synthesis technology is an effective method to reconstruct dense LFs. In 2007, Katz et al. used geometric-image interpolation to synthesize all the perspective images required by the Fourier hologram for the first time, which had the limitation that only the middle viewpoints of two known viewpoints could be synthesized [18]. Among the many view synthesis technologies, the DIBR technology can draw the virtual viewpoint image at any position according to the reference viewpoint image and the corresponding depth information, which has the advantages of fast rendering speed, strong realism and low complexity. In 2020, Fachada et al. applied DIBR technology to synthesize 768 virtual-sampled viewpoint images from four RGBD images and printed out a high-resolution HPO HS [19]. Some scholars applied compressive sensing technology to reconstruct the LF and achieved good holographic reconstruction [2022]. In 2011, Rivenson et al. used the compressive-sensing theory to generate multiple views projection, which greatly reduced the number of actual projections [23]. In 2016, Erdem et al. used the sparse representation of LF in the shearlet domain to reconstruct a dense LF from a highly under-sampled LF and obtained satisfactory holographic reconstructions [24]. However, compressive-sensing technology is extraordinarily time-consuming in the reconstruction of complex LFs. In 2021, Liang et al. used a deep neural network to generate a realistic-color 3D hologram from a single RGBD image in real time, but the effect of this method depended on the large-scale dataset used for training, so its robustness was not strong enough [25].

Real and virtual 3D scenes are recorded into the same HS, which can reconstruct the stereoscopic-perception effect similar to augmented reality, and the human eyes can directly observe the 3D real–virtual fused scene. However, during the implementation, it is necessary to consider the inconsistency of sampling parameters as well as the relative spatial and occlusion relationships between different 3D scenes. As far as we know, there is no relevant research report. The main work of this paper is to apply DIBR technology to reconstruct the sampled degraded LF of a real 3D scene and realize the effective fusion of real and virtual-3D scene information. In the last, we fabricated a full-parallax HS to record a real–virtual fused 3D-scene in combination with the EPISM method.

2. Analysis on reconstruction quality of EPISM

2.1 Brief idea of EPISM

The EPISM method is based upon simulating the conventional two-step transfer-exposure process with a computer, and is in accordance with the principle of ray tracing and the cone-shaped radiation-observation effect of the human eye’s line-of-sight. This method is used to segment and mosaic the segments of sampled-perspective images within a specific field-of-view (FOV) range, after which the effectively synthetic perspective images are exposed into hogels. Thus, a one-step HS printing process can be realized. At the same time, due to the high resolution of the effectively synthetic perspective images, the HS has a good holographic-reconstruction effect.

Figure 1 shows the schematic diagram of EPISM method in a one-dimensional case for the convenience of description. First, according to the requirements of the EPISM method for parameters such as sampling interval, sampling number and sampling distance, the LF acquisition system is applied to sample the LF of a target 3D scene, and the original perspective-image sequence including the full-parallax information of the 3D scene is obtained, as shown in Fig. 1 (a). Second, the effective image segments of the sampled-perspective images captured by the cameras are segmented and mosaicked to obtain the effectively synthetic perspective images, as shown in Fig. 1 (b). The focus plane of the camera to the target 3D scene is set as the reference plane, and the position is the same as the LCD screen in the printing system. Each hogel in the hologram corresponds to an effectively synthetic perspective image, and the position of the holographic plane is set according to the geometric relationship between the reconstructed scene and the hologram. The virtual hologram consists of several virtual hogels, and each virtual hogel corresponds to a sampled-perspective image. The sampling plane coincides with the virtual hologram plane, and the size of the virtual hogel is the same as the sampling interval. In order to match the size of hogel, the sampling interval in the sampling plane is scaled as a whole. $\overline {{M_i}{N_i}}$ represents the sampled-perspective image taken by the camera ${C_i}$, and $\overline {{O_1}{O_2}} $ represents the effectively synthetic perspective image corresponding to hogel’. According to the principle of ray tracing and the cone-shaped observation effect of the human eye’s line-of-sight, only a limited image segment $\overline {{Q_1}{Q_2}} $ in $\overline {{M_i}{N_i}}$ can be observed by the center point O of hogel’, which is defined as the part intercepted by the connecting lines between $C_i^u$ and $C_i^d$ on both sides of camera ${C_i}$ and point O. In the 2D case, it is a square-segment region. Similarly, the sampled-perspective image corresponding to the sampling viewpoint ${C_{i - 1}}$ adjacent to ${C_i}$ is $\overline {{M_{i - 1}}{N_{i - 1}}}$, and only the effective-perspective image segment $\overline {{P_1}{P_2}} $ in $\overline {{M_{i - 1}}{N_{i - 1}}}$ can be recorded to hogel’. By analogy, the effective-perspective image segments of all the sampled-perspective images within the FOV $\theta $ of point O are segmented and mosaicked to obtain the effectively synthetic perspective image $\overline {{O_1}{O_2}} $ corresponding to hogel’. Finally, $\overline {{O_1}{O_2}} $ is exposed and recorded into hogel’, and the rest of the hogels on the holographic plate are exposed in turn according to the above method until the whole HS is printed. The detailed principles as well as their implementation were introduced in our previously published work [11].

 figure: Fig. 1.

Fig. 1. The basic principles of EPISM method. (a) sampling of the original perspective images, (b) segmentation and mosaicking of the effective perspective image segments.

Download Full Size | PDF

2.2 Effect of LF sampling on the reconstruction quality of EPISM

According to the 4D LF theory $L({s,t,u,v} )$ [26], the pupil plane of the camera lens can be regarded as the st plane, and the angle-dimension distribution of the LF is recorded. Here $({s,t} )$ indicates that $s \times t$ perspective images are captured. The plane of the camera sensors is regarded as the uv plane, and the spatial-dimension distribution of the LF is recorded, which indicates that the resolution of each of the perspective images is $u \times v$. Therefore, the whole 4D LF can be defined by a 2D sampled perspective image sequence. The EPISM HS can reconstruct a stereoscopic 3D scene from a series of discrete 2D effectively synthetic perspective images, and the quality of the sampled LF directly determines the reconstructed quality. If the sampled-perspective images are directly used for holographic printing, the reconstructed LF will have an artifact-display effect of depth-information inversion, and the huge amount of sampled images will bring a high time cost. The EPISM method is used as a LF encoding algorithm, in which the core idea is to generate depth-inverted and high-resolution pseudo-perspective images through less sampled-perspective images, so that the HS can reconstruct a positive-real 3D scene with the correct depth relationship, so as to restore the original LF information both truly and accurately. In the sampled-perspective images, each pixel corresponds to a ray in the LF, so we can analyze the effect of degradation in the LF sampling on the EPISM LF-encoding algorithm by studying the propagation of light.

First, in the process of LF acquisition, the influence of irregular-sampling problems, such as the deviation of the actual-sampling viewpoint from the ideal-viewpoint position or the missing of the ideal-viewpoint image, on the reconstruction effect of the EPISM method is studied. Figure 2 shows the error-analysis diagram of the EPISM method for irregular sampling in 1D case. According to the principles of the EPISM method, among the sampled-perspective images captured by sampling cameras ${C_{i - 1}}$, ${C_i}$ and ${C_{i + 1}}$, the effective-perspective image segments that can be exposed in hogel’ are the parts intercepted by connecting lines between point O and each camera’s boundary. In the ideal case, the sampling interval of the viewpoint is d, the actual-sampling camera ${C_i}$ is located at the ideal-viewpoint position, the actual-sampling camera ${C_{i - 1}}$ deviates $\mathrm{\Delta }d$ from the ideal-sampling camera ${C^{\prime}_{i - 1}}$ in the direction close to the camera ${C_i}$, and the actual-sampling camera ${C_{i + 1}}$ deviates $\mathrm{\Delta }d$ from the ideal-sampling camera ${C^{\prime}_{i + 1}}$ in the direction far away from the camera ${C_i}$. It can be seen from Fig. 2 that there are errors in the segmentation and mosaicking process of the effective-perspective image segments corresponding to each actual-sampling camera. That is, the overlapping region $\overline {{Q_1}{P_2}} $ is generated between $\overline {{Q_1}{Q_2}} $ and $\overline {{P_1}{P_2}} $, and a blank-pixel region $\overline {{Q_2}{R_1}} $ is generated between $\overline {{Q_1}{Q_2}} $ and $\overline {{R_1}{R_2}} $. According to the time sequence of the algorithm processing, the effective-perspective image segments corresponding to the actual-sampling cameras ${C_{i - 1}}$, ${C_i}$, and ${C_{i + 1}}$ are $\overline {{P_1}{Q_1}} $, $\overline {{Q_1}{Q_2}} $, and $\overline {{R_1}{R_2}} $, respectively. In order to quantitatively study the reconstruction error on irregular sampling, the distances from the reference plane to the camera plane and the hologram plane are represented by ${L_1}$ and ${L_2}$, respectively. Then, the mosaicking error $\mathrm{\Delta }e$ of the effective-perspective image segment is given as:

$$\mathrm{\Delta }e = \frac{{{L_2}}}{{{L_1} + {L_2}}}\mathrm{\Delta }d.$$

It can be concluded that the reconstruction error $\mathrm{\Delta }e$ is positively correlated with the offset $\mathrm{\Delta }d$. The larger the value of $\mathrm{\Delta }e$, the more serious the distortion of the reconstructed LF. When ${0 < \Delta }d < d$, the irregular-sampling type can be regarded as the case in which the viewpoint deviates from the ideal viewpoint. When $\mathrm{\Delta }d > d$, the irregular-sampling type can be regarded as the case in which the ideal viewpoint is missing, and the solution is introduced in detail in following chapter.

 figure: Fig. 2.

Fig. 2. Analysis of the EPISM reconstruction error caused by irregular sampling.

Download Full Size | PDF

Second, the influence of sparse LF sampling on EPISM is studied. The acquisition and reconstruction of the LF in HS are not the complete representation of the real LF, but the discrete LF is used to approximate the real and complete LF according to the limited resolution of the human eye. When the observer moves along the angular dimension, if the parallax between the discrete-reconstructed effectively synthetic perspective images is greater than the limit of resolution of the human visual system, the human eye will observe the reconstructed scene of discontinuous-motion parallax. That is the view-flapping effect that greatly reduces the 3D-perception effect. As shown in Fig. 3, the analysis diagram of the view-flapping effect is shown. The observer is located at point V, which is ${L_\textrm{v}}$ away from the hologram plane, to observe the reconstructed 3D scene at the reference plane. It is assumed that there are two adjacent hogels, $\textrm{hoge}{\textrm{l}_n}$ and $\textrm{hoge}{\textrm{l}_{n + 1}}$, in the hologram, and the center points are point ${O_n}$ and point ${O_{n + 1}}$, respectively. The distance between those two points is equal to the hogel’s size $\mathrm{\Delta }H$. In the reconstructed 3D scene, there is a point A that deviates from the imaging plane by a distance of ${L_\textrm{A}}$, and the line connecting point A with point ${O_n}$ and point ${O_{n + 1}}$ intersects with the imaging plane at ${A_n}$ and ${A_{n + 1}}$, respectively, with a distance between the two points of $\delta $. When the observer perceives point A in the reconstructed 3D scene, it actually observes the adjacent reconstructed points ${A_n}$ and ${A_{n + 1}}$ on the imaging plane, that is, light $\overline {V{A_n}} $ and light $\overline {V{A_{n + 1}}} $ enter the pupil and are equivalent to light $\overline {VA} $ through the binocular-parallax effect. If the angle between the light rays is ${\theta _\textrm{A}} \le {\theta _{\min }}$, there is no difference between the reconstructed points ${A_n}$ and ${A_{n + 1}}$, meaning that the view-flapping effect can be ignored, where ${\theta _{\min }}$ is the ultimate angle resolution of the human eye in the 3D display system, and ${\theta _{\min }} = 0.3\textrm{ mrad}$ [27]. If ${\theta _\textrm{A}} > {\theta _{\min }}$, when the line-of-sight moves from $\overline {V{A_n}} $ to $\overline {V{A_{n + 1}}} $, the observer will obviously perceive the discontinuous disparity between the reconstructed points ${A_n}$ and ${A_{n + 1}}$, resulting in the view-flapping effect. $\delta $ is used to represent the view-flapping distance, which is given as:

$$\delta = \frac{{\mathrm{\Delta }H \times {L_\textrm{A}}}}{{{L_2} + {L_\textrm{A}}}}.$$

${\theta _\textrm{A}}$ is used to denote the angular resolution of the HS. Generally, the hogel’s size $\mathrm{\Delta }H$ is much smaller than the distance ${L_\textrm{v}}$ from the hologram to the observer. Therefore, according to the paraxial approximation, we can obtain

$${\theta _\textrm{A}} \approx \frac{\delta }{{{L_\textrm{v}} - {L_2}}} = \frac{{\mathrm{\Delta }H \times {L_\textrm{A}}}}{{({{L_2} + {L_\textrm{A}}} )({{L_\textrm{v}} - {L_2}} )}}.$$

 figure: Fig. 3.

Fig. 3. Analysis on the view-flapping effect under sparse sampling.

Download Full Size | PDF

The main research content of this paper is the influence of the arrangement of sampling viewpoints on the holographic reconstruction effect. The size of the sampling interval d is determined by the hogel’s size $\mathrm{\Delta }H$, and there is a positive correlation between $\mathrm{\Delta }H$ and d [11]. Therefore, for the reconstructed point A in the 3D scene where ${L_2}$, ${L_\textrm{v}}$, and ${L_\textrm{A}}$ are kept unchanged, it can be concluded that the larger the value of d, the larger the value of $\mathrm{\Delta }H$ in the HS. In turn, the angular resolution ${\theta _\textrm{A}}$ becomes lagers, and the view-flapping effect is more serious.

In order to verify the influence of the LF sampling parameters on the reconstruction effect of the EPISM method, the virtual 3D model police car provided by Microsoft was used for simulation. As shown in Fig. 4, using 3D Studio MAX software, the center depth plane of 3D scene was set as the reference plane, and ${L_1}$, ${L_2}$, and the sampling range were kept unchanged. By changing the sampling parameters of the virtual-camera array to collect the LF of the 3D scene, different full-parallax sampled-perspective image sequences under different sampling parameters were obtained. The corresponding sampling parameters and sampling types of each group are shown in Table 1.

 figure: Fig. 4.

Fig. 4. Configuration of the sampling.

Download Full Size | PDF

Tables Icon

Table 1. The sampling parameters corresponding to each group of effectively synthetic perspective images

Finally, the effectively synthetic perspective images encoded by the EPISM method are shown in Fig. 5(a)–(d). In Fig. 5(b), due to the lack of viewpoints on line 65, the corresponding blank-pixel segments appeared in the effectively synthetic perspective image. In Fig. 5(c), the pixel aliasing occurred in the effectively synthetic perspective image because the viewpoints on line 65 were deviated 2 mm to the right. In Fig. 5(d), when the sampling range remained unchanged, the sparser sampling led to larger sampling interval, which increased the difference between adjacent sampled-perspective images. In the process of segmentation and mosaicking, the difference between adjacent effectively perspective image segments will be more obvious. According to Eq. (3), the view-flapping effect of the reconstructed LF will be aggravated. However, the effectively synthetic perspective image shown in Fig. 5(a) was encoded from the ideal-sampling parameters, and there was no such problem. However, in the process of real LF acquisition, it is often unable to meet the ideal viewpoint sampling parameters, so it is necessary to use the virtual viewpoint rendering method to calculate the regular and intensive ideal sampling viewpoints.

 figure: Fig. 5.

Fig. 5. The effectively synthetic perspective images encoded by the EPISM method under different sampling parameters: (a) when the sampling viewpoints are dense and regular, (b) when the sampling viewpoints are partly missing, (c) when the sampling viewpoints are partly deviated, and (d) when the sampling viewpoints are sparse.

Download Full Size | PDF

3. Generation and fusion of the real–virtual LF

The acquisition of virtual 3D-scene information is idealized, while the acquisition of real 3D scene information is limited by the accuracy and effective shift distance of the acquisition system, which leads to sparse and irregular characteristics of the actual captured-view images. Therefore, the reconstructed real + virtual 3D scene of HS has reduced quality. In this work, DIBR technology was used to reconstruct the real 3D scene information.

3.1 Design of the real-LF sampling system

For the research content of this work, the LF acquisition system was only responsible for capturing sparse and discrete viewpoint images, so we chose the timing-acquisition system to capture the original sampling LF of the real 3D scene.

According to the principles of the EPISM method, the camera plane and the 3D scene should maintain a parallel relationship, and the distance from the camera plane to the center depth of the target 3D scene and the field angle of the sampling camera should satisfy a certain constraint relationship. As shown in Fig. 6, the real-LF sampling system based on the constraint relationship was composed of a sampling camera, a two-dimensional linear-displacement platform, and a control computer. A MER2-502-79U3C CMOS industrial camera was fixed on the KSA300@Zolix-type electronically-controlled two-dimensional linear-displacement platform. The resolution of the sensor was 2448×2048 pixels, and it was equipped with a M1224-MPW2 fixed-focus lens, whose focal length was 12 mm and FOV was 39.8°. The vertical distance between the camera and the real scene was set to 13.8 cm. The two-dimensional linear-displacement platform was driven by the MC600@Zolix motion controller to achieve linear movement in two degrees of freedom, horizontal and vertical. The control computer was connected to the CMOS industrial camera through a USB 3.0 port and was responsible for triggering the camera to capture and store images. Thus, the entire real-LF sampling system could complete the horizontal- and vertical-parallax information collection of the real 3D scene.

 figure: Fig. 6.

Fig. 6. Configuration of the real-LF sampling system.

Download Full Size | PDF

3.2 Regularization and densification processing of the sampled real-LF

As shown in Fig. 7 and for the convenience of description, only the reconstruction diagram of the sampled real-LF under the horizontal parallax is given. The sampling position corresponding to the actual-sampled viewpoint camera ${C_i}(i = 1,2,\ldots ,n)$ was ${x_i}$, where n was the number of actual-sampled viewpoints, the perspective image captured by the camera at each actual-sampled viewpoint was ${I_i}$, the distance between two adjacent actual-sampled viewpoints was defined as ${l_i}(i = 1,2,\ldots ,n - 1)$, and ${L_{\textrm{ini}}}$ was used to represent the initial-sampled LF defined by the actual-sampled viewpoint image sequence. In order to obtain an ideal-reconstructed 3D scene without distortion, the sampled viewpoints should meet the requirements of the EPISM method, and such sampled viewpoints are called ideal-sampled viewpoints. The sampling position corresponding to the ideal-sampled viewpoint camera ${C^{\prime}_i}(i = 1,2,\ldots ,n)$ was ${x^{\prime}_i}$, the captured ideal-sampled viewpoint image was ${I^{\prime}_i}$, the distance between every two adjacent ideal-sampled viewpoints was a constant $l^{\prime}$, and ${L_{\textrm{ide}}}$ was used to represent the ideal-sampled LF defined by the ideal-sampled viewpoint image sequence. Due to the irregularity of the viewpoints in the initial-sampled LF, ${x_i} = {x^{\prime}_i}$ and ${l_i} \equiv l^{\prime}$ could not be guaranteed. Under this condition, if the initial-sampled LF ${L_{\textrm{ini}}}$ was directly processed by the EPISM method, it would cause distortion of the reconstructed LF. Therefore, in this paper, the initial-sampled LF ${L_{\textrm{ini}}}$ was generated into the ideal-sampled LF ${L_{\textrm{ide}}}$ by our improved DIBR method to complete the regularization of the sampled real-LF.

 figure: Fig. 7.

Fig. 7. Reconstruction of the sampled real-LF.

Download Full Size | PDF

Due to the time and space cost of real-LF acquisition and the complexity of the system, the viewpoint image sequence captured was usually sparse and subsampled. If ${I_i}$ was directly processed by the EPISM method, the angular resolution of the HS would be low, and a serious view-flapping effect would be produced, which would greatly reduce the LF reconstruction effect. Therefore, the improved DIBR method was needed to generate the virtual-sampled LF ${L_{\textrm{vir}}}$ based on the regularized ideal-sampled LF ${L_{\textrm{ide}}}$ to further complete the intensive processing of the sampled real-LF. The sampling position corresponding to the virtual-sampled viewpoint camera ${C^{\prime\prime}_j}({j = 1,2,\ldots ,m} )$ was ${x^{\prime\prime}_j}$, where m was the number of rendered virtual-sampled viewpoint image ${I^{\prime\prime}_j}$. ${L_{\textrm{fin}}}$ was used to represent the reconstructed sampled LF composed of the ideal-sampled viewpoint image sequence and the virtual-sampled viewpoint image sequence, which can be expressed as

$${L_{\textrm{fin}}} = \{{{{I^{\prime}}_1},{{I^{\prime}}_2},\ldots ,{{I^{\prime}}_n}} \}\cup \{{{{I^{\prime\prime}}_1},{{I^{\prime\prime}}_2},\ldots ,{{I^{\prime\prime}}_m}} \}.$$

The core idea of DIBR technology is 3D-warping [28], which establishes the corresponding geometric relationship between the pixel point in the 2D image and the 3D object point. The first step is to project the pixel points in the reference viewpoint image into 3D Euclidean space according to the depth information as well as internal and external parameters of the sampling camera to form a 3D point cloud composed of spatial-object points. In the second step, according to the camera parameters at the virtual viewpoint, the space points are reprojected to the image plane of the virtual viewpoint to complete the whole 3D-warping process. The complete 3D-warping equation can be expressed as

$$\left[ {\begin{array}{c} {{\xi_\textrm{v}}}\\ {{\eta_\textrm{v}}}\\ 1 \end{array}} \right] = Z_\textrm{v}^{ - 1}{{\boldsymbol K}_\textrm{v}}\left[ {{{\boldsymbol R}_\textrm{v}}{\boldsymbol R}_\textrm{r}^{ - 1}({\boldsymbol K}_\textrm{r}^{ - 1}{Z_\textrm{r}}\left[ {\begin{array}{c} {{\xi_\textrm{r}}}\\ {{\eta_\textrm{r}}}\\ 1 \end{array}} \right] - {{\boldsymbol T}_\textrm{r}}) + {{\boldsymbol T}_\textrm{v}}} \right].$$

According to the 3D-warping equation, the DIBR method can realize the rendering from the reference viewpoint to the virtual image at any viewpoint in space according to the camera’s internal and external parameters, which are obtained by the camera calibration. And we use the PatchMatch stereo algorithm [29] to obtain depth information, which mainly includes three steps: random initialization, iterative propagation and post-processing. It has the advantages of high matching accuracy and strong generalization ability.

Although the DIBR method has the advantages of simple implementation, fast rendering speed, and wide application range, there are many problems in the traditional DIBR method that seriously affect the quality of the generated image, which mainly include holes, cracks, artifacts, and overlaps [30]. This work proposes an improved DIBR method based on bidirectional viewpoints fusion to solve the problems of artifacts, overlaps, cracks, and holes in the conventional DIBR method one by one. The overall improvement strategy is shown in Fig. 8. The method of morphology expansion was used to solve the problem of the artifacts caused by the sudden change of the edge position of the object in the depth image. The Z-buffer algorithm [31], also known as the depth-buffer algorithm, was used to solve the overlap problem. Aiming at the problem of cracks, this paper treated small cracks as pepper-like noise signals and used a median-filter algorithm that had a significant effect on filtering salt and pepper noise. In addition, considering that the smoothing effect of the median filter on the image will cause the image details to be blurred, the adaptive-median filter algorithm was selected to filter only the crack area. To solve the hole problem, this paper first used bidirectional viewpoints fusion method to repair larger holes, and then used Criminisi algorithm [32] to fill the remaining smaller holes.

 figure: Fig. 8.

Fig. 8. The overall strategy block diagram of the improved DIBR method.

Download Full Size | PDF

3.2.1 Regular processing of the sampled real-LF

In order to complete the regularization of the initial-sampled real-LF and solve the problems of the actual-sampled viewpoints deviating from the ideal-sampled viewpoints and the ideal-sampled viewpoints being missing, the generation model of the ideal-sampled LF is constructed as shown in Fig. 9.

 figure: Fig. 9.

Fig. 9. The generation of the regularized sampled real-LF.

Download Full Size | PDF

For the ideal-sampled viewpoint image to be drawn, if only one side of the horizontal or vertical direction had an adjacent reference viewpoint image, this kind of ideal-sampled viewpoint image was called an edge-type viewpoint image. If there were two adjacent reference viewpoint images in the horizontal or vertical direction, this kind of viewpoint image was called an inner-type viewpoint image. As shown in Fig. 10, the inner-type ideal-sampled viewpoint image was generated by the improved DIBR method. First, 3D-warping was performed on the reference viewpoints on both sides of the image in the horizontal or vertical direction, and then high-quality inner-type ideal-sampled viewpoint images could be drawn using weighted fusion and image restoration. When drawing the edge-type ideal-sampled viewpoint image, 3D-warping mapping was performed on the adjacent reference viewpoint image in the horizontal or vertical direction, and then image inpainting was performed.

 figure: Fig. 10.

Fig. 10. Different types of ideal-sampled viewpoint image rendering processes.

Download Full Size | PDF

The pre-processing included morphological expansion and the Z-buffer algorithm, while the post-processing included the adaptive-median filter and Criminisi algorithm processing for eliminating the hole problem. This work mainly focused on the rendering process between the reference image and the rendered image. For ease of description, it will be assumed that the pre-processing and post-processing processes have been implemented in the rendering process and will not be repeated in the discussion.

First, the rendering of inner-type ideal-sampled viewpoint images in the horizontal direction will be discussed. The subscripts “L” and “R” are used to indicate the left and right sides, respectively. ${I_\textrm{L}}({\xi ,\eta } )$ and ${I_\textrm{R}}({\xi ,\eta } )$ represented the actual-sampled viewpoint images on the left and right sides, respectively, $I^{\prime}(\xi ,\eta )$ represented the ideal-sampled viewpoint image, and $(\xi ,\eta )$ represented the position index of the pixel point. ${Z_\textrm{L}}$ and ${Z_\textrm{R}}$ corresponded to the pixel-depth values of ${I_\textrm{L}}({\xi ,\eta } )$ and ${I_\textrm{R}}({\xi ,\eta } )$, respectively. According to Eq. (6), the left actual-sampled viewpoint was warped horizontally to the right to obtain the initial ideal-sampled viewpoint image ${I^{\prime}_\textrm{L}}(\xi ,\eta )$

$${I^{\prime}_\textrm{L}}({\xi ,\eta } )= {Z^{\prime - 1}}{\boldsymbol K^{\prime}}[{{\boldsymbol R^{\prime}}{{\boldsymbol R}_\textrm{L}}^{ - 1}({{{\boldsymbol K}_\textrm{L}}^{ - 1}{Z_\textrm{L}}{I_\textrm{L}}({\xi ,\eta } )- {{\boldsymbol T}_\textrm{L}}} )+ {\boldsymbol T^{\prime}}} ].$$

In the same way, the initial ideal-sampled viewpoint image ${I^{\prime}_\textrm{R}}(\xi ,\eta )$, which was in the same position as ${I^{\prime}_\textrm{L}}(\xi ,\eta )$, was rendered horizontally to the left from the actual-sampled viewpoint image ${I_\textrm{R}}({\xi ,\eta } )$ on the right. $Z^{\prime}$ represented the pixel-depth value of the ideal-sampled viewpoint image $I^{\prime}(\xi ,\eta )$. ${\boldsymbol K^{\prime}}$, ${\boldsymbol R^{\prime}}$, and ${\boldsymbol T^{\prime}}$ represented the internal parameters, rotation parameters, and translation parameters of the ideal-sampling camera, respectively. In the timing acquisition system, the internal parameters at different viewpoints were completely consistent, and the optical axes of the cameras met the condition of being in the same direction, perpendicular to the optical-center baseline and parallel to each other, in order to simplify the coordinate-system transformation. In the model, assuming that the camera coordinate system of the left reference viewpoint coincided with the world coordinate system, the external-camera parameters of the left reference viewpoint could be replaced by a normalized unit matrix. In summary, we can get:

$$\left\{ {\begin{array}{c} {{Z_\textrm{L}} = {Z_\textrm{R}} = Z^{\prime},}\\ {{{\boldsymbol K}_\textrm{L}} = {{\boldsymbol K}_\textrm{R}} = {\boldsymbol K^{\prime}}\textrm{,}}\\ {{{\boldsymbol R}_\textrm{L}} = {{\boldsymbol R}_\textrm{R}} = {\boldsymbol R^{\prime}}\textrm{,}}\\ {{{\boldsymbol T}_\textrm{L}} = {\boldsymbol 0}_{3 \times 1}^\textrm{T}.} \end{array}} \right.$$

Therefore, Eq. (6) can be simplified to:

$${I^{\prime}_\textrm{L}}({\xi ,\eta } )= {I_\textrm{L}}({\xi ,\eta } )+ {Z_\textrm{L}}^{ - 1}{{\boldsymbol K}_\textrm{L}}{\boldsymbol T^{\prime}}\textrm{.}$$

The relationship between the disparity D and the depth Z can be expressed as $Z = f \cdot b/D$, where f is the focal length of the camera, and b is the baseline distance between the optical centers of cameras. Therefore, Eq. (8) can be further obtained:

$${I^{\prime}_\textrm{L}}({\xi ,\eta } )= {I_\textrm{L}}({\xi + \tau {D_{\textrm{LR}}},\eta } ).$$

Similarly, ${I^{\prime}_\textrm{R}}(\xi ,\eta )$ can be expressed as:

$${I^{\prime}_\textrm{R}}({\xi ,\eta } )= {I_\textrm{R}}({\xi - ({1 - \tau } ){D_{\textrm{LR}}},\eta } ).$$
where ${D_{\textrm{LR}}}$ was the horizontal disparity from the left-reference viewpoint to the right-reference viewpoint, and $\tau $ was the fusion-weighting factor, which can be expressed as:
$$\tau = \frac{{|{{\boldsymbol t^{\prime}} - {{\boldsymbol t}_\textrm{L}}} |}}{{|{{\boldsymbol t^{\prime}} - {{\boldsymbol t}_\textrm{L}}} |+ |{{\boldsymbol t^{\prime}} - {{\boldsymbol t}_\textrm{R}}} |}}.$$
where ${\boldsymbol t^{\prime}}$, ${{\boldsymbol t}_\textrm{L}}$, and ${{\boldsymbol t}_\textrm{R}}$ represented the camera translation vectors at the ideal-sampled viewpoint, the left reference viewpoint, and the right reference viewpoint, respectively. In order to eliminate the large-area hole problem caused by viewpoint rendering, ${I^{\prime}_\textrm{L}}(\xi ,\eta )$ and ${I^{\prime}_\textrm{R}}(\xi ,\eta )$ were fused to generate the final inner-type ideal-sampled viewpoint image $I^{\prime}(\xi ,\eta )$:
$$I^{\prime}({\xi ,\eta } )= \tau {I^{\prime}_\textrm{L}}({\xi ,\eta } )+ ({1 - \tau } ){I^{\prime}_\textrm{R}}({\xi ,\eta } ).$$

Similarly, the inner-type ideal-sampled viewpoint image in the vertical direction could be warped in the vertical direction according to the actual reference viewpoint images ${I_\textrm{A}}({\xi ,\eta } )$ and ${I_\textrm{B}}({\xi ,\eta } )$ on the upper and lower sides and the corresponding vertical disparity ${D_{\textrm{AB}}}$, shown as ${I^{\prime}_{2,4}}$ in Fig. 9.

For the edge-type ideal-sampled viewpoint image, it was not on the connecting line of any two horizontally- or vertically-adjacent actual reference-viewpoint images, so it could not be generated only by the parallax warping between images, but by the unilateral adjacent actual reference viewpoint image and its depth mapping:

$$I^{\prime}({\xi ,\eta } )= {I_{\textrm{L/R/A/B}}}({\xi ,\eta } )+ {Z_{\textrm{L/R/A/B}}}^{ - 1}{{\boldsymbol K}_{\textrm{L/R/A/B}}}{\boldsymbol T^{\prime}}\textrm{.}$$
where the subscripts “A” and “B” are used to indicate the above and below sides, respectively. In summary, drawing the inner-type and edge-type ideal-sampled viewpoint images by our improved DIBR method could complete the regularization of the initial-sampled real-LF, and the obtained ideal-sampled viewpoint image set was denoted as ${L_{\textrm{ide}}} = \{{{{I^{\prime}}_{1,1}},{{I^{\prime}}_{1,2}},\ldots ,{{I^{\prime}}_{n,n}}} \}$, which could be used in the EPISM method to effectively reduce the distortion of the reconstructed 3D scene.

3.2.2 Dense processing of the sampled real-LF

In order to solve the problems of low angular resolution and the view-flapping effect of the reconstructed LF under the sparse-sampling conditions in the HS, the regularized sampled LF should be extended to dense-sampled LF by using the improved DIBR method. A 3×3 sparse-viewpoints array was taken to generate a 5×5 dense-viewpoints array as an example [33], as shown in Fig. 11. Among them, in each 2×2 ideal-sampled viewpoint image sub-array, the ideal-sampled viewpoint image was sequentially generated in the order of horizontal-direction mapping, vertical-direction mapping, and diagonal-direction mapping. The generation of the virtual-sampled viewpoint images in the horizontal and vertical directions was the same as the inner-type ideal-sampled viewpoint image. That is, the virtual-sampled viewpoint images in the middle were drawn according to the ideal-sampled viewpoint images on both sides of the adjacent sides as well as the horizontal- or vertical-disparity relationships, and then the final virtual-sampled viewpoint image was obtained by fusion. The drawing process of the virtual-sampled viewpoint image in the horizontal direction was:

$${I^{\prime\prime}_\textrm{L}}({\xi ,\eta } )= {I^{\prime}_\textrm{L}}({\xi + \tau^{\prime}{{D^{\prime}}_{\textrm{AH}}},\eta } ),$$
$${I^{\prime\prime}_\textrm{R}}({\xi ,\eta } )= {I^{\prime}_\textrm{R}}({\xi - ({1 - \tau^{\prime}} ){{D^{\prime}}_{\textrm{AH}}},\eta } ),$$
$$I^{\prime\prime}({\xi ,\eta } )= \tau ^{\prime}{I^{\prime\prime}_\textrm{L}}({\xi ,\eta } )+ ({1 - \tau^{\prime}} ){I^{\prime\prime}_\textrm{R}}({\xi ,\eta } ).$$
where $\tau ^{\prime}$ was the weight factor, which represented the ratio of the horizontal distance from the middle virtual-sampled viewpoint to the left ideal-sampled viewpoint and the distance between the left and right ideal-sampled viewpoints. ${D^{\prime}_{\textrm{AH}}}$ represented the horizontal disparity between the top-left and top-right ideal-sampled viewpoint images. In the same way, the virtual-sampled viewpoint image in the vertical direction could be drawn according to the adjacent ideal-sampled viewpoints and their vertical disparity.

 figure: Fig. 11.

Fig. 11. The generation of the dense sampled real-LF.

Download Full Size | PDF

The virtual-sampled viewpoint image in the diagonal direction was not on the connecting line of any two horizontal- or vertical-adjacent actual reference-viewpoint images, so it needed to be mapped twice in the horizontal and vertical directions. In each ideal-sampled viewpoint array, first, the middle virtual-sampled viewpoint image ${I^{\prime\prime}_{\textrm{AM}}}({\xi ,\eta } )$ was drawn from the upper-left and upper-right ideal-sampled viewpoints, and then the vertical-disparity map ${D^{\prime\prime}_{\textrm{AMV}}}({\xi ,\eta } )$ corresponding to ${I^{\prime\prime}_{\textrm{AM}}}({\xi ,\eta } )$ was drawn. The process was expressed as

$${D^{\prime\prime}_{\textrm{LAMV}}}({\xi ,\eta } )= {D^{\prime}_{\textrm{LAV}}}({\xi + \tau^{\prime}{{D^{\prime}}_{\textrm{AH}}},\eta } ),$$
$${D^{\prime\prime}_{\textrm{RAMV}}}({\xi ,\eta } )= {D^{\prime}_{\textrm{RAV}}}({\xi - ({1 - \tau^{\prime}} ){{D^{\prime}}_{\textrm{AH}}},\eta } ),$$
$${D^{\prime\prime}_{\textrm{AMV}}}({\xi ,\eta } )= \tau ^{\prime}{D^{\prime\prime}_{\textrm{LAMV}}}({\xi ,\eta } )+ ({1 - \tau^{\prime}} ){D^{\prime\prime}_{\textrm{RAMV}}}({\xi ,\eta } ).$$
where ${D^{\prime}_{\textrm{LAV}}}({\xi ,\eta } )$ and ${D^{\prime}_{\textrm{RAV}}}({\xi ,\eta } )$ represented the vertical-disparity map corresponding to the upper-left and upper-right ideal-sampled viewpoint images, respectively, and ${D^{\prime\prime}_{\textrm{LAMV}}}({\xi ,\eta } )$ and ${D^{\prime\prime}_{\textrm{RAMV}}}({\xi ,\eta } )$ represented the vertical-disparity map of the middle virtual-sampled viewpoint generated by ${D^{\prime}_{\textrm{LAV}}}({\xi ,\eta } )$ and ${D^{\prime}_{\textrm{RAV}}}({\xi ,\eta } )$, respectively. Finally, according to the intermediate virtual-sampled viewpoint image ${I^{\prime\prime}_{\textrm{AM}}}({\xi ,\eta } )$ and its corresponding vertical-disparity map ${D^{\prime\prime}_{\textrm{AMV}}}({\xi ,\eta } )$, the initial virtual-sampled viewpoint image ${I^{\prime\prime}_{\textrm{M1}}}({\xi ,\eta } )$ in the diagonal direction of the 2×2 ideal-viewpoint sub-array could be drawn vertically downward.
$${I^{\prime\prime}_{\textrm{M1}}}({\xi ,\eta } )= {I^{\prime\prime}_{\textrm{AM}}}({\xi ,\eta + \omega {{D^{\prime\prime}}_{\textrm{AMV}}}} ).$$
where the weighting factor $\omega $ represented the ratio of the vertical distance from the diagonal virtual viewpoint to the upper-middle virtual viewpoint in the 2×2 sub-array to the distance between the upper- and lower-middle virtual viewpoints. In the same way, the middle virtual-sampled viewpoint image ${I^{\prime\prime}_{\textrm{BM}}}({\xi ,\eta } )$ and its corresponding vertical-disparity map ${D^{\prime\prime}_{\textrm{BMV}}}({\xi ,\eta } )$ could be drawn according to the ideal-sampled viewpoint images at the bottom-left and bottom-right of the sub-array. At this time, the 2×2 ideal-sampled viewpoint sub-array could be drawn vertically upwards in the diagonal direction of the initial virtual-sampled viewpoint image ${I^{\prime\prime}_{\textrm{M2}}}({\xi ,\eta } )$:
$${I^{\prime\prime}_{\textrm{M2}}}({\xi ,\eta } )= {I^{\prime\prime}_{\textrm{BM}}}({\xi ,\eta - ({1 - \omega } ){{D^{\prime\prime}_{\textrm{BMV}}}}} ).$$

Finally, the initial virtual-sampled viewpoint images ${I^{\prime\prime}_{\textrm{M1}}}({\xi ,\eta } )$ and ${I^{\prime\prime}_{\textrm{M2}}}({\xi ,\eta } )$ were weighted and fused to get the final virtual-sampled viewpoint image ${I^{\prime\prime}_m}({\xi ,\eta } )$:

$${I^{\prime\prime}_m}({\xi ,\eta } )= \omega {I^{\prime\prime}_{\textrm{M1}}}({\xi ,\eta } )+ ({1 - \omega } ){I^{\prime\prime}_{\textrm{M2}}}({\xi ,\eta } ).$$

In summary, the virtual viewpoint image was drawn by the improved DIBR method, the dense processing of the regularly-sampled LF ${L_{\textrm{ide}}}$ was completed, and the obtained virtual-sampled viewpoint images set was denoted as ${L_{\textrm{vir}}} = \{{{{I^{\prime\prime}}_{1,1}},{{I^{\prime\prime}}_{1,2}},\ldots ,{{I^{\prime\prime}}_{m,m}}} \}$. Finally, the reconstruction of the irregular- and sparse-degraded sampled real-LF in the angular dimension was realized, and the reconstructed sampled real-LF ${L_{\textrm{fin}}}$ that was drawn could be expressed as

$${L_{\textrm{fin}}} = {L_{\textrm{ide}}} \cup {L_{\textrm{vir}}} = \{{{{I^{\prime}}_{1,1}},{{I^{\prime}}_{1,2}},\ldots ,{{I^{\prime}}_{n,n}}} \}\cup \{{{{I^{\prime\prime}}_{1,1}},{{I^{\prime\prime}}_{1,2}},\ldots ,{{I^{\prime\prime}}_{m,m}}} \}.$$

The initial-sampled LF of the real 3D-scene had an angular resolution of 7×7. After regularization and densification processing, the reconstructed-sampled LF of the real 3D-scene has an angular resolution of 69×69. The intervals between the horizontal and vertical viewpoints were 4 mm. Therefore, the sampling range of the camera was a square area with a side length of 27.2 cm. The spatial resolution of the reconstructed-sampled LF ${L_{\textrm{fin}}}$ depended on the CMOS camera’s sensor, which had a resolution of 2448×2448 pixels. In order to match the parameters of the EPISM method and improve rendering efficiency, the resolution of the LF image was down-sampled to 1000×1000 pixels. It may be assumed that the coordinate of the center point ${O_\textrm{R}}$ of the square sampling area in the real-world coordinate system was $({0,0,0} )$, and the center of the camera’s focus plane, that is, the depth center coordinate of the real 3D-scene was $({0,0,13.8\textrm{ cm}} )$.

3.3 Generation of the virtual scene’s LF

Obtaining regular and dense virtual 3D-scene information was performed in 3Ds Max by directly using the virtual cameras to render the 3D model along the preset movement trajectory in the sampling plane. Compared with real-LF acquisition, virtual-LF acquisition has the advantages of low cost and easy implementation. The LF sampling diagram of the virtual 3D scene is presented in Fig. 12. The virtual 3D model police car was taken as the object, which had a length of 2.9 cm, height of 1.7 cm, and depth of 2.3 cm, and was rotated counterclockwise in the plane by 25° to better show the stereoscopic-display effect.

 figure: Fig. 12.

Fig. 12. Configuration of the virtual-LF sampling.

Download Full Size | PDF

In order to achieve the effective fusion of real and virtual LFs, the rendering parameters of the virtual camera were set to keep the spatial resolution, angle resolution, FOV, and sampling interval of the virtual-LF consistent with the parameters of ${L_{\textrm{fin}}}$ of real 3D scene. A set of 2D-perspective images containing the full-parallax information of the virtual 3D scene were rendered. The FOV of the virtual camera was 39.8°, the resolution of each sampled image was 1000×1000 pixels, and the number of sampling viewpoints was 69×69. The intervals between the horizontal and vertical viewpoints were 4 mm, and a square-sampling area with a side length of 27.2 cm was also obtained. It was assumed that the center point ${O_\textrm{V}}$ of the square-sampling area of the virtual scene was coordinated as $({0,0,0} )$ in the virtual-world coordinate system. In 3Ds Max, the virtual-LF could be changed by adjusting the size of the 3D model and the positional relationship with the camera’s sampling area.

3.4 Fusion of virtual and real LFs

When the origin ${O_\textrm{V}}$ of the virtual-world coordinate system coincided with the origin ${O_\textrm{R}}$ of the real-world coordinate system, it was equivalent to the virtual-sampled viewpoints completely coinciding with the real-sampled viewpoints. At this time, only two groups of sampled-viewpoint image sequences needed to be superimposed and fused according to the spatial-geometric occlusion relationship between the virtual and real 3D scenes to complete the fusion. Figure 13 only shows the “real + virtual” LFs fusion diagram in the horizontal-parallax acquisition mode so that it does not lose generality.

 figure: Fig. 13.

Fig. 13. The diagram of “real + virtual” LFs fusion.

Download Full Size | PDF

As shown in Fig. 14, the relationship between light projection and pixel mapping was illustrated when the timing-acquisition system sampled the real 3D scene. Generally, it was considered that the 3D scene conforms to the Lambert hypothesis, that is, the light emitted or reflected from the scene surface had isotropy in space. The optical axis of each camera ${C^{\prime}_{1\_\textrm{R}}},{C^{\prime}_{2\_\textrm{R}}},\ldots ,{C^{\prime}_{i\_\textrm{R}}}$ (i was the number of sampled viewpoints, R represented the real 3D scene) remained parallel and perpendicular to the lens plane and the sensor plane. By capturing the 2D projection of the light emitted or reflected from the real scene’s surface, we were able to get the sampled-viewpoint image sequence ${I^{\prime}_{1\_\textrm{R}}},{I^{\prime}_{2\_\textrm{R}}},\ldots ,{I^{\prime}_{i\_\textrm{R}}}$, which contained the parallax relationship of the real 3D scene. Among them, the pixels formed by the projection of the light emitted or reflected by the three object points $\{{E,F,G} \}$ on the scene surface in the image ${I^{\prime}_{1\_\textrm{R}}},{I^{\prime}_{2\_\textrm{R}}},\ldots ,{I^{\prime}_{i\_\textrm{R}}}$ of each viewing angle were $\{{{e_1},{f_1},{g_1}} \},\{{{e_2},{f_2},{g_2}} \},\ldots ,\{{{e_i},{f_i},{g_i}} \}$, respectively. Since the projection angles of the light emitted by the same object point were different in each viewpoint image, the coordinates of the formed pixel point in the pixel coordinate system of each viewpoint image also changed accordingly. For example, the position of the pixel ${f_1}$ in ${I^{\prime}_{1\_\textrm{R}}}$ was lower than that of the pixel ${f_2}$ in ${I^{\prime}_{2\_\textrm{R}}}$, so the LF information of the real 3D scene in space could be recorded.

 figure: Fig. 14.

Fig. 14. The light-affine projection relation in real-LF sampling.

Download Full Size | PDF

The relationship between light projection and pixel mapping of the virtual LF acquisition system is illustrated in Fig. 15. The rendering parameters and moving trajectory of the virtual camera were completely consistent with those of the real camera, and the sampled-viewpoint image sequence ${I^{\prime}_{1\_\textrm{V}}},{I^{\prime}_{2\_\textrm{V}}},\ldots ,{I^{\prime}_{i\_\textrm{V}}}$ (V represented the virtual 3D scene) could be obtained, which had the same image resolution as ${I^{\prime}_{i\_\textrm{R}}}$. The occlusion relationship between the virtual and real 3D scene is shown in Fig. 15. Within the angle range that can be captured by the camera lens, the light emitted or reflected by two points F and G on the surface of the real 3D scene was blocked by the virtual 3D scene in the same space. The lens center of the virtual camera ${C^{\prime}_{1\_\textrm{V}}}$ was point ${O_1}$. According to the ray-tracing principle, the lines of points $F$ and G and point ${O_1}$ intersected with the virtual 3D scene’s surface at points ${F_1}$ and ${G_1}$, respectively. The light emitted or reflected by points ${F_1}$ and ${G_1}$ within a certain angle converged through the lens of camera ${C^{\prime}_{1\_\textrm{V}}}$ and was recorded in pixels ${f_{11}}$ and ${g_{11}}$ in the viewpoint image ${I^{\prime}_{1\_\textrm{V}}}$, respectively. Since the essence of camera imaging is the 2D projection of the target light set in space, the projection-point position of the light with the same direction on the sensor plane was the same, and the difference lied in the intensity and frequency of the light. Since the angle between the lights ${F_1}{O_1}$ and $F{O_1}$ and the optical axis of the camera ${C^{\prime}_{1\_\textrm{V}}}$ was ${\theta _{{f_{_1}}}}$, the projection point ${f_{11}}$ in viewpoint ${I^{\prime}_{1\_\textrm{V}}}$ and the projection point ${f_1}$ in viewpoint ${I^{\prime}_{1\_\textrm{R}}}$ had the same pixel position index. Then, since the angle between the lights ${G_1}{O_1}$ and $G{O_1}$ and the optical axis of the camera ${C^{\prime}_{1\_\textrm{V}}}$ was ${\theta _{{g_{_1}}}}$, the projection point ${g_{11}}$ in viewpoint ${I^{\prime}_{1\_\textrm{V}}}$ and the projection point ${g_1}$ in viewpoint ${I^{\prime}_{1\_\textrm{R}}}$ had the same pixel-position index as well. By analogy, the pixel-position index of $\{{{f_{22}},{g_{22}}} \}$ in image ${I^{\prime}_{2\_\textrm{V}}}$ was the same as that of $\{{{f_2},{g_2}} \}$ in image ${I^{\prime}_{2\_\textrm{R}}}$, and the pixel-position index of $\{{{f_{ii}},{g_{ii}}} \}$ in image ${I^{\prime}_{i\_\textrm{V}}}$ was the same as that of $\{{{f_i},{g_i}} \}$ in image ${I^{\prime}_{i\_\textrm{R}}}$. Different from the real-LF acquisition, the position of the pixel that did not receive the light signal in the sensor of the virtual camera was the blank pixel.

 figure: Fig. 15.

Fig. 15. The light-affine projection relation in virtual-LF sampling.

Download Full Size | PDF

The “real + virtual” LF-fusion principle of the HS is shown in Fig. 16. Since the acquisition parameters of the real-LF and the virtual-LF were exactly the same, if the real-LF described in Fig. 14 and the virtual-LF described in Fig. 15 were merged into the same LF, it was necessary to cover the pixels with smaller depth at the same pixel-index position in sequence according to the different depth relationships between the scenes and according to the order of the image sequence if it was at a larger depth. If the pixels were not blocked by the foreground object, they would be displayed normally, and finally the fusion of the sampled LF can be realized. According to the assumed relationship of scene depth, the virtual scene blocked part of the real scene, so the pixels ${f_{11}},{f_{22}},\ldots ,{f_{ii}}$ in the virtual-LF covered the pixels ${f_1},{f_2},\ldots ,{f_i}$ in the real-LF, respectively, Likewise, the pixels ${g_{11}},{g_{22}},\ldots ,{g_{ii}}$ in the virtual-LF covered the pixels ${g_1},{g_2},\ldots ,{g_i}$ in the real-LF, respectively. For the object point E, which was not occluded in the real 3D scene, its corresponding pixels were normally displayed in the fused viewpoint image sequence. In the same way, by covering all the occluded pixels in the real-LF with the virtual-LF in turn, a set of perspective images sequence ${I^{\prime}_{1\_\textrm{F}}},{I^{\prime}_{2\_\textrm{F}}},\ldots ,{I^{\prime}_{i\_\textrm{F}}}$ including the depth relationship and disparity relationship of the virtual–real fused-LF was obtained.

 figure: Fig. 16.

Fig. 16. The diagram of LFs fusion.

Download Full Size | PDF

4. Experiments and discussions

Figure 17 shows the structure of the full-parallax HS printing system. The laser source adopted a 400 mW 639 nm single longitudinal-mode linearly-polarized solid-state laser (model CNIMSL-FN-639@CNI). A mechatronic shutter (model SSH-C2B@Sigma Koki) was employed to control the exposure time of the hogels by modulating the laser output. The modulated laser was divided into the object beam and the reference beam after passing through a $\lambda \textrm{/2}$ wave plate and a polarization-dependent beam splitter. The power ratio between the object beam and the reference beam could be adjusted by rotating the $\lambda \textrm{/2}$ wave plate in front of the beam splitter. Then the object beam passed through another $\lambda \textrm{/2}$ wave plate behind the beam splitter, and this $\lambda \textrm{/2}$ wave plate was used to keep the polarization direction of the object beam and the reference beam consistent. The object beam illuminated the effective area on the LCD panel after being expanded by the 40× objective lens. The model of the LCD panel was VVX09F035M20 @Panasonic, which had a size of 8.9 inch, a resolution of 1920×1200 pixels, and a pixel spacing of 0.1 mm, whose background-light module and polarizer was removed except for the diffusor, and the scattering film was placed close to the front of the LCD panel to diffuse the object beam. And then, the object beam projected the effectively synthetic perspective image of 1000×1000 pixels loaded on the LCD panel onto the silver-halide plate, which was 13.8 cm away from the LCD panel. The intensity of the silver halide plate is $\mathrm{\Delta }E = 1250\textrm{ }\mathrm{\mu}\textrm{J/c}{\textrm{m}^\textrm{2}}@639\textrm{ nm}$. A 4 mm×4 mm square aperture was placed close to the front and back of the silver halide dry plate to form a hogel of the same shape and size, as well as to block the unexpected light from other directions. The power of the reference beam was further controlled by rotating the attenuator located between the two reflectors, and the higher-frequency light component in the reference beam was filtered out by a spatial filter composed of a 40× objective lens and a $15\textrm{ }\mathrm{\mu}\textrm{m}$ pinhole. Finally, a collimating lens with a focal length of 150 mm was placed to collimate the reference beam as planar wave, with the $15\textrm{ }\mathrm{\mu}\textrm{m}$ pinhole located at the focal point of the collimator lens. The reference beam and the object beam were irradiated on the silver halide dry plate at an angle of 20°. The silver halide dry plate was installed on the KSA300 @Zolix-type two-dimensional linear-displacement platform, and the displacement platform was driven by the MC600 @Zolix Control programmable controller and used a computer to synchronously control the mechatronic shutter, the LCD panel, and two-dimensional linear displacement platform to realize the one-by-one exposure of the hogel, until the entire HS was printed.

 figure: Fig. 17.

Fig. 17. The diagram of the HS printing system structure.

Download Full Size | PDF

The center coordinate of the police car model was set to $({1.5\textrm{ cm},1.9\textrm{ cm},13.8\textrm{ cm}} )$ for virtual-LF sampling. First, the virtual 3D scene-sampled perspective image sequence under this condition was fused with the real 3D scene-sampled perspective image sequence, and then processed by the EPISM algorithm to obtain 20×20 LF-fused effectively synthetic perspective images with 1000×1000 pixels. Finally, they were loaded into the printing system to obtain a full-parallax HS of the “real + virtual” fusion-LF. The hogel size was 4 mm, and the effective interference area is 8 cm×8 cm. The exposure time of each hogel can be expressed as ${T_{exp }} = \mathrm{\Delta }E/({{P_{\textrm{obj}}} + {P_{\textrm{ref}}}} )$. The energy density of the object beam in the experiment was ${P_{\textrm{obj}}} = 12\textrm{ }\mathrm{\mu}\textrm{J/c}{\textrm{m}^2}$, and the energy density of the reference beam was ${P_{\textrm{ref}}} = 300\textrm{ }\mathrm{\mu}\textrm{J/c}{\textrm{m}^2}$. The exposure time was set to ${T_{exp }} = 4\textrm{ s}$. In order to eliminate the system vibration, the static time of the displacement platform was ${T_{\textrm{sta}}} = 16\textrm{ s}$ before the next hogel was exposed. Ignoring the time of each displacement, the total printing time was ${T_{\textrm{all}}} \approx ({{T_{exp }} + {T_{\textrm{sta}}}} )\times 20 \times 20 = 8000\textrm{ s}$.

Since the energy ratio between the object beam and the reference beam was 1:25, the diffraction efficiency was relatively low. In order to obtain a better holographic display effect, the HS was reconstructed by using the laser. The reconstruction system is shown in Fig. 18. Employing the same laser light source as the printing system, the output laser beam first expanded by a 40× objective lens, and then passes through a collimator lens with a focal length of 150 mm to form planar wave. The silver halide dry plate was illuminated by the conjugate beam of the original reference beam, and a real stereoscopic image was reconstructed perpendicularly to the plate. A Canon EOS 5D Mark III digital camera with a 100 mm fixed-focus lens was used to capture the reconstructed LF at about 52.5 cm in front of the HS.

 figure: Fig. 18.

Fig. 18. The diagram of the LF-reconstruction system for the HS.

Download Full Size | PDF

The photographs of reconstruction of the HS with real–virtual fused-LF from different perspectives are shown in Fig. 19. It can be seen that the reconstructed scenes are consistent with the original ones, and the occlusion between the virtual 3D scene and the real 3D scene conforms to the preset spatial position relationships, that is, the police car deviates from the center of the real 3D scene and is displayed at the bottom-right. At the same time, the HS presents correct full parallax information and is free of view-flipping effect. According to the printing principle of EPISM method, the view zone range is determined by the FOV of the sampled perspective images and printing of hogels, which is supposed to be 39.8° in this experiment. However, when the observer was close to the extreme visible area (i.e., ${\pm} {19.9^ \circ }$), the main part of the real 3D scene (the police with shield) was incomplete in the reconstructed scene. Furthermore, since the main part of the real 3D scene is 4.9 cm in length and 3.3 cm in height, this phenomenon mentioned above is more obvious in vertical direction than that of the horizontal direction. In order to capture photographs with complete reconstructed scene of the real 3D scene and better display the fusion effect, the viewing zone range shown in Fig. 19 is less than 39.8°, and the horizontal viewing zone is slightly larger than that of the vertical one. In addition, the edge part of the chessboard is brighter than other parts in the reconstructed scene, which dues to the illumination of the light in the real-LF acquisition, and it shows the holographic reconstruction can display the gloss of the scene.

 figure: Fig. 19.

Fig. 19. The photographs of reconstruction from different perspectives.

Download Full Size | PDF

The camera’s focus plane was 13.8 cm away from the camera-sampling plane, according to the principle of the EPISM method, the distance between the imaging plane of the reconstructed scene and the HS was also 13.8 cm. The reconstructed scene is indicated out of the HS in Fig. 20. The position relationship between the rulers and the HS was shown in Fig. 20(a). Ruler 1 was placed on the left side of the HS and was parallel to its front surface. Ruler 2 was placed on the right side of the HS and 13.8 cm ahead. When the digital camera focused on ruler 2, a clear reconstructed “real + virtual” fused scene could be observed, as shown in Fig. 20(b). When the digital camera focused on ruler 1, the clear holographic dry plate and raster hogels could be observed simultaneously, while the reconstructed scene was blurred, as shown in Fig. 20(c). It can be proved that the reconstructed image of the fused-LF was a positive-real image out of the holographic plate.

 figure: Fig. 20.

Fig. 20. The photographs in different focus depths: (a) the position relationship between the holographic plate and rulers, (b) focus on ruler 2, and (c) focus on ruler 1.

Download Full Size | PDF

In order to further explore the influence of the spatial position relationship between the virtual and the real 3D-scene on the reconstructed fused-scene, the sampling parameters of the real 3D scene were kept unchanged, and the distance between the virtual 3D scene and the sampling plane was adjusted in order to change the depth relationship between the virtual and the real 3D-scene. According to the principle of the EPISM method, the reconstruction depth is determined by the distance ${L_2}$ between the reference plane and the HS. Therefore, the shape, size, and rotation attitude of the virtual 3D scene were kept, and the sampling plane was adjusted, in other words, it was equivalent to changing the distance $L_2^{vir}$ from the HS plane to the reference plane. When $L_2^{vir} = 15.3\textrm{ cm}$ and 15.8 cm, the reconstruction depth between the virtual and real 3D-scene was changed to $\mathrm{\Delta }L_2^{vir} = 1.5\textrm{ cm}$ and 2.0 cm, respectively. In the printing system shown in Fig. 17, the physical distance between the LCD panel and the HS was fixed to 13.8 cm, which could only meet the correct exposure of the synthetic effective perspective images of the real 3D scene. In order to realize the holographic printing of the real + virtual 3D scene at different reconstruction depths, it was necessary to scale the synthetic effective perspective images of the virtual 3D scene, as shown in Fig. 21.

 figure: Fig. 21.

Fig. 21. The diagram of synthetic effective perspective image scaling.

Download Full Size | PDF

Suppose the synthetic effective perspective images generated by EPISM method is $M \times M\textrm{ pixels}$, then the image loaded onto LCD panel should be scaled to 1000×1000 pixels, where $M = 1000 \times L_2^{vir}/13.8$. In this experiment, when $L_2^{vir}$ was set to 15.3 cm and 15.8 cm, then we had $M = 1109\textrm{ pixels}$ and $1145\textrm{ pixels}$, respectively. The 1109×1109 pixels and 1145×1145 pixels synthetic effective perspective image sequence of the virtual 3D scene was scaled to 1000×1000 pixels and fused with the synthetic effective perspective image sequence of the real 3D scene, and then loaded onto the LCD panel for printing. Lastly, the digital camera was used to focus at different depths to take the reconstructed scene. As shown in Fig. 22, ruler 1 is placed 13.8 cm in front of the holographic plate, and ruler 2 was placed $\mathrm{\Delta }L_2^{vir}$ away from ruler 1. The HS was illuminated by the planar conjugate beam of the original reference beam to reconstruct the fused-LF.

 figure: Fig. 22.

Fig. 22. The spatial position relation of the HS and rulers.

Download Full Size | PDF

The photographs in different focus depths are shown in Fig. 23. When $\mathrm{\Delta }L_2^{vir} = 1.5\textrm{ cm}$, both the ruler 1 and the main body of the real 3D scene are clear simultaneously in Fig. 23(a), while both the ruler 2 and the virtual 3D scene are clearly simultaneously in Fig. 23(b). Similarly, when $\mathrm{\Delta }L_2^{vir} = 2.0\textrm{ cm}$, both the ruler 1 and the main body of the real 3D scene are clear simultaneously in the Fig. 23(c), while both the ruler 2 and the virtual 3D scene are clearly simultaneously in Fig. 23(d). By comparing Fig. 23(b) with Fig. 23(d), it can be seen that the defocus-blur effect became serious with the increasing of the depth relationship between the virtual and real 3D scene. The stereoscopic display of the real–virtual fused-scene can be observed clearly and correctly.

 figure: Fig. 23.

Fig. 23. Fusion representation when the reproduction depths of the real and virtual scenes are inconsistent.

Download Full Size | PDF

By comparing the reconstruction effect in Fig. 23(b) with Fig. 23(c), it can be seen that the reconstruction quality of the virtual 3D scene decreased with the increase of the reconstruction depth, which shows the curvature distortion caused by the inherent defects of HS. The fundamental reason is that the HS approximated the complete phase information with discrete sampled perspective images. As shown in Fig. 24, the reconstruction scene is a joint expression of the synthetic effective perspective images in multiple hogels, and the imaging plane locates on the reference plane. The wavefront information of any point can be represented by the ideal spherical wave. The wavefront information of the image point ${\textrm{A}_\textrm{r}}$ that deviates from the imaging plane is an approximate approximation of the discrete wavefront emitted by a series of hogels including $\textrm{hoge}{\textrm{l}_n}$. It can be seen that there is a curvature difference between the reconstructed wavefront corresponding to the point A on the $\textrm{hoge}{\textrm{l}_n}$ and the ideal spherical wave at the point ${A_\textrm{r}}$ on the $\textrm{hoge}{\textrm{l}_n}$ on the imaging plane, that is, the curvature distortion effect. Obviously, there is a positive correlation between ${L_\textrm{A}}$, ${L_\textrm{2}}$, and the magnitude of the curvature deviation. The clearest imaging plane should be located on the reference plane of the 3D scene. For the holographic printing system in this experiment, the reference plane of the fused 3D scene is located on the depth center plane of the real 3D scene, that is, ${L_\textrm{2}} = 13.8\textrm{ cm}$. When the virtual 3D scene is far away from this plane, the curvature distortion effect will gradually increase, resulting in a decrease in the quality of the reconstruction scene. An effective way to solve the effect is to continuously increase the angular resolution of the sampled LF, which infinitely approaches the real wavefront information of the 3D scene. Therefore, our improved DIBR method can be employed to generate more dense-sampled viewpoint images, which can reduce the influence of curvature distortion on the reconstruction scene.

 figure: Fig. 24.

Fig. 24. The diagram of curvature distortion effect.

Download Full Size | PDF

5. Conclusion

In this work, we built the reconstruction model of a degenerative LF based on the DIBR technology, which included the regularization and densification processing of the sampled viewpoints, and reduced the distortion and view-flapping effect of the reconstructed scene. Additionally, a LF-fusion method for the HS based on pixel-affine-projection was proposed, which completed the pixel-level fusion of the reconstructed real 3D scene and virtual 3D scene information. And we printed the real–virtual scene-fused full-parallax holographic stereogram with the correct spatial-geometric relationship, and changed the spatial relationship of the fused 3D reconstruction by adjusting the sampling parameters of the virtual 3D scene.

Funding

National Key Research and Development Program of China (2017YFB1104500); National Natural Science Foundation of China (61775240); Foundation for the Author of National Excellent Doctoral Dissertation of the People’s Republic of China (201432).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Liu, X. Sun, Y. Zhao, and Z. Li, “Ultrafast volume holographic recording with exposure reciprocity matching for TI/PMMAs application,” Opt. Express 27(14), 19583–19595 (2019). [CrossRef]  

2. H. I. Bjelkhagen and D. Brotherton-Ratcliffe, “Ultrarealistic imaging: the future of display holography,” Opt. Eng. 53(11), 112310 (2014). [CrossRef]  

3. J. Su, X. Yan, Y. Huang, X. Jiang, Y. Chen, and T. Zhang, “Progress in the Synthetic Holographic Stereogram Printing Technique,” Appl. Sci. 8(6), 851 (2018). [CrossRef]  

4. P.-A. J. Blanche, C. M. Bigler, J.-W. Ka, and N. N. Peyghambarian, “Fast and continuous recording of refreshable holographic stereograms,” Opt. Eng. 57(06), 1 (2018). [CrossRef]  

5. V. R. Pole and, “3-D Imagery and Holograms of Objects Illuminated in White Light,” Appl. Phys. Lett. 10(1), 20–22 (1967). [CrossRef]  

6. D. J. Debitetto, “Holographic Panoramic Stereograms Synthesized from White Light Recordings,” Appl. Opt. 8(8), 1740–1741 (1969). [CrossRef]  

7. M. C. King, A. M. Noll, and D. H. Berry, “A new approach to computer-generated holography,” Appl. Opt. 9(2), 471–475 (1970). [CrossRef]  

8. M. W. Halle, S. A. Benton, M. A. Klug, and J. S. Underkoffler, “Ultragram: a generalized holographic stereogram,” Proc. SPIE 1461, 142–155 (1991). [CrossRef]  

9. M. Yamaguchi, N. Ohyama, and T. Honda, “Holographic 3-D printer,” Proc. SPIE 1212(6), 1–5 (1990). [CrossRef]  

10. D. Brotherton-Ratcliffe, F. R. Vergnes, A. Rodin, and M. Grichine, “Holographic printer,” U.S. patent 7161722 (9 January 2007).

11. J. Su, Q. Yuan, Y. Huang, X. Jiang, and X. Yan, “Method of single-step full parallax synthetic holographic stereogram printing based on effective perspective images’ segmentation and mosaicking,” Opt. Express 25(19), 23523–23544 (2017). [CrossRef]  

12. X. Yan, T. Zhang, C. Wang, F. Fan, X. Wang, Z. Wang, J. Bi, S. Chen, M. Lin, and X. Jiang, “Analysis on the reconstruction error of EPISM based full-parallax holographic stereogram and its improvement with multiple reference plane,” Opt. Express 27(22), 32508–32522 (2019). [CrossRef]  

13. M. W. Halle, “Fast Computer Graphics Rendering for Full Parallax Spatial Displays,” Proc. SPIE 3011, 105–112 (1997). [CrossRef]  

14. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM Trans. Graph. 24(3), 765–776 (2005). [CrossRef]  

15. J. Unger, A. Wenger, T. Hawkins, A. Gardner, and P. Debevec, “Capturing and rendering with incident light fields,” in 14th Eurographics Workshop on Rendering Techniques, pp. 141–149 (2003).

16. G. Lippmann, “Epreuves reversibles donnant la sensation du relief,” J. Phys. Theor. Appl. 7(1), 821–825 (1908). [CrossRef]  

17. T. Georgiev and C. Intwala, “Light field camera design for integral view photography,” Adobe Technical Report, 1–13 (2006).

18. B. Katz, N. T. Shaked, and J. Rosen, “Synthesizing computer generated holograms with reduced number of perspective projections,” Opt. Express 15(20), 13250–13255 (2007). [CrossRef]  

19. S. Fachada, D. Bonatto, and G. Lafruit, “High-quality holographic stereogram generation using four RGBD images,” Appl. Opt. 60(4), A250–A259 (2021). [CrossRef]  

20. X. Cao, X. Sang, Z. Chen, Y. Zhang, J. Leng, N. Guo, B. Yan, J. Yuan, K. Wang, and C. Yu, “Fresnel hologram reconstruction of complex three-dimensional object based on compressive sensing,” Chin. Opt. Lett. 12(8), 080901 (2014). [CrossRef]  

21. Y. Rivenson, A. Stern, and B. Javidi, “Overview of compressive sensing techniques applied in holography,” Appl. Opt. 52(1), A423–A432 (2013). [CrossRef]  

22. T. Yao, X. Sang, D. Chen, P. Wang, H. Wang, and S. Yang, “Multi-view acquisition for 3D light field display based on external mask and compressive sensing,” Opt. Commun. 435, 118–125 (2019). [CrossRef]  

23. Y. Rivenson, A. Stern, and J. Rosen, “Compressive multiple view projection incoherent holography,” Opt. Express 19(7), 6109–6118 (2011). [CrossRef]  

24. E. Sahin, S. Vagharshakyan, J. Mäkinen, R. Bregovic, and A. Gotchev, “Shearlet-domain light field reconstruction for holographic stereogram generation,” in 2016 IEEE International Conference on Image Processing (ICIP), pp. 1479–1483 (2016).

25. L. Shi, B. Li, C. Kim, P. Kellnhofer, and W. Matusik, “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]  

26. M. Levoy and P. Hanrahan, “Light field rendering,” in 23rd Annual Conference on Computer Graphics and Interactive Techniques-SIGGRAPH'96, pp. 31–42 (1996).

27. M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512–566 (2018). [CrossRef]  

28. C. Fehn, “Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV,” in Stereoscopic Displays and Virtual Reality Systems XI, pp. 93–104 (2004).

29. M. Bleyer, C. Rhemann, and C. Rother, “PatchMatch Stereo-Stereo Matching with Slanted Support Windows,” in British Machine Vision Conference, pp. 1–11 (2011).

30. W. Sun, L. Xu, O. C. Au, S. H. Chui, and C. W. Kwok, “An overview of free view-point depth-image-based rendering (DIBR),” in APSIPA Annual Summit and Conference, pp. 1023–1030 (2010).

31. N. Greene, M. Kass, and G. Miller, “Hierarchical Z-buffer visibility,” in 20th Annual Conference on Computer Graphics and Interactive Techniques, pp. 231–238 (1993).

32. A. Criminisi, P. Pérez, and K. Toyama, “Region filling and object removal by exemplar-based image inpainting,” IEEE T. Image Process. 13(9), 1200–1212 (2004). [CrossRef]  

33. S. W. Y. Lv, G. Ren, Y. Yu, and X. Wang, “Two-dimensional multi-viewpoint image array acquisition and virtual viewpoint synthesis algorithm,” J. Harbin Eng. Univ. 34(6), 763–767 (2013).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (24)

Fig. 1.
Fig. 1. The basic principles of EPISM method. (a) sampling of the original perspective images, (b) segmentation and mosaicking of the effective perspective image segments.
Fig. 2.
Fig. 2. Analysis of the EPISM reconstruction error caused by irregular sampling.
Fig. 3.
Fig. 3. Analysis on the view-flapping effect under sparse sampling.
Fig. 4.
Fig. 4. Configuration of the sampling.
Fig. 5.
Fig. 5. The effectively synthetic perspective images encoded by the EPISM method under different sampling parameters: (a) when the sampling viewpoints are dense and regular, (b) when the sampling viewpoints are partly missing, (c) when the sampling viewpoints are partly deviated, and (d) when the sampling viewpoints are sparse.
Fig. 6.
Fig. 6. Configuration of the real-LF sampling system.
Fig. 7.
Fig. 7. Reconstruction of the sampled real-LF.
Fig. 8.
Fig. 8. The overall strategy block diagram of the improved DIBR method.
Fig. 9.
Fig. 9. The generation of the regularized sampled real-LF.
Fig. 10.
Fig. 10. Different types of ideal-sampled viewpoint image rendering processes.
Fig. 11.
Fig. 11. The generation of the dense sampled real-LF.
Fig. 12.
Fig. 12. Configuration of the virtual-LF sampling.
Fig. 13.
Fig. 13. The diagram of “real + virtual” LFs fusion.
Fig. 14.
Fig. 14. The light-affine projection relation in real-LF sampling.
Fig. 15.
Fig. 15. The light-affine projection relation in virtual-LF sampling.
Fig. 16.
Fig. 16. The diagram of LFs fusion.
Fig. 17.
Fig. 17. The diagram of the HS printing system structure.
Fig. 18.
Fig. 18. The diagram of the LF-reconstruction system for the HS.
Fig. 19.
Fig. 19. The photographs of reconstruction from different perspectives.
Fig. 20.
Fig. 20. The photographs in different focus depths: (a) the position relationship between the holographic plate and rulers, (b) focus on ruler 2, and (c) focus on ruler 1.
Fig. 21.
Fig. 21. The diagram of synthetic effective perspective image scaling.
Fig. 22.
Fig. 22. The spatial position relation of the HS and rulers.
Fig. 23.
Fig. 23. Fusion representation when the reproduction depths of the real and virtual scenes are inconsistent.
Fig. 24.
Fig. 24. The diagram of curvature distortion effect.

Tables (1)

Tables Icon

Table 1. The sampling parameters corresponding to each group of effectively synthetic perspective images

Equations (23)

Equations on this page are rendered with MathJax. Learn more.

Δ e = L 2 L 1 + L 2 Δ d .
δ = Δ H × L A L 2 + L A .
θ A δ L v L 2 = Δ H × L A ( L 2 + L A ) ( L v L 2 ) .
L fin = { I 1 , I 2 , , I n } { I 1 , I 2 , , I m } .
[ ξ v η v 1 ] = Z v 1 K v [ R v R r 1 ( K r 1 Z r [ ξ r η r 1 ] T r ) + T v ] .
I L ( ξ , η ) = Z 1 K [ R R L 1 ( K L 1 Z L I L ( ξ , η ) T L ) + T ] .
{ Z L = Z R = Z , K L = K R = K , R L = R R = R , T L = 0 3 × 1 T .
I L ( ξ , η ) = I L ( ξ , η ) + Z L 1 K L T .
I L ( ξ , η ) = I L ( ξ + τ D LR , η ) .
I R ( ξ , η ) = I R ( ξ ( 1 τ ) D LR , η ) .
τ = | t t L | | t t L | + | t t R | .
I ( ξ , η ) = τ I L ( ξ , η ) + ( 1 τ ) I R ( ξ , η ) .
I ( ξ , η ) = I L/R/A/B ( ξ , η ) + Z L/R/A/B 1 K L/R/A/B T .
I L ( ξ , η ) = I L ( ξ + τ D AH , η ) ,
I R ( ξ , η ) = I R ( ξ ( 1 τ ) D AH , η ) ,
I ( ξ , η ) = τ I L ( ξ , η ) + ( 1 τ ) I R ( ξ , η ) .
D LAMV ( ξ , η ) = D LAV ( ξ + τ D AH , η ) ,
D RAMV ( ξ , η ) = D RAV ( ξ ( 1 τ ) D AH , η ) ,
D AMV ( ξ , η ) = τ D LAMV ( ξ , η ) + ( 1 τ ) D RAMV ( ξ , η ) .
I M1 ( ξ , η ) = I AM ( ξ , η + ω D AMV ) .
I M2 ( ξ , η ) = I BM ( ξ , η ( 1 ω ) D BMV ) .
I m ( ξ , η ) = ω I M1 ( ξ , η ) + ( 1 ω ) I M2 ( ξ , η ) .
L fin = L ide L vir = { I 1 , 1 , I 1 , 2 , , I n , n } { I 1 , 1 , I 1 , 2 , , I m , m } .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.