Real-time three-dimensional video reconstruction of real scenes with deep depth using electro-holographic display system

Hidenari Yanagihara; Takashi Kakue; Yota Yamamoto; Tomoyoshi Shimobaba; Tomoyoshi Ito

doi:10.1364/OE.27.015662

1. Introduction

Three-dimensional (3D) display is a promising technique for realizing a surgical-support system in the medical field and next-generation 3D television systems. Particularly, realizing a next-generation 3D television system capable of bidirectional interactive communication is expected to be possible if 3D information regarding 3D objects in real space can be displayed as 3D images in real time. The 3D display can be classified into three types: stereoscopic display (based on geometric optics) [1,2], light-field display (based on geometric optics) [3–5] and holographic display (based on wave optics) [6,7]. Because stereoscopic display and light-field display can record and reconstruct only the intensity of light, the phase of the light is lost when during 3D reconstruction of images, and the quality of the 3D images may deteriorate. In contrast, because the holographic display can record both the intensity and the phase of light as a hologram, such holographic display can accurately reconstruct the phase of light. Therefore, holographic display can reconstruct high-quality 3D images with deep depth. In the holographic display, we focus on electro-holography [8,9] which can reconstruct moving pictures by displaying holograms on a spatial light modulator (SLM). Because holograms displayed on the SLM can be treated as digital images, computer-generated holograms (CGHs) calculated by a numerical simulation of light on a computer are mainly used in electro-holography. To realize a 3D display using electro-holography, considerable research have been conducted on 3D information acquisition in real space [10–13], CGH calculation [14–26], and 3D image reconstruction [27–30]. While 3D image reconstruction using 3D information of real 3D objects have been reported [12–14,22,25], these studies do not perform the processing from acquiring 3D information to reconstructing 3D images continuously in real time. To realize real-time reconstruction of real scenes using electro-holography, continuously performing a set of processes from acquiring 3D information to reconstructing 3D images is necessary.

Real-time electro-holographic reconstruction of real scenes using a light-field technique [10,11] to capture 3D information of actual objects has been reported [31]. Light-field cameras can acquire 3D information of actual objects as light fields. In [31], a micro-lens array comprising many elemental lenses is used as a light-field camera. Because the light-field technique can easily realize occlusion culling, occlusion of 3D images can be correctly reconstructed when eye position is changed. When using a light-field technique, the image quality of the acquired 3D objects can remarkably deteriorate if the distance between 3D objects and the micro-lens array is long [32]. Although this problem can be solved using multiple ray-sampling planes [19,24] that correspond to the positions of the light-field camera [33], capturing the images multiple times while varying the depth is necessary to acquire 3D information of 3D objects clearly. In other words, 3D information of 3D objects with deep depth cannot be clearly acquired at once using the light-field technique. Because it takes a lot of time to effectively acquire 3D information with deep depth, it is difficult to capture dynamic scenes such as the movement of a person. This is a severe problem facing the realization of an electro-holography based next-generation 3D television system.

In this paper, we propose and demonstrate a real-time electro-holography system that does not employ the light-field technique in order to overcome these problems. Figure 1 shows the schematic diagram of the proposed system. Here, we focus on the movement of a person and reconstruct the 3D images. In the input part, we acquired 3D information of 3D objects in a real space as point cloud data having color and position information using an RGB-D camera [12–14,22,25]. In [12–14,22,25], the processing from 3D-object acquisition using an RGB-D camera to 3D-image reconstruction was not performed in real time. An RGB-D camera can be used to effectively acquire 3D information of 3D objects in deep depth at once. Further, the angle of view of the RGB-D camera depends on the camera lens, hence, it can be used to capture images of real scenes within the desired range. Therefore, an RGB-D camera can be used to capture images of 3D objects in real scenes. However, the obtained 3D objects may contain a huge amount of background information. Because the background is unnecessary in reconstructing the movement of a person, we extract 3D information of the person by removing the background information using a background subtraction technique [34]. By removing the background, the number of object points decreases and CGH calculation can be performed at a high speed. Further, we calculate the CGHs using 3D information of the person obtained by performing background subtraction. Since CGH calculation has huge computational complexity, it is necessary to speed up the CGH calculation in order to perform real-time processing. We used a graphics processing unit (GPU) to parallelize CGH calculation. In the output part, the CGHs calculated at the calculation part were displayed on the SLM and the 3D images were reconstructed by illuminating it with the reconstruction light. Additionally, we evaluated the performance of the real-time system by changing the distance between an RGB-D camera and the person as well as the number of persons involved.

Fig. 1 Schematic diagram of the proposed system. Preprocessing is performed continuously from the input part to the output part.

Download Full Size | PDF

2. Methods

2.1 Acquiring 3D information of 3D objects

In this paper, we present how 3D information about real 3D objects can be acquired using an RGB-D camera. Because an RGB-D camera is equipped with not only a color camera but also a depth sensor, it can be used to simultaneously acquire both color and depth images of real 3D objects. In this study, we used Kinect for Windows v2 (Kinect v2) from Microsoft Corporation [35] as an RGB-D camera to acquire 3D information. Kinect v2 was equipped with a color camera and a depth sensor. The depth sensor acquired depth information using the time-of-flight (ToF) [36] method. The resolutions of color and depth images were 1,920 × 1,080 pixels and 512 × 424 pixels, respectively. Using the Kinect v2 software development kit, we downsampled the resolution of color images to be equal to that of depth images and obtained the color and position information of real 3D objects from their color and depth images. We also showed point cloud data can be generated from the captured images. Point cloud data to be generated has color and position information. Color information indicates luminance values of blue, green, and red colors. Position information indicates the coordinates of each point composing the point cloud. An RGB-D camera can effectively acquire 3D information of 3D objects in deep depth at once in such a way that it will be possible to accurately capture the movement of 3D objects that change in the depth direction. Figure 2 shows an example of a color image, a depth image of 3D objects, and a point cloud image generated from color and position information. We made a depth image of 256 gradations by setting black at the near place and white at the far place. Here, we output the point cloud data generated from the color and position information of 3D objects as a PCD file and visualized these point cloud data using the Point Cloud Library 1.8.0 [37]. We calculated point-cloud-based CGHs using the acquired point cloud data. There exists a layer-based algorithm that considers an object as multiple layered images when calculating CGHs with color and depth images [12–14,22,25,38]. However, dedicated hardware was developed for point-cloud-based CGH calculation [39,40]. Because we plan to realize a real-time holographic reconstruction system for real scenes using dedicated hardware in our future work, herein, we applied point-cloud-based CGHs and acquired 3D information about 3D objects as point cloud data in this paper.

Fig. 2 Examples of a captured image using an RGB-D camera. (a) Color image acquired using a color camera. (b) Depth image acquired using a depth sensor. (c) Point cloud data generated from color and position information.

Download Full Size | PDF

2.2 Background subtraction

The point cloud data of 3D objects acquired using an RGB-D camera contained a huge amount of background information that did not change between frames. Because CGH calculation was performed per frame as in subsequent sections, inefficient processing was implemented on our conventional holographic display system. In addition, the background is unnecessary when we focused on reconstructing the movement of a person. Therefore, we used background subtraction to eliminate unchanged background information between frames and extracted moving objects with changes.

The background subtraction method can detect foreground objects by comparing with a background image that does not include the objects with a foreground image. Because values of pixels not including objects on a foreground image are the same as those on a background image, we extract portions that have changed between the images as the objects in a pixel-by-pixel subtraction approach.

In this study, we performed background subtraction by comparing the color and position information using the point cloud data of the background acquired in advance and the point cloud data of 3D objects. We define vector variables used for background subtraction as shown in Table1. The superscript of “n” means the index number of each point that makes up point cloud. The vectors $C$ and $P$ denote the color and position information, respectively. The subscripts of “b,” “in,” “out,” and “diff” are used to express the background, 3D objects, moving object with changes between frames, and the differences between the background and 3D objects, respectively. $C$ has three components: $B$ , $G$ , and $R$ , which indicate the luminance values of blue, green, and red colors. $P$ has three components, $x$ , $y$ , and $z$ , which indicate $x$ , $y$ , and $z$ coordinates, respectively. The vector $T$ denotes the threshold values for background subtraction, where the subscripts of “c” and “p” are used to express the color and position information, respectively. $T_{c}$ and $T_{p}$ have three components of $B_{T}$ , $G_{T}$ , and $R_{T}$ and $x_{T}$ , $y_{T}$ , and $z_{T}$ , respectively. $D$ represents the deviation of the position information of 3D objects, and has three components of $x_{D}$ , $y_{D}$ , and $z_{D}$ .

Table 1. Variables used for background subtraction.

View Table | View all tables in this article

Herein, we present the procedure of the background subtraction as follows. First, $C_{b}^{n}$ and $P_{b}^{n}$ , which correspond to the point cloud data of the background information as shown in Fig. 3(a), are acquired in advance. Secondly, $C_{in}^{n}$ and $P_{in}^{n}$ , which correspond to the point cloud data of 3D objects’ information shown in Fig. 3(b), are acquired. Thirdly, $C_{diff}^{n}$ and $P_{diff}^{n}$ are obtained using the following equations:

C_{diff}^{n} = [\begin{matrix} B_{diff}^{n} \\ G_{diff}^{n} \\ R_{diff}^{n} \end{matrix}] = [\begin{matrix} | B_{in}^{n} - B_{b}^{n} | \\ | G_{in}^{n} - G_{b}^{n} | \\ | R_{in}^{n} - R_{b}^{n} | \end{matrix}],

P_{diff}^{n} = [\begin{matrix} x_{diff}^{n} \\ y_{diff}^{n} \\ z_{diff}^{n} \end{matrix}] = [\begin{matrix} | x_{in}^{n} - x_{b}^{n} | \\ | y_{in}^{n} - y_{b}^{n} | \\ | z_{in}^{n} - z_{b}^{n} | \end{matrix}] .

_$D^{n}$ can be obtained as follows:

D^{n} = [\begin{matrix} x_{D}^{n} \\ y_{D}^{n} \\ z_{D}^{n} \end{matrix}] = [\begin{matrix} | x_{diff}^{n} / x_{in}^{n} | \\ | y_{diff}^{n} / y_{in}^{n} | \\ | z_{diff}^{n} / z_{in}^{n} | \end{matrix}] .

By using Eqs. (1) and (3), we can confirm whether the threshold conditions are satisfied or not. The threshold conditions can be expressed as follows:

\begin{array}{l} (B_{diff}^{n} > B_{T}) \cap (G_{diff}^{n} > G_{T}) \cap (R_{diff}^{n} > R_{T}) \\ \cap (x_{D}^{n} > x_{T}) \cap (y_{D}^{n} > y_{T}) \cap (z_{D}^{n} > z_{T}) \end{array}

Then,

C_{out}^{n}

and

P_{out}^{n}

are obtained as follows:

C_{out}^{n} = {\begin{cases} C_{in}^{n}, if the threshold conditions were satisfied \\ 0, else \end{cases},

P_{out}^{n} = {\begin{cases} P_{in}^{n}, if the threshold conditions were satisfied \\ 0, else \end{cases} .

Figure 4 shows the pseudo code of the algorithm for background subtraction. Equations (4) and (5) imply that color and position information of points of the background is set as 0. Finally, we removed all point of

C_{in}^{n} = 0

and

P_{in}^{n} = 0

, and obtained a point cloud data of the moving object. Here, the removed point cloud included outliers. The outliers indicate points whose color and position information could not be obtained correctly and was set as

C_{in}^{n} = 0

and

P_{in}^{n} = 0

. Because the outliers are not needed for CGH calculation, we removed them together with background information before CGH calculation. Figure 3 (c) shows the point cloud data of moving object after background subtraction. Although a little background information still remained due to noise resulting from external disturbances and errors when using an RGB-D camera, we can confirm that most of the background information was effectively removed.

Fig. 3 Point cloud data used for background subtraction. (a) Background, (b) 3D objects, and (c) Moving object obtained using background subtraction.

Download Full Size | PDF

Fig. 4 Pseudo code of the algorithm for background subtraction.

Download Full Size | PDF

2.3 Computer-generated hologram (CGH)

In electro-holography, information about 3D objects is recorded as CGHs by calculating propagation and interference of light on the computer. We calculated point-cloud-based CGHs using 3D information of the moving object after performing background subtraction. Figure 5 shows the schematic diagram for CGH calculation. The complex amplitude formed on the CGH plane, $U (x_{a}, y_{a})$ , can be expressed by regarding point cloud as an aggregation of point-light sources with approximation under the condition that $z_{j} ≫ (x_{a} - x_{j})$ and $z_{j} ≫ (y_{a} - y_{j})$ as follows:

U (x_{a}, y_{a}) = \sum_{j = 1}^{L} A_{j} \exp [i \frac{2 π}{λ} \frac{{(x_{a} - x_{j})}^{2} + {(y_{a} - y_{j})}^{2}}{2 z_{j}}],

where

(x_{a}, y_{a})

represent the coordinates of the object in the CGH plane,

(x_{j}, y_{j}, z_{j})

denotes the coordinates of the j-th point-light source, L denotes the number of point clouds,

A_{j}

denote the luminance value of the j-th point-light source,

i

denotes an imaginary unit, and

λ

denotes the wavelength of light wave. The argument of

U (x_{a}, y_{a})

,

ϕ (x_{a}, y_{a})

, can be expressed as

ϕ (x_{a}, y_{a}) = \arg [U (x_{a}, y_{a})] .

Using the argument obtained according to Eq. (7), we calculated kinoforms that are one of the phase-modulation-type CGHs.

Fig. 5 Schematic diagram for CGH calculation. CGH calculation can be performed by considering the recorded objects as point-light sources.

Download Full Size | PDF

The computational complexity of a CGH is $O (L N_{x} N_{y})$ when the resolution of the CGH is $N_{x} \times N_{y}$ pixels. Because it is difficult to calculate CGHs in real time with only the central processing unit (CPU), we used a GPU in this paper. GPU has more cores than CPU and can speed up the calculation of CGHs through parallelization of the task [41,42]. Figure 6 shows the pseudo code of the algorithm for CGH calculation.

Fig. 6 Pseudo code of the algorithm for CGH calculation.

Download Full Size | PDF

3. Experiment

3.1 Process flowchart and optical setup

Figure 7 shows the flowchart illustrating the procedure used to construct the system. In this paper, we used Kinect v2 as an RGB-D camera to acquire 3D information. The ranges of color information and depth information were from 0 to 255 and from 0.5 m to 4.5 m, respectively. The angles of the field in horizontal and vertical directions were 70 degrees and 60 degrees, respectively. Figure 8 shows a schematic of the geometry that can be used to capture a color image and a depth image with an RGB-D camera. We made a person stand as a moving object. In order to capture the whole body of the person, we positioned an RGB-D camera at a height of 0.85 m from the floor as shown in Fig. 8(a). Further, we set the movement range of the person within 2.5 m to 3.5 m from the RGB-D camera. 3D information of the background and 3D objects in real space were acquired using an RGB-D camera as point cloud data having color and position information. The number of object points comprising the point cloud acquired with an RGB-D camera depends on the resolution of the depth sensor, i.e., 512 × 424, and has 217,088 points. We constructed the real-time reconstruction system by storing point cloud data in memory without outputting them as a PCD file to prevent delays in performing the real-time processing.

Fig. 7 Flowchart illustrating the procedure of the constructed system. Acquiring 3D information of background is performed in advance.

Download Full Size | PDF

Fig. 8 (a) Schematic of geometry for capturing a color and a depth images using an RGB-D camera. (b) Actual photographing situation.

Download Full Size | PDF

Background subtraction was performed using the acquired 3D information of the background and 3D objects, and the point cloud of a person was extracted. We set the threshold values of $T_{c}$ and $T_{p}$ for the background subtraction as $B_{T} = 3$ , $G_{T} = 3$ , and $R_{T} = 3$ and $x_{T} = 0.01$ , $y_{T} = 0.01$ , and $z_{T} = 0.01$ , respectively. After performing background subtraction, CGH calculation was performed based on Eqs. (6) and (7). The calculated CGHs were displayed on a SLM to reconstruct the 3D images.

Figure 9 shows the optical setup used to reconstruct the full-color 3D images. We used a blue laser (450 nm; $L_{B}$ ), a green laser (532 nm; $L_{G}$ ), and a red laser (650 nm; $L_{R}$ ) as optical sources. Light emitted from the lasers was aligned using mirrors ( $M_{1}$ , $M_{2}$ , and $M_{3}$ ) and dichroic mirrors ( ${DM}_{B 1}$ and ${DM}_{R 1}$ ). The aligned light was collimated by a microscope objective ( $OL$ ) and a collimator lens ( $CL$ ). The collimated light was introduced into SLMs ( $S_{B}$ , $S_{G}$ , and $S_{R}$ ) using a half mirror ( $H$ ), dichroic mirrors ( ${DM}_{B 2}$ , ${DM}_{R 2}$ ), and a mirror ( $M_{4}$ ). Phase-modulation-type SLMs (Holoeye Photonics AG, “PLUTO”) with 1,920 × 1,080 pixels and a pixel pitch of 8.0 μm × 8.0 μm were used to display the CGHs. The gradation of phase modulation and maximum refresh rate of the SLMs were 256 and 60 Hz, respectively. We calculated CGHs by quantizing to 256 gradations according to the number of gradations of the SLM used. We used Microsoft Windows 10 Enterprise as an operating system, a CPU (Intel Core i7-7700K with 4.20 GHz) and a GPU (NVIDIA, “GeForce GTX 1080 Ti”) in processing the acquired 3D information and CGH calculations. Microsoft Visual Studio Enterprise 2015 [43] and computer unified device architecture (CUDA) 8.0 [44] were used as an integrated development environment for the PC and GPU, respectively. We used OpenGL 4.6.0 [45] to display the CGHs. The reconstruction light from each SLM was recombined with $M_{4}$ , ${DM}_{B 2}$ , and ${DM}_{R 2}$ , and introduced into relay lenses ( ${RL}_{1}$ and ${RL}_{2}$ ) by a mirror ( $M_{5}$ ). Because an aperture ( $AP$ ) was positioned at a co-focal plane of ${RL}_{1}$ and ${RL}_{2}$ , 0th-order diffraction light was removed from the reconstructed light by $AP$ . Afterward, we observed 3D images using a field lens ( $F$ ).

Fig. 9 Optical setup for reconstructing full-color 3D images using electro-holography. Herein, $AP$ represents the aperture and $CL$ represents the collimator lens. ${DM}_{B 1}$ , ${DM}_{B 2}$ , ${DM}_{R 1}$ , and ${DM}_{R 2}$ represent dichroic mirrors. $F$ represents the field lens. $H$ represents half mirror. $L_{B}$ , $L_{G}$ , and $L_{R}$ represent blue, green, and red lasers, respectively. $M_{1}$ , $M_{2}$ , $M_{3}$ , $M_{4}$ , and $M_{5}$ represent mirrors. $OL$ represents the objective lens. ${RL}_{1}$ and ${RL}_{2}$ represent relay lenses. $S_{B}$ , $S_{G}$ , and $S_{R}$ represent SLMs for blue, green, and red reconstruction, respectively.

Download Full Size | PDF

Figure 10 shows the display system used to demonstrate the behavior of the proposed system. We constructed the display system using three monitors to observe real scenes and the reconstructed 3D images in real time. CGH was calculated using a PC by utilizing the 3D information obtained using an RGB-D camera. The reconstructed 3D images and real scenes were captured using digital video cameras. Although real scenes can be displayed on the monitor using an RGB-D camera, we used another digital video camera in order to prevent delays in performing real-time processing.

Fig. 10 Display system used to demonstrate the behavior of the proposed system. The upper-left monitor displays the real scene captured at the input part. The bottom monitor displays the CGH generated at the calculation part. The upper-right monitor displays the 3D image reconstructed at the output part. Each image was captured by each digital video camera simultaneously (see Visualization 1).

Download Full Size | PDF

3.2 Results

First, we used only a green laser and one SLM to reconstruct monochromatic 3D images. Table 2 shows the measurement time per frame for each process during monochromatic reconstruction. The processing time in Table 2 is the average processing time for 500 frames. Because the number of object points of the person after background subtraction was different in each frame, we averaged the number of object points for 500 frames and obtained 13,518 points. Since the processing time per frame was 73 ms, real-time operation at about 14 frames per second (fps) was demonstrated.

Table 2. Measurement time per frame for each process during monochromatic reconstruction.

View Table | View all tables in this article

Figure 11 shows the results of the monochromatic reconstruction. The left column pictures represent real scenes captured using a digital video camera. CGH calculation was performed using point cloud data shown in the central column pictures. The right column pictures represent the reconstructed 3D images. Although the background to which the background subtraction technique has not been appropriately applied was reconstructed as noises, the movement of the person was correctly reconstructed. Further, we can easily observe the facial expression of the person and the pattern of the clothes.

Fig. 11 Results at monochromatic reconstruction. (a), (d), (g), and (j) represent 3D objects of real scenes. (b), (e), (h), and (k) represent point cloud data after background subtraction and outlier removal. (c), (f), (i), and (l) represent the reconstructed 3D images (see Visualization 2).

Download Full Size | PDF

Further, we verified the effect of background subtraction. We made a person stand 2.5 m away from an RGB-D camera and photographed the person. Table 3 shows the time taken to perform the CGH calculation using background subtraction and when background substitution is not used. Because the number of object points used for CGH calculation decreased when using background subtraction, CGH calculation using background subtraction can be performed at higher speed than that without background subtraction. Figure 12 shows the reconstructed 3D images by using and by not using background subtraction. It was confirmed that the contrast of the person was reduced due to the background by which most of the object points were occupied. In contrast, it was confirmed that the 3D image of the person could be clearly reconstructed using background subtraction. The noise was generated due to the presence of background with many object points as shown in Fig. 12(b). A previous research proposed the use of downsampling method using a voxel grid filter in order to reduce the number of object points [46]. The reconstructed 3D image shown in Fig. 12(c) is the result obtained by extracting 16,754 points from 180,326 points using a voxel grid filter. Although the noise was reduced, the person was not clearly reconstructed because the number of object points in the background was large compared with the person.

Table 3. Time taken to perform CGH calculation by using and by not using background subtraction.

View Table | View all tables in this article

Fig. 12 (a) Reconstructed 3D images using background subtraction. (b) Reconstructed 3D images not using background subtraction. (c) Reconstructed 3D images using a voxel grid filter.

Download Full Size | PDF

We expanded monochromatic reconstruction system to full-color reconstruction system using blue light and red light in addition to green light in order to reconstruct more realistic 3D images closer to actual objects. We captured real scenes with an RGB-D camera under the same conditions as Fig. 8. We reconstructed full-color 3D images using the optical setup shown in Fig. 9.

Table 4 shows the measurement time per frame for each process during full-color reconstruction. From Table 4, since the processing time per frame was 199 ms, it can be confirmed that the full-color reconstruction system was operating at about 5 fps. Further, it can be confirmed that it took time about three times longer to calculate CGHs than the monochromatic reconstruction system shown in Table 2. In full-color reconstruction, because it is necessary to calculate three CGHs per frame, it took about three times longer to calculate CGHs than monochromatic reconstruction. Because the time taken for CGH calculation occupies most of the processing time and the refresh rate of 5 fps might be insufficient to demonstrate the real-time performance, it is necessary to further speed up CGH calculation. One method for speeding up CGH calculation is to perform parallel processing with a clustered system of dedicated hardware [18,39]. It is possible to speed up the CGH calculation by using the clustered system of dedicated hardware which calculates CGH for each color. Another method can be to adopt Fresnel diffraction based on a fast Fourier transform (FFT) [12–14,22,25]. FFT-based Fresnel diffraction can dramatically reduce the computational complexity of the CGH calculation compared to that with no acceleration algorithm.

Table 4. Measurement time per frame for each process during full-color reconstruction.

View Table | View all tables in this article

Figure 13 shows the results of full-color reconstruction. The left column and right column pictures represent real scenes and reconstructed full-color 3D images, respectively. The central column pictures represent point cloud data after background subtraction. As shown in Fig. 13, it can be confirmed that the texture of a person such as clothes and skin was faithfully reconstructed.

Fig. 13 Results at full-color reconstruction. (a), (d), (g), and (j) represent 3D objects of real scenes. (b), (e), (h), and (k) represent point cloud data after background subtraction. (c), (f), (i), and (l) represent the reconstructed 3D images (see Visualization 3).

Download Full Size | PDF

4. Discussion

The number of total object points varies per frame due to the movement of the person. Particularly, it is conceivable that the number of total object points to be acquired varies according to the distance between a person and an RGB-D camera because an RGB-D camera captures images based on perspective projection. Hence, it is expected that the processing time per frame would vary depending on the distance. Because we extracted images of people using background subtraction to speed up the CGH calculation, it is also conceivable that the number of objects points to be acquired changes according to the number of people. Hence, it is also expected that the processing time per frame will change depending on the number of people. Therefore, we evaluated the relationship between the time taken to perform CGH calculation and the number of total object points by varying the distance to place people and the number of people involved. Figure 14 shows the schematic for evaluating the relationship. Under the first condition P₁, we made a person stand 2.5 m away from an RGB-D camera. Under the second condition P₂, we made a person stand 3.5 m away from an RGB-D camera. Under the third condition P₃, we made two people stand 2.5 m and 3.5 m away from an RGB-D camera, respectively. Table 5 shows the measurement time taken for the CGH calculation under each condition. From the results of P₁ and P₂ shown in Table 5, when the distance between the person and an RGB-D camera became long, the time taken for CGH calculation was shortened due to decreasing the number of total object points of the person. In addition, the time taken for CGH calculation was lengthened due to increasing the number of object points composing of the point cloud of people when the number of people increased. As a method of overcoming this problem, inter-frame subtraction proposed by Kim et al. [47] has been found to be effective. People are not always moving even if the number of people increases, therefore, more efficient processing can be performed by extracting only those who are moving. Figure 15 shows the reconstruction results under each condition. From the results shown in Fig. 15(a)-(d), high-accurate 3D information could be acquired with an RGB-D camera even if the distance between the person and an RGB-D camera changes, and 3D images need to be reconstructed with equivalent resolution.

Fig. 14 Schematic for evaluating the relationship. In P₁, we made a person stand 2.5 m away from an RGB-D camera. In P₂, we made a person stand 3.5 m away from an RGB-D camera. In P₃, we made people stand 2.5 m and 3.5 m away from an RGB-D camera, respectively.

Download Full Size | PDF

Table 5. Measurement time taken for CGH calculation under each condition.

View Table | View all tables in this article

Fig. 15 Reconstruction results under each condition. (a) Point cloud data after background subtraction and (b) reconstructed 3D image when a person was placed 2.5 m away from an RGB-D camera. (c) Point cloud data after background subtraction and (d) reconstructed 3D image when a person was placed 3.5 m away from an RGB-D camera. (e) Point cloud data after background subtraction and (f) reconstructed 3D image when people were placed 2.5 m and 3.5 m away from an RGB-D camera, respectively.

Download Full Size | PDF

5. Conclusion

In this paper, we examined real-time 3D video image reconstruction of real scenes for realizing an electro-holography based 3D display. We succeeded in performing real-time 3D video image reconstruction of real scenes using electro-holography through continuous processing of the input part into the output part using RGB-D camera and GPU. We confirmed that the monochromatic reconstruction system can faithfully reconstruct the movement of a person and operate at about 14 fps when the number of object points was about 13,518 points and the resolution of CGHs was set to 1,920 × 1,080 pixels. Further, we evaluated the construction of the full-color reconstruction system and the case where the distance between a person and an RGB-D camera changed. We confirmed that the full-color reconstruction system can faithfully reconstruct the appearance of a person in real space and operate at about 5 fps. In addition, we also confirmed that the number of object points of the person decreased when the distance between an RGB-D camera and the person is long. In the future, we will aim to further speed up CGH calculation in order to reconstruct more realistic 3D video images in both monochromatic reconstruction and full-color reconstruction.

Funding

Kenjiro Takayanagi Foundation and the Institute for Global Prominent Research, Chiba University.

References

1. P. V. Johnson, J. A. Parnell, J. Kim, C. D. Saunter, G. D. Love, and M. S. Banks, “Dynamic lens and monovision 3D displays to improve viewer comfort,” Opt. Express 24(11), 11808–11827 (2016). [CrossRef] [PubMed]

2. S. Lee, J. Park, J. Heo, B. Kang, D. Kang, H. Hwang, J. Lee, Y. Choi, K. Choi, and D. Nam, “Autostereoscopic 3D display using directional subpixel rendering,” Opt. Express 26(16), 20233 (2018). [CrossRef] [PubMed]

3. G. Lippmann, “Epreuves reversibles. Photographies integrals,” C. R. Acad. Sci. 146, 446–451 (1908).

4. T. Naemura, T. Yoshida, and H. Harashima, “3-D computer graphics based on integral photography,” Opt. Express 8(4), 255–262 (2001). [CrossRef] [PubMed]

5. R. Yang, X. Huang, S. Li, and C. Jaynes, “Toward the light field display: autostereoscopic rendering via a cluster of projectors,” IEEE Trans. Vis. Comput. Graph. 14(1), 84–96 (2008). [CrossRef] [PubMed]

6. X. Li, C. P. Chen, Y. Li, P. Zhou, X. Jiang, N. Rong, S. Liu, G. He, J. Lu, and Y. Su, “High-efficiency video-rate holographic display using quantum dot doped liquid crystal,” J. Disp. Technol. 12(4), 362–367 (2016). [CrossRef]

7. Z. Zhang, C. P. Chen, Y. Li, B. Yu, L. Zhou, and Y. Wu, “Angular multiplexing of holographic display using tunable multi-stage gratings,” Mol. Cryst. Liq. Cryst. (Phila. Pa.) 657(1), 102–106 (2017). [CrossRef]

8. P. S. Hilaire, S. A. Benton, M. Lucente, M. L. Jepsen, J. Kollin, H. Yoshikawa, and J. Underkoffler, “Electronic display system for computational holography,” Proc. SPIE 1212, 174–182 (1990). [CrossRef]

9. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express 14(2), 603–608 (2006). [CrossRef] [PubMed]

10. T. Mishina, M. Okui, and F. Okano, “Calculation of holograms from elemental images captured by integral photography,” Appl. Opt. 45(17), 4026–4036 (2006). [CrossRef] [PubMed]

11. Y. Endo, K. Wakunami, T. Shimobaba, T. Kakue, D. Arai, Y. Ichihashi, K. Yamamoto, and T. Ito, “Computer-generated hologram calculation for real scenes using a commercial portable plenoptic camera,” Opt. Commun. 356, 468–471 (2015). [CrossRef]

12. Y. Zhao, K. C. Kwon, M. U. Erdenebat, M. S. Islam, S. H. Jeon, and N. Kim, “Quality enhancement and GPU acceleration for a full-color holographic system using a relocated point cloud gridding method,” Appl. Opt. 57(15), 4253–4262 (2018). [CrossRef] [PubMed]

13. E. Y. Chang, J. Choi, S. Lee, S. Kwon, J. Yoo, M. Park, and J. Kim, “360-degree color hologram generation for real 3D objects,” Appl. Opt. 57(1), A91–A100 (2018). [CrossRef] [PubMed]

14. H. Kang, C. Ahn, S. Lee, and S. Lee, “Computer-generated 3D holograms of depth-annotated images,” Proc. SPIE 5742, 234–241 (2005). [CrossRef]

15. K. Matsushima and S. Nakahara, “Extremely high-definition full-parallax computer-generated hologram created by the polygon-based method,” Appl. Opt. 48(34), H54–H63 (2009). [CrossRef] [PubMed]

16. R. H. Y. Chen and T. D. Wilkinson, “Computer generated hologram from point cloud using graphics processor,” Appl. Opt. 48(36), 6841–6850 (2009). [CrossRef] [PubMed]

17. T. Shimobaba, N. Masuda, and T. Ito, “Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane,” Opt. Lett. 34(20), 3133–3135 (2009). [CrossRef] [PubMed]

18. Y. Ichihashi, H. Nakayama, T. Ito, N. Masuda, T. Shimobaba, A. Shiraki, and T. Sugie, “HORN-6 special-purpose clustered computing system for electroholography,” Opt. Express 17(16), 13895–13903 (2009). [CrossRef] [PubMed]

19. K. Wakunami and M. Yamaguchi, “Calculation for computer generated hologram using ray-sampling plane,” Opt. Express 19(10), 9086–9101 (2011). [CrossRef] [PubMed]

20. Y. Pan, Y. Wang, J. Liu, X. Li, and J. Jia, “Fast polygon-based method for calculating computer-generated holograms in three-dimensional display,” Appl. Opt. 52(1), A290–A299 (2013). [CrossRef] [PubMed]

21. T. Shimobaba and T. Ito, “Fast generation of computer-generated holograms using wavelet shrinkage,” Opt. Express 25(1), 77–87 (2017). [CrossRef] [PubMed]

22. Y. Zhao, Y. Piao, S. Park, K. Lee, and N. Kim, “Fast calculation method for full-color computer-generated hologram of real objects captured by depth camera,” Electronic Imaging 2018(4), 250–251 (2018).

23. S. Yamada, T. Kakue, T. Shimobaba, and T. Ito, “Interactive holographic display based on finger gestures,” Sci. Rep. 8(1), 2010 (2018). [CrossRef] [PubMed]

24. S. Igarashi, T. Nakamura, K. Matsushima, and M. Yamaguchi, “Efficient tiled calculation of over-10-gigapixel holograms using ray-wavefront conversion,” Opt. Express 26(8), 10773–10786 (2018). [CrossRef] [PubMed]

25. Y. Zhao, C. Shi, K. Kwon, Y. Piao, M. Piao, and N. Kim, “Fast calculation method of computer-generated hologram using a depth camera with point cloud gridding,” Opt. Commun. 411, 166–169 (2018). [CrossRef]

26. T. Kakue, Y. Wagatsuma, S. Yamada, T. Nishitsuji, Y. Endo, Y. Nagahama, R. Hirayama, T. Shimobaba, and T. Ito, “Review of real-time reconstruction techniques for aerial-projection holographic displays,” Opt. Eng. 57(06), 1 (2018). [CrossRef]

27. J. Hahn, H. Kim, Y. Lim, G. Park, and B. Lee, “Wide viewing angle dynamic holographic stereogram with a curved array of spatial light modulators,” Opt. Express 16(16), 12372–12386 (2008). [CrossRef] [PubMed]

28. M. Makowski, M. Sypek, I. Ducin, A. Fajst, A. Siemion, J. Suszek, and A. Kolodziejczyk, “Experimental evaluation of a full-color compact lensless holographic display,” Opt. Express 17(23), 20840–20846 (2009). [CrossRef] [PubMed]

29. K. Yamamoto, Y. Ichihashi, T. Senoh, R. Oi, and T. Kurita, “3D objects enlargement technique using an optical system and multiple SLMs for electronic holography,” Opt. Express 20(19), 21137–21144 (2012). [CrossRef] [PubMed]

30. G. Xue, J. Liu, X. Li, J. Jia, Z. Zhang, B. Hu, and Y. Wang, “Multiplexing encoding method for full-color dynamic 3D holographic display,” Opt. Express 22(15), 18473–18482 (2014). [CrossRef] [PubMed]

31. Y. Ichihashi, R. Oi, T. Senoh, K. Yamamoto, and T. Kurita, “Real-time capture and reconstruction system with multiple GPUs for a 3D live scene by a generation from 4K IP images to 8K holograms,” Opt. Express 20(19), 21645–21655 (2012). [CrossRef] [PubMed]

32. M. Yamaguchi, “Light-field and holographic three-dimensional displays [Invited],” J. Opt. Soc. Am. A 33(12), 2348–2364 (2016). [CrossRef] [PubMed]

33. K. Wakunami, H. Yamashita, and M. Yamaguchi, “Occlusion culling for computer generated hologram based on ray-wavefront conversion,” Opt. Express 21(19), 21811–21822 (2013). [CrossRef] [PubMed]

34. M. Heikklä and M. Pietikäinen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006). [CrossRef] [PubMed]

35. Microsoft Corporation, https://www.microsoft.com.

36. H. Sarbolandi, D. Lefloch, and A. Kolb, “Kinect range sensing: structured-light versus time-of-flight kinect,” Comput. Vis. Image Underst. 139, 1–20 (2015). [CrossRef]

37. Point Cloud Library, http://www.pointclouds.org/.

38. Y. Zhao, L. Cao, H. Zhang, D. Kong, and G. Jin, “Accurate calculation of computer-generated holograms using angular-spectrum layer-oriented method,” Opt. Express 23(20), 25440–25449 (2015). [CrossRef] [PubMed]

39. T. Sugie, T. Akamatsu, T. Nishitsuji, R. Hirayama, N. Masuda, H. Nakayama, Y. Ichihashi, A. Shiraki, M. Oikawa, N. Takada, Y. Endo, T. Kakue, T. Shimobaba, and T. Ito, “High-performance parallel computing for next-generation holographic imaging,” Nature Electron. 1(4), 254–259 (2018). [CrossRef]

40. Y. Yamamoto, H. Nakayama, N. Takada, T. Nishitsuji, T. Sugie, T. Kakue, T. Shimobaba, and T. Ito, “Large-scale electroholography by HORN-8 from a point-cloud model with 400,000 points,” Opt. Express 26(26), 34259–34265 (2018). [CrossRef] [PubMed]

41. H. Nakayama, N. Takada, Y. Ichihashi, S. Awazu, T. Shimobaba, N. Masuda, and T. Ito, “Real-time color electroholography using multiple graphics processing units and multiple high-definition liquid-crystal display panels,” Appl. Opt. 49(31), 5993–5996 (2010). [CrossRef]

42. H. Sato, T. Kakue, Y. Ichihashi, Y. Endo, K. Wakunami, R. Oi, K. Yamamoto, H. Nakayama, T. Shimobaba, and T. Ito, “Real-time colour hologram generation based on ray-sampling plane with multi-GPU acceleration,” Sci. Rep. 8(1), 1500 (2018). [CrossRef] [PubMed]

43. Visual Studio, https://visualstudio.microsoft.com.

44. CUDA, https://developer.nvidia.com/cuda-zone.

45. OpenGl, https://www.opengl.org/.

46. S. Hasegawa, H. Yanagihara, Y. Yamamoto, T. Kakue, T. Shimobaba, and T. Ito, “Electroholography of real scenes by RGB-D camera and the downsampling method,” OSA Continuum 2(5), 1629–1638 (2019). [CrossRef]

47. S. C. Kim and E. S. Kim, “Fast one-step calculation of holographic videos of three-dimensional scenes by combined use of baseline and depth-compensating principal fringe patterns,” Opt. Express 22(19), 22513–22527 (2014). [CrossRef] [PubMed]

Variables	Meaning
$C_{b}^{n} = (B_{b}^{n}, G_{b}^{n}, R_{b}^{n})$	Color information of background at n-th point comprising point cloud
$C_{in}^{n} = (B_{in}^{n}, G_{in}^{n}, R_{in}^{n})$	Color information of 3D objects at n-th point comprising the point cloud
$C_{out}^{n} = (B_{out}^{n}, G_{out}^{n}, R_{out}^{n})$	Color information of moving object with variations between frames at n-th point comprising the point cloud
$C_{diff}^{n} = (B_{diff}^{n}, G_{diff}^{n}, R_{diff}^{n})$	Differences in the color information between the background and 3D objects at n-th point comprising the point cloud
$P_{b}^{n} = (x_{b}^{n}, y_{b}^{n}, z_{b}^{n})$	Position information of background at n-th point comprising the point cloud
$P_{in}^{n} = (x_{in}^{n}, y_{in}^{n}, z_{in}^{n})$	Position information of 3D objects at n-th point comprising the point cloud
$P_{out}^{n} = (x_{out}^{n}, y_{out}^{n}, z_{out}^{n})$	Position information of moving object with changes between frames at n-th point comprising the point cloud
$P_{diff}^{n} = (x_{diff}^{n}, y_{diff}^{n}, z_{diff}^{n})$	Differences in the position information between the background and 3D objects at n-th point comprising of point cloud
$T_{c} = (B_{T}, G_{T}, R_{T})$	Threshold values of color information
$T_{p} = (x_{T}, y_{T}, z_{T})$	Threshold values of position information
$D^{n} = (x_{D}^{n}, y_{D}^{n}, z_{D}^{n})$	Deviation of position information for 3D objects at n-th point comprising the point cloud

Number of object points	Processing time [ms]
15,657 points (Using background subtraction)	67
180,326 points (Not using background subtraction)	762

Number of object points	Processing time [ms]
15,657 points (2.5 m away from an RGB-D camera)	67
9,130 points (3.5 m away from an RGB-D camera)	48
21,519 points (2.5 m away and 3.5 m away from an RGB-D camera, respectively)	90

Variables	Meaning
$C_{b}^{n} = (B_{b}^{n}, G_{b}^{n}, R_{b}^{n})$	Color information of background at n-th point comprising point cloud
$C_{in}^{n} = (B_{in}^{n}, G_{in}^{n}, R_{in}^{n})$	Color information of 3D objects at n-th point comprising the point cloud
$C_{out}^{n} = (B_{out}^{n}, G_{out}^{n}, R_{out}^{n})$	Color information of moving object with variations between frames at n-th point comprising the point cloud
$C_{diff}^{n} = (B_{diff}^{n}, G_{diff}^{n}, R_{diff}^{n})$	Differences in the color information between the background and 3D objects at n-th point comprising the point cloud
$P_{b}^{n} = (x_{b}^{n}, y_{b}^{n}, z_{b}^{n})$	Position information of background at n-th point comprising the point cloud
$P_{in}^{n} = (x_{in}^{n}, y_{in}^{n}, z_{in}^{n})$	Position information of 3D objects at n-th point comprising the point cloud
$P_{out}^{n} = (x_{out}^{n}, y_{out}^{n}, z_{out}^{n})$	Position information of moving object with changes between frames at n-th point comprising the point cloud
$P_{diff}^{n} = (x_{diff}^{n}, y_{diff}^{n}, z_{diff}^{n})$	Differences in the position information between the background and 3D objects at n-th point comprising of point cloud
$T_{c} = (B_{T}, G_{T}, R_{T})$	Threshold values of color information
$T_{p} = (x_{T}, y_{T}, z_{T})$	Threshold values of position information
$D^{n} = (x_{D}^{n}, y_{D}^{n}, z_{D}^{n})$	Deviation of position information for 3D objects at n-th point comprising the point cloud

Number of object points	Processing time [ms]
15,657 points (Using background subtraction)	67
180,326 points (Not using background subtraction)	762

Real-time three-dimensional video reconstruction of real scenes with deep depth using electro-holographic display system

Abstract

1. Introduction

2. Methods

2.1 Acquiring 3D information of 3D objects

2.2 Background subtraction

2.3 Computer-generated hologram (CGH)

3. Experiment

3.1 Process flowchart and optical setup

3.2 Results

4. Discussion

5. Conclusion

Funding

References

Supplementary Material (3)

Cited By

Figures (15)

Tables (5)

Equations (8)

Optics Express

Name	Description
Visualization 1	Visualization 1
Visualization 2	Visualization 2
Visualization 3	Visualization 3

Process	Processing time [ms]
Acquiring 3D information	8
Background subtraction	4
CGH calculation	60
CGH display	1
Total	73