Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Real-time three-dimensional video reconstruction of real scenes with deep depth using electro-holographic display system

Open Access Open Access

Abstract

Herein, we demonstrate a real-time, three-dimensional (3D) video-reconstruction system using electro-holography in real-world scenes with deep depth. We calculated computer-generated holograms using 3D information obtained through an RGB-D camera. We successfully reconstructed a 3D video (in real time) of a person moving in real-world space and confirmed that the proposed system operates at ~14 frames per second. In addition, we successfully reconstructed a full-color 3D video of the person. Furthermore, we varied the number of persons moving in the real-world space and evaluated the proposed system’s performance by varying the distance between the RGB-D camera and the person(s).

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Three-dimensional (3D) display is a promising technique for realizing a surgical-support system in the medical field and next-generation 3D television systems. Particularly, realizing a next-generation 3D television system capable of bidirectional interactive communication is expected to be possible if 3D information regarding 3D objects in real space can be displayed as 3D images in real time. The 3D display can be classified into three types: stereoscopic display (based on geometric optics) [1,2], light-field display (based on geometric optics) [3–5] and holographic display (based on wave optics) [6,7]. Because stereoscopic display and light-field display can record and reconstruct only the intensity of light, the phase of the light is lost when during 3D reconstruction of images, and the quality of the 3D images may deteriorate. In contrast, because the holographic display can record both the intensity and the phase of light as a hologram, such holographic display can accurately reconstruct the phase of light. Therefore, holographic display can reconstruct high-quality 3D images with deep depth. In the holographic display, we focus on electro-holography [8,9] which can reconstruct moving pictures by displaying holograms on a spatial light modulator (SLM). Because holograms displayed on the SLM can be treated as digital images, computer-generated holograms (CGHs) calculated by a numerical simulation of light on a computer are mainly used in electro-holography. To realize a 3D display using electro-holography, considerable research have been conducted on 3D information acquisition in real space [10–13], CGH calculation [14–26], and 3D image reconstruction [27–30]. While 3D image reconstruction using 3D information of real 3D objects have been reported [12–14,22,25], these studies do not perform the processing from acquiring 3D information to reconstructing 3D images continuously in real time. To realize real-time reconstruction of real scenes using electro-holography, continuously performing a set of processes from acquiring 3D information to reconstructing 3D images is necessary.

Real-time electro-holographic reconstruction of real scenes using a light-field technique [10,11] to capture 3D information of actual objects has been reported [31]. Light-field cameras can acquire 3D information of actual objects as light fields. In [31], a micro-lens array comprising many elemental lenses is used as a light-field camera. Because the light-field technique can easily realize occlusion culling, occlusion of 3D images can be correctly reconstructed when eye position is changed. When using a light-field technique, the image quality of the acquired 3D objects can remarkably deteriorate if the distance between 3D objects and the micro-lens array is long [32]. Although this problem can be solved using multiple ray-sampling planes [19,24] that correspond to the positions of the light-field camera [33], capturing the images multiple times while varying the depth is necessary to acquire 3D information of 3D objects clearly. In other words, 3D information of 3D objects with deep depth cannot be clearly acquired at once using the light-field technique. Because it takes a lot of time to effectively acquire 3D information with deep depth, it is difficult to capture dynamic scenes such as the movement of a person. This is a severe problem facing the realization of an electro-holography based next-generation 3D television system.

In this paper, we propose and demonstrate a real-time electro-holography system that does not employ the light-field technique in order to overcome these problems. Figure 1 shows the schematic diagram of the proposed system. Here, we focus on the movement of a person and reconstruct the 3D images. In the input part, we acquired 3D information of 3D objects in a real space as point cloud data having color and position information using an RGB-D camera [12–14,22,25]. In [12–14,22,25], the processing from 3D-object acquisition using an RGB-D camera to 3D-image reconstruction was not performed in real time. An RGB-D camera can be used to effectively acquire 3D information of 3D objects in deep depth at once. Further, the angle of view of the RGB-D camera depends on the camera lens, hence, it can be used to capture images of real scenes within the desired range. Therefore, an RGB-D camera can be used to capture images of 3D objects in real scenes. However, the obtained 3D objects may contain a huge amount of background information. Because the background is unnecessary in reconstructing the movement of a person, we extract 3D information of the person by removing the background information using a background subtraction technique [34]. By removing the background, the number of object points decreases and CGH calculation can be performed at a high speed. Further, we calculate the CGHs using 3D information of the person obtained by performing background subtraction. Since CGH calculation has huge computational complexity, it is necessary to speed up the CGH calculation in order to perform real-time processing. We used a graphics processing unit (GPU) to parallelize CGH calculation. In the output part, the CGHs calculated at the calculation part were displayed on the SLM and the 3D images were reconstructed by illuminating it with the reconstruction light. Additionally, we evaluated the performance of the real-time system by changing the distance between an RGB-D camera and the person as well as the number of persons involved.

 figure: Fig. 1

Fig. 1 Schematic diagram of the proposed system. Preprocessing is performed continuously from the input part to the output part.

Download Full Size | PDF

2. Methods

2.1 Acquiring 3D information of 3D objects

In this paper, we present how 3D information about real 3D objects can be acquired using an RGB-D camera. Because an RGB-D camera is equipped with not only a color camera but also a depth sensor, it can be used to simultaneously acquire both color and depth images of real 3D objects. In this study, we used Kinect for Windows v2 (Kinect v2) from Microsoft Corporation [35] as an RGB-D camera to acquire 3D information. Kinect v2 was equipped with a color camera and a depth sensor. The depth sensor acquired depth information using the time-of-flight (ToF) [36] method. The resolutions of color and depth images were 1,920 × 1,080 pixels and 512 × 424 pixels, respectively. Using the Kinect v2 software development kit, we downsampled the resolution of color images to be equal to that of depth images and obtained the color and position information of real 3D objects from their color and depth images. We also showed point cloud data can be generated from the captured images. Point cloud data to be generated has color and position information. Color information indicates luminance values of blue, green, and red colors. Position information indicates the coordinates of each point composing the point cloud. An RGB-D camera can effectively acquire 3D information of 3D objects in deep depth at once in such a way that it will be possible to accurately capture the movement of 3D objects that change in the depth direction. Figure 2 shows an example of a color image, a depth image of 3D objects, and a point cloud image generated from color and position information. We made a depth image of 256 gradations by setting black at the near place and white at the far place. Here, we output the point cloud data generated from the color and position information of 3D objects as a PCD file and visualized these point cloud data using the Point Cloud Library 1.8.0 [37]. We calculated point-cloud-based CGHs using the acquired point cloud data. There exists a layer-based algorithm that considers an object as multiple layered images when calculating CGHs with color and depth images [12–14,22,25,38]. However, dedicated hardware was developed for point-cloud-based CGH calculation [39,40]. Because we plan to realize a real-time holographic reconstruction system for real scenes using dedicated hardware in our future work, herein, we applied point-cloud-based CGHs and acquired 3D information about 3D objects as point cloud data in this paper.

 figure: Fig. 2

Fig. 2 Examples of a captured image using an RGB-D camera. (a) Color image acquired using a color camera. (b) Depth image acquired using a depth sensor. (c) Point cloud data generated from color and position information.

Download Full Size | PDF

2.2 Background subtraction

The point cloud data of 3D objects acquired using an RGB-D camera contained a huge amount of background information that did not change between frames. Because CGH calculation was performed per frame as in subsequent sections, inefficient processing was implemented on our conventional holographic display system. In addition, the background is unnecessary when we focused on reconstructing the movement of a person. Therefore, we used background subtraction to eliminate unchanged background information between frames and extracted moving objects with changes.

The background subtraction method can detect foreground objects by comparing with a background image that does not include the objects with a foreground image. Because values of pixels not including objects on a foreground image are the same as those on a background image, we extract portions that have changed between the images as the objects in a pixel-by-pixel subtraction approach.

In this study, we performed background subtraction by comparing the color and position information using the point cloud data of the background acquired in advance and the point cloud data of 3D objects. We define vector variables used for background subtraction as shown in Table1. The superscript of “n” means the index number of each point that makes up point cloud. The vectors C and P denote the color and position information, respectively. The subscripts of “b,” “in,” “out,” and “diff” are used to express the background, 3D objects, moving object with changes between frames, and the differences between the background and 3D objects, respectively. C has three components: B, G, and R, which indicate the luminance values of blue, green, and red colors. P has three components, x,y, and z, which indicate x, y, and z coordinates, respectively. The vector T denotes the threshold values for background subtraction, where the subscripts of “c” and “p” are used to express the color and position information, respectively. Tc and Tp have three components of BT, GT, and RT and xT, yT, and zT, respectively. D represents the deviation of the position information of 3D objects, and has three components of xD, yD, and zD.

Tables Icon

Table 1. Variables used for background subtraction.

Herein, we present the procedure of the background subtraction as follows. First, Cbn and Pbn, which correspond to the point cloud data of the background information as shown in Fig. 3(a), are acquired in advance. Secondly, Cinnand Pinn, which correspond to the point cloud data of 3D objects’ information shown in Fig. 3(b), are acquired. Thirdly, Cdiffn and Pdiffn are obtained using the following equations:

Cdiffn=[BdiffnGdiffnRdiffn]=[|BinnBbn||GinnGbn||RinnRbn|],
Pdiffn=[xdiffnydiffnzdiffn]=[|xinnxbn||yinnybn||zinnzbn|].
Dn can be obtained as follows:
Dn=[xDnyDnzDn]=[|xdiffn/xinn||ydiffn/yinn||zdiffn/zinn|].
By using Eqs. (1) and (3), we can confirm whether the threshold conditions are satisfied or not. The threshold conditions can be expressed as follows:
(Bdiffn>BT)(Gdiffn>GT)(Rdiffn>RT)(xDn>xT)(yDn>yT)(zDn>zT)
Then, Coutn and Poutn are obtained as follows:
Coutn={Cinn,ifthethresholdconditionsweresatisfied0,else,
Poutn={Pinn,ifthethresholdconditionsweresatisfied0,else.
Figure 4 shows the pseudo code of the algorithm for background subtraction. Equations (4) and (5) imply that color and position information of points of the background is set as 0. Finally, we removed all point of Cinn=0 and Pinn=0, and obtained a point cloud data of the moving object. Here, the removed point cloud included outliers. The outliers indicate points whose color and position information could not be obtained correctly and was set as Cinn=0 and Pinn=0. Because the outliers are not needed for CGH calculation, we removed them together with background information before CGH calculation. Figure 3 (c) shows the point cloud data of moving object after background subtraction. Although a little background information still remained due to noise resulting from external disturbances and errors when using an RGB-D camera, we can confirm that most of the background information was effectively removed.

 figure: Fig. 3

Fig. 3 Point cloud data used for background subtraction. (a) Background, (b) 3D objects, and (c) Moving object obtained using background subtraction.

Download Full Size | PDF

 figure: Fig. 4

Fig. 4 Pseudo code of the algorithm for background subtraction.

Download Full Size | PDF

2.3 Computer-generated hologram (CGH)

In electro-holography, information about 3D objects is recorded as CGHs by calculating propagation and interference of light on the computer. We calculated point-cloud-based CGHs using 3D information of the moving object after performing background subtraction. Figure 5 shows the schematic diagram for CGH calculation. The complex amplitude formed on the CGH plane, U(xa,ya), can be expressed by regarding point cloud as an aggregation of point-light sources with approximation under the condition that zj(xaxj) and zj(yayj) as follows:

U(xa,ya)=j=1LAjexp[i2πλ(xaxj)2+(yayj)22zj],
where (xa,ya) represent the coordinates of the object in the CGH plane, (xj,yj,zj) denotes the coordinates of the j-th point-light source, L denotes the number of point clouds, Aj denote the luminance value of the j-th point-light source, i denotes an imaginary unit, and λ denotes the wavelength of light wave. The argument of U(xa,ya), ϕ(xa,ya), can be expressed as
ϕ(xa,ya)=arg[U(xa,ya)].
Using the argument obtained according to Eq. (7), we calculated kinoforms that are one of the phase-modulation-type CGHs.

 figure: Fig. 5

Fig. 5 Schematic diagram for CGH calculation. CGH calculation can be performed by considering the recorded objects as point-light sources.

Download Full Size | PDF

The computational complexity of a CGH is O(LNxNy) when the resolution of the CGH is Nx×Ny pixels. Because it is difficult to calculate CGHs in real time with only the central processing unit (CPU), we used a GPU in this paper. GPU has more cores than CPU and can speed up the calculation of CGHs through parallelization of the task [41,42]. Figure 6 shows the pseudo code of the algorithm for CGH calculation.

 figure: Fig. 6

Fig. 6 Pseudo code of the algorithm for CGH calculation.

Download Full Size | PDF

3. Experiment

3.1 Process flowchart and optical setup

Figure 7 shows the flowchart illustrating the procedure used to construct the system. In this paper, we used Kinect v2 as an RGB-D camera to acquire 3D information. The ranges of color information and depth information were from 0 to 255 and from 0.5 m to 4.5 m, respectively. The angles of the field in horizontal and vertical directions were 70 degrees and 60 degrees, respectively. Figure 8 shows a schematic of the geometry that can be used to capture a color image and a depth image with an RGB-D camera. We made a person stand as a moving object. In order to capture the whole body of the person, we positioned an RGB-D camera at a height of 0.85 m from the floor as shown in Fig. 8(a). Further, we set the movement range of the person within 2.5 m to 3.5 m from the RGB-D camera. 3D information of the background and 3D objects in real space were acquired using an RGB-D camera as point cloud data having color and position information. The number of object points comprising the point cloud acquired with an RGB-D camera depends on the resolution of the depth sensor, i.e., 512 × 424, and has 217,088 points. We constructed the real-time reconstruction system by storing point cloud data in memory without outputting them as a PCD file to prevent delays in performing the real-time processing.

 figure: Fig. 7

Fig. 7 Flowchart illustrating the procedure of the constructed system. Acquiring 3D information of background is performed in advance.

Download Full Size | PDF

 figure: Fig. 8

Fig. 8 (a) Schematic of geometry for capturing a color and a depth images using an RGB-D camera. (b) Actual photographing situation.

Download Full Size | PDF

Background subtraction was performed using the acquired 3D information of the background and 3D objects, and the point cloud of a person was extracted. We set the threshold values of Tc and Tp for the background subtraction as BT=3, GT=3, and RT=3 and xT=0.01, yT=0.01, and zT=0.01, respectively. After performing background subtraction, CGH calculation was performed based on Eqs. (6) and (7). The calculated CGHs were displayed on a SLM to reconstruct the 3D images.

Figure 9 shows the optical setup used to reconstruct the full-color 3D images. We used a blue laser (450 nm; LB), a green laser (532 nm; LG), and a red laser (650 nm; LR) as optical sources. Light emitted from the lasers was aligned using mirrors (M1, M2, and M3) and dichroic mirrors (DMB1and DMR1). The aligned light was collimated by a microscope objective (OL) and a collimator lens (CL). The collimated light was introduced into SLMs (SB, SG, and SR) using a half mirror (H), dichroic mirrors (DMB2, DMR2), and a mirror (M4). Phase-modulation-type SLMs (Holoeye Photonics AG, “PLUTO”) with 1,920 × 1,080 pixels and a pixel pitch of 8.0 μm × 8.0 μm were used to display the CGHs. The gradation of phase modulation and maximum refresh rate of the SLMs were 256 and 60 Hz, respectively. We calculated CGHs by quantizing to 256 gradations according to the number of gradations of the SLM used. We used Microsoft Windows 10 Enterprise as an operating system, a CPU (Intel Core i7-7700K with 4.20 GHz) and a GPU (NVIDIA, “GeForce GTX 1080 Ti”) in processing the acquired 3D information and CGH calculations. Microsoft Visual Studio Enterprise 2015 [43] and computer unified device architecture (CUDA) 8.0 [44] were used as an integrated development environment for the PC and GPU, respectively. We used OpenGL 4.6.0 [45] to display the CGHs. The reconstruction light from each SLM was recombined with M4, DMB2, and DMR2, and introduced into relay lenses (RL1 and RL2) by a mirror (M5). Because an aperture (AP) was positioned at a co-focal plane of RL1 and RL2, 0th-order diffraction light was removed from the reconstructed light by AP. Afterward, we observed 3D images using a field lens (F).

 figure: Fig. 9

Fig. 9 Optical setup for reconstructing full-color 3D images using electro-holography. Herein, AP represents the aperture and CL represents the collimator lens. DMB1, DMB2, DMR1, and DMR2 represent dichroic mirrors. F represents the field lens. H represents half mirror. LB, LG, and LR represent blue, green, and red lasers, respectively. M1, M2, M3, M4, and M5 represent mirrors. OL represents the objective lens. RL1and RL2 represent relay lenses. SB, SG, and SR represent SLMs for blue, green, and red reconstruction, respectively.

Download Full Size | PDF

Figure 10 shows the display system used to demonstrate the behavior of the proposed system. We constructed the display system using three monitors to observe real scenes and the reconstructed 3D images in real time. CGH was calculated using a PC by utilizing the 3D information obtained using an RGB-D camera. The reconstructed 3D images and real scenes were captured using digital video cameras. Although real scenes can be displayed on the monitor using an RGB-D camera, we used another digital video camera in order to prevent delays in performing real-time processing.

 figure: Fig. 10

Fig. 10 Display system used to demonstrate the behavior of the proposed system. The upper-left monitor displays the real scene captured at the input part. The bottom monitor displays the CGH generated at the calculation part. The upper-right monitor displays the 3D image reconstructed at the output part. Each image was captured by each digital video camera simultaneously (see Visualization 1).

Download Full Size | PDF

3.2 Results

First, we used only a green laser and one SLM to reconstruct monochromatic 3D images. Table 2 shows the measurement time per frame for each process during monochromatic reconstruction. The processing time in Table 2 is the average processing time for 500 frames. Because the number of object points of the person after background subtraction was different in each frame, we averaged the number of object points for 500 frames and obtained 13,518 points. Since the processing time per frame was 73 ms, real-time operation at about 14 frames per second (fps) was demonstrated.

Tables Icon

Table 2. Measurement time per frame for each process during monochromatic reconstruction.

Figure 11 shows the results of the monochromatic reconstruction. The left column pictures represent real scenes captured using a digital video camera. CGH calculation was performed using point cloud data shown in the central column pictures. The right column pictures represent the reconstructed 3D images. Although the background to which the background subtraction technique has not been appropriately applied was reconstructed as noises, the movement of the person was correctly reconstructed. Further, we can easily observe the facial expression of the person and the pattern of the clothes.

 figure: Fig. 11

Fig. 11 Results at monochromatic reconstruction. (a), (d), (g), and (j) represent 3D objects of real scenes. (b), (e), (h), and (k) represent point cloud data after background subtraction and outlier removal. (c), (f), (i), and (l) represent the reconstructed 3D images (see Visualization 2).

Download Full Size | PDF

Further, we verified the effect of background subtraction. We made a person stand 2.5 m away from an RGB-D camera and photographed the person. Table 3 shows the time taken to perform the CGH calculation using background subtraction and when background substitution is not used. Because the number of object points used for CGH calculation decreased when using background subtraction, CGH calculation using background subtraction can be performed at higher speed than that without background subtraction. Figure 12 shows the reconstructed 3D images by using and by not using background subtraction. It was confirmed that the contrast of the person was reduced due to the background by which most of the object points were occupied. In contrast, it was confirmed that the 3D image of the person could be clearly reconstructed using background subtraction. The noise was generated due to the presence of background with many object points as shown in Fig. 12(b). A previous research proposed the use of downsampling method using a voxel grid filter in order to reduce the number of object points [46]. The reconstructed 3D image shown in Fig. 12(c) is the result obtained by extracting 16,754 points from 180,326 points using a voxel grid filter. Although the noise was reduced, the person was not clearly reconstructed because the number of object points in the background was large compared with the person.

Tables Icon

Table 3. Time taken to perform CGH calculation by using and by not using background subtraction.

 figure: Fig. 12

Fig. 12 (a) Reconstructed 3D images using background subtraction. (b) Reconstructed 3D images not using background subtraction. (c) Reconstructed 3D images using a voxel grid filter.

Download Full Size | PDF

We expanded monochromatic reconstruction system to full-color reconstruction system using blue light and red light in addition to green light in order to reconstruct more realistic 3D images closer to actual objects. We captured real scenes with an RGB-D camera under the same conditions as Fig. 8. We reconstructed full-color 3D images using the optical setup shown in Fig. 9.

Table 4 shows the measurement time per frame for each process during full-color reconstruction. From Table 4, since the processing time per frame was 199 ms, it can be confirmed that the full-color reconstruction system was operating at about 5 fps. Further, it can be confirmed that it took time about three times longer to calculate CGHs than the monochromatic reconstruction system shown in Table 2. In full-color reconstruction, because it is necessary to calculate three CGHs per frame, it took about three times longer to calculate CGHs than monochromatic reconstruction. Because the time taken for CGH calculation occupies most of the processing time and the refresh rate of 5 fps might be insufficient to demonstrate the real-time performance, it is necessary to further speed up CGH calculation. One method for speeding up CGH calculation is to perform parallel processing with a clustered system of dedicated hardware [18,39]. It is possible to speed up the CGH calculation by using the clustered system of dedicated hardware which calculates CGH for each color. Another method can be to adopt Fresnel diffraction based on a fast Fourier transform (FFT) [12–14,22,25]. FFT-based Fresnel diffraction can dramatically reduce the computational complexity of the CGH calculation compared to that with no acceleration algorithm.

Tables Icon

Table 4. Measurement time per frame for each process during full-color reconstruction.

Figure 13 shows the results of full-color reconstruction. The left column and right column pictures represent real scenes and reconstructed full-color 3D images, respectively. The central column pictures represent point cloud data after background subtraction. As shown in Fig. 13, it can be confirmed that the texture of a person such as clothes and skin was faithfully reconstructed.

 figure: Fig. 13

Fig. 13 Results at full-color reconstruction. (a), (d), (g), and (j) represent 3D objects of real scenes. (b), (e), (h), and (k) represent point cloud data after background subtraction. (c), (f), (i), and (l) represent the reconstructed 3D images (see Visualization 3).

Download Full Size | PDF

4. Discussion

The number of total object points varies per frame due to the movement of the person. Particularly, it is conceivable that the number of total object points to be acquired varies according to the distance between a person and an RGB-D camera because an RGB-D camera captures images based on perspective projection. Hence, it is expected that the processing time per frame would vary depending on the distance. Because we extracted images of people using background subtraction to speed up the CGH calculation, it is also conceivable that the number of objects points to be acquired changes according to the number of people. Hence, it is also expected that the processing time per frame will change depending on the number of people. Therefore, we evaluated the relationship between the time taken to perform CGH calculation and the number of total object points by varying the distance to place people and the number of people involved. Figure 14 shows the schematic for evaluating the relationship. Under the first condition P1, we made a person stand 2.5 m away from an RGB-D camera. Under the second condition P2, we made a person stand 3.5 m away from an RGB-D camera. Under the third condition P3, we made two people stand 2.5 m and 3.5 m away from an RGB-D camera, respectively. Table 5 shows the measurement time taken for the CGH calculation under each condition. From the results of P1 and P2 shown in Table 5, when the distance between the person and an RGB-D camera became long, the time taken for CGH calculation was shortened due to decreasing the number of total object points of the person. In addition, the time taken for CGH calculation was lengthened due to increasing the number of object points composing of the point cloud of people when the number of people increased. As a method of overcoming this problem, inter-frame subtraction proposed by Kim et al. [47] has been found to be effective. People are not always moving even if the number of people increases, therefore, more efficient processing can be performed by extracting only those who are moving. Figure 15 shows the reconstruction results under each condition. From the results shown in Fig. 15(a)-(d), high-accurate 3D information could be acquired with an RGB-D camera even if the distance between the person and an RGB-D camera changes, and 3D images need to be reconstructed with equivalent resolution.

 figure: Fig. 14

Fig. 14 Schematic for evaluating the relationship. In P1, we made a person stand 2.5 m away from an RGB-D camera. In P2, we made a person stand 3.5 m away from an RGB-D camera. In P3, we made people stand 2.5 m and 3.5 m away from an RGB-D camera, respectively.

Download Full Size | PDF

Tables Icon

Table 5. Measurement time taken for CGH calculation under each condition.

 figure: Fig. 15

Fig. 15 Reconstruction results under each condition. (a) Point cloud data after background subtraction and (b) reconstructed 3D image when a person was placed 2.5 m away from an RGB-D camera. (c) Point cloud data after background subtraction and (d) reconstructed 3D image when a person was placed 3.5 m away from an RGB-D camera. (e) Point cloud data after background subtraction and (f) reconstructed 3D image when people were placed 2.5 m and 3.5 m away from an RGB-D camera, respectively.

Download Full Size | PDF

5. Conclusion

In this paper, we examined real-time 3D video image reconstruction of real scenes for realizing an electro-holography based 3D display. We succeeded in performing real-time 3D video image reconstruction of real scenes using electro-holography through continuous processing of the input part into the output part using RGB-D camera and GPU. We confirmed that the monochromatic reconstruction system can faithfully reconstruct the movement of a person and operate at about 14 fps when the number of object points was about 13,518 points and the resolution of CGHs was set to 1,920 × 1,080 pixels. Further, we evaluated the construction of the full-color reconstruction system and the case where the distance between a person and an RGB-D camera changed. We confirmed that the full-color reconstruction system can faithfully reconstruct the appearance of a person in real space and operate at about 5 fps. In addition, we also confirmed that the number of object points of the person decreased when the distance between an RGB-D camera and the person is long. In the future, we will aim to further speed up CGH calculation in order to reconstruct more realistic 3D video images in both monochromatic reconstruction and full-color reconstruction.

Funding

Kenjiro Takayanagi Foundation and the Institute for Global Prominent Research, Chiba University.

References

1. P. V. Johnson, J. A. Parnell, J. Kim, C. D. Saunter, G. D. Love, and M. S. Banks, “Dynamic lens and monovision 3D displays to improve viewer comfort,” Opt. Express 24(11), 11808–11827 (2016). [CrossRef]   [PubMed]  

2. S. Lee, J. Park, J. Heo, B. Kang, D. Kang, H. Hwang, J. Lee, Y. Choi, K. Choi, and D. Nam, “Autostereoscopic 3D display using directional subpixel rendering,” Opt. Express 26(16), 20233 (2018). [CrossRef]   [PubMed]  

3. G. Lippmann, “Epreuves reversibles. Photographies integrals,” C. R. Acad. Sci. 146, 446–451 (1908).

4. T. Naemura, T. Yoshida, and H. Harashima, “3-D computer graphics based on integral photography,” Opt. Express 8(4), 255–262 (2001). [CrossRef]   [PubMed]  

5. R. Yang, X. Huang, S. Li, and C. Jaynes, “Toward the light field display: autostereoscopic rendering via a cluster of projectors,” IEEE Trans. Vis. Comput. Graph. 14(1), 84–96 (2008). [CrossRef]   [PubMed]  

6. X. Li, C. P. Chen, Y. Li, P. Zhou, X. Jiang, N. Rong, S. Liu, G. He, J. Lu, and Y. Su, “High-efficiency video-rate holographic display using quantum dot doped liquid crystal,” J. Disp. Technol. 12(4), 362–367 (2016). [CrossRef]  

7. Z. Zhang, C. P. Chen, Y. Li, B. Yu, L. Zhou, and Y. Wu, “Angular multiplexing of holographic display using tunable multi-stage gratings,” Mol. Cryst. Liq. Cryst. (Phila. Pa.) 657(1), 102–106 (2017). [CrossRef]  

8. P. S. Hilaire, S. A. Benton, M. Lucente, M. L. Jepsen, J. Kollin, H. Yoshikawa, and J. Underkoffler, “Electronic display system for computational holography,” Proc. SPIE 1212, 174–182 (1990). [CrossRef]  

9. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express 14(2), 603–608 (2006). [CrossRef]   [PubMed]  

10. T. Mishina, M. Okui, and F. Okano, “Calculation of holograms from elemental images captured by integral photography,” Appl. Opt. 45(17), 4026–4036 (2006). [CrossRef]   [PubMed]  

11. Y. Endo, K. Wakunami, T. Shimobaba, T. Kakue, D. Arai, Y. Ichihashi, K. Yamamoto, and T. Ito, “Computer-generated hologram calculation for real scenes using a commercial portable plenoptic camera,” Opt. Commun. 356, 468–471 (2015). [CrossRef]  

12. Y. Zhao, K. C. Kwon, M. U. Erdenebat, M. S. Islam, S. H. Jeon, and N. Kim, “Quality enhancement and GPU acceleration for a full-color holographic system using a relocated point cloud gridding method,” Appl. Opt. 57(15), 4253–4262 (2018). [CrossRef]   [PubMed]  

13. E. Y. Chang, J. Choi, S. Lee, S. Kwon, J. Yoo, M. Park, and J. Kim, “360-degree color hologram generation for real 3D objects,” Appl. Opt. 57(1), A91–A100 (2018). [CrossRef]   [PubMed]  

14. H. Kang, C. Ahn, S. Lee, and S. Lee, “Computer-generated 3D holograms of depth-annotated images,” Proc. SPIE 5742, 234–241 (2005). [CrossRef]  

15. K. Matsushima and S. Nakahara, “Extremely high-definition full-parallax computer-generated hologram created by the polygon-based method,” Appl. Opt. 48(34), H54–H63 (2009). [CrossRef]   [PubMed]  

16. R. H. Y. Chen and T. D. Wilkinson, “Computer generated hologram from point cloud using graphics processor,” Appl. Opt. 48(36), 6841–6850 (2009). [CrossRef]   [PubMed]  

17. T. Shimobaba, N. Masuda, and T. Ito, “Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane,” Opt. Lett. 34(20), 3133–3135 (2009). [CrossRef]   [PubMed]  

18. Y. Ichihashi, H. Nakayama, T. Ito, N. Masuda, T. Shimobaba, A. Shiraki, and T. Sugie, “HORN-6 special-purpose clustered computing system for electroholography,” Opt. Express 17(16), 13895–13903 (2009). [CrossRef]   [PubMed]  

19. K. Wakunami and M. Yamaguchi, “Calculation for computer generated hologram using ray-sampling plane,” Opt. Express 19(10), 9086–9101 (2011). [CrossRef]   [PubMed]  

20. Y. Pan, Y. Wang, J. Liu, X. Li, and J. Jia, “Fast polygon-based method for calculating computer-generated holograms in three-dimensional display,” Appl. Opt. 52(1), A290–A299 (2013). [CrossRef]   [PubMed]  

21. T. Shimobaba and T. Ito, “Fast generation of computer-generated holograms using wavelet shrinkage,” Opt. Express 25(1), 77–87 (2017). [CrossRef]   [PubMed]  

22. Y. Zhao, Y. Piao, S. Park, K. Lee, and N. Kim, “Fast calculation method for full-color computer-generated hologram of real objects captured by depth camera,” Electronic Imaging 2018(4), 250–251 (2018).

23. S. Yamada, T. Kakue, T. Shimobaba, and T. Ito, “Interactive holographic display based on finger gestures,” Sci. Rep. 8(1), 2010 (2018). [CrossRef]   [PubMed]  

24. S. Igarashi, T. Nakamura, K. Matsushima, and M. Yamaguchi, “Efficient tiled calculation of over-10-gigapixel holograms using ray-wavefront conversion,” Opt. Express 26(8), 10773–10786 (2018). [CrossRef]   [PubMed]  

25. Y. Zhao, C. Shi, K. Kwon, Y. Piao, M. Piao, and N. Kim, “Fast calculation method of computer-generated hologram using a depth camera with point cloud gridding,” Opt. Commun. 411, 166–169 (2018). [CrossRef]  

26. T. Kakue, Y. Wagatsuma, S. Yamada, T. Nishitsuji, Y. Endo, Y. Nagahama, R. Hirayama, T. Shimobaba, and T. Ito, “Review of real-time reconstruction techniques for aerial-projection holographic displays,” Opt. Eng. 57(06), 1 (2018). [CrossRef]  

27. J. Hahn, H. Kim, Y. Lim, G. Park, and B. Lee, “Wide viewing angle dynamic holographic stereogram with a curved array of spatial light modulators,” Opt. Express 16(16), 12372–12386 (2008). [CrossRef]   [PubMed]  

28. M. Makowski, M. Sypek, I. Ducin, A. Fajst, A. Siemion, J. Suszek, and A. Kolodziejczyk, “Experimental evaluation of a full-color compact lensless holographic display,” Opt. Express 17(23), 20840–20846 (2009). [CrossRef]   [PubMed]  

29. K. Yamamoto, Y. Ichihashi, T. Senoh, R. Oi, and T. Kurita, “3D objects enlargement technique using an optical system and multiple SLMs for electronic holography,” Opt. Express 20(19), 21137–21144 (2012). [CrossRef]   [PubMed]  

30. G. Xue, J. Liu, X. Li, J. Jia, Z. Zhang, B. Hu, and Y. Wang, “Multiplexing encoding method for full-color dynamic 3D holographic display,” Opt. Express 22(15), 18473–18482 (2014). [CrossRef]   [PubMed]  

31. Y. Ichihashi, R. Oi, T. Senoh, K. Yamamoto, and T. Kurita, “Real-time capture and reconstruction system with multiple GPUs for a 3D live scene by a generation from 4K IP images to 8K holograms,” Opt. Express 20(19), 21645–21655 (2012). [CrossRef]   [PubMed]  

32. M. Yamaguchi, “Light-field and holographic three-dimensional displays [Invited],” J. Opt. Soc. Am. A 33(12), 2348–2364 (2016). [CrossRef]   [PubMed]  

33. K. Wakunami, H. Yamashita, and M. Yamaguchi, “Occlusion culling for computer generated hologram based on ray-wavefront conversion,” Opt. Express 21(19), 21811–21822 (2013). [CrossRef]   [PubMed]  

34. M. Heikklä and M. Pietikäinen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006). [CrossRef]   [PubMed]  

35. Microsoft Corporation, https://www.microsoft.com.

36. H. Sarbolandi, D. Lefloch, and A. Kolb, “Kinect range sensing: structured-light versus time-of-flight kinect,” Comput. Vis. Image Underst. 139, 1–20 (2015). [CrossRef]  

37. Point Cloud Library, http://www.pointclouds.org/.

38. Y. Zhao, L. Cao, H. Zhang, D. Kong, and G. Jin, “Accurate calculation of computer-generated holograms using angular-spectrum layer-oriented method,” Opt. Express 23(20), 25440–25449 (2015). [CrossRef]   [PubMed]  

39. T. Sugie, T. Akamatsu, T. Nishitsuji, R. Hirayama, N. Masuda, H. Nakayama, Y. Ichihashi, A. Shiraki, M. Oikawa, N. Takada, Y. Endo, T. Kakue, T. Shimobaba, and T. Ito, “High-performance parallel computing for next-generation holographic imaging,” Nature Electron. 1(4), 254–259 (2018). [CrossRef]  

40. Y. Yamamoto, H. Nakayama, N. Takada, T. Nishitsuji, T. Sugie, T. Kakue, T. Shimobaba, and T. Ito, “Large-scale electroholography by HORN-8 from a point-cloud model with 400,000 points,” Opt. Express 26(26), 34259–34265 (2018). [CrossRef]   [PubMed]  

41. H. Nakayama, N. Takada, Y. Ichihashi, S. Awazu, T. Shimobaba, N. Masuda, and T. Ito, “Real-time color electroholography using multiple graphics processing units and multiple high-definition liquid-crystal display panels,” Appl. Opt. 49(31), 5993–5996 (2010). [CrossRef]  

42. H. Sato, T. Kakue, Y. Ichihashi, Y. Endo, K. Wakunami, R. Oi, K. Yamamoto, H. Nakayama, T. Shimobaba, and T. Ito, “Real-time colour hologram generation based on ray-sampling plane with multi-GPU acceleration,” Sci. Rep. 8(1), 1500 (2018). [CrossRef]   [PubMed]  

43. Visual Studio, https://visualstudio.microsoft.com.

44. CUDA, https://developer.nvidia.com/cuda-zone.

45. OpenGl, https://www.opengl.org/.

46. S. Hasegawa, H. Yanagihara, Y. Yamamoto, T. Kakue, T. Shimobaba, and T. Ito, “Electroholography of real scenes by RGB-D camera and the downsampling method,” OSA Continuum 2(5), 1629–1638 (2019). [CrossRef]  

47. S. C. Kim and E. S. Kim, “Fast one-step calculation of holographic videos of three-dimensional scenes by combined use of baseline and depth-compensating principal fringe patterns,” Opt. Express 22(19), 22513–22527 (2014). [CrossRef]   [PubMed]  

Supplementary Material (3)

NameDescription
Visualization 1       Visualization 1
Visualization 2       Visualization 2
Visualization 3       Visualization 3

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (15)

Fig. 1
Fig. 1 Schematic diagram of the proposed system. Preprocessing is performed continuously from the input part to the output part.
Fig. 2
Fig. 2 Examples of a captured image using an RGB-D camera. (a) Color image acquired using a color camera. (b) Depth image acquired using a depth sensor. (c) Point cloud data generated from color and position information.
Fig. 3
Fig. 3 Point cloud data used for background subtraction. (a) Background, (b) 3D objects, and (c) Moving object obtained using background subtraction.
Fig. 4
Fig. 4 Pseudo code of the algorithm for background subtraction.
Fig. 5
Fig. 5 Schematic diagram for CGH calculation. CGH calculation can be performed by considering the recorded objects as point-light sources.
Fig. 6
Fig. 6 Pseudo code of the algorithm for CGH calculation.
Fig. 7
Fig. 7 Flowchart illustrating the procedure of the constructed system. Acquiring 3D information of background is performed in advance.
Fig. 8
Fig. 8 (a) Schematic of geometry for capturing a color and a depth images using an RGB-D camera. (b) Actual photographing situation.
Fig. 9
Fig. 9 Optical setup for reconstructing full-color 3D images using electro-holography. Herein, AP represents the aperture and CL represents the collimator lens. DM B1 , DM B2 , DM R1 , and DM R2 represent dichroic mirrors. F represents the field lens. H represents half mirror. L B , L G , and L R represent blue, green, and red lasers, respectively. M 1 , M 2 , M 3 , M 4 , and M 5 represent mirrors. OL represents the objective lens. RL 1 and RL 2 represent relay lenses. S B , S G , and S R represent SLMs for blue, green, and red reconstruction, respectively.
Fig. 10
Fig. 10 Display system used to demonstrate the behavior of the proposed system. The upper-left monitor displays the real scene captured at the input part. The bottom monitor displays the CGH generated at the calculation part. The upper-right monitor displays the 3D image reconstructed at the output part. Each image was captured by each digital video camera simultaneously (see Visualization 1).
Fig. 11
Fig. 11 Results at monochromatic reconstruction. (a), (d), (g), and (j) represent 3D objects of real scenes. (b), (e), (h), and (k) represent point cloud data after background subtraction and outlier removal. (c), (f), (i), and (l) represent the reconstructed 3D images (see Visualization 2).
Fig. 12
Fig. 12 (a) Reconstructed 3D images using background subtraction. (b) Reconstructed 3D images not using background subtraction. (c) Reconstructed 3D images using a voxel grid filter.
Fig. 13
Fig. 13 Results at full-color reconstruction. (a), (d), (g), and (j) represent 3D objects of real scenes. (b), (e), (h), and (k) represent point cloud data after background subtraction. (c), (f), (i), and (l) represent the reconstructed 3D images (see Visualization 3).
Fig. 14
Fig. 14 Schematic for evaluating the relationship. In P1, we made a person stand 2.5 m away from an RGB-D camera. In P2, we made a person stand 3.5 m away from an RGB-D camera. In P3, we made people stand 2.5 m and 3.5 m away from an RGB-D camera, respectively.
Fig. 15
Fig. 15 Reconstruction results under each condition. (a) Point cloud data after background subtraction and (b) reconstructed 3D image when a person was placed 2.5 m away from an RGB-D camera. (c) Point cloud data after background subtraction and (d) reconstructed 3D image when a person was placed 3.5 m away from an RGB-D camera. (e) Point cloud data after background subtraction and (f) reconstructed 3D image when people were placed 2.5 m and 3.5 m away from an RGB-D camera, respectively.

Tables (5)

Tables Icon

Table 1 Variables used for background subtraction.

Tables Icon

Table 2 Measurement time per frame for each process during monochromatic reconstruction.

Tables Icon

Table 3 Time taken to perform CGH calculation by using and by not using background subtraction.

Tables Icon

Table 4 Measurement time per frame for each process during full-color reconstruction.

Tables Icon

Table 5 Measurement time taken for CGH calculation under each condition.

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

C diff n =[ B diff n G diff n R diff n ]=[ | B in n B b n | | G in n G b n | | R in n R b n | ],
P diff n =[ x diff n y diff n z diff n ]=[ | x in n x b n | | y in n y b n | | z in n z b n | ].
D n =[ x D n y D n z D n ]=[ | x diff n / x in n | | y diff n / y in n | | z diff n / z in n | ].
( B diff n > B T )( G diff n > G T )( R diff n > R T ) ( x D n > x T )( y D n > y T )( z D n > z T )
C out n ={ C in n ,ifthethresholdconditionsweresatisfied 0,else ,
P out n ={ P in n ,ifthethresholdconditionsweresatisfied 0,else .
U( x a , y a )= j=1 L A j exp[ i 2π λ ( x a x j ) 2 + ( y a y j ) 2 2 z j ] ,
ϕ( x a , y a )=arg[ U( x a , y a ) ].
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.