Cell-based hardware architecture for full-parallel generation algorithm of digital holograms

Young-Ho Seo; Hyun-Jun Choi; Ji-Sang Yoo; Dong-Wook Kim

doi:10.1364/OE.19.008750

1. Introduction

Holograms have been recognized by most image researchers as the final goal of perfect 3-dimensional (3D) image reconstruction, because they are exactly the same image, as the original object in free space. Thus, many researchers have working on this, since their invention by Gabor in 1948.

Electronic holograms have been researched since the 1960s. Computational holography is one of these forms of electronic holograms, in which the interference pattern (fringe pattern) is calculated numerically to acquire a hologram. The hologram is uploaded to a spatial light modulator (SLM) and the reference light is exposed to reconstruct the image [1–3]. The numerically calculated hologram is termed a computer-generated hologram (CGH). The inputs for a CGH, the depth information and the light intensity of each object point, are in digital forms. In addition, the output, the resulting hologram, consists of pixels.

The inherent and critical problem for the CGH method is the enormous amount of calculations. M×N×P×Q times of calculation to calculate a fringe pattern in a hologram pixel by one light source and accumulations for all the light sources must be performed to make a hologram with the resolution of M×N [pixel²] for an object of P×Q [pixel²]. Thus, the two major issues for CGH are how to simplify the calculation equation, and how to increase the calculation speed with minimal loss of reconstructed object image quality [4,5]. The research group of [4], who first started CGH research, tried to speed up the generation by including only the horizontal parallax (HPO) using a look-up table (LUT) method and a parallel supercomputer. Thus, they could generate one hologram frame per second, in which the number of light sources of the object was 10,000 and the hologram resolution was 6M [pixel²]. The CGH equation was approximated with the Taylor series expansion for the square root calculation [5]. We should pay attention to this research from the viewpoint that it has been the basis in recent CGH research, even though this paper could not contribute much to the speed-up.

Hardware implementations based on FPGA [6–9] and in a graphics processing unit (GPU) [10–14] have been developed subsequently, because implementing the approximated equation in software could not reach the desired speed-up. [6] modified the equation of [5] to propose a recursive equation to calculate a row of a hologram for a light source and implemented it with an FPGA. This research group continued to upgrade their implementations to develop the HORN-5 system, a printed circuit board (PCB) with four Xilink FPGAs to calculate Fresnel transform CGH [7]. It arranged the same number of calculation cells, as the number of pixels in a column of a hologram and takes 0.0679 sec/hologram with a 166 MHz clock. In addition, this group proposed a special-purpose computer system called HORN-6 to dedicate it to CGH calculation [8]. In [9] a CGH processor with 100% pipeline structure was proposed, in which the structures of the basic cell for Fresnel transforms, kernel, and the processor were included.

Reference [12] proposed an algorithm referring LUTs and implemented it in an Nvidia GPU for implementation with a GPU. It takes 0.3 sec. to calculate a hologram frame of 1,024×768 [pixel²] with 1,000 object light sources. Reference [13] used a 3-dimensional mesh-model to calculate CGH and implemented it in a GPU. Reference [14] used an AMD GPU and optimized GPU programming that resulted in about 0.03 sec/hologram of HD resolution with 1,000 light sources, too small to yield reasonable image quality.

If we consider an SLM with pixel pitch of even 10μm for image reconstruction, a hologram with HD resolution can make the image of only about 2×1 [cm²] size. Therefore, a much higher calculation speed is required to service a real-time hologram video of moderate image size. We have proposed a high-performance CGH processor, by re-arranging the CGH equation and adopting a pipelining scheme in [9]. In this paper we modify the CGH equation used in previous works including [9] to maximize parallel computation. With the resulting equation, the hardware is implemented to speed up the computation. In addition, we propose a new hardware structure to accomplish this modified equation.

In Section 2, we show a conceptual explanation of CGH and the recursive equation for high-speed computation. It is modified for maximum parallel computation. The corresponding hardware structure is proposed in Section 3. Then, the hardware is implemented with FPGA from Altera in Section 4. The paper is concluded in Section 5.

2. Computer-generated hologram (CGH)

This section briefly explains the CGH equations proposed for high speed or hardware implementation so far.

2.1. The basic CGH equation

CGH generation calculates interference between two light waves, the object wave, and the reference wave. This paper focuses on the phase hologram, which generates a hologram with the phases of the object wave components. The proof of this method can be found in [3].

The fundamental equation for the phase hologram used in the CGH research is as Eq. (1).

I_{α j} = A_{j} cos (k \sqrt{{(p x_{α} - p x_{j})}^{2} + {(p y_{α} - p y_{j})}^{2} + z_{j}^{2}} + Φ_{α} + Φ_{j})

where, I_αj is the intensity influence on the hologram pixel (x_α,y_α) from the object pixel (x_j, y_j, z_j), A_j is the light intensity of the object pixel, λ is the wave length, p is the pixel pitch (here, the pixel pitches of object plane and the hologram are assumed to be the same), and Φ_α and Φ_j are the initial phases of the object wave and the reference wave, respectively. A hologram pixel is completely calculated by summing all the influences from all the object pixels. However, this calculation scheme has a drawback in that changes of the pixel positions increase the complexity of calculation, because different object pixels have different z-axis values and/or x and y values. Therefore, it is preferable in CGH calculation that all the influences by one object pixel on all the hologram pixels are calculated, then moved to the next object pixel. The change of the z-axis value is minimized in this scheme.

2.2. Recursive equation for CGH calculation

If examining Eq. (1), one can recognize that for a row of hologram by a source, only the x-axis value changes. Thus, it is more efficient to consider a row of a hologram together, as shown in Fig. 1 [6–9]. If z_j ≫p|x_α – x_j| and z_j ≫p|y_α – y_j|, Eq. (1) can be approximated into Eq. (2) by expanding the square root of Eq. (1) in series [6].

I_{α j} ≅ A_{j} cos [2 π {\frac{z_{j}}{λ} + \frac{p^{2}}{2 λ z_{j}} (x_{α j}^{2} + y_{α j}^{2})} + Φ_{α} + Φ_{j}] = A_{j} cos [θ (x_{α j}, y_{α j}, z_{j}) + Φ_{α} + Φ_{j}]

where, x_αj=x_α – x_j and y_αj=y_α – y_j. In Eq. (2) the only quantity (or phase information) changing along a row of hologram pixels is θ(x_αj, y_αj, z_j)[6]. The phase θ_H at (x_α,y_α) on the hologram plane that is formed by the light source at (x_j, y_j) in the virtual object is expressed as Eq. (3) and the separated phases, θ_XY and θ_Z are defined in Eq. (4) and (5), respectively.

θ_{H} (x_{α j}, y_{α j}, z_{j}) = 2 π [\frac{z_{j}}{λ} + \frac{p^{2}}{2 λ z_{j}} (x_{α j}^{2} + y_{α j}^{2})] = 2 π (θ_{Z} + θ_{XY})

θ_{Z} (z_{j}) = \frac{z_{j}}{λ}

θ_{XY} (x_{α j}, y_{α j}, z_{α j}) = \frac{p^{2}}{2 λ z_{j}} (x_{α j}^{2} + y_{α j}^{2})

Fig. 1 Calculating a row of hologram components by one object light source.

Download Full Size | PDF

Now, let us consider the hologram pixel at x = p(x_α + d)θ_XY in the same row. It should be as Eq. (6).

\begin{matrix} θ_{XY} (x_{α j} + d, y_{α j}, z_{α j}) & = \frac{p^{2}}{2 λ z_{j}} (x_{α j}^{2} + y_{α j}^{2}) + \frac{p^{2}}{2 λ z_{j}} (2 d x_{α j}^{2} + d^{2}) \\ = θ_{XY} (x_{α j}, y_{α j}, z_{j}) + \frac{p^{2}}{2 λ z_{j}} (2 d x_{α j}^{2} + d^{2}) \end{matrix}

This means that once θ_XY (x_αj, y_αj, z_j) is calculated, θ_XY of the pixel at x > px_α in the same row can be obtained by adding the rightmost term. Of course, θ_Z’s of all the pixels in the row are the same. Therefore, once the θ_XY for the leftmost pixel in a row and θ_Z are calculated, all the other pixels in the row can be calculated recursively with Eq. (6), as Eq. (7) and (8).

I_{α j, d = 0} = A_{j} cos [2 π {θ_{Z} + θ_{XY, d = 0}} + Φ_{α} + Φ_{j}]

I_{α j, d \geq 1} = A_{j} cos [2 π {θ_{Z} + θ_{XY, d = 0} + Γ_{d}} + Φ_{α} + Φ_{j}]

θ_{XY, d = 0} (x = 0) = \frac{p^{2}}{2 λ z_{j}} (x_{α j}^{2} + y_{α j}^{2})

Γ_{d} = Γ_{d - 1} + Γ_{1} + (d - 1) Δ, (d \geq 1)

Γ_{1} (x_{α j}, z_{j}) = \frac{p^{2}}{2 λ z_{j}} (2 x_{α j} + 1)

Δ = \frac{p^{2}}{λ z_{j}}

3. Parallel CGH calculation and its hardware architecture

This section proposes a modified CGH equation that can be performed fully in parallel when it is implemented in hardware. Then, its hardware design is proposed with its internal and external pipelining scheme. In addition, this section includes some precision approximation with the minimal quality degradation.

3.1. CGH equation in full parallel

Equation (10) can be expanded as d increases, which are shown in Eq. (13).

\begin{array}{l} Γ_{1} (x_{α j}, z_{j}) & = & \frac{p^{2}}{2 λ z_{j}} (2 x_{α j} + 1) = 1 Γ_{1} + 0 Δ \\ Γ_{2} (x_{α j}, z_{j}) & = & \frac{p^{2}}{2 λ z_{j}} (4 x_{α j} + 4) = 2 Γ_{1} + 1 Δ \\ Γ_{3} (x_{α j}, z_{j}) & = & \frac{p^{2}}{2 λ z_{j}} (6 x_{α j} + 9) = 3 Γ_{1} + 3 Δ \\ Γ_{4} (x_{α j}, z_{j}) & = & \frac{p^{2}}{2 λ z_{j}} (8 x_{α j} + 16) = 4 Γ_{1} + 6 Δ \\ \dots \end{array}

These values form a progressive sequence of differences. The general term of this sequence is

Γ_{d} (x_{α j}, z_{j}) = \frac{p^{2}}{2 λ z_{j}} (2 d x_{α j} + d^{2}) = d [Γ_{1} + \frac{1}{2} (d - 1) Δ] (d \geq 1)

Now, let us compare this equation to Eq. (10). In Eq. (10), each pixel value in a row of a hologram affected by a light source must be calculated serially one-by-one due to its recursive property, although each row can be performed in parallel, if hardware is provided. For Eq. (14), the independence of each row is the same as Eq. (11). However, all the other pixel values can be calculated in parallel, if hardware is provided in the calculation of each pixel value in a row, once θ_XY (x = 0) and θ_XY (x = p or d = 1) are calculated.

This method has a great deal of flexibility in parallel computing of digital holograms. It means that if sufficient hardware resources are provided, two cycles of calculation are sufficient for all the hologram components by one light source. Of course, Eq. (10) can perform a parallel computation by dividing a row into several sub-rows, or into the number of pixels in the ultimate case. However, the first pixel in each sub-row must be calculated by Eq. (5), which takes much hardware and a more complicated data input scheme compared to Eq. (10) or (14).

3.2. Hardware architecture of computational cells

We separate Eq. (8) into two components, the initial-parameter calculator initial-parameter calculator i_init (x_αj, y_αj, z_j) and the update-phase calculator i_update(i_init (), d), which are as Eq. (15) and (16), respectively, to design hardware for Eq. (7) and (8) including Eq. (14). Figure 2 shows their hardware architecture.

Fig. 2 Architecture of CGH cells: (a) initial-parameter calculator, (b) update-phase calculator.

Download Full Size | PDF

i_{init} (x_{α j}, y_{α j}, z_{j}) = (θ_{H, d = 0} = θ_{Z} + θ_{XY, d = 0,} Γ_{1}, Δ)

i_{update} (i_{init} (), d) = (θ_{H, d \geq 1} = θ_{Z} + θ_{XY, d = 0}, Γ_{d})

The outputs of the initial-parameter calculator in Fig. 3(a) are the phase of the first pixel θ_H,d ₌₀=θ_XY,d ₌₀+θ_Z, Γ₁, and Δ, among which Γ₁ and Δ are used in the update-phase calculator. In this cell, θ_Z and Δ/2 are taken from a pre-generated look-up table LUT1 with z_j as the address. Note that z_j is used only once in a row and the wavelength λ is fixed. The update-phase calculator in Fig. 3(b) also uses a look-up table, LUT2, for the cosine function. Section 3.4 explains this in more detail. Note that the update-phase calculator does not have a feedback loop that resides in the corresponding cell of [7] or [9], which increases much flexibility in pipelining. This will be explained in the next section.

Fig. 3 The pipelined architecture of the update-phase calculator.

Download Full Size | PDF

Only one initial parameter calculator cell is required to calculate a row of hologram components for a light source, while for the update-phase calculator cell, as many cells as desired can be included. If one update-phase cell resides, the recursive operation as Eq. (11) is performed. If M-1 (M is the number of pixels in a hologram row) update-phase cells are included as the fastest case, all the pixels except the leftmost one are calculated in parallel.

3.3. Pipelining

Many pipelining schemes are possible. More than one scheme is usually implemented, according to the provided hardware (hologram by hologram, light source by light source, row by row, pixel by pixel, and internal operator by operator, etc). Only the operator level scheme is explained here, because the higher level pipelining schemes depend on the included number of update-phase cells more than internal operator level.

Figure 3 only shows the pipelining scheme for the update-phase cell, because pipelining the initial-parameter cell may be meaningless, if sufficient cells are not included. As can be seen in the figure, we used a counter for (d – 1)/2 as well as d, which can be easily obtained by diminishing and shifting. It has six pipeline stages. Table 1 shows their time scheduling. Thus, from the sixth clock cycle after obtaining Δ, each clock cycle outputs one pixel value. The maximum delay that determines the speed of the clock period is the delay of one multiplier.

Table 1. Pipeline Stage Scheduling

View Table | View all tables in this article

Now, let us consider the extension of the cells to perform a parallel computation. The extension for only two cells is explained for simplicity, because extension for more than two cells is conceptually the same. Fig. 4 depicts the two possible schemes. The one in Fig. 4(a) shares cosine functions and a multiplier. In this structure, the outputs should be taken sequentially through the MUX. Conversely, the one in Fig. 4(b) can calculate two pixel values separately. The MUX at the end of this structure is to output the two results sequentially as Fig. 4(a). If sequential output is unnecessary, the MUX can be removed. Consequently, the structure of Fig. 4(a) saves some hardware resource at the expense of losing some flexibility of parallel computation compared to the structure of Fig. 4(b).

Fig. 4 Extension of update-phase calculator for pixel-based parallelization; (a) extendable structure with sequential outputs, (b) extendable structure with sequential or parallel outputs.

Download Full Size | PDF

3.4. Precision approximation for cosine function

Fixed-point computation is more preferred over floating-point computation in hardware implementation, because it uses less resources and can be computed more quickly. Note that a fixed-point numerical system itself is an approximation. For example, a number is expressed with 8 bits of integers, and it is approximated into an integer between 0 and 255. Therefore, it is quite usual that the intermittent results can be properly approximated without losing much precision in the final result. This can reduce the amount of hardware resource used.

This section deals with the approximation of the cosine function used in Fig. 2, 3, and 4. The methodology was a fixed-point simulation that a given digital bit is assigned to the result from the cosine function. Fig. 5 shows the simulation results from assigning the given bits (the numbers in the horizontal axis) to the cosine function both in the hologram Fig. 5(a) and in the reconstructed object Fig. 5(b). We estimated both peak signal-to-noise ratio (PSNR) (Eq. (17)) and normal correlation (NC) (Eq. (18)) for each case. X and Y is horizontal and vertical resolution. I and I ^′ are an original and a reconstructed pixel. In both cases, the cosine values resulting from assigning more than 28 bits are saturated to the one without approximation.

Fig. 5 The experimental results of approximation for the cosine function; (a) hologram (b) reconstructed object.

Download Full Size | PDF

PSNR (d B) = 10 {log}_{10} \frac{225^{2}}{\frac{1}{XY} Σ_{x, y} {(I_{x, y} - I_{x, y}^{'})}^{2}}

NC = \frac{Σ_{j = 1}^{XY} I_{j} I_{j}^{'}}{Σ_{j = 1}^{XY} I_{j}^{2}}

However, in real applications, the quality of the reconstructed image is more important than that of the hologram. In addition, subjective tests or eye inspection may be quite different from the results of PSNR or NC measurement, especially for holograms. Fig. 6 shows some examples of the reconstructed image for various cosine approximations with the Rabbit test image. In the figure, assigning 0 bits always denotes cosθ_H = 1, assigning 1 bit, makes cosθ_H = 1 or −1, and so on. One can easily recognize from the figures that the image created by assigning 1 bit does not make much difference in image quality from that created by assigning 30 bits. From this experiment, we could conclude that 3 bits are sufficient for the cosine function. We implement LUT2 in Figs. 3 and 4 with 3 bits.

Fig. 6 The object reconstruction results for the approximations of cosine function by assigning; (a) 0 bit, (b) 1 bit, (c) 15 bits, (d) 30 bits.

Download Full Size | PDF

3.5. CGH processor

The initial-parameter calculator cell of Fig. 2(a) and the update-phase calculator of Fig. 4(a) or (b) consist of a CGH kernel that performs the real CGH calculation. Of course, how many cells reside in the CGH kernel according to the parallel computation scheme is pre-determined. The kernel is also a component of the CGH processor, which includes input/output interfaces, memory, and its controller, DMA. In this paper, the architectures of the CGH kernel and CGH processor of [9] are used.

4. Hardware implementation and experiments

The proposed architecture in the previous section was implemented with VHDL in the environments of FPGA from Altera. Thus, Quartus II 10.0 and Modelsim 6.5e were used for VHDL design and simulation, respectively.

Table 2 compares the hardware resource in each cell of the proposed method to that of [9], which has similar hardware composition. As in this table, the proposed scheme uses fewer hardware resources, multipliers, adders, and LUTs, even though it uses one MUX per cell. It has high flexibility in parallel computation.

Table 2. Hardware Resource of CGH Cell

View Table | View all tables in this article

As explained previously, our scheme can calculate the influences from one light source to all the hologram pixels in parallel, if sufficient hardware is provided. The hardware for all the hologram cells might be too large to realize. Thus, we estimate the calculation ability, as the amount of hardware increases, as shown in Fig. 7. The horizontal and vertical axies indicate the amount of hardware corresponding to the number of hologram rows and the number of hologram frames per second, respectively. Three hologram resolutions were considered: 1,920×1,080 [pixel²], 1,408×1,050 [pixel²], and 1,280×1,024 [pixel²] (1920, 1408, and 1280 in the figure, respectively). In addition, two clock frequencies, 166MHz and 294MHz, were included, in which 294 MHz is the maximum stable frequency. As can be seen in the figure, the calculation speed has the properties of Eq. (19), as expected.

Fig. 7 Calculation speed according to the amount of hardware.

Download Full Size | PDF

Calculation speed \propto \frac{(clock speed) \times (amount of implemented hardware)}{(number of object points) \times (hologram resolution)}

Table 3 shows the performance under some implementation conditions. The number of object light points was the same but we considered two hologram resolutions, 1,920×1,080 [pixel²] and 1,408×1,050 [pixel²]. Two cases were examined for examples of the amount of hardware: the number of cells corresponding to a row of holograms and four rows of holograms. Here, the maximum clock frequency that operated stably was 294 [MHz]. 27.22 frames/sec of holograms with HD resolution could be generated with this clock frequency. As explained above, the speed is proportional to the clock frequency and inversely proportional to the hologram resolution and the number of object points, as shown in the other cases of the table.

Table 3. Performances for Various Implementation Conditions

View Table | View all tables in this article

Figure 8 shows two examples of the reconstructed objects, Ballet and Hyun-Jin. Ballet is a test multi-view video sequence from MPEG with a depth map resolution of 200×200 [pixel²]. Hyun-Jin is an image that we have made with a depth map resolution of 177×144 [pixel²]. The hologram resolution was 1,280×1,024 [pixel²] for both images. We used a depth camera from Mesa Imaging to capture the depth information for the test image of Hyun-Jin. Each test image includes its depth map in Fig. 8(a) and 8(d), a reconstructed image by simulation for the CGH generated with the original equations of Eq. (1) in Fig. 8(b) and 8(e), and the reconstructed results in the optical system (such as [9]) for the CGH generated by the proposed hardware in Fig. 8(c) and 8(f). The resolution and pixel pitch for the software-based CGH and reconstruction are 1,024×1,024 and 10.4μm, respectively. In the optical system, the resolution and pixel pitch of the spatial light modulator (SLM) are 1,280×1,024 and 13.62μm repectively.

Fig. 8 Examples of reconstructed images (upper ones for Ballet and the lower ones for Hyun-Jin); (a) and (d) depth maps, (b) and (e) reconstructed results by software, (c) and (f) (Media 1) and (Media 2) reconstructed results by optical apparatus for the CGH generated by the proposed hardware.

Download Full Size | PDF

5. Conclusion

In this paper, we proposed a CGH generation equation that can maximize the parallel computation by modifying the previous one. In addition, we proposed the hardware design of CGH cells to calculate the initial parameters and to update the phase for other pixels, although the architecture of the kernel and processor from one of our previous papers was used.

The experimental results, after implementing the proposed scheme in hardware in various conditions and amounts of hardware, verified that the calculation speed is proportional to the amount of hardware and clock frequency and inversely proportional to the resolution of the object image and that of the hologram. This can maximize the flexibility of CGH calculation.

The purpose of this paper is to maximize the parallel computation for CGH generation by the proposed hardware architecture. It can complete the computations for all the hologram pixels for one light source in a few clock cycles if sufficient hardware is incorporated. Also, it has the property of a trade-off between the amount of hardware and the computation speed all the way from pixel by pixel serial computation to the fully parallel computation. Therefore, the proposed scheme can be properly and efficiently used by applications according to the requirements for the amount of hardware and the computation speed.

Acknowledgments

This work was supported by the IT R&D program of KEIT. [KI002058, Signal Processing Elements and their SoC Developments to Realize the Integrated Service System for Interactive Digital Holograms]

References and links

1. S. Benton and V. M. Bove Jr., Holographic Imaging (Wiley, 2008). [CrossRef]

2. J. K. Chung and M. H. Tsai, Three-Dimensional Holographic Imaging (Wiley, 2002).

3. P. Hariharan, Basics of Holography (Cambridge University Press, 2002). [CrossRef]

4. M. Lucente, “Interactive computation of holograms using a look-up table,” J. Electron. Imaging 2, 28–34 (1993). [CrossRef]

5. H. Yoshikawa, S. Iwase, and T. Oneda, “Fast computation of fresnel holograms employing differences,” Proc. SPIE 3956, 48–55 (2000). [CrossRef]

6. T. Shimobaba and T. Ito, “An efficient computational method suitable for hardware of computer-generated hologram with phase computation by addition,” Comput. Phys. Commun. 138, 44–52 (2001). [CrossRef]

7. T. Ito, N. Masuda, K. Yoshimura, A. Shiraki, T. Shimobaba, and T. Sugie, “Special-purpose computer HORN-5 for a real-time electroholography,” Opt. Express 13, 1923–1932 (2005). [CrossRef] [PubMed]

8. Y. Ichihashi, H. Nakayama, T. Ito, N Masuda, T. Shimobaba, A Shiraki, and T. Sugie, “HORN-6 special-purpose clustered computing system for electroholography,” Opt. Express 17, 13895–13903 (2009). [CrossRef] [PubMed]

9. Y.-H. Seo, H.-J. Choi, J.-S. Yoo, and D.-W. Kim, “An architecture of a high-speed digital hologram generator based on FPGA,” J. Syst. Archit. 56, 27–37 (2009). [CrossRef]

10. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express 14, 603–608 (2006). [CrossRef] [PubMed]

11. L. Ahrenberg, P. Benzie, M. Magnor, and J. Watson, “Computer generated holography using parallel commodity graphics hardware,” Opt. Express 14, 7636–7641 (2006). [CrossRef] [PubMed]

12. Y. Pan, X. Xu, S. Solanki, X. Liang, R. Bin, A. Tanjung, C. Tan, and T.-C. Chong, “Fast CGH computation using S-LUT on GPU,” Optics Express 17, 18543–18555 (2009). [CrossRef]

13. Y.-Z. Liu, J.-W. Dong, Y.-Y. Pu, B.-C. Chen, H.-X. He, and H.-Z. Wang, “High-speed full analytical holographic computations for true-life scenes,” Opt. Express 18, 3345–3351 (2010). [CrossRef] [PubMed]

14. T. Shimobaba, T. Ito, N Masuda, Y Ichihashi, and N. Takada, “Fast calculation of computer-generated-hologram on AMD HD5000 series GPU and OpenCL,” Opt. Express 18, 9955–9960 (2010). [CrossRef] [PubMed]

15. W. G. Joseph, Introduction to Fourier Optics, 3rd ed. (Roberts and Company, 2005).

Cycle	Stage1	Stage2	Stage3	Stage4	Stage5	Stage6
Cycle	R0	R1	R2	R3	R4	R5
1	0Δ
2	$\frac{1}{2} Δ$	Γ₁ + 0Δ
3	1Δ	$Γ_{1} + \frac{1}{2} Δ$	Γ₁
4	$\frac{3}{2} Δ$	Γ₁ + 1Δ	Γ₂	θ_H _,1
5	2Δ	$Γ_{1} + \frac{3}{2} Δ$	Γ₃	θ_H _,2	cos(2πθ_H _,1)
6	$\frac{5}{2} Δ$	Γ₁ + 2Δ	Γ₄	θ_H _,3	cos(2πθ_H _,2)	A_j cos(2πθ_H _,1)
7	3Δ	$Γ_{1} + \frac{5}{2} Δ$	Γ₅	θ_H _,4	cos(2πθ_H _,3)	A_j cos(2πθ_H _,2)
8	$\frac{7}{2} Δ$	Γ₁ + 3Δ	Γ₆	θ_H _,5	cos(2πθ_H _,4)	A_j cos(2πθ_H _,3)
⋮	⋮	⋮	⋮	⋮	⋮	⋮
n	$\frac{(n - 1)}{2} Δ$	$Γ_{1} + \frac{(n - 2)}{2} Δ$	Γ _n _–2	θ_H,n _–3	cos(2πθ_H,n _–4)	A_j cos(2πθ_H,n _–5)

	initial parameter calculator		Optimized update-phase calculator
	[9]	Proposed	[9]	Proposed
Multiplier	2	2	3	2
Adder	3	3	3	2
LUT1	1	1	-	-
LUT2	1	-	-	1
LUT3	-	-	1	-
Register	4	4	8	6
MUX	-	-	1	-

Number of object points	10,000
Hologram resolution [pixel²]	1,920×1,080		1,408×1,050
Frequency [MHz]	294MHz		166MHz
Time[sec]/CGH	0.036	0.0092	0.0158
CGHs/sec	27.22	91.8	62.2
Included number of cells	1,920	7,680	5,632

Cycle	Stage1	Stage2	Stage3	Stage4	Stage5	Stage6
Cycle	R0	R1	R2	R3	R4	R5
1	0Δ
2	$\frac{1}{2} Δ$	Γ₁ + 0Δ
3	1Δ	$Γ_{1} + \frac{1}{2} Δ$	Γ₁
4	$\frac{3}{2} Δ$	Γ₁ + 1Δ	Γ₂	θ_H _,1
5	2Δ	$Γ_{1} + \frac{3}{2} Δ$	Γ₃	θ_H _,2	cos(2πθ_H _,1)
6	$\frac{5}{2} Δ$	Γ₁ + 2Δ	Γ₄	θ_H _,3	cos(2πθ_H _,2)	A_j cos(2πθ_H _,1)
7	3Δ	$Γ_{1} + \frac{5}{2} Δ$	Γ₅	θ_H _,4	cos(2πθ_H _,3)	A_j cos(2πθ_H _,2)
8	$\frac{7}{2} Δ$	Γ₁ + 3Δ	Γ₆	θ_H _,5	cos(2πθ_H _,4)	A_j cos(2πθ_H _,3)
⋮	⋮	⋮	⋮	⋮	⋮	⋮
n	$\frac{(n - 1)}{2} Δ$	$Γ_{1} + \frac{(n - 2)}{2} Δ$	Γ _n _–2	θ_H,n _–3	cos(2πθ_H,n _–4)	A_j cos(2πθ_H,n _–5)

	initial parameter calculator		Optimized update-phase calculator
	[9]	Proposed	[9]	Proposed
Multiplier	2	2	3	2
Adder	3	3	3	2
LUT1	1	1	-	-
LUT2	1	-	-	1
LUT3	-	-	1	-
Register	4	4	8	6
MUX	-	-	1	-

Cell-based hardware architecture for full-parallel generation algorithm of digital holograms

Abstract

1. Introduction

2. Computer-generated hologram (CGH)

2.1. The basic CGH equation

2.2. Recursive equation for CGH calculation

3. Parallel CGH calculation and its hardware architecture

3.1. CGH equation in full parallel

3.2. Hardware architecture of computational cells

3.3. Pipelining

3.4. Precision approximation for cosine function

3.5. CGH processor

4. Hardware implementation and experiments

5. Conclusion

Acknowledgments

References and links

Supplementary Material (2)

Cited By

Figures (8)

Tables (3)

Equations (19)

Optics Express