Parallel multi-view polygon rasterization for 3D light field display

Yanxin Guan; Xinzhu Sang; Xinzhu Sang; Shujun Xing; Yingying Chen; Yuanhang Li; Duo Chen; Xunbo Yu; Binbin Yan; Binbin Yan

doi:10.1364/OE.408857

1. Introduction

Many efforts have been devoted to research and implementation of three-dimensional (3D) technologies including capturing, processing, and display of 3D images [1–5]. Broad applications are discussed, such as entertainment, manufacturing, and biomedical applications. Because the content of the 3D light field image is difficult to obtain in the real scene, and the camera acquisition is easy to introduce errors, so the method of computer graphics is often used to generate the light field image.

For computer graphics, a computer graphics pipeline is a conceptual model that describes steps of a graphics system to perform to render a 3D scene to a 2D screen [6–10]. Based on the programmability of graphics pipelines, several generation methods for multi-view display were presented [11–20]. Different from the single viewpoint generation, the multi-view generation has high information density, and it is required to explore more specialized generation methods specially designed for that purpose. However, the current generation of 3D displays uses traditional single viewpoint computer graphics techniques, which is not sufficiently well suited to the task of synthesizing images for the 3D light field displays. The rasterization stage is non-programmable, and the structure of the pipeline is not easily customized. Instead, with programmable GPUs, unconventional rendering tasks multi-view rasterization can be implemented with the soft renderer [21].

Here, a parallel polygon rasterization method is presented for multi-view light field image generation, which can be finished in one pass. Due to the expansion of the view dimensions, a new primitive for parallelized polygon rasterization is constructed. The position relationship of vertices in the screen space above all viewpoints is obtained, and the vertex position in different can be computed directly with a reference viewpoint. The plane function and barycentric coordinate interpolation method are derived to compute the color of pixels. A soft rendering pipeline with GPU is designed and implemented. The proposed method is implemented and verified in several 3D scenes. Finally, the performance of our rendering pipeline is analyzed and compared with other multi-view image generation method.

2. Related work

Many different methods for computer-generated multi-view images were presented. With the programmable graphics pipeline, some methods were implemented, such as multiple ray cluster rendering [11] and multiple orthographic frustum combing [12]. Based on the path of light propagation and the optical structure of the display, the generation method of viewpoint vector rendering [13,14] and backward ray tracing [15,16] were presented. Some other methods are based on virtual viewpoint generation and image processing, such as multiple reference views depth-image based rendering (MDIBR) [17]. The multiple viewpoint rendering (MVR) [18] method considers the image set as a single spatio-perspective volume, and the coordinate position transformation of vertices between viewpoints is explored.

Efficient rasterization methods and architectures for real-time rendering were also developed [21]. Pineda presented a parallel algorithm for the rasterization of polygons which was particularly well suited for 3D Z-buffered graphics implementations [22]. The algorithm represents each edge of a polygon by a linear edge function with a value greater than zero on one side of the edge and less than zero on the opposite side. Based on the above algorithm, a novel polygon tiling algorithm in which recursive subdivision of image space is driven by coverage masks that classify a convex polygon as inside, outside, or intersecting cells in an image hierarchy is presented [23]. The resulting hierarchical polygon tiling algorithm performs subdivision and visibility computations very rapidly while only visiting cells in the image hierarchy that are crossed by visible edges in the output image. This method is widely used in GPU hardware rendering [24], but no longer applicable in the multi-view rendering.

As the GPU performance becomes almost irresistibly appealing, many applications of all sorts are recast to make use of GPUs. Although the current rendering pipeline is highly programmable, there are long-standing limitations in the hardware rasterization pipeline that restrict the set of algorithms that can benefit from hardware acceleration. Therefore, researchers presented several effective software rendering pipeline methods, such as Freepipe [25], CUDARaster [26], Piko [27], and cuRE [28]. The cuRE is a real-time graphics pipeline implemented entirely in software on a modern GPU and features a fully-concurrent, multi-stage, streaming design with dynamic load balancing, capable of operating efficiently within bounded memory. The power of a software approach lies in the ability to tailor the graphics pipeline to any given application. To validate our algorithm, cuRE is used as the rasterization software rendering framework to implement our method. Also, the dynamic parallelism technique provides flexibility for our pipeline design.

3. Parallel multi-view polygon rasterization method

The typical structure of the light field display system is shown in Fig. 1(a), the synthesized image on LCD can be obtained by computer-generated multi-view images, and the light rays from the synthesized image are integrated by the lenticular lens array and diffuser so that they reconstruct a 3D image of the virtual object [3]. According to the path of ray in the optical system, the multi-view image information in the synthesized image can be captured by multiple cameras. As shown in Fig. 1(b), the capture cameras are arranged in a regularly-spaced linear array located at the camera plane. The optical axis directions of cameras are identical, and all the cameras focus on the same plane at a certain depth with shear transform.

Fig. 1. (a) The 3D images are constructed by a multi-view light field display system; (b) the cameras with sheared perspective projection are placed in the horizontal direction for 3D information capture.

Download Full Size | PDF

In traditional single-view rasterization rendering, we can only generate one viewpoint image at a time, so multiple rendering passes are required for the generation of multi-view images. Each viewpoint image is generated by reading in the model, transforming it into world space, repeatedly transforming the scene geometry based on the position of a moving camera, performing shading calculations. Although some operations such as view independent lighting can be performed once for the entire image sequence, most implementations are executed once per view. These operations cannot be performed in parallel in different viewpoints.

The PMR method draws 3D models inherently computes multiple viewpoints of the scene at once. The primitive shown in Fig. 2(b) is called multi-view triangle array (MTA) which is similar to 3-prism formed by 6 vertices. The PMR reduces the cost of producing the multi-view images by performing as many calculations as possible just once per sequence rather than once per viewpoint. Accordingly, the result of rasterization should be a 3D volume, where the X/Y-axes represent the index of horizontal and vertical pixels, and the Z-axis represents the viewpoint number (0∼N). By considering the image sequence as a whole, PMR reduces the cost of scene synthesis. In the algorithm, multiplications are transformed into accumulation operations as much as possible. For example, in the case of n viewpoints and m pixels covered per triangle, 38mn multiplications are required in the traditional method, while only 30m+8mn multiplications in our method. Compared with traditional methods, it has higher parallelism and lower time algorithm complexity.

Fig. 2. (a) 3D mesh model; (b) Primitive MTA setup by viewpoint re-projection; (c) The coverage of voxels are calculated by the plane function; (d) the properties of voxels are obtained based on barycentric coordinate interpolation; (e) the rasterization results of MTA are stored in 3D volume; (f) multi-view images are stored in a 3D volume.

Download Full Size | PDF

Rasterization is the process where each individual primitive is broken down into discrete elements called fragments, based on the sample coverage of the primitive. In the rasterization stage, each triangle is processed through the following steps as shown in Fig. 2. Firstly, the primitive MTA is set up based on viewpoint re-projection. Secondly, the voxels covered by the MTA are found based on plane functions. Thirdly, each voxels’ properties are obtained based on barycentric coordinate interpolation. The details of the steps are as follows.

3.1 MTA setup by viewpoint re-projection

There is a certain spatial relationship among vertex positions in different viewpoints, and the screen coordinates of vertices in different viewpoints can be transformed from that in each other viewpoints directly by view re-projection [29]. As shown in Fig. 3(a), vertices in the reference and destination viewpoint are corresponding one by one, and the basic mapping rule conforms to the following Eq. (1), where n is the sequence number of viewpoints, N is the total number of viewpoints, and where T is the distance of the between the reference and destination viewpoint in the horizontal direction.

(1)$$\left( {{x_0} + \frac{{{t_n}({{Z_0} - {z_0}} )}}{{{z_0}}},{y_0},{z_0}} \right)$$

(2)$${t_n} = \frac{{n \cdot T}}{N}$$

Fig. 3. Viewpoint re-projection: (a) Correspondence between pixels of a reference viewpoint and destination viewpoint, (b) the disparity in the screen coordinate system, (c) the vertices of arbitrary viewpoint view_n can be by linear interpolation between view_0 and view_N.

Download Full Size | PDF

Most computer graphics scene databases consist of polygons instead of just points. Polygons are graphics primitives that describe patches of surfaces. Most computer graphics algorithms and hardware are optimized for triangle drawing, sometimes even more so than for line drawing. As shown in Fig. 3(b), (a) triangle is defined by 3 non-collinear vertices, and the vertices of triangles in the destination viewpoint can be obtained by view re-projection.

In the rendering process with n viewpoints, the left-most viewpoint is marked as view_0, and the right-most viewpoint is marked as the view_N. The screen coordinates of vertices (A_n, B_n, C_n) in n-th viewpoint view_n can be obtained by linear interpolation between view_0 (A₀, B₀, C₀) and view_N (A_N, B_N, C_N). The view_0 is regarded as the reference viewpoint, and the positions of vertices view_n are expressed as Eq. (3),

(3)$$\begin{array}{l} {A_n}:({{x_{A0}} + \Delta A,{y_{A0}},{z_{A0}}} )= ({{x_{A0}} + {{{t_n}({{Z_0} - {z_{A0}}} )} / {{z_{A0}}}},{y_{A0}},{z_{A0}}} ),\\ {B_n}:({{x_{B0}} + \Delta B,{y_{B0}},{z_{B0}}} )= ({{x_{B0}} + {{{t_n}({{Z_0} - {z_{B0}}} )} / {{z_{B0}}}},{y_{B0}},{z_{B0}}} ),\\ {C_n}:({{x_{C0}} + \Delta C,{y_{C0}},{z_{C0}}} )= ({{x_{C0}} + {{{t_n}({{Z_0} - {z_{C0}}} )} / {{z_{C0}}}},{y_{C0}},{z_{C0}}} ). \end{array}$$

The coordinates of these vertices in different viewpoints only change in the y-axis direction [18]. The shape of MTA is similar to an irregular triangular prism as shown in Fig. 3(c).

3.2 Plane function

The edge function is a linear function which can be used to classify points on a 2D plane that is subdivided by a line, into three regions: the points to the “left” of the line, the points to the “right” of the line, and the points on the line [22]. In the single-view rasterization, the edge equation is a common method to calculate pixel coverage. As shown in Fig. 4(a), when the pixel q is on the same side of the three edges, the results of the 3 edge equations are all positive, and the pixel q is covered by the triangle ΔA_nB_nC_n. In multi-view rasterization, because of the addition of the dimension of viewpoint, the edge function is extended to the plane function. The 3 directed planes of an MTA can be represented by 3 plane functions as shown in Fig. 4(b). Each plane function is negative for points to the “left” of the plane, positive for points to the “right”, and zero for points on the plane. The four points (such as A₀, B₀, B_N, A_N) of on the same side are coplanar because the points of different viewpoints are only offset in the horizontal direction. For example, the normals of the 3 planes A₀B₀B_NA_N, B₀C₀C_NB_N, and C₀A₀A_NC_N can be expressed separately as

(4)$$\begin{array}{l} {{\bf n}_{\bf 1}}{\bf = }{{\bf A}_{\bf 0}}{{\bf A}_{\bf N}} \times {{\bf A}_{\bf 0}}{{\bf B}_{\bf 0}},\\ {{\bf n}_{\bf 2}}{\bf = }{{\bf B}_{\bf 0}}{{\bf B}_{\bf N}} \times {{\bf B}_{\bf 0}}{{\bf C}_{\bf 0}},\\ {{\bf n}_{\bf 3}}{\bf = }{{\bf C}_{\bf 0}}{{\bf C}_{\bf N}} \times {{\bf C}_{\bf 0}}{{\bf A}_{\bf 0}}. \end{array}$$

And the 3 planes are described by the plane function P:

(5)$$\begin{array}{l} {P_1} = {{\bf n}_{\bf 1}} \cdot ({x - {x_{A0}},y - {y_{A0}},z - {z_{A0}}} ),\\ {P_2} = {{\bf n}_{\bf 2}} \cdot ({x - {x_{B0}},y - {y_{B0}},z - {z_{B0}}} ),\\ {P_3} = {{\bf n}_{\bf 3}} \cdot ({x - {x_{C0}},y - {y_{C0}},z - {z_{C0}}} ). \end{array}$$

Fig. 4. (a) A triangle ΔA_nB_nC_n with its three edge functions A_nB_n, B_nC_n, and C_nA_n. For points inside the triangle ΔA_nB_nC_n, all edge functions are positive. (b) The MTA is defined by three lateral and two bottom surfaces. For points inside the MTA, all plane functions are positive.

Download Full Size | PDF

If the voxel q (x, y, z) on the same side of the three planes, which satisfies inequality Eq. (6), the voxel is covered by MTA.

(6)$$\begin{array}{l} {P_1}(x,y,z) \ge 0,\\ {P_2}(x,y,z) \ge 0,\\ {P_3}(x,y,z) \ge 0,\\ 0 \le z \le N. \end{array}$$

The edge function is easy to incrementally update. For example, the plane functions neighbors are described as:

(7)$$P({x + a,y + b,z + c} )= P({x,y,z} )+ a{n_x} + b{n_y} + c{n_z}.$$

This property is well suited for a multi-view rasterizer, which steps through a polygon, generating one or a group of fragments at each step.

3.3 Barycentric coordinate interpolation

If voxels are covered by the MTA, the attribute values of these voxels need to be determined. In 3D models, the most common vertex attributes are color, normal, and texture coordinates. These attribute values of covered voxels can be obtained by interpolating 6 vertices of MTA. For the vertex interpolation of MTA, the linear interpolation is performed in the Z direction to obtain the areas of triangles, and then the barycentric coordinate interpolation of the triangle is performed. When the view ID is a constant, the plane function degenerates to an edge function. As shown in Fig. 4(b), for the v-th viewpoint, the edge function is expressed as

(8)$$P({x + a,y + b,z + v} )= P({x,y,z} )+ a{n_x} + b{n_y} + v{n_z}, $$

and the area of the triangle ΔA₀B₀C₀ is expressed as

(9)$${S_0} = |{{{\bf A}_{\bf 0}}{{\bf B}_{\bf 0}} \times {{\bf B}_{\bf 0}}{{\bf C}_{\bf 0}}} |,$$

(10)$$\begin{array}{l} {S_v} = |{{{\bf A}_{\bf v}}{{\bf B}_{\bf v}} \times {{\bf B}_{\bf v}}{{\bf C}_{\bf v}}} |= |{({{{\bf B}_{\bf v}} - {{\bf A}_{\bf v}}} )\times ({{{\bf C}_{\bf v}} - {{\bf B}_{\bf v}}} )} |\\ = |{({\Delta C{\bf - }\Delta B} )({0\textrm{,}{{\bf A}_{\bf 0}}{{\bf B}_{\bf 0}}_y\textrm{,}{{\bf A}_{\bf 0}}{{\bf B}_{\bf 0}}_z} )+ ({\Delta B{\bf - }\Delta A} )({\textrm{0}\textrm{,}{{\bf B}_{\bf 0}}{{\bf C}_{\bf 0}}_y\textrm{,}{{\bf B}_{\bf 0}}{{\bf C}_{{\bf 0}z}}} )} |+ {S_0} \end{array}. $$

The attribute values of any points inside the triangle are expressed as the weighted average of 3 vertices. The weights (W₁, W₂, W₃) can be obtained by the barycentric coordinates method. Barycentric coordinates are used to express the position of any point located on the triangle with three scalars [30]. The location of this point includes any position inside the triangle, any position on any of the three edges of the triangles, or anyone of the three triangle's vertices themselves. According to the barycentric coordinates method, the barycentric coordinates can be obtained by the results of the edge functions and area of the triangle, as expressed in Eq. (11).

(11)$$\begin{array}{l} {W_1} = \frac{{{P_1}(v )}}{{{S_v}}},\\ {W_2} = \frac{{{P_2}(v )}}{{{S_v}}},\\ {W_3} = \frac{{{P_3}(v )}}{{{S_v}}}. \end{array}$$

The 3 weights have the following relationships:

(12)$${W_1} + {W_2} + {W_3} = 1.$$

So, the attribute values of the point (x, y, v) inside the MTA is expressed as:

(13)$$Param({x,y} )= Param \cdot {W_1} + Param \cdot {W_2} + Param \cdot {W_3}.$$

4. Implementation of PMR with GPU

In this section, a hierarchical software rasterization rendering pipeline is designed. The structure of the pipeline is illustrated in Fig. 5. Our pipeline consists of four stages: primitive setup, tile rasterizer, fine rasterizer, and pixel encoding stage. Each stage is executed as a separate CUDA kernel launch with the dynamic parallelism technique [31]. The whole pipeline adopts a hierarchical structure to accelerate calculation, if it is judged that the pixels in the bounding box are not covered, the next stage will not be launched [32]. Primitive setup performs culling, clipping, snapping, and calculation of plane functions. Tile rasterizers generate perform different level axis-aligned bounding box (AABB) test, the tile outside the AABB is abandoned. Fine rasterizer, processes each frame buffer tile, computing exact coverage for the overlapping triangles, executing the shader. Finally, the multi-view images are encoding to the synthesized image. The resolution of our display device is 7680×4320, with 96 viewpoints. The resolution of each viewpoint is 1024×512, which is divided into 256×128×24 tiles. Each tile includes 4×4×4 voxels.

Fig. 5. A high-level diagram of our software rasterization pipeline.

Download Full Size | PDF

4.1 Primitive setup

In this stage, the differentials, plane functions, and other data for the triangle are computed. These data are saved in global memory for interpolation of the various shading data in later stages. Each CUDA thread is given the task of processing one triangle created by a previously executed vertex shader. After reading the vertex positions, vertex clipping and culling are performed. If any of the vertices are outside the near or far clip plane, the triangle is processed by the clipper [33].

Multiple culling tests are performed for each triangle. Because of multiple view frustums in different viewpoints existing, the results of view culling tests are different. To keep both efficiency and correctness in mind, a conservative culling is performed using a single frustum body containing a collection of all the view frustums as shown in Fig. 6(a). Only primitives that are entirely or partially inside the view frustum need to be rendered. One way to speed up the rendering process is to compare the bounding box of each object to the view frustum. If the bounding box is outside the frustum, then the geometry it encloses can be omitted from rendering. If instead the bounding box is inside or intersecting the frustum, then the contents of that bounding box may be visible and must be sent through the rendering pipeline. Furthermore, if the area is negative or zero and backface culling is enabled. The backface culling results can be obtained by linear interpolation, so we only need to judge the values of view_0 and view_N as shown in Fig. 6(b).

Fig. 6. (a) A spatial set consisting of a set of frustums of multiple viewpoints; (b) The AABB of MTA for culling and clipping.

Download Full Size | PDF

Fig. 7. The operations performed in tile rasterizer: (a) the parameter of MTA from the input queue in global memory; (b) the AABB tests are performed; (c) the plane functions are calculated.

Download Full Size | PDF

Fig. 8. The operations performed in the fine rasterizer. Each CUDA thread block performs the rasterization of a tile with the same (x, y) index.

Download Full Size | PDF

4.2 Tile rasterization

According to the result of clipping and culling in the primitive setup stage, the tile rasterization stage is launched, as shown in Fig. 7. Triangles are fetched from the input queue in global memory, each thread is given the task of processing one triangle. Each CUDA thread block has one thread, and the number of blocks is equal to the number of triangles. The AABB test is performed to discard the tile which is not covered by MTA. The plane function calculations are performed on all the 3D tiles through several iterations, and the tiles covered by MTA are obtained. A 3D bitmask is used to record the results, and each bit in bitmask represents whether a tile is covered. Finally, according to the result of the AABB test, the fine rasterizer is launched, and the thread number is equal to the number of tiles. The information of triangles and bitmask is transformed into fine raster kernel function as input.

4.3 Fine rasterization

Each thread is given the task of processing one tile, and the voxel coverage is computed through 4×4×4 iterations, as shown in Fig. 8. Calculate the coordinates of the current tile according to the thread number with the input triangle parameters. The surviving threads continue to execute the shader. Firstly, the plane function and barycenter coordinate interpolation of attributes calculations are performed. Then, calculate the depth and discard the fragment if the depth test fails. Finally, shading and other calculations are performed. After getting the color of each voxel, save it to the global memory frame buffer. To avoid memory conflicts, atomic operations should be used in the saving process. To reduce the influence of atomic operation on efficiency in parallel computing, we change the order of view calculation between different triangles, so that the color values of the same address are not modified at the same time.

4.4 Pixel encoding stage

According to the parameters of the display device, the synthesized image can be obtained by pixel encoding from multi-view images [34,35]. Each sub-pixel of the synthesized image is from different viewpoints images, as shown in Fig. 9. The calculation rules of the view number v of each sub-pixel (i, j, k) are as follows:

(14)$$v(i,j,k) = \left( {{{({\textrm{3}j\textrm{ + }k\textrm{ - 3}i\textrm{tan}\alpha } )} / L}\textrm{ - }floor\left( {\frac{{({\textrm{3}j\textrm{ + }k\textrm{ - 3}i\textrm{tan}\alpha } )}}{L}} \right)} \right) \cdot N, $$

(15)$$L\textrm{ = }{{{p_u}} / {{p_h}\cos \alpha }}, $$

where p_u is the lens pitch, p_h is the horizontal subpixel pitch, and N is the total number of viewpoints. floor(x) is a truncation function that maps a real number x to the largest previous integer. Each thread is given the task of processing one pixel, which is regarded as the post-processing of the rendering pipeline. The computation of different threads are similar and non-interfering, so all the threads can be assigned in one thread block.

Fig. 9. Subpixel-viewpoint arrangement of PMR 3D display and the corresponding parameters.

Download Full Size | PDF

5. Results

5.1 Experimental configuration

Here, a dense-viewpoint horizontal parallax light field display system shown in Fig. 10 is used, and its numerical parameters in the experiment are summarized in Table 1. In the experiments, a computer with a CPU Intel Core i9–9900K @ 3.60GHz and NVIDIA GeForce RTX 2080 GPU is used. The algorithm is implemented with OpenGL and CUDA10.2.

Fig. 10. The photograph of multi-view light field display system.

Download Full Size | PDF

Table 1. Related parameters of multi-view light field display system

View Table

5.2 Generation results

After calculation, the generated results are obtained. The multi-view images and the left-most and right-most viewpoint images are shown in Fig. 11. After pixel encoding, the synthesized image is generated as shown in Fig. 12. The reconstructed 3D images are displayed in different perspectives are shown in Fig. 13. Our algorithm can generate images of different viewpoints correctly at the same time.

Fig. 11. The images are generated by the center camera and the synthesized image.

Download Full Size | PDF

Fig. 12. The synthesized image is displayed on LCD.

Download Full Size | PDF

Fig. 13. Different perspectives of the reconstructed 3D scene on light field display (Visualization 1).

Download Full Size | PDF

5.3 Performance analysis

The influence of resolution and viewpoint on the efficiency of each stage is tested, as shown in Fig. 14. The primitive setup and tile rasterizer are affected by the number of triangles. Fine raster is greatly affected by the number of triangles and the resolution. The encoding pixel stage is only related to the resolution. These results are consistent with expectations.

Fig. 14. Time-consuming of each stage in our pipeline.

Download Full Size | PDF

Our method is compared with the previous real-time generation methods, such as cuRE and MDIBR. The software renderer cuRE based on traditional single view rasterization is implemented with CUDA, which has the same framework as our method. The renderer presented by MDIBR [13] method is implemented with OpenGL. As shown in Fig. 15, when the total resolution is fixed, the calculation speed is not significantly affected by the change of the viewpoint number. Therefore, our method is suitable for multi-view image generation. Although the implementation in this paper is only implemented in software, our method is the fastest when the viewpoint number is greater than 10. For parallel computing with CUDA, optimization operations are used to ensure the compatibility of our program for a large number of viewpoints. So, these optimization operations may become an obstacle for efficiency improvement when the number of viewpoints is small, which causes lower efficiency when the viewpoint number is less than 10.

Fig. 15. Generating performances (fps) of different rendering methods.

Download Full Size | PDF

6. Conclusion

In summary, a highly parallel triangle rasterization algorithm for 3D generation is presented. Because of the addition of the dimension of viewpoint, the computation algorithms of primitive setup, fragment coverage, and interpolation are re-deduced. To verify the algorithm, a hierarchical GPU software renderer is designed. The experiment shows that the rasterizer can get the correct rasterization results, and real-time generating can be realized. The frame rate rises above 60 fps under the condition of 7680×4320 resolution, millions of vertices. In theory, compared with the traditional rasterization, the proposed algorithm exhibits obvious advantages in time complexity. If the hardware is used for our rasterization algorithm, the advantages of the algorithm will be more obvious.

Funding

National Natural Science Foundation of China (61705014, 61905017); Fundamental Research Funds for the Central Universities (2019PTB-018, 2019RC13); National Key Research and Development Program of China (2017YFB1002900); BUPT Excellent Ph.D. Students Foundation (CX2020224).

Disclosures

The authors declare no conflicts of interest. This work is original and has not been published elsewhere.

References

1. M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512 (2018). [CrossRef]

2. G. Lv, Q. Wang, W. Zhao, and J. Wang, “3D display based on parallax barrier with multiview zones,” Appl. Opt. 53(7), 1339–1342 (2014). [CrossRef]

3. J. Wen, X. Yan, X. Jiang, Z. Yan, F. Fan, P. Li, Z. Chen, and S. Chen, “Integral imaging based light field display with holographic diffusor: principles, potentials and restrictions,” Opt. Express 27(20), 27441 (2019). [CrossRef]

4. X. Sang, X. Gao, X. Yu, S. Xing, Y. Li, and Y. Wu, “Interactive floating full-parallax digital three-dimensional light-field display based on wavefront recomposing,” Opt. Express 26(7), 8883 (2018). [CrossRef]

5. X. Yu, X. Sang, X. Gao, D. Chen, B. Liu, L. Liu, C. Gao, and P. Wang, “Dynamic three-dimensional light-field display with large viewing angle based on compound lenticular lens array and multi-projectors,” Opt. Express 27(11), 16024 (2019). [CrossRef]

6. S. Marschner, “The Graphics Pipeline,” in Fundamentals of Computer Graphics (A K Peters/CRC Press, 2019), pp. 159–182.

7. R. L. Cook, L. Carpenter, and E. Catmull, “The reyes image rendering architecture,” in Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987 (1987), pp. 95–102.

8. T. Aila and S. Laine, “Understanding the efficiency of ray traversal on GPUs,” in Proceedings of the HPG 2009: Conference on High-Performance Graphics 2009 (2009), pp. 145–150.

9. M. Eldridge, H. Igehy, and P. Hanrahan, “Pomegranate: A fully scalable graphics architecture,” in Proceedings of the ACM SIGGRAPH Conference on Computer Graphics (2000), pp. 443–454.

10. D. B. Kirk and W. M. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach: 3rd Edition (2016).

11. S. Jiao, X. Wang, M. Zhou, W. Li, T. Hong, D. Nam, J.-H. Lee, E. Wu, H. Wang, and J.-Y. Kim, “Multiple ray cluster rendering for interactive integral imaging system,” Opt. Express 21(8), 10070 (2013). [CrossRef]

12. S. L. Li, Q. H. Wang, Z. L. Xiong, H. Deng, and C. C. Ji, “Multiple orthographic frustum combing for real-time computer-generated integral imaging system,” J. Disp. Technol. 10(8), 704–709 (2014). [CrossRef]

13. K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D(1), 233–241 (2007). [CrossRef]

14. Z. Yan, X. Yan, X. Jiang, and L. Ai, “Computational integral imaging reconstruction of perspective and orthographic view images by common patches analysis,” Opt. Express 25(18), 21887 (2017). [CrossRef]

15. S. Xing, X. Sang, X. Yu, C. Duo, B. Pang, X. Gao, S. Yang, Y. Guan, B. Yan, J. Yuan, and K. Wang, “High-efficient computer-generated integral imaging based on the backward ray-tracing technique and optical reconstruction,” Opt. Express 25(1), 330 (2017). [CrossRef]

16. J. Wen, X. Yan, X. Jiang, Z. Yan, Y. Wang, and J. Wang, “Nonlinear mapping method for the generation of an elemental image array in a photorealistic pseudoscopic free 3D display,” Appl. Opt. 57(22), 6375 (2018). [CrossRef]

17. Y. Guan, X. Sang, S. Xing, Y. Li, and B. Yan, “Real-Time Rendering Method of Depth-Image-Based Multiple Reference Views for Integral Imaging Display,” IEEE Access 7, 170545–170552 (2019). [CrossRef]

18. M. Halle, “Multiple viewpoint rendering,” in Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1998 (1998), pp. 243–254.

19. H. Liao, T. Dohi, and K. Nomura, “Autostereoscopic 3D display with long visualization depth using referential viewing area-based integral photography,” IEEE Trans. Vis. Comput. Graph. 17(11), 1690–1701 (2011). [CrossRef]

20. B. Pang, X. Sang, S. Xing, X. Yu, D. Chen, B. Yan, K. Wang, C. Yu, B. Liu, C. Cui, Y. Guan, W. Xaing, and L. Ge, “High-efficient rendering of the multi-view image for the three-dimensional display based on the backward ray-tracing technique,” Opt. Commun. 405, 306–311 (2017). [CrossRef]

21. J. McCormack and R. McNamara, “Tiled polygon traversal using half-plane edge functions, “ in Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware (2000), pp. 15–21.

22. J. Pineda, “A parallel algorithm for polygon rasterization,” in Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1988 (1988), pp. 17–20.

23. D. Crisu, S. Vassiliadis, S. D. Cotofana, and P. Liuha, “3D graphics tile-based systolic scan-conversion,” in Conference Record - Asilomar Conference on Signals, Systems and Computers (2004), Vol. 1, pp. 517–521.

24. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, “Larrabee: A many-core x86 architecture for visual computing,” ACM Trans. Graph. 27(3), 1–15 (2008). [CrossRef]

25. F. Liu, M. C. Huang, X. H. Liu, and E. H. Wu, “FreePipe: A programmable parallel rendering architecture for efficient multi-fragment effects,” in Proceedings of I3D 2010: The 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (2010), pp. 75–82.

26. S. Laine and T. Karras, “High-performance software rasterization on GPUs,” in Proceedings - HPG 2011: ACM SIGGRAPH Symposium on High Performance Graphics (2011), pp. 79–88.

27. A. Patney, S. Tzeng, K. A. Seitz, and J. D. Owens, “Piko: A framework for authoring programmable graphics pipelines,” ACM Trans. Graph. 34(4), 1–13 (2015). [CrossRef]

28. M. Kenzel, B. Kerbl, Di. Schmalstieg, and M. Steinberger, “A high-performance software graphics pipeline architecture for the GPU,” ACM Trans. Graph. 37(4), 1–15 (2018). [CrossRef]

29. W. R. Mark and G. Bishop, “Efficient reconstruction techniques for post-rendering 3D image warping,” Univ. North. Carolina Chapel Hill (1998).

30. P. B. Laval, “Mathematics for Computer Graphics - Barycentric Coordinates,” pp. 1–9, (2003).

31. J. Gómez-Luna and I. El Hajj, “CUDA dynamic parallelism,” in Programming Massively Parallel Processors: A Hands-on Approach: Third Edition (2017), pp. 275–304.

32. N. Greene, “Hierarchical polygon tiling with coverage masks,” in Proceedings of the ACM SIGGRAPH Conference on Computer Graphics (1996), pp. 65–74.

33. S. Jung, M. Hong, and M. H. Choi, “Free-form deformation axis aligned bounding box,” in Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (LNCS, 2009), Vol. 5576, pp. 804–813.

34. X. Yu, X. Sang, D. Chen, P. Wang, X. Gao, T. Zhao, B. Yan, C. Yu, D. Xu, and W. Dou, “Autostereoscopic three-dimensional display with high dense views and the narrow structure pitch,” Chin. Opt. Lett. 12(6), 060008 (2014).

35. X. Yan, J. Wen, Z. Yan, T. Zhang, and X. Jiang, “Post-calibration compensation method for integral imaging system with macrolens array,” Opt. Express 27(4), 4834 (2019). [CrossRef]

Parameter	Value
Resolution	7680×4320
Number of views	96
Field of view	80°
Viewing distance	200 cm
Size of display device	166 cm × 93 cm

Parallel multi-view polygon rasterization for 3D light field display

Abstract

1. Introduction

2. Related work

3. Parallel multi-view polygon rasterization method

3.1 MTA setup by viewpoint re-projection

3.2 Plane function

3.3 Barycentric coordinate interpolation

4. Implementation of PMR with GPU

4.1 Primitive setup

4.2 Tile rasterization

4.3 Fine rasterization

4.4 Pixel encoding stage

5. Results

5.1 Experimental configuration

5.2 Generation results

5.3 Performance analysis

6. Conclusion

Funding

Disclosures

References

Supplementary Material (1)

Cited By

Figures (15)

Tables (1)

Equations (15)

Optics Express