Image edge smoothing method for light-field displays based on joint design of optical structure and elemental images

Xunbo Yu; Xunbo Yu; Hanyu Li; Xiwen Su; Xin Gao; Xin Gao; Xinzhu Sang; Binbin Yan

doi:10.1364/OE.488781

1. Introduction

Light-field display is a promising approach to implement glasses-free three-dimensional (3D) displays in the near future. Numerous studies have been conducted and much research dedicated to improving the performance of light-field displays [1–10]. Image quality is essential to a display system because the viewer will instinctively base the performance of a display on it. However, the existing image quality created by current light-field display technology is far below the ideal level. In our previous work [11], an integral imaging-based light-field display system with a lens array, a liquid crystal display (LCD), and a directional diffuser is realized. Its imaging process is shown in Fig. 1 from a partial perspective. One single pixel on the LCD becomes larger than it should be (the same size as on the LCD) after it is transmitted through the lens due to the aberrations of the lens array. Amplified pixels on the display can cause jagged and grainy lines on the imaging plane, reducing display quality and affecting the viewing experience.

Fig. 1. Pixel granules enlarging due to the lens imaging process.

Download Full Size | PDF

In the 3D light-field image reconstruction field, increasing visual resolution is often used as a means to improve image quality [12,13]. To improve image quality, tremendous effort has been made by researchers in the two-dimensional (2D) image processing field. Methods such as image sharpness enhancement [14–16], high-gaze probability area optimization [17–19], and image edge smoothing [20–23] have been proposed and proven to work. The human visual system is more sensitive to the structural information in the image (for example, the contour) [24]. Therefore, the quality enhancement of the image edges is extremely cost-effective compared with focusing on other aspects.

End-to-end optimization methods have been applied to the light-field system with modern deep learning approaches [25,26] and some achievements have been made in certain specific application scenarios such as improving the dynamic range of the image [27]; however, this aspect is part of the light-field capture procedure. In the field of optical optimization within the imaging process, deep learning has also been implemented. The structural parameters and point spread function distribution of the optical components can be optimized through deep learning, providing researchers with new ideas. Based on the deep learning algorithm, researchers have realized the inverse design optimization of optical thin films [28], the depth of field expansion with cascaded neural networks and point spread functions [29], and the design of nanophotonic metasurfaces [30]. However, only one light path is considered in the aforementioned optimization schemes, which is not applicable to the multi-viewpoints light-field displaying. Therefore, we present the method of joint optimization.

In this paper, an optimization method that jointly optimizes the optical structure and the elemental image array to smooth image edges for multi-viewpoints light-field reconstruction is proposed. Every alteration that can affect the optical system is considered as a whole. For example, aspects from physical lens structure optimization to displayed 2D image re-coding are all taken into account. Each step of the light-field reconstruction is modeled respectively and added to the back-propagation loop of the convolutional neural network (CNN) for better overall performance. In the optical experiment presented, a full-parallax 3D scene with smoother contours is achieved within a 90-degree viewing angle.

2. Principle

The process of smoothing image edges using a joint neural-network-based design is shown in Fig. 2. It includes three processes, which are the joint input process, the preprocessing process, and the output reconstruction process. In the joint input process, the elemental image array (EIA) and lens are packaged and input to the preprocessing convolutional neural network (PCNN), which is an image array containing multiple viewpoints and is obtained by capturing, processing, and synthesizing the 3D scene with a virtual camera array. Digital capture is used here because the pinhole model can be used to obtain a perspective image without optical distortion. Because each lens unit in the lens array is independent and equivalent to each other, only a single lens unit is selected for discussion in the subsequent lens analysis and optimization in this paper. In the preprocessing process, a convolutional neural network based on a self-encoder is used, and the network contains downsampling, jump connections, and upsampling . In the reconstruction process, the pre-processed elemental image array (PEIA) output from the network is loaded onto a liquid crystal display panel (LCD), and a realistic edge-smoothed light-field image is restored by the optimally designed lens array.

Fig. 2. Process of using a co-design method to smooth the image edges.

Download Full Size | PDF

2.1 Optimizing optical elements

The details of the lensing information loading process during the joint input are shown in Fig. 3. In the analysis of the lens, the point spread function (PSF) is used, which describes the result of the response of the imaging system to a point source or point object. The point spread function, for an optical system, refers to the light-field energy distribution of the optical system’s output image when the input object is a point source. The point spread function changes with the position on the object surface. To determine the distribution of the point spread function of the lens, the object surface covered by the lens is divided into 10*10 sub-regions, and the point spread function of each sub-region is represented by the point spread function at its central position, as shown in the schematic diagram in Fig. 3(a). The point spread function can be approximated by using a Gaussian distribution instead, and the Gaussian distribution function is further extracted into a Gaussian kernel that is convenient for the convolution operation. The point spread function of many subregions can be converted into a Gaussian kernel array (GKA) consisting of many Gaussian kernels. Thus, a relatively complex optical device can be effectively represented by a set of Gaussian kernel arrays and implemented as input to the subsequent convolutional neural network, as shown in Fig. 3(b).

Fig. 3. Schematic of the lens information loading process.

Download Full Size | PDF

2.2 Modeling the overall imaging process

Figure 4 shows the iterative optimization process of the co-design CNN. First, EIA and GKA are packaged together and input to the network, and the network output obtains the pre-processed elemental image array (PEIA) and the pre-processed Gaussian kernel array (PGKA). The PEIA is sequentially operated by nearest neighbor interpolation (NNI). and convolution with the PGKA, which is used to simulate the optical reconstruction process in a real light-field display. The NNI simulates the process of pixel magnification through the lens, and the convolution with Gaussian kernels represents the process of passing the elemental image through the lens. The convolution yields the displayed elemental image array (DEIA), and finally, the display result is interpolated with the higher-resolution elemental image array (HEIA) for similarity calculation. The calculation result is then used as the loss function for the network reverse transmission process, and the network is iteratively optimized. In the Fig. 4, the local details of the EIA, DEIA, and HEIA are each enlarged. Images in the EIA have a strong pixelated style and jagged lines. HEIA is an iterative target loaded with high-resolution information, and the visual experience is better than the original EIA. The DEIA is processed by the neural network and convolution operation to effectively alleviate the shortcomings of the EIA.

Fig. 4. Iterative optimization process of the co-designed CNN.

Download Full Size | PDF

2.3 Learned-based optimization algorithm

Figure 5 shows the schematic diagram of the self-encoder employed in the preprocessing CNN. The EI and GKA and the convolution of the EI and GKA are stitched together respectively as the input to the self-encoder network, with EI characterizing the multi-view image source and GKA characterizing the parameters of the desired lens structure. The concatenation during input is concatenated along the dimension. The reason for not choosing to concatenate only EI and GKA is that it is difficult for the convolutional neural network to judge which part of the parameter represents the lens and which part of the parameter represents the image. Therefore, splicing the convolution results of EI and GKA into the network can improve the correlation between EI and GKA, and facilitate the convolutional neural network to identify the physical process of the image passing through the lens. The convolution results of both are used to predict the display results. The self-encoder consists of an encoder and a decoder, where the encoder is used to extract features, while the decoder output obtains the PEI for subsequent loss calculation. In the encoder, five convolutional layers are used. Each convolutional layer is processed with half the feature size and twice the number of features. In the decoder, the deconvolution operation is used to gradually increase the resolution of the features so that the resulting PEI is the same size as the input EI.

Fig. 5. Structure of the co-designed CNN.

Download Full Size | PDF

In this study, the encoder of the proposed CNN is trained based on architectural scenes. Many parallax images of streets and buildings were obtained for different scene models. More than 30,000 EIs with a 143 × 143 resolution were collected as a dataset and randomly cropped 128 × 128 patches were used for the training. Large datasets were used and regularization was implemented to avoid overfitting problems. In addition, techniques such as skip linking and batch normalization were used to improve the convergence performance of the network. The employed network converges well, when the training process is iterated to 80,000 iterations, as shown in Fig. 6. Each iteration takes 0.0675 seconds, and the total training time is 5400 seconds. The memory of model parameters saved by the convolutional neural network is 10.8MB. The network used was programmed in Tensorflow and run on an NVIDIA RTX 2080 GPU. The network structure contains five convolutional layers with 32, 64, 128, 256, and 512 features, respectively.

Fig. 6. Convergence curve of network training.

Download Full Size | PDF

2.4 Simulation of the reconstructed images with smooth edges

To prove the effectiveness of the pre-processed method, the display process of a 3D street scene was simulated. Figure 7(a) shows the simulation’s displayed images from the original EIA and Fig. 7(b) shows the simulation’s displayed images with pre-processing. Due to the magnification of the lens unit and the low resolution of the EI, it can be observed in Fig. 7(a) that the image quality is significantly degraded, and the jagged pattern at the edge of the image is very perceptible. However, in Fig. 7(b), it can be observed that the image quality is noticeably improved with the introduction of the pre-processing CNN. The SSIM values of the simulated images without and with pre-processing are shown at the top of each of the images in Fig. 7. By comparing the two simulation results, it can be concluded that the SSIM value of the enhanced image increases (from 0.8743 to 0.9598). Usually, an image SSIM value greater than 0.95 indicates that the distortion loss in image quality is within an acceptable range [24], which demonstrates that the proposed method effectively improves the jagged pattern of the image edges.

Fig. 7. Comparison results from the street model simulation: (a) Original simulation results; (b) simulation results with pre-processing.

Download Full Size | PDF

3. Experiments

To confirm the effectiveness of the proposed method, we performed the corresponding optical experiments. A part of the PGKA array obtained by the joint design is shown in Fig. 8(a). Due to the rotational symmetry of the lens unit, only 25 of the Gaussian kernels were used. In general, the diffuse spot is calculated by the ray-tracing method, which does not equivalent to the Gaussian distribution; however, for the system in which the geometric aberration is much larger than the diffraction effect (large aberration system), the diffuse spot is mainly determined by the geometric aberration [31]. The results indicate that the diffuse spot and the point spread function are well correlated. In addition, the diffuse spot‘s light intensity distribution function can be approximated by the Gaussian distribution, of which its root mean square radius can be calculated, as shown in Eq. (1).

(1)$$PSF = g(r) = {e^{ - \alpha {r^2}}}\textrm{ }\varepsilon = \sqrt {1/2\alpha }$$

where $\alpha$ is the variance of the Gaussian distribution, and $\varepsilon$ is the root mean square radius of the diffuse spot.Therefore, the lens is designed and processed according to the requirements of the root mean square radius of the diffuse spot under the different field of view of the projected lens. Figure 8(b) Fig. 8(b) shows a schematic diagram of the surface shape of the lens structure, Fig. 8(c) shows the distribution of the spot diagram from different fields of view, and Fig. 8(d) shows a real shot image of the lens array.

Fig. 8. (a) Part of the PGKA array; (b) schematic diagram of the surface shape of the lens structure; (c) spot diagram distribution of the lens; (d) real shot image of the lens array.

Download Full Size | PDF

Table 1 shows the specific parameters of the compound lens structure. The whole system consists of two lenses and a diaphragm, and their corresponding numbers have been marked in Fig. 8(b). Structure No. 1 is a lens with a radius of curvature of -18.14 mm on the front surface, a radius of curvature of the rear surface of 12.02 mm, and a thickness of 0.84 mm. Structure No. 2 is the diaphragm with a semi-diameter of 2.1 mm. It is 0.84 mm away from structure No. 1 and 1.76 mm away from structure No. 3. Structure No. 3 is a lens with a radius of curvature of the front surface of 3.66 mm, a radius of curvature of the rear surface of 7.76 mm, and a thickness of 0.89 mm. The error of the radius of curvature of the lens structure within ±0.2 mm is acceptable, and the scale of the lens array is 53 × 30.

Table 1. Parameters of lens structure

View Table

In the optical experiment, a desktop light-field display system composed of a directional diffuser, a lens array, and an LCD was demonstrated. The lens array is located 6.977 mm above the LCD, and the directional diffuser is located 180.00 mm above the lens array. The LCD has a screen size of 32 inches and a resolution of 7680 × 4320, and the 53 × 30 lens units form the lens array. The distance between the centers of the adjacent lens units is 13 mm, and each EIA consists of 143 × 143 pixels in a matrix format, forming a 143 × 143 viewpoint perspective.

Figure 9 shows the 3D building scene displayed in different views. The Fig. 9(a) shows the 3D image pre-processing network optimization without pre-processing and the bottom row shows the 3D image with pre-processing network optimization. The Fig. 9(b) shows the details of the image. By comparing the details, it can be concluded that the enhanced 3D image in the bottom row presents a 3D image with smoother edges. In addition, the Fig. 9(c) presents more detailed information regarding the 3D building scene in a 90-degree view. By introducing the pre-correction method in the light-field display, a clear and full parallax 3D image with a 30-cm display depth and 20449 viewpoints can be observed.

Fig. 9. Experimental comparison of the street model.

Download Full Size | PDF

4. Conclusions

The end-to-end optimization method works well in a complex system, such as a light-field system with plenty of components. Unfortunately, existing end-to-end light-field optimization attempts do not address the multi-viewpoint issues of 3D displays. Thus, a co-design method for smoothing the edges of images displayed on light-field displays is presented in this study. The core concept of the proposed method involves the joint optimization of the lens parameters and multi-viewpoint information sources through an optimization algorithm and reversely designing the lens array based on the optimization results. The experimental results demonstrate the successful realization of an edge-optimized large-viewing-angle light-field display.

Funding

National Key Research and Development Program of China (2021YFB3600504); National Natural Science Foundation of China (62075016, 62175015).

Disclosures

The authors declare no conflicts of interest. This work is original and has not been published elsewhere.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. N. Okaichi, M. Miura, J. Arai, M. Kawakita, and T. Mishina, “Integral 3D display using multiple LCD panels and multi-image combining optical system,” Opt. Express 25(3), 2805–2817 (2017). [CrossRef]

2. H. Watanabe, M. Kawakita, N. Okaichi, H. Sasaki, M. Kano, and T. Mishina, “High-resolution spatial image display with multiple UHD projectors,” Proc. SPIE 10666, 106660Y (2018). [CrossRef]

3. B. Liu, X. Sang, X. Yu, X. Gao, L. Li, C. Gao, P. Wang, Y. Le, and J. Du, “Time-multiplexed light field display with 120-degree wide viewing angle,” Opt. Express 27(24), 35728–35739 (2019). [CrossRef]

4. B. Liu, X. Sang, X. Yu, X. Ye, X. Gao, L. Liu, C. Gao, P. Wang, X. Xie, and B. Yan, “Analysis and removal of crosstalk in a time-multiplexed light-field display,” Opt. Express 29(5), 7435–7452 (2021). [CrossRef]

5. L. Liu, X. Sang, X. Yu, X. Gao, Y. Wang, X. Pei, X. Xie, B. Fu, H. Dong, and B. Yan, “3D light-field display with an increased viewing angle and optimized viewpoint distribution based on a ladder compound lenticular lens unit,” Opt. Express 29(21), 34035–34050 (2021). [CrossRef]

6. X. Yu, H. Li, X. Sang, X. Su, X. Gao, B. Liu, D. Chen, Y. Wang, and B. Yan, “Aberration correction based on a pre-correction convolutional neural network for light-field displays,” Opt. Express 29(7), 11009–11020 (2021). [CrossRef]

7. X. Su, X. Yu, D. Chen, H. Li, X. Gao, X. Sang, X. Pei, X. Xie, Y. Wang, and B. Yan, “Regional selection-based pre-correction of lens aberrations for light-field displays,” Opt. Commun. 505, 127510 (2022). [CrossRef]

8. X. Yan, Z. Yan, T. Jing, P. Zhang, M. Lin, P. Li, and X. Jiang, “Enhancement of effective viewable information in integral imaging display systems with holographic diffuser: Quantitative characterization, analysis, and validation,” Opt. Laser Technol. 161, 109101 (2023). [CrossRef]

9. Z. Yan, X. Yan, X. Jiang, C. Wang, Y. Liu, X. Wang, Z. Su, and T. Jing, “Calibration of the lens’ axial position error for macrolens array based integral imaging display system,” Opt. Laser Eng. 142, 106585 (2021). [CrossRef]

10. Z. Yan, X. Yan, Y. Huang, X. Jiang, Z. Yan, Y. Liu, Y. Mao, Q. Qu, and P. Li, “Characteristics of the holographic diffuser in integral imaging display systems: A quantitative beam analysis approach,” Opt. Laser Eng. 139, 106484 (2021). [CrossRef]

11. X. Sang, X. Gao, X. Yu, S. Xing, Y. Li, and Y. Wu, “Interactive floating full-parallax digital three dimensional light-field display based on wavefront recomposing,” Opt. Express 26(7), 8883–8889 (2018). [CrossRef]

12. Z. Yan, X. Yan, X. Jiang, H. Gao, and J. Wen, “Integral imaging based light field display with enhanced viewing resolution using holographic diffuser,” Opt. Commun. 402, 437–441 (2017). [CrossRef]

13. L. Yang, X. Sang, X. Yu, B. Yan, K. Wang, and C. Yu, “Viewing-angle and viewing-resolution enhanced integral imaging based on time multiplexed lens stitching,” Opt. Express 27(11), 15679–15692 (2019). [CrossRef]

14. Y. Li, C. Ma, T. Zhang, J. Li, Z. Ge, Y. Li, and S. Serikawa, “Underwater Image High Definition Display Using the Multilayer Perceptron and Color Feature-Based SRCNN,” IEEE Access 7, 83721–83728 (2019). [CrossRef]

15. K. Song, G. Liu, Q. Wang, Z. Wen, L. Lyu, Y. Du, L. Sha, and C. Fang, “Quantification of lake clarity in China using Landsat OLI imagery data,” Remote Sens. Environ. 243, 111800 (2020). [CrossRef]

16. G. Gao, H. Lai, Y. Liu, L. Wang, and Z. Jia, “Sandstorm image enhancement based on YUV space,” Optik 226, 165659 (2021). [CrossRef]

17. S. Wang, K. Gu, K. Zeng, Z. Wang, and W. Lin, “Perceptual screen content image quality assessment and compression,” in International Conference on Image Processing1434–1438 (IEEE, 2015).

18. T. Tariq, J. L. Gonzalez Bello, and M. Kim, “A HVS-Inspired Attention to Improve Loss Metrics for CNN-Based Perception-Oriented Super-Resolution,” in IEEE/CVF International Conference on Computer Vision Workshop3904–3912 (2019).

19. X. Meng, R. Du, and A. Varshney, “Eye-dominance-guided foveated rendering,” IEEE T. Vis. Comput. Gr. 26(5), 1972–1980 (2020). [CrossRef]

20. L. Zhao, H. Bai, J. Liang, A. Wang, B. Zeng, and Y. Zhao, “Local activity-driven structural-preserving filtering for noise removal and image smoothing,” Signal Process. 157, 62–72 (2019). [CrossRef]

21. H. Xu and D. Ge, “A novel image edge smoothing method based on convolutional neural network,” Int. J. Adv. Robot. Syst. 17(3), 172988142092167 (2020). [CrossRef]

22. H. Saito, T. Ito, K. Omachi, A. Inugami, M. Yamaguchi, M. Tsushima, Y. Mariya, and I. Kashiwakura, “Effectiveness of the smoothing filter in pediatric 99 m Tc-dimercaptosuccinic acid renal scintigraphy,” Radiol. Phys. Technol. 13(1), 104–110 (2020). [CrossRef]

23. C. Wang, X. Chen, S. Min, J. Wang, and Z. J. Zha, “Structure-Guided Deep Video Inpainting,” IEEE Trans. Circuits Syst. Video Technol. 31(8), 2953–2965 (2021). [CrossRef]

24. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef]

25. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end Optimization of Optics and Image Processing for Achromatic Extended Depth of Field and Super-resolution Imaging,” ACM Trans. Graph. 37(4), 1–13 (2018). [CrossRef]

26. G. Wetzstein, H. Ikoma, C. Metzler, and Y. Peng, “Deep Optics: Learning Cameras and Optical Computing Systems,” in 54th Asilomar Conference on Signals, Systems, and Computers1313–1315 (2020).

27. C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deep Optics for Single-Shot High-Dynamic-Range Imaging,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition1372–1382 (2020).

28. A. Jiang and O. Yoshie, “A Reinforcement Learning Method for Optical Thin-Film Design,” IEICE Trans. Electron. 105(2), 95–101 (2022). [CrossRef]

29. X. Yang, L. Huang, Y. Luo, Y. Wu, H. Wang, Y. Rivenson, and A. Ozcan, “Deep-learning-based virtual refocusing of images using an engineered point-spread function,” ACS Photonics 8(7), 2174–2182 (2021). [CrossRef]

30. A. Mall, A. Patil, D. Tamboli, A. Sethi, and A. Kumar, “Fast design of plasmonic metasurfaces enabled by deep learning,” J. Phys. D: Appl. Phys. 53(49), 49LT01 (2020). [CrossRef]

31. F. Song, X. Chen, and C. Liu, Introduction to Modern Optical System Design (Science Press, 2019), Chap. 3.

Lens No.	P (mm)	R (mm)	Thickness (mm)	Glass
1	6mm	-18.14	0.84	K9
1	6mm	12.02	0.84	K9
2	4.2mm		0.82 (Air)
2	4.2mm		1.76 (Air)
3	6mm	3.66	0.89	K9
3	6mm	7.76	0.89	K9

Image edge smoothing method for light-field displays based on joint design of optical structure and elemental images

Abstract

1. Introduction

2. Principle

2.1 Optimizing optical elements

2.2 Modeling the overall imaging process

2.3 Learned-based optimization algorithm

2.4 Simulation of the reconstructed images with smooth edges

3. Experiments

4. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Tables (1)

Equations (1)

Optics Express