Enhancing efficiency of complex field encoding for amplitude-only spatial light modulator based on a neural network

Daeho Yang

doi:10.1364/OE.506455

1. Introduction

Holographic display, often regarded as a future of display, has made significant strides in recent years. Since the vergence-accommodation conflict, which is a limitation of traditional displays, can be overcome by holographic display, holographic display can be utilized in diverse applications where depth expression is important [1]. Complex field is generally required to reconstruct hologram, but direct reconstruction of complex field is a challenging task due to the absence of a complex spatial light modulator (SLM). Instead of using a complex SLM, holograms are encoded as amplitude-only or phase-only field and then reconstructed by the corresponding SLMs: amplitude-only SLMs or phase-only SLMs. Although synthesizing methods, such as the position-dependent phase offset method [2] and the random phase method [3], can be commonly used in both types of devices, encoding methods are substantially different.

In phase-only SLMs, a double phase encoding method [4,5], superpixel method [6,7], and bleaching method [8] are representative methods to reconstruct complex field. The double phase encoding method [4,5] decomposes the complex value into two phase-only values and encodes the phase values by implementing fast alternating phase. In the superpixel method [6,7], several pixels are combined to represent one complex value and unwanted values are filtered to reconstruct the complex field.

In amplitude-only SLMs, a Burch encoding method [9,10] and superpixel method [11] are typically adopted to reconstruct holograms. The Burch encoding method [9] and its variant method [12] utilize the grating phase to lift the degeneracy between the conjugate and the signal field, and project the complex field to amplitude-only field with offset amplitude. In the superpixel method [11], multiple amplitude pixels are combined and filtered as in phase-only SLMs.

However, due to the limited diffraction efficiency, optical efficiency of amplitude-only holograms is low and thus amplitude-only SLMs have practical challenges to be adopted in portable holographic devices. Here, we enhanced optical efficiency of amplitude-only holograms by using a neural-network-based encoding method. The neural encoder achieves 2.4 times of optical efficiency compared to the Burch encoding method [9] while maintaining nearly the same image quality. To achieve the enhanced optical efficiency, we added optical efficiency to the cost function as well as mean squared error. Moreover, in order to reduce training instability, the cost function was smoothly changed during the training. Experimental reconstruction confirms the improved optical efficiency and image quality. We expect that the neural encoder can relieve one of the most significant drawbacks of amplitude-only holograms and give insights to develop better encoding methods for phase-only holograms.

2. Results

In amplitude-only SLMs, Burch encoding method [9] is widely adopted. The Burch encoding method is defined as

(1)$$B(U)={\rm Real}\{U e^{i \vec{k} \cdot \vec{x} } \}- {\rm min} \{ {\rm Real} \{ U e^{i \vec{k} \cdot \vec{x} } \} \},$$

where, $U$ refers to the complex field, $e^{i \vec {k} \cdot \vec {x} }$ refers to the off-axis grating phase term, ${\rm Real}$ is the function which extracts real values of complex values, and ${\rm min}$ is a minimum calculation function. Since only real values of the field can be displayed on amplitude-only SLMs, the conjugate field $U^*$ is also generated. To successfully reconstruct the target complex field, the signal and conjugate field are separated by the grating phase and an optical filter is used to block the DC background and conjugate field (Fig. 1(a)). However, most of the energy is concentrated in the DC field and half of the residual energy is leaked to the conjugate field (Fig. 1(b)). The optical energy in the signal field of the Burch encoding method is around 1.8% of the total energy (Fig. 1(c)), which is 1/15 of the double-phase encoding method.

Fig. 1. Schematics of neural encoding method. a, Conjugate field and DC field are blocked by optical filtering and complex field can be successfully reconstructed. b, Fourier domain image of the Burch encoding method. c, Relative energy distribution along x axis of Fourier domain. d, Network structure of neural encoder. The network contains concatenating skip connections and adding skip connections. e, Burch loss coefficient and reconstruction loss coefficient are varied during the training.

Download Full Size | PDF

To improve optical efficiency of amplitude-only SLMs, we present a neural-network-based encoding method (Fig. 1(d)). The structure of the neural network is based on a residual network [13]. The network has 64 channels for each convolution layer and 10 residual blocks. Complex field is fed to the network and the network returns encoded image. Although the network structure is simple, training the network has several challenges due to the absence of conventional ground truth data. Since the encoded amplitudes with enhanced efficiency would be different from the Burch encoded amplitudes, we should train the network by reconstructing the complex field using the output of the network.

However, the necessity of optical filtering for complex field conversion obstructs the adoption of reconstruction loss, especially in amplitude-only holograms that rely on the grating phase. Optical filtering suppresses the DC signal, and most output signal of the network is blocked under the random-initialized network, causing extremely slow update of weights. To overcome the issue, we gradually change the coefficient between two ground truths during the training (Fig. 1(e)). Initially, Burch encoded image is used as the ground truth, and then the reconstructed target is used as the ground truth.

The gradual shift is controlled by adjusting the ratio between the Burch encoded target and the reconstructed target. A decay factor is applied to the ratio at the end of each epoch, decreasing the ratio to half at a period of 7 epochs, where the initial value of the ratio is set to 1. After adopting the smooth transition, no training failures are observed, which corresponds to the cessation of cost function decrement. In summary, the total cost function can be written as

(2)$$\begin{aligned}\mathcal{L} =& (1-\gamma) \frac{1}{N} \sum_{x, y} \Biggl[|{{\rm Fr}_d \{ f(U)(x,y) \}}|^2 - I_{\rm target}(x,y)\Biggr]^2\\ &+ \gamma \frac{1}{N} \sum_{x, y} \Biggl[f(U)(x,y) - B(x,y)\Biggr]^2\\ &- \alpha \frac{1}{N} \sum_{x, y} |{{\rm Fr}_d \{ f(U)(x,y) \}}|^2. \end{aligned}$$

Here, ${\rm Fr}_d$ refers to the Fresnel propagation function for a distance $d$ with optical filtering, $N$ is the number of pixels and $f$ is a functional form of the neural network, $U$ is the single-plane computer generated hologram, $I_{\rm target}$ is the target image, and $B$ is the solution of the Burch encoding method. $\gamma$ is the smooth transition factor and $\alpha$ is a user selected coefficient for efficiency, which was set to 0.1 in the study. The first term of the cost function is the reconstruction loss, which ensures that the encoded hologram can reconstruct the target image while preserving depth information. The second term is the Burch loss, which verifies that the network-encoded hologram is similar to the result of the Burch encoding method. The third term is the efficiency loss, which aims to maximize the optical efficiency of the reconstructed hologram.

To train the network, DIV2K training dataset [14], which consists of 800 high-resolution images, is used and the training is repeated for 200 epochs. Since the dataset includes only 2-dimensional (2D) images, complex field is generated by propagating a 2D image using the angular spectrum method [15]. The propagation distance is randomly selected from 8 distances since 8 depth layers can cover most real-world spaces considering that the depth-of-field of the human eye is around 0.3 diopter [16].

Although the network after training does not have batch normalization layer, the batch normalization layers exist at the beginning of the training. Without batch normalization layers, the network often fails to converge due to the ill-conditioned weights originating from the random weight initialization [17]. However, we considered that the channel-wise normalization of the values is inappropriate for encoding because the complex field should conserve the balances of its colors as well as its distribution. So all batch normalization layers are fused with convolutional layers at the end of the first epoch. Before fusing batch normalization layers, the positions of the batch normalization layers were between the convolutional layers and the activation layers.

Figure 2 presents the numerical reconstruction results of the Burch encoding method and the neural encoding method. Although image quality metrics are slightly lower in the case of neural encoding method, the optical efficiency of the neural encoding method is higher than 2 times of that of the Burch encoding method (Fig. 2(a), (b). The Fourier-transformed images imply that the neural encoding method adaptively changes background offset of the encoded hologram (Fig. 2(d)), while Burch encoding method has a fixed background offset intensity (Fig. 2(c)).

Fig. 2. Numerical reconstruction results. a, Numerically reconstructed image of Burch encoding method. PSNR and SSIM compared to the target image are marked at the top right corner. “EFF” refers to the optical efficiency of each reconstructed hologram. The small image next to the large image is an enlarged image of the large image. b, Numerically reconstructed image of neural encoding method. Fourier domain image of the Burch encoding method (c) and neural encoding method (d).

Download Full Size | PDF

For a more concise comparison, we benchmarked the two encoding methods under the DIV2K validation dataset [14]. To evaluate image quality, PSNR and SSIM are adopted and optical efficiency compared to the full white incident light is also calculated (Fig. 3). Although PSNR (SSIM) is slightly decreased by 0.88 dB (0.008), the optical efficiency is increased from 1.76% ($\pm$ 0.65%) to 4.21% ($\pm$ 1.36%).

Fig. 3. Benchmark results of numerically reconstructed images depending on the encoding methods. “Burch” represents the Burch encoding method and “ENet” represents the neural encoding method. “EFF” represents optical efficiency compared to the full white incident light and the plot represents the optical efficiency multiplied by 10 (right $y$-axis). The error bar represents the standard deviation (SD) between the holograms with different target images.

Download Full Size | PDF

Optical efficiency of the neural encoding method can be further enhanced at the expense of image quality by changing the user-selected coefficient $\alpha$. Table 1 presents optical efficiencies and image quality metrics based on different $\alpha$ values. The condition $\alpha =0.1$ is chosen in this study because the condition results in a slight decrease in image quality metrics compared to lower $\alpha$ values, while providing a significant enhancement in optical efficiency.

Table 1. Efficiencies and image quality metrics depending on conditions. “Burch” refers to the Burch encoding method and others refer to the neural encoding methods with corresponding $\alpha$ values. The metrics are evaluated on the DIV2K validation dataset. As $\alpha$ increases, the optical efficiency increases and the image quality metrics decrease. The values in parentheses are SDs between the holograms with different target images.

View Table

Although the major purpose of adopting the smooth transition was to avoid training failures, the consistency of the results of the successfully trained networks is also improved after adopting it. To measure the consistency of the results, we trained multiple networks under the same conditions and calculated the SDs of the average PSNRs evaluated on the validation dataset. Without the smooth transition, the SD of PSNRs between 4 different networks was 0.33. The SD of PSNRs was 0.26 for a decay period of 4 and 0.05 for a decay period of 7.

To experimentally confirm the enhanced efficiency, we conducted optical reconstruction (Fig. 4). The experimental setup is composed of collimated lasers with 638, 515, and 450 nm wavelengths. The three lasers are coupled by a single mode fiber (Wikistar VENUS-RGB-TTL) and the fiber is collimated using an achromatic lens. The amplitude-only SLM has 6.3 $\mu m$ pixel pitch with 1920$\times$1080 resolution (May Inc. IRIS-F55). To evaluate image quality, we synthesized the hologram to be reconstructed in front of 1.71 mm of the SLM while different distances do not change the tendency of the experimental results. The hologram is filtered by a custom 4-$f$ imaging system and captured by a color camera (FLIR BFS-U3-200S6C-C).

Fig. 4. Experimental reconstruction. a, Optically reconstructed image of the Burch encoding method. b, Optically reconstructed image of the neural encoding method. The small image next to the large image is an enlarged image of the large image. PSNR and SSIM compared to the target image are marked at the top right corner. The intensity normalized to the intensity of the Burch encoding method is marked as relative intensity. Bright spots near the leg of the squirrel correspond to the reflected light from optical components.

Download Full Size | PDF

A noteworthy observation in the optical reconstruction is that the superior image quality metrics are achieved by the neural encoding method in comparison to the Burch encoding method. Reflected light from optical components generates background noise as well as bright spots in the reconstructed image and some of these artifacts cannot be masked by optical filtering. Owing to its greater luminosity, the neural encoding method is less susceptible to the noise originating from the reflected light.

3. Discussions

Recently, numerous studies have proposed hologram synthesis methods utilizing artificial neural networks to enhance image quality and reduce computational costs [5,13,18]. However, the enhancement of optical efficiency using an artificial neural network was not studied before. We presented that it is possible to enhance optical efficiency when the neural network is adopted in the encoding process. Moreover, it is experimentally shown that image quality of low efficiency holograms, such as amplitude-only holograms, can be also improved by enhancing optical efficiency.

It should be noted here that the amplitude-only holograms with the neural encoding method still have lower optical efficiency than phase-only holograms [4,19]. However, amplitude-only SLMs have unique properties which cannot be replaceable with phase-only SLMs. For example, amplitude-only SLMs are commonly employed in consumer electronics due to their low manufacturing cost and offer easy calibration by direct imaging of the SLMs [20]. Especially in high-speed applications, amplitude-only SLMs, such as digital micromirror devices, are required, and adopting the proposed method instead of the Burch encoding method [21] may enhance optical efficiency. In this regard, we anticipate that the proposed method can benefit applications that rely on amplitude-only SLMs. Furthermore, we expect that the optical efficiency of phase-only holograms can be enhanced by adopting a neural encoding method.

Funding

Gachon University (GCU-2023-00960001); National Research Foundation of Korea (RS-2023-00245184).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Pan, J. Liu, X. Li, et al., “A review of dynamic holographic three-dimensional display: algorithms, devices, and systems,” IEEE Trans. Ind. Inf. 12(4), 1599–1610 (2016). [CrossRef]

2. A. Maimone, A. Georgiou, and J. S. Kollin, “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36(4), 1–16 (2017). [CrossRef]

3. Y. Zhao, L. Cao, H. Zhang, et al., “Accurate calculation of computer-generated holograms using angular-spectrum layer-oriented method,” Opt. Express 23(20), 25440–25449 (2015). [CrossRef]

4. C.-K. Hsueh and A. A. Sawchuk, “Computer-generated double-phase holograms,” Appl. Opt. 17(24), 3874–3883 (1978). [CrossRef]

5. L. Shi, B. Li, C. Kim, et al., “Towards real-time photorealistic 3d holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]

6. V. Arrizón, “Complex modulation with a twisted-nematic liquid-crystal spatial light modulator: double-pixel approach,” Opt. Lett. 28(15), 1359–1361 (2003). [CrossRef]

7. V. Bagnoud and J. D. Zuegel, “Independent phase and amplitude control of a laser beam by use of a single-phase-only spatial light modulator,” Opt. Lett. 29(3), 295–297 (2004). [CrossRef]

8. X. Li, J. Liu, J. Jia, et al., “3d dynamic holographic display by modulating complex amplitude experimentally,” Opt. Express 21(18), 20577–20587 (2013). [CrossRef]

9. J. Burch, “A computer algorithm for the synthesis of spatial frequency filters,” Proc. IEEE 55(4), 599–601 (1967). [CrossRef]

10. V. Arrizón, G. Méndez, and D. Sánchez-de La-Llave, “Accurate encoding of arbitrary complex fields with amplitude-only liquid crystal spatial light modulators,” Opt. Express 13(20), 7913–7927 (2005). [CrossRef]

11. S. A. Goorden, J. Bertolotti, and A. P. Mosk, “Superpixel-based spatial amplitude and phase modulation using a digital micromirror device,” Opt. Express 22(15), 17999–18009 (2014). [CrossRef]

12. J. An, B. Shin, C.-K. Lee, et al., “7-2: High-contrast encoding method for amplitude-only computer generated hologram,” in SID Symposium Digest of Technical Papers, vol. 49 (Wiley Online Library, 2018), pp. 64–67.

13. D. Yang, W. Seo, H. Yu, et al., “Diffraction-engineered holography: Beyond the depth representation limit of holographic displays,” Nat. Commun. 13(1), 6012 (2022). [CrossRef]

14. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (2017).

15. K. Matsushima and T. Shimobaba, “Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields,” Opt. Express 17(22), 19662–19673 (2009). [CrossRef]

16. F. W. Campbell, “The depth of field of the human eye,” Opt. Acta 4(4), 157–164 (1957). [CrossRef]

17. N. Bjorck, C. P. Gomes, B. Selman, et al., “Understanding batch normalization,” Advances in Neural Information Processing Systems 31, 1 (2018).

18. Y. Peng, S. Choi, N. Padmanaban, et al., “Neural holography with camera-in-the-loop training,” ACM Trans. Graph. 39(6), 1–14 (2020). [CrossRef]

19. M. H. Maleki and A. J. Devaney, “Phase-retrieval and intensity-only reconstruction algorithms for optical diffraction tomography,” J. Opt. Soc. Am. A 10(5), 1086–1092 (1993). [CrossRef]

20. K.-H. F. Chiang, S.-T. Wu, and S.-H. Chen, “Fringing field effect of the liquid-crystal-on-silicon devices,” Jpn. J. Appl. Phys. 41(Part 1, No. 7A), 4577–4585 (2002). [CrossRef]

21. M. Zhou, Y. Chen, and J. Wu, “Optimization algorithm for amplitude-only hologram based on digital micromirror device,” in Third Optics Frontier Conference (OFS 2023), vol. 12711 (SPIE, 2023), pp. 214–221.

Condition	Efficiency (%)	PSNR (dB)	SSIM
Burch	1.76 (0.65)	26.66 (3.13)	0.786 (0.091)
$α$ =0.01	1.78 (0.60)	27.26 (3.40)	0.778 (0.089)
$α$ =0.03	3.02 (1.01)	26.69 (3.07)	0.783 (0.090)
$α$ =0.1	4.21 (1.36)	25.78 (2.92)	0.778 (0.090)
$α$ =0.3	4.72 (1.39)	23.78 (2.68)	0.765 (0.090)
$α$ =1.0	6.56 (1.17)	16.71 (2.15)	0.635 (0.107)

Enhancing efficiency of complex field encoding for amplitude-only spatial light modulator based on a neural network

Abstract

1. Introduction

2. Results

3. Discussions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (4)

Tables (1)

Equations (2)

Optics Express