Diffraction model-driven neural network trained using hybrid domain loss for real-time and high-quality computer-generated holography

Huadong Zheng; Huadong Zheng; Junchang Peng; Zhen Wang; Xinghua Shui; Yingjie Yu; Xinxing Xia; Xinxing Xia

doi:10.1364/OE.492129

1. Introduction

Holographic displays offer highly advanced capabilities for virtual and augmented reality (VR/AR) applications because of all the visual cues they can provide, making it a promising technology for the next generation of optical devices in VR and AR [1–3]. CGH uses 2D or 3D digital images as input and employs computer algorithms to transform these digital images into the complex optical wavefront in hologram plane.

In CGH, the complex amplitude information usually needs to be encoded as a phase-only or amplitude-only hologram according to the modulation mode of the spatial light modulator (SLM). The conversion of light field complex amplitude information into POHs is an important research topic in CGH due to the high diffraction efficiency of phase modulation and no conjugate images in the reconstruction [4].

The computation of high-quality POHs may require efficient computational methods. As we know, the traditional CGH algorithms are mainly iterative and non-iterative methods. Typical iterative algorithms include the Gerchberg-Saxton (GS) algorithm [5,6], stochastic gradient descent (SGD) [7–9], and non-convex optimization methods [10,11]. The G-S algorithm usually needs dozens of iterations to generate a high-quality hologram. The SGD uses a complex-valued loss function to optimize both the real and imaginary parts of the reconstructed image, and the reconstruction quality is relatively satisfactory. If an iterative algorithm is used to implement refreshable dynamic holographic displays, the POHs need to be calculated in advance. Non-iterative algorithms include dual-phase amplitude coding (DPAC) [12,13], error diffusion [14], and other hologram encoding methods [15–17]. While non-iterative algorithms may be faster than iterative algorithms, the reconstructed images are more susceptible to artifacts. Until now, one of the biggest challenges for generating POHs is reducing algorithm runtime and maintaining high-quality images as well.

Recently, the learning-based CGH has emerged as a promising candidate due to its ability of rapidly generating high-quality POHs. In data-driven CGH, the dataset for a large-scale hologram with high-quality needs to be prepared, and the neural network should fit the inverse problem by learning the coding methods of traditional algorithms. Sinha et al. used SLMs to create input phase objects for neural networks, it demonstrated that deep neural networks can solve end-to-end inverse problems in computational imaging [18]. Shi et al. synthesized a photorealistic 3D hologram in real-time from a single RGB-depth image and produced a large-scale CGH dataset (MIT-CGH-4 K) [19]. Kavakli et al. suggested learning a single complex-valued point spread function that optimizes the propagation of the POH to the target plane [20]. These CGH methods require the generation of label POHs in advance, can only be adapted to specific diffraction conditions, and require a long calculation time.

In contrast, model-driven CGH does not require labeled POHs due to the incorporation of physical diffraction into the neural network. Wu et al. proposed an autoencoder-based neural network (Holoencoder), which automatically learns potential encoding methods for POH in an unsupervised manner [21]. Liu et al. proposed a 4K-DMDNet to strengthen the constraints in the frequency domain of the reconstructed image and constructed a network structure combining the residual method and the sub-pixel convolution method [22]. Peng et al. trained a calibrated wave propagation model, HoloNet, which is capable of generating high-quality holograms with 1080P resolution in real-time [8]. Wang et al. used mixed low-frequency noise to train the holograms, which greatly alleviated the artifact problem of the reconstructed images [23]. Yu et al. proposed an optimized dual-resolution CNN and explored the effect of different loss functions on holograms with fewer speckles in the optically reconstructed images [24]. Shui et al. presented Self-holo, which adopts random reconstruction to one layer of 3D object making the network training independent of the number of object layers [25]. Dong et al. introduced a Fourier-inspired neural module that extracts diffusive information between image and hologram [26]. Zhong et al. developed a complex-valued convolutional neural network for generating POH with great efficiency [27].

Although learning-based CGH algorithms have recently made significant breakthroughs. Currently, CGH algorithms usually use spatial domain loss, which makes it difficult to optimize the frequency domain of the reconstructed image because each pixel is equally important for a given frequency. If the ability of CNN to learn low-frequency function is improved, it can help to improve the imaging quality [28]. Meanwhile, the use of single dataset does not fully exploit the mapping capabilities of the convolution operator.

In this paper, we present a diffraction model-driven neural network trained using hybrid domain loss, which greatly improves the quality of reconstructed images. Specifically, in the encoder stage of the initial phase prediction network (IPPN), we use the weights of the pretrained ResNet34 as the initialization which has learned some of the more general features and prevents overfitting [29]. Also, through the training strategy of hybrid domain loss, frequency domain loss complements the information not learned from spatial domain loss. It turned out that the simulation result is consistent with the optical result, greatly improving the contrast and considerably mitigating artifacts of the reconstructed images.

In Section 2 we introduce the hybrid domain loss and the detailed network structure. In Sections 3 and 4, we perform numerical and optical reconstructions of the holograms and verify the validity of Res-Holo.

2. Principle and method

2.1 Mathematical model of unsupervised CGH

In the holographic display, the complex-valued wavefield u_src generated by a coherent source is incident on the phase-only SLM and the wavefield is delayed in phase. The wavefield then propagates in free space to the target plane. The angular spectrum method (ASM) can obtain higher resolution, independent of the distance from the target plane to the hologram plane. The ASM is expressed as:

(1)$${\psi _p}(x,y;z) = {f_{ASM}}\{ \phi (x,y),z\} = IFFT\{{FFT\{ {u_{src}}(x,y) \cdot \exp [\textrm{j}\phi (x,y)]\} \times H({f_x},{f_y};z)} \}$$

where ψ_p(x,y;z) denotes the complex-valued wavefield at diffraction distance $z$, ${f_{ASM}}$ denotes ASM, $\phi (x,y)$ is the POH, z is the distance from the target plane to the SLM plane, ${u_{src}}$ denotes the wavefield generated by the coherent source. FFT and IFFT are the fast Fourier transform operator and inverse Fourier transform operator, respectively, and H(f_x, f_y; z) is the spatial frequency transfer function (SFTF).

When the wavefront propagated through a long distance by ASM, lattice-like artifacts normally appear in the reconstructed images. In fact, the artifact is an ill problem due to under-sampling. According to Nyquist's sampling theorem, the sampling frequency of the SLM should be at least two times the maximum spatial frequency. Therefore, in order to sample adequately, we may use the band-limited ASM, which limits the maximum spatial frequency of SFTF. The SFTF is expressed by [30]:

(2)$$H({f_x},{f_y}) = \left\{ {\begin{array}{{cc}} {\exp [\textrm{j}\frac{{2\pi }}{\lambda }z\sqrt {1 - {{(\lambda {f_x})}^2} - {{(\lambda {f_y})}^2}} ]}&{if\sqrt {{f_x}^2 + {f_y}^2} < \frac{1}{\lambda },}\\ 0&{otherwise,} \end{array}} \right.$$

where f_x and f_y are the spatial frequencies, $\lambda$ is the wavelength, z is the diffraction distance. In the case of short propagation distance, we can use a smaller pixel pitch SLM or expand the range of hologram by zero-filling. Based on the diffraction model-driven network, the target image is input to the CNN and the corresponding POH is output, which is then diffracted by the ASM to obtain the digitally reconstructed image. The process can be formulated as:

(3)$$\hat{I} = {f_{ASM}}({{\phi_{holo}}} )= {f_{ASM}}({{f_{CNN}}(I )} )$$

where $\hat{I}$ is the reconstructed image, ${\phi _{holo}}$ denotes POH, ${f_{CNN}}$ denotes CNN network, and $I$ is the target amplitude.

2.2 Pipeline of Res-Holo

The pipeline of Res-Holo is illustrated in Fig. 1. A color image is composed of red(R), green(G), and blue(B) channels. Each channel of the RGB image can be converted into amplitude image, and it was then used as an input. In the iterative algorithm, adding a random phase to the target amplitude image can change the light field distribution, thereby improving the quality of the reconstructed image. The operation of adding random phase can also be applied to deep learning algorithms. Given the input, the initial phase prediction network (IPPN) attaches a phase to the target amplitude. The complex amplitude in the target plane is forward propagated with distance z to the SLM plane. The SLM plane is decomposed into amplitude and phase components. These two parts are concatenated and fed into the phase-encoder network (PEN) to obtain the POH.

Fig. 1. The pipeline of Res-Holo. IPPN: initial phase prediction network, PEN: phase encoder network. Because the amplitude and phase of the wavefield are tightly coupled, the complex-valued wavefield in the hologram plane is decomposed into amplitude and phase and fed into PEN.

Download Full Size | PDF

The POH is backward propagated z to obtain the reconstructed image. In the training process, the loss value of the reconstructed image and the target image is back-propagated to update the weights of the whole network and minimize the loss value to obtain the optimal POH. Up to here, the complete model-driven CGH can be formulated as:

(4)$$\hat{I} = f_{ASM}^{z - b}({{\phi_{holo}}} )= f_{ASM}^{z - b}({{f_{holo}}({f_{ASM}^{z - f}({I{e^{i{f_{init}}(I)}}} )} )} )$$

(5)$${\hat{\phi }_{holo}} = \textrm{arg}\textrm{min}(\mathrm{{\cal L}}(I,\hat{I}))$$

where f_holo denotes the PEN, f_init denotes the IPPN, $f_{ASM}^{z - f}$ denotes the target complex amplitude through the ASM forward propagation with distance z to the SLM plane, $f_{ASM}^{z - b}$ denotes the POH backward propagation with distance z to the target plane, and ${\hat{\phi }_{holo}}$ denotes the optimized POH.

2.3 Res-Holo network structure

Both the IPPN and the PEN use similar U-net architecture [31]. Both networks consist of four downsampling stages and corresponding upsampling stages, with a maximum channel of 512 for the IPPN and 128 for the PEN, respectively. In the encoder stage of the IPPN, the weights of the pretrained ResNet34 [32] were used as initialization. During the training process, we should fine-tune its parameters to make it better for our task. Figure 2(a) shows the detailed schematic of the IPPN. Using the pretrained model, which has already learned some generic features, can improve the generalization capability. We use skip connection to fuse the information learned from the residual block with the information after upsampling. PEN does not require the large-parameters ResNet34. The quality is not significantly improved, but the POH generation time increases. Therefore, in the PEN, we use a U-net that is same as the IPPN but light-weighted. In the PEN, the downsampling block is essentially the same as the upsampling block. Figure 2(b) illustrates the detailed structure of the upsampling block. For upsampling block, we use transposed convolution to scale the feature map to high resolution with trainable weights, allowing more complex features to be learned. Finally, the output layer is a HardTanh layer that limits the phase value in the range of $[ - \pi ,\pi ]$.

Fig. 2. U-Net neural network architecture of Res-Holo. (a) Initial phase prediction network. The encoder part of the network uses pretrained ResNet34. (b) Upsampling block. Downsample is the value sampled by the residual block, Upsample is the value output by upsampling block.

Download Full Size | PDF

2.4 Optimized hybrid domain loss

The design of loss function has a crucial impact on the POH generated by deep learning. The loss function is central to the optimization of the training process. If the loss function can accurately measure the difference between the reconstructed image and the target image, the mapping ability of the convolution operator will be more accurate and efficient. Researchers have also explored this area with some positive results, such as Peng et al. [8] using mean square error (MSE) loss and perceptual loss, and Wu et al. [21] using negative Pearson correlation coefficient (NPCC) and perceptual loss.

However, some problems arise from using single spatial loss. For example, the reconstructed image is vulnerable to artifacts, and the contrast is lower, along with unclear contour. It may have a crucial impact on solving the above problems by constraining the frequency domain of the reconstructed image. Therefore, we use a combination of NPCC and perceptual loss [33] as the spatial domain loss and adjusted focal frequency loss (FFL) [34] as the frequency domain loss.

During the research, the frequency difference between the target image and the reconstructed image was directly used as a loss, the reconstruction quality was not improved. Because directly constraining the complete frequency information of an image is too redundant and less stimulating for neural network to learn the frequency components that are difficult to synthesize. In the training process, FFL only needs to constrain the information not learned by spatial loss. We have adjusted the FFL (FFL_adj), we define the spectrum difference between the target image and the reconstructed image as $\vec{r}({u,v} )$, and define the spectrum of the zero-valued image (zeros(M, N)) as ${\vec{r}_{zeros({M,N} )}}({u,v} )$. Figure 3 shows the insensitive frequency components for the spatial domain loss and the frequency distances to be optimized. For example, in Fig. 3, ${\vec{r}_1}({{u_1},{v_1}} )= {a_1} + {b_1}\textrm{j}$ is the spatial frequency value at the spectrum point (u₁, v₁), and the ${\vec{r}_2}({u_2},{v_2}) = {a_2} + {b_2}\textrm{j}$ is another spatial frequency value at the spectrum point (u₂, v₂). Constraining the real and imaginary parts of the insensitive frequencies means optimizing the amplitude ($|{{{\vec{r}}_1}} |,|{{{\vec{r}}_2}} |$) and phase (${\theta _1},{\theta _2}$) of the reconstructed image. ${d_1}({{{\vec{r}}_1},{{\vec{r}}_{zeros({M,N} )}}} )$, ${d_2}({{{\vec{r}}_2},{{\vec{r}}_{zeros({M,N} )}}} )$, and the other frequency distances to be optimized converge to zero-valued frequency, which may simplify the path of neural network optimization. The FFL_adj is expressed by:

(6)$$\vec{r}({u,v} )= FFT({|{\hat{I}({M,N} )- I({M,N} )} |} )$$

(7)$${\vec{r}_{zeros({M,N} )}}({u,v} )= FFT({zeros({M,N} )} )$$

(8)$$d({\vec{r},{{\vec{r}}_{zeros({M,N} )}}} )= \|{\vec{r}({u,v} )- {{\vec{r}}_{zeros({M,N} )}}({u,v} )} \|_2^2$$

(9)$$w(u,v) = \sqrt {d({\vec{r},{{\vec{r}}_{zeros({M,N} )}}} )}$$

(10)$$FFL\underline \;\; adj = \frac{1}{{MN}}\sum\limits_{u = 0}^{M - 1} {\sum\limits_{v = 0}^{N - 1} {w({u,v} )d({\vec{r},{{\vec{r}}_{zeros({M,N} )}}} )} }$$

where (M, N) denotes the image size corresponding to target image I in the spatial domain, Eq. (8) represents the frequency distance, w(u, v) is the dynamic spectral weight matrix, which is determined dynamically by the non-uniform distribution of current frequency distance loss during each training period [34].

Fig. 3. Spectrum of the difference between reconstructed image and target image (frequency components that are not sensitive to spatial domain loss) and the frequency distances to be optimized.

Download Full Size | PDF

It is worth noting that as the training period increases, the reconstructed image becomes more similar to the target image and the unlearned frequency component is more apparent. The training epoch was set as 20, and the weighting of FFL_adj was gradually increased according to training period. The hybrid domain loss is expressed by:

(11)$$\begin{array}{{cc}} {{\mathrm{{\cal L}}_{total}}(I,\hat{I}) = {\mathrm{{\cal L}}_{NPCC}}(I,\hat{I}) + {\mathrm{{\cal L}}_{percep}}(I,\hat{I}) + FFL\underline \;\; adj({{t_e} \cdot |{\hat{I} - I} |,0} ),}&{0 \le {t_e} \le 19} \end{array}$$

where ${\mathrm{{\cal L}}_{total}}$ denotes the total loss value, ${\mathrm{{\cal L}}_{NPCC}}$ denotes the negative Pearson coefficient, ${\mathrm{{\cal L}}_{percep}}$ denotes the perceptual loss, FFL_adj denotes the adjusted focal frequency loss, and ${t_e}$ denotes the training epoch.

3. Experiments

The Res-Holo is implemented with Python 3.7 and Pytorch 1.12.1. The super-resolution dataset DIV2K [35] was used to train Res-Holo, with 800 images as the training dataset and 100 images as the validation set. The data augmentation of the training set included image horizontal flipping, vertical flipping, and rotation. The Res-Holo uses the Adam optimizer and the learning rate is 0.001. The training period was set to 20 epochs. The pixel pitch of POH is 8 µm and the diffraction distance z is set to 0.30 m. The wavelengths of the red, green, and blue lasers are 670 nm, 532 nm, and 473 nm, respectively. The GPU used in the experiments is an NVIDIA RTX 3090 GPU 24 GB with CUDA version 11.2.

3.1 Simulation results

The comparison of numerical reconstructions of POHs calculated with different methods is presented in Fig. 4. We compared the Res-Holo with the DPAC [12], SGD [8], and HoloNet [8], using PSNR and structural similarity index (SSIM) as evaluation metrics. The CGHs above represent typical non-iterative, iterative, and learning-based algorithms, respectively. The number of iterations of the SGD algorithm is set to 500. The RGB images are preprocessed to occupy a region of 1600 × 880 pixels, padded with zeros out to a 1920 × 1072 pixels. The zoomed details images in Fig. 4 are the result of magnification with 3 times.

Fig. 4. Comparison of numerically reconstructed images in green channel. (a) and (b) come from DIV2K validation set.

Download Full Size | PDF

The simulation results show that the DPAC has high computational efficiency, but the reconstructed images have distinct chessboard-like noise and grid-like noise, which are strongly related to the chessboard encoder method. The SGD algorithm can produce high-quality POH, but cannot meet the demand for real-time display. Also, the simulation results are consistent with the experimental results, the reconstructed images are vulnerable to random noise. The reconstructed images of HoloNet are smooth, but there are significant streak-like noises. Compared to using the pretrained ResNet34, the reconstructed images generated by the untrained ResNet34 have significant raster-like noise at the top and the average PSNR value is 1.47 dB lower. In our research, the use of ResNet34 trained on large-scale datasets may help the convolution operator to learn low-frequency functions and improve generalization capability. Figure 5 shows the simulation and experimental reconstruction results from the calculated holograms of a grayscale image by the proposed pretrained Res-Holo and Res-Holo without pretrained by ResNet34 in the green channel. The zoomed detail images in Fig. 5 are the results of magnification with 3 times. As we can see in Fig. 5, the grayscale values of the detailed square area are more uniform by using the pretrained Res-Holo. The proposed pretrained Res-Holo obtains the best score considering both the PSNR and SSIM evaluation metrics.

Fig. 5. The comparison of simulation results (a) and experimental results (b) of grayscale image generated by the proposed pretrained Res-Holo and Res-Holo without pretrained in the green channel.

Download Full Size | PDF

Figure 6 shows the average inference time and PSNR for the above CGHs, using DIV2K 100 images as validation set. Res-Holo has a higher PSNR compared to the iterative algorithm SGD and the computation time is approximately 3 orders of magnitude faster. For the same level of inference time, the imaging quality of using Res-Holo is 2.44 dB higher than that of using HoloNet. It means that Res-Holo has a good ability to balance computation time and reconstruction quality. The proposed Res-Holo generates high-quality 2 K POHs in 0.014 s with an average PSNR of 32.88 dB and SSIM of 0.95.

Fig. 6. Running time of POH generation vs. reconstruction quality in the green channel. PSNR values for all computing methods are the average of 100 test images in DIV2K.

Download Full Size | PDF

3.2 Effectiveness of hybrid domain loss

Table 1 shows the quantitative results in different loss function conditions. In the research, when using NPCC alone, the loss value dropped rapidly and reached stability, but the network was easily over-fitted. Compared to training with NPCC and perceptual loss, training with the original FFL [34] and NPCC does not improve the quality, but rather slightly decreased, because constraining the complete frequency component of an image may be too much redundant for CNN. Compared to using spatial domain loss alone, when using the FFL_adj and training with hybrid domain loss, the PSNR and SSIM are improved to 32.88 dB and 0.95, respectively.

Table 1. Comparison of spatial domain loss and hybrid domain loss

View Table

To illustrate the effectiveness of using hybrid domain loss. The comparison experiment was conducted using three different combinations of loss functions, with other experimental conditions fixed. Figure 7 shows the PSNR and SSIM values of reconstructed images for different loss function conditions. In Fig. 7(c), the ground truth is shown in the upper right corner, and the corresponding reconstructed image is shown in the lower left corner. In order to show the effect of hybrid domain loss, the detailed areas have been enlarged by 4 times. The combination of spatial loss using NPCC and perceptual loss shows significant unclear contour in the red enlarged area and lower contrast in the yellow enlarged area. In Fig. 7(c) and Fig. 7(d), the simulation results and experimental results obtained using only the spatial domain loss have obvious artifacts, which look relatively unnatural. The combination of hybrid domain loss using NPCC [21] and FFL_adj solves the above problems very well. The addition of perceptual loss [33] constrains the reconstructed image with the real image in terms of advanced features such as edges and textures.

Fig. 7. The comparison of spatial domain loss and hybrid domain loss for reconstructed image quality in green channel. (a) and (b) come from DIV2K validation set. Perceptual means perceptual loss, and FFL_adj is the adjusted focal frequency loss. (c) shows the ground truth in the upper right corner and the reconstructed image in the lower left corner. (d) is the optical reconstruction results.

Download Full Size | PDF

Figure 8 shows the absolute value of the difference between the target image and the reconstructed image at different magnifications under the spatial domain loss and the hybrid domain loss. These images of difference can be expressed by the following equation:

(12)$${\Delta _k} = \delta \cdot |{{{\hat{I}}_k} - {I_k}} |$$

where Δ is the absolute value of the difference between the target image and the reconstructed image, δ is the magnification factor, and k denotes the order number of input image.

Fig. 8. The absolute value of the difference between the target image and the reconstructed image at different magnifications. $\delta$ is the magnification factor. The spatial loss represents the combination of NPCC + perceptual loss. The hybrid loss represents the combination of NPCC + perceptual loss + FFL_adj.

Download Full Size | PDF

In Fig. 8, the color bar denotes the gray value range from 0 to 255. The areas in Fig. 8 with higher gray values represent lower image quality. Due to the poor learning of low-frequency functions by the CNN and the fact that each pixel is equally important for a given frequency, it is difficult to make the convolution operator synthesize low-frequency components using spatial domain loss alone. We use the FFL_adj to constrain only the information that cannot be learned by the spatial domain loss. It turns out that our recommended training strategy of using hybrid domain loss solves these problems perfectly. As the training rounds increase, we multiply a magnification factor in front of the difference image to increase the weight of the difficult-synthesized frequency components.

3.3 Generalization capability

In order to test the generalization ability of Res-Holo, we randomly select three images from the DIV2K validation set as well as the big bucktooth bunny video. The laser wavelength and reconstruction distance are described in the initial part of Section 3. Figure 9 shows the color reconstructed image with POH. The zoomed-in detail images in Fig. 9 are the result of magnification 2 times.

Fig. 9. Numerical reconstructions from color holograms. The color POH is merged by R, G, and B POH. (a) comes from www.bigbuckbunny.org (© 2008, Blender Foundation) under the Creative Commons Attribution 3.0 license (https://creativecommons.org/licenses/by/3.0/); (b) and (c) come from DIV2K validation set.

Download Full Size | PDF

In Fig. 9, the lion's beard and the grille of the car are very clear, and the color of the reconstructed image is very similar to the original. The hologram has a clear outline of the reconstructed image and records more information about the reconstructed image. All these results show that Res-Holo has good generalization capability.

4. Optical reconstruction and discussion

To further verify the effectiveness of the algorithm, we built a color holographic display system based on the time-division multiplexing method, as shown in Fig. 10. Three colors of laser beams are first passed through attenuators and polarizers to adjust the intensity and polarization direction of the light; they are then expanded and collimated to illuminate the SLM. The collimating lenses are doublet lenses that suppress chromatic aberrations. The light waves are transmitted to the 4-f system by modulation and reflection from the SLM and go through the beam splitter. The focal length of the Fourier lens in the 4-f system is 200 mm. We used a camera to capture the reconstructed image. The wavelength of the green laser is 532 nm, the wavelength of the red laser is 670 nm, and the wavelength of the blue laser is 473 nm. The phase-only SLM is produced by Holoeye Photonics AG. Its resolution is 1920 × 1080 pixels and the pixel pitch is 8 µm.

Fig. 10. Experimental system for holographic photoelectron reconstruction. P1, P2, and P3 are polarizers; A1, A2, and A3 are attenuators; M is a mirror; DM1 and DM2 are dichroic mirrors; MO is microscope objective; CL is a collimating lens and BS is a beam splitter.

Download Full Size | PDF

The optical reconstruction results are shown in Fig. 11 and the details in the reconstructed image are the result of magnification with 3 times. We compare the reconstruction results of using Res-Holo to those of using DPAC as well as using SGD. The optical experiment results are consistent with the simulation results.

Fig. 11. Comparison of optically reconstructed results in the green channel.

Download Full Size | PDF

The optically reconstructed images of DPAC have distinct artifacts and are uneven, as shown in the reconstructed images of the wings of the parrot and the butterfly. The SGD shows severe random noise due to the unconstrained phase, resulting in the butterfly branch-like skeleton in the magnified region which has been completely drowned by the noise. The proposed Res-Holo significantly reduces artifacts in the reconstructed image and improves uniformity for better viewing.

We use time-division multiplexing method to realize full-color holographic display. Figure 12 shows the results of the full-color reconstruction, the details in the reconstructed image are the result of magnification with 3 times. The drawbacks of the above CGHs for monochrome experiments naturally affect them for color experiments as well. The DPAC has caused distinct artifacts in the eye region of the parrot and uneven colors in the wing part of the butterfly. The SGD algorithm can improve image quality as the number of iterations increases. But the iteration process will take much time. When large monochrome areas are encountered, like the belly of the parrot, the RGB color channels cannot accurately synthesize a certain color due to the unconstrained phase. The Res-Holo has a more regular phase distribution and the reconstruction quality is already high, so the color distribution is more even and closer to the real image.

Fig. 12. Comparison of optical full-color reconstructed results.

Download Full Size | PDF

5. Conclusion

In this paper, the proposed Res-Holo is used to rapidly generate high-quality 2 K resolution POHs. Compared to the existing learning-based CGHs, we use a novel hybrid domain training strategy, the adjusted FFL is added to further constrain the insensitive information of spatial loss. In the encoder stage of the IPPN, we use the weights of the pretrained ResNet34 as the initialization, which extracts more generic features and has a significant impact on the synthesis of high-quality optical reconstruction results. In simulation experiments, the quality of the reconstruction is significantly improved. The proposed Res-Holo generates high-quality 2 K POHs in 0.014 seconds with an average PSNR of 32.88 dB and SSIM of 0.95. In optical experiments, Res-Holo improves the uniformity and reduces artifacts in the reconstructed image. In addition, the training strategy using hybrid domain loss can be easily applied to other learning-based CGHs. In the future, we will intend to design a more learning-friendly unsupervised CGH algorithm that constrains not only the amplitude of the reconstructed image, but also the phase information, and try to extend Res-Holo to 3D situation.

Funding

National Key Research and Development Program of China (2021YFB2802200); National Natural Science Foundation of China (61875115, 62005154); Natural Science Foundation of Shanghai (20ZR1420500); Key Laboratory of Advanced Display and System Application, Chinese Ministry of Education (P201610).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. Maimone, A. Georgiou, and J. S. Kollin, “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36(4), 1–16 (2017). [CrossRef]

2. C. Chang, K. Bang, G. Wetzstein, B. Lee, and L. Gao, “Toward the next-generation VR/AR optics: a review of holographic near-eye displays from a human-centric perspective,” Optica 7(11), 1563–1578 (2020). [CrossRef]

3. G. Situ, “Deep holography,” Light: Adv. Manuf. 3(2), 1 (2022). [CrossRef]

4. D. Pi, J. Liu, and Y. Wang, “Review of computer-generated hologram algorithms for color dynamic holographic three-dimensional display,” Light Sci. Appl. 11(1), 231 (2022). [CrossRef]

5. H. Zheng, C. Zhou, X. Shui, and Y. Yu, “Computer-generated full-color phase-only hologram using a multiplane iterative algorithm with dynamic compensation,” Appl. Opt. 61(5), B262–B270 (2022). [CrossRef]

6. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35(2), 237–246 (1972).

7. X. Xia, F. Yang, W. Wang, X. Shui, F. Guan, H. Zheng, Y. Yu, and Y. Peng, “Investigating learning-empowered hologram generation for holographic displays with ill-tuned hardware,” Opt. Lett. 48(6), 1478–1481 (2023). [CrossRef]

8. Y. Peng, S. Choi, N. Padmanaban, and G. Wetzstein, “Neural holography with camera-in-the-loop training,” ACM Trans. Graph. 39(6), 1–14 (2020). [CrossRef]

9. Z. Wang, T. Chen, Q. Chen, K. Tu, Q. Feng, G. Lv, A. Wang, and H. Ming, “Reducing crosstalk of a multi-plane holographic display by the time-multiplexing stochastic gradient descent,” Opt. Express 31(5), 7413–7424 (2023). [CrossRef]

10. P. Chakravarthula, Y. Peng, J. Kollin, H. Fuchs, and F. Heide, “Wirtinger holography for near-eye displays,” ACM Trans. Graph. 38(6), 1–13 (2019). [CrossRef]

11. J. Zhang, N. Pégard, J. Zhong, H. Adesnik, and L. Waller, “3D computer-generated holography by non-convex optimization,” Optica 4(10), 1306–1313 (2017). [CrossRef]

12. X. Sui, Z. He, G. Jin, D. Chu, and L. Cao, “Band-limited double-phase method for enhancing image sharpness in complex modulated computer-generated holograms,” Opt. Express 29(2), 2597–2612 (2021). [CrossRef]

13. Y. Qi, C. Chang, and J. Xia, “Speckleless holographic display by complex modulation based on double-phase method,” Opt. Express 24(26), 30368–30378 (2016). [CrossRef]

14. P. Tsang and T.-C. Poon, “Novel method for converting digital Fresnel hologram to phase-only hologram based on bidirectional error diffusion,” Opt. Express 21(20), 23680–23686 (2013). [CrossRef]

15. D. Pi, J. Liu, J. Wang, Y. Sun, Y. Yang, W. Zhao, and Y. Wang, “Optimized computer-generated hologram for enhancing depth cue based on complex amplitude modulation,” Opt. Lett. 47(17), 4379–4382 (2022). [CrossRef]

16. D. Pi, J. Liu, and S. Yu, “Speckleless color dynamic three-dimensional holographic display based on complex amplitude modulation,” Appl. Opt. 60(25), 7844–7848 (2021). [CrossRef]

17. X. Li, J. Liu, J. Jia, Y. Pan, and Y. Wang, “3D dynamic holographic display by modulating complex amplitude experimentally,” Opt. Express 21(18), 20577–20587 (2013). [CrossRef]

18. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

19. L. Shi, B. Li, C. Kim, P. Kellnhofer, and W. Matusik, “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]

20. K. Kavaklı, H. Urey, and K. Akşit, “Learned holographic light transport: invited,” Appl. Opt. 61(5), B50–B55 (2022). [CrossRef]

21. J. Wu, K. Liu, X. Sui, and L. Cao, “High-speed computer-generated holography using an autoencoder-based deep neural network,” Opt. Lett. 46(12), 2908–2911 (2021). [CrossRef]

22. K. Liu, J. Wu, Z. He, and L. Cao, “4K-DMDNet: diffraction model-driven network for 4 K computer-generated holography,” Opto-Electron. Adv. 6(1), 220135 (2023). [CrossRef]

23. X. Wang, X. Liu, T. Jing, P. Li, X. Jiang, Q. Liu, and X. Yan, “Phase-only hologram generated by a convolutional neural network trained using low-frequency mixed noise,” Opt. Express 30(20), 35189–35201 (2022). [CrossRef]

24. T. Yu, S. Zhang, W. Chen, J. Liu, X. Zhang, and Z. Tian, “Phase dual-resolution networks for a computer-generated hologram,” Opt. Express 30(2), 2378–2389 (2022). [CrossRef]

25. X. Shui, H. Zheng, X. Xia, F. Yang, W. Wang, and Y. Yu, “Diffraction model-informed neural network for unsupervised layer-based computer-generated holography,” Opt. Express 30(25), 44814–44826 (2022). [CrossRef]

26. Z. Dong, C. Xu, Y. Ling, Y. Li, and Y. Su, “Fourier-inspired neural module for real-time and high-fidelity computer-generated holography,” Opt. Lett. 48(3), 759–762 (2023). [CrossRef]

27. C. Zhong, X. Sang, B. Yan, H. Li, D. Chen, X. Qin, S. Chen, and X. Ye, “Real-time high-quality computer-generated hologram using complex-valued convolutional neural network,” in IEEE Transactions on Visualization and Computer Graphics. [CrossRef]

28. N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, “On the spectral bias of neural networks,” in Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5301–5310 (2019).

29. V. Iglovikov and A. Shvets, “TernausNet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation,” arXiv, arXiv:1801.05746 (2018). [CrossRef]

30. K. Matsushima and T. Shimobaba, “Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields,” Opt. Express 17(22), 19662–19673 (2009). [CrossRef]

31. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2015), pp. 234–241.

32. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778.

33. J. Johnson, A. Alahi, and F. F. Li, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision (ECCV) (2016), pp. 694–711.

34. L. Jiang, B. Dai, W. Wu, and C. Loy, “Focal frequency loss for image reconstruction and synthesis,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), pp. 13899–13909.

35. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2017), pp. 126–135.

Combination of loss functions	NPCC	NPCC + Perceptual	NPCC + FFL	NPCC+ $F F L_a d j$	NPCC+ $F F L_a d j$ +Perceptual
PSNR (dB)	19.11	26.83	26.81	31.93	32.88
SSIM	0.79	0.91	0.90	0.94	0.95

Diffraction model-driven neural network trained using hybrid domain loss for real-time and high-quality computer-generated holography

Abstract

1. Introduction

2. Principle and method

2.1 Mathematical model of unsupervised CGH

2.2 Pipeline of Res-Holo

2.3 Res-Holo network structure

2.4 Optimized hybrid domain loss

3. Experiments

3.1 Simulation results

3.2 Effectiveness of hybrid domain loss

3.3 Generalization capability

4. Optical reconstruction and discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (1)

Equations (12)

Optics Express