DCPNet: a dual-channel parallel deep neural network for high quality computer-generated holography

Qingwei Liu; Jing Chen; Bingsen Qiu; Yongtian Wang; Juan Liu

doi:10.1364/OE.502503

1. Introduction

Holographic display is widely recognized as the most promising 3D display technology for the next generation of augmented reality (AR) and virtual reality (VR) devices [1–3], which could accurately record both the intensity and depth information of real objects. However, a significant challenge lies in the fact that most existing spatial light modulators (SLMs) are unable to simultaneously modulate the amplitude and phase information of optical waves. To tackle this problem, various methods have been proposed, including the use of amplitude-only holograms and POHs. Among these methods, POHs have emerged as the dominant encoding method for CGHs due to their high diffraction efficiency and ability to reconstruct holographic images without interruption from conjugate images [4,5]. Nevertheless, how to generate high quality POHs in real time remains a major challenging problem.

Existing POHs optimization methods can be categorized into iterative and non-iterative algorithms. Iterative algorithms, such as Gerchberg-Saxton (GS) [6], Wirtinger Holography [7], stochastic gradient descent (SGD) [8,9], gradient-descent optimization [10], typically require a large number of iterations to achieve high quality reconstruction, which is time-consuming and ineffective for dynamic holographic display. Common non-iterative algorithms include double phase amplitude encoding (DPAC) [11–13], error diffusion [14,15], and one-step method [16], etc. These non-iterative algorithms offer faster computation than iterative algorithms because of their non-iterative computation. However, the reconstructed image quality is susceptible to speckle noises and artifacts.

Recently, learning-based methods have gained significant attention in the field of optics due to its effectiveness and pervasiveness, and are gradually utilized for optimizing CGHs to address the issue of high computational cost. Learning-based CGHs are usually classified into two categories, the data-driven CGH and the model-driven CGH respectively. The data-driven CGH normally relies on a supervised training strategy, where massive pre-computed accurate holograms are used as labeled datasets. This dramatically increases the computational burden and make it impractical for large-scale images [17–19]. In addition, the performance is limited by the quality of pre-computed holograms.

Conversely, the model-driven CGH can achieve end-to-end unsupervised training by incorporating physical diffraction models into the network, making it more flexible for different display conditions. For instance, wu et al. [20] proposed HoloEncoder, which employs a UNet network to encode the given input and generate optimized POHs in an unsupervised manner. However, the imaging quality need to be improved. In order to enhance the imaging quality, HoloNet [8] used two convolutional neural networks (CNN) to predict the initial and final phases, resulting in high-quality reconstructions. In [21], Yu et al. presented a phase dual-resolution network to explore CNN's capability for generate CGHs, as well as investigate the effects of different combinations of loss functions on the reconstructed speckle noise. To further suppress speckle noises, Sun et al. [22] utilized a UNet-based CNN to predict POHs, in which Pearson correlation coefficient (PCC) and target-weighted standard deviation (TWSD) is employed as loss function. Dong et al. [23] introduced Fourier-inspired neural modules, which could be easily integrated into various existing learning-based CGH frameworks to achieve high-fidelity POHs. Zheng et al. [24] adopted hybrid domain loss training strategy to help improve the imaging quality, in which the frequency domain loss is used to complement the unlearned spatial information.

Although learning-based CGH algorithms have achieved significant breakthrough in generating promising CGH performance at high speed, the inherent encoding approach of these methods makes it challenging to further improve the reconstructed quality. The main reason is that these methods utilize real-valued kernels in the spatial domain instead of performing complex-valued convolutions. To ensure that the complex-valued wave field in the SLM plane is mapped onto real-valued kernels, these methods treat the complex-valued wave field as a two-channel spatial domain image, where its amplitude and phase are concatenated along the channel dimension. However, it is important to note that this encoding scheme is not compatible with the computational characteristics of complex amplitude.

There are several feasible solutions to address this issue, one of which is directly using complex-valued convolutional neural networks (CCNN) to learn the complex-valued wave field in the SLM plane and further acquire optimal POHs. Recent research has also indicated that the CCNN may have a richer representational capacity [25] and could be used to generate high-quality POHs [26]. Another feasible solution is complex amplitude modulation (CAM) methods. The most common CAM methods are the DPAC method [11–13] and the hologram bleaching method [27,28]. To achieve desired holograms, the DPAC method encodes the target complex amplitude into two real-valued phase elements, which are then sampled with complementary 2D binary gratings to fuse into a double-phase hologram (DPH).

Inspired by the DPAC, we propose a novel dual-channel parallel neural network (DCPNet) to achieve both high-quality and high-speed CGH. different from the previous learning-based CGH network, our DCPNet adopts two parallel real-valued phase optimization networks for the purpose of complex amplitude encoding. In the encoding stage, the first CNN, referred to as the complex amplitude generator, learns the target complex amplitude and then numerically propagates to the SLM plane. The complex-valued wave field in the SLM is then encoded into two real-valued phase elements. Two parallel CNNs, known as the phase optimizers, learn the encoded phase elements to obtain two sub-POHs. The complementary 2D binary grating is applied to sample the two sub-POHs and synthesize them into an optimized POH. In the decoding stage, the optimized POH is backward propagated to obtain the reconstructed amplitude, which is then directly compared with the input initial amplitude for unsupervised training. Simulation and optical experiments have demonstrated that the proposed DCPNet can suppress speckle noise, improve imaging uniformity, and preserve fine details in the reconstructed images.

In Section 2, we introduce the mathematical model and network architecture of the DCPNet. In Sections 3 and 4, we perform simulations and optical reconstructions of holograms to verify the effectiveness of the proposed DCPNet.

2. Method

Different from previous methods, we draw inspiration from the DPAC method and encode the complex-valued wave field in the SLM plane into two real-valued phase elements. These phase elements can be directly mapped onto the real-valued kernels of the CNN to learn the two sub-POHs. Similar to the DPAC method, we utilize the complementary 2D binary gratings (checkerboard patterns) to sample the two sub-POHs and synthesize a single POH. This POH is then back propagated at a certain distance to obtain the reconstructed amplitude. Figure 1 illustrates the framework of our method, which includes three CNN networks. One is a complex amplitude generator, and the others are phase optimizer to predict the POH.

Fig. 1. The principle of DCPNet algorithm. CNN1: complex amplitude Generator; CNN2 and CNN3: phase optimizer.

Download Full Size | PDF

2.1 Principle of DCPNet algorithm

The complex-valued wave field in the SLM can be expressed as follows:

(1)$${U_{\textrm{SLM} }}({x,y} )= {A_{\textrm{SLM} }}({x,y} )\textrm{exp} [{\textrm{j} {\varphi_{\textrm{SLM} }}({x,y} )} ]= {f_{\textrm{ASM} }}[{{f_{\textrm{CNN1}}}(I ),z} ]$$

where ${A_{\textrm{SLM} }}({x,y} )$ and ${\varphi _{\textrm{SLM} }}({x,y} )$ are the amplitude and phase of the SLM field ${U_{\textrm{SLM} }}({x,y} )$, respectively. $I$ is the input initial amplitude. ${f_{\textrm{CNN1}}}$ denotes the approximation function of the complex amplitude generator. ${f_{\textrm{ASM} }}[{\cdot ,z} ]$ denotes the forward propagation ASM, which can be expressed as:

(2)$${f_{\textrm{ASM} }}[{{f_{\textrm{CNN1}}}(I ),z} ]= {\mathrm{{\cal F}}^{ - 1}}\{{\mathrm{{\cal F}}[{{f_{\textrm{CNN1}}}(I )} ]H({{f_x},{f_y}} )} \}$$

where ${{\cal F}}$ and ${{{\cal F}}^{ - 1}}$ represent the Fourier transform and the inverse Fourier transform, respectively. $H({{f_x},{f_y}} )$ represents the transfer function of ASM and can be expressed as:

(3)$$H({{f_x},{f_y}} )= \left\{ \begin{array}{lc}\textrm{exp} \left[ {j \frac{{2\pi }}{\lambda }z\sqrt {1 - {{({\lambda {f_x}} )}^2} - {{({\lambda {f_y}} )}^2}} } \right]&if\sqrt {f_x^2 + f_y^2} < \frac{1}{\lambda }\\ 0&otherwish \end{array} \right.$$

Here ${f_x}$ and ${f_y}$ are the spatial frequency coordinates, $\lambda $ is the wavelength, and $z$ is the diffraction distance. When $z$ is positive, it represents ASM forward propagation, and vice versa. To map the complex-valued wave field in the SLM onto the real-valued kernel of the CNN, previous learning-based CGH methods have typically treated the complex-valued wave field as a two-channel spatial domain image:

(4)$${U_{\textrm{SLM} }}({x,y} )= \textrm{cat} [{{A_{\textrm{SLM}}}({x,y} ),{\varphi_{\textrm{SLM}}}({x,y} ),c} ]$$

where $cat [{\cdot ,c} ]$ indicates that amplitude and phase in the SLM are concatenated along the channel dimension. This encoding can be interpreted as the amplitude and phase are arbitrary two color channels of an RGB image. However, this approach is essentially an image processing operation and is incompatible with the computational characteristics of complex amplitude. To address this issue, we introduce the DCPNet method.

As demonstrated in [29], the complex amplitude can be represented as the sum of two pure phase functions with constant amplitude. Hence, the complex-valued wave field in the SLM can be re-expressed as:

(5)$${U_{\textrm{SLM}}}({x,y} )= B\textrm{exp} [{\textrm{j} {\theta_1}({x,y} )} ]+ B\textrm{exp} [{\textrm{j} {\theta_2}({x,y} )} ]$$

where $B = {{{A_{\max }}} / 2}$ is a constant, and ${A_{\max }}$ is the maximum value of ${A_{\textrm{SLM}}}({x,y} )$. ${\theta _1}({x,y} )$ and ${\theta _2}({x,y} )$ are two phase elements that can be calculated from the amplitude and phase of the SLM wave field. The formula is as follows [11–13]:

(6)$${\theta _1}({x,y} )= {\varphi _{\textrm{SLM}}}({x,y} )+ {\cos ^{ - 1}}[{{A_{\textrm{SLM} }}({x,y} )/{A_{\max }}} ]$$

(7)$${\theta _2}({x,y} )= {\varphi _{\textrm{SLM} }}({x,y} )- {\cos ^{ - 1}}[{{A_{\textrm{SLM} }}({x,y} )/{A_{\max }}} ]$$

Evidently, ${\theta _1}$ and ${\theta _2}$ are real-valued phase elements, and they can be directly mapped onto the real-valued kernel to learn two sub-POHs. Using the same sampling method as the DPAC, we employ complementary checkerboard patterns ${M_1}({x,y} )$ and ${M_2}({x,y} )$ to sample the two sub-POHs and then synthesize them into the predicted POH. The pixelated checkerboard pattern is represented by:

(8)$${M_1}({n\Delta x,m\Delta y} )= {\cos ^2}\left[ {\frac{{({n + m} )\pi }}{2}} \right]$$

(9)$${M_2}({n\Delta x,m\Delta y} )= {\cos ^2}\left[ {\frac{{({n + m + 1} )\pi }}{2}} \right]$$

where the ${\cos ^2}[{\cdot} ]$ operator is used to generate the desired checkerboard pattern. $n$ and $m$ are the indexes of pixels, and $\Delta x$ and $\Delta y$ are the pixel intervals of the pattern. It is obvious that ${M_1}({x,y} )$ and ${M_2}({x,y} )$ are complementary. Consequently, the predicted POH is expressed as:

(10)$$\hat{\phi }({x,y} )= {M_1}({x,y} ){f_{\textrm{CCN2}}}[{{\theta_1}({x,y} )} ]+ {M_2}({x,y} ){f_{\textrm{CCN3}}}[{{\theta_2}({x,y} )} ]$$

where $\hat{\phi }({x,y} )$ is the predicted POH, ${f_{\textrm{CCN2}}}$ and ${f_{\textrm{CCN3}}}$ are the approximation functions of the two phase optimizers. Note that the two phase optimizers have the same network architecture. However, after training, they have different weight parameters, which implies that ${f_{{\textrm{CCN3}}}}$ is different from ${f_{\textrm{CCN3}}}$. The predicted POH is back propagated at distance $z$ to obtain the reconstructed amplitude:

(11)$$\hat{I} = {f_{\textrm{ASM} }}\{{\textrm{exp} [{\textrm{j} \hat{\phi }({x,y} )} ], - z} \}$$

where ${f_{\textrm{ASM} }}\{{\cdot , - z} \}$ is the back propagation ASM, and $\hat{I}$ is the reconstructed amplitude. The loss function is utilized to calculate the loss value between the input initial amplitude and reconstructed amplitude. Given an appropriate number of training epochs, the optimal predicted POH is achieved by minimizing the loss value:

(12)$${\hat{\phi }_{\textrm{optimal}}}({x,y} )= \arg \min [{\mathrm{{\cal L}}({I,\hat{I}} )} ]$$

2.2 DCPNet network structure

The complex amplitude generator and the two identical phase optimizers are implemented using a similar UNet architecture [30]. All three networks have four downsampling stages and corresponding four upsampling stages. Each downsampling stage consists of one downsampling block and one residual block, while each upsampling stage also consists of one upsampling block and one residual block. Skip connections are utilized to fuse the feature maps learned in the downsampling stage with the output of the upsampling stage. The complex amplitude generator has one channel for input (amplitude) and two channels for output (amplitude and phase). Similarly, the phase optimizer has one input channel (phase) and one output channel (phase). The phase output layer of both the complex amplitude generator and the phase optimizer uses a tanh function to limit the phase value in the range [-π, π].

The residual blocks are ResNetv2 units [31], and they are not further described in this paper. Figure 2 illustrates the detailed structure of the downsampling block and the upsampling block. For downsampling and encoding the feature maps, we use a two-dimensional convolutional layer in the downsampling block. In the upsampling block, we avoid using the conventional transposed convolution method due to the checkerboard artifact problems it causes [32], stemming from the uneven overlap in the convolution process. Instead, we adopt the sub-pixel convolution method. This method consists of a convolutional layer with a stride of one and a four-times expansion of the original channel count, along with a pixel shuffle layer. The sub-pixel convolution method rearranges data from the channel dimension into blocks of 2D spatial data for upsampling.

Fig. 2. The detailed structure of the downsampling block and corresponding upsampling block.

Download Full Size | PDF

2.3 Design of loss function

We use a combination of negative Pearson correlation coefficients (NPCC), perceptual loss [33], and total variation (TV) regularizer as the loss function to train the DCPNet. The total loss function is presented as follows:

(13)$${\mathrm{{\cal L}}_{\textrm{total}}}({\hat{I},I} )= \alpha {\mathrm{{\cal L}}_{\textrm{NPCC} }}({\hat{I},I} )+ \beta {\sum\limits_l {||{{\psi_l}({\hat{I}} )- {\psi_l}(I )} ||} _2} + \gamma {||{\nabla \hat{\phi }} ||_1}$$

(14)$${\mathrm{{\cal L}}_{\textrm{NPCC} }}({\hat{I},I} )= ({ - 1} )\times \frac{{\sum\nolimits_i^n {({{{\hat{I}}_i} - \bar{\hat{I}}} )} ({{I_i} - \bar{I}} )}}{{{{\left\{ {\sum\nolimits_i^n {{{({{{\hat{I}}_i} - \bar{\hat{I}}} )}^2}\sum\nolimits_i^n {{{({{I_i} - \bar{I}} )}^2}} } } \right\}}^{{\raise0.7ex\hbox{$1$} \!\mathord{/ {\vphantom {1 2}} }\!\lower0.7ex\hbox{$2$}}}}}}$$

For clarity, in Eqs. (13) and (14), the explicit dependence of all functions with $x$ and $y$ has been omitted. $\hat{I}$ is the reconstruction amplitude, and $I$ is the input initial amplitude, i.e., the ground truth. ${\psi _l}$ is the output of the $l$-layer of the pre-trained VGG-19 network. $\hat{\phi }$ is the predicted POH. $\alpha $ and $\beta $ are the correlation coefficients, $\gamma $ is the corresponding penalty coefficient. We employed NPCC and perceptual loss to improve the quality of the reconstructed images, bringing them closer to a realistic representation with enhanced perceptual appeal. Meanwhile, the TV regularizer was used to constrain the phase variation of the hologram, resulting in a smoother reconstructed phase distribution. During the training period, $l$, $\alpha $, $\beta $, and $\gamma $ are set to 5, 0.5, 0.5, and 0.001, respectively.

3. Simulation experiments

The DCPNet algorithm is implemented with Python 3.8 and Pytorch 2.0.0. The public super-resolution dataset DIV2K [34], which consists of 800 images as the training dataset and 100 images as the validation dataset, was used to train the DCPNet. We trained the DCPNet with the Adam optimizer, and the initial learning rate was set to 0.001. The training period was set to 100 epochs, but the loss value tended to stabilize after about 35 epochs. The laser wavelengths are 638 nm, 532 nm, and 430 nm. The SLM pixel interval is 8 µm, and the diffraction distance is 250 mm. All simulation experiments are implemented on an Intel Xeon Gold 6230R CPU (2.10 GHz), 128GB RAM, and an NVIDIA RTX 3090 GPU with 24GB.

3.1 Simulation results

Figure 3 depicts the comparison of the numerical reconstruction results of POHs generated with various algorithms. We compare the reconstructed images of DCPNet with those of GS, DPAC, and the HoloNet methods using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) as evaluation indexes. These methods correspond to iterative methods, non-iterative methods and learning-based methods, respectively. The test images were taken from the DIV2K validation dataset, and were resized to 1920 × 1072 pixels. The iteration number of GS algorithm is set to 100. The zoomed-in view of the reconstructed image of the POHs in Fig. 3 is the result of magnification by a factor of 4.

Fig. 3. Comparison of numerically reconstructed images of POHs generated by several algorithms. (a) and (b) from the DIV2K validation dataset.

Download Full Size | PDF

The simulation experiment results show that the GS algorithm can obtain high-quality reconstructed images with 100 iterations, but it is time-consuming. Moreover, the optical reconstructed image of GS algorithm is susceptible to speckle noise contamination due to the randomness and uncertainty of the initial phase. As a result, it fails to meet the requirements for real-time holographic display. The DPAC method exhibits the highest computational efficiency, but its reconstructed images suffer from significant artifacts and checkerboard-like noise, which can be contributed to the sampling with checkerboard pattern. The HoloNet method yields smoother images, but distinct streak-like noise is observed in the reconstructed images. By using the proposed method, there are no noticeable artifacts and noise phenomenon in the reconstructed images. Furthermore, the proposed method outperforms the DPAC and HoloNet methods in terms of PSNR and SSIM for the above test images. This indicates that the proposed method offers superior imaging quality. It should be noted that although the proposed method also employs the checkerboard pattern sampling strategy, continuous iteration during the training period can significantly suppress the checkerboard-like noise by searching the optimal phase distribution.

The quantitative relationships between the runtime of several algorithms generated POHs and the imaging quality are illustrated in Fig. 4. The 100 images from the DIV2K validation dataset are used as the test images for calculating the average runtime and average PSNR. Although the DPAC has the fastest computation time among all approaches, it only achieves medium imaging quality. For the iterative GS algorithm, the reconstruction quality improves with the number of iterations, however, this is a time-consuming operation. Moreover, the GS also does not achieve comparable quality with that of the DCPNet after about 10 seconds. With the same level of runtime, the reconstruction quality of the DCPNet is about 2.24 dB higher than that of the HoloNet. The quantitative results highlight that the DCPNet has the fast runtime of about 28 frames per second for 1080p images as well as the high image quality of about 31.31 dB PSNR. This indicates that the DCPNet can effectively balance computation time and imaging quality, making it applicable for real-time holographic displays. Note that all the comparison experiments are performed under the same conditions, such as dataset, epoch number, diffraction model, device configuration, and so on.

Fig. 4. Quantitative relationship between runtime for generating POHs and imaging quality.

Download Full Size | PDF

We have also carried out the simulation experiments to verify the applicability of the proposed method on 2D binary images. To do this, the target USAF-1951 with 1024 × 1024 pixels, padded with zero to 1920 × 1072 pixels, is used as a test image, i.e., the ground truth. The pre-processed test image is shown in Fig. 5(a). The experiment conditions are fixed, and the reconstructed images of compared algorithms are presented in Fig. 5(b), (c), (d), and (e). The Fig. 5(b), (c), (d), and (e) are corresponding to the DPAC, GS, HoloNet, and the DCPNet algorithms, respectively. Figure 5(f) demonstrates the intensity distribution of pixels along the red solid line.

Fig. 5. The simulation results of the target image USAF-1951. (a) The test image; (b)-(e) The reconstructed intensity distributions of DPAC, GS, HoloNet and our proposed method, respectively; (f) The intensity distribution of pixels along the red solid line in the reconstructed image.

Download Full Size | PDF

It can be observed from Fig. 5(b) that the reconstructed image with DPAC method contains significantly speckle noise, which is caused by the spatial shifting noise introduced by the checkerboard pattern encoding [35,36]. Although the GS algorithm can improve the imaging quality compared to the DPAC method, its reconstructed intensity distribution also suffers from speckle noise, as depicted in Fig. 5(c). The main reason for this phenomenon is that the GS algorithm adopts a random phase as its initial phase distribution, which leads to destructive interference between the adjacent pixels. As shown in Fig. 5(d), the HoloNet provides the worst imaging quality among several algorithms, as well as the reconstructed intensity distribution is seriously disturbed by the artifact-like noise. Compared to the above three algorithms, the proposed DCPNet is more suitable for binary images, which provides the better imaging quality and can efficiently suppress speckle noise and artifact-like noise, as shown in Fig. 5(e). The intensity distributions of the pixels along the same red solid line in different reconstructed images are plotted in Fig. 5(f). As can be seen, compared to DPAC and HoloNet, the DCPNet can easily distinguish narrower lines, which suggests that the DCPNet has higher spatial resolution in the reconstruction and the ability to preserve finer details. The GS algorithm can also distinguish narrower lines with 100 iterations, but its running time is approximately 3 orders of magnitude slower than that of the DCPNet.

In order to quantitatively evaluate the reconstruction results of binary images, speckle contrast is introduced to assess the influence of speckle noise on the image quality [37]. The value of the speckle contrast C is given by:

(15)$$C(g )= \frac{{{\sigma _g}}}{{{\mu _g}}}$$

where ${\sigma _g}$ represents the standard deviation of the reconstructed image, and ${\mu _g}$ is the mean value of the reconstructed image. The smaller C value means less speckle noise and higher image quality. We objectively evaluate the signal region marked by the yellow solid box and the background region marked by the blue solid box in Fig. 5 using the evaluation index mentioned above. The results are shown in Table 1. As can be seen from Table 1, the GS algorithm has severe speckle noises in both the signal region and the background region, which can be attributed to the use of random phase. Since the reconstructed intensity distribution using the HoloNet is disturbed by artifact-like noise, it also has significant noise in the signal and background regions. Compared to the DPAC, our proposed method has comparable signal region noise, while less background noise. This indicates that the reconstruction of the proposed method achieves higher contrast and better visual quality, which is also supported by the optical experiment results in section 4.

Table 1. Measurement of the results in Fig. 5 using the speckle contrast evaluation index.

View Table | View all tables in this article

3.2 Ablation study

Most existing learning-based CGH algorithms typically compose the target complex amplitude using the phase learned by the initial phase generator (IPG) and the input initial amplitude. They encode the complex-valued wave field in the SLM plane into a two-channel spatial domain image to facilitate mapping onto the real-valued kernel (single-channel mapping). In our proposed method, we employ the complex amplitude generator (CAG) to directly generate the target complex amplitude, while encoding the complex-valued wave field in the SLM plane into two real-valued phase elements to facilitate mapping onto the real-valued kernel (dual-channel mapping). We conducted ablation studies to test the influence of two different target complex amplitude generation methods and two different mapping methods on the performance of our network.

We utilized all 100 images from the DIV2K validation dataset as test images to calculate the average PSNR and average SSIM. The results of the ablation study are presented in Table 2. Table 2 clearly illustrates that the average PSNR and average SSIM of the reconstructed images of our proposed DCPNet are 31.31 dB and 0.86, which are 2.16 dB and 0.03 higher than those by the CAG and single-channel mapping methods. We also measured the average PSNR and average SSIM of the reconstructed images using the IPG and dual-channel mapping methods are 27.89 dB and 0.79, which are 1.18 dB and 0.02 higher than those using the IPG and single-channel mapping methods. Besides, the complex amplitude generator that directly generates the target complex amplitude proves to be more suitable for our network.

Table 2. Ablation studies on target complex amplitude and mapping method

View Table | View all tables in this article

3.3 Generalization capability

To verify the generalization capability of the proposed DCPNet algorithm, we randomly select test images from the DIV2K validation dataset and the big buck bunny video. We carried out the reconstruction experiments for both grayscale and color holograms respectively, and the reconstructed results as shown in Fig. 6.

Fig. 6. Numerical reconstructions from POHs. The color POH is combined from the R, G, and B channel POHs. (a) comes from www.bigbuckbunny.org (© 2008, Blender Foundation) under the Creative Commons Attribution 3.0 license (https://creativecommons.org/licenses/by/3.0/); (b) comes from the DIV2K validation dataset.[34]

Download Full Size | PDF

As seen in the middle column of Fig. 6, the grayscale holograms and color holograms are smooth. Because the TV regularizer is employed to enforce constraints for phase variations during the training period, this can suppress the influence of speckle noise on the optical reconstructed images to a certain extent. Additionally, both the grayscale and the color hologram contain clear outlines of the test image and record more information about the test image. The reconstruction results of the grayscale images are shown in the top portion of Fig. 6, and the detailed zoomed-in views of the two sets of images are the results of a 4x magnification. The reconstructed images exhibit extremely close outlines, brightness and contrast to the target test images. Moreover, the details of both test images can also be well reconstructed, such as the eyes of the big buck bunny and the grill of the car.

In practice, the DCPNet algorithm demonstrates its capability to be generalized to full-color holographic displays. As described in Section 2.1, the wavelength in the proposed DCPNet is a hyper parameter, and when calculating holograms, the R, G, B channel images are treated uniformly. In our simulation experiments, the DCPNet algorithm successfully generate high-fidelity POHs for all three color channels, which are then synthesized into a color hologram, as shown in the bottom of Fig. 6. The reconstructed images of the color holograms are strongly similar to the target color images, and they achieve excellent scores of PSNR and SSIM. Notably, the eyes of the big buck bunny and the grill of the car can also be well reconstructed, which implies that the DCPNet algorithm has a good capacity to preserve details for color holographic displays.

4. Optical experiments

In this section, we perform optical experiments to further prove the feasibility of the proposed method. Our holographic display system is depicted in Fig. 7. The laser beam with a wavelength of 532 nm is expanded and collimated to illuminate a reflection-type phase-only SLM. This SLM is Holoeye Pluto, with the resolution of 1920 × 1080 and the pixel interval of 8 µm. The POHs generated by all algorithms, with 1920 × 1072 resolution, are zero-padded to 1920 × 1080 and then loaded into the SLM. The light waves, modulated and reflected by the SLM, are transmitted to the 4-f system. The 4f system, comprising two Fourier lenses and a filter, is used to eliminate the zero-order beam introduced by the SLM and the higher-order diffraction of the holographic reconstruction. The holographic reconstruction images are captured by a CCD (Lumenera camera INFINITY 4-11C).

Fig. 7. Optical setup of holographic display system. LP: linear polarizer; BS: beam splitter; SLM: reflection-type phase-only spatial light modulator; FL1 and FL2: Fourier transform lenses.

Download Full Size | PDF

The reconstructed images of the holographic display are shown in Fig. 8. We compare the reconstructed images using DPAC, HoloNet, and DCPNet algorithms. The zoomed-in view of the reconstructed image in Fig. 8 is the result of a 3x magnification. It is important to acknowledge that factors such as the current noise of the CCD, the ambient light, the non-100% filling factor of the SLM, and practical manual manipulation can have negative influences on the experiment results. Nevertheless, despite these potential influences, the simulation and optical experiment results are basically consistent.

Fig. 8. The optical reconstruction results using DPAC (first column), HoloNet (second column) and our proposed method (last column). “Castle” and “Butterfly” are from the DIV2K validation dataset. The binary pattern is a USAF-1951 test chart.

Download Full Size | PDF

As can be seen from Fig. 8, the optical reconstructed images with the DPAC algorithm have distinct speckle noise and unevenness, particularly noticeable in the butterfly wings and the castle chimney. Both HoloNet and the proposed DCPNet can effectively suppress speckle noise, and the intensity distributions of the reconstructed images are smooth. However, the HoloNet suffers from the loss of several details and feature information, resulting in blurred reconstructed images. The zoomed-in view of the reconstructed image of the castle illustrates that its outline and details are indistinguishable. On the contrary, the proposed DCPNet exhibits better outline and more details with less blurring compared to the DPAC method and the HoloNet method. These experiment results provide evidences for the effectiveness of our proposed method in suppressing speckle noise and enhancing the reconstruction quality. To further validate the effectiveness of the DCPNet, we also performed optical experiments using binary patterns. As can be seen from the last two rows of Fig. 8, the reconstructed images of the proposed DCPNet have higher visual quality and better contrast. Compared to the other two methods, our method is capable of distinguishing narrower lines shown in the zoomed-in view, which suggests that our method has higher spatial resolution and greater capability to retain minute details in the reconstruction. All the experiment results illustrate the effectiveness and feasibility of the proposed DCPNet method.

5. Conclusion

In this paper, we propose a novel learning-based CGH method, which adopted two parallel real-valued phase optimization networks to encode complex-valued wave field in the SLM, instead of simply concatenating the amplitude and phase along the channel dimension. The complex amplitude encoding method follows the computational characteristics of complex amplitude, rather than treating it solely as an image processing procedure in traditional learning-based CGH algorithms. We have described the principle and mathematical model of the DCPNet in detail. In addition, both grayscale and binary images are used for the simulation, and experimental results demonstrate that the DCPNet generates high-fidelity 2 K resolution POHs in 36 ms with an average PSNR of 31.31 dB and SSIM of 0.86. These results prove that our method is effective in improving the imaging quality and enhancing the spatial resolution. In optical experiments, the proposed method effectively suppresses speckle noise and improves uniformity. Especially for binary images, the reconstructed image of the DCPNet preserves richer details than those of other two methods. These experimental results prove that our DCPNet is reasonable, efficient, and provides a new way for design the desired hologram.

Funding

Basic research (JCKY2021602B021); National Natural Science Foundation of China (61975014, 62035003, U22A2079); Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park (Z211100004821012).

Acknowledgments

We wish to thank Professor Juan Liu from Beijing Engineering Research Center of Mixed Reality and Advanced Display for providing optical devices and Master Jie Wang for providing experiment guidance.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. Maimone, A. Georgiou, and J. S. Kollin, “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36(4), 1–16 (2017). [CrossRef]

2. C. Chang, K. Bang, G. Wetzstein, et al., “Toward the next-generation VR/AR optics: a review of holographic near-eye displays from a human-centric perspective,” Optica 7(11), 1563–1578 (2020). [CrossRef]

3. G. Situ, “Deep holography,” Light: Adv. Manuf. 3(2), 1–300 (2022). [CrossRef]

4. Y. Z. Liu, J. W. Dong, Y. Y. Pu, et al., “High-speed full analytical holographic computations for true-life scenes,” Opt. Express 18(4), 3345–3351 (2010). [CrossRef]

5. H. Dammann and K. Görtler, “High-efficiency in-line multiple imaging by means of multiple phase holograms,” Opt. Commun. 3(5), 312–315 (1971). [CrossRef]

6. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35(2), 237–246 (1972).

7. P. Chakravarthula, Y. Peng, J. Kollin, et al., “Wirtinger holography for near-eye displays,” ACM Trans. Graph. 38(6), 1–13 (2019). [CrossRef]

8. Y. Peng, S. Choi, N. Padmanaban, et al., “Neural holography with camera-in-the-loop training,” ACM Trans. Graph. 39(6), 1–14 (2020). [CrossRef]

9. Z. Wang, T. Chen, Q. Chen, et al., “Reducing crosstalk of a multi-plane holographic display by the time-multiplexing stochastic gradient descent,” Opt. Express 31(5), 7413–7424 (2023). [CrossRef]

10. S. So, J. Kim, T. Badloe, et al., “Multicolor and 3D Holography Generated by Inverse-Designed Single-Cell Metasurfaces,” Adv. Mater. 35(17), 2208520 (2023). [CrossRef]

11. O. Mendoza-Yero, G. Mínguez-Vega, and J. Lancis, “Encoding complex fields by using a phase-only optical element,” Opt. Lett. 39(7), 1740–1743 (2014). [CrossRef]

12. O. Mendoza-Yero, M. Carbonell-Leal, G. Mínguez-Vega, et al., “Generation of multifocal irradiance patterns by using complex Fresnel holograms,” Opt. Lett. 43(5), 1167–1170 (2018). [CrossRef]

13. Y. Qi, C. Chang, and J. Xia, “Speckleless holographic display by complex modulation based on double-phase method,” Opt. Express 24(26), 30368–30378 (2016). [CrossRef]

14. P. W. M. Tsang and T. C. Poon, “Novel method for converting digital Fresnel hologram to phase-only hologram based on bidirectional error diffusion,” Opt. Express 21(20), 23680–23686 (2013). [CrossRef]

15. P. W. M. Tsang, A. S. M. Jiao, and T. C. Poon, “Fast conversion of digital Fresnel hologram to phase-only hologram based on localized error diffusion and redistribution,” Opt. Express 22(5), 5060–5066 (2014). [CrossRef]

16. P. W. M. Tsang, Y. T. Chow, and T. C. Poon, “Generation of edge-preserved noise-added phase-only hologram,” Chin. Opt. Lett. 14(10), 100901 (2016). [CrossRef]

17. A. Khan, Z. Zhijiang, Y. Yu, et al., “GAN-Holo: Generative Adversarial Networks-Based Generated Holography Using Deep Learning,” Complexity 2021, 1–7 (2021). [CrossRef]

18. J. Lee, J. Jeong, J. Cho, et al., “Deep neural network for multi-depth hologram generation and its training strategy,” Opt. Express 28(18), 27137–27154 (2020). [CrossRef]

19. L. Shi, B. Li, C. Kim, et al., “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]

20. J. Wu, K. Liu, X. Sui, et al., “High-speed computer-generated holography using an autoencoder-based deep neural network,” Opt. Lett. 46(12), 2908–2911 (2021). [CrossRef]

21. T. Yu, S. Zhang, W. Chen, et al., “Phase dual-resolution networks for a computer-generated hologram,” Opt. Express 30(2), 2378–2389 (2022). [CrossRef]

22. X. Sun, X. Mu, C. Xu, et al., “Dual-task convolutional neural network based on the combination of the U-Net and a diffraction propagation model for phase hologram design with suppressed speckle noise,” Opt. Express 30(2), 2646–2658 (2022). [CrossRef]

23. Z. Dong, C. Xu, Y. Ling, et al., “Fourier-inspired neural module for real-time and high-fidelity computer-generated holography,” Opt. Lett. 48(3), 759–762 (2023). [CrossRef]

24. H. Zheng, J. Peng, Z. Wang, et al., “Diffraction model-driven neural network trained using hybrid domain loss for real-time and high-quality computer-generated holography,” Opt. Express 31(12), 19931–19944 (2023). [CrossRef]

25. C. Trabelsi, O. Bilaniuk, Y. Zhang, et al., “Deep Complex Networks,” arXiv, arXiv:1705.09792v4 (2018). [CrossRef]

26. C. Zhong, X. Sang, B. Yan, et al., “Real-time High-Quality Computer-Generated Hologram Using Complex-Valued Convolutional Neural Network,” in IEEE Transactions on Visualization and Computer Graphics. [CrossRef]

27. D. Pi, J. Liu, and S. Yu, “Speckleless color dynamic three-dimensional holographic display based on complex amplitude modulation,” Appl. Opt. 60(25), 7844–7848 (2021). [CrossRef]

28. D. Pi, J. Wang, J. Liu, et al., “Color dynamic holographic display based on complex amplitude modulation with bandwidth constraint strategy,” Opt. Lett. 47(17), 4379–4382 (2022). [CrossRef]

29. C. K. Hsueh and A. A. Sawchuk, “Computer-generated double-phase holograms,” Appl. Opt. 17(24), 3874–3883 (1978). [CrossRef]

30. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2015), pp. 234–241.

31. K. He, X. Zhang, S. Ren, et al., “Identity Mappings in Deep Residual Networks,” in European Conference on Computer Vision (ECCV) (2016), pp. 630–645.

32. V. Dumoulin, J. Shlens, and M. Kudlur, “A Learned Representation For Artistic Style,” arXiv, arXiv:1610.07629v5 (2017). [CrossRef]

33. J. Johnson, A. Alahi, and F. F. Li, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision (ECCV) (2016), pp. 694–711.

34. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2017), pp. 126–135.

35. X. Sui, Z. He, G. Jin, et al., “Band-limited double-phase method for enhancing image sharpness in complex modulated computer-generated holograms,” Opt. Express 29(2), 2597–2612 (2021). [CrossRef]

36. X. Sui, Z. He, G. Jin, et al., “Spectral-envelope modulated double-phase method for computer-generated holography,” Opt. Express 30(17), 30552–30563 (2022). [CrossRef]

37. J. G. Manni and J. W. Goodman, “Versatile method for achieving 1% speckle contrast in large-venue laser projection displays using a stationary multimode optical fiber,” Opt. Express 20(10), 11288–11315 (2012). [CrossRef]

Method	C in the signal region	C in the background region
DPAC	0.03	0.53
GS	0.08	0.52
HoloNet	0.07	0.42
Proposed method	0.04	0.24

	CAG	dual-channel mapping	PSNR (dB)	SSIM
IPG w/ single-channel mapping	×	×	26.71	0.77
IPG w/ dual-channel mapping	×	$✓$	27.89	0.79
CAG w/ single-channel mapping	$✓$	×	29.15	0.83
CAG w/ dual-channel mapping (ours)	$✓$	$✓$	31.31	0.86

Method	C in the signal region	C in the background region
DPAC	0.03	0.53
GS	0.08	0.52
HoloNet	0.07	0.42
Proposed method	0.04	0.24

	CAG	dual-channel mapping	PSNR (dB)	SSIM
IPG w/ single-channel mapping	×	×	26.71	0.77
IPG w/ dual-channel mapping	×	$✓$	27.89	0.79
CAG w/ single-channel mapping	$✓$	×	29.15	0.83
CAG w/ dual-channel mapping (ours)	$✓$	$✓$	31.31	0.86

DCPNet: a dual-channel parallel deep neural network for high quality computer-generated holography

Abstract

1. Introduction

2. Method

2.1 Principle of DCPNet algorithm

2.2 DCPNet network structure

2.3 Design of loss function

3. Simulation experiments

3.1 Simulation results

3.2 Ablation study

3.3 Generalization capability

4. Optical experiments

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (2)

Equations (15)

Optics Express