Deep compression network for enhancing numerical reconstruction quality of full-complex holograms

Juyeon Seo; Jaewoo Lee; Juhyun Lee; Hyunsuk Ko

doi:10.1364/OE.494835

1. Introduction

Holography has been regarded as the ultimate 3D representation technology because it can reproduce the light field waves of real objects and solve several problems, including the vergence-accommodation conflict (VAC) of other three-dimensional displays that utilize human factors, such as binocular parallax and motion parallax [1]. A hologram is a medium used to record the interference patterns between an object beam and a reference beam. In analog holography, it is made of a special material, such as silver halide; however, the use of a film-type hologram is known to have many practical limitations. For example, the size of the virtual object is usually limited, and sensor or speckle noise may be present. In addition, the hologram cannot be reused once the patterns are written. To overcome these problems of the analog hologram, the electro-hologram (also known as holographic display) uses a special type of display panel, called a spatial light modulator (SLM), which modulates the wavefront electrically [2]. For decades, many studies have been actively conducted on the process of developing such digital holograms, which are largely divided into digital hologram generation (or rendering) and compression. Recently, the field of digital hologram generation has been mainly related to computer-generated holograms (CGHs) [3]. In contrast with optically captured holograms (OCHs), which are acquired from real objects in a well-organized experimental set-up, CGHs can be obtained using a computational method that simulates virtual objects [4].

In the rendering process, virtual objects reconstructed (diffracted) from a hologram can be output through holographic displays [5]. It is then important to secure an appropriate viewing angle (or field of view, FOV) of the rendered objects. As the viewing angle is inversely proportional to the size of a unit pixel (or pixel pitch) of a holographic display, a very small pixel pitch is required for wide viewing angle, resulting in a hologram with a large data size. In addition to ensuring the viewing angle, providing a visually pleasing experience to the viewer requires high resolution, high bit per pixel (bpp) precision, and deep depth range, which further increases the size of hologram data significantly, adding the burden of storage and transmission. Thus, it is essential to develop an efficient compression method for digital holograms.

For decades, many studies have been performed on hologram compression with different approaches. First, compressive sensing-based methods using a sparse matrix representation were proposed in [6,7], which claim better quality in the reconstructed image. Next, scalar and vector quantization methods of digital hologram compression were studied in [8]. Novel transform schemes for hologram coding have also been proposed. Specifically, Blinder et al. proposed fully arbitrary wavelet decomposition styles and directional wavelet transforms to consider local geometries for improving the coding performance for off-axis holography [9], and they also proposed unitary transforms with time-frequency warping for modeling deep holographic scenes [10]. Wavelet compression of amplitude/phase and real/imaginary parts of the Fourier spectrum of filtered off-axis digital holograms is compared in [11]. The authors in [12] proposed a hybrid HEVC-Wavelet compression algorithm, which adopts an asymmetric prediction scheme to overcome the uncorrelated form of HEVC compression error. In [13], the authors showed that the coding performance could be improved by the speckle noise reduction for both CGHs and experimental holograms. For binary hologram compression, a lossless coding technique using the context-based Bayesian tree model was presented in [14]. As for other approaches, the vector lifting scheme [15], wave atom transforms [16], and matching pursuit-based view dependent scalable coding [17] have been proposed. Moreover, several methods were specifically proposed to achieve better compression ratios for phase holograms, including alternative representation formats [18], phase-difference based compression [19].

Several studies have also been conducted to compress holograms using existing standard codecs, and comprehensive benchmark results using JPEG [20], JPEG2000 [21], H.264/AVC-Intra [22], and H.265/HEVC-Intra [23] are presented in [24–26]. Note that several hologram representation formats exist, such as the interferogram, real and imaginary, amplitude and phase, and phase-only hologram (POH). In [24], it was found that coding efficiency is highly dependent on the specific type of hologram format. Concretely, it has been revealed that the amplitude component is relatively easy to encode, thereby yielding the best coding performance. Next, real and imaginary components exhibit modest coding performance very similar to each other, while the phase-only format is the most difficult to encode owing to the increased randomness of fringe patterns. In [25], the authors provided an experimental analysis of the performance of the existing codecs, taking into account several holographic images reconstructed from different perspectives and distances. They claim that there are no significant fluctuations in image quality for a fixed compression level.

Among the standard codecs, HEVC-Intra, AVC/H.264-Intra, JPEG2000, and JPEG show better coding efficiency in that order, but none of them still provide satisfactory coding gain. This is mainly because standard codecs are optimized for real photographs, rather than for holograms that yield significantly different signal characteristics of interferometric nature. Specifically, in the case of a photograph, low-frequency components occupy most of the scene energy and vertical/horizontal structures dominate, whereas holograms have a significant number of high-frequency components with random structural directions. Thus, when conventional transforms used in standard codecs, such as discrete cosine/sine transforms (DCT/DST) or wavelet transform, are applied to holograms, the energy compaction does not work well, and the coding efficiency is reduced.

Recently, in the field of digital holography, various studies applying deep learning are being conducted, as summarized in [27]. Although various tasks, including fast numerical reconstruction, hologram image enhancement (e.g., noise removal), depth prediction, and autofocusing, have been tried, few deep learning-based hologram compression studies have been performed. In this work, to overcome the limitations of existing coding solutions, we propose a deep compression network for full-complex holograms comprising a pair of real and imaginary images, which provides superior coding performance and better perceptual visual quality. The contributions of our work are summarized as follows:

• We propose a deep network for efficiently encoding full-complex holograms, which is trainable in an end-to-end manner and can be optimized on a rate-distortion criterion. Specifically, the actual rates are estimated using the entropy of the latent space assumed to follow a Gaussian distribution in the training phase. For the distortion term, errors in both the encoded hologram plane and the rendered numerical reconstruction plane are taken into account. It is worth mentioning that most of the previous coding methods mainly considered the reconstruction quality of the decoded holograms. For those methods, the corresponding numerical reconstruction may not be well restored even at a low compression scenario because the relationships between the distortion in the hologram and the distortion perceived on the holographic image are still not well known [28]. To reflect the distortion in the numerical reconstruction that is actually displayed to the viewer, we designed the object distortion term based on a theoretical propagation model of wavefronts of light. As a result, the distortion loss for training the network is a weighted sum of two distortions in the hologram and object planes.
• For performance evaluation and comparison, extensive experiments were conducted. To this end, we defined two evaluation domains: the hologram plane where compression is conducted and the numerical reconstruction (hereafter referred to as NR) plane where the rendered objects are displayed. Experimental results show that the proposed network significantly outperforms the latest video coding standards HEVC/VVC by 23.70%/23.04% and 23.67%/29.59% in the hologram and NR planes, respectively, in terms of BD-rate gain for (P)SNR metric.

The rest of the paper is organized as follows. We introduce a set of full-complex holograms provided by JPEG-Pleno in Section 2, and the compression pipeline and evaluation framework are described in Section 3. Next, the benchmark coding performance of the existing standard codecs is analyzed in Section 4. Thereafter, the proposed deep compression network is detailed in Section 5, and experimental results and their discussions are provided in Section 6. Finally, concluding remarks are given in Section 7.

2. Full-complex hologram image database

The international standardization of hologram image compression technology, JPEG Pleno, is currently underway; the specific scope and technology validation procedures are described in detail in [29,30]. The JPEG standardization committee has collected databases from several institutions, which cover a wide range of use cases and characteristics, to validate the development of the standard [31]. It is noteworthy that these databases are heterogeneous in many aspects. First, there are different types of holograms, such as CGHs versus OCHs or holograms of shallow and deep scene depths, each type exhibiting different spatial-frequency behavior and statistics. Moreover, for representing complex wavefronts, different formats can be used, such as polar form (amplitude and phase), Cartesian form (real and imaginary), or phase-only representations. Note that the JPEG Pleno database comprises full-complex holograms in real/imaginary format. Additionally, these holograms are either colored or grayscale and have a dynamic range spanning from binary to floating-point precision.

In this study, we selected three holograms from the JPEG Pleno database as a test set to evaluate and compare the performance of the proposed network and anchor codecs, which are summarized in Table 1. To be specific, the selected holograms are CGH and color holograms of floating-point data produced by B-Com. Here, Dice16K and Piano16k have a resolution of 16384 x 16384, whereas that of DeepDices2k is 2048 x 2048. The holograms and their corresponding NR images for the recommended three reconstruction distances are shown in Fig. 1.

Fig. 1. Three test holograms and their corresponding numerical reconstructions (NRs) at three representative reconstruction distances. The first and second columns show the real and imaginary parts (cropped and magnified 400 times), respectively, of DeepDices2K, Dice16K, and Piano16K in that order. The third to fifth columns show the corresponding NRs, where the reconstruction distances (in $mm$) from left to right are $\{86.7, 166, 246\}$ for DeepDices2K, $\{6.57, 10, 13.1\}$ for Dice16K, and $\{6.8, 10, 12.5\}$ for Piano16K.

Download Full Size | PDF

Table 1. Details of the tested full-complex holograms.

View Table | View all tables in this article

The signal characteristics of holograms are very different from those of a real photograph [32]. To confirm this, we analyzed histograms of natural images and the test holograms. To this end, three normal images with different scene characteristics were selected from the Kodak image dataset [33]. As shown in Fig. 2, the histograms of natural images substantially differ among different images. In contrast, those for test holograms look very similar to each other, which means that the holograms have globally statistical similarity regardless of what the NRs look like. This can be verified by the Pearson correlation coefficients defined as Eq. (1), where $x_i, y_i$ and $\mu _x, \mu _y$ are the values and means of the x- and y-variable, respectively.

(1)$$r_{pcc} =\frac{\sum(x_i-\mu_x)(y_i-\mu_y)}{\sqrt{\sum(x_i-\mu_x)^2\sum(y_i-\mu_y)^2}}$$

The PCC coefficients range from 0.94$\sim$0.99 for any pair of six histograms (i.e., three imaginary and three real holograms), and from 0.36$\sim$0.79 for the natural images, as presented in Table 2. To further analyze intra-correlation (within the same hologram image) and inter-correlation (across the heterogeneous hologram images) of holograms, we calculated the self and cross mean normalized correlation (MNC) index using the template matching method. Concretely, a 32 x 32 patch of the central area in the hologram was cropped as a template. Then, the index was calculated as follows.

(2)$$r_{mnc} =\frac{\sum_{r,c}(I(r,c)-\bar{I}_{u,v})(t(r-u,c-v)-\mu_t)}{\sqrt{\sum_{r,c}(I(r,c)-\bar{I}_{u,v})^2\sum(t(r-u,c-v)-\mu_t)^2}},$$

where $I(r,c)$ and $t$ are a hologram and a template, respectively, with indices $r\in \{0,1,\ldots,height-1\}$, $c\in \{0,1,\ldots,width-1\}$, and $u,v\in \{0,1,\ldots,31\}$. Further, $\mu _t$ and $\bar {I}_{u,v}$ are the mean values of the template and the hologram area under the template, respectively. According to the results in Table 2, the holograms have much smaller values of self-MNC, ranging from 0.024–0.030, than the natural images. This means that holograms generally have low intra-correlation and a large proportion of high-frequency components/randomness. Moreover, unlike the surface plot of self-MNC for the natural image (Fig. 3(a)), for the hologram, there is correlation only at the same location as the template, whereas self-MNCs are almost zero in other regions (Fig. 3(b)). Next, in terms of cross-MNC, we calculated the index between real and imaginary and low(2K) and high(16K) resolutions, and found that the index values are also much smaller than those of the natural images. This implies a low inter-correlation for any pair of heterogeneous holograms.

Fig. 2. (a) histograms of the natural images, (b) histogram of the full-complex holograms

Download Full Size | PDF

Fig. 3. Surface plots of mean normalized correlation for (a) natural image (Kodim17), and (b) full-complex hologram (the real-valued image of DeepDices2K).

Download Full Size | PDF

Table 2. Statistical analysis of the full-complex holograms and natural images

View Table | View all tables in this article

3. Evaluation framework

JPEG Pleno released the document of the common test condition in [34], which describes the test pipeline about the procedure for hologram coding and objective/subjective visual quality assessment, as shown in Fig. 4. Overall, the compression can be conducted in two different ways. First, the hologram itself can be encoded, and then the decoded hologram is rendered to the object plane using the Numerical Reconstruction Software for Holography (NRSH) [35]. In a second coding method that is optional, the input hologram is first propagated to the object plane at a certain reconstruction distance, and then the encoding is performed. In particular, the latter method seeks to exploit the better spatial correlation in the object plane, as in a general image. However, if the compressed NR is backpropagated to the hologram plane and then re-rendered to the NR using different reconstruction distances, undesirable visual artifacts can occur, especially for holograms whose focus changes with depth [36]. Therefore, we adopted the compression in the hologram plane for the experiments in this work. Next, because the standard codecs only operate on non-negative real values with integer precision, a preprocessor splits the full-complex holograms into real and imaginary parts and maps a floating-point data to a 16-bit integer representation using a uniform mid-rise quantizer (MRQ) before encoding. The NRSH allows the NR of holograms from several institutions, such as b-com, Interfere, EmergImg, and WUT, and it supports three different propagation methods depending on the type of input hologram, which are the angular spectrum method (ASM), Fresnel, and Fraunhofer. The three test holograms selected in this work are based on the ASM.

For an objective performance evaluation, various image quality metrics are used to assess the reconstruction quality. For the decoded hologram, SNR and SSIM [37] are used, whereas PSNR, SSIM, and Visual Information Fidelity in Pixel domain (VIF_P) [38] are used for the propagated NR. Here, SNR and PSNR are based on pixel-wise errors, whereas SSIM and VIFp are perceptual metrics considering factors of human visual perception. A subjective image quality assessment and related discussions are detailed in Section 6.4.

Fig. 4. Test pipeline for hologram compression method and visual quality assessment [34].

Download Full Size | PDF

4. Performance analysis of legacy codecs for full-complex holograms

In Section 2, it has been shown that holograms exhibit low auto and inter-channel correlations. Given that legacy codecs were designed to exploit intra/inter correlation of source content, it is important to evaluate the coding performance of them on digital holographic data. To this end, we conducted the encoding experiments with HEVC-Intra and JPEG2000 on selected full-complex holograms from the JPEG Pleno database; the details are discussed in this section.

We used HM16.20 and Kakadu 8.0.5 as reference software [40] to conduct the experiments for HEVC-Intra and JPEG2000, respectively. To analyze various aspects, seven holograms with different generation methods, resolutions, and number of channels were selected and classified into five groups—G1: Astronat (OCH-grayscale, 2588x2588); G2: DeepChess (CGH-grayscale, 2048x16384); G3: DeepCornellBox16k (CGH-grayscale, 16384x16384); G4: DeepDices2k (CGH-color, 2048x2048); and G5: Biplane16k / Dices16k / Piano16k (CGH-color, 16384x16384), as summarized in Table 3. Following the two pipelines in Fig. 4, encoding was performed in both the hologram and the object planes. In terms of performance evaluation, both planes were tested using the aforementioned quality metrics. Following the recommendation of [34], the target encoding bitrates are set to {0.1, 0.25, 0.5, 1, 2, 4} bpp for monochrome holograms and {0.3, 0.75, 1.5, 3, 6, 12} bpp for color holograms, respectively. For example, Fig. 5 shows the four NRs rendered from the original hologram and three decoded holograms with different bitrates and visual qualities. The RD-curves of the encoding results are shown in Fig. 6–10, where the x-axis represents bpp ($=\frac {\text {Total}\,{\# }\,\text {of bits}}{\text {Total} \,{\# }\,\text {of pixels}}$) and the y-axis represents the visual quality metric values. In Fig. 6–10, each figure has four RD-curves—Holo-HM and Holo-J2K mean that the encoding is performed on real/imaginary holograms themselves using HM or JPEG2000, and Obj-HM and Obj-J2K mean that the encoding is done on the rendered NR.

Figure 6 shows the results of encoding the optically captured grayscale hologram, Astronaut, in the hologram and object planes, respectively. Note that the quality metric value in the object plane is the average result for the NRs rendered at three reconstruction distances. In specific, the encoding results in the hologram plane show that Obj-HM and Holo-J2k have the best and worst coding performance for SNR (Fig. 6(a)). When SSIM is used as a quality metric, Holo-J2K and Obj-J2K show the best and worst coding performance (Fig. 6(b)). Next, the encoding results in the object plane are shown in Figs. 6(c)–6(d). It is noteworthy that the visual quality of the NR is considered more important than the quality of the hologram itself, because the NR is presented directly to the viewer. For the PSNR and SSIM metrics, Obj-HM and Holo-HM showed the best coding gains, whereas Holo-J2K and Obj-J2K showed the worst performance, respectively, in that order.

Fig. 5. Numerical reconstructions of Astronaut [39] rendered from the original hologram (a) and the three decoded holograms with different bitrates (b–d) using HM16.20, where two values in parentheses mean PSNR and SSIM, respectively.

Download Full Size | PDF

Fig. 6. Encoding experiments of HEVC-Intra and JPEG2000 for G1.

Download Full Size | PDF

Fig. 7. Encoding experiments of HEVC-Intra and JPEG2000 for G2.

Download Full Size | PDF

Fig. 8. Encoding experiments of HEVC-Intra and JPEG2000 for G3.

Download Full Size | PDF

Fig. 9. Encoding experiments of HEVC-Intra and JPEG2000 for G4.

Download Full Size | PDF

Fig. 10. Encoding experiments of HEVC-Intra and JPEG2000 for G5.

Download Full Size | PDF

Table 3. The seven selected full-complex holograms for the encoding experiments of HEVC-Intra and JPEG2000.

View Table | View all tables in this article

The experiments on G2$\sim$G5 can be analyzed in a similar way, and the results are shown in Table 4. The main discussion points are summarized as follows, which are largely consistent with the results of previous studies [39,41].

• In terms of compression domain, object plane coding generally provides better compression efficiency than hologram plane coding (Fig. 6–10).
• HM-Intra with the predictive coding tool and transform coding tool shows consistently better coding gain than JPEG2000 with the transform coding tool alone (Fig. 6–10 and Table 4).
• In terms of the visual quality metric used, Obj-HM generally gives the best performance for (P)SNR. However, the best coding scenario is not constant for the SSIM metric (Table 4).

Table 4. Summary of the best and worst compression scenarios (A–B; A means a domain where coding is performed, and B means the standard codec used) in both the hologram and object planes.

View Table | View all tables in this article

As shown in Fig. 6–10, most of the RD-curves, with the exception of the SSIM-related RD-curves, tend to increase linearly without saturation as bpp increases. This is different from the tendency of the quality value to gradually become saturated when the bpp increases to some extent when compressing natural images. This might be mainly because the existing coding tools of standard codecs do not work well owing to the highly random nature of holograms, and thus the quantization parameter (QP) is the most dominant factor in coding efficiency.

To overcome these limitations, we propose a deep compression network that can be tailored to the inherent characteristics of holograms via a data-driven approach.

For the experiments in Section 6, we conducted the hologram plane coding rather than the object plane coding for all test codecs including our network, following the guideline provided by the JPEG Pleno CTC [34].

5. Proposed deep compression network

The architecture of the proposed deep network for compressing full-complex hologram is shown in Fig. 11. The network is trained to minimize the following rate-distortion (R-D) cost function.

(3)$$L=\lambda_1 \cdot R+\lambda_2 \cdot D_{holo}+\lambda_3 \cdot D_{obj}$$

where $R$ is the estimated bit amount after encoding, $D_{holo}$ means the distortion between the original hologram and the decoded hologram, and $D_{obj}$ is the distortion between the three pairs of NRs rendered from the original and decoded hologram, respectively. The $\lambda _1$, $\lambda _2$ and $\lambda _3$ are weighting parameters to balance the bit rate and distortions during R-D optimization, which are determined experimentally. Overall, the deep network comprises two pairs of analytic (transform) and synthetic (inverse-transform) networks, $T_a$ / $T_s$ and $H_a$ / $H_s$.

5.1 Base encoding and decoding networks ($T_a$ and $T_s$)

Initially, a pair of real and imaginary hologram patch images $x$ is fed into the analytic network $T_a$ as input, which is then transformed to latent information $y$. Next, $y$ is quantized to $\hat {y}$ by rounding ($Q$). In the training phase, instead of calculating the actual amount of bits, we estimate $R$ in Eq. (3) with entropy $H$ by assuming the latent space $y$ with Gaussian distribution model. Specifically, suppose that the entropy model of $\hat {y}$ is defined as $q_{\hat {y}}$ and the actual marginal distribution is $p_{\hat {y}}$. Because $p_{\hat {y}}$ is unknown, we use the approximated cross-entropy between the two distributions to estimate the bit consumption.

(4)$$\begin{aligned} R & = H(p,q) = E_{\hat{y}\sim p}\left[-\log_2 q_{\hat{y}}(\hat{y})\right] \\ & ={-}\sum p(\hat{y})\log_2 q_{\hat{y}}(\hat{y}) \approx{-}\sum \log_2 q_{\hat{y}}(\hat{y}) \end{aligned}$$

During the training phase, the goal is to learn the network parameters such that $q_y$ is as close to $p_y$ as possible, which can be measured by the KL-divergence defined as follows (ideally zero):

(5)$$\begin{aligned} D_{KL}(p||q) & = E_{p}\left[\log_2\frac{p(\hat{y})}{q(\hat{y})}\right] \\ & = \sum p(\hat{y})\cdot\left\lbrace \log_2 p(\hat{y}) - \log_2 q(\hat{y}) \right\rbrace \\ & = \sum p(\hat{y})\log_2 p(\hat{y})-\sum p(\hat{y})\log_2 q(\hat{y}) \\ & ={-} H\left(p \right) + H\left(p, q \right)\\ \Leftrightarrow R= & H(p,q)=H(p)+D_{KL}(p||q) \end{aligned}$$

Therefore, in addition to the above-mentioned purpose, efforts to reduce the bit rate $R$ during training also mean that the network’s weights converge in the direction such that the actual entropy $H(p)$ itself becomes smaller.

Fig. 11. Architecture of the proposed deep compression network for full-complex hologram.

Download Full Size | PDF

$T_a$ comprises iterative connections of the convolutional network (CNN) and generalized divisive normalization (GDN). For a given input hologram, the CNN starts extracting low-level features, such as edges and textures, near the input layer, and it is used to extract progressively high-level structural features as the depth of the layer increases to the bottleneck layer ($y$). The GDN is known to be suitable for density modeling of images [42]. Each CNN layer in Fig. 11 is represented by $N$ or $M$ (number of channels) x $W$ (kernel width) x $H$ (kernel height) x $F$ (up-scaling $\uparrow$ or down-scaling $\downarrow$ factor). Because the output of $T_a$ is quantized to $\hat {y}$ resulting in discontinuities, gradient-based weight updates in backpropagation are not possible. To solve this problem in training, we follow the method in [43] that relaxes the quantization process. Specifically, the authors showed that the quantized $\hat {y}$ can be approximated using a uniform distribution as $\hat {y}\approx y+u(-0.5,0.5)$, which is denoted hereinafter as $\tilde {y}$.

5.2 Networks for hyper-parameters ($H_a$ and $H_s$)

One of the key factors to improve the coding efficiency of the proposed network is to predict accurately the two parameters of the Gaussian model, the mean ($\mu$) and variation($\sigma$), of each element $\hat {y}_i$ in the latent space. To this end, we adopted a context-adaptive entropy model that uses two different types of contexts. Regarding the first context, we exploit the assumption that neighboring latent representations $\hat {y}_i$s are strongly correlated with each other. For example, for the current $\hat {y}_i$ to be encoded, we set a volumetric window of size 4 x 4 x M (the number of channels of $y$) as $c1$ as shown in Fig. 12. As pixels within the local window are likely to have similar values and $\hat {y}_i$s are encoded/decoded in raster scan order, $c1$ can be an informative context to predict the value of $\hat {y}$. Secondly, for context $c2$, we extend the base network composed of $T_a/T_s$ with a hyper-network of $H_a/H_s$ as shown in Fig. 11. The hyper-network works to further increase the accuracy of model parameter predictions. Here, the encoder (or analytic) network $H_a$ takes $\hat {y}$ as input and then transforms and quantizes it to $\hat {z}$, which is another latent space modeled as a simple zero-mean Gaussian distribution and only a variation parameter needs to be estimated. Next, the decoder (or synthetic) network $H_s$ outputs the 4 x 4 x M context $c2$, which is then concatenated with another 4 x 4 x M context, $c1$. Note that, because $c1$ is utilized in an autoregressive way during the encoding and decoding process, there is no need to encode the context information. Meanwhile, $c2$ should be transmitted to the decoder as a side information to avoid mismatch in compression, consuming the overhead bits. The concatenated context goes into the model parameter estimator ($E$ in Fig. 11) and outputs two scalar values, $\mu$ and $\sigma$. Finally, the latent space can be effectively decorrelated as $\frac {\hat {y}-\mu }{\sigma }$, which causes $\hat {y}$s to have lower entropy. As a result, the amount of bits required for encoding is reduced.

To sum up, the rate cost $R$ in Eq. (3) for end-to-end training can be calculated by the entropy values of two latent spaces, which is summarized as follows.

(6)$$R \approx E_{x\sim p_{x}}E_{\tilde{y}, \tilde{z}\sim q}\left[-\log p_{\tilde{y}|\hat{z}}(\tilde{y}|\hat{z})-\log p_{\tilde{z}}(\tilde{z}) \right],$$

(7)$$where \quad\quad y=T_{a}(x; \phi_{T}), \quad \hat{y}=round(y),$$

(8)$$z=H_{a}(\hat{y};\phi_{H}), \quad \hat{z}=round(z),$$

(9)$$c1=\hat{y}\,in\,window, \quad c2=H_{s}(\hat{z};\theta_H),$$

(10)$$(\mu_{i},\sigma_{i})=E(c1, c2), \quad \hat{x}=T_{s}(\hat{y}; \theta_{T})$$

Fig. 12. Demonstration of two context $c_1$ and $c_2$ used in the NRQN

Download Full Size | PDF

5.3 Distortion costs for the decoded hologram ($D_{holo}$) and rendered NR ($D_{obj}$)

As for the distortion term $D_{holo}$ in Eq. (3), it can be interpreted as maximizing the log-likelihood of the inference, as expressed below.

(11)$$D_{holo} = E_{x\sim p_x}\left[-\log_2 p_{x|\hat{y}}(x|\hat{y})\right]$$

As the training holograms are assumed to follow the independence and identical distribution (i.i.d.) condition, $-\log _2 p_{x|\hat {y}}(x|\hat {y})=-\sum _{i}\log _2 p_{x_i|\hat {y}_i}(x_i|\hat {y}_i)$ is satisfied. Additionally, when the conditional probability is modeled as a Gaussian distribution, the log-likelihood is equivalent to the mean squared error (MSE) between the input and decoded holograms, which is verified in Eq. (12) (assuming the standard deviation $\sigma _i=1$).

(12)$$\begin{aligned} p(x_i|\mu_i,\sigma_i) & =\frac{1}{\sqrt{2\pi}\sigma_{i}}exp\left( -\frac{(x_i-\mu_i)^2}{2\sigma_{i}^2}\right) \\ -\log_2(p(x_i|\mu_i,\sigma_i)) & ={-}\log_2\frac{1}{\sqrt{2\pi}\sigma_{i}}+\frac{(x_i-\mu_i)^2}{2\sigma_{i}^2}\\ -\log_2(p(x_i|\mu_i)) & \propto\frac{(x_i-\mu_i)^2}{2}\\ \end{aligned}$$

Therefore, $D_{holo}$ is calculated using the following equation.

(13)$$D_{holo}=\sum_{i}(x_i-\hat{x}_i)^2$$

As previously mentioned, the visual quality of the NR is far more important than that of a decoded hologram. Hence, we account for the distortion of the NR in the training cost function of Eq. (3) as $D_{obj}$. Specifically, three reconstruction distances ($d_1$, $d_2$, and $d_3$) are selected for each training hologram as recommended in [34]. Here, we use the ASM propagation model ($P$) for rendering the corresponding NRs, defined as follows.

(14)$$\begin{aligned} d(x,y) & =P\left\lbrace s\right\rbrace (x,y)\\ & =F^{{-}1}\left\lbrace F\left\lbrace s\right\rbrace (f_x,f_y) e^{j2\pi z\sqrt{\lambda^{{-}2}-f_x^2-f_y^2}}\right\rbrace (x,y) \end{aligned},$$

where $s$ and $d$ are the light waves in the hologram and reconstruction planes; $F$ and $F^{-1}$ are the forward and inverse Fourier transforms, respectively; $(f_x,f_y)$ is the coordinate in frequency domain; $\lambda$ is the wavelength of the light; and $z$ is the user-defined reconstruction distance. At a given $z$, the distortion $D_{obj_z}$ is the mean square error (MSE) measured between the two NRs rendered from the original and decoded holograms, respectively. Finally, $D_{obj}$ is a weighted sum of three distortions.

(15)$$D_{obj}=w_1\cdot D_{obj_1}+w_2\cdot D_{obj_2}+w_3\cdot D_{obj_3}$$

Hereinafter, the proposed deep network for full-complex hologram encoding is referred to as a numerical reconstruction quality-optimized deep compression Network (NRQN).

6. Experimental results and analysis

6.1 Training holograms using data augmentation

To train the proposed NRQN, we selected ten training hologram images from the JPEG Pleno dataset as listed in Table 5. Those are color-CGH holograms that contain various types of attributes, such as different scene objects, resolutions, and pixel pitch. Figure 13 shows the cropped training hologram patches. Note that signal characteristics appear similar to each other globally, but they differ locally owing to different generation parameters.

Fig. 13. Examples of the cropped patch images (256x256) of seven training holograms.

Download Full Size | PDF

Table 5. Training holograms selected from the JPEG Pleno dataset ($B$-$com$), which are CGH and color holograms. The last column represents the number of training patches generated by the data augmentation process, where x 2 means the real and imaginary parts per one hologram.

View Table | View all tables in this article

In training phase, data augmentation was performed to obtain sufficient number of training data and improve the generalization. To this end, two geometric transformations were applied. Specifically, rotations with five uniformly distributed angles from 0° to 180° were applied to the training holograms. In addition, the training holograms were flipped vertically and horizontally. Then, from the original and transformed holograms, several non-overlapping patches of size 256x256 were extracted. In this way, a total of 92,000 hologram patches were obtained, the details of which are presented in Table 5.

6.2 Experimental results

In this section, we provide experimental results and relevant in-depth discussions for two training scenarios. The first subsection presents the results of training the NRQN using the hologram distortion ($D_{holo}$) only (i.e., $\lambda _3=0$ in Eq. (3)). The purpose is to confirm whether the proposed NRQN can provide better coding efficiency than the existing standard codecs. In the second subsection, the pretrained NRQN based on $D_{holo}$ is additionally fine-tuned by the objective distortion ($D_{obj}$) in combination with $D_{holo}$ (i.e., $\lambda _2\neq 0$ and $\lambda _3\neq 0$ in Eq. (3)). The results show that the NRQN can provide improved objective/subjective NR quality, which is considered important in real applications.

As for evaluation metrics, following the common test conditions of JPEG Pleno [34], SNR and SSIM are used for evaluation of the hologram domain, whereas PSNR, SSIM, and VIF_P are used for the NR domain. VIF_P is a faster implementation of the visual information fidelity (VIF) that performs multiscale analysis in the spatial domain instead of the wavelet domain originally used in VIF. SSIM and VIF_P can have values from 0 (lowest quality) to 1 (highest quality).

For color holograms, the three channels are compressed independently, and the quality metrics are computed for each color channel independently. The arithmetic mean is calculated as $Q=(Q_R+Q_G+Q_B)/3$, where $Q_R$, $Q_G$, and $Q_B$ refer to the quality index for the red, green, and blue components, respectively.

The test results for the object domain were evaluated in accordance with the guidelines set forth by the CTC [34], which involved assessing each test hologram utilizing a combination of three different reconstruction distances and three distinct view ports, as shown in Table 6. Then, the results at three different reconstruction distances were averaged to provide an overall assessment of the object domain tests.

Table 6. NR rendering summary of the test holograms.

View Table | View all tables in this article

6.2.1 NRQN only trained by $D_{holo}$

For the training stage, $\lambda _1$ in Eq. (3) is a trade-off parameter between the reconstruction quality and bit consumption after encoding. To cover a sufficient bpp range, we used a total of six $\lambda$ configurations, meaning that six NRQNs were trained independently. Here, networks optimized using smaller values of $\lambda$ are for high bit-rate encoding, while networks trained with larger $\lambda$ values are for low bit-rate encoding.

The architecture of NRQN was implemented using the tensorflow framework. A mini-batch that comprised eight 256$\times$256 sized hologram patches and the ADAM optimizer was used in training. The hyper parameters, the number of CNN filters ($N$ and $M$) and total number of iterations, are summarized in Table 7. The specific $\lambda$ values ($\lambda _1\sim \lambda _6$) used for each test hologram were different to fit similar bpp range. We also adopted the learning rate decay scheme that reduces the learning rate by half every 50,000 iterations for the last 200,000 iterations.

Table 7. Summary of the hyper parameters used in NRQN. The $N$ and $M$ are the number of convolutional filters, and the $\lambda$ values decrease from $\lambda _1$ to $\lambda _6$.

View Table | View all tables in this article

During the training phase, the distortions were calculated based on 256x256 patches for $D_{holo}$ and $D_{obj}$ per iteration. Specifically, the mini-batch comprises eight hologram patches, wherein four patches corresponded to the real part and the remaining four patches corresponded to the imaginary part. These patches were provided as input to the network, which subsequently produced reconstructed patches of the same dimensions. Here, the $D_{holo}$ was calculated between the input and reconstructed patches. Next, original NRs and reconstructed NRs were generated using the input patches and reconstructed patches of the network, respectively, at three different reconstruction distances via the NRSH software. Then, the average $D_{obj}$ was calculated between them.

In the test phase, the actual entropy encoding/decoding engine was used instead of estimating the bit consumption by the hypothesized entropy model. Specifically, we used the publicly available arithmetic encoder and decoder (AE/AD in Fig. 11) from Project Nayuki [44]. Of the three test holograms, Biplane16K and Dice16K have an ultra-high resolution of 16384 x 16384 and a very large bit-depth of 16 bpp, which is almost twice the 8–10 bpp that is typically used in general video. In this case, the python program cannot properly process an entire image at once owing to the very large computational complexity. Thus, we adopted a tile-based encoding scheme for these two 16K test holograms. Specifically, the width ($W$) and height ($H$) were divided by a down-scale factor of 4 to generate 16 tile images of size 4096 x 4096. Next, all tiles were independently encoded using NRQN, and then the decoded tiles were combined together and rendered to NR. We confirmed that the tile-based encoding results are nearly identical to the full-frame encoding results, except for a slight performance drop in the high-bpp range.

Table 8 presents a summary of the experimental results. It presents the BD-rate gain(-) and loss($+$) of the proposed NQRN with regard to the JPEG2000, JPEG-XL [45], HEVC and VVC [46]. The experiments for the compared codecs were conducted using their respective reference softwares. Specifically, we used HM16.22 with the 'monochrome_16' profile for HEVC using intra-coding configuration and VTM15.0 with the 'main_16_444' profile using a 4:0:0 color space option for VVC. The RD-curves for three test holograms, DeepDices2K, Dices16K, and Piano16K, are shown in Figs. 14–16. In summary, the NRQN outperforms the existing standards by significant margins. Only for the results of DeepDices2K on hologram plane in terms of SNR metric, the NRQN and VVC show comparable coding performance. For hologram domain evaluation, the average BD-rate gains of the NRQN against JPEG2000/JPEG-XL/HEVC/VVC are -44.52%/-55.71%/-19.76%/NA and -32.98%/-41.33%/-12.69%/-22.68% for SNR and SSIM metrics, respectively. For object domain evaluation, the BD-rate gains are much greater: -43.99%/-54.47%/-20.50%/-26.28%, -58.03%/-70.88%/-28.67%/-22.54%, and -47.40%/-58.31%/-27.35%/-19.60% for PSNR, SSIM, and VIF_P, respectively.

Fig. 14. RD-curves for DeepDices2K in the hologram plane (a: SNR, b: SSIM) and in the NR plane (c: PSNR , d: VIF_P).

Download Full Size | PDF

Fig. 15. RD-curves for Dices16K in the hologram plane (a: SNR, b: SSIM) and in the NR plane (c: PSNR , d: VIF_P).

Download Full Size | PDF

Fig. 16. RD-curves for Piano16K in the hologram plane (a: SNR, b: SSIM) and in the NR plane (c: PSNR , d: VIF_P).

Download Full Size | PDF

Table 8. Summary of the BD-rate results for the encoding experiments, where the plus sign means BD-rate loss and the minus sign means BD-rate gain of the proposed NRQN, compared to the standardized codecs. (The N/A indicates that the BD-rate cannot be calculated due to the crossing of the two RD-curves.)

View Table | View all tables in this article

VVC, as reported in [46], is known to provide significant BD-rate gains of 30$\sim$40% for 2K natural images/videos compared to HEVC. However, in our experiments measured by SNR metric on the hologram domain, the performance of VVC against HEVC shows very different tendencies. Firstly, for DeepDices2K, the BD-rate gain of VVC is very small (-4.33%). Secondly, in the case of Piano16K, VVC demonstrated coding gain in the low bpp range, but exhibited coding loss in the high bpp range. Intriguingly, for Dices16K, VVC exhibits slight coding losses (+2.50%) in the entire bpp range. One possible reason for the reduced performance of VVC, particularly in relation to hologram resolution, may be attributed to the increased number of in-loop filters in VVC. In addition to the deblock filter and sample-adaptive offset (SAO) filters in HEVC, VVC introduced an additional filter called the adaptive loop filter (ALF). A recent study [47] highlighted that disabling the deblock filter or SAO can yield coding gain, rather than coding loss, when encoding phase-only holograms with HEVC. This effect may be more prominent as image resolution increase, owing to the differing signal characteristics between 16K and 2K holograms. To the best of our knowledge, this is the first investigation of encoding a 16-bit / 16K resolution image using VVC. Therefore, further in-depth study is required in the future to thoroughly understand the underlying reasons behind the observed results.

Next, the recent JPEG XL shows even worse coding performance than JPEG 2000. The design of the JPEG XL is based on the XYB color space, which aims to facilitate perceptually uniform quantization to account for the human visual system [45]. While this approach proves effective for natural images, it falls short in effectively encoding holograms, resulting in unsatisfactory coding performance.

It is worth mentioning that VIF_P in object domain and SSIM in hologram domain are more appropriate for the quality evaluation of compressed digital holograms, as they are known to have the highest correlations with the human quality scores [48,49]. Moreover, the NRQN obviously provides better perceptual NR image quality.

6.2.2 Fine-tuned NRQN by $D_{obj}$

The pretrained NRQN by $D_{holo}$ was further fine-tuned using a combination of $D_{holo}$ and $D_{obj}$ to improve the quality of the NR. In Fig. 11, the NRSH module was implemented based on Eq. (14), following the reference software provided by JPEG Pleno [35]. The weights $\lambda _2$ and $\lambda _3$ in Eq. (3) were set to 1 and 0.001, respectively, and all three weights ($w_1$, $w_2$, $w_3$) in Eq. (15) were set to 1. The number of iterations used for fine-tuning was 500,000.

For a fair comparison, we trained the pretrained NQRN for 500,000 more iterations in two ways: first, the anchor network was further trained using $D_{holo}$ only; second, the test network was trained using both $D_{holo}$ and $D_{obj}$. As can be observed in Table 9 that summarizes the BD-rate results, the test network largely improved the coding performance compared to the anchor network in all evaluation metrics. In summary, the test network provided -4.13% and -4.67% BD-rate gains on an average for SNR and SSIM in the hologram domain, respectively, and those for PSNR, SSIM, and VIF_P in the NR domain are -3.79%, -3.93%, and -2.71%, respectively. Although the coding performance of the NRQN in Table 8 had already been empirically maximized through extensive training parameter optimization, the coding efficiency was further improved by directly applying the NR quality-related distortion into the network cost function. On the other hand, it was found that further training the network using $D_{holo}$ did not help to achieve an additional coding gain.

Table 9. Summary of BD-rate results for the fine-tuned NRQN that was trained in 500,000 more iterations based on the pretrained NRQN in Table 8 using both $D_{holo}$ and $D_{obj}$, whereas the anchor network was fine-tuned using $D_{holo}$ only.

View Table | View all tables in this article

Finally, the BD-rate gains of the fine-tuned NRQN against the standardized codecs are summarized in Table 10, showing even higher coding gains compared to the results in Table 8.

Table 10. Summary of the final average BD-rate results for the fine-tuned NRQN with $D_{holo}$ and $D_{obj}$, compared to the standardized codecs.

View Table | View all tables in this article

6.3 Computational complexity analysis

In this subsection, we analyze the computational complexity of five test codecs. The average processing time for encoding or decoding a single 2K or 4K hologram image is presented in Table 11. Among the codecs, JPEG 2000 shows the fastest processing times, followed by JPEG XL. In contrast, VVC requires approximately 38 hours to encode a single 16K hologram image. However, the NRQN achieves a much shorter encoding time for a 16K hologram compared to VVC, approximately 8 minutes. In terms of decoding, VVC takes 48.5 seconds, while the NRQN requires 23 hours. It is important to note that the NRQN, unlike the compared standardized codecs, is implemented using the Python programming, and its speed can be optimized by importing to faster C programming.

Table 11. Computational complexity comparison: encoding and decoding time in second.

View Table | View all tables in this article

6.4 Visual quality analysis

A subjective quality evaluation of holographic data is a challenging task owing to the lack of appropriate visualization devices. Although there are a few devices that deliver complete 3D holographic information, the viewing angle is generally limited, and the required resolution of digital holograms can easily exceed those of the best 2D displays. Therefore, discussion on a suitable subjective evaluation method is ongoing, and the JPEG-Pleno uses a pseudo-data sequence of multiple reconstructed images at different distances and viewports using a conventional 2D display.

In this section, we compare the subjective visual quality of NRs that are rendered from the decoded holograms compressed using JPEG2000, JPEG XL, HEVC, and fine-tuned NRQN, respectively. The observed visual quality for VVC was found to be similar to that of HEVC. Figure 17 shows the NRs and the summary of is presented in Table 12. First, the results of DeepDices2K indicate that the proposed NRQN provides the best visual quality with sharp object boundaries and pixel brightness closest to the original NR. However, the NR of HEVC is blurry due to the reduced brightness, and this visual deterioration is more severe in the NR of JPEG2000. Next, for Dices16K, the NRQN shows similarly good subjective NR quality with approximately 33% less bit consumption compared to HEVC, while JPEG2000 shows much lower visual quality. The results for Piano16K are presented in the last row. The NRQN provides the best objective coding performance and subjective image quality consuming lower bits. For example, the NR of NRQN gives a better PSNR of 2.7dB, 6.1dB, and 8.2dB than HEVC, JPEG2000 and JPEG XL, respectively. The NRQN also shows the subjective quality most similar to the original, but the NRs of HEVC and JPEG2000 have a lower subjective quality owing to the reduced brightness and speckle noise near the object boundary. In the case of JPEG XL, the presence of color artifacts was consistently observed in the results. This can be attributed to its design based on the XYB color space, which allocates a lesser number of bits to the blue channel.

Fig. 17. Subjective quality comparison. (a) The first row shows the original NRs from uncompressed holograms, (b) The second to fourth rows show the NRs of DeepDices2K, Dices16K, and Piano16K, where the columns 1-4 show the results of the JPEG XL, JPEG2000, HEVC, and fine-tuned NRQN, respectively.

Download Full Size | PDF

Table 12. Summary of subjective quality comparison between the test codecs.

View Table | View all tables in this article

7. Conclusion

In this paper, a deep learning-based compression network for full-complex holograms was proposed. The coding performances of existing standard codecs were evaluated through large scale encoding experiments, using HEVC-Intra and JPEG2000 on various real/imaginary high-resolution hologram images provided by JPEG-Pleno. The limitations of these standard coding solutions were then analyzed. The proposed, NRQN, is an end-to-end trainable network based on rate-distortion cost. The cost function incorporates rendering distortion observed in the NR and coding distortion in the decoded hologram during training. As a result, the NRQN outperformed HEVC/VVC, achieving an improvement of 23.70%/23.04% and 23.67%/29.59% in the hologram and NR planes, respectively, as measured by the (P)SNR metric.

Future research will focus on developing a better perceptual NR quality metric that has a higher correlation with human perception. It is expected that this will lead to even better coding performance when integrated into a coding engine.

Funding

This work was supported by the research fund of Hanyang University (HY-2021-2599).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Hariharan, Basics of Holography, (Cambridge University Press, 2002).

2. P.-A. Blanche, “Holography, and the future of 3D display,” Light: Adv. Manuf. 2, 446–459 (2021). [CrossRef]

3. D. Pi, J. Liu, and Y. Wang, “Review of computer-generated hologram algorithms for color dynamic holographic three-dimensional display,” Light: Sci. Appl. 11(1), 231 (2022). [CrossRef]

4. J. Geng, “Three-dimensional display technologies,” Adv. Opt. Photonics 5(4), 456–535 (2013). [CrossRef]

5. H.-G. Choo, T. Kozacki, W. Zaperty, M. Chlipala, Y. Lim, and J. Kim, “Fourier digital holography of real scenes for 360°tabletop holographic displays,” Appl. Opt. 58(34), G96–G103 (2019). [CrossRef]

6. J. P. Memmolo, M. Paturzo, A. Pelagotti, A. Finizio, P. Ferraro, and B. Javidi, “New high compression method for digital hologram recorded in microscope configuration,” SPIE Modeling Aspects in Optical Metrology III 8083, 297–303 (2011). [CrossRef]

7. Y. Rivenson, A. Stern, and B. Javidi, “Overview of compressive sensing techniques applied in holography,” Appl. Opt. 52(1), A423–A432 (2013). [CrossRef]

8. P. A. Cheremkhin and E. A. Kurbatova, “Numerical comparison of scalar and vector methods of digital hologram compression,” Holography, Diffractive Optics, and Applications VII 10022, 455–464 (2016).

9. D. Blinder, T. Bruylants, A. Munteanu, and P. Schelkens, “JPEG 2000-based compression of fringe patterns for digital holographic microscopy,” Opt. Eng. 53(12), 123102 (2014). [CrossRef]

10. D. Blinder, C. Schretter, H. Ottevaere, A. Munteanu, and P. Schelkens, “Unitary transforms using time-frequency warping for digital holograms of deep scenes,” IEEE Trans. Comput. Imaging 4(2), 206–218 (2018). [CrossRef]

11. P. A. Cheremkhin and E. A. Kurbatova, “Wavelet compression of off-axis digital holograms using real/imaginary and amplitude/phase parts,” Sci. Rep. 9(1), 7561 (2019). [CrossRef]

12. V. Hajihashemi, H. E. Najafabadi, A. A. Gharahbagh, H. Leung, M. Yousefan, and J. M. R. Tavares, “A novel high-efficiency holography image compression method, based on HEVC, Wavelet, and nearest-neighbor interpolation,” Multimed. Tools Appl. 80(21-23), 31953–31966 (2021). [CrossRef]

13. M. V. Bernardo, E. Fonseca, A. M. G Pinheiro, P. T. Fiadeiro, and M. Pereira, “Efficient coding of experimental holograms using speckle denoising,” Signal Processing: Image Communication 96, 116306 (2021). [CrossRef]

14. R. K. Muhamad, T. Birnbaum, D. Blinder, C. Schretter, and P. Schelkens, “Binary hologram compression using context based Bayesian tree models with adaptive spatial segmentation,” Opt. Express 30(14), 25597–25611 (2022). [CrossRef]

15. Y. Xing, M. Kaaniche, B. Pesquet-Popescu, and F. Dufaux, “Vector lifting scheme for phase-shifting holographic data compression,” Opt. Eng. 53(11), 112312 (2014). [CrossRef]

16. T. YaBirnbaum, A. Ahar, D. Blinder, C. Schretter, T. Kozacki, and P. Schelkens, “Wave atoms for digital hologram compression,” Appl. Opt. 58(22), 6193–6203 (2019). [CrossRef]

17. A. E. Rhammad, P. Gioia, A. Gilles, M. Cagnazzo, and B. Pesquet-Popescu, “View-dependent compression of digital hologram based on matching pursuit,” Optics, Photonics, and Digital Technologies for Imaging Applications V 10679, 133–146 (2018). [CrossRef]

18. A. V. Zea, A. L. V. Amado, M. Tebaldi, and R. Torroba, “Alternative representation for optimized phase compression in holographic data,” OSA Continuum 2(3), 572–581 (2019). [CrossRef]

19. H. Gu and G. Jin, “Phase-difference-based compression of phase-only holograms for holographic three-dimensional display,” Opt. Express 26(26), 33592–33603 (2018). [CrossRef]

20. G. K. Wallace, “The JPEG still picture compression standard,” IEEE Trans. Consumer Electron. 38(1), xviii–xxxiv (1992). [CrossRef]

21. D. S. Taubman, M. W. Marcellin, and M. Rabbani, “JPEG2000: Image compression fundamentals, standards and practice,” J. Electron. Imaging 11(2), 286–287 (2002). [CrossRef]

22. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H. 264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003). [CrossRef]

23. G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). [CrossRef]

24. J. P. Peixeiro, C. Brites, J. Ascenso, and F. Pereira, “Holographic data coding: Benchmarking and extending hevc with adapted transforms,” IEEE Trans. Multimedia 20, 282–297 (2017). [CrossRef]

25. R. Corda and C. Perra, “Hologram Domain Data Compression: Performance of Standard Codecs and Image Quality Assessment at Different Distances and Perspectives,” IEEE Trans. Broadcast. 66(2), 292–309 (2019). [CrossRef]

26. P. Schelkens, A. Ahar, A. Gilles, R. K. Muhamad, T. J. Naughton, C. Perra, A. Pinheiro, P. Stȩpień, and M. Kujawińska, “Compression strategies for digital holograms in biomedical and multimedia applications,” Light: Adv. Manuf. 3, 601–621 (2022). [CrossRef]

27. T. Zeng, Y. Zhu, and E. Y. Lam, “Deep learning for digital holography: a review,” Opt. Express 29(24), 40572–40593 (2021). [CrossRef]

28. D. Blinder, A. Ahar, S. Bettens, T. Birnbaum, A. Symeonidou, H. Ottevaere, C. Schretter, and P. Schelkens, “Signal processing challenges for digital holographic video display systems,” Signal Processing: Image Communication 70, 114–130 (2019). [CrossRef]

29. R. K. Muhamad, T. Birnbaum, A. Gilles, S. Mahmoudpour, K. J. Oh, M. Pereira, C. Perra, A. Pinheiro, and P. Schelkens, “JPEG Pleno holography: scope and technology validation procedures,” Appl. Opt. 60(3), 641–651 (2021). [CrossRef]

30. J. Prazeres, A. Gilles, R. K. Muhammad, T. Birnbaum, P. Schelkens, and A. M. Pinheiro, “Quality evaluation of the JPEG Pleno Holography Call for Proposals response,” 14th International Conference on Quality of Multimedia Experience (QoMEX), 1–6 (2022).

31. “JPEG Pleno Database,” http://plenodb.jpeg.org/ (2023).

32. M. Tausif, E. Khan, M. Pereira, and A. Pinheiro, “Comprehensive Statistical analysis of Holograms in context of coding,” IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI)1–5 (2022).

33. “Kodak Image Dataset,” http://www.cs.albany.edu/xypan/research/snr/Kodak.html (2010).

34. R. K. Muhamad, A. Ahar, T. Birnbaum, A. Gilles, S. Mahmoudpour, and P. Schelkens, “JPEG Pleno Holography Common Test Conditions 8.0,” 89th JPEG Meeting, WG1N89046 (2022).

35. A. Gilles and P. Gioia, “Numerical Reconstruction Software for Holography (NRSH) 8.0,” 88th JPEG Meeting, WG1N88042 (2022).

36. P. Schelkens, T. Ebrahimi, A. Gilles, P. Gioia, K. J. Oh, F. Pereira, C. Perra, and A. M. Pinheiro, “JPEG Pleno: Providing representation interoperability for holographic applications and devices,” ETRI Journal 41(1), 93–108 (2019). [CrossRef]

37. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

38. H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Trans. on Image Process. 15(2), 430–444 (2006). [CrossRef]

39. M. V. Bernardo, P. Fernandes, A. Arrifano, M. Antonini, E. Fonseca, P. T. Fiadeiro, A. M. Pinheiro, and M. Pereira, “Holographic representation: Hologram plane vs. object plane,” Signal Processing: Image Communication 68, 193–206 (2018). [CrossRef]

40. “Kakadu,” https://kakadusoftware.com/ (2005).

41. M. V. Bernardo and A. M. G. Pinheiro and M. Pereira, “Benchmarking coding standards for digital holography represented on the object plane,” Optics, Photonics, and Digital Technologies for Imaging Applications V 10679, 123–132 (2018). [CrossRef]

42. J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” arXiv, arXiv:1511.06281 (2015). [CrossRef]

43. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv, arXiv:1611.01704 (2016). [CrossRef]

44. Nayuki, “Nayuki project arithmetic coder,” GitHub (2018), https://github.com/nayuki/Reference-arithmetic-coding/.

45. J. Alakuijala, J. Sneyers, L. Versari, and J. Wassenberg, “JPEG White Paper: JPEG XL Image Coding System,” ISO/IEC JTC 1/SC 29/WG1 N100400 (2023).

46. B. Bross, Y. K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J. R. Ohm, “Overview of the versatile video coding (VVC) standard and its applications,” IEEE Trans. Circuits Syst. Video Technol. 31(10), 3736–3764 (2021). [CrossRef]

47. K. J. Oh, H. Ban, S. Choi, H. Ko, and H. Y. Kim, “HEVC extension for phase hologram compression,” Opt. Express 31(6), 9146–9164 (2023). [CrossRef]

48. H. Amirpour, A. M. G. Pinheiro, E. Fonseca, M. Ghanbari, and M. Pereira, “Quality evaluation of holographic images coded with standard codecs,” IEEE Trans. Multimedia 24, 3256–3264 (2021). [CrossRef]

49. H. Ko and H. Y. Kim, “Deep learning-based compression for phase-only hologram,” IEEE Access 9, 79735–79751 (2021). [CrossRef]

Hologram	Resolution (pixel)	Pixel pitch ( $μ$ m)	Wavelenth (nm)	Type	Reconstruction distance (mm)	Background
DeepDices2K	2048x2048	4.8	640, 532, 473	CGH / Color	86.7, 166, 246	Yes
Dice16K	16384x16384	0.4	640, 532, 473	CGH / Color	6.57, 10, 13.1	Yes
Piano16K	16384x16384	0.4	640, 532, 473	CGH / Color	6.8, 10, 12.5	No

Type	Histogram	Mean Normalized Correlation (MNC)
	PCC	Self-MNC	Cross-MNC
			Real vs Imaginary	2K vs 4K
Natural Image	0.36 $\sim$ 0.79	0.14 $\sim$ 0.30	N/A	N/A
Hologram	0.94 $\sim$ 0.99	0.02 $\sim$ 0.03	0.02 $\sim$ 0.03	0.021 $\sim$ 0.025

Group	G1	G2	G3	G4	G5
Hologram	Astronaut	DeepChess	DeepCornellBox16k	DeepDices2K	Biplane16k,
					Dices16k,
					Piano16k
Generation Method	OCH	CGH	CGH	CGH	CGH
Resolution	2588 x 2588	2048 x 16384	16384 x 16384	2048 x 2048	16384 x 16384
# of Channel	Grayscale	Grayscale	Grayscale	Color	Color
Reconstruction Distances (mm)	{-160, -172, -175}	{396, 998, 1,606}	{250, 342, 532}	{87, 166, 246}	{46, 37, 50}
					{7, 10, 13}
					{7, 10, 13}

Metric	Hologram Plane		Object Plane
	Best	Worst	Best	Worst
(P)SNR	Obj-HM, Obj-HM, Obj-HM, Holo-HM, Obj-HM	Holo-J2K, Holo-J2K, Holo-J2K, Obj-2K, Holo-J2K	Obj-HM, Obj-HM, Obj-HM, Obj-HM, Obj-HM	Holo-J2K, Holo-J2K, Holo-J2K, Holo-J2K, Holo-J2K
SSIM	Holo-J2K, Obj-HM, Obj-HM, Obj-HM, Obj-HM	Obj-J2K, Holo-J2K, Holo-J2K, Holo-J2K, Holo-J2K	Holo-HM, Obj-HM, Obj-HM, Obj or Holo-HM, Obj-HM	Obj-J2K, Holo-J2K, Holo-J2K, Obj-J2K, Holo-J2K

Hologram	Resolution	Pixel pitch (um)	# of training hologram patches (256 x 256)
Dices1080p	1920 x 1080	6.4	3,000 x 2 = 6,000
Biplane4k	4096 x 4096	1	4,000 x 2 = 8,000
Ring4k	4096 x 4096	0.4	4,000 x 2 = 8,000
Ballet8k4k	7680 x 4320	4.8	5,000 x 2 = 10,000
Breakdancers8k4k			5,000 x 2 = 10,000
DeepDices8k4k			5,000 x 2 = 10,000
Biplane16k	16384 x 16384	1	5,000 x 2 = 10,000
DeepDices16k		1	5,000 x 2 = 10,000
Ring16k		0.4	5,000 x 2 = 10,000
SpecularCar16k		0.4	5,000 x 2 = 10,000
Total # of training hologram patches			92,000

Deep compression network for enhancing numerical reconstruction quality of full-complex holograms

Abstract

1. Introduction

2. Full-complex hologram image database

3. Evaluation framework

4. Performance analysis of legacy codecs for full-complex holograms

5. Proposed deep compression network

5.1 Base encoding and decoding networks ($T_a$ and $T_s$)

5.2 Networks for hyper-parameters ($H_a$ and $H_s$)

5.3 Distortion costs for the decoded hologram ($D_{holo}$) and rendered NR ($D_{obj}$)

6. Experimental results and analysis

6.1 Training holograms using data augmentation

6.2 Experimental results

6.2.1 NRQN only trained by $D_{holo}$

6.2.2 Fine-tuned NRQN by $D_{obj}$

6.3 Computational complexity analysis

6.4 Visual quality analysis

7. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (17)

Tables (12)

Equations (15)

Optics Express

Test Hologram	Three Reconstruction Distances (mm)	Corresponding Viewports (window position={v_pos,h_pos})
DeepDices2K	0.166 / 0.246 / 0.0867	{0,0} / {0,0} / {0,0}
Dices16K	0.00656 / 0.01 / 0.0131	{0,0} / {+1,-1} / {-1,+1}
Piano16K	0.0068 / 0.01 / 0.0125	{0,0} / {+1,+1} / {-1,-1}

$λ$	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	$λ_{5}$	$λ_{6}$
N	192	192	192	256	256	256
M	350	420	420	420	600	720
# of iterations	1200000

Test Holograms	NRQN vs.	Hologram Plane		Object Plane
Test Holograms	NRQN vs.	SNR	SSIM	PSNR	SSIM	VIFp
DeepDices2K	JPEG 2000	-33.44	-39.09	-35.54	-34.91	-37.99
	JPEG XL	-53.14	-42.72	-47.46	-56.79	-50.06
	HEVC	-10.12	-17.53	-13.00	-15.18	-16.61
	VVC	N/A	-9.26	-6.94	-5.61	-9.90
Dices16K	JPEG 2000	-44.97	-22.58	-42.03	-52.95	-44.98
	JPEG XL	-50.60	-35.37	-53.15	-63.54	-55.35
	HEVC	-18.19	-8.38	-17.08	-30.14	-26.62
	VVC	-21.91	-14.95	-26.54	-18.99	-19.09
Piano16K	JPEG 2000	-55.13	-37.29	-54.40	-86.23	-59.22
	JPEG XL	-63.39	-45.91	-62.81	-92.29	-69.53
	HEVC	-30.97	-12.15	-31.43	-40.71	-38.83
	VVC	-34.18	-43.81	-45.38	-43.03	-29.80
Average	JPEG 2000	-44.52	-32.98	-43.99	-58.03	-47.40
	JPEG XL	-55.71	-41.33	-54.47	-70.88	-58.31
	HEVC	-19.76	-12.69	-20.50	-28.67	-27.35
	VVC	N/A	-22.68	-26.28	-22.54	-19.60

Test Holograms	Fine-tuned NRQN vs.	Hologram Plane		Object Plane
Test Holograms	Fine-tuned NRQN vs.	SNR	SSIM	PSNR	SSIM	VIFp
Average	JPEG 2000	-48.11	-37.77	-46.93	-61.10	-49.49
	JPEG XL	-58.46	-45.67	-57.05	-72.95	-60.19
	HEVC	-23.70	-18.17	-23.67	-32.33	-29.26
	VVC	-23.04	-28.56	-29.59	-26.45	-21.62

Enc / Dec	Resolution	JPEG XL	JPEG 2000	HEVC	VVC	NRQN
Encoding	2K hologram	0.2	0.3	39.0	2170.0	480.0
Encoding	16K hologram	10.2	4.2	2960.0	136387.0	48480.0
Decoding	2K hologram	1.0	0.3	0.6	0.9	1380.0
Decoding	16K hologram	47.7	4.2	30.9	48.5	82800.0

Test Holograms	Hologram Plane		Object Plane
Test Holograms	SNR	SSIM	PSNR	SSIM	VIFp
DeepDices2K	-6.73	-7.39	-6.76	-8.29	-5.68
Dices16K	-1.93	-3.12	-1.40	-2.06	-1.11
Piano16K	-3.73	-3.50	-3.20	-1.45	-1.32
Average	-4.13	-4.67	-3.79	-3.93	-2.71

Index	Test Codec	Test Holograms
		DeepDices2K	Dices16K	Piano16K
bpp (Encoded hologram)	JPEG 2000	2.866	6.077	2.959
	JPEG XL	3.096	5.979	2.967
	HEVC	2.883	6.134	3.003
	NRQN	2.507	4.104	2.821
Objective Scores (PSNR in NR domain)	JPEG 2000	14.63 dB	26.82 dB	21.27 dB
	JPEG XL	14.03 dB	24.22 dB	19.23 dB
	HEVC	16.56 dB	31.01 dB	24.71 dB
	NRQN	17.34 dB	30.98 dB	27.39 dB
Subjective visual quality	JPEG 2000	bad	moderate	bad
	JPEG XL	bad	bad	bad
	HEVC	moderate	good	moderate
	NRQN	good	good	good

Test Holograms	Hologram Plane		Object Plane
Test Holograms	SNR	SSIM	PSNR	SSIM	VIFp
DeepDices2K	-6.73	-7.39	-6.76	-8.29	-5.68
Dices16K	-1.93	-3.12	-1.40	-2.06	-1.11
Piano16K	-3.73	-3.50	-3.20	-1.45	-1.32
Average	-4.13	-4.67	-3.79	-3.93	-2.71

Index	Test Codec	Test Holograms
		DeepDices2K	Dices16K	Piano16K
bpp (Encoded hologram)	JPEG 2000	2.866	6.077	2.959
	JPEG XL	3.096	5.979	2.967
	HEVC	2.883	6.134	3.003
	NRQN	2.507	4.104	2.821
Objective Scores (PSNR in NR domain)	JPEG 2000	14.63 dB	26.82 dB	21.27 dB
	JPEG XL	14.03 dB	24.22 dB	19.23 dB
	HEVC	16.56 dB	31.01 dB	24.71 dB
	NRQN	17.34 dB	30.98 dB	27.39 dB
Subjective visual quality	JPEG 2000	bad	moderate	bad
	JPEG XL	bad	bad	bad
	HEVC	moderate	good	moderate
	NRQN	good	good	good