Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Neural compression for hologram images and videos

Open Access Open Access

Abstract

Holographic near-eye displays can deliver high-quality three-dimensional (3D) imagery with focus cues. However, the content resolution required to simultaneously support a wide field of view and a sufficiently large eyebox is enormous. The consequent data storage and streaming overheads pose a big challenge for practical virtual and augmented reality (VR/AR) applications. We present a deep-learning-based method for efficiently compressing complex-valued hologram images and videos. We demonstrate superior performance over the conventional image and video codecs.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

Full Article  |  PDF Article
More Like This
Phase-only hologram video compression using a deep neural network for up-scaling and restoration

Woosuk Kim, Jin-Kyum Kim, Byung-Seo Park, Kwan-Jung Oh, and Young-Ho Seo
Appl. Opt. 61(36) 10644-10657 (2022)

Dynamic-range compression scheme for digital hologram using a deep neural network

Tomoyoshi Shimobaba, David Blinder, Michal Makowski, Peter Schelkens, Yota Yamamoto, Ikuo Hoshi, Takashi Nishitsuji, Yutaka Endo, Takashi Kakue, and Tomoyoshi Ito
Opt. Lett. 44(12) 3038-3041 (2019)

Deep-learning-based computer-generated hologram from a stereo image pair

Chenliang Chang, Di Wang, Dongchen Zhu, Jiamao Li, Jun Xia, and Xiaolin Zhang
Opt. Lett. 47(6) 1482-1485 (2022)

Supplementary Material (1)

NameDescription
Supplement 1       Supplemental Document

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (3)

Fig. 1.
Fig. 1. High-fidelity hologram compression (HiFiHC) pipeline for hologram image and video compression. For image compression, the encoder $E$ encodes one latent code for the hologram’s real and imaginary components. The latent code is quantized by $Q$, entropy coded with side information generated through $P$, decoded by $G$, and classified by $discrim$. For video compression, $E$ takes an H.265 compressed frame with its associated residual and encodes a latent code only for reconstructing the residual. The reconstructed residual is added back to the H.265 frame.
Fig. 2.
Fig. 2. Comparison of HiFiHC, high efficient image file format (HEIC), and better portable graphics (BPG) performance on hologram images. Readers are encouraged to zoom in and examine details. The second and the third row in each label mark the peak signal to noise ratio (PSNR) and structure similarity index (SSIM) for the hologram amplitude (first number), and the refocused DoF image (second number). Source images: PartyTug 6:00AM (left) by Ian Hubert, and Mansion (right) from Kim et al. [21]. More discussion to be added.
Fig. 3.
Fig. 3. Comparison of HiFiHC and H.265 [at lower constant rate factor (CRF)] performance on hologram videos. Readers are encouraged to zoom in and examine details. In each inset, the top right-hand and bottom left-hand numbers mark the PSNR and SSIM for the refocused DoF image. The second row in the frame label marks the frame type and the bits per pixel (bpp) of the HiFiHC latent code. Source images: Big Buck Bunny (top) by Blender Foundation, and Horns (bottom) from Mildenhall et al. [27]. The H265 (lower CRF) results use CRF of 15 and 18 for Big Buck Bunny and Horns, respectively, both of which yield a similar number of additional bits per pixel compared with HiFiHC.

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

$$\begin{aligned} \mathcal{L}_{E,G} &= w_r r(y) + w_{holo} ||x - x'||_1 + w_{fs}d_{fs}(x,x') \\ &\quad - w_D \log(discrim(x', y)). \end{aligned}$$
$$\mathcal{L}_{D} ={-}\log(1-discrim(x', y)) - \log(discrim(x,y)),$$
$$\begin{aligned} \mathcal{L}_{\Delta (E,G)} &= w_{\Delta r} r(\Delta y) + w_{\Delta {holo}} ||\Delta x - \Delta x'||_1 \\ &\quad + w_{\Delta fs}d_{\Delta fs}(\Delta x+x_{625},\Delta x'+x_{625}) \\ &\quad - w_{\Delta D} \log(discrim(\Delta x'+x_{625}, \Delta y)) \end{aligned}$$
$$\scalebox{0.9}{$\displaystyle\mathcal{L}_{D} ={-}\log(1-discrim(\Delta x'+x_{625}, \Delta y)) - \log(discrim(\Delta x+x_{625},\Delta y)).$}$$
$$\Delta x_\textrm{P} = x_\textrm{P} - (x_{256 \_\textrm{P}} + \text{warp}(\Delta x'_\textrm{I}, M_{I \to P})),$$
$$\Delta x_\textrm{B} = x_\textrm{B} - (x_{256 \_\textrm{B}} + \text{warp}(\Delta x'_\textrm{I}, M_{I \to B}) + \text{warp}(\overline{\Delta x_\textrm{P}}, M_{P \to B})),$$
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.