Neural compression for hologram images and videos

Liang Shi; Liang Shi; Richard Webb; Lei Xiao; Changil Kim; Changwon Jang

doi:10.1364/OL.472962

Optics Letters
Vol. 47,
Issue 22,
pp. 6013-6016
(2022)
•https://doi.org/10.1364/OL.472962

Neural compression for hologram images and videos

Liang Shi, Richard Webb, Lei Xiao, Changil Kim, and Changwon Jang

Open Access

Get PDF
Email
Share
Get Citation
Copy Citation Text
Liang Shi, Richard Webb, Lei Xiao, Changil Kim, and Changwon Jang, "Neural compression for hologram images and videos," Opt. Lett. 47, 6013-6016 (2022)

Export Citation
- BibTex
- Endnote (RIS)
- HTML
- Plain Text
Citation alert
Save article
Editors' Pick

Check for updates

Abstract

Holographic near-eye displays can deliver high-quality three-dimensional (3D) imagery with focus cues. However, the content resolution required to simultaneously support a wide field of view and a sufficiently large eyebox is enormous. The consequent data storage and streaming overheads pose a big challenge for practical virtual and augmented reality (VR/AR) applications. We present a deep-learning-based method for efficiently compressing complex-valued hologram images and videos. We demonstrate superior performance over the conventional image and video codecs.

Full Article | PDF Article

More Like This

Phase-only hologram video compression using a deep neural network for up-scaling and restoration

Woosuk Kim, Jin-Kyum Kim, Byung-Seo Park, Kwan-Jung Oh, and Young-Ho Seo
Appl. Opt. 61(36) 10644-10657 (2022)

Dynamic-range compression scheme for digital hologram using a deep neural network

Tomoyoshi Shimobaba, David Blinder, Michal Makowski, Peter Schelkens, Yota Yamamoto, Ikuo Hoshi, Takashi Nishitsuji, Yutaka Endo, Takashi Kakue, and Tomoyoshi Ito
Opt. Lett. 44(12) 3038-3041 (2019)

Deep-learning-based computer-generated hologram from a stereo image pair

Chenliang Chang, Di Wang, Dongchen Zhu, Jiamao Li, Jun Xia, and Xiaolin Zhang
Opt. Lett. 47(6) 1482-1485 (2022)

Previous Article Next Article

Supplementary Material (1)

Name	Description
Supplement 1	Supplemental Document

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.

Fig. 1. High-fidelity hologram compression (HiFiHC) pipeline for hologram image and video compression. For image compression, the encoder

$E$

encodes one latent code for the hologram’s real and imaginary components. The latent code is quantized by

$Q$

, entropy coded with side information generated through

$P$

, decoded by

$G$

, and classified by

$discrim$

. For video compression,

$E$

takes an H.265 compressed frame with its associated residual and encodes a latent code only for reconstructing the residual. The reconstructed residual is added back to the H.265 frame.

Download Full Size | PDF

Fig. 2. Comparison of HiFiHC, high efficient image file format (HEIC), and better portable graphics (BPG) performance on hologram images. Readers are encouraged to zoom in and examine details. The second and the third row in each label mark the peak signal to noise ratio (PSNR) and structure similarity index (SSIM) for the hologram amplitude (first number), and the refocused DoF image (second number). Source images: PartyTug 6:00AM (left) by Ian Hubert, and Mansion (right) from Kim et al. [21]. More discussion to be added.

Download Full Size | PDF

Fig. 3. Comparison of HiFiHC and H.265 [at lower constant rate factor (CRF)] performance on hologram videos. Readers are encouraged to zoom in and examine details. In each inset, the top right-hand and bottom left-hand numbers mark the PSNR and SSIM for the refocused DoF image. The second row in the frame label marks the frame type and the bits per pixel (bpp) of the HiFiHC latent code. Source images: Big Buck Bunny (top) by Blender Foundation, and Horns (bottom) from Mildenhall et al. [27]. The H265 (lower CRF) results use CRF of 15 and 18 for Big Buck Bunny and Horns, respectively, both of which yield a similar number of additional bits per pixel compared with HiFiHC.

Download Full Size | PDF

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

(1) $$\begin{aligned} \mathcal{L}_{E,G} &= w_r r(y) + w_{holo} ||x - x'||_1 + w_{fs}d_{fs}(x,x') \\ &\quad - w_D \log(discrim(x', y)). \end{aligned}$$

(2) $$\mathcal{L}_{D} ={-}\log(1-discrim(x', y)) - \log(discrim(x,y)),$$

(3) $$\begin{aligned} \mathcal{L}_{\Delta (E,G)} &= w_{\Delta r} r(\Delta y) + w_{\Delta {holo}} ||\Delta x - \Delta x'||_1 \\ &\quad + w_{\Delta fs}d_{\Delta fs}(\Delta x+x_{625},\Delta x'+x_{625}) \\ &\quad - w_{\Delta D} \log(discrim(\Delta x'+x_{625}, \Delta y)) \end{aligned}$$

(4) $$\scalebox{0.9}{$\displaystyle\mathcal{L}_{D} ={-}\log(1-discrim(\Delta x'+x_{625}, \Delta y)) - \log(discrim(\Delta x+x_{625},\Delta y)).$}$$

(5) $$\Delta x_\textrm{P} = x_\textrm{P} - (x_{256 \_\textrm{P}} + \text{warp}(\Delta x'_\textrm{I}, M_{I \to P})),$$

(6) $$\Delta x_\textrm{B} = x_\textrm{B} - (x_{256 \_\textrm{B}} + \text{warp}(\Delta x'_\textrm{I}, M_{I \to B}) + \text{warp}(\overline{\Delta x_\textrm{P}}, M_{P \to B})),$$

Abstract

Supplementary Material (1)

Data availability

Cited By

Figures (3)

Equations (6)

Optics Letters