Block-wise focal stack image representation for end-to-end applications

Kejun Wu; Kejun Wu; You Yang; You Yang; Mei Yu; Qiong Liu; Qiong Liu

doi:10.1364/OE.413523

1. Introduction

Focal stack imaging is an emerging imaging technology for computational photography, macroscopic and microscopic imaging. As the essential ingredient of this technology, focal stack images (FoSIs) are a set of images focusing on multiple depths of a scene. FoSIs record rich volumetric information of scene, which enhances the capability of scene understanding. Consequently, FoSIs act a pivotal role in life sciences [1–3], 3D medical imagery [4,5], focus-tunable displays [6–8], light field reconstruction [9,10], distance measurement [11–13], immersive media [14,15] and other related communities.

FoSIs are generally captured by changing focal lengths or varying focal planes of optical systems, such as camera or microscope. Optical systems are typically subjected to a fundamental trade-off between depth of field (DoF) and signal-to-noise ratio (SNR). The aperture of lens should be fully opened to obtain high SNR of output image, resulting in shallow DOF [16]. Examples of FoSIs acquisition by optical systems are shown in Fig. 1. Figure 1(a) shows a prototype of microscope with liquid lens, objective of which has tunable focus. FoSIs of specimen are captured on detector plane by changing focal lengths. It allows to observe samples at any depth with high precision. Subsequently, DoF of optical systems can be extended. And the extended depth of field (EDoF) is suitable for observing cells, structures and tissues [17]. Similarly, Fig. 1(b) shows an ordinary digital camera. The light from scene converges on varying focal planes, and generates a series of 2D images with shallow DOF. By extending DOF, it achieves a comprehensive description of scene with high SNR.

Fig. 1. DoF limitation of optical systems and extend depth of field (EDoF). (a) Microscopy with tunable lens. (b) Ordinary digital camera with varying focal planes.

Download Full Size | PDF

However, FoSIs are highly redundant in $z$-dimension. The aforementioned applications of FoSIs are realized at the cost of dense depth sampling, which yields huge volumes of data. Typically, FoSIs are locally used after acquisition without quality loss in existing works [10,18–21], where transmission is not considered or storage is not a challenge for these FoSIs. However, when remote application or transmission of FoSIs is required [22,23], compression of these images becomes a new paradigm for both image/video coding and aforementioned fields. FoSIs can be easily organized into a video sequence in form. Different from ordinary video sequences, the changes among FoSIs come from propagations of focus rather than motion of cameras or objects. It has special form of visual redundancies. Challenge arises from the compression on this type of images using existing encoders, because existing encoders are originally designed for ordinary image/video. There is a wide gap between existing encoders and FoSIs. For better compression performance, encoders should additionally take into account the special form of visual redundancies of FoSIs.

In existing works, there are two mainstream schemes for the compression of FoSIs, including image-based schemes and video-based schemes. In image-based schemes, FoSIs are directly compressed via JPEG-like image compression tools in the applications of 3D medical imagery [24]. Each image of FoSIs is treated as an independent image, and the inter-frame redundancies are absent in the consideration. The 1D signals of frequency components are obtained by 3D-DCT and 3D zigzag scan, and then compressed via Huffman coding [25]. Both spatial and inter-frame redundancies can hardly be eliminated due to the sequential organization of signals. ZPEG is proposed by employing differential pulse-code modulation to decorrelate neighboring FoSIs after DCT [26]. The redundancies of co-located blocks cannot be exploited, because it takes the whole image as input. In video-based schemes, state-of-the-art video encoder, high efficiency video coding (HEVC) is recently applied to eliminate inter-frame redundancies of FoSIs [27,28]. These schemes can obtain better compression performance than image-based schemes. However, the motion-based inter prediction employed in video-based scheme cannot fully exploit redundancies induced by changing of focus. The special type of redundancies of FoSIs accounts for the incompatibility.

In this paper, we propose a new Gaussian based representation model to describe the special type of redundancies of FoSIs. In the representation, each sharpest block of FoSIs are used to represent the blocks in the same position of remained images (co-located blocks). Based on this representation model, we develop a coding scheme to fully exploit redundancies of FoSIs. Firstly, FoSIs are split into blocks with fixed size. Each block is measured by proposed spatial-weighted sharpness metric. And the sharpest block is selected as the basis of co-located blocks. Subsequently, all bases are stitched into an all-in-focus image according to their original positions. And the obtained all-in-focus image is compressed via image/video coding tools. Finally, compressed bases are convoluted with 2D Gaussian function to approximate their co-located blocks. Gaussian parameters will be compressed via lossless coding. Moreover, a reconstruction scheme is presented to reconstruct complete FoSIs from limited number at user-side. And a frame interpolation rendering scheme is proposed to improve the smooth of FoSIs displays. To the best of our knowledge, it is the first work to compress, reconstruct and render FoSIs by using its representation model, which builds a bridge between the source-side and the user-side. The proposed scheme is verified via 10 public test sequences, and the experimental results show that state-of-the-art performance can be obtained. Our contributions of this work can be concluded as follows:

• A block-wise spatial-weighted sharpness metric of FoSIs is proposed to assess the focus level of each FoSIs block. The boundaries of one block contain objects at different depths from central region. To weaken the impact of boundary region on sharpness assessment, we reduce the weight of boundary regions and keep the weight of central region. The assessment results are consistent with human visual perception.
• By analyzing the point spread function of optical systems, a new block-wise Gaussian representation model is proposed to describe the characteristics of focus changes for FoSIs. In this representation, the focus level of each FoSIs block is assessed by proposed sharpness metric. The sharpest block serves as basis to approximate the remaining blocks via Gaussian convolution. The Gaussian parameters are estimated by solving optimization problem. Consequently, FoSIs can be efficiently represented by Gaussian parameters and corresponding bases.
• Based on the proposed representation model, we propose a series of efficient coding, reconstruction and rendering schemes for end-to-end applications. A coding scheme at source-side is aiming at exploiting the inter-frame redundancies of FoSIs. A reconstruction scheme is presented to reconstruct complete FoSIs from limited number. A frame interpolation rendering scheme improves the smooth of FoSIs displays at user-side. These schemes facilitate various visual effects on end-to-end applications.

The remainder of this paper is organized as follows. Sec. 2 describes the principle of optical system for capturing FoSIs. In Sec. 3, we present all details of proposed Gaussian based representation of FoSIs. Representation based schemes for coding, reconstruction and rendering are Sec. 4. Experiments and discussions are given in Sec. 5. Finally, we conclude our work in Sec. 6.

2. Optical analysis of FoSIs

Existing FoSIs acquisition approaches can be divided into two fundamental categories: time-sequential capture by conventional optical systems, such as digital camera or microscope, and refocusing after a single exposure by plenoptic systems, such as camera array or plenoptic camera (a.k.a light field camera). To simplify the optical analysis, ordinary camera is chose and the optical structure of it is depicted in Fig. 2. As shown in Fig. 2, an ordinary camera is composed of a lens and a photo detector. Light rays emitting from objects in the scene will propagate through the lens. And they converge to a point or give rise to a circle of confusion at detector when the detector plane shifts along the optical axis. This kind of distributions at detector ultimately produce focused or defocused regions, respectively. It follows the rule of point spread function (PSF) [29].

Fig. 2. Schematic diagram of the optical system for capturing FoSIs.

Download Full Size | PDF

PSF is the main brick that builds up the captured images. Typically PSF of diffraction-limited optical system is defined as the response to a monochromatic incoherent point source [30]. The term diffraction-limited means that there is no degradation for an optical system other than unavoidable diffraction. Theoretical PSF is ideally a diffraction pattern referred to as Airy pattern shown in Fig. 3. The pattern consists of a series of concentric rings with a bright central Airy disc. The Airy pattern can be described by a complex equation:

(1)$$h_a(x,y)=\left(\frac{2J_{1}\left(\pi D \sqrt{x^{2}+y^{2}}/{\lambda f} \right)}{\pi D \sqrt{x^{2}+y^{2}}/{\lambda f}}\right)^{2},$$

where $J_{1}(\cdot )$ is a Bessel function of first order and first kind; $\lambda$, $D$ and $f$ are the wavelength of light, the aperture diameter, and the focal length of lens, respectively. The central disc contains about 85% energy of the pattern. For practical solutions, it can be approximated by Gaussian fitting shown in Fig. 3(c). The Gaussian profile is expressed as follows:

(2)$$h_{g}(x, y) = \frac{1}{2\pi{\sigma}^2}exp\left(-\frac{x^2+y^2}{2{\sigma}^2}\right),$$

Fig. 3. Ideal PSF of optical systems is an Airy pattern. (a) Airy pattern. (b) 3D visualization of Airy pattern. (c) Gaussian approximation of Airy disc.

Download Full Size | PDF

The Airy disc PSF is valid only when the detector plane near the optical axis coincides with the focal plane. In this case, the effects of diffraction on optical system far exceed all other factors [30]. However, for the practical systems of capturing FoSIs shown in Fig. 2, the detector plane shifts along the optical axis. Light from scene is not always near the optical axis or detector plane does not always coincide with focal plane. In this case, geometrical optics plays a more vital role in the defocus optical system than diffraction optics. When the diffraction effects can be neglected, the geometrical PSF usually be modeled by a pillbox function or a 2D Gaussian function. The pillbox function is described as follows:

(3)$$h_p(x,y) =\left\{\begin{array}{ll} 0 & \sqrt{x^2+y^2} > r\\ \frac{1}{\pi r^2} & \sqrt{x^2+y^2} \leq r \end{array}\right.$$

where $r$ is the blur radius, which is related to the shift of detector plane. The 2D Gaussian function is shown as Eq. (2), which has the same form with the Gaussian approximation of Airy dics.

However, the geometrical PSF model is invalid when the detector plane is near focal plane, which corresponds to small amounts of defocus [30]. A practical defocus optical system for capturing FoSIs should take into account the effect of both the diffraction PSF and the geometrical PSF. Therefore, the PSF of optical systems for capturing FoSIs can be approximately modeled by Gauss-like function as follows:

(4)$$h(x,y) =\left\{\begin{array}{ll} h_a(x,y) \approx h_g(x,y) & if \ near \ optical \ axis \ and \ focal \ plane\\ h_g(x,y) & else \end{array}\Rightarrow \ h_g(x,y),\right.$$

The uniform Gaussian function is a practical and intuitive model. It avoids the quantitative definition of near optical axis and the coincidence level between detector plane and focal plane.

3. Characteristics and representation of FoSIs

3.1 Characteristics of FoSIs in block-wise

Regions of FoSIs are sharp when they are in focus, while others are blurred when they are out of focus. The characteristics of FoSIs in block-wise are shown in Fig. 4. Figure 4(a) shows co-located blocks of FoSIs. For example, block $A$, $B$ and $C$ belong to different co-located blocks. Figure 4(b) shows the close-ups of co-located blocks. Visual redundancies can be found between adjacent co-located blocks. And the redundancies are induced by changing of focus regions rather than motion of scenes. Curves in Fig. 4(c) depict sharpness changes inside co-located blocks measured by proposed sharpness metric. The peak of a curve denotes the sharpest block among co-located blocks. Sharpness score of one region indicates the focus level of it. Thus, Fig. 4(c) shows that focus levels of co-located blocks vary with their indexes in a FoSIs sequence. Moreover, curves tend to be smooth and have only one peak, which demonstrates the strong correlation among co-located blocks. Modeling the correlation and exploiting the redundancies of FoSIs are presented in subsections below.

Fig. 4. Characteristics of FoSIs. (a) Co-located blocks of FosIs. (b) Visual redundancies of co-located blocks. (c) Smooth changes of sharpness among co-located blocks

Download Full Size | PDF

3.2 Representation model of FoSIs

Based on the theoretical optical analysis and practical characteristics of FoSIs, we propose a new Gaussian based representation model of FoSIs shown in Fig. 5. In this model, FoSIs are split into blocks with fixed size. The focus level of each block is measured by proposed spatial-weighted sharpness metric. And the sharpest block is selected as the basis and used to represent its co-located blocks. Co-located blocks are approximated by convoluting their basis with Gaussian PSF.

Fig. 5. Flowchart of proposed representation model of FoSIs.

Download Full Size | PDF

3.2.1 Block-wise spatial-weighted sharpness metric

Various pixel-wise sharpness metrics are used to assess the focus levels of images [31,32]. However, these metrics are not suitable for block-based image compression of focal stack images. The boundaries of one block contain objects at different depths, which depends on the fineness of block partition. Thus, assessing the sharpness of one block should take into account the relative importance of different region in spatial domain.

To reduce the impact of boundary region on sharpness assessment, we reduce the importance of boundary regions and keep the importance of central region. An template of spatial weight with the size of $64\times 64$ is shown in Fig. 6.

Fig. 6. Example of spatial weight distribution.

Download Full Size | PDF

In this paper, we propose a block-wise spatial-weighted sharpness metric to assess the focus level of co-located blocks shown in Fig. 7. Firstly, we compute the gradient magnitudes of co-located blocks from the product of the gradients in horizontal and vertical directions. The gradient of a position in one direction is the absolute deviation between current and adjacent position. For the $i$-th block of co-located blocks, the gradient $G_i(x,y)$ of position $f_i(x,y)$ is computed as follows:

(5)$$G_i(x,y) = \left|f_i(x,y)-f_i(x+1,y)\right|\cdot\left|f_i(x,y)-f_i(x,y+1)\right|,$$

where $f_i(x+1,y)$ and $f_i(x,y+1)$ are the adjacent positions of current position $f_i(x,y)$. An spatial-weight template of $64\times 64$ is expressed as Eq. (6).

(6)$$W(x,y) =\left\{\begin{array}{ll} 1 & \left|x\right| \leq 23, \left|y\right| \leq 23, x,y\in \mathbb{Z} \\ 1-(\left|x\right|-23)/10 & \left|x\right| \geq \left|y\right|,\left|x\right| \geq 24, x,y\in \mathbb{Z}\\ 1-(\left|y\right|-23)/10 & \left|x\right| \lt \left|y\right|,\left|y\right| \geq 24, x,y\in \mathbb{Z} \end{array}\right.$$

Fig. 7. Flowchart of proposed spatial-weighted sharpness metric.

Download Full Size | PDF

Finally, the gradients of all positions in one block are weighted by template as follows:

(7)$$S_i = \sum_{x=1}^M \sum_{y=1}^N G_i(x,y) \cdot W(x,y),$$

where $W(x,y)$ is the spatial weight of one block defined in Eq. (6); the weighted sum $S_i$ is the sharpness score of the block. For FoSIs with the image number of $z$, we calculate sharpness scores of all $z$ co-located blocks, and select the block with highest score as the basis of the co-located blocks as follows:

(8)$$i = \mathop {\arg \max}_{1\leq i\leq z}(S_i)$$

Example in Fig. 4(c) depicts the changes of sharpness scores inside co-located blocks. The peak of a curve represents the basis of the co-located blocks. We can find that the part of curve on one side of peak is overall monotonically changed. This characteristic can be further used to in sparse reconstruction or frame interpolation rendering of FoSIs.

3.2.2 Gaussian approximation

The focus level of each block is measured by spatial-weighted sharpness metric. And the sharpest block is selected as the basis to represent its co-located blocks. Typically, a blurred image of an object can be approximated as the convolution of a sharp image of the same object with a 2D Gaussian function [33,34] as follows:

(9)$$g(x,y) = f(x,y) \otimes h(x,y,\sigma) + n,$$

where $g(x,y)$, $f(x,y)$ and $n$ are blurred image, sharp image and noise, respectively; $(x, y)$ corresponds to the position in an image; $\otimes$ denotes convolution operation; $h(x,y,\sigma )$ is the 2D Gaussian function whose definition is as follows:

(10)$$h(x,y,\sigma) = \frac{1}{2\pi{\sigma}^2}exp(-\frac{x^2+y^2}{2{\sigma}^2}),$$

where $\sigma$ is the standard deviation of Gaussian function, and it indicates the blurred level of one blurred image relative to sharp image.

Generalizing to block level, it is reasonable to approximate co-located blocks by the sharpest block. The key is to find the optimal Gaussian parameters of each co-located block, which can be calculated by solving convex optimization problem as follows:

(11)$$D(\sigma) = \frac{1}{M \times N} \sum_{x=1}^M \sum_{y=1}^N \left(B_i(x,y) - \hat{B}(x,y) \otimes h(x,y;\sigma)\right)^2,$$

(12)$$\sigma_o =\mathop {\arg \min}D(\sigma),$$

where $B_i(x,y)$ is the $i$-th block of co-located blocks, and $\hat {B}(x,y)$ stands for the basis of them; $h(x,y;\sigma )$ denotes 2D Gaussian function with standard deviation $\sigma$ (also called Gaussian parameter in this paper), and $\hat {B}(x,y) \otimes h(x,y;\sigma )$ represents the prediction of $B_i(x,y)$ from basis $\hat {B}(x,y)$; $D(\sigma )$ is the distortion of the prediction assessed by mean squared error (MSE). The optimal $\sigma _o$ of the $i$-th block $B_i(x,y)$ is obtained by minimizing the distortion. Since distortion $D(\sigma )$ is convex to prediction $\hat {B}(x,y) \otimes h(x,y;\sigma )$, the local optimal solution of $D(\sigma )$ is the global optimal solution. Example in Fig. 8(a) shows the relation between $D(\sigma )$ and $\sigma$. The relation can contribute to reduce the range of optimal solution.

Fig. 8. Visualization of distributions. (a) The relation between $D(\sigma )$ and Gaussian parameter $\sigma$. (B) The optimal $\sigma _o$ distribution of all blocks for FoSIs. (c) The difference of optimal $\sigma _o$ between two adjacent co-located blocks

Download Full Size | PDF

For FoSIs sampled at regular intervals in depth, example in Fig. 8(b) illustrates the distribution of optimal Gaussian parameter of block-wise FoSIs. The $Z$-axis represents the value of optimal Gaussian parameter $\sigma _o$ of each block, and the value of basis is set as zero. The $X$-axis indicates the index of each block among co-located blocks. The $Y$-axis is the block partition of FoSIs, which is reorganized into sequential blocks by zigzag scan. It reveals that the optimal Gaussian parameter monotonically changes when index is behind or after the zeros of $Z-X$ function. The monotonicity is helpful to analyze the relationship of optimal $\sigma _o$ between two adjacent blocks.

For all co-located blocks of FoSIs, we calculate the difference of optimal Gaussian parameter $\sigma _o$ between two adjacent blocks shown in Fig. 8(c). We can find that most of the differences are distributed in a narrow range near zero. Thus, the optimal Gaussian parameter of one block can be used to predict that of adjacent blocks. The prediction between two adjacent blocks is formulated as follows:

(13)$$\sigma^{adj} = \sigma_o + \delta \quad \delta \in \mathcal{R}, \ \mathcal{R}=\{\delta_0,\delta_1,\ldots\delta_m\},$$

where $\sigma _o$ is the optimal Gaussian parameter of current block; $\sigma ^{adj}$ is the prediction of optimal Gaussian parameter of adjacent block; $\delta$ is the offset term, which comes from a narrow range $\mathcal {R}$ with $m$ candidates. Therefore, solving the optimization problem of adjacent block can be converted to finding the range and determine optimal candidates in the range. The conversion is shown as follows:

(14)$$\sigma_o^{adj}= \mathop {\arg \min} D(\sigma^{adj}) \Rightarrow \ \delta_o = \mathop {\arg \min}_{\delta \in \mathcal{R}} D(\sigma_o + \delta), \ \mathcal{R}=\{\delta_0,\delta_1,\ldots\delta_m\} \Rightarrow \ \mathcal{R}, \delta,$$

where $\sigma _o^{adj}$ is the optimal Gaussian parameter of adjacent block; Offset $\delta$ is within a range $\mathcal {R}=\{\delta _0,\delta _1,\ldots \delta _m\}$, and the optimal offset $\delta _o$ is obtained by minimizing the distortion $D(\sigma _o + \delta )$. Thus, the key is to find the range $\mathcal {R}$ and the exact location of offset $\delta$ in the range. By analysing numerous optimal Gaussian parameters between adjacent blocks for different scenes, the range $\mathcal {R}$ can be obtained statistically. The significance of $\mathcal {R}$ is to locate solution in a narrow range of Gaussian parameters. Solving the offset term locally will greatly reduce the consumption of computing than solving the global optimization problem defined in Eq. (14).

4. Representation based applications

The representation of FoSIs decomposes the redundant data into bases and Gaussian parameters at source-side. It has greatly reduced the demand of data volume to describe this special format of images. Since FoSIs are mainly used at user-side, compression, reconstruction and rendering are essential. Based on proposed representation model of FoSIs, we develop a series of new schemes for coding, reconstruction and rendering applications.

4.1 Coding

To recover FoSIs with high quality and limited bandwidth at user-side, bases and Gaussian parameters of proposed representation model should be compressed before transmission. Gaussian representation based coding scheme is proposed to fully exploit redundancies of FoSIs shown in Fig. 9. Firstly, all bases are selected from co-located blocks by spatial-weighted sharpness metric, and stitched into an all-in-focus image according to their original positions. The generated all-in-focus image contains all the bases of FoSIs, and will be compressed via HEVC encoder. The redundancies of inter-basis can be eliminated by the intra-frame prediction of the encoder. Then, these compressed bases are employed to predict their co-located blocks by Gaussian approximation. Gaussian parameters of floating point type are magnified tenfold and converted to integer type. The integer parameters are rearranged to form a 2D matrix. Thirdly, the matrix will be compressed via lossless coding. Each position of the matrix is predicted using its upper, left and upper-left positions. And the prediction error is converted to binary form and written into bitstream via entropy coding. Finally, bitstream of Gaussian parameters will be merged with previously generated bases bitstream.

Fig. 9. Flowchart of proposed coding scheme.

Download Full Size | PDF

4.2 Reconstruction

To reconstruct complete FoSIs from limited number, a reconstruction scheme is presented. In this scheme, only half number of FoSIs at source-side serve as input to reconstruct the complete FoSIs at user-side. Firstly, images with odd index in the sequence of FoSIs are measured by spatial-weighted sharpness metric, and bases of odd images are stitched into an all-in-focus image. Then, the obtained all-in-focus image is compressed via intra prediction. The complete FoSIs, including odd and even images, are approximated by the compressed all-in-focus image. Gaussian parameters of complete FoSIs are compressed at source-side and reconstruct at user-side.

4.3 Rendering

Gaussian representation based frame interpolation rendering scheme is proposed to improve the smooth of FoSIs displays. New images with varying focus level can be generated at user-side. Firstly, bases and Gaussian parameters of co-located blocks are decoded after transmission. Gaussian parameters of co-located blocks represents the focus level of it. Then, new Gaussian parameter can be obtained by interpolating Gaussian parameters between adjacent co-located blocks. Block with new focus level can be synthesized by convoluting basis with the interpolated Gaussian parameter. Finally, New images are interpolated and placed in the sequence of FoSIs shown in Fig. 10.

Fig. 10. Flowchart of proposed frame interpolation rendering scheme.

Download Full Size | PDF

5. Experiments and discussions

5.1 Experiment settings

In proposed FoSIs coding scheme, basis of co-located blocks is selected by proposed spatial-weighted sharpness metric. And then, all bases are stitched into an all-in-focus image according to their original positions. The obtained all-in-focus image is compressed via HEVC Test Model of HM 16.20 with All Intra configuration [35]. The compressed all-in-focus image will be used to predict the complete FoSIs via Gaussian based representation. And the obtained Gaussian parameters will be compressed by lossless coding. Therefore the bitstreams of coded all-in-focus image and corresponding Gaussian parameters are the total bits that FoSIs consumes. On the other hand, FoSIs are considered as a pseudo video sequence, and HM 16.20 is employed to compress the sequence with the configuration of Low Delay P Equal Quantization Parameter (LDPEQP). Quantization is the process of mapping continuous signal values to a smaller set of discrete finite values, which is the main process of image compression. Quantization in an encoder is controlled by quantization parameter (QP). As QP increases, bitrate requirement and compression quality decrease. HM 16.20 with the configuration of LDPEQP is set as the benchmark coding scheme in our experiments. In addition, we compare proposed coding scheme with our previous FoSIs coding scheme, Gaussian guided inter prediction (GGIP) with the configuration of LDPEQP [36].

In this experiment, we evaluate the performance of the proposed coding scheme on 7 synthetic scenarios and 3 realistic scenarios are used as test sequences in our experiment. QPs are set as 22, 27, 32 and 37 for synthetic scenarios, 32, 37, 42 and 47 for realistic scenarios, respectively. Synthesized scenarios are generated from light field dataset [37] captured by Lytro Illum camera. Specifically, the function $LFFiltShiftSum$ of Light Field Toolbox v0.4 [38] is used to refocus sub-aperture images of light field on 41 focal planes. The focal planes are parameterized as $slope$, which determines the depth that we focus on. All FoSIs of synthetic scenarios are well aligned since sub-aperture images are shifted to center view. Realistic scenarios are from Focus-Path dataset [39], a benchmark dataset of natural images in digital pathology. FoSIs are generated by scanning slide samples with different focal planes using Huron TissueScope LE1.2, a Laser confocal microscopy. Due to the slight magnification changes, image registration using homography transform is required for realistic scenarios. Both synthetic and realistic scenarios are converted into YUV420 color space with 8 bits precision. The detailed information of all scenarios are listed in Table 1.

Table 1. Detailed information about test sequences

View Table | View all tables in this article

5.2 Experimental parameter discussions

There are two experimental parameters that should be determined: the size of co-located blocks and the range term $\mathcal {R}$ of solving optimization problem. These parameters are highly related to the efficiency of proposed schemes. Two experiments are conducted for these two parameters.

5.2.1 Determine the block size

The coding scheme based on proposed representation is block-wised. Thus, the block size determines the Rate-Distortion (RD) performance, which is a trade-off between compression quality and bits consuming. From the perspective of video codec, compression quality benefits from fined partition pattern (small block size). However, it consumes more bitrate due to more parameters. There is a trade-off between block size and RD performance.

The experiments for the determination on block size are performed on synthetic and realistic scenarios. Specifically, we include I02 and I05 for synthetic scenarios, I09 and I10 for realistic scenarios, respectively. Three different block sizes are tested for different scenarios in our coding schemes. Specifically, $32\times 32$, $64\times 64$ and $128\times 128$ for synthetic scenarios, while $64\times 64$, $128\times 128$ and $256\times 256$ for realistic scenarios.

The results are presented in Fig. 11. As can be found in this figure, block size of $64\times 64$ generally has better coding performance than other block sizes in all involved synthetic scenarios. To be more specific, block size of $32\times 32$ is slightly inferior to $64\times 64$ in higher bitrate, and $128\times 128$ is close to or even better than $64\times 64$ in lower bitrate for I02 and I05. For realistic scenarios, block sizes performance differs from that of synthetic scenarios. Block size of $128\times 128$ has almost the same coding performance with $256\times 256$, and both of them are better than $64\times 64$ for I09. For I10 of realistic scenarios, block size of $128\times 128$ achieves best performance than the others in lower bitrate, and it obtains intermediate performance among three block sizes in higher bitrate. In general, overall gains can be obtained when block size is fixed to $64\times 64$ for synthetic scenarios and $128\times 128$ for realistic scenarios. The above block sizes are adopted in all the following experiments.

Fig. 11. Comparison of different block sizes.

Download Full Size | PDF

5.2.2 Locate the range

The experiments are performed on synthetic and realistic scenarios. Specifically, we include I05 for synthetic scenarios, and I09 for realistic scenarios, respectively. On the one hand, bases contained in compressed all-in-focus image are used to predict their corresponding co-located blocks. On the other hand, uncompressed bases are used to predict their corresponding co-located blocks. In this way, optimal Gaussian parameters of each block are obtained in compressed and uncompressed cases.

We calculate the difference of optimal Gaussian parameters between two adjacent blocks, and statistically analyse the range of the difference listed in Table 2. It can be seen that more than $90\%$ of difference are distributed in the range of $[-0.4,0.4]$ for I05. And at least $95\%$ are in the range of $[-0.6,0.6]$. For I09, the range of $[-1.8,1.8]$ achieves more than $94\%$ of difference distributed in. And overall $98\%$ of difference are distributed in the range of $[-2.0,2.0]$. Moreover, we find that there are no significant differences of proportions in compressed and uncompressed cases when ranges are wider than $[-0.5,0.5]$ for I05 and $[-1.8,1.8]$ for I09. It is beneficial to predict the rough range of compressed case before compressing basis. Therefore, the range of $[-0.5,0.5]$ and $[-1.8,1.8]$ are set as the local range in following experiments for synthetic and realistic scenarios, respectively.

Table 2. The range distribution of the difference of optimal Gaussian parameters between two adjacent blocks

View Table | View all tables in this article

5.3 Overall performance

5.3.1 Coding performance

The performances of benchmark, GGIP and proposed scheme are evaluated by RD performance and new view synthesis performance. The RD performance of Y-component is measured by Bjontegaard-Delta bitrate (BDBR) and Bjontegaard-Delta PSNR (BDPSNR) [40], while synthesis performance is assessed by linear view synthesis [41]. New views of $11\times 11$ are synthesized using compressed FoSIs. Due to lack of ground truth of realistic scenarios, we only synthesize new views for synthetic scenarios. The results of these scheme are listed in Table 3. Note that abbreviations in Table 3 are explained in Table 4.

Table 3. RD and rendering performance of all coding schemes

View Table | View all tables in this article

Table 4. Explanations about the abbreviations used in Table 3

View Table | View all tables in this article

It can be seen from Table 3 that both GGIP and proposed coding scheme obtain gains compared with benchmark coding scheme for all test sequences. And proposed coding scheme has overall better performance than GGIP coding scheme in terms of RD performance and synthesis performance. As for synthetic scenarios, proposed coding scheme achieves 65.31% average bitrate saving and 3.31 dB average PSNR gains in RD performance, and 71.85% average bitrate saving and 0.26 dB average PSNR gains in synthesis performance. Specifically, 70.26% bitrate savings and 3.98 dB PSNR gains can be provided for I03 in RD performance. As high as 83.01% bit rate saving for I07 and 0.37 dB PSNR gains for I05 are achieved in synthesis performance, respectively. As for realistic scenarios, proposed coding scheme can provide 48.86% bit rate savings and 1.87 dB PSNR gains on average. Particularly, I09 achieves up to 55.42% bit rate savings and 2.50 dB PSNR gains.

The RD performance of all schemes for test sequences I03 and I09 are shown in Fig. 12. These RD curves reveal the relationship between rate (bitrate consumption) and distortion (compression quality) of all coding schemes. If we draw a horizontal line in Fig. 12(a), the intersections with curves show the rate consumption of all schemes with the same compression quality. It can be found that benchmark, GGIP and proposed coding schemes approximately consume 63000, 42000 and 20000 bytes, respectively. For test sequence I03 of 41 frames in YUV420 data storage format, the original uncompressed size is $624 \times 432 \times 1.5 \times 41 = 16578732 \ bytes$. Therefore, the compression ratios of benchmark, GGIP and proposed schemes are $16578732 / 63000 = 263.15$, $16578732 / 42000 = 394.73$ and $16578732 / 20000 = 828.94$, respectively. Similarly, the original size for I09 is $1024 \times 1024 \times 1.5 \times 9 = 14155776 \ bytes$, the compressed size of proposed scheme in 42 QP settings is $17263 + 468 = 17731 \ bytes$ shown in Table 3. Therefore, the compression ratio of proposed scheme for test sequence I09 is $14155776 / 17731 = 798.36$. The proposed coding scheme achieves significant compression ratios for both synthetic and realistic test sequences.

Fig. 12. RD performance of all schemes.

Download Full Size | PDF

Moreover, we assess the subjective quality of new views synthesis for test sequences I05 and I06 shown in Fig. 13. The first column from left is the ground-truth of central views obtained from light field dataset [37]. The images of remaining columns are the central views synthesized from compressed FoSIs of benchmark, GGIP and proposed coding scheme, respectively. It can be found that GGIP obtains slight PSNR increment and cost less bitrate than benchmark. Both of them have close subjective quality with compression artifacts. The proposed coding scheme achieves significant PSNR increment with less bitrate consuming than benchmark. And the subjective quality of proposed coding scheme is more close to ground-truth than other schemes.

Fig. 13. Subjective quality of new views synthesis from compressed FoSIs for Y-component.

Download Full Size | PDF

The considerable bitrate savings and PSNR increments illustrate that proposed coding scheme has solid RD performance. Synthesis performance is also evaluated by the significant improvement in subjective quality. We can conclude that proposed coding scheme can fully eliminate the redundancy of FoSIs.

5.3.2 Reconstruction performance

In this experiment, proposed FoSIs reconstruction scheme is performed on test sequences of I02 and I09. Half of FoSIs with odd frames are as input. Due to no motion of objects in sceneries, weighted average of adjacent co-located blocks is an acceptable method to interpolate new frames. Therefore, we adopt weighted average method as comparison scheme. The range of weight is set from 0 to 1 with stride of 0.1 for two adjacent co-located blocks. And the sum of weights of adjacent blocks equals to 1. The input frames are compressed by HM16.20 with there configurations, namely, Low Delay P (LDP), Low Delay B (LDB) and Random Access (RA). The QPs are set as 32, 37, 42 and 47. These comparison schemes with different configurations are named HEVC-LDP, HEVC-LDB and HEVC-RA, respectively.

The results all schemes are shown in Fig. 14. It can be found that proposed reconstruction scheme obtains overall better performance than comparison scheme in all configurations for all scenarios. Specifically, it achieves significant PSNR increments in all bitrate for I02. For I05 of realistic scenarios, it obtains better performance in lower bitrate. However the performance is close to comparison scheme in higher bitrate. Because there is only intra prediction in proposed scheme, while inter prediction is applied in comparison scheme. If the test sequences have no plenty frames, the advantage of proposed scheme will be reduced in higher bitrate.

Fig. 14. Comparison of reconstruction performance.

Download Full Size | PDF

5.3.3 Frame interpolation rendering performance

To improve the smooth of FoSIs displays, new images with moderate focus level are generated by frame interpolation rendering. In this experiment, new Gaussian parameter are interpolated by averaging Gaussian parameters between adjacent co-located blocks. In this way, new images with varying focus levels can be rendered by convoluting bases with corresponding new Gaussian parameters. Moreover, image average scheme is adopted as comparison scheme. The average of adjacent two images is the interpolation rendering result. And new images with average focus level between adjacent images are set as the target image. We compare two frame interpolation rendering schemes at decode side. FoSIs are compressed by HM16.20 with the configuration of LDPEQP, and QPs are set as 22, 27, 32 and 37. Two schemes are implemented on all 10 test sequences.

We randomly select one block of test sequence I05 as example, rendering results of block (5,4) are shown in Fig. 15. Figure 15(a) and (b) illustrate the rendering results of proposed interpolation scheme and comparison scheme, respectively. We can find that proposed scheme achieves more accurate focus level than comparison scheme. Specifically, most interpolation results of proposed scheme are coincide or near to the targets, while comparison scheme has more interpolation results deviate from the targets. Moreover, we analyze the interpolation errors of two rendering schemes for all test sequences. The errors of one test sequence are computed by averaging all images of FoSIs. The error analysis is shown in Fig. 16. It shows that error varies for different QPs, and proposed scheme achieves overall lower errors than comparison scheme. Specifically, proposed scheme obtains significant error reduction for I01,I02,I03,I04,I05,I06, I07 and I10. The interpolation errors of proposed scheme are close to comparison scheme for I08 and I09.

Fig. 15. Comparison of frame interpolation rendering scheme.

Download Full Size | PDF

Fig. 16. Error Comparison of frame interpolation rendering scheme.

Download Full Size | PDF

6. Conclusions

FoSIs act a vital role in extending DoF, computational photography, macroscopic and microscopic imaging, virtual reality and other immersive visual applications. In this paper, an efficient Gaussian based FoSIs representation is proposed to describe the focus changes of FoSIs. Based on this representation model, we proposed a series of schemes for the coding, reconstruction and rendering of FoSIs. The coding scheme is aimed at exploiting the inter-frame redundancies induced by changing of focus. The reconstruction scheme is presented to reconstruct complete FoSIs from limited number. The rendering scheme improves the smooth of FoSIs displays by frame interpolation. Experimental results demonstrate the high efficiency of proposed representation model and the superior performance of proposed schemes. We believe it will expand application scopes of optical imaging systems limited by DoF, and contribute to the visual community, especially, end-to-end applications.

Funding

National Natural Science Foundation of China (91848107, 61971203, 62071266); National Key Research and Development Program of China (2017YFC0806202); Wuhan Municipal Science and Technology Bureau (2019020701011422, 2020020601012222).

Disclosures

The authors declare no conflicts of interest.

References

1. K. He, X. Wang, Z. Wang, H. J. Yi, N. F. Scherer, A. K. Katsaggelos, and O. Cossairt, “Snapshot multifocal light field microscopy,” Opt. Express 28(8), 12108–12120 (2020). [CrossRef]

2. M. S. Sigdel, M. Sigdel, S. Dinç, I. Dinç, M. L. Pusey, and R. S. Aygün, “FocusALL: focal stacking of microscopic images using modified Harris corner response measure,” IEEE-ACM Trans. Comput. Biol. Bioinform. 13(2), 326–340 (2016). [CrossRef]

3. X. Zhang, F. Zeng, Y. Li, and Y. Qiao, “Improvement in focusing accuracy of DNA sequencing microscope with multi-position laser differential confocal autofocus method,” Opt. Express 26(2), 887–896 (2018). [CrossRef]

4. L. Ma, X. Zhang, Z. Xu, A. Spath, Z. Xing, T. Sun, and R. Tai, “Three-dimensional focal stack imaging in scanning transmission X-ray microscopy with an improved reconstruction algorithm,” Opt. Express 27(5), 7787–7802 (2019). [CrossRef]

5. W. Ono, H. Shionozaki, T. Ijiri, K. Kohiyama, and H. Tanaka, “Shape and texture reconstruction for insects by using X-ray CT and focus stack imaging,” in SIGGRAPH Asia 2018 Posters, (Association for Computing Machinery, New York, NY, USA, 2018), SA ’18, pp. 1–2.

6. H. Li, J. Peng, F. Pan, Y. Wu, Y. Zhang, and X. Xie, “Focal stack camera in all-in-focus imaging via an electrically tunable liquid crystal lens doped with multi-walled carbon nanotubes,” Opt. Express 26(10), 12441–12454 (2018). [CrossRef]

7. J. R. Chang, B. V. K. V. Kumar, and A. C. Sankaranarayanan, “Towards multifocal displays with dense focal stacks,” ACM Trans. Graph. 37(6), 1–13 (2019). [CrossRef]

8. C. Chang, W. Cui, and L. Gao, “Holographic multiplane near-eye display based on amplitude-only wavefront modulation,” Opt. Express 27(21), 30960–30970 (2019). [CrossRef]

9. X. Yin, G. Wang, W. Li, and Q. Liao, “Iteratively reconstructing 4D light fields from focal stacks,” Appl. Opt. 55(30), 8457–8463 (2016). [CrossRef]

10. C. Liu, J. Qiu, and M. Jiang, “Light field reconstruction from projection modeling of focal stack,” Opt. Express 25(10), 11377–11388 (2017). [CrossRef]

11. E. Alexander, Q. Guo, S. Koppal, S. Gortler, and T. Zickler, “Focal Flow: measuring distance and velocity with defocus and differential motion,” in Proc. Eur. Conf. Comput. Vis. (ECCV), (Springer, 2016), pp. 667–682.

12. Y. Chen, X. Jin, and Q. Dai, “Distance measurement based on light field geometry and ray tracing,” Opt. Express 25(1), 59–76 (2017). [CrossRef]

13. C. Hahne, A. Aggoun, V. Velisavljevic, S. Fiebig, and M. Pesch, “Refocusing distance of a standard plenoptic camera,” Opt. Express 24(19), 21521 (2016). [CrossRef]

14. T. Zhan, Y. Lee, and S. Wu, “High-resolution additive light field near-eye display by switchable pancharatnam-berry phase lenses,” Opt. Express 26(4), 4863–4872 (2018). [CrossRef]

15. C. Stefani, A. Lacy-Hulbert, and T. L. Skillman, “Confocalvr: Immersive visualization for confocal microscopy,” J. Mol. Biol. 430(21), 4028–4035 (2018). [CrossRef]

16. S. Kuthirummal, H. Nagahara, C. Zhou, and S. K. Nayar, “Flexible depth of field photography,” IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 58–71 (2011). [CrossRef]

17. S. Liu and H. Hua, “Extended depth-of-field microscopic imaging with a variable focus microscope objective,” Opt. Express 19(1), 353–362 (2011). [CrossRef]

18. N. Li, B. Sun, and J. Yu, “A weighted sparse coding framework for saliency detection,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp. 5216–5223.

19. Y. Kobayashi, K. Takahashi, and T. Fujii, “From focal stacks to tensor display: a method for light field visualization without multi-view images,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), (2017), pp. 2007–2011.

20. L. Ma, X. Zhang, Z. Xu, A. Späth, Z. Xing, T. Sun, and R. Tai, “Three-dimensional focal stack imaging in scanning transmission X-ray microscopy with an improved reconstruction algorithm,” Opt. Express 27(5), 7787–7802 (2019). [CrossRef]

21. T. Broad and M. Grierson, “Light field completion using focal stack propagation,” in ACM SIGGRAPH 2016 Posters, (Association for Computing Machinery, New York, NY, USA, 2016), SIGGRAPH ’16, pp. 1–2.

22. D. Xiao, L. Wang, T. Xiang, and Y. Wang, “Multi-focus image fusion and robust encryption algorithm based on compressive sensing,” Opt. Laser Technol. 91, 212–225 (2017). [CrossRef]

23. S. N. Kumar, A. L. Fred, and P. S. Varghese, “Compression of CT images using contextual vector quantization with simulated annealing for telemedicine application,” J. Med. Syst. 42(11), 218 (2018). [CrossRef]

24. B. Kim, K. H. Lee, K. J. Kim, T. Richter, H. Kang, S. Y. Kim, Y. H. Kim, and J. Seo, “Jpeg2000 3D compression vs. 2D compression: an assessment of artifact amount and computing time in compressing thin-section abdomen CT images,” Med. Phys. 36(3), 835–844 (2009). [CrossRef]

25. T. Sakamoto, K. Kodama, and T. Hamamoto, “A novel scheme for 4-D light-field compression based on 3-D representation by multi-focus images,” in Proc. IEEE Int. Conf. Image Process. (ICIP), (2012), pp. 2901–2904.

26. S. Khire, L. A. D. Cooper, Y. Park, A. B. Carter, N. Jayant, and J. H. Saltz, “ZPEG: a hybrid DPCM-DCT based approach for compression of Z-stack images,” in Conf. Proc. IEEE Eng. Med. Biol. Soc., (2012), pp. 5424–5427.

27. M. D. Zarella and J. Jakubowski, “Video compression to support the expansion of whole-slide imaging into cytology,” J. Med. Imag. 6(04), 1 (2019). [CrossRef]

28. V. V. Duong, T. N. Canh, T. N. Huu, and B. Jeon, “Focal stack based light field coding for refocusing applications,” in Proc. IEEE Int. Symp. Broadband Multimedia Syst. Broadcast. (BMSB), (2019), pp. 1–4.

29. K. G. Chan and M. Liebling, “A point-spread-function-aware filtered backprojection algorithm for focal-plane-scanning optical projection tomography,” in Proc. IEEE 13th Int. Symp. Biomed. Imag. (ISBI), (2016), pp. 253–256.

30. P. Arroyo, “Modeling and applications of the focus cue in conventional digital cameras,” Ph.D. thesis, University of Rovira i Virgili (2013).

31. S. Jiao, P. Tsang, T. Poon, J. Liu, W. Zou, and X. Li, “Enhanced autofocusing in optical scanning holography based on hologram decomposition,” IEEE Trans. Ind. Informat. 13(5), 2455–2463 (2017). [CrossRef]

32. Y. Zhang, H. Wang, Y. Wu, M. Tamamitsu, and A. Ozcan, “Edge sparsity criterion for robust holographic autofocusing,” Opt. Lett. 42(19), 3824–3827 (2017). [CrossRef]

33. Y. Chen, J. Guan, and W. Cham, “Robust multi-focus image fusion using edge model and multi-matting,” IEEE Trans. Image Process. 27(3), 1526–1541 (2018). [CrossRef]

34. C. Zhang, G. Hou, Z. Zhang, Z. Sun, and T. Tan, “Efficient auto-refocusing for light field camera,” Pattern Recognit. 81, 176–189 (2018). [CrossRef]

35. K. McCann, C. Rosewarne, B. Bross, M. Naccari, K. Sharman, and G. Sullivan, “High efficiency video coding (HEVC) test model 16 (HM 16) encoder description, document JCTVC-R1002,” JCTVC, Sapporo, Japan (2014).

36. K. Wu, Q. Liu, Y. Yin, and Y. Yang, “Gaussian guided inter prediction for focal stack images compression,” 2020 Data Compression Conference (DCC) pp. 63–72 (2020).

37. M. Rerábek and T. Ebrahimi, “New light field image dataset,” in Proc. 8th Int. Conf. Quality Multimedia Exper. (QoMEX), (2016).

38. D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding, calibration and rectification for lenselet-based plenoptic cameras,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), (2013), pp. 1027–1034.

39. M. S. Hosseini, Y. Zhang, and K. N. Plataniotis, “Encoding visual sensitivity by maxpol convolution filters for image sharpness assessment,” IEEE Trans. Image Process. 28(9), 4510–4525 (2019). [CrossRef]

40. G. Bjøntegaard, “Calculation of average PSNR differences between RD-curves (VCEG-M33),” in VCEG Meeting (ITU-T SG16 Q. 6), (2001), pp. 2–4.

41. A. Levin and F. Durand, “Linear view synthesis using a dimensionality gap light field prior,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), pp. 1831–1838.

Scene ID	Category	Scene name	Slope range	Resolution	Image number
I01	Nature	Bush	[−1.0, 0.7]	$625 \times 434$	41
I02	People	Friends_2	[−1.0, 0.7]	$625 \times 434$	41
I03	People	Yan_Krios_standing	[−1.0, 0.7]	$625 \times 434$	41
I04	Urban	Billboards	[−1.0, 0.7]	$625 \times 434$	41
I05	Urban	Paved_Road	[−1.0, 0.7]	$625 \times 434$	41
I06	Building	Red_White_Building	[−1.0, 0.7]	$625 \times 434$	41
I07	Building	Pillars	[−1.0, 0.7]	$625 \times 434$	41
I08	Slide01	Strip00_Position02	–	$1024 \times 1024$	9
I09	Slide01	Strip01_Position03	–	$1024 \times 1024$	9
I10	Slide09	Strip01_Position02	–	$1024 \times 1024$	9

Range	I05					Range	I09
	Ori	QP					Ori	QP
	Ori	22	27	32	37		Ori	32	37	42	47
[−0.1,0.1]	0.179	0.190	0.189	0.196	0.209	[−1.0,1.0]	0.324	0.289	0.256	0.254	0.246
[−0.2,0.2]	0.704	0.707	0.707	0.704	0.711	[−1.2,1.2]	0.482	0.465	0.459	0.408	0.439
[−0.3,0.3]	0.855	0.843	0.845	0.844	0.840	[−1.4,1.4]	0.635	0.662	0.699	0.719	0.682
[−0.4,0.4]	0.927	0.927	0.924	0.921	0.906	[−1.6,1.6]	0.836	0.854	0.863	0.863	0.873
[−0.5,0.5]	0.948	0.949	0.948	0.945	0.946	[−1.8,1.8]	0.945	0.945	0.949	0.945	0.949
[−0.6,0.6]	0.961	0.960	0.960	0.960	0.958	[−2.0,2.0]	0.984	0.984	0.986	0.986	0.988

Scene	QP	Benchmark			GGIP							Proposed
Scene	QP	Bitrate	PY	PYS	Bitrate	PY	PYS	BDB	BDP	BDBS	BDPS	Bitrate1	Bitrate2	PY	PYS	BDB	BDP	BDBS	BDPS
I01	22	324096	43.71	21.40	273816	44.06	21.39	−23.16	1.09	−25.69	0.03	64434	1614	38.32	21.22	−54.59	2.13	−81.24	0.13
	27	156477	40.84	21.37	128326	41.15	21.35					40164	1616	37.86	21.17
	32	73439	37.70	21.23	60345	38.00	21.21					24045	1622	36.94	21.04
	37	32749	34.39	20.87	27972	34.64	20.85					13756	1622	35.45	20.75
I02	22	132734	44.94	22.17	111686	45.26	22.17	−23.98	0.92	−19.31	0.03	26807	1424	41.89	22.14	−62.80	3.00	−65.23	0.19
	27	57286	42.66	22.11	47595	42.99	22.12					15047	1419	41.21	22.09
	32	26267	40.01	22.01	21879	40.32	22.01					8426	1419	40.10	22.01
	37	12401	37.03	21.82	10806	37.40	21.83					4640	1429	38.37	21.85
I03	22	113738	45.89	22.98	91331	46.28	22.98	−30.31	1.23	−24.20	0.05	30760	1302	44.78	22.99	−70.26	3.98	−67.55	0.24
	27	49417	43.36	22.90	40075	43.86	22.91					15985	1312	43.73	22.92
	32	21894	40.50	22.75	17715	41.05	22.76					7964	1307	42.09	22.80
	37	9458	37.43	22.50	7779	37.97	22.53					3875	1318	39.70	22.60
I04	22	122758	45.76	23.60	101591	46.12	23.60	−26.59	1.20	−21.79	0.06	22457	1505	42.26	23.53	−64.41	3.38	−65.70	0.26
	27	58107	43.22	23.53	46269	43.57	23.54					13020	1505	41.56	23.48
	32	27183	40.26	23.38	21779	40.62	23.40					7520	1502	40.36	23.39
	37	12870	37.06	23.10	10521	37.37	23.13					4250	1498	38.49	23.20
I05	22	217131	44.18	22.08	173651	44.53	22.08	−29.04	1.15	−21.81	0.08	54327	1416	41.35	22.08	−68.40	3.68	−64.10	0.37
	27	91751	41.28	21.94	73425	41.77	21.96					31532	1411	37.23	21.99
	32	37583	38.29	21.67	31946	38.89	21.71					16577	1419	33.56	21.78
	37	14632	35.28	21.26	12886	35.78	21.30					7644	1431	30.37	21.41
I06	22	130991	45.14	22.52	108613	45.46	22.53	−31.92	1.30	−33.37	0.07	24398	1272	42.84	22.53	−69.60	3.88	−76.10	0.31
	27	58718	42.74	22.45	46249	43.15	22.47					13990	1285	42.03	22.49
	32	27426	40.01	22.32	20961	40.55	22.35					8005	1316	40.69	22.40
	37	12811	37.05	22.07	10117	37.64	22.14					4489	1338	38.83	22.24
I07	22	127264	45.26	22.81	107195	45.56	22.82	−30.82	1.23	−37.31	0.08	18301	1413	39.96	22.82	−67.09	3.11	−83.01	0.34
	27	58798	42.99	22.75	46930	43.34	22.77					10626	1412	39.59	22.79
	32	28228	40.39	22.63	21599	40.84	22.68					6183	1392	38.80	22.73
	37	13712	37.40	22.38	10360	38.05	22.50					3768	1445	37.66	22.61
Average	—							−27.40	1.13	−25.17	0.06	—				−65.31	3.31	−71.85	0.26
I08	32	130918	39.94	–	127781	39.93	–	−2.17	0.08	–	–	40203	483	36.21	–	−51.26	2.28	–	–
	37	66115	36.86	–	64744	36.87	–					23430	477	35.19	–
	42	30660	33.74	–	30113	33.75	–					13333	470	33.79	–
	47	11640	30.61	–	11383	30.62	–					6957	466	31.85	–
I09	32	179750	39.61	–	172131	39.62	–	−2.89	0.11	–	–	51759	476	35.53	–	−55.42	2.50	–	–
	37	92258	36.50	–	89521	36.52	–					30428	473	34.65	–
	42	43110	33.30	–	42083	33.29	–					17263	468	33.34	–
	47	15912	30.01	–	15726	30.03	–					8793	474	31.42	–
I10	32	66126	40.97	–	64655	40.98	–	−2.36	0.08	–	–	18881	539	36.62	–	−39.89	0.81	–	–
	37	30483	38.51	–	29814	38.53	–					9843	542	36.12	–
	42	13526	35.79	–	13351	35.82	–					5097	547	35.23	–
	47	5407	32.97	–	5311	32.95	–					2501	551	33.68	–
Average	—							−2.47	0.09	–	–	—-				−48.86	1.87	–	–

Abbreviation	PY	PYS	Bitrate1	Bitrate2
Explanations	PSNR-Y	PSNR-Y of synthesized view	Bitrate of bases	Bitrate of Gaussian parameters
Abbreviation	BDB	BDBS	BDP	BDPS
Explanations	BDBR	BDBR of synthesized view	BDPSNR	BDPSNR of synthesized view

Scene ID	Category	Scene name	Slope range	Resolution	Image number
I01	Nature	Bush	[−1.0, 0.7]	$625 \times 434$	41
I02	People	Friends_2	[−1.0, 0.7]	$625 \times 434$	41
I03	People	Yan_Krios_standing	[−1.0, 0.7]	$625 \times 434$	41
I04	Urban	Billboards	[−1.0, 0.7]	$625 \times 434$	41
I05	Urban	Paved_Road	[−1.0, 0.7]	$625 \times 434$	41
I06	Building	Red_White_Building	[−1.0, 0.7]	$625 \times 434$	41
I07	Building	Pillars	[−1.0, 0.7]	$625 \times 434$	41
I08	Slide01	Strip00_Position02	–	$1024 \times 1024$	9
I09	Slide01	Strip01_Position03	–	$1024 \times 1024$	9
I10	Slide09	Strip01_Position02	–	$1024 \times 1024$	9

Block-wise focal stack image representation for end-to-end applications

Abstract

1. Introduction

2. Optical analysis of FoSIs

3. Characteristics and representation of FoSIs

3.1 Characteristics of FoSIs in block-wise

3.2 Representation model of FoSIs

3.2.1 Block-wise spatial-weighted sharpness metric

3.2.2 Gaussian approximation

4. Representation based applications

4.1 Coding

4.2 Reconstruction

4.3 Rendering

5. Experiments and discussions

5.1 Experiment settings

5.2 Experimental parameter discussions

5.2.1 Determine the block size

5.2.2 Locate the range

5.3 Overall performance

5.3.1 Coding performance

5.3.2 Reconstruction performance

5.3.3 Frame interpolation rendering performance

6. Conclusions

Funding

Disclosures

References

Cited By

Figures (16)

Tables (4)

Equations (14)

Optics Express