Information theoretic performance evaluation of 3D integral imaging

Pranav Wani; Gokul Krishnan; Timothy O’Connor; Bahram Javidi

doi:10.1364/OE.475086

1. Introduction

Three-dimensional integral imaging (3D InIm) is a prominent imaging technique that works by capturing angular information about the 3D scene [1–5]. 3D InIm records multiple 2D elemental images of a scene from a diversity of perspectives [6–16]. This can be achieved either using a single imaging sensor with a lenslet array, camera array, or a single camera mounted on a moving platform. 3D scene can be reconstructed by integrating the 2D elemental images either optically or computationally. Computational reconstruction back propagates the optical rays through a virtual pinhole. Reconstruction can be achieved for any depth within the limits of the depth of field of captured elemental images.

Mutual information (MI) is a statistical measure of the non-linear similarity between two data sources. The development of mutual information is largely credited to the work by Shannon [17]. Since then, mutual information has been used in several fields, including statistics, complexity analysis, and communication theory [18]. The first use of mutual information as a measure of pixel correspondence was in 1995 when Viola employed mutual information to register multi-modal medical images, such as MR and CAT images [19,20]. Subsequently, it was shown that mutual information performed at a similar level with manual-assisted methods [21]. Since then, mutual information has been used for image registration [22,23], as a stereo correspondence measure [24], and as an image similarity metric [25,26]. Mutual information has also been utilized for multi-plane tracking [27], and image fusion [28]. In [29] a simple one-dimensional classification problem was presented. It demonstrated a correlation between classification accuracy and normalized mutual information.

In this paper, we use mutual information (MI) as a metric to evaluate the fidelity of the 3D InIm reconstruction with and without the presence of partial occlusion. The main idea is to use mutual information as a similarity measure between the true scene and the 3D reconstructed scene in the presence of occlusion. This allows us to simulate and access the theoretical performance of integral imaging in degraded environments. We demonstrate the utility of this formulation by considering two examples. First, we use this formulation to determine the depth of an object with partial occlusion. Second, we try to estimate the optimal pitch size of the InIm setup to maximize the longitudinal resolution. Our experimental results agree with theoretical work. The scope of this manuscript is limited to establishing a mutual information-theoretic framework for integral imaging in degraded environments. A rigorous study of its various applications is not considered here.

2. Methodology

2.1 Integral imaging

Integral imaging (InIm) is a passive 3D imaging approach that integrates the diverse perspectives of the captured 2D elemental images to obtain information about the light field generated by the field. This can be accomplished by using a camera array or a single camera mounted on a translation stage [6–16]. 3D reconstruction of the scene can be achieved through the backpropagation of rays through a virtual pinhole. The reconstruction depth can be anywhere within the limits of the depth of field of captured elemental images. InIm uses parallax to record both angular and intensity information of a 3D scene. This helps mitigate the effects of partial occlusion. The 3D reconstructed scenes also have a better signal-to-noise ratio (SNR). This is due to InIm being optimal in the maximum likelihood sense for read-noise dominant images [2–5].

In our simulations, a synthetic aperture integral imaging system is used for 3D InIm [30]. The pickup stage of synthetic aperture InIm is shown in Fig. 1(a). Once the 2D elemental images are captured, 3D scene reconstruction can be computationally achieved as illustrated in Fig. 1(b).

Fig. 1. (a) Synthetic aperture integral imaging setup pickup stage. (b) Reconstruction stage of integral imaging.

Download Full Size | PDF

3D reconstruction is accomplished by backpropagation of the captured elemental images through a virtual pinhole array to the desired depth. In 3D reconstruction process, reconstructed 3D image intensity I_z(x,y) is computed as [30]:

(1)$${I_z} = \frac{1}{{O(x,y)}}\sum\limits_{m = 0}^{M - 1} {} \sum\limits_{n = 0}^{N - 1} {\left[ {{I_{mn}}\left( {x - \frac{{m \times {L_x} \times {p_x}}}{{{c_x} \times \frac{z}{f}}},y - \frac{{n \times {L_y} \times {p_y}}}{{{c_y} \times \frac{z}{f}}}} \right) + \varepsilon } \right]}, $$

where (x, y) is the pixel indices, O(x, y) is the number of overlapping pixels in (x, y). I_mn is a 2D elemental image, with (m, n) representing its index, and (M, N) indicating the total number of elemental images in the horizontal and vertical directions. The resolution of the camera is represented by (L_x, L_y). (c_x, c_y), and (p_x, p_y) represent the sensor size and pitch between adjacent camera positions. f represents the focal length and z is the reconstruction depth.$\varepsilon $ is the additive camera noise. When 3D reconstructed at the true depth of the object, the variation of the rays is minimum assuming that the rays are coming from an object with approximately similar intensities [31].

2.2 Ising model

A common way to simulate natural images is through the use of Markov random fields [32]. Due to the way Markov random fields (MRF) express the relationship between the nodes, they make a lot of sense as a representation of physical space, and can thus represent several physical phenomena [32]. A commonly used special case of homogenous MRF is the Ising model. The Ising model [33] is a very useful model that has been applied in many areas, such as thermodynamics, physics, and computer science. It’s ability to represent the spatial correlation properties of natural image render it useful in various computer vision tasks. It has been widely used for image denoising [34] and image segmentation [35,36]. As such, we use this model to generate images that are used as a proxy for natural scenes and objects.

Markov random fields (MRF) is an undirected graph model that explicitly expresses conditional independence relationships between nodes. While MRF can provide conditional dependencies, Gibbs random field can provide us with a representation of a set of random variables and their relationships. From the Hammersley-Clifford theorem [37], we have the Gibbs representation for the probability distribution of an MRF. Given any MRF, all joint probability distributions that satisfy the conditional independencies can be written as clique potentials over the maximal cliques of the corresponding Gibbs field [32].

Gibbs field is a representation of a set of random variables and their relationships. Gibbs fields have an implicit probability function $\Phi $ for each clique, while MRFs only specify conditional independencies [37]. With this, the joint probability of random variables can be written as functions of cliques in the graph. The joint probability of any random variables represented by a Gibbs field can be written as a product of clique potentials [32]:

(2)$$P(x) = \frac{1}{Z}\prod\nolimits_{{c_i} \in C} {{\Phi _i}({c_i})}. $$

Here ${\Phi _i}({c_i})$ is the i^th clique potential, which is a function of only the clique members. Often clique potentials of the form ${\Phi _i}({c_i}) = \exp ({f({{c_i}} )} )$ are used, since it maximizes the entropy, with $f({c_i})$ representing the energy function over values of c_i. This form has been well studied in the literature and is known as canonical probability distribution or Boltzmann distribution [38]. If ${f_i}$ are quadratic, the corresponding Gibbs field is known as the Gaussian Gibbs field. Energy acts as an indicator of the relationships within the clique. A higher energy configuration has less probability of realization than a lower valued one.

If the clique potential is independent of the position of the clique in the lattice, we have a homogenous MRF. A commonly used special case of homogenous MRF is the Ising model [37]. A generalized Ising model has the following form for its clique potential [37]:

(3)$$P(\mathbf {X} = x) = \frac{1}{Z}\exp \left( { - \frac{{U(x)}}{T}} \right)\textrm{, where }U(x) = \sum\nolimits_{c \in C} {{V_c}(x)}$$

(4)$${V_c} = \left\{ \begin{array}{l} + {\beta_c}\textrm{, if the pixel values of }\mathbf {X}\mathrm{\ at\ the\ sites\ in ^{\prime}c^{\prime}\ are\ the\ same}\\ - {\beta_c}\textrm{, otherwise} \end{array} \right.$$

Here ${\beta _c}$ is a control parameter that determines the spatial correlation of the image. Here, ${\beta _c}$ is taken to be 1 for one site clique, constant $\beta $ for two site cliques, and 0 for the rest. With this formulation, $\beta $ can be used to obtain images of varying spatial correlations.

We use the Metropolis sampling algorithm [37,39] to generate 3-bit images of sizes 128 by 128 and 512 by 512. The temperature parameter T is set to 3, the number of iterations to 4000, and images are generated for $\beta $ varying from -2 to -0.3. Sample images have been shown in Fig. 2.

Fig. 2. Simulated Ising images of size 128 by 128. Temperate parameter T is set to 3 and the number of iterations is set to 4000. $\beta $ values for the images are (a) -2, (b) -0.83, (c) -0.79, (d) -0.77, (e) -0.7, (f) -0.4

Download Full Size | PDF

We use Moran’s I method [40] to quantify the spatial correlation of simulated Ising images. It computes spatial correlation as:

(5)$$I = \frac{N}{W}\frac{{\sum\nolimits_i {\sum\nolimits_j {{w_{ij}}({x_i} - \overline x )({x_j} - \overline x )} } }}{{\sum\nolimits_i {{{({x_i} - \overline x )}^2}} }}$$

Here N is the number of spatial units indexed by i and j. x is a variable of interest, and w_ij is a matrix of spatial weights wit zeros on a diagonal. We choose w_ij as a 5 by 5-unit matrix with diagonal elements zero. The spatial correlation of simulated 512 by 512 Ising images as a function of $\beta $ is shown in Fig. 3.

Fig. 3. Spatial correlation of 512 by 512 Ising images as computed with the Moran’s I method versus $\beta $ for temperature parameter T = 3 and number of iterations = 4000.

Download Full Size | PDF

2.3 Mutual information

Given events ${s_1},\ldots ,{s_n}$ occurring with probabilities $p({s_1}),\ldots ,p({s_n}),$ the average uncertainty associated with each event is defined by the Shannon entropy as [18]:

(6)$$H(s) ={-} \sum\limits_{i = 1}^n {p({s_i}).{{\log }_2}p({s_i})}$$

Assume X and Y as two random variables corresponding to input and output variables. The entropy corresponding to X and Y is denoted as H(X) and H(Y). The mutual information can now be defined as [18]:

(7)$$I(X;Y) = H(X) - {H_y}(X) = H(Y) - {H_x}(Y) = H(X) + H(Y) - H(X,Y)$$

In terms of the probability density function of the pixel values, the mutual information can be written as [41]:

(8)$$I(X;Y) = \sum\nolimits_{{g_1} \in I} {\sum\nolimits_{{g_2} \in I} {{f_{xy}}({g_1},{g_2})\log \frac{{{f_{xy}}({g_1},{g_2})}}{{{f_x}({g_1}){f_y}({g_2})}}} }$$

Where I is the set of pixel intensity values available in the image. Three-bit simulated Ising images have eight available intensity values, and thus $I = \{ 0,1,2,3,4,5,6,7\} $.

Pixel-to-pixel correspondence fails to capture the spatial information that exists in an image. We consider spatial entropy defined as the entropy of the probability distribution of class label configurations on the neighbors of the pixel [42].

Considering only one adjacency neighborhood, as is valid under the assumption of the Markov random field, we can write the total entropy as [41]:

(9)$$H({X_s}) ={-} \sum\nolimits_{{g_1} \in I} {\ldots \sum\nolimits_{{g_8} \in I} {\sum\nolimits_{g \in I} {{f_x}({g_1}\textrm{,}\ldots \textrm{,}{g_8},g)\log \frac{{{f_x}({g_1}\textrm{,}\ldots \textrm{,}{g_8},g)}}{{{f_x}({g_1}\textrm{,}\ldots \textrm{,}{g_8})}}} } }$$

g's are the pixel intensities of the central pixel (g) and neighboring pixels (g₁....g₈). For three-bit Ising images, the number of discreet intensity values available is eight, $I = \{ 0,1,2,3,4,5,6,7\} $. Thus, the number of possible realizations of $({X_{{s_1}}}\textrm{, }\ldots \textrm{, }{X_{{s_8}}},{X_s})$ is ${8^9}$. For a high-definition image with $I = \{ 0,\ldots ,255\} $ the number of possible realizations become ${256^9}$. This requires immense computational cost. Apart from this, the main concern is the availability of sufficient data points. Each configuration needs at least 20-30 sample points for proper representation of probability density function.

Gibbs random field formulation allows us to reduce the space of possible values. From the Hammersley-Clifford theorem [37] it follows that the conditional probabilities of a site’s gray level with respect to its neighbors are proportional to the exponential of the sum of the potentials of its associated cliques. The particular case of the Ising model has been discussed in the previous section. Thus, different neighborhood configurations that produce the same potential $U(x)$ can be grouped together as a single state $\alpha $. Thus, the random variable alpha represents pixel intensities (g₁ to g₈) with the same U(x). The entropy of an image can now be computed as [41]:

(10)$$H({X_s}) ={-} \sum\nolimits_{g \in I} {\sum\nolimits_\alpha {{f_x}(\alpha ,g)\log \frac{{{f_x}(\alpha ,g)}}{{{f_x}(\alpha )}}} }$$

For three-bit Ising images with only one adjacency neighborhood, $I = \{ 0,1,2,3,4,5,6,7\} $ and $\alpha $ equals nine. Thus, the total combinations of the pair $(\alpha ,g)$ are 72. Now the mutual information between two images is given as [41]:

(11)$$I(X;Y) = \sum\nolimits_{{g_x} \in I} {\sum\nolimits_{{g_y} \in I} {\sum\nolimits_{{\alpha _x}} {\sum\nolimits_{{\alpha _y}} {{f_{xy}}({{g_x},{\alpha_x},{g_y},{\alpha_y}} )\log \frac{{{f_{xy}}({{g_x},{\alpha_x},{g_y},{\alpha_y}} ){f_x}({\alpha _x}){f_y}({\alpha _y})}}{{{f_{xy}}({{\alpha_x},{\alpha_y}} ){f_x}({\alpha _x},{g_x}){f_y}({\alpha _y},{g_y})}}} } } }$$

This formulation of mutual information has been used henceforward.

3. Experimental details

3.1 Interpolation artifacts

Mutual information is sensitive to pixel intensities which give rise to interpolation errors. Interpolation artifacts while computing mutual information was studied in detail in [43,44]. The authors describe the cause and effect of interpolation-induced artifacts in mutual information-based image registration using two commonly utilized interpolation methods, namely linear interpolation, and partial volume interpolation. They suggest that the improved registration accuracy as observed for scale corrected magnetic resonance (MR) images may be partially accounted for by the inequality of grid distances that is a result of scale correction.

3D InIm reconstruction involves sub-pixel shifts which are achieved using interpolation. To understand the adverse effects of interpolation, a simple experiment using linear, nearest, cubic, cubic Hermite polynomial, and cubic spline is performed. 2D Ising images were shifted in one direction by a sub-pixel distance. These were then shifted back to their original positions. Mutual information was then computed between these images and their original counterparts. The plot for mutual information as a function of various sub-pixel shifts is shown in Fig. 4(a). Interpolation methods used are linear, nearest, cubic, cubic Hermit polynomial, and cubic spline. It can be observed that cubit Hermit polynomial-based interpolation has a small error with smooth morphology. As such, cubic Hermit polynomial-based interpolation has been used henceforth. These errors are also a function of the spatial correlation of the scene. As can be expected, objects having a high degree of spatial correlation have a low interpolation error compared with objects having less spatial correlation. The effects of spatial correlations on cubic Hermit polynomial-based interpolation have been shown in Fig. 4(b) and (c).

Fig. 4. (a) Normalized mutual information as a function of sub-pixel shifts for various interpolation methods. (b) Normalized mutual information as a function of sub-pixel shifts using cubic Hermit polynomial based interpolation for different values of spatial correlations. (c) Normalized mutual information for 0.5 sub-pixel shift using cubic Hermit polynomial based interpolation as a function of spatial correlation coefficients.

Download Full Size | PDF

Apart from choosing a good interpolation method, interpolation errors can be avoided in certain InIm setups by eliminating the need for interpolation. By appropriately choosing the 3D reconstruction distance in integral imaging, non-integer shifts in 2D elemental images can be avoided. For one dimensional case, the modified reconstruction depth can be computed as:

(12)$${z_{\textrm{modified}}} = \frac{{{L_x} \times {p_x} \times f}}{{{c_x}}} \times \left[ {\frac{{{c_x} \times z}}{{{L_x} \times {p_x} \times f}}} \right]$$

where all the variables have the same meaning as mentioned previously, and [.] represents the round-off operator. The extension of this method to the two-dimensional case may not be straightforward for every integral imaging setup. However, if the integral imaging setup is symmetrical (same InIm parameters for both axis), the modified value of reconstruction depth is the same for the one-dimensional and the two-dimensional case. In our experiments modified reconstruction depth has been used wherever applicable. All our experiments were performed on a symmetrical integral imaging setup allowing us to use this method with ease.

3.2 Integral imaging system parameters

The simulated integral imaging setup discussed here uses nine cameras in a 3 by 3 configuration as shown in Fig. 1. Pitch size of the cameras is 100 mm in both x and y directions. We first use simulated 2D Ising images as objects. The object plane is at 3000 mm. The field of view of the central camera at the plane of the object is 4000 mm by 4000 mm, of which the object covers 2000 mm by 2000 mm. 2D elemental images have $1024 \times 1024$ pixels. Unless specified, these parameters are held constant. A sample 2D central elemental image is shown in Fig. 5(a). Its 3D reconstruction at 2000 mm is shown in Fig. 5(b), and at the true depth of 3000 mm is shown in Fig. 5(c).

Fig. 5. (a) Sample 2D central elemental image. (b) 3D reconstruction at 2000 mm. (c) 3D reconstruction at the true depth of 3000 mm

Download Full Size | PDF

The experimental InIm setup uses nine cameras in a 3 by 3 configuration. The horizontal and vertical camera pitches were both set to 100 mm. Objects were placed approximately 4 m from the SAII setup and were recorded using a visible sCMOS sensor (Hamamatsu C11440-42U). The focal length of each camera lens is 50 mm and the diameter is 40 mm giving an F-number of 1.25. The sensor size is 2048 by 2048 pixels and pixel size is 6.5 by 6.5 micrometers. Eight different lab posters were used as objects for the experimental SAII setup. Sample objects have been shown in Fig. 6.

Fig. 6. Sample objects used in the experimental InIm setup. A total of eight distinct posters were used as objects. Four among them have been shown here.

Download Full Size | PDF

3.3 Computational complexity

Assuming a patch of size $n \times m$, the time complexity of 3D reconstruction is $O((n + {k_1})(m + {k_2}))$. Here, ${k_1}\textrm{, and }{k_2}$ are some constant factors dependent on the total parallax of the InIm system. This time complexity can thus be assumed to be $O(nm)$. Mutual information computation has the complexity of $O(nm) + O({b_1}) + O({b_2}) + O(b_1^2b_2^2)$ which can be simplified as $O(nm) + O(b_1^2b_2^2)$. Here, ${b_1}\textrm{, and }{b_2}$ are the bin sizes for possible pixel values, and possible clique potentials. For the three-bit images and the Ising model considered in this paper, b₁ is 8 and b₂ is 9. Thus, the total time complexity for computing mutual information for one 3D reconstructed slice is $O(nm + b_1^2b_2^2)$. Computation of the average MI curve requires multiple 3D reconstructions requiring $O(l(nm + b_1^2b_2^2))$ time. Here, l is the number of data points or the number of 3D reconstructions used to generate the average MI curve.

4. Results

Simulated 2D Ising image of size 512 by 512 pixels was used as an object for imaging (see Fig. 5). The integral imaging parameters are as discussed previously. 3D reconstruction was performed for depths ranging from 2000 mm to 4000 mm, with 3000 mm as the true depth of the object. Normalized mutual information was computed between 3D reconstructed image and the 2D central elemental image. Ising images of varying spatial correlation coefficients were used as object. Normalized mutual information as a function of reconstruction depth for different values of correlation coefficients is plotted in Fig. 7(a). As can be seen from the plots, normalized mutual information attains its maximum value when 3D reconstruction depth equals the true object depth. Without any environmental degradation, this maximum value is 1. A smaller spatial correlation of the object produces a faster decrease of mutual information as we move away from the true reconstruction depth of the object. Several depth estimation methods have been studied in the literature, including methods specifically designed for integral imaging. Although depth estimation is out of scope for this paper, preliminary results suggest the possibility of using the presented mutual information method for depth estimation.

Fig. 7. Simulated results for images in Fig. 5 and Fig. 2(a) Normalized mutual information vs. 3D reconstruction depth plots. Mutual information is computed between the reconstructed slice of the 3D image at various depths of the scene and the central 2D elemental image. (b) Normalized mutual information vs. 3D reconstruction depth plots. Mutual information is computed between the adjacent planes of 3D reconstructed images. m is spatial correlation.

Download Full Size | PDF

An alternate method to compute mutual information is between two adjacent 3D reconstructed images separated by a distance of Δz. We take the average of the mutual information computed between the reconstructed images at depth of (z and z-Δz), and (z and z+Δz). Average normalized mutual information as a function of reconstruction depth has been shown in Fig. 7(b). In our simulations, 3D image were reconstructed at every 50mm depth step. The depth separation between the adjacent images (Δz) plays an important part. If the depth separation is narrow, the mutual information between adjacent reconstructed images will be very high and approximately constant. Alternatively, if the depth separation is large, the generated average mutual information curve will not have a pronounced peak. In Fig. 7(b), the rate of change of average mutual information is more invariant to the spatial correlation of the object compared with Fig. 7(a).

Figure 8 shows the results for the optical experimental implementation of InIm for objects shown in Fig. 6. Eight distinct posters were used as objects which were placed at approximately 4m depth. Figure 8(a) shows the normalized mutual information curves between the reconstructed planes for each of the eight objects and their corresponding central elemental images. Figure 8(b) shows the average of normalized mutual information between one slice of the 3D reconstructed scene and elemental image for all eight objects. Figure 8(c) shows the normalized mutual information as computed within three adjacent 3D reconstructed image planes separated by 50 mm for each of the eight objects (similar to Fig. 7(b)). Figure 8(d) shows the average of normalized mutual information in Fig. 8(c) for all eight objects in 8(c).

Fig. 8. Optical experimental results using the InIm setup. (a) Normalized mutual information curves between the central elemental image and one slice of 3D reconstructed image at various depths of the scene for each of the eight objects. (b) The average of normalized mutual information in Fig. 8(a) for all the eight objects. (c) Normalized mutual information curves for each of the eight objects computed within three adjacent 3D reconstructed image planes. (d) The average of normalized mutual information in Fig. 8(c) for all the eight objects.

Download Full Size | PDF

3D imaging in partially occluded environments is a common situation where integral imaging has been applied. Partial occlusion is simulated using binary Ising images. Ising model allows the control of the fill factor and the spatial correlation of the occlusion. Occlusion masks of constant spatial correlation coefficient and varying fill factors are shown in Fig. 9.

Fig. 9. Occlusion masks with a spatial correlation coefficient of 0.61 with varying fill factors. Fill factors in percentage are (a) 11.13, (b) 23.13, (c) 39.16, (d) 48.84, (e) 61.94, (f) 75.91, (g) 87.68

Download Full Size | PDF

Partial occlusion was introduced by placing the occlusion mask at 1500 mm from the camera plane. To demonstrate the effect of occlusion, a fixed mask of spatial correlation coefficient 0.61 and a fill factor of 11.13 percent was used as a partially occluding object (e.g. Figure 9). The plots are similar to that in Fig. 7, albeit with occlusion as shown in Fig. 9. For Fig. 10(a), mutual information was computed between one slice of the 3D reconstructed image at various depths and the 2D central elemental image. The elemental image contains the image of the partially occluded object. As can be seen from the plots, the presence of occlusion reduces the maximum value of the normalized mutual information. There is also a secondary peak at the depth of the location of the occlusion. Preliminary results suggest that the presented method can be used for depth estimation even for partially occluded objects.

Fig. 10. The effect of partial occlusion in the scene on normalized mutual information vs. 3D InIm reconstruction depth. Object (Fig. 2 and Fig. 5) is present at the depth of 3000 mm and partial occlusion (Fig. 9) is present at depth of 1500 mm. (a) Mutual information is computed between the 3D reconstructed image in the presence of occlusion and 2D central elemental image. (b) Mutual information is computed within three adjacent 3D reconstructed image planes in the presence of occlusion.

Download Full Size | PDF

Figure 10(b) presents the mutual information computed between adjacent 3D reconstructed image planes. From Fig. 10(b), it can be seen that this method is able to provide the correct depth information for the object. However, as opposed to the first method in Fig. 10(a), it fails to provide information about the location of the occlusion. The first method in Fig. 10(a) outperforms the second method in Fig. 10(b) in the presence of occlusion and thus would be considered solely henceforth.

The fill factor or severity of the occlusion can be modified as shown in Fig. 9. To study the effects of the fill factor, the occlusion mask of spatial correlation 0.61 was placed at 1500 mm. Ising image of spatial correlation 0.6 was used as an object placed at 3000 mm. Plots for mutual information as a function of reconstruction depths have been shown in Fig. 11. Here, the mutual information is computed between 3D reconstructed images at different depths and 2D central image. As can be seen from Fig. 11, the object mutual information peaks decrease as the fill factor of occlusion increases. Correspondingly, the mutual information peaks at the plane of occlusion rise as the fill factor increases. These simulations suggest that the presented method can be used for depth estimation even for severely occluded objects.

Fig. 11. The effect of occlusion fill factor on mutual information. (a) Mutual information as a function of reconstruction depth for various fill factors. Mutual information is computed between one slice of the 3D reconstructed images at different depths and 2D central elemental image. (b) The same plots as (a) displayed separately for various values of fill factors. ff is fill factor for occlusion shown in Fig. 9.

Download Full Size | PDF

In a laboratory experiments, leaves are used as an occluding object. The occlusion is placed at approximately 2700 mm from the InIm camera plane while the objects are placed at approximately 4000 mm. The sample object with partial occlusion is shown in Fig. 12. Figure 12(a) shows the image of the object partially occluded by leaves, Fig. 12(b) shows the central elemental image of the InIm.

Fig. 12. (a) Photo of a sample scene, and object partially occluded by leaves. (b) Central elemental image of InIm.

Download Full Size | PDF

Figure 13 shows the results for the experimental InIm. Figure 13(a) shows the mutual information curves for each of the eight objects. The mutual information is computed between the reconstructed slice of the 3D image at various depths of the scene and the 2D central elemental image. Figure 13(b) shows the average mutual information curve for all eight objects in (a).

Fig. 13. Experimental results for the effect of occlusion (Fig. 12) on mutual information. (a) Normalized mutual information curve for each of the eight objects. Mutual information is computed between one slice of the 3D reconstructed image at different depths and the 2D central elemental image. (b) Average of normalized mutual information in (a) for all the eight objects in the scene of Fig. 12.

Download Full Size | PDF

The performance of the integral imaging system depends on various system parameters such as camera pitch size, number of cameras, and placement of cameras. An important parameter is the camera pitch size, which has been discussed here. The field of view of the InIm setup is the intersection of the field of view of all the cameras. Thus, increasing the pitch size reduces the available field of view for the InIm system. Figure 14(a) shows the effect of pitch size on the mutual information. Figure 14(b) shows the full width at half maximum (FWHM) for plots in Fig. 14(a) as a function of camera pitch size. As can be seen, FWHM decreases with an increase in pitch size. A lower FWHM signifies a higher longitudinal resolution capacity. Thus, a larger pitch size leads to a higher longitudinal resolution. However, this effect plateaus upon further increase in pitch size. Optimum pitch size can be found by taking the knee point of the FWHM curve. The effect of pitch size under the presence of occlusion is the same as without, and as such, no further discussion has been provided.

Fig. 14. (a) Mutual information as a function of reconstruction depth for varying camera pitch sizes. Mutual information is computed between one slice of the 3D reconstructed image at different depths and the 2D central elemental image. (b) Full width at half maximum (FWHM) of plots in (a) as a function of pitch size. ‘ps’ is the pitch size of the camera.

Download Full Size | PDF

5. Conclusions

We have investigated the use of mutual information for integral imaging and 3D reconstruction, including the presence of partial occlusions in the scene. Preliminary numerical and optical experimental results of the information theoretic approach are presented. In the numerical analysis, objects were simulated using the Ising model allowing us to control their spatial correlations. In the experiments InIm setup was used to verify the simulated results. Mutual information analysis as a function of reconstruction depth shows that there is a prominent peak at the true depth of the object in the 3D scene. This suggests the possibility of using the presented information theoretic method for depth estimation in integral imaging. Partial occlusion was added to simulate an important degraded environment in 3D imaging applications. This was again simulated using the Ising model and optical experiments. Our investigation shows that the true depth of the object can be evaluated even under the presence of severe occlusion. Lastly, the effect of changing the camera pitch size was discussed. It showed a plateauing effect with increasing pitch size, suggesting the presence of optimal pitch size.

This manuscript presented a mutual information theoretic formulation for the integral imaging. However, a rigorous study of its applications was not considered here. In the future, we plan to study in detail the utility of this formulation for optimal parameter estimation. We postulate that parameters such as the number of cameras, pixel size, and pitch size can be optimized for different imaging environments and degradations. We also plan to study in detail the possibility of using this formulation for object depth estimation under various environmental degradations, such as partial occlusion, fog, brownout, and low illumination conditions. We plan to compare our approach for object depth localization with other focus measures [45].

Funding

Office of Naval Research (N00014-20-1-2690, N000142212349, N000142212375); Air Force Office of Scientific Research (FA9550-21-1-0333).

Acknowledgments

T. O’Connor acknowledges support via the GAANN fellowship. G. Krishnan acknowledges the support via GE fellowship for excellence.

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Komatsu, A. Markman, A. Mahalanobis, K. Chen, and B. Javidi, “Three-dimensional integral imaging and object detection using long-wave infrared imaging,” Appl. Opt. 56(9), D120–D126 (2017). [CrossRef]

2. A. Markman and B. Javidi, “Learning in the dark: 3D integral imaging object recognition in very low illumination conditions using convolutional neural networks,” OSA Continuum 1(2), 373–383 (2018). [CrossRef]

3. D. Aloni, A. Stern, and B. Javidi, “Three-dimensional photon counting integral imaging reconstruction using penalized maximum likelihood expectation maximization,” Opt. Express 19(20), 19681–19687 (2011). [CrossRef]

4. X. Shen, A. Carnicer, and B. Javidi, “Three-dimensional polarimetric integral imaging under low illumination conditions,” Opt. Lett. 44(13), 3230–3233 (2019). [CrossRef]

5. B. Tavakoli, B. Javidi, and E. Watson, “Three dimensional visualization by photon counting computational integral imaging,” Opt. Express 16(7), 4426–4436 (2008). [CrossRef]

6. G. Lippmann, “Epreuves reversibles donnant la sensation du relief,” J. Phys. 7, 821–825 (1908). [CrossRef]

7. N. Davies, M. McCormick, and L. Yang, “Three-dimensional imaging systems: a new development,” Appl. Opt. 27(21), 4520–4528 (1988). [CrossRef]

8. H. Arimoto and B. Javidi, “Integral Three-dimensional Imaging with digital reconstruction,” Opt. Lett. 26(3), 157–159 (2001). [CrossRef]

9. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997). [CrossRef]

10. M. Martinez-Corral, A. Dorado, J. C. Barreiro, G. Saavedra, and B. Javidi, “Recent advances in the capture and display of macroscopic and microscopic 3D scenes by integral imaging,” Proc. IEEE 105(5), 825–836 (2017). [CrossRef]

11. A. Stern and B. Javidi, “Three-dimensional image sensing and reconstruction with time-division multiplexed computational integral imaging,” Appl. Opt. 42(35), 7036–7042 (2003). [CrossRef]

12. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” Computational Models of Visual Processing 1, 3–20 (1991).

13. J. Liu, D. Claus, T. Xu, T. Keßner, A. Herkommer, and W. Osten, “Light field endoscopy and its parametric description,” Opt. Lett. 42(9), 1804–1807 (2017). [CrossRef]

14. G. Scrofani, J. Sola-Pikabea, A. Llavador, E. Sanchez-Ortiga, J.C. Barreiro, G. Saavedra, J. Garcia-Sucerquia, and M. Martinez-Corral, “FIMic: design for ultimate 3D-integral microscopy of in-vivo biological samples,” Biomed. Opt. Express 9(1), 335–346 (2018). [CrossRef]

15. J. Arai, E. Nakasu, T. Yamashita, H. Hiura, M. Miura, T. Nakamura, and R. Funatsu, “Progress overview of capturing method for integral 3-D imaging displays,” Proc. IEEE 105(5), 837–849 (2017). [CrossRef]

16. M. Yamaguchi, “Full-parallax holographic light-field 3-D displays and interactive 3-D touch,” Proc. IEEE 105(5), 947–959 (2017). [CrossRef]

17. C. Shannon, “A mathematical theory of communication,” BSTJ 27(3), 379–423 (1948). [CrossRef]

18. T. M. Cover and J. A. Thomas, Elements of information theory, (John Wiley & Sons, 1991). [CrossRef]

19. A. Collignon, F. Maes, D. Delaere, D. Vandermeulen, P. Suetens, and G. Marchal, “Automated multi-modality image registration based on information theory,” Information processing in medical imaging 263–274 (1995).

20. P. A. Viola and W. M. Wells III, “Alignment by maximization of mutual information,” Int. J. Comput. Vis. 24(2), 137–154 (1997). [CrossRef]

21. J. West, J. Fitzpatrick, M. Wang, B. Dawant, C. Maurer, R. Kessler, R. Maciunas, R. Barillot, D. Lemoine, A. Collignon, F. Maes, P. Suetens, D. Vandermeulen, P. Van Den Elsen, S. Napel, T. Sumanaweera, B. Harkness, P. Hemler, D. Hill, D. Hawkes, C. Studholme, J. Maintz, M. Viergever, G. Malandein, X. Pennec, M. Noz, G. Maguire, M. Pollac, C. Pellizzari, R. Robb, D. Hanson, and R. Woods, “Comparison and evaluation of retrospective intermodality brain image registration techniques,” J. Comput. Assist. Tomogr. 21(4), 554–568 (1997). [CrossRef]

22. F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, “Multimodality image registration by maximization of mutual information,” IEEE Trans. Med. Imaging 16(2), 187–198 (1997). [CrossRef]

23. M. Corsini, M. Dellepiane, F. Ponchio, and R. Scopigno, “Image-to-geometry registration: a mutual information method exploiting illumination-related geometric properties,” Pacific graphics 28(7), 1 (2009). [CrossRef]

24. G. Egnal, “Mutual information as a stereo correspondence measure,” UPenn technical report no. MS-CIS-00-20, (2000).

25. D. B. Russakoff, C. Tomasi, T. Rohlfing, and C. R. Maurer, “Image similarity using mutual information of regions,” European conference on computer vision ECCV, 596–607 (2004).

26. J. Blazek and B. Zitova, “Image difference visualization based on mutual information,” WDS’10 Proceedings of contributed papers, 37–41 (2010).

27. B. Delabarre and E. Marchand, “Camera localization using mutual information-based multiplane tracking,” IEEE/RSJ Int. conference on intelligent robots and systems, (2013).

28. C. You, Y. Liu, B. Zhao, and S. Yang, “An objective quality metric for image fusion based on mutual information and multi-scale structural similarity,” JSW 9(4), 1050–1054 (2014). [CrossRef]

29. S. R. Narravula, M. M. Hayat, and B. Javidi, “Information theoretic approach for accessing image fidelity in photon-counting arrays,” Opt. Express 18(3), 2449–2466 (2010). [CrossRef]

30. J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27(13), 1144–1146 (2002). [CrossRef]

31. M. Daneshpanah and B. Javidi, “Profilometry and optical slicing by passive three dimensional imaging,” Opt. Lett. 34(7), 1105–1107 (2009). [CrossRef]

32. D. Bagnell, “Gibbs fields and Markov random fields,” https://www.cs.cmu.edu/∼16831-f14/notes/F11/16831_lecture07_bneuman.pdf.

33. M. Niss, “History of the lenz-Ising model 1950-1965: From irrelevance to relevance,” Arch. Hist. Exact Sci. 63(3), 243–287 (2009). [CrossRef]

34. G. Chen, Q. Chen, Y. Chen, and X. Zhu, “Historical document image denoising using Ising model,” IEEE Int. Conf. on Dependable, Automatic and Secure Computing (DASC), 457–461 (2020).

35. F. W. Bentrem, “A Q-Ising model application for linear-time image segmentation,” Cent. Eur. J. Phys. 8, 689–698 (2010). [CrossRef]

36. P. Qin and J. Zhao, “A polynomial time algorithm for image segmentation using Ising models,” Seventh International Conference on Natural Computations, 932–935 (2011).

37. S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Analysis and Machine IntelligencePAMI-6(6), 721–741 (1984). [CrossRef]

38. L. D. Landau, L. Davidovich, and E. M. Lifshitz, Statistical physics. Course of therotical physics Vol. 5, (Pergamon Press, Oxford, 1976).

39. B. Walsh, “Markov chain Monte Carlo and Gibbs sampling,” http://www.stat.columbia.edu/∼liam/teaching/neurostat-spr11/papers/mcmc/mcmc-gibbs-intro.pdf

40. P. A. P. Moran, “Notes on continuous stochastic phenomenon,” Biometrika 37(1), 1 (1950). [CrossRef]

41. E. Volden, G. Giraudon, and M. Berthod, “Information in markov random fields and image redundancy,” Selected papers from the 4th Canadian workshop on information theory and applications II, 250–268 (1996).

42. H. Maitre, “Entrrpie, information et image – Partie 2,” Technical report 94 D 006, Ecole nationale supereure des telecommunications, (1994).

43. J. P. W. Pluim, J. B. Maintz, and M. A. Viergever, “Mutual information matching and interpolation artifacts,” Proc. SPIE Medical Imaging: Image Processing 3661, (1999).

44. J. P. W. Pluim, J. B. Maintz, and M. A. Viergever, “Interpolation artefacts in mutual information-based image registration,” Comput. Vis. Image Underst. 77(2), 211–232 (2000). [CrossRef]

45. S. Pertuz, D. Puig, and M. A. Garcia, “Analysis of focus measure operators for shape-from-focus,” Pattern Recognition 46(5), 1415–1432 (2013). [CrossRef]

Information theoretic performance evaluation of 3D integral imaging

Abstract

1. Introduction

2. Methodology

2.1 Integral imaging

2.2 Ising model

2.3 Mutual information

3. Experimental details

3.1 Interpolation artifacts

3.2 Integral imaging system parameters

3.3 Computational complexity

4. Results

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (14)

Equations (12)

Optics Express