High-resolution lightfield photography using two masks

Zhimin Xu; Jun Ke; Edmund Y. Lam

doi:10.1364/OE.20.010971

1. Introduction

Advances in computational imaging suggest that we can capture more information than a single two-dimensional (2D) projection of a three-dimensional (3D) scene. Although the acquired picture in this manner may not be visually pleasing, via computational methods in post-processing, it can yield data that could not be obtained with the traditional methods [1–5]. In this paper, we focus on the camera design for computational photography, which allows us to capture the “lightfield”. This is a term commonly used in the computer graphics literature [6], but is not a “field” in the wave optics sense [7]; instead, it is a collection of light rays in geometric optics, which takes into account not only the geometrical position of the rays but also their directions.

Generally, the radiance along all the rays in a region of 3D space is mathematically characterized by a five-dimensional (5D) plenoptic function [8], i.e., three coordinates for the position and two angles for the direction. In free space, as the radiance does not change along a line unless it is occluded, such a 5D representation may be reduced to four-dimensional (4D), which is called the “lightfield” [6] or “lumigraph” [9]. With a lightfield, we can reconstruct, or render, various observations of the scene. For example, we can manipulate viewpoints and perform refocusing via ray-tracing techniques.

There are two main approaches to capturing lightfields. The first one is to sample each individual light ray directly. An early example is integral photography [10], which gathers multiple images from different perspectives by placing an array of microlenses directly before the sensor. This is optically similar to a camera array system [11]. More recently, Adelson and Wang [12], and Ng et al. [13], develop what they called plenoptic cameras. In the latter, an additional main lens is placed in front of the microlens array. Since the microlenses are located at the focal plane of this additional lens, the converging rays are separated and finally recorded by the sensor behind the microlens array. A second approach is to acquire the data in the Fourier domain. Veeraraghavan et al. developed the dappled photography [14], where an attenuation mask is added to a regular camera. Its working principle will be discussed in more detail in Sections 2.1 and 2.2. After that, Agrawal et al. extend this design to the problem of capturing useful subsets of time-varying 4D lightfield in a single snapshot [15]. This “reinterpretable” imaging system adopts a design of a time-varying mask in the pupil plane and a static mask placed near the sensor, providing a variable resolution tradeoff among the spatial, angular and temporal dimensions.

Nevertheless, a common issue for different lightfield camera systems is that the spatial resolution is traded for angular information (for both angular and temporal information in [15]) because the limited sensor elements have to be allocated to all these dimensions [16, 17]. For instance, to acquire a lightfield of 144 views on a sensor of size 3072 × 1536, a twelvefold reduction in each spatial dimension means that the maximum resolution achievable is only 256 ×128. There have been attempts to overcome this tradeoff, but they come at the expense of other aspects. For example, the camera array system [11] can gain the 4D radiance information with a high resolution (i.e., full sensor size of each camera) for each perspective, but the system is also known for its large size. This eventually limits its practical use. Alternatively, in a method known as programmable aperture photography [18], we need many image captures to attain the required angular resolution. This results in a long acquisition time, which is not desirable in many practical applications. In [19], Lumsdaine and Georgiev depict a new design of a plenoptic camera, called the focused plenoptic camera, where the microlens array is positioned before or behind the focal plane of the main lens. This modification samples the lightfield in a way that allows for a higher spatial resolution. However, at the same time, the angular resolution is decreased. Besides, the low angular resolution also introduces some unwanted aliasing artifacts.

In this paper, we present a camera system that collects the 4D lightfield within a single exposure. With two attenuating masks separately placed at the aperture plane and the optical path of the camera, we can encode the lightfield spectrum in the Fourier domain, and then selectively sub-sample it. We show that this economical and easily adjustable design can overcome various limitations found in other lightfield acquisition systems.

2. A lightfield camera with two masks

2.1. Lightfield mapping via mask-based multiplexing

We explain the mapping of a lightfield with mask-based multiplexing. In geometrical optics, we describe light propagation in terms of rays, which together form a lightfield [6]. We describe the light rays by their intersections with two parallel planes as shown in Fig. 1, i.e., a first coordinate pair u = {u,v} (at the u-plane) and a second coordinate pair s = {s,t} (at the s-plane) [6]. The lightfield is then ℓ(u,v,s,t), which we abbreviate as ℓ(u,s) in the rest of this paper.

Fig. 1 The two-plane parametrization of a 4D lightfield.

Download Full Size | PDF

Using this two-plane parametrization, we can analyze a conventional camera fitted with a mask between the u-plane and the s-plane. We depict such a camera in Fig. 2. The u-plane is taken to be at the aperture, while the s-plane at the sensor. They are separated by a distance d, while the mask is placed at a distance z in front of the sensor, where z ≤ d. Let m(u,s) be the attenuation on a lightfield produced by the mask. The lightfield measured behind the mask is then ℓ_o(u,s), given by

ℓ_{o} (u, s) = ℓ (u, s) m (u, s) .

If we can capture ℓ_o(u,s), we can retrieve ℓ(u,s) since m(u,s) is known.

Fig. 2 Schematic diagram of a regular camera, with an attenuation mask placed inside it.

Download Full Size | PDF

In fact, m(u,s) is completely determined by the 2D pattern c(x,y) printed on the mask when the distance d is known. We denote the mask plane as the x-plane, with x = {x,y}. With reference to Fig. 2, because ΔABC and ΔADE are similar triangles, we have

\frac{B C}{D E} = \frac{A B}{A D} \Leftrightarrow \frac{x - u}{s - u} = \frac{d - z}{d} .

Based on Eq. (2), we have x = (1 − z/d)s + (z/d)u. But since u = {u,v} and s = {s,t},

x = (1 - \frac{z}{d}) s + \frac{z}{d} u .

Thus, m(u,s) can be expressed as

m (u, s) = c [(1 - \frac{z}{d}) s + \frac{z}{d} u] .

In reality, however, we seldom directly capture the lightfield ℓ_o(u,s). Instead, it is instructive to consider the “lightfield-frequency” domain, which is the 4D Fourier transform applied to the lightfield in Eq. (1). Using f_u and f_s to denote the lightfield-frequency variables, we have

ℒ_{o} (f_{u}, f_{s}) = ℒ (f_{u}, f_{s}) * M (f_{u}, f_{s}),

where ℒ_o(f_u, f_s), ℒ(f_u, f_s) and M(f_u, f_s) are the respective Fourier transforms of ℓ_o(u,s), ℓ(u,s) and m(u,s), and * denotes the 4D convolution operation. Furthermore, we can express M(f_u, f_s)

\begin{array}{l} M (f_{u}, f_{s}) & = & \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} c [(1 - \frac{z}{d}) s + \frac{z}{d} u] exp [- j 2 π (f_{u} \cdot u + f_{s} \cdot s)] d u d s \\ = & \int_{- \infty}^{\infty} {\int_{- \infty}^{\infty} c [(1 - \frac{z}{d}) s + \frac{z}{d} u] exp (- j 2 π f_{s} \cdot s) d s} exp (- j 2 π f_{u} \cdot u) d u . \end{array}

Clearly, the positioning of the mask (i.e., the value of z) affects the lightfield ℓ_o(u,s). This effect is explained in further details as follows.

Generally, the mask is between the aperture and the sensor, so 0 < z <d. According to Eq. (6), the inner integration computes the Fourier transform over the dimension of s with some shift and scaling, i.e. [20], $\begin{array}{l} M (f_{u}, f_{s}) & = & \frac{d}{d - z} \int_{- \infty}^{\infty} {C (\frac{d}{d - z} f_{s}) exp [j 2 π (\frac{z}{d - z} f_{s}) \cdot u]} exp (- j 2 π f_{u} \cdot u) d u \\ = & \frac{d}{d - z} C (\frac{d}{d - z} f_{s}) δ (f_{u} - \frac{z}{d - z} f_{s}), \end{array}$ where C(·) represents the 2D Fourier transform of c(·). This means that the modulation caused by the mask in the lightfield-frequency domain happens along an inclined 2D plane, where $f_{u} - \frac{z}{d - z} f_{s} = 0$ . Its inclination angle α, if we plot f_s versus f_u, is given by $α = arctan \frac{z}{d - z} .$
Alternatively, the mask can be placed exactly at the aperture, where z = d. All the rays with the same location in the u-plane are attenuated equally by the mask. Substitute z = d into Eq. (6), then $M (f_{u}, f_{s}) = C (f_{u}) δ (f_{s}) .$ Thus, in lightfield-frequency domain, the corresponding convolution only affects the lightfield spectrum along the f_u axis (where f_s = 0). This observation is critical to our design, as we will explain next.

2.2. Lightfield capture and image reconstruction

The sensor at the s-plane cannot capture the full 4D lightfield ℓ_o(u,s) as given in Eq. (1). Instead, all rays with the same (s,t) but different (u,v) are collected (i.e., integrated together) by the same photodetector. In the lightfield-frequency domain, this means the sensor only obtains data at f_u = 0, or along the f_s axis.

Ref. [14] however provides a strategy to capture the 4D lightfield using a normal sensor, which we briefly review here. This will form the basis of our computational photography architecture which makes use of two masks. Assume that c(x) is the sum of a series of cosine waves of equal amplitude; C(f_x) is then an impulse train with even symmetry, which causes modulation along a slanted plane. Specifically, Eq. (5) suggests that ℒ_o(f_u, f_s) contains replications of ℒ(f_u, f_s) along a slanted plane at angle α given by Eq. (8). This is shown in Fig. 3. For ease of explanations, we depict the lightfield spectrum as one consisting of several sections along the f_u axis, each of which is called an angular spectral slice. By adjusting α and the distance between each consecutive replications of the lightfield spectrum along the slanted plane, we can position all the sections along the f_s axis. Therefore, the 2D slice of data collected by the sensor still contains all the information about the 4D lightfield.

Fig. 3 The modulation in the lightfield-frequency domain.

Download Full Size | PDF

The tradeoff with this mode of capture is that the slice in Fig. 3 needs to be much longer than what would be needed for conventional photography; therefore, many more samples are needed to achieve the same 2D resolution for one reconstructed picture. Put another way, assume the overall number of pixels is q. Then, to resolve n different views, we only assign q/n of the pixels to sample each angular spectral slice, compared with using all q pixels for a single picture in conventional photography. This ultimately results in a loss of the spatial resolution with a scaling of 1/n. Our design of a lightfield camera seeks to ameliorate this problem by showing that when each angular spectral slice can contain more information than merely one perspective or view, fewer replicas of the lightfield spectrum are needed. This means that effectively the sensor slice is shortened, and as a result a higher resolution lightfield can be obtained with a fixed sensor size.

2.3. Lightfield capture with a double-mask design

We propose a lightfield camera as shown in Fig. 4. We assume that the lightfield spectrum is bandlimited, i.e., ℒ(f_u, f_s) = 0 for |f_u| ≥ B_u/2 or |f_s| ≥ B_s/2. This is reasonable because the optics imposes a cutoff in the optical transfer function in the f_s axis. As for f_u, Ref. [21] shows that the corresponding bandwidth is basically determined by the depth range of a scene.

Fig. 4 Our proposed lightfield camera, with two attenuation masks respectively placed at the aperture stop and the optical path in the camera.

Download Full Size | PDF

We analyze the working principle of this camera by considering the operations in the lightfield-frequency domain as shown in Fig. 5. After passing through the first attenuation mask located at the aperture stop, the incoming bandwidth-limited lightfield is convolved with the mask spectrum along the f_u axis. If the mask frequency response is a series of impulses, the lightfield spectrum is replicated along the f_u axis, causing the angular spectral slices to overlay on each other. This is the lightfield spectrum encoding. Because of the second mask, the encoded lightfield spectrum is then replicated along a slanted line. By adjusting the position of the mask, we can place the desired angular spectral slices along the f_s axis. Thereafter, we perform the lightfield reconstruction from the 2D slice data collected by the sensor in the fashion described in Section 2.2.

Fig. 5 The corresponding lightfield-frequency domain operations in our double-mask light-field camera. (The asterisk pattern in the figure denotes the convolution.)

Download Full Size | PDF

The analysis in lightfield-frequency domain provides an intuitive knowledge of our design. However, for the purpose of mask design and lightfield retrieval, we need to explicitly model the acquisition process. This is expressed as

\begin{array}{l} i (s) & = & \int_{- \infty}^{\infty} ℓ (u, s) m_{1} (u, s) m_{2} (u, s) d u \\ = & \int_{- \infty}^{\infty} ℓ (u, s) c_{1} (u) c_{2} [(1 - \frac{z}{d}) s + \frac{z}{d} u] d u, \end{array}

where i(s) is the 2D picture recorded by the sensor, and m₁(u,s) and m₂(u,s) are the respective attenuation provided by the masks at the aperture stop (c₁(x)) and at the camera’s optical path (c₂(x)) shown in Fig. 4. The formula for the masks are given in Eq. (4).

As indicated in Fig. 5, our design is based on a series of operations in the lightfield-frequency domain. Thus, it is rational to convert the integration of Eq. (10) into a form under the Fourier bases. After discretizing Eq. (10) and converting it into matrix form, we have

i = F^{- 1} M_{2} M_{1} F ℓ = F^{- 1} M F ℓ = A ℓ,

where F and F⁻¹ are the matrices consisting of the Fourier basis and its inverse, M₁ and M₂, respectively, consist of the coefficients of the Fourier transforms of c₁(x) and c₂(x) and the projection matrix A = F⁻¹MF. Therefore, the image formation of our lightfield camera can be treated as a linear integration process in the content of geometrical optics as indicated in [22, 23]. More specifically, it is a measuring procedure in the lightfield-frequency domain through a measurement matrix M = M₂M₁.

We note that the discretized lightfield ℓ is arranged into a 2D matrix of size n ×m, with n as the resolution in the u dimension and m as the resolution in the s dimension. Assume M₁ and M₂ are of size k × p and p × n, respectively. Then, M is a k × n matrix, which means that we sample k measurements of the coefficients decomposed by n Fourier bases. The size of the final captured picture i is k ×m, meaning we need a sensor with km pixels. We can compare this with the design in [14], which forbids overlapping between each replicated spectrum. Consequently, the matrix M in their case is diagonal (k = n). To achieve a lightfield with the same resolution, the dappled photography system will need nm pixels. In our design, however, the measurement matrix is the product of two matrices M₂ and M₁. This provides us with the means to control the size of the two dimensions of M separately. Hence if we can achieve a measurement matrix M with k < n, fewer pixels will be used to sample the signal. In other words, we can acquire a higher spatial resolution lightfield using the same number of pixels. As discussed next, we can then realize a measurement matrix with k < n in our design.

2.4. Design of the two masks

In this section, we describe the pattern design of these two attenuation masks. For clarity, only the case of 2D lightfield is carried out here, but these conclusions can be easily extended to a 4D lightfield.

The first row of Fig. 5 shows the desired frequency response of the first mask, which is actually a symmetric impulse train. The interval between each consecutive impulse is equal to the sampling interval of the lightfield spectrum along the f_u axis. Thus, the corresponding physical mask pattern is the sum of multiple cosine waves with a given amplitude, which in turn determines M₁ completely. Specifically, assume the first mask has the following the frequency response, i.e.,

C_{1} (f_{u}) = \sum_{i = - (n - 1)}^{n - 1} a_{i} δ (f_{u} - i Δ f_{u}),

where n is the expected resolution along the f_u axis_, a_i is the amplitude of the i-th impulse and Δf_u is the sampling interval of the lightfield spectrum along the f_u axis_, which is equal to B_u/n with B_u as the bandwidth in the f_u dimension. Because the first mask is convolved with the lightfield spectrum in the lightfield-frequency domain, by converting the convolution into a matrix multiplication, we have M₁ equal to

{[\begin{matrix} ⋮ & ⋮ & ⋮ & ⋮ \\ a_{0} & a_{1} & \dots & a_{n - 1} \\ a_{- 1} & a_{0} & \dots & a_{n - 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{- (n - 1)} & a_{- (n - 2)} & \dots & a_{0} \\ ⋮ & ⋮ & ⋮ & ⋮ \end{matrix}]}_{p \times n} .

Thus, we have constructed a matrix M₁ with a Toeplitz-structured block inside it. Because of the second mask, only k rows of M₁ are selected, so the other ones are marked with ellipses for simplicity. Note that we can recover the original sparse signal with a high probability from the limited observations measured by a well-designed Toeplitz-structured matrix [24, 25]. To satisfy the conditions for such a design, several methods have been recommended. As suggested in [24], we generate M₁ with entries a_i, i = 0,...,n − 1 drawn independently from a Gaussian distribution with zero mean. Since a_i is symmetric about a_0, the values of a_i_, for i = −(n − 1),...,−1, are then known. Eventually, we obtain the physical pattern of the first mask based on its frequency response in Eq. (12).

As for the second mask placed at the optical path, the second row in Fig. 5 has shown a heuristic example. That is, the frequency response of the second mask is a series of even-symmetric impulses with equal amplitudes. The number of impulses depends on how many measurements are required for reconstruction. To avoid aliasing between the adjacent spectrum replicas, the interval of this impulse train is equal to the lightfield bandwidth in the f_s dimension_, i.e._, B_s. Specifically, the frequency response of the second mask is given by

C_{2} (f_{x}) = \sum_{i = - (k - 1) / 2}^{(k - 1) / 2} δ (f_{x} - i B_{s}),

where k is the number of the measurements. Thus the corresponding mask pattern c₂(x) can be obtained by computing the inverse Fourier transform of Eq. (14). That is the sum of a series of cosine waves. As regards its matrix form M_2, it depends on the requirement of which measurements will be collected for further reconstruction. So we could realize the function of M₂ by selecting the k rows of M₁ according to the specific design.

2.5. Lightfield reconstruction

After constructing the two masks, we can then establish the projection matrix A in Eq. (11). Next we consider the reconstruction of the target 4D lightfield based on the captured 2D picture i and the projection matrix A. We adopt two different approaches to solve such an inverse problem. The first is to find its least-norm solution, i.e.,

ℓ^{★} = A^{†} i = A^{T} {(A A^{T})}^{- 1} i,

where A^† denotes the pseudoinverse of A. While this is simple and fast, due to the lack of prior information about the lightfield, the solution is often not sufficiently accurate. To improve the reconstruction accuracy, we make use of the prior knowledge about a lightfield and impose regularization in the reconstruction process. One possibility is a sparse regularizer, which is a 2D total variation (TV) penalty on the u dimension of a lightfield to reflect the inherent correlations. We also use the 2D TV norm regularization on the s dimension of a lightfield to preserve the edges and suppress the noise [26–28]. Thus, we reconstruct the lightfield by the optimization given by

ℓ^{★} = \underset{ℓ}{argmin} {\frac{1}{2} {‖ A ℓ - i ‖}_{2}^{2} + λ \sum_{u} {‖ i_{u} ‖}_{TV} + μ \sum_{s} {‖ i_{s} ‖}_{TV}},

where λ and μ are the regularization parameters, i_u is a 2D image corresponding to the lightfield ℓ(u, s) at a fixed point u, and i_s refers to the lightfield ℓ(u, s) at a fixed point s.

This optimization can be solved via a nonlinear conjugate gradient method combined with backtracking line search, as adopted in [29].

3. Experimental results

To verify the ability to achieve a high-resolution lightfield, a direct way is to use a fixed number of pixels to retrieve a lightfield with a higher spatial resolution. Alternatively, one can aim at obtaining a lightfield of a fixed resolution with fewer pixels, which is the approach we take here. The following experiments are based on actual lightfield datasets from the Stanford lightfield archive [30]. For computational considerations, we choose 100 views on a 10 × 10 grid and resize the image to 128 × 256 pixels.

Figure 6 shows the corresponding mask patterns that are adopted in the experiments. According to Eq. (12) in Section 2.4, the required frequency response of the mask at the aperture stop is an even-symmetric impulse train of size 19×19 (where n = 10×10 in our experiments). The corresponding amplitude of these impulses are drawn independently from a Gaussian distribution with zero mean. The physical pattern shown in Fig. 6(a) is the one we use here. Since the mask at the aperture stop is responsible for encoding the lightfield spectrum, we keep this mask unchanged during our experiments.

Fig. 6 (a) The pattern of the first mask; (b)–(e) are the pattern parts of the second mask, respectively in cases of using full, 64%, 36% and 16% sensor size.

Download Full Size | PDF

For the mask placed at the optical path, its frequency response depends on the specific requirement of the measurement number. For example, for the case of using full sensor size (i.e., 1280 × 2560), it is a 10 × 10 impulse train with equal amplitude based on Eq. (14). Similarly, we have 8 × 8 for the case of using 64% sensor size (i.e., 1024 × 2048), 6 × 6 for the case of using 36% sensor size (i.e., 768 × 1536) and 4 × 4 for the case of using 16% sensor size (i.e., 512 × 1024). Figure 6(b) – 6(e) show the corresponding pattern parts in these different cases. Notice that since we cannot have negative values in the mask, we need to increase the DC component so that the values in these masks are nonnegative.

Next, we show the performance of our camera when using different sensor sizes. That is, we aim to retrieve the original lightfield of the same spatial resolution from the captured signals by using different physical sensor sizes. Figure 7 shows the captured pictures by using the proposed lightfield camera with different number of pixels. Figure 8 shows the corresponding reconstruction images at one selected viewpoint. For the sake of comparison, we use both the least-norm method in Eq. (15) and our proposed algorithm in Eq. (16) for lightfield reconstruction. In the case of using full sensor, both methods can yield perfect reconstructions as given in the ground truth. With a mild reduction in sensor size, the recoveries can still provide us good details comparable with the ground truth, such as the ones shown in the case of using 64% sensor size. With further reduction, however, the reconstruction becomes difficult, although the reconstructed images are still satisfactory with 36% and 16% pixels. Furthermore, in comparison with the reconstructions by using the least-norm method (the left column in Fig. 8), we can see that our method can preserve more details and provide better artifact control (e.g., the ringing artifacts around the beans). Nevertheless, we also observe that with significant sensor size reduction, some of the details in the images are lost and the images are blurry.

Fig. 8 The reconstructed images at one selected viewpoint by using the least-norm method (left column) and the proposed method in Section 2.5 (right column): (a) ground truth, (b) and (c) full size, (d) and (e) 64% sensor size, (f) and (g) 36% sensor size, (h) and (i) 16% sensor size.

Download Full Size | PDF

Finally, we show that a higher resolution lightfield can be acquired with our proposed system than that with the conventional lightfield cameras when using the same sensor size. Figure 9 shows the case of using 36% sensor size (i.e., 768 × 1536). If we use the conventional lightfield cameras such as the ones in [13, 14], the maximum spatial resolution that can be achieved will be 76 × 153. From the results shown in Fig. 9, we can see that with our proposed camera the lightfield can be recovered at a higher spatial resolution. Such a resolution enhancement effect becomes more prominent in the case of using 16% sensor size (i.e., 512 × 1024). In this case, the best quality that can be achieved with the conventional method is 51 ×102. But by adopting the proposed camera, we can still reconstruct many details of the scene from the captured data. See Fig. 10 for details.

Fig. 9 Reconstructions when using 36% sensor size: (a) ground truth; (b) the best quality that can be achieved by using the traditional lightfield cameras; (c) our reconstruction with the least-norm method; (d) our reconstruction with the proposed iterative method.

Download Full Size | PDF

Fig. 10 Reconstructions when using 16% sensor size: (a) ground truth; (b) the best quality that can be achieved by using the traditional lightfield cameras; (c) our reconstruction with the least-norm method; (d) our reconstruction with the proposed iterative method.

Download Full Size | PDF

4. Conclusions

We show a system that can capture a 4D lightfield with two attenuation masks. Taking advantage of the correlations inherent in the lightfield, we develop a post-processing algorithm to reconstruct the lightfield from the captured 2D data from the sensor. The experimental results show that fewer pixels are needed to achieve the same resolution as what one can achieve with a conventional lightfield camera.

Acknowledgments

This work was supported in part by the University Research Committee of the University of Hong Kong under Project 10208648.

References and links

1. E. Y. Lam, “Computational photography: Advances and challenges,” in Tribute to Joseph W. Goodman, H. J. Caulfield and H. H. Arsenault, eds., Proc. SPIE 8122, 81220O (2011).

2. W. T. Cathey and E. R. Dowski, “New paradigm for imaging systems,” Appl. Opt. 41, 6080–6092 (2002). [CrossRef] [PubMed]

3. J. Mait, R. Athale, and J. van der Gracht, “Evolutionary paths in imaging and recent trends,” Opt. Express 11, 2093–2101 (2003). [CrossRef] [PubMed]

4. W.-S. Chan, E. Y. Lam, M. K. Ng, and G. Y. Mak, “Super-resolution reconstruction in a computational compound-eye imaging system,” Multidim. Syst. Sign. Process 18, 83–101 (2007). [CrossRef]

5. T. Mirani, D. Rajan, M. P. Christensen, S. C. Douglas, and S. L. Wood, “Computational imaging systems: Joint design and end-to-end optimality,” Appl. Opt. 47, B86–B103 (2008). [CrossRef] [PubMed]

6. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.

7. J. W. Goodman, Introduction to Fourier Optics, 3rd ed. (Roberts and Company Publishers, 2004).

8. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing, M. S. Landy and J. A. Movshon, eds. (MIT Press, 1991), pp. 3–20.

9. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proceedings of ACM SIGGRAPH (1996), pp. 43–54.

10. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Théor. Appl. 7, 821–825 (1908). [CrossRef] [PubMed]

11. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in Proceedings of ACM SIGGRAPH (2005), pp. 765–776. [CrossRef]

12. E. H. Adelson and J. Y. Wang, “Single lens stereo with a plenoptic camera,” IEEE Trans. Pattern Anal. Mach. Intell. 14, 99–106 (1992). [CrossRef]

13. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report CTSR (2005), pp. 1–11.

14. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH 26, (2007).

15. A. Agrawal, A. Veeraraghavan, and R. Raskar, “Reinterpretable imager: Towards variable post-capture space, angle and time resolution in photography,” Comput. Graph. Forum 29, 763–772 (2010). [CrossRef]

16. T. Georgeiv, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution tradeoff in integral photography,” in Proceedings of Eurographics Symposium on Rendering (2006), pp. 263–272.

17. Z. Xu and E. Y. Lam, “Light field superresolution reconstruction in computational photography,” in Signal Recovery and Synthesis, (Optical Society of America, 2011), p. SMB3.

18. C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H. H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” in Proceedings of ACM SIGGRAPH 27 (2008), pp. 1–10. [CrossRef]

19. A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2009), pp. 1–8. [CrossRef]

20. R. N. Bracewell, The Fourier Transform and Its Applications, 3rd ed. (McGraw-Hill, 1999).

21. J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum, “Plenoptic sampling,” in Proceedings of ACM SIGGRAPH 27 (2000), pp. 307–318.

22. A. Levin, W. T. Freeman, and F. Durand, “Understanding camera trade-offs through a Bayesian analysis of light field projections,” in Proceedings of the 10th European Conference on Computer Vision (2008), pp. 88–101.

23. Z. Xu and E. Y. Lam, “A spatial projection analysis of light field capture,” in Frontiers in Optics, (Optical Society of America, 2010), p. FWH2.

24. W. U. Bajwa, J. D. Haupt, G. M. Raz, S. J. Wright, and R. D. Nowak, “Toeplitz-structured compressed sensing matrices,” in Proceedings of IEEE/SP 14th Workshop on Statistical Signal Processing, (IEEE, 2007), pp. 294–298. [CrossRef]

25. W. Yin, S. Morgan, J. Yang, and Y. Zhang, “Practical compressive sensing with Toeplitz and circulant matrices,” in Visual Communications and Image Processing , Proc. SPIE 7744, 77440K (2010).

26. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60, 259–268 (1992). [CrossRef]

27. E. Y. Lam, X. Zhang, H. Vo, T.-C. Poon, and G. Indebetouw, “Three-dimensional microscopy and sectional image reconstruction using optical scanning holography,” Appl. Opt. 48, H113–H119 (2009). [CrossRef] [PubMed]

28. X. Zhang and E. Y. Lam, “Edge-preserving sectional image reconstruction in optical scanning holography,” J. Opt. Soc. Am. A 27, 1630–1637 (2010). [CrossRef]

29. Z. Xu and E. Y. Lam, “Image reconstruction using spectroscopic and hyperspectral information for compressive terahertz imaging,” J. Opt. Soc. Am. A 27, 1638–1646 (2010). [CrossRef]

30. “The (new) Stanford light field archive,” http://lightfield.stanford.edu/lfs.html.

High-resolution lightfield photography using two masks

Abstract

1. Introduction

2. A lightfield camera with two masks

2.1. Lightfield mapping via mask-based multiplexing

2.2. Lightfield capture and image reconstruction

2.3. Lightfield capture with a double-mask design

2.4. Design of the two masks

2.5. Lightfield reconstruction

3. Experimental results

4. Conclusions

Acknowledgments

References and links

Cited By

Figures (9)

Equations (16)

Optics Express