## Abstract

Mask based lensless imagers have huge application prospects due to their ultra-thin body. However, the visual perception of the restored images is poor due to the ill conditioned nature of the system. In this work, we proposed a deep analytic network by imitating the traditional optimization process as an end-to-end network. Our network combines analytic updates with a deep denoiser prior to progressively improve lensless image quality over a few iterations. The convergence is proven mathematically and verified in the results. In addition, our method is universal in non-blind restoration. We detailed the solution for the general inverse problem and conducted five groups of deblurring experiments as examples. Both experimental results demonstrate that our method achieves superior performance against the existing state-of-the-art methods.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Computational imaging has developed rapidly in recent years. Unlike traditional optical imaging based on refractive lens, computational imaging system does not directly realize the point-to-point mapping from the object to the image, but makes a kind of coding for the object, and then restores the image from the encoded pattern. Although it complicates the imaging process, the weight and volume of the imaging system are greatly reduced. In the past few years, various miniaturized lensless imaging systems have been proposed. Asif et al. [1,2] proposed the FlatCam lensless camera that only contains an amplitude mask in front of the CMOS sensor. It not only has an eye-catching ultra-thin body, but also can achieve a variety of functions such as 3D imaging [3], 3D fluorescence microscopy [4]. And the FlatCam was used in face recognition [5] and privacy protection [6]. Based on the similar principle, Deweert et al. [7] used a spatial light modulator to implement a programmable mask. His experiments showed that the result of multiple masks is better than that of single mask.

In addition to the above amplitude masks, phase masks are more widely used lensless imaging [8]. Boominathan et al. proposed a phase-encoded version of FlatCam called the PhlatCam [9], which only contains a phase mask. Compared with amplitude encoding, phase encoding can better regulate the PSF, so as to obtain better imaging results. Other phase devices commonly used for lensless imaging include Fresnel zone aperture [10,11], phase gratings [12], diffraction lenses [13,14], and diffuser [15]. The first three devices are controllable phase masks, while diffuser is a pseudo-random phase mask. Cao et al. [16] used Fresnel zone aperture for wave-front coding and achieved imaging under incoherent illumination. Gill et al. [17] designed a Planar Fourier Capture Array that consists of an array of angle-sensitive pixels. Every pixel is composed of a photodiode under two metal gratings and reports one component of a spatial two-dimensional Fourier transform of the local light field. Gill and Stork [18,19] designed the odd-symmetry spiral phase gratings and used them for lensless imaging. Imaging based on the principle of diffraction has gained popularity in recent years. Peng et al. [20] conducted multiple diffractive experiments such as pure phase plate, stacked phase plates, phase plate and lens. This work provided an alternative candidate to build light efficient and thin optics for white light imaging. To address the chromatic aberration of diffraction, Peng et al. [21] designed an optimized diffraction lens that makes the PSF of each band the same. By jointly optimizing the parameters of the optical device and the reconstruction algorithm, Sitzmann et al. [22] proposed an end-to-end snapshot super-resolution imaging scheme. Diffuser is a random phase mask with optical memory effect. Antipa et al. [23] proposed the DiffuserCam that is a lensless imaging system based on diffuser. Similar to the FlatCam, the DiffuserCam also has a microscopic imaging function [24]. After adding a spectral filter array in front of the sensor, DiffuserCam can perform spectral imaging [25]. Kim et al. [26] proposed a “see-through” lensless camera that the sensor was placed at on the side of the glass. In this system, the glass on the side plays the role of a random phase encoder, and the object is encoded after being refracted.

Due to the ill-conditioned nature of the inverse problem, image restoration (IR) is a key step in lensless imaging. A common approach is to establish the cost function by maximizing a posterior probability (MAP) $P({x|y} )$:

where*y*and

*x*represent the measurement and the latent image, respectively. $\log P({y|x} )$ denotes the log-likelihood of the measurement and $\log P(x )$ represents the prior information of the latent image. The traditional model-based methods use the principle of optimization to output the reconstructed images after multiple iterations [1,9]. Generally, the qualities of the output images from these methods are not satisfactory because the latent images do not fully conform to the hand-picked image prior such as Hyper-Laplacian prior [27], Total Variation prior [28] etc. Fortunately, deep learning provides a new path for high-quality lensless image reconstruction [29,30]. Further, recent study has shown that deep convolutional neural networks (DCNN) not only has a strong nonlinear fitting ability, but also can be used as a natural image prior [31]. Therefore, the unrolled network originated from unrolled optimization was proposed, that can combine traditional optimization with deep convolutional neural networks to explore the advantages of the both. Unrolled optimization decouples decoding and denoising and transfers the pressure of solving the inverse problem to the denoising algorithm that has many successful solutions at present. Through half-quadratic splitting (HQS), the original optimization equation is changed into the following form:

In this paper, we proposed a deep denoiser prior based deep analytic network (DPDAN) for FlatCam lensless imaging. The forward imaging link of the lensless system is shown as Fig. 1(a), and the backward image restoration is shown as Fig. 1(b). Our entire end-to-end network can be unfolded into a few blocks, each of which contains an analytical update function and the subsequent DCNN. The DCNN in our network is served as a denoiser in the form of implicit priors rather than the whole inverse operation, which makes network functions easier to implement. In particular, we solve the convex problem in the form of the analytic solution. Comparative experiments show that the results analytic updates are better than those of gradient descent. Furthermore, starting from the strong convexity of the sub-problem, we prove mathematically that the iterations in our deep analytic network is convergent (see section 3.3). And the iteration convergence is supported by real shooting reconstruction results. To the best of our knowledge, this is the first time that the combination of analytic solution and DCNN has been analyzed in detail. Benefit by fast analytic updates and the powerful expression capability of DCNN, our DPDAN has achieved the best FlatCam restoration results in the published literature at present.

In fact, our method is a universal non-blind restoration method. We have discussed the solutions for the general inverse problems (see section 3.2), and conducted five groups of non-blind deblurring experiments as examples. These experiments demonstrate that our method has achieved the state-of-the-art results and is superior to similar methods in terms of visual effects and objective evaluations.

## 2. Imaging model

In this paper, we proposed a restoration method for FlatCam lensless imaging based on unrolled network. FlatCam is an ultrathin lensless imaging system (see Fig. 1) that only contained an amplitude mask placed at a submillimeter distance in front of the sensor [1–3]. In principle, FlatCam can be approximated as a code aperture system. The light from each point in the scene contributes an enlarged version of the mask pattern to the sensor. In order to reduce the parameters of the system transfer matrix, FlatCam used a rank 1 mask pattern that usually generated from a pseudo-random sequence such as maximum length sequence [1], almost perfect sequence [42]. By encoding the rows and columns separately, the imaging model can be written as:

where*y*is the sensor measurement, ${\Phi _L},\textrm{ }{\Phi _R}$ are the system transfer matrixes,

*x*and

*n*represent the latent image and the additive noise, respectively. By projecting a set of designed horizontal and vertical stripes on the screen, we can calibrate the system transfer matrix [1]. After obtaining the system transfer matrixes, we can establish the cost function according to the priors, and then use the optimization algorithm to restore the latent image from the measurement.

## 3. Solutions

Whether it is low-level image processing or computational imaging, our goal is to recover latent clear images from sensor measurements. Unfortunately, we can hardly perform reverse restoration directly due to the ill nature of the system. A common solution is minimize the value of the cost function, which can be called a prior engineering. The former in the Eq. (1) is derived from the prior of noise, and the latter is derived from the prior of image. In this paper, we adopt the DCNN denoiser prior in the restoration method. The following details the specific solutions of our method for FlatCam reconstruction and general inverse problems.

#### 3.1 FlatCam reconstruction

Through variable separation, the cost function is written as Eq. (2). The additive noise is generally assumed to be independent and identically distributed Gaussian noise, resulting in logarithmic likelihood being Frobenius norm. Therefore, the optimization equation can be rewritten as follows:

Before reconstruction, we perform singular value decomposition (SVD) on the transfer matrixes obtained by calibration, $[{{u_L},{s_L},{v_L}^T} ]= {\textrm{SVD}} ({\Phi _L}),\textrm{ }[{{u_R},{s_R},{v_R}^T} ]= {\textrm{SVD}} ({\Phi _R})$. This step is used for subsequent analytic updates. As soon as our network is fed a sensor measurement, initial estimation is performed. In the FlatCam reconstruction, Tikhonov regularization [43] is used for initial estimation:

The above equation can be easily solved. ${z_0}$ is initialized to ${x_0}$. After that, the initial estimation and measurement are sent into the iteration. In each iteration, we first applies an analytic function and then uses a DCNN denoiser to improve image quality. By deriving *x* and setting the gradient equal to zero, the following formula can be obtained:

The analytic solution is achieved by replacing ${\Phi _L}$ and ${\Phi _R}$ with their SVD decompositions,

Here, ./ represents the elementwise division of the matrixes, ${\sigma _L} = diag({{s_L}^2} )$, ${\sigma _R} = diag({{s_R}^2} )$. $diag$ means to take diagonal elements of the matrix. After the analytical updates, a DCNN denoiser the intermediate image in the form of an implicit prior. The structure of DCNN is illustrated in Fig. 2(b). As we can see, it is a four-scale U-Net [44] for learning residuals. In each scale of the encoder, there is a feature extraction block which sandwiched a res-block [45] between two convolutional layers. The stride of the second convolutional layer is two to down-sampling the feature maps. The number of channels on the four scales is $({32,64,\textrm{128,}256} )$. And the bottleneck of the DCNN denoiser is a convolutional layer with $512$ channels. In each scale of the decoder, there is an up-sampling layer, a convolutional layer for down-sampling the channels, and a feature extraction block same as that of the encoder. Finally, a three-channel convolutional layer restored image to the original color space. The kernel size of all convolutional layer is $3 \times 3$ except the down-sampling channel layer of the decoder which is $1 \times 1$.

To show the superiority of the analytic solution in the unrolled framework, we take gradient descent to solve the sub-problem and keep the rest unchanged in each iteration. With the gradient descent, the update of ${x_k}$ is shown as follows:

#### 3.2 General inverse problem

In addition to FlatCam reconstruction, our end-to-end network is also applicable to general inverse problem. In this section, we discussed our solution for general inverse problem in details.

Under the Gaussian noise model, the optimization equation for general inverse problem is shown as follows after the separation of variables:

It is obviously that matrix inversion is a large obstacle to the path to analytic solution because of its computational complexity. Inspired by FlatCam restoration, singular value decomposition of $A$ is used to reduce the computational complexity of matrix inversion because the inverses of a diagonal matrix and a unitary matrix are easy to obtain. With SVD, the analytic solution of the first convex equation can be rewritten in the following form:

*I*is the identity matrix. In actual operations, we perform SVD in advance and store these three matrixes. Therefore, the updates of the analytic solution can be calculated quickly and efficiently. For the second equation in Eq. (9), we take the DCNN denoiser to solve it, as the same as FlatCam reconstruction.

From a macro perspective, the proposed deep denoiser prior based deep analytic network is a stacked denoising network that interspersed with several matrix operations. By analyzing the gradient, we prove that the analytic update operations do not introduce singularities and gradient explosions during the backward propagation. By vectorizing the matrix, Eq. (11) can be written in the following form:

*m*, and $\psi $ is a diagonal matrix and is equal to ${\mu _k}{({{s_A}^T{s_A} + {\mu_k}I} )^{ - 1}}$. In writing Eq. (11) in the form of Eq. (12), we omitted the term related to

*y*because it does not affect the derivative of $x$ to $z$. Therefore, the Jacobian matrix of

*x*with respect to $z$ is $I \otimes ({{v_A}\psi {v_A}^T} )$. Since the latter is a real symmetric matrix, the Jacobian matrix is also real symmetric. According to $rank({P \otimes Q} )= rank(P )rank(Q )$, $({I \otimes {v_A}} )$ must be full rank. According to the Kronecker product’s property in which ${({P \otimes Q} )^{ - 1}} = {P^{ - 1}} \otimes {Q^{ - 1}}$, we can obtain

Thus, $({I \otimes {v_A}} )$ is a unitary orthogonal matrix that is norm preserving. Similarly, $({I \otimes {v_A}^T} )$ is also a unitary orthogonal matrix. By combining the eigenvalue decomposition properties of the matrix, we know that the second row of Eq. (12) is the eigenvalue decomposition of the Jacobian matrix. All elements in $\psi $ are in $({0,1} )$. In summary, the Jacobian matrix is a full rank matrix and its eigenvalues are all between 0 and 1. In conclusion, the analytic solution updating step does not introduce singularity and gradient explosion during the process of updating the network parameters by back propagation. We have shown in [42], the analytic updates of the FlatCam model $y = {\Phi _L}x{\Phi _R}^T$ also do not introduce singularity and gradient explosion. The above analysis lays a theoretical foundation for our end-to-end network to successfully complete the training process.

To demonstrate the powerful reconstruction ability of our method for general inverse problems, we take deblurring as the example and do multiple groups of non-blind deblurring experiments. In deblurring, the system degradation matrix $A$ is a huge sparse matrix. It is not wise to solve the equation in matrix form. The equation is usually written in the form of convolution:

*m*. The results are shown in the latter part.

#### 3.3 Convergence analysis

In this section, we focus on the convergence of our network. In our network, the total cost function is as follows:

The total cost function satisfies lower bounded and coercive by default. And the coercive means $\varphi ({x,z} )\to \infty $ whenever $||{({x,z} )} ||\to \infty $, that ensures sequence $({{x_k},{z_k}} )$ is bounded. When dealing with general inverse problems, ${\Phi _L} = A,\textrm{ }{\Phi _R} = I$. Under the following assumption, our method has a subsequence that converge to the stationary point $({{x^ \ast },{z^ \ast }} )$ of the cost function,

First of all, we prove that $\varphi $ is a strongly convex function of $x$ with the convexity parameter is $\mu $. According to the properties of the matrix trace $tr({m{m^T}} )\ge 0$ and $tr({PQ} )= tr({QP} )$,

Therefore, the following inner product is satisfied:

The above formula proves strong convexity. Under the background of strongly convex, we can obtain that $\varphi ({x,z} )> \varphi ({{x^ \ast },z} )+ \frac{\mu }{2}{||{x - {x^ \ast }} ||_2}^2,\textrm{ }\forall x \ne {x^ \ast }$. Therefore, the analytic update satisfies the descending property:

where ${c_1}: = {{{\mu _{k + 1}}} / 2} > 0$. By assumption, the denoising step also satisfies the descending:Thus, the iteration in our network is descent,

Since $\varphi $ is monotonically decreasing, coercive, and lower bounded, there is a convergent subsequence ${({{x_{{k_t}}},{z_{{k_t}}}} )_{t = 0,1, \cdots }} \to ({{x^ \ast },{z^ \ast }} )$. By summing Eq. (20) and Eq. (21) along $k$, we can get $\sum\limits_k {{{||{{x_k} - {x_{k + 1}}} ||}_2}^2} < \infty$ and $\sum\limits_k {{{||{{{\tilde{\nabla }}_z}\varphi ({{x_{k + 1}},{z_k}} )} ||}_2}^2} < \infty$. Therefore,

From Eq. (23), we can also get $\mathop {\lim }\limits_{k \to \infty } {||{{z_{k - 1}} - {z_k}} ||_2}^2 = 0$. Since ${\nabla _z}\varphi ({x,z} )$ is continuous along $x$, thus $\mathop {\lim }\limits_{k \to \infty } {||{{{\tilde{\nabla }}_z}\varphi ({{x_k},{z_k}} )} ||_2}^2 = 0$. The property of analytic solution guarantees ${\nabla _x}\varphi ({{x_{k + 1}},{z_k}} )= 0$. Similarly, ${\nabla _x}\varphi ({x,z} )$ is continuous along $x$, therefore $\mathop {\lim }\limits_{k \to \infty } {\nabla _x}\varphi ({{x_k},{z_k}} )= 0$. To sum up, our method is gradually convergent. And the results of actual experiments confirm the convergence.

## 4. Implementation details

For FlatCam reconstruction, we built an experimental device (see Fig. 3) and photographed a dataset to train our end-to-end network. This dataset contains 10000 data pairs and each data pair consists of a $2048 \times 2048 \times 3$ measurement and a $512 \times 512 \times 3$ ground truth. In our network, the number of iterations $k = 6$. ${\lambda _0}$ that used for initial estimation is equal to $0.0005$. The penalty parameter ${\mu _k}$ is fixed at $\eta \cdot {2^{k - 1}}$. In our experiments, $\eta $ is taken between $[{0.1,\textrm{ }1} ]$.

For deblurring experiment, we cropped $78156$ patches $({128 \times 128} )$ from DIV2K dataset [47] and BSD400 dataset [48]. Then, the dataset is synthesized by using blur kernel convolve the clear image and adding noise with a standard deviation of ${\sigma _n}$. Two motion kernels [49] and a Gaussian blur kernel is used. In the deblurring network, the initial estimation ${x_0} = {A^T}y$, and ${\mu _k} = \eta \cdot {2^{k - 1}}$. When ${\sigma _n} = 2.0$ or ${\sigma _n} = 2.55$, $\eta $ is set to $0.01$. When ${\sigma _n} = 7.65$, $\eta = 0.02$.

In this paper, reconstruction tasks are implemented by TensorFlow, and the loss function is the mean squared error between network output and the ground truth. During training, learning rate is initialized to $0.0001$, and then halved every 20 epochs for a total of 100 epochs. The ADAM optimizer is used to update network parameters. And the gradients are clipped by norm with coefficient $6$. All networks are operated on a Linux server that has 64GB of memory and four NVIDIA GTX 1080Ti cards.

## 5. Results

This section shows our restoration results in detail. The first part is the results of FlatCam, the second part is the results of non-blind de-blur. PSNR and SSIM are used for quantitative evaluation.

#### 5.1 FlatCam imaging

To demonstrate the beneficial effects of our method, we compare our network results against those of existing methods, including LAsNet [42], Tikhonov regularization [43], FISTA reconstruction with $L1$ regularization [50], and FlatNet [51]. Several test images are from ImageNet [52] and most test images are from valid set of DIV2K. The results of ImageNet images are shown in Fig. 4. It is evident that our results are significantly better than those of other methods. The enlarged blur blocks show that our results are clearer and contain less noise. The contents in the yellow boxes indicate that our method can restore the details better. Our method also scored higher on objective evaluation. Overlooking the whole image, we can find that the color distortion of our reconstructed images is much smaller than that of FlatNet. Compared to FlatNet, our method has fewer parameters and FLOPs, but more matrix operations. The total parameters, total FLOPs, and run times are summarized in Table 1.

Our network also performed well on the DIV2K valid set. The 100 original images in DIV2K valid set are scaled to $512 \times 512$, and then measured by our device (Fig. 3). Compared with LAsNet, our network can recover finer structures and clearer textures (see Fig. 5), which shows the strong expressive ability of our DCNN in the form of implicit priors. To demonstrate the effectiveness of analytic updates, we use gradient descent (denoted as Our-GD) as a contrast. The results of analytical updating have sharper edges (see Fig. 5). To show the robustness of our method, we perform experiments when the forward model is inaccurate as shown in Fig. 6. Our-E5 represents random erasure of 5% of the transfer matrix data, and Our-S5 simulates the stripes shake less than 5 pixels randomly in one direction during calibration. The PSNR dropped by about 0.4dB, and the overall restoration effects are not too bad. The quantitative evaluations of the 100 image are summarized in Table 2. The PSNR and SSIM values are drawn in Fig. 7. The curves show that our method outperforms the existing methods on almost every image.

As analyzed in Section 3.3, our method is gradually convergent. And our experimental results verify the convergence. Two images of DIV2K valid set are chose to demonstrate the convergence property of our method for FlatCam reconstruction, as shown in Fig. 8 and Fig. 9. Among them, the first row is the comparison of the three restoration results. The second and third rows show images of our network at different stages. It can be seen intuitively that the quality of the image gradually improves with the progress of iteration. Figure 8 focuses on the convergence of *x* and Fig. 9 focuses on $z$.

#### 5.2 Non-blind deblurring

In this paper, we conduct five groups of non-blind deblurring experiments. And 10 images are used for deblur testing, as shown in Fig. 10. We compare our method with the classical model-based method EPLL [53] and the denoising-based methods IRCNN [37], DPDNN [39]. DPDNN is similar to our network in that it uses gradient descent to solve Eq. (14) while we use analytical solutions. Table 3 summarized the average PSNR values of the 10 restored images by five methods. Ours-GD means to replace analytical update with gradient descent. The first line of Table 3 is five groups of blur experiment scenarios. The detailed PSNR values of each image are shown Fig. 11 in the form of curves. Our method is significantly ahead of the comparison methods in objective evaluation, which confirms from the side that analytical updates is faster than gradient descent under deep unfolding framework. Three groups of images visually show the deblurring effect of different scenarios (see Figs. 12–14). The green boxes in Fig. 12 show that our result retains more details such as the lines on the hat and the hairs. And Fig. 13 demonstrate that our result recovers the most object information in the region close to the background. Compared to Our-GD, the enlarged blocks in Fig. 14 illustrate that our result in higher sharpness and contrast. The results of our method are convergent, as shown in (e) section of Figs. 12–14.

## 6. Conclusion

In this paper, we mimicked the traditional optimization process as a deep analytic network for FlatCam image reconstruction, which is more effective than gradient descent under unrolled framework. The real shooting experiments demonstrate that the image quality of our results is superior to that of the existing results in terms of visual perception and quantitative evaluation. Starting from the strong convexity of the sub-problem, we prove mathematically that the iterations in our network is gradually convergent, that is also supported by reconstructed results. In addition, our method is a universal non-blind restoration method. We give the solution to the general inverse problem and prove that the analytical update inserted between the DCNN denoising operations will not introduce singularity and gradient explosion during backward propagation. Five groups of deblurring experiments show the excellent effectiveness of our method for general inverse problem.

## Funding

ZJU-Sunny Photonics Innovation Center (#2019-04); Science and Technology on Optical Radiation Laboratory (61424080211); National Natural Science Foundation of China (61975175).

## Acknowledgments

We thank Bian meijuan from the facility platform of optical engineering of Zhejiang University for instrument support.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are available in Ref. [39], Ref. [47], Ref. [48], and Ref. [52].

## References

**1. **M. S. Asif, A. Ayremlou, A. Sankaranarayanan, A. Veeraraghavan, and R. G. Baraniuk, “Flatcam: Thin, lensless cameras using coded aperture and computation,” IEEE Trans. Comput. Imaging. **3**(3), 384–397 (2017). [CrossRef]

**2. **V. Boominathan, J. K. Adams, M. S. Asif, R. G. Baraniuk, and A. Veeraraghavan, “Lensless Imaging: A computational renaissance,” IEEE Signal Process. Mag. **33**(5), 23–35 (2016). [CrossRef]

**3. **M. S. Asif, “Lensless 3D imaging using mask-based cameras,” in Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2018), pp. 6498–6502.

**4. **J. K. Adams, V. Boominathan, B. W. Avants, D. G. Vercosa, F. Ye, R. G. Baraniuk, J. T. Robinson, and A. Veeraraghavan, “Single-frame 3D fluorescence microscopy with ultraminiature lensless FlatScope,” Sci. Adv. **3**(12), e1701548 (2017). [CrossRef]

**5. **J. Tan, L. Niu, J. K. Adams, V. Boominathan, J. T. Robinson, R. G. Baraniuk, and A. Veeraraghavan, “Face detection and verification using lensless cameras,” IEEE Trans. Comput. Imaging. **5**(2), 180–194 (2019). [CrossRef]

**6. **T. N. Canh and H. Nagahara, “Deep compressive sensing for visual privacy protection in FlatCam imaging,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2019), pp. 3978–3986.

**7. **M. J. DeWeert and B. P. Farm, “Lensless coded-aperture imaging with separable Doubly-Toeplitz masks,” Opt. Eng. **54**(2), 023102 (2015). [CrossRef]

**8. **W. Chi and N. George, “Optical imaging with phase-coded aperture,” Opt. Express **19**(5), 4294–4300 (2011). [CrossRef]

**9. **V. Boominathan, J. K. Adams, J. T. Robinson, and A. Veeraraghavan, “PhlatCam: Designed phase-mask based thin lensless camera,” IEEE Trans. Pattern Anal. Mach. Intell. **42**(7), 1618–1629 (2020). [CrossRef]

**10. **K. Tajima, T. Shimano, Y. Nakamura, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with multi-phased fresnel zone aperture,” in IEEE International Conference on Computational Photography (IEEE, 2017), pp. 1–7.

**11. **T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with Fresnel zone aperture: quasi-coherent coding,” Appl. Opt. **57**(11), 2841–2850 (2018). [CrossRef]

**12. **P. R. Gill, “Odd-symmetry phase gratings produce optical nulls uniquely insensitive to wavelength and depth,” Opt. Lett. **38**(12), 2074–2076 (2013). [CrossRef]

**13. **N. Mohammad, M. Meem, B. Shen, P. Wang, and R. Menon, “Broadband imaging with one planar diffractive lens,” Sci. Rep. **8**(1), 2799 (2018). [CrossRef]

**14. **P. Wang, N. Mohammad, and R. Menon, “Chromatic-aberration-corrected diffractive lenses for ultra-broadband focusing,” Sci. Rep. **6**(1), 1–7 (2016). [CrossRef]

**15. **S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathiset, “Imaging through glass diffusers using densely connected convolutional networks,” Optica **5**(7), 803–813 (2018). [CrossRef]

**16. **J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, “Single-shot lensless imaging with fresnel zone aperture and incoherent illumination,” Light: Sci. Appl. **9**(1), 1–11 (2020). [CrossRef]

**17. **P. R. Gill, C. Lee, D. G. Lee, A. Wang, and A. Molnar, “A microscale camera using direct Fourier-domain scene capture,” Opt. Lett **36**(15), 2949–2951 (2011). [CrossRef]

**18. **D. G. Stork and P. R. Gill, “Optical, mathematical, and computational foundations of lensless ultra-miniature diffractive imagers and sensors,” Intern. J. Adv. Syst. Measure. **7**(3), 4 (2014).

**19. **P. R. Gill and D. G. Stork, “Lensless ultra-miniature imagers using odd-symmetry spiral phase gratings,” in * Computational Optical Sensing and Imaging* (Optical Society of America, 2013) paper CW4C-3.

**20. **Y. Peng, Q. Fu, H. Amata, S. Su, F. Heide, and W. Heidrich, “Computational imaging using lightweight diffractive-refractive optics,” Opt. Express **23**(24), 31393–31407 (2015). [CrossRef]

**21. **Y. Peng, Q. Fu, F. Heide, and W. Heidrich, “The diffractive achromat full spectrum computational imaging with diffractive optics,” In * Virtual Reality meets Physical Reality: Modelling and Simulating Virtual Humans and Environments* (SIGGRAPH ASIA, 2016), pp. 1–2.

**22. **V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph **37**(4), 1–13 (2018). [CrossRef]

**23. **N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng, and L. Waller, “DiffuserCam: lensless single-exposure 3D imaging,” Optica **5**(1), 1–9 (2018). [CrossRef]

**24. **G. Kuo, N. Antipa, R. Ng, and L. Waller, “3D Fluorescence microscopy with diffusercam,” in * Computational Optical Sensing and Imaging* (Optical Society of America, 2018) paper CM3E-3.

**25. **K. Monakhova, K. Yanny, N. Aggarwal, and L. Waller, “Spectral DiffuserCam: Lensless snapshot hyperspectral imaging with a spectral filter array,” Optica **7**(10), 1298–1307 (2020). [CrossRef]

**26. **G. Kim and R. Menon, “Computational imaging enables a “see-through” lens-less camera,” Opt. Express **26**(18), 22826–22836 (2018). [CrossRef]

**27. **H. Zhou, Y. Chen, H. Feng, G. Lv, Z. Xu, and Q. Li, “Rotated rectangular aperture imaging through multi-frame blind deconvolution with Hyper-Laplacian priors,” Opt. Express **29**(8), 12145–12159 (2021). [CrossRef]

**28. **L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D **60**(1-4), 259–268 (1992). [CrossRef]

**29. **J. Wu, L. Cao, and G. Barbastathis, “DNN-FZA camera: a deep learning approach toward broadband FZA lensless imaging,” Opt. Letters **46**(1), 130–133 (2021). [CrossRef]

**30. **S. S. Khan, V. R. Adarsh, V. Boominathan, J. Tan, A. Veeraraghavan, and K. Mitra, “Towards photorealistic reconstruction of highly multiplexed lensless images,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2019), pp. 7860–7869.

**31. **D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454.

**32. **S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Trans. Comput. Imaging. **3**(1), 84–98 (2017). [CrossRef]

**33. **A. M. Teodoro, J. M. Bioucas-Dias, and M. A. Figueiredo, “Image restoration and reconstruction using variable splitting and class-adapted image priors,” in the International Conference on Image Processing (IEEE, 2016), pp. 3518–3522.

**34. **Y. Sun, S. Xu, Y. Li, L. Tian, B. Wohlberg, and U. S. Kamilov, “Regularized fourier ptychography using an online plug-and-play algorithm,” in ICASSP International Conference on Acoustics, Speech and Signal Processing (IEEE, 2019), pp. 7665–7669.

**35. **T. Tirer and R. Giryes, “Image restoration by iterative denoising and backward projections,” IEEE Trans. Image Proc. **28**(3), 1220–1234 (2019). [CrossRef]

**36. **S. Bigdeli, D. Honzátko, S. Süsstrunk, and L. A. Dunbar, “Image restoration using plug-and-play cnn map denoisers,” arXiv:1912.09299 (2019).

**37. **K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser prior for image restoration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 3929–3938. [CrossRef]

**38. **K. Zhang, Y. Li, W. Zuo, L. Zhang, L. V. Gool, and R. Timofte, “Plug-and-play image restoration with deep denoiser prior,” arXiv:2008.13751 (2020).

**39. **W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network for image restoration,” IEEE Trans. Pattern Anal. Mach. Intell. **41**(10), 2305–2318 (2019). [CrossRef]

**40. **K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny, and L. Waller, “Learned reconstructions for practical mask-based lensless imaging,” Opt. Express **27**(20), 28075–28090 (2019). [CrossRef]

**41. **D. S. Jeon, S. H. Baek, S. Yi, Q. Fu, X. Dun, W. Heidrich, and M. H. Kim, “Compact snapshot hyperspectral imaging with diffracted rotation,” ACM Trans. Graph **38**(4), 1–13 (2019). [CrossRef]

**42. **H. Zhou, H. Feng, Z. Hu, Z. Xu, Q. Li, and Y. Chen, “Lensless cameras using a mask based on almost perfect sequence through deep learning,” Opt. Express **28**(20), 30248–30262 (2020). [CrossRef]

**43. **A. N. Tikhonov, “Solution of incorrectly formulated problems and the regularization method,” Soviet Math. **4**, 1035–1038 (1963).

**44. **O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention (2015), pp. 234–241.

**45. **K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

**46. **H. Wang, T. Zhang, M. Yu, J. Sun, W. Ye, C. Wang, and S. Zhang, “Stacking networks dynamically for image restoration based on the Plug-and-Play framework,” in European Conference on Computer Vision (2020), pp. 446–462.

**47. **E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017), pp. 126–135. [CrossRef]

**48. **D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision (ICCV, 2001), pp. 416–423. [CrossRef]

**49. **A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” inConference on Computer Vision and Pattern Recognition (IEEE, 2009), pp. 1964–1971.

**50. **A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci **2**(1), 183–202 (2009). [CrossRef]

**51. **S. S. Khan, V. Sundar, V. Boominathan, A. Veeraraghavan, and K. Mitra, “FlatNet: Towards photorealistic scene reconstruction from lensless measurements,” IEEE Trans. on Pattern Analysis and Machine Intelligence (to be published).

**52. **O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, and F. Li, “Imagenet large scale visual recognition challenge,” Int J Comput Vis **115**(3), 211–252 (2015). [CrossRef]

**53. **D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in International Conference on Computer Vision (IEEE, 2011), pp. 479–486.