Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Deep denoiser prior based deep analytic network for lensless image restoration

Open Access Open Access

Abstract

Mask based lensless imagers have huge application prospects due to their ultra-thin body. However, the visual perception of the restored images is poor due to the ill conditioned nature of the system. In this work, we proposed a deep analytic network by imitating the traditional optimization process as an end-to-end network. Our network combines analytic updates with a deep denoiser prior to progressively improve lensless image quality over a few iterations. The convergence is proven mathematically and verified in the results. In addition, our method is universal in non-blind restoration. We detailed the solution for the general inverse problem and conducted five groups of deblurring experiments as examples. Both experimental results demonstrate that our method achieves superior performance against the existing state-of-the-art methods.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Computational imaging has developed rapidly in recent years. Unlike traditional optical imaging based on refractive lens, computational imaging system does not directly realize the point-to-point mapping from the object to the image, but makes a kind of coding for the object, and then restores the image from the encoded pattern. Although it complicates the imaging process, the weight and volume of the imaging system are greatly reduced. In the past few years, various miniaturized lensless imaging systems have been proposed. Asif et al. [1,2] proposed the FlatCam lensless camera that only contains an amplitude mask in front of the CMOS sensor. It not only has an eye-catching ultra-thin body, but also can achieve a variety of functions such as 3D imaging [3], 3D fluorescence microscopy [4]. And the FlatCam was used in face recognition [5] and privacy protection [6]. Based on the similar principle, Deweert et al. [7] used a spatial light modulator to implement a programmable mask. His experiments showed that the result of multiple masks is better than that of single mask.

In addition to the above amplitude masks, phase masks are more widely used lensless imaging [8]. Boominathan et al. proposed a phase-encoded version of FlatCam called the PhlatCam [9], which only contains a phase mask. Compared with amplitude encoding, phase encoding can better regulate the PSF, so as to obtain better imaging results. Other phase devices commonly used for lensless imaging include Fresnel zone aperture [10,11], phase gratings [12], diffraction lenses [13,14], and diffuser [15]. The first three devices are controllable phase masks, while diffuser is a pseudo-random phase mask. Cao et al. [16] used Fresnel zone aperture for wave-front coding and achieved imaging under incoherent illumination. Gill et al. [17] designed a Planar Fourier Capture Array that consists of an array of angle-sensitive pixels. Every pixel is composed of a photodiode under two metal gratings and reports one component of a spatial two-dimensional Fourier transform of the local light field. Gill and Stork [18,19] designed the odd-symmetry spiral phase gratings and used them for lensless imaging. Imaging based on the principle of diffraction has gained popularity in recent years. Peng et al. [20] conducted multiple diffractive experiments such as pure phase plate, stacked phase plates, phase plate and lens. This work provided an alternative candidate to build light efficient and thin optics for white light imaging. To address the chromatic aberration of diffraction, Peng et al. [21] designed an optimized diffraction lens that makes the PSF of each band the same. By jointly optimizing the parameters of the optical device and the reconstruction algorithm, Sitzmann et al. [22] proposed an end-to-end snapshot super-resolution imaging scheme. Diffuser is a random phase mask with optical memory effect. Antipa et al. [23] proposed the DiffuserCam that is a lensless imaging system based on diffuser. Similar to the FlatCam, the DiffuserCam also has a microscopic imaging function [24]. After adding a spectral filter array in front of the sensor, DiffuserCam can perform spectral imaging [25]. Kim et al. [26] proposed a “see-through” lensless camera that the sensor was placed at on the side of the glass. In this system, the glass on the side plays the role of a random phase encoder, and the object is encoded after being refracted.

Due to the ill-conditioned nature of the inverse problem, image restoration (IR) is a key step in lensless imaging. A common approach is to establish the cost function by maximizing a posterior probability (MAP) $P({x|y} )$:

$$\hat{x} = \mathop {\arg \min }\limits_x \textrm{ } - \log P({y|x} )- \log P(x )$$
where y and x represent the measurement and the latent image, respectively. $\log P({y|x} )$ denotes the log-likelihood of the measurement and $\log P(x )$ represents the prior information of the latent image. The traditional model-based methods use the principle of optimization to output the reconstructed images after multiple iterations [1,9]. Generally, the qualities of the output images from these methods are not satisfactory because the latent images do not fully conform to the hand-picked image prior such as Hyper-Laplacian prior [27], Total Variation prior [28] etc. Fortunately, deep learning provides a new path for high-quality lensless image reconstruction [29,30]. Further, recent study has shown that deep convolutional neural networks (DCNN) not only has a strong nonlinear fitting ability, but also can be used as a natural image prior [31]. Therefore, the unrolled network originated from unrolled optimization was proposed, that can combine traditional optimization with deep convolutional neural networks to explore the advantages of the both. Unrolled optimization decouples decoding and denoising and transfers the pressure of solving the inverse problem to the denoising algorithm that has many successful solutions at present. Through half-quadratic splitting (HQS), the original optimization equation is changed into the following form:
$$\begin{aligned} {{\hat{x}}_k} &= {\mathop {\arg \min }\limits_x \textrm{ } - \log P({y|x} )+ {\mu _k}{{||{x - {{\hat{z}}_{k - 1}}} ||}_2}^2}\\ {{\hat{z}}_k} &= {\mathop {\arg \min }\limits_z \textrm{ }{\mu _k}{{||{{{\hat{x}}_k} - z} ||}_2}^2 - \log P(z )} \end{aligned}$$
where $\mu $ is the penalty parameter. The former equation usually is a convex equation, while the latter can be treat as a denoising problem that can avoid the explicit expression of priors. This type of methods has long been used in various image reconstruction tasks such as single image super resolution [32], deblurring [33], Fourier ptychographic microscopy reconstruction [34]. In addition to the traditional denoisers based on various assumed priors, DCNN denoisers are also widely used in image restoration [35,36] under the framework of unrolled optimization and leads to state-of-the-art results [37,38]. Dong et al. proposed an end-to-end unrolled network for low-level image reconstruction [39]. This network shows that even with only a few iterations under an end-to-end architecture, good results can still be achieved. In terms of lensless computational imaging, Le-ADMM-U Net was proposed to restore images of the DiffuserCam [40]. Jeon et al. [41] proposed an end-to-end network to complete the hyperspectral image reconstruction. Although DCNN denoiser prior has been used for computational imaging, there is still room for analysis and improvement. On the one hand, the related works [39,41] mainly combine DCNN with gradient descent. Gradient descent usually requires multiple iterations because the solution is not accurate, while the iterations are limited by time and space costs of DCNN denoiser. Given that the first convex problem in Eq. (2) is usually easy to solve, we can use analytic solution directly. When the number of iterations is the same, our method is more effective than gradient descent under unrolled framework. On the other hand, the combination of analytic solution and DCNN has not been analyzed in detail, and there is still room for proving convergence.

In this paper, we proposed a deep denoiser prior based deep analytic network (DPDAN) for FlatCam lensless imaging. The forward imaging link of the lensless system is shown as Fig. 1(a), and the backward image restoration is shown as Fig. 1(b). Our entire end-to-end network can be unfolded into a few blocks, each of which contains an analytical update function and the subsequent DCNN. The DCNN in our network is served as a denoiser in the form of implicit priors rather than the whole inverse operation, which makes network functions easier to implement. In particular, we solve the convex problem in the form of the analytic solution. Comparative experiments show that the results analytic updates are better than those of gradient descent. Furthermore, starting from the strong convexity of the sub-problem, we prove mathematically that the iterations in our deep analytic network is convergent (see section 3.3). And the iteration convergence is supported by real shooting reconstruction results. To the best of our knowledge, this is the first time that the combination of analytic solution and DCNN has been analyzed in detail. Benefit by fast analytic updates and the powerful expression capability of DCNN, our DPDAN has achieved the best FlatCam restoration results in the published literature at present.

 figure: Fig. 1.

Fig. 1. Overview of the FlatCam imaging. (a) System equipment and forward imaging process. After the object is encoded by the mask, a diffuse pattern is formed on the sensor. (b) Reconstruction pipeline of our method. The left is the pre-processing operation, including system calibration to obtain the transfer matrix and singular value decomposition of the transfer matrix to obtain the analytic solution. In the middle is the main body of our network. After a few iterations, the network outputs the reconstruction result as shown on the right.

Download Full Size | PDF

In fact, our method is a universal non-blind restoration method. We have discussed the solutions for the general inverse problems (see section 3.2), and conducted five groups of non-blind deblurring experiments as examples. These experiments demonstrate that our method has achieved the state-of-the-art results and is superior to similar methods in terms of visual effects and objective evaluations.

2. Imaging model

In this paper, we proposed a restoration method for FlatCam lensless imaging based on unrolled network. FlatCam is an ultrathin lensless imaging system (see Fig. 1) that only contained an amplitude mask placed at a submillimeter distance in front of the sensor [13]. In principle, FlatCam can be approximated as a code aperture system. The light from each point in the scene contributes an enlarged version of the mask pattern to the sensor. In order to reduce the parameters of the system transfer matrix, FlatCam used a rank 1 mask pattern that usually generated from a pseudo-random sequence such as maximum length sequence [1], almost perfect sequence [42]. By encoding the rows and columns separately, the imaging model can be written as:

$$y = {\Phi _L}x{\Phi _R}^T + n$$
where y is the sensor measurement, ${\Phi _L},\textrm{ }{\Phi _R}$ are the system transfer matrixes, x and n represent the latent image and the additive noise, respectively. By projecting a set of designed horizontal and vertical stripes on the screen, we can calibrate the system transfer matrix [1]. After obtaining the system transfer matrixes, we can establish the cost function according to the priors, and then use the optimization algorithm to restore the latent image from the measurement.

3. Solutions

Whether it is low-level image processing or computational imaging, our goal is to recover latent clear images from sensor measurements. Unfortunately, we can hardly perform reverse restoration directly due to the ill nature of the system. A common solution is minimize the value of the cost function, which can be called a prior engineering. The former in the Eq. (1) is derived from the prior of noise, and the latter is derived from the prior of image. In this paper, we adopt the DCNN denoiser prior in the restoration method. The following details the specific solutions of our method for FlatCam reconstruction and general inverse problems.

3.1 FlatCam reconstruction

Through variable separation, the cost function is written as Eq. (2). The additive noise is generally assumed to be independent and identically distributed Gaussian noise, resulting in logarithmic likelihood being Frobenius norm. Therefore, the optimization equation can be rewritten as follows:

$$\begin{aligned} {x_k} &= {\mathop {\arg \min }\limits_x \textrm{ }{{||{y - {\Phi _L}x{\Phi _R}^T} ||}_2}^2 + {\mu _k}{{||{x - {z_{k - 1}}} ||}_2}^2}\\ {z_k} &= {\mathop {\arg \min }\limits_z \textrm{ }\frac{{{\mu _k}}}{2}{{||{{x_k} - z} ||}_2}^2 + {\lambda _k}\phi (z )} \end{aligned}$$
where ${\lambda _k}$ is the regular coefficient and $\phi (z )$ is the regularization term related to the prior of latent image. The restored image can be obtained by solving the above two equations alternately. In this paper, the process of alternate solution is mimicked as an end-to-end network under unrolled framework. With the powerful expression ability of DCNN, high-quality restored images can be output after several steps of iteration. Figure 2 shows the diagram of our end-to-end network. The detailed reconstruction process is described as below.

 figure: Fig. 2.

Fig. 2. Diagram of our end-to-end DPDAN. (a) The architecture of the proposed denoiser prior based deep analytic network that is composed of initial estimate and a few iterations. Each iteration first applies an analytic function and then uses a DCNN denoiser to improve image quality. (b) The structure of the DCNN denoiser used in our network.

Download Full Size | PDF

Before reconstruction, we perform singular value decomposition (SVD) on the transfer matrixes obtained by calibration, $[{{u_L},{s_L},{v_L}^T} ]= {\textrm{SVD}} ({\Phi _L}),\textrm{ }[{{u_R},{s_R},{v_R}^T} ]= {\textrm{SVD}} ({\Phi _R})$. This step is used for subsequent analytic updates. As soon as our network is fed a sensor measurement, initial estimation is performed. In the FlatCam reconstruction, Tikhonov regularization [43] is used for initial estimation:

$${x_0} = \mathop {\arg \min }\limits_x \textrm{ }{||{y - {\Phi _L}x{\Phi _R}^T} ||_2}^2 + {\lambda _0}{||x ||_2}^2$$

The above equation can be easily solved. ${z_0}$ is initialized to ${x_0}$. After that, the initial estimation and measurement are sent into the iteration. In each iteration, we first applies an analytic function and then uses a DCNN denoiser to improve image quality. By deriving x and setting the gradient equal to zero, the following formula can be obtained:

$${\Phi _L}^T{\Phi _L}{x_k}{\Phi _R}^T{\Phi _R} + {\mu _k}{x_k} = {\Phi _L}^Ty{\Phi _R} + {\mu _k}{z_{k - 1}}$$

The analytic solution is achieved by replacing ${\Phi _L}$ and ${\Phi _R}$ with their SVD decompositions,

$${x_k} = {v_L}[{({{s_L}^T{u_L}^Ty{u_R}{s_R} + {\mu_k}{v_L}^T{z_{k - 1}}{v_R}} )./({{\sigma_L}{\sigma_R}^T + {\mu_k}{{11}^T}} )} ]{v_R}^T$$

Here, ./ represents the elementwise division of the matrixes, ${\sigma _L} = diag({{s_L}^2} )$, ${\sigma _R} = diag({{s_R}^2} )$. $diag$ means to take diagonal elements of the matrix. After the analytical updates, a DCNN denoiser the intermediate image in the form of an implicit prior. The structure of DCNN is illustrated in Fig. 2(b). As we can see, it is a four-scale U-Net [44] for learning residuals. In each scale of the encoder, there is a feature extraction block which sandwiched a res-block [45] between two convolutional layers. The stride of the second convolutional layer is two to down-sampling the feature maps. The number of channels on the four scales is $({32,64,\textrm{128,}256} )$. And the bottleneck of the DCNN denoiser is a convolutional layer with $512$ channels. In each scale of the decoder, there is an up-sampling layer, a convolutional layer for down-sampling the channels, and a feature extraction block same as that of the encoder. Finally, a three-channel convolutional layer restored image to the original color space. The kernel size of all convolutional layer is $3 \times 3$ except the down-sampling channel layer of the decoder which is $1 \times 1$.

To show the superiority of the analytic solution in the unrolled framework, we take gradient descent to solve the sub-problem and keep the rest unchanged in each iteration. With the gradient descent, the update of ${x_k}$ is shown as follows:

$${x_k} = {x_{k - 1}} + {\varepsilon _k} \cdot ({{\Phi _L}^T{\Phi _L}{x_{k - 1}}{\Phi _R}^T{\Phi _R} - {\Phi _L}^Ty{\Phi _R} + {\mu_k}{x_{k - 1}} - {\mu_k}{z_{k - 1}}} )$$
where ${\varepsilon _k}$ represents the step size and is initialized to a variable of $0.1$.

3.2 General inverse problem

In addition to FlatCam reconstruction, our end-to-end network is also applicable to general inverse problem. In this section, we discussed our solution for general inverse problem in details.

Under the Gaussian noise model, the optimization equation for general inverse problem is shown as follows after the separation of variables:

$$\begin{aligned} {x_k} &= {\mathop {\arg \min }\limits_x \textrm{ }{{||{y - Ax} ||}_2}^2 + {\mu _k}{{||{x - {z_{k - 1}}} ||}_2}^2}\\ {z_k} &= {\mathop {\arg \min }\limits_z \textrm{ }\frac{{{\mu _k}}}{2}{{||{{x_k} - z} ||}_2}^2 + {\lambda _k}\phi (z )} \end{aligned}$$
where $A$ is the degradation matrix. The analytic solution of the first convex equation is
$${x_k} = {({{A^T}A + {\mu_k}I} )^{ - 1}}({{A^T}y + {\mu_k}{z_{k - 1}}} )$$

It is obviously that matrix inversion is a large obstacle to the path to analytic solution because of its computational complexity. Inspired by FlatCam restoration, singular value decomposition of $A$ is used to reduce the computational complexity of matrix inversion because the inverses of a diagonal matrix and a unitary matrix are easy to obtain. With SVD, the analytic solution of the first convex equation can be rewritten in the following form:

$${x_k} = {v_A}{({{s_A}^T{s_A} + {\mu_k}I} )^{ - 1}}{v_A}^T({{v_A}{s_A}^T{u_A}^Ty + {\mu_k}{z_{k - 1}}} )$$
where $[{{u_A},{s_A},{v_A}^T} ]= {\textrm{SVD}} (A)$, and I is the identity matrix. In actual operations, we perform SVD in advance and store these three matrixes. Therefore, the updates of the analytic solution can be calculated quickly and efficiently. For the second equation in Eq. (9), we take the DCNN denoiser to solve it, as the same as FlatCam reconstruction.

From a macro perspective, the proposed deep denoiser prior based deep analytic network is a stacked denoising network that interspersed with several matrix operations. By analyzing the gradient, we prove that the analytic update operations do not introduce singularities and gradient explosions during the backward propagation. By vectorizing the matrix, Eq. (11) can be written in the following form:

$$\begin{aligned} {\textrm{vec}} ({{x_k}} )&= I \otimes {v_A}[{I \otimes \psi ({({I \otimes {v_A}^T} ){\textrm{vec}} ({{z_{k - 1}}} )} )} ]\\ &= ({I \otimes {v_A}} )({I \otimes \psi } )({I \otimes {v_A}^T} ){\textrm{vec}} ({{z_{k - 1}}} )\\ &= I \otimes ({{v_A}\psi {v_A}^T} ){\textrm{vec}} ({{z_{k - 1}}} )\end{aligned}$$
where ${\otimes} $ is the Kronecker product, ${\textrm{vec}} (m )$ represents vectorization in terms of the columns of the matrix m, and $\psi $ is a diagonal matrix and is equal to ${\mu _k}{({{s_A}^T{s_A} + {\mu_k}I} )^{ - 1}}$. In writing Eq. (11) in the form of Eq. (12), we omitted the term related to y because it does not affect the derivative of $x$ to $z$. Therefore, the Jacobian matrix of x with respect to $z$ is $I \otimes ({{v_A}\psi {v_A}^T} )$. Since the latter is a real symmetric matrix, the Jacobian matrix is also real symmetric. According to $rank({P \otimes Q} )= rank(P )rank(Q )$, $({I \otimes {v_A}} )$ must be full rank. According to the Kronecker product’s property in which ${({P \otimes Q} )^{ - 1}} = {P^{ - 1}} \otimes {Q^{ - 1}}$, we can obtain
$${({I \otimes {v_A}} )^{ - 1}} = {I^{ - 1}} \otimes {v_A}^{ - 1} = {I^T} \otimes {v_A}^T = {({I \otimes {v_A}} )^T}$$

Thus, $({I \otimes {v_A}} )$ is a unitary orthogonal matrix that is norm preserving. Similarly, $({I \otimes {v_A}^T} )$ is also a unitary orthogonal matrix. By combining the eigenvalue decomposition properties of the matrix, we know that the second row of Eq. (12) is the eigenvalue decomposition of the Jacobian matrix. All elements in $\psi $ are in $({0,1} )$. In summary, the Jacobian matrix is a full rank matrix and its eigenvalues are all between 0 and 1. In conclusion, the analytic solution updating step does not introduce singularity and gradient explosion during the process of updating the network parameters by back propagation. We have shown in [42], the analytic updates of the FlatCam model $y = {\Phi _L}x{\Phi _R}^T$ also do not introduce singularity and gradient explosion. The above analysis lays a theoretical foundation for our end-to-end network to successfully complete the training process.

To demonstrate the powerful reconstruction ability of our method for general inverse problems, we take deblurring as the example and do multiple groups of non-blind deblurring experiments. In deblurring, the system degradation matrix $A$ is a huge sparse matrix. It is not wise to solve the equation in matrix form. The equation is usually written in the form of convolution:

$${x_k} = \mathop {\arg \min }\limits_x ||{y - psf \ast x} ||_2^2 + {\mu _k}||{x - {z_{k - 1}}} ||_2^2$$
where ${\ast} $ denotes two-dimensional convolution operation and $psf$ is the point spread function of the system. The analytic solution is shown below:
$${x_k} = {{{\cal F}}^{ - 1}}\left\{ {\frac{{{\mu_k}{{\cal F}}({{z_{k - 1}}} )+ \overline {{{\cal F}}({psf} )} {{\cal F}}(y )}}{{\overline {{{\cal F}}({psf} )} {{\cal F}}({psf} )+ {\mu_k}}}} \right\}$$
where ${{\cal F}}$ and ${{{\cal F}}^{ - 1}}$ represent the fast Fourier transform (FFT) and the inverse FFT, respectively. $\overline m$ represents the complex conjugate of matrix m. The results are shown in the latter part.

3.3 Convergence analysis

In this section, we focus on the convergence of our network. In our network, the total cost function is as follows:

$$\varphi ({x,z} )= \frac{1}{2}{||{y - {\Phi _L}x{\Phi _R}^T} ||_2}^2 + \frac{\mu }{2}{||{x - z} ||_2}^2 + \lambda \phi (z )$$

The total cost function satisfies lower bounded and coercive by default. And the coercive means $\varphi ({x,z} )\to \infty $ whenever $||{({x,z} )} ||\to \infty $, that ensures sequence $({{x_k},{z_k}} )$ is bounded. When dealing with general inverse problems, ${\Phi _L} = A,\textrm{ }{\Phi _R} = I$. Under the following assumption, our method has a subsequence that converge to the stationary point $({{x^ \ast },{z^ \ast }} )$ of the cost function,

$$\varphi ({x,z} )- \varphi ({x,D(x )} )\ge {c_2}{||{{{\tilde{\nabla }}_z}\varphi ({x,z} )} ||_2}^2$$
where $D({\cdot} )$ is the DCNN denoiser, ${c_2}$ is a positive constant, and ${\tilde{\nabla }_z}\varphi ({x,z} )$ is a continuous limiting sub-gradient of $\varphi $. This assumption is a reasonable and widely used condition [39,46]. The detailed proof is as follows.

First of all, we prove that $\varphi $ is a strongly convex function of $x$ with the convexity parameter is $\mu $. According to the properties of the matrix trace $tr({m{m^T}} )\ge 0$ and $tr({PQ} )= tr({QP} )$,

$$\begin{aligned} &\left\langle { {{\Phi _L}^T{\Phi _L}({{x_1} - {x_2}} ){\Phi _R}^T{\Phi _R},\textrm{ }{x_1} - {x_2}} \rangle } \right.\\ &= tr({{\Phi _R}^T{\Phi _R}{{({{x_1} - {x_2}} )}^T}{\Phi _L}^T{\Phi _L}({{x_1} - {x_2}} )} )\\ &= tr({{{[{{\Phi _L}({{x_1} - {x_2}} ){\Phi _R}^T} ]}^T}[{{\Phi _L}({{x_1} - {x_2}} ){\Phi _R}^T} ]} )\ge 0 \end{aligned}$$

Therefore, the following inner product is satisfied:

$$\left\langle {{\nabla_x}\varphi ({{x_1},z} )- {\nabla_x}\varphi ({{x_2},z} ),\textrm{ }{x_1} - {x_2}} \right\rangle \ge \mu {||{{x_1} - {x_2}} ||_2}^2$$

The above formula proves strong convexity. Under the background of strongly convex, we can obtain that $\varphi ({x,z} )> \varphi ({{x^ \ast },z} )+ \frac{\mu }{2}{||{x - {x^ \ast }} ||_2}^2,\textrm{ }\forall x \ne {x^ \ast }$. Therefore, the analytic update satisfies the descending property:

$$\varphi ({{x_k},{z_k}} )- \varphi ({{x_{k + 1}},{z_k}} )> {c_1}{||{{x_k} - {x_{k + 1}}} ||_2}^2$$
where ${c_1}: = {{{\mu _{k + 1}}} / 2} > 0$. By assumption, the denoising step also satisfies the descending:
$$\varphi ({{x_{k + 1}},{z_k}} )- \varphi ({{x_{k + 1}},{z_{k + 1}}} )\ge {c_2}{||{{{\tilde{\nabla }}_z}\varphi ({{x_{k + 1}},{z_k}} )} ||_2}^2$$

Thus, the iteration in our network is descent,

$$\varphi ({{x_k},{z_k}} )- \varphi ({{x_{k + 1}},{z_{k + 1}}} )> {c_1}{||{{x_k} - {x_{k + 1}}} ||_2}^2 + {c_2}{||{{{\tilde{\nabla }}_z}\varphi ({{x_{k + 1}},{z_k}} )} ||_2}^2$$

Since $\varphi $ is monotonically decreasing, coercive, and lower bounded, there is a convergent subsequence ${({{x_{{k_t}}},{z_{{k_t}}}} )_{t = 0,1, \cdots }} \to ({{x^ \ast },{z^ \ast }} )$. By summing Eq. (20) and Eq. (21) along $k$, we can get $\sum\limits_k {{{||{{x_k} - {x_{k + 1}}} ||}_2}^2} < \infty$ and $\sum\limits_k {{{||{{{\tilde{\nabla }}_z}\varphi ({{x_{k + 1}},{z_k}} )} ||}_2}^2} < \infty$. Therefore,

$$\mathop {\lim }\limits_{k \to \infty } {||{{x_k} - {x_{k + 1}}} ||_2}^2 = 0$$
$$\mathop {\lim }\limits_{k \to \infty } {||{{{\tilde{\nabla }}_z}\varphi ({{x_{k + 1}},{z_k}} )} ||_2}^2 = 0$$

From Eq. (23), we can also get $\mathop {\lim }\limits_{k \to \infty } {||{{z_{k - 1}} - {z_k}} ||_2}^2 = 0$. Since ${\nabla _z}\varphi ({x,z} )$ is continuous along $x$, thus $\mathop {\lim }\limits_{k \to \infty } {||{{{\tilde{\nabla }}_z}\varphi ({{x_k},{z_k}} )} ||_2}^2 = 0$. The property of analytic solution guarantees ${\nabla _x}\varphi ({{x_{k + 1}},{z_k}} )= 0$. Similarly, ${\nabla _x}\varphi ({x,z} )$ is continuous along $x$, therefore $\mathop {\lim }\limits_{k \to \infty } {\nabla _x}\varphi ({{x_k},{z_k}} )= 0$. To sum up, our method is gradually convergent. And the results of actual experiments confirm the convergence.

4. Implementation details

For FlatCam reconstruction, we built an experimental device (see Fig. 3) and photographed a dataset to train our end-to-end network. This dataset contains 10000 data pairs and each data pair consists of a $2048 \times 2048 \times 3$ measurement and a $512 \times 512 \times 3$ ground truth. In our network, the number of iterations $k = 6$. ${\lambda _0}$ that used for initial estimation is equal to $0.0005$. The penalty parameter ${\mu _k}$ is fixed at $\eta \cdot {2^{k - 1}}$. In our experiments, $\eta $ is taken between $[{0.1,\textrm{ }1} ]$.

 figure: Fig. 3.

Fig. 3. FlatCam experimental device and system parameters.

Download Full Size | PDF

For deblurring experiment, we cropped $78156$ patches $({128 \times 128} )$ from DIV2K dataset [47] and BSD400 dataset [48]. Then, the dataset is synthesized by using blur kernel convolve the clear image and adding noise with a standard deviation of ${\sigma _n}$. Two motion kernels [49] and a Gaussian blur kernel is used. In the deblurring network, the initial estimation ${x_0} = {A^T}y$, and ${\mu _k} = \eta \cdot {2^{k - 1}}$. When ${\sigma _n} = 2.0$ or ${\sigma _n} = 2.55$, $\eta $ is set to $0.01$. When ${\sigma _n} = 7.65$, $\eta = 0.02$.

In this paper, reconstruction tasks are implemented by TensorFlow, and the loss function is the mean squared error between network output and the ground truth. During training, learning rate is initialized to $0.0001$, and then halved every 20 epochs for a total of 100 epochs. The ADAM optimizer is used to update network parameters. And the gradients are clipped by norm with coefficient $6$. All networks are operated on a Linux server that has 64GB of memory and four NVIDIA GTX 1080Ti cards.

5. Results

This section shows our restoration results in detail. The first part is the results of FlatCam, the second part is the results of non-blind de-blur. PSNR and SSIM are used for quantitative evaluation.

5.1 FlatCam imaging

To demonstrate the beneficial effects of our method, we compare our network results against those of existing methods, including LAsNet [42], Tikhonov regularization [43], FISTA reconstruction with $L1$ regularization [50], and FlatNet [51]. Several test images are from ImageNet [52] and most test images are from valid set of DIV2K. The results of ImageNet images are shown in Fig. 4. It is evident that our results are significantly better than those of other methods. The enlarged blur blocks show that our results are clearer and contain less noise. The contents in the yellow boxes indicate that our method can restore the details better. Our method also scored higher on objective evaluation. Overlooking the whole image, we can find that the color distortion of our reconstructed images is much smaller than that of FlatNet. Compared to FlatNet, our method has fewer parameters and FLOPs, but more matrix operations. The total parameters, total FLOPs, and run times are summarized in Table 1.

 figure: Fig. 4.

Fig. 4. Reconstructed images using various methods. The quantitative evaluation values are at the bottom of each image. (a) The ground truth images for reference. (b) Results of Tikhonov regularization. (c) Results of FISTA reconstruction. (d) Results of the FlatNet. (e) Our results.

Download Full Size | PDF

Tables Icon

Table 1. Parameter and FLOPs of the FlatNet and our method.

Our network also performed well on the DIV2K valid set. The 100 original images in DIV2K valid set are scaled to $512 \times 512$, and then measured by our device (Fig. 3). Compared with LAsNet, our network can recover finer structures and clearer textures (see Fig. 5), which shows the strong expressive ability of our DCNN in the form of implicit priors. To demonstrate the effectiveness of analytic updates, we use gradient descent (denoted as Our-GD) as a contrast. The results of analytical updating have sharper edges (see Fig. 5). To show the robustness of our method, we perform experiments when the forward model is inaccurate as shown in Fig. 6. Our-E5 represents random erasure of 5% of the transfer matrix data, and Our-S5 simulates the stripes shake less than 5 pixels randomly in one direction during calibration. The PSNR dropped by about 0.4dB, and the overall restoration effects are not too bad. The quantitative evaluations of the 100 image are summarized in Table 2. The PSNR and SSIM values are drawn in Fig. 7. The curves show that our method outperforms the existing methods on almost every image.

 figure: Fig. 5.

Fig. 5. Results of images that come from the DIV2K valid set. (a) The ground truth images for reference. (b) Results of FISTA reconstruction. (c) Results of LAsNet. (d) Results of our method with gradient descent. (e) Our results.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. Comparison of results under inaccurate forward model. The PSNR values are at the bottom of each result.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. The detailed quantitative evaluation of 100 images in DIV2K valid set. Four methods are compared. (a) The PSNR values of the 100 results of DIV2K valid set. (b) The SSIM values of the 100 reconstructed results.

Download Full Size | PDF

Tables Icon

Table 2. The average values of PSNR and SSIM of DIV2K valid set.

As analyzed in Section 3.3, our method is gradually convergent. And our experimental results verify the convergence. Two images of DIV2K valid set are chose to demonstrate the convergence property of our method for FlatCam reconstruction, as shown in Fig. 8 and Fig. 9. Among them, the first row is the comparison of the three restoration results. The second and third rows show images of our network at different stages. It can be seen intuitively that the quality of the image gradually improves with the progress of iteration. Figure 8 focuses on the convergence of x and Fig. 9 focuses on $z$.

 figure: Fig. 8.

Fig. 8. Convergence of restored results of the “0829.PNG” of DIV2K valid set. The PSNR values are at the bottom of each result. (a) The ground truth image for reference. (b) Result of FISTA. (c) Result of LAsNet. (d) Our result. (e)-(k) Visual results of our network at different stages. (l) Convergence curve of our restored results.

Download Full Size | PDF

 figure: Fig. 9.

Fig. 9. Convergence of restored results of the “0803.PNG” of DIV2K valid set. The PSNR values are at the bottom of each result. (a) The ground truth image for reference. (b) Result of FISTA. (c) Result of LAsNet. (d) Our result. (e)-(k) Visual results of our network at different stages. (l) Convergence curve of our restored results.

Download Full Size | PDF

5.2 Non-blind deblurring

In this paper, we conduct five groups of non-blind deblurring experiments. And 10 images are used for deblur testing, as shown in Fig. 10. We compare our method with the classical model-based method EPLL [53] and the denoising-based methods IRCNN [37], DPDNN [39]. DPDNN is similar to our network in that it uses gradient descent to solve Eq. (14) while we use analytical solutions. Table 3 summarized the average PSNR values of the 10 restored images by five methods. Ours-GD means to replace analytical update with gradient descent. The first line of Table 3 is five groups of blur experiment scenarios. The detailed PSNR values of each image are shown Fig. 11 in the form of curves. Our method is significantly ahead of the comparison methods in objective evaluation, which confirms from the side that analytical updates is faster than gradient descent under deep unfolding framework. Three groups of images visually show the deblurring effect of different scenarios (see Figs. 1214). The green boxes in Fig. 12 show that our result retains more details such as the lines on the hat and the hairs. And Fig. 13 demonstrate that our result recovers the most object information in the region close to the background. Compared to Our-GD, the enlarged blocks in Fig. 14 illustrate that our result in higher sharpness and contrast. The results of our method are convergent, as shown in (e) section of Figs. 1214.

 figure: Fig. 10.

Fig. 10. The 10 images used for the deblurring test.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. The PSNR (in dB) values of 10 images in five groups of deblurring experiments. (a) Gaussian kernel with standard deviation 1.6, ${\sigma _n} = 2$. (b) $19 \times 19$ motion kernel, ${\sigma _n} = 2.55$. (c) $19 \times 19$ motion kernel, ${\sigma _n} = 7.65$. (d) $17 \times 17$ motion kernel, ${\sigma _n} = 2.55$. (e) $17 \times 17$ motion kernel, ${\sigma _n} = 7.65$.

Download Full Size | PDF

 figure: Fig. 12.

Fig. 12. Deblurring results of Lena256 image under the scenario: $19 \times 19$ motion kernel, ${\sigma _n} = 2.55$. The SSIM values are at the bottom of each result. (a) Ground truth image. PSF is in the upper right corner. (b) Result of EPLL. (c) Result of IRCNN. (d) Our result. (e) Convergence curve of our method.

Download Full Size | PDF

 figure: Fig. 13.

Fig. 13. Deblurring results of Boats image under the scenario: $17 \times 17$ motion kernel, ${\sigma _n} = 2.55$. The SSIM values are at the bottom of each result. (a) Ground truth image. PSF is in the lower left corner. (b) Result of EPLL. (c) Result of IRCNN. (d) Our result. (e) Convergence curve of our method.

Download Full Size | PDF

 figure: Fig. 14.

Fig. 14. Deblurring results of C. Man image under the scenario: $17 \times 17$ motion kernel, ${\sigma _n} = 7.65$. The SSIM values are at the bottom of each result. (a) Ground truth image. PSF is in the upper left corner. (b) Result of IRCNN. (c) Result of Our-GD. (d) Our result. (e) Convergence curve of our method.

Download Full Size | PDF

Tables Icon

Table 3. The average PSNR values of the 10 deblurred images by the test methods.

6. Conclusion

In this paper, we mimicked the traditional optimization process as a deep analytic network for FlatCam image reconstruction, which is more effective than gradient descent under unrolled framework. The real shooting experiments demonstrate that the image quality of our results is superior to that of the existing results in terms of visual perception and quantitative evaluation. Starting from the strong convexity of the sub-problem, we prove mathematically that the iterations in our network is gradually convergent, that is also supported by reconstructed results. In addition, our method is a universal non-blind restoration method. We give the solution to the general inverse problem and prove that the analytical update inserted between the DCNN denoising operations will not introduce singularity and gradient explosion during backward propagation. Five groups of deblurring experiments show the excellent effectiveness of our method for general inverse problem.

Funding

ZJU-Sunny Photonics Innovation Center (#2019-04); Science and Technology on Optical Radiation Laboratory (61424080211); National Natural Science Foundation of China (61975175).

Acknowledgments

We thank Bian meijuan from the facility platform of optical engineering of Zhejiang University for instrument support.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [39], Ref. [47], Ref. [48], and Ref. [52].

References

1. M. S. Asif, A. Ayremlou, A. Sankaranarayanan, A. Veeraraghavan, and R. G. Baraniuk, “Flatcam: Thin, lensless cameras using coded aperture and computation,” IEEE Trans. Comput. Imaging. 3(3), 384–397 (2017). [CrossRef]  

2. V. Boominathan, J. K. Adams, M. S. Asif, R. G. Baraniuk, and A. Veeraraghavan, “Lensless Imaging: A computational renaissance,” IEEE Signal Process. Mag. 33(5), 23–35 (2016). [CrossRef]  

3. M. S. Asif, “Lensless 3D imaging using mask-based cameras,” in Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2018), pp. 6498–6502.

4. J. K. Adams, V. Boominathan, B. W. Avants, D. G. Vercosa, F. Ye, R. G. Baraniuk, J. T. Robinson, and A. Veeraraghavan, “Single-frame 3D fluorescence microscopy with ultraminiature lensless FlatScope,” Sci. Adv. 3(12), e1701548 (2017). [CrossRef]  

5. J. Tan, L. Niu, J. K. Adams, V. Boominathan, J. T. Robinson, R. G. Baraniuk, and A. Veeraraghavan, “Face detection and verification using lensless cameras,” IEEE Trans. Comput. Imaging. 5(2), 180–194 (2019). [CrossRef]  

6. T. N. Canh and H. Nagahara, “Deep compressive sensing for visual privacy protection in FlatCam imaging,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2019), pp. 3978–3986.

7. M. J. DeWeert and B. P. Farm, “Lensless coded-aperture imaging with separable Doubly-Toeplitz masks,” Opt. Eng. 54(2), 023102 (2015). [CrossRef]  

8. W. Chi and N. George, “Optical imaging with phase-coded aperture,” Opt. Express 19(5), 4294–4300 (2011). [CrossRef]  

9. V. Boominathan, J. K. Adams, J. T. Robinson, and A. Veeraraghavan, “PhlatCam: Designed phase-mask based thin lensless camera,” IEEE Trans. Pattern Anal. Mach. Intell. 42(7), 1618–1629 (2020). [CrossRef]  

10. K. Tajima, T. Shimano, Y. Nakamura, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with multi-phased fresnel zone aperture,” in IEEE International Conference on Computational Photography (IEEE, 2017), pp. 1–7.

11. T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with Fresnel zone aperture: quasi-coherent coding,” Appl. Opt. 57(11), 2841–2850 (2018). [CrossRef]  

12. P. R. Gill, “Odd-symmetry phase gratings produce optical nulls uniquely insensitive to wavelength and depth,” Opt. Lett. 38(12), 2074–2076 (2013). [CrossRef]  

13. N. Mohammad, M. Meem, B. Shen, P. Wang, and R. Menon, “Broadband imaging with one planar diffractive lens,” Sci. Rep. 8(1), 2799 (2018). [CrossRef]  

14. P. Wang, N. Mohammad, and R. Menon, “Chromatic-aberration-corrected diffractive lenses for ultra-broadband focusing,” Sci. Rep. 6(1), 1–7 (2016). [CrossRef]  

15. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathiset, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803–813 (2018). [CrossRef]  

16. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, “Single-shot lensless imaging with fresnel zone aperture and incoherent illumination,” Light: Sci. Appl. 9(1), 1–11 (2020). [CrossRef]  

17. P. R. Gill, C. Lee, D. G. Lee, A. Wang, and A. Molnar, “A microscale camera using direct Fourier-domain scene capture,” Opt. Lett 36(15), 2949–2951 (2011). [CrossRef]  

18. D. G. Stork and P. R. Gill, “Optical, mathematical, and computational foundations of lensless ultra-miniature diffractive imagers and sensors,” Intern. J. Adv. Syst. Measure. 7(3), 4 (2014).

19. P. R. Gill and D. G. Stork, “Lensless ultra-miniature imagers using odd-symmetry spiral phase gratings,” in Computational Optical Sensing and Imaging (Optical Society of America, 2013) paper CW4C-3.

20. Y. Peng, Q. Fu, H. Amata, S. Su, F. Heide, and W. Heidrich, “Computational imaging using lightweight diffractive-refractive optics,” Opt. Express 23(24), 31393–31407 (2015). [CrossRef]  

21. Y. Peng, Q. Fu, F. Heide, and W. Heidrich, “The diffractive achromat full spectrum computational imaging with diffractive optics,” In Virtual Reality meets Physical Reality: Modelling and Simulating Virtual Humans and Environments (SIGGRAPH ASIA, 2016), pp. 1–2.

22. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph 37(4), 1–13 (2018). [CrossRef]  

23. N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng, and L. Waller, “DiffuserCam: lensless single-exposure 3D imaging,” Optica 5(1), 1–9 (2018). [CrossRef]  

24. G. Kuo, N. Antipa, R. Ng, and L. Waller, “3D Fluorescence microscopy with diffusercam,” in Computational Optical Sensing and Imaging (Optical Society of America, 2018) paper CM3E-3.

25. K. Monakhova, K. Yanny, N. Aggarwal, and L. Waller, “Spectral DiffuserCam: Lensless snapshot hyperspectral imaging with a spectral filter array,” Optica 7(10), 1298–1307 (2020). [CrossRef]  

26. G. Kim and R. Menon, “Computational imaging enables a “see-through” lens-less camera,” Opt. Express 26(18), 22826–22836 (2018). [CrossRef]  

27. H. Zhou, Y. Chen, H. Feng, G. Lv, Z. Xu, and Q. Li, “Rotated rectangular aperture imaging through multi-frame blind deconvolution with Hyper-Laplacian priors,” Opt. Express 29(8), 12145–12159 (2021). [CrossRef]  

28. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1-4), 259–268 (1992). [CrossRef]  

29. J. Wu, L. Cao, and G. Barbastathis, “DNN-FZA camera: a deep learning approach toward broadband FZA lensless imaging,” Opt. Letters 46(1), 130–133 (2021). [CrossRef]  

30. S. S. Khan, V. R. Adarsh, V. Boominathan, J. Tan, A. Veeraraghavan, and K. Mitra, “Towards photorealistic reconstruction of highly multiplexed lensless images,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2019), pp. 7860–7869.

31. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454.

32. S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Trans. Comput. Imaging. 3(1), 84–98 (2017). [CrossRef]  

33. A. M. Teodoro, J. M. Bioucas-Dias, and M. A. Figueiredo, “Image restoration and reconstruction using variable splitting and class-adapted image priors,” in the International Conference on Image Processing (IEEE, 2016), pp. 3518–3522.

34. Y. Sun, S. Xu, Y. Li, L. Tian, B. Wohlberg, and U. S. Kamilov, “Regularized fourier ptychography using an online plug-and-play algorithm,” in ICASSP International Conference on Acoustics, Speech and Signal Processing (IEEE, 2019), pp. 7665–7669.

35. T. Tirer and R. Giryes, “Image restoration by iterative denoising and backward projections,” IEEE Trans. Image Proc. 28(3), 1220–1234 (2019). [CrossRef]  

36. S. Bigdeli, D. Honzátko, S. Süsstrunk, and L. A. Dunbar, “Image restoration using plug-and-play cnn map denoisers,” arXiv:1912.09299 (2019).

37. K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser prior for image restoration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 3929–3938. [CrossRef]  

38. K. Zhang, Y. Li, W. Zuo, L. Zhang, L. V. Gool, and R. Timofte, “Plug-and-play image restoration with deep denoiser prior,” arXiv:2008.13751 (2020).

39. W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network for image restoration,” IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2305–2318 (2019). [CrossRef]  

40. K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny, and L. Waller, “Learned reconstructions for practical mask-based lensless imaging,” Opt. Express 27(20), 28075–28090 (2019). [CrossRef]  

41. D. S. Jeon, S. H. Baek, S. Yi, Q. Fu, X. Dun, W. Heidrich, and M. H. Kim, “Compact snapshot hyperspectral imaging with diffracted rotation,” ACM Trans. Graph 38(4), 1–13 (2019). [CrossRef]  

42. H. Zhou, H. Feng, Z. Hu, Z. Xu, Q. Li, and Y. Chen, “Lensless cameras using a mask based on almost perfect sequence through deep learning,” Opt. Express 28(20), 30248–30262 (2020). [CrossRef]  

43. A. N. Tikhonov, “Solution of incorrectly formulated problems and the regularization method,” Soviet Math. 4, 1035–1038 (1963).

44. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention (2015), pp. 234–241.

45. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

46. H. Wang, T. Zhang, M. Yu, J. Sun, W. Ye, C. Wang, and S. Zhang, “Stacking networks dynamically for image restoration based on the Plug-and-Play framework,” in European Conference on Computer Vision (2020), pp. 446–462.

47. E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017), pp. 126–135. [CrossRef]  

48. D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision (ICCV, 2001), pp. 416–423. [CrossRef]  

49. A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” inConference on Computer Vision and Pattern Recognition (IEEE, 2009), pp. 1964–1971.

50. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci 2(1), 183–202 (2009). [CrossRef]  

51. S. S. Khan, V. Sundar, V. Boominathan, A. Veeraraghavan, and K. Mitra, “FlatNet: Towards photorealistic scene reconstruction from lensless measurements,” IEEE Trans. on Pattern Analysis and Machine Intelligence (to be published).

52. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, and F. Li, “Imagenet large scale visual recognition challenge,” Int J Comput Vis 115(3), 211–252 (2015). [CrossRef]  

53. D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in International Conference on Computer Vision (IEEE, 2011), pp. 479–486.

Data availability

Data underlying the results presented in this paper are available in Ref. [39], Ref. [47], Ref. [48], and Ref. [52].

39. W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network for image restoration,” IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2305–2318 (2019). [CrossRef]  

47. E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017), pp. 126–135. [CrossRef]  

48. D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision (ICCV, 2001), pp. 416–423. [CrossRef]  

52. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, and F. Li, “Imagenet large scale visual recognition challenge,” Int J Comput Vis 115(3), 211–252 (2015). [CrossRef]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (14)

Fig. 1.
Fig. 1. Overview of the FlatCam imaging. (a) System equipment and forward imaging process. After the object is encoded by the mask, a diffuse pattern is formed on the sensor. (b) Reconstruction pipeline of our method. The left is the pre-processing operation, including system calibration to obtain the transfer matrix and singular value decomposition of the transfer matrix to obtain the analytic solution. In the middle is the main body of our network. After a few iterations, the network outputs the reconstruction result as shown on the right.
Fig. 2.
Fig. 2. Diagram of our end-to-end DPDAN. (a) The architecture of the proposed denoiser prior based deep analytic network that is composed of initial estimate and a few iterations. Each iteration first applies an analytic function and then uses a DCNN denoiser to improve image quality. (b) The structure of the DCNN denoiser used in our network.
Fig. 3.
Fig. 3. FlatCam experimental device and system parameters.
Fig. 4.
Fig. 4. Reconstructed images using various methods. The quantitative evaluation values are at the bottom of each image. (a) The ground truth images for reference. (b) Results of Tikhonov regularization. (c) Results of FISTA reconstruction. (d) Results of the FlatNet. (e) Our results.
Fig. 5.
Fig. 5. Results of images that come from the DIV2K valid set. (a) The ground truth images for reference. (b) Results of FISTA reconstruction. (c) Results of LAsNet. (d) Results of our method with gradient descent. (e) Our results.
Fig. 6.
Fig. 6. Comparison of results under inaccurate forward model. The PSNR values are at the bottom of each result.
Fig. 7.
Fig. 7. The detailed quantitative evaluation of 100 images in DIV2K valid set. Four methods are compared. (a) The PSNR values of the 100 results of DIV2K valid set. (b) The SSIM values of the 100 reconstructed results.
Fig. 8.
Fig. 8. Convergence of restored results of the “0829.PNG” of DIV2K valid set. The PSNR values are at the bottom of each result. (a) The ground truth image for reference. (b) Result of FISTA. (c) Result of LAsNet. (d) Our result. (e)-(k) Visual results of our network at different stages. (l) Convergence curve of our restored results.
Fig. 9.
Fig. 9. Convergence of restored results of the “0803.PNG” of DIV2K valid set. The PSNR values are at the bottom of each result. (a) The ground truth image for reference. (b) Result of FISTA. (c) Result of LAsNet. (d) Our result. (e)-(k) Visual results of our network at different stages. (l) Convergence curve of our restored results.
Fig. 10.
Fig. 10. The 10 images used for the deblurring test.
Fig. 11.
Fig. 11. The PSNR (in dB) values of 10 images in five groups of deblurring experiments. (a) Gaussian kernel with standard deviation 1.6, ${\sigma _n} = 2$. (b) $19 \times 19$ motion kernel, ${\sigma _n} = 2.55$. (c) $19 \times 19$ motion kernel, ${\sigma _n} = 7.65$. (d) $17 \times 17$ motion kernel, ${\sigma _n} = 2.55$. (e) $17 \times 17$ motion kernel, ${\sigma _n} = 7.65$.
Fig. 12.
Fig. 12. Deblurring results of Lena256 image under the scenario: $19 \times 19$ motion kernel, ${\sigma _n} = 2.55$. The SSIM values are at the bottom of each result. (a) Ground truth image. PSF is in the upper right corner. (b) Result of EPLL. (c) Result of IRCNN. (d) Our result. (e) Convergence curve of our method.
Fig. 13.
Fig. 13. Deblurring results of Boats image under the scenario: $17 \times 17$ motion kernel, ${\sigma _n} = 2.55$. The SSIM values are at the bottom of each result. (a) Ground truth image. PSF is in the lower left corner. (b) Result of EPLL. (c) Result of IRCNN. (d) Our result. (e) Convergence curve of our method.
Fig. 14.
Fig. 14. Deblurring results of C. Man image under the scenario: $17 \times 17$ motion kernel, ${\sigma _n} = 7.65$. The SSIM values are at the bottom of each result. (a) Ground truth image. PSF is in the upper left corner. (b) Result of IRCNN. (c) Result of Our-GD. (d) Our result. (e) Convergence curve of our method.

Tables (3)

Tables Icon

Table 1. Parameter and FLOPs of the FlatNet and our method.

Tables Icon

Table 2. The average values of PSNR and SSIM of DIV2K valid set.

Tables Icon

Table 3. The average PSNR values of the 10 deblurred images by the test methods.

Equations (24)

Equations on this page are rendered with MathJax. Learn more.

x ^ = arg min x   log P ( y | x ) log P ( x )
x ^ k = arg min x   log P ( y | x ) + μ k | | x z ^ k 1 | | 2 2 z ^ k = arg min z   μ k | | x ^ k z | | 2 2 log P ( z )
y = Φ L x Φ R T + n
x k = arg min x   | | y Φ L x Φ R T | | 2 2 + μ k | | x z k 1 | | 2 2 z k = arg min z   μ k 2 | | x k z | | 2 2 + λ k ϕ ( z )
x 0 = arg min x   | | y Φ L x Φ R T | | 2 2 + λ 0 | | x | | 2 2
Φ L T Φ L x k Φ R T Φ R + μ k x k = Φ L T y Φ R + μ k z k 1
x k = v L [ ( s L T u L T y u R s R + μ k v L T z k 1 v R ) . / ( σ L σ R T + μ k 11 T ) ] v R T
x k = x k 1 + ε k ( Φ L T Φ L x k 1 Φ R T Φ R Φ L T y Φ R + μ k x k 1 μ k z k 1 )
x k = arg min x   | | y A x | | 2 2 + μ k | | x z k 1 | | 2 2 z k = arg min z   μ k 2 | | x k z | | 2 2 + λ k ϕ ( z )
x k = ( A T A + μ k I ) 1 ( A T y + μ k z k 1 )
x k = v A ( s A T s A + μ k I ) 1 v A T ( v A s A T u A T y + μ k z k 1 )
vec ( x k ) = I v A [ I ψ ( ( I v A T ) vec ( z k 1 ) ) ] = ( I v A ) ( I ψ ) ( I v A T ) vec ( z k 1 ) = I ( v A ψ v A T ) vec ( z k 1 )
( I v A ) 1 = I 1 v A 1 = I T v A T = ( I v A ) T
x k = arg min x | | y p s f x | | 2 2 + μ k | | x z k 1 | | 2 2
x k = F 1 { μ k F ( z k 1 ) + F ( p s f ) ¯ F ( y ) F ( p s f ) ¯ F ( p s f ) + μ k }
φ ( x , z ) = 1 2 | | y Φ L x Φ R T | | 2 2 + μ 2 | | x z | | 2 2 + λ ϕ ( z )
φ ( x , z ) φ ( x , D ( x ) ) c 2 | | ~ z φ ( x , z ) | | 2 2
Φ L T Φ L ( x 1 x 2 ) Φ R T Φ R ,   x 1 x 2 = t r ( Φ R T Φ R ( x 1 x 2 ) T Φ L T Φ L ( x 1 x 2 ) ) = t r ( [ Φ L ( x 1 x 2 ) Φ R T ] T [ Φ L ( x 1 x 2 ) Φ R T ] ) 0
x φ ( x 1 , z ) x φ ( x 2 , z ) ,   x 1 x 2 μ | | x 1 x 2 | | 2 2
φ ( x k , z k ) φ ( x k + 1 , z k ) > c 1 | | x k x k + 1 | | 2 2
φ ( x k + 1 , z k ) φ ( x k + 1 , z k + 1 ) c 2 | | ~ z φ ( x k + 1 , z k ) | | 2 2
φ ( x k , z k ) φ ( x k + 1 , z k + 1 ) > c 1 | | x k x k + 1 | | 2 2 + c 2 | | ~ z φ ( x k + 1 , z k ) | | 2 2
lim k | | x k x k + 1 | | 2 2 = 0
lim k | | ~ z φ ( x k + 1 , z k ) | | 2 2 = 0
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.