Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Autofocusing of Fresnel zone aperture lensless imaging for QR code recognition

Open Access Open Access

Abstract

Fresnel zone aperture (FZA) lensless imaging encodes the incident light into a hologram-like pattern, so that the scene image can be numerically focused at a long imaging range by the back propagation method. However, the target distance is uncertain. The inaccurate distance causes blurs and artifacts in the reconstructed images. This brings difficulties for the target recognition applications, such as quick response code scanning. We propose an autofocusing method for FZA lensless imaging. By incorporating the image sharpness metrics into the back propagation reconstruction process, the method can acquire the desired focusing distance and reconstruct noise-free high-contrast images. By combining the Tamura of the gradient metrics and nuclear norm of gradient, the relative error of estimated object distance is only 0.95% in the experiment. The proposed reconstruction method significantly improves the mean recognition rate of QR code from 4.06% to 90.00%. It paves the way for designing intelligent integrated sensors.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The lens-based imaging system has been developed for hundreds of years. It is now widely used in various occasions, such as photography, microscopy and telescope. As imaging systems get more portable and miniaturized, e. g. mobile phone camera and endoscope, the bulky lenses become the bottleneck restricting the size reduction of imaging systems. Mask-based lensless imaging adopts masks instead of lenses to encode incident light, so that the imaging system could be assembled in a compact structure. Pinhole imaging is the most simplified mask-based lensless imaging. However, the low light throughput limits its applications. The uniform redundant array (URA) [1] and modified URA (MURA) [2] extend the pinhole to the multi-hole mask to enhance the light throughput. To improve the imaging robustness, the separable masks are proposed by employing scalable calibration procedure [35]. To further speed up the reconstruction, learning-based reconstruction algorithms gradually replace iterative reconstruction algorithms [611].

Though learning-based methods allow for faster reconstruction, they usually require a large amount of training data and cannot always guarantee a satisfactory reconstruction quality for various scenarios. Taking advantage of the inference power of deep learning, the mask-based lensless imaging can be extended to the classification and recognition tasks, such as image inference [12], text recognition [13], hand gestures recognition [14] and facial recognition [15]. In these tasks, the target is located at a fixed distance. But in practice, when the target distance is uncertain, inaccurate distance causes blurs and artifacts in the reconstructed images. Therefore, an accurate focusing distance is important for a decent image quality and a high recognition rate.

Fresnel zone aperture (FZA) imaging has attracted research interests as a typical mask-based lensless imaging. It could encode the scene as a hologram-like pattern, so that the original image can be reconstructed by using digital holographic imaging algorithms. The FZA imaging provides calibration-free quasi-coherent coding [1618]. The power of compressive sensing further enables single-shot lensless imaging [19,20]. Similar to the holographic reconstruction, exact focusing distance is required for reconstruction. However, object distance is usually inaccurate or unknown, which brings the difficulty for high-quality imaging and high-accuracy recognition.

Because of the similarity between the digital holography and the FZA imaging, we suggest that the autofocusing method in holography can be easily transplanted into the FZA imaging method. In digital holography, the numerical autofocusing can be applied by the backpropagation method. The original images at each position are back-propagated with a set of estimated distances to generate a Z-space stack. By evaluating the sharpness of the image with certain metrics, the focusing distance corresponding to the optimal value of sharpness can be obtained [2123]. Many metrics have been investigated [2426], such as gradient (GRA), Laplacian gradient (LAP), Sobel gradient, weighted power spectrum (SPEC), squared gradient summation (SG), variance (VAR), sum of modulus of gray difference (SMD) and Tamura of the gradient (TOG). The learning-based methods have been applied to realize fast autofocusing in the holographic imaging [27,28].

In this work, we proposed an autofocusing method with the single-shot FZA lensless imaging. The back propagation (BP) method is used to generate a series of images in Z-space. Then the image sharpness metrics are applied to each image to estimate the focusing distance. After performance analysis, the TOG and nuclear norm of gradient (NoG) have outstanding effect than LAP, VAR SMD and GRA for this work. To get a stable metric, a weighted mean metric between the TOG and NoG, called W-T-N, is proposed for FZA lensless imaging. The measured image and the estimated distance are involved into alternating direction method of multipliers (ADMM) algorithm to further improve the image quality and recognition accuracy. After the decoding test of 10080 QR codes, the proposed method significantly improves the mean recognition rate of QR code from 4.06% to 90.00%.

2. Methodology

2.1 Principle of FZA imaging

In a mask-based lensless imaging framework, each point source from the object casts a unique shadow of the mask onto the sensor. The shadow shifts and scales according to the position of the light source. This property enables mask-based lensless imaging to record depth information [2932]. The pipeline of the proposed autofocusing imaging is shown as Fig. 1. The FZA camera is composed of an image sensor and an adhered Fresnel zone plate. The distance from the FZA to the sensor is d. The object is placed at a distance of z from FZA camera. The object surface diffuses the light, which can be considered as a set of point sources. Each point source in the scene casts an FZA shadow on the sensor plane, so this process is similar to the recording of point source hologram. The shadow center is at the intersection between the extension line connecting the object point and the center of the mask and the sensor plane. The size of the shadow is expanded from the FZA by the magnification factor $1 + {d / z}$. We define that the FZA parameter is r1 which represents the radius of the innermost zone, so the FZA parameter of the shadow is ${r^{\prime}_1} = ({1 + {d / z}} ){r_1}$. The image formed on the sensor is a superposition of shifted and scaled versions of FZA shadows. Since the recorded image is similar to the in-line hologram, the back propagation method can be used to refocus to the recorded image to arbitrary distance. By incorporating the image sharpness metrics into the back propagation reconstruction process, the object distance $\hat{z}$ can be estimated. Finally, a high-quality image can be reconstructed by the ADMM algorithm with the estimated object distance.

 figure: Fig. 1.

Fig. 1. Overview of the autofocusing FZA lensless imaging. The target image is captured by a compact FZA camera. The measured image is processed by the back propagation to generate a series of Z-space images. After the image sharpness valuation, the estimated focusing distance $\hat{z}$ is involved into the ADMM algorithm to get the high-quality reconstruction.

Download Full Size | PDF

According to the continuity of transmittance distribution, the zone plate can be divided into Gabor zone plate (GZP) and FZP. The GZP has continuously varying transmittance, whose transmittance function is

$$T({x,y} )= \frac{1}{2} + \frac{1}{2}\cos \left( {\pi \frac{{{x^2} + {y^2}}}{{r_1^2}}} \right), $$
where x, y are the spatial coordinates. However, GZP is difficult to be fabricated because of its sinusoidal variation transmittance. The FZP is a binary approximation version which generally used in practice. Each zone plate shadow can be considered as a point source hologram that encodes the intensity and the location of the point source. All these elementary holograms synthesize the measurement. Then, the reconstruction can be performed by the BP method. The recording process can be represented as the convolution of the ideal image and zone plate shadow, which can be formulated as
$$I({x,y} )= O({x,y} )\ast T({x,y} )+ e({x,y} ), $$
where * denotes the convolution; O(x, y) is the image to be restored on the sensor plane; e(x, y) is a random term that includes photodetector noise, crosstalk, quantization noise and diffraction artifacts. By decomposing the cosine term of $T({x,y} )$ into $[{h({x,y} )+ {h^\ast }({x,y} )} ]/2$, where $h({x,y} )= \textrm{exp}[{i({\pi /r_1^2} )({{x^2} + {y^2}} )} ]$ and h* is the conjugate of h, Eq. (2) becomes
$$I({x,y} )= C + \frac{1}{2}\textrm{Re}[{U({x,y} )} ]+ e({x,y} ), $$
$$U({x,y} )= {\mathrm{{\cal F}}^{ - 1}}\{{H({u,v} )\circ \mathrm{{\cal F}}\{{O({x,y} )} \}} \}, $$
where $C = \; O({x,y} )\ast ({1/2} )$ is a constant; ${\cal F}$ and ${\cal F}^{-1}$ are the Fourier transform operator and inverse Fourier transform operator, respectively, and H indicates the operator that multiplies by the transfer function $H({u,v} )= i\textrm{exp}[{ - i\pi r_1^2({{u^2} + {v^2}} )} ]$. The u and v are coordinates in the Fourier domain. Ignoring the scaling factor, constant bias and noise, the measured image I can be expressed as a function only related to U, which is the forward transform model:
$$I = \; U + {U^ \ast }. $$

Solving O with a known I and forward transform model is a typical inverse problem. The BP method can be formulated as

$$\hat{O} = {\mathrm{{\cal F}}^{ - 1}}\{{{H^ \ast } \circ \mathrm{{\cal F}}\{I \}} \}= O + {\mathrm{{\cal F}}^{ - 1}}\{{{{({{H^ \ast }} )}^2} \circ \mathrm{{\cal F}}\{I \}} \}. $$

The first term in Eq. (6) is the original image, and the second term is the twin image. The twin image is an inherent problem in holography. For in-line holography, the twin image is a defocused image which is superimposed on the original image. The twin image inevitably comes into being and degrades the image quality by using the BP method. However, the intensity of the twin image gradually diffuses around as the distance z increases, which has less influence on the image’s sharpness evaluation. Moreover, the BP method provides fast computation speed. Therefore, it could be used to rapidly generate Z-space stack for evaluating refocusing distance.

2.2 Image sharpness metrics

Because the FZA imaging has no straight reference image, full reference methods are unable to support autofocusing. No-reference image quality assessment methods would be the ideal setting for autofocusing [33]. Image sharpness can be extracted from basic global statistics in spatial and frequency domains. By evaluating the sharpness of the image under different focusing distance, the estimated object distance can be obtained for subsequent processing. To calculate the image sharpness, a set of sharpness quantification functions have been investigated [34]. The popular five sharpness quantification functions are listed as follows:

$$\textrm{LAP}(k )= \mathop \sum \limits_{x = 1}^{{N_x}} \mathop \sum \limits_{y = 1}^{{N_y}} {[{\Delta {f_k}({x,y;{z_k}} )} ]^2}, $$
$$\textrm{GRA}(k )= \mathop \sum \limits_{x = 1}^{{N_x}} \mathop \sum \limits_{y = 1}^{{N_y}} \sqrt {{{({\nabla {f_x}} )}^2} + {{({\nabla {f_y}} )}^2}} , $$
$$\textrm{VAR}(k )= \frac{1}{{{N_x}{N_y}}}\mathop \sum \limits_{x = 1}^{{N_x}} \mathop \sum \limits_{y = 1}^{{N_y}} {({f - \bar{f}} )^2}, $$
$$\textrm{SMD}(k )= \frac{1}{{{N_x}{N_y}}}\mathop \sum \limits_{x = 1}^{{N_x}} \mathop \sum \limits_{y = 1}^{{N_y}} [{|{f({x,y} )- f({x + 1,y} )} |+ |{f({x,y} )- f({x,y + 1} )} |} ], $$
$$\textrm{TOG}(k )= \sqrt {{{{N_x}{N_y}\textrm{std}[{|{\nabla f} |} ]} / {\mathop \sum \limits_{x = 1}^{{N_x}} \mathop \sum \limits_{y = 1}^{{N_y}} \sqrt {{{({\nabla {f_x}} )}^2} + {{({\nabla {f_y}} )}^2}} }}}, $$
where std[·] denotes the standard deviation of an image and f is the image to be evaluated. Nx and Ny equal to the pixel numbers along with x and y directions. ∇ is the image gradient and Δ is the image Laplacian operation. For the above sharpness quantification functions, the position of the peak value points to the focusing distance. However, the traditional sharpness quantification functions pay attention to the whole image pixels and average all the pixels. These functions do not make use of the common features among the pixels. The nuclear norm of gradient (NoG) can effectively indicate the rank of the matrix which presents the similarity of pixels in the images [35]. The NoG can be formulated as
$$\textrm{NoG} = F[{{{({\nabla {f_x}} )}^2} + {{({\nabla {f_y}} )}^2}} ], $$
where F denotes the Frobenius norm. Because the sparsity of the natural image, the NoG is more conform to the real situation.

The autofocusing process can be divided into two steps: focusing distance estimation and iterative image reconstruction. To locate the focusing distance z, a series of images on different z are processed. Here the BP reconstruction is used to process the measured images and get the initial reconstructed images at different distance. Because the expanded FZA mask shadow has ${r^{\prime}_1} = \; ({1 + d/z} ){r_1}$, the transfer function of back propagation H can be rewritten as

$$H = \; i\textrm{exp}[{ - i\pi {{({1 + d/z} )}^2}r_1^2({{u^2} + {v^2}} )} ], $$
where d is constant for a specific experiment device, so H only depends on z. A set of zi are substituted into Eq. (13) to generate a set of Hi. Then a set of Hi are substituted into Eq. (6) to generate a series of Z-space stack images. These Z-space images are evaluated by the sharpness metrics. The peak position indicates the suitable focusing distance $z^{\prime}.$

2.3 Compressive reconstruction for twin image elimination

Once the accurate focusing distance $z^{\prime}$ is obtained, the compressive sensing method can be used to suppress twin image and generate high-quality image. According to sparsity difference of image gradient between twin image and original image [19,36], total variation (TV) regularization is used in this image reconstruction. The isotropic TV for a discrete digital image is defined as

$$\textrm{TV}(\mathbf{x} )= \sum\limits_{m,n} {\sqrt {{{({{\mathbf{x}_{m + 1,n}} - {\mathbf{x}_{m,n}}} )}^2} + {{({{\mathbf{x}_{m,n + 1}} - {\mathbf{x}_{m,n}}} )}^2}} }, $$
where m and n represent the pixel index of two-dimensional discrete image. To facilitate calculation, the anisotropic total variation version based on l1 norm can be adopted:
$$\textrm{T}{\textrm{V}_{\textrm{aniso}}}(\mathbf{x} )= \sum\limits_{m,n} {[{|{{\mathbf{x}_{m + 1,n}} - {\mathbf{x}_{m,n}}} |+ |{{\mathbf{x}_{m,n + 1}} - {\mathbf{x}_{m,n}}} |} ]}, $$

Then the calculation of TV can be represented by difference operation D

$$\textrm{T}{\textrm{V}_{\textrm{aniso}}}(\mathbf{x} )= {\|{\mathbf{Dx}} \|_1}. $$

The D can be split into the difference operators Dh along the horizontal and Dv along the vertical directions:

$$\mathbf{Dx} = {[{{\mathbf{D}_h},{\mathbf{D}_\textrm{v}}} ]^T}\mathbf{x}. $$

Twin-image-free reconstruction can be achieved by minimizing the objective function:

$$\hat{\mathbf{x}} = \; \arg \mathop {\min }\limits_\mathbf{x} \left\{ {\frac{1}{2}\|{\mathbf{D}({\mathbf{y} - {\mathbf{F}^{\mathbf{- 1}}}\mathbf{HFx}} )} \|_2^2 + \tau {{\|{\mathbf{Dx}} \|}_1}} \right\}, $$
where y is the recorded image; H is a diagonal matrix, whose nonzero entries are the discrete value of H; F and F-1 represent the Fourier transform matrix and its inverse matrix; $\|\cdot |{|_p}$ denotes the ${\ell _p}$ norm. The first term is fidelity term and the second term is regularization term. The difference operation D in fidelity term is to avoid the influence of constant on reconstruction error. The regularization parameter τ controls the relative weight of the two terms.

To solve the optimization problem, ADMM is adopted to decompose the complicated problem into several subproblems that are easy to solve. Firstly, Eq. (18) is rewritten as an equivalent constraint optimization problem:

$$\begin{array}{l} \hat{\mathbf{x}} = \arg \mathop {\min }\limits_\mathbf{x} \left\{ {\frac{1}{2}\|{\mathbf{c} - \mathbf{b}} \|_2^2 + \tau {{\|\mathbf{w} \|}_1}} \right\},\\ \textrm{s}\textrm{.t}\textrm{. }\mathbf{D}{\mathbf{F}^{\mathbf{- 1}}}\mathbf{HF}\mathbf{x} - \mathbf{c} = 0,\textrm{ }\mathbf{Dx} - \mathbf{w} = 0. \end{array}$$
where b = Dy. c and w are auxiliary variables used to proxy DF-1HFx and Dx, respectively. The augmented Lagrange function of this problem is defined as
$$\begin{array}{l} \mathrm{{\cal L}}({\mathbf{x},\mathbf{w},\mathbf{c},\mathbf{u},\mathbf{v}} )= \frac{1}{2}\|{\mathbf{c} - \mathbf{b}} \|_2^2 + \tau {\|\mathbf{w} \|_1} + \\ \frac{\mu }{2}\left\|{\mathbf{c} - \mathbf{D}{\mathbf{F}^{ - 1}}\mathbf{HFx} + \frac{1}{\mu }\mathbf{u}} \right\|_2^2 + \frac{\eta }{2}\left\|{\mathbf{w} - \mathbf{Dx} + \frac{1}{\eta }\mathbf{v}} \right\|_2^2. \end{array}$$
where u and v are Lagrangian multipliers, and ${\mu}$ and ${\eta}$ are the penalty parameters. Starting with w0, c0, u0 and v0, the ADMM solves the following three subproblems sequentially in each iteration:
$$\left\{ \begin{array}{l} {\;\mathbf{x}^{k + 1}} = \arg \mathop {\min }\limits_\mathbf{x} \mathrm{{\cal L}}({{\mathbf{x}^k},{\mathbf{w}^{k + 1}},{\mathbf{c}^{k + 1}},{\mathbf{u}^k},{\mathbf{v}^k}} )\\ {\mathbf{w}^{k + 1}} = \arg \mathop {\min }\limits_\mathbf{w} \mathrm{{\cal L}}({{\mathbf{x}^k},{\mathbf{w}^k},{\mathbf{c}^{k + 1}},{\mathbf{u}^k},{\mathbf{v}^k}} )\\ {\;\mathbf{c}^{k + 1}} = \arg \mathop {\min }\limits_\mathbf{c} \mathrm{{\cal L}}({{\mathbf{x}^k},{\mathbf{w}^k},{\mathbf{c}^k},{\mathbf{u}^k},{\mathbf{v}^k}} )\end{array} \right.. $$

Then we update the Lagrange multipliers:

$$\left\{ \begin{array}{l} {\mathbf{u}^{k + 1}} = {\mathbf{u}^k} + \beta \mu ({{\mathbf{c}^{k + 1}} - \mathbf{D}{\mathbf{F}^{ - 1}}\mathbf{HF}{\mathbf{x}^{k + 1}}} )\\ {\mathbf{v}^{k + 1}} = {\mathbf{v}^k} + \beta \eta ({{\mathbf{w}^{k + 1}} - \mathbf{D}{\mathbf{x}^{k + 1}}} )\end{array} \right., $$
where $\beta \in \left( {0,{{\left( {\sqrt 5 + 1} \right)} / 2}} \right)$ is an appropriately chosen step length, and the range of β guarantees the convergence under some technical assumptions [37]. All the subproblems in Eq. (21) have closed-form solutions. According to specific convergence conditions, such as reaching the maximum iteration number, or the optimization objective function is lower than the set threshold, the iteration steps are summarized as Algorithm 1.

oe-31-10-15889-i001

3. Simulation performance analysis

In this section, three focusing distances of 150 mm, 250 mm and 350 mm are chosen for evaluating the performance of the autofocusing algorithms. To compare the accuracy of each sharpness metric, two groups of 30 images, one group of natural images and the other group of QR code images, are processed. The errors and the mean relative errors of each metric are analyzed. The comprehensive evaluation index based on the sharpness metrics is provided.

3.1 Image autofocusing with sharpness valuation

We make a linear division on z and then work out the amplified $r^{\prime}$ used in the H for each situation. The step length of the Z-space is 5 mm. The range of object distance is from 100 mm to 400 mm. After getting the measurement images, the BP method is used to reconstruct the original images, as shown in Fig. 2(a). Then the BP reconstructed images are evaluated through the sharpness metrics. Because the different metrics have different ranges, these values are normalized for clear comparation. Considering the simulation parameters, the FZA parameter r1 is 0.3 mm, the distance d between FZA mask and image sensor is 2.5 mm, and the pixel size is 0.01 mm. Then the imaging model is used to generate images to simulate the images captured by the FZA camera. After calculating the sharpness valuation, relevant sharpness valuation curves are shown in Fig. 2(b). Here six metrics are adopted for the evaluation. Most of the curves have a peak around the object distance, and the maximum value corresponds to the focusing distance which is expected. It can be seen that the closer the objects are, the narrower the peak. The reason is when the object distance is shorter, the image is bigger and has more details. Moreover, the change rate of ${r^{\prime}_1}$ for the FZA mask shadow increases with the decreasing distance, which means the sharpness metrics are more sensitive at short object distance. According to the results, the metrics such as LAP, VAR, TOG and NoG are competent for autofocusing. It is also noted that the focusing distance obtained from the peak value of the GRA and SMD curves have large errors, so the GRA and SMD cannot usually work correctly in FZA imaging case.

 figure: Fig. 2.

Fig. 2. The image autofocusing with sharpness valuation (a) The original, measured, and BP reconstructed images at the object distances of 150 mm, 250 mm and 350 mm. The object images are the ground truth under pinhole camera model. The measurements are the captured images by FZA camera. The BP method can reconstruct original image but with twin-image noise. (b) The sharpness valuation curves of different metrics. The correct focusing distances are indicated by the red vertical dotted lines. By zooming in around the object distance, the peaks of the valuation curves of the LAP, VAR, TOG and NoG are close to the correct focusing distance.

Download Full Size | PDF

3.2 Metrics comparation

The LAP, VAR, TOG and NoG metrics which have effective autofocusing capability are selected for further comparation. Two groups of images are used for test. One is CSIQ image quality database which has 30 natural images [38]. The other group includes 30 QR code images. The simulation parameters used in this part is the same as section 3.1. The range of object distance is between 150 mm and 350 mm. The distance interval for evaluation is 5 mm. The autofocusing distances corresponding to each metric are calculated for every image at each distance, separately. The estimated distance error and mean relative error are shown in Fig. 3. The label on the left is the estimated distance error, the range of the vertical axis is from -300 mm to 200 mm. The label on the right is the mean relative error whose range of the vertical axis is from 0 to 0.4 for natural images and from 0 to 2 for QR code images. The scatter plot presents the off-center state of each distance, while the relative error bar reflects the average deviation degree. The joint observation of the two visualization approaches provides a comprehensive quantitative evaluation. The larger the distance between upper and lower limits of the scatter plot, the more unstable the representation method, and the higher the corresponding bar chart. By observing the estimated distance error scatter plots and the mean relative error histograms, the scatter plots of LAP and VAR have a more dispersed distribution and have a higher mean relative error. On the contrast, the metrics of TOG and NoG have a better performance for both groups. In the whole imaging distance, there is a relatively concentrated scatter distribution and low relative error, so the stability and accuracy have a great guarantee. By making a horizontal comparison, the image set of QR code has a more dispersed distribution and a higher mean relative error than the natural images. The reason is that QR code image has a certain periodic arrangement which is similar to grating. The image of the original object is reproduced at the multiples of the Talbot distance, which affects the judgement of focusing distance.

 figure: Fig. 3.

Fig. 3. The estimated distance error scatter plot and the mean relative error histogram of the metrics. The label on the left is the estimated distance error, the range of the vertical axis is from -300 mm to 200 mm. The label on the right is the mean relative error whose range of the vertical axis is from 0 to 0.4 for natural images and from 0 to 2 for QR code images. (a1) - (d1). For the group of CSIQ image quality database, the mean relative errors are all below 0.1, which indicates that all metrics have a good autofocusing estimation. (a2) - (d2) For the group of QR code images. The scatters have more dispersed distribution than the natural images. The relative error of the metrics of LAP and VAR is very high, with some of them above 0.5, which indicates they are not proper for estimating the focusing distance of the QR code. The metrics of TOG and NoG show a better autofocusing performance.

Download Full Size | PDF

The metrics of TOG and NoG have a relative error of no more than 0.2, so the TOG and NoG can be used as criterial metrics. The metric TOG mainly shows the dispersion of image information. Its calculation form is equivalent to the coefficient of variation in statistics, so it reflects the degree of dispersion of the data, which reflects the degree of difference between each pixel value of the reconstructed image. And the metric NoG calculates and adds the entire gradient of the image, extracts the edge and change rate information of the image, and pays more attention to the changes between adjacent points. At the same time, the metric NoG is also the nuclear norm of the image gradient matrix, and its return value reflects the amount of the information contained in the matrix, the significance of the image edge information and the rate of change. Therefore, both of them can perform well in autofocusing task. We involve the dispersion degree and gradient change of the pixels of the image into our evaluation index to improve the universality and robustness of our evaluation method.

Because TOG and NoG have strong linear correlation for the evaluation of the focus distance according to our analysis, we use the linear regression method here to calculate the weight of the two metrics and obtain a more comprehensive evaluation metric W-T-N. We obtain the autofocus distance value of 10080 object images of 480 QR codes distributed at different distances from 150 mm to 350 mm. Both methods obtain the corresponding focus value vector in turn. Then using the least squares method for linear regression analysis, the weight of the two metrics can be obtained for the metric W-T-N. According to the calculation, the weight of the TOG is 0.3752 and the weight of the NoG is 0.6134.

3.3 Image reconstruction and recognition rate test

After estimating the accurate focusing distance, the ADMM can be used to reconstruct the twin-image-free object image. Three focusing distances of 150 mm, 250 mm and 350 mm are chosen for reconstruction. The ground truth of object image, the BP reconstruction and the 1st, 20th, and 50th iterative results of the ADMM are shown in Fig. 4. According to the estimated focusing distance through the W-T-N metric, the clear QR code image can be preserved while the twin image is suppressed during the reconstruction. By using the QR code scanning function of WeChat, 20 iterations are enough to recognize the QR code, and 50 iterations can generate high-quality reconstruction.

 figure: Fig. 4.

Fig. 4. The ADMM reconstruction of the QR code. The reconstruction of QR code image at focusing distances of 150 mm, 250 mm and 350 mm for different iterations and the object image at different focusing distances. The 1st iteration can preliminarily recover the image but has much noise around the image. The 20th iteration is enough to decode the QR code, but the contrast is not good enough. The 50th iteration can produce low-noise and high-contrast image.

Download Full Size | PDF

To quantitatively evaluate the improvement of recognition rate by the ADMM, we evaluated the recognition rate on 16 kinds of QR codes. Each kind contains 30 images, for a total of 480 images. These object images are distributed at 21 different distances from 150 mm to 350 mm. The decoder program comes from open-source project Zxing [39].The results are shown in Table 1. A total of 10080 images are processed at this simulation, and the recognition rates of the 16 kinds of QR codes through the BP reconstruction and the ADMM reconstruction are shown at the Fig. 5. All the detected QR codes have the recognition rate more than 50% and most of them are close to 95% on the ADMM reconstruction. Most of the detected QR codes have the recognition rate less than 10%, even close to 0% on the BP reconstruction. The recognition rate of all samples is 4.06% of BP and 90.00% of ADMM which shows the robustness, stability and effect enhancement of our method. The results verify the accuracy of distance estimation and the effectiveness of denoising of the proposed method.

 figure: Fig. 5.

Fig. 5. The recognition rate on 16 kinds of QR codes. Every detected QR codes has 630 sample images which include 30 samples on 21 different distances distributed from 150 mm to 350 mm. All the detected QR codes have the recognition rate more than 50% and most are close to 95% on the ADMM reconstruction. Most of the detected QR codes have the recognition rate less than 10%, even close to 0% on the BP reconstruction.

Download Full Size | PDF

Tables Icon

Table 1. The recognition rate of QR code decoding

4. Experiment

Because the NoG and TOG metric is sensitive to the background noise, which leads to an oscillated auto-focusing curve, the spatial filtering is utilized to eliminates background noise and get the more accurate objective distance. According to the analysis before, the weight of the TOG is set to 0.3752 and the weight of the NoG is 0.6134 as the W-T-N metric used in the experiment.

The experimental device is shown in the Fig. 6. The size of the screen used in the experiment is 24 inches (16:10), and the target displayed on the screen is a square QR code pattern whose side length is about 18 cm. The camera type is DFM 37UX178-ML from Imaging Source and the image sensor has a pixel size of 2.4 µm. The FZA parameter is 0.3 mm. The distance between the mask and the image sensor is 2.5 mm.

 figure: Fig. 6.

Fig. 6. Experimental device (a) The experimental devices we used in the experiment, such as the FZA camera, display screen, guide rail, two supports and reference plate. (b) The front of the FZA camera. (c) The detail of the FZA mask plate whose central circular fringe radius, FZA parameters, is 0.3 mm.

Download Full Size | PDF

The experimental setup includes a reference plate fixed on the guide rail. The distance between the reference plate and the display screen is 75.5 mm. The FZA camera is installed on the guide rail with movable support so that the sliding displacement could be measured. In the experiment, the sliding displacement is equal to 132 mm from the mask to the reference plate. Then the distance between the camera and the target z is 210 mm, which is obtained by adding 2.5 mm, 75.5 mm and 132 mm. The QR code displayed on the screen as object image is shown in Fig. 7(a). The measurement image could be obtained by capturing a single image through the assembled FZA camera, as shown in Fig. 7(b). A series of refocused images reconstructed by the back propagation are shown in Fig. 7(c). After scanning the object distance, the sharpness valuation can be obtained, and the autofocusing curve of the metric W-T-N can be plotted as shown in Fig. 7(d). The autofocusing normalized metric curve of W-T-N has a peak value of 212 mm, and the relative error is only 0.95%, which is very close to the ground truth. The ADMM algorithm is used to reconstruct a noise-free and high-contrast image, which can be rapidly recognized by the QR-code decoder.

 figure: Fig. 7.

Fig. 7. The experimental process and result (a) The HOLOLAB QR code is used as the object image which is displayed on the screen. (b) The measured image captured by the FZA camera. (c) A series of back propagation images. (d) The autofocusing curve of the metric W-T-N and the peak value of the curve corresponds the focusing distance 212 mm. (e) The ADMM reconstruction, which can clearly restore the original image.

Download Full Size | PDF

5. Conclusion

In summary, we proposed a autofocusing method for FZA lensless imaging system to improve the performance of target recognition. The image sharpness metrics are investigated to get the focusing distance, and ADMM algorithm is adopted to reconstruct noise-free image. Various autofocusing algorithms at three different focusing distances are evaluated, and four metrics, namely, LAP, VAR, TOG and NoG are chosen for tested. Then the different sharpness valuation metrics are compared by the estimated distance error scatter plot and the mean relative error histogram. Both the estimated distance error scatter plot and the mean relative error histogram show the TOG and NoG have better performance. To get a stable metric, a weighted mean metric between the TOG and NoG, called the metric W-T-N, is proposed for FZA lensless imaging. The simulation and experimental results indicate the method can extract the distance accurately and improve the recognition rate of QR code effectively. In the future work, the focusing distance could be extracted directly from the measured image by a designed neural network. A tailored valuation metric instead of the sharpness valuation could also be explored for FZA lensless imaging. The proposed autofocusing method is expected to apply in miniature image sensors with compact structure and low cost. These sensors can be organized into a distributed sensor network to server for unconscious payment, identification and authentication, mobile phones, autonomous cars, etc.

Funding

National Natural Science Foundation of China (62235009); National Postdoctoral Program for Innovative Talents (BX20220180).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [40].

References

1. E. E. Fenimore and T. M. Cannon, “Coded aperture imaging with uniformly redundant arrays,” Appl. Opt. 17(3), 337–347 (1978). [CrossRef]  

2. S. R. Gottesman and E. E. Fenimore, “New family of binary arrays for coded aperture imaging,” Appl. Opt. 28(20), 4344–4352 (1989). [CrossRef]  

3. M. S. Asif, A. Ayremlou, A. Veeraraghavan, R. Baraniuk, and A. Sankaranarayanan, “FlatCam: replacing lenses with masks and computation,” in 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), (2015), 663–666.

4. M. S. Asif, A. Ayremlou, A. Sankaranarayanan, A. Veeraraghavan, and R. G. Baraniuk, “FlatCam: thin, lensless cameras using coded aperture and computation,” IEEE Trans. Comput. Imaging 3(3), 384–397 (2017). [CrossRef]  

5. M. J. DeWeert and B. P. Farm, Lensless coded-aperture imaging with separable Doubly-Toeplitz masks, in (SPIE, 2015), 9.

6. K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny, and L. Waller, “Learned reconstructions for practical mask-based lensless imaging,” Opt. Express 27(20), 28075–28090 (2019). [CrossRef]  

7. R. Horisaki, Y. Okamoto, and J. Tanida, “Deeply coded aperture for lensless imaging,” Opt. Lett. 45(11), 3131–3134 (2020). [CrossRef]  

8. S. S. Khan, V. Sundar, V. Boominathan, A. Veeraraghavan, and K. Mitra, “FlatNet: towards photorealistic scene reconstruction from lensless measurements,” IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 1 (2020). [CrossRef]  

9. H. Zhou, H. Feng, Z. Hu, Z. Xu, Q. Li, and Y. Chen, “Lensless cameras using a mask based on almost perfect sequence through deep learning,” Opt. Express 28(20), 30248–30262 (2020). [CrossRef]  

10. J. Wu, L. Cao, and G. Barbastathis, “DNN-FZA camera: a deep learning approach toward broadband FZA lensless imaging,” Opt. Lett. 46(1), 130–133 (2021). [CrossRef]  

11. X. Pan, X. Chen, S. Takeyama, and M. Yamaguchi, “Image reconstruction with transformer for mask-based lensless imaging,” Opt. Lett. 47(7), 1843–1846 (2022). [CrossRef]  

12. X. Pan, T. Nakamura, X. Chen, and M. Yamaguchi, “Lensless inference camera: incoherent object recognition through a thin mask with LBP map generation,” Opt. Express 29(7), 9758–9771 (2021). [CrossRef]  

13. Y. Zhang, Z. Wu, P. Lin, Y. Wu, L. Wei, Z. Huang, and J. Huangfu, “Text detection and recognition based on a lensless imaging system,” Appl. Opt. 61(14), 4177–4186 (2022). [CrossRef]  

14. Y. Zhang, Z. Wu, P. Lin, Y. Pan, Y. Wu, L. Zhang, and J. Huangfu, “Hand gestures recognition in videos taken with a lensless camera,” Opt. Express 30(22), 39520–39533 (2022). [CrossRef]  

15. M.-H. Wu, Y.-T. C. Lee, and C.-H. Tien, “Lensless facial recognition with encrypted optics and a neural network computation,” Appl. Opt. 61(26), 7595–7601 (2022). [CrossRef]  

16. K. Tajima, T. Shimano, Y. Nakamura, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with multi-phased Fresnel zone aperture,” in 2017 IEEE International Conference on Computational Photography (ICCP), (IEEE, (2017), 1–7.

17. T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with Fresnel zone aperture: quasi-coherent coding,” Appl. Opt. 57(11), 2841–2850 (2018). [CrossRef]  

18. T. Nakamura, T. Watanabe, S. Igarashi, X. Chen, K. Tajima, K. Yamaguchi, T. Shimano, and M. Yamaguchi, “Superresolved image reconstruction in FZA lensless camera by color-channel synthesis,” Opt. Express 28(26), 39137–39155 (2020). [CrossRef]  

19. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, “Single-shot lensless imaging with Fresnel zone aperture and incoherent illumination,” Light: Sci. Appl. 9(1), 53 (2020). [CrossRef]  

20. Y. Ma, J. Wu, S. Chen, and L. Cao, “Explicit-restriction convolutional framework for lensless imaging,” Opt. Express 30(9), 15266–15278 (2022). [CrossRef]  

21. J. R. Fienup, “Synthetic-aperture radar autofocus by maximizing sharpness,” Opt. Lett. 25(4), 221–223 (2000). [CrossRef]  

22. V. A. a. D. T. Pham, “Depth from automatic defocusing,” Opt. Express 15(3), 1011–1023 (2007). [CrossRef]  

23. A. Erteza, “Sharpness index and its application to focus control,” Appl. Opt. 15(4), 877–881 (1976). [CrossRef]  

24. C. D. P. Memmolo, M. Paturzo, A. Finizio, P. Ferraro, and B. Javidi, “Automatic focusing in digital holography and its application to stretched holograms,” Opt. Lett. 36(10), 1945–1947 (2011). [CrossRef]  

25. B. K. Patrik Langehanenberg, D. Dirksen, and G. von Bally, “Autofocusing in digital holographic phase contrast microscopy on pure phase objects for live cell imaging,” Appl. Opt. 47(19), D176–D182 (2008). [CrossRef]  

26. M. L. a. M. Unser, “Autofocus for digital Fresnel holograms by use of a Fresnelet-sparsity criterion,” J. Opt. Soc. Am. A 21(12), 2424–2430 (2004). [CrossRef]  

27. Z. Ren, Z. Xu, and E. Y. Lam, “Learning-based nonparametric autofocusing for digital holography,” Optica 5(4), 337–344 (2018). [CrossRef]  

28. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]  

29. N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng, and L. Waller, “DiffuserCam: lensless single-exposure 3D imaging,” Optica 5(1), 1–9 (2018). [CrossRef]  

30. Y. Zheng, Y. Hua, A. C. Sankaranarayanan, and M. S. Asif, “A Simple Framework for 3D Lensless Imaging with Programmable Masks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 2603–2612.

31. F. Tian and W. Yang, “Learned lensless 3D camera,” Opt. Express 30(19), 34479–34496 (2022). [CrossRef]  

32. Y. Zheng and M. S. Asif, “Joint image and depth estimation with mask-based lensless cameras,” IEEE Trans. Comput. Imaging 6, 1167–1178 (2020). [CrossRef]  

33. S. Koho, E. Fazeli, J. E. Eriksson, and P. E. Hänninen, “Image quality ranking method for microscopy,” Sci. Rep. 6(1), 28962 (2016). [CrossRef]  

34. M. Dellepiane and R. Scopigno, “Global refinement of image-to-geometry registration for color projection,” in Digital Heritage International Congress, (2013), 39.

35. C. Guo, F. Zhang, X. Liu, Q. Li, S. Zheng, J. Tan, Z. Liu, and W. Wang, “Lensfree auto-focusing imaging using nuclear norm of gradient,” Opt. Lasers Eng. 156, 1 (2022). [CrossRef]  

36. W. Zhang, L. Cao, D. J. Brady, H. Zhang, J. Cang, H. Zhang, and G. Jin, “Twin-image-free holography: A compressive sensing approach,” Phys. Rev. Lett. 121(9), 093902 (2018). [CrossRef]  

37. R. Glowinski, Numerical Methods for Nonlinear Variational Problems (Springer, 1984).

38. E. C. Larson and D. M. Chandler, “Most apparent distortion: full-reference image quality assessment and the role of strategy,” J. Electron. Imaging 19(1), 011006 (2010). [CrossRef]  

39. Z. Cross, “Multi-format 1d/2d barcode image processing library with clients for android, java,” (2012), http://code.google.com/p/zxing/.

40. THUHoloLab, “FZA autofocusing imaging,” (2023), https://github.com/THUHoloLab/FZA_autofocusing_imaging.

Data availability

Data underlying the results presented in this paper are available in Ref. [40].

40. THUHoloLab, “FZA autofocusing imaging,” (2023), https://github.com/THUHoloLab/FZA_autofocusing_imaging.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Overview of the autofocusing FZA lensless imaging. The target image is captured by a compact FZA camera. The measured image is processed by the back propagation to generate a series of Z-space images. After the image sharpness valuation, the estimated focusing distance $\hat{z}$ is involved into the ADMM algorithm to get the high-quality reconstruction.
Fig. 2.
Fig. 2. The image autofocusing with sharpness valuation (a) The original, measured, and BP reconstructed images at the object distances of 150 mm, 250 mm and 350 mm. The object images are the ground truth under pinhole camera model. The measurements are the captured images by FZA camera. The BP method can reconstruct original image but with twin-image noise. (b) The sharpness valuation curves of different metrics. The correct focusing distances are indicated by the red vertical dotted lines. By zooming in around the object distance, the peaks of the valuation curves of the LAP, VAR, TOG and NoG are close to the correct focusing distance.
Fig. 3.
Fig. 3. The estimated distance error scatter plot and the mean relative error histogram of the metrics. The label on the left is the estimated distance error, the range of the vertical axis is from -300 mm to 200 mm. The label on the right is the mean relative error whose range of the vertical axis is from 0 to 0.4 for natural images and from 0 to 2 for QR code images. (a1) - (d1). For the group of CSIQ image quality database, the mean relative errors are all below 0.1, which indicates that all metrics have a good autofocusing estimation. (a2) - (d2) For the group of QR code images. The scatters have more dispersed distribution than the natural images. The relative error of the metrics of LAP and VAR is very high, with some of them above 0.5, which indicates they are not proper for estimating the focusing distance of the QR code. The metrics of TOG and NoG show a better autofocusing performance.
Fig. 4.
Fig. 4. The ADMM reconstruction of the QR code. The reconstruction of QR code image at focusing distances of 150 mm, 250 mm and 350 mm for different iterations and the object image at different focusing distances. The 1st iteration can preliminarily recover the image but has much noise around the image. The 20th iteration is enough to decode the QR code, but the contrast is not good enough. The 50th iteration can produce low-noise and high-contrast image.
Fig. 5.
Fig. 5. The recognition rate on 16 kinds of QR codes. Every detected QR codes has 630 sample images which include 30 samples on 21 different distances distributed from 150 mm to 350 mm. All the detected QR codes have the recognition rate more than 50% and most are close to 95% on the ADMM reconstruction. Most of the detected QR codes have the recognition rate less than 10%, even close to 0% on the BP reconstruction.
Fig. 6.
Fig. 6. Experimental device (a) The experimental devices we used in the experiment, such as the FZA camera, display screen, guide rail, two supports and reference plate. (b) The front of the FZA camera. (c) The detail of the FZA mask plate whose central circular fringe radius, FZA parameters, is 0.3 mm.
Fig. 7.
Fig. 7. The experimental process and result (a) The HOLOLAB QR code is used as the object image which is displayed on the screen. (b) The measured image captured by the FZA camera. (c) A series of back propagation images. (d) The autofocusing curve of the metric W-T-N and the peak value of the curve corresponds the focusing distance 212 mm. (e) The ADMM reconstruction, which can clearly restore the original image.

Tables (1)

Tables Icon

Table 1. The recognition rate of QR code decoding

Equations (22)

Equations on this page are rendered with MathJax. Learn more.

T ( x , y ) = 1 2 + 1 2 cos ( π x 2 + y 2 r 1 2 ) ,
I ( x , y ) = O ( x , y ) T ( x , y ) + e ( x , y ) ,
I ( x , y ) = C + 1 2 Re [ U ( x , y ) ] + e ( x , y ) ,
U ( x , y ) = F 1 { H ( u , v ) F { O ( x , y ) } } ,
I = U + U .
O ^ = F 1 { H F { I } } = O + F 1 { ( H ) 2 F { I } } .
LAP ( k ) = x = 1 N x y = 1 N y [ Δ f k ( x , y ; z k ) ] 2 ,
GRA ( k ) = x = 1 N x y = 1 N y ( f x ) 2 + ( f y ) 2 ,
VAR ( k ) = 1 N x N y x = 1 N x y = 1 N y ( f f ¯ ) 2 ,
SMD ( k ) = 1 N x N y x = 1 N x y = 1 N y [ | f ( x , y ) f ( x + 1 , y ) | + | f ( x , y ) f ( x , y + 1 ) | ] ,
TOG ( k ) = N x N y std [ | f | ] / x = 1 N x y = 1 N y ( f x ) 2 + ( f y ) 2 ,
NoG = F [ ( f x ) 2 + ( f y ) 2 ] ,
H = i exp [ i π ( 1 + d / z ) 2 r 1 2 ( u 2 + v 2 ) ] ,
TV ( x ) = m , n ( x m + 1 , n x m , n ) 2 + ( x m , n + 1 x m , n ) 2 ,
T V aniso ( x ) = m , n [ | x m + 1 , n x m , n | + | x m , n + 1 x m , n | ] ,
T V aniso ( x ) = D x 1 .
D x = [ D h , D v ] T x .
x ^ = arg min x { 1 2 D ( y F 1 H F x ) 2 2 + τ D x 1 } ,
x ^ = arg min x { 1 2 c b 2 2 + τ w 1 } , s .t D F 1 H F x c = 0 ,   D x w = 0.
L ( x , w , c , u , v ) = 1 2 c b 2 2 + τ w 1 + μ 2 c D F 1 H F x + 1 μ u 2 2 + η 2 w D x + 1 η v 2 2 .
{ x k + 1 = arg min x L ( x k , w k + 1 , c k + 1 , u k , v k ) w k + 1 = arg min w L ( x k , w k , c k + 1 , u k , v k ) c k + 1 = arg min c L ( x k , w k , c k , u k , v k ) .
{ u k + 1 = u k + β μ ( c k + 1 D F 1 H F x k + 1 ) v k + 1 = v k + β η ( w k + 1 D x k + 1 ) ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.