Optimization of light fields in ghost imaging using dictionary learning

Chenyu Hu; Chenyu Hu; Zhishen Tong; Zhishen Tong; Zhentao Liu; Zengfeng Huang; Jian Wang; Shensheng Han; Shensheng Han

doi:10.1364/OE.27.028734

1. Introduction

As a novel technique for optical imaging, ghost imaging (GI) was initially implemented with quantum-entangled photons two decades ago [1,2]. In recent years, owing to its realization with thermal light and other new sources [3–8], GI has gained new attention and developed applications in various imaging areas, such as remote sensing [9,10], imaging through scattering media [11,12], spectral imaging [13,14], photon-limited imaging [15,16] and X-ray imaging [17,18].

Different from conventional imaging techniques that are based on the first-order correlation of light fields, GI extracts information of an object by calculating the second-order correlation between the light fields of the reference and the object arms [4,5]. Theoretically, calculation of the second-order correlation requires infinite number of samplings of the light fields at both arms. In practice, however, the number of samplings is always finite, which often leads to reconstructed images of degraded signal-to-noise ratio (SNR) [19–21]. To address this issue, much effort has been made in designing more effective reconstruction methods for GI. On the one hand, approaches improving the second-order correlation have been proposed, which can increase the SNR of reconstructed images with theoretical guarantees [21–24]. On the other hand, by exploiting sparsity of the objects’ images in transform bases (e.g., wavelets [25]), methods built upon the compressed sensing (CS) theory [26–28] have also been developed [29,30]. In general, the CS based methods have superior performance over those relying on the second-order correlation, especially for imaging smooth objects [31].

While improving the reconstruction methods has greatly promoted the practical applications of GI, there has been increasing evidence that the reconstruction quality of GI may be fundamentally restricted by the sampling efficiency [32,33], i.e., how well information of objects is acquired in the samplings. To enable a satisfactory reconstruction from limited number of samplings in GI, a natural way is to enhance the sampling efficiency. In fact, this can be realized by optimizing the light fields of GI; see [32–35] and the references therein. Considering an orthogonal sparsifying basis, Xu et al. [35] optimized the sampling matrix in order for their product, so called the equivalent sampling matrix, to have the minimum mutual coherence, which results in much refinement of the imaging quality. Though orthogonal basis is widely suitable for sparse representation of natural images, for images from a specific category, dictionary learning [36,37] usually produces much sparser representation coefficients, suggesting room for further improvements of the reconstruction quality. Motivated by this, in this paper we propose to optimize the light fields of GI for a sparsifying basis obtained via dictionary learning. By minimizing the mutual coherence of the equivalent sampling matrix, the proposed scheme enhances the sampling efficiency and thus achieves an improved reconstruction quality. In comparison with the state-of-the-art optimization methods for light fields in GI, the superiority of our scheme is confirmed via both simulations and experiments. The main advantages of the proposed scheme is summarized as follows:

• Inspired from some previous researches in CS [38,39], we formulate the problem of minimizing the mutual coherence of the equivalent sampling matrix in GI as a Frobenius-norm minimization problem, which yields a closed-form solution that depends on the sparsifying basis only. To the best of our knowledge, the suggested solution of the light fields is the first closed-form result in the GI optimization field.
• The proposed scheme enables successive samplings. In GI, successive samplings means that when more samplings are available (or needed), one can simply augment new rows to the currently optimized sampling matrix in order to form a new one, without the need to perform additional optimization over the entire matrix. Such feature can bring great convenience to the practical applications of GI and was not addressed in previous works.

It is worth mentioning that matrix optimization based on dictionary learning has also been studied in the CS literature, see, e.g., [38–40]. However, the optimizations in [38,39] were carried out over sampling matrices of fixed sizes, which does not allow successive samplings. Although Duarte’s method [40] dealt with the matrix optimization problem of alterable sampling size, it is also not compatible to GI because of the demanding quantization accuracy. Moreover, those methods all fail to cope with the non-negative nature of sampling matrices in GI.

2. The proposed scheme

The detection process in GI can be approximately formulated as [30]

(1)$${\mathbf{y}} = {\mathbf{\Phi x}} + {\mathbf{n}},$$

where ${\mathbf {y}} \in {\mathcal {R}}^M$ stands for the signal measured by the detector in the object arm, ${\mathbf {\Phi }} \in {\mathcal {R}}^{M \times N}$ is the sampling matrix consisting of the light-field intensity distribution recorded by the detector in the reference arm, ${\mathbf {x}} \in {\mathcal {R}}^N$ signifies the object’s information to be retrieved, and ${\mathbf {n}}$ denotes the detection noise. Let $\ {{\mathbf {\Psi }} \in \mathcal {R}^{N \times K}}$ be the sparsifying basis obtained via dictionary learning, in which ${\mathbf {x}}$ can be sparsely represented as $\mathbf {x} = \mathbf {\Psi z}$, where $\ {\mathbf {z} \in \mathcal {R}^{K}}$ is the sparse coefficient vector. Also, consider the equivalent sensing matrix ${\mathbf {D}} := {\mathbf {\Phi \Psi }}$. Then, (1) can be rewritten as

(2)$${\mathbf{y}} = {\mathbf{D z}} + {\mathbf{n}},$$

Evidences from the CS theory have revealed that a matrix $\mathbf {D}$ well preserving information of the sparse vector $\mathbf {z}$ guarantees a faithful reconstruction [26–28]. As a powerful measure of information preservation, the mutual coherence $\mu (\mathbf {D})$ characterizes how incoherent each column pairs in $\mathbf {D}$ are [26,41], namely,

(3)$$\mu \left( {\mathbf{D}} \right) = \mathop {\max} _{1 \leq i < j \leq K} \frac{{\bigg |} {\bigg \langle} \mathbf{d}_i, \mathbf{d}_j {\bigg \rangle} {\bigg |}} {{\bigg \|} \mathbf{d}_i {\bigg \|}_2 {\bigg \|} \mathbf{d}_j {\bigg \|}_2}$$

with ${\mathbf {d}}_i$ being the $i$-th column of ${\mathbf {D}}$. For its simplicity and ease of computation, the mutual coherence $\mu \left ( {\mathbf {D}} \right )$ has been widely used to describe the performance guarantees of CS reconstruction algorithms. For example, exact recovery of sparse signals via orthogonal matching pursuit (OMP) [42] is ensured by $\mu \left ( {\mathbf {D}} \right ) < \frac {1}{{2k - 1}}$ [43], where $k$ is the sparsity level of input signals. In this work, with the aim of enhancing the sampling efficiency, we employ $\mu (\mathbf {D})$ as the objective function to be minimized in our optimization scheme. In particular, our proposed scheme consists of the following two main steps:

• Firstly, an over-complete dictionary $\mathbf {\Psi }$ is learned from a collection of images, under the constraint that its first column has identical entries $N^{-1/2}$, while each of the other columns has entries summing to zero. Specifically, given ${\mathbf {X}} = [ {\mathbf {x}}^{(1)},{\mathbf {x}}^{(2)}, \ldots ,{\mathbf {x}}^{(\ {L})} ] \in {\mathcal {R}^{N \times L}}$, in which each column is a reshaped vector of the training image sample, the sparsifying dictionary ${\mathbf \Psi } \in \mathcal {R}^{N \times K}$ is learned by solving the following problem: $(4)$$\begin{array}{cl}\min_{\mathbf \Psi, Z} & \left\| {{\mathbf{X}} - {\mathbf{\Psi Z}}} \right\|_F^2 \\ \textrm{subject to} &{\mathbf{\Psi}}_{11} = \cdots = {\mathbf{\Psi}}_{N1} = N^{{-}1/2}, \\ & \|{\mathbf{z}}_i\|_0 \le T_0,~i = 1, \ldots, L, \end{array}$$$ where $\|\cdot \|_F$ and $\|\cdot \|_0$ are the Frobenius- and $\ell _0$-norm, respectively, ${\mathbf {Z}} = {\bigg [} {\mathbf {z}}_1,{\mathbf {z}}_2, \ldots ,{\mathbf {z}}_L {\bigg ]} \in \mathcal {R}^{K \times L}$ represents the sparse coefficient matrix of training images, and $T_0$ denotes the predetermined sparsity level of vectors ${\mathbf {z}}_i$. In this work, we shall employ $K$-SVD as a representative method to perform the dictionary learning task, which results in simultaneous sparse representation of input images in the learned dictionary $\mathbf {\Psi }$. Readers are referred to [37] for more details of the $K$-SVD method.
• Secondly, the sampling matrix $\mathbf {\Phi }$ is optimized by minimizing the mutual coherence of the equivalent sampling matrix $\mathbf {D}$. Put formally, $(5)$$\mathop {\min }_{\mathbf{\Phi }} \mu \left( \mathbf{D} \right) ~~\ \ \textrm{subject to} ~\ \ \ {{\mathbf{\Phi }}_{ij}} \ge 0~\textrm{and}~{\mathbf{D}} = \mathbf{\Phi\Psi}.$$$ The non-negative constraint $\mathbf {\Phi }_{ij} \geq 0$ is imposed due to the fact that the intensity of light fields is always non-negative.

We now proceed to solve the optimization problem in (5). Without loss of generality, assume that matrix $\mathbf {D}$ has $\ell _2$-normalized columns, that is, $\|\mathbf {d}_i\|_2 = 1$ for $i = 1, \ldots , K$. Then,

(6)$$\mu(\mathbf{D}) = \max_{1 \leq i < j \leq K} {| \langle \mathbf{d}_i, \mathbf{d}_j \rangle |}.$$

To optimize $\mu (\mathbf {D})$, it suffices to minimize the off-diagonal entries of the Gram matrix $\mathbf {D}^{\top } \mathbf {D}$, each of which corresponds to the coherence between two different columns in $\mathbf {D}$ (i.e., $(\mathbf {D}^{\top } \mathbf {D})_{ij} = | {\bigg \langle} \mathbf {d}_i, \mathbf {d}_j {\bigg \rangle} |$, $i \neq j$). In particular, we would like the Gram matrix to be as close to the identity matrix as possible, namely, $\mathbf {\Psi }^{\top } \mathbf {\Phi }^{\top } \mathbf {\Phi \Psi } \approx \mathbf {I}.$ Since replacing the identity matrix with ${\mathbf {\Psi }}^{\top }{\mathbf {\Psi }}$ yields a sampling matrix robust to the sparse representation error of images [44], we propose to optimize $\mathbf {\Phi }$ via

(7)$$\mathop {\min }_{\mathbf{\Phi }} {\bigg \|} {{{\mathbf{\Psi }}^{\top}}{{\mathbf{\Phi }}^{\top}}{\mathbf{\Phi \Psi }} - {{\mathbf{\Psi }}^{\top}}{\mathbf{\Psi }}} {\bigg\|}_F^2.$$

By multiplying $\mathbf {\Psi }$ and ${\mathbf {\Psi }}^{\top }$ on the left- and right-hand sides of both terms inside the Frobenius norm, respectively, one has

(8)$$\mathop {\min }_{\mathbf{\Phi }} {\bigg \|} {{\mathbf{\Psi }} {{\mathbf{\Psi }}^{\top}} {{\mathbf{\Phi }}^{\top}}{\mathbf{\Phi \Psi }} {{\mathbf{\Psi }}^{\top}} - {\mathbf{\Psi }} {{\mathbf{\Psi }}^{\top}}{\mathbf{\Psi }} {{\mathbf{\Psi }}^{\top}}} {\bigg \|}_F^2.$$

After substituting ${\mathbf {\Psi }}{{\mathbf {\Psi }}^{\top }}$ with its eigenvalue decomposition ${\mathbf {V}}{\mathbf {\Lambda }}{{\mathbf {V}}^{\top }}$, and also denoting ${\mathbf {W}} := {\mathbf {\Lambda }}{{\mathbf {V}}^{\top }}{{\mathbf {\Phi }}^{\top }}$, (8) can be rewritten as

(9)$$\min_{\mathbf{W}} {\bigg \|} {{\mathbf{VW}}{{\mathbf{W}}^{\top}}{{\mathbf{V}}^{\top}} - {\mathbf{V}}{\mathbf \Lambda}^2 {{\mathbf{V}}^{\top}}} {\bigg \|}_F^2,$$

or equivalently,

(10)$$\mathop {\textrm{min}}_{\mathbf{W}} \Bigg\| {{{\mathbf{\Lambda }}^2} - \sum_{i = 1}^M {{{\mathbf{w}}_i}{\mathbf{w}}_i^{\top}} } \Bigg\|_F^2~\textrm{where}~{\mathbf{W}} = \left[ {{{\mathbf{w}}_1}, \ldots ,{{\mathbf{w}}_M}} \right].$$

Denoting ${\mathbf {\Lambda }} = \left [ {{{\mathbf {r}}_1}, \ldots ,{{\mathbf {r}}_N}} \right ]$, (10) further becomes

(11)$$\mathop {\textrm{min}}_{\mathbf{W}} \Bigg\| {\sum_{j = 1}^N {{{\mathbf{r}}_j}{\mathbf{r}}_j^{\top}} - \sum_{i = 1}^M {{{\mathbf{w}}_i}{\mathbf{w}}_i^{\top}} } \Bigg\|_F^2.$$

Clearly problem (11) has the solution $\widehat {{\mathbf {W}}} = {{\mathbf {\Lambda }}_1^{\top }},$ where ${\mathbf {\Lambda }}_1$ is the matrix consisting of the first $M$ columns of $\mathbf {\Lambda }$, which is obtained by setting ${{\mathbf {w}}_k} = {{\mathbf {r}}_k}$, $k = 1, \ldots , M$. Recalling that ${\mathbf {W}} := {\mathbf {\Lambda }}{{\mathbf {V}}^{\top }}{{\mathbf {\Phi }}^{\top }}$, the optimized sampling matrix $\mathbf {\Phi }$ can be simply calculated as

(12)$${\widehat {\mathbf{\Phi }} = {\widehat {\mathbf{W}}^\textrm{T}}{\left( {{{\mathbf{\Lambda }}^{ - 1}}} \right)^\textrm{T}}{{\mathbf{V}}^\textrm{T}} = {{\mathbf{\Lambda }}_1}{\left( {{{\mathbf{\Lambda }}^{ - 1}}} \right)^\textrm{T}}{{\mathbf{V}}^\textrm{T}} = \left[ {\begin{array}{*{20}{c}} {{{\mathbf{I}}_{M \times M}}} & {\mathbf{0}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{\mathbf{V}}_1^\textrm{T}}\\ {{\mathbf{V}}_2^\textrm{T}} \end{array}} \right] = {\mathbf{V}}_1^\textrm{T}},$$

where matrix ${\mathbf {V}}_1$ consists of the first $M$ columns of $\mathbf {V}$. When more samplings become available, interestingly, it suffices to update $\widehat {\mathbf {\Phi }}$ by augmenting more rows of $\mathbf {V}^{\top }$ to the previous one, thereby enabling a successive sampling. As aforementioned, the feature of successive sampling is of vital importance to the practical applications of GI.

Furthermore, due to the fact that the intensity of light fields is always non-negative, additional treatments are needed to make sure that elements of the sampling matrix are non-negative (NN). To the end, we propose a NN lifting, which adds a constant matrix to the optimized matrix $\widehat {\mathbf {\Phi }}$ in (12) as

(13)$$\widehat{\mathbf{D}} = {\bigg(} {\widehat{\mathbf{\Phi }} + c{\mathbf{1}}_{M \times N}} {\bigg)}{\mathbf{\Psi }},$$

where $\mathbf {1}_{M \times N}$ is an $M$-by-$N$ matrix with entries being ones and

(14)$$c : = \begin{cases} - \min_{i,j } \widehat{\mathbf{\Phi}}_{ij} & \textrm{if}~\min_{i,j } \widehat{\mathbf{\Phi}}_{ij} < 0, \\ 0 & \textrm{if}~\min_{i,j } \widehat{\mathbf{\Phi}}_{ij} \geq 0. \end{cases}$$

As aforementioned, the first column of $\mathbf {\Psi }$ has identical entries and other columns have entries summing to zero. Thus,

(15)$$\widehat{\mathbf{D}} =\widehat{\mathbf{\Phi }} {\mathbf{\Psi}} + c N^{{-}1/2} {\bigg[} {{\mathbf{1}}_{M \times 1},\underbrace{{\mathbf{0}}, \ldots ,{\mathbf{0}}}_{M \times (K - 1)}} {\bigg]}.$$

It can be noticed that after the NN lifting, $\widehat {\mathbf {D}}$ and $\widehat {\mathbf {\Phi }} {\mathbf {\Psi }}$ differ only in the first column. Nevertheless, the mutual coherence $\mu (\widehat {\mathbf {D}})$ is not much affected by the NN lifting, as confirmed by our extensive empirical test.

3. Results

3.1 Simulations

To evaluate the effectiveness of the proposed optimization scheme, both simulations and experiments are performed. MNIST handwritten digits of size $28 \times 28$ pixels [45] are chosen to be the imaging objects, and the dictionary $\mathbf {\Psi }$ is learned based on 20,000 digits randomly selected from the training set.

Moreover, the optimized sampling matrix $\widehat {\mathbf {\Phi }}$ is obtained from (12), followed by the NN lifting. A subset of atoms in the learned dictionary $\mathbf {\Psi }$ and the optimized light-field intensity distributions $\widehat {\mathbf {\Phi }}$ are shown as Fig. 1(a) and Fig. 1(b), respectively. For comparative purposes, our simulation includes other four methods: 1) Gaussian method, 2) Duarte’s method [40], 3) Xu’s method [35] and 4) normalized GI (NGI) method [24]. Table 1 gives a brief summary of the methods under test. In the Gaussian method, the sampling matrices are random Gaussian matrices, whose entries are drawn independently from the standard Gaussian distribution ($\mathbf {\Phi }_{ij} \sim \mathcal {N}(0,1)$). To meet the NN constraint $\mathbf {\Phi }_{ij} \geq 0$ of GI, the matrices $\mathbf {\Phi }$’s generated from the Gaussian and Duarte’s methods are also inflicted with the NN lifting. For the Gaussian, Duarte’s and our proposed methods, the images are retrieved in two steps. Firstly, the sparse coefficient vector $\hat {\mathbf {z}}$ of image under the learned dictionary $\mathbf {\Psi }$ is obtained by solving the $\ell _0$-minimization problem:

(16)$$\hat{\mathbf{z}} = \underset{\mathbf{z}}{\arg \min} \hspace{0.5mm} \|\widehat{\mathbf{D}} \mathbf{z} - \mathbf{y} \|_2^2~~~\textrm{subject~to}~~\|\mathbf{z}\|_0 \leq T_0$$

via the OMP algorithm [42,43,46]. Secondly, the object’s image is reconstructed as

(17)$$\widehat{\mathbf{x}}={\mathbf{\Psi}}{\widehat{{\mathbf{z}}}}.$$

For Xu’s method, the Discrete Cosine Transform (DCT) basis is chosen as the orthogonal basis. And the images in methods 3) and 4) are reconstructed via approaches proposed in their corresponding references.

Fig. 1. (Left) A subset of atoms in the learned dictionary $\mathbf {\Psi }$; (right) a subset of the optimized light-field intensity distributions

Download Full Size | PDF

Table 1. A summary of test methods

View Table | View all tables in this article

In our simulation, we first adopt matrices with entries of double-type in MATLAB. Figure 2 shows the simulation results of different methods. In Fig. 2, the reconstructed images at the different sampling ratios (SR’s) (i.e., SR $= 0.15$, $0.38$, and $0.64$) are displayed, where the SR is computed by dividing the number of samplings by the number of image pixels. Figures 2(b) and 2(c) depict the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index of the reconstructed images as functions of the SR, respectively. Given the reference image $X$ and the reconstructed image $Y$, the PSNR and SSIM are defined as follows,

(18a)$$\begin{aligned}\textrm{MSE}\left( {X,Y} \right) = \frac{1}{{mn}}\sum_{i=1}^m\sum_{j=1}^n {{{\left[ {X\left( {i,j} \right) - Y\left( {i,j} \right)} \right]}^2}}, \end{aligned}$$

(18b)$$\begin{aligned}\textrm{PSNR}\left( {X,Y} \right) = 10 \hspace{0.4mm} {\log _{10}}\left[ {\frac{{{{B}^2}}}{{\textrm{MSE}\left( {X,Y} \right)}}} \right], \end{aligned}$$

(18c)$$\begin{aligned}\textrm{SSIM}\left( {X,Y} \right) = \frac{{\left( {2{\mu _X}{\mu _Y} + {c_1}} \right)\left( {2{\sigma _{XY}} + {c_2}} \right)}}{{\left( {\mu _X^2 + \mu _Y^2 + {c_1}} \right)\left( {\sigma _X^2 + \sigma _Y^2 + {c_2}} \right)}}, \end{aligned}$$

where the pixel size of image is $m \times n$, $B$ denotes the dynamic range of image pixels, which takes the value $255$ in this paper, $(\mu _X, \sigma _X)$ and $(\mu _Y, \sigma _Y)$ are the means and variances of $X$ and $Y$, respectively, $\sigma _{XY}$ is the covariance of $X$ and $Y$, $c_1 = (0.01B)^2$ and $c_2 = (0.03B)^2$. These two metrics measure the difference and similarity between the reconstructed images and the original ones, respectively. For each SR under test, the PSNR and SSIM are averaged over 500 reconstructed digit images to plot the curves. By comparing them, the reconstruction quality of different methods are compared empirically.

Fig. 2. Simulation results with sampling matrix of high accuracy. Figure 2(a) shows the reconstructed images reconstructed via different methods under different SR. Figures 2(b) and 2(c) illustrate the PSNR and SSIM of reconstructed images via different methods as a function of SR, respectively.

Download Full Size | PDF

From Fig. 2, it can be observed that the reconstruction quality of the proposed scheme is uniformly better than the other methods under test. In particular, it achieves $2$dB to $4$dB gain of PSNR and up to $10$% higher SSIM over the Gaussian method, owing to the optimized light fields. Compared to Xu’s method [35], the Gaussian method have a notable advantage in the low SR region, which gradually converges as the SR approaches one. The performance gap is mainly attributed to the utilization of dictionary learning that can better incorporate the sparsity prior of images. Among all test methods, the PSNR and SSIM of the NGI method lie in the lowest level. This is mainly due to the image noise, which often happens to the correlation-based reconstruction methods of GI, especially when the SR is low. It can also be observed from Fig. 2 that Duarte’s method [40] performs comparably with the proposed method in the low SR region, but deteriorating dramatically when the SR increases. Such phenomenon seems unreasonable at first glance, but can be interpreted from the condition number perspective. To be specific, the sampling matrix ${\mathbf \Phi }$ of Duarte’s method has larger condition number as the SR increases (see detailed explanations in Footnote 7 of [40]). Thus, when ${\mathbf \Phi }$ multiplies with the representation error ${\mathbf {e}} = {\mathbf {x - \Psi z}}$, it could significantly amplify this error, and eventually degrade the reconstruction quality. Indeed, in this case one would need to reconstruct the sparse vector $\mathbf {z}$ from the samplings ${\mathbf {y}} = {\mathbf {\Phi x}} + {\mathbf {n}} = {\mathbf {\Phi }} {\mathbf {\Psi z}} + {\mathbf \Phi e} + {\mathbf {n}} = {\mathbf {D z}} + ({\mathbf \Phi e} + {\mathbf {n}})$, which can be difficult since the largely amplified error ${\mathbf \Phi e}$ essentially becomes part of noise for the reconstruction.

In practice, detectors measure the intensity signals with quantization, which means that $\mathbf {\Phi }$ is actually a quantized sampling matrix. Thus, we also simulate the case where the sampling matrices are quantized to $8$-bit of precision and plot the results in Fig. 3. The quantization was done in the following way. Let ${\widetilde {\mathbf {\Phi }}}$ denote the sampling matrix after NN-Lifting. Then, each entry of the quantized sampling matrix was calculated as

(19)$${\mathbf{\Phi }}_{ij}^\textrm{Q} = \textrm{round} \left( \frac{\widetilde {\mathbf{\Phi }}_{ij}}{ {\max}_{i,j} {\widetilde {\mathbf{\Phi }}}_{ij}} \times 255 \right),$$

where the function ${\textrm {round}}( \cdot )$ returns to the nearest integer. Similarly, Figs. 3(b) and 3(c) show the curves of PSNR and SSIM with averaged values over 500 reconstruction trails, respectively. We observe that the overall behavior is similar to that of Figs. 2(b) and 2(c) except that both the PSNR and SSIM curves of Duarte’s method [40] fluctuate in the lowest level for the whole SR region. Accordingly, Duarte’s method [40] also fails to retrieve the images in Fig. 3. This is mainly because Duarte’s method [40] is demanding in the quantization accuracy. When large quantization errors are introduced, sparse coefficients of the test images may not be correctly calculated by the reconstruction algorithm. The PSNR and SSIM curves of the NGI method lie in a low level similar to that of Duarte’s method, and the images are only vaguely reconstructed, as shown in Fig. 3. We also observe that the performance of Xu’s method [35] becomes worse in the quantized case, although the images can still be retrieved. Overall, our proposed method performs the best for both the high accurate as well as the quantized scenarios.

Fig. 3. Simulation results with $8$-bit quantized sampling matrix. Figure 3 shows the reconstructed images via different methods under different SR. Figures 3(a) and 3(b) show PSNR and SSIM of the reconstructed images via different methods as a function of SR, respectively.

Download Full Size | PDF

3.2 Experimental results

We also experimentally compare the proposed method with the Gaussian method, Duarte’s method [40] and NGI method [24]. The schematic diagram of experimental setup is shown as Fig. 4. The light-field patterns are first displayed on the digital micro-mirror device (DMD) after preloaded via the computer. Next, light from a light-emitting diode (LED) source is modulated by a Kohler illumination system to be evenly incident on DMD. The light reflected by DMD is then projected onto the imaging object by a lens system. Finally, the whole light reflected from the object is collected by the lens and measured by the detector. In our experiments, the light-field patterns are displayed at a rate of 10Hz to avoid frame dropping of the detector, so that the sampling procedure lasts for one minute or so. After the samplings, the subsequent image retrieval steps for each method are the same as those in the simulation test. The reconstruction was carried out on an industrial computer with $32$GB RAM and Intel(R) Core(TM)-I7 $2600$ CPU @$3.4$GHz, and the consuming time of matrix optimization and reconstruction for different methods is specified in Table 2. The reconstruction time for each method is given as a time period, since it varies according to the sampling rate.

Fig. 4. Schematic diagram of experimental setup. Light-field patterns are generated by DMD and projected onto the object, afterwards collected by the detector.

Download Full Size | PDF

Table 2. Running time of test methods

View Table | View all tables in this article

The comparison of reconstructed images by different methods is shown as Fig. 5, where the ground truth is obtained by pixel-wise detection and serve as a reference image. By comparing the reconstructed images with the ground truth, again, the PSNR and SSIM are calculated and plotted in Fig. 5(b) and 5(c), respectively. Overall, the experimental results demonstrate that the reconstructed quality of the proposed optimization scheme is superior to that of other methods under test, which well matches our simulation results.

Fig. 5. Experimental results. Figure 5 shows the reconstructed images via different methods under different SR. Figures 5(b) and 5(c) show PSNR and SSIM of the reconstructed images via different methods as a function of SR, respectively.

Download Full Size | PDF

3.3 Discussions

We would like to point out some interesting points that arise from the simulation and experimental results.

i) Superiority: The superiority of the proposed method is mainly owing to two factors: i) optimization of sampling matrix and ii) dictionary learning. Indeed, the proposed optimization scheme outperforms the Gaussian method, even though they share the same spasifying basis obtained by dictionary learning. This is because our method essentially performs a “global” optimization of light fields that incorporates the image statistics captured in the dictionary learning process, thereby enhancing the sampling efficiency. Besides, the improvement of the proposed method over Xu’s method can be attributed to the use of both dictionary learning and our optimized sampling matrix.
ii) PSNR and SSIM: The PSNR and SSIM curves of the proposed method tend towards flat after the SR reaches a critical value. This in turn implies that the inherent information of the imaging object acquired at this very SR value already suffices to produce a satisfactory reconstruction. The critical SR can thus be utilized to evaluate the capability of information acquisition and also allows the comparison of different approaches.
iii) Recovery error: While the proposed closed-form solution can lead to the recovery of images with high accuracy, there still exists some unavoidable recovery errors (see, e.g., Fig. 2(a) and Fig. 3(a)). The errors are essentially due to the representation error in dictionary learning. To be specific, when training the sparsifying basis via dictionary learning in Eq. (4), the representation error term $||{\mathbf {X} - \mathbf {\Psi Z}}||_F^2$ does not usually reach zero. Thus, even if all the sparse coefficients of images (under the obtained sparsifying dictionary) are correctly computed, when using those coefficients to retrieve images, there could still be some tiny difference between the reconstructed images and the original ones.
iv) Dictionary learning: We would like to mention a practical limitation of the proposed scheme in handling images of large size due to the use of dictionary learning. Specifically, while dictionary learning in our scheme can bring in some performance gain, it is usually demanding in the requirements of storage and computational cost. Thus the patch size used in dictionary learning should not be large, which, however, poses a limitation to the image size that we can handle. Nevertheless, efficient dictionary learning methods dealing with images of larger scales have recently been proposed [47–49], in which the handled image size can go beyond $64 \times 64$ pixels. To demonstrate the effectiveness of the proposed method for images of larger size, we carry out simulations over the LFWcrop database [50], which consists of more than $13,000$ images of $64 \times 64$ pixels. The dictionary is trained offline using the algorithm in [48] with $12,000$ images, which takes about $20$ hours in our industrial computer. The results of the proposed method and the Gaussian method, which involve dictionary learning, are shown as Fig. 6 for comparison.
v) Explicit dictionary: We mention that if one wish to deal with images of even larger size such that existing dictionary learning methods fail to handle or cannot learn the images offline, then the proposed light-field optimization scheme can still be applied by using explicit dictionaries (e.g., Cropped Wavelets [48]) to incorporate the sparse prior of images.
vi) Alternative to NN lifting: Recall that we have employed an NN lifting to cope with the non-negative constraint $\mathbf {\Phi }_{ij} \geq 0$ in our scheme. This operation, however, could severely degrade the mutual incoherence property of the equivalent sampling matrix in the worst case. Although this situation rarely happens as confirmed by our empirical test, there still exists such possibility in theory, which can be potentially risky to the image reconstruction. To address this issue, we introduce an alternative operation called NN subtraction to deal with the non-negativeness. Specifically, after obtaining the optimized matrix $\widehat {\mathbf {\Phi }}$, whose elements are not necessarily non-negative, we can decompose it into the subtraction of two matrices, $(20)$$\widehat{\mathbf{\Phi}} = \widehat{\mathbf{\Phi}}_+{-} \widehat{\mathbf{\Phi}}_-,$$$ which all have non-negative entries. For example, $(21)$$\underbrace{\left[ \begin{array}{ccc} 1 & -2 & 3 \\ -4 & 5 & -6 \end{array} \right]}_{\widehat{\mathbf{\Phi}}} = \underbrace{\left[ \begin{array}{ccc} 1 & 0 & 3 \\ 0 & 5 & 0 \end{array} \right]}_{\widehat{\mathbf{\Phi}}_+} - \underbrace{\left[ \begin{array}{ccc} 0 & 2 & 0 \\ 4 & 0 & 6 \end{array} \right]}_{\widehat{\mathbf{\Phi}}_-}.$$$ We then use matrices $\widehat {\mathbf {\Phi }}_+$ and $\widehat {\mathbf {\Phi }}_-$ to physically acquire the samplings of input signal $\mathbf {x}$ as $(22)$$\left[ \begin{array}{c} {\mathbf{y}}_+ \\ \hline {\mathbf{y}}_- \end{array} \right] = \left[ \begin{array}{ccccc} & & \widehat{\mathbf{\Phi}}_+ & & \\ \hline & & \widehat{\mathbf{\Phi}}_- & & \end{array} \right] \left[ \begin{array}{c} \\ \mathbf{x} \\ \\ \end{array} \right].$$$ Based on ${\mathbf {y}}_+$ and ${\mathbf {y}}_-$, we finally obtain the desired samplings as $(23)$${\mathbf{y}} = {\mathbf{y}}_+{-} {\mathbf{y}}_-{=} ( \widehat{\mathbf{\Phi}}_+{-} \widehat{\mathbf{\Phi}}_- ) {\mathbf{x}} \overset{(20)}{=} \mathbf{\Phi} \mathbf{x},$$$ from which we can recover $\mathbf {x}$. We would like to highlight that the NN subtraction operation has a theoretical advantage over the NN lifting one. Indeed, it addresses the non-negative issue of sampling matrices $\widehat {\mathbf {\Phi }}$ without degrading its mutual coherence, as we will finally recover $\mathbf {x}$ just from ${\mathbf {y}} = \widehat {\mathbf {\Phi }} {\mathbf {x}}$. This is no doubt beneficial to the image reconstruction. On the other hand, it should be noted that this benefit is at the cost of acquiring the doubled samplings (i.e., $[ {\mathbf {y}}_+^T~ {\mathbf {y}}_-^T ]^T \in \mathcal {R}^{2 M}$). Considering i) that the cost of having more samplings could be expensive in practice and ii) that these two operations generally lead to similar performance (see Fig. 7 for examples), the NN lifting operation can still be favorable choice.

Fig. 6. Simulation results of LFWcrop face images. Figure 6 shows the reconstructed images via different methods under different SR. Figures 6(b) and 6(c) show PSNR and SSIM of the reconstructed images via different methods as a function of SR, respectively.

Download Full Size | PDF

Fig. 7. Comparison of reconstruction results via NN lifting and NN subtraction.

Download Full Size | PDF

4. Conclusion

In this paper, an optimization scheme of light fields has been proposed to improve the imaging quality of GI. The key idea is to minimize the mutual coherence of the equivalent sampling matrix in order to enhance the sampling efficiency. A closed-form solution of the sampling matrix has been derived, which enables successive sampling. Simulation and experimental results have shown that the proposed scheme is very effective in improving the reconstruction quality of images, compared to the state-of-the-art methods for the light-field optimization. One important point we would like to mention is that while the proposed scheme has been empirically shown to be able to imaging specific targets with higher quality, deriving analytical results of the recovery accuracy would require a bit more effort, and our future work will be directed towards this avenue.

Funding

National Key Research and Development Program of China (2017YFB0503303); National Natural Science Foundation of China (61971146, 61802069, 71803026, 11627811).

Acknowledgement

The authors would like to appreciate the editor and anonymous reviewers for their constructive comments and suggestions. Particularly, we would like to thank one of the reviewers who shared her/his wisdom with us on how to address the non-negative issue of sampling matrices.

References

1. T. B. Pittman, Y. H. Shih, D. V. Strekalov, and A. V. Sergienko, “Optical imaging by means of 2-photon quantum entanglement,” Phys. Rev. A: At., Mol., Opt. Phys. 52(5), R3429–R3432 (1995). [CrossRef]

2. D. Strekalov, A. Sergienko, D. Klyshko, and Y. Shih, “Observation of two-photon “ghost” interference and diffraction,” Phys. Rev. Lett. 74(18), 3600–3603 (1995). [CrossRef]

3. R. S. Bennink, S. J. Bentley, and R. W. Boyd, ““two-photon” coincidence imaging with a classical source,” Phys. Rev. Lett. 89(11), 113601 (2002). [CrossRef]

4. A. Gatti, E. Brambilla, M. Bache, and L. A. Lugiato, “Ghost imaging with thermal light: comparing entanglement and classicalcorrelation,” Phys. Rev. Lett. 93(9), 093602 (2004). [CrossRef]

5. J. Cheng and S. Han, “Incoherent coincidence imaging and its applicability in x-ray diffraction,” Phys. Rev. Lett. 92(9), 093903 (2004). [CrossRef]

6. D. Zhang, Y.-H. Zhai, L.-A. Wu, and X.-H. Chen, “Correlated two-photon imaging with true thermal light,” Opt. Lett. 30(18), 2354–2356 (2005). [CrossRef]

7. R. I. Khakimov, B. Henson, D. Shin, S. Hodgman, R. Dall, K. Baldwin, and A. Truscott, “Ghost imaging with atoms,” Nature 540(7631), 100–103 (2016). [CrossRef]

8. S. Li, F. Cropp, K. Kabra, T. Lane, G. Wetzstein, P. Musumeci, and D. Ratner, “Electron ghost imaging,” Phys. Rev. Lett. 121(11), 114801 (2018). [CrossRef]

9. M. Malik, O. S. Magaña-Loaiza, and R. W. Boyd, “Quantum-secured imaging,” Appl. Phys. Lett. 101(24), 241103 (2012). [CrossRef]

10. C. Zhao, W. Gong, M. Chen, E. Li, H. Wang, W. Xu, and S. Han, “Ghost imaging lidar via sparsity constraints,” Appl. Phys. Lett. 101(14), 141123 (2012). [CrossRef]

11. W. Gong and S. Han, “Correlated imaging in scattering media,” Opt. Lett. 36(3), 394–396 (2011). [CrossRef]

12. M. Bina, D. Magatti, M. Molteni, A. Gatti, L. A. Lugiato, and F. Ferri, “Backscattering differential ghost imaging in turbid media,” Phys. Rev. Lett. 110(8), 083901 (2013). [CrossRef]

13. Y. Wang, J. Suo, J. Fan, and Q. Dai, “Hyperspectral computational ghost imaging via temporal multiplexing,” IEEE Photonics Technol. Lett. 28(3), 288–291 (2016). [CrossRef]

14. Z. Liu, S. Tan, J. Wu, E. Li, X. Shen, and S. Han, “Spectral camera based on ghost imaging via sparsity constraints,” Sci. Rep. 6(1), 25718 (2016). [CrossRef]

15. P. A. Morris, R. S. Aspden, J. E. Bell, R. W. Boyd, and M. J. Padgett, “Imaging with a small number of photons,” Nat. Commun. 6(1), 5913 (2015). [CrossRef]

16. X. Liu, J. Shi, X. Wu, and G. Zeng, “Fast first-photon ghost imaging,” Sci. Rep. 8(1), 5012 (2018). [CrossRef]

17. D. Pelliccia, A. Rack, M. Scheel, V. Cantelli, and D. M. Paganin, “Experimental x-ray ghost imaging,” Phys. Rev. Lett. 117(11), 113902 (2016). [CrossRef]

18. H. Yu, R. Lu, S. Han, H. Xie, G. Du, T. Xiao, and D. Zhu, “Fourier-transform ghost imaging with hard x rays,” Phys. Rev. Lett. 117(11), 113901 (2016). [CrossRef]

19. X. Shen, Y. Bai, T. Qin, and S. Han, “Experimental investigation of quality of lensless ghost imaging with pseudo-thermal light,” Chin. Phys. Lett. 25(11), 3968–3971 (2008). [CrossRef]

20. B. I. Erkmen and J. H. Shapiro, “Signal-to-noise ratio of gaussian-state ghost imaging,” Phys. Rev. A 79(2), 023833 (2009). [CrossRef]

21. F. Ferri, D. Magatti, L. Lugiato, and A. Gatti, “Differential ghost imaging,” Phys. Rev. Lett. 104(25), 253603 (2010). [CrossRef]

22. W. Gong and S. Han, “A method to improve the visibility of ghost images obtained by thermal light,” Phys. Lett. A 374(8), 1005–1008 (2010). [CrossRef]

23. G. Brida, M. Genovese, and I. R. Berchera, “Experimental realization of sub-shot-noise quantum imaging,” Nat. Photonics 4(4), 227–230 (2010). [CrossRef]

24. B. Sun, S. S. Welsh, M. P. Edgar, J. H. Shapiro, and M. J. Padgett, “Normalized ghost imaging,” Opt. Express 20(15), 16892–16901 (2012). [CrossRef]

25. M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform,” IEEE Trans. Image Process. 1(2), 205–220 (1992). [CrossRef]

26. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

27. E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005). [CrossRef]

28. E. J. Candès and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006). [CrossRef]

29. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. 95(13), 131110 (2009). [CrossRef]

30. S. Han, H. Yu, X. Shen, H. Liu, W. Gong, and Z. Liu, “A review of ghost imaging via sparsity constraints,” Appl. Sci. 8(8), 1379 (2018). [CrossRef]

31. W. Gong, Z. Bo, E. Li, and S. Han, “Experimental investigation of the quality of ghost imaging via sparsity constraints,” Appl. Opt. 52(15), 3510–3515 (2013). [CrossRef]

32. M. Chen, E. Li, and S. Han, “Application of multi-correlation-scale measurement matrices in ghost imaging via sparsity constraints,” Appl. Opt. 53(13), 2924–2928 (2014). [CrossRef]

33. S. M. Khamoushi, Y. Nosrati, and S. H. Tavassoli, “Sinusoidal ghost imaging,” Opt. Lett. 40(15), 3452–3455 (2015). [CrossRef]

34. E. Li, M. Chen, W. Gong, H. Yu, and S. Han, “Mutual information of ghost imaging systems,” Acta Opt. Sin. 33, 1211003 (2013). [CrossRef]

35. X. Xu, E. Li, X. Shen, and S. Han, “Optimization of speckle patterns in ghost imaging via sparse constraints by mutual coherence minimization,” Chin. Opt. Lett. 13(7), 071101 (2015). [CrossRef]

36. B. A. Olshausen and D. J. Field, “Natural image statistics and efficient coding,” Network: Comput. Neural Syst. 7(2), 333–339 (1996). [CrossRef]

37. M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Acoust., Speech, Signal Process. 54(11), 4311–4322 (2006). [CrossRef]

38. M. Elad, “Optimized projections for compressed sensing,” IEEE Trans. Acoust., Speech, Signal Process. 55(12), 5695–5702 (2007). [CrossRef]

39. V. Abolghasemi, S. Ferdowsi, and S. Sanei, “A gradient-based alternating minimization approach for optimization of the measurement matrix in compressive sensing,” Signal Process. 92(4), 999–1009 (2012). [CrossRef]

40. J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Trans. Image Process. 18(7), 1395–1408 (2009). [CrossRef]

41. D. L. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via $l 1$ minimization,” Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003). [CrossRef]

42. Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, (IEEE, 1993), pp. 40–44.

43. J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004). [CrossRef]

44. N. Cleju, “Optimized projections for compressed sensing via rank-constrained nearest correlation matrix,” Appl. Comput. Harmon. Analysis 36(3), 495–507 (2014). [CrossRef]

45. L. Deng, “The mnist database of handwritten digit images for machine learning research [best of the web],” IEEE Signal Process. Mag. 29(6), 141–142 (2012). [CrossRef]

46. J. Wang, “Support recovery with orthogonal matching pursuit in the presence of noise,” IEEE Trans. Acoust., Speech, Signal Process. 63(21), 5868–5877 (2015). [CrossRef]

47. L. Le Magoarou and R. Gribonval, “Chasing butterflies: In search of efficient dictionaries,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2015), pp. 3287–3291.

48. J. Sulam, B. Ophir, M. Zibulevsky, and M. Elad, “Trainlets: Dictionary learning in high dimensions,” IEEE Trans. Acoust., Speech, Signal Process. 64(12), 3180–3193 (2016). [CrossRef]

49. C. F. Dantas, M. N. Da Costa, and R. da Rocha Lopes, “Learning dictionaries as a sum of kronecker products,” IEEE Signal Process. Lett. 24(5), 559–563 (2017). [CrossRef]

50. C. Sanderson and B. C. Lovell, “Multi-region probabilistic histograms for robust and scalable identity inference,” in International Conference on Biometrics, (Springer, 2009), pp. 199–208.

Method	Sampling matrix	Dictionary	$Φ_{i j} \geq 0$
Proposed	Eq. (12)	$K$ -SVD	NN lifting
Gaussian	Random Gaussian	$K$ -SVD	NN lifting
Duarte [40]	Matrix optimization	$K$ -SVD	NN lifting
Xu [35]	Matrix optimization	DCT	Zero-forcing
NGI [24]	Random Gaussian	None	NN lifting

Method	Matrix Optimization (sec.)	Reconstruction (sec.)
Proposed	0.365	$0.037$ to $0.150$
Gaussian	–	$0.037$ to $0.158$
Duarte [40]	$0.345$	$0.039$ to $0.157$
Xu [35]	$90.18$ (for $100$ iterations)	$0.028$ to $0.360$
NGI [24]	–	$0.001$ to $0.007$

Method	Sampling matrix	Dictionary	$Φ_{i j} \geq 0$
Proposed	Eq. (12)	$K$ -SVD	NN lifting
Gaussian	Random Gaussian	$K$ -SVD	NN lifting
Duarte [40]	Matrix optimization	$K$ -SVD	NN lifting
Xu [35]	Matrix optimization	DCT	Zero-forcing
NGI [24]	Random Gaussian	None	NN lifting

Method	Matrix Optimization (sec.)	Reconstruction (sec.)
Proposed	0.365	$0.037$ to $0.150$
Gaussian	–	$0.037$ to $0.158$
Duarte [40]	$0.345$	$0.039$ to $0.157$
Xu [35]	$90.18$ (for $100$ iterations)	$0.028$ to $0.360$
NGI [24]	–	$0.001$ to $0.007$

Optimization of light fields in ghost imaging using dictionary learning

Abstract

1. Introduction

2. The proposed scheme

3. Results

3.1 Simulations

3.2 Experimental results

3.3 Discussions

4. Conclusion

Funding

Acknowledgement

References

Cited By

Figures (7)

Tables (2)

Equations (25)

Optics Express