Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Compressive X-ray tomosynthesis using model-driven deep learning

Open Access Open Access

Abstract

Compressive X-ray tomosynthesis uses a few two-dimensional projection measurements modulated by coding masks to reconstruct the three-dimensional object that can be sparsely represented on a predefined basis. However, the coding mask optimization and object reconstruction require significant computing resources. In addition, existing methods fall short to exploits the synergy between the encoding and reconstruction stages to approach the global optimum. This paper proposes a model-driven deep learning (MDL) approach to significantly improve the computational efficiency and accuracy of tomosynthesis reconstruction. A unified framework is developed to jointly optimize the coding masks and the neural network parameters, which effectively increase the degrees of optimization freedom. It shows that the computational efficiency of coding mask optimization and image reconstruction can be improved by more than one order of magnitude. Furthermore, the performance of reconstruction results is significantly improved.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Traditional X-ray computed tomography (CT) relies on a complete set of measurements taken around the object, which inevitably lead to long inspection time and harmful radiation exposure [15]. In medical imaging, however, a large amount of radiation exposure may significantly affect the health of patients, including damage of body cells and nerves [6]. The goal to reduce X-ray radiation has heightened the need for optimizing the hardware settings of CT systems. In this sense, X-ray tomosynthesis uses limited projection angles to greatly reduce the exposure time and radiation dose [7,8]. However, the incomplete measurements lead to an ill-posed problem that challenges the accuracy of image reconstruction [9,10].

To overcome this limitation, coding masks were introduced to modulate the structure illuminations, leading to the compressive tomosynthesis approaches, which are able to maintain quality in image reconstruction and reduce the radiation dose [11]. The coding masks can be implemented by the 3D printed Tungsten masks [1214]. Figure 1 shows the sketch of a typical compressive tomosynthesis system. A set of coding masks are inserted between the X-ray source array and the object under inspection to block/unblock the X-rays. The detector receives the attenuated X-rays by the three-dimensional (3D) object, and generates the two-dimensional (2D) compressive measurements that can be used to reconstruct the object based on numerical algorithms. Kaganovsky et al. extended these concepts and introduced coded projections for medical CT scanners [15]. However, both of these approaches used random coding patterns, which are suboptimal in down-sampling. To improve the reconstruction performance, several coding mask optimization methods were developed based on the uniform sensing criteria [16], the restricted isometry property (RIP) in compressive sensing (CS) theory [17,18], the minimum information loss [19], and the sparse principal component analysis [20].

 figure: Fig. 1.

Fig. 1. Sketch of compressive X-ray tomosynthesis.

Download Full Size | PDF

All the methods mentioned above, however, used gradient-based algorithms which require a large number of iterations to obtain good design results, which poses a big challenge to computational efficiency. More importantly, separating the coding mask optimization and image reconstruction will lead to sub-optimal solutions. Deep learning may be used to solve these problems because of its abilities to improve computational efficiency and to learn from data. Recently, a set of explainable and predictable deep learning approaches were proposed to efficiently solve ill-posed inverse problems, which unfolds the iterative optimization process to construct the network structure [2125]. Initially, fully connected networks (FCN) were widely used in these studies, which may lead to shortcomings such as over-fitting, low computational efficiency, and large storage consumption particularly for tomosynthesis [26,27]. Subsequently, convolutional neural networks (CNN) were used to share the weighting coefficients, thus greatly reduce the number of parameters and avoid the over-fitting problem [2833]. Inspired by this idea, this work proposes a model-driven deep learning (MDL) approach based on a CNN structure to improve the computational efficiency and reconstruction performance of tomosynthesis.

To the best of our knowledge, this is the first approach that focuses on MDL to address compressive tomosynthesis image reconstruction. In this paper, the iterative hard thresholding (IHT) algorithm is selected as the launching point to construct the architecture of the MDL network. The iterative process in the IHT algorithm is unfolded and reformulated as a neural network, which is regarded as an encoder that transfers the compressive tomosynthesis measurements to the sparse representation coefficients of object. On the other hand, the pixels on the coding masks have a one-to-one correspondence with the detector pixels, so these can be regarded as a trainable point-to-point binary network layer, namely coding layer with weights of 0 or 1.

In addition, this paper develops an unsupervised end-to-end training strategy for the proposed MDL network, which avoids the demanding task of labelling a number of training samples. To achieve this goal, we design a fully-connective decoder to reconstruct the object images from the output of encoder (i.e., the sparse representation coefficients of the object). In fact, the fully-connective decoder is constituted by the sparse representation bases of the object. For a well-training neural network, the output of decoder should be very similar to the ground truth of the training objects. Based on this principle, the loss function to train the MDL network is set as the difference between the ground truth of training object samples and the output of decoder. The proposed method jointly optimizes the coding masks and MDL network parameters in a unified framework to effectively improve the degrees of optimization freedom. When the training process is completed, the optimized coding masks can be used to modulate the structured X-ray illuminations, and the neural network can be used to reconstruct the object images in a fast manner. Furthermore, the MDL can provide a good initial guess of sparse coefficients, which can be combined with several iterations of traditional algorithms to refine reconstruction performance.

The remainder of this paper is organized as follows. The mathematical model of X-ray tomosynthesis problem is described in Section 2. The IHT algorithm and its recursive network structure are presented in Section 3. The proposed MDL approaches are proposed in Section 4. Simulations and analysis are provided in Section 5. Section 6 concludes the paper.

2. Mathematical model of X-ray tomosynthesis

The monochromatic X-ray transmission imaging model is given by the Beer-Lambert law: $I=I_0 \cdot \exp \left [-\int _{0}^{\infty }\mu (r) \cdot dr \right ]$, where $I_0$ is the original intensity of an X-ray incident to a 3D object, $I$ is the intensity received by the detector, and $\mu (r)$ is the linear attenuation coefficient of object at location $r$. Assume $\vec {s}$ represents the location of X-ray source, $\hat {\omega }$ represents the direction to illuminate the object. Then, the Beer-Lambert law can be transferred into the form: $y(\vec {s},\hat {\omega })=-\ln (I/I_0)=\int _{0}^{\infty }f(\vec {s}+r\hat {\omega })dr$, where $f$ is the linear attenuation coefficient map of object. This continuous-to-continuous imaging model is termed as the X-ray transform of the object. Suppose the tomosynthesis system sequentially turns on $P$ X-ray sources above the object to take measurements, and the incident direction varies with the source location. Next, we derive the discrete form of the imaging model, since both the object and detector are pixelated in the numerical algorithms.

First, consider the imaging process with a single X-ray source, for example the $p$th source point. The measurement on the detector is formulated as $\boldsymbol{y}_p= \mathbf {H}_p \boldsymbol{f}$, in which $\boldsymbol{f}\in \mathbb {R}^N$ ($N=N_x\cdot N_y\cdot N_z$) is the vectorized representation of the linear attenuation coefficient of 3D object, where $N_x$, $N_y$ and $N_z$ are the dimensions along the $x$, $y$ and $z$ axes. $\mathbf {H}_p\in \mathbb {R}^{M\times N}$ is the system matrix with $M=M_x\cdot M_y$, where $M_x$ and $M_y$ are the dimensions of the detector along the $x$ and $y$ axes. The $(m,n)$th element in $\mathbf {H}_p$ represents the volumetric portion in the $n$th voxel of $\boldsymbol{f}$ irradiated by the X-ray associated with the $m$th detector pixel.

Consider next the placement of a block/unblock coding mask in front of the X-ray source to modulate structured illumination and reduce radiation dose, where the pixels on the coding mask have one-to-one correspondence to the pixels on the detector. Mathematically, the coding mask corresponding to the $p$th X-ray source is denoted as a binary matrix $\mathbf {T}_p\in \mathbb {R}^{M_x\times M_y}$, where the zero-valued and one-valued elements respectively indicate the opaque and transparent pixels that completely block or unblock the incident X-rays. The transparent coding mask elements select the rows of $\mathbf {H}_p$ associated with the unblocked X-rays. For the coding mask $\mathbf {T}_p$, we define a code matrix $\mathbf {C}_p\in \mathbb {R}^{M \times M}$ , which places the elements of $\mathbf {T}_p$ into the diagonal entries of $\mathbf {C}_p$. Concatenating the imaging models of all X-ray sources, the measurements are given by

$$\boldsymbol{y} = \mathbf{C} \mathbf{H} \boldsymbol{f} = \mathbf{C} \mathbf{H} \boldsymbol{\Psi}\boldsymbol{\theta},$$
where $\boldsymbol{y} = [\boldsymbol{y}_1^{\top },\boldsymbol{y}_2^{\top },\ldots ,\boldsymbol{y}_P^{\top }]^{\top }$, $\mathbf {C} =\textrm {diag}\left (\mathbf {C}_1,\mathbf {C}_2,\ldots ,\mathbf {C}_P\right )$, and $\mathbf {H} = [\mathbf {H}_1^{\top },\mathbf {H}_2^{\top },\ldots ,\mathbf {H}_P^{\top }]^{\top }$. Assume $\boldsymbol{f}$ can be sparsely represented on a basis $\boldsymbol{\Psi }$, i.e., $\boldsymbol{f}= \boldsymbol{\Psi }\boldsymbol{\theta }$, where $\boldsymbol{\theta }$ is referred as the sparse coefficient vector of the object, which only includes a small number of significant coefficients while other coefficients are close to zero [34]. The sensing matrix of X-ray tomosynthesis system is defined as $\mathbf {A}=\mathbf {C} \mathbf {H} \boldsymbol{\Psi }$. The object $\boldsymbol{f}$ can be recovered from the compressive measurements by solving the following $l_0$-norm minimization problem [35]:
$$\hat{\boldsymbol{\theta}}=\arg\min _{\boldsymbol{\theta}}\|\boldsymbol{\theta} \|_0 ,~~~\mathrm{s.t.}~\ \boldsymbol{y}=\mathbf{A}\boldsymbol{\theta}~ \textrm{and} ~\boldsymbol{f}= \boldsymbol{\Psi}\boldsymbol{\theta}.$$

Numerous algorithm have been proposed to solve the problem in Eq. (2). However, most of the existing algorithms require a large number of iterations, and thus are computational intensive. In addition, the very high dimensionality of the object data and sensing matrix dramatically increase the computational burden. On the other hand, coding masks will induce the loss of measurement information, resulting in degradation of reconstruction performance. In the following, we derive the MDL approach based on the concept of IHT algorithm. It will shown that the proposed MDL method can effectively improve the computational efficiency and reconstruction performance.

3. IHT algorithm and its recursive network form

IHT is a widely used greedy algorithm for solving the $l_0$-norm minimization problem. However, IHT algorithm sometimes converges slowly and the hard thresholding operations may induce oscillation and sub-optimality of the solutions. The IHT algorithm is chosen to derive the MDL framework, because it is well suited for unfolding into a deep learning architecture while retaining the sparsity condition of the reconstructed signal [25].

Suppose there are at most $S$ significant sparse coefficients in $\boldsymbol{\theta }$ , then Eq. (2) can be rewritten as the following $S$-sparse reconstruction problem

$$\hat{\boldsymbol{\theta}}=\arg\min _{\boldsymbol{\theta}}\frac{1}{2}\|\boldsymbol{y}- \mathbf{A} \boldsymbol{\theta} \|_2^{2},\ \quad \mathrm{s.t.} \ \| \boldsymbol{\theta} \|_0\leq S.$$

The IHT algorithm attempts to minimize Eq. (3) using the projected gradient iterations. Let $\boldsymbol{\theta }^{(n)}$ represent the estimate of $\boldsymbol{\theta }$ in the $n$th iteration, then the variable $\boldsymbol{\theta }$ can be updated as:

$$\boldsymbol{\theta}^{(n+1)}=\mathbf{\Omega}_S[\boldsymbol{\theta}^{(n)}- \mathbf{A}^{\top}(\mathbf{A} \boldsymbol{\theta}^{(n)}-\boldsymbol{y})],$$
where $\mathbf {\Omega }_{S}[\cdot ]$ is a hard-thresholding operator to preserve the largest $S$ elements in the vector and set others to zero. Initializing $\boldsymbol{\theta }^{(0)}=\boldsymbol{0}$ will guarantee the convergence of the algorithm [25]. After some mathematical manipulations, Eq. (4) can be simplified into the following form:
$$\boldsymbol{\theta}^{(n+1)}=\sigma_{\beta} [\mathbf{W} \boldsymbol{\theta}^{(n)}+\mathbf{Z} \boldsymbol{y}], \quad \boldsymbol{\theta}^{(0)}=\boldsymbol{0},$$
where $\mathbf {W}=\mathbf {I}-\mathbf {A}^\top \mathbf {A}$, $\mathbf {Z}=\mathbf {A}^\top$, and $\sigma _{\beta }$ is an element-wise thresholding function with a predefined threshold value $\beta$, that is $\sigma _{\beta }(x)= x$ if $| x | > \beta$, otherwise $\sigma _{\beta }(x)= 0$.

The iteration process in Eq. (5) can be depicted by the flowchart in Fig. 2(a). In other words, the loop of the IHT algorithm can be represented by a recursive network.

 figure: Fig. 2.

Fig. 2. (a) The flowchart of IHT algorithm, which can be interpreted as a recursive network. (b) Unfolded fully connected network derived from IHT algorithm.

Download Full Size | PDF

4. MDL method for compressive X-ray tomosynthesis

4.1 Architecture of MDL network

According to Fig. 2(a), the IHT algorithm can be unfolded into an FCN, which includes a series of iterations, and each iteration corresponds to one network layer. Assume that the IHT algorithm is terminated after $K$ iterations, then the unfolded FCN will include $K$ layers as shown in Fig. 2(b), where $\mathbf {W}$ and $\mathbf {Z}$ are referred as to the fully connected weight matrix and the linear transformation matrix, respectively. In general, there are two requirements for deep learning models: efficient computation and good prediction performance. FCN has too many independent weight coefficients, which will increase the computational complexity, memory consumption, as well as the risk of over-fitting. In contrast, CNN can easily solve these problems using the weight sharing mechanism. Thus, we will transfer the FCN in Fig. 2(b) into a CNN, which is termed as MDL, since it is derived from the physical imaging model of X-ray tomosynthesis system.

Consider a more general case that $P$ X-ray sources are deployed to interrogate the object, according to Eq. (1), the sensing matrix can be expanded as

$$\mathbf{A}=\mathbf{CH} \boldsymbol{\Psi} ={\bigg[} \mathbf{C}_1 \mathbf{H}_1 \boldsymbol{\Psi}, \mathbf{C}_2 \mathbf{H}_2 \boldsymbol{\Psi}, \ldots, \mathbf{C}_P \mathbf{H}_P \boldsymbol{\Psi} {\bigg]}^{\top}.$$

Thus, the fully connected weight matrix in Fig. 2(b) can be calculated as

$$\mathbf{W}=\mathbf{I}-\mathbf{A}^\top \mathbf{A} =\mathbf{I}- \sum_{p=1}^{P} \boldsymbol{\Psi}^{\top} \mathbf{H}_p^{\top} \mathbf{C}_p^{\top} \mathbf{C}_p \mathbf{H}_p \boldsymbol{\Psi} =\frac{1}{P} \sum_{p=1}^{P} \left[\mathbf{I}-P (\boldsymbol{\Psi}^{\top} \mathbf{H}_p^{\top} \mathbf{C}_p^{\top} \mathbf{C}_p \mathbf{H}_p \boldsymbol{\Psi}) \right].$$

In Fig. 2(b), the linear transformation of $\boldsymbol{y}$ can be calculated as

$$\mathbf{Z} \boldsymbol{y} =\mathbf{A^{\top}} \boldsymbol{y} =\frac{1}{P} \sum_{p=1}^{P} \boldsymbol{\Psi}^{\top} \mathbf{H}_p^{\top} \mathbf{C}_p^{\top} \boldsymbol{y}_p.$$

Some cases of replacing fully connected layers with convolution kernels have been studied in previous works [36]. However, these works just reshaped the weights of fully connected layers into the filters of convolution layers, thus fell short to reduce the amount of parameters in the network. Inspired by the concept of CNN, this paper attempts to replace each fully connected layer by one convolution kernel with a small dimensionality, and thus greatly reducing the computational complexity and amount of storage.

The block diagram of the proposed MDL network is illustrated in Fig. 3(a). As mentioned above, the vectorized representation of the projection measurement associated with the $p$th source is denoted as $\boldsymbol{y}_p \in \mathbb {R}^{M \times 1} (M = M_x \times M_y)$. Let $\mathbf {Y}_p\in \mathbb {R}^{M_x \times M_y}$ be the 2D projection measurement on the detector, which is the matrix representation of vector $\boldsymbol{y}_p$. As shown in Fig. 3, we can stack all the $\mathbf {Y}_p$ ($p=1,2,\ldots ,P$) layer by layer to form a measurement cubic denoted by $\widetilde {\mathscr {Y}}\in \mathbb {R}^{M_x \times M_y \times P}$, which is set as the input of the MDL network. The forward CNN is regarded as an encoder that transfers $\widetilde {\mathscr {Y}}$ into the sparse coefficients $\boldsymbol{\theta }_K$. Using 2D convolution kernel $\mathbf {\Gamma }_p \in \mathbb {R}^{Q_x \times Q_y}$ to replace the linear transformation operator in Eq. (8), we have

$$\mathbf{Z} \boldsymbol{y} =\frac{1}{P} \sum_{p=1}^{P} \boldsymbol{\Psi}^{\top} \mathbf{H}_p^{\top} \mathbf{C}_p^{\top} \boldsymbol{y}_p \approx \mathscr{T} \left\{ \frac{1}{P} \sum_{p=1}^{P} \mathbf{\Gamma}_p \otimes \mathbf{Y}_p \right\} \triangleq \mathscr{T} \left\{ \frac{1}{P} \widetilde{\mathbf{\Gamma}} \otimes \widetilde{\mathscr{Y}} \right\},$$
where $\mathscr {T}$ is an unfolding operator to raster-scan a matrix into its vectorized representation, $\otimes$ is the convolution operation, and $\widetilde {\mathbf {\Gamma }} \in \mathbb {R}^{Q_x \times Q_y \times P}$ ($Q_x \ll M_x$, $Q_y \ll M_y$) is a 3D convolution kernel that is stacked by all of $\mathbf {\Gamma }_p$. In other words, $\widetilde {\mathbf {\Gamma }}$ consists of $P$ filter bands, each of which is equal to $\mathbf {\Gamma }_p$. In the right side of Eq. (9), the convolution between two 3D cubes is defined as the summation of the 2D convolutions between all corresponding slices in the two cubes. In the case of padding, the size of output image can be kept the same as the original. Thus, the result of $\widetilde {\mathbf {\Gamma }} \otimes \widetilde {\mathscr {Y}}$ is a 2D slice with dimension of $M_x \times M_y$.

 figure: Fig. 3.

Fig. 3. (a) The coding mask layer that maps the 3D object $\boldsymbol{f}$ to the input measurement data $\widetilde {\mathscr {Y}}$, (b) the forward CNN, also referred to as encoder, used to learn the sparse coefficient vector of 3D object from the input measurement data, and (c) the decoder that map the sparse coefficient vector to the reconstructed object.

Download Full Size | PDF

Similarly, the weight matrix $\mathbf {W}$ in Eq. (7) can be replaced by a 2D convolution kernel $\boldsymbol{\Phi } \in \mathbb {R}^{Q_x \times Q_y }$ such that:

$$\mathbf{W} \boldsymbol{\theta} =\frac{1}{P} \sum_{p=1}^{P} \left[\mathbf{I}-P\left(\boldsymbol{\Psi}^{\top} \mathbf{H}_p^{\top} \mathbf{C}_p^{\top} \cdot \mathbf{C}_p \mathbf{H}_p \boldsymbol{\Psi}\right) \right] \boldsymbol{\theta} \approx \mathscr{T} \left\{\boldsymbol{\Phi} \otimes \boldsymbol{\Theta} \right\},$$
where $\boldsymbol{\theta }$ is the sparse coefficient vector described in Eq. (1), and $\boldsymbol{\Theta }$ is the matrix form of $\boldsymbol{\theta }$. Assume that the number of colums of sparse basis is equal to the number of detector elements. So the dimension of $\boldsymbol{\Theta }$ is equal to $M_x \times M_y$. Let $\boldsymbol{\Theta }_j$ be the output of the $j$th layer, then the mathematical model in the $j$th layer can be formulated as
$$\boldsymbol{\Theta}_j=\sigma_\beta[\boldsymbol{\Phi}_{j}\otimes \boldsymbol{\Theta}_{j-1} + \widetilde{\mathbf{\Gamma}}_{j} \otimes \widetilde{\mathscr{Y}}], \quad j=1,2,\ldots,K.$$

It is noted that the variable $\boldsymbol{\Theta }$ is initialized as $\boldsymbol{\Theta }_0=\boldsymbol{0}$, so the convolution kernel $\boldsymbol{\Phi }_1$ vanishes.

Next, an integrated description of the MDL framework is provided. The entire MDL framework consists of three sub-parts, namely the coding mask layer, the decoder and the decoder. The coding mask layer transfers the object attenuation coefficients to the compressive measurements, and the coding masks are used to modulate the structured illumination and to reduce the exposure dose. The encoder represents the feedforward CNN network, which transfers the compressive measurements to the sparse coefficients. The decoder is a fully connected layer, which converts sparse coefficients into the reconstructed object. The joint optimization over the three sub-parts can effectively improve the degrees of optimization freedom, and try to approach the optimal solution.

4.2 Initialization of MDL network parameters

This section provides a systematic initialization approach for the parameters in MDL network. As described in Section 4.1, the substitution of FCN layers by convolution kernels will inevitably introduce errors. Thus, the MDL parameters are initialized based on the least square method that minimizes the difference between the outputs obtained by the fully connected weight matrices and the corresponding convolution kernels. Using this method, we can find out the initial convolution kernels that match the fully connected weight matrices best.

Consider the convolution between two matrices $\mathbf {A} \in \mathbb {R}^{u \times u}$ and $\mathbf {D} \in \mathbb {R}^{v \times v}$. Assume $u$ and $v$ are odd numbers, and the $(i,j)$th elements in $\mathbf {A}$ and $\mathbf {D}$ are denoted as $a_{ij}$ and $d_{ij}$, respectively. We can convert $\mathbf {A} \otimes \mathbf {D}$ into matrix multiplication $\mathbf {D}_a \cdot \boldsymbol{a}$, where $\boldsymbol{a}$ is the raster-scanned vector of $\mathbf {A}$, and $\mathbf {D}_a$ is the equivalent transformation matrix to represent the convolution effect. Mathematically, we first truncate a $u \times u$ sub-matrix from $\mathbf {D}$, where the center of the sub-matrix locates at $(p,q)$. The coordinates $(p,q)$ are given by $p = \lfloor \frac {i-1}{v}\rfloor + 1$ and $q = \textrm {mod}(i-1,v) + 1$. Then, the sub-matrix is raster-scanned into a $u^2 \times 1$ vector, and this vector is the $i$th row of $\mathbf {D}_a$.

First, consider the initialization of $\widetilde {\mathbf {\Gamma }}$. According Eq. (9), we have

$$\mathbf{Z} \boldsymbol{y} \approx \mathscr{T} \left\{ \frac{1}{P} \widetilde{\mathbf{\Gamma}} \otimes \widetilde{\mathscr{Y}} \right\} = \mathbf{H}_y \cdot \boldsymbol{\tau},$$
where $\boldsymbol{\tau } \in \mathbb {R}^{Q\times 1}$ ($Q=Q_x\times Q_y \times P$) is the raster-scanned vector of the 3D convolution kernel $\widetilde {\mathbf {\Gamma }}$, $\mathscr {T}\{ \cdot \}$ is the unfolding operator as defined in Eq. (9), and $\mathbf {H}_y$ is the equivalent transformation matrix that can be generated using aforementioned method. Then, the vector $\boldsymbol{\tau }$ can be estimated by using the least square method
$$\hat{\boldsymbol{\tau}}=(\mathbf{H}^{\top}_y \mathbf{H}_y)^{{-}1} \mathbf{H}^{\top}_y \mathbf{Z}\boldsymbol{y}.$$

Then, the initial convolution kernel $\widetilde {\mathbf {\Gamma }}$ in all layers can be obtained by reshaping $\hat {\boldsymbol{\tau }}$.

The initialization of $\boldsymbol{\Phi }$ is implemented in a similar way. According to Eq. (10), we have

$$\mathbf{W} \boldsymbol{\theta} \approx \mathscr{T} \left\{ \boldsymbol{\Phi} \otimes \mathbf{\Theta} \right\} = \mathbf{H}_{\theta} \cdot \boldsymbol{\varphi},$$
where $\boldsymbol{\varphi }$ is the raster-scanned vector of $\boldsymbol{\Phi }$, and $\mathbf {H}_{\theta }$ is the equivalent transformation matrix. The convolution kernel $\boldsymbol{\Phi }$ in different layers will be estimated successively. Consider the $k$th layer, the matrix $\mathbf {W}$ can be calculated by Eq. (7). Assume we have initialized the convolution kernels $\mathbf {\Gamma }_1, \mathbf {\Gamma }_2, \ldots , \mathbf {\Gamma }_{k-1}$, then $\mathbf {\Theta }_{k-1}$ can be calculated according to the flowchart in Fig. 3, and $\boldsymbol{\varphi }_k$ can be estimated as:
$$\boldsymbol{\varphi}_k=(\mathbf{H}^{\top}_{\theta} \mathbf{H}_{\theta})^{{-}1} \mathbf{H}^{\top}_{\theta} \mathbf{W} \boldsymbol{\theta}_{k-1}.$$

Then, the initial convolution kernel $\boldsymbol{\Phi }_k$ can be obtained by reshaping $\boldsymbol{\varphi }_k$.

4.3 Unsupervised end-to-end training of MDL

This section develops an unsupervised end-to-end training method with two merits. First, most of the supervised training methods require the time-consuming labelling process. To train the inverse tomosynthesis network, we need to reconstruct the sparse coefficient vectors $\boldsymbol{\theta }_K$ for all training samples in advance. In principle, these sparse coefficients can be obtained using traditional reconstruction algorithms, which are computationally intensive and storage expensive. The proposed unsupervised strategy will avert the time-consuming labelling process by designing an auto-decoder for the feedforward MDL network.

Another merit of the proposed training method is that it can jointly optimize the coding masks and the neural network parameters. According to the CS theory, the optimization of coding masks plays an important role in improving the condition of sensing matrix and the reconstruction performance of tomosynthesis system [17]. Most of existing compressive tomosynthesis methods use pre-defined sparse bases, including the discrete cosine transform (DCT) basis and wavelet bases. In addition, current methods treat the coding mask optimization and object reconstruction as two separate and independent steps. Typically, the coding masks are optimized beforehand according to the condition of sensing matrix. Then, the coding masks are fixed, and the iterative numerical algorithms are used to reconstruct the object. However, current methods fall short to jointly optimize the coding masks, sparse basis and reconstruction process. In order to increase the degrees of optimization freedom, this paper proposes an end-to-end framework to jointly optimize the coding masks and the network parameters. The basic idea is to involve the coding masks and sparse basis as two special layers in the MDL network, which are then trained together with the convolution kernels.

4.3.1 Cost function of unsupervised training

As shown in Fig. 3(a), each band of the measurement cubic $\widetilde {\mathscr {Y}}$ can be calculated from the object vector $\boldsymbol{f}$ using Eq. (1). This is equivalent to add a “coding mask layer” in the front of MDL network, which is composed of the system matrices $\mathbf {H}_p$ and the code matrices $\mathbf {C}_p$. The input of the “coding mask layer” is the object vector $\boldsymbol{f}$. The $\mathbf {H}_p$ is fixed since it is determined by the sensing geometry of tomosynthesis system. On the other hand, the $\mathbf {C}_p$ is regarded as variable matrix that can be optimized during the training process.

As mentioned above, $\mathbf {C}_p$ is an $M \times M$ diagonal matrix with binary elements. Let $\mathbf {C}_{pi}$ represent the $i$th diagonal element. In order to reduce the training problem to a continuous problem, we relax $\mathbf {C}_{pi}$ to a grey-scaled parameter $\alpha _{pi}$:

$$\mathbf{C}_{pi} = \dfrac{1+\cos\alpha_{pi}}{2},~~~\alpha_{pi}\in(-\infty,+\infty).$$

In the training process, $\alpha _{pi}$ is optimized using the gradient-based algorithm. After training, the binary coding masks are calculated based on Eq. (16) followed by a hard thresholding operation.

Next, consider the training of sparse basis. As shown in Fig. 3, the output of feedforward CNN is the sparse coefficient matrix $\mathbf {\Theta }_K$ of object, which can be vectorized as $\boldsymbol{\theta }_K$. If the network is well-trained, the output $\boldsymbol{\theta }_K$ should be very close to the true value of sparse coefficients. Then, the multiplication between the sparse basis $\boldsymbol{\Psi }$ and $\boldsymbol{\theta }_K$ should be approximate to the ground truth of object data. Based on this concept, an auto-decoder is inserted as shown in Fig. 3(b) to transform the sparse coefficients back to the object, which can be expressed as $\boldsymbol{\hat {f}}=\boldsymbol{\Psi } \boldsymbol{\theta }_K$. The decoder is regarded as a fully connected layer, where the weight matrix is equal to $\boldsymbol{\Psi }$.

Before the training process, we need to collect the training data set $\mathbf {F}=[\boldsymbol{f}^1,\boldsymbol{f}^2,\ldots ,\boldsymbol{f}^i,\ldots ]$, where $\boldsymbol{f}^i$ represents the vectorized representation of the $i$th training object. Input $\boldsymbol{f}^i$ into the network, the output from the decoder should be the reconstructed object vector $\boldsymbol{\hat {f}}^i=T_D{\bigg \{} T_F{\bigg \{} T_C\{\boldsymbol{f}^i\} {\bigg \}}{\bigg \}}$, where $T_C\{ \cdot \}$, $T_F\{\cdot \}$ and $T_D\{ \cdot \}$ represent the operations of the coding mask layer, the feedforward CNN and the decoder, respectively. The goal of network training is to find out the optimal coding masks, sparse basis and convolution kernels that minimize the distance between the input training samples $\boldsymbol{f}^i$ and their corresponding reconstruction results $\boldsymbol{\hat {f}}^i$. In order to accelerate the training process, we use a batch training method with batch size $B$. Then, the cost function of unsupervised training is formulated as

$$E=\frac{1}{2B} \sum_{j=1}^{B}{\bigg\|}\boldsymbol{f}^j-\frac{1}{\varepsilon}\boldsymbol{\hat{f}}^j{\bigg\|}_2^2=\frac{1}{2B} \sum_{j=1}^{B}{\bigg\|}\boldsymbol{f}^j-\frac{1}{\varepsilon}\boldsymbol{\Psi}\boldsymbol{\theta}_K^j{\bigg\|}_2^2,$$
where $\boldsymbol{\theta }_K^j$ is the output of encoder for the $j$th training sample, and $\varepsilon$ is a normalization factor. In this paper, $\varepsilon$ is selected by the line search method to minimize the initial value of the cost function, thus accelerating the convergence speed of the training process. Then, the training problem can be formulated as
$${\bigg\{}\hat{\alpha}_{pi}, \hat{\mathbf{\Gamma}}_p, \hat{\boldsymbol{\Phi}}, \hat{\boldsymbol{\Psi}} {\bigg\}} = \arg \min_{\alpha_{pi}, \mathbf{\Gamma}_p, \boldsymbol{\Phi}, \boldsymbol{\Psi}} E.$$

It is noted that the proposed unsupervised training strategy can automatically learn from the training set $\mathbf {F}$ without any labelling.

4.3.2 Penalty terms

Besides the cost function described in Eq. (17), we introduce two penalty terms to constrain the solution space to satisfy some characteristics of the coding mask. The parameter transformation in Eq. (16) will introduce grey-scaled coding mask, which is then discretized to a binary one after training. In order to reduce the quantization error, a quadratic penalty term is involved in the cost function:

$$R_Q = \sum_{p=1}^{P}\sum_{i=1}^{M}\left[1-(2\mathbf{C}_{pi}-1)^2\right],$$
where $\mathbf {C}_{pi}$ is described in Eq. (16). Only when $\mathbf {C}_{pi}$ approaches to 0 or 1, the penalty $R_Q$ approaches to 0. So, the penalty $R_Q$ constrains the coding mask close to a binary mask. In addition, another penalty term is incorporated to constrain the coding masks with fixed transmittance. Assume the desired transmittance is $\mu$, the transmittance penalty can be formulated as
$$R_T = \sum_{p=1}^{P}\left(\sum_{i=1}^{M} \mathbf{C}_{pi} - \mu M\right)^2,$$
where $M$ is the total number of elements on a coding mask.

Taking into account the penalty terms, the cost function is extended to:

$$E' = E + \gamma_Q R_Q + \gamma_T R_T,$$
where $E$ is described in Eq. (17), $\gamma _Q$ and $\gamma _T$ are the penatly weight coefficients. Then, the training problem can be rewritten as following:
$$\left\{\hat{\alpha}_{pi}, \hat{\mathbf{\Gamma}}_p, \hat{\boldsymbol{\Phi}}, \hat{\boldsymbol{\Psi}} \right\} = \arg \min_{\alpha_{pi}, \mathbf{\Gamma}, \boldsymbol{\Phi}, \boldsymbol{\Psi}} E' = \arg \min_{\alpha_{pi}, \mathbf{\Gamma}, \boldsymbol{\Phi}, \boldsymbol{\Psi}} \left(E + \gamma_Q R_Q + \gamma_T R_T \right).$$

4.3.3 Back-propagation algorithm

This paper applies the back-propagation algorithm to solve for the training problem in Eq. (22). At the beginning, the coding masks are initialized as Bernoulli random patterns with 50% transmittance. The parameter $\alpha _{pi}$ in Eq. (16) is initialized as $4\pi /5$ and $\pi /5$ for the blocked and unblocked coding mask pixels, respectively. The initial sparse basis $\boldsymbol{\Psi }$ is defined as the Kronecker product of a 2D wavelet transform basis in the $x$-$y$ plane and a 1D DCT basis along the $z$ axis [37]. The gradient of cost function provides the direction to update the optimization variables. Let $\omega$ represent any optimization variable among $\alpha _{pi}$, $\mathbf {\Gamma }$, $\boldsymbol{\Phi }$, and $\boldsymbol{\Psi }$. Then, the variable $\omega$ can be iteratively updated as follows:

$$\omega = \omega - \textrm{step}_{\omega} \cdot \nabla E'|_\omega,$$
where $\nabla E'|_\omega$ is the gradient of cost function with respect to $\omega$, and step$_{\omega }$ is the step size. After the training process, $\boldsymbol{\hat {\Gamma }}$ and $\boldsymbol{\hat {\Phi }}$ are used as the convolution kernel of the feedforward CNN. The optimized coding masks are used to modulate the structured illuminations, and $\boldsymbol{\hat {\Psi }}'=(1/\varepsilon )\boldsymbol{\hat {\Psi }}$ is used as the optimal basis to sparsely represent the object.

5. Simulation and analysis

This section provides the simulations to verify the advantage of the proposed MDL approach. In Section 5.1, the training and testing data sets are generated by the Shepp-Logan phantom MATLAB tools. In Section 5.2, the MDL method is compared to the traditional gradient-based reconstruction algorithm. In Section 5.3, the MDL is implemented by using the real Walnut data set. It is noted that this paper focuses on the typical tomosynthesis system with cone-beam X-ray illuminations. Nevertheless, the principles and methodologies can be generalized to other sensing geometries.

5.1 Training and testing of MDL

The following simulations are based on compressive X-ray tomosynthesis configuration in Fig. 1, with 5 cone-beam X-ray sources ($P=5$) distributing uniformly in a $3 \times 3$ array. A coding mask with dimension $M_x \times M_y = 64 \times 64$ is placed under each X-ray source, where the pitch of coding elements is adjusted to obtain one-to-one correspondence with the detector elements. The system matrices $\mathbf {H}_p$ and the projection measurements $\boldsymbol{y}_p$ for all X-ray sources are obtained by the ASTRA Tomography Toolbox (“All Scale Tomographic Reconstruction Antwerp”) [38]. The dimension of all convolution kernels in the $x-y$ plane is $7\times 7$. The penalty weight coefficients in Eqs. (19) and (20) are set to be $1 \times 10^{-4}$. In this paper, the threshold value of function $\sigma _\beta$ is chosen empirically based on simulations, and the threshold value in Eq. (11) is set to be 0.1.

Training data play an important role in deep learning. In this simulations, we generate 36 Shepp-Logan phantom objects with dimension $N_x \times N_y \times N_z = 32 \times 32 \times 4$, where the locations and sizes of elliptical features in the phantoms are different. Then, these samples are divided into four categories according to the number of internal ellipses. As shown in Fig. 4, we randomly select one sample in each category as testing data, and the rest of the samples are used as training data. The batch size in training process is $B=10$. MDL network is trained for 15 epoches. During the training process, we use each of the training data in a batch to update the neural network parameters for 20 times based on Eq. (23). The step size to update the sparse basis and convolution kernels is 0.001, and the step size to update the coding mask is 5.

 figure: Fig. 4.

Fig. 4. The (a) training data set and (b) testing data set for the MDL model.

Download Full Size | PDF

Figure 5 shows the obtained reconstruction error of the MDL with different number of layers. It is observed that the four-layer MDL model leads to a better reconstruction results. In the testing stage, we first use the optimized coding masks to calculate the 2D tomosynthesis measurements of the objects. Then, the measurement data is used as the input of the feedforward CNN, and then the sparse coefficient vector $\hat {\boldsymbol{\theta }}_K$ is obtained. Finally, the object is reconstructed as $\hat {\boldsymbol{\Psi }} \hat {\boldsymbol{\theta }}_K$, where $\hat {\boldsymbol{\Psi }}$ is the optimized sparse basis. Figure 6 illustrates reconstructed objects in the testing data set. From left to right, it shows four layers in the object along the $z$-axis. The peak signal-to-noise ratio (PSNR) is used to evaluate the quality of the reconstructed images. Given the original image $\mathbf {I}\in \mathbb {R}^{N \times N}$ and its reconstruction $\mathbf {R} \in \mathbb {R}^{N \times N}$, the PSNR is calculated as $\textrm {PSNR}=10\textrm {log}_{10}(I_{max}^2/\textrm {MSE})$, where $I_{max}$ is the maximum possible pixel value within $\mathbf {I}$, and MSE is the mean square error of the reconstructed image defined as $\textrm {MSE} = \frac {1}{N^2}\sum _{i=1}^{N}\sum _{j=1}^{N}[\mathbf {I}(i,j)-\mathbf {R}(i,j)]^2$. With the original image as the reference, the PSNRs of the reconstructed objects are shown in Fig. 6. Fig. 7 shows the coding masks before and after optimization, as well as their difference patterns.

 figure: Fig. 5.

Fig. 5. The resulting training error corresponding to different number of convolution layers.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. The reconstrusted images of testing object obtained by the proposed MDL model.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. The optimization results of coding masks using MDL method: (a) the initial coding masks, (b) optimized coding masks, and (c) the difference patterns between them.

Download Full Size | PDF

5.2 Comparison between MDL and traditional reconstruction approach

This section compares the proposed MDL approach to the traditional CS reconstruction approach. In the traditional method, the coding mask is first optimized based on the RIP in CS theory. Then, the optimized coding masks are used to modulate the X-ray illuminations, and the compressive measurements are obtained on the detector [16,17]. The gradient projection for sparse reconstruction (GPSR) algorithm is used to reconstruct the object from the compressive measurement aided by the sparsity prior. GPSR is selected because it has been verified to achieve superior or comparable reconstruction performance among multiple CS algorithms [39].

Figure 8 shows the reconstructed objects in testing data set using the traditional CS method. The iteration number of the GPSR algorithm is 3000. Comparing Figs. 6 and 8, it shows that MDL has at least 7db improvement over traditional CS method. That is because the trained coding mask and sparse basis can well characterize the testing data sets, and the output of MDL can be regarded as the result of a large number of iterations of traditional CS algorithms. Figure 9 shows the results of coding mask optimization using traditional method. Table 1 compares the runtimes of optimizing coding masks by the proposed MDL method and traditional CS method, the proposed MDL method can achieve more than 14-fold speedup. In addition, the average runtimes of reconstruction by MDL and gradient-based approaches over the whole testing data set is compared in Table 1, the proposed MDL method can achieve more than 45-fold speedup.

 figure: Fig. 8.

Fig. 8. The reconstrusted images of testing object obtained by the traditional CS algorithm.

Download Full Size | PDF

 figure: Fig. 9.

Fig. 9. The optimization results of coding masks using traditoinal method: (a) the initial coding masks, (b) optimized coding masks, and (c) the difference patterns between them.

Download Full Size | PDF

Tables Icon

Table 1. Runtimes of optimizing coding masks by MDL (K = 4) and gradient-based approaches, and average runtimes of reconstruction by MDL and gradient-based approaches over the whole testing data set.

5.3 Simulation using real objects

In this subsection, the proposed method is tested on a real Walnut phantom data set obtained by an experimental tomography system, namely Rigaku CT Lab GX 130 with a 130kV X-ray generator. A total of 240 different Walnut phantoms are generated, each of which is reconstructed from 459 projection measurements. We randomly select 95% Walnut phantoms in the data set for training, and the rests are used for testing. To expand the data set, each Walnut object is uniformly divided into 55 sub-objects along the $z$-axis, and each sub-object includes 4 slices. By re-sampling the data cubes, each sub-object has dimension $64 \times 64 \times 4$. In order to speed up the training, the proposed method was implemented in Python using PyTorch. The batch size is set to be 50 in the training stage, and the MDL model is trained for 200 epoches. The step size to update the sparse basis and convolution kernels is 0.0001, and the step size for the coding mask is 1. The rest of the parameters are consistent with the previous simulations. Note we conduct the experiment for multiple times with shuffled data, and the results keep consistent. In the following experiments, the MDL is used to obtain the approximate reconstructions of the sub-objects. Then, we use the output of MDL as the initial guess of the solution. Then, the GPSR algorithm is run for 20 iterations to further optimize the solution to improve the reconstruction quality. For some complex tomosynthesis reconstruction problems, MDL cannot provide the final solutions directly, but it can provide a good initial guess of the reconstruction solutions. Thus, we carry out the traditional CS algorithm for several iterations to refine the reconstructed solutions. The addition of a few iterations will improve the robustness and accuracy of the reconstruction results.

As a comparison, the traditional CS algorithm with 3000 GPSR iterations are also used to reconstruct the sub-objects under the initial settings. Several representative reconstructed results are shown in Fig. 10. It can be seen that the proposed MDL approach can achieve better reconstructed performance, since MDL can provide good initial images for the following iterations. Table 2 compares the average runtimes and average PSNRs over the whole testing data set. Figure 11 shows the optimization results of coding masks with MDL method. It can be seen that the proposed MDL method can achieve more than 284-fold speedup and get 9dB improvement in average compared to the traditional CS method.

 figure: Fig. 10.

Fig. 10. The reconstruction results comparison on two different 3D Walnut phantom by the traditional CS approach and the proposed MDL model.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. The optimization results of coding masks by the proposed MDL model: (a) the initial coding masks, (b) optimized coding masks, and (c) the difference patterns between them.

Download Full Size | PDF

Tables Icon

Table 2. Average runtimes and PSNR of MDL (K = 4) and gradient-based approaches over the whole testing data set.

6. Conclusion

A novel MDL method was developed for mutiple-shot X-ray tomosynthesis system to effectively improve the reconstruction quality and speed. The IHT algorithm and forward imaging model were used to build up the architecture of the MDL model. The MDL model is an end-to-end framework so that it can avoid labeling task. In addition, our current MDL method, its network structure and parameters are all introduced and initialized by the formula, so each parameter of the network is given the corresponding physical meaning. Since the structure and initialization of the network are model driven, it is more reliable than random or empirical initialization. Beyond that, this paper developed a unit framework to optimize the coding mask and the neural network parameters together. In future work, we will generalize the proposed MDL into other algorithm. Furthermore, the proposed MDL will be generalized to the fan beam X-ray systems.

Funding

Fundamental Research Funds for the Central Universities (2018CX01025); National Science Foundation (CIF 1717578).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging (Society for Industrial and Applied Mathematics, 2011).

2. J. T. Dobbins III and D. J. Godfrey, “Digital X-ray tomosynthesis: current state of the art and clinical potential,” Phys. Med. Biol. 48(19), R65–R106 (2003). [CrossRef]  

3. F. Elarnaut, J. P. O. Evans, D. Downes, A. J. Dicken, S. X. Godber, and K. D. Rogers, “Sporadic absorption tomography using a conical shell X-ray beam,” Opt. Express 25(26), 33029–33042 (2017). [CrossRef]  

4. H. Gao, L. Zhang, Z. Chen, Y. Xing, H. Xue, and J. Cheng, “Straight-line-trajectory-based X-ray tomographic imaging for security inspections: system design, image reconstruction and preliminary results,” IEEE Trans. Nucl. Sci. 60(5), 3955–3968 (2013). [CrossRef]  

5. T. M. Buzug, Computed Tomography: from Photon Statistics to Modern Cone-Beam CT (Springer, 2008).

6. E. Ron, “Cancer risks from medical radiation,” Health Phys. 85(1), 47–59 (2003). [CrossRef]  

7. I. Reiser and S. Glick, Tomosynthesis Imaging (Taylor and Francis, 2014).

8. H. Kudo, T. Suzuki, and E. A. Rashed, “Image reconstruction for sparse-view CT and interior CT-introduction to compressed sensing and differentiated backprojection,” Quant. Imag. Med. Surg. 3(3), 147–161 (2013). [CrossRef]  

9. A. A. Samarskii and P. N. Vabishchevich, Numerical Methods for Solving Inverse Problems of Mathematical Physics (Walter de Gruyter, 2007).

10. F. Natterer, The Mathematics of Computerized Tomography (New York, 1986).

11. K. Choi and D. J. Brady, “Coded aperture computed tomography,” Proc. SPIE 7468, 74680B (2009). [CrossRef]  

12. A. A. M. Muñoz, A. Vella, M. J. F. Healy, D. W. Lane, I. Jupp, and D. Lockley, “3D-printed coded apertures for X-ray backscatter radiography,” Proc. SPIE 10393, 103930F (2017). [CrossRef]  

13. A. Holmgren, “Coding strategies for X-ray tomography,” Dissertation, Duke University (2016).

14. Wolfmet, “Precision tungsten collimators from wolmet,” https://static.mimaterials.com/SLMflyerweb.pdf.

15. Y. Kaganovsky, D. Li, A. Holmgren, H. Jeon, K. MacCabe, D. Politte, J. O’Sullivan, L. Carin, and D. J. Brady, “Compressed sampling strategies for tomography,” J. Opt. Soc. Am. A 31(7), 1369–1394 (2014). [CrossRef]  

16. A. P. Cuadros, C. Peitsch, H. Arguello, and G. R. Arce, “Coded aperture optimization for compressive X-ray tomosynthesis,” Opt. Express 23(25), 32788–32802 (2015). [CrossRef]  

17. A. P. Cuadros and G. R. Arce, “Coded aperture optimization in compressive X-ray tomography: a gradient descent approach,” Opt. Express 25(20), 23833–23849 (2017). [CrossRef]  

18. X. Ma, Q. Zhao, A. P. Cuadros, T. Mao, and G. R. Arce, “Source and coding mask joint optimization for compressive X-ray tomosynthesis,” Opt. Express 27(5), 6640–6659 (2019). [CrossRef]  

19. T. Mao, A. P. Cuadros, X. Ma, W. He, Q. Chen, and G. R. Arce, “Fast optimization of coded apertures in X-ray computed tomography,” Opt. Express 26(19), 24461–24478 (2018). [CrossRef]  

20. T. Mao, A. P. Cuadros, X. Ma, W. He, Q. Chen, and G. R. Arce, “Coded aperture optimization in X-ray tomography via sparse principal component analysis,” IEEE Trans. Comput. Imaging 6, 73–86 (2020). [CrossRef]  

21. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]  

22. Z. Wang, Q. Ling, and T. S. Huang, “Learning deep l0 encoders,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI Press, 2016), pp. 2194–2200.

23. K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proceedings of the 27th International Conference on Machine Learning, (ICML, 2010), pp. 399–406.

24. P. Sprechmann, A. M. Bronstein, and G. Sapiro, “Learning efficient sparse and low rank models,” IEEE Trans. Pattern Anal. 37(9), 1821–1833 (2015). [CrossRef]  

25. B. Xin, Y. Wang, W. Gao, and D. Wipf, “Maximal sparsity with deep networks,” arXiv preprint arXiv: 1605.01636 (2016).

26. S. Sundaravelpandian, S. Johan, and G. Philipp, “Deep-learning neural-network architectures and methods: Using component-based models in building-design energy prediction,” Adv. Eng. Inform. 38, 81–90 (2018). [CrossRef]  

27. Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey of model compression and acceleration for deep neural networks,” arXiv preprint arXiv: 1710. 09282 (2017).

28. S. Amari, The handbook of brain theory and neural networks. (MIT press, 2003).

29. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM 60(6), 84–90 (2017). [CrossRef]  

30. O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional neural networks for speech recognition,” IEEE-ACM Trans. Audio, Speech, and Language Processing 22(10), 1533–1545 (2014). [CrossRef]  

31. Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Proceedings of the 30th international conference on neural information processing systems, (2016), pp. 10–18.

32. H. Wang, Q. Xie, Q. Zhao, and D. Meng, “A model-driven deep neural network for single image rain removal,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 3103–3112.

33. Z. Wei and X. Chen, “Physics-inspired convolutional neural network for solving full-wave inverse scattering problems,” IEEE Trans. Antennas Propag. 67(9), 6138–6148 (2019). [CrossRef]  

34. E. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008). [CrossRef]  

35. J. Zhang, C. Zhao, D. Zhao, and W. Gao, “Image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization,” Signal Processing 103, 114–126 (2014). [CrossRef]  

36. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (IEEE, 2015), pp. 3431–3440.

37. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging: an introduction,” IEEE Signal Proc. Mag. 31(1), 105–115 (2014). [CrossRef]  

38. W. Xu, F. Xu, M. Jones, B. Keszthelyi, J. Sedat, D. Agard, and K. Mueller, “High-performance iterative electron tomography reconstruction with long-object compensation using graphics processing units,” J. Structural Biol. 171(2), 142–153 (2010). [CrossRef]  

39. Z. Wang, X. Ma, R. Chen, G. R. Arce, L. Dong, H. J. Stock, and Y. Wei, “Comparison of different lithographic source optimization methods based on compressive sensing,” Proc. SPIE 11327, 1132716 (2020). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. Sketch of compressive X-ray tomosynthesis.
Fig. 2.
Fig. 2. (a) The flowchart of IHT algorithm, which can be interpreted as a recursive network. (b) Unfolded fully connected network derived from IHT algorithm.
Fig. 3.
Fig. 3. (a) The coding mask layer that maps the 3D object $\boldsymbol{f}$ to the input measurement data $\widetilde {\mathscr {Y}}$ , (b) the forward CNN, also referred to as encoder, used to learn the sparse coefficient vector of 3D object from the input measurement data, and (c) the decoder that map the sparse coefficient vector to the reconstructed object.
Fig. 4.
Fig. 4. The (a) training data set and (b) testing data set for the MDL model.
Fig. 5.
Fig. 5. The resulting training error corresponding to different number of convolution layers.
Fig. 6.
Fig. 6. The reconstrusted images of testing object obtained by the proposed MDL model.
Fig. 7.
Fig. 7. The optimization results of coding masks using MDL method: (a) the initial coding masks, (b) optimized coding masks, and (c) the difference patterns between them.
Fig. 8.
Fig. 8. The reconstrusted images of testing object obtained by the traditional CS algorithm.
Fig. 9.
Fig. 9. The optimization results of coding masks using traditoinal method: (a) the initial coding masks, (b) optimized coding masks, and (c) the difference patterns between them.
Fig. 10.
Fig. 10. The reconstruction results comparison on two different 3D Walnut phantom by the traditional CS approach and the proposed MDL model.
Fig. 11.
Fig. 11. The optimization results of coding masks by the proposed MDL model: (a) the initial coding masks, (b) optimized coding masks, and (c) the difference patterns between them.

Tables (2)

Tables Icon

Table 1. Runtimes of optimizing coding masks by MDL (K = 4) and gradient-based approaches, and average runtimes of reconstruction by MDL and gradient-based approaches over the whole testing data set.

Tables Icon

Table 2. Average runtimes and PSNR of MDL (K = 4) and gradient-based approaches over the whole testing data set.

Equations (23)

Equations on this page are rendered with MathJax. Learn more.

y = C H f = C H Ψ θ ,
θ ^ = arg min θ θ 0 ,       s . t .     y = A θ   and   f = Ψ θ .
θ ^ = arg min θ 1 2 y A θ 2 2 ,   s . t .   θ 0 S .
θ ( n + 1 ) = Ω S [ θ ( n ) A ( A θ ( n ) y ) ] ,
θ ( n + 1 ) = σ β [ W θ ( n ) + Z y ] , θ ( 0 ) = 0 ,
A = C H Ψ = [ C 1 H 1 Ψ , C 2 H 2 Ψ , , C P H P Ψ ] .
W = I A A = I p = 1 P Ψ H p C p C p H p Ψ = 1 P p = 1 P [ I P ( Ψ H p C p C p H p Ψ ) ] .
Z y = A y = 1 P p = 1 P Ψ H p C p y p .
Z y = 1 P p = 1 P Ψ H p C p y p T { 1 P p = 1 P Γ p Y p } T { 1 P Γ ~ Y ~ } ,
W θ = 1 P p = 1 P [ I P ( Ψ H p C p C p H p Ψ ) ] θ T { Φ Θ } ,
Θ j = σ β [ Φ j Θ j 1 + Γ ~ j Y ~ ] , j = 1 , 2 , , K .
Z y T { 1 P Γ ~ Y ~ } = H y τ ,
τ ^ = ( H y H y ) 1 H y Z y .
W θ T { Φ Θ } = H θ φ ,
φ k = ( H θ H θ ) 1 H θ W θ k 1 .
C p i = 1 + cos α p i 2 ,       α p i ( , + ) .
E = 1 2 B j = 1 B f j 1 ε f ^ j 2 2 = 1 2 B j = 1 B f j 1 ε Ψ θ K j 2 2 ,
{ α ^ p i , Γ ^ p , Φ ^ , Ψ ^ } = arg min α p i , Γ p , Φ , Ψ E .
R Q = p = 1 P i = 1 M [ 1 ( 2 C p i 1 ) 2 ] ,
R T = p = 1 P ( i = 1 M C p i μ M ) 2 ,
E = E + γ Q R Q + γ T R T ,
{ α ^ p i , Γ ^ p , Φ ^ , Ψ ^ } = arg min α p i , Γ , Φ , Ψ E = arg min α p i , Γ , Φ , Ψ ( E + γ Q R Q + γ T R T ) .
ω = ω step ω E | ω ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.