Abstract
Among the many super-resolution techniques for microscopy, single-molecule localization microscopy methods are widely used. This technique raises the difficult question of precisely localizing fluorophores from a blurred, under-resolved, and noisy acquisition. In this work, we focus on the grid-based approach in the context of a high density of fluorophores formalized by a ℓ2 least-square term and sparsity term modeled with ℓ0 pseudo-norm. We consider both the constrained formulation and the penalized formulation. Based on recent results, we formulate the ℓ0 pseudo-norm as a convex minimization problem. This is done by introducing an auxiliary variable. An exact biconvex reformulation of the ℓ2 − ℓ0 constrained and penalized problems is proposed with a minimization algorithm. The algorithms, named CoBic (Constrained Biconvex) and PeBic (Penalized Biconvex) are applied to the problem of single-molecule localization microscopy and we compare the results with other recently proposed methods.
© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
Single-Molecule localization microscopy (SMLM) is an acquisition method that makes it possible to obtain images with a higher resolution than the diffraction limit. Ernst Abbe first developed the equation of the diffraction in 1873, which gives the resolution limit of a microscope in the lateral plane. The limit is given as
where $\lambda$ is the wavelength of the light and $NA$ is the numerical aperture of the optical system [1]. This limit is around 200 nm in the lateral plane of the microscope. This is not sufficient when observing small biological structures, such as proteins and viruses. An electron microscope has a resolution up to 0.2 nm but the samples must be pretreated and fixed, and in-vivo imaging (images of living structures) is then impossible. Among the many super-resolution techniques for microscopy (STED, SIM etc, see [2]), SMLM is widely used. 20 nm resolution is reported on SMLM when it was first introduced in [3–5], and can thus be used to observe fine structures.SMLM uses photoactivable fluorophores. This allows one to activate a sparse subsample of the fluorophores instead of illuminating all at the same time, as in standard fluorescence microscopy. The probability that two fluorophores are in the same Point Spread Function (PSF) is low when only a few fluorophores are activated (low-density images), and precise localization of each is therefore possible. However, obtaining the image of the whole sample necessitate to acquire a big number of low-density images, which takes time, and the temporal resolution of the image may be inadequate. High-density acquisitions reduce the total acquisition time and increase temporal resolution. However, the fluorophores may be too close to each other, and thus efficient localization algorithms are needed.
The ISBI 2013 [6] and 2016 [7] SMLM challenges addressed this problem and compared the localization performed by different algorithms. There are many approaches to solve the problem. Some algorithms localize spots in the observed image and fit specific shapes (templates, PSF) on these spots [8]. Other algorithms compare the spot to a pre-defined Gaussian function [9]. Other, more sophisticated algorithms test the likeness between the spot and a sum of different Gaussians on each spot [10]. Another approach consists of modeling the localization problem as an inverse problem with a sparsity term. These methods can be grouped into two: Grid methods and grid-less methods. The grid methods require a fine grid to obtain a sufficient precision, but at the cost of computational time. Grid-less algorithms do not have this problem, but the observation model is non-linear. With a grid, this observation model is linear. Some algorithms of the inverse problem approach are [11,12].
In this article, we focus on inverse grid method approaches. We work with this formulation of the problem mainly because of its versatility. Inverse problems with a sparsity term have many applications beyond SMLM as geophysics, source separation, variable selection, and many more.
2. Direct and inverse problem
2.1 The image formation model
Let $d\in \mathbb {R}^{M \times M}$ be an image acquisition. We suppose that only a small number of fluorophores have contributed to this image. We want to localize the fluorophores on a finer grid $x\in \mathbb {R}^{ML \times ML}$, where $L$ is the refinement factor. The goal is to reconstruct $x$ from $d$. To do so, we need to model the acquisition process. The fluorophores are observed through an optical system, and thus we observe diffraction discs instead of the fine position of the fluorophores. This is modeled by a convolution with the Point Spread Function (PSF) of the microscope. We suppose the PSF to be a Gaussian kernel:
where $I$ is a normalization factor. Furthermore, a sensor captures the observation with a resolution inferior to the fine grid. We model this as an operator that sums pixel groups of $L\times L$. The result is an observation of size $M \times M$. Finally, this observation is affected by noise $\eta$, which is assumed to be a mix of Poisson noise and Gaussian noise. We simplify the noise assumptions and consider only Gaussian noise.The model can be written as
where $A\in \mathbb {R}^{M^{2}\times (ML)^{2}}$ is the matrix that performs a convolution and downsampling.2.2 Formulation of the inverse problem
With the assumption of Gaussian noise we can write the recovering of $x$ as
The $\ell _2$ norm is defined in Notations at the end of section 2. However, $A$ is a matrix with more columns than lines, and thus the problem is underdetermined and ill-posed. An a priori knowledge of $x$ is needed to correctly localize the fluorophores. The first hypothesis is that only a few fluorophores are excited and emit light. Thus the solution should be sparse. We can use the $\ell _0$ pseudo-norm to enforce sparsity. This will be, by abuse of terminology referred to as the $\ell _0$-norm. It is defined as The second hypothesis is that the fluorophores emit light, and we wish to reconstruct the intensity, which is positive. We add, therefore, that the solution should be non-negative. This yields that we can search a solution $\hat {x}$ as2.3 State-of-the-art of sparse optimization
The two problems (2) and (3) are not continuous, nor convex and the problems are known to be NP-hard due to the combinatorial nature of the $\ell _0$ norm. Due to numerous applications in domains such as geophysics, variable selection and machine learning (see, among others, [13,14]), they have been greatly studied. We present below some methods to solve minimization problems with a $\ell _0$ norm.
Greedy algorithms Greedy algorithms are a well-used method in sparse optimization. These algorithms add one component to the signal $x$ at each iteration. The criterion for choosing a component depends on the algorithm. One of the easiest and least costly greedy algorithms, the Matching Pursuit (MP) algorithm [15], adds the component that reduces most the residual $R$, defined as $R=d-Ax^{(s)}$, at each iteration $s$. More sophisticated algorithms have been developed. Greedy sparse simplex [16] or Single Best Replacement (SBR) [17] can, in contrast to MP, add and also subtract components.
Relaxations The non-convexity is due to the $\ell _0$ norm. The convex $\ell _1$ norm could replace the non-convex $\ell _0$ norm, as it is known to promote sparsity. The $\ell _2-\ell _0$ problem becomes a $\ell _2-\ell _1$ problem. This approach is called a convex relaxation since a convex term replaces the non-convex term. The solution to the original problem is, however, not always the same as the one of the convex relaxed problem. This happens only under restrictive conditions on the matrix $A$ [18], which are not satisfied here. Non-smooth, non-convex, but continuous relaxations were primarily introduced to avoid the difference between $\|x\|_0$ and $\|x\|_1$ when $x$ contains large values. Among them we find NonNegative Garrote, the Log-Sum penalty, Capped-$\ell _1$, and the Continuous Exact $\ell _0$ (CEL0) penalty which is an exact relaxation for the problem (3), to mention some. The relaxed problems are still non-convex, and the convergence of the algorithms to a global minimum is not ensured. A unified view of the non-convex continuous relaxations for the penalty problem is given in [19].
Mathematical program with equilibrium constraint A more recent method of resolving a sparse optimization problem is to introduce auxiliary variables to mimic the nature of $\ell _0$ norm. This requires a constraint between the primary variable and the auxiliary. Hence the problem becomes a mathematical program with equilibrium constraint, and among the approaches, we find Mixed Integer reformulations [20] (designed for small dimension problems), and Boolean relaxation [21]. Articles, such as [22–25], use reformulations of the $\ell _0$ norm similar to ours.
Contribution: The aim of this paper is to present two new techniques for SMLM. We start in section 3 by introducing a reformulation of the $\ell _0$ norm. We rewrite the norm as a convex minimization problem by introducing an auxiliary variable. Thus, we can reformulate (2) and (3) as a mathematical program with equilibrium constraint (MPEC). Based on the reformulation of the problem, we define a Lagrangian cost function $G_\rho$. The function $G_\rho : \mathbb {R}^{N}\times \mathbb {R}^{N}\rightarrow \mathbb {R}$ is biconvex. In section 4, we propose a Constrained Biconvex (CoBic) algorithm and a Penalized Biconvex algorithm (PeBic) to minimize the new objective function. The algorithms are based on already existing and well-known algorithms, and thus easy to implement. In section 5, we test the algorithms on the problem of SMLM and compare them to the state-of-the-art methods Deep-STORM [26] and IRL1-CEL0 [27], as well as the well-known IHT algorithms [28]. The comparison done on the real dataset confirms the efficiency of CoBic and PeBic. Section 7 gives the mathematical justification of the algorithm.
Note that the reformulation of the $\ell _0$ norm was presented in [25], and slight variations have been previously presented in [22–25]. Our work differs from their work, as [23] and [22] did not work on the minimization problem (3) and (2). [24,25] study the minimization of the sum of Lipschitz continuous data term and a sparsity constraint. In this paper, the data term is the square norm, which is not Lipschitz continuous.
Notations:
- • The $\ell _2$ norm of a vector $x$ is denoted $\|\cdot \|_2$ and defined as $\sqrt {\sum _i x_i^{2}}$.
- • To simplify the notation we will set $\|\cdot \|=\|\cdot \|_2$.
- • The $\ell _1$ norm of a vector $x$ is denoted $\|\cdot \|_1$ and defined as $\sum _i |x_i|$.
- • A is a matrix in $\mathbb {R}^{P \times N}$, $P<N$. In this paper $P=M\times M$ and $N=ML\times ML$.
- • $A^{T}$ is the transposed matrix of $A$.
- • If not stated otherwise, the vector $x\in \mathbb {R}^{N}$.
- • The observed signal $d\in \mathbb {R}^{P}$.
- • For a matrix $A \in \mathbb{R}^{P \times N}$, the singular value decomposition (SVD) of $A$ is noted $A=U_A\Sigma (A)V_A^{*}$.
- • $-\textbf {1}\leq u \leq \textbf {1}$ is a component-wise notation, i.e, $\forall \, \, i, \, -1\leq u_i\leq 1$.
- • $|x|\in \mathbb {R}^{N}$ is a vector containing the absolute value of each component of the vector $x$.
3. Exact reformulation
In this section, we focus on a reformulation of the $\ell _0$ norm. [25] first introduced this formulation where they rewrite the $\ell _0$ norm as convex minimization problem by adding an auxiliary variable. We can write the $\ell _0$ norm of any $x \in \mathbb {R}^{N}$ as
We define the Lagrangian cost function $G_\rho (x,u): \mathbb {R}^{N}\times \mathbb {R}^{N} \rightarrow \mathbb {R}$ as
In this paper, we are interested in exact reformulation methods. This means that any minimizer of (9) must be also a minimizer of (8). We show in this paper that for $\rho >\sigma (A)\|d\|$, with $\sigma (A)$ the largest singular value of $A$, $G_\rho (x,u)$ is exact (see Section 7).4. A minimization algorithm
The minimization algorithm is presented in this section. We refer to the constrained biconvex algorithm by CoBic and to the penalized biconvex algorithm by PeBic.
The main body of the algorithm depends on two particularities of $G_\rho$; $G_\rho$ is convex when $\rho =0$, and the non-convexity of $G_\rho$ is due to the coupling term $<x,u>$. These two properties inspire the idea of an algorithm for minimizing $G_\rho (x,u)$. The minimization is initialized with a small $\rho ^{(0)}$ and $(x^{(0)},u^{(0)})$ as zero vector. We minimizes $G_{\rho ^{(0)}}(x^{(0)},u^{(0)})$ and the result is denoted $(x^{(1)},u^{(1)})$. The penalty parameter $\rho$ increases at each iteration. For a given iteration, $p$, we minimize $G_{\rho ^{(p)}}(x^{(p)},u^{(p)})$, with $(x^{(p)},u^{(p)})$ the solution of arg min $G_{\rho ^{(p-1)}}(x^{(p-1)},u^{(p-1)})$. This method will hopefully give a proper initialization for the final minimization, which is when $\rho >\sigma (A)\|d\|$. The second attractive property of functional $G_\rho$ is the bi-convexity. Minimization by blocks is therefore suitable. With this in mind, and following [25], we propose the following algorithm.
The Proximal Alternating Minimization algorithm (PAM) [29] minimize (10). The algorithm ensures convergence to a critical point, and thus Algorithm 1 converges to a critical point.
The PAM minimizes functions of the form
In our case, we have, $f(x)=\frac {1}{2}\|Ax-d\|^{2}+\rho \|x\|_1+ \iota _{\cdot \geq 0}(x)$, $g(u)=I(u)$ and $Q(x,u)=-\rho <x,u>$. PAM has the following outline4.1 The minimization with respect to $x$.
The minimization with respect to $x$ using PAM is
4.2 The minimization with respect to $u$
We study how to find a solution to the convex minimization problem
Firstly, we work with the constrained biconvex formulation of $G_\rho$, CoBic. $I(u)$ is thus defined by (6). The minimization problem (17) can be written as
5. Numerical experiments
In this section, we perform numerical experiments of CoBic and PeBic and compare them to other grid-based localization algorithms. They are compared on two datasets from the ISBI 2013 SMLM challenge [6]. A more recent challenge was launched in 2016. We decided to use the 2013 challenge as the data is denser in the 2013 challenge (220 fluorophores per acquisition in 2013 compared to 12 in the 2016 challenge). The algorithms are coded on MATLAB2015 with a computer running on Linux, with CPU INTEL core i7-3920XM, except Deep-STORM that was launched on a computer running on Linux, with CPU Intel Xeon E5-2687WV3. The code for PeBic and CoBic, as well as an example can be found at: https://github.com/abechens/CoBic-and-PeBic-SMLM/
CoBic and PeBic are compared with four algorithms; the IRL1-CEL0 [27], Deep-STORM [26], as well as the standard Iterative Hard Thresholding (IHT) [28] applied to the constrained and penalized form. The IRL1-CEL0 is an algorithm that uses the IRL1 [33] to minimize an exact relaxation [11] of the penalized formulation (3). Deep-STORM is an algorithm that uses deep-learning to localize the fluorophores. We use the public codes of Deep-STORM [34]. Note that a relaxation such that the $\ell _0$ norm in (3) is replaced by the $\ell _1$ norm is possible, however, this relaxation is not exact and a full comparison is not done. Table 3 in the Appendix shows the Jaccard index for the reconstruction. All the algorithms, except Deep-STORM, are based on the constrained or penalized problem. The IRL1-CEL0, penalized IHT, and PeBic have the trade-off parameter $\lambda$ to choose. It is not possible to select this parameter in beforehand, and substantial trial and fail to obtain a satisfying result is needed. CoBiC and the constrained IHT algorithm use the sparsity parameter $k$. This parameter ensures that the algorithm does not reconstruct more than $k$ elements non-zero in the solution. The parameter $k$ may be easier to use for two reasons, principally. The user may have an idea of the upper bound of excited fluorophores in each acquisition. If the user thinks the algorithm reconstructs too many fluorophores, he can easily reduce $k$. In the latter case, the user does not know how to change $\lambda$ to obtain the wished-for results. The Deep-STORM is a deep learning algorithm. To teach the algorithm, the user needs to create proper simulated images that represent the dataset. This requires an idea of the density, knowledge of the PSF, as well as an estimation of the noise level.
5.1 Results of the simulated dataset
The first dataset contains simulated acquisitions, which makes it possible to quantitatively evaluate the reconstruction, while the second contains real acquisitions. Figure 1 (a), (b) and (c) show three of the 361 acquisitions of the simulated dataset. We apply the localization algorithms to each image of the acquisition dataset. The sum of the results of the localization of the 361 acquisitions yields the super-resolution image. We use the Jaccard index in order to perform the numerical evaluation of the reconstructions. The Jaccard index evaluates the localization of the reconstructed fluorophores (see [6]), and is defined as the ratio between the correctly reconstructed (CR) fluorophores and the sum of CR-, false negatives (FN)- and false positives (FP) fluorophores. A perfect reconstruction yields an index of 1, and the lower the index, the poorer the reconstruction. Furthermore, the Jaccard index includes an error tolerance in its calculation.
The ISBI simulated dataset represents eight tubes of 30 nm diameter. The acquisitions are captured with a $64\times 64$ pixels sensor where each pixel is of size $100 \times 100$ nm$^{2}$. The Point Spread Function (PSF) is modeled by a Gaussian function where the Full Width at Half Maximum (FWHM) is 258.21 nm. In total, there are 81 049 fluorophores on a total of 361 images.
We localize the fluorophores on a fine grid of $256\times 256$ pixel image, where the size of each pixel is $25nm\times 25nm$. Mathematically, this is equivalent to reconstruct $x\in \mathbb {R}^{ML\times ML}$ from an acquisition $d\in \mathbb {R}^{M\times M}$, where $M=64$ and $L=4$. Note that $L$ could be larger, but this introduces more local minima, and the results might be worse. See Table 4 in Appendix for a small comparison. The center of the pixel is used to estimate the precise position of the fluorophore.
We set $k$ equal to the average number of fluorophores for each acquisition, which is 220, known from the ground truth. In order to observe the reconstruction, we normalize the image after summing all the reconstruction. Thus the brightest points indicate strong intensity, and dark spots indicate a low intensity.
We set $\rho ^{(0)}=1$ for PeBic and CoBic. Note that a smaller $\rho$ could be chosen, but this implies longer computational time without improving the results.
CoBic reconstructs 99 non-zero pixels on average. Even though 220 fluorophores emit lights on average, several of these may be within a $25nm\times 25nm$ square, and in the grid case, this will count only for one pixel. This may be one reason for our algorithm reconstructs only 99 non-zero pixels. In addition, it may be that the proposed algorithm converges to a critical point and not a global minimum.
First, we adjust the parameters of the other algorithms such that they reconstruct on average 99 non-zero pixels in order to properly compare the Jaccard index. We initialize the IHT and IRL1-CEL0 algorithms by applying the conjugate of the operator $A$ on the acquisition as this gives the best result.
The ground truth and the sum of the 361 acquisitions can be observed Fig. 1(d) and Fig. 1(e), and the results of the reconstructions are presented in Fig. 2. The two IHT algorithms do not manage to distinguish between two tubes when they are close (see the red case in Fig. 2) compared to the other algorithms. CoBic and PeBic perform quite well, but not as well as the state-of-the-art IRL1-CEL0 and Deep-STORM. The Deep-STORM algorithm seems to reconstruct the fluorophores best visually.
The Jaccard index is shown in Table 1 for a reconstruction of on average 90 non-zero pixels, 99 non-zero pixels and 142 non-zero pixels. The last case is optimized for the IRL1-CEL0. The case of 90 non-zero pixels demonstrate the performance of the algorithms with a $k$ chosen which is not optimal for IRL1-CEL0 nor CoBic and PeBic. We observe the low Jaccard index of the IHT constrained algorithm compared to CoBic. With a precision of $150nm$ and worse, CoBic performs the best with the 99 non-zero pixels, but IRL1-CEL0 performs best overall. The Deep-STORM algorithm reconstruct images with an average of 44264 non-zero pixels. Thus, due to the high number of non-zero pixels, the calculation of the Jaccard index is too demanding, but the index would be close to 0. Most of the non-zero pixels have a low intensity, with higher intensity on the tubelins, and that is why we observe in Fig. 2 a good reconstruction. We could fix a threshold, and let all the pixels with an intensity less than the threshold be zero. However, this would not be fair to the other methods as the same operation could be performed on them. Further note that CoBic cannot reconstruct more than 99 non-zero pixels with the given initialization, and there is no value in the case of 142 non-zeros pixels in Table 1.
Table 2 shows the average computational time for one image acquisition from the simulated dataset. The Deep-STORM is fast and outperforms the other methods in speed. The other algorithms have not been optimized with respect to speed, and could possibly be accelerated by parallel computing and GPU computing.
However, the calibration time, the time to find the best parameters, is something that cannot be measured by a computer. The advantage of the IHT (penalized and constrained), IRL1-CEL0, PeBic and CoBic is that we can fine-tune the parameters by testing them on a few images. This is not possible with Deep-STORM as each change of parameters needs a different training set which must be simulated and then the deep neural network must be trained. The total training time is around 2 hours. In contrast to a maximum of 15 minutes if we test the parameters of another algorithm on 7-10 images. Furthermore, the constrained IHT and CoBic have $k$ as the main parameter. The parameter is quite easy to choose and to adjust after testing. The $\lambda$ for the penalized formulations is trickier to regulate as it is not possible to know how much to change it to obtain the wished-for result.
5.2 Results of the real dataset
We compare the algorithms on a high-density dataset of tubulins provided from the 2013 ISBI SMLM challenge. The dataset contains 500 acquisitions of $128\times 128$ pixels, and each pixel is of size $100 \times 100$ nm$^{2}$. The FWHM has been estimated to be 351.8 nm in [35] by averaging many fitted PSF on observed single molecules in the given dataset. We localize the fluorophores on a $512\times 512$ pixel grid. Each pixel is of size $25 \times 25$ nm$^{2}$.
We do not have any beforehand knowledge of the solution, and after trial and error, we set $k=140$ for CoBic. We choose the parameters of the other algorithms such that their reconstructions visually looks best. Figure 3 presents the reconstruction. The results are coherent with the obtained results of the simulated dataset. The IHT algorithms reconstruct not as well as the other algorithms and the penalized version is worse than the constrained version. The reconstructions obtained by the other algorithms are equivalent, with the Deep-STORM algorithm slightly better.
6. Conclusion
We have presented a reformulation of the $\ell _2-\ell _0$ constrained and penalized problem. We propose CoBic and PeBic, two algorithms to minimize the problems and apply them to SMLM. The algorithms are easy to implement, as each step can be decomposed to well-studied problems. We compare the algorithms to the well-known IHT algorithm on constrained and penalized form as well as the state-of-the-art relaxation CEL0. Furthermore, we compare them to a deep-learning method. CoBic and PeBic outperform the IHT algorithms visually and numerically. Compared to the IRL1-CEL0 and Deep-STORM methods, the methods are less precise. However, the choice of the parameter for CoBic may be more natural to choose for a biologist, as $k$ represents the maximum number of non-zero pixels which can be translated to the number of distinguishable fluorophores.
Even though CoBic reconstruct with high precision in the simulated dataset, we can observe that the constraint is not saturated. This may indicate that the algorithm has converged to a critical point which is not the global minimum. As a perspective, we are interested in how to avoid these non satisfactory critical points that may be saddle points. Furthermore, it seems interesting to further investigate the reformulation of the $\ell _0$-norm and to introduce it with other data-fitting terms. The acquisitions from Single-Molecule Localization microscopy has low photon count, and Poisson noise is dominant. The CEL0 penalization cannot be used with a Poisson fitting data term, but this may be possible with the proposed reformulation.
7. Mathematical justification of the algorithm
In this section we present the theoretical foundation of our work. Theorem 1 and 2 show that minimizing (9) is equivalent in terms of minimizers as minimizing (8), given $\rho$ is large enough.
This theorem differs from [24, Corollary 3.2] as their $\rho$ may be arbitrarily large in contrast to this work where $\rho$ can be calculated precisely. Furthermore, they work with a slightly different reformulation of the $\ell _0$-norm and not explicitly with the problem (2) since they assume their loss-function to be locally Lipschitzian.
First, some important notations that has not been previously stated.
Theorem 1 (Constrained form). Assume that $\rho >\sigma (A)\|d\|_2$, and $A$ is full rank. Let $G_\rho$ and $G$ be defined respectively in (9) and (8) with $I(u)$ defined by (6). We have:
- 1. If $(x_\rho ,u_\rho )$ is a local (respectively global) minimizer of $G_\rho$, then $(x_\rho ,u_\rho )$ is a local (respectively global) minimizer of $G$.
- 2. If $(\hat {x},\hat {u})$ is a global minimizer of $G$, then $(\hat {x},\hat {u})$ is a global minimizer of $G_\rho$.
Theorem 2 (Penalized form). Assume that $\rho >\sigma (A)\|d\|_2$, and $A$ is full rank. Let $G_\rho$ and $G$ be defined respectively in (9) and (8) with $I(u)$ defined in (7). We have:
- 1. If $(x_\rho ,u_\rho )$ is a local (respectively global) minimizer of $G_\rho$, then we can construct $(x_\rho ,\tilde {u}_\rho )$ which is a local (respectively global) minimizer of $G$.
- 2. If $(\hat {x},\hat {u})$ is a global minimizer of $G$, then $(\hat {x},\hat {u})$ is a global minimizer of $G_\rho$.
Some lemmas are needed to prove Theorem 1. We start with presenting and proving the lemmas, and the proofs of the theorems are presented at the end.
Lemma 1. Let $B\in \mathbb {R}^{N\times l}$ be a semi-orthogonal matrix, that is, a non-square matrix composed of orthonormal columns. Then, $B^{T}B$ is the identity matrix in $\mathbb {R}^{l\times l}$.
Lemma 2. Let $A\in \mathbb {R}^{P\times N}$, let $a_i$ denote the $i$th column of $A$. Defining $\omega$ to be a set of indices, $\omega \subseteq \{1,\dots ,N\}$. Let the restriction of $A$ to the columns indexed by the elements of $\omega$ be denoted as $A_\omega =(a_{\omega [1]},\dots ,a_{\omega [\#\omega ]})\in \mathbb {R}^{P\times \#\omega }$. Then $\|A_\omega \|\leq \|A\|$.
Proof. $A_\omega$ can be written as the product of matrix $A$ and a matrix $B$. We define the vector $e_i\in \mathbb {R}^{P}$, the unitary vector which has zeros everywhere except for the $i$-th place. The matrix $B \in \mathbb {R}^{N \times \# \omega }$ can be constructed with $e_i\, \forall i \in \omega$. The matrix $B$ is thus a semi-orthonormal matrix. The spectral norm of the matrix $B$ is 1, as $B^{T}B$ is the identity matrix (from Lemma 1). We overbound $A_\omega$ as
□
Lemma 3. [Pshenichnyi-Rockafellar lemma][36, Theorem 2.9.1] Assume $g$ is a proper lower semi-continuous convex function. Let $C$ be a convex set, such that $int (C) \cap dom (g)\neq \emptyset$. Then
Lemma 4. Given the minimization problem
where $A\in \mathbb {R}^{P \times N}$ is a full rank matrix and $w$ a non-negative vector. $|x|$ is a vector which contains the absolute value of each component of $x$. Let $\hat {x}$ be a solution of problem (19). Then $\|A\hat {x}-d\|_2$ is bounded independently of $w$ andProof. Let $\hat {x}$ be the solution of $$ \arg \min _{x} \frac{1}{2}\|A x-d\|^{2}+<w,|x|>,$$ then we have $\forall x\in \mathbb {R}^{N}$
In particular, by choosing $x=0$ we have: Since $w$ is a non-negative vector, the term $<w,|\hat {x}|>$ is always non-negative; therefore we have and so□
Lemma 5. Let $f(x)=\frac {1}{2}\|Ax-d\|_2^{2}+<w,|x|>+\iota _{\cdot \geq 0}(x)$, $A$ be a full rank matrix and $w$ is a non-negative vector. We have the following result: If $w_i> \sigma (A)\|d\|_2$ then the optimal solution of the following optimization problem:
is achieved with $\hat {x}_i=0$.Proof. We start by proving that $\sigma (A)\|d\|_2 \geq \left |\left (A^{T}(A\hat {x}-d)\right )_{i}\right |$. Remark that Lemma 4 is valid for problem (23), from which we have
From the Pshenichnyi-Rockafellar lemma, a necessary and sufficient condition for $\hat {x}$ is a minimizer of $f$ on $C$ is that
In our case $C$ is the $\mathbb {R}^{N}_+$ and $f(x)=\frac {1}{2}\|Ax-d\|^{2}+<w,|x|>$. We have that $\partial f(x)=\partial (\frac {1}{2}\|Ax-d\|^{2}) + \partial (<w,|x|>)$ as $f(x)$ is a sum of two convex functions, where the intersection of the domains is non empty (see [37, Corollary 16.38]).The optimal condition is therefore
where□
Lemma 6. Let $(x_\rho ,u_\rho )$ be a local minimizer of $G_\rho$ defined in (9), with $I$ on the constrained form, that is, defined as in (6). Let $G_{x_{\rho }}(u)= \frac {1}{2}\|Ax_{\rho }-d\|^{2}+ I(u) +\rho (\|x_{\rho }\|_1-<x_{\rho },u>)$. We denote $O$ the indexes of the $k$ largest values of $\{i=1\cdots N,|(x_{\rho })_i|\}$. $Q\triangleq \{i|(x_{\rho })_i>0\}$, and $S\triangleq \{j|(x_{\rho })_j<0\}$. Moreover, define $D\triangleq O \cap Q$, $L\triangleq O\cap S$ and $W\triangleq \{1,2\cdots ,N\}\backslash \{D\cup L\}$. If $\#(D\cup L)=k$, that is, $\|x_\rho \|_0\geq k$, then the minimum of $G_{x_\rho }(u)$ will be reached with $u_{\rho }$ such that
Proof. The minimization of $G_{x_\rho }(u)$ can be viewed as a problem of minimizing $-<x_{\rho },u> + \iota _{-1\leq \cdot \leq 1 }(u) + \iota _{\|\cdot \|_1\leq k}(u)$ by using the definition of $I(u)$. The results are obvious.
□
Lemma 7. Let $\rho >\sigma (A)\|d\|_2$. Let $(x_{\rho },u_{\rho })$ be a local or global minimizer of $G_\rho (x,u):=\frac {1}{2}\|Ax-d\|^{2}+ I(u) +\rho (\|x\|_1-<x,u>)$ with $I(u)$ defined as in (6) or (7). Let $\omega = \{i\in \{1,\dots ,N\}; (u_{\rho })_i=0\}$. Then $(x_{\rho })_i=0\, \forall i \in \omega$.
Proof. Let $J$ denote the set of indices: $J=\{1,\dots , N\} \backslash \omega$. If $(x_{\rho },u_{\rho })$ is a local or global minimizer of $G_\rho$ then $\forall (x,u)\in \mathcal {N}((x_{\rho },u_{\rho }),\gamma )$, where $\mathcal {N}((x_{\rho },u_{\rho }),\gamma )$ denotes a neighborhood of $(x_{\rho },u_{\rho })$ of size $\gamma$, we have
□
Lemma 8. If $\rho >\sigma (A)\|d\|_2$, let $(x_\rho ,u_\rho )$ be a local or global minimizer of
Proof. From Lemma 6, we have that $(u_{\rho })_i(x_{\rho })_i=|(x_{\rho })_i| \forall \,\, i \in J$, and $(u_{\rho })_i=0 \, \forall i \in \omega$. It suffices to prove $(x_{\rho })_i=0\, \forall i \in \omega$. For that we use Lemma 7, and conclude that $(x_{\rho })_\omega =0$.
□
With the above lemmas we can prove Theorem 1
Proof. We start by proving the first part of the theorem. Let $(x_\rho ,u_\rho )$ be a local minimizer of $G_\rho$, with $I(u)$ on the constrained form, that is, defined as in (6). Let $\mathcal {S}= \{(x,u); \|x\|_1=<x,u>\}$. If $\rho >\sigma (A)\|d\|_2$ then, from Lemma 8,
Furthermore, from the definition of a minimizer, we haveNow we prove part 2 of Theorem 1.
Let $(\hat {x},\hat {u})$ be a global minimizer of $G$. We necessarily have $\|\hat {x}\|_1=<\hat {x},\hat {u}>$. First, we show that
This can be shown by contradiction. Assume the opposite, and denote $(x_\rho ,u_\rho )$ a global minimizer of $G_\rho$. We then have Lemma 8 shows that $\|x_\rho \|_1=<x_\rho ,u_\rho >$, so $G_\rho (x_\rho ,u_\rho )=G(x_\rho ,u_\rho )$ and we haveWe therefore have shown that $G_\rho (\hat {x},\hat {u})\leq \min G_\rho (x,u)$, and we have
$(\hat {x},\hat {u})$ is thus a global minimizer of $G_\rho$.□
Lemma 9. Let $(x_\rho ,u_\rho )$ be a local minimizer of $G_\rho$ defined in (9), with $I$ on the penalized form defined as in (7). Let $G_{x_{\rho }}(u)= \frac {1}{2}\|Ax_{\rho }-d\|^{2}+ I(u) +\rho (\|x_{\rho }\|_1-<x_{\rho },u>)$. The minimum of $G_{x_\rho }(u)$ will be reached with a $u_{\rho }$ such that
Proof. The proof of the necessary condition:
We start by writing the optimal conditions of $G_{x_\rho }(u)$.
We can prove the reverse statement. Rewrite $(x_{\rho })_i=\frac {\beta }{\rho }$, for some $\beta \in \mathbb {R}$. We have then, from the optimal conditions (31) that
This finishes the proof.
□
Lemma 10. Let $(x_\rho ,u_\rho )$ be a local or a global minimizer of $G_\rho$ for the penalized form ( $I(u$ ) defined by (7)). If $\rho >\sigma (A)\|d\|_2$ then $\forall \, i$ such that $(u_{\rho })_i=0$ we have $(x_{\rho })_i =0$
Proof. From Lemma 9, we have that $(u_{\rho })_i=0$ iff $(x_{\rho })_i\in (-\frac {\lambda }{\rho },\frac {\lambda }{\rho })$. We denote $\omega$ the set of indices where $u_{\rho }=0$, and we can apply Lemma 7, and conclude that $(x_{\rho })_{\omega }=0$.
□
Remark 1. If $\rho >\sigma (A)\|d\|_2$, note that the cost function $G_\rho$ with minimizers $(x_{\rho },u_{\rho })$ is constant on $|(x_{\rho })_i|=\frac {\lambda }{\rho }$ and $|(u_{\rho })_i|\in [0, 1]$.
Remark 2. In the case of the penalized form, the minimizers ($x_{\rho },u_{\rho }$) of $G_\rho$ with $\rho >\sigma (A)\|d\|_2$ may be such that $<x_{\rho },u_{\rho }> \neq \|x_{\rho }\|_1$. This may only happen if $|(x_{\rho })_i|=\frac {\lambda }{\rho }$.
Remark 3. If $\rho >\sigma (A)\|d\|_2$. From Remark 1, from a minimizer $(x_{\rho },u_{\rho })$ of $G_\rho$, we can construct a minimiser $(x_{\rho },\tilde {u}_\rho )$ of $G_\rho$ such that $\|x_{\rho }\|_1=<x_{\rho },\tilde {u}_\rho >$. This can be done by denoting $Z$, the set of indices such that $0<|(u_{\rho })_i|<1$. If $Z$ is non-empty, we have $<x_\rho ,u_\rho >\neq \|x_\rho \|_1$. From Remark 2, $|(x_{\rho })_i|=\frac {\lambda }{\rho } \forall i \in Z$. Take $\tilde {u}_{\rho i} = sign(x_i) \,\,\, \forall i \in Z$ and $\tilde {u}_{\rho i} = (u_{\rho })_i \, \forall i \notin Z$, then $<x_\rho ,\tilde {u}_\rho >=\|x_\rho \|_1$. Furthermore, $(x_\rho ,\tilde {u}_\rho )$ is a minimizer of $G_\rho$ according to Remark 1 and the fact that $G_\rho (x_\rho ,u)$ is convex with respect to $u$.
With Lemma 10 and the above remarks, we can prove Theorem 2.
Proof. We start by proving the first part of the theorem. Given $(x_\rho ,u_\rho )$ a local or global minimizer of $G_\rho$, with $I(u)$ on the penalized form, that is, defined as in (6). Let $\mathcal {S}$ denote the space where $\|x\|_1=<x,u>$. If $\rho >\sigma (A)\|d\|_2$ then, from Remark 3, we can construct $(x_\rho ,\tilde {u}_\rho )$ such that
Furthermore, from the definition of a minimizer, we haveThe second part of Theorem 2 can be proved as in the proof of Theorem 1.
□
Lemma 11. [25, Lemma 1] For any $x\in \mathbb {R}^{N}$
Proof. We consider first the problem
□
Proposition 1. The solution $u^{(s+1)}$ of
Proof. A closed form expression can be found by calculating the subgradient for the problem (42) with respect to $u$. The subgradient of the box constraint $\iota _{-1\leq \cdot \leq 1}$ is 0 if $|u_i|< 1$, $[0,\infty )$ if $u_i=1$ and $(-\infty ,0]$ if $u_i=-1$. We obtain the following optimal conditions:
□
Funding
Ministère de l'Enseignement supérieur, de la Recherche et de l'Innovation; Agence Nationale de la Recherche (ANR-19-P3IA-0002).
Disclosures
The authors declare no conflicts of interest.
References
1. A. Lipson, S. G. Lipson, and H. Lipson, Optical Physics, 4th ed (Cambridge University Press, 2010).
2. A. Mishin and K. Lukyanov, “Live-cell super-resolution fluorescence microscopy,” Biochemistry (Moscow) 84(S1), 19–31 (2019). [CrossRef]
3. S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophys. J. 91(11), 4258–4272 (2006). [CrossRef]
4. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313(5793), 1642–1645 (2006). [CrossRef]
5. M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm),” Nat. Methods 3(10), 793–796 (2006). [CrossRef]
6. D. Sage, H. Kirshner, T. Pengo, N. Stuurman, J. Min, S. Manley, and M. Unser, “Quantitative evaluation of software packages for single-molecule localization microscopy,” Nat. Methods 12(8), 717–724 (2015). [CrossRef]
7. D. Sage, T.-A. Pham, H. Babcock, T. Lukes, T. Pengo, J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Colabrese, A. Wheeler, A. Archetti, B. Rieger, R. Ober, G. M. Hagen, J.-B. Sibarita, J. Ries, R. Henriques, M. Unser, and S. Holden, “Super-resolution fight club: A broad assessment of 2d & 3d single-molecule localization microscopy software,” Nat. Methods 16(5), 387–395 (2019). [CrossRef]
8. T. Takeshima, T. Takahashi, J. Yamashita, Y. Okada, and S. Watanabe, “A multi-emitter fitting algorithm for potential live cell super-resolution imaging over a wide range of molecular densities,” J. Microsc. 271(3), 266–281 (2018). [CrossRef]
9. R. Henriques, M. Lelek, E. F. Fornasiero, F. Valtorta, C. Zimmer, and M. M. Mhlanga, “Quickpalm: 3d real-time photoactivation nanoscopy image processing in imagej,” Nat. Methods 7(5), 339–340 (2010). [CrossRef]
10. H. Babcock, Y. M. Sigal, and X. Zhuang, “A high-density 3d localization algorithm for stochastic optical reconstruction microscopy,” Opt Nanoscopy 1(1), 6 (2012). [CrossRef]
11. E. Soubies, L. Blanc-Féraud, and G. Aubert, “A Continuous Exact ℓ0 Penalty (CEL0) for Least Squares Regularized Problem,” SIAM J. Imaging Sci. 8(3), 1607–1639 (2015). [CrossRef]
12. J. Min, C. Vonesch, H. Kirshner, L. Carlini, N. Olivier, S. Holden, S. Manley, J. C. Ye, and M. Unser, “Falcon: fast and unbiased reconstruction of high-density super-resolution microscopy data,” Sci. Rep. 4(1), 4577 (2015). [CrossRef]
13. J.-J. Cao, Y.-F. Wang, and C.-C. Yang, “Seismic data restoration based on compressive sensing using regularization and zero-norm sparse optimization,” Chin. J. Geophys. 55(2), 239–251 (2012). [CrossRef]
14. F. Nie, H. Huang, X. Cai, and C. H. Ding, “Efficient and robust feature selection via joint l2, 1-norms minimization,” in Advances in neural information processing systems, (2010), pp. 1813–1821.
15. S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process. 41(12), 3397–3415 (1993). [CrossRef]
16. A. Beck and Y. C. Eldar, “Sparsity constrained nonlinear optimization: Optimality conditions and algorithms,” SIAM J. Optim. 23(3), 1480–1509 (2013). [CrossRef]
17. C. Soussen, J. Idier, D. Brie, and J. Duan, “From bernoulli–gaussian deconvolution to sparse signal restoration,” IEEE Trans. Signal Process. 59(10), 4572–4584 (2011). [CrossRef]
18. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006). [CrossRef]
19. E. Soubies, L. Blanc-Féraud, and G. Aubert, “A unified view of exact continuous penalties for ℓ2 − ℓ0 minimization,” SIAM J. Optim. 27(3), 2034–2060 (2017). [CrossRef]
20. S. Bourguignon, J. Ninin, H. Carfantan, and M. Mongeau, “Exact sparse approximation problems via mixed-integer programming: Formulations and computational performance,” IEEE Trans. Signal Process. 64(6), 1405–1419 (2016). [CrossRef]
21. M. Pilanci, M. J. Wainwright, and L. El Ghaoui, “Sparse learning via boolean relaxations,” Math. Program. 151(1), 63–87 (2015). [CrossRef]
22. S. Bi, X. Liu, and S. Pan, “Exact penalty decomposition method for zero-norm minimization based on mpec formulation,” SIAM J. Sci. Comput. 36(4), A1451–A1477 (2014). [CrossRef]
23. A. d’Aspremont, A semidefinite representation for some minimum cardinality problems, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), vol. 5 (IEEE, 2003), pp. 4985–4990.
24. Y. Liu, S. Bi, and S. Pan, “Equivalent lipschitz surrogates for zero-norm and rank optimization problems,” J Glob Optim 72(4), 679–704 (2018). [CrossRef]
25. G. Yuan and B. Ghanem, “Sparsity Constrained Minimization via Math. Program. with Equilibrium Constraints,” arXiv:1608.04430 (2016).
26. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458–464 (2018). [CrossRef]
27. S. Gazagnes, E. Soubies, and L. Blanc-Féraud, “High density molecule localization for super-resolution microscopy using CEL0 based sparse approximation,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), (2017). pp. 28–31.
28. P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul. 4(4), 1168–1200 (2005). [CrossRef]
29. H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality,” Mathematics of OR 35(2), 438–457 (2010). [CrossRef]
30. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” FNT in Machine Learning 3(1), 1–122 (2010). [CrossRef]
31. A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imaging Sci. 2(1), 183–202 (2009). [CrossRef]
32. S. M. Stefanov, “Convex quadratic minimization subject to a linear constraint and box constraints,” Appl. Math. Res. eXpress 2004(1), 17–42 (2004). [CrossRef]
33. P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, “On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision,” SIAM J. Imaging Sci. 8(1), 331–372 (2015). [CrossRef]
34. Y. Shechtman, “Software nano-bio-optics lab,” https://nanobiooptics.net.technion.ac.il/software/ (2018). [Online; accessed 2018-10-09].
35. M. Chahid, “Echantillonnage compressif appliqué à la microscopie de fluorescence et à la microscopie de super résolution,” Ph.D. thesis, Bordeaux (2014).
36. C. Zalinescu, Convex Analysis in General Vector Spaces (World scientific, 2002).
37. H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408 (Springer, 2011).