## Abstract

Single-molecule localization microscopy has become a prominent approach to study structural and dynamic arrangements of nanometric objects well beyond the diffraction limit. To maximize localization precision, high numerical aperture objectives must be used; however, this inherently strongly limits the depth-of-field (DoF) of the microscope images. In this work, we present a framework inspired by “optical co-design” to optimize and benchmark phase masks, which, when placed in the exit pupil of the microscope objective, can extend the DoF in the realistic context of single fluorescent molecule detection. Using the Cramér-Rao bound (CRB) on localization accuracy as a criterion, we optimize annular binary phase masks for various DoF ranges, compare them to Incoherently Partitioned Pupil masks and show that they significantly extend the DoF of single-molecule localization microscopes. In particular we propose different designs including a simple and easy-to-realize two-ring binary mask to extend the DoF. Moreover, we demonstrate that a simple maximum likelihood-based localization algorithm can reach the localization accuracy predicted by the CRB. The framework developed in this paper is based on an explicit and general information theoretic criterion, and can thus be used as an engineering tool to optimize and compare any type of DoF-enhancing phase mask in high resolution microscopy on a quantitative basis.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Over the last decades, a variety of super-resolution fluorescence microscopy techniques have allowed to obtain images with higher resolutions than the diffraction limit [1]. Among these techniques, single-molecule localization microscopy consists in detecting single emitters [2–5], which, when optically isolated, can easily be super-localized, i.e. with nanometric precision [6]. On the one hand, taking advantage of such localization precision, single molecule/particle tracking allows the precise dynamic behavior of molecules to be revealed in complex environments including in live biological cells [7] or in structured materials [8–10]. On the other hand, the ultrastructure of densely labelled entities (e.g. biological specimens [11–14] or nanomaterials [15–17]) can be revealed by controlling the emission properties of the emitters used.

Because molecule diffusion and molecular assemblies are generally not confined within the two dimensions of the imaging microscope plane, several approaches have been designed to extend the super-localization concept to the third dimension [18,19]. 3D single-molecule localization proved to be very efficient within the depth-of-field (DoF) of the microscope, even in thick samples [20,21]. These techniques require sophisticated devices, calibration and processing techniques, and often lead to PSF displaying broadened transverse shapes which might penalize 2D localization accuracy.

Yet, for some applications, extending the DoF without aiming at superlocalizing molecules along the axial direction might be useful, for minimal instrumental and processing complexity or in the case of low photon numbers, for example when imaging at high speed. To this aim, several approaches have been proposed, consisting in extending the DoF by making the PSF of the microscope invariant along the imaging axis to generate volumetric images formed of 2D projections of the 3D imaging volume [22,23]. This concept was also used in 2-photon excitation fluorescence microscopy for fast volumetric imaging of brain function [24].

In this work, we propose an alternative methodology to optimize phase masks designed to increase the DoF of localization microscopes. Our approach is inspired by “optical co-design” [25–30], which not only takes into account the image formation model and the properties of the optical system to design the phase mask, but also the method of localization extraction to maximize the quality of the final information delivered by the system. Building on this, we propose a rigorous framework to optimize and benchmark phase masks aimed at optimizing 2D location accuracy within a prescribed DoF range. The potential of this framework is illustrated by optimizing and comparing the performance of annular binary phase masks [31] and Incoherently Partitioned Pupil masks [22].

More precisely, this optimization framework is based on the Cramér-Rao Bound (CRB) that represents the fundamental limit of single-molecule localization accuracy [32]. The CRB has already been used in the literature to evaluate the 2D and 3D location capabilities of localization microscopy strategies, and to compare the performance of different strategies [33–35]. However, to the best of our knowledge, it has never been used to design optimized DoF-enhancing phase masks. In order to efficiently localize the particle in practice from the images, we also propose a localization algorithm based on the maximum likelihood (ML) and adapted to the optical characteristics of the optimal masks. We show that contrary to the ML-based methods used in standard focused localization microscopy [33], this algorithm requires segmenting the DoF range in a sufficient number of segments to reach the CRB.

Our approach is in sharp contrast with the works in [31] and [36], which address the problem of DoF extension in classical imaging. In these publications, the optimization criterion is not a location performance expressed by a CRB or a Fisher information. It is the image quality obtained after deconvolution with an averaged Wiener filter. This quality is expressed in terms of mean square error (MSE) between the ideal and deconvolved images. We show in the present paper that since the CRB and MSE-based optimization criteria are different, they lead to optimal masks that are different and that have significantly different localization performance.

This article is organized as follows: Section 2 describes the imaging and noise models, introduces the Fisher information matrix and the CRB to calculate the fundamental limit of single-molecule localization accuracy, and describes annular binary phase masks aimed at improving the localization accuracy of defocused imaging systems. In Section 3, we present a co-design approach for optimizing DoF-extending phase masks. Using the CRB as a criterion, we optimize annular binary phase masks for various DoF ranges. In Section 4, we design a ML based localization algorithm and demonstrate that it reaches the CRB at the price of a moderate increase of the computational complexity compared to the case when the particle is in focus. In Section 5, we use the developed framework to benchmark the performance of two different types of optimized DoF-extending phase masks, with a particular focus on a simple and easy to realize binary mask composed of two rings only. Conclusions and perspectives are drawn in Section 6.

## 2. Single-molecule localization microscopy and DoF extension

Our goal is to improve the DoF of single-molecule fluorescence microscopes by using phase masks and adapted image processing algorithms. In this section, we first define the signal and noise models considered in this article, then the single-molecule localization accuracy criterion we have chosen. We then describe the type of phase masks we shall use for DoF extension and illustrate their capacity to make localization accuracy nearly invariant to defocus.

#### 2.1 Signal and noise models

Since the emitter is unresolved by the microscope, we observe, in the image plane, the point spread function (PSF) of the microscope objective centered on the geometric image position of the emitter. The irradiance is proportional to

where $M$ is the lateral magnification of the imaging setup and $\boldsymbol {\theta }_0=(\theta _{0x},\theta _{0y}) ^{\textrm {t}}$ is the position of the emitter in the object plane. The superscript t denotes transposition. The function $f^{\psi }(x,y)$ represents the 2D spatial distribution of the PSF and is normalized so that $\iint f^{\psi }(x,y)\mathop {}\!\mathrm {d} x\mathop {}\!\mathrm {d} y=1$. It is proportional to the squared modulus of the Fourier transform of the normalized complex pupil function defined as $\exp [i\Phi (\xi ,\eta )]$, when $\xi ^{2}+\eta ^{2}<1$, and $0$ otherwise, with the phase function $\Phi (\xi ,\eta )$ expressed, for pure defocus wavefront error, byThe digital image delivered by the sensor is a version of Eq. (1) that has been sampled, filtered by the finite size pixel and corrupted by noise. Let us assume that this image has a width of $2P+1$ pixels, with $P\in \mathbb {N}^{+}$. We call $s_{ij}$ the number of photo-electrons observed at pixel $(i,j)\in \mathbb {N}^{2}$ with $|i|$ and $|j|\leq P$. Assuming that the measurement noise is additive, Gaussian, spatially white, of mean zero and variance $\sigma ^{2}_n$, the number $s_{ij}$ is a Gaussian random variable of probability density function

For the sake of simplicity, we do not take into account in our model the signal-dependent photon noise, that is Poisson distributed and not additive. Also, we use Fourier optics for modeling image formation of high-aperture microscope objectives and nanometric targets (see [37,41] for a more accurate electromagnetic-based model). These simplifications ease the proof of the DoF-enhancing potential of binary phase masks. However, the results obtained in this paper can be generalized without difficulty to more thorough imaging and noise models.

#### 2.2 Fundamental limit of localization accuracy

Locating a molecule consists in estimating the particle coordinates $\boldsymbol {\theta }_0$ with high accuracy from the measured data $s_{ij}$. According to [33], the fundamental limit of localization accuracy can be obtained from the Fisher information matrix, which indicates how the likelihood of the observed data is affected by changes in the values of the parameters of interest. It is defined by

The diagonal values of the inverse of the Fisher matrix are the Cramér-Rao bounds (CRB) of the estimates of $\theta _{0x}$ and $\theta _{0y}$. They represent a lower bound on the estimation error variance of these parameters that can be obtained with an unbiased estimator. They thus represent the intrinsic difficulty of an estimation problem, independently of the (unbiased) method used to solve it. We consider here equivalently the square root of the CRB, denoted RCRB and which has the dimension of a distance, as the limit of localization accuracy.

Due to the circular symmetry of the PSF $f^{\psi ,\boldsymbol {\theta }_0}(x,y)$ for any value of $\psi$, the off-diagonal terms of the Fisher information matrix are zero and the RCRB is the square root of the inverse of the diagonal elements of the Fisher information. Thus, the RCRB along the $x$ and $y$ axes have the following expressions:

As an illustration, we have represented in Fig. 2(a) the variation of the PSF profile along the $x$ axis as a function of the defocus parameter $\psi$ for an example of microscope configuration defined in Table 1. We express the defocus parameter $\psi$ in units of central wavelength $\lambda$ of the collected light. We note that the larger the defocus parameter, the more the PSF spreads out and its central lobe gets fainter. We have represented in Fig. 2(c) the variation of the normalized RCRB of this optical system as a function of defocus (blue dotted line). It is observed that the RCRB increases very slowly until $\psi \simeq \pm \lambda /4$, which corresponds to the Rayleigh criterion, then the increase gets much sharper. At a defocus parameter of $\psi = \pm 1\lambda$, the RCRB is more than ten times larger than for $\psi =0$ where, for example, the RCRB is equal to 0.04 pixel (or 2.7 nm) when $N_0=500$ photons and $\sigma _n^{2}=6$ photons$^{2}$/pixel. The case with a phase mask, in Fig. 2(b), is discussed in the next section.

#### 2.3 DoF extension using annular binary phase masks

Our goal is to minimize the variation of localization accuracy with defocus by placing an optimized phase mask in the exit pupil of the optical system and by adapting the localization algorithm.

There exists many types of phase masks that may improve the DoF [42–47]. We consider in this article annular binary phase masks since they are easy to manufacture and have proven their efficiency in classical imaging applications [36,48–51]. These masks consist of concentric rings defined by their normalized outer radii. Each ring implements a phase modulation of alternatively 0 or $\pi$ radians at a nominal wavelength $\lambda$. For instance, we have represented in Fig. 3 a four-ring annular binary phase mask defined by the parameter vector $\boldsymbol {\rho }=(\rho _1, \rho _2, \rho _3) ^{\textrm {t}}$, with $0\leq \rho _1\leq \rho _2\leq \rho _3\leq 1$, where $\rho _n$ is the outer radius of the $n$-th ring. This mask defines a binary phase function that we denote by $\Phi _{\textrm {mask}}(\xi ,\eta ,\boldsymbol {\rho })$. If such a mask is placed in the exit pupil of a defocused optical system, the phase function in the exit pupil, as defined in Eq. (2), has the following expression:

We have represented in Fig. 2(b) the variation of the PSF profile as a function of the defocus parameter $\psi$ when a two-ring mask defined by $\rho _1=0.59$ is placed in the exit pupil (the reason for choosing this value of $\rho _1$ will become apparent in the next section). The presence of the mask significantly alters the optical properties of the system through defocus when compared with the aberration free PSF in Fig. 2(a). The minimum spread of the PSF is not obtained for perfect focus (i.e. $\psi =0$), but for $\psi \simeq \pm 0.7\lambda$. On the other hand, for $\psi =0$, the PSF profile is not concentrated but divided in three main lobes. It is also noticed from Fig. 2(a) and 2(b) that whether or not the mask is present, the PSF is identical for $\psi$ and $-\psi$ (that is, on either side of the focus point). This would not be the case for most other types of DoF-enhancing phase masks, such as the cubic mask [42]. This is an interesting property of binary phase masks with $\pi$-phase modulation [31].

We have represented in Fig. 2(c) the variation of the normalized RCRB as a function of $\psi$ with this mask placed in the exit pupil (red solid line). The normalized RCRB for a given scenario is equal to the RCRB in this scenario divided by the value of the RCRB without mask and in focus (i.e. $\psi =0$). By comparison with the curve obtained without mask (blue dotted line), it is seen that the mask allows to get much lower values of the RCRB over the whole defocus range. With the mask, the maximal value of the RCRB over the defocus range is three times smaller than without it. Of course, this value is also three times larger that the value obtained without mask and in focus: the price to pay to extend the DoF is to degrade the localization accuracy in focus. It is interesting to note that the value of RCRB is good at $\psi =0$, even if for this defocus parameter, the PSF is spread into three main lobes (see Fig. 2(b)). This means that such a PSF, albeit not concentrated in a single lobe, still contains enough information to ensure accurate localization.

Our goal will be to determine the binary phase masks parameters that optimize the RCRB for different numbers of rings and different values of the defocus range, and to determine the algorithms that make it possible to reach a localization accuracy equal to the RCRB in practice.

## 3. Binary phase mask optimization for localization applications

In this section, we define an optimization criterion for binary phase masks aimed at DoF extension. This criterion is highly non-convex, and we describe a method to perform its optimization. Then, we apply this method to optimize annular binary phase masks composed of two to five rings for various defocus ranges. In each case, we evaluate the obtained trade-off between localization accuracy and DoF extension.

#### 3.1 Optimization method

Using Eq. (7) and (8), we are able to calculate the RCRB of an imaging system equipped with a phase mask for any value of the defocus parameter $\psi$. Since $\operatorname {\textrm {RCRB}}=\operatorname {\textrm {RCRB}}_x=\operatorname {\textrm {RCRB}}_y$ and the PSF is identical for $\psi$ and $-\psi$ using binary phase masks with $\pi$-phase modulation, a reasonable criterion for phase mask optimization is therefore to minimize the value of the RCRB for the worst value of $\psi$, that is, to define the optimal mask parameters as

#### 3.2 Performance and limits of optimized binary phase masks

By applying the optimization method described in the previous section, we have optimized multi-ring binary phase masks for various defocus ranges. Figure 5(a) represents the maximal value of the normalized RCRB over the defocus range obtained with the optimal mask parameters $\boldsymbol {\rho }_{\textrm {opt}}$, defined as

If we now consider the orange curve that corresponds to $\psi _{\textrm {max}}=1.5 \lambda$, we first see that it is above the blue one, since the problem to solve is more difficult. Moreover, for this defocus range, using three rings instead of two brings some improvement in localization accuracy. Globally speaking, it is observed that as the defocus range widens, a larger number of rings is needed to reach the optimal localization accuracy. For the maximal considered value of the defocus range $\psi _{\textrm {max}}=3\lambda$, four rings are necessary and sufficient to reach the minimal RCRB. Figure 5(b) represents the profiles of the optimized binary phase masks obtained for this defocus range for two to five rings. We can see that the optimal mask with five rings is quite similar to that with four rings, which is consistent with their similar localization performance.

The main conclusion of these results is that optimal binary masks significantly improve localization accuracy for all the considered defocus ranges, and that a limited number of rings is sufficient to obtain optimal performance. These conclusions are similar to those obtained in the case of DoF enhancement of classical imaging systems, where the optimization criterion is the quality of the deconvolved image [31]. However, as shown in Appendix A, the optimal masks are different.

In order to gain a deeper insight into the obtained results, let us analyze how the optimal masks modify the PSF of the optical system and how these modifications make it possible to extend the DoF. Figure 6(a) represents the variation of the PSF profiles as a function of the defocus parameter for four different configurations: without mask and with masks optimized, respectively, for $\psi _{\textrm {max}} = \{1\lambda ,1.5\lambda ,3\lambda \}$. It can be seen that the use of a binary phase mask significantly reduces the fading of the PSF over the defocus range. Figure 6(b) illustrates the variation of the normalized RCRB as a function of $\psi$ for the same four imaging systems. Comparing Fig. 6(a) and 6(b) is insightful. It is observed that the local minima of the RCRB in Fig. 6(b) correspond, for all defocus ranges, to two different types of PSF profiles that are appropriate for localization in Fig. 6(a). The first one is characterized by an important central lobe. It occurs for example at $\psi \simeq 0.7\lambda$ for $\psi _{\textrm {max}} = 1\lambda$ and $\psi =0$ for $\psi _{\textrm {max}} = 1.5\lambda$. The second type is characterized by important secondary lobes. It occurs for example at $\psi \simeq 0$ for $\psi _{\textrm {max}} = 1\lambda$ and $\psi \simeq 0.7\lambda$ for $\psi _{\textrm {max}} = 1.5\lambda$. The PSF profiles corresponding to the transition between these two types yield higher values of the RCRB. For example, for $\psi _{\textrm {max}} = 1.5\lambda$, the worst localization accuracy is obtained for $\psi \simeq 0.4\lambda$. In Fig. 6(a), this value of $\psi$ corresponds to a low contrast PSF profile with no distinct central or secondary lobes.

## 4. Localization algorithm

The phase masks have been optimized using the RCRB criterion defined in Eq. (11), which represents a lower bound on the localization standard deviation of unbiased estimation. It is thus a “potential” performance level, and one has to specify actual estimators able to reach this performance in practice. In the case of well focused images, it has been shown that for sufficient SNR, ML algorithms are able to reach the CRB [33]. However, in our case, the problem is more involved since it depends on another parameter, the defocus parameter $\psi$, which is unknown and will not be retrieved. Thus, the parameter $\psi$ can be considered as a nuisance parameter for our localization problem. If one does not have any *a priori* information on the actual value of $\psi$, one has to maximize, with respect to $\boldsymbol {\theta }$ and $\psi$, the log-likelihood defined as

#### 4.1 Estimation based on a global kernel

When the defocus parameter $\psi$ is known, the ML estimate of the position $\boldsymbol {\theta }$ can be written as a correlation product:

Figure 7(a) compares the empirical normalized standard deviation of the estimator defined in Eq. (16) with the normalized RCRB as a function of the defocus parameter $\psi$. The empirical normalized standard deviation of the estimator for a given scenario is equal to the empirical standard deviation of the estimator in this scenario divided by the value of the RCRB without mask and in focus (i.e. $\psi =0$). The simulated optical system uses a two-ring binary phase mask optimized for the DoF range $\psi _{\textrm {max}}=1\lambda$. The estimator standard deviation is estimated using Monte-Carlo simulations based on 4000 realizations. This curve shows that the empirical normalized standard deviation of the estimator reaches the normalized RCRB only for $\psi >0.4\lambda$. For values of $\psi$ below this limit, the variance of the estimator is much larger than the RCRB (the values are outside the graph limits in Fig. 7(a)).

The reason for this failure is the following. From Fig. 6, it can be seen that the shape of the PSF strongly varies with the defocus. For the defocus range $\psi _{\textrm {max}} = 1 \lambda$, two regimes can be distinguished: one where the shape of the PSF is spread with important secondary lobes (see Fig. 7(c)) and the other where the PSF is more concentrated around a major main lobe (see Fig. 7(d)). The optimal defocus invariant kernel $r_{ij}^{\boldsymbol {\theta }}$, represented in Fig. 7(b), is clearly much more similar to the PSF for $\psi \simeq 0.7\lambda$ than to the PSF for $\psi =0$. This explains why its accuracy is good for larger values of the defocus parameter but dramatically fails for smaller values. In conclusion, it is impossible for a unique kernel - even an optimal one - to encompass the PSF variability over the defocus range $[0,1\lambda ]$. It is therefore necessary to split the defocus range into sub-ranges on which the shape of the PSF is more invariant to apply the estimator defined in Eq. (16). It is important to specify that these sub-ranges do not allow to estimate the defocus parameter, but only to improve the estimation algorithm of the 2D coordinates of the single molecule observed.

#### 4.2 Estimation based on multiple kernels

Suppose that the DoF range $[0,\psi _{\textrm {max}}]$ is split into $M$ distinct sub-ranges. Using the method described in the previous section, we can define $M$ optimal defocus invariant kernels over each defocus sub-ranges, called $r_{ij}^{m,\boldsymbol {\theta }}$ with $m\in \{1,\dots ,M\}$. Then, we jointly estimate the particle coordinates $\hat {\boldsymbol {\theta }}$ and the defocus sub-range $\hat {m}$ in the ML sense:

Of course, when the defocus range increases, the PSF variability within this range also increases (see Fig. 6(a)). As a consequence, one has to use a larger number of sub-ranges to encompass this variability. For example, for a range $\psi _{\textrm {max}}=1.5 \lambda$, we have used five sub-ranges. We have represented in Fig. 9(a) the standard normalized deviation of this estimator, estimated with Monte-Carlo simulations, and we see that it fits the normalized RCRB. Figure 9(b) represents the same values for $\psi _{\textrm {max}}=2 \lambda$. In this case, we had to use six sub-ranges to fit the normalized RCRB.

As a summary, contrary to well-focused systems, the RCRB of DoF-enhanced systems cannot be reached with an estimator consisting of a single correlation kernel. This is due to the fact that even with the optimal masks, the PSF varies within the defocus range. There is thus a price to pay in terms of computational complexity for DoF extension. We have proposed a method based on the subdivision of the defocus range in a sufficient number of sub-ranges, each one being associated with a single correlation kernel. We have shown that this method makes it possible to reach the RCRB. Its computation complexity is simply proportional to the number $M$ of necessary sub-ranges, which increases with the width of the defocus range. Indeed, for $M$ subranges, localization requires $M$ correlations instead of a single one in the standard case of focused localization.

## 5. Comparison with other previously proposed DoF-extending mask

The mask optimization approach presented in this article is based on a general and objective localization criterion: the RCRB. This allows the comparison of any type of DoF-extending strategy on a quantitative basis. In order to illustrate this potential, we compare in this section the annular binary masks and the masks introduced in [22]. These masks consist of a series of concentric annular sub-apertures introducing phase delays that are much larger than the coherence length of illumination. Hence, the light beams emerging for each sub-aperture are mutually incoherent and the PSF of the mask is simply the incoherent addition of the PSFs produced by each sub-aperture. In the following, these masks will be denoted Incoherently Partitioned Pupil (IPP) masks. Their parameters are the widths of the rings.

To compare these two strategies, we optimize the masks, that is, the ring widths, with the minimax criterion defined in Eq. (10) for different values of the defocus range $\psi _{\textrm {max}}$. Figure 10 represents the normalized value of $\operatorname {\textrm {RCRB}}_{\textrm {max}}$ obtained with different imaging strategies for discrete values of $\psi _{\textrm {max}}$ ranging from 0 to $1\lambda$ with a step of $0.1\lambda$. The black line represents the normalized $\operatorname {\textrm {RCRB}}_{\textrm {max}}$ obtained with a localization microscope with no mask and only limited by diffraction. It will serve as a baseline for comparison. The dotted blue curve is obtained with the optimized annular binary phase mask and the dash-dotted red one with the optimized IPP mask. Note that for each value of $\psi _{\textrm {max}}$, the optimal mask may be different. Interestingly, it is first observed that the phase masks do not improve performance for small defocus parameters $\psi _{\textrm {max}}<0.4\lambda$, whatever the used strategy, since the three curves are superposed. In this case, DoF is too small for improving localization performance using masks. On the other hand, when the DoF range becomes larger, phase masks yield significant improvement, and this improvement is larger with annular binary phase masks than with IPP masks for any value of $\psi _{\textrm {max}}$. For instance, when the defocus range is equal to $\psi _{\textrm {max}}=1\lambda$, the optimal annular binary phase mask yields a $\operatorname {\textrm {RCRB}}_{\textrm {max}}$ three times smaller than that obtained without mask, while the optimal IPP mask reduces $\operatorname {\textrm {RCRB}}_{\textrm {max}}$ only by a factor of two.

This result is an illustration of how the framework proposed in the article allows any type of DoF-enhancing strategy to be compared. Of course, localization accuracy may not be the only criterion for the choice of a mask in a given application. For example, manufacturability is also an important criterion. In that regard, annular binary phase masks may be easier to manufacture with photolithographic techniques since they only require one shallow etching level whereas the IPP masks necessitate deep tiers between each ring.

## 6. Conclusion

We have investigated the problem of DoF extension in the context of single-molecule localization microscopy. We have shown that placing an optimized phase mask in the exit pupil of the microscope and using an adapted processing algorithm allows to significantly increase the localization performance within the required defocus range. We have proposed different binary mask designs to enhance the DoF including a two-ring only solution that is easy to manufacture. Of course, there is a price to pay for DoF extension since the localization accuracy is always lower than for well-focused systems.

A strong asset of the framework developed in this article is to be based on an explicit and general information theoretical criterion. It thus makes it possible to compare any type of DoF extending masks on a realistic and quantitative basis. We have illustrated this potential by comparing optimized annular binary phase masks with another type of optimized phase mask proposed in the literature.

The present work is based on a simple imaging model, and in this sense, the orders of magnitude given in this article can be considered as upper limits on the DoF improvement that can be obtained in practice with phase masks. Thus, this method lays the basis on which more sophisticated and application-dependent strategies can be built. In particular, the imaging model could be improved in several ways, in terms of PSF modeling, optical aberrations or noise model. Another interesting perspective is to apply this framework to more general types of masks, such as continuous pure-phase masks or masks with combined amplitude and phase modulation, and to compare them on a realistic basis.

## A. Comparison between CRB and MSE-based optimization criteria

In this Appendix, we compare the binary annular DoF-enhancing phase masks optimal for classical imaging systems and for single-molecule localization microscopy.

We have represented in Fig. 11 the optimal masks for DoF extension of $\psi _{\textrm {max}} = 1 \lambda$ in classical imaging [31] (MSE optimization, Fig. 11(a)) and in localization microscopy (CRB optimization, Fig. 11(b)). It is seen that these mask are very different. The question is whether they yield different localization performance.

To answer this question, we have plotted in Fig. 12(a) and Fig. 12(b) the $x$-axis profile of the PSF of a single-molecule microscope as a function of defocus when using these two masks. It is observed that these profiles are quite different. The profile of the mask optimized with the MSE criterion (Fig. 12(a)) is smooth since the quality of the deconvolved image has to be constant along the required DoF range. In contrast, the profile of the mask optimized with the CRB criterion (Fig. 12(b)) varies more sharply. This is because, as explained in the section 3.2, the three-lobe profile around $\psi = 0$ and the “blob-like” profile around $\psi =0.7\lambda$ are both appropriate for localization (they yield low values of the RCRB).

Finally, Fig. 12(c) represents the localization performance of the two masks represented in Fig. 11, expressed in terms of normalized RCRB, as a function of the defocus $\psi$. It is seen that this localization performance differ significantly. We can conclude that the DoF enhancing masks optimal for classical imaging [31,36] are not optimal for localization microscopy.

## B. Closed-form expression of the variance $\operatorname {\mbox {var}}_\psi [\boldsymbol {\hat {\theta }}]$

In this Appendix, we show that it is possible, with some approximations, to get a closed-form expression of the variance $\operatorname {\mbox {var}}_\psi [\boldsymbol {\hat {\theta }}]$ of the estimator in Eq. (16) for a given value of $\psi$. For the sake of simplicity, we model the single-molecule localization problem as a one-dimensional problem. However, the method is also valid for 2D localization. To facilitate the mathematical developments, we assume that the observed signal $s(x)$ is continuous and modeled as

where $\theta _0$ is the position of the emitter in the object plane, $N_0$ is the total number of photo-electrons expected in the whole 1D image, $n(x)$ is a Gaussian zero-mean white noise of power spectral density $S_{nn}(\nu )=q$ and $f^{\psi ,\theta _0}(x)=f^{\psi }(x-\theta _0)$ is proportional to the 1D spatial distribution of irradiance over the sensor for a given defocus parameter $\psi$.We want to estimate the coordinate $\theta _0$ of the emitter in the object plane by maximizing

where $r(x)$ is a defocus invariant kernel.To quantify the performance of the estimator defined in Eq. (22), we can calculate its bias and variance. Using Eq. (21), the estimator has the following expression:

where $\Omega (\theta )=\int f^{\psi ,\theta _0}(x)r(x-\theta )\mathop {}\!\mathrm {d} x$ is a correlation product and $n'(\theta )=\int n(x)r(x-\theta )\mathop {}\!\mathrm {d} x$ is a filtered noise. Let us consider the second order Taylor expansion of $\Omega (\theta )$ when $\theta$ is close to the true value $\theta _0$ (i.e. when the variance is low)## C. Invariant kernel based on variance minimization

In this Appendix, we determine the optimal defocus invariant kernel $r(x)$ such as the variance $\operatorname {\mbox {var}}_\psi [\boldsymbol {\hat {\theta }}]$ of the estimator defined in Eq. (22) is minimal. Considering a discrete set of values $\psi _k$, such as $k\in [1,K]$, the optimal defocus invariant kernel can be defined as the one which maximizes

By annulling its functional derivative, it is easily shown that the Fourier transform of the optimal defocus invariant kernel is a linear combination of the functions $\tilde {f}^{\psi _k}(x)$ such as

We can conclude that the optimal defocus invariant kernel that maximizes $\sum _k(\operatorname {\mbox {var}}_{\psi _k}[\hat {\theta }])^{-1}$ is a linear combination of the functions $f^{\psi _k}(x)$ which coefficients are the components of the eigenvector associated with the greatest eigenvalue of the matrix $\boldsymbol {\underline {W}}$.

## Funding

Agence Nationale de la Recherche (ANR-15-CE16-0004-03, ANR-18-CE09-0019-02); Fondation ARC pour la Recherche sur le Cancer; IdEx Bordeaux (ANR-10-IDEX-03-02); French research group GdR ISIS of the CNRS.

## Acknowledgements

This work was supported by grants from *Agence Nationale de la Recherche* (ANR-15-CE16-0004-03, ANR-18-CE09-0019-02), the *Fondation ARC pour la recherche sur le cancer* (A. L.) and IdEx Bordeaux (ANR-10-IDEX-03-02). This work has also received the support of the French research group GdR ISIS of the CNRS through the *Projet de Recherche Exploratoire* MASK.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **B. Huang, M. Bates, and X. Zhuang, “Super-resolution fluorescence microscopy,” Annu. Rev. Biochem. **78**(1), 993–1016 (2009). [CrossRef]

**2. **E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science **313**(5793), 1642–1645 (2006). [CrossRef]

**3. **S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophys. J. **91**(11), 4258–4272 (2006). [CrossRef]

**4. **M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm),” Nat. Methods **3**(10), 793–796 (2006). [CrossRef]

**5. **A. Sharonov and R. M. Hochstrasser, “Wide-field subdiffraction imaging by accumulated binding of diffusing probes,” Proc. Natl. Acad. Sci. **103**(50), 18911–18916 (2006). [CrossRef]

**6. **N. Bobroff, “Position measurement with a resolution and noise-limited instrument,” Rev. Sci. Instrum. **57**(6), 1152–1157 (1986). [CrossRef]

**7. **L. Cognet, C. Leduc, and B. Lounis, “Advances in live-cell single-particle tracking and dynamic super-resolution imaging,” Curr. Opin. Chem. Biol. **20**, 78–85 (2014). [CrossRef]

**8. **J. Kirstein, B. Platschek, C. Jung, R. Brown, T. Bein, and C. Bräuchle, “Exploration of nanostructured channel systems with single-molecule probes,” Nat. Mater. **6**(4), 303–310 (2007). [CrossRef]

**9. **D. Lasne, A. Maali, Y. Amarouchene, L. Cognet, B. Lounis, and H. Kellay, “Velocity profiles of water flowing past solid glass surfaces using fluorescent nanoparticles and molecules as velocity probes,” Phys. Rev. Lett. **100**(21), 214502 (2008). [CrossRef]

**10. **A. G. Godin, J. A. Varela, Z. Gao, N. Danné, J. P. Dupuis, B. Lounis, L. Groc, and L. Cognet, “Single-nanotube tracking reveals the nanoscale organization of the extracellular space in the live brain,” Nat. Nanotechnol. **12**(3), 238–243 (2017). [CrossRef]

**11. **P. Kanchanawong, G. Shtengel, A. M. Pasapera, E. B. Ramko, M. W. Davidson, H. F. Hess, and C. M. Waterman, “Nanoscale architecture of integrin-based cell adhesions,” Nature **468**(7323), 580–584 (2010). [CrossRef]

**12. **A. Dani, B. Huang, J. Bergan, C. Dulac, and X. Zhuang, “Superresolution imaging of chemical synapses in the brain,” Neuron **68**(5), 843–856 (2010). [CrossRef]

**13. **G. Giannone, E. Hosy, F. Levet, A. Constals, K. Schulze, A. I. Sobolevsky, M. P. Rosconi, E. Gouaux, R. Tampé, D. Choquet, and L. Cognet, “Dynamic superresolution imaging of endogenous proteins on living cells at ultra-high density,” Biophys. J. **99**(4), 1303–1310 (2010). [CrossRef]

**14. **A. Löschberger, S. van de Linde, M.-C. Dabauvalle, B. Rieger, M. Heilemann, G. Krohne, and M. Sauer, “Super-resolution imaging visualizes the eightfold symmetry of gp210 proteins around the nuclear pore complex and resolves the central channel with nanometer resolution,” J. Cell Sci. **125**(3), 570–575 (2012). [CrossRef]

**15. **L. Cognet, D. A. Tsyboulski, and R. B. Weisman, “Subdiffraction far-field imaging of luminescent single-walled carbon nanotubes,” Nano Lett. **8**(2), 749–753 (2008). [CrossRef]

**16. **J. Feng, H. Deschout, S. Caneva, S. Hofmann, I. Lončarić, P. Lazić, and A. Radenovic, “Imaging of optically active defects with nanometer resolution,” Nano Lett. **18**(3), 1739–1744 (2018). [CrossRef]

**17. **N. Danné, M. Kim, A. G. Godin, H. Kwon, Z. Gao, X. Wu, N. F. Hartmann, S. K. Doorn, B. Lounis, Y. Wang, and L. Cognet, “Ultrashort carbon nanotubes that fluoresce brightly in the near-infrared,” ACS Nano **12**(6), 6059–6065 (2018). [CrossRef]

**18. **S. R. P. Pavani and R. Piestun, “Three dimensional tracking of fluorescent microparticles using a photon-limited double-helix response system,” Opt. Express **16**(26), 22048–22057 (2008). [CrossRef]

**19. **B. Hajj, M. El Beheiry, I. Izeddin, X. Darzacq, and M. Dahan, “Accessing the third dimension in localization-based super-resolution microscopy,” Phys. Chem. Chem. Phys. **16**(31), 16340–16348 (2014). [CrossRef]

**20. **P. Bon, J. Linarès-Loyez, M. Feyeux, K. Alessandri, B. Lounis, P. Nassoy, and L. Cognet, “Self-interference 3d super-resolution microscopy for deep tissue investigations,” Nat. Methods **15**(6), 449–454 (2018). [CrossRef]

**21. **F. Xu, D. Ma, K. P. MacPherson, S. Liu, Y. Bu, Y. Wang, Y. Tang, C. Bi, T. Kwok, A. A. Chubykin, P. Yin, S. Calve, G. E. Landreth, and F. Huang, “Three-dimensional nanoscopy of whole cells and tissues with in situ point spread function retrieval,” Nat. Methods **17**(5), 531–540 (2020). [CrossRef]

**22. **S. Abrahamsson, S. Usawa, and M. Gustafsson, “A new approach to extended focus for high-speed high-resolution biological microscopy,” in Three-Dimensional and Multidimensional Microscopy: Image Acquisition and Processing XIII, vol. 6090J.-A. Conchello, C. J. Cogswell, and T. Wilson, eds., International Society for Optics and Photonics (SPIE, 2006), pp. 128–135.

**23. **R. N. Zahreddine and C. J. Cogswell, “Total variation regularized deconvolution for extended depth of field microscopy,” Appl. Opt. **54**(9), 2244–2254 (2015). [CrossRef]

**24. **R. Lu, W. Sun, Y. Liang, A. Kerlin, J. Bierfeld, J. D. Seelig, D. E. Wilson, B. Scholl, B. Mohar, M. Tanimoto, M. Koyama, D. Fitzpatrick, M. B. Orger, and N. Ji, “Video-rate volumetric functional imaging of the brain at synaptic resolution,” Nat. Neurosci. **20**(4), 620–628 (2017). [CrossRef]

**25. **W. T. Cathey and E. R. Dowski, “New paradigm for imaging systems,” Appl. Opt. **41**(29), 6080–6092 (2002). [CrossRef]

**26. **M. D. Robinson and D. G. Stork, “Joint design of lens systems and digital image processing,” in International Optical Design, (Optical Society of America, 2006), p. WB4.

**27. **A. R. Harvey, T. Vettenburg, M. Demenikov, B. Lucotte, G. Muyo, A. Wood, N. Bustin, A. Singh, and E. Findlay, “Digital image processing as an integral component of optical design,” in Novel Optical Systems Design and Optimization XI, vol. 7061R. J. Koshel, G. G. Gregory, J. D. Moore Jr., and D. H. Krevor, eds., International Society for Optics and Photonics (SPIE, 2008), pp. 32–42.

**28. **F. Diaz, F. Goudail, B. Loiseaux, and J.-P. Huignard, “Increase in depth of field taking into account deconvolution by optimization of pupil mask,” Opt. Lett. **34**(19), 2970–2972 (2009). [CrossRef]

**29. **V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. **37**(4), 1–13 (2018). [CrossRef]

**30. **S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperture for the benefit of depth of field extension,” Opt. Express **26**(12), 15316–15331 (2018). [CrossRef]

**31. **R. Falcón, F. Goudail, C. Kulcsár, and H. Sauer, “Performance limits of binary annular phase masks codesigned for depth-of-field extension,” Opt. Eng. **56**(2), 065104 (2017). [CrossRef]

**32. **S. M. Kay, * Fundamentals of Statistical Signal Processing: Estimation theory*, Prentice Hall signal processing series (Prentice-Hall PTR, 1993).

**33. **R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracy in single-molecule microscopy,” Biophys. J. **86**(2), 1185–1200 (2004). [CrossRef]

**34. **A. von Diezmann, Y. Shechtman, and W. E. Moerner, “Three-dimensional localization of single molecules for super-resolution imaging and single-particle tracking,” Chem. Rev. **117**(11), 7244–7275 (2017). [CrossRef]

**35. **M. Badieirostami, M. D. Lew, M. A. Thompson, and W. E. Moerner, “Three-dimensional localization precision of the double-helix point spread function versus astigmatism and biplane,” Appl. Phys. Lett. **97**(16), 161103 (2010). [CrossRef]

**36. **A. Fontbonne, H. Sauer, C. Kulcsár, A.-L. Coutrot, and F. Goudail, “Experimental validation of hybrid optical–digital imaging system for extended depth-of-field based on co-optimized binary phase masks,” Opt. Eng. **58**(11), 1–12 (2019). [CrossRef]

**37. **F. Aguet, “Super-resolution fluorescence microscopy based on physical models,” Theses, École polytechnique fédérale de Lausanne (2009).

**38. **M. Born and E. Wolf, * Principles of optics: electromagnetic theory of propagation, interference and diffraction of light* (Elsevier, 2013).

**39. **H. H. Hopkins and C. R. Burch, “The frequency response of a defocused optical system,” Proc. R. Soc. Lond. A **231**(1184), 91–103 (1955). [CrossRef]

**40. **L. Tao and C. Nicholson, “The three-dimensional point spread functions of a microscope objective in image and object space,” J. Microsc. **178**(3), 267–271 (1995). [CrossRef]

**41. **C. J. R. Sheppard and P. Török, “An electromagnetic theory of imaging in fluorescence microscopy, and imaging in polarization fluorescence microscopy,” Bioimaging **5**(4), 205–218 (1997).

**42. **E. R. Dowski and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. **34**(11), 1859–1866 (1995). [CrossRef]

**43. **S. S. Sherif, W. T. Cathey, and E. R. Dowski, “Phase plate to extend the depth of field of incoherent hybrid imaging systems,” Appl. Opt. **43**(13), 2709–2721 (2004). [CrossRef]

**44. **A. Sauceda and J. O.-C. neda, “High focal depth with fractional-power wave fronts,” Opt. Lett. **29**(6), 560–562 (2004). [CrossRef]

**45. **Q. Yang, L. Liu, and J. Sun, “Optimized phase pupil masks for extended depth of field,” Opt. Commun. **272**(1), 56–66 (2007). [CrossRef]

**46. **N. Caron and Y. Sheng, “Polynomial phase masks for extending the depth of field of a microscope,” Appl. Opt. **47**(22), E39–E43 (2008). [CrossRef]

**47. **F. Zhou, G. Li, H. Zhang, and D. Wang, “Rational phase mask to extend the depth of field in optical-digital hybrid imaging systems,” Opt. Lett. **34**(3), 380–382 (2009). [CrossRef]

**48. **Z. Zalevsky, A. Shemer, A. Zlotnik, E. B. Eliezer, and E. Marom, “All-optical axial super resolving imaging using a low-frequency binary-phase mask,” Opt. Express **14**(7), 2631–2643 (2006). [CrossRef]

**49. **B. Milgrom, N. Konforti, M. A. Golub, and E. Marom, “Pupil coding masks for imaging polychromatic scenes with high resolution and extended depth of field,” Opt. Express **18**(15), 15569–15584 (2010). [CrossRef]

**50. **F. Diaz, M.-S. L. Lee, X. Rejeaunier, G. Lehoucq, F. Goudail, B. Loiseaux, S. Bansropun, J. Rollin, E. Debes, and P. Mils, “Real-time increase in depth of field of an uncooled thermal camera using several phase-mask technologies,” Opt. Lett. **36**(3), 418–420 (2011). [CrossRef]

**51. **M.-A. Burcklen, F. Diaz, F. Leprêtre, J. Rollin, A. Delboulbé, M.-S. L. Lee, B. Loiseaux, A. Koudoli, S. Denel, P. Millet, F. Duhem, F. Lemonnier, H. Sauer, and F. Goudail, “Experimental demonstration of extended depth-of-field f/1.2 visible High Definition camera with jointly optimized phase mask and real-time digital processing,” J. European Opt. Soc. : Rapid publications **10**, 15046 (2015). [CrossRef]

**52. **J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4 (1995), pp. 1942–1948 vol.4.