Statistical properties of amplitude and decay parameter estimators for fluorescence lifetime imaging

Jeongtae Kim; Jiyeong Seok

doi:10.1364/OE.21.006061

1. Introduction

Fluorescence lifetime imaging microscopy (FLIM) provides useful information about the status of fluorophores and their molecular environments, which can be used to resolve physiological parameters in cells [1, 2]. Moreover, FLIM can be extended to monitor fluorescence resonance energy transfer that can be used to study the dynamics of proteins within the cells [3]. Since FLIM is based on the decay properties of fluorophores, an accurate estimation of fluorescence lifetime becomes important. The fluorescence lifetime is usually estimated either in the time domain, using a time correlated single photon counting (TCSPC) technique or in the frequency domain, using an intensity-modulated light source and phase-shifted emission light [1]. Compared with methods used for the frequency domain, the TCSPC-based method has advantages such as robustness to scattering, higher precision, and the ability to measure short lifetimes [4].

To estimate decay time constants in time domain, one usually fits an exponential decay model to the measured photon counts by maximizing an objective function that measures the goodness of the fit to the data. Since the detected photon count follows a Poisson distribution, one recommended measure for the goodness of fit is the Poisson likelihood function, which leads to the maximum likelihood estimator (MLE) that has many desirable properties such as asymptotic unbiasedness, asymptotic consistency, and asymptotic efficiency [5]. Nevertheless, many previous investigations did not use the Poisson likelihood function. Instead, these investigations used the sum of square errors, Pearson’s χ² and Neyman’s χ², that originate from approximations of the Poisson likelihood function to the Gaussian likelihood function [1,6–9]. We suspect that this is because χ²-based measures (including the sum of square errors, which is χ² for data having equal variances) have been frequently used to measure the goodness of fit in data analysis [10], and because it may look natural to measure the goodness of fit by a function that includes the sum of square errors between the model and data, as in the case with χ²-based measures. However, since χ²-based measures assume that each data value is a Gaussian random variable, estimation using such measures is based on an incorrect probability model. For the case of high photon counts, it is known that the Gaussian likelihood function might be an alternative to the Poisson likelihood function, because the Poisson probability distribution can be approximated by the Gaussian probability distribution [11]. However, for the case of low photon counts, one should not use the estimators that are based on the Gaussian likelihood function. Moreover, even in case of high photon counts, one should not use χ²-based estimators because these estimators are either biased or inefficient, as demonstrated herein. Nevertheless, the least squares estimator (LSE) was used in software packages for FLIM [6], and χ²-based estimators were used in FLIM imaging [7–9] until recently, in addition to the many χ²-based investigations that were reported earlier [12].

We think that the plurality of incorrect estimators is due to the lack of an analysis of statistical properties of each estimator. Although there are some investigations that compare the performance of different estimators [11,12], the comparisons are based on only numerical simulations and experiments, and do not use analysis. In addition, the usefulness of these comparisons is limited, because most studies are based on estimating the parameters of a single exponential decay. Recently, an investigation on estimating parameters of a double exponential model for measuring protein interacting fractions was reported [13]. Except for empirical comparisons, to our knowledge, there exists only one study that analyzed the performance of the estimators [14]. However, this analysis was not very useful since it assumed an infinite number of photons and was also used for the estimation of parameters of a a single decay. Moreover, since the behaviors of MLE and Pearson’s χ²-based estimator for the asymptotic case of infinite photon counts are identical in their analysis [14], it may mislead readers, who may think that equal performance can be achieved either using the MLE or Pearson’s χ²-based estimator. In fact, the Pearson’s χ²-based estimator should not be used since, as demonstrated herein, this estimator is biased for a finite number of photon counts.

In this investigation, we approximate the mean and variance of MLE, χ², Pearson’s χ², and Neyman’s χ² methods by linearizing the estimators. Such analysis was used effectively in some previous works to analyze the properties of nonlinear estimators [15, 16]. Our analysis shows that the MLE is both unbiased and efficient, while the variance of the LSE is larger than that of the MLE, although the LSE is unbiased. In addition, we show that the frequently used Pearson’s χ² and Neyman’s χ²-based estimation yield a biased estimator. We use simulations to validate our analysis.

2. Theory

2.1. Problem formulation

We model a photon count y(t_i), that is measured at time t_i from a mixture of fluorophores, as a Poisson random variable [1]:

y (t_{i}) = Poisson {λ (t_{i}; θ)}, i = 0, \dots ., N - 1,

where, the mean of the Poisson random variable, λ (t_i; θ), is defined as follows:

λ (t_{i}; θ) = g (t_{i}; θ) * h (t_{i}),

where, * is the convolution operator, h(t) is the impulse response function (IRF) of a measurement system, and g(t; θ) is an exponential decay model that is defined as follows:

g (t; θ) = \sum_{j = 0}^{M - 1} A_{j} e^{- (\frac{t}{τ_{j}})},

where, the 2M × 1 long vector of unknown parameters, θ = [τ₀, τ₁,...,τ_M₋₁, A₀, A₁,..., A_M−1], consists of the decay constants τ_i and the amplitudes A_i of all fluorophores [1]. To simplify the notations, we denote λ (t_i; θ) by λ_i(θ) and y(t_i) by y_i. Because FLIM should be based on accurate decay properties, the parameter vector θ should be accurately estimated from the noisy measurement Y = [y₀, y₁,..., y_N₋₁]. To achieve this goal, the estimation is usually carried out by determining the minimizer of an objective function, in the following way:

\hat{θ} = \underset{θ > 0}{argmin} Φ (θ; Y) .

Since it is desired that the mean of the estimator be the same as the true parameter (i.e., unbiased) and that the variance of the estimator be as small as possible, one should design the objective function based on the underlying probabilistic nature of the measured photon counts. One of the most useful estimators is the MLE, which is defined by the maximizer of the likelihood function of Y. The MLE for the measurement that is modeled in Eq. (1) is determined by the minimizer of the negative Poisson log-likelihood function, defined as follows:

- L (θ; Y) = \sum_{i = 0}^{N - 1} λ_{i} (θ) - y_{i} log λ_{i} (θ) + log y_{i}! .

If one assumes that the measured data is acquired under additive Gaussian noises with equal variances, then the LSE would be the MLE for such measured data. The objective function for the LSE is defined as follows:

Φ_{L S} (θ; Y) = \frac{1}{2} \sum_{i = 0}^{N - 1} {(y_{i} - λ_{i} (θ))}^{2} .

Since each measurement y_i has different variance, one may attempt to estimate the vector θ by considering different variances for different y_i. Because the mean and the variance of the Poisson random variable y_i are the same as λ_i(θ), assuming each measurement as a Gaussian random variable with equal mean and variance yields the well known Pearson’s χ² objective function defined, as follows [11]:

Φ_{P} (θ; Y) = \frac{1}{2} \sum_{i = 0}^{N - 1} \frac{{(y_{i} - λ_{i} (θ))}^{2}}{λ_{i} (θ)} .

Another alternative of the Pearson’s χ² is the modified Neyman’s χ², which is obtained by substituting the variance in the denominator with max{y_i, 1}, as follows:

Φ_{N} (θ; Y) = \frac{1}{2} \sum_{i = 0}^{N - 1} \frac{{(y_{i} - λ_{i} (θ))}^{2}}{max {y_{i}, 1}} .

The denominator of the original Neyman’s χ² is y_i, which is based on the intuition that E[y_i] is λ_i(θ). However, since the original Neyman’s χ² function is not defined when y_i is zero, we study the modified Neyman’s χ²-based method as suggested previously [11]. We note that if all y_i are non-zero, the modified Neyman’s χ² is the same as the original [11].

2.2. Mean and variance approximation

If the mean and the variance of all estimators (that are based on the corresponding objective functions described in the previous section) are known, it would be easy to compare their performance. One would prefer an unbiased estimator that has small variance. However, since the estimator is defined implicitly by the minimizer of some objective function, it is not possible to derive exact analytical expressions for the mean and variance [16]. To theoretically analyze the performance of the estimators, we approximate the mean and variance of each estimator by linearizing the gradient of the objective function around the true parameter as reported previously [15, 16].

We consider an estimator that is defined by the minimizer of an analytic objective function. The minimizer is determined by zeroing the gradient of the objective function, as follows [16]:

\nabla_{θ} Φ (\hat{θ}; Y) = {\nabla_{θ} Φ (θ; Y) |}_{θ = \hat{θ}} = 0,

where, the ∇_θ operator is defined by the column gradient using

\frac{\partial}{\partial θ_{j}}

[16]. To approximate its mean and variance, we linearize the column gradient ∇_θΦ(θ̂, Y) by the first-order Taylor series expansion that is performed around the true parameter θ, as follows [16]:

\nabla_{θ} Φ (\hat{θ}; Y) \approx \nabla_{θ} Φ (θ; Y) + [\nabla_{θ} {[\nabla_{θ} Φ (θ; Y)]}^{T}] (\hat{θ} - θ) .

Note that the approximation should be useful when the objective function is locally quadratic in a region around θ, and when θ̂ belongs to the local region. For such cases, the Hessian matrix ∇_θ[∇_θΦ(θ; Y)] is a positive definite matrix. Since ∇_θΦ(θ̂; Y) = 0, the estimator θ̂ can be approximated as follows [16]:

\hat{θ} \approx θ - {[\nabla_{θ} {[\nabla_{θ} Φ (θ; Y)]}^{T}]}^{- 1} \nabla_{θ} Φ (θ, Y) .

Therefore, the bias may be approximated as follows [16]:

E [\hat{θ}] - θ \approx - E [{[\nabla_{θ} {[\nabla_{θ} Φ (θ; Y)]}^{T}]}^{- 1} \nabla_{θ} Φ (θ; Y)] \approx - H^{- 1} E [\nabla_{θ} Φ (θ; Y)],

where, H is the expectation of Hessian matrix at θ and is defined as follows:

H = E [\nabla_{θ} {[\nabla_{θ} Φ (θ; Y)]}^{T}] .

In addition, the covariance of the estimator may be approximated as follows [16]:

Cov [\hat{θ}] \approx H^{- 1} E [\nabla_{θ} Φ (θ; Y) \nabla_{θ} Φ {(θ; Y)}^{T}] H^{- 1}

Note that the approximation would be more accurate if estimated parameter values are closer to true parameter values, because in such a case, the objective function can be better approximated by a locally quadratic function around the true parameter (i.e., high signal to noise scenario).

2.3. Cramer-Rao bound

For any unbiased estimator, the Cramer-Rao bound is the lowest achievable bound for the variance of the estimator [5, 17]. The variance of the j-th component of any unbiased estimator θ̂_j is bounded by J_jj, as follows [17]:

Var [{\hat{θ}}_{j}] \geq J_{j j},

where, the matrix J is the inverse of the Fisher information matrix F, which is defined as follows:

F_{j k} ≜ E [\frac{\partial log p (Y; θ)}{\partial θ_{j}} \frac{\partial log p (Y; θ)}{\partial θ_{k}}] = - E [\frac{\partial^{2} log p (Y; θ)}{\partial θ_{j} \partial θ_{k}}] .

We compute the Cramer-Rao bound for the estimation of θ from Y (defined in Eq. (1)). By using the negative log-likelihood function defined in Eq. (5), it is straightforward to show that the (j, k)-th element of the Fisher information matrix F can be defined as follows:

F_{j k} = \sum_{i = 0}^{N - 1} \frac{1}{λ_{i} (θ)} \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}} .

Note that it is possible to simplify the Fisher information matrix, using the relation E[y_i] = Var[y_i] = λ_i(θ).

2.4. Maximum likelihood estimation

The MLE that use the negative Poisson log-likelihood function, defined in Eq. (5), is unbiased in our approximation since

E [\frac{\partial L (θ; Y))}{\partial θ_{j}}] = E [\sum_{i = 0}^{N - 1} (\frac{y_{i}}{λ_{i} (θ)} - 1) \frac{\partial λ_{i} (θ)}{\partial θ_{j}}] = 0 .

For covariance approximation, it is sufficient to note that [17, p. 80]

- E [[\nabla_{θ} L (θ, Y)] {[\nabla_{θ} L (θ, Y)]}^{T}] = E [\nabla_{θ} {[\nabla_{θ} L (θ, Y)]}^{T}] .

Since the approximated covariance matrix is the same as the inverse of the Fisher information matrix, we conclude that, in our approximation, the MLE is both unbiased and efficient. This result is expected, since the MLE is asymptotically unbiased and asymptotically efficient [5].

2.5. Least squares estimation

The least squares estimation is also unbiased in our approximation, because

- E [\frac{\partial Φ_{L S} (θ; Y)}{\partial θ_{j}}] = E [\sum_{i} (y_{i} - λ_{i} (θ)) \frac{\partial λ_{i} (θ)}{\partial θ_{j}}] = 0 .

However, the covariance approximation of the LSE reveals that it has larger variance than MLE. Let H = E [∇_θ[∇_θΦ_LS(θ; Y)]^T] and K = E[∇_θΦ_LS(θ; Y)∇_θΦ_LS(θ; Y)^T]. Then the (j, k)-th elements of H and K matrices are defined as follows:

H_{j k} = \sum_{i = 0}^{N - 1} \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}},

K_{j k} = \sum_{i = 0}^{N - 1} λ_{i} (θ) \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}} .

The approximated covariance function of the LSE, θ̂_LS, satisfies the following inequality due to the matrix Cauchy-Schwarz inequality [18]:

Cov {{\hat{θ}}_{L S}} \approx H^{- 1} K H^{- 1} \geq F^{- 1},

where, A ≥ B is understood in the sense that A – B matrix is positive semidefinite. The proof of this inequality is included in Appendix A. Therefore, we conclude that the variance of the LSE is larger than the MLE, although the estimator is unbiased. This is because the LSE is based on an incorrect probability model.

2.6. Pearson’s χ²

The approximated mean of the Pearson’s χ²-based estimator shows that the estimator is biased because, from what follows, the negative expected gradient value at true parameter is not zero:

\begin{array}{l} - E [\frac{\partial Φ_{p} (θ; Y)}{\partial θ_{j}}] & = & - E [\frac{1}{2} \sum_{i = 0}^{N - 1} \frac{- 2 (y_{i} - λ_{i} (θ)) λ_{i} (θ) - {(y_{i} - λ_{i} (θ))}^{2}}{λ_{i} {(θ)}^{2}} \frac{\partial λ_{i} (θ)}{\partial θ_{j}}] \\ = & \frac{1}{2} \sum_{i = 0}^{N - 1} \frac{1}{λ_{i} (θ)} \frac{\partial λ_{i} (θ)}{\partial θ_{j}} . \end{array}

If, as assumed in our approximation, the expectation of Hessian matrix at θ is invertible, the non-zero gradient implies the biasedness of the estimator in our approximations. We also compute the expectation of the Hessian matrix at θ, H^P, as follows:

H_{j k}^{P} = \sum_{i = 0}^{N - 1} [\frac{1}{λ_{i} (θ)} + \frac{1}{λ_{i} {(θ)}^{2}}] \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}} - \frac{1}{2} \frac{1}{λ_{i} (θ)} \frac{\partial^{2} λ_{i} (θ)}{\partial θ_{j} \partial θ_{k}} .

One should consider the Pearson’s χ²-based estimator to be a biased estimator, even for the case of high photon counts. We note that it was previously shown that Pearson’s χ²-based estimator is asymptotically unbiased and efficient in estimating a single decay time constant [19]. Although this statement is correct, readers should not be misled, because this case describes the case of infinite amplitude. For the estimation of a single decay constant, the bias in estimating the decay constant decreases in proportion to the reciprocal value of the amplitude (see Appendix B). Since the rate of bias decrease is only sub-linear, one should expect the presence of bias even for the case of high photon counts. For the estimation of a single decay constant, the Pearson’s χ²-based estimator will show positive bias for the case of high photon counts, because the expectation of the negative of the gradient, defined in Eq. (24), has a positive value, and the approximated second order derivative be a positive constant. This is related to the fact that Pearson’s χ²-based estimation of the mean of a Poisson random variable overestimates the mean [11], since an overestimated mean would imply slower decay than the true one. However, in estimating multiple parameters simultaneously, the bias can be either negative or positive, depending on the value of the Hessian matrix. Because unbiased estimation of decay parameters is important, we believe that one should avoid using Pearson’s χ² for FLIM imaging. Regarding the variance approximation, although we were able to derive a formula, we have not included it here because the expression is lengthy and includes even fourth order statistics. Instead, we include the empirical variances and discuss the performance in comparison with other estimators in the following section.

2.7. Neyman’s χ²

The approximated mean of the modified Neyman’s χ²-based estimator shows that, similar to the Pearson’s χ²-based estimator, this estimator is biased as well. The expectation of the negative of gradient at true θ is calculated as follows:

\begin{array}{l} - E [\frac{\partial Φ_{N} (θ; Y)}{\partial θ_{j}}] & = & E [\sum_{i = 0}^{N - 1} (\frac{y_{i} - λ_{i} (θ)}{max {y_{i}, 1}}) \frac{\partial λ_{i} (θ)}{\partial θ_{j}}] \approx \sum_{i = 0}^{N - 1} (1 - E [\frac{λ_{i} (θ)}{max {y_{i}, 1}}]) \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \\ \approx & - \sum_{i = 0}^{N - 1} [\frac{1}{λ_{i} (θ)} + λ_{i} (θ) e^{- λ_{i} (θ)}] \frac{\partial λ_{i} (θ)}{\partial θ_{j}}, \end{array}

where, we used the approximation that was developed previously, as follows [20]:

E [\frac{1}{max {y_{i}, 1}}] \approx e^{- λ_{i} (θ)} + \frac{1}{λ_{i} (θ) - 1} .

For a final approximation formula for the bias, we approximate the expectation of Hessian matrix H^N as follows:

\begin{array}{l} H_{j k}^{N} & = & E [\sum_{i = 0}^{N - 1} \frac{1}{max {y_{i}, 1}} \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}} - (\frac{y_{i} - λ_{i} (θ)}{max {y_{i}, 1}}) \frac{\partial^{2} λ_{i} (θ)}{\partial θ_{j} \partial θ_{k}}], \\ \approx & \sum_{i = 0}^{N - 1} [e^{- λ_{i} (θ)} + \frac{1}{λ_{i} (θ)}] \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}} + [\frac{1}{λ_{i} (θ)} + λ_{i} (θ) e^{- λ_{i} (θ)}] \frac{\partial^{2} λ_{i} (θ)}{\partial θ_{j} \partial θ_{k}} . \end{array}

In Eq. (26) and Eq. (28), we approximated 1/(λ_i(θ) − 1) by 1/λ_i(θ) to avoid the numerical instability that arises when λ_i(θ) = 1. An interesting observation is that the expectation of the negative of gradient, defined in Eq. (26), attains a negative value, whereas that of Pearson’s χ² attains a positive value. Again, this is related to the fact that Neyman’s χ²-based mean estimation underestimates the mean [11]. Therefore, an estimation of a single time constant using Neymans’s χ² and Pearson’s χ²-based estimation methods result in underestimation and overestimation, respectively. This is consistent with experimental results that were reported previously [12]. Regarding the variance approximation, we do not include the approximation formula, as it also includes fourth order statistics and does not provide useful information. Instead, in the subsequent section, we include the numerical simulation results.

2.8. Numerical optimization

To determine the MLE and the χ²-based estimations, it is required to solve the minimization problem defined in Eq. (4) with the objective functions defined in Eq. (5) through Eq. (8). Note that the minimization problem is a constrained optimization problem with non-negativity constraints since both decay and amplitude parameters have positive values. We solve the constrained minimization problem using the trust region reflective method [21–23] implemented in the Optimization toolbox in MATLAB (Mathworks, USA), which is based on the quadratic approximation of an objective function. The Hessian matrix H^M of the negative log-likelihood function, which is the objective function for the MLE, is defined as follows:

H_{j k}^{M} = \sum_{i = 0}^{N - 1} (1 - \frac{y_{i}}{λ_{i} (θ)}) \frac{\partial^{2} λ_{i} (θ)}{\partial θ_{j} \partial θ_{k}} + \frac{y_{i}}{λ_{i} {(θ)}^{2}} \frac{\partial λ_{i} (θ)}{\partial θ_{j}} \frac{\partial λ_{i} (θ)}{\partial θ_{k}} .

For the quadratic approximation of the negative log-likelihood function, we ignore the first term in Eq. (29) since inclusion of the second order derivative term can be destabilizing if the model fits badly [10, p. 688]. The Hessian matrices for the χ²-based estimators are also computed using only first order derivative terms for the same reason.

The approximations for the statistical properties of each estimator developed in the preceding sections are based on the assumption that each estimator is determined by zeroing the gradient of corresponding objective function around true parameter. In practice, since objective functions for the MLE and the χ²-based estimators are non-convex functions, numerical optimization may find only a local minimum depending on initial parameter values. Because it is very challenging to analyze the behaviors of local minima theoretically, we study the effects of local minima in decay and amplitude parameter estimation using numerical simulations in the following section.

3. Simulation results

To validate the analysis that was presented in the preceding section, we conducted simulation studies in order to estimate the two decay constants and their associated amplitudes in an exponential decay model that is defined as follows:

λ_{i} (θ) = (A_{1} e^{\frac{t_{i}}{τ_{1}}} + A_{2} e^{\frac{t_{i}}{τ_{2}}}) * h (t_{i}) .

In Eq. (30), we used the IRF of a truncated Gaussian function with σ = 1 and 5 nonzero points. Note that the Gaussian function was often used to model an IRF [2, 6]. We generated 1024 Poisson random variables y_i whose mean is λ_i(θ) with uniformly spaced t_i from t₀ = 0 to t_N₋₁ = 12.788ns. We set true values as τ₁ = 1ns, τ₂ = 4ns. To simulate a different number of photon counts, we changed A₁ from 23 to 125 while we set A₂ = A₁ − 10. By using the generated photon counts (i.e., the Poisson random variables), we estimated the parameters using the MLE, LSE, and Pearson’s χ² and Neyman’s χ²-based estimators. We set initial parameter values by two times the true values. We repeated the estimation 5000 times, using different Poisson random variable realizations.

Figure 1 shows the sample mean of each estimator for [τ₁, τ₂, A₁, A₂] from the 5000 estimations with the approximated mean values that were obtained using the formula developed in the preceding section for the Pearson’s and Neyman’s χ²-based estimators. Notably, because there is no bias in our approximation, the approximated means of the MLE and the LSE are the same as true ones. As shown in Fig. 1, the sample means of the MLE and the LSE are very close to the true parameters for the entire range of photon counts. However, the Pearson’s χ²-based estimator shows positive biases in estimating τ₁ and τ₂. There also exists a small positive bias in estimating A₁ and a small negative bias in estimating A₂. For higher photon counts, the approximated mean formula is in better agreement with the sample mean. The approximated mean values are included only in the range for which the approximated Hessian is a positive definite matrix. Compared with the Pearson’s χ²-based estimator, the Neyman’s χ²-based estimator exhibits more complicated behaviors. The Neyman’s χ²-based estimator shows positive biases in estimating τ₁ and τ₂ when photon counts are very low. However, in the region wherein photon counts are relatively high, the biases are negative. The approximated mean formula agrees with the sample mean only in the regime of high photon counts. As a matter of fact, the approximated Hessian matrix of the Neyman’s χ² is positive definite only for sufficiently high photon counts, indicating that the Neyman’s χ²-based estimator might be more nonlinear than the Pearson’s χ²-based estimator.

Fig. 1 Mean of the estimated parameters for different photon counts: (a) τ₁; (b) τ₂; (c) A₁; (d) A₂;

Download Full Size | PDF

In Fig. 2, we present the sample variance of each estimator with the Cramer-Rao bound and approximated variances for the LSE. As shown in Fig. 2, the sample variances of the MLE and the LSE agree with the Cramer-Rao bound and the approximated variance formula very well for the entire range of photon counts. We note that, according to the proof presented in Appendix A, the approximated variance of the LSE is always greater than the Cramer-Rao bound. The variances of the Pearson’s χ²-based estimator are larger than those of the MLE for low photon counts, although the variances become close to the Cramer-Rao bound as the number of photons increases. Note that even for high photon counts, the Pearson’s χ²-based estimator is biased in estimating τ₁ and τ₂. For Neyman’s χ²-based estimator, the variances are greater than those of the MLE for relatively low photon counts. Although the variances for estimating τ₁ and τ₂ approach the Cramer-Rao bound as photon counts increase, variances in estimated amplitudes do not approach the Cramer-Rao bound even in the case of very high photon counts, as is shown in Fig. 2(c) and Fig. 2(d). We believe that the Neyman’s χ²-based estimator should not be used because it is more biased than the Pearson’s χ²-based method, in addition to showing larger variances. Note that the Neyman’s χ²-based method was a frequently used method in FLIM imaging [7, 12]. We believe that one should choose the MLE for FLIM imaging, not only for low photon counts but also for high photon counts. Our analysis and simulations show biases in Pearson’s and Neyman’s χ²-based methods even for the case of more than 45000 photon counts. In addition, since the LSE shows larger variances than the MLE, there is no reason to favor the LSE over to the MLE.

Fig. 2 Variance in the estimation of parameters for different photon counts: (a) τ₁; (b) τ₂; (c) A₁; (d) A₂;

Download Full Size | PDF

As mentioned in the preceding section, numerical optimization may determine only a local minimizer since the objective functions for the MLE and the χ²-based estimators are non-convex functions. To study the effects of local minimizers in estimating decay and amplitude parameters, we conducted another simulation study. We generated 1024 Poisson random variables y_i whose mean is λ_i(θ) with true parameter values as τ₁ = 1ns, τ₂ = 4ns, A₁ = 72, and A₂ = 62 (number of average total photon counts is 24852). Then, we estimated the decay and the amplitude parameters using the MLE and the χ²-based methods while increasing initial parameter values from the true values to five times the true values. We repeated these estimations 5000 times with different Poisson random variable realizations. Figure 3 shows the sample mean of each estimator from 5000 realizations for different initial parameters. To our surprise, all the four estimation methods are very robust to the change of initial parameters. As shown in Fig. 3, the sample mean values of estimated parameters using the four methods are almost the same for the entire range of initial parameters. Similar results are found in the sample variance of each estimator which is shown in Fig. 4. One can see that the sample variances of the MLE and the χ²-based estimations do not change much for the entire range of initial parameters. In addition, the sample variance of the MLE and the LSE agree with the Cramer-Rao bound and the variance approximation formula for the LSE, respectively. These simulation results suggest that the objective functions of the four estimation methods might be locally convex functions for the cases of high photon counts.

Fig. 3 Mean of the estimated parameters for different initial parameter values (high photon counts): (a) τ₁ (true value=1); (b) τ₂ (true value=4); (c) A₁ (true value=72); (d) A₂ (true value=62);

Download Full Size | PDF

Fig. 4 Variance in the estimation of parameters for different initial parameter values (high photon counts): (a) τ₁; (b) τ₂; (c) A₁; (d) A₂;

Download Full Size | PDF

It is expected that the correct estimation of decay and amplitude parameters becomes more difficult if two decay parameters are more similar and/or the number of photons is smaller. For such cases, the effect of local minimizers in estimating the parameters might be more severe than the cases of high photon counts. To study the statistical properties of each estimator when the number of photon counts is small and two decay parameters are similar, we changed the true parameters values in λ_i(θ) to τ₁ = 1ns, τ₂ = 2ns, A₁ = 28 and A₂ = 18 (number of average total photon counts is 5138). Then, we conducted the same simulation study as the previous one. Figure 5 shows the sample mean values of estimated decay and amplitude parameters for different initial parameters. One can see that the sample mean values of the MLE exhibit small biases. In addition, the sample mean values are different for different initial parameter values. We suspect that this dependency on initial parameter is due to the effects of local minimizers. Compared with the MLE, the Pearson’s χ²-based estimation was more robust to the change of initial parameters. However, the Pearson’s χ²-based method shows larger biases than the MLE as shown in Fig. 5(a) through Fig. 5(d). The Neyman’s χ² and the LSE exhibit very large bias in estimating τ₂ for some initial parameters as shown in Fig. 5(b). Although the amounts of the biases of the MLE are different depending on initial parameters, the sample mean values of the MLE are the closest ones to the true parameter values for the entire range of initial parameters.

Fig. 5 Mean of the estimated parameters for different initial parameter values (low photon counts and similar decay parameters): (a) τ₁ (true value=1); (b) τ₂ (true value=2); (c) A₁ (true value=28); (d) A₂ (true value=18);

Download Full Size | PDF

Figure 6 shows the sample variances of each estimator from the 5000 realizations. We do not include the Cramer-Rao bound and the approximated variances for the LSE in Fig. 6 since neither the MLE is unbiased nor the approximations developed in the preceding section are valid if the number of photon counts is small. As shown in Fig. 6(a), Fig. 6(c) and Fig. 6(d), the sample variances of the Pearsons and Neymans χ² were smaller than those of the MLE for τ₁, A₁ and A₂. However, in estimating τ₂, Pearson’s and Neyman’s χ² showed larger variances than the MLE. In addition, the variances of the LSE were always larger than the MLE. Note that although the Pearson’s χ²-based estimator shows smaller variances than the MLE in estimating some parameters, it has larger biases than the MLE. Since unbiased estimation of decay and amplitude parameters is important, we believe that one should choose the MLE for the cases of low photon counts with similar decay parameters. Note that a previous investigation also reported that the MLE should be used for the cases of low photon counts based on empirical comparisons [12].

Fig. 6 Variance in the estimation of parameters for different initial parameter values (low photon counts and similar decay parameters): (a) τ₁; (b) τ₂; (c) A₁; (d) A₂;

Download Full Size | PDF

4. Conclusion

We used gradient linearization around true parameters to analyze the statistical properties of amplitude and decay constant estimators for FLIM. In the analysis, the MLE that was based on the Poisson likelihood function exhibited both unbiasedness and efficiency. The LSE was unbiased, but it showed a larger variance than the MLE. Since the Pearson’s and Neyman’s χ²-based estimators showed biases, it can be concluded that they are not suitable for even high photon counts. We believe that the analysis presented in this work can also be used to predict the performance of the estimators for FLIM.

Appendix A: Inequality for the variance approximations of MLE and LSE

Matrix Cauchy-Schwarz inequality is defined as follows [18]:

B^{T} B - [B^{T} A] {[A^{T} A]}^{- 1} [A^{T} B] \geq 0,

where, a matrix C ≥ 0 means that the matrix C is a positive semidefinite matrix. We define N × 2M matrices A and B as follows:

B_{i j} = \frac{1}{\sqrt{λ_{i} (θ)}} \frac{\partial λ_{i} (θ)}{\partial θ_{j}},

A_{i j} = \sqrt{λ_{i} (θ)} \frac{\partial λ_{i} (θ)}{\partial θ_{j}}, i = 0, \dots, N - 1, j = 0, \dots, 2 M - 1 .

Note that the Fisher information matrix is defined as follows:

F = B^{T} B .

In addition, since the H matrix, defined in Eq. (21) is A^TB, and the K matrix, defined in Eq. (22), is A^TA, the covariance approximation of the LSE satisfies the following inequality:

Cov {{\hat{θ}}_{L S}} \approx {[A^{T} B]}^{- 1} [A^{T} A] {[B^{T} A]}^{- 1} \geq F^{- 1} \approx Cov {{\hat{θ}}_{M L E}},

where θ̂_MLE denotes MLE.

Appendix B: Mean approximation for the estimation of a single time constant using the Pearson’s χ²

For the estimation of a single decay constant from a single fluorophore, the mean intensity is modeled as follows:

λ_{i} (θ) = A e^{- \frac{t_{i}}{τ}} * i (t_{i}) .

Then, the first order derivative is defined as follows:

\frac{\partial λ_{i} (θ)}{\partial τ} = \frac{t_{i}}{τ^{2}} λ_{i} (θ) .

Note that the first term in Eq. (25) dominates when A is very large, since the first term is proportional to A but others are not. Therefore, we approximate the second derivative of the objective function as follows:

h^{p} \approx \sum_{i = 0}^{N - 1} \frac{t_{i}^{2}}{τ^{4}} λ_{i} (θ) .

Therefore, the bias is approximated as follows:

E [\hat{τ}] - τ \approx \frac{1}{h^{p}} \frac{1}{2} \sum_{i = 0}^{N - 1} \frac{1}{λ_{i} (θ)} \frac{\partial λ_{i} (θ)}{\partial τ} = \frac{1}{2 A} \frac{\sum_{i = 0}^{N - 1} \frac{t_{i}}{τ^{2}}}{\sum_{i = 0}^{N - 1} \frac{t_{i}^{2}}{τ^{4}} e^{- \frac{t_{i}}{τ}} * i (t_{i})} .

Note that the bias in estimating τ approaches zero as A approaches infinity. However, note that the convergence rate is rather slow, since the bias is proportional to the reciprocal value of the amplitude A.

Acknowledgment

This work was supported by the Korean Research Foundation Grant funded by the Korean Government (KRF-2008-331-D00419) and Ewha Global Top 5 Grant 2011 of Ewha Womans University.

References and links

1. J. R. Lakowicz, Principles of Fluorescence Spectroscopy (Kluwer Academic/Plenum, 1999) [CrossRef] .

2. H. E. Grecco, P. Roda-Navarro, and P. J. Verveer, “Global analysis of time correlated single photon counting FRET-FLIM data,” Opt. Express 17, 6493–6508 (2009) [CrossRef] [PubMed] .

3. S. Kumar, C. Dunsby, P. A. A. D. Beule, D. M. Owen, U. Anand, P. M. P. Lanigan, R. K. P. Benninger, D. M. Davis, M. A. A. Neil, P. Anand, C. Benham, A. Naylor, and P. M. W. French, “Multifocal multiphoton excitation and time correlated single photon counting detection for 3-D fluorescence lifetime imaging,” Opt. Express 15, 12548–12561 (2007) [CrossRef] [PubMed] .

4. B. B. Collier and M. J. McShane, “Dynamic windowing algorithm for the fast and accurate determination of luminescence lifetimes,” Anal. Chem. 84, 4725–4731 (2012) [CrossRef] [PubMed] .

5. H. Cramer, Mathematical Methods of Statistics (Princeton University, 1999).

6. S. Laptenok, K. M. Mullen, J. W. Borst, I. H. M. van Stokkum, V. V. Apanasovich, and A. J. W. G. Visser, “Fluorescence lifetime imaging microscopy (FLIM) data analysis with TIMP,” J. Stat. Softw. 18, 1–20 (2007).

7. S. Pelet, M. J. R. Previte, L. H. Laiho, and P. T. C. So, “A fast global fitting algorithm for fluorescence lifetime imaging microscopy based on image segmentation.” Biophys. J. 87, 2807–17 (2004) [CrossRef] [PubMed] .

8. N. Boens, W. Qin, N. Basarić, J. Hofkens, M. Ameloot, J. Pouget, J.-P. Lefèvre, B. Valeur, E. Gratton, M. vandeVen, N. D. Silva, Y. Engelborghs, K. Willaert, A. Sillen, G. Rumbles, D. Phillips, A. J. W. G. Visser, A. van Hoek, J. R. Lakowicz, H. Malak, I. Gryczynski, A. G. Szabo, D. T. Krajcarski, N. Tamai, and A. Miura, “Fluorescence lifetime standards for time and frequency domain fluorescence spectroscopy,” Anal. Chem. 79, 2137–2149 (2007) [CrossRef] [PubMed] .

9. P. J. Steinbach, R. Ionescu, and C. R. Matthews, “Analysis of kinetics using a hybrid maximum-entropy/nonlinear-least-squares method: application to protein folding,” Biophys. J. 82, 2244–2255 (2002) [CrossRef] [PubMed] .

10. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C++ - The Art of Scientific Computing (Cambridge University, 2002).

11. T. Hauschild and M. Jentschel, “Comparison of maximum likelihood estimation and chi-square statistics applied to counting experiments,” Nucl. Instrum. Meth. A 457, 384–401 (2001) [CrossRef] .

12. M. Maus, M. Cotlet, J. Hofkens, T. Gensch, F. C. De Schryver, J. Schaffer, and C. A. M. Seidel, “An experimental comparison of the maximum likelihood estimation and nonlinear least-squares fluorescence lifetime analysis of single molecules,” Anal. Chem. 73, 2078–2086 (2001) [CrossRef] [PubMed] .

13. K. A. Walther, B. Papke, M. B. Sinn, K. Michel, and A. Kinkhabwala, “Precise measurement of protein interacting fractions with fluorescence lifetime imaging microscopy,” Mol. BioSyst. 7, 322–336 (2011) [CrossRef] [PubMed] .

14. J. Tellinghuisen and C. W. Wilkerson, “Bias and precision in the estimation of exponential decay parameters from sparse data,” Anal. Chem. 65, 1240–1246 (1993) [CrossRef] .

15. J. Fessler, “Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): applications to tomography,” IEEE Trans. Image Process. 5, 493 –506 (1996) [CrossRef] [PubMed] .

16. J. Kim and J. Fessler, “Intensity-based image registration using robust correlation coefficients,” IEEE Trans. Med. Imag. 23, 1430 –1444 (2004) [CrossRef] .

17. H. Van Trees, Detection, Estimation, and Modulation Theory, Part 1 (John Wiley & Sons, 2001).

18. T. G., “A matrix extension of the cauchy-schwarz inequality,” Econ. Lett. 63, 1–3 (1999) [CrossRef] .

19. P. Hall and B. Selinger, “Better estimates of exponential decay parameters,” J. Phys. Chem. 85, 2941–2946 (1981) [CrossRef] .

20. K. J. Mighell, “Parameter estimation in astronomy with poisson-distributed data. I. the $χ_{γ}^{2}$ statistic,” The Astrophys. J. 518, 380 (1999) [CrossRef] .

21. J. Moré and D. Sorensen, “Computing a trust region step,” SIAM J. Sci. Comput. 4, 553–572 (1983) [CrossRef] .

22. M. A. Branch, T. F. Coleman, and Y. Li, “A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems,” SIAM J. Sci. Comput. 21, 1–23 (1999) [CrossRef] .

23. R. Byrd, R. Schnabel, and G. Shultz, “Approximate solution of the trust region problem by minimization over two-dimensional subspaces,” Math. Program. 40, 247–263 (1988) [CrossRef] .

Statistical properties of amplitude and decay parameter estimators for fluorescence lifetime imaging

Abstract

1. Introduction