Optimization of linear signal processing in photon counting lidar using Poisson thinning

Matthew Hayman; Robert A. Stillwell; Scott M. Spuler

doi:10.1364/OL.396498

Photon counting detectors are frequently employed in atmospheric lidar due to the small backscattered signals from atmospheric targets. In systems with low power transmitters, photon counting is essential to capture useful atmospheric observations. Shot noise, a Poisson random process, is typically the dominant source of random noise in photon counting systems. In order to overcome shot noise, observations are typically accumulated over many laser pulses. The profiles are then smoothed—generally in time and range—to further suppress noise. If the smoothing operation includes decimating the lidar resolution, this is equivalent to binning. The limitation of smoothing is that it risks biasing the data by blurring out statistically significant structures in the profile. This means that the practice of integrating profiles until random noise is satisfactorily suppressed likely generates other, unquantified errors in a lidar product. This is particularly problematic for nonlinear retrievals such as water vapor estimates from differential absorption lidar (DIAL) and particle extinction from both high spectral resolution lidar (HSRL) and Raman lidar. In general, most atmospheric lidar systems aim to reduce noise while accurately capturing the resolvable variations in the atmosphere. Ideally the biases imposed by integration or smoothing are balanced with the more quantifiable random errors. Inevitably, the choices we make in signal processing impact the accuracy and precision of the retrieved data products—whether accounted for or not. This presents a particular challenge to generalizing observations across platforms even when the processing is identical, because different lidars are likely to have different noise and bias characteristics.

The use of various low pass filters for noise suppression is well established in the lidar community. Leblanc et al. [1] provides a thorough description of various digital filter designs for lidar signal processing, including the resulting resolution of the profile after filtering. The work considers the factors related to noise suppression in ozone DIAL and various low pass filter designs, but without a priori knowledge of the backscattered signal and noise power spectrum, one cannot definitively know the best filter for the scene. Godin et al. [2] considered several different filter algorithms that were actively employed for ozone DIAL processing. Those algorithms were evaluated using synthetic ozone DIAL data. Because the data was synthetic, allowing knowledge of the true signals, they were able to include errors caused by smearing in the algorithm evaluations. The use of synthetic data is helpful for evaluating the impact of smearing in a general sense and perhaps adequate for analyzing some forms of data with reasonably well known structures, such as stratospheric ozone data. However, this is not true of troposphere observations that can be highly variable, so the error contributions of different smoothing algorithms are difficult to generalize. In the troposphere and for most observations in general, the ideal filter will inevitably depend on the particular scene under observation, which cannot be known a priori.

The challenge in optimizing signal processing to a scene is having an objective measure of errors from both random noise and bias (smearing). With such a metric, we can trade and dynamically optimize the filter design to produce the best balance between these two sources of error. One method for this is known as hold-out cross validation [3], in which a statistically independent validation data set is used to evaluate the quality of fit. In the case of photon counting, a validation data set with the same signal and statistical properties as the original data is easily obtained through a technique called “Poisson thinning.” Poisson thinning makes two statistically independent copies of the original photon counting profile, where each has approximately half the original photon counts. Here, one profile is processed using smoothing kernels to reduce random noise, while the second is used to evaluate the effect of noise suppression and smearing through cross validation. The smoothing kernel that produces the best cross-validation result is the best kernel (of the evaluated set) for the scene.

Poisson thinning [4] can generally be employed for estimating signal processing tuning parameters. It can be used to estimate the best polynomial order in a fit, as well as regularization terms in more sophisticated processing algorithms such as Poisson total variation (PTV) [5]. In this work, we show that Poisson thinning allows us to better customize linear smoothing to each scene without any need for human intervention. The added computational expense is dictated by the number of filters we wish to consider, where the computation time is roughly equal to the number of filters times the time to evaluate a single filter.

Poisson thinning relies on the assumption that the observed photon counts $s$ is a Poisson random process described by

(1)$${P_\alpha}(s\!) = \frac{{{e^{- \alpha}}{\alpha ^s}}}{{s\!!}},$$

where ${P_\alpha}(s\!)$ is the probability distribution function (PDF) of photon counts $s\!$, $\alpha$ is the mean number of photons in the time interval defined by the multi-channel scalar bin and represents the underlying signal we wish to recover.

Poisson thinning is a process where we effectively flip a coin for each counted photon in an observation $s$ and place it one of two independent profiles based on the outcome of that coin flip. If the coin flips heads, the photon increases the count of set $f$, and, if tails, the photon is placed in set $g$. Because we make no distinction between photons, we can effectively determine the number of photons in $f$ by generating a binomial random number such that $f \sim {\rm Binomial}(p,s\!)$, with heads probability $p$ and number of trials $s\!$. It therefore stands to reason any photons that were not sorted into $f$ must be in the other thinned profile $g$, so

(2) $$g = s - f.$$

If the mean number of photons in $s$ is $\alpha$, then the mean number of photons in $f$ is $p\alpha$ and $(1 - p)\alpha$ in $g$.

This produces $f$ and $g$, which are themselves statistically independent Poisson random variables. This is perhaps most easily seen by computing the joint PDF of $s$ and $f$:

(3)$${P_\alpha}(s\!,f) = {P_\alpha}(s\!)P(f|s\!),$$

where ${P_\alpha}(s\!,f)$ is the joint PDF, and $P(f|s)$ is the PDF of thinned counts $f$ given original counts $s\!$. The joint PDF then can be written as

(4)$${P_\alpha}(s\!,f) = {e^{- \alpha}}\frac{{{\alpha ^s}}}{{s\!!}}\left({\begin{array}{*{20}{c}}s\\f\end{array}} \right){p^f}{(1 - p)^g}.$$

When evaluating $s\!$, choose $f$, substituting $\alpha = p\alpha + (1 - p)\alpha$ and simplifying results in

(5)$${P_\alpha}(s\!,f) = \frac{{{e^{- p\alpha}}{{(p\alpha)}^f}}}{{f!}}\frac{{{e^{- (1 - p)\alpha}}{{[{(1 - p)\alpha} ]}^g}}}{{g!}}.$$

Thus, the joint PDF is fully described by two independent Poisson random variables. The thinned profiles are Poisson distributed and statistically independent, but their mean signals are scaled versions of the original $\alpha$. In order to optimize filter kernels, we require $p = 1/2$ so that the mean of the thinned signals are identical. In the last example, we will require three independent sets and use $p = 1/3$.

Poisson thinning is simple to implement, and the process is fast. For example, the following two lines of python code will provide two thinned profiles $f$ and $g$ from the original photon count profile $s\!$: f = numpy.random.binomial(s, 0.5, size = s.shape) g = s - f

The filtering operation will be limited here to linear convolution, so, in the simplest case, the filtered signal will be described as

(6)$$\tilde{\textbf{f}}(\textbf{a}) = \textbf{f}\,*\,\textbf{h}(\textbf{a}),$$

where $\tilde{\textbf{f}}$ is the filtered version of $\textbf{f}$, and $\textbf{h}(\textbf{a})$ is the filter kernel or impulse response with tuning parameters $\textbf{a}$.

One limitation of directly applying Eq. (6) is that the expected or known structure, such as background variation and geometric overlap, may be the limiting factors in the size of the smoothing kernel. Thus, more random noise can often be suppressed if we remove the known structure prior to smoothing, then add it back in to obtain the signal estimate. So, we can more effectively estimate the signal using

(7)$$\tilde{\textbf{f}}(\textbf{a}) = {y^{- 1}}\left[{y(\textbf{f})*\textbf{h}(\textbf{a})} \right],$$

where $y$ is the function used to remove the expected structure from the profile, and ${y^{- 1}}$ is its inverse.

In the examples included here, only the background is removed so that

(8)$$y(\textbf{f}) = \textbf{f} - \textbf{b}.$$

For a given photon counting scene, we wish to optimize our selection of the tuning parameters $\textbf{a}$ to minimize error in the filtered profile. This error is a combination of random noise and bias. Because the noise in $\textbf{g}$ is statistically independent of the noise in $\textbf{f}$, residual random noise is penalized in the validation process. But, the signal that generates $\textbf{f}$ and $\textbf{g}$ [$\alpha$ in Eq. (5)] is common to the two independent observations, so a bias will also be penalized.

In the case presented here, we seek to find the most likely estimate of the signal $\tilde{\textbf{f}}(\textbf{a})$ for the observed validation data $\textbf{g}$. In order to evaluate this, we seek to minimize the negative log-likelihood of a Poisson random number (which therefore maximizes its likelihood). The loss function we wish to optimize is given by

(9)$${\cal E}(\textbf{a}) = \sum \left[{\tilde{\textbf{f}}(\textbf{a}) - \textbf{g}\ln \tilde{\textbf{f}}(\textbf{a})} \right].$$

Here, lower values of ${\cal E}(\textbf{a})$ indicate better validation. The value of $\textbf{a}$ that minimizes ${\cal E}(\textbf{a})$ is the optimal filter of the evaluated set.

Given the ability to generate $\textbf{f}$ and $\textbf{g}$ from photon counting data $\textbf{s}$ and the ability to evaluate a filtered signal $\tilde{\textbf{f}}(\textbf{a})$ against $\textbf{g}$ with the negative log-likelihood, the procedure for finding the optimal smoothing kernel is relatively straightforward. We evaluate each filter parameterization $\textbf{a}$ in a set of considered options $\textbf{A}$ and use the result that generates the lowest value of ${\cal E}$. This algorithm is outlined in Algorithm 1.

Algorithm 1. Filter Optimization

View Table

Fig. 1. Simulated lidar data where the under smoothed (no smoothing) photon counts are blue diamonds, the optimally filtered fit data (46 m width) are green squares, an over filtered case is orange circles (81 m width), and the true (noise and bias free) profile is the black dotted line. The absolute difference between each of these profiles and the true profile is shown on the right, where eleven point smoothing has been applied to simplify interpretation.

Download Full Size | PDF

We start by demonstrating this algorithm on a synthetic profile to show that it provides an accurate estimate of the optimal filter. A profile signal is generated consisting of exponential decaying molecular scattering in range. The boundary layer (below 3 km) consists of aerosol scattering with a backscatter ratio of five (the aerosols scatter four times more light than the molecules). In addition, there is an enhancement in backscatter at the top of the boundary layer, which is common due to hygroscopic aerosol growth. This profile is similar to a MicroPulse DIAL (MPD) [6–8] observed profile that will be considered later. In Fig. 1, the true, noise-free profile is shown as the black dotted line. The unsmoothed (or undersmoothed) photon counts are shown as blue diamonds. This profile is subsequently smoothed using a variety of Gaussian filters. The optimally filtered counts are shown as green squares, and an over filtered case is shown as orange circles. On the right panel, the error between each filtered profile compared to the true signal is shown with eleven point range smoothing to average the statistical noise. The results shown in the right panel of Fig. 1 show that over smoothing gives the lowest errors in regions of low variability where it can suppress random errors without imposing significant bias, but over smoothing has the highest errors in dynamic regions such as the cloud at 5 km and the top of the boundary layer. Conversely, the under smoothed data is noisy in the less dynamic regions but does not smear out cloud structures. The optimal filter is a compromise between these two cases with less statistical noise than the under smoothed case and less smearing than the over smoothed case. In effect, its error is bounded by the other two cases. While this should be quite intuitive, the problem has always rested with knowing which smoothing kernel achieves this optimal compromise. This can be accomplished using Algorithm 1. Figure 2 shows the negative log-likelihood for each evaluated filter. The optimal result (that filter that produces the lowest negative log-likelihood) corresponds to a filter width of 46 m. The black dashed line in Fig. 2 is the negative log-likelihood of the true signal, which demonstrates that the optimal Gaussian filter still does not recover the true signal. The over smoothed case uses a filter width of 81 m.

Fig. 2. Negative log-likelihood (with offset removed) of different Gaussian smoothing kernel widths. The black dashed line indicates the negative log-likelihood of the true (noise free) signal similarly evaluated against the validation data.

Download Full Size | PDF

Figure 2 shows that Algorithm 1 provides an effective approach for minimizing the error in the profile, with no prior knowledge of the true signal. Also note in Fig. 2 that over smoothing (filters to the right of optimal) can result in errors that are comparable to, or exceed, random errors. This problem is further compounded because biases caused by over smoothing are generally not well captured in uncertainty estimates.

If we attempted a similar validation without Poisson thinning, i.e., using the same photon count profile for both smoothing and validation, the negative log-likelihood would increase monotonically because the random noise in the smoothed signal is correlated with its original noisy signal. Thus, Poisson thinning is the key component to enable optimized signal processing. It allows us to tailor the smoothing kernels for each scene and even evaluate across different kernel types without relying on synthetic data or making assumptions about the atmospheric structure.

We now consider a data set from the National Center for Atmospheric Research (NCAR) MPD HSRL channels [7–9]. In this case, the HSRL channel was designed to provide Rayleigh–Doppler correction for temperature DIAL retrievals below 6 km, so the presence of high clouds in the daytime makes this a challenging scene to process. For brevity, here, we only consider the combined channel, which is analogous to an elastic backscatter lidar system. The photon count profiles have a base resolution of 10 s. The background subtracted counts, with the fixed smoothing kernel used in Ref. [7] ($1\; \min \times\, 37.5\; {\rm m}$) are shown in the top panel of Fig. 3.

Fig. 3. Combined channel photon counts after using a fixed smoothing kernel ($1\; \min \times\, 37.5\; {\rm m}$) (top) and after applying optimal smoothing (middle). The optimal time filter widths are shown with respect to range and time in the right and bottom plots, respectively. The filter widths of the fixed kernels are shown with the black dashed lines.

Download Full Size | PDF

We independently optimize the smoothing in both time and range using Gaussian kernels, where their reported width is the standard deviation of the Gaussian. Note that this means some kernels have widths less than the grid resolution but still have nonzero values in adjacent grid points. The result of this optimized smoothing is shown in the middle panel of Fig. 3, where the optimal kernel sizes are plotted along the bottom (for range smoothing) and right (for time smoothing). The dotted black line shows the kernel size used in the top, fixed kernel plot.

There is considerable variability in optimal kernel size depending on the particular features in the scene. At altitudes near 3 km between 13:55 and 14:50 small liquid water clouds are present (which have large gradients and high SNR), forcing the selection of small smoothing kernels of 1–6 s (nearly delta functions). Those same water clouds force limited range smoothing between 13:55 and 14:50 except where small gaps appear in the clouds. Meanwhile, clear air tends toward the selection of large kernels (e.g., above liquid clouds at 3 km, below cirrus at 7 km, and gaps in liquid clouds). This substantial variability (sometimes larger and sometimes smaller than the fixed kernels) illustrates that a generalized fixed kernel will probably fail to accurately represent tropospheric lidar data in a general sense. Note that because the color scales are logarithmic, pixels where the background subtracted counts are less than zero appear white.

We note that the optimally smoothed profile does better than the fixed profile at higher altitudes (more noise is suppressed in the clear air between cirrus and liquid clouds as well as in the regions containing cirrus). In addition, careful inspection shows that gaps in the liquid water cloud structure are better preserved in the optimally smoothed profile.

In order to objectively evaluate the profiles, we thinned the original photon counts into three profiles: fit, validation, and test data. The test set provides a statistically independent set of photon counts to score the different signal processing approaches. This is needed because the validation data set is already used to select the smoothing kernels, so the optimally smoothed data is not independent of the validation data. This evaluation uses the negative log-likelihood from Eq. (9), where $\textbf{g}$ is now the independent test data. For the scene in Fig. 3, the optimally filtered data has a negative log-likelihood 100,000 lower than the unfiltered counts and 800,000 lower than the fixed filtered counts. The lower negative log-likelihood of the optimally filtered data objectively demonstrates that optimized filter selection provides the best estimate of the mean photon counts. The optimized smoothing will inherently adjust to the scene being processed, providing the best balance between smearing and random noise. Furthermore, it requires no human intervention and is dynamically adaptable to any scene. While the performance difference with fixed smoothing methods may vary, this optimization approach should always produce a better test result regardless of the scene under consideration.

One drawback to the method presented here is that the presence of clouds in a relatively localized region can impact smoothing across all corresponding times and ranges. This can be partially mitigated by masking high SNR regions then replacing them after smoothing. There are also more advanced algorithms, not considered here, such as PTV [5] and block-matching three-dimensional (3D) filtering (BM3D) [10] that adjust averaging in localized regions based on correlated structure and therefore do not suffer from this drawback. But, these algorithms are typically more difficult to implement and more computationally expensive than the method employed here.

Poisson thinning is a simple technique for generating statistically independent profiles from photon counting data. It enables optimal tuning of signal processing routines to optimally balance suppression of random noise and smearing of resolvable atmospheric structure. In is work, we have shown how Poisson thinning can be used to optimize smoothing kernels for a particular photon counting scene and demonstrated that it provides a better estimate of the mean count rate than fixed smoothing approaches. The code and data used to generate the plots in this work are available at [11], including functions for implementing Poisson thinning in python.

Funding

National Science Foundation (Cooperative Agreement No. 1852977).

Acknowledgment

The authors thank Dr. Willem Marais for helpful discussions and comments in compiling this manuscript.

Disclosures

The authors declare no conflicts of interest.

REFERENCES

1. T. Leblanc, R. J. Sica, J. A. E. van Gijsel, S. Godin-Beekmann, A. Haefele, T. Trickl, G. Payen, and F. Gabarrot, Atmos. Meas. Tech. 9, 4051 (2016). [CrossRef]

2. S. Godin, A. I. Carswell, D. P. Donovan, H. Claude, W. Steinbrecht, I. S. McDermid, T. J. McGee, M. R. Gross, H. Nakane, D. P. J. Swart, H. B. Bergwerff, O. Uchino, P. von der Gathen, and R. Neuber, Appl. Opt. 38, 6225 (1999). [CrossRef]

3. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 1st ed. (Springer, 2001).

4. A. K. Oh, Z. T. Harmany, and R. M. Willett, in IEEE International Conference on Image Processing (2013), pp. 484–488.

5. W. J. Marais, R. E. Holz, Y. H. Hu, R. E. Kuehn, E. E. Eloranta, and R. M. Willett, Appl. Opt. 55, 8316 (2016). [CrossRef]

6. S. M. Spuler, K. S. Repasky, B. Morley, D. Moen, M. Hayman, and A. R. Nehrir, Atmos. Meas. Tech. 8, 1073 (2015). [CrossRef]

7. M. Hayman and S. Spuler, Opt. Express 25, A1096 (2017). [CrossRef]

8. R. A. Stillwell, S. M. Spuler, M. Hayman, K. S. Repasky, and C. E. Bunn, Opt. Express 28, 71 (2020). [CrossRef]

9. NCAR/EOL Remote Sensing Facility, “NCAR MPD data. Version 1.0,” 2020, https://doi.org/10.26023/MX0D-Z722-M406.

10. L. Azzari and A. Foi, IEEE Signal Process. Lett. 23, 1086 (2016). [CrossRef]

11. https://ncar.github.io/PoissonLinearProcessing/.

Optimization of linear signal processing in photon counting lidar using Poisson thinning

Abstract

Funding

Acknowledgment

Disclosures

REFERENCES

Cited By

Figures (3)

Tables (1)

Equations (9)

Optics Letters