Abstract
A probability model for a 3-layer radiative transfer model (foreground layer, cloud layer, background layer, and an external source at the end of line of sight) has been developed. The 3-layer model is fundamentally important as the primary physical model in passive infrared remote sensing. The probability model is described by the Johnson family of distributions that are used as a fit for theoretically computed moments of the radiative transfer model. From the Johnson family we use the SU distribution that can address a wide range of skewness and kurtosis values (in addition to addressing the first two moments, mean and variance). In the limit, SU can also describe lognormal and normal distributions. With the probability model one can evaluate the potential for detecting a target (vapor cloud layer), the probability of observing thermal contrast, and evaluate performance (receiver operating characteristics curves) in clutter-noise limited scenarios. This is (to our knowledge) the first probability model for the 3-layer remote sensing geometry that treats all parameters as random variables and includes higher-order statistics.
©2012 Optical Society of America
Corrections
Avishai Ben-David and Charles E. Davidson, "Probability theory for 3-layer remote sensing radiative transfer model: errata," Opt. Express 21, 11852-11852 (2013)https://opg.optica.org/oe/abstract.cfm?uri=oe-21-10-11852
1. Introduction
The radiative transfer model with simple 3-layer geometry has been the primary physical model for long wave infrared (LWIR) or “thermal infrared” passive remote sensing for many years [1–7]. This model has been used for detecting vapor and aerosol targets in the atmosphere in conjunction with detection algorithms such as matched filters and anomaly detectors. A probability density function (pdf) model for the radiative transfer geometry is of great interest and importance for predicting the stochastic nature of the measured signals, evaluating performance, and for developing detection algorithms.
The traditional three-layer remote sensing model is based on the deterministic radiative transfer equation [8,9] where radiative transfer parameters are non-fluctuating. In 1991, Stephens et al. [10] gave a review of efforts to include stochastic components into the radiative transfer equation, and proposed a method for determining pdfs for the optical properties of stratiform clouds. Stephens’s model was based on the two-stream solution [8,9] to the radiative transfer equation for a single layer, where the optical depth and single scattering albedo were treated as random variables with empirically measured pdfs (the asymmetry parameter was fixed). Using transformation of variables and numerical techniques, Stephens obtained output distributions for the albedo, diffuse transmission, and absorption of the cloud. Other work [11,12] has incorporated fluctuations by characterizing a few radiative transfer parameters by a mean and variance (second-order statistics) while holding others fixed; these papers do not address the underlying probability distribution of the predicted spectral radiance. The goal of our work is to predict the pdf of spectral radiance not only due to a fluctuating vapor cloud, but also due to fluctuations in the atmosphere in front and behind the cloud, where all parameters are stochastic and we employ higher order statistics.
In the literature there is a seemingly related area of research regarding radiative transfer through random or stochastic media [13–15], sometimes referred to as stochastic radiative transfer. However, as Byrne states on page 386 of [14], “The common term ‘stochastic radiative transfer’ is, strictly, a misnomer. The situation we discuss is the problem of finding the average (emphasis added) radiative properties of an ‘ensemble’ (a generally infinite set) of problems, by means other than actually solving each problem and averaging results”. Thus, so-called stochastic radiative transfer does not provide information regarding the distribution of spectral radiance. In our paper when we use the term stochastic radiative transfer or stochastic model, we do not intend to refer to ensemble averages, but to complete probability distribution information.
In this work we develop a stochastic 3-layer radiative transfer model that consists of a foreground layer between the sensor and the target, a non-scattering target layer (e.g., a vapor cloud), a background layer at the far end of the target, and an external radiation source (e.g., a mountain, building, etc.) at the end of the line of sight (LOS). The external radiation source can also be taken as atmospheric radiation, in which case it can be thought of as a fourth layer. Each layer is characterized by a transmittance (which attenuates the radiation from previous layers) and serves as a source of radiation (due to its intrinsic temperature). Our main objective is to derive a pdf for the at-sensor spectral radiance (i.e., the spectral radiance—hereafter just radiance—input to a hypothetical sensor) due to fluctuations in transmittance and the radiation produced by the layers. These fluctuations contribute to variance in sensor measurements that is typically referred to as “clutter” (variation from the scene that potentially interferes with the task of detecting a particular signal of interest). Technological advancements have led to very low sensor noise, and typical remote sensing applications are limited by clutter noise. Thus, since it is of primary importance to understand the stochastic nature of the physical environment, and less important to model sensor noise, we neglect modeling sensor characteristics in this paper. At-sensor radiance statistics, when combined with specific sensor models and detection algorithms, can produce accurate end-to-end performance predictions. Predictions derived from the at-sensor radiance only (as in this paper) may be interpreted as upper bounds on observed performance from an actual sensor.
We focus mainly on modeling each of the three layers as homogeneously composed layers at thermal equilibrium (the target layer is always modeled this way), in which case the layer radiance is due to self emission, and the stochastic nature of the radiance is due to fluctuations in temperature and density (which in turn causes the layer’s transmittance and emissivity to be stochastic). This type of model is likely applicable to near-horizon LOS (e.g., LOS1 in Fig. 1 ), but not up- or down-looking LOS (e.g., LOS2 in Fig. 1) since the vertical stratification of the atmosphere makes it less likely for a layer to be homogeneously composed (local thermal equilibrium is always assumed). Therefore, we also allow the layer radiance to be taken with an assumed pdf intended to capture all sources of fluctuations in a layer without assuming homogeneity. We will show that a lognormal distribution is a reasonable distribution to assume for the radiance from a layer (as well as for the transmission). In fact, normal and lognormal distributions play a central role in our model due to the central limit theorem and the role these distributions play in sampling from additive and multiplicative processes [16–21].
To derive the pdf of the at-sensor radiance, we allow any or all of the radiative transfer model inputs to become random variables (RVs), assume a particular distribution and population parameters for the variates in each layer (transmission and temperature), and assume that the layers are independently distributed. Then, the pdf of the at-sensor radiance is determined by finding population parameters for a distribution that matches the first four moments (mean, variance, skewness, and kurtosis, denoted E, V, S, and K, respectively) computed analytically through the 3-layer radiative transfer model equation. A pdf that matches all moments is the correct (true) analytical distribution, however, in practice one does not have access to reliable higher moments, and furthermore, fitting routines that include higher moments are extremely complex. In our statistical model we make use of the first 4 moments, only; in detection theory the sought-after information (the presence of the target) resides in the tail of the distribution [22–24], hence the importance to describe a pdf with arbitrary skewness and kurtosis values that may be used as an indicator for detection [25].
There are several systems of distributions designed to exactly match any combination of values for the first four moments, including Pearson distributions, Burr distributions, and systems based on logistic and Laplace distributions, respectively [18,26]. In our work we chose the Johnson system [18,26–28] for simplicity (it includes only four distributions), convenience (only the Johnson SU distribution is typically needed for our model), and the central role the lognormal distribution plays in the system. The ability of the Johnson SU distribution to model the at-sensor radiance will be established by comparing to stochastic simulations. The SU distribution is also useful to model intermediate terms in the model (such as the radiance incident on the back of the cloud due to the source and background layers), although sometimes less well.
Our statistical model is a “forward model”. We do not address the general question of how to choose and determine the statistical properties of the radiative transfer variates (transmissions and temperatures) for a given physical scenario, nor do we address the “inverse problem” of estimating statistical properties of the radiative transfer variates from radiance measurements. These are both challenging problems deserving further research.
Our statistical model is for a single wavelength (i.e., the radiance is a scalar) hence, we develop a univariate pdf model. We reserve the extension from univariate to multivariate model (where radiance is specified at multiple wavelengths—and thus is a vector quantity—as is required for multi- or hyperspectral applications) for a future publication. Our model can be extended from 3 layers to n layers (though computing analytical moments for n layers is very cumbersome).
The remainder of the paper is organized as follows. In section 2 we present the radiative transfer model. In section 3 we present the details of the pdf model. In section 4 we detail the simulations that are used to demonstrate and investigate our statistical model. In the simulations we use turbulence theory to set the population parameters for the layer temperatures. In section 5 we show results of simulations and we discuss the performance of our model. We conclude with a summary. We include four appendices to aid the reader with different technical aspects of our presentation. Definitions and properties of the lognormal and Johnson SU distributions (which play a key role in the stochastic model) appear in Appendices A and B, respectively. Moments and properties of the expectation operator appear in Appendices C and D.
2. Radiative transfer model
We use a standard layer model for passive LWIR remote sensing, using the following notation (for simplicity we omit the wavelength dependence of all variables in the model). Let Lf, Lb, Ls denote radiances originating from the foreground, background, and source layers, respectively; let tf, tc, tb denote transmissions of the foreground, cloud (target), and background layers, and let B(Tc) denote the Plank blackbody radiance at the cloud temperature, Tc. Cloud transmission may also be written in terms of optical depth, , as , and due to local thermal equilibrium and Kirchoff’s law, the emissivity of the (non-scattering) cloud layer is .
Using terminology from detection theory [29]—where H0 denotes the null hypothesis (target cloud is absent) and H1 is the alternative hypothesis (target is present)—the total at-sensor radiance given by the model in the presence of the target is denoted M|H1 or M(1) and is given by
The total at-sensor radiance in the absence of the cloud, denoted M|H0 or M(0), can be obtained from Eq. (1) by setting tc to 1, resulting in
Equations (1-2) are good for any LOS. However, for LOS1 or for when foreground and background layers are well-approximated as homogeneous layers at thermal equilibrium, the foreground and background layer radiances may be written as
The terms and are emissivities for the foreground and background layers, respectively (in the LWIR most of the attenuation is due to molecular absorption, and therefore scattering may oftentimes be neglected). In many cases one assumes . Figure 1 diagrams the radiative transfer model and physical parameters for each layer for both horizontal and slant-path lines of sight (LOS1 and LOS2, respectively).
It is also convenient to define at-sensor radiance terms corresponding to each layer, i.e., layer-radiances after attenuation by subsequent layers, as follows: , , , , , and . Parenthetical superscript notation is necessary for Mb and Ms to differentiate between H0 and H1 since the presence of the cloud affects them (whereas Mf is unaffected by presence of the cloud and Mc only exists under H1, and thus does not need additional notation). Using these quantites, the total at sensor radiances in Eqs. (1-2) are just the sum of the appropriate individual at-sensor radiance terms: and , respectively.
The information about the vapor cloud’s presence, ΔM, can be seen by subtracting Eq. (2) from Eq. (1) to obtain
where ΔT, known as thermal contrast [1,4], is the difference between the input radiance on the back of the cloud, Lin, and the blackbody radiance of the cloud. If the thermal contrast is negative, , the cloud is observed in emission: the cloud is “warmer” than the input radiance and adds photons to the LOS (). If the thermal contrast is positive, , the cloud is observed in absorption: the cloud is “colder” than the input radiance and removes photons from the LOS (). A non-zero value of ΔT is required for vapor detection, and thus is the “engine” behind thermal infrared remote sensing. It is sometimes convenient to present the thermal contrast in temperature units (Kelvin) aswhere is the inverse of the blackbody function.It is essential to note that Eqs. (1-4) describe the traditional deterministic model (henceforth called the physical model): they predict the value of the at-sensor radiance given constant (non-fluctuating) values of the physical parameters. The goal of this paper is to extend the physical model by allowing any (or all) of the model inputs to be RVs; the result is that model outputs (M(0), M(1), and ΔM, as well as intermediate quantities such as ΔT, Lin, etc.) are themselves RVs and characterized by appropriate pdfs. The details of how this is done are presented in the next section, however there are immediate implications that can be introduced here. For example, in the physical model, means no information about the cloud reaches the sensor. However, in the stochastic model, even when , the cloud may still be detectable based on fluctuations around the mean. Additionally, it is possible for the cloud to be observed in emission and in absorption at the “same time”, i.e., at a given moment there will be a non-zero probability for the cloud to be in absorption and simultaneously a non-zero probability for the cloud to be in emission. Furthermore, the polarity (sign) of is affected by the foreground layer (that is stochastic in our model), thus, a cloud that is in a given radiation mode, say emission, for which the probability for ΔT to be less than zero is one—i.e., —may appear in the measurements in the reverse mode with . Only with a probability model can one study these phenomena, which we demonstrate and discuss further in Section 5.
3. Probability model
In this section we describe how initial distributions for the input RVs are chosen (in subsection 3.1), and detail how the pdfs of the output RVs are determined (subsection 3.2). Input RVs are assumed independent (this includes RVs from different layers, e.g., Tf and Tc, as well as RVs within a single layer, e.g., tc and Tc). In general this choice is reasonable, however, in adiabatic conditions air temperature and density are not independent, causing correlation between temperature and transmission (proportional to density). This dependence could, in principle, be handled using the ideal gas law.
H0 and H1 radiances along a fixed LOS are (necessarily) measured at different times, and therefore we assume that all the population parameters in the radiative transfer model remain unchanged (stationary) during the measurement period and that the sampling is independent. For example, we assume that there is no drift in the population parameters or distribution of the foreground temperature, hence, the mean and variance (and all higher moments) for are fixed. In a recent paper [30] it was shown that for a short period ~3-5 s temperature drift is small and the stationary assumption for temperature is reasonable.
3.1 Distribution of input terms
3.1.1 Temperature – Planck blackbody function
The Planck blackbody functionis given by
where, for a given wavelength in microns and radiance units W/cm2/sr/μm, and ; for λ in wavenumbers (cm−1) and radiance units W/cm2/sr/cm−1, and .Assume that the temperature, T, is a normally distributed random variable (a commonly made assumption, supported by the central limit theorem) with mean and variance , i.e., . Deriving the exact pdf of B(T) via transformation of variables is straight-forward, however exact moments of the distribution (required in our model) are not computable analytically. Instead, an extremely good approximation is that B(T) is lognormally distributed, (for which all moments are easily computed), which is shown as follows.
Typical terrestrial values of μT are on the order of 300 K with variances at least an order of magnitude smaller. Under these conditions, 1/T is approximately linear (e.g., the points are close to equidistant from the median and therefore the mean is close to ). Therefore, a first-order Taylor expansion of around , giving , leads to the excellent approximation . It follows (Appendix A) that .
Were it not for the “-1” term in the denominator of (6), the inverse and multiplicative properties of the lognormal distribution (see Appendix A) could be used to show that . Although it is tempting to simply ignore the “minus one” term—on the grounds that typically in the LWIR—it creates a bias (shift) in the distribution that while small, is significant compared to the variance of the distribution. A better method is to approximate the denominator as a standard 2-parameter lognormal distribution. That is, let be the denominator of (6). The goal is to find population parameters μx and for a lognormal distribution that matches the mean and variance of x without neglecting the minus one. The mean and variance of x are computed (see Appendix A) as and , respectively, and population parameters are computed with Eq. (A2) from Appendix A. Finally, the inverse and multiplicative properties of the lognormal distribution can now be used to show that the blackbody radiance is lognormally distributed, , where and . The fit is so good that B(T) may essentially be considered an “input term” to the model.
3.1.2 Transmission
We choose to represent the transmission of a layer in the model as a lognormally-distributed variate, . Partially this is due to the convenience of working with lognormal distributions. More importantly, however, is a physical argument based on the statistics of multiplicative processes.
Consider the transmission of monochromatic radiation through a general layer composed of j sub-layers. The total transmission will be a product of the sub-layer transmissions, . Each sub-layer transmission is a non-negative random variable, . By the central limit theorem for multiplicative processes (see property (x) of Appendix A) t approaches a lognormal distribution, . Note that values for μt and for distributions generated in this manner will naturally enforce the constraint that the probability for must be zero, i.e., . Note also that due to the multiplicative property of the lognormal distribution, if the transmissions of the sub-layers are themselves lognormally distributed, , then with and .
The above central-limit argument is likely very appropriate for foreground and background layer transmissions, since these tend to be due to the ambient atmosphere along a path that may be long (e.g., a background layer for LOS to space). The transmission of the cloud is qualitatively different than that of the foreground or background layers in the sense that the cloud is localized and (in our remote sensing applications) it is usually due to an external source (e.g., a man-made cloud). An additional justification for the cloud transmission tc to be lognormally distributed is based on the assumption of Gaussian statistics for concentration that is commonly used in plume dispersion models [31, Ch. 18]. Since the optical depth is proportional to concentration, this implies , where μτc and are the mean and variance of the normally-distributed optical depth.
As already mentioned, a real (i.e., physical) transmission is always bounded . An arbitrary lognormal distribution will always be positive but may not have a non-zero probability for . Because the user may choose population parameters for the layer transmissions where , we actually represent all transmissions in the model as truncated lognormal distributions that enforce an upper bound at 1 (if the probability that is negligible—e.g., where η is a factor large enough, for example , to ensure a negligible probability for the normally-distributed optical depth to be negative—then truncation will have no effect). Moments of the truncated lognormal distribution are given in Appendix A, property (ix).
3.1.3 Radiance
We assume that the ambient atmospheric radiance is lognormally distributed (e.g., foreground and background layers for a slant-path LOS—LOS2 in Fig. 1—where the layers are modeled generically without assuming homogeneity). This choice is partially motivated by the fact that radiance values must be non-negative (the lognormal distribution naturally enforces this constraint) and that the lognormal distribution is easy to work with. More importantly, however, is that there is a physical argument to be made that is based on radiative transfer theory.
The formal solution of the classical radiative transfer differential equation [32] [Ch. 11, Eq. (3)] for an atmosphere in local thermodynamic equilibrium is given [32] [Ch. 11, Eq. (4)] as an integral for the multiplication of a blackbody function along a slant path with a slant path transmission to an observer. Approximating the integral as a sum shows that the radiance is given by a summation of t × B(T) terms. We already showed that the blackbody and transmission functions are both lognormally distributed. With the addition property of lognormals (see property (vi) in Appendix A), the result is lognormally distributed radiance.
If the source radiance, Ls, is an atmospheric radiance, it is reasonable to assume that it is lognormally distributed, as the above argument would apply. For source radiance that is due to emission from an opaque object, we assume a lognormal distribution also. If the object is a blackbody, then this is justifiable based on section 3.1.1. For graybody radiation, where , choosing a lognormal distribution for the radiance implies either (i) εs is constant and property (iii), Apendix A, establishes , or (ii) and property (v), Appendix A, establishes . In the latter case, it must be that the population parameters of εs are such that Pr(εs>1) = 0 for the quantity to be strictly physical.
3.2. Distribution of output terms using Johnson SU
The primary output terms of the pdf model are the total at-sensor radiances M(0) and M(1), as well as the differential radiance ΔM; these are the terms that a sensor has access to directly (i.e., can measure). The thermal contrast, ΔT, is also fundamentally important due to the key role it plays in enabling LWIR detection. We model these terms with the Johnson SU distribution. In this section we explain the process and the reasons why the SU distribution is successful as a single distribution to describe these terms. We also discuss the use of the Johnson SU distribution to describe intermediate terms in the model—terms of secondary importance such as at-sensor radiance terms for the individual layers, Mf, Mc, Mb, and Ms—which can be more challenging for the SU distribution to represent.
The previous section showed that it is reasonable to model all of the input RVs in Eq. (1) as lognormal distributions. While physically justifiable, this is also very convenient since a product of lognormal distributions is (analytically) lognormally distributed, and a sum of lognormal distributions is (approximately) lognormally distributed. Ignoring the presence of emission terms such as (e.g., assuming tc was constant or that εc was lognormally distributed) would lead to a lognormal distribution for the total at-sensor radiance (where correlation between terms in the sum—caused by tf, tc, and tb appearing in multiple places—can be handled via property (viii) in Appendix A). However, the problem is that the emissivity when does not follow the standard 2-parameter lognormal distribution, and is instead described by a lognormal distribution that has been reversed (reflected about zero) and shifted. While the moments of this distribution are easily computed in terms of the moments of t (e.g., ), it is difficult to determine the distribution of the product , for example. Equivalently, expanding terms such as to become shows that a difference of (correlated) lognormally-distributed variates enters into the model. Theoretically, the standard 2-parameter lognormal distribution is a poor candidate to approximate a difference of lognormals since there is no guarantee that values would be positive. This fact, combined with the fact that transmissions in our model may actually be truncated lognormal distributions (for which the nice multiplicative and additive properties no longer apply), motivated us to use the Johnson system to fit the moments of the model outputs. A secondary advantage to this is that one may relax the need to assume a specific distribution for the model inputs, and instead may simply specify the first four moments of the input variates.
The Johnson system contains three 4-parameter distributions, SL, SU, and SB, which, along with the normal distribution, allows fitting any combination of values for the set of moments, {E, V, S, K}. All distributions in the system may be considered transformations of a normal variate. Typically distribution systems are characterized by the Pearsonian coefficients and , where each distribution in the system corresponds to a region, curve, or point in the β1, β 2 plane. The SL distribution is a lognormal distribution with the added flexibility of scale and location parameters (and thus the distribution may be shifted and the skewness may become negative); despite being defined as a four-parameter distribution only three of the parameters are identifiable. Both the Johnson SL and the standard two-parameter lognormal distributions are described by the same curve in the β1, β 2 plane (see Fig. 2 ). Any point below the lognormal curve is in the SU region, which is an unbounded (and unimodal) distribution. Any point above the lognormal curve is in the SB region, a bounded distribution that may be unimodal or bimodal. The normal distribution is a point in Fig. 2 and is a limiting form of the SL (and SU and SB) distributions; the SL is a limiting form of the SU (and SB). The fact that the lognormal distribution appears as a special case of each distribution in the Johnson system is an important property given that (as stated above) the model predicts lognormally distributed radiance under certain conditions.
The SU distribution is particularly important because: (i) we observed that most of our simulations produced (β1, β 2) values that—if not on the lognormal curve—were below it, and thus in the SU region; (ii) analytical moments are relatively easy to derive and there are good procedures for method-of-moments estimation (details are in Appendix B); (iii) the SL and the LN distributions are limiting cases of the SU, and thus the SU distribution can represent these distributions with an error that is as small as desired—this means that the model outputs may always be considered to be SU thereby avoiding the complexity of having to switch to or from an SL representation.
There are times when the moments of model outputs are above the lognormal curve in the SB region (generally caused when the truncated lognormal distribution is necessary for transmission variates). However, dealing with the SB distribution is difficult due to the complexity of the moments; estimation must be performed via quantiles or even frequencymoments [28]. Therefore, even when the distribution to be fit is in the SB region, we use an SU approximation (we project the point in the SB region to the closest point on the lognormal curve and then use the SU—see details in Appendix B, property (ii)—generally the Euclidean distance to the lognormal curve is small, and the SU fit is reasonably good). Therefore, the SU distribution is the only member of the Johnson system that we need to consider.
The general procedure for determining the pdf of model outputs is to analytically compute the first four moments (E, V, S, and K) for each model output that we desire to fit with a pdf, say, M(1) under LOS2. E, V, S, and K are a function of the raw moments of M(1), which are in turn a function of the population parameters of the model inputs. The raw moments are found by, e.g., expanding the function (where n is 1 through 4; obtaining 5 terms for , 15 terms for , 35 terms for , and 70 terms for ) and taking the expectation of each expression, using properties of the expectation operator given in Appendix D. Each expression contains terms that are a product of up to six RVs (in this example) appearing to powers up to 4. Because of independence, the expectation operator distributes, e.g., . E, V, S, and K, are then computed as a function of the raw moments using Appendix C. This procedure, though tedious, is nevertheless straight-forward and may be applied to all of the model outputs. Given the analytical moments, then the estimation procedure in Appendix B is used to determine the four population parameters (a, b, c, d) of the SU distribution, e.g., . If the moments are located in the SU region (or on the lognormal curve), then the corresponding moments of the SU fit (see Eq. (B4) in Appendix B) will match exactly. If the distribution to be fit is in the SB region, then the skewness and kurtosis will be slightly off, and only the first two moments will be fitted exactly.
In the remainder of this section, we highlight several individual model outputs, how well they are modeled with the SU distribution, and conditions for them to be lognormally distributed.
3.2.1 Homogeneous layer radiance and at-sensor radiances
As previously mentioned, the possibility of truncated lognormal distributions for the transmission terms and the presence of emission terms are the primary reasons for model outputs to deviate from the standard 2-parameter lognormal distribution. Therefore, terms such as Mc—or Lf and Lb using the homogeneity assumption (Eq. (3))—are potentially the most problematic since both transmission and emission terms appear. Our observations indicate that terms such as Mc are the most likely to be poorly fit by the SU distribution, i.e., it is possible for them to be in the SB region. However, even when the component at-sensor radiances are poorly fit, the total at-sensor radiance tends to be remarkably well fit by the SU, and usually M(1) is even better fit than M(0). A reason for this may stem from the fact that the SB region (see Fig. 2) is characterized by smaller kurtosis values for a given skewness than the SU (or alternatively larger skewness for the same kurtosis). M(0) and M(1) are a convolution (sum) of the component at-sensor radiances where convolution is a smoothing and broadening operation. We theorize that the convolution of terms helps increase the value of the kurtosis relative to the skewness, making it more likely for total at-sensor radiance terms to be in the SU region. M(1) is a convolution of more terms, and thus would be more likely to be in the SU region than M(0). Regardless of the reason, M(0) and M(1)—the primary outputs of the model—are particularly well fit by the SU distribution.
The ability to approximate at-sensor radiances as lognormals is attractive as it eliminates the necessity of using a fitting routine for the SU due to the elegance of lognormal distributions that are easily manipulated. Component at-sensor radiances such as Ms (or Mb under LOS2 when the homogeneity assumption is not used) will be lognormally distributed as long as the transmissions are not significantly truncated. On the other hand, Mc will only be lognormal if the transmissions are not truncated and if may be well approximated by a lognormal distribution (which occurs when tc is nearly symmetric), in which case is a product of three lognormal variates. If all of the component at-sensor radiances are lognormally distributed, then the total at-sensor radiance will also be (approximately) lognormally distributed (based on the additive property of correlated lognormals in Appendix A—where the correlation is caused by the transmissions appearing in multiple terms).
We note that a further simplification for the total at-sensor radiance (or any other quantity) that is lognormally distributed is the normal distribution. The normal is the limiting form of the lognormal as . Thus, if the skewness and kurtosis of the lognormal distribution are not too far from 0 and 3, respectively, then a normal distribution may be used instead. For example, at wavenumber of 1000 cm−1, and , the blackbody lognormal function (see Section 3.1.1) is given with and , for which the skewness and kurtosis (Appendix A) are 0.24 and 3.1, respectively. Hence, can be well-approximated by a normal distribution .
3.2.2 Differential radiance
The differential radiance, , is particularly well-suited to a Johnson SU distribution. We can theoretically prove that a difference of independent SU variates, , always remains in the SU region. Although this does not establish that a difference of SU variates is analytically an SU distribution, it does say that an SU fit will always be able to exactly match the first 4 moments. Additionally, even when one or both of M(0) and M(1) are in the SB region (and therefore outside of the SU), it is possible for ΔM to be in the SU region. Thus, the Johnson SU is very appropriate for fitting ΔM.
In the physical model, the differential radiance given by Eq. (4) is the information in the measured radiance that is due to the cloud, i.e., it captures only those photons that interact with the cloud. In the stochastic model the interpretation of ΔM is not as straight-forward: is a difference of RVs that are assumed to be (independently) sampled at different times, and fluctuations in the input RVs will cause a differential radiance between terms that do not involve the cloud. Thus, in the stochastic model ΔM captures both “informative” cloud-photons, as well as “uninformative” photons that are due to clutter fluctuations. For example, consider the foreground radiance, Lf, which is distributed identically and independently under H0 and H1 conditions. The mean , but the variance is the sum of the variances of Lf |H0 and Lf |H1; therefore, the differential radiance will include some photons that originated in the foreground layer and never interacted with the cloud. Note that correlations between H0 and H1 quantities may be included in this analysis, but here we assume independent temporal sampling.
When M(0) and M(1) analytically follow the lognormal distribution, the differential radiance ΔM is a difference of independent lognormals. We can show analytically that a difference between two independent lognormal distributions is always in the SU region (below the lognormal line in the β1,β2 plane), and in fact, the fit is so fantastic it is tempting to think that the SU distribution might even be the analytical pdf for the difference of lognormals.
In the stochastic model, we use binary hypothesis testing [19] with respect to the radiance distributions M(0) and M(1) to determine the detection potential of the remote sensing scenario that is due to scene noise (e.g., the likelihood that at-sensor radiance may be attributed to the presence of a cloud along with the corresponding probability of false alarm). We are aware of the fact that detecting a presence of a cloud with only one wavelength is problematic as it is susceptible to drift, atmospheric disturbances, and interferences (using multiple wavelengths, these effects may be mitigated due to the wavelength dependence of the different events).
3.2.3 Thermal contrast
Determining the distribution of the thermal contrast in radiance units—defined in (4)—is qualitatively similar to determining the distribution of differential radiance, and raises no additional challenges. However, determining the distribution of ΔT in units of brightness temperature (Eq. (5)) is difficult because of the appearance of the inverse Planck blackbody function, B−1(·), that appears in the expression . If Lin is a lognormally distributed variate, then the steps in showing that B(T) is approximately lognormal when T is normal can be reversed to show that is approximately normal, where
However, in general Lin will be described (or approximated) by an SU distribution. Transformation of variables applied to Eq. (7) is too complicated when the input distribution is the Johnson SU (scaling and translation relationships for the SU distribution exist—see Appendix B—but more complicated transformations generally do not lead to densities that can be integrated); computing moments of Tin analytically from the moments of Lin is also intractable due to the nonlinearity of the transform. However, moments can be computed easily when the inverse blackbody function is linearized (approximated) with a first order Taylor expansion. Therefore, we will describe two complementary methods to determine the pdf of ΔT: (i) approximating Lin as a lognormal distribution and using the exact functional form of B−1; or (ii) propagating exact moments of Lin using an approximated form of B−1.
In the first method, , and may be approximated very well as a 2-parameter lognormal distribution with population parameters {μy,σy} found from fitting and using Eq. (A2). Then, by definition. The previously-used first order Taylor approximation for the reciprocal of a normal variate (Section 3.1.1) establishes to be approximately . Tc is also normally distributed, and thus the distribution of ΔT is approximated by . These steps are simply the reverse of the steps in Section 3.1.1, and are appropriate whenever the SU distribution for Lin is not “too far” from the lognormal line. However, as the kurtosis and skewness of the SU gets further from the lognormal, the output distribution for Tin becomes skewed and the normal approximation (which is necessarily symmetric) will begin to provide a poor fit.
In the second method, we approximate Eq. (7) with a first order Taylor expansion around the mean of L. Let L0 denote E(L), then
The result is that now has a linear relationship with random variate L, , with constant gain α1 and offset α0 given by
respectively, whereThe moments of and may now be easily computed using Appendices C and D, and fit with an SU via property (ii) of Appendix B.
Note that if Lin analytically follows an SU distribution, , then via property (iii) in Appendix B, where the absolute value and signum functions are not needed since . Subsequently, moments of can be found from Eqs. (B3) and (B4) instead of using Appendices C and D. Additionally, if Tc is a constant, then neither Appendices C and D nor the SU fitting are needed since Appendix B property (iii) establishes .
The first order Taylor expansion is appropriate when the variance of Lin is small enough such that the inverse blackbody function is approximately linear over the domain of Lin. Which method is better will be determined by the relative error in approximating Lin as a lognormal distribution compared to the error in assuming the blackbody function is linear. In all simulations shown in Section 5, the blackbody function is close to linear, and so the second method works well. Thus, in the remainder of the paper we only focus on the second method.
4. Simulations
In this section we describe the details of simulations for the detection of triethyl phosphate (TEP) vapor, including how population parameters were chosen. All simulations are run at 1049.3 cm−1 (which corresponds to peak absorption for TEP) and radiance values are reported in units of . With given population parameters for the input variates to the model, we sample values from the appropriate random number generator (106 samples of each variate, creating 106 sets of values). For example, layer transmissions are sampled from (possibly truncated) lognormal distributions, temperatures are sampled from normal random number generators, and layer radiances (e.g., Ls) are sampled from the lognormal distribution. Samples for the intermediate model terms and the primary outputs are computed via applying the model equations to each set of random values, creating 106 values for each output term. We ensure that all RVs for H0 are sampled independently (i.e., sampled separately) from H1, thus we have an additive signal model where H0 and H1 are uncorrelated (in field measurements H0 and H1 are measured at different times, and therefore are assumed independent). From the sampled RV data we compute histograms (sampled pdfs) that we compare with the analytical pdfs predicted by our model. Predicted pdfs are solely a function of the population parameters for the input variates.
We present two types of simulations. The first is a physical scenario of a sensor situated 1 m above the ground in a horizontal LOS geometry that is viewing a vapor cloud located at a horizontal distance of 1 km. At the end of the LOS (10km away from the sensor), there is a topographical target (a mountain). The second type of simulation is for a hypothetical scenario with a challenging set of parameters for the RVs (e.g., a combination of transmissions, temperatures and radiance values) in order to test our model’s accuracy and to explore its limitations. The parameters chosen for the hypothetical scenario can be associated with a physical scenario, but this is not our intention. For both types of simulations we chose horizontal LOS geometry, a geometry that is more challenging than a slant path due to the added complexity of the foreground and background emission terms (discussed in Section 3.2). Simulation results for slant path geometry (not shown) were very good. Table 1 summarizes population parameters as well as means and variances for all simulations.
4.1 Physical simulation
The 3-layer model geometry in the physical simulation is similar to a previously used scenario [4]: a LOS near the horizon (1.7 degree up-look angle), foreground pathlength of 1 km at temperature of 288 K; a cloud at temperature of 288 K (i.e., at equilibrium with the ambient atmosphere); background layer of 9 km path at the cloud temperature; a mountain (that serves as an external blackbody source) with emissivity of 0.85 (typical of many clay soils in the IR) at temperature of 278 K (i.e., 10 K colder than the ambient air temperature, a scenario that can occur in the morning hours). Note that a 1.7 degree up-look angle is shallow enough that horizontal distances and slant-path distances are virtually identical. We use previous stochastic simulations [6] as guidelines for setting population parameters for transmission and temperature functions in this work.
The standard deviation of the transmission, , for IR wavelengths and horizontal LOS (elevation angle < 5°) is between to for which the mean transmission values are between 0.7 to 0.97 (see Figs. 4 and 5 in Reference [6]). Note that the transmission of the atmosphere is highly wavelength dependent. At wavelength 993 , a relatively clear atmospheric transmission, the ground level extinction coefficient is 0.07 ; for 1053, due to ozone absorption, the extinction coefficient is 0.12 ; for 853 , due to water vapor absorption, the ground level extinction coefficient is 0.3 .
In our “physical” simulation we choose the foreground transmission for horizontal path of 1 km to be lognormally distributed with and for which the population parameters can be computed from with Eq. (A2) to be and . Thus, the foreground transmission is and the foreground optical depth is . The background layer is 9 times longer in pathlength than the foreground layer; population parameters are derived from the relation . Thus, and with and . Using property (i) of Appendix A, the mean and standard deviation of the background transmission is and . Note that this produces a variablilty for the background transmission that is a little larger (factor of 2.5) than observed in [6]. Truncation is not required for any of the transmission variates in the physical simulation.
The variance of the atmospheric temperature is computed with a turbulence theory [6]. The variance of temperature due to turbulence between two points separated by distance Δx is , where is the temperature structure constant. is proportional to the refractive index structure constant, (Eq. (2) in [6] where a typo needs correction: the left-hand side should have been and the right-hand side should have been ). Both and are a function of elevation. The Tatarski model (see [33] Eq. (15).4.35) gives the height dependence of the refractive index structure constant as where (supported by experiments for h up to at least 100m). We use a value of which is characteristic of moderately strong turbulence; the corresponding value of the temperature structure constant (for the 1976 US standard atmosphere [34] with surface temperature of 288K and 46% relative humidity) is . Using the Tatarski height dependence and assuming (for convenience) that wind speed and sampling rate (and hence ), the variance of temperature over a slant-path extending from height above the ground to height is estimated by . For LOS near the horizon at 1.7° elevation angle, the variance of the foreground layer (1km path from to ) is 0.180 K2 and for the background layer (9km path from to ) is 0.0448 K2. The variance of the cloud temperature is simply set to 0.088 K2. Temperatures in the model are normally distributed, and as explained in 3.1.1, corresponding distributions for the blackbody radiance are very close to lognormal distributions; Table 1 gives population parameters and moments for these quantities.
The radiance from the external source, Ls, is modeled as a lognormal distribution with and . This choice is arbitrary but is consistent with graybody radiation with and distributed according to a lognormal distribution with and ; population parameters and moments are given in Table 1.
For the vapor cloud population parameters we used field data from an experiment where a TEP vapor “cloud” was confined to a closed chamber [3] of size . In the experiment, concentration-pathlength values inferred from LWIR remote sensing measurements when 1 cm3 of TEP was vaporized in the chamber were fit with a normal distribution. We extract the mean and standard deviation of the TEP concentration-pathlength (mg/m2 and mg/m2, respectively; see [3], Table 1, ) and convert the values to optical depth (, where is the absorption coefficient at 1049.3 cm−1). Thus, the mean optical depth is with variance , which means that the TEP cloud’s transmission in the physical simulation is distributed as where and . Note that in our model the dimension of the cloud is immaterial, only the optical depth is important.
4.2 Additional simulations
For the second type of simulations (intended to challenge our model) we increase the variance of the temperature of the three layers () by ten-fold, , , and . The source radiance is altered by increasing the variance of Ts by a factor of 100, from . The variances of all layer transmissions are set to the large value of 0.01 (standard deviation of 0.1 transmission unit). For “Simulation 1”, the mean transmission value for all layers is set to 0.95 (i.e., optically thin layers), which results in a lognormal distribution with ~29% of values greater than 1 (and thus the effects of truncation are significant). For “Simulation 2”, the mean transmission values for all three layers are set to 0.1, such that each layer is closer to being optically thick. Although less than 0.1% of transmission values are greater than 1 for Simulation 2 (and thus truncation is not significant), the transmission pdfs are highly skewed (they are sharply peaked with a mode close to zero and a long tail) which challenges the model. Population parameters and corresponding means and variances for the various distributions for Simulations 1 and 2 are summarized in Table 1. Note that the mean and variance for the transmission variates are reported prior to truncation. The actual mean and variance of the truncated lognormal distributions in Simulations 1 and 2 are , , and , , respectively.
5. Results and discussion
We compare the sampled data histograms of the radiative transfer quantities (Eqs. (1-5)) to those predicted with the Johnson SU distribution. To assess how close the two are, we compute a symmetric Kullback-Leibler (KL) distance [35], . The KL distances are noted in the figure captions. The smaller the distance the better the agreement. The computed value of the KL will depend on how the histograms are computed (e.g., the bin spacing used). To get an idea of the significance of the KL value, we computed the 99th percentile for KL distances between a histogram computed from 106 samples drawn from N(0,1) and the theoretical density; the result of 0.0013 means that KL values must be greater than this to establish that the two densities are not identical. In Figs. 3 -4 we show the comparison for the physical simulations and in Figs. 5 -8 for the two hypothetical scenarios. In Table 2 we summarize the Johnson SU population parameters, , for the primary model outputs M(0), M(1), and , as well as for . The pdf of is computed using the second method outlined in section 3.2.3, using Eq. (8).
In the first panel of Fig. 3 we show (top row) the layer emission radiances , , and the cloud layer emission radiance . In the middle row we show the at-sensor radiances for the external source and background layers , and the cloud at-sensor radiance ; in the bottom row we show at-sensor radiance and the cloud blackbody radiance, . The layer radiances and at-sensor radiances are modeled with Johnson SU statistics and the blackbody is modeled with lognormal statistics. Figure 3 shows that the approximation of a blackbody with a lognormal pdf is excellent. Other blackbodies (foreground and background) in Table 1 (all simulations) also produced practically an exact match between sampled data and a lognormal.
The agreement between our model and the data for the emission terms and the at-sensor radiances is excellent (KL distances for all RVs in the physical simulation are less than 0.002).
In Fig. 4 (top left panel) we show the pdf for the total at-sensor radiance under H0 and H1. In the top right panel we show the pdf for the differential radiance (Eq. (4)). The pdfs are in execellent agreement with the data (KL distances for M(0), M(1), and are less than 0.002). In the bottom left panel we show the pdf for the thermal contrast (in Kelvin) where . The agreement with the data is excellent, KL distance of 0.0016. The negative indicates that the cloud is always in emission mode, , and the probability for is zero. Nevertheless, the pdf for shows that the cumulative probability for is . Thus, due to the fluctuations in the foreground layer, the cloud that is “physically” in emission mode will appear in the measurements with , about 13% of the time, and the cloud may be erroneously interpreted as a cloud in “absorption” mode. This also has implications towards averaging spectra in order to increase the signal to noise ratio: fluctuations in the polarity of means that naively averaging instances of measured by a sensor will decrease noise but will also decrease the mean (signal). In the bottom right panel we show the performance of detection, with the standard receiver operation characteristics (ROC) curves (the probability of detection, PD, as a function of the probability of false alarm, PFA). The ROC curve can be used to assess the theoretical performance of detecting the cloud due to scene fluctuations.
In Figs. 5-8 we show the two hypothetical scenarios of Table 1 with challenging parameters. We first discuss the SU fit for several of the intermediate terms in the model to illuminate the limitations of the SU distribution, and then we show that in spite of these limitations, the SU is an excellent fit for the important quantities that determine performance, namely, M(1), M(0),, and . In Fig. 5 (simulation 1) the KL distances for Lf, Lb, Lc, , , and Mc are large (~1.3), which indicates poor agreement with the SU model. The histograms for these parameters all have a large probability for radiance values near zero, and a discontinuity at zero that reflects the fact that radiance values may not be negative (the shape of the histograms is caused by the severe truncation necessary to model the transmission terms on the interval 0 to 1; many transmission values are close to 1, meaning that the emission values for each layer are near zero). The SU distribution, however, is a continuous distribution defined on the interval to with no restriction regarding non-negativity. Thus, the SU fits for these terms are poor, and include (nonphysical) negative radiances. At-sensor radiances for the source layer, on the other hand, are reasonably well-fit by the SU distribution (KL distances for and are 0.005 and 0.008, respectively). The source layer is not modeled with its own transmission term, and the effect of attenuating the source radiance through truncated lognormal transmission terms (where a majority of values are close to 1) is small.
Despite the fact that intermediate model terms for simulation 1 pose problems for the Johnson SU fit, the primary model outputs, M(1), M(0), and —as well as —are very well fit by the SU distribution, with KL distances around 0.0013. These quantities, as well as the ROC curve for simulation 1, are shown in Fig. 6 . The ROC curve shows the challenging detection scenario posed by this simulation; despite the large thermal contrast and the same vapor optical depth as the physical simulation, it is only modestly above the 45 degree line (equal probability of false alarm and detection—a “coin-flip” detection scenario). Thus, scene fluctuations are large compared to the difference in average radiance between H1 and H0. In simulation 1 (Fig. 6) the cloud is in emission mode nearly always (the probability of is negligible, around ), but nevertheless the cloud appears in the measurements as an “absorbing” cloud, ,with large (37%) probability, .
In Fig. 7 (simulation 2), layer radiances Lf, Lb, and Lc show a reasonable SU fit (KL values between 0.03 and 0.06). In each case, the long tail to the left of the SU peak causes a small probability of non-physical negative values (~0.02%). At-sensor radiances and are also reasonable (KL values around 0.75) but have a higher probability of negative radiances (3%). Fits for , , and are poor with large KL distances (greater than 2) and significant probabilities of negative radiances (13%, 27%, and 14%, respectively). The amount of truncation for transmission variates is small (<0.1%), thus we do not think the main reason for the poor fits. Regardless of the cause, the SU distribution is just not appropriate to describe the shape (rise and fall) of the histograms. The shape of the SU distribution is controlled by the skewness and kurtosis of the distribution; as shown in Fig. 2 the SU distribution only covers a subset of the possible values. The histograms in Fig. 7 are all in the SB region: Lf, Lb, Lc, , and are all relatively close to the lognormal line (~2 units or less) such that the SU approximation is still reasonable; , , and are significantly far from the lognormal line (10-100 units away) such that the SU approximations are poor. Evidently, certain types of very sharply peaked distributions that rise very quick onone side with a long tail on the other (as in Fig. 7) are difficult for the SU to represent. Despite the poor fit for some of the intermediate model outputs, Fig. 8 shows excellent agreement between the model and the histograms for M(1), M(0), and . In fact, these histograms are actually in the SU region and KL values are small (≤0.002). For the fit is good, though not perfect (KL of 0.008). The discrepancy between the model and the histogram for is not due to the validity of the linearization of the inverse blackbody function described in Section 3.2.3, but is due instead to the fact that the Lin radiance is actually in the SB region. The distance to the lognormal line is small, however, and the SU approximation for is reasonable.
Figure 8 also shows the detection performance (ROC curve) for Simulation 2, which is even poorer than for Simulation 2. The larger optical depth of the three layers in Simulation 2 causes more emission from the cloud layer, but also more attenuation of cloud photons due to the foreground layer. Additionally, the thermal contrast is reduced (compare the magnitude of in Figs. 6 and 8) due to the fact that fewer photons from the mountain reach the cloud. In fact, in Simulation 2 the cloud will exhibit both radiation modes: 83% of the time in emission (, computed from the histogram), and 17% in absorption (in Simulation 1 the cloud was always in emission). In the measurements, however, the cloud in Simulation 2 will appear in emission mode with probability of 47% (). Thus, inferences made from regarding (that are valid in the physical model) can be misleading in practice, where variation in the radiative transfer terms causes a mismatch between and .
5. Summary
A probability model for a 3-layer radiative transfer model (foreground layer, cloud layer, background layer, and an external source at the end of LOS) has been developed. The 3-layer model is fundamentally important as the primary physical model in passive LWIR remote sensing, but it has not been useful for predicting performance due to clutter (scene variations).
This is (to our knowledge) the first probability model for a 3-layer radiative transfer model where all quantities are allowed to be stochastic and higher order statistics are included. The model provides statistical characterization of at-sensor spectral radiance due to scene fluctuations, that is, it can predict the detection potential in clutter-noise limited scenarios. Note that the probability model could be combined with sensor models and detection algorithms to predict the performance of a particular sensor.
The model is based on computing analytical moments (mean, variance, skewness, and kurtosis) of the output variates based on the moments of the input variates. Our choices for the input statistics are made based on physical theory, and we expect that they can accurately model real stochastic quantities. However, any distribution that has four computable moments can be chosen. Moments for input variates could also be measured empirically.
We assumed Gaussian (normal) statistics for temperature fluctuations, , which results in the excellent approximation that the Planck blackbody radiance is lognormally distributed, . Layer radiances and transmission variates are modeled as lognormal variates, both of which are justifiable based on physical arguments. The lognormal distribution has the nice property that it describes non-negative random variables (it is defined on the domain 0 to ∞), and thus is naturally applicable to describe radiance values. Transmission values are non-negative but also less than or equal to 1; therefore it is sometimes necessary (depending on population parameters) to represent transmission variates by truncated lognormal distributions. Other choices are possible (for example, the beta distribution is naturally bounded between 0 and 1 and can describe a wide range of shapes), however, the lognormal distribution is convenient to work with since products and sums of lognormal distributions are themselves lognormally distributed (or approximately so).
In special cases the model predicts that the total at-sensor radiance is lognormally distributed, however the lognormal distribution is not flexible enough to represent the general case. Therefore, we use the Johnson system of distributions to fit the moments of the output variates. There are other systems of curves that may be used for distribution fitting; we chose the Johnson system due to central role the logarithmic distribution plays in the system and the small number of distributions that appear in the system. The Johnson SU distribution is a particularly useful and flexible distribution that is appropriate for representing the at-sensor radiance under H0 and H1 hypotheses, M(0) and M(1), respectively, and the differential radiance, . These are the primary quantities that a sensor would be able to measure. The thermal contrast, ΔT (see Eqs. (4) and (5)), is also typically well-fit by the SU distribution. The thermal contrast is of interest as the principle physical quantity that enables LWIR vapor detection. Other intermediate output terms of the model may be well fit with the SU distribution; at times they would be better modeled with the Johnson SB distribution, but method-of-moments estimation is intractable with the SB, which precludes its use in our model. This is a disadvantage to the Johnson system, but in practice it is of little impact since the pimary model outputs are fit so well by the SU distribution, even when the intermediate terms would require the SB to be “optimally” fit.
In the traditional (non-stochastic) physical model, gives only information about the vapor cloud. In the stochastic model, due to scene fluctuations, ΔM includes photons that never interacted with the cloud.
Detection potential may be quantified by ROC curves derived from binary hypothesis testing on the M(0) and M(1) distributions. Fluctuations (in the temperature of the cloud and in the input radiance on the back of the cloud, Lin) can also cause the cloud to be in emission and in absorption at the “same time”, i.e., at a given moment there is a non-zero probability for the cloud to be in absorption and a non-zero probability for the cloud to be in emission. This has potential impacts towards processing spectral time-series data: it may be dangerous to attempt to increase the signal-to-noise ratio (SNR) via averaging spectra, since it would be possible for an emission and absorption spectrum to cancel. It is also possible that due to the fluctuations in the foreground layer, the polarity of will be reversed, hence, a cloud that is in emission (i.e., the cloud output radiance into the foreground layer is greater than the incident radiance on its back side) may appear in the measurements with . Only with a probability model for radiative transfer one can study these types of phenomena.
In the probability model we implicitly assume that the population parameters for the three layers (and the external source) are stationary (i.e., are fixed) during the measurement period. Our probability model is a univariate model and thus can only address the radiative transfer model at a single wavelength. This is the first step in developing a multivariate model for hyperspectral remote sensing where the covariance between wavelengths plays an important role. The challenge in extending our univariate model to a multivariate model is mainly in extending Johnson SU to multivariate statistics where the covariance needs to be computed and estimated from Eq. (1) that is written for vector quantities. This research may lead to new detection algorithms tailored to exploit the Johnson SU statistics. Our pdf model is a “forward model” where with given population parameters for the radiative transfer model we predict performance. The challenge of estimating the population parameters form sampled data (i.e., an “inverse model”) is great and remains open.
Appendix A. The lognormal distribution
Let x be a normal distribution, , with density
By transformation of variables, follows the standard two-parameter lognormal distribution, , with density
The population parameters are the mean and variance of the associated normal distribution. We will now present some useful properties of the lognormal distribution.
- (i) Statistical properties (moments)
The raw nth moments of the lognormal distribution are given by:
from which the following moments may be derived (using Appendix C) to be
The median of the lognormal distribution is simply , and the mode is given by . Defining Pearsonian coefficients and , the equation for the lognormal curve in the β1, β 2 plane (see Fig. 2) is given by
- (ii) Estimation using the method of moments
The population parameters for a lognormal distribution that matches the mean and variance of a random variable x, can be found by solving the simultaneous equations to be
Note that when attempting to fit a distribution to sampled data, other methods are usually preferred to the method of moments (maximum likelihood, for example) due to higher efficiencies [28]. Hybrid techniques may also be used, for example, replacing the mean by the median in the method of moments gives
However, for fitting theoretical distributions, where the moments and are known exactly, there are no concerns in using the method of moments.
In the properties below, let be independent lognormal variates (i.e., ) and a positive constant. We have analytically the following properties:
- (iii) Scaling:
- (iv) Inverse:
- (v) Multiplication:
- (vi) Division:
The following two properties are approximately true, where in property (viii) the independence assumption is dropped:
- (vii) Addition (approximation with Fenton-Wilkinson [19], ):
For two lognormal variables it was stated [20,21] that Fenton-Wilkinson moment-matching method breaks down if for either y1 or y2. In terms of signal to noise ratio, defined as , this restriction can be shown (by solving ) to give the condition that a minimum is required for the Fenton-Wilkinson approximation to hold. This property may be combined with (iii) and (v) to show that a linear combination of lognormal variates (with positive coefficients) is lognormally distributed, as long as the SNR restriction is met.
- (viii) Addition of correlated lognormals
For two lognormals and that are correlated with correlation coefficient
we follow the spirit of Fenton-Wilkinson and use the method of moments—property (ii)—to find such that and are matched exactly. A sum of three correlated lognormals with correlations , , and is easily computed in the same manner:
and
- (ix) Moments of truncated lognormal
For a lognormal variate that is truncated to be within (e.g., when y is a transmission that may not be greater than 1) the nth raw moment may be written in terms of the untruncated moments:
where is the complementary error function. If the associated normal parameters are such that the probability Pr(y>1) is negligible (i.e., no truncation is required), the ratio
and the expression for untruncated raw moments is recovered.
- (x) Central limit theorem for multiplicative processes
Consider a product of independent non-negative variates, . Taking the logarithm of this expression leads to a sum of (real-valued) independent variates, . If, by the central limit theorem , then the relationship immediately implies that .
Appendix B. The Johnson SU distribution
The SU distribution is based on the following transformation of a normally distributed variate :
where the parameters a and are location and scale parameters, respectively, and c and are shape parameters. The density for is given by
The SU distribution can describe the lognormal and normal distributions as limiting cases: when , and when . The SU distribution is symmetric (skewness is zero) when . Some useful properties of the SU distribution are presented below.
- (i) Statistical properties (moments)
The nth moments around arbitrary location are given by
where and . Raw moments may be found from this equation by setting , central moments are found by setting . Using Appendix C, the following moments may be derived:
Note that references [28, p42, Eq. (2).62] and [18, Eq. (6).74] both have errors in their expressions for moments of the SU distribution. In the β1, β 2 plane, the SU distribution defines an unbounded region “below” the lognormal curve (see Fig. 2).
- (ii) Estimation using method of moments
We follow the iterative numerical procedure suggested by Elderton and Johnson [36] for finding the parameters that match the desired and coordinates of the distribution to be fit (in our model these are computed analytically; for fitting sampled data alternative methods such as quantiles [37] may be preferred). For the SU distribution the sign of the skewness is determined by the sign of c ( means ), but because the fitting routine operates on β1 (thereby losing the sign of the skewness), the computed value of c will always be non-positive. Thus, c should be multiplied by if the skewness to be fit was negative. Given values for c and d, it is then straightforward to determine the value of b that matches the variance, and then the value of a that matches the mean. This is the same procedure used in [38].
If the distribution to be fit is in the SU region or on the lognormal curve, then the first four moments will be matched exactly by this fitting procedure. If the distribution to be fit lies in the SB region (above the lognormal curve in Fig. 2) then the skewness and kurtosis will not match exactly. To ensure that the “best” possible SU approximation is returned by the iterative procedure, we first find the point on the lognormal curve with the smallest Euclidean distance in the β1,β2 plane to the SB point to be fit. This can be found using Newton’s method [39] to minimize the distance where β1 and β2 are given by Eq. (A1) and S2 and K are the coordinates of the point in the SB region. D is only a function of σ; the solution value σmin is plugged back into Eq. (A1) to find the coordinates on the lognormal curve closest to the SB point. Then, applying Elderton and Johnson’s routine to this point finds population parameters {a,b,c,d} corresponding to the optimal SU approximation (and is numerically equivalent the best possible SL approximation). This procedure is necessary to return the optimal SU fit because the Elderton and Johnson procedure approaches the solution while preserving the correct value of β2. Thus, it would return an SU approximation that matched the mean, variance, and kurtosis, but not the skewness. The optimal solution accepts some error in the kurtosis in order to reduce the error in the skewness to achieve minimum total error.
Note that there is a non-uniqueness in approximating a point on the lognormal curve with the SU distribution. Since the lognormal is a limiting form of the SU, the SU can get indistinguishably close to the lognormal distribution without actually reaching it; for a given tolerance there will be a circular arc in the SU region centered on the lognormal point in the β1,β2 plane defining particular SU distributions that are equidistant to the lognormal point (and therefore have the same value for the total error and are thus “equally good” at approximating the lognormal point). In practice, however, for a fixed tolerance, using the iterative fitting routine will always produce the same set of SU population parameters since the point is approached from a specific direction (along a line of constant β2).
- (iii) Translation and Scaling:
Let k and λ be constants and . Then, it can be shown that . The absolute value and signum (sign) functions are needed because by convention, b and d are always nonnegative. In (B1), since sinh(·) is an odd function and x is a zero-mean symmetric distribution, the following transformations are all equivalent, . Therefore there is an invariance in the Johnson SU distribution to simultaneously flipping the sign of the c and b parameters, and no loss in generality in always taking b to be positive.
Appendix C. [E, V, S, K] as a function of moments
Let z be a random variable whose raw moments exist for n =1, 2, 3, 4. Then the mean, variance, skewness, and kurtosis may be written as a function of the raw moments as
respectively. These expressions may be derived using the properties of the expectation operator, which are summarized in Appendix D.
Appendix D. Properties of the expectation operator
The definition of the expectation operator is
where px(x) is the pdf describing the random variable x, and the integral is computed over the entire domain of x. It follows directly from the definition that where k and λ are constants. It can also be established, regardless of independence between variates x and y, that . Additionally, if x and y are independent—such that the joint probability density —then .
For convenience we give the moments of and of (with x and y independent), which are frequently used in deriving moments of the model outputs. The moments of (for example, and ) are given by
For emissivity terms that are of the form , (D1) may be used with . The moments of (with x and y independent) are given by
References and links
1. M. L. Polak, J. L. Hall, and K. C. Herr, “Passive Fourier-transform infrared spectroscopy of chemical plumes: an algorithm for quantitative interpretation and real-time background removal,” Appl. Opt. 34(24), 5406–5412 (1995). [CrossRef] [PubMed]
2. D. F. Flanigan, “Prediction of the limits of detection of hazardous vapors by passive infrared with the use of modtran,” Appl. Opt. 35(30), 6090–6098 (1996). [CrossRef] [PubMed]
3. A. Ben-David and H. Ren, “Detection, identification, and estimation of biological aerosols and vapors with a Fourier-transform infrared spectrometer,” Appl. Opt. 42(24), 4887–4900 (2003). [CrossRef] [PubMed]
4. A. Ben-David, C. E. Davidson, and J. F. Embury, “Radiative transfer model for aerosols at infrared wavelengths for passive remote sensing applications: revisited,” Appl. Opt. 47(31), 5924–5937 (2008). [CrossRef] [PubMed]
5. D. Manolakis, “Signal processing algorithms for hyperspectral remote sensing of chemical plumes”, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1857 – 1860, Digital Object Identifier: 10.1109/ICASSP.2008.4517995 (2008). [CrossRef]
6. A. Ifarraguerri and A. Ben-David, “Impact of atmospheric boundary layer turbulent temperature fluctuations on remote detection of vapors by passive infrared spectroscopy,” Opt. Express 16(22), 17366–17382 (2008). [CrossRef] [PubMed]
7. A. Ben-David, “Remote detection of biological aerosols at a distance of 3 km with a passive Fourier transform infrared (FTIR) sensor,” Opt. Express 11(5), 418–429 (2003). [CrossRef] [PubMed]
8. H. C. van de Hulst, Multiple Light Scattering Tables, Formulas and Application (Academic, 1980).
9. K. N. Liou, An Introduction to Atmospheric Radiation, 2nd ed. (Academic, 2002).
10. G. L. Stephens, P. M. Gabriel, and S.-C. Tsay, “Statistical radiative transport in one-dimensional media and its application to the terrestrial atmosphere,” Transp. Theory Stat. Phys. 20(2-3), 139–175 (1991). [CrossRef]
11. J. Kerekes and J. Baum, “Full spectrum spectral imaging system analytical model,” IEEE Trans. Geosci. Rem. Sens. 43(3), 571–580 (2005). [CrossRef]
12. C. Quang, F. Dalaudier, A. Roblin, V. Rialland, and P. Chervet, “Statistical model for atmospheric limb radiance structure: application to airborne infrared surveillance systems,” Proc. SPIE 7108, 710805, 710805-9 (2008). [CrossRef]
13. A. Ishimaru, Wave Propagation and Scattering in Random Media (Academic, 1978).
14. N. Byrne, “3D Radiative transfer in stochastic media,” in 3D Radiative Transfer in Cloudy Atmospheres, A. Marshak and A. B. Davis, eds. (Springer, 2005).
15. D. E. Lane-Veron and C. J. Somerville, “Stochastic theory of radiative transfer through generalized cloud fields,” J. Geophys. Res. 109(D18), D18113 (2004). [CrossRef]
16. J. Aitchison and J. A. C. Brown, The Lognormal Distribution (Cambridge University Press, 1957).
17. E. Limpert, W. A. Stahel, and M. Abbt, “Lognormal distributions across the sciences: keys and clues,” Bioscience 51(5), 341–352 (2001). [CrossRef]
18. N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous univariate distributions, 2nd ed. (Wiley, 1994), Vol. 1.
19. L. F. Fenton, “The sum of lognormal probability distributions in scatter transmission systems,” IRE Trans. Commun. Syst. CS-8, 57–67 (1960).
20. G. L. Stuber, Principles of Mobile Communications (Kluwer Academic Publishers, 1996).
21. N. B. Mehta, J. Wu, A. F. Molisch, and J. Zhang, “Approximating a sum of random variables with a lognormal,” IEEE Trans. Wirel. Comm. 6(7), 2690–2699 (2007). [CrossRef]
22. D. Manolakis, D. Marden, and G. A. Shaw, “Hyperspectral image processing for automatic target detection applications,” Lincoln Lab. J. 14, 79–116 (2003).
23. A. Schaum, “Adapting to change: the CFAR problem in advanced hyperspectral detection,” in Proceedings of the 36th Applied Imagery Pattern Recognition Workshop (IEEE Computer Society Washington, DC, 2007), pp. 15–21.
24. D. Manolakis, R. Lockwood, T. Cooley, and J. Jacobson, “Is there a best hyperspectral detection algorithm?” Proc. SPIE 7334, 733402, 733402-16 (2009). [CrossRef]
25. H. Ren, “Anomaly detection in hyperspectral imagery: second and higher-order statistics-based algorithms,” in Recent Advances in Hyperspectral Signal And Image Processing, C–I Chang, ed. (Transworld Research Network, 2006), Chap. 12.
26. A. Stuart and K. Ord, Kendall’s Advanced Theory of Statistics: Distribution Theory, 6th ed. (Oxford University, 2007), Vol. 1.
27. N. L. Johnson, “Systems of frequency curves generated by methods of translation,” Biometrika 36(1-2), 149–176 (1949). [CrossRef] [PubMed]
28. J. K. Ord, Families of Frequency Distributions (Griffin, 1972), Chap. 2.
29. S. M. Kay, Fundamentals of statistical signal processing: detection theory (Prentice Hall PTR, 1998), Vol. II.
30. A. Ben-David, S. K. Holland, G. Laufer, and J. D. Baker, “Measurements of atmospheric brightness temperature fluctuations and their implications on passive remote sensing,” Opt. Express 13(22), 8781–8800 (2005). [CrossRef] [PubMed]
31. J. H. Seinfeld and S. N. Pandis, Atmospheric Chemistry and Physics (Wiley, 1998).
32. S. Chandrasekhar, Radiative Transfer (Dover, 1960).
33. N. S. Kopeika, A System Engineering Approach to Imaging (SPIE Optical Engineering Press, 1998), Chap. 15.
34. A. Berk, G. P. Anderson, P. K. Acharya, L. S. Bernstein, L. Muratov, J. Lee, M. J. Fox, S. M. Adler-Golden, J. H. Chetwynd, M. L. Hoke, R. B. Lockwood, T. W. Cooley, and J. A. Gardner, “MODTRAN5: A Reformulated Atmospheric Band Model with Auxiliary Species and Practical Multiple Scattering Options,” Proc. SPIE 5655, 88–95 (2005). [CrossRef]
35. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. (Wiley, 2001).
36. W. P. Elderton and N. L. Johnson, Systems of Frequency Curves (Cambridge University Press, 1969).
37. R. E. Wheeler, “Quantile estimators of Johnson curve parameters,” Biometrika 67(3), 725–728 (1980). [CrossRef]
38. I. D. Hill, R. Hill, and R. L. Holder, “Algorithm AS99: fitting Johnson curves by moments,” J. R. Stat. Soc., Ser. C, Appl. Stat. 25(2), 180–189 (1976).
39. F. S. Acton, Numerical Methods That Work (The Mathematical Association of America, 1990).