Improving the retrieval of water inherent optical properties in noisy hyperspectral data through statistical modeling

David B. Gillis; Jeffrey H. Bowles; Wesley J. Moses

doi:10.1364/OE.21.021306

1. Introduction

Historically, most of ocean color analysis has centered on estimating chl-a concentration (as a proxy for algal biomass) from satellite data. While the determination of algal biomass is a key aspect of monitoring the ecological status of a water body, several scientific and military purposes require analysis that goes beyond merely estimating chl-a concentration and requires the retrieval of other optically active components in water and the characteristics of the substratum. For example, determining the sediment concentration in the surface layer helps assess the availability and transmission of light into deeper layers, which is a crucial parameter in deriving energy budgets for primary production studies; characterizing the bottom type and bathymetry is important for many purposes, such as maritime navigation, underwater geologic studies, global climate studies, habitat management, seafood safety control, coastal erosion control, etc.

Numerous empirically-driven and analytically-driven algorithms have been developed for retrieving in-water and under-water characteristics from satellite data, with varying degrees of success in the achieved accuracy. Contrary to open-ocean waters, the coastal, estuarine, and inland waters are often optically complex due to the relative abundance of other optically active components besides phytoplankton, and are conventionally categorized as Case II waters [1]. The optical properties of Case II waters are not directly correlated to the concentration of phytoplankton alone. Therefore, spectral algorithms for retrieving a particular biophysical parameter in such waters are prone to effects due to variations in other parameters. For example, blue-green algorithms for estimating chlorophyll-a (chl-a) concentration do not perform well in Case II waters due to absorption by colored dissolved organic matter (CDOM) in the blue spectral region (e.g., [2, 3]). Algorithms relying on reflectances in the red and near infrared (NIR) regions have been preferred for retrieving chl-a concentration in Case II waters because of the diminished effects of absorption by CDOM and scattering by Suspended Particulate Matter (SPM) in the red and NIR wavelengths (e.g., [4, 5]). Nevertheless, algorithms based on band ratios in the red and NIR wavelengths still assume uniform absorption and scattering by non-algal components at the wavelengths of interest. This assumption breaks down in cases of low chl-a and high SPM concentrations, causing significant errors in the estimated chl-a concentration (e.g., [6]). Modified NIR-red algorithms have been developed to handle the non-uniform scattering by SPM in the red and NIR regions [7, 8]. Even though such algorithms have been shown to yield accurate results for very highly turbid waters from specific geographic regions, their applicability to waters from various geographic regions with varying biophysical characteristics is yet to be proven.

Comprehensive analysis of coastal, estuarine, and inland ecosystems require the retrieval of multiple in-water and under-water characteristics. Algorithms based on a Look-Up-Table (LUT) approach (e.g., [9]) can retrieve multiple parameters simultaneously and, when properly designed, are less prone to adverse effects of the dominance of a few parameters over the rest. The Naval Research Laboratory in Washington, D.C. has developed a LUT-based Coastal Water Spectral Toolkit (CWST) for retrieving Inherent Optical Properties (IOPs) of water, constituent concentrations, sediment type, bottom type, bottom reflectance, and bottom depth from remotely sensed data. The CWST essentially employs a spectrum-matching approach that is based on spectral distances calculated between the input spectrum and modeled spectra in a large database. The accuracy and reliability of this approach depend significantly on the metric used for calculating the spectral distances. In this paper, the use of the Euclidean and the Mahalanobis distances are compared as the metric used in choosing the best-matching spectrum, and demonstrate the benefits of using the Mahalanobis distance instead of the Euclidean distance.

2. Experimental description and methodology

2.1 Look-Up-Table (LUT) based IOP retrieval

In order to implement a LUT-based model, one must generate in advance a large number of modeled remote sensing reflectance ( $R_{r s}$ ) spectra, covering a wide range of bio-optical parameters, to ensure that each image spectrum can be matched to a modeled spectrum within acceptable limits of accuracy. The CWST contains a very large database of modeled spectra, generated using a wide variety of IOP parameters, bottom reflectance, depths, sediment types, and phase functions. The forward model used for generating the spectra is Ecolight, which is a simplified version of the radiative transfer model Hydrolight [10, 11]. The full set of modeled data, currently comprising approximately 20 million unique $R_{r s}$ spectra, is stored in a database for warehousing.

To run the IOP retrieval for a given image or set of individual spectra, the database is first reduced by limiting the ranges of the various input parameters to those appropriate for the input spectra (for example, bottom types that are known to not exist in the region from which the input spectra were collected are omitted). This reduction in modeled spectra both increases the speed of the analysis and reduces the chance of incorrect matches that might occur due to unusual combinations of input parameters that aren’t realistic for this region. Each input spectrum is then compared to each of the modeled spectra in the (reduced) database, and the spectral distance (according to some metric) is calculated. The modeled spectrum that is closest to the input spectrum is considered the best match, and the (known) parameters corresponding to this spectrum are then assigned to the input spectrum.

2.2 Sensor noise modeling

In very general terms, the process of measuring a hyperspectral spectrum in a CCD array is a means of converting incoming light (photons) into ‘digital numbers’; sensor calibration is then used to turn the digital numbers into radiance values. The total measured signal is a sum of the incoming light and any sensor-generated noise. The latter terms include dark noise, read noise, and digitization noise, and are independent of the incoming signal strength; accurate characterization of these terms for a given sensor is possible in the laboratory. The incoming light is essentially a count of photons per unit time; this count will vary as a Poisson distribution with a standard deviation equal to the mean number of photons received. This variation is usually referred to as photon (or ‘shot’) noise. When the incoming light is significantly large, the photon noise dominates the sensor noise and the data are said to be ‘shot-noise limited’.

In this paper, we simulate real-world noisy measurements using the sensor model for NRL’s Hyperspectral Imager for the Coastal Ocean (HICO), which has been operating continuously aboard the International Space Station since October 2009. The sensor is a 512 x 512 CCD array with a spectral range of 350-1080 nm and a spectral channel width of 5.73 nm; the ground sampling distance is approximately 90m at nadir. Detailed characterizations of the sensor noise for HICO are available in [12]; a much more detailed description of the noise modeling algorithm can be found in [13].

Only simulated data are used in the paper. Ideally, one would prefer to use actually measured field data. However, there is limited ability to collect a sufficient number of in situ measurements over a wide range of biophysical conditions to make statistically meaningful inferences. Through simulations it is possible to generate large data sets over any number of conditions at intervals that are narrow enough to permit investigation of even small biophysical changes in water and still provide a reasonable estimate of how the algorithm will perform with field data.

To generate noisy samples, the following three-step procedure is used: starting with a given $R_{r s}$ spectrum $ρ$ , an at-sensor radiance spectrum, $L$ , is generated by using a forward version of the Tafkaa atmospheric correction algorithm [14, 15]. Next, a ‘noisy’ radiance spectrum $L_{n} = L + η$ is generated by adding a zero-mean Gaussian noise spectrum $η$ ; the variance of the noise is determined by the noise model described in [13] and is independent of the wavelengths. Finally, Tafkaa is used to atmospherically correct the data and produce a noisy $R_{r s}$ spectrum $ρ_{n}$ .

Note that, by construction, the noisy radiance spectra form a Gaussian distribution with a mean roughly equal to the noise-free spectrum and a covariance matrix $Σ_{R}$ ; since the noise is uncorrelated among wavelengths, the covariance matrix is diagonal. In general,, the atmospheric correction algorithm is, to a very good approximation, an affine mapping (that is, $ρ_{n} = A R_{n} + b$ for some matrix $A$ and vector $b$ ), it follows that the distribution of the noisy remote sensing reflectance $ρ_{n}$ will also be Gaussian, with a mean equal to the noise-free reflectance spectrum $ρ$ and a covariance matrix $Σ = A Σ_{R} A^{t}$ . If we further assume that the matrix $A$ is diagonal – that is, no inelastic terms are included in the atmospheric correction – then it follows that that the reflectance covariance matrix $Σ$ is also diagonal.

2.3 Distance metrics

In order to run the CWST IOP retrieval, some metric must be defined in order to calculate the distance between the input and the database spectra. The most traditional choice is the standard least-squares (or $L_{2}$ ) distance,

d (x, y) = \sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2} = {(x - y)}^{t} \cdot (x - y)

where

x, y

are the spectra to be compared,

x_{1}, x_{2}, \dots, x_{n}

are the values at the various wavelengths, and

n

is the total number of wavelengths (or bands) of the spectra.

An alternative metric is the statistical-based Mahalanobis distance (MD) [16], which measures the distance from a vector $x$ to a given (multivariate) distribution (or set of points) $Y$ . Formally, if the mean and covariance of $Y$ are $μ and Σ$ , respectively, then the Mahalanobis distance $d_{M} (x)$ of $x$ to $Y$ is given by

d_{M} (x) = {(x - μ)}^{t} \cdot Σ^{- 1} \cdot (x - μ)

For two vectors

x, y

, the MD can be generalized as

d_{M} (x, y) = {(x - y)}^{t} \cdot Σ^{- 1} \cdot (x - y)

If the covariance matrix

Σ

is diagonal, with diagonal entries

σ_{1}^{2}, \dots, σ_{n}^{2}

, then the MD formula simplifies to

d_{M} (x, y) = \sum_{i = 1}^{n} \frac{1}{σ_{i}^{2}} {(x_{i} - y_{i})}^{2}

In this case, the Mahalanobis distance is simply a band-weighted version of

L_{2},

with weights given by the inverse covariance of the band; in particular, ‘noisy’ bands (those with high variance) will be weighted less than less-noisy bands.

3. Results

3.1 IOP retrievals

The general setup of our experiment consists of first generating noisy spectra from a given $R_{r s}$ input, as described in Sec. 2.2 and running the LUT-based IOP inversion (Sec. 2.1) using both the $L_{2}$ and Mahalanobis distances and comparing the results.

To generate the noisy data, we began with a set of 52 optically-deep $R_{r s}$ spectra generated by Ecolight over various levels of chl-a, CDOM, and SPM. All other parameters (phase functions, sediment type, wind speed, etc.) were set fixed for all generated spectra; each spectrum was generated on a grid of 68 HICO wavelengths within the range 405 – 789 nm. Each of the 52 noise-free spectra was then input into the noise model (Sec. 2.2) and for each input 1000 noisy $R_{r s}$ samples generated, an example is shown in Fig. 1. In all cases, the atmospheric conditions were held constant; a complete description of the atmospheric parameters may be found in [13]. We also calculated the associated (sample) covariance matrix for each input for use in the Mahalanobis distance calculations.

Fig. 1 1000 Noisy variations (left) and covariance matrix (right). The latter is scaled by 10⁻⁸.

Download Full Size | PDF

Next, we extracted a subset of the existing $R_{r s}$ library spectra from the full database. To keep the comparison as simple as possible, we followed the model used to generate the noisy data and only used optically deep spectra with varying amounts of chl-a, CDOM, and SPM; the sediment type and scattering phase functions for both SPM and chl-a were also varied; all other parameters (wind speed, solar zenith angle, etc.) were fixed to the same values as for the noisy input. An approximate overview of the various ranges and discretization values used in the LUT for CWST is given in Table 1; 261,281 modeled spectra were used in the comparison. The database spectra were originally generated on a wavelength grid of 350 - 790 nm at 5 nm resolution; in order to compare the spectra in the database with the noise-modeled data, the former were resampled onto the HICO grid using cubic-spline interpolation.

Table 1. Overview of the range / values for the parameters used in the database search. Some values have been approximated for brevity.

View Table | View all tables in this article

The next step was to run the CWST IOP inversion for each of the 52,000 noisy spectra, once under each metric. To run the inversion, we simply ran a brute-force search on each input spectrum, calculating the distance between the input and every database spectra, and finding the minimum. The parameters corresponding to the database spectrum with the minimum spectral distance from the input spectrum were then assigned to the input spectrum, and the retrieved parameters were compared to the original, noise-free parameters. We note that no attempt was made to optimize the search, and all calculations were done in the most naïve way possible. Due to the additional matrix-vector product, the time to run to the Mahalanobis distance search was approximately twice as long; however, on a modern desktop, each version can be done quickly (on average, $L_{2}$ search ran in approximately 0.15 seconds per input spectra while MD ran in approximately 0.30 seconds).

The main results are summarized in columns 4 and 5 of Table 2, which shows a simple count of how many of each run of 1000 noisy spectra exactly matched the input spectrum – that is, the retrieved values of all parameters were the same. As can be seen from the results, the Mahalanobis distance outperforms the standard $L_{2}$ distance for each of the 52 input spectra. The improvement ranged from 2 to 50%, with an average improvement of approximately 20%. It was noticed, somewhat surprisingly, that whenever the retrieved concentrations of chl-a, CDOM (expressed in terms of the absorption coefficient at 440 nm), and SPM were correct, the other three free parameters – sediment type and both phase functions – were also correct. It follows that the ‘number correct’ column is the same whether we compare only chl-a, CDOM and SPM or all six parameters.

Table 2. IOP characterizations and experimental results for the 52 noise-free spectra. Cols.1-3 are the input IOPs; Cols. 4-6 describe the retrieval results (out of 1000 noisy realizations); Cols. 7-12 are the avg. retrieved IOP error (see text for definition). Note that the number of correct retrievals will be strongly affected by the discretization of the various parameters in the lookup table (Table 1); as a result, a direct comparison among various levels is difficult.

View Table | View all tables in this article

The average relative error for the retrieved chl-a, CDOM and SPM values are shown in columns 7-12 of Table 2. The reported values are of the form

100 \times (\frac{true value - avg . retrieved value}{true value}) .

The reported relative error is the average error for all 1000 samples; negative errors mean that the average retrieved value was higher than the actual input. In general, the average of the MD-retrieved values was slightly closer to the true value, although in several instances the

L_{2}

values were better.

In each case, it is important to note that the average value will be biased by the discrete nature of the LUT; when constructing the database, a ‘step size’ must be chosen for each parameter. The correct step size to use is itself a research topic [17]; it is also important to remember that since the retrieval is done for all parameters simultaneously, the discretization grid of any one parameter will impact the retrieval of the others as well. A more complete analysis would include an examination of the distribution of each parameter in each input ‘bin’, as shown in Fig. 2. In general, the overall pattern was similar for each parameter; the majority of the retrieved values exactly matched the true value; the remaining values were within one or two step-units on either side (with a few occasional outliers).

Fig. 2 Distribution of the retrieved chl-a (left), CDOM (center) and SPM (right) values for the $L_{2}$ (blue) and Mahalanobis (red) distances. The true values (indicated by arrow) are 3.0 mg m⁻³, 0.3 m⁻¹, and 2.0 g m⁻³, respectively.

Download Full Size | PDF

3.2 Theoretical justification

In order to gain an understanding of why the Mahalanobis distance outperforms $L_{2}$ , an examination of how the noisy data and database spectra are distributed under these two metrics was performed.

From Sec. 2.2, it is reasonable to expect that the generated noisy spectra $X = {x_{1}, x_{2}, \dots, x_{1000}}$ follow, to a very good approximation, a multivariate Gaussian distribution with mean $μ = ρ$ equal to the noise-free input, and diagonal covariance matrix $Σ$ .

To test this, the distribution of the distances $d (x_{i}, ρ)$ between the noise-free and noisy spectra, $ρ$ and $x_{i},$ was calculated for both the Mahalanobis and $L_{2}$ metrics, and the experimental results were compared with the expected theoretical values.

If the data are assumed Gaussian, then it can be shown [18] that the MD reduces to the sum of squared normal variables and therefore has a chi-squared distribution with n degrees of freedom, where n is the number of wavelengths.

Similarly, under the same assumptions, the $L_{2}$ distance Eq. (1) can be rewritten as

d (x, ρ) = \sum_{i = 1}^{n} σ_{i}^{2} (\frac{1}{σ_{i}^{2}} {(x_{i} - ρ_{i})}^{2})

which is a weighted sum of chi-squared variables. By the Lyapunov version of the central limit theorem [19], this is approximately Gaussian, with a mean

u_{d} = σ_{1}^{2} + \dots σ_{n}^{2}

and variance

σ_{d}^{2} = 2 (σ_{1}^{4} + \dots + σ_{n}^{4}) .

An example of the actual and fitted distributions for one of the input spectra is shown in Fig. 3 below. We note that each of the other 51 noise-free spectra lead to fits that are qualitatively similar to the example shown; although not necessarily conclusive, it appears from this that the multivariate Gaussian distribution for the noisy data is reasonable.

Fig. 3 Distribution of $L_{2}$ (left) and Mahalanobis (right) distances from the first noise-free spectrum to the 1000 noisy realizations. The red bars are the actual data, the line represents the fitted normal and chi-squared distributions, respectively.

Download Full Size | PDF

An alternative, more geometrical view of the two distances can be obtained by examining how the data are distributed in the spectral space. In general, the data will have a higher ‘spread’ along bands that have more noise, as shown in Fig. 4; as a result, the set of noisy data tends to be more of an ellipsoid than a perfect sphere. From Eq. (4), for a fixed value the Mahalanobis distance defines an ellipsoid with axial lengths given by the band variances; in contrast, for a fixed distance $L_{2}$ defines a sphere that is equidistant along each band as shown in Fig. 4. Intuitively, the MD distance does a much better job of modeling the true spread of the noisy data within the spectral space.

Fig. 4 Two-band scattergram of the noisy (left) and database (right) spectra. The red curve represents the 3-sigma range of the $L_{2}$ distance; the black dotted curve is the 3-sigma range for MD. The bands shown are 422 and 571 nm.

Download Full Size | PDF

With this in mind, it is useful to examine how many of the database spectra lie within the same ‘range’ of the noise-free spectrum as the noisy spectra. From above, we know we can estimate the distribution of the noisy spectra as a chi-squared (MD) or normal distribution ( $L_{2}$ ); from this, we can estimate the range within which a given percentage of the noisy data must fall. For example, under the normal distribution, we can estimate that approximately 85% of the data are less than one sigma above the mean, and about 99% of the data are less than $μ + 2.33 σ$ . Similar results for the chi-squared distribution with a given degree of freedom can be found via tables or software-based cumulative distribution functions. In Table 3, we show the total number of noisy and database spectra that lie within the 85 / 99% ranges of the noisy data; an example is shown in Fig. 4 (spectral space) and Fig. 5 (distance). In each case, it can be seen that the number of ‘incorrect’ database spectra within the noise range drops significantly under MD; intuitively, this implies that the noisy data are ‘closer’ to the true noise-free spectrum under MD than under $L_{2}$ and that the likelihood of the noisy data being closest to the truth is higher.

Table 3. Distribution of the noisy and database spectra within a given range of the noise-free input spectrum.

View Table | View all tables in this article

Fig. 5 Distribution of the $L_{2}$ (left) and Mahalanobis (right) distances in the noise range. The dotted line represents the (fitted) distribution of the noisy data; the bars represent the number of database spectra at that distance. Note that the number of database spectra within the noise range is significantly smaller for the Mahalanobis distance.

Download Full Size | PDF

Summary and future work

Noise, including sensor and shot noise, is inescapable in real-world data, and has been shown previously [13] to strongly affect the accuracy of retrieved in-water constituents. In this work we have shown that statistical modeling, including the use of the Mahalanobis distance to compare spectra, can partially offset this effect and lead to improved retrievals.

Realistic sensor models were used to simulate noisy data; in particular, extensive simulations were required to derive the (full) covariance matrix needed to use the Mahalanobis distance. In order to reduce the computational complexity, a more efficient method of estimating the (diagonal) covariance is needed; we have recently begun an examination of this problem and expect to present these results in the near future. We also note that an accurate description of the noise levels and how they affect retrievals may be used to help guide future sensor design.

References and links

1. A. Morel and L. Prieur, “Analysis of variations in ocean color,” Limnol. Oceanogr. 22(4), 709–722 (1977). [CrossRef]

2. R. P. Bukata, J. H. Jerome, K. Y. Kondratyev, and D. V. Pozdnyakov, Optical Properties and Remote Sensing of Inland and Coastal Waters (CRC Press, 1995).

3. K. L. Carder, R. G. Steward, G. R. Harvey, and P. B. Ortner, “Marine humic and fulvic acids: their effects on remote sensing of ocean chlorophyll,” Limnol. Oceanogr. 34(1), 68–81 (1989). [CrossRef]

4. A. Gitelson, “The peak near 700 nm on radiance spectra of algae and water - relationships of its magnitude and position with chlorophyll concentration,” Int. J. Remote Sens. 13(17), 3367–3373 (1992). [CrossRef]

5. G. Dall’Olmo and A. A. Gitelson, “Effect of bio-optical parameter variability on the remote estimation of chlorophyll-a concentration in turbid productive waters: experimental results,” Appl. Opt. 44(3), 412–422 (2005). [CrossRef] [PubMed]

6. Y. Z. Yacobi, W. J. Moses, S. Kaganovsky, B. Sulimani, B. C. Leavitt, and A. A. Gitelson, “NIR-red reflectance-based algorithms for chlorophyll-a estimation in mesotrophic inland and coastal waters: Lake Kinneret case study,” Water Res. 45(7), 2428–2436 (2011). [CrossRef] [PubMed]

7. C. Le, Y. Li, Y. Zha, D. Sun, C. Huang, and H. Lu, “A four-band semi-analytical model for estimating chlorophyll a in highly turbid lakes: the case of Taihu Lake, China,” Remote Sens. Environ. 113(6), 1175–1182 (2009). [CrossRef]

8. W. Yang, B. Matsushita, J. Chen, T. Fukushima, and R. Ma, “An enhanced three-band index for estimating chlorophyll-a in turbid case-II waters: case studies of Lake Kasumigaura, Japan, and Lake Dianchi, China,” IEEE Geosci. Remote Sens. Lett. 7(4), 655–659 (2010). [CrossRef]

9. C. D. Mobley, L. K. Sundman, C. O. Davis, J. H. Bowles, T. V. Downes, R. A. Leathers, M. J. Montes, W. P. Bissett, D. D. R. Kohler, R. P. Reid, E. M. Louchard, and A. Gleason, “Interpretation of hyperspectral remote-sensing imagery by spectrum matching and look-up tables,” Appl. Opt. 44(17), 3576–3592 (2005). [CrossRef] [PubMed]

10. C. D. Mobley, “A numerical model for the computation of radiance distributions in natural waters with wind roughened surfaces,” Limnol. Oceanogr. 34(8), 1473–1483 (1989). [CrossRef]

11. C. D. Mobley and L. K. Sundman, Hydrolight 5 Ecolight 5 technical documentation, 1st Ed., (Sequoia Scientific Inc., 2008).

12. R. L. Lucke, M. Corson, N. R. McGlothlin, S. D. Butcher, D. L. Wood, D. R. Korwan, R. R. Li, W. A. Snyder, C. O. Davis, and D. T. Chen, “Hyperspectral Imager for the Coastal Ocean: instrument description and first images,” Appl. Opt. 50(11), 1501–1516 (2011). [CrossRef] [PubMed]

13. W. J. Moses, J. H. Bowles, R. L. Lucke, and M. R. Corson, “Impact of signal-to-noise ratio in a hyperspectral sensor on the accuracy of biophysical parameter estimation in case II waters,” Opt. Express 20(4), 4309–4330 (2012). [CrossRef] [PubMed]

14. B. C. Gao, M. J. Montes, Z. Ahmad, and C. O. Davis, “Atmospheric correction algorithm for hyperspectral remote sensing of ocean color from space,” Appl. Opt. 39(6), 887–896 (2000). [CrossRef] [PubMed]

15. M. J. Montes, B. C. Gao, and C. O. Davis, “A new algorithm for atmospheric correction of hyperspectral remote sensing data,” Proc. SPIE, Geo-Spatial Image and Data Exploitation II, W. E. Roper (ed.), 4383: 23–30 (2001). [CrossRef]

16. P. C. Mahalanobis, “On the generalized distance in statistics,” Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936).

17. J. Hedley, C. Roelfsema, and S. Phinn, “Efficient radiative transfer model inversion for remote sensing applications,” Remote Sens. Environ. 113(11), 2527–2532 (2009). [CrossRef]

18. K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis (Academic Press, 2003).

19. P. Billingsley, Probability and Measure, Third Ed. (John Wiley & Sons, 1995).

Parameter	Range	% of database
chl-a	0, 0.05. 0.1, …, 0.95	10.1
	1, 1.1, …, 2	7.5
	2, 3, 4, 5, …, 9	36.5
	10, 11, …, 19	27.8
	20, 25, .., 60	18.1
CDOM	0, 0.01, …,0.19	13.4
	0.2, 0.3, …, 0.9	43.3
	1.0, 1.1, …, 1.9	31.3
	2, 2.5, …, 4	12.0
SPM	0, 0.1, …, 0.9	12.3
	1, 1.5, …, 9.5	53.3
	10, 10.5, ..., 20	34.3
Sediment	None	4.7
	Yellow clay	2.7
	Red clay	52.1
	Calcareous sand	22.5
	Bukata	3.6
	Brown Earth	14.4
Phase Function	bb = 0.004	6.9
	bb = 0.008	5.6
	bb = 0.012	17.8
	bb = 0.015	10.5
	bb = 0.02	18.6
	bb = 0.016	18.6
	bb = 0.024	19.6

Noise-free parameters			No. Correct		Imp. (%)	Average Error (%)
Noise-free parameters			No. Correct			Chl-a		CDOM		SPM
Chl-a	CDOM	SPM	L₂	MD		L₂	MD	L₂	MD	L₂	MD
3	0.3	2	518	688	32.8	5.0	2.3	3.3	3.3	11.5	7.0
3	0.3	6	782	866	10.7	0.0	−1.0	0.0	0.0	3.3	1.2
3	0.3	10	857	889	3.7	−1.7	−1.3	0.0	0.0	0.5	0.1
3	0.3	16	889	913	2.7	−1.7	−0.7	0.0	0.0	0.2	0.0
3	1	2	255	383	50.2	−1.3	1.3	6.0	2.0	−2.5	9.0
3	1	6	439	615	40.1	−2.7	−1.7	7.0	3.0	9.7	6.7
3	1	10	447	530	18.6	−3.0	−2.7	3.0	3.0	6.2	3.7
3	1	16	650	733	12.8	−2.3	−1.3	−3.0	−1.0	2.4	0.7
3	1.6	2	315	408	29.5	−3.0	0.0	8.8	5.6	−34.5	−19.5
3	1.6	6	483	610	26.3	−1.3	−0.7	5.0	0.0	1.2	4.8
3	1.6	10	428	527	23.1	−6.3	−4.7	4.4	2.5	3.7	2.0
3	1.6	16	514	583	13.4	−3.7	−2.3	−1.9	−1.3	2.6	1.4
10	0.3	2	441	652	47.8	6.0	3.1	0.0	0.0	11.5	6.0
10	0.3	6	504	602	19.4	2.3	1.0	6.7	3.3	4.2	1.8
10	0.3	10	576	621	7.8	0.0	−0.1	−3.3	0.0	1.3	0.3
10	0.3	16	671	700	4.3	−0.4	−0.1	0.0	0.0	0.2	0.1
10	1	2	318	410	28.9	3.8	3.6	2.0	0.0	16.0	18.5
10	1	6	322	466	44.7	1.8	0.9	4.0	1.0	7.7	4.5
10	1	10	223	307	37.7	0.4	1.1	0.0	0.0	7.6	4.8
10	1	16	455	487	7.0	0.0	0.1	−2.0	0.0	1.4	0.4
10	1.6	2	223	293	31.4	3.3	3.0	5.0	0.0	−14.0	11.0
10	1.6	6	304	394	29.6	3.8	2.8	2.5	−0.6	4.2	6.2
10	1.6	10	207	301	45.4	0.4	0.6	2.5	−0.6	6.1	4.4
10	1.6	16	399	410	2.8	0.0	0.0	−1.3	−0.6	1.2	0.6
18	0.3	2	505	609	20.6	2.9	1.2	−10.0	−3.3	10.0	4.5
18	0.3	6	366	428	16.9	1.8	0.9	−6.7	−3.3	3.2	1.0
18	0.3	10	505	515	2.0	0.1	0.2	0.0	0.0	0.1	0.0
18	0.3	16	546	561	2.7	0.0	0.2	0.0	0.0	0.2	0.0
18	1	2	347	442	27.4	5.8	4.5	−5.0	−4.0	17.5	16.5
18	1	6	244	285	16.8	4.3	3.8	−6.0	−5.0	9.0	7.2
18	1	10	231	268	16.0	1.9	1.4	−2.0	−1.0	2.2	0.9
18	1	16	347	372	7.2	−0.1	−0.2	−3.0	−1.0	2.1	0.6
18	1.6	2	245	301	22.9	6.4	4.9	−3.7	−3.1	9.5	17.0
18	1.6	6	250	298	19.2	4.3	3.5	−3.7	−3.1	10.5	8.5
18	1.6	10	218	266	22.0	2.4	2.7	−3.1	−1.9	5.2	3.7
18	1.6	16	262	283	8.0	0.6	0.7	−2.5	−1.9	3.6	2.4
45	0.3	2	820	943	15.0	0.8	0.4	−6.7	−3.3	8.5	2.0
45	0.3	6	737	860	16.7	0.6	0.5	−6.7	−3.3	4.2	1.0
45	0.3	10	856	928	8.4	−0.1	−0.1	−3.3	0.0	1.5	0.3
45	0.3	16	920	975	6.0	−0.1	0.0	−3.3	0.0	0.6	0.1
45	1	2	631	775	22.8	1.6	1.6	−4.0	−3.0	14.0	11.0
45	1	6	604	712	17.9	1.6	1.6	−5.0	−3.0	7.5	5.2
45	1	10	607	694	14.3	0.8	1.0	−3.0	−1.0	2.4	1.1
45	1	16	792	862	8.8	−0.1	0.0	−1.0	0.0	0.7	0.2
45	1.6	2	504	577	14.5	3.5	3.0	−3.1	−2.5	17.5	16.0
45	1.6	6	592	680	14.9	2.1	1.8	−3.1	−2.5	9.7	7.8
45	1.6	10	574	618	7.7	1.4	1.2	−2.5	−1.9	4.4	3.5
45	1.6	16	669	711	6.3	0.8	0.8	−1.9	−0.6	2.3	1.3
0.5	0.2	1	448	649	44.9	−14.0	−4.0	−5.0	−5.0	10.0	9.0
0.5	0.2	2	695	817	17.6	−20.0	−10.0	0.0	0.0	10.0	7.0
0.5	0.5	1	198	287	44.9	−26.0	−4.0	−2.0	−6.0	−31.0	−26.0
0.5	0.5	2	522	657	25.9	−30.0	−20.0	2.0	−2.0	2.0	5.0

Noise-free parameters			Noisy spectra in range				Database spectra in range
Noise-free parameters			1 sigma		2.33 sigma		1 sigma		2.33 sigma
Chl-a	CDOM	SPM	L₂	MD	L₂	MD	L₂	MD	L₂	MD
3	0.3	2	866	846	970	988	244	102	378	149
3	0.3	6	858	838	970	982	65	34	99	47
3	0.3	10	848	848	974	979	66	28	118	37
3	0.3	16	849	840	977	988	67	25	104	39
3	1	2	847	842	975	988	1044	434	1439	554
3	1	6	859	838	971	981	373	158	519	201
3	1	10	874	842	978	991	330	150	469	202
3	1	16	851	852	977	980	201	78	289	121
3	1.6	2	857	834	979	988	1218	536	1661	682
3	1.6	6	855	841	978	986	437	191	633	253
3	1.6	10	852	836	977	987	296	121	396	147
3	1.6	16	852	832	979	986	192	82	275	100
10	0.3	2	851	832	977	986	155	83	238	111
10	0.3	6	850	853	976	985	121	61	170	86
10	0.3	10	848	840	977	987	126	59	193	88
10	0.3	16	845	840	976	987	112	55	169	86
10	1	2	857	842	971	980	823	369	1143	494
10	1	6	850	838	972	986	490	204	716	261
10	1	10	853	854	977	984	427	205	615	250
10	1	16	853	834	977	987	318	140	444	189
10	1.6	2	848	841	978	984	1089	556	1399	700
10	1.6	6	846	849	981	986	612	291	806	346
10	1.6	10	847	840	970	987	449	222	590	269
10	1.6	16	853	842	972	988	342	165	445	200
18	0.3	2	853	837	973	987	106	48	155	66
18	0.3	6	845	848	976	986	88	55	123	70
18	0.3	10	849	852	973	982	92	30	150	47
18	0.3	16	846	841	973	987	87	54	121	68
18	1	2	850	847	976	979	689	251	989	316
18	1	6	841	839	979	979	399	151	551	214
18	1	10	846	843	976	986	316	129	461	194
18	1	16	849	841	976	983	243	126	369	160
18	1.6	2	869	842	972	985	950	471	1212	568
18	1.6	6	860	850	974	982	532	242	706	301
18	1.6	10	857	841	975	984	413	184	536	233
18	1.6	16	861	837	969	987	313	137	418	179
45	0.3	2	855	856	974	978	51	27	71	35
45	0.3	6	856	841	971	982	57	33	91	42
45	0.3	10	847	829	974	985	71	38	91	50
45	0.3	16	836	852	976	984	70	42	107	52
45	1	2	839	849	980	988	186	77	290	110
45	1	6	849	844	974	980	188	79	274	104
45	1	10	854	841	971	987	171	84	243	106
45	1	16	854	841	972	984	168	74	231	93
45	1.6	2	856	840	967	987	411	145	613	189
45	1.6	6	851	826	973	993	272	119	389	151
45	1.6	10	853	849	978	991	201	99	276	127
45	1.6	16	866	836	972	983	175	80	230	104
0.5	0.2	1	844	859	980	981	508	259	699	336
0.5	0.2	2	848	839	972	985	355	169	520	227
0.5	0.5	1	846	842	974	985	2316	958	3214	1222
0.5	0.5	2	855	860	971	988	1544	624	2145	783

Parameter	Range	% of database
chl-a	0, 0.05. 0.1, …, 0.95	10.1
	1, 1.1, …, 2	7.5
	2, 3, 4, 5, …, 9	36.5
	10, 11, …, 19	27.8
	20, 25, .., 60	18.1
CDOM	0, 0.01, …,0.19	13.4
	0.2, 0.3, …, 0.9	43.3
	1.0, 1.1, …, 1.9	31.3
	2, 2.5, …, 4	12.0
SPM	0, 0.1, …, 0.9	12.3
	1, 1.5, …, 9.5	53.3
	10, 10.5, ..., 20	34.3
Sediment	None	4.7
	Yellow clay	2.7
	Red clay	52.1
	Calcareous sand	22.5
	Bukata	3.6
	Brown Earth	14.4
Phase Function	bb = 0.004	6.9
	bb = 0.008	5.6
	bb = 0.012	17.8
	bb = 0.015	10.5
	bb = 0.02	18.6
	bb = 0.016	18.6
	bb = 0.024	19.6

Noise-free parameters			No. Correct		Imp. (%)	Average Error (%)
Noise-free parameters			No. Correct			Chl-a		CDOM		SPM
Chl-a	CDOM	SPM	L₂	MD		L₂	MD	L₂	MD	L₂	MD
3	0.3	2	518	688	32.8	5.0	2.3	3.3	3.3	11.5	7.0
3	0.3	6	782	866	10.7	0.0	−1.0	0.0	0.0	3.3	1.2
3	0.3	10	857	889	3.7	−1.7	−1.3	0.0	0.0	0.5	0.1
3	0.3	16	889	913	2.7	−1.7	−0.7	0.0	0.0	0.2	0.0
3	1	2	255	383	50.2	−1.3	1.3	6.0	2.0	−2.5	9.0
3	1	6	439	615	40.1	−2.7	−1.7	7.0	3.0	9.7	6.7
3	1	10	447	530	18.6	−3.0	−2.7	3.0	3.0	6.2	3.7
3	1	16	650	733	12.8	−2.3	−1.3	−3.0	−1.0	2.4	0.7
3	1.6	2	315	408	29.5	−3.0	0.0	8.8	5.6	−34.5	−19.5
3	1.6	6	483	610	26.3	−1.3	−0.7	5.0	0.0	1.2	4.8
3	1.6	10	428	527	23.1	−6.3	−4.7	4.4	2.5	3.7	2.0
3	1.6	16	514	583	13.4	−3.7	−2.3	−1.9	−1.3	2.6	1.4
10	0.3	2	441	652	47.8	6.0	3.1	0.0	0.0	11.5	6.0
10	0.3	6	504	602	19.4	2.3	1.0	6.7	3.3	4.2	1.8
10	0.3	10	576	621	7.8	0.0	−0.1	−3.3	0.0	1.3	0.3
10	0.3	16	671	700	4.3	−0.4	−0.1	0.0	0.0	0.2	0.1
10	1	2	318	410	28.9	3.8	3.6	2.0	0.0	16.0	18.5
10	1	6	322	466	44.7	1.8	0.9	4.0	1.0	7.7	4.5
10	1	10	223	307	37.7	0.4	1.1	0.0	0.0	7.6	4.8
10	1	16	455	487	7.0	0.0	0.1	−2.0	0.0	1.4	0.4
10	1.6	2	223	293	31.4	3.3	3.0	5.0	0.0	−14.0	11.0
10	1.6	6	304	394	29.6	3.8	2.8	2.5	−0.6	4.2	6.2
10	1.6	10	207	301	45.4	0.4	0.6	2.5	−0.6	6.1	4.4
10	1.6	16	399	410	2.8	0.0	0.0	−1.3	−0.6	1.2	0.6
18	0.3	2	505	609	20.6	2.9	1.2	−10.0	−3.3	10.0	4.5
18	0.3	6	366	428	16.9	1.8	0.9	−6.7	−3.3	3.2	1.0
18	0.3	10	505	515	2.0	0.1	0.2	0.0	0.0	0.1	0.0
18	0.3	16	546	561	2.7	0.0	0.2	0.0	0.0	0.2	0.0
18	1	2	347	442	27.4	5.8	4.5	−5.0	−4.0	17.5	16.5
18	1	6	244	285	16.8	4.3	3.8	−6.0	−5.0	9.0	7.2
18	1	10	231	268	16.0	1.9	1.4	−2.0	−1.0	2.2	0.9
18	1	16	347	372	7.2	−0.1	−0.2	−3.0	−1.0	2.1	0.6
18	1.6	2	245	301	22.9	6.4	4.9	−3.7	−3.1	9.5	17.0
18	1.6	6	250	298	19.2	4.3	3.5	−3.7	−3.1	10.5	8.5
18	1.6	10	218	266	22.0	2.4	2.7	−3.1	−1.9	5.2	3.7
18	1.6	16	262	283	8.0	0.6	0.7	−2.5	−1.9	3.6	2.4
45	0.3	2	820	943	15.0	0.8	0.4	−6.7	−3.3	8.5	2.0
45	0.3	6	737	860	16.7	0.6	0.5	−6.7	−3.3	4.2	1.0
45	0.3	10	856	928	8.4	−0.1	−0.1	−3.3	0.0	1.5	0.3
45	0.3	16	920	975	6.0	−0.1	0.0	−3.3	0.0	0.6	0.1
45	1	2	631	775	22.8	1.6	1.6	−4.0	−3.0	14.0	11.0
45	1	6	604	712	17.9	1.6	1.6	−5.0	−3.0	7.5	5.2
45	1	10	607	694	14.3	0.8	1.0	−3.0	−1.0	2.4	1.1
45	1	16	792	862	8.8	−0.1	0.0	−1.0	0.0	0.7	0.2
45	1.6	2	504	577	14.5	3.5	3.0	−3.1	−2.5	17.5	16.0
45	1.6	6	592	680	14.9	2.1	1.8	−3.1	−2.5	9.7	7.8
45	1.6	10	574	618	7.7	1.4	1.2	−2.5	−1.9	4.4	3.5
45	1.6	16	669	711	6.3	0.8	0.8	−1.9	−0.6	2.3	1.3
0.5	0.2	1	448	649	44.9	−14.0	−4.0	−5.0	−5.0	10.0	9.0
0.5	0.2	2	695	817	17.6	−20.0	−10.0	0.0	0.0	10.0	7.0
0.5	0.5	1	198	287	44.9	−26.0	−4.0	−2.0	−6.0	−31.0	−26.0
0.5	0.5	2	522	657	25.9	−30.0	−20.0	2.0	−2.0	2.0	5.0

Improving the retrieval of water inherent optical properties in noisy hyperspectral data through statistical modeling

Abstract

1. Introduction

2. Experimental description and methodology

2.1 Look-Up-Table (LUT) based IOP retrieval

2.2 Sensor noise modeling

2.3 Distance metrics

3. Results

3.1 IOP retrievals

3.2 Theoretical justification

Summary and future work

References and links

Cited By

Figures (5)

Tables (3)

Equations (6)

Optics Express