Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Robustness of analyses of imaging data

Open Access Open Access

Abstract

Successful classifications of reflectance and vibrational data are to a large extent dependent upon robustness of input data. In this study, a well-known geostatistical approach, variogram analysis, was described and its robustness was assessed through comprehensive evaluation of 3,200 variogram settings. High-resolution hyperspectral imaging data were acquired from greenhouse maize plants, and the robustness (radiometric repeatability) of three variogram parameters (nugget, sill, and range) was examined when generated from imaging data collected from two different sets of plants and with imaging data collected on seven different days in two years. Robustness of variogram parameters was compared with average reflectance values in six spectral bands, three standard vegetation indices (NDVI, SI, and PRI), and PCA scores from principal component analysis.

©2011 Optical Society of America

1. Introduction

There is widespread and growing use of imaging technology as part of quality control systems in agriculture, food processing, pharmaceutical industries, and medical research. In many of these applications, the objective is to detect or quantify subtle differences in otherwise homogenous target objects. For instance, an image-based system may be used to: 1) detect tumors in chicken carcasses [1], 2) quantify surface residues and contaminants [26], 3) evaluate cereal kernel traits [79], or 4) assess purity or quality of pharmaceutical products [1012]. A common denominator in these high-resolution imaging applications is that the spatial resolution of the data far exceeds the size of the target objects, so that many (in some cases >1,000) reflection or vibrational spectra are acquired from all pixels, which are arranged in a two-dimensional grid (x- and y-coordinates). Furthermore, the classification purpose is commonly not to identify classes within an image but rather to classify and quantify similarity/dissimilarity among images. Consequently, many different images are typically acquired on different days, and despite efforts to calibrate and adjust for subtle variations in environmental conditions, such variation may lead to significant between-day variation in reflectance data and/or low radiometric repeatability [13]. Low radiometric repeatability of input data obviously decreases the likelihood of being able of successfully distinguish treatment classes, and conduct independent validation [9, 1316]. Robustness of input data is of particular concern when imaging data are acquired as time-series from organic (living) objects, because the reflectance signal is not static but “moving baseline” – difference in imaging data between treatment groups varies over time due to complex interactions between: growth of target object (i.e. a plant), environmental conditions, and imposed stressor(s).

In this study, “robustness” (radiometric repeatability) is defined as the similarity of reflectance data (405-907 nm) over time and across hybrids when acquired from multiple maize plants (Zea mays L.) grown under the same experimental conditions. The robustness of three variogram parameters (nugget, sill, and range) was examined in response to combinations of variogram settings as part of identifying the most robust variogram analysis for high-resolution analysis of reflectance data. Robustness of variogram parameters was compared with average reflectance values in six spectral bands (R532, R570, R693, R706, R759, and R876 nm) and three standard vegetation indices [Normalized Difference Vegetation Index, NDVI = (R750 - R705) / (R750 + R705), Stress index, SI = (R693 / R759), and Photochemical reflectance index, PRI = (R531 – R570) / (R531 – R570)]. Due to considerable computer processing demand by variogram analysis, the effects of input data reduction on variogram parameter estimates was also evaluated.

2. Experimental details

2.1. Hyperspectral imaging data

Reflectance data were acquired from two maize hybrids (Hybrid 1: Triumph 1416, 2004 and Hybrid 2: Pioneer 3223, 2006), which were planted October 2007 (data set 1) and February 2008 (data set 2). For both data sets, individual maize plants were grown in 11L plastic pots under optimum water regime, and one hyperspectral image was acquired from the middle portion of the 7th or 8th leaf from each plant. Imaging data from data set 1 were acquired on three different days within a seven days period (N = 18 hyperspectral images) and imaging data from data set 2 were acquired on four different days within a 10 days period (N = 22 hyperspectral images). The imaging acquisition system consisted of a push broom line-scanning hyperspectral camera with 640 sensors in a linear array (PIKA II, www.resonon.com) and a wavelength range from 405 to 907 nm. Imaging data were acquired inside the greenhouse used for rearing the maize plants, and each imaged leaf was held in horizontal position by rubber bands (Fig. 1 ). Sunlight was the sole light source. Dark calibration was conducted at the beginning of the data acquisition, and white calibration (using white Teflon) was conducted immediately before each image acquisition to account for subtle changes in light conditions. The experimental unit for hyperspectral imaging was a 6 cm long and 2.5 cm wide (15 cm2) mid portion of maize leaves, and the original spatial resolution of each hyperspectral image was 640 × 250 (160,000) hyperspectral profiles or pixels. However, PC-ENVI 4.7 (www.ittvis.com) was used to conduct 4 × 4 spatial averaging so that each hyperspectral image was reduced to 10,080 (160 × 63) pixels. After spatial averaging, the spatial resolution was equivalent to 6.7 hyperspectral profiles (pixels) per mm2. All 40 hyperspectral image files were imported into PC-SAS 9.2 (Cary, NC) for data processing and statistical analyses.

 figure: Fig. 1

Fig. 1 Reflectance data acquisition from maize individual leaves.

Download Full Size | PDF

2.2. Variogram settings

Spatial structure analysis based on geostatistics is widely accepted as being “BLUE” (best linear unbiased estimate) [18-19], and it is considered one of the most powerful and robust approaches to spatial data analysis. For more detailed description of variogram analysis, it is recommended to consult [17, 18] The spectral band at 706 nm was chosen for analysis of influence of lag distance and number of lag distance interval settings, as it had been shown in a previous study to provide good indication of both abiotic (sill values) and biotic (nugget and range values) stress [20]. Reflectance data at 706 nm from the 40 hyperspectral images were used to conduct semi-variogram or co-variogram analyses and each variogram approach was evaluated using 20 different variogram settings: 1) lag distances (1-5 pixels in each interval) and 2) number of lag intervals (5-20 in 5-interval increments) (20 combinations). Two regression fits are commonly used in regression fits of variogram data:

F(v)=a+b(1e(3×Dc))

Is an asymptotic curve fit [18], in which “a” denotes the nugget, “b” the sill, and “c” the “range”.

F(v)=a+b(3D2cD32c3)

Is an spherical curve fit [18], in which “a” denotes the nugget, “b” the sill, and “c” the “range”.

For each combination of lag distance and number of lag intervals, Eq. (1) or 2 was fitted to the variogram data. Thus, with data from 40 hyperspectral images, a total of 3,200 variogram analyses were conducted (40 hyperspectral images × semi- or co-variogram × 20 variogram settings × Eq. (2) or 3). Variogram parameter estimates were discarded if the regression fits failed to converge or the predicted range value exceeded 63. This threshold of 63 was chosen at it represented the width of the imaging data cube and a range value higher than 63 meant that the sill was reached at a lag distance that was longer than the width of data set. From this comprehensive variogram analysis, the combination with the highest level of robustness was identified, and analysis of variance (PROC MIXED) was used to compare average variogram parameters over time and between the two maize hybrids.

2.3. Use of indices as stress indicators

For comparison of robustness of variogram parameters, three commonly used vegetation indices [14], were calculated: normalized difference vegetation index (NDVI) = [(R750 - R705) / (R750 + R705)], Stress index (SI) = (R693 / R759), and Photochemical reflectance index (PRI) = [(R531 – R570) / (R531 – R570)]. Average reflectance in the six spectral bands used to calculate the vegetation indices were also determined (R531, R570, R693, R705, R759, and R750), and analysis of variance (PROC MIXED) was used to examine effects of time and hybrid on average reflectance. As a second comparison, principal component analysis (PCA) was applied to all combinations of 160 spectral bands and hyperspectral images to examine the robustness of PCA scores for each spectral band. Analysis of variance (PROC MIXED) was used to examine effects of year, day of sampling within year, and hybrid on average reflectance on PCA scores from all 160 spectral bands.

2.4. Effect of input data reduction on variogram parameters

The amount of input data was tentatively reduced in 500 data point increments from the original data size of about 10,000 to 1,000 per hyperspectral image (19 data set sizes). Thus with 40 hyperspectral images, a total of 760 variogram analyses were conducted with the combination of variogram settings that identified as being the most robust (spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals). The relationship between input data size and consistency of each of the three semi-variogram parameters estimates was characterized by dividing each variogram parameter estimate with that based from the full input data set (N = 10,080).

3. Results and discussion

Despite encouraging results from previously published studies involving variogram-based analyses of reflectance data from different agricultural applications [2, 20, 21], there is a need for better insight into the robustness of a important user-defined variogram options, in terms of: 1) which variogram analysis to use (i.e. semi-variance, co-variance)? 2) Which regression fit to use (i.e. exponential or spherical)? 3) How to select the most appropriate lag distance interval? 4) How many lag distance intervalS to include in the regression fit? 5) whether it is possible to reduce the size of input data sets without adversely affecting the robustness of variogram parameter estimates?

3.1. Robustness of variogram settings

A given spatial structure characterized by variogram analysis is highly dependent on which settings are being used. The comprehensive evaluation of 3,200 variogram analyses from 40 hyperspectral images revealed that about 40% (1,329) of the examined variogram settings produced range value estimates that exceeded half the width of the hyperspectral image file (63 pixels), so these estimates were discarded. Only one combination of variogram data × variogram settings × regression fit produced a consistent output for all 40 imaging data set: spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals. This combination was therefore selected as the most robust and used for further analyses. Figure 2 shows the derived variogram parameter estimates across maize hybrids and with reflectance data collected on seven different days in two separate years (data sets). As indicated by standard error bars, nugget estimates (Fig. 2a) were quite consistent, except for data collected March 28 2008, in which one nugget estimate was about 10 times the general average (0.0068) and one was negative (−0.0002). Sill estimates (Fig. 2b) were very consistent within each combination of hybrid and date of data collection, but there was considerable variation in averages for the seven days of data collection. Average range values (Fig. 2c) were highlyconsistent across days of data collection and were also consistent within each combination of hybrid and date of data collection. Based on statistical analyses of the three effects (maize hybrid, year, and nested effect of day within year, Table 1 ): 1) sill values were significantly different for the two years, and they varied over time within years, 2) nugget values did not respond significantly to the examined effects, this was attributed to extreme estimates from data collected March 28 2008, and 3) range values did not respond significantly to the examined effects and were therefore considered the most robust variogram parameter.

 figure: Fig. 2

Fig. 2 Average variogram parameters [nugget (a), sill (b), and range (c)] obtained from the most robust variogram analysis (spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals) for all combinations of date of data collection and maize hybrid.

Download Full Size | PDF

Tables Icon

Table 1. Robustness of variogram parameters and vegetation indices

3.2. Robustness of spectral bands and vegetation indices

Regarding robustness of average reflectance in six spectral bands (Fig. 3 ), several trends were observed: 1) In 2007, distinct difference between hybrids while little difference in 2008 [R532 (Fig. 3a), R570 (Fig. 3b)], 2) distinct between-year difference in average reflectance values [R963 (Fig. 3c), R706 (Fig. 3d)], 3) sampling dates with considerable “outlier” characteristics (overall low robustness) [R759 (Fig. 3e)], and 4) temporal trends within each year (low within-year robustness) [R786 (Fig. 3f)]. Thus, it was emphasized that average reflectance in these spectral bands showed low robustness, but it appeared that low robustness was caused by factors that varied among the spectral bands. Statistical analyses of the three effects (maize hybrid, year, and nested effect of day within year, Table 2 ) showed that: 1) all spectral bandswere significantly different between years, 2) none of the spectral bands varied significantly between maize hybrids, and 3) there was a significant effect of time within years for two of the spectral bands (759 and 786 nm). Vegetation indices varied in robustness with NDVI estimates (Fig. 3a) being slightly higher in 2007 compared to 2008, especially for hybrid 2, SI estimates (Fig. 3b), were generally highest in 2008 compared to 2007, and PRI estimates (Fig. 3c) were negative in 2007 and positive in 2008. Statistical analyses the vegetation indices revealed that they all showed similar responses to the examined effects with significant difference between years, significant effect of days within years, but no significant difference between hybrids (Fig. 4 , Table 1).

 figure: Fig. 3

Fig. 3 : Average reflectance values in six spectral bands [532 nm (a), 570 nm (b), 693 nm (c), 706 nm (d), 759 nm (e), 786 nm (f)] for all combinations of date of data collection and maize hybrid.

Download Full Size | PDF

Tables Icon

Table 2. Robustness of six spectral bands

 figure: Fig. 4

Fig. 4 : Average vegetation indices [Normalized Difference Vegetation Index (a), NDVI = (R750 - R705) / (R750 + R705), Stress index (b), SI = (R693 / R759), and Photochemical reflectance index (c), PRI = (R531 – R570) / (R531 – R570)] for all combinations of date of data collection and maize hybrid.

Download Full Size | PDF

3.3. Robustness of PCA scores

Similar to average reflectance values in individual spectral bands and vegetation indices, PCA scores from the first three principal components (PCA1, PCA2, and PCA3) varied significantly in most of the 160 spectral bands in response to one or several of the treatment effects (Fig. 5 ). However regarding PCA1 scores, there were six spectral bands between 715 and 730 nm (Fig. 5a), in which robustness was high (no significant treatment effects). Similarly, PCA2 scores (Fig. 5b) and PCA3 scores (Fig. 5c) showed high level of robustness in the ranges from 550 to 567 nm and 816-832 nm, respectively. The high level of robustness of PCA1 scores from spectral bands near 700 nm is noteworthy, strong response to a wide range plant stressors is most likely to occur in this part of the spectrum [22]. However, the low level of robustness of PCA scores in most of the examined spectral bands underscores the importance of careful a priori selection of spectral bands when conducting principal component analysis, as spectral bands with low robustness may adversely affect the classification accuracy. Furthermore, it appears important to mention that the robustness of PCA scores of spectral bands varied among the first three principal components, so it seem important to select the appropriate combination of spectral bands and principal components for the PCA based classification to be successful.

 figure: Fig. 5

Fig. 5 Principal component analyses (PCA) were conducted individually with all 40 hyperspectral images, and PCA scores for the first three components [PCA1 (a), PCA2 (b), and PCA3 (c)] were calculated for each of the 160 spectral bands. These PCA scores were subsequently analyzed for treatment effects [year (difference between the two data sets), days within year, or hybrids]. Vertical lines represent parts of the examined spectrum with no significant treatment effect (P>0.05). Horizontal dotted lines denote spectral ranges with high level of robustness (all three treatment effects being non-significant).

Download Full Size | PDF

3.4. Effect of data reduction on nugget, range, and sill parameters as stress detectors

After having selected the most robust variogram analysis and determined that the range value was the most robust variogram parameter, the importance of size of input data sets was examined. With 40 variogram analyses for each input data set size, it was demonstrated that average range values remained very stable and there appeared to be negligible loss of accuracy by only including 40% of the input data (Fig. 6 ). A 60% reduction in size of input data set represents a 84% data reduction (8 million lag distance pairs instead of 50 million) in computer processing, as variance calculations increase exponentially with number of data points.

 figure: Fig. 6

Fig. 6 Average range values at 706 nm obtained from the most robust variogram analysis (spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals) and based on input data sets ranging from 1,000 – 10,000 data points.

Download Full Size | PDF

3.4. Concluding remarks

Successful application of reflectance and vibrational based analyses is to a large extent dependent upon the ability to extract information from imaging data with low within-class variation as that increases the likelihood of detecting between-class difference. It was demonstrated that reflectance in all six spectral bands showed significant between-year variation and that the three vegetation indices showed significant between-year variation and also significant variation over time within each year. PCA scores from the three main components (PCA1, PCA2, and PCA3) were found to be robust within restricted spectral ranges, so it was clearly demonstrated that a priori selection of spectral bands is critical for successful performance of PCA based classification. It was also shown that range values generated from particularly one variogram analysis (spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals) were very consistent among all combinations of date of data collection and hybrid, and they showed no significant difference between years. Furthermore, it was demonstrated that the size of input data could be reduced by 60% without adversely affecting the robustness of range values.

References and Links

1. S. G. Kong, Y. R. Chen, I. Kim, and M. S. Kim, “Analysis of hyperspectral fluorescence images for poultry skin tumor inspection,” Appl. Opt. 43(4), 824–833 (2004). [CrossRef]   [PubMed]  

2. C. Nansen, N. Abidi, A. J. Sidumo, and A. H. Gharalari, “Using spatial structure analysis of hyperspectral imaging data and fourier transformed infrared analysis to determine bioactivity of surface pesticide treatment,” Remote Sens. 2(4), 908–925 (2010). [CrossRef]  

3. P. M. Mehl, Y. Chen, M. S. Kim, and D. E. Chan, “Development of hyperspectral imaging technique for the detection of apple surface defects and contaminations,” J. Food Eng. 61(1), 67–81 (2004). [CrossRef]  

4. A. M. Vargas, M. S. Kim, T. Yang, A. M. Lefcourt, Y.-R. Chen, Y. Luo, Y. Song, and R. Buchanan, “Detection of fecal contamination on cantaloupes using hyperspectral fluorescence imagery,” J. Food Sci. 70(8), 471–476 (2005). [CrossRef]  

5. A. M. Lefcout and M. S. Kim, “Technique for normalizing intensity histograms of images when the approximate size of the target is known: Detection of feces on apples using fluorescence imaging,” Comp. Elec. Agricult. 50(2), 135–147 (2006). [CrossRef]  

6. B. Park, K. C. Lawrence, R. W. Windham, and D. P. Smith, “Performance of hyperspectral imaging system for poultry surface fecal contaminant detection,” J. Food Eng. 75(3), 340–348 (2006). [CrossRef]  

7. S. R. Delwiche, “Protein content of single kernels of wheat by near-infrared reflectance spectroscopy,” J. Cereal Sci. 27(3), 241–254 (1998). [CrossRef]  

8. R. Cogdill, C. Hurburgh, and G. Rippke, “Single-kernel maize analysis by near-infrared hyperspectral imaging,” Trans. ASAE 47, 311–320 (2004).

9. C. Nansen, M. Kolomiets, and X. Gao, “Considerations regarding the use of hyperspectral imaging data in classifications of food products, exemplified by analysis of maize kernels,” J. Agric. Food Chem. 56(9), 2933–2938 (2008). [CrossRef]   [PubMed]  

10. H. Ma and C. A. Anderson, “Optimisation of magnification levels for near infrared chemical imaging of blending of pharmaceutical powders,” J. Near Infrared Spectrosc. 15(2), 137–151 (2007). [CrossRef]  

11. J. M. Amigo and C. Ravn; “Direct quantification and distribution assessment of major and minor components in pharmaceutical tablets by NIR-chemical imaging,” Eur. J. Pharm. Sci. 37(2), 76–82 (2009). [CrossRef]   [PubMed]  

12. C. Ravn, E. Skibsted, and R. Bro, “Near-infrared chemical imaging (NIR-CI) on pharmaceutical solid dosage forms-comparing common calibration approaches,” J. Pharm. Biomed. Anal. 48(3), 554–561 (2008). [CrossRef]   [PubMed]  

13. K. Peleg, G. L. Anderson, and C. Yang, “Repeatability of hyperspectral imaging systems – quantification and improvement,” Int. J. Remote Sens. 26(1), 115–139 (2005). [CrossRef]  

14. M. Baghzouz, D. A. Devitt, and R. L. Morris, “Evaluating temporal variability in the spectral reflectance response of annual ryegrass to changes in nitrogen applications and leaching fractions,” Int. J. Remote Sens. 27(19), 4137–4157 (2006). [CrossRef]  

15. D. I. Givens and E. R. Deaville, “The current and future role of near infrared reflectance spectroscopy in animal nutrition,” Aust. J. Agric. Res. 50(7), 1131–1145 (1999). [CrossRef]  

16. A. Masoni, L. Ercoli, and M. Mariotti, “Spectral properties of leaves deficient in iron, sulfur, magnesium, and manganese,” Agron. J. 88(6), 937–943 (1996). [CrossRef]  

17. M. Armstrong, Basics of semi-variogram Analysis (Springer 1998).

18. E. H. Isaaks and R. M. Srivastava, Applied Geostatistics (Oxford University Press, New York 1989).

19. S. A. Krajewski and B. L. Gibbs, “Understanding contouring. A practical guide to spatial estimation using a computer and semi-variogram interpretation,” Gibbs Associates (2001).

20. C. Nansen, A. J. Sidumo, and S. Capareda, “Variogram analysis of hyperspectral data to characterize the impact of biotic and abiotic stress of maize plants and to estimate biofuel potential,” Appl. Spectrosc. 64(6), 627–636 (2010). [CrossRef]   [PubMed]  

21. C. Nansen, T. Macedo, R. Swanson, and D. K. Weaver, “Use of spatial structure analysis of hyperspectral data cubes for detection of insect-induced stress in wheat plants,” Int. J. Remote Sens. 30(10), 2447–2464 (2009). [CrossRef]  

22. G. A. Carter and A. K. Knapp, “Leaf optical properties in higher plants: linking spectral characteristics to stress and chlorophyll concentration,” Am. J. Bot. 88(4), 677–684 (2001). [CrossRef]   [PubMed]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1 Reflectance data acquisition from maize individual leaves.
Fig. 2
Fig. 2 Average variogram parameters [nugget (a), sill (b), and range (c)] obtained from the most robust variogram analysis (spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals) for all combinations of date of data collection and maize hybrid.
Fig. 3
Fig. 3 : Average reflectance values in six spectral bands [532 nm (a), 570 nm (b), 693 nm (c), 706 nm (d), 759 nm (e), 786 nm (f)] for all combinations of date of data collection and maize hybrid.
Fig. 4
Fig. 4 : Average vegetation indices [Normalized Difference Vegetation Index (a), NDVI = (R750 - R705) / (R750 + R705), Stress index (b), SI = (R693 / R759), and Photochemical reflectance index (c), PRI = (R531 – R570) / (R531 – R570)] for all combinations of date of data collection and maize hybrid.
Fig. 5
Fig. 5 Principal component analyses (PCA) were conducted individually with all 40 hyperspectral images, and PCA scores for the first three components [PCA1 (a), PCA2 (b), and PCA3 (c)] were calculated for each of the 160 spectral bands. These PCA scores were subsequently analyzed for treatment effects [year (difference between the two data sets), days within year, or hybrids]. Vertical lines represent parts of the examined spectrum with no significant treatment effect (P>0.05). Horizontal dotted lines denote spectral ranges with high level of robustness (all three treatment effects being non-significant).
Fig. 6
Fig. 6 Average range values at 706 nm obtained from the most robust variogram analysis (spherical regression to semi-variance data with lag distance = 2 and 15 lag distance intervals) and based on input data sets ranging from 1,000 – 10,000 data points.

Tables (2)

Tables Icon

Table 1 Robustness of variogram parameters and vegetation indices

Tables Icon

Table 2 Robustness of six spectral bands

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

F ( v ) = a + b ( 1 e ( 3 × D c ) )
F ( v ) = a + b ( 3 D 2 c D 3 2 c 3 )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.