Optical classification of inland waters based on an improved Fuzzy C-Means method

Shun Bi; Yunmei Li; Yunmei Li; Jie Xu; Ge Liu; Kaishan Song; Meng Mu; Heng Lyu; Song Miao; Jiafeng Xu

doi:10.1364/OE.27.034838

1. Introduction

The Case 1 and 2 scheme is commonly used as a way to classify global water types for bio-optical modeling purposes, which was firstly qualitatively defined by Morel and Prieur [1], depending on the predominance of phytoplankton and other covarying compounds. The idealized concept of Case 1 water promoted the success of the first generation of bio-optical models and, to some extent, advanced the birth of ocean color satellite [2]. However, the development and refinement of satellites or sensors facilitated the ocean color science community to start paying more attention to Case 2 waters with more complex bio-optical properties [3–5]. It is generally accepted that there are no universal model parameters applicable to all water types [6,7]. Given that, Case 2 waters have been furtherly divided into more detailed types based on the trophic gradient [8], region-based separation (i.e., using a regional algorithm for one certain study area) [9], or AOPs (apparent optical properties). AOPs summarize the optical properties of water near or above the water surface which could be indirectly derived from satellite records based on the effective atmospheric correction [10]. AOPs are usually considered as the great carrier for water classification [2,11,12] in the form of Forel-Ule index [13], optical spectrum [12,14] or diffuse attenuation coefficient (K_d) [15,16]. Based on the spectral shape of K_d(λ), Jerlov classified waters into three oceanic types and five coastal types which provides a convenient scheme for describing water clarity [17]. A classic example of AOP difference is shown in Fig. 1, where four China inland lakes present considerable variations in both the magnitude and spectral shape of R_rs (remote sensing reflectance, one of AOPs). As a conclusion, AOP-based water classification has long been popular both in Case 1 and 2 waters, such as its subsequent application for optimizing the best model in one certain bio-optical state water [9,14,18–20], or flag-mask process for satellite maps [21,22].

Fig. 1. Pictures from different water color types: (a) Lake Erhai, (b) Lake Dianchi, (c) Lake Hongze and (d) Lake Taihu; the inset plots indicate the magnitude and spectral shape of remote sensing reflectance (R_rs, unit: sr⁻¹) with considerable variations in different kinds of lakes.

Download Full Size | PDF

The AOP-based classification strategies are generally divided into hard clustering and fuzzy clustering (or soft classification), including hard schemes: hierarchical clustering [23], HCM (Hard C-Means) [7], and ISODATA (Iterative Self-Organizing Data Analysis Technique) [24]; and fuzzy schemes: FCM (Fuzzy C-Means) [14], FPCM (Fuzzy and Possibilistic C-Means) [25], and UPFC (Unsupervised Possibilistic Fuzzy C-Means) [26]. The application of hard clustering usually results in uneven or discontinuous maps, violating the first law of geography that near things (waters) are more related than distant things (waters) [27]. Conversely, fuzzy clustering utilizes “membership degrees” to explain the ambiguity between different water types [28]. Membership degree is usually deemed as “weight” in algorithm blending schemes, which helps to remove the sharp boundary in maps produced by hard classification schemes [12,29]. Since the comprehensiveness of water information is expressed by AOP or variants of AOP, they were usually used as the input of fuzzy clustering schemes. These scheme inputs are varied, such as raw AOP [12,14], normalized AOP [29,30], spectral slope [18], spectral depth [6], spectral angle [19], spectral ratio [20], B-spline functions [30], and CIE-based hue angle [13].

These applications have provided useful insight into optical water clustering. However, there have been few investigations into the determination of the fuzzifier (or ambiguity) of fuzzy clustering schemes [12,14,19]. Specifically, the fuzzifier parameter m of FCM has not been discussed in optical water clustering. The study by Moore, et al. [11] tested the effects of different m values in FCM and finally determined the appropriate value by assessing the clustering quality indices. However, a detailed scheme to determine m value is not given, which makes it hard to select an appropriate fuzzifier parameter when facing new data sets. The optical water data set is usually characterized as high dimensional data with multispectral or hyperspectral bands whose satellite sensors are designed for water color. It has been reported that fuzzifier m defaulted to equal 2 in FCM is not suitable for high dimensional data sets and probably makes cluster results more indistinct [31,32]. Meanwhile, algal blooms often occur in inland eutrophic lakes with vegetation-like spectra which should be considered in the optical clustering data set [33]. However, the study of Xue, et al. [7] indicated that FCM with default fuzzifier did not show better performance compared to HCM when determining the appropriate optical clusters in great lakes of Yangtze and Huai River Basin. Furthermore, Spyrakos, et al. [30] reported that using HCM scheme could obtain the distinct cluster result covering the bloom spectrum. Therefore, given high dimensionality of spectral data, inappropriate fuzzifier parameter, and necessity of using FCM (rather than HCM), it motivates us to look for a compromise, which can consider both the membership degree provided by fuzzy clustering and ensure that the seeming outlier objects (such as bloom spectra) are strictly divided.

The main objectives of this study are to 1) improve the FCM method to make the optical cluster more effective and applicable for inland water. 2) to analyze the difference between the optical properties of distinct clusters and 3) to assess the clustering results by applying them to atmospherically corrected images in various kinds of inland waters.

2. Data and methods

2.1 Data collection

A dataset of 1280 in situ hyperspectral R_rs from widely-distributed inland waters were used in the clustering analysis (shown in Table 1). The dataset consisted of data from sixteen inland lakes, reservoirs, and rivers with co-measured parameters of the concentration of Chlorophyll-a (C_Chla) and total suspended matter (C_TSM) together with Secchi disk depth (SDD) and the absorption of phytoplankton, non-phytoplankton, and colored dissolved organic matter (namely a_ph, a_NAP and a_CDOM) within 350-800 nm.

Table 1. Names, locations, elevation, and in situ numbers of the sampled inland waters including lakes (in green), reservoirs (in violet), and rivers (in yellow).

View Table | View all tables in this article

Specifically, the in situ R_rs was measured by the FieldSpec spectroradiometer (Analytical Spectral Devices, Inc. Boulder, CO, USA) according to the above-water method (see more details in [34]). The hyperspectral R_rs spectra were re-sampled to 15 bands of OLCI based on the spectral responding function (https://sentinel.esa.int) which excluded water vapor absorption bands (namely 761, 764 and 768 nm) and adjacency-contaminated band (1020 nm, reported in [35]). This operation was considered as the dimension-reduction to avoid the noise signal from hyperspectral observations to some extent [30]. In situ water samples were collected from the water surface using Niskin bottles that were frozen at −20 °C for laboratory analysis. The value of Chla, TSM, ISM, and OSM could be obtained using the methods of previous studies [36,37]. The a_ph and a_NAP values were obtained using the quantitative filter technique delineated by Yunlin Zhang, et al. [38]. The a_CDOM values were obtained by using a Shimadzu UV-2250 in the region of 240-880 nm described by Mitchell, et al. [39]. All measurements were visually examined to avoid obvious errors.

The samples were gathered during multiple cruises, covering a wide range of biogeochemical and optical variability observed during different seasonal conditions from 2006 to 2018. The investigated inland waters range from turbid river-communicating waters such as Lake Hongze [40], Lake Dongting [41], and Reservoir Three Gorges [42], to the shallow and phytoplankton-dominated waters such as Lake Chaohu [43], Lake Taihu [44], and Lake Dianchi [33], and to the deep and low-reflectance waters such as Lake Erhai [45] and Lake Qiandao [46]. Fortunately, previous case studies focusing on investigated waters have provided plentiful prior knowledge to validate the cluster results quantitatively [33,40,42,44–46].

2.2 Algorithms

2.2.1 Clustering algorithms based on fuzzy logic

Hard clustering algorithms, such as K-means, assume that each object belongs to only one cluster. In practice, the object usually belongs to more than one cluster because of the overlap. To solve this problem, the membership degree of one object to clusters usually be quantitated between 0 and 1. For a given object, an index close to 1 represents a strong association with the cluster and vice versa. FCM, proposed by Bezdek [47], is a well-known clustering algorithm with the main constraint that the sum of membership degree of one object to the entire cluster is equal to 1. For a data set, the membership matrix and cluster centroids can be obtained after the minimization of the total inertia criterion [47]. Meanwhile, as the extension of the basic FCM algorithm, many other fuzzy logic clustering algorithms, such as FPCM [25] and UPFC [26], were proposed to overcome the vulnerability to noise and outliers of data sets suffered by FCM. These algorithms were assembled into one package in R “ppclust” for utilization in this study [48].

Here we introduce the algorithm principle of FCM, while FPCM and UPFC as variants of FCM designed for specific data sets could be referred to in previous studies [25,26,47]. The spectra were initially normalized by dividing their integrals, since the composition of inland waters varies greatly, which changes the reflectance spectral shape rather than the magnitude dominated by water clarity [18,29,30]. Then, the cluster number (K) needs to be defined. The starting centers at each iteration of FCM were selected randomly for their unsupervised clustering process. Thus, we assumed that K was varied from 2 to 20 and repeated clustering 1000 times by randomly bootstrapping over 1150 subsamples (approximately 90% of the whole data) to avoid undue weight to individual spectra. For each process, Partition Entropy (PE), Partition Coefficient (PC), Modified Partition Coefficient (MPC) and Fuzzy Silhouette Index (SIL.F), all available indices for assessing cluster performance in “fclust” R package [49], were computed and compared to achieve the optimal number of cluster K. Finally, the best cluster number in this study is 7, accounting for ∼89% in all bootstrapping results (see more details in open source codes https://github.com/bishun945). FCM aims to ascertain the most characteristic optical water type and to calculate the membership degree, which is achieved by minimizing the objective function defined as follow:

(1)$$\min J(U,V) = \sum\limits_{j = 1}^N {\sum\limits_{i = 1}^K {u_{ij}^m||{{x_j} - {v_i}} ||_A^2} }$$

(2)$$\textrm{with }\sum\limits_{i = 1}^K {{u_{ij}} = 1} \textrm{ and }0 \le \sum\limits_{j = 1}^N {{u_{ij}} \le N} $$

where N is the total number of observations in the data set and K is the cluster number; U=[u_ij]_K_×N is the membership matrix of the j-th observation x_j to each cluster i; ||x_j-v_i|| is the feature distance between x_j and cluster centroids v_i. The matrix norm A is the rule of similarity measurement. Squared Euclidean distance was chosen for this study. The weighting exponent m (1 < m < ∞) is usually called the fuzzifier.

2.2.2 Determination of fuzzifier parameter m

The fuzziness of FCM is controlled by fuzzifier parameter m and matrix norm A, which has been proven by Dembele [31]. Fuzzifier m can strengthen the fuzziness of FCM by increasing m [50]. Conversely, when m is close to one, the FCM approaches HCM [51]. So parameter m should be given more attention once the norm metric is chosen for calculating the feature distance. Dembele and Kastner also proved the importance of m optimization by introducing a mathematical method for determining the appropriate fuzzifier m to cluster DNA microarray data sets [31,32].

Many researchers have utilized m = 2 based on empirical studies to allow easy computation of membership degree matrix [51,52]. However, m = 2 may not be an appropriate fuzzifier parameter for general data sets, revealed by previous studies [11,31,51,52]. Note that parameter m may seriously affect the fuzziness for one belonging to its center and then affects the subsequent application based on membership degree [53]. In this section, an improved FCM method, denoted as FCM-m, is proposed for the water spectral data with a slight modification of the method by Dembele and Kastner [32].

The first step of this process is to define the upper bound value for m (m_ub). For a given data set, there is an m_ub, above which the membership degrees resulting from FCM are equal to 1/K. The method assumes that the coefficient of variance (cv) of the set of distances between the spectra with m = m_ub is close to 0.03p, where p is the data dimension shown as follows:

(3)$$cv\{ {Y_m}\} = \frac{{std({Y_m})}}{{mean({Y_m})}} \approx 0.03p$$

(4)$$\textrm{with }{Y_m} = \{ {[||{{x_j} - {v_i}} ||_A^2]^{\frac{1}{{m - 1}}}};k \ne i = 1,2,\ldots ,N\}$$

where the dissimilarity matrix Y_m, depending on the initial data set and fuzzifier m, is completely independent of the FCM results. Then, we iteratively minimize cv{Y_m}−0.03p with a variable fuzzifier m, ranging from 1.1 to 10 with step 0.1, to determine the m_ub. Once the result of m_ub is equal to 10, the upper limit is extended to 20 and then repeats until convergence. After that, the fuzzifier m used in this study (m_used) could be obtained by m_used = 1 + m_ub /10. Lastly, the flowchart of using FCM-m is described in Fig. 2, and the compiled R functions including implementation of FCM-m and application of images are available at https://github.com/bishun945.

Fig. 2. The flowchart of FCM-m. The raw R_rs spectra are initially normalized by dividing their integrals to capture the spectral shape; then, the fuzzifier parameter (m_used) were optimized by calculating the upper bound m value; finally, water optical clusters and membership degree are generated by Fuzzy C-Means method with Normalized-R_rs and m_used as input.

Download Full Size | PDF

2.2.3 Assessment of the quality of cluster results

The SIL.F was chosen to evaluate the goodness of fuzzy clustering results. Compared to its original silhouette criterion, widely used in crisp (hard) data clustering [7,29], SIL.F has been designed to improve performance in detecting regions with higher data density when clusters overlapped [54]. For each spectrum j, the silhouette width s_j is defined as follows: suppose a_pj is the average dissimilarity between j and all other spectra of the cluster p to which j belongs. For all other cluster centroids C, suppose the feature distance d(j,C) is the average dissimilarity of j to all spectra of C. The minimum value of d(j,C) is b_pj, i.e., min_C[d(j,C)], and can be seen as the dissimilarity between j and its “neighbor” cluster, namely the nearest one to which it does not belong. Finally, the s_j can be calculated as in Eq. (5). In Eq. (6), u_pj and u_qj are the first and second-largest elements of the jth column of the membership degree matrix, respectively, and α ≥ 0 is a weighting coefficient (default: 1) [54]. The SIL.F value lies between −1 and 1. When its value is less than zero, the corresponding spectrum is poorly classified and vice versa [7,32,54].

(5)$${s_j} = \frac{{{b_{pj}} - {a_{pj}}}}{{\max \{ {a_{pj}},{b_{pj}}\} }}$$

(6)$$SIL.F = \frac{{\sum\nolimits_{j = 1}^N {{s_j}{{({u_{pj}} - {u_{qj}})}^\alpha }} }}{{\sum\nolimits_{j = 1}^N {{{({u_{pj}} - {u_{qj}})}^\alpha }} }}$$

3. Results

3.1 Cluster results and properties of optical water types

We observed the distribution of membership degrees via jitter plots when using distinct values. However, i.e., 1.1, 1.36, 2, 3.6 and 5.6, shown in Fig. 3. When m approaches 1 (Fig. 3(a)), the FCM gives less fuzzy membership (i.e., harder) results, similar to those obtained by HCM. While the membership degrees of three algorithms, i.e., FCM, FPCM, and UPFC, with m = 2 were relatively low, most especially for UPFC which failed to extract any clustering structure, indicating algorithms with m = 2 failed to associate any spectrum to any cluster tightly. However, FCM, FPCM, and UPFC with m = 1.36 can obtain appropriate membership distributions in which the majority of the spectra are strongly associated with one given cluster. This is in line with our demand for fuzzy clustering of water spectra which finds a good compromise between the need to assign most spectra to a given cluster, and need to discriminate spectra that classify poorly [7,11,12,32]. When m continues to increase, the membership degree of each cluster is strictly equal to 1/K [47], such as Fig. 3(g) where m = 5.6. However, from statistical results (not shown here), membership degree with m = 3.6 has already reached the upper limit of the data set. Lastly, considering the stability and simplicity, FCM was chosen to be improved for fuzzy clustering of the water spectra data set.

Fig. 3. The distribution of membership degree (between 0 and 1) from fuzzy clustering algorithms (FCM, FPCM, and UPFC) influenced by the distinct fuzzifier parameter m; the data set includes 1280 spectra of inland waters with 15 OLCI bands; the jitter plots indicate the membership distribution of each cluster sorted by the sum of all samplings; the first row (a-c) and the first column (a, d, and g) are the membership degree distribution of FCM with increasing fuzzifier m (i.e., 1.1, 1.36, 2, 3.6, 5.6); (e-f) and (h-i) are the results of FPCM and UPFC with fuzzifier m = 1.36 and m = 2, respectively.

Download Full Size | PDF

Given the optimized m=1.36, seven optimized clusters were obtained by utilizing FCM-m with the mean fuzzy silhouette width=0.5135, showing a well-performed clustering structure (Table 3). The distributions of each cluster for a_CDOM(440), C_Chla, C_TSM, and SSD were also analyzed, as shown in Fig. 4(i)–4(l). The variation and aggregation of each cluster performed well, especially for C_Chla and C_TSM. However, the distribution of a_CDOM(440) is similar in most clusters while the variation of SSD in Cluster 3 is much wider than others. Specifically, Cluster 6 has the strongest variability which presents vegetation-like spectra (Fig. 4(g)) with extremely higher C_Chla and C_TSM, reaching 1000 mg/m³ and 100 mg/L, respectively. Meanwhile, the high absorption of CDOM at 440 nm (from 1.0 to 3.0 m⁻¹) results in low reflectance of Cluster 6. After that, the C_Chla and C_TSM of Cluster 1 are followed by that of Cluster 6 which has a weak reflectance peak at 709 nm. Similar to Cluster 1, the optical clarity, indicated by SSD, of Cluster 2 is higher due to smaller C_TSM, around 32 mg/L, and higher C_Chla, reaching 100 mg/m³. Thus, Cluster 1 and 2 are considered as productive waters. The spectra of Cluster 2 show more obvious peaks and higher reflectance at near-infrared region (NIR) which is optically neighbor to Cluster 6. Cluster 3 and 4 belong to relatively clean inland water types, showing lower-reflected spectra (Fig. 4(d) and 4(e)), lower C_Chla (nearing 10 mg/m³), and less absorption of CDOM at 440 nm. Nevertheless, Cluster 3 is much cleaner since C_TSM of Cluster 3 and 4 are lower and higher than 10 mg/L, respectively. Finally, Cluster 5 and 7 are deemed to be turbid water, dominated by sediments, with a high concentration of TSM reaching 100 mg/L but low C_Chla (around 10 mg/m³), with Cluster 7 being cleaner. Overall, a brief description and example waters of each cluster are provided in Table 2. The detailed spectra value of cluster centers with OLCI bands are shown in Data File 1.

Fig. 4. (a) Combination of the mean raw remote sensing reflectance (R_rs, unit: sr⁻¹) for seven clusters in inland waters with the highest membership which is defined by FCM (m = 1.36) with area-normalized R_rs spectra as input; (b-h) raw R_rs associated with seven optical water types; bold black lines denote the mean R_rs for each cluster; gray lines denote members of each cluster; among them, (g) has a wider y-axis range as its cluster is dominated by floating algae or very high concentrations of biomass; (i-l) boxplots for CDOM absorption coefficient at 440 nm (m⁻¹), Chla concentration (mg/m³), TSM concentration (mg/L) and SSD (m) across the seven optical water types with log10 scale in y-axis.

Download Full Size | PDF

Table 2. Dominant characteristics of clusters in inland waters.

View Table | View all tables in this article

3.2 Satellite image application in case studies

Maps of water optical clusters and membership degrees could be obtained by FCM-m with corrected image spectral reflectance as input. In this section, three representative lakes, namely Lake Taihu, Lake Hongze, and Lake Erhai, were selected for testing FCM-m, since they present high variability of water color which has been extensively reported in previous studies [35,40,44,55–57]. The selected OLCI images were atmospherically corrected prior to FCM-m by utilizing ACbTC (Atmospheric Correction based on Turbidity Classification) method designed for complex inland waters (see more details about imagery process in our previous study [35]). The ACbTC method achieved full-band average values of the mean absolute percentage error (MAPE) = 29.55%, providing the relatively accurate water spectral magnitude and shape as input for FCM-m [35].

3.2.1 Lake Taihu

The true color map of Lake Taihu on July 24, 2017 captured great spatial heterogeneity of water color (Fig. 5(a)). Visually, the algal blooms, occurring in the western bay, were dark green. However, the center of Lake Taihu is darker while the southwest and south are lighter, deemed as more turbid.

Fig. 5. (a) True color map for an atmospherically corrected OLCI image using 2% linear stretching over Lake Taihu, a shallow and phytoplankton-dominated inland lake, on July 24, 2017; (b) the dominant cluster defined by the highest membership degree; (c-i) membership maps for seven clusters; purple areas denote zero or low membership; green areas denote medium membership; yellow areas denote high membership; red lines in (a) and (b) are transects for subsequent analysis.

Download Full Size | PDF

Visual observations are similar to the results presented by FCM-m, shown in Fig. 5(b). Specifically, Cluster 1 and 7 are mainly distributed in the northern bay, namely Meiliang Bay, characterized as a turbid and productive area. The clustering result of FCM-m presents the more spatial structure of details in the complexly distributed west area. The water clusters are distributed as a ladder from near-shore regions to the lake center, i.e., from Cluster 6 to Cluster 1 and 2, and then to Cluster 3 and 4. Moreover, the color difference located at the two northwestern estuaries could also be observed. After that, Cluster 1 and 7, termed as turbid waters, dominated the southern off-shore areas. Collectively, the membership maps display a more “strict” clustering result, which guarantees the belonging of in-cluster members to one given cluster and that strictly reject objects (namely pixels with spectral information) outside the cluster. An appropriate example, shown in Fig. 5(h) is that membership degrees of Cluster 6 in Lake Taihu could be regarded as the result of HCM to some extent, that is, the membership is a bipolar distribution of 0 and 1. However, the edges of some clusters show medium membership, around 0.5, such as Fig. 5(f) and 5(i). These results are what we desire from the fuzzy logical clustering method since there is, inevitably, a fuzzy state belonging to two clusters (or more) when one transition to another. In conclusion, the clustering result of FCM-m for the Lake Taihu OLCI image is reasonable based on our prior knowledge of Lake Taihu [44]. Meanwhile, FCM-m can also achieve a compromise between over-soft clustering and HCM.

3.2.2 Lake Hongze

The second case of Lake Hongze shows typical water color patterns of spatial distribution, described by Cao et al. (2017), that both the estuary and lake center are turbid [40]. Correspondingly, these regions are divided into Cluster 5 and 7, as shown in Fig. 6(b). However, the productive turbid waters, characterized as Cluster 1 and 2, were distributed in western areas which contain macrophytes and higher concentrations of phytoplankton and CDOM than other parts of Lake Hongze [56]. Relatively clean waters, Cluster 3 and 4, could be found in calm bays where re-suspension of sediments is weak due to the topographic and environmental conditions [40]. Additionally, we obtained pixels classified as Cluster 6 at two rivers in the south, but do not think they are pure water signals as narrow rivers are vulnerable to stray light from adjacent land or vegetation [35].

Fig. 6. (a) True color map for an atmospherically corrected OLCI image using 2% linear stretching over Lake Hongze, a shallow and turbid inland lake, on May 18, 2017; (b) the dominant cluster defined by the highest membership degree; (c-i) membership maps for seven clusters; purple areas denote zero or low membership; green areas denote medium membership; yellow areas denote high membership.

Download Full Size | PDF

3.2.3 Lake Erhai

Lake Erhai is considered as a clean inland lake with low reflectance [45], presenting as black in the true color map, shown in Fig. 7(a). The lake is mainly divided into Cluster 3 and 4, located in the central and southern parts (Fig. 7(b)). Meanwhile, Cluster 5 and 7 are located in the northern part of Lake Erhai. This could be explained by the high wind speed in the north, resulting in sediment resuspension with higher C_TSM [57]. However, Cluster 7 in the south is mainly caused by runoff from the west of Lake Erhai, which is consistent with the observation of the river plume in our previous study [55]. Thus, Lake Erhai is not a highly productive lake as there is no apparent high-value distribution in membership maps of Cluster 1, 2, and 6 (Fig. 7(c), 7(d) and 7(h)). The water classified as Cluster 6 around the lakeshore is likely caused by submerged plants on shallow terrain, and of course, the adjacent effect is not excluded [55].

Fig. 7. (a) True color map for an atmospherically corrected OLCI image using 2% linear stretching over Lake Erhai, a deep and low-reflected inland lake, on April 19, 2017; (b) the dominant cluster defined by the highest membership degree; (c-i) membership maps for seven clusters; purple areas denote zero or low membership; green areas denote medium membership; yellow areas denote high membership.

Download Full Size | PDF

4. Discussion

4.1 Is fuzzifier m equal to 2 in FCM appropriate for water optical data sets?

Although few studies have discussed the fuzzifier parameter m in water optical clustering, Moore, et al. [53] studied the influence of different m values of FCM on clustering results based on 159 coastal spectra, and revealed that the quality of the cluster begins to decline with m equal to 2 or more. In fact, Moore’s finding is consistent with our results that three candidate fuzzy clustering methods presented softer membership distribution with m ≥ 2, as shown in Fig. 3. Therefore, is it reasonable to default if m value should be 2 in fuzzy clustering methods for water optical classification? According to our research, we think the m value should be adjusted with the data set. The reasons are as follows. First, we recognize that m = 2 can be simplified in the process of the FCM algorithm which can also cause consistent fuzzy results when dealing with low-dimensional data sets, as reported by previous studies [31,32,47,51,52]. However, in some complex conditions of the FCM application, such as high-dimensional data of DNA or spectrum [32,53], the result of m = 2 is too soft, which leads to an object having to give more memberships to other not-belonging categories. This is not conducive to the FCM application, for instance, retrieval algorithms blending [12,19,20,29], although the clustering may not change. Second, when spectrally vegetation-like waters are mixed into the training data set, it is found that their spectra could not be separated by FCM with m = 2. Undoubtedly, the occurrence of algal blooms in eutrophic lakes is very common, even with reports of large-scale blooms [19,22,33], and as an optical water type in the inland aquatic system, we believe that it should be included in the clustering results. However, spectra of water blooms could not be found in previous studies using fuzzy cluster (or soft classification) methods, results of which, as far as we consider, were attributed to the inappropriate m value. Nevertheless, the clustering method proposed by Spyrakos, et al. [30] with HCM could obtain these bloom spectra. Finally, we assessed the proposed FCM-m based on several published data sets [58,59] and AERONET-OC data, shown in Table 3. The result indicates that the value of m_ub should only depend on the data set itself, and then be transformed to m_used for FCM utilization. Also, we found a general conclusion that the appropriate m_used was closer to 1 when abnormal samples (such as bloom waters) were incorporated into the data set and the dimension (or bands) increased.

Table 3. Parameters used for the FCM algorithm. N, p, and K denote the number of samplings, dimension (or bands), and the optimized cluster number of each data set, respectively; each data set was clustered using both normalized (by dividing with the spectrum integral) and non-normalized spectra; FCM cluster results include upper bound m (m_ub), final used m (m_used), and the mean of SIL.F with better performance in bold font; besides the data set of inland waters used in this study, the data set of this study excluded the spectrally vegetation-like type (this study 2) and other five additional data sets were included.

View Table | View all tables in this article

The failure of using the single algorithm for OWC retrieval has inspired the idea of water optical classification. However, these classification strategies seem to pay more attention to the accuracy of the final blended algorithm [7,12,14,20]. The membership degree of one transect at Lake Taihu (Fig. 5) is shown in Fig. 8 by FCM with m = 1.36 and m = 2. It is evident that the result of FCM with m = 2 (as the default set of previous studies) is unreasonable since it weakens the membership of the in-cluster member and enlarges the membership of other clusters. On the other hand, FCM with m = 2 delays the gradual transition from one cluster to another, as shown in Fig. 8(b), where the membership of Cluster 6 (the bloom water type) approaches zero until the pixels are 25. Notably, the transitional water areas were considered to share similar membership degrees, with their intersections from one cluster to another. There is no doubt that the deviation cumulated by clusters with the second-highest membership degree (often called neighbor) should be taken into account since they contribute greatly to the final algorithm blending, especially for pixels near water blooms in eutrophic lakes.

Fig. 8. The membership degree of seven clusters of one transect at Lake Taihu shown in Fig. 5 by FCM with (a) fuzzifier parameter m = 1.36 and (b) m = 2.

Download Full Size | PDF

4.2 Can FCM-m improve the Chla estimation by blending algorithms?

Two widely used C_Chla estimation algorithms, namely Band-Ratio (BR) and Three-Band algorithm (TBA), were selected to test whether the blending algorithms based on FCM-m improve the retrieval result. Specifically, we used equations that have been parameterized by Gilerson, et al. [60] from coastal and inland water systems. After that, the optimal algorithm for each optical water cluster was selected by evaluating the C_Chla performance, which has been elaborated by previous studies and the algorithms blending was based on the membership matrix from FCM-m using fuzzifier parameter m equal to 1.36 and 2 (see Moore, et al. [12] and references therein).

Here we selected MRPE (Median Relative Percent Error), bias and MAE (Median Absolute Error) as error metrics for assessing the performance of the two algorithms. Recommended by Seegers, et al. [3], all Chla concentration was log-transformed before calculating error metrics. As shown in Table 4, TBA performed better in productive waters (Cluster 1 and 2) while BR obtained lower MRPE, bias, and MAE in relatively clean waters (Cluster 3 and 4). In the context of turbid waters (Cluster 5 and 7), the improvement is considerably different, that TBA is a lot better than BR in Cluster 5. However, regarding Cluster 6 waters with extremely high Chla concentration, both TBA and BR present higher statistical error values than that of other clusters (MRPE > 20%). This is mainly due to the lack of C_Chla estimation algorithm for Cluster 6. Given that the blending strategy inevitably induces the error from the non-optimized algorithm to the final result, what we desire is to reduce the redundant error. In other words, we must pay attention to the effect of huge errors in some water types (such as Cluster 6) on the final blending results. The blended results with m = 1.36 have obtained the more acceptable error metrics of each cluster and total clusters than that with m = 2, which mainly because FCM-m reduces the error from poor-performance types by stricter membership assignment. As reported by previous studies [60], the relationship of the absorption of yellow matter on model using different bands is important to accurately estimate Chla concentrations. The properties of productive waters, clean waters, and turbid waters present a large variety which makes it plausible to apply the specific algorithm in a certain water type for satisfying different model assumptions. Thus, there is no need to propose a universal C_Chla algorithm at present, but rather to improve the accuracy of certain water types (such as Cluster 5 and 7 with low C_Chla but high turbidity or Cluster 6 with extremely high C_Chla).

Table 4. Median Relative Percent Error (MRPE), bias and Median Absolute Error (MAE) for Band-Ratio (BR), Three-Band algorithm (TBA), and blended C_Chla concentration based on results of FCM-m with fuzzifier parameter m equal to 1.36 and 2, assessed by the in situ measured value within the range from 1 to 200 mg/m³; the bold values denote optimal algorithm for that optical water cluster.

View Table | View all tables in this article

4.3 Comparison of different water optical clustering results

In this study, only R_rs spectra from inland waters were selected for FCM training rather than putting open ocean, coast, and inland waters together. We think this is reasonable, just as in the study of Spyrakos, et al. [30], which subdivided the Case 2 water into inland systems and coastal systems. The water optical clusters provided in this study complements the global optical water type, however, it does not include the type of “blue-water” reported by the previous study [30,61], since this type of water is not found in our investigated inland systems, which may be an inadequacy of our data set. Nevertheless, further refinement of this method is necessary to justify the classification scheme when more data are considered. Indeed, the FCM-m method is capable of distinguishing the blue-water spectra by controlling the fuzzifier parameter, which may occur in future studies. We believe that FCM-m is an adaptive algorithm since it depends on the input data self and glad to make FCM-m codes public to be next improved in the community.

By comparing the clustering results with those of the study reported by Moore, et al. [12], shown in Fig. 9, several similar clusters were found, although they came from different training samples. Meanwhile, no low-reflecting water type (such as TM2 in Fig. 9) was found in the clustering results of this study, mainly because the input of FCM was first normalized. Eleveld, et al. [21] supposed that normalization of spectra as input performs better in deep, low-reflecting clear lakes and non-normalization performs better in high-reflecting lakes with high sediment load. The normalization or non-normalization of input samples leads to different concerns for cluster results, as reported by Jackson, et al. [14] and Vantrepotte, et al. [29], so clusters obtained in this study may neglect some parts of low-reflecting inland waters to some extent. However, due to the dominance of phytoplankton in clean water (not in blue-water) [7], spectral shapes of these waters are still represented in our results, such as the water types clustered as Cluster 3 and 4 in Lake Erhai and Lake Qiandao.

Fig. 9. The mean raw remote sensing reflectance (R_rs, unit: sr⁻¹) of seven clusters in inland waters based on FCM-m and data set in this study (namely line C1-7), compared with the cluster results in the study of Moore, et al. [12] denoted as gray dashed lines (namely line TM1-7).

Download Full Size | PDF

4.4 Advantages, expansibility and limitations of FCM-m

We suppose that the advantages of this study are as follows: first, the FCM method is used in water optical clustering, which can provide membership degree compared with the HCM and obtain more reasonable spatial separation of water types in nature; second, the obtained water optical cluster contains hypereutrophic waters (i.e., the spectrally vegetation-like type) which is more in line with the actual situation of inland systems; third, the fuzzifier parameter of FCM is optimized which preserves high membership degree of belonging clusters. This study could provide more reasonable flag mask work for satellite data pre-processing based on membership degree in future applications. After that, the membership degree could be transformed into the weight of algorithms for blending framework which has improved the estimation performance based on parameterized algorithms.

Considering the expansibility and stability of FCM-m to other optical sensors, we re-sampled hyperspectral R_rs spectra to other six sensors and then used as the input of FCM with fuzzifier m in a predefined range (1.1, 1.3, 1.5, 1.7, 2.0, 2.5, 3.0, 4.0, and 5.0). The 6 sensors are OLI (Operational Land Imager onboard Landsat-8 with 5 available bands), VIIRS (Visible Infrared Imaging Radiometer, Suomi NPP, 7), GOCI (Geostationary Ocean Color Imager, COMS, 8), MSI (Multispectral Instrument, Sentinel-2, 9), MODIS (Aqua/Terra, 13), and MERIS (Medium Resolution Imaging Spectrometer, ENVISAT, 13). Note that, here the cluster number was fixed as 7, same as the OLCI-result, which helps us focus on the effect of data set dimension (or band numbers) and fuzzifier parameter m to FCM results (Fig. 10). The optimized m_used increases with the decrease of the band number which is consistent with the result of Yu, et al. [51]. It is believed that for a better clustering structure (or membership distribution), the reduced m value is recommended for the remote sensing spectra with high dimensions (bands), same to DNA microarray data [32].

Fig. 10. The collected hyperspectral R_rs spectra of inland waters (n=1280) were re-sampled to 7 popular sensors, namely OLI (on Landsat-8 with 5 available bands), VIIRS (7), GOCI (8), MSI (9), MODIS (13), MERIS (13), and OLCI (15), and then used as the input of FCM with fuzzifier m in a predefined range (1.1, 1.3, 1.5, 1.7, 2.0, 2.5, 3.0, 4.0, and 5.0). The relationship between the optimal fuzzifier m (m_used), determined by FCM-m, and sensor band numbers were shown in (a). The trend of SIL.F values, presenting the goodness of FCM results, within the predefined range of m was shown in (b). The circles of different colors represent the best m position of each sensor.

Download Full Size | PDF

However, there are still some limitations to this study. Given that water optical clustering is a fit-for-purpose scheme, the m_used optimized by FCM-m may not be the best (a finer step in section 2.2.2 could optimize more accurate values). Furthermore, SIL.F can only evaluate the clustering structure [54], while ignoring the actual benefit of the membership degree, which requires some quantitative evaluation criteria for judging the rationality of the parameters. Nevertheless, after fine-tuning the m_used parameter in different data sets (not shown here), we found that the distribution of membership degree did not change greatly, which indicates that using FCM-m to calculate m value is practical at present. Besides, the water optical clustering is very sensitive to the input data quality [12,14,62]. First, in situ spectra should strictly follow the same measurement method so as to ensure that the results of water optical clustering are contributed to the differences of water properties, rather than the systematic errors from methods or devices. On the other hand, when applying to satellite images, the uncertainty of atmospheric correction can also cause failed clustering results, as reported by previous studies [12,21]. It may be necessary to establish a fuzzy clustering framework based on Rayleigh-corrected R_rs. Its feasibility lies in that aerosol scattering reflectance could be added to the in situ spectra through radiation transfer models or established lookup tables [63].

5. Conclusion

At present, fuzzy clustering has been widely used in water optical classification with the default fuzzifier parameter. However, the preset value might not be suitable for all types of water bodies. Therefore, we proposed an improved FCM method, namely FCM-m, for water optical clustering. The new method optimizes the fuzzifier parameter m based on upper bound m (m_ub) depends on the input data set. Through a widely-distributed in situ data set from China inland waters, we have generated seven representative water optical clusters by using FCM-m. Meanwhile, the method was evaluated on atmospherically corrected image scenes from the low-reflected to sediment/phytoplankton-dominated waters. Our results can be considered as an extension of the current global optical water type used for inland waters since we took into account the spectrally vegetation-like water type. By testing three additional oceanic and coastal data sets, we found that a reduced m is recommended for fuzzy clustering and the optimized m gradually approaches to 1 with the increase of band number. Further, thanks to the stricter membership assignation rule of FCM-m, the performance of algorithms blending based on FCM-m is better than that on the original FCM method. Finally, we believe that FCM-m is an adaptive algorithm, whose R codes are available at https://github.com/bishun945, and needs to be tested by more public data sets.

Funding

National Key R&D Program of China (2017YFB0503902); National Natural Science Foundation of China (41671340, 41701412, 41701423); Major Science and Technology Program for Water Pollution Control and Treatment (2017ZX07302-003); Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX18_1205).

Acknowledgments

We greatly appreciate Prof. Dick Brus, Dr. Thomas Jackson, and Dr. Doulaye Dembele to provide constructive suggestions about the fuzzy clustering method. We would also like to thank the three anonymous reviewers for their careful readings of the text and for many useful suggestions.

References

1. A. Morel and L. Prieur, “Analysis of variations in ocean color 1,” Limnol. Oceanogr. 22(4), 709–722 (1977). [CrossRef]

2. C. D. Mobley, “Optical modeling of ocean waters: Is the case 1-case 2 classification still useful?” Oceanogr 17(2), 60–67 (2004). [CrossRef]

3. B. N. Seegers, R. P. Stumpf, B. A. Schaeffer, K. A. Loftin, and P. J. Werdell, “Performance metrics for the assessment of satellite data products: an ocean color case study,” Opt. Express 26(6), 7404–7422 (2018). [CrossRef]

4. J. Lin, H. Lyu, S. Miao, Y. Pan, Z. Wu, Y. Li, and Q. Wang, “A two-step approach to mapping particulate organic carbon (POC) in inland water using OLCI images,” Ecological Indicators 90, 502–512 (2018). [CrossRef]

5. K. Xue, R. Ma, H. Duan, M. Shen, E. Boss, and Z. Cao, “Inversion of inherent optical properties in optically complex waters using sentinel-3A/OLCI images: A case study using China's three largest freshwater lakes,” Remote Sens. Environ. 225, 328–346 (2019). [CrossRef]

6. D. Sun, Y. Li, Q. Wang, C. Le, C. Huang, and K. Shi, “Development of optical criteria to discriminate various types of highly turbid lake waters,” Hydrobiologia 669(1), 83–104 (2011). [CrossRef]

7. K. Xue, R. Ma, D. Wang, and M. Shen, “Optical Classification of the Remote Sensing Reflectance and Its Application in Deriving the Specific Phytoplankton Absorption in Optically Complex Lakes,” Remote Sens. 11(2), 184 (2019). [CrossRef]

8. Y. Zhang, Y. Zhou, K. Shi, B. Qin, X. Yao, and Y. Zhang, “Optical properties and composition changes in chromophoric dissolved organic matter along trophic gradients: Implications for monitoring and assessing lake eutrophication,” Water Res. 131, 255–263 (2018). [CrossRef]

9. C. Hu, Z. Lee, and B. Franz, “Chlorophyll aalgorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference,” J. Geophys. Res.: Oceans 117(C1), 117 (2012). [CrossRef]

10. G. Zheng and P. M. DiGiacomo, “Uncertainties and applications of satellite-derived coastal water quality products,” Prog. Oceanogr. 159, 45–72 (2017). [CrossRef]

11. T. S. Moore, J. W. Campbell, and M. D. Dowell, “A class-based approach to characterizing and mapping the uncertainty of the MODIS ocean chlorophyll product,” Remote Sens. Environ. 113(11), 2424–2430 (2009). [CrossRef]

12. T. S. Moore, M. D. Dowell, S. Bradt, and A. R. Verdu, “An optical water type framework for selecting and blending retrievals from bio-optical algorithms in lakes and coastal waters,” Remote Sens. Environ. 143, 97–111 (2014). [CrossRef]

13. J. Pitarch, H. J. van der Woerd, R. J. Brewin, and O. Zielinski, “Optical properties of Forel-Ule water types deduced from 15 years of global satellite ocean color observations,” Remote Sens. Environ. 231, 111249 (2019). [CrossRef]

14. T. Jackson, S. Sathyendranath, and F. Mélin, “An improved optical classification scheme for the Ocean Colour Essential Climate Variable and its applications,” Remote Sens. Environ. 203, 152–161 (2017). [CrossRef]

15. N. G. Jerlov, “Classification of sea water in terms of quanta irradiance,” ICES J. Mar. Sci. 37(3), 281–287 (1977). [CrossRef]

16. N. G. Jerlov and F. F. Koczy, Photographic measurements of daylight in deep water (Elanders boktr., 1951).

17. M. G. Solonenko and C. D. Mobley, “Inherent optical properties of Jerlov water types,” Appl. Opt. 54(17), 5392–5401 (2015). [CrossRef]

18. C. Le, Y. Li, Y. Zha, D. Sun, C. Huang, and H. Zhang, “Remote estimation of chlorophyll a in optically complex waters based on optical classification,” Remote Sens. Environ. 115(2), 725–737 (2011). [CrossRef]

19. F. Zhang, J. Li, Q. Shen, B. Zhang, C. Wu, Y. Wu, G. Wang, S. Wang, and Z. Lu, “Algorithms and Schemes for ChlorophyllaEstimation by Remote Sensing and Optical Classification for Turbid Lake Taihu, China,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 8(1), 350–364 (2015). [CrossRef]

20. M. E. Smith, L. R. Lain, and S. Bernard, “An optimized Chlorophyll a switching algorithm for MERIS and OLCI in phytoplankton-dominated waters,” Remote Sens. Environ. 215, 217–227 (2018). [CrossRef]

21. M. Eleveld, A. Ruescas, A. Hommersom, T. Moore, S. Peters, and C. Brockmann, “An optical classification tool for global lake waters,” Remote Sens. 9(5), 420 (2017). [CrossRef]

22. M. W. Matthews and D. Odermatt, “Improved algorithm for routine monitoring of cyanobacteria and eutrophication in inland and near-coastal waters,” Remote Sens. Environ. 156, 374–382 (2015). [CrossRef]

23. K. Shi, Y. Li, Y. Zhang, L. Li, H. Lv, and K. Song, “Classification of inland waters based on bio-optical properties,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 7(2), 543–561 (2014). [CrossRef]

24. F. Mélin and V. Vantrepotte, “How optically diverse is the coastal ocean?” Remote Sens. Environ. 160, 235–251 (2015). [CrossRef]

25. N. R. Pal, K. Pal, and J. C. Bezdek, “A mixed c-means clustering model,” in Proceedings of 6th International Fuzzy Systems Conference, (IEEE, 1997), 11–21.

26. X. Wu, B. Wu, J. Sun, and H. Fu, “Unsupervised possibilistic fuzzy clustering,” J. Comput. Sci. 7, 1075–1080 (2010).

27. W. R. Tobler, “A computer movie simulating urban growth in the Detroit region,” Economic Geography 46, 234–240 (1970). [CrossRef]

28. L. A. Zadeh, “Fuzzy sets,” Information and Control 8(3), 338–353 (1965). [CrossRef]

29. V. Vantrepotte, H. Loisel, D. Dessailly, and X. Mériaux, “Optical classification of contrasted coastal waters,” Remote Sens. Environ. 123, 306–323 (2012). [CrossRef]

30. E. Spyrakos, R. O’Donnell, P. D. Hunter, C. Miller, M. Scott, S. G. Simis, C. Neil, C. C. Barbosa, C. E. Binding, and S. Bradt, “Optical types of inland and coastal waters,” Limnol. Oceanogr. 63(2), 846–870 (2018). [CrossRef]

31. D. Dembele, “Multi-objective optimization for clustering 3-way gene expression data,” Adv. Data Anal. Classif. 2(3), 211–225 (2008). [CrossRef]

32. D. Dembele and P. Kastner, “Fuzzy C-means method for clustering microarray data,” Bioinformatics 19(8), 973–980 (2003). [CrossRef]

33. M. Mu, C. Wu, Y. Li, H. Lyu, S. Fang, X. Yan, G. Liu, Z. Zheng, C. Du, and S. Bi, “Long-term observation of cyanobacteria blooms using multi-source satellite images: a case study on a cloudy and rainy lake,” Environ. Sci. Pollut. Res. 26(11), 11012–11028 (2019). [CrossRef]

34. J. L. Mueller, C. Davis, R. Arnone, R. Frouin, K. Carder, Z. Lee, R. Steward, S. Hooker, C. D. Mobley, and S. McLean, “Above-water radiance and remote sensing reflectance measurements and analysis protocols,” Ocean Optics protocols for satellite ocean color sensor validation Revision 2, 98–107 (2000).

35. S. Bi, Y. Li, Q. Wang, H. Lyu, G. Liu, Z. Zheng, C. Du, M. Mu, J. Xu, S. Lei, and S. Miao, “Inland Water Atmospheric Correction Based on Turbidity Classification Using OLCI and SLSTR Synergistic Observations,” Remote Sens. 10(7), 1002 (2018). [CrossRef]

36. K. Shi, Y. Li, L. Li, and H. Lu, “Absorption characteristics of optically complex inland waters: Implications for water optical classification,” J. Geophys. Res. Biogeosci. 118(2), 860–874 (2013). [CrossRef]

37. Z. Zheng, J. Ren, Y. Li, C. Huang, G. Liu, C. Du, and H. Lyu, “Remote sensing of diffuse attenuation coefficient patterns from Landsat 8 OLI imagery of turbid inland waters: a case study of Dongting Lake,” Sci. Total Environ. 573, 39–54 (2016). [CrossRef]

38. Y. Zhang, E. Zhang, and M. Liu, “Spectral absorption properties of chromophoric dissolved organic matter and particulate matter in Yunnan Palteau lakes,” J. Lake Sci. 21(2), 255–263 (2009). [CrossRef]

39. B. G. Mitchell, M. Kahru, J. Wieland, M. Stramska, and J. Mueller, “Determination of spectral absorption coefficients of particles, dissolved material and phytoplankton for discrete water samples,” Ocean optics protocols for satellite ocean color sensor validation Revision 3, 231 (2002).

40. Z. Cao, H. Duan, L. Feng, R. Ma, and K. Xue, “Climate-and human-induced changes in suspended particulate matter over Lake Hongze on short and long timescales,” Remote Sens. Environ. 192, 98–113 (2017). [CrossRef]

41. Z. Zheng, Y. Li, Y. Guo, Y. Xu, G. Liu, and C. Du, “Landsat-based long-term monitoring of total suspended matter concentration pattern change in the wet season for Dongting Lake, China,” Remote Sens. 7(10), 13975–13999 (2015). [CrossRef]

42. X. Hou, L. Feng, H. Duan, X. Chen, D. Sun, and K. Shi, “Fifteen-year monitoring of the turbidity dynamics in large lakes and reservoirs in the middle and lower basin of the Yangtze River, China,” Remote Sens. Environ. 190, 107–121 (2017). [CrossRef]

43. K. Xue, Y. Zhang, R. Ma, and H. Duan, “An approach to correct the effects of phytoplankton vertical nonuniform distribution on remote sensing reflectance of cyanobacterial bloom waters,” Limnol. Oceanogr.: Methods 15(3), 302–319 (2017). [CrossRef]

44. K. Shi, Y. Zhang, Y. Zhou, X. Liu, G. Zhu, B. Qin, and G. Gao, “Long-term MODIS observations of cyanobacterial dynamics in Lake Taihu: Responses to nutrient enrichment and meteorological factors,” Sci. Rep. 7(1), 40326 (2017). [CrossRef]

45. B. Matsushita, W. Yang, G. Yu, Y. Oyama, K. Yoshimura, and T. Fukushima, “A hybrid algorithm for estimating the chlorophyll-a concentration across different trophic states in Asian inland waters,” ISPRS J. Photogramm. and Remote Sensing 102, 28–37 (2015). [CrossRef]

46. Y. Zhang, K. Shi, Y. Zhang, M. J. Moreno-Madriñán, G. Zhu, Y. Zhou, and X. Yao, “Long-term change of total suspended matter in a deep-valley reservoir with HJ-1A/B: implications for reservoir management,” Environ. Sci. Pollut. Res. 26(3), 3041–3054 (2019). [CrossRef]

47. J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms (Springer Science & Business Media, 2013).

48. R Development Core Team, R: Probabilistic and Possibilistic Cluster Analysis, 2019.

49. R Development Core Team, R: Fuzzy Clustering, 2019.

50. W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets and Systems 158(19), 2095–2117 (2007). [CrossRef]

51. J. Yu, Q. Cheng, and H. Huang, “Analysis of the weighting exponent in the FCM,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1), 634–639 (2004). [CrossRef]

52. M. R. Rezaee, B. P. Lelieveldt, and J. H. Reiber, “A new cluster validity index for the fuzzy c-mean,” Pattern Recognit. Lett. 19(3-4), 237–246 (1998). [CrossRef]

53. T. S. Moore, J. W. Campbell, and H. Feng, “A fuzzy logic classification scheme for selecting and blending satellite ocean color algorithms,” IEEE T. Geosci. Remote Sensing 39(8), 1764–1776 (2001). [CrossRef]

54. R. J. Campello and E. R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis,” Fuzzy Sets and Systems 157(21), 2858–2875 (2006). [CrossRef]

55. S. Bi, Y. Li, H. Lu, L. Zhu, M. Mu, S. Lei, S. Wen, and X. Ding, “Estimation of chlorophyll-a concentration in Lake Erhai based on OLCI data,” J. Lake Sci. 30(3), 701–712 (2018). [CrossRef]

56. Z. Cao, H. Duan, M. Shen, R. Ma, K. Xue, D. Liu, and Q. Xiao, “Using VIIRS/NPP and MODIS/Aqua data to provide a continuous record of suspended particulate matter in a highly turbid inland lake,” Int. J. Appl. Earth Obs. Geoinf. 64, 256–265 (2018). [CrossRef]

57. X. Han, L. Feng, X. Chen, and H. Yesou, “MERIS observations of chlorophyll-a dynamics in Erhai Lake between 2003 and 2009,” Int. J. Remote Sensing 35(24), 8309–8322 (2014). [CrossRef]

58. B. Nechad, K. Ruddick, T. Schroeder, K. Oubelkheir, D. Blondeau-Patissier, N. Cherukuru, V. Brando, A. Dekker, L. Clementson, and A. C. Banks, “CoastColour Round Robin data sets: a database to evaluate the performance of algorithms for the retrieval of water quality parameters in coastal waters,” Earth Syst. Sci. Data 7(2), 319–348 (2015). [CrossRef]

59. A. Valente, S. Sathyendranath, V. Brotas, S. Groom, M. Grant, M. Taberner, D. Antoine, R. Arnone, W. M. Balch, and K. Barker, “A compilation of global bio-optical in situ data for ocean-colour satellite applications,” Earth Syst. Sci. Data 8(1), 235–252 (2016). [CrossRef]

60. A. A. Gilerson, A. A. Gitelson, J. Zhou, D. Gurlin, W. Moses, I. Ioannou, and S. A. Ahmed, “Algorithms for remote estimation of chlorophyll-a in coastal and inland waters using red and near infrared bands,” Opt. Express 18(23), 24109–24125 (2010). [CrossRef]

61. M. Hieronymi, D. Müller, and R. Doerffer, “The OLCI Neural Network Swarm (ONNS): A bio-geo-optical algorithm for open ocean and coastal waters,” Front. Mar. Sci. 4, 140 (2017). [CrossRef]

62. J. Wei, Z. Lee, and S. Shang, “A system to measure the data quality of spectral remote-sensing reflectance of aquatic environments,” J. Geophys. Res.: Oceans 121, 8189–8207 (2016).

63. H. Liu, S. Hu, Q. Zhou, Q. Li, and G. Wu, “Revisiting effectiveness of turbidity index for the switching scheme of NIR-SWIR combined ocean color atmospheric correction algorithm,” Int. J. Appl. Earth Obs. Geoinfor. 76, 1–9 (2019). [CrossRef]

Data set	N	p	K	Normalized			Non-normalized
Data set	N	p	K	m_ub	m_used	SIL.F	m_ub	m_used	SIL.F
This study	1280	15	7	3.6	1.36	0.5146	4.4	1.44	0.3996
This study 2	1270	15	6	4.1	1.41	0.4994	4.3	1.43	0.3989
Valente 2019*	1205	8	5	6.8	1.68	0.6309	7.6	1.76	0.6167
Nechad 2015**	336	9	4	6.3	1.63	0.5474	9.4	1.94	0.7564
AeronetOC 1***	1000	7	4	6.3	1.63	0.4766	9.5	1.95	0.5600
AeronetOC 2***	2000	7	4	6.3	1.63	0.4837	9.6	1.96	0.5637
AeronetOC 3***	3000	7	4	6.2	1.62	0.4854	9.5	1.95	0.5568

Cluster		1	2	3	4	5	6	7	Total
MRPE* (%)	BR**	−7.014	−8.762	−1.210	−3.537	−25.01	34.041	−5.165	−4.118
	TBA***	−5.207	−3.077	−24.18	−5.200	1.997	24.956	−1.749	−4.139
	Blended m=1.36	−4.688	−3.124	−1.682	−3.484	1.982	25.047	−1.742	−2.491
	Blended m=2	−5.380	−4.386	−6.729	−3.717	−3.081	29.584	−3.061	−3.280
bias* (log µg/L)	BR**	−0.108	−0.167	0.010	−0.045	−0.796	0.282	−0.069	−0.058
	TBA***	−0.079	−0.062	−0.265	−0.061	0.054	0.196	−0.017	−0.060
	Blended m=1.36	−0.074	−0.065	−0.018	−0.045	0.054	0.196	−0.020	−0.036
	Blended m=2	−0.085	−0.089	−0.072	−0.046	−0.089	0.252	−0.037	−0.055
MAE* (log µg/L)	BR**	0.229	0.177	0.153	0.118	0.796	0.295	0.166	0.176
	TBA***	0.166	0.096	0.273	0.129	0.158	0.236	0.159	0.147
	Blended m=1.36	0.165	0.096	0.117	0.118	0.158	0.236	0.160	0.141
	Blended m=2	0.174	0.104	0.176	0.119	0.309	0.273	0.155	0.143

Data set	N	p	K	Normalized			Non-normalized
Data set	N	p	K	m_ub	m_used	SIL.F	m_ub	m_used	SIL.F
This study	1280	15	7	3.6	1.36	0.5146	4.4	1.44	0.3996
This study 2	1270	15	6	4.1	1.41	0.4994	4.3	1.43	0.3989
Valente 2019*	1205	8	5	6.8	1.68	0.6309	7.6	1.76	0.6167
Nechad 2015**	336	9	4	6.3	1.63	0.5474	9.4	1.94	0.7564
AeronetOC 1***	1000	7	4	6.3	1.63	0.4766	9.5	1.95	0.5600
AeronetOC 2***	2000	7	4	6.3	1.63	0.4837	9.6	1.96	0.5637
AeronetOC 3***	3000	7	4	6.2	1.62	0.4854	9.5	1.95	0.5568

Cluster		1	2	3	4	5	6	7	Total
MRPE* (%)	BR**	−7.014	−8.762	−1.210	−3.537	−25.01	34.041	−5.165	−4.118
	TBA***	−5.207	−3.077	−24.18	−5.200	1.997	24.956	−1.749	−4.139
	Blended m=1.36	−4.688	−3.124	−1.682	−3.484	1.982	25.047	−1.742	−2.491
	Blended m=2	−5.380	−4.386	−6.729	−3.717	−3.081	29.584	−3.061	−3.280
bias* (log µg/L)	BR**	−0.108	−0.167	0.010	−0.045	−0.796	0.282	−0.069	−0.058
	TBA***	−0.079	−0.062	−0.265	−0.061	0.054	0.196	−0.017	−0.060
	Blended m=1.36	−0.074	−0.065	−0.018	−0.045	0.054	0.196	−0.020	−0.036
	Blended m=2	−0.085	−0.089	−0.072	−0.046	−0.089	0.252	−0.037	−0.055
MAE* (log µg/L)	BR**	0.229	0.177	0.153	0.118	0.796	0.295	0.166	0.176
	TBA***	0.166	0.096	0.273	0.129	0.158	0.236	0.159	0.147
	Blended m=1.36	0.165	0.096	0.117	0.118	0.158	0.236	0.160	0.141
	Blended m=2	0.174	0.104	0.176	0.119	0.309	0.273	0.155	0.143

Optical classification of inland waters based on an improved Fuzzy C-Means method

Abstract

1. Introduction

2. Data and methods

2.1 Data collection

2.2 Algorithms

2.2.1 Clustering algorithms based on fuzzy logic

2.2.2 Determination of fuzzifier parameter m

2.2.3 Assessment of the quality of cluster results

3. Results

3.1 Cluster results and properties of optical water types

3.2 Satellite image application in case studies

3.2.1 Lake Taihu

3.2.2 Lake Hongze

3.2.3 Lake Erhai

4. Discussion

4.1 Is fuzzifier m equal to 2 in FCM appropriate for water optical data sets?

4.2 Can FCM-m improve the Chla estimation by blending algorithms?

4.3 Comparison of different water optical clustering results

4.4 Advantages, expansibility and limitations of FCM-m

5. Conclusion

Funding

Acknowledgments

References

Supplementary Material (1)

Cited By

Figures (10)

Tables (4)

Equations (6)

Optics Express

Cluster	Dominant characteristics	Examples
1	Productive turbid waters with high C_TSM and rich CDOM	Lake Chaohu
2	Highly productive waters, dominated by the phytoplankton	Lake Dianchi
3	Relatively clean inland waters with low CDOM concentration, C_Chla and C_TSM	Lake Qiandao
4	Relatively clean inland waters, optically neighboring to Cluster 3 but with higher C_TSM	Lake Erhai
5	Turbid waters, dominated by sediments	Lake Dongting
6	Spectrally vegetation-like water type with extremely high C_Chla and C_TSM, mainly in hypereutrophic waters	Lake Taihu
7	Turbid waters, optically neighboring to Cluster 5 but with lower C_TSM	Lake Hongze