A method to extract cyanobacteria blooms from satellite imagery with no requirements for any prior atmospheric correction or cloud-masking

Haiqiu Liu; Haiqiu Liu; Hangzhou Li; Hengkui Ren; Jinxiu Hu

doi:10.1364/OE.438838

1. Introduction

Lake Chaohu, one of five largest freshwater inland lakes in China, is suffering from ecological damage as a result of anthropogenic activities. Harmful cyanobacteria blooms [1] occur more frequently in recent years with a growing area up to 1/6 the entire lake area [2–5], and the consequent concerns to most people are the discolored lake water and emitted foul smells.

These incidents promoted numbers of researches on how to monitor cyanobacteria blooms in Lake Chaohu accurately and timely. Compared with in situ investigations, satellite imagery covers wider swath and provides more timely detections [6–7], and has become popular in monitoring cyanobacteria blooms of inland lakes [8–9]. An effective indicator of cyanobacteria blooms is essential to detect cyanobacteria blooms from satellite imagery [10]. The chlorophyll contained in cyanobacteria blooms strongly absorbs in red band and only reflects about 5% red light, while the cellular structure of cyanobacteria blooms causes over 40% NIR (near-infrared) reflection, consequently forming rapid increase from red band 5% to NIR 40% reflectance, which is generally known as “red edge” [11–12] (as shown in Fig. 4 (a) and Fig. 1 (b)). This characteristic forms the basis of current indices for distinguishing cyanobacteria blooms from water. NDVI [13–14] (normalized different vegetation index) separates floating algae from water by using the characteristic of red edge. He et al. [15] highlighted the cyanobacteria blooms in Lake Dianchi by using NDVI and setting its threshold as NDVI>-0.1, similarly, Zhang et al. [16] performed NDVI on Lake Chaohu images and those pixels with NDVI>0.12 were identified as cyanobacteria blooms. EVI [17] (enhance vegetation index) uses the red edge and introduces gain factor and pixel-independent coefficients to strengthen the vegetation signal. Hu et al. [10] applied EVI to detect sargassum on central West Atlantic. Water strongly absorbs light in red, NIR and SWIR (short-wave infrared) bands, and shows opaque or “black” in SWIR band even in turbid environments [18]. FAI [19] (floating algae index) utilities these water characteristics in red-NIR-SWIR bands and the red edge to detect floating algae in turbid environments. Zhang et al. [20] used FAI to monitor cyanobacteria blooms in Erhai Lake. Compared with NDVI and EVI, FAI is less sensitive to atmospheric effects and viewing geometry, and can “see” through thin clouds. However, it does not perform well on distinguishing floating algae from thick clouds [11,19]. Instead of using SWIR band which is not equipped in some satellites, VB-FAH (virtual-baseline floating macro algae height) adopts green and red bands as the baseline to measure the height of NIR reflectance. Xing et al. [21] proposed VB-FAH and applied it to extract macroalgal blooms in the Yellow Sea, results suggest that VB-FAH is comparable to FAI even in absence of SWIR band. NDCI (normalized difference chlorophyll index) retrieves dinoflagellate blooms in turbid coastal waters by using the difference at 708 nm and 665 nm [22]. Caballero Isabel et al. adopted NDCI to monitor the cyanobacteria blooms in Guadiana River from its Sentinel-2A/B MSI imagery [22–25]. Although interpretation suggests NDCI is widely applicable to coastal waters, it requires spectral reflectance at 708 nm, which is not mounted on many satellites.

Fig. 1. (a) Lake Chaohu and the coverages of several satellite images used in this study. (b) True color Sentinel-2A MSI images acquired on Jun.11, 2019 (c) Aerial images of Lake Chaohu acquired on Aug. 5, 2020.

Download Full Size | PDF

However, challenges still remain. First of all, atmosphere correction is necessary for the above indicators to extract cyanobacteria blooms, as they are all base on atmospherically corrected data which could actually compare with in situ information and thus obtain values that can be directly used as proxies for floating algae. But there’s no universal atmosphere correction methods applicable for various sensors, and it’s still challenging to select an appropriate atmosphere correction method without in situ investigations [26]. In addition, clouds are common in optical satellite images [27]. In our previous work, we downloaded the Sentinel-2A/B MSI images of Lake Chaohu from May to November 2020, during which cyanobacteria blooms erupted, and found that the cloud-covered images accounted for more than 70%, while the clear-sky images were less than 30%. Unfortunately, some cloud pixels may be incorrectly identified as cyanobacteria blooms by some indicators (e.g., FAI and NDVI), thus it’s necessary to mask the clouds in images in advance. If an indicator could distinguish cyanobacteria blooms from water and clouds without performing atmospheric correction in advance, it will reduce our workload in selecting proper atmospheric correction method and masking cloud pixels, thereby decreasing the errors introduced by inaccurate atmospheric correction and cloud masking.

TCT (tasseled cap transformation) is a linear preprocessing transformation which allows reduction of data to four dimensions, and its transformation coefficients were initially chosen by finding the best fit to soil brightness, crop greenness and soil wetness, so the first three components describe the bright, green and wet information in terrestrial green vegetation, and the last component is generally regarded as useless noise [28]. Most importantly, TCT allows to directly extract vegetation information from satellite imagery digital numbers (DN) without performing any atmospheric correction in advance [29]. Zhang et al. selected two components (the greenness and wetness components) from TCT to establish an indicator and successfully detected the green macroalgae blooms in Yellow Sea [30], however, cloud mask is required as the indicator misclassifies some cloud pixels as green macroalgae. According to large numbers statistical results, we have found that the last component of TCT, although it is regarded as useless noise, could separate cyanobacteria blooms from clouds and water body like other components, and it set both clouds and water in the value ranges higher than that of cyanobacteria blooms (more information in Section 2.3.1), which is critical for distinguishing cyanobacteria blooms from water and clouds by a single threshold. These give us motivation to find an indicator from TCT components that allows to distinguish cyanobacteria blooms from clouds and water body through a single threshold, without requirements on performing any cloud masking or atmospheric corrections in advance.

2. Materials and methods

2.1 Study area

Lake Chaohu [31°25′-31°42′N, 117°17′-117°50E, as shown in Fig. 1], located in Anhui Province, is one of the five largest inland lakes of China with a length of 54.5 km, maximum width of 21 km, an area of 776 km² and an average water depth of 3.0 m. It is a shallow eutrophic lake. Cyanobacteria blooms appear in the northwest of Lake Chaohu in April and gradually spreads to the southeast as the temperature rises [31].

2.2 Satellite imagery and data set construction

2.2.1 Introducing satellite images

Sentinel-2A/B imagery is used to establish a cyanobacteria bloom indicator, and other sensors, such as Landsat-8 OLI, Chinese GF-1 WFV (wide field view) and HJ-1 A/B, are then adopted to examine the established indicator. Sentinel-2 missions contain two satellites: Sentinel-2A and Sentinel-2B, they both carry MSI (multispectral imager), which cover 13 spectral bands with a swath of 290 km [7]. Landsat-8 OLI is an imager with 9 bands [32]. Chinese GF-1 satellite carries four WFV sensors that cover 4 bands [33–34]. HJ-1 has two satellites: HJ-1A and HJ-1B, they are both equipped with two multispectral cameras that have 4 bands, as shown in Table 1.

Table 1. Parameters of different satellite imagery. (B, G, R and NIR are blue, green, red and near infrared bands, respectively.)

View Table | View all tables in this article

2.2.2 Constructing training and test sets from images over time

Clouds are common in Lake Chaohu optical satellite images, and they usually generate bright areas, dark shadows and cloud edge pixels in images. Cloud moves constantly, causing misalignment between different color bands at the edge of cloud, as shown in Fig. 2 (a). As a result, the cloud edge captured only by blue and green bands shows their synthesized color of blue-green, while the other cloud edge captured only by red and green bands shows their synthesized yellow, as shown in Fig. 2 (b)-(c). Unfortunately, just like some cloud shadow pixels, some edge pixels were misclassified as cyanobacteria blooms by some candidates derived from TCT (more information in Section 2.3.3).

Fig. 2. (a) Misalignment between different color bands at cloud edges caused by clouds’ moving. (b)-(c) Two examples to show yellow and blue-green cloud edges.

Download Full Size | PDF

Large numbers of pixels were collected from Lake Chaohu Sentinel-2A/B MSI images acquired in different seasons from 2019 to 2021. Since few cyanobacteria blooms appear in Lake Chaohu in winter, all the selected images were acquired in spring, summer and autumn. Two images per quarter, so total of 18 images in resent three years were downloaded from Sentinels Scientific Data Hub, and all the images were divided into training and test sets. Each set contained nine images coming from different seasons and different years. It was easy to identify water, cloud, cloud shadow and cloud edge pixels by visual observation, while cyanobacteria bloom pixels were determined by using the following method: three typical algorithms of NDVI, FAI and VB-FAH were adopted to retrieve the cyanobacteria blooms with the thresholds of 0 for NDVI, -1.148 for FAI and 0.038 for VB-FAH (The NDVI threshold of 0 has been confirmed to be suitable for Lake Chaohu Sentinel-2A/B images in our previous work [31]). Since cyanobacteria blooms show red in false color images, those bright red pixels classified as cyanobacteria blooms by all the three algorithms were extract as cyanobacteria bloom samples. Table 2 shows the statistical results of all the pixels, over 200,000 pixels are contained in training and test sets. Figure 3 shows the locations of pixels in the two sets. Since the western half of Lake Chaohu is much more polluted than its eastern half, cyanobacteria blooms mainly erupt in the western half, while rarely occur in the eastern half, so the samples are mainly concentrated in the western half of the lake.

Fig. 3. Distribution maps of samples in training and test sets

Download Full Size | PDF

Fig. 4. (a) Spectral reflectance characteristics of green vegetation. (b) Values of four TCT components performed on cyanobacteria bloom, water and cloud pixels in training set (Circle markers represent the average values and error bars denote the standard deviation).

Download Full Size | PDF

Table 2. Statistical results of training and test sets (unit: pixels, the normal and bold figures denote the numbers of pixels used in training and test sets, respectively. The figures marked with *,’ and ^{^}mean the pixels from the images acquired in 2019, 2020 and 2021, respectively).

View Table | View all tables in this article

2.3 Establishing an indicator to distinguish cyanobacteria blooms from both water body, clouds, cloud shadows and cloud edges by a single threshold

2.3.1 Performing TCT on Sentinel-2A/B MSI images

TCT was proposed to convert Landsat bands into channels of known characteristics: soil brightness, vegetation greenness, and wetness. TCT has been applied to some other sensors, such as Quick Bird, MODIS and Chinese HJ-1 satellite [30]. It is a linear transformation performed on images to extract terrestrial vegetation information. Since Sentinel-2A/B MSI has the central wavelengths in blue, green, red and NIR bands similar to Landsat-8 OLI sensor (as shown in Table 1), the TCT coefficients of Landsat-8 OLI sensor [35] is used to perform TCT on Sentinel-2A/B MSI images, as shown in Eq. (1).

(1)$$\left[ {\begin{array}{c} {\textrm{TCB}}\\ {\textrm{TCG}}\\ {\textrm{TCW}}\\ {\textrm{TCN}} \end{array}} \right]\textrm{ = }\left[ {\begin{array}{cccc} {0.3521}&{0.3899}&{0.3825}&{0.6985}\\ {\textrm{ - }0.3301}&{\textrm{ - }0.3455}&{\textrm{ - }0.4508}&{0.6970}\\ {0.2651}&{0.2361}&{0.1296}&{0.0590}\\ {0.1010}&{\textrm{ - }0.0517}&{0.1964}&{\textrm{ - }0.1239} \end{array}} \right] \ast \left[ {\begin{array}{c} {D{N_b}}\\ {D{N_g}}\\ {D{N_r}}\\ {D{N_n}} \end{array}} \right]$$

where the DN_b, DN_g, DN_r and DN_n denote the DN values of blue, green, red and NIR bands, respectively, and TCB, TCG, TCW and TCN are the four components derived from TCT and represent the brightness, greenness, wetness and noise [28], respectively. All the weights in the first row are positive numbers, thus TCB represents the weighted sum of DN values in all the four bands. A cloud pixel provides the greatest TCB as its reflectivity in the four bands is higher than that of water and cyanobacteria [36]. The first three weights in the second row are negative numbers while the last one is positive, hence TCG allows to highlight the difference between NIR and visible light bands. Similarly, TCN helps to emphasize the difference between NIR and red bands as the third weight has the greatest positive value while the fourth one has the lowest negative value in the last row. Since the reflectance of cyanobacteria rises sharply from red to NIR bands (Red Edge, as shown in Fig. 4 (a)), we guess that TCN, although it is generally considered as useless noise, may also help to distinguish cyanobacteria from other substances. Water body usually has higher reflectance in visible light band than NIR band, with the first three greater weights in the third row, TCW could help to stress the information of water body.

The cyanobacteria bloom, water and cloud pixels in the training set were used to investigate TCT’s ability in cyanobacteria blooms extraction. TCT was performed on each pixel to obtain its four components of TCB, TCG, TCW and TCN. Figure 4 (b) shows the average value of each kind of samples (circle marker) and their standard deviation (error bar). If a component sets cyanobacteria and non-cyanobacteria pixels in separated value ranges, it allows to distinguish cyanobacteria pixels from others by setting thresholds, and a larger difference between its cyanobacteria and non-cyanobacteria pixels may produce a more accurate separation. In this respect, TCB and TCG components may have better performance on distinguishing cyanobacteria blooms from clouds, while in terms of separating cyanobacteria blooms from water body, TCG and TCW components may produce more accurate results. In addition, a cyanobacteria bloom pixel sets its TCB values between that of cloud and water, indicating two different thresholds will be required for TCB to separate cyanobacteria blooms from cloud and water, while a single threshold is enough for TCG and TCN to extract cyanobacteria blooms from others. Each component has its advantages and disadvantages. We try to enhance the advantages through their linear combination. This study aims to find the best linear combination of the four components to separate cyanobacteria blooms from water body and clouds (including clouds, cloud shadows and cloud edges) by a single threshold.

2.3.2 Selecting candidates allowing to separate cyanobacteria blooms from water body

The four components (TCB, TCG, TCW and TCN) are combined linearly, and each has a coefficient of -1, 1 or 0, totaling up to 81 linear combinations (3⁴=81). The 81 combinations are performed on the cyanobacteria bloom and water pixels in the training set to obtain their values. FAI is a typical indicator and perform well on extracting cyanobacteria blooms from clear-sky images. As a comparison, all pixels are first corrected atmospherically by using Sen2Cor tool, a processor specially designed for Sentinel-2 product providing atmospheric correction function, and FAI is then performed on the surface reflectance to calculate their FAI values by using Eq. (2), where R_NIR, R_red and R_SWIR denote the reflectance in NIR, red and SWIR bands, respectively, λ_NIR, λ_red and λ_SWIR represents the center wavelengths in NIR, red and SWIR bands, respectively. For each combination, its values of all the pixels are fitted linearly with FAI values. Finally, the coefficient of determination R², root mean square error (RMSE) and median percent difference (MPD) [37–38] are adopted to evaluate the deviation between each combination fitting results and FAI values. R², RMSE and MPD are calculated by using the following equations:

(2)$$\textrm{FAI} = ({{R_{\textrm{NIR}}} - {R_{\textrm{red}}}} )- ({{R_{\textrm{SWIR}}} - {R_{\textrm{red}}}} )\times {{({{\lambda_{\textrm{NIR}}} - {\lambda_{\textrm{red}}}} )} / {({{\lambda_{\textrm{SWIR}}} - {\lambda_{\textrm{red}}}} )}}$$

(3)$${\{{\textrm{RMS}{\textrm{E}_k}} \}_{81}} = {\left\{ {\sqrt {\frac{1}{N}{{\sum\nolimits_{i = 1}^N {({x_i^k - x_i^{\textrm{FAI}}} )} }^2}} } \right\}_{81}},\quad k = 1,2, \cdots ,81$$

(4)$${\{{\textrm{MP}{\textrm{D}_k}} \}_{81}} = {\{{median({{{\{{100\%\cdot |{{{({x_i^k - x_i^{\textrm{FAI}}} )} / {x_i^{\textrm{FAI}}}}} |} \}}_N}} )} \}_{81}},\quad k = 1,2, \cdots ,81$$

(5)$${\{{{\textrm{R}^\textrm{2}}_k} \}_{81}} = {\left\{ {{{\sum\limits_{i = 1}^N {{{({x_i^k - \overline {{x^{\textrm{FAI}}}} } )}^2}} } / {\sum\limits_{i = 1}^N {{{({x_i^{\textrm{FAI}} - \overline {{x^{\textrm{FAI}}}} } )}^2}} }}} \right\}_{81}},\quad k = 1,2, \cdots ,81,\quad \overline {{x^{\textrm{FAI}}}} = {{\sum\limits_{i = 1}^N {x_i^{\textrm{FAI}}} } / N}$$

where N is the total number of pixels in the study region, $x_{i}^{\mathrm{FAI}}$ denotes the FAI value of the $x_{i}^{k}$ represents the fitting results of the kth combination on the ith pixel. R²_k, RMSE_k and MPD_k represent the R², RMSE and MPD values between the fitting results of the kth combination and FAI value, respectively, and {R²_k}₈₁, {RMSE_k}₈₁ and {MPD_k}₈₁ are the collections of R², RMSE and MPD values of the 81 combinations, respectively.

As a result, over 60% combinations produce a R² not less than 0.9, showing that they are well correlated with the FAI index, among which, 38 combinations have a relatively better performance with R²≥0.95, RMSE<0.007 and MPD<0.4%. Remove those dual combinations and the left 19 combinations are selected as the candidates that allow to separate cyanobacteria blooms from water body. Table 3 shows the selected 19 combinations and their statistical results.

Table 3. Basic regression statistics for the nineteen best candidates

View Table | View all tables in this article

2.3.3 Further selecting candidates allowing to separate cyanobacteria blooms from clouds, cloud shadows and cloud edges

Nineteen candidates were selected in Section 2.3.2 to distinguish cyanobacteria blooms from water body, here we further examine their performance on distinguishing cyanobacteria blooms from clouds. The nineteen candidates are performed on the training set to compute their values on cyanobacteria blooms, cloud, cloud shadow and cloud edge pixels. Results are shown in Fig. 5 (a) and Table 4. The blue, gray, purple, green and red bars represent candidates’ value ranges on cyanobacteria blooms, cloud, cloud shadow, blue-green cloud edge and yellow cloud edge pixels, respectively. Figure 5 (b)-(f) show some details in Fig. 5 (a). It can be seen that only the eight candidates labeled 6^#, 10^#, 11^#, 15^#, 16^#, 17^#, 18^# and 19^# define the cyanobacteria blooms and cloud pixels in completely separated ranges, and among them, only 15^#, 16^#, 17^# candidates allow to distinguish cyanobacteria blooms from cloud shadows and cloud edges, while the others make partly overlapping value ranges between cyanobacteria blooms and cloud shadows or edges.

Fig. 5. (a) Nineteen candidates’ value ranges on cyanobacteria bloom, cloud, cloud shadow, blue-green cloud edge and yellow cloud edge samples. Details of (b) 6^# (c) 10^# and 11^# (d) 15^# (e) 16^# and 17^# (f) 18^# and 19^# candidates (The upper and lower boundaries of each bar denote its maximal and minimal values, respectively).

Download Full Size | PDF

Table 4. Comparison of candidates’ values between cyanobacteria bloom and non-cyanobacteria pixels

View Table | View all tables in this article

Table 5. Numbers of cloud edge pixels misclassified as cyanobacteria blooms by 15^#, 16^# and 17^# candidates (unit: pixels)

View Table | View all tables in this article

Table 4 shows the minimal values of cyanobacteria bloom pixels and the maximal values of non-cyanobacteria pixels in Fig. 5. Min_cyb denotes the minimal values of cyanobacteria blooms pixels, Max_cld, Max_shd, Max_yeled and Max_bged represent the maximal values of cloud, cloud shadows, yellow cloud edges and blue-green cloud edges, respectively. If a candidate makes Min_cyb larger than all non-cyanobacteria pixels’ top values, it allows to separate cyanobacteria blooms from others by a single threshold. Conversely, if a candidate makes Min_cyb smaller than Max_cld, Max_shd, Max_yeled or Max_bged, some non-cyanobacteria pixels will be misclassified as cyanobacteria blooms, even if the threshold is set to Min_cyb. Hence only 15^#, 16^# and 17^# candidates allow to distinguish cyanobacteria blooms from others by a single threshold.

17^# candidate has the following two advantages over 15^# and 16^# candidates: (1) For 15^#, 16^# and 17^# candidates, they all make yellow cloud edge values Max_yeled top in all non-cyanobacteria pixels. Hence their thresholds could be assigned between cyanobacteria blooms’ minimal value Min_cyb and yellow cloud edges’ maximal value Max_yeled, that is (606 640) for 15^#, (407 514) for 16^# and (175 330) for 17^#, as shown in Table 4, leaving redundancies of 34 for 15^#, 107 for 16^# and 155 for 17^#. Obviously, 17^# candidate provides a larger redundancy for its threshold than 15^# and 16^# candidates do. Since other pixels may exceed the value ranges calculated from training set, a larger redundancy may produce more reliable separation than the smaller one. (2) Clouds usually cover more pixels than cloud shadows and edges. If a candidate could set clouds to the values with the largest difference from cyanobacterial blooms, it will provide the lowest cloud misclassification. 17^# candidate makes the cloud value Maxcld differ most from cyanobacterial bloom Mincyb (Min_cyb > Max_yeled > Max_bged > Max_shd > Max_cld). Furthermore, the differences between Maxcld and Mincyb are 338 for 15^# (640-302 = 338), 493 for 16^# candidate (514-21 = 493) and 893 for 17^# (330-(-563) =893), as shown in Table 4. So, 17^# candidate sets the cloud value farthest from that of cyanobacterial bloom and provides a wider gap between cyanobacterial blooms and clouds than 15^# and 16^# candidates do, consequently ensuring the lowest misclassification of cloud pixels.

The test set is used to examine the selected eight candidates’ performance, which contains nine Lake Chaohu Sentinel-2A/B MSI images acquired in different seasons in the past three years (as shown in Table 2). For 15^#, 16^# and 17^# candidates, their thresholds should be set between the minimum of cyanobacteria pixels and the maximum of non-cyanobacteria pixels, that is, (606 640) for 15^#, (407 514) for 16^# and (175 330) for 17^#. Here, we set the threshold to the average value: 623 for 15^#, 460.5 for 16^# and 252.5 for 17^#. For 6^#, 10^#, 11^#, 18^# and 19^# candidates, their cyanobacteria pixel values overlap with some non-cyanobacteria pixel values, thus their thresholds are set to be slightly lower than the minimum values of cyanobacteria pixels, to ensure that as many cyanobacteria pixels as possible are identified and as few non-cyanobacteria pixels as possible are misidentified. So, the thresholds are set to be -1235 for 6^#, -1939 for 10^#, -3701 for 11^#, -565 for 18^# and -894 for 19^#. Those pixels with values larger than the thresholds are identified as cyanobacteria blooms. The following ratios are used to measure the performances:

(6)$${\alpha _{cyb}}\,{=}\,{{{n_{cyb}}} / {{N_{cyb}}}},\;{\beta _{cld}}\,{=}\,{{{n_{cld}}} / {{N_{cld}}}},\;{\beta _{shd}} = {{{n_{shd}}} / {{N_{shd}}}},\;{\beta _{bged}} \,{=}\,{{{n_{bged}}} / {{N_{bged}}}},\;{\beta _{yeled}}\,{=}\,{{{n_{yeled}}} / {{N_{yeled}}}}$$

where N_cyb, N_cld, N_shd, N_bged and N_yeled denote the numbers of cyanobacteria, cloud, cloud shadow, blue-green cloud edge and yellow cloud edge pixels in the test set, respectively. n_cyb represents the number of cyanobacteria pixels that are correctly identified as cyanobacteria blooms, n_cld, n_shd, n_bged and n_yeled are the numbers of cloud, cloud shadow, blue-green cloud edge and yellow cloud edge pixels that are misclassified as cyanobacteria blooms, respectively.

Results are shown in Fig. 6. 6^#, 10^#, 11^#, 18^# and 19^# candidates seriously misclassify cloud shadow pixels, as shown in Fig. 6 (b), by contrast, 15^#, 16^# and 17^# candidates provide 0.00% misclassified shadow, and relatively lower false-positive ratios of blue-green cloud edge pixels, as shown in Fig. 6 (d). Compared with 15^# and 16^# candidates, 17^# candidate makes lower false-positive ratios of cloud and cloud edge pixels, as shown in Fig. 6 (c)-(e), and it also gives more accurate identification of cyanobacteria pixels, as shown in Fig. 6 (a). The results are consistent with the above theoretical analysis based on the training set. In general, 17^# candidate performs better than other candidates.

Fig. 6. Comparisons of true-positive and false-positive ratios of eight candidates. (a) Cyanobacteria pixels true-positive ratios. False-positive ratios of (b) cloud shadow, (c) cloud, (d) blue-green cloud edge and (e) yellow cloud edge pixels.

Download Full Size | PDF

We select three images to visually display the results, as shown in Fig. 7. Three areas are picked which contain cyanobacteria blooms, clouds, cloud shadows and cloud edges, and more importantly, all the substances appear in a small enough area so that they can be magnified to a significant size for easy observation. Fig. 7 (a) - (c) show the false color images to highlight the cyanobacteria blooms in red, the areas enclosed by yellow rectangles contain cyanobacteria bloom, cloud, shadow and edge pixels, and are magnified in Fig. 7 (a1) - (c1). Fig. 7 (a2) - (c2) show the true color images after being Gaussian stretched to highlight their cloud shadows pixels. The blue pixels in Fig. 7 (d) - (Δ) denote the detected cyanobacteria blooms. It can be seen that the cyanobacteria blooms in Fig. 7(a1) - (c1) are basically detected by all the eight candidates, while some cloud shadows are incorrectly identified as cyanobacteria blooms by 6^#, 10^#, 11^#, 18^# and 19^# candidates (yellow dotted circles or ellipses in Fig. 7 (d) - (f), (j) - (n), (r) - (v), (z) - (Δ)). By contrast, 15^#, 16^# and 17^# candidates could distinguish the cyanobacteria blooms from cloud shadow pixels, but they misclassify some cloud edge pixels as cyanobacteria blooms (red circles in Fig. 7 (g)-(i), (o)-(q), (w)-(y), and their detail views (g1) -(i1), (o1) - (q1), (w1) - (y1)).

Fig. 7. Cyanobacteria blooms detected by eight candidates. False color images of Lake Chaohu Sentinel-2A/B MSI acquired on (a) May. 11, 2020, (b) May. 26, 2020 and (c) Oct. 8, 2020. (a1) - (c1) false color and (a2) - (c2) true color images of Areas 1 in Fig. 7 (a), Areas 2 in Fig. 7 (b) and Areas 3 in Fig. 7 (c), respectively. Results of Area 1 detected by candidates of (d) 6^#; (e) 10^#; (f) 11^#; (g) 15^#; (h) 16^#; (i) 17^#; (j) 18^#; (k) 19^#, (g1) - (i1) details of Fig. 7 (g)-(i). Results of Area 2 detected by candidates of (l) 6^#; (m) 10^#; (n) 11^#; (o) 15^#; (p) 16^#; (q) 17^#; (r) 18^#; (s) 19^#, (o1) - (q1) details of Fig. 7 (o)-(q). Results of Area 3 detected by candidates of (t) 6^#; (u) 10^#; (v) 11^#; (w) 15^#; (x) 16^#; (y) 17^#; (z) 18^#; (Δ) 19^#, (w1) - (y1) details of Fig. 7 (w)-(y) (Yellow circles and ellipses are the misclassified cloud shadows. Red circles are the misclassified cloud edges).

Download Full Size | PDF

In addition, we select three areas to compare the results of 15^#, 16^# and 17^# candidates, as shown in Fig. 8, the three areas contain blue-green and yellow cloud edge pixels and they are small enough to be magnified to a significant size for easy observation. Some yellow cloud edge pixels are misunderstood as cyanobacteria blooms, while blue-green cloud edge pixels are not. In the three areas, the numbers of misclassified yellow cloud edge pixels are 246, 118 and 447 pixels for 15^# candidate (Fig. 8 (d)-(f)), 97, 45 and 197 pixels for 16^# candidate (Fig. 8 (g)-(i)), 29, 17 and 130 pixels for 17^# candidate (Fig. 8 (j)-(l)). In average, 17^# candidate misclassifies only 21.70% of that by 15^# candidate and 51.92% of that by 16^# candidate, as shown in Table 5. 17^# candidate provides the lowest false positive rate of cloud edge pixels.

Fig. 8. Three areas extracted from Lake Chaohu Sentinel-2A/B MSI images and their results of cyanobacteria blooms detected by 15^#, 16^# and 17^# candidates. (a) Area 1 from the image acquired on May. 11, 2020; (b) Area 2 from the image acquired on Sept. 29, 2019; (c) Area 3 from the image acquired on Sept. 28, 2020. Results detected by 15^# candidate in (d) Area 1, (e) Area 2 and (f) Area 3. Results detected by 16^# candidate in (g) Area 1, (h) Area 2 and (i) Area 3. Results detected by 17^# candidate in (j) Area 1, (k) Area 2 and (l) Area 3 (The figures in (d)-(l) mean the numbers of pixels classified as cyanobacteria blooms).

Download Full Size | PDF

Obviously, 17^# candidate allows to separate cyanobacteria pixels from water, cloud and cloud shadow pixels by a single threshold, and it provides the lowest false positive rate of cloud edge pixels. So, it is selected as the optimal combination of the components from TCT. Therefore, the indicator that allows to distinguish cyanobacteria blooms from water, clouds, cloud shadows and most cloud edges (ICW3C) is defined as:

(7) $$ICW3C = TCG - TCW + TCN$$

According to Table 4, a threshold value range TH_ICW3C for using ICW3C to extract Lake Chaohu cyanobacteria blooms from Sentinel-2A/B MSI imagery is given as Eq. (8).

(8)$$T{H_{\textrm{ICW3C}}} = ({175\;\;330} )$$

3. Results and discussions

This section is divided into three parts. Firstly, ICW3C’s sensitivities to different choices within its threshold value range and different viewing angles are discussed. Following this, ICW3C is compared with current indicator of FAI to examine its performance over time. Since ICW3C is established based on Sentinel-2A/B MSI imagery, experiments performed on other sensors (i.e., Chinese GF1-WFV) are conducted in the last part to investigate its extension.

3.1 Sensitivities of ICW3C

3.1.1 Sensitivity of ICW3C to threshold

Experiments are conducted on the test set to demonstrate how results vary with the choice of threshold. According to Eq. (8), the threshold of ICW3C TH_ICW3C is set in the range of (175 330) with a step of 0.5. We first perform ICW3C on all the pixels in the test set, and then α_cyb, β_cld, β_shd, β_bged and β_yeled are used to measure the ratios of true and false positives (their equations are shown in Section 2.3.3). Results are shown in Fig. 9. As the threshold decreases from 330 to 175, the true-positive ratio of cyanobacteria improves slightly by 0.002%, while the false-positive ratios of clouds and yellow cloud edges significantly increase by 141.7% and 188.9%, respectively, as shown in Table 6. Threshold reduction mainly causes more misclassifications of cloud and yellow cloud edges. Even so, the false-positive ratio of clouds still remains very low values (<0.03%). Although yellow cloud edge tops in both absolute false-positive ratio and relative deviation, it usually covers far fewer pixels than other non-cyanobacterial substances. Thereby, the changes of threshold within its value range will not lead to serious increase in misclassification.

Fig. 9. Variations of true and false positive ratios with threshold decreasing from 330 to 175.

Download Full Size | PDF

Table 6. Statistic results of true and false positive ratios with threshold decreasing from 330 to 175 (‘$\nearrow $’ denote the ratios increase from the left data with threshold of 330 to the right data with threshold of 175)

View Table | View all tables in this article

As we know, cyanobacteria blooms are highlighted in red in false color satellite images and could be identified through visual inspection [39–40]. So, false color images are a good way to help us determine the specific value of threshold. Of course, if you do not have much time to choose the threshold visually, the intermediate value of 252.5 in TH_ICW3C is a good choice as it provides redundancy for both cyanobacteria and non-cyanobacteria pixels.

However, it should be noticed that the threshold range given by Eq. (8) applies only to Sentinel-2A/B MSI images, difference exists in images from different sensors even they shooting the same object, resulting in deviations between their ICW3C values, which, however, determine the threshold for extracting cyanobacteria blooms. So, the threshold of ICW3C varies with satellite sensors, this explains why the thresholds of ICW3C in Sentinel-2A/B MSI, GF1-WFV, Landsat8-OLI and HJ1A/B are different from each other (Section 3.3). If ICW3C is extended to other sensors, its threshold cloud be determined by using the following method: (1) Collecting samples containing cyanobacteria bloom, water, cloud, cloud shadow and cloud edge pixels, and calculating their values of ICW3C. The threshold could be roughly set to the middle value of minimum ICW3C values of cyanobacteria blooms and the maximum values of other samples. (2) The threshold could be fine adjusted with the assistance of false color images where cyanobacteria blooms are highlighted in red.

3.1.2 Sensitivity of ICW3C to viewing angle

The Lake Chaohu Sentinel-2A/B MSI image acquired on Jul.15, 2021 is used to deserve how ICW3C’s performance varies with viewing geometry. Two lines of pixels with large width in zonal and meridional directions were selected (the yellow lines in Fig. 10 (a)). ICW3C was performed on both the lines’ DN values (blue curves in Fig. 10 (b)-(c)). As a comparison, both lines of pixels were first corrected atmospherically by using Sen2Cor tool, and FAI and VB-FAH were then performed on the surface reflectance to get their FAI (gray curves in Fig. 10 (b)-(c)) and VB-FAH values (purple curves in Fig. 10 (b)-(c)). Results are shown in Fig. 10. The coefficient of determination R² is used to evaluate the correlation between ICW3C and FAI/VB-FAH. The R² between ICW3C and VB-FAH is slightly higher than that between ICW3C and FAI in both directions, nevertheless, all the values of R² exceed 0.95, indicating that ICW3C is strongly correlated with FAI and VB-FAH in both directions. This proves that, just like FAI and VB-FAH, ICW3C performs stable to variations of viewing geometry.

Fig. 10. Comparisons between ICW3C and FAI/VB-FAH in meridional and zonal directions. (a) Lake Chaohu Sentinel-2A/B MSI image acquired on Jul.15, 2021. Comparisons between ICW3C and FAI/VB-FAH in (b) meridional direction and (c) zonal direction. (d) Fitting results between ICW3C and FAI/VB-FAH.

Download Full Size | PDF

3.2 Comparison between ICW3C and FAI over time

The test set contains over 100,000 pixels from 9 images acquired in different months in recent three years (as shown in Table 2), which was used to compare ICW3C and FAI over time. ICW3C was performed on the pixels’ DN values, and then all the pixels were corrected atmospherically with Sen2Cor tool to obtain surface reflectance data, following this, FAI was adopted to calculate FAI values. Thresholds were set to 252.5 for ICW3C and -1.148 for FAI, and the pixels with values higher than thresholds were classified as cyanobacteria bloom. α_cyb, β_wat, β_cld, β_shd, β_bged and β_yeled were used to measure the ratios of true and false positives (as Eq. (6), and β_wat=n_wat/N_wat, where N_wat is the number of water pixels in test set, and n_wat is the number of water pixels that are misclassified as cyanobacteria bloom). Figure 11 (a) shows their values calculated from all pixels. It can be seen that FAI successfully identifies all cyanobacteria bloom pixels in the test set, slightly better than ICW3C (99.99%), and they both provide 0% false-positive ratios of water and cloud shadow pixels. In addition, ICW3C misclassifies 0.02% of cloud pixels and 1.55% of yellow cloud edge pixels, by contrast, 19.18% clouds, 13.74% yellow cloud edges and 19.34% blue-green cloud edges are incorrectly identified as cyanobacteria blooms by FAI, which is fatal flaw for images heavily covered by thick clouds (FAI could distinguish cyanobacteria from thin clouds [11]), as shown in Table 7. Figure 11 (b) compares the true-positive ratios between ICW3C and FAI over time, and the false-positive ratios of cloud pixels are shown in Fig. 11 (c). Although ICW3C’s recognition ratio of cyanobacteria bloom pixels varies from 99.94% to 100%, and its misclassification ratio of cloud pixels changes from 0% to 0.08%, it provides performance comparable to FAI over time, and their disparity manly lays in the misclassification of cloud and cloud edge pixels.

Fig. 11. Comparisons between ICW3C and FAI from May 2019 to September 2021. (a) Overall comparisons of cyanobacteria pixels true-positive ratio and non-cyanobacteria pixels false-positive ratios throughout the period. (b) Comparisons of cyanobacteria pixels true-positive ratios by month. (c) Comparisons of cloud pixels false-positive ratios by month.

Download Full Size | PDF

Table 7. Comparisons between FAI and ICW3C (α_cyb is cyanobacteria pixels true-positive ratio, β_cld, β_yeled and β_bged are the false-positive ratios of cloud, yellow cloud edge and blue-green cloud edge pixels (as shown in Eq. (6)), respectively. R1-R5 are the five images in Fig. 12(a)-(e). n_cyb and n_cld are the numbers of cyanobacteria and cloud pixels identified as cyanobacteria blooms, respectively).

View Table | View all tables in this article

In order to demonstrate the two algorithms performance, two clear-sky regions that contain cyanobacteria bloom pixels and three cloud-covered regions without cyanobacteria pixels were cropped from the images used for constructing the test set, while it should be emphasized that the pixels in the test sets were not included in the above image regions. The former was used to compare the extraction of cyanobacteria bloom pixels between ICW3C and FAI, and the latter was used to discuss their false positives of non-cyanobacteria pixels. In each image region, ICW3C was directly performed on its DN values and the threshold was set to 252.5. As a comparison, each image region was first corrected atmospherically by using the Sen2Cor tool release by ESA [41], and FAI was then performed on the surface reflectance, and the pixels with the FAI values over -1.148 were classified as cyanobacteria blooms.

In clear-sky image regions with cyanobacteria blooms (Fig. 12 (a)-(b)), both algorithms have successfully retrieved the bright red pixels of cyanobacteria blooms (Fig. 12 (a1) - (a2) and (b1) - (b2)), while their differences lies in the edges of cyanobacteria (Fig. 12 (a3) - (b3)), where are dark red pixels. The dark red pixels in the edges of cyanobacteria blooms perhaps are low-concentration cyanobacteria. Obviously, FAI classifies more dark red pixels as blooms, while ICW3C performs more cautious with dark red pixels, as shown in Fig. 12 (a4) - (a5) and Fig. 12 (b4) - (b5). As a result, FAI extract an average 5.81% more cyanobacteria pixels than ICW3C does, as shown in Table 7.

Fig. 12. Regions from Lake Chaohu Sentinel-2A/B MSI images and results of cyanobacteria blooms detected by FAI and ICW3C. False color images acquired on (a) May. 11, 2020 and (b) Sept. 23, 2021. Results of Fig. 12 (a) by using (a1) FAI and (a2) ICW3C. (a3) Difference between Fig. 12 (a1) and (a2). Results of Fig. 12 (b) by using (b1) FAI and (b2) ICW3C. (b3) Difference between Fig. 12 (b1) and (b2). (a4) - (a5) Details in Fig. 12 (a1) - (a2). (b4) - (b5) Details in Fig. 12 (b1) - (b2). True color images acquired on (c) Oct. 8, 2020, (d) May. 11, 2020 and (e) Jul.15, 2021. Results of Fig. 12 (c) by using (c1) FAI and (c2) ICW3C. Results of Fig. 12 (d) by using (d1) FAI and (d2) ICW3C. Results of Fig. 12 (e) by using (e1) FAI and (e2) ICW3C. (f) Northeast Lake Chaohu false color images acquired on Oct.8, 2020. Results of Fig. 12 (f) by using (f1) FAI and (f2) ICW3C (The figures are the numbers of pixels detected as cyanobacteria blooms).

Download Full Size | PDF

In cloud-covered image regions without cyanobacteria blooms (Fig. 12 (c)-(e)), FAI misclassifies 569, 1638 and 3874 cloud and edge pixels as cyanobacteria blooms, while ICW3C only incorrectly identified 7, 3 and 0 cloud and edge pixels, as shown in Fig. 12 (c1) - (e2). In average, FAI misclassifies over 608 times as many cloud and cloud edge pixels as ICW3C, as shown in Table 7. Figure 12 (f) show the cloud-covered image of northeast Lake Chaohu acquired on Oct. 08, 2020. Although ICW3C and FAI both extract the cyanobacteria blooms, FAI misclassifies large numbers of cloud and edge pixels as cyanobacteria, as shown in Fig. 12 (f1) - (f2).

3.3 ICW3C’s adaptability on other sensors

Since the indicator of ICW3C is established based on Sentinel-2A/B MSI imagery, experiments performed on other sensors, such as GF1-WFV, Landsat8-OLI, and HJ1A/B, are conducted to test its adaptability on other sensors. ICW3C was performed on the clear-sky GF1-WFV image acquired on Sept. 4, 2018, Landsat8-OLI image acquired on Oct. 3, 2018 and HJ1A/B image acquired on Jul. 31, 2018. The thresholds of ICW3C are 252.5, 500 and 40 for GF1-WFV, Landsat8-OLI and HJ1A/B sensors, respectively. As a comparison, FAI was used to detect cyanobacteria blooms in the Landsat8-OLI image with the threshold of 0.025, and VB-FAH were performed on the other two sensors’ images with a threshold of 0.038 due to lack of SWIR bands in GF1-WFV and HJ1A/B sensors. Fig. 13 (a), (b) and (c) show the false color Lake Chaohu images from GF1-WFV, Landsat8-OLI and HJ1A/B sensors, respectively, and their cyanobacteria blooms are highlighted in red. In the three images, the extracted cyanobacteria bloom areas are 168.57 km², 157.91 km² and 30.59 km² for FAI or VB-FAH, 163.25 km², 153.16 km² and 29.26 km² for ICW3C, as the pink pixels shown in Fig. 13 (d)-(i), and their relative deviations are 3.16%, 3.01% and 4.35% (Table 8). It can be seen that, just like the Sentinel-2A/B MSI sensor, the GF1-WFV, Landsat8-OLI and HJ1A/B sensors also use ICW3C to detect the cyanobacteria areas that are basically consistent with that of VB-FAH/FAI, proving that the established indicator of ICW3C is also applicable for GF1-WFV, Landsat8-OLI and HJ1A/B sensors.

Fig. 13. Results of cyanobacteria blooms in Lake Chaohu extracted from different sensors. False color images from (a) GF1-WFV acquired on Sept.4, 2018, (b) Landsat8-OLI acquired on Oct.3, 2018 and (c) HJ1A/B acquired on Jul.31, 2018. Results of cyanobacteria blooms in Fig. 13 (a) detected by (d) VB-FAH and (g) ICW3C. Results of cyanobacteria blooms in Fig. 13 (b) detected by (e) FAI and (h) ICW3C. Results of cyanobacteria bloom in Fig. 13 (c) detected by (f) VB-FAH and (i) ICW3C. The pink pixels in Fig. 13 (d)-(i) denote the detected cyanobacteria blooms.

Download Full Size | PDF

Table 8. Statistic results of cyanobacteria blooms areas detected from different sensors

View Table | View all tables in this article

4. Conclusions

Cloud-covered images of Lake Chaohu account for over 70%, while the traditional indicators usually mistake some cloud pixels as cyanobacteria blooms, resulting in the need for cloud-masking before extracting cyanobacteria blooms from satellite images. In addition, atmospheric correction is another challenge in extracting cyanobacteria blooms from satellite images due to lack of general atmospheric correction methods and the difficulties in evaluating the accuracy of atmospheric correction without in situ measurement. Fortunately, TCT allows to extract vegetation properties directly from satellite imagery DN values without conducting any atmospheric correction in advance, which provides a perspective for extracting cyanobacteria blooms independent from atmospheric correction. Therefore, this study takes the advantage of TCT to establish an indicator, which allows to distinguish cyanobacteria blooms from clouds and lake water directly from image DN values without conducting any atmospheric correction or cloud-masking. Experiments are performed on satellite images to test the established indicator, and conclusions can be made as follows:

(1) Comparisons between the established ICW3C and traditional indicator FAI are first performed on the test set which contains samples from May 2019 to September 2021, and results show that 1) FAI retrieves cyanobacteria bloom samples with the accuracy of 100%, slightly better than ICW3C of 99.99%, and they both provide 0% false-positive ratios of water and cloud shadow pixels. 2) Their disparity manly lays in the misclassification of cloud and cloud edge pixels. ICW3C misclassifies 0.02% of cloud pixels and 1.55% of yellow cloud edge pixels as cyanobacteria blooms, however, 19.18% cloud, 13.74% yellow cloud edge and 19.34% blue-green cloud edges pixels are incorrectly identified as cyanobacteria blooms by FAI. 3) Results by month from May 2019 to September 2021 suggest that, for ICW3C, its true-positive ratio of cyanobacteria bloom samples varies between 99.94% and 100%, and its false-positive ratio of cloud samples varies between 0% and 0.08%, indicating that ICW3C could provide comparable performance over time.
(2) Comparisons between ICW3C and FAI are then performed on five image regions not containing any training or test set pixels, and results show that 1) In clear-sky image regions with cyanobacteria blooms, FAI extracts an average 5.81% more pixels than ICW3C, and difference maps show that their differences mainly lay in the edge of cyanobacteria blooms, where the biomass is relatively lower than the centers. Obviously, ICW3C is more cautious with low-biomass cyanobacteria, as both its training and test sets come from the high-biomass centers of cyanobacteria bloom regions. 2) In cloud-covered image regions without cyanobacteria blooms, FAI misclassifies over 608 times as many cloud and cloud edges pixels as ICW3C, proving that ICW3C could reduce the misclassification of clouds significantly.
(3) ICW3C index’s threshold value range of (175 330) is given for Sentinel-2A/B MSI images. Sensitivity of ICW3C to different threshold choices is analyzed, and results show that threshold’s reduction from 330 to 175 causes a slight improvement in recognition ratio of cyanobacteria pixels (increase by 0.002%), and significant increases in the misclassified cloud (increases by 141.7%) and yellow cloud edge pixels (increases by 188.9%). Even so, the false-positive ratio of cloud pixels still remains very low (<0.03%). Although the false-positive ratio of yellow cloud edge tops in both absolute value (3.106%) and increase ratio (188.9%), cloud edge usually covers far fewer pixels than other non-cyanobacterial substances. Thereby, the change of threshold within its value range will not lead to serious increase in misclassification.
(4) Sensitivity of ICW3C to viewing geometry is analyzed and results suggest that ICW3C is strongly correlated with FAI and VB-FAH in both zonal and meridional directions (all R²>0.95), indicating that, just like FAI and VB-FAH, ICW3C performs stable to the variations of viewing geometry.
(5) ICW3C is extended to GF1-WFV, Landsat8-OLI and HJ-1A/B sensors, and results in clear-sky images suggest that the relative deviations of cyanobacteria bloom areas between ICW3C and FAI/VB-FAH is 3.16% for GF1-WFV, 3.01% for Landsat 8-OLI and 4.35% for HJ1A/B. ICW3C provides performance comparable to FAI/VB-FAH, indicating that ICW3C is also applicable for GF1-WFV, Landsat8-OLI and HJ1A/B sensors.

In summary, the established indicator of ICW3C enables to extract cyanobacteria blooms in Lake Chaohu directly from satellite imagery DN values without performing any atmospheric correction or cloud-masking in advance. ICW3C sets the cyanobacteria bloom pixels in completely separated value ranges from water, cloud, cloud shadow and majority cloud edge pixels, allowing to distinguish cyanobacteria blooms from the others by setting its threshold in the range of (175 330). It’s confirmed that ICW3C is applicable for Sentinel-2A/B MSI, GF1-WFV, Landsat 8-OLI and HJ1A/B sensors, however, further researches are still needed to test whether ICW3C is suitable for other inland lakes or seas, which will be our focus.

Funding

National Natural Science Foundation of China (61805001); Natural Science Foundation of Anhui Province (1808085QF218); Independent Innovation Research Fund of Anhui Key Laboratory of Smart Agricultural Technology and Equipment (APKLSATE2019X007); Graduate Innovation Fund of Anhui Agricultural University, China (2021yjs-51).

Acknowledgments

We appreciate the research paper of Dr. Hailong Zhang, Nanjing University of Information Science and Technology, which inspired this study.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. O. N. Kimambo, H. Chikoore, J. R. Gumbo, and T. A. M. Msagati, “Retrospective analysis of Chlorophyll-( and its correlation with climate and hydrological variations in Mindu Dam, Morogoro Tanzania,” Heliyon 5, e02834 (2019). [CrossRef]

2. L. Feng, “Key issues in detecting lacustrine cyanobacteria bloom using satellite remote sensing,” J. Lake Sci. 33(3), 647–652 (2021). [CrossRef]

3. “Chaohu Cyanobacteria Bloom Burst Out, Covering one-sixth of the Waters,” Available online: https://www.sohu.com/a/245618821_688917 (2018).

4. Z. Wen, J. Zhou, and K. Yue, “Mode of Ecological Civilization Construction Based on Water Environment—Case Study of Hefei City and Lake Chaohu Basin,” Chinese Journal of Engineering Science 21(5), 113–119 (2019). [CrossRef]

5. J. Gao, Y. Gao, and Z. Zhang, “Theory and application of aquatic ecoregion delineation in lake-basin,” Progress in Geography. 38(8), 1159–1170 (2019). [CrossRef]

6. J. Shin, K. Kim, Y. B. Son, and J. H. Ryu, “Synergistic effect of multi-sensor Data on the detection of Margalefidinium polykrikoides in the South Sea of Korea,” Remote Sens. 11(1), 36 (2019). [CrossRef]

7. H. Liu, H. Ma, Q. Tang, and D. Wang, “Analysis on how error transfer coefficient varies with frequencies in satellite jitter detected from parallax observation images taken by adjacent CCDs: a case for three adjacent CCDs,” Optics Communications 503, 127422 (2022). [CrossRef]

8. T. Kyle, V. Sagan, and J. Sloan, “Deep learning-based water quality estimation and anomaly detection using Landsat-8/Sentinel-2 virtual constellation and cloud computing,” GIScience & Remote Sensing 57(4), 510–525 (2020). [CrossRef]

9. Q. Xing, L. Wu, L. Tian, T. Cui, L. Li, F. Kong, X. Gao, and M. Wu, “Remote sensing of early-stage green tide in the Yellow Sea for floating-macroalgae collecting campaign,” Marine Pollution Bulletin 133(8), 150–156 (2018). [CrossRef]

10. M. Wang and C. Hu, “Mapping and quantifying Sargassum, distribution and coverage in the Central West Atlantic using MODIS observations,” Remote Sensing of Environment 183, 350–367 (2016). [CrossRef]

11. C. Hu, “A novel ocean color index to detect floating algae in the global oceans,” Remote Sensing of Environment 113(10), 2118–2129 (2009). [CrossRef]

12. https://en.jinzhao.wiki/wiki/Red_edge

13. W. Li, H. Shi, Y. Zhang, Z. Niu, T. Wang, M. Ding, and K. Cai, “Cyanobacteria Blooms Monitoring in Taihu Lake Based on the Sentinel- 2A Satellite of European Space Agency,” Environmental Monitoring in China 34(4), 169–176 (2018).

14. Y. Xiao, J. Zhang, and T. Cui, “High-precision extraction of nearshore green tides using satellite remote sensing data of the Yellow Sea, China,” International Journal of Remote Sensing 38(6), 1626–1641 (2017). [CrossRef]

15. Y. He, Q. Xiong, X. Luo, T. Li, and L. Yu, “Study on spatio-temporal changes of water bloom in Dianchi Lake based on NDVI,” Ecology and Environmental Sciences. 28(3), 555–563 (2019). [CrossRef]

16. D. Zhang, X. Yin, B. She, Y. Ding, D. Liang, L. Huang, J. Zhao, and Y. Gao, “Using multi-source satellite imagery data to monitor of cyanobacteria bloom in Lake Chaohu,” Infrared and Laser Engineering. 48(7), 303–314 (2019). [CrossRef]

17. M. Shahzaman, W. Zhu, M. Bilal, H. A. Habtemicheal, F. Mustafa, I. Arshad, S. Ishfaq, and R. Iqbal, “Remote Sensing Indices for Spatial Monitoring of Agricultural Drought in South Asian Countries,” Remote Sens. 13(11), 2059 (2021). [CrossRef]

18. M. Wang and W. Shi, “Estimation of ocean contribution at the MODIS near-infrared wavelengths along the east coast of the U.S.: Two case studies,” Geophysical Research Letters 32(13), L13606 (2005). [CrossRef]

19. J. Zhang, L. Chen, and X. Chen, “Monitoring cyanobacteria blooms bloom based on remote sensing in Lake Erhai by FAI,” J. Lake Sci. 28(4), 718–725 (2016). [CrossRef]

20. X. Tang, M. Sheng, and H. Duan, “Temporal and spatial distribution of algal blooms in Lake Chaohu, 2000-2015,” J. Lake Sci. 29(2), 276–284 (2017). [CrossRef]

21. Q. Xing and C. Hu, “Mapping Macroalgal Blooms in the Yellow Sea and East China Sea Using HJ-1 and Landsat Data: Application of a Virtual Baseline Reflectance Height Technique,” Remote Sensing of Environment 178, 113–126 (2016). [CrossRef]

22. I. Caballero, R. Fernández, O. M. Escalante, L. Maman, and G. Navarro, “New capabilities of Sentinel-2A/B satellites combined with in situ data for monitoring small harmful algal blooms in complex coastal waters,” Sci Rep. 10(1), 8743 (2020). [CrossRef]

23. A. German, V. Andreo, C. Tauro, C. M. Scavuzzo, and A. Rerral, “A novel method based on time series satellite data analysis to detect algal blooms,” Ecological Informatics 59, 101131 (2020). [CrossRef]

24. S. Mishra, R. P. Stumpf, B. A. Schaeffer, P. J. Werdell, K. A. Loftin, and A. Meredith, “Measurement of Cyanobacterial Bloom Magnitude using Satellite Remote Sensing,” Sci Rep 9(1), 18310 (2019). [CrossRef]

25. A. A. Molkov, S. V. Fedorov, V. V. Pelevin, and E. N. Korchemkina, “Regional Models for High-Resolution Retrieval of Chlorophyll a and TSM Concentrations in the Gorky Reservoir by Sentinel-2 Imagery,” Remote Sens. 11(10), 1215 (2019). [CrossRef]

26. P. R. Renosh, D. Doxaran, L. D. Keukelaere, and J. I. Gossn, “Evaluation of Atmospheric Correction Algorithms for Sentinel-2-MSI and Sentinel-3-OLCI in Highly Turbid Estuarine Waters,” Remote Sens. 12(8), 1285 (2020). [CrossRef]

27. P. Singh and N. Komodakis, “Cloud-GAN: Cloud Removal for Sentinel-2 Imagery Using a Cyclic Consistent Generative Adversarial Network,” IGARSS, Valencia, Spain, hal-01832797 (2018).

28. R. Kauth and G. Thomas, “The Tasselled Cap – A Graphic Description of the Spectral-Temporal Development of Agricultural Crops as Seen by LANDSAT,” Machine processing of remotely sensed data (1976).

29. H. A. Muhammad, L. Zhang, T. Shuai, and Q. Tong, “Derivation of a tasselled cap transformation based on Landsat 8 at-satellite reflectance,” Remote Sens Letters 5(5), 423–431 (2014). [CrossRef]

30. H. Zhang, Z. Qiu, E. Devred, D. Sun, S. Wang, Y. He, and Y. Yu, “A simple and effective method for monitoring floating green macroalgae blooms: a case study in the Yellow Sea,” Optics Express. 27(4), 4528–4548 (2019). [CrossRef]

31. H. Liu, H. Ren, X. Niu, and P. Xia, “Extraction of cyanobacteria bloom in Lake Chaohu based on Sentinel-2 remote sensing images,” Ecology and Environmental Sciences 30(1), 146–155 (2021).

32. D. Sun, Y. Chen, S. Wang, H. Zhang, Z. Qiu, Z. Mao, and Y. He, “Using Landsat 8 OLI data to differentiate Sargassum and Ulva prolifera blooms in the South Yellow Sea,” Int J APPL Earth OBS. 98, 102302 (2021). [CrossRef]

33. Q. Xing, R. Guo, L. Wu, D. An, M. Cong, S. Qin, and X. Li, “High-Resolution Satellite Observations of a New Hazard of Golden Tides Caused by Floating Sargassum in Winter in the Yellow Sea,” IEEE Geosci. Remote Sensing Lett 14(10), 1815–1819 (2017). [CrossRef]

34. N. Chen, S. Wang, X. Zhang, and S. Yang, “A risk assessment method for remote sensing of cyanobacterial blooms in inland waters,” Sci. Total Environ. 740, 140012 (2020). [CrossRef]

35. B. Li, C. Di, and X. Yan, “Study of derivation of tasseled cap transformation for Landsat-8 OLI images,” Science of Surveying and Mapping. 41(4), 102 (2016).

36. K. Griffin, H. Burke, and M. Dan, “Cloud cover detection algorithm for EO-1 Hyperion imagery,” IEEE International Geoscience & Remote Sensing Symposium. (2003).

37. M. Wozniak, K. M. Bradtke, M. Darecki, and A. Kresel, “Empirical Model for Phycocyanin Concentration Estimation as an Indicator of Cyanobacterial Bloom in the Optically Complex Coastal Waters of the Baltic Sea,” Remote Sens. 8(3), 212 (2016). [CrossRef]

38. H. Liu, H. Ma, Z. Jiang, and D. Yan, “Jitter detection based on parallax observations and attitude data for Chinese Heavenly Palace-1 satellite,” Opt. Express 27(2), 1099–1123 (2019). [CrossRef]

39. G. Xie, M. Li, W. Lu, W. Zhou, L. Yu, F. Li, and S. Yang, “Spectral features, remote sensing identification and break-out meteorological conditions of algal bloom in Lake Dianchi,” J. Lake Sci. 22(3), 327–336 (2010).

40. C. Chen, B. Zhou, Y. Tian, Z. Chen, and F. Gao, “Application of Environmental Satellite HJ 1A/B-CCD Data for Cyanobacteria Dynamic Monitoring in Lake Chaohu,” Environmental Monitoring in China. 30(1), 200–204 (2012).

41. A. Ansper and K. Alikas, “Retrieval of Chlorophyll a from Sentinel-2 MSI Data for the European Union Water Framework Directive Reporting Purposes,” Remote Sens. 11(1), 64 (2019). [CrossRef]

Sensor	Bands(central/nm)					Resolution (/m)	Revisit cycle (/days)
Sensor	B	G	R	NIR	SWIR	Resolution (/m)	Revisit cycle (/days)
Sentinel-2A/B MSI	492	559	665	842	1610	10, 20(SWIR)	5
Landsat-8 OLI	482	561	654	864	1609	30	16
GF-1 WFV	485	560	660	830	-	16	4
HJ-1A/B	475	560	660	830	-	30	2

Sample	Spring			Summer			Autumn			Total
Sample	Apr.	May.		Jul.		Aug.	Sept.	Oct.	Nov.	Total
Cyanobacteria bloom	1956*	11898’	1598*	4093*	3626*	7463’	4804^{^}	5592’	860’	33350/46411
			3464’	2276^{^}	5663’		3229*
			3464’	2276^{^}	5119^{^}		18120^{^}
Cloud	744^{^}	6172’	5140*	5672*	1025’	1652’	4235*	3692’	-	30393/17097
Cloud	744^{^}	15453^{^}	2261’	1444^{^}	1025’	1652’	4235*	3692’	-	30393/17097
Water	2095*	3193’	1763*	2175*	2922*	5058’	3679^{^}	4876^{^}	8448*	37631/31053
	1456^{^}	2646^{^}	2570’	3531^{^}	4864’		5192*		6806’
	1456^{^}	2646^{^}	2570’	3531^{^}	1947^{^}		5463^{^}		6806’
Shadow	-	1236’	1399’	1159*	607’	697’	1174*	2992’	-	3928/6172
Shadow	-	1236’	1399’	836^{^}	607’	697’	1174*	2992’	-	3928/6172
Yellow edge	-	423’	425’	-	929*	205’	520*	729’	-	1898/1674
Yellow edge	-	423’	425’	-	341^{^}	205’	520*	729’	-	1898/1674
Blue-green edge	-	703’	380’	-	326*	220’	-	1089’	-	1654/1469
Blue-green edge	-	703’	380’	-	405^{^}	220’	-	1089’	-	1654/1469

No.	Candidate	R²	RMSE (×10⁻⁴)	MPD	No.	Candidate	R²	RMSE (×10⁻⁴)	MPD
1^#	-TCB-TCG	0.98	37	0.20%	10^#	-TCB + TCG-TCW-TCN	0.98	44	0.20%
2^#	-TCB-TCG + TCN	0.98	34	0.19%	11^#	-TCB + TCG-TCN	0.96	63	0.32%
3^#	-TCB-TCG	0.98	35	0.20%	12^#	TCW + TCN	0.98	36	0.15%
3^#	+TCW-TCN	0.98	35	0.20%	12^#	TCW + TCN	0.98	36	0.15%
4^#	-TCB-TCG + TCW	0.99	24	0.13%	13^#	-TCW	0.97	46	0.20%
5^#	-TCB-TCG	0.99	26	0.14%	14^#	-TCN	0.97	48	0.26%
5^#	+TCW + TCN	0.99	26	0.14%	14^#	-TCN	0.97	48	0.26%
6^#	-TCB-TCW-TCN	0.95	63	0.26%	15^#	TCG-TCW-TCN	0.99	23	0.10%
7^#	-TCB + TCN	0.95	68	0.39%	16^#	TCG-TCW	0.99	28	0.14%
8^#	-TCB + TCW	0.97	48	0.27%	17^#	TCG-TCW + TCN	0.95	68	0.35%
9^#	-TCB + TCW + TCN	0.98	43	0.23%	18^#	TCG-TCN	0.99	20	0.11%
					19^#	TCG	0.99	28	0.15%

	6^#	10^#	11^#	15^#	16^#	17^#	18^#	19^#
Max_cld	-1397	-3051	-4633	302	21	-563	-1287	-13
Max_shd	-720	-1169	-1257	-351	-288	-200	-495	-424
Max_yeled	-488	-1542	-2323	606	407	175	-811	-788
Max_bged	-914	-1535	-1814	166	101	59	-412	-456
Min_cyb	-1234	-1938	-3700	640	514	330	-564	-893

Sensor	Bands(central/nm)					Resolution (/m)	Revisit cycle (/days)
Sensor	B	G	R	NIR	SWIR	Resolution (/m)	Revisit cycle (/days)
Sentinel-2A/B MSI	492	559	665	842	1610	10, 20(SWIR)	5
Landsat-8 OLI	482	561	654	864	1609	30	16
GF-1 WFV	485	560	660	830	-	16	4
HJ-1A/B	475	560	660	830	-	30	2

A method to extract cyanobacteria blooms from satellite imagery with no requirements for any prior atmospheric correction or cloud-masking

Abstract

1. Introduction

2. Materials and methods

2.1 Study area

2.2 Satellite imagery and data set construction

2.2.1 Introducing satellite images

2.2.2 Constructing training and test sets from images over time

2.3 Establishing an indicator to distinguish cyanobacteria blooms from both water body, clouds, cloud shadows and cloud edges by a single threshold

2.3.1 Performing TCT on Sentinel-2A/B MSI images

2.3.2 Selecting candidates allowing to separate cyanobacteria blooms from water body

2.3.3 Further selecting candidates allowing to separate cyanobacteria blooms from clouds, cloud shadows and cloud edges

3. Results and discussions

3.1 Sensitivities of ICW3C

3.1.1 Sensitivity of ICW3C to threshold

3.1.2 Sensitivity of ICW3C to viewing angle

3.2 Comparison between ICW3C and FAI over time

3.3 ICW3C’s adaptability on other sensors

4. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (8)

Equations (8)

Optics Express

	α_cyb	β_cld	β_yeled	β_shd	β_bged
Variation	99.991% $↗$ 99.993%	0.012% $↗$ 0.029%	1.075% $↗$ 3.106%	0% $↗$ 0%	0% $↗$ 0%
Relative deviation	0.002%	141.7%	188.9%	0%	0%

	Comparison on test set					Comparison on image regions
	α_cyb	β_cld	β_yeled	β_bged	n_cyb (pixels)			n_cld (pixels)
	α_cyb	β_cld	β_yeled	β_bged	R1	R2	Sum	R3	R4	R5	Sum
FAI	100%	19.18%	13.74%	19.34%	37433	11062	48495	569	1638	3874	6081
ICW3C	99.99%	0.02%	1.55%	0%	35104	10575	45679	7	3	0	10
					Relative deviation		5.81%	Ratio			608.1

Time	Sensor	Area (unit: km²)		Relative Deviation
Time	Sensor	FAI/VB-FAH	ICW3C	Relative Deviation
2018/09/04	GF1-WFV	168.57	163.25	3.16%
2018/10/03	Landsat8-OLI	157.91	153.16	3.01%
2018/07/31	HJ-1A/B	30.59	29.26	4.35%