Approach for estimating the vertical distribution of the diffuse attenuation coefficient in the South China Sea

Xianqing Zhang; Xianqing Zhang; Xianqing Zhang; Cai Li; Cai Li; Yuanning Zheng; Yuanning Zheng; Yuanning Zheng; Cong Liu; Cong Liu; Cong Liu; Wen Zhou; Wen Zhou; Zhantang Xu; Zhantang Xu; Zeming Yang; Zeming Yang; Yuezhong Yang; Yuezhong Yang; Wenxi Cao; Wenxi Cao

doi:10.1364/OE.503850

1. Introduction

The vertical distribution of the diffuse attenuation coefficient $K({z,\lambda } )$ (see Table 1 for the main abbreviations used in this study), is critical for studies in bio-optics, microbial carbon pump, photosynthesis of phytoplankton, heat budget of the oceanic mixed layer [1], and is one of the key indicators in the application of the distribution of the underwater light field [2], underwater photovoltaic (PV) power generation [3], and warning of harmful algal blooms [4–6]. As a key apparent optical property (AOP), $K({z,\lambda } )$ is predominantly influenced by the inherent optical properties (IOPs) and minimally impacted by the geometric underwater light field [7–11].

Table 1. The main abbreviations used in this study

View Table | View all tables in this article

The volume scattering function (VSF, $\beta ({\psi ,z,\lambda } )$) and the absorption coefficient $a({z,\lambda } )$ are the fundamental IOPs. Coupled with the boundary conditions, the radiometric variables such as radiance $L({z,\lambda } )$, irradiance $E({z,\lambda } )$, and the AOPs such as remote sensing ${R_{rs}}(\lambda )$, and the diffuse attenuation coefficient $K({z,\lambda } )$, can be simulated based on the radiative transfer equation (RTE) [12], and in the past decades, analytical and semianalytical algorithms have been developed for the estimation of $K({z,\lambda } )$ from IOPs [10,13,14].

Analytical algorithms utilize an RTE numerical simulation without any approximations to simulate $K({z,\lambda } )$ based on IOPs. An RTE numerical simulation model, which could provide precise solutions for estimating AOPs (such as $K({z,\lambda } )$) from IOPs such as $a({z,\lambda } )$, $c({z,\lambda } )$, $\beta ({\theta ,z,\lambda } )$, etc [5]., was developed by Mobley [10], and then based on this numerical simulation model, commercial software, Hydrolight, was constructed by Mobley [10]. The model and software are widely used to simulate the optical properties [13,15].

Different from the analytical algorithm, based on a series of approximations to RTE and simulations by Hydrolight software, many semianalytical algorithms for estimating $K({z,\lambda } )$ from IOPs have been established and used in marine optical research [13,14,16,17].

Beside the analytical and semianalytical algorithms, based on ${R_{rs}}(\lambda )$ or $Chla$, several empirical algorithms for estimating $K(\lambda )$ have been built and applicated [18–21]. These empirical algorithms could be used to estimate $K(\lambda )$ with high spatial coverage and temporal resolution, but predominantly focusing on estimating $K(\lambda )$ in the surface.

The South China Sea (SCS) is the largest tropical marginal sea in the western Pacific Ocean, which plays an important role in regulating the regional climate and carbon cycle with its extensive area and volume and it is essential to study the vertical distribution of $K({z,\lambda } )$ [22,23]. In the past decades, Wang et al. [24] and Zhao et al. [25] have studied the relationship between ${K_d}(\lambda )$ with $Chla$, and ${R_{rs}}(\lambda )$ in the SCS while the study on the vertical distribution of $K({z,\lambda } )$ is restricted [6,26].

The machine learning algorithm (MLA), characterized by its high efficiency, strong robustness, and convenience, is an effective solution for addressing the nonlinear relationships between various parameters widely used in the field of ocean color remote sensing [27–30]. In this study, using the MLA, a novel method for calculating the vertical distribution of $K({z,\lambda } )$ in the SCS was developed based on in situ IOPs.

In this novel approach, based on $\beta ({\psi ,z,650} )$ ($\psi = 20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $, $140^\circ $, and $168^\circ $), $a({z,650} )$, and the profile depths z of three cruises in the SCS, alongside various matching boundary conditions, using three MLAs (categorical boosting (CatBoost), light gradient boosting machine (LightGBM), and random forest (RF)), the machine learning models (MLMs) for estimating the vertical distribution of the diffuse attenuation coefficients in the SCS, including the diffuse attenuation coefficient for the downwelling irradiance ${K_d}({z,650} )$, the diffuse attenuation coefficient for the upwelling radiance ${K_{Lu}}({z,650} )$, and the diffuse attenuation coefficient for the upwelling irradiance ${K_u}({z,650} )$, were developed, evaluated by three accuracy evaluation indicators (${R^2}$, RMSE, and MAPE), compared with the results calculated by Hydrolight (taken as the ground truth in this study), and applied in the study for the vertical and spatial distribution of $K({z,\lambda } )$ in the SCS.

2. Methods and data

2.1 Study area

Located in the western Pacific Ocean, the SCS is the largest tropical marginal sea with a surface area of 3.5 million square kilometers and a maximum depth of more than 5 thousand meters [31]. Due to its extensive area and volume and unique geographical location, the SCS is a crucial regulatory agent in terms of regional climate and carbon cycle [22]. Numerous intricate and dynamic processes take place in the SCS, including coastal upwelling, periodic typhoons, freshwater from the Pearl River estuary, and some subtropical bays influenced by human activity [6,32], leading to optically complex waters in the SCS.

Datasets were obtained from three cruises in the SCS: September 30 to October 24, 2012 (cruise 2012), August 9-10, 2013 (cruise 2013), and June 30 to July 15, 2015 (cruise 2015). These cruise datasets contained 166 sets of match-up vertical profiles of the total minus water absorption coefficient ${a_{nw}}({z,\lambda } )$, the tot minus water attenuation coefficient ${c_{nw}}({z,\lambda } )$, and $\beta ({\psi ,z,650} )$ ($\psi = 20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $., $140^\circ $, and $168^\circ $) for 26 stations, as shown in Fig. 1, along with the solar zenith angle, which is computed based on the time, latitude, and longitude of each station. Additionally, various meteorological parameters (e.g., wind speed and relative humidity), which were either collected simultaneously during sampling or obtained from reanalysis datasets, were also essential in this study.

Fig. 1. Locations of datasets obtained from cruise 2012, cruise 2013, and cruise 2015 (the red line denotes a section near the Pearl River mouth).

Download Full Size | PDF

2.2 Data collection and processing

2.2.1 Absorption and attenuation coefficients

The vertical profiles of the absorption coefficient and attenuation coefficient were obtained via ac-9/ac-s (the Wetlabs [now Seabird, in Bellevue, WA, USA] underwater absorption and attenuation meter). ac-9 (412 nm, 440 nm, 488 nm, 510 nm, 532 nm, 555 nm, 650 nm, 676 nm, 715 nm) was used during cruise 2012, while ac-s (82 wavebands between 401.6 nm and 744.1 nm) was used during cruises 2013 and 2015. The data were interpolated to a vertical resolution of 1 m for better usability after processing. Notably, ac-9/ac-s measurements are the total minus water absorption coefficient ${a_{nw}}({z,\lambda } )$ instead of the total absorption coefficient $a({z,\lambda } )$, which were used in the MLMs. The processed ac-9/ac-s measurements should be added to the seawater absorption coefficient [33] when building the MLMs.

2.2.2 Volume scattering function

The vertical profiles of the volume scattering function (VSF, $\beta ({\psi ,z,\lambda } )$) were collected by a Volume Scattering and Attenuation Meter (VSAM, developed by the South China Sea Institute of Oceanology, Chinese Academy of Sciences, in Guangzhou, Guangdong, China). VSAM was used to measure vertical profiles of $\beta ({\psi ,z,\lambda } )$ at seven angles ($20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $, $140^\circ $, and $168^\circ $) at 650 nm. The VSAM data was also interpolated to a vertical profile resolution of 1 m, which was consistent with the ac-9/ac-s measurements.

2.2.3 Boundary conditions

Based on RTE, for $K({z,\lambda } )$ calculating by HL6.0, boundary conditions (including the solar zenith angle, wind speed, relative humidity, sea level pressure, visibility, total cloud cover and precipitable water) are the essential input information, and they are also used for constructing MLMs with boundary conditions. The solar zenith angle was determined through a calculation based on the sampling time, latitude, and longitude. The wind speed, relative humidity, and sea level pressure were measured by the automatic weather station on the research ship. The visibility was determined from the Global Atmospheric Reanalysis (CRA-40) dataset (found at [34]), while the total cloud cover and precipitable water were obtained from the National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) Reanalysis I dataset (found at [35]).

2.3 Methods

In this study, based on the water IOPs ($\beta ({\psi ,z,650} )$, ${a_{nw}}({z,\lambda } )$, and ${c_{nw}}({z,\lambda } )$), chlorophyll-a concentration $Chla(z )$, and boundary conditions, ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ were first calculated by HL6.0 in the Measured IOPs model; then, ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ calculated by HL6.0 as a foundation, combined with $\beta ({\psi ,z,650} )$, $a({z,650} )$, the profile depths z, and the boundary conditions, by employing optimized decision tree (DT) algorithms, three MLMs for estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ were studied, developed, and evaluated; finally, the best MLMs for estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ in the SCS were selected based on the results of three accuracy evaluation indicators and comparison with HL6.0 simulations. The flowchart for this study is shown in Fig. 2.

Fig. 2. Flowchart for ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ simulations.

Download Full Size | PDF

2.3.1 Hydrolight

Similar to the chart in Fig. 2, ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ must be calculated by HL6.0 before constructing the MLMs.

In the HL6.0 measured IOPs model, the boundary conditions, $Chla(z )$, and water IOPs, such as ${a_{nw}}({z,\lambda } )$, ${c_{nw}}({z,\lambda } )$, and the particulate backscattering coefficient ${b_{bp}}({z,\lambda } )$, are essential, where $Chla(z )$ is computed via the method of ${a_{LH}}$ grounded on ${a_{nw}}({z,\lambda } )$ [6,31,36] and ${b_{bp}}({z,\lambda } )$ is extracted from $\beta ({\psi ,z,\lambda } )$. The work wavelength in the VSAM is 650 nm. To calculate ${b_{bp}}({z,\lambda } )$ from $\beta ({\psi ,z,650} )$ measured by VSAM, ${a_{nw}}({z,\lambda } )$, ${c_{nw}}({z,\lambda } )$, and the particulate backscattering ratio ${B_p}$ (${B_p} = {b_{bp}}/{b_p}$) were introduced. A chart for ${b_{bp}}({z,\lambda } )$ calculated from $\beta ({\psi ,z,650} )$ is shown in Fig. 3. First, the backscattering coefficient ${b_b}({z,650} )$ was calculated from $\beta ({126^\circ ,z,650} )$ multiplied by the relevant factor ${\chi _{b\theta }}$, where ${\chi _{b\theta }}$ is 1.403 [37]; consequently, ${b_{bp}}({z,650} )$, could be inferred from ${b_b}({z,650} )$ by subtracting the backscattering coefficient of seawater ${b_{bw}}({650} )$; finally, the computation of ${b_{bp}}({z,\lambda } )$ relied on the assumption that ${B_p}$ is a spectral constant [38]. The value of ${B_p}({z,650} )$ was obtained by dividing ${b_{bp}}({z,650} )$ by ${b_p}({z,650} )$, and ${b_p}({z,650} )$ was derived from ${a_{nw}}({z,650} )$ and ${c_{nw}}({z,650} )$, which were measured by ac-9/ac-s.

Fig. 3. Flowchart for calculating ${b_{bp}}({z,\lambda } )$ based on measurements of the VSAM.

Download Full Size | PDF

Additionally, based on the RADTRAN-X and Coombes normalized sky radiance model, the skylight distribution, a critical factor for light field geometry in HL6.0 simulations, was calculated with the given solar and atmospheric conditions.

2.3.2 Machine learning methods

Among numerous MLAs, the DT algorithm is widely adopted and recognized as one of the most prevalent MLAs. It constructs a hierarchical “tree” by partitioning a dataset into smaller subsets, providing efficient classification or regression predictions with fewer computational complexities; of course, the overfitting issue is inherent to the DT algorithm [39]. To overcome this limitation, ensemble learning algorithms are often employed to enhance the performance of the DT algorithm [40]. In this study, three ensemble learning algorithms based on DTs, namely, CatBoost, LightGBM, and RF, were selected to estimate ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ based on IOPs.

2.3.2.1 CatBoost

The ensemble learning algorithms predominantly utilized in many applications are boosting and bagging algorithms [41]. CatBoost is an advanced iteration of the boosting algorithm that adopts binary decision trees as base predictions [42]. The implementation of ordered boosting is introduced in CatBoost to enhance traditional gradient estimation, effectively address the prediction shift issue, reduce overfitting, and increase accuracy and robustness. In comparison with other boosting algorithms (such as LightGBM), the CatBoost algorithm reduces the requirement for extensive hyperparameter tuning, resulting in a greatly reduced model runtime.

2.3.2.2 LightGBM

LightGBM is also an implementation of the boosting algorithm [43]. It adopts a leafwise tree growth strategy, which selectively splits the leaf with the maximum gain among the current leaves to reduce computational costs and increase efficiency. It also introduces gradient-based one-side sampling (GOSS) and exclusive feature building (EFB) to increase the training speed. LightGBM demonstrates an ensemble learning algorithm that excels across various aspects, including increasing efficiency and significantly reducing computational times.

2.3.2.3 RF

RF is an extension and enhancement of the bagging algorithm, originating from the amalgamation of the bagging sampling approach (random bootstrapping) and random selection of features. The random selection of the training dataset and features in RF amplifies the heterogeneity of weak learning, thus enhancing the generalization performance [44]. Aiming to enhance the prediction accuracy and curtail overfitting, RF publishes its prediction by integrating and averaging the outcomes from all decision trees.

2.4 Modeling approach

Based on the HL6.0 simulation results, the flowchart for estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ via CatBoost, LightGBM, and RF is shown in Fig. 2, including dataset construction, model training, and model evaluation.

First, the datasets were constructed by matching $\beta ({\psi ,z,650} )$ at seven angles ($20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $, $140^\circ $, and $168^\circ $), $a({z,650} )$, the profile depths z, and the boundary conditions, along with ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ calculated by HL6.0. $K({z,\lambda } )$ is a key AOP, which is strongly determined by the water IOPs and minimally influenced by the geometry of the penetrating light field. Based on this feature, there are two input feature parameter selection schemes for $K({z,650} )$ MLMs building: one includes IOPs ($\beta ({\psi ,z,650} )$ and $a({z,650} )$) and the profile depths z, and the other includes the boundary conditions (including the solar zenith angle, wind speed, relative humidity, sea level pressure, visibility, total cloud cover and precipitable water) in addition to IOPs and the profile depths z. A total of 166 pairs of sample data were collected from 26 stations to build the MLMs. Then, the datasets were randomly sampled into a training set and a testing set: 80% for the training set (132 pairs of sample data) and 20% for the testing set (34 pairs of sample data). The training set was utilized for model training purposes, while the testing set was employed to evaluate the performance of the trained models.

Second, the CatBoost, LightGBM, and RF models were trained based on the training set. Hyperparameter tuning is a critical aspect of model training. To optimize the performance of the models, a grid-search strategy, which is particularly suitable for small datasets, was employed for hyperparameter tuning in this study. For RF model tuning, two hyperparameters are used: the number of trees in the forest (n_estimators) and the maximum number of features considered in the random subset (max_features). For LightGBM, there are relatively more hyperparameters that need to be tuned, including the learning rate (learning_rate), the tree depth maximum (max_depth), etc. All the tuned hyperparameters of the LightGBM models are shown in Table 2. In comparison, the hyperparameter tuning for CatBoost is relatively straightforward, and enhanced results and less overall tuning time could be obtained while tuning only the number of trees (n_estimators).

Table 2. The hyperparameters of LightGBM

View Table | View all tables in this article

Finally, the performances of the MLMs were assessed based on the testing set. Furthermore, a comparison was conducted between ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ calculated by HL6.0 and estimated by CatBoost, LightGBM, and RF models, and the optimal model was selected.

2.5 Accuracy evaluation indicators

The performance evaluation of the results was assessed using the following statistical indicators: the coefficient of determination (${R^2}$), root mean square error (RMSE), and mean absolute percentage error (MAPE). These metrics can be calculated as follows:

(1)$${R^2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^n {{({y_i} - \widehat {{y_i}})}^2}}}{{\mathop \sum \nolimits_{i = 1}^n {{({y_i} - \bar{y})}^2}}}, $$

(2)$$\textrm{RMSE} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^n {{({y_i} - {{\hat{y}}_i})}^2}}}{n}} ,$$

(3)$$\textrm{MAPE} = \mathop \sum \limits_{i = 1}^n \left|{\frac{{{y_i} - {{\hat{y}}_i}}}{{{y_i}}}} \right|\times \frac{{100\%}}{n},$$

where ${y_i}$ denotes the calculated value from HL6.0, ${\hat{y}_i}$ denotes the estimated value from three MLMs, $\bar{y}$ is the average calculated value from HL6.0, and n is the number of data pairs.

3. Results

3.1 Results of CatBoost/LightGBM/RF calculated without boundary conditions

In this section, based on the characteristics that $K({z,\lambda } )$ strongly determined by the water IOPs and minimally influenced by the geometry of the penetrating light field, we established CatBoost, LightGBM and RF models to predict ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ using in situ measurements of $\beta ({\psi ,z,650} )$ ($\psi = 20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $, $140^\circ $, and $168^\circ $), $a({z,650} )$, the profile depths z, and simulations of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ obtained from HL6.0. Figure 4 illustrates the correlations between the simulations of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ derived from HL6.0 and those estimated by CatBoost, LightGBM, and RF in the testing set with three accuracy evaluation indicators (${R^2}$, RMSE, and MAPE). In Fig. 4, a good agreement was observed between ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ calculated by HL6.0 and those estimated via CatBoost, LightGBM, and RF, with ${R^2} > $0.88, RMSE$\le $0.028 ${\textrm{m}^{ - 1}}$, and MAPE$< $6%. The ranges of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ were 0.35-0.6 ${\textrm{m}^{ - 1}}$, 0.25 to 0.7 ${\textrm{m}^{ - 1}}$ and 0.25 to 0.6 ${\textrm{m}^{ - 1}}$, respectively. ${K_d}({z,650} )$ obtained from HL6.0 and three MLMs were roughly distributed around the 1:1 line. In contrast to ${K_d}({z,650} )$, for ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$, there were still slight differences between the simulations by HL6.0 and the estimations by the three MLMs; the MLMs tended to slightly overestimate ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ within the range of 0.25-0.4 ${\textrm{m}^{ - 1}}$, while underestimation was observed within the range of 0.4-0.55 ${\textrm{m}^{ - 1}}$.

Fig. 4. Accuracy evaluation results of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ estimated by CatBoost/LightGBM/RF (testing set; the red lines denote 1:1 lines).

Download Full Size | PDF

Through the mutual comparison of three accuracy evaluation indicators for three MLMs that were used to estimate ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$, the CatBoost models exhibited the highest accuracy in estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$, with ${R^2}$, RMSE and MAPE values of 0.955, 0.010 ${\textrm{m}^{ - 1}}$ and 1.7%, 0.950, 0.021 ${\textrm{m}^{ - 1}}$ and 4.3%, and 0.920, 0.021 ${\textrm{m}^{ - 1}}$ and 3.6%, respectively. Among the three diffuse attenuation coefficients, the MLMs employed to estimate ${K_d}({z,650} )$ demonstrated the best performance. In other words, when considering three accuracy evaluation indicators (${R^2}$, RMSE and MAPE), the CatBoost models outperformed both the LightGBM and RF models in accurately estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$.

To further directly compare the accuracy between the CatBoost, LightGBM, and RF results, taking the HL6.0 simulations as the ground truth, the vertical distribution at a section near the Pearl River mouth (including stations E11, E12, E13 and E15, as shown in Fig. 1) of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ via three MLMs were compared with HL6.0 simulations, as shown in Fig. 5. Figure 5 shows the vertical distribution of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ at the section near the Pearl River mouth, as estimated by CatBoost, LightGBM, and RF, alongside the corresponding simulations obtained from HL6.0. As shown in Fig. 5, both the values and contours of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ demonstrated remarkable similarities between estimations by CatBoost, LightGBM, and RF models and HL6.0 simulations, exhibiting a consistent vertical distribution characteristic. Nevertheless, a distinction was observed at stations E11 E13, and E15, where the estimations of CatBoost, LightGBM, and RF displayed different overestimations or underestimations in comparison to the HL6.0 simulations. Among the estimations of ${K_d}({z,650} )$, the outcomes generated by three MLMs exhibited overestimations at 4-7 m underwater of station E11. In the comparative analysis of ${K_{Lu}}({z,650} )$ obtained via three MLMs and HL6.0, it was observed that the former yielded lower estimations at 4-8 m underwater of station E15. Additionally, a slight underestimation was observed at 4 m underwater of station E13 when estimating ${K_{Lu}}({z,650} )$ using RF model. In terms of the vertical distribution of ${K_u}({z,650} )$, different underestimations were observed from the estimations produced by LightGBM and RF models at 6-8 m of station E15. Furthermore, the LightGBM model demonstrated a slight underestimation at 9-10 m underwater at station E11 and the RF model displayed a underestimation at 4 m underwater at station E13 relative to the HL6.0 simulations. The vertical distribution of ${K_d}({z,650} )$ at the selected section displayed a gentle variation within the range of 0.43-0.53 ${\textrm{m}^{ - 1}}$, whereas ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ showed broader variations spanning from 0.3-0.48 ${\textrm{m}^{ - 1}}$ and 0.3-0.5 ${\textrm{m}^{ - 1}}$. At station E13, ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ all reached a maximum at a depth of 4 m underwater, while at station E11, they exhibited a minimum at a depth of 10 m underwater.

Fig. 5. Comparison of ${K_d}({z,650} )$ (a), ${K_{Lu}}({z,650} )$ (b), and ${K_u}({z,650} )$ (c) obtained from HL6.0 and CatBoost/LightGBM/RF at a section near the Pearl River mouth (the red boxes represent overestimations, and the yellow boxes represent underestimations).

Download Full Size | PDF

To summarize, among three MLMs, the outcomes generated by CatBoost models exhibited the closest agreement with HL6.0 simulations for ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$. Therefore, three accuracy evaluation indicators (${R^2}$, RMSE, and MAPE) and the comparison between HL6.0 simulations and estimations obtained through CatBoost, LightGBM, and RF models consistently indicated that the CatBoost models outperform in accurately estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$.

3.2 Results of CatBoost/LightGBM/RF with boundary conditions

Undoubtedly, as a key AOP, $K({z,\lambda } )\textrm{}$ is still influenced by the geometric underwater light field despite strongly determined by the water IOPs . Based on the definition of AOP, in addition to $\beta ({\psi ,z,650} )$ ($\psi = 20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $, $140^\circ $, and $168^\circ $), $a({z,650} )$, and the profile depths z, the boundary conditions, including sea level pressure, precipitable water, relative humidity, wind speed, total cloud cover, visibility, and solar zenith angle, were also important input features and were incorporated into MLMs constructions. The results derived from MLMs with boundary conditions and calculated by HL6.0 are shown in Figs. 6,7.

Fig. 6. Accuracy evaluation results of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ estimated by CatBoost/LightGBM/RF with boundary conditions (testing set; the red lines denote 1:1 lines).

Download Full Size | PDF

Fig. 7. Comparison of ${K_d}({z,650} )$ (a), ${K_{Lu}}({z,650} )$ (b), and ${K_u}({z,650} )$ (c) obtained from HL6.0 and CatBoost/LightGBM/RF with boundary conditions at a section near the Pearl River mouth (the red boxes represent overestimations, and the yellow boxes represent underestimations).

Download Full Size | PDF

As shown in Fig. 6, the results of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ obtained from the HL6.0 and CatBoost, LightGBM, and RF models with boundary conditions demonstrated a stronger agreement, with ${R^2} > $0.91, RMSE$\le $0.021${\textrm{m}^{ - 1}}$ and MAPE$\le $4.7%. Notably, ${K_d}({z,650} )$ estimated from three MLMs demonstrated a remarkable resemblance to the HL6.0 simulations, as evidenced by ${R^2}$>0.99, RMSE < 0.005${\textrm{m}^{ - 1}}$, MAPE < 1%, and within a range of 0.35-0.6 ${\textrm{m}^{ - 1}}$. Consistent with the findings presented in section 4.1, the results of ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$, obtained from three MLMs and HL6.0 simulations, exhibited values within the range of 0.25-0.7 ${\textrm{m}^{ - 1}}$ and 0.25-0.6 ${\textrm{m}^{ - 1}}$, respectively. It was observed that the outcomes generated by three MLMs tended to be higher than HL6.0 simulations in the range of 0.25-0.4 ${\textrm{m}^{ - 1}}$ and lower in the range of 0.4-0.55 ${\textrm{m}^{ - 1}}$.

Moreover, during the comparison between three MLMs to estimate ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ based on ${R^2}$, RMSE, and MAPE, it was determined that the CatBoost models exhibited superior performance, with ${R^2}$, RMSE, and MAPE values of 0.998, 0.002 ${\textrm{m}^{ - 1}}$ and 0.4%, 0.955, 0.020 ${\textrm{m}^{ - 1}}$ and 4.0%, and 0.955, 0.016 ${\textrm{m}^{ - 1}}$ and 2.9%, respectively.

To further assess the precision of the estimations obtained from the CatBoost, LightGBM and RF models, Fig. 7 shows the vertical distribution at the section near the Pearl River mouth (as shown in Fig. 1) of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ via three MLMs with boundary conditions and HL6.0. As shown in Fig. 7, when compared to the HL6.0 simulations, the values and contours of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ for the vertical distribution at the selected section estimated by the CatBoost, LightGBM, and RF models exhibited comparability to the HL6.0 simulations. In particular, the estimations derived from three MLMs for ${K_d}({z,650} )$ demonstrated a significant agreement with the HL6.0 simulations, aside from the LightGBM model at 5 m underwater at station E11, which displayed a slightly lower value compared to the HL6.0 simulations. And in the case of the estimations of ${K_{Lu}}({z,650} )$, a substantial correspondence with the HL6.0 simulations was observed for the estimations of CatBoost and LightGBM models, whereas the RF model yielded a higher value at 4-7 m underwater of station E15. When assessing the estimations obtained from three MLMs for ${K_u}({z,650} )$, there was a marked alignment between the CatBoost model and the HL6.0 simulations, but the LightGBM and RF models used to estimate ${K_u}({z,650} )$ indicated a consistent underestimation at 5-7 m underwater at station E15.

In conclusion, the vertical distributions of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ estimated by CatBoost, LightGBM, and RF models with boundary conditions demonstrated more accuracy, in contrast to three MLMs without boundary conditions, which indicated the importance of boundary conditions in estimating $K({z,\lambda } )$. With considering both the assessment of three accuracy evaluation indicators (${R^2}$, RMSE, and MAPE) and a comparison of the estimations provided by the CatBoost, LightGBM, and RF models with HL6.0 simulations, it was evident that the CatBoost models with boundary conditions continued to outperform when estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$.

In a word, the superior performance of accuracy evaluation indicators (${R^2}$>0.88, RMSE < 0.03 ${\textrm{m}^{ - 1}}$, and MAPE < 6%) and the satisfactory consistency results between MLMs estimations and HL6.0 simulations indicate that the method proposed in the study is an efficient and accurate approach for estimating $K({z,650} )$ in the SCS, and helpful in studying the underwater light field distribution, water quality assessment and remote sensing data products validation.

4. Discussion

4.1 Comparative analysis of $K({z,\lambda } )$ estimates with and without boundary conditions

A comparative analysis was conducted for the estimated results of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ using CatBoost models, which were identified as the most accurate in estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ without and with boundary conditions, as shown in Table 3 and Fig. 8. The results presented in Table 3 highlighted the increased accuracy achieved by the CatBoost models when incorporating boundary conditions for estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$, with enhancements in terms of ${R^2}$, RMSE, and MAPE. Moreover, the comparison of three accuracy evaluations between the CatBoost models without and with boundary conditions further demonstrated the significance of boundary conditions in estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$.

Table 3. Accuracy evaluation results of CatBoost models without and with boundary conditions

View Table | View all tables in this article

In order to further assess the significance of boundary conditions in estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$, Fig. 8 shows a comparison between the spatial distributions of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ obtained from HL6.0 and the CatBoost models without and with boundary condition at 5 m underwater. On the whole, as illustrated in Fig. 8, both the magnitudes and contours of ${K_d}({5,650} )$, ${K_{Lu}}({5,650} )$, and ${K_u}({5,650} )$ were found to reveal notable resemblances across estimations generated by CatBoost models without and with boundary conditions and HL6.0 simulations. This alignment underscored a consistent spatial distribution characteristic, except for some distinct overestimations or underestimations at stations E01, E63, and E73 in 2012, station S43 in 2013, and station S60 in 2015.

Fig. 8. Comparison of $K({z,\lambda } )$ at 5 m depth (${K_d}({5,650} )$, ${K_{Lu}}({5,650} )$, and ${K_u}({5,650} )$) obtained from HL6.0 and CatBoost models without and with boundary conditions (the red boxes represent overestimations, and the yellow boxes represent underestimations).

Download Full Size | PDF

Specifically, for the spatial distribution of ${K_d}({5,650} )$, the estimations derived from the CatBoost model with boundary conditions exhibited a similar performance to the HL6.0 simulations, while the estimations of CatBoost model without boundary conditions were lower at station S60 in 2015 and higher at station E63 in 2012 than the HL6.0 simulations. In terms of the spatial distribution of ${K_{Lu}}({5,650} )$, a slight overestimation was also observed between the estimations produced by the CatBoost models without and with boundary conditions relative to the HL6.0 simulations at station E01 in 2012. In the comparative analysis of ${K_u}({5,650} )$ estimations obtained via CatBoost models without and with boundary conditions against the HL6.0 simulations, it was observed that the former yielded higher estimations at station E01 in 2012 and lower estimations at station E73 in 2012. Additionally, a slight overestimation was observed at station S43 in 2013 when estimating ${K_u}({5,650} )$ using CatBoost model without boundary conditions.

Overall, at 5 m underwater, the estimations of ${K_d}({5,650} )$, ${K_{Lu}}({5,650} )$ and ${K_u}({5,650} )$ derived from CatBoost models with boundary conditions were closer to the HL6.0 simulations than the corresponding estimations obtained from CatBoost models without boundary conditions, resulting in a higher accuracy in estimating ${K_d}({5,650} )$, ${K_{Lu}}({5,650} )$ and ${K_u}({5,650} )$. Therefore, both the enhancements of three accuracy evaluation indicators and the closer estimations to the HL6.0 simulations collectively demonstrated the significance of boundary conditions in accurately estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$.

Additionally, Fig. 9 shows the correlations between ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ derived from CatBoost models with and without boundary conditions with three accuracy evaluation indicators. A strong consistence was observed in Fig. 9, as indicated by ${R^2}$ values of 0.992, 0.994, and 0.997, RMSE values of 0.004 ${\textrm{m}^{ - 1}}$, 0.007 ${\textrm{m}^{ - 1}}$, and 0.005 ${\textrm{m}^{ - 1}}$, and MAPE values of 0.4%, 0.7%, and 0.8%, respectively. The results demonstrate that MLMs without boundary conditions also can be used to estimate $K({z,\mathrm{\lambda }} )$, since AOPs are predominantly influenced by IOPs. It is worth noting, these MLMs without boundary conditions may only be applicable to the water strongly affected by chlorophyll. Additionally, in these MLMs, an effective assessment of the solar zenith angle within a valid range has been carried out before the construction of the MLMs, even the boundary conditions are not involved in MLMs.

Fig. 9. Comparison of (a) ${K_d}({z,650} )$, (b) ${K_{Lu}}({z,650} )$, (c) and ${K_u}({z,650} )$ estimated by CatBoost with and without boundary conditions (the red lines denote 1: lines).

Download Full Size | PDF

4.2 Spatial and vertical distribution characteristics of $K({z,650} )$

In the section near the Pearl River mouth, as shown in Fig. 5 and Fig. 7, ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ exhibited an overall decline in magnitude as depth increased, which was observed by Sigel in 1987 [45]. It has been suggested that the observed reduction may be caused by sunlight-induced fluorescence [46,47] and/or limitations of the irradiance or radiance sensors [48,49]. ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ utilized in this study were derived from HL6.0 simulations based on IOPs rather than being calculated based on in situ measurements obtained through irradiance or radiance sensors. Hence, it could be inferred that the decreasing trend observed in ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ was not caused by sensor limitations. The observed decrease in ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ in this study could potentially be caused by sunlight-induced fluorescence. The presence of sunlight-induced fluorescence would generate an orange–red light source within the water column, causing a slower decline in irradiance at these specific wavebands with increasing depth, thus leading to a reduction in the magnitude of the diffuse attenuation coefficient with increasing depth.

The spatial distribution characteristics of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ consistently exhibited an overall decreasing trend with increasing offshore distance at the same depth at the section near the Pearl River mouth, while station E15 stood out as an exception which had a lower value compared to the $K({z,\lambda } )$ calculation at station E13. During September 29 to October 2, 2012, coupled with the meteorological parameters, a significant precipitation event was observed within the investigation region (as shown in Fig. 10). Maybe this exception distribution of $K({z,\lambda } )$ at station E15 can be explained by the precipitation event and its lag time impact on water quality (station E15 sampled in October 7, 2012, while stations E13∼E11 sampled in October 9, 2012) [50]. And the observed decreasing trend with increasing offshore distance at stations E13, E12, and E11 could attribute to the impact of human activities, continental input, and freshwater from the Pearl River estuary. These factors contribute to water constituent complexities and water quality deterioration, consequently leading to a faster light attenuation near the nearshore stations.

Fig. 10. The distribution of precipitation between September 29 and October 2, 2012 (the black circles denote the stations E11, E12, E13, and E15).

Download Full Size | PDF

In all the investigating area in the SCS, the spatial distribution of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$ and ${K_u}({z,650} )$ at 5 m underwater, as shown in Fig. 8, demonstrated a consistent trend of gradual decrease with increasing offshore distance, except for four stations in 2013, which may also be attributed to the temporal and spatial variations of water quality.

5. Conclusions

$K({z,\lambda } )$ is a key AOP and is sensitive to $\beta ({\psi ,z,\lambda } )$. In this study, using the MLAs including CatBoost, LightGBM, and RF, based on the profile measurements of $\beta ({\psi ,z,650} )$ ($\psi = 20^\circ $, $50^\circ $, $71^\circ $, $90^\circ $, $126^\circ $, $140^\circ $, and $168^\circ $), $a({z,650} )$, the profile depths z, and ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ simulations via HL6.0, three MLMs for estimating the vertical distribution of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ were built without or with boundary conditions. The performance of the three MLMs was evaluated using three accuracy evaluation indicators (${R^2}$, RMSE, and MAPE). Additionally, we compared the estimations of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ obtained through CatBoost, LightGBM, and RF with the corresponding values calculated by HL6.0. For the MLMs without boundary conditions, the accuracy evaluation indicators showed that the CatBoost models exhibited superior performance with ${R^2}$ values of 0.955, 0.950, and 0.920, RMSE values of 0.010 ${\textrm{m}^{ - 1}}$, 0.021 ${\textrm{m}^{ - 1}}$, and 0.021 ${\textrm{m}^{ - 1}}$, and MAPE values of 1.7%, 4.3%, and 3.6%. Furthermore, the ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ estimations by CatBoost models were found to be closely aligned with the HL6.0 calculated values for ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$. For the MLMs with boundary conditions, a more satisfactory consistency between ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ simulations obtained from HL6.0 and the results produced by CatBoost, LightGBM, and RF models on the testing set were found, with ${R^2}$>0.91, RMSE$\le $0.021${\textrm{m}^{ - 1}}$ and MAPE$\le $4.7%. Specifically, the CatBoost models showed the most favorable agreement with the HL6.0 simulations, with ${R^2}$, RMSE, and MAPE values of 0.998, 0.002 ${\textrm{m}^{ - 1}}$ and 0.4%, 0.955, 0.020 ${\textrm{m}^{ - 1}}$ and 4.0%, and 0.955, 0.016 ${\textrm{m}^{ - 1}}$ and 2.9%, respectively. When assessing the comparison between the vertical distributions of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ estimated by the CatBoost, LightGBM, and RF models and those calculated by HL6.0, both the values and contours demonstrated remarkable comparability. Notably, the CatBoost models exhibited results virtually indistinguishable from the HL6.0 simulations.

Through a comparative analysis of the accuracy evaluation indicators and spatial distributions for CatBoost models without and with boundary conditions for estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$, it was observed that the inclusion of boundary conditions increases the accuracy in estimating ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ with enhancement of accuracy evaluation indicators (${R^2}$, RMSE, and MAPE) and closer estimations to the HL6.0 simulations, which further underscores the importance of considering boundary conditions for reliable estimations. Furthermore, a satisfactory consistency was identified between the estimation results of ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ by CatBoost models with and without boundary conditions, with ${R^2}$ values of 0.992, 0.994, and 0.997, RMSE values of 0.004 ${\textrm{m}^{ - 1}}$, 0.007 ${\textrm{m}^{ - 1}}$, and 0.005 ${\textrm{m}^{ - 1}}$, and MAPE values of 0.4%, 0.7%, and 0.8%, respectively. Thus, MLM implementations without boundary conditions appears suitable for predicting $K({z,\lambda } )$ in the water that strongly affected by chlorophyll. Furthermore, the valid solar zenith angle range should be an essential prerequisite when employing these MLMs to estimate $K({z,\mathrm{\lambda }} )$.

${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$ exhibited a decreasing trend with increasing depth in the selected section near the Pearl River mouth, which could be attributed to the presence of sunlight-induced fluorescence. In terms of the spatial distribution across the investigated area for ${K_d}({z,650} )$, ${K_{Lu}}({z,650} )$, and ${K_u}({z,650} )$, there was a general decreasing trend with increasing offshore distance, with the exception of station E15, which may be attributed to the effects of a rainfall event.

Summary, in this study, a new method for estimating the vertical distribution of $K({z,\mathrm{\lambda }} )$ is proposed, which could provide more accurate information for the study of underwater light field distribution, water quality assessment and the validation of remote sensing data products. Given the measurement limitation associated with VSAM, which is used to measure $\beta ({\psi ,z,\lambda } )$, the MLMs constructed in this study are currently restricted to the estimation of $K({z,\mathrm{\lambda }} )$ at 650 nm. With the expansions in the measurement wavelengths of the VSAM in the future, the MLMs will be applied in estimating $K({z,\mathrm{\lambda }} )$ in multiple wavelengths, facilitating the study of the Secchi disk depth, euphotic zone depth, chlorophyll-a concentration, renewable energy development and so on.

Funding

National Natural Science Foundation of China (41976181, 41976170, 41976172); Science and Technology Planning Project of Guangzhou Nansha District Guangzhou City China (2022ZD001); Science Technology Fundamental Resources Investigation Program (2022FY100601); Scientific and Technological Planning Project of Guangzhou City (201707020023).

Acknowledgments

We thank our colleagues at the Optics Laboratory, South China Sea Institute of Oceanology, Chinese Academy of Sciences, whose invaluable contributions facilitated the acquisition of the field data employed in this study. We also thank the ODV software (https://odv.awi.de) for the convenience in drawing Fig. 5, Fig. 7, and Fig. 8.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. R. Lewis, M.-E. Carr, G. C. Feldman, W. Esaias, and C. McClain, “Influence of penetrating solar radiation on the heat budget of the equatorial Pacific Ocean,” Nature 347(6293), 543–545 (1990). [CrossRef]

2. Z. P. Lee, S. L. Shang, K. P. Du, and J. W. Wei, “Resolving the long-standing puzzles about the observed Secchi depth relationships,” Limnol. Oceanogr. 63(6), 2321–2336 (2018). [CrossRef]

3. P. K. Enaganti, P. K. Dwivedi, A. K. Srivastava, and S. Goel, “Study of solar irradiance and performance analysis of submerged monocrystalline and polycrystalline solar cells,” Prog. Photovoltaics 28(7), 725–735 (2020). [CrossRef]

4. A. Castillo-Ramirez, E. Santamaria-del-Angel, A. Gonzalez-Silvera, R. Frouin, M. T. Sebastia-Frasquet, J. Tan, J. Lopez-Calderon, L. Sanchez-Velasco, and L. Enriquez-Paredes, “A new algorithm to estimate diffuse attenuation coefficient from Secchi disk depth,” J. Mar. Sci. Eng. 8(8), 558 (2020). [CrossRef]

5. Z. P. Lee, M. Darecki, K. L. Carder, C. O. Davis, D. Stramski, and W. J. Rhea, “Diffuse attenuation coefficient of downwelling irradiance: An evaluation of remote sensing methods,” J. Geophys. Res. 110(C2), 9 (2005). [CrossRef]

6. X. Zhang, C. Li, W. Zhou, Y. Zheng, W. Cao, C. Liu, Z. Xu, Y. Yang, Z. Yang, and F. Chen, “Study of the profile distribution of the diffuse attenuation coefficient and Secchi disk depth in the Northwestern South China Sea,” Remote Sens. 15(6), 1533 (2023). [CrossRef]

7. N. G. Jerlov, “Marine optics,” (Elsevier, 1976).

8. J. T. Kirk, “Light and photosynthesis in aquatic ecosystems,” (Cambridge University Press, 1994).

9. C. Mobley, “The oceanic optics book,” (International Ocean Colour Coordinating Group (IOCCG),2022).

10. C. D. Mobley, “Light and water: radiative transfer in natural waters,” (Academic Press, 1994).

11. R. W. Preisendorfer, “Hydrologic optics,” (National Oceanic and Atmospheric Administration,1976).

12. J. E. Ivey, Closure between apparent and inherent optical properties of the ocean with applications to the determination of spectral bottom reflectanceUSF Tampa Graduate Theses and Dissertations (2009).

13. Z. P. Lee, K. P. Du, and R. Arnone, “A model for the diffuse attenuation coefficient of downwelling irradiance,” J. Geophys. Res. 110, 10 (2005). [CrossRef]

14. J. T. O. Kirk, “Monte Carlo study of the nature of the underwater light field in, and the relationship between optical properties of, turbid yellow waters,” Mar. Freshwater Res. 32(4), 517–532 (1981). [CrossRef]

15. C. D. Mobley, L. K. Sundman, and E. Boss, “Phase function effects on oceanic light fields,” Appl. Opt. 41(6), 1035–1050 (2002). [CrossRef]

16. H. R. Gordon, “Can the Lambert-Beer law be applied to the diffuse coefficient of ocean water?” Limnol. Oceanogr. 34(8), 1389–1409 (1989). [CrossRef]

17. M. H. Wang, S. Son, and L. W. Harding, “Retrieval of diffuse attenuation coefficient in the Chesapeake Bay and turbid ocean regions for satellite ocean color applications,” J. Geophys. Res. 114(C10), 15 (2009). [CrossRef]

18. J. L. Mueller, “SeaWiFS algorithm for the diffuse attenuation coefficient, K (490), using water-leaving radiances at 490 and 555 nm,” SeaWiFS Postlaunch Calibration and Validation Analyses, part 3, 24–27 (2000).

19. A. Morel and S. Maritorena, “Bio-optical properties of oceanic waters: A reappraisal,” J. Geophys. Res. 106(C4), 7163–7180 (2001). [CrossRef]

20. P. J. Werdell and S. W. Bailey, “An improved in-situ bio-optical data set for ocean color algorithm development and satellite data product validation,” Remote Sensing of Environment 98(1), 122–140 (2005). [CrossRef]

21. A. Morel, Y. Huot, B. Gentili, P. J. Werdell, S. B. Hooker, and B. A. Franz, “Examining the consistency of products derived from various ocean color sensors in open ocean (Case 1) waters in the perspective of a multi-sensor approach,” Remote Sensing of Environment 111(1), 69–88 (2007). [CrossRef]

22. B. Z. Chen, L. Wang, S. Q. Song, B. Q. Huang, J. Sun, and H. B. Liu, “Comparisons of picophytoplankton abundance, size, and fluorescence between summer and winter in northern South China Sea,” Cont. Shelf Res. 31(14), 1527–1540 (2011). [CrossRef]

23. W. Zheng, W. Zhou, W. Cao, Y. Liu, G. Wang, L. Deng, C. Li, Y. Zhang, and K. Zeng, “Vertical Variability of Total and Size-Partitioned Phytoplankton Carbon in the South China Sea,” Remote Sens. 13(5), 993 (2021). [CrossRef]

24. G. F. Wang, W. X. Cao, D. T. Yang, and D. Z. Xu, “Variation in downwelling diffuse attenuation coefficient in the northern South China Sea,” Chin. J. Ocean. Limnol. 26(3), 323–333 (2008). [CrossRef]

25. W. Zhao, W. Cao, S. Hu, and G. Wang, “Comparison of diffuse attenuation coefficient of downwelling irradiance products derived from MODIS-Aqua in the South China Sea,” Optics and Precision Engineering 26, 14–24 (2018). [CrossRef]

26. Q. Zhang, C. Chen, and P. Shi, “Characteristics of K_d(490) around Nansha Islands in South China Sea,” Journal of Tropical Oceanography 22, 9–16 (2003).

27. B. H. Li, K. Liu, M. Wang, Y. F. Wang, Q. He, L. M. Zhuang, and W. H. Zhu, “High-spatiotemporal-resolution dynamic water monitoring using LightGBM model and Sentinel-2 MSI data,” Int. J. Appl. Earth Obs. Geoinf. 118, 103278 (2023). [CrossRef]

28. H. Su, X. Yang, W. F. Lu, and X. H. Yan, “Estimating subsurface thermohaline structure of the global ocean using surface remote sensing observations,” Remote Sens. 11(C13), 1598 (2019). [CrossRef]

29. H. Su, X. M. Lu, Z. Q. Chen, H. S. Zhang, W. F. Lu, and W. T. Wu, “Estimating coastal chlorophyll-a concentration from time-series OLCI data based on machine learning,” Remote Sens. 13(4), 576 (2021). [CrossRef]

30. D. A. Maciel, C. C. F. Barbosa, E. Novo, R. Flores, and F. N. Begliomini, “Water clarity in Brazilian water assessed using Sentinel-2 and machine learning methods,” ISPRS-J. Photogramm. Remote Sens. 182, 134–152 (2021). [CrossRef]

31. L. Deng, W. Zhou, W. Cao, W. Zheng, G. Wang, Z. Xu, C. Li, Y. Yang, S. Hu, and W. Zhao, “Retrieving phytoplankton size class from the absorption coefficient and chlorophyll a concentration based on support vector machine,” Remote Sens. 11(9), 1054 (2019). [CrossRef]

32. H. J. Xue, F. Chai, N. Pettigrew, D. Y. Xu, M. Shi, and J. P. Xu, “Kuroshio intrusion and the circulation in the South China Sea,” J. Geophys. Res. 109(C2), 14 (2004). [CrossRef]

33. R. M. Pope and E. S. Fry, “Absorption spectrum (380-700 nm) of pure water .2. Integrating cavity measurements,” Appl. Opt. 36(33), 8710–8723 (1997). [CrossRef]

34. China Meterological Administration, “ Reanalysis (CRA-40) dataset” Global Atmospheric (1979–2018). https://data.cma.cn/

35. Physical Sciences Laboratory, “ Reanalysis I dataset,” National Centers for Environmental Prediction (NCEP) /National Center for Atmospheric Research (NCAR) (1984). https://psl.noaa.gov/data/gridded/data.ncep.reanalysis

36. C. S. Roesler and A. H. Barnard, “Optical proxy for phytoplankton biomass in the absence of photophysiology: Rethinking the absorption line height,” Methods in Oceanography 7, 79–94 (2013). [CrossRef]

37. C. Liu, C. Li, W. Zhao, F. Chen, Z. M. Yang, X. Q. Zhang, Y. Zhang, W. Zhou, W. X. Cao, L. H. Yu, and H. L. Xing, “Instrument for in situ synchronous measurement of the multi-angle volume scattering function and attenuation coefficient,” Opt. Express 31(1), 248–264 (2023). [CrossRef]

38. A. Tonizzo, M. Twardowski, S. McLean, K. Voss, M. Lewis, and C. Trees, “Closure and uncertainty assessment for ocean color reflectance using measured volume scattering functions and reflective tube absorption coefficients with novel correction for scattering,” Appl. Opt. 56(1), 130–146 (2017). [CrossRef]

39. D. Ignatov and A. Ignatov, and Ieee, “Decision stream: cultivating deep decision trees,” in 29th Annual IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Proceedings-International Conference on Tools With Artificial Intelligence (IEEE, 2017), 905–912.

40. Y. H. Qian, H. Xu, J. Y. Liang, B. Liu, and J. T. Wang, “Fusing Monotonic Decision Trees,” IEEE Trans. Knowl. Data Eng. 27(10), 2717–2728 (2015). [CrossRef]

41. M. Sheykhmousa, M. Mahdianpari, H. Ghanbari, F. Mohammadimanesh, P. Ghamisi, and S. Homayouni, “Support vector machine versus random forest for remote sensing image classification: a meta-analysis and systematic review,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 13, 6308–6325 (2020). [CrossRef]

42. A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” (2018).

43. G. L. Ke, Q. Meng, T. Finley, T. F. Wang, W. Chen, W. D. Ma, Q. W. Ye, and T. Y. Liu, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” in 31st Annual Conference on Neural Information Processing Systems (NIPS), Advances in Neural Information Processing Systems (Neural Information Processing Systems (Nips), 2017),

44. G. Ngo, R. Beard, and R. Chandra, “Evolutionary bagging for ensemble learning,” Neurocomputing 510, 1–14 (2022). [CrossRef]

45. D. A. Siegel and T. D. Dickey, “Observations of the vertical structure of the diffuse attenuation coefficient spectrum,” Deep-Sea Res., Part A 34(4), 547–563 (1987). [CrossRef]

46. H. R. Gordon, “Diffuse reflectance of the ocean: the theory of its augmentation by chlorophyll a fluorescence at 685 nm,” Appl. Opt. 18(8), 1161–1166 (1979). [CrossRef]

47. B. J. Topliss, “Optical measurements in the Sargasso Sea: solar stimulated chlorophyll fluorescence,” Oceanol. Acta 8, 263–270 (1985).

48. J. E. Tyler, “Natural water as a monochromator,” Limnol. Oceanogr. 4(1), 102–105 (1959). [CrossRef]

49. D. A. Siegel and T. D. Dickey, “Variability of net longwave radiation over the eastern North Pacific Ocean,” J. Geophys. Res. 91(C6), 7657–7666 (1986). [CrossRef]

50. E. Armijos, A. Crave, J. C. Espinoza, N. Filizola, R. Espinoza-Villar, I. Ayes, P. Fonseca, P. Fraizy, O. Gutierrez, P. Vauchel, B. Camenen, J. M. Martiinez, A. Dos Santos, W. Santini, G. Cochonneau, and J. L. Guyot, “Rainfall control on Amazon sediment flux: synthesis from 20 years of monitoring,” Environ. Res. Commun. 2, 051008 (2020). [CrossRef]

Abbreviations	Definitions	Units
$E$	Irradiance	$W m^{- 2} n m^{- 1}$
$K$	Diffuse attenuation coefficient	$m^{- 1}$
$K_{d}$	Diffuse attenuation coefficient for downwelling irradiance	$m^{- 1}$
$K_{u}$	Diffuse attenuation coefficient for upwelling irradiance	$m^{- 1}$
$L$	Radiance	$W m^{- 2} n m^{- 1} s r^{- 1}$
$K_{L u}$	Diffuse attenuation coefficient for upwelling radiance	$m^{- 1}$
$R_{r s}$	Remote sensing reflectance	$s r^{- 1}$
$a$	Absorption coefficient	$m^{- 1}$
$c$	Attenuation coefficient	$m^{- 1}$
$a_{n w}$	Total minus water absorption coefficient	$m^{- 1}$
$c_{n w}$	Total minus water attenuation coefficient	$m^{- 1}$
$β$	Volume scattering function	$m^{- 1} s r^{- 1}$
$b_{p}$	Particulate scattering coefficient	$m^{- 1}$
$b_{b p}$	Particulate backscattering coefficient	$m^{- 1}$
$b_{b w}$	Water backscattering coefficient	$m^{- 1}$
$B_{p}$	Particulate backscattering ratio
$C h l a$	Chlorophyll a concentration	$m g m^{- 3}$
$z$	Vertical depth	m
$λ$	Wavelength	nm
$ψ$	Volume scattering angle	°

The name of hyperparameters	Meaning
learning_rate	Learning rate
max_depth	The maximum depth of trees
num_leaves	The number of leaves in one tree
min_data_in_leaf	The minimum number in one leaf
min_sum_hessian_in_leaf	The minimum sum of the Hessian in a leaf
feature_fraction	The ratio to randomly select the features
bagging_fraction	The ratio of randomly sampling from the training data
reg_alpha	L1 regularization
reg_lambda	L2 regularization

Abbreviations	Definitions	Units
$E$	Irradiance	$W m^{- 2} n m^{- 1}$
$K$	Diffuse attenuation coefficient	$m^{- 1}$
$K_{d}$	Diffuse attenuation coefficient for downwelling irradiance	$m^{- 1}$
$K_{u}$	Diffuse attenuation coefficient for upwelling irradiance	$m^{- 1}$
$L$	Radiance	$W m^{- 2} n m^{- 1} s r^{- 1}$
$K_{L u}$	Diffuse attenuation coefficient for upwelling radiance	$m^{- 1}$
$R_{r s}$	Remote sensing reflectance	$s r^{- 1}$
$a$	Absorption coefficient	$m^{- 1}$
$c$	Attenuation coefficient	$m^{- 1}$
$a_{n w}$	Total minus water absorption coefficient	$m^{- 1}$
$c_{n w}$	Total minus water attenuation coefficient	$m^{- 1}$
$β$	Volume scattering function	$m^{- 1} s r^{- 1}$
$b_{p}$	Particulate scattering coefficient	$m^{- 1}$
$b_{b p}$	Particulate backscattering coefficient	$m^{- 1}$
$b_{b w}$	Water backscattering coefficient	$m^{- 1}$
$B_{p}$	Particulate backscattering ratio
$C h l a$	Chlorophyll a concentration	$m g m^{- 3}$
$z$	Vertical depth	m
$λ$	Wavelength	nm
$ψ$	Volume scattering angle	°

The name of hyperparameters	Meaning
learning_rate	Learning rate
max_depth	The maximum depth of trees
num_leaves	The number of leaves in one tree
min_data_in_leaf	The minimum number in one leaf
min_sum_hessian_in_leaf	The minimum sum of the Hessian in a leaf
feature_fraction	The ratio to randomly select the features
bagging_fraction	The ratio of randomly sampling from the training data
reg_alpha	L1 regularization
reg_lambda	L2 regularization

Approach for estimating the vertical distribution of the diffuse attenuation coefficient in the South China Sea

Abstract

1. Introduction

2. Methods and data

2.1 Study area

2.2 Data collection and processing

2.2.1 Absorption and attenuation coefficients

2.2.2 Volume scattering function

2.2.3 Boundary conditions

2.3 Methods

2.3.1 Hydrolight

2.3.2 Machine learning methods

2.3.2.1 CatBoost

2.3.2.2 LightGBM

2.3.2.3 RF

2.4 Modeling approach

2.5 Accuracy evaluation indicators

3. Results

3.1 Results of CatBoost/LightGBM/RF calculated without boundary conditions

3.2 Results of CatBoost/LightGBM/RF with boundary conditions

4. Discussion

4.1 Comparative analysis of $K({z,\lambda } )$ estimates with and without boundary conditions

4.2 Spatial and vertical distribution characteristics of $K({z,650} )$

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (3)

Equations (3)

Optics Express

Parameters	Accuracy Evaluation IndicatorsBoundary Conditions	No	Yes
$K_{d} (z, 650)$	$R^{2}$	0.955	0.998
	$R M S E$	0.010	0.002
	$M A P E$	1.7%	0.4%
$K_{L u} (z, 650)$	$R^{2}$	0.950	0.955
	$R M S E$	0.021	0.020
	$M A P E$	4.3%	4.0%
$K_{u} (z, 650)$	$R^{2}$	0.920	0.955
	$R M S E$	0.021	0.016
	$M A P E$	3.6%	2.9%