Classification of water contamination developed by 2-D Gabor wavelet analysis and support vector machine based on fluorescence spectroscopy

P. Huang; T. Mao; Q. Yu; Y. Cao; J. Yu; G. Zhang; D. Hou

doi:10.1364/OE.27.005461

1. Introduction

Incidents of drinking water contamination in recent years have adversely affected communities and their residents [1]. A timely response to such pollution incidents requires an immediate classification and identification of pollutants so as to prevent further damage. Pollutant sources can be identified by classifying organic pollutants. A successful classification should help prevent water pollution incidents that threaten residents’ health and social stability. Therefore, the rapid detection of drinking water contamination through the identification and classification of pollutants is crucial.

The current detection methods for water pollutants in distribution networks are mainly based on traditional water quality indicators [2]. For example, Conde [3] developed regression models of artificial neural networks (ANNs) and relevance vector machines (RVMs) according to normal water quality parameters and developed a discriminant classifier to identify abnormal water from normal water online. However, the process of obtaining these water quality parameters is made difficult by issues such as a long analysis time, low sensitivity, high reagent requirements, and heavy waste production [4]. These problems hinder high-frequency water quality detection online. Compared with conventional detection methods, the water quality analysis method based on spectra can work directly without other operations such as water sample extraction or separation [5]. Ultraviolet-visible (UV-Vis) spectroscopy is commonly adopted in research. Skala [6] explored the application of diffuse reflectance spectroscopy to UV-Vis spectra for the diagnosis of epithelial precancers and cancers in vivo and compared the classification accuracy between a physical model (Monte Carlo inverse model) and a principal component analysis (PCA)-based approach. G.Langergraber [7, 8] continuously monitored water quality through ultraviolet spectroscopy and judged anomalies according to three-dimensional (3D) spectra and the historical information of ultraviolet spectra to obtain satisfactory results. D.B. Hou [9] integrated PCA with chi-square distribution to detect anomalies in distribution water quality. However, the low detection limit of some organic matters based on UV-Vis spectra is not enough to meet the detection standards of some test scenarios [10]. 3D fluorescence spectra contain abundant information and feature lower detection limit for organic matters than UV-Vis spectra do [11]. They are characterized by three parameters, namely, excitation wavelength, emission wavelength, and fluorescence intensity, which are generally formulated as an extraction-emission matrix (EEM). 3D fluorescence spectra can also provide complete spectral information and exhibit high selectivity, sensitivity, and reproducibility [12]. Thus, 3D fluorescence spectroscopy is widely used as a tool to detect incidents of water quality contamination and is known to complement conventional water index methods and the UV-Vis spectra method.

Researchers have comprehensively explored qualitative discrimination, quantitative analysis, and classification and identification based on 3D fluorescence spectra [13–15]. For example, Yulia Shutova [15] collected water samples from various treatment stages for 3D fluorescence spectrum analysis and extracted five fluorescent components to characterize organic matter in water using parallel factor analysis (PARAFAC). The results showed that changes in the intensity ratio of fluorescent components is related to changes in concentration and composition caused by the decomposition of organic matter. Liyang Yang [16] analyzed samples in the drinking water treatment process by PARAFAC modeling and obtained the fluorescent components that characterize dissolved organic matter for qualitative and quantitative studies. Studies have shown that the EEM combined with the parallel factor method (EEM-PARAFAC) benefits the optimization of the drinking water treatment process and helps ensure water quality. Peiris RH [17] used membrane-based 3D fluorescence spectroscopy together with PCA to monitor ultrafiltration and nanofiltration processes in drinking water pretreatment (biological filtration) for high membrane fouling during drinking water membrane treatment. Baker [18] studied the 3D fluorescence spectra of river water samples from 10 sampling sites in six rivers, some of which have sewage outlets in sewage treatment plants. The study showed that the tryptophan versus flavonic acid-like fluorescence intensity ratio can be used to distinguish the source of different river waters. Pavelescu G [19] used 3D fluorescence spectroscopy to analyse water samples from 12 different wells in the surrounding environment. PARAFAC is employed to identify differences in the spectral characteristics of water samples from different sources. It is also useful in further distinguishing water samples from different areas. PARAFAC and PCA are commonly used methods for extracting 3D fluorescence spectrum data. However, current methods for extracting features from 3D fluorescence spectra have some restrictions. For example, PARAFAC requires the number of factors to be known before modelling; such requirement limits its application to on-line detection. PCA is a two-way feature extraction method that straightens fluorescence data from three dimensions to two dimensions, leading to information loss. For substances with closely positioned or overlapping fluorescence characteristic peaks, the subtle features of each of their spectrum should be obtained; such requirement is a challenge for PCA.

To extract detailed features when exploiting unsupervised feature extraction methods and resolve spectral peaks that are too closely positioned or overlapping, this study introduces the two-dimensional (2D) Gabor wavelet analysis method. 2D Gabor wavelet analysis is an effective feature extraction method for image data in the field of image analysis [20]. The data characteristics of a 3D fluorescence spectrum are similar to those of a grayscale image. As different organic pollutants show different spectral texture information, 2D Gabor wavelets are applicable to this study. Wang Chunyan [21] applied the 2D Gabor wavelet feature of image recognition to extract the concentration-synchronized 3D fluorescence spectra of different oils and verified the feasibility and effectiveness of image recognition algorithms applied to spectral image analysis.

In the real-time monitoring system of water quality detection, homologous chemical substances (such as diverted volatile phenols) may be encountered. The application of fluorescence spectroscopy is often made problematic by issues such as similar fluorescence spectra and the overlapping of characteristic peaks. Therefore, the present study aims to investigate the possible utilization of 2D Gabor wavelets as a feature extraction method for fluorescence spectra. This method is employed to discriminate five water organic pollutants (phenol, hydroquinone, resorcinol, rhodamine B, and salicylic acid) in water quality monitoring based on 3D fluorescence spectra. 2D Gabor wavelets are utilized to extract features, which are then further described by block statistics. On the basis of a support vector machine (SVM) multi classification model, different pollutants are classified and identified. The discrimination assessments for organic pollutants using the proposed model are compared with those obtained using the commonly used PCA method.

2. Method

2.1. Organic solutions

Five classical organic pollutants (phenol, hydroquinone, resorcinol, rhodamine B, and salicylic acid) in various concentrations are employed to simulate water pollution incidents. Phenol, hydroquinone, and resorcinol are routinely monitored in water quality detection [22]. As a result of the similarities of their chemical functional groups, the characteristic peaks of their 3D fluorescence spectra are extremely close and tend to overlap. Rhodamine B is a synthetic fuel used to produce fluorescent dyes. Salicylic acid is a fat-soluble organic acid that produces a large amount of waste water during its production.

Phenol, hydroquinone, resorcinol, rhodamine B, and salicylic acid were accurately weighed by an electronic analytical balance to be exactly 0.01 g each. Then, 0.01 g/L of stock solution of each organic matter was prepared after dissolving it in 1 L of laboratory drinking water. The solution was kept away from light and stored at a low temperature. Then, a 100 mL solution of phenol, hydroquinone, resorcinol, rhodamine B, and salicylic acid stock solutions was dissolved in a 1 L volumetric flask and diluted to a solution of 100 μg/L. Subsequently, 10, 20, 30, 40, 50, 100, 150, and 200 mL of solution were taken from the diluted solution. After adding laboratory drinking water to 500 mL volumetric flasks, the solutions were diluted to 2, 4, 6, 8, 10, 20, 30, and 40 μg/L for each organic matter. Each solution was scanned by fluorescence spectrometer for three times as a parallel test. The experiment was repeated for three days to obtain sample solutions with different drinking water as a training dataset. As a result, the training dataset contains 72 samples for each organic matter and comes to 360 samples as a total. Similarly, the testing solutions were diluted to 3, 5, 7, 9, 15, 25, 35, and 45 μg/L for each organic matter with another day’s drinking water, and was scanned for three times as the parallel test. With 24 samples for each organic matter, the testing dataset contains 120 fluorescence data samples.

2.2. Fluorescence

Each sample was scanned using a Hitachi F-4600 fluorescence spectrometer to obtain the 3D fluorescence spectrum. The excitation and emission wavelengths were set to 5 nm for the sampling interval of 200–700 nm. The slit width was 10 nm, the PMT voltage was 700 V, the scanning speed was 12 000 nm/min, and the sample was measured in a 1 cm ×1 cm×4.5 cm quartz fluorescence cuvette. Each sample generated a 3D EEM that presented the fluorescence characteristics of the organic matter in the sample with the data size of 101 ×101. Data collected for three days were scanned continuously and was used as the training sample. Another set of data collected for a day was scanned as testing dataset.

2.3. Principal component analysis

PCA was carried out in MATLAB (R2015b). Rayleigh and Raman scatter regions were removed in conformity to the linear assumptions [23,24]. The training dataset and testing dataset were mean centered and scaled to unit variance to remove bias toward compounds and spectral regions with high variability. PCA is an unsupervised feature extraction method for 2D data [25]. Each spectrum pair of the EEM was straightened into a one-dimensional vector by the direction of the excitation spectrum, resulting in a typical data structure for PCA.

2.4. 2-D Gabor wavelet

The 2D Gabor wavelet was first proposed by D. Gabor in the 1940s. J. Daugman proposed a 2D Gabor filter model and applied it to the field of computer vision [20]. The 2D Gabor wavelet is a family of complex function systems generated by the Gabor function through scale stretching and rotation. As a result of the similar filter responses of simple cells in the visual cortex, 2D Gabor wavelets obtain good time-frequency localization characteristics and multi-resolution characteristics. The ability of these wavelets to extract local variations in images can be applied to extract the features of the 3D fluorescence spectrum data of organic pollutants in drinking water. In addition, Gabor wavelets are insensitive to changes in light and can thus solve the disturbance on the spectra caused by the light source intensity of the fluorescence spectrometer [21]. The Gabor function can obtain the best localization in the time and frequency domains at the same time. In the spatial domain, the 2D Gabor function is a Gaussian-limited wavefunction expressed as follows [20],

ϕ_{μ, ν} = \frac{{‖ k_{μ, ν} ‖}^{2}}{σ^{2}} - exp (- \frac{{‖ k_{μ, ν} ‖}^{2} {‖ z ‖}^{2}}{2 σ^{2}}) \cdot [exp (i k_{μ, ν} z) - exp (- \frac{σ^{2}}{2})]

Where μ and ν represent the direction and scale of the Gabor filter, z = (x, y) is the spatial position, and k_μ,ν is the wave vector of the plane wave, expressed as k_μ,ν = k_νe^iΦ_μ, where k_ν = k_max/f^ν, Φ_μ = πµ/8 and k_max = π/2 known as the maximum frequencies.

The Gabor transform of the 3D fluorescence spectrum was obtained by convolving the three-dimensional fluorescence spectrum image F(z) with Gabor filters ϕ_μ,ν (z) of different scales and directions. The Gabor filter bank used in this study contained five scales ν = {0, 1, 2, 3, 4} and eight directions μ = {0, 1, 2, 3, 4, 5, 6, 7} as shown in Fig. 1, where μ represents the different directions of the filter, namely, the orientation directions of {0, π/8, 2π/8, 3π/8, 4π/8, 5π/8, 6π/8, 7π/8}. Different filter scales and directions were combined to obtain 40 sub-band outputs (take amplitude information). Figures 2 and 3 present the filtration results of the spectrum data deploying Gabor filters in scale 4 and orientation 0–7. Take R_μ,ν (z) as a result of the filtering direction μ and dimension ν. And the m_μ,ν indicates the column vector formed by the elements in the sub-band output R_μ,ν (z) after column-by-column concatenation. The filtering outputs from all scales and directions were connected to form a column vector $χ = {[m_{0, 0}^{T} m_{0, 1}^{T}, \dots, m_{4, 7}^{T}]}^{T}$ , which is represented as a feature vector of the 3D fluorescence spectrum. However, the dimensionality of each feature of the three-dimensional fluorescence spectrum obtained by Gabor filtering was as high as 408040 (101 × 101 × 40), which will result in a large computation for on-line water quality identification systems.

Fig. 1 Gabor filter bank.

Download Full Size | PDF

Fig. 2 Feature coefficient Figs. of Rhodamine B drinking water solution under different Gabor filters in scale 4 and orientation 0–3.

Download Full Size | PDF

Fig. 3 Feature coefficient Figs. of Rhodamine B drinking water solution under different Gabor filters in scale 4 and orientation 4–7.

Download Full Size | PDF

The feature dimension obtained directly after adopting the Gabor filter was so high that it will cause difficulties in identification tasks. This study aimed to reduce the number of features without affecting the feature description efficiency by employing block statistics for the secondary feature description of Gabor coefficients [26]. Each Gabor sub-band image was divided into several sub-block images which is of the same size, as is shown in Figs. 4(a) and 4(b). Each sub-block was marked with the mean and standard deviation of all elements in the sub-block image. The specific process is as follows.

Fig. 4 (a) Gabor feature block. (b) Block statistics of Gabor feature.

Download Full Size | PDF

A Gabor-transformed sub-band image $R_{μ, ν}^{i} (z)$ with a size of L_w × L_h was divided into K connected sub-blocks measuring L_w × L_h/K. The i-th sub-block is denoted as $R_{μ, ν}^{i} (z)$ , and the mean $m_{μ, ν}^{i}$ and the standard deviation $σ_{μ, ν}^{i}$ of all elements in the sub-blocks are referred to as the local feature description of the sub-block region expressed as

m_{μ, ν}^{i} = \frac{K}{L_{r} \times L_{e}} \sum R_{μ, ν}^{i} (z)

σ_{μ, ν}^{i} = {(\frac{K}{L_{r} \times L_{e}} \sum {(R_{μ, ν}^{i} (z) - m_{μ, ν}^{i})}^{2})}^{\frac{1}{2}}

The sub-band $R_{μ, ν}^{i} (z)$ can be described as

[m_{μ, ν}^{1} σ_{μ, ν}^{1} m_{μ, ν}^{2} σ_{μ, ν}^{2} \dots m_{μ, ν}^{K} σ_{μ, ν}^{K}]

Subsequently, the block statistics of all the sub-blocks are connected into a column vector

{[m_{0, 0}^{1} σ_{0, 0}^{1} \dots m_{0, 0}^{K} σ_{0, 0}^{K} \dots m_{4, 7}^{1} σ_{4, 7}^{1} \dots m_{4, 7}^{K} σ_{4, 7}^{K}]}^{T}

As a Gabor feature vector describes a 3D fluorescence spectrum, the number K of sub-blocks in a sub-band image should be reasonably selected to ensure that the feature number is compressed while enhancing the accuracy of the feature description.

2.5. SVM multi-classification model

Cortes C and Vapnik V proposed the SVM, a data mining method, together with statistical learning theory [27]. The main feature is to construct the optimal classifying hyperplane by minimizing the structural risk and maximizing the classification interval to improve the generalization ability of the learning machine, which solves the problems of high dimensionality, non-linearity, and local minimums.

The main idea of organic pollutant classification and recognition based on 3D fluorescence spectra using multiple SVM classifiers is to train the features extracted from PCA and 2D Gabor wavelets and the labels of the training data to gain the multiple SVM classifiers. The training dataset was split into two parts for training and validation. In this experiment, the training fluorescence dataset was split into five consecutive folds [28]. One fold was used as a validation set while the four remaining folds formed the training set. After putting the test samples into the classifier for label prediction, the number of support vectors, the number of misclassified samples, and the classification accuracy with different kernel functions were obtained. The classification accuracy peaked when a linear kernel was adopted in the classifier. This result indicated that this kind of problem has linear separability. The recognition outcome of the test sample was thus obtained according to the output prediction label.

3. Results and discussion

3.1. Pre-processing

Take a 10 μg/L rhodamine B solution as an example. Samples scanned using a Hitachi F-4600 fluorescence spectrometer presented Rayleigh and Raman scattering. The Delaunay triangulation method for interpolation was used to remove the Rayleigh scattering. Raman scattering was eliminated by removing the spectrum data of purified water. The comparison of the spectra before and after the elimination of scattering showed that the characteristic information of the contaminants was more apparent on the spectrum without scattering than on the spectrum with scattering. Moreover, the former more intuitively reflected the characteristic peaks of the contaminants than the latter did (Figs. 5(a) and 5(b)).

Fig. 5 (a) Spectral image of Rhodamine B spectral image before pretreatment. (b) Spectral image of Rhodamine B spectral image after pretreatment.

Download Full Size | PDF

3.2. Experiments

In this study, five different organic pollutants were used to realize the discrimination goal. The fluorescence images of each organic pollutant are shown in Fig. 6. For rhodamine B and salicylic acid, the characteristic peaks in the fluorescence spectrum are easily identified with the peak locations in 550 nm/575 nm (Ex/Em) and 295 nm/405 nm (Ex/Em), respectively. The characteristic peaks for phenol, hydroquinone, and resorcinol are close to one another. As for the basic chemical properties of the organic pollutants, phenol, hydroquinone, and resorcinol show a similar chemical structure. In particular, hydroquinone and resorcinol share the same molecular formula, with the difference being the variation of hydroxyl bonds (−OH), as shown in Table 1 and Fig. 6. This kind of relation may partly explain the fluorescence spectral characteristics of these organic pollutants. In this study, we aimed to extract the distinct variance of 3D fluorescence for each contaminant. PCA and 2D Gabor wavelets were adopted to extract detailed information about the five organic contaminants.

Table 1. Basic chemical properties of five organic contiminants

View Table | View all tables in this article

Fig. 6 Spectrum image of phenol, Rhodamine B, hydroquinone, resorcinol, and salicylic acid.

Download Full Size | PDF

The Gabor wavelet used to describe image texture features results from filters at multiple frequencies and directions in different positions of spectral images. The Gabor filter bank used in this study contained five dimensions ν = {0, 1, 2, 3, 4} and eight directions μ = {0, 1, 2, 3, 4, 5, 6, 7}. Phenol, rhodamine B, hydroquinone, resorcinol, and salicylic acid were utilized to obtain 40 sub-bands (take the amplitude information) under various filter banks. With regard to dimension reduction, each sub-band image was reduced into a lower dimension, with the block number set to 16 according to previous research in the facial recognition field. For the three-day collection of fluorescence data, 360 samples were set as the training data to obtain the classification model. From the data from the last day, 120 samples were prepared as the testing data with concentrations being different from those of the training data. The classification accuracy was defined as the ratio of the number of correctly matched samples to the total number of testing samples. To illustrate the efficiency of the two methods in extracting the features of different organic water solutions, we conducted three experiments.

The samples of rhodamine B, salicylic acid, and phenol were set as Group 1, which indicated evidently different fluorescence characteristics. The samples of phenol, hydroquinone, and resorcinol were selected as Group 2, which indicated only slight variations of fluorescence spectra. The samples of all the five organic water solutions (rhodamine B, salicylic acid, phenol, hydroquinone, and resorcinol) were set as Group 3 to further detect the classification process. For Group 1, the three-day (216 samples) solutions of rhodamine B, salicylic acid, and phenol were chosen as the training dataset. The latter day’s solutions (72 samples) were chosen as the testing dataset. Group 2 comprised 216 samples of phenol, hydroquinone, and resorcinol as the training dataset and 72 samples as the testing dataset. Group 3 comprised 360 samples of all five contaminants as the training dataset and 120 samples as the testing dataset. For each group, the features extracted from the training dataset from the two methods were inputted into the training model in the SVM classifier. The SVM kernel function covers linear kernel, polynomial kernel, RBF kernel, and sigmoid kernel. The multiple SVM classifier is obtained by using different kernel functions for SVM model training and grid method to search for optimal parameters.

3.3. Classification results with Group 1

Tables 2, 3, 4 show the results of the classification of the three groups based on PCA and 2D Gabor wavelet with SVM. The results of the two feature extraction methods shown in Table 3 indicated similarities. The accuracy reached 94.4% for the classification of phenol, rhodamine B, and salicylic acid. For PCA (Fig. 7), the restructured features for phenol, rhodamine B, and salicylic acid displayed visible differences. For 2D Gabor, the Gabor features extracted from the filter of scale 4 and orientations 0–7 displayed visible differences in the spectra images shown in Figs. 8, 9 and 10. Figure 9 shows the Gabor feature image from filters of scale 4 and orientations 0–7 for rhodamine B. From all directions, the feature images show that characteristic peak information(550 nm/575 nm) was detected from the Gabor filters. Figures 8 and 10 show the feature images for salicylic acid and phenol, respectively. The Gabor features of salicylic acid were more visibly different from those of phenol mostly on the filters of orientation from 0–3. This result indicated that the florescence characteristics of salicylic acid and phenol were partly similar and partly different. The misclassified samples (four rhodamine B samples) in this group turned out to be samples of low concentrations at 1 and 3μg/L, whose fluorescence characteristics were too obscure to distinguish.

Table 2. Classification results for samples of Rhodamine B, Salicylic acid and phenol in Group 1.

View Table | View all tables in this article

Table 3. Classification results for samples of phenol, hydroquinone and resorcinol in Group 2.

View Table | View all tables in this article

Table 4. Classification results for samples of phenol, hydroquinone, resorcinol, Rhodamine B, Salicylic acid in Group 3.

View Table | View all tables in this article

Table 5. Classification Accuracy with Groups 1,2,3 for Gabor and PCA.

View Table | View all tables in this article

Fig. 7 Recontructed spectrum though PCA.

Download Full Size | PDF

Fig. 8 Gabor features in filters with scale 4 and orientation 0–7 for Salicylic acid in 10ug/L.

Download Full Size | PDF

Fig. 9 Gabor features in filters with scale 4 and orientation 0–7 for Rhodamine B in 10ug/L.

Download Full Size | PDF

Fig. 10 Gabor features in filters with scale 4 and orientation 0–7 for Phenol in 10ug/L.

Download Full Size | PDF

3.4. Classification results for Group 2 and 3

Table 3 shows the different results of PCA and 2D Gabor wavelets for the classification of phenol, hydroquinone, and resorcinol. For the testing samples of hydroquinone and resorcinol, the accuracy of their classification declined when PCA was used as the feature extraction method. The features of hydroquinone and resorcinol restructured from PCA were too close to correctly classify the samples (Fig. 7). For 2D Gabor wavelets, the classification accuracy reached 100%, which strongly proves the effective extraction of texture information from fluorescence spectra. Figures 11 and 12 show the Gabor feature images for hydroquinone and resorcinol from all orientations. By comparing the two feature images, we found that the Gabor images on orientations 0, 2, 3, and 5 of the two samples visibly varied. Such kind of information was detected and utilized by the 2D Gabor wavelets and played an important role in the classification task.

Fig. 11 Gabor features in filters with scale 4 and orientation 0–7 for hydroquinone in 10ug/L.

Download Full Size | PDF

Fig. 12 Gabor features in filters with scale 4 and orientation 0–7 for resorcinol in 10ug/L.

Download Full Size | PDF

To further observe the results of the experiment, we used all five organic water solution samples, as shown in Table 4. The classification accuracy of PCA was 89.1%, and that of 2D Gabor wavelets was 96.7%. The 2D Gabor wavelets performed excellent in classifying hydroquinone and resorcinol, which presented close characteristic peaks in their fluorescence spectra. PCA directly straightened the 3D data into 2D data in one direction, thereby reducing the dimensionality of the extracted features and eliminating spectral details. By contrast, the 2D Gabor wavelets described the texture information of the spectra explicitly in the form of Gabor eigenvalues. As indicated by the results of the testing datasets of Groups 1, 2, and 3 (Table 5), the PCA method performed poorly in discriminating between hydroquinone and resorcinol, whereas the Gabor method correctly classified all the hydroquinone and resorcinol samples. Therefore, the feature extraction based on 2D Gabor wavelets together with multiple SVM classifiers achieved the best classification effect for the substances with close or overlapping characteristic peaks. This method can thus effectively identify the types of known pollutants in drinking water and provide effective and reliable information for the research and analysis of pollutants in drinking water systems of distribution networks.

3.5. Discussion with block number K

As mentioned above, Gabor features grow bulky and thus cause slow calculation for the detection system. Block statistics were implemented in this experiment, with the block number set to 16 according to a previous study on 2D Gabor wavelets in the facial discrimination field. In this section, we discuss the development of a reliable and feasible water contaminant classification procedure.

After Gabor feature extraction, Gabor feature matrices were obtained for each sample with 40 different scales and directional filters. From the three-day collection of fluorescence data, 360 samples were set as training data to obtain the classification model. From the last day’s data, 120 samples with different concentrations were set as testing data. The classification accuracy was defined as the ratio of the number of correctly matched samples to the total number of test samples. Prior to classification, block statistics were used to perform a quadratic characterization of each sample’s Gabor feature matrix. As is explained in section 2.4, the block number K is set to n² (n = 1, 2, 3...) to make each sub-block the same size. Take the number of blocks K = 1, 4, 16, 64; if K = 1, the statistical description of each Gabor feature matrix was obtained directly. The block feature descriptors of each Gabor matrix were composed of a feature vector, which was used as the characteristic representation of the 3D fluorescence spectra of the pollutant. Table 6 provides the recognition accuracy of the test samples and the main program runtime under different numbers of blocks.

Table 6. the classification accuracy on different K

View Table | View all tables in this article

As shown in Table 6, the classification accuracy rate evidently improved when the number of sub-blocks was set to 4 with 320 features relative to the case in which the number of sub-blocks was set to 1. This result showed that after using the block statistics, the feature description reflected improved accuracy and suitability for on-line pollutant identification. The highest rate of recognition of the contaminant samples was obtained at K = 64. However, the program runtime reached 0.48 s, which is not appropriate in the on-line detection scenario. In this study, K was chosen as 16 to ensure both the accuracy and the program operating efficiency. The main program computation time for PCA-based method is around 0.1s, and the Gabor-based method with block number K = 16 runs 0.008s. Classification method with 2D Gabor wavelets performs better both in the accuracy and efficiency.

3.6. Discussion with Gabor features

The Gabor filter bank used in this study contained five dimensions ν = {0, 1, 2, 3, 4} and eight directions μ = {0, 1, 2, 3, 4, 5, 6, 7}, as previously defined. Forty sub-band vectors were combined together to form the final Gabor feature. All 40 sub-band vectors were used in the experiment after employing block statistics to reduce feature dimensionality. However, Figs. 11 and 12 show that the most considerable difference occurred in directions 2, 3 and 5 for Gabor feature. Orientations 0 and 6 slightly varied for two feature groups, whereas the other features in orientations 1, 4, and 7 seemed rather similar. Evidently, filters in different directions and scales will contribute differently to feature representations and classification results. In the succeeding discussion, we clarify the contributions from all filters.

Table 7 depicts the various accuracies for the Gabor filters in scales 0–4. The best discrimination result was obtained in scales 2–3 possibly because of the wide scope when applying large scales for texture feature extraction. Table 8 lists the different accuracies for the Gabor filters in orientations 0–7. It shows the high accuracy obtained when orientations is 0, 1, 2, 3, 4, 5, which may indicates that Gabor filters of orientation 0–5 contributes more in the classification task. The Gabor feature images of salicylic acid and rhodamine B were visibly distinct, thus indicating the contribution of the features from the filters of each orientation. Figures 11 and 12 show that the feature images of orientations 0, 2, 3, and 5 exhibit dissimilarities in texture features. The accuracy results in Table 8 explain the high accuracy obtained for orientations 0, 2, 3, and 5, which is meaningful for the discrimination of hydroquinone and resorcinol. Fair performance was observed in orientations 1 and 4, which was possibly because of the contribution from the other organic contaminants whose feature images differed greatly in those orientations. For example, the Gabor feature images differs a lot in the orientation 1 and 4 for phenol, salicylic acid and rhodamine B. At the mean time, Gabor filters in orientation 6 and 7 may contributes less in the method. This part of the discussion demonstrates the experimental results in 3.3 and 3.4 as a support evidence. In this study, all the Gabor filters was chosen to generate the features for classification and it turned out to work well. We can continue to do some research on the effect from various Gabor filters for different organic matters in the future to improve the efficiency of the method in the future.

Table 7. Different classification accuracy from filters in scale 0–4

View Table | View all tables in this article

Table 8. Different classification accuracy from filters in orientation 0–7

View Table | View all tables in this article

4. Conclusion

The real-time water quality monitoring based on 3D fluorescence spectra has been proven to be a promising alternative. A few reports on the use of spectral data to detect water quality and discriminate water sources have been published. However, identical chemical structures with comparable spectral characteristics in terms of parallel peak positions or overlapping peaks can lead to missed detection or false alarms. These issues compromise the reliability of detection systems. In this study, we proposed a novel procedure on the basis of 2D Gabor wavelets and multiple SVM classifiers and applied it to the classification of organic contaminants from fluorescence spectra. 2D Gabor wavelets were employed as a feature extraction method, and 40 sub feature images were obtained for each sample. Next, block statistics were applied to reduce dimensionality and enhance classification efficiency and computational speed in practice. A simulative experiment was implemented to obtain five organic pollutants in various concentrations. Multiple SVM classifiers were employed to train the classification model with a training dataset from data obtained in a three-day collection and with a testing dataset from data obtained in a single-day acquisition. To amply prove the effects of 2D Gabor wavelets on the feature extraction for fluorescence data, we performed three groups of experiments and compared the results with those of the PCA in terms of classification accuracy. For the pollutants whose characteristic peaks were closely positioned or appeared to overlap, such as phenol, hydroquinone, and resorcinol, the feature extraction method based on 2D Gabor wavelets exerted the best effects on detailed texture extraction. Moreover, the feature description information obtained from 3D fluorescence spectra using this method was abundant, thus indicating high classification accuracy. Given these advantages, the novel procedure can be widely used in water quality detection from fluorescence spectral data and will contribute to ensuring water safety.

Funding

National Natural Science Foundation of China (61573313, 61803333, U1509208); the National Key R&D Program of China (2017YFC1403801); Key Technology Research and Development Program of Zhejiang Province (2015C03G2010034); Zhejiang Provincial Natural Science Foundation of China (LQ18F030001); Fundamental Research Funds for the Central Universities (2017FZA5011).

References

1. L. Sheng-ca, “Statistics of environmental events in China during the period from July to August in 2015,” J. Saf. Environ. pp. 390–394 (2015).

2. H. Che, S. Liu, and K. Smith, “Performance evaluation for a contamination detection method using multiple water quality sensors in an early warning system,” Water 7, 1422–1436 (2015). [CrossRef]

3. E. F. Conde, “Environmental sensor anomaly detection using learning machines,” Master’s thesis, Utah State University (2011).

4. J. Dahlén, S. Karlsson, M. Bäckström, J. Hagberg, and H. Pettersson, “Determination of nitrate and other water quality parameters in groundwater from UV/Vis spectra employing partial least squares regression,” Chemosphere 40, 71–77 (2000). [CrossRef] [PubMed]

5. W. Bourgeois, J. E. Burgess, and R. M. Stuetz, “On-line monitoring of wastewater quality: a review,” J. Chem. Technol. & Biotechnol. 76, 337–348 (2001). [CrossRef]

6. M. C. Skala, G. M. Palmer, K. M. Vrotsos, A. Gendron-Fitzpatrick, and N. Ramanujam, “Comparison of a physical model and principal component analysis for the diagnosis of epithelial neoplasias in vivo using diffuse reflectance spectroscopy,” Opt. Express 15, 7863–7875 (2007). [CrossRef]

7. G. Langergraber, J. Gupta, A. Pressl, F. Hofstaedter, W. Lettl, A. Weingartner, and N. Fleischmann, “On-line monitoring for control of a pilot-scale sequencing batch reactor using a submersible uv/vis spectrometer,” Water Sci. Technol. 50, 73–80 (2004). [CrossRef]

8. G. Langergraber, J. v. d. Broeke, W. Lettl, and A. Weingartner, “Real-time detection of possible harmful events using UV/Vis spectrometry,” Spectrosc. Eur. 18, 19–22 (2006).

9. D. Hou, J. Zhang, Z. Yang, S. Liu, P. Huang, and G. Zhang, “Distribution water quality anomaly detection from UV optical sensor monitoring data by integrating principal component analysis with chi-square distribution,” Opt. Express 23, 17487–17510 (2015). [CrossRef]

10. D. Hou, S. Liu, J. Zhang, F. Chen, P. Huang, and G. Zhang, “Online monitoring of water-quality anomaly in water distribution systems based on probabilistic principal component analysis by UV-Vis absorption spectroscopy,” Spectroscopy 2014, 1–9 (2014). [CrossRef]

11. J. Yu, X. Zhang, D. Hou, F. Chen, T. Mao, P. Huang, and G. Zhang, “Detection of water contamination events using fluorescence spectroscopy and alternating trilinear decomposition algorithm,” Spectroscopy 2017, 1–9 (2017). [CrossRef]

12. M. V. Bui, M. M. Rahman, N. Nakazawa, E. Okazaki, and S. Nakauchi, “Visualize the quality of frozen fish using fluorescence imaging aided with excitation-emission matrix,” Opt. Express 26, 22954–22964 (2018). [CrossRef]

13. K. M. G. Mostofa, T. Yoshioka, E. Konohira, E. Tanoue, K. Hayakawa, and M. Takahashi, “Three-dimensional fluorescence as a tool for investigating the dynamics of dissolved organic matter in the Lake Biwa watershed,” Limnology 6, 101–115 (2005). [CrossRef]

14. M. Bosco, M. Callao, and M. Larrechi, “Simultaneous analysis of the photocatalytic degradation of polycyclic aromatic hydrocarbons using three-dimensional excitation-emission matrix fluorescence and parallel factor analysis, ” Anal. Chimica Acta 576, 184–191 (2006). [CrossRef]

15. Y. Shutova, A. Baker, J. Bridgeman, and R. K. Henderson, “Spectroscopic characterisation of dissolved organic matter changes in drinking water treatment: From parafac analysis to online monitoring wavelengths,” Water Res. 54, 159–169 (2014). [CrossRef] [PubMed]

16. L. Yang, J. Hur, and W. Zhuang, “Occurrence and behaviors of fluorescence EEM-parafac components in drinking water and wastewater treatment systems and their applications: a review,” Environ. Sci. Pollut. Res. 22, 6500–6510 (2015). [CrossRef]

17. R. H. Peiris, C. Hallé, H. Budman, C. Moresoli, S. Peldszus, P. M. Huck, and R. L. Legge, “Identifying fouling events in a membrane-based drinking water treatment process using principal component analysis of fluorescence excitation-emission matrices,” Water Res. 44, 185–194 (2010). [CrossRef]

18. A. Baker, “Fluorescence excitation-emission matrix characterization of some sewage-impacted rivers, ” Environ. Sci. & Technol. 35, 948–953 (2001). [CrossRef]

19. G. Pavelescu, L. Ghervase, C. Ioja, S. Dontu, and R. Spiridon, “Spectral fingerprints of groundwater organic matter in rural areas,” Romanian Reports Phys. 65, 1105–1113 (2013).

20. J. G. Daugman, “Complete discrete 2-d gabor transforms by neural networks for image analysis and compression,” IEEE Transactions on Acoust. Speech, Signal Process. 36, 1169–1179 (1988). [CrossRef]

21. C. Wang, X. Shi, W. Li, L. Wang, J. Zhang, C. Yang, and Z. Wang, “Oil species identification technique developed by Gabor wavelet analysis and support vector machine based on concentration-synchronous-matrix-fluorescence spectroscopy, ” Mar. Pollut. Bull. 104, 322–328 (2016). [CrossRef]

22. F. Solsona, “Guidelines for drinking water quality standards in developing countries,” World Health Organization (2002).

23. M. Bahram, R. Bro, C. A. Stedmon, and A. Afkhami, “Handling of Rayleigh and Raman scatter for parafac modeling of fluorescence data using interpolation,” J. Chemom. 20, 99–105 (2006). [CrossRef]

24. R. G. Zepp, W. M. Sheldon, and M. A. Moran, “Dissolved organic fluorophores in southeastern us coastal waters: correction method for eliminating Rayleigh and Raman scattering peaks in excitation-emission matrices,” Mar. Chem. 89, 15–36 (2004). [CrossRef]

25. J. Bridgeman, M. Bieroza, and A. Baker, “The application of fluorescence spectroscopy to organic matter characterisation in drinking water treatment,” Rev. Environ. Sci. Bio/Technology 10, 277 (2011). [CrossRef]

26. L. Fei, Y. Xueyi, L. Bin, Y. Peng, and Z. Zhenquan, “Block statistics based Gabor feature representation and face recognition,” Pattern Recognit. Artif. Intell. 19, 585–590 (2006).

27. C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn. 20, 273–297 (1995). [CrossRef]

28. M. A. Little, G. Varoquaux, S. Saeb, L. Lonini, A. Jayaraman, D. C. Mohr, and K. P. Kording, “Using and understanding cross-validation strategies. Perspectives on Saeb et al.,” GigaScience 6, 1–6 (2017). [CrossRef]

Pollutants	Peak(Ex/Em)	Molecular Formula	Number of benzene
Phenol	270 ∼ 280nm/305 ∼ 315nm	C₆H₅OH	1
hydroquinone	275 ∼ 285nm/300 ∼ 310nm	C₆H₄O₂	1
resorcinol	275 ∼ 280nm/300 ∼ 310nm	C₆H₆O₂	1
Rhodamine B	545 ∼ 555nm/570 ∼ 580nm	C₂₈H₃₁ClN₂O₃	4
Salicylic acid	290 ∼ 300nm/400 ∼ 410nm	C₇H₆O₃	1

Num(K)	Features	Accuracy(%)	Runtime(s)
64	5120	97.5	0.48
16	1280	96.7	0.008
4	320	94.5	0.003
1	80	86.1	0.002

Pollutants	Peak(Ex/Em)	Molecular Formula	Number of benzene
Phenol	270 ∼ 280nm/305 ∼ 315nm	C₆H₅OH	1
hydroquinone	275 ∼ 285nm/300 ∼ 310nm	C₆H₄O₂	1
resorcinol	275 ∼ 280nm/300 ∼ 310nm	C₆H₆O₂	1
Rhodamine B	545 ∼ 555nm/570 ∼ 580nm	C₂₈H₃₁ClN₂O₃	4
Salicylic acid	290 ∼ 300nm/400 ∼ 410nm	C₇H₆O₃	1

Num(K)	Features	Accuracy(%)	Runtime(s)
64	5120	97.5	0.48
16	1280	96.7	0.008
4	320	94.5	0.003
1	80	86.1	0.002

Classification of water contamination developed by 2-D Gabor wavelet analysis and support vector machine based on fluorescence spectroscopy

Abstract

1. Introduction

2. Method

2.1. Organic solutions

2.2. Fluorescence

2.3. Principal component analysis

2.4. 2-D Gabor wavelet

2.5. SVM multi-classification model

3. Results and discussion

3.1. Pre-processing

3.2. Experiments

3.3. Classification results with Group 1

3.4. Classification results for Group 2 and 3

3.5. Discussion with block number K

3.6. Discussion with Gabor features

4. Conclusion

Funding

References

Cited By

Figures (12)

Tables (8)

Equations (5)

Optics Express

testing labels	methods
	Gabor+SVM predicted labels			PCA+SVM predicted labels
	phenol	salicylic acid	rhodamine B	phenol	salicylic acid	rhodamine B
phenol	24	0	0	24	0	0
salicylic acid	0	24	0	0	24	0
rhodamine B	0	4	20	0	4	20

Accuracy(%)	Gabor+SVM	PCA+SVM
Group 1 with phenol, Rhodamine B and salicylic acid	94.4	94.4
Group 2 with phenol, hydroquinone and resorcinol	100	61
Group 3 with phenol, hydroquinone, resorcinol, Rhodamine B and salicylic acid	96.7	89.1