Dilated convolutional neural networks for fiber Bragg grating signal demodulation

Baocheng Li; Baocheng Li; Baocheng Li; Zhi-Wei Tan; Zhi-Wei Tan; Perry Ping Shum; Perry Ping Shum; Perry Ping Shum; Chenlu Wang; Chenlu Wang; Yu Zheng; Yu Zheng; Liang jie Wong; Liang jie Wong

doi:10.1364/OE.413443

1. Introduction

In the last few decades, optical fiber sensor technology has attracted much attention in areas including structural health monitoring, civil engineering, biochemistry and medical devices [1,2], due to advantages like compactness and immunity to electromagnetic interference. Fiber Bragg grating (FBG) sensors, in particular, have controllable reflected signal, high multiplexing capability and excellent linearity [3–5]. Arrays of FBG sensors can be inscribed on a single optical fiber to form a quasi-distributed FBG sensor channel, which can be used to monitor real-time conditions in multiple physical locations simultaneously. For even more flexibility, multiple FBG sensor channels can be used, forming an FBG sensor network. In these networks, more than one FBG is typically assigned to the same wavelength range to maximize the use of the broadband light source’s limited bandwidth. This gives rise to challenges in peak detection, especially when FBGs assigned to the same wavelength range produce nearly overlapped signals, since the proximity of the peaks distorts the location of each individual peak in the total signal. This leads to very poor accuracies for highly overlapped peaks, despite high accuracies when the peaks are moderately overlapped or far apart [6]. Therefore, a demodulation method with high accuracy even for highly overlapped signals is essential to ensure the functionality of FBG sensor networks under all conditions.

The working mechanism for FBG sensors is the linear relationship between Bragg wavelength shift and the grating period, which varies in response to changes in physical conditions that include temperature, strain, pressure and vibration [7]. Bragg wavelength detection is therefore key to demodulating signals from quasi-distributed FBG sensor networks. This signal demodulation problem is relatively trivial when each FBG is assigned its unique wavelength range within the bandwidth of the broadband source, ensuring that only one signal exists within any particular wavelength range. Doing so, however, limits the wavelength multiplexing capability of the system. To maximize the multiplexing capability for a given broadband source, more than one FBG must be assigned to each wavelength range, necessitating solutions for accurate and efficient demodulation of multiple overlapped signals.

Conventional peak detection (CPD) is the technique traditionally used in FBG signal demodulation. In CPD, however, any overlapped signal leads to a signification deterioration in detection accuracy due to crosstalk [8]. To improve on this, demodulation techniques based on evolutionary algorithms like particle swarm optimizers (PSO) [9–11], genetic algorithms (GA) [12] and differential evolution algorithms (DE) [13] were proposed to improve multiplexing capability. These algorithms improved the signal demodulation error for overlapped signals. However, a longer demodulation time is required when the number of FBG sensors is increased.

Various machine learning algorithms have been shown to improve both demodulation accuracy and time for overlapped signals. These algorithms include extreme learning machine (ELM) [14,15] and least squares support vector regression (LS-SVR) [8]. These machine learning techniques have achieved an average root-mean-square (RMS) error in peak wavelength detection of 1 pm. However, each signal demodulation operation for just two overlapped signals takes about a half-second, which is too long for real-time monitoring applications. Recently, a recurrent neural network was studied for FBG signal demodulation, achieving a record-low RMS error in peak wavelength detection of 0.674 pm. However, the errors grew by more than an order of magnitude when the signals were highly overlapped [6]. The work highlighted the promise of neural network algorithms in FBG signal demodulation, but also revealed the need to address the large errors that arise in the event of significant signal overlap.

In this paper, we implement a dilated convolutional neural network (CNN) algorithm that successfully achieves signal demodulation with RMS error of < 0.05 pm, even for cases when the signals are highly overlapped. Our demodulation scheme has a demodulation time of 15 ms for the overlapped signals from two FBG sensor channels, making it feasible for real-time monitoring of physical conditions. Our demodulation scheme is also robust against noise, achieving an RMS error of < 0.47 pm even with a signal-to-noise ratio as low as 15 dB. Furthermore, we show that our scheme can be robustly scaled up to a larger number of channels, achieving RMSEs of 0.10 pm (testing time 13.9 ms) and 0.12 pm (testing time 18.7 ms) for a four- and eight-channel system respectively. Our findings pave the way to even faster and more accurate signal demodulation schemes, and testify to the substantial promise of neural network algorithms in signal demodulation problems.

2. Quasi-distributed sensor network principles

The schematic diagram of the proposed quasi-distributed FBG sensor network system is shown in Fig. 1(a). The system comprises n channels of individual FBG sensor arrays, with each channel containing m FBGs. In our convention, $\textrm{FB}{\textrm{G}_{ji}}$denotes the FBG at ${i^{\textrm{th}}}$ position of ${j^{\textrm{th}}}$channel, where$1 \le i \le m$, and $1 \le j \le n$.We group the ${i^{\textrm{th}}}$ FBGs across all channels (“Gp i” in Fig. 1(a)), and assign them to the ${i^{\textrm{th}}}$ wavelength segment of the broadband light source. For instance, the wavelength range for Gp 1 could be 1549 nm to 1551 nm and Gp 2 could be 1551 nm to 1553 nm, etc. The optical spectrum analyzer (OSA) collects the reflected spectra then pass the data to the dilated CNN model for further analysis and peak detection. Different degrees of overlap in the FBG signals allocated to the same wavelength range can be expected, as illustrated in Fig. 1(b) for the case of 2 signals. When the signals are highly overlapped, a single peak might be formed instead of two individual peaks as illustrated in Fig. 1(b), resulting in challenges in accurately determining the peak wavelength of each individual signal, which is the task of the signal demodulation unit (labelled “CNN algorithm” in Fig. 1(a)).

Fig. 1. Extremely low error in peak wavelength detection (root-mean-square (RMS) error < 0.05 pm) can be achieved when using a dilated convolutional neural network (CNN) to demodulate two overlapped FBG signals in a quasi-distributed sensor network. a) Schematic of a distributed sensor network. A broadband light source is emitted into the splitter, which splits the light pulse into multiple branches. FBGs belonging to the same group are assigned to the same wavelength range within the bandwidth of the broadband light source. The reflection measured at the optical spectrum analyzer (OSA) from each group of FBGs contain multiple signals with different degrees of overlap, illustrated in b) for the case of two overlapping signals. In previous signal demodulation schemes, the demodulation error typically increases when the signal overlap is high (peak separation Δλ < ±0.06 nm) c) Our signal demodulation method using a dilated CNN achieves extremely low RMS error even when the signals are highly overlapped with a root-mean-square error in measured peak wavelength that remains consistently < 0.05 pm.

Download Full Size | PDF

We denote the reflectivity of $\textrm{FB}{\textrm{G}_{ji}}$ as${R_{ji}}$, and we consider random white Gaussian noise $N(\lambda )$ in the system. The total reflectivity ${R_{\textrm{tot}}}$ of the FBG sensing system, measured by the OSA, is given by

(1)$${R_{\textrm{tot}}}(\lambda ) = \sum\nolimits_{i = 1}^m {\sum\nolimits_{j = 1}^n {{R_{ji}}(\lambda ,{\lambda _{\textrm{B, }ji}}) + N(\lambda )} } .$$

Equation (1) shows the relationship between total reflectivity$R(\lambda )$and the Bragg wavelength${\lambda _{\textrm{B, }ji}}$. The ultimate goal in FBG signal demodulation is to retrieve the precise Bragg wavelengths ${\lambda _{\textrm{B, }ji}}$ from the total reflectivity. Previous numerical models to retrieve these wavelengths focus on the inverse of the reflected spectrum matrix ${R_{ji}}^{ - 1}$ [6,14]. Instead, we implement a dilated CNN to perform the signal demodulation, achieving extremely low RMS error and testing time. Furthermore, by treating the signal demodulation problem as an image analysis problem, our algorithm can be applied to any wavelength range, not just the wavelength range in which we perform the model training.

Figure 1(c) shows that we achieve an RMS error of < 0.05 pm, even for cases when the signals are highly overlapped. Our demodulation scheme has a demodulation time of 15 ms for the overlapped signals from two FBG sensor channels, making it feasible for real-time monitoring of physical conditions.

3. Simulation setup

The reflectivity of $\textrm{FB}{\textrm{G}_{ji}}$can be approximated as Gaussian distribution [16] and expressed as

(2)$${R_{ji}}(\lambda ,{\lambda _{\textrm{B, }ji}}) = {I_{\textrm{peak}}}\exp \left[ { - 4\ln 2{{\left( {\frac{{\lambda - {\lambda_{\textrm{B, }ji}}}}{{\varDelta {\lambda_{\textrm{B, }ji}}}}} \right)}^2}} \right],$$

where ${I_{\textrm{peak}}}$ and $\varDelta {\lambda _{\textrm{B, }ji}}$ are the peak reflectivity and full width at half maximum (FWHM) of $\textrm{FB}{\textrm{G}_{ji}}$. The sensor network could have multiples of channels as shown in Fig. 1(a); we will focus on two channels in this work. The sensor network data for model training is generated through the theoretical model using Eqs. (1, 2) with ${I_{\textrm{peak}}}$ of 0.8 and 0.7 for the first and second channel, respectively. We also set $\varDelta {\lambda _{\textrm{B,}1i}}$ = 0.2 nm and $\varDelta {\lambda _{\textrm{B,}2i}}$ = 0.3 nm for all i.

The simulation data without noise was fed into the dilated CNN as training data. It is not practical to use experimental data for model training as deep learning requires massive training data in order to produce high accuracy models; 70,000 training data sets were used in the proposed model (more details of the dilated CNN will be discussed in Section 5).

In an actual scenario, noise is unavoidable and will increase the likelihood of inaccurate measurements. To test the robustness of our signal demodulation method, white Gaussian noise with different signal-to-noise ratios (SNR) was added to the original spectra to generate testing data. The SNR is given by

(3)$$SN{R_{\textrm{dB}}} = 10\log \frac{{{P_{\textrm{signal}}}}}{{{P_{\textrm{noise}}}}},$$

where ${P_{\textrm{signal}}}$ is set as 5 mW which is the typical broadband source output power. The noise masks the actual location of the peak, making the Bragg wavelength challenging to determine with high precision, as one may see from the inset in Fig. 2(a). The same effect occurs, albeit to a smaller extent, when the SNR is reduced, as in Fig. 2(b). Besides performing peak detection using the noisy data directly, filters could be added to smooth the data before the peak detection process. In Fig. 2(c) we present the RMS error for a range of SNR, comparing the errors when noisy data is used directly, and when the noisy data is first smoothed with a Gaussian-smoothing filter. It is noteworthy that the RMS error is < 0.47 pm, even with an SNR as small as 15 dB.

Fig. 2. Impact of noise on signal demodulation accuracies, showing the effectiveness of a Gaussian-smoothing filter to achieve extremely low RMS errors even in the presence of substantial noise. a) shows the spectrum of a signal with a signal-to-noise ratio (SNR) of 15 dB. b) shows the spectrum of a signal with an SNR of 35 dB. c) shows that the RMS errors increase considerably when the spectra SNR decreases if we directly demodulate the noisy signals with our dilated CNN. When a Gaussian-smoothing filter is applied, the RMS errors drop substantially, achieving an RMS error of < 0.47 pm even when the SNR is 15 dB. These results show the robustness of our dilated CNN method in the presence of noise, and also the effectiveness of a Gaussian-smoothing filter even under relatively high SNR conditions.

Download Full Size | PDF

4. Results

The RMS error (RMSE), which we use to evaluate the effectiveness of our signal demodulation method, are defined as

(4)$$\textrm{RMSE} = \sqrt {\frac{{\sum\nolimits_{i = 1}^m {{{\sum\nolimits_{j = 1}^n {({{\lambda_{\textrm{B, }ji}} - {y_{ji}}} )} }^2}} }}{{n \times m}}} ,$$

where ${y_{ji}}$ is the predicted Bragg wavelength for $\textrm{FB}{\textrm{G}_{ji}}$, n is the number of FBGs in Gp i, and m is the number of channels in the sensing system.

We compare the RMSE of the dilated CNN demodulation method against that of other schemes, including the standard CNN, long short-term memory (LSTM) and hybrid CNN-LSTM, using the same training and testing data sets. As shown in Fig. 3(a), the dilated CNN converges after 75 epochs with a root-mean-square validation error (RMSVE) of 0.16 pm, whereas the LSTM took twice the number epochs and converged only after 150 epochs with a RMSVE of 0.21 pm. The hybrid CNN-LSTM and standard CNN converge with RMSVE of 0.24 pm and 0.43 pm, respectively as shown in Fig. 3(c), respectively. We see that proposed dilated CNN outperforms the other three methods in terms of RMSVE. The small variations between root-mean-square training error (RMSTE) and RMSVE showed the model is nor overfitting neither underfitting. A model with lower RMSVE will have a better performance during the peak detection. Hence, a lower RMSE is achieved. Further analysis of the dilated CNN was carried out and the relationship between RMSE, number of dilated layers and testing time is shown in Fig. 3(d). As expected, the testing time increases roughly in proportion to the number of dilated CNN layers. At the same time, the RMSE decrease with increasing layers, with the sharpest drop in RMSE between 40 and 60 CNN layers. By varying the number of CNN layers, the demodulation method can readily to meet detection accuracy and detection time requirements.

Fig. 3. Performance of the proposed dilated CNN (where DCNN is dilated CNN, Hybrid is hybrid CNN-LSTM, T is training, V is validation, RMSTE is root-mean-square training error, and RMSVE is root-mean-square validation error). a) RMSTE and RMSVE plot, the RMSVE are about the same as RMSTE, implying that the model training was conducted using suitable learning rates and batch sizes. b) The difference between the RMSTE and RMSVE for the dilated CNN is consistently low within 0.05 pm. c) The RMSVE for dilated CNN, LSTM, hybrid CNN-LSTM, and standard CNN are 0.16 pm, 0.21 pm 0.24 pm, and 0.43pm respectively. d) Relationship between RMS, number of layers and testing time. The error falls dramatically especially between 40 and 60 CNN layers, while the testing time increases proportionally to the number of CNN layers.

Download Full Size | PDF

As shown in Fig. 4(a), we have conducted peak detection tests for signals made of signal peaks separated by Δλ ranging from ±0 nm to ±0.45 nm, using several deep learning methods. We see that the proposed dilated CNN has the best performance in terms of RMS error, with RMSEs below 0.06 pm, even when Δλ is very small (high overlap). To study the difference in the performance of dilated CNN and standard CNN, we set up a standard CNN and keep all the training parameters the same except for the dilation rate. The performance of the standard CNN is inconsistent is seen in Fig. 4(a), with RMSEs as high as 0.8 pm. In addition, we have implemented a standard LSTM with two layers, with RMSEs that remain consistently low for all overlap around 0.12 pm, which is twice as large as the RMSE obtained with dilated CNN. As a hybrid CNN-LSTM is believed to combine the benefits of both CNN and LSTM, we implemented a hybrid CNN-LSTM consisting of 70 layers of CNN and 2 layers of LSTM. The CNN-LSTM method gives an RMSE as high as 0.9 pm. The standard CNN and hybrid CNN-LSTM do not perform as well as the other models, possibly because the features in the total signal are correlated over a relatively large spectral scale, and a non-dilated convolution kernel cannot span the scale needed to capture these correlations. The proposed dilated CNN allows CNN to increase receptive field without introducing extra parameters, reducing the RMSE significantly.

Fig. 4. Root-mean-square error (RMSE) for different peak separations Δλ, for different neural network algorithms. In a), we look at a wide range of Δλ, and in b), we zoom-in to the region where Δλ < ±0.06 nm. In all cases, we note the consistent superiority in RMSE of the dilated CNN over the other algorithms.

Download Full Size | PDF

In Fig. 4(b), we zoom-in to look at the regime of highly overlapped signals (Δλ < ±0.06 nm). In this regime, the RMSEs for dilated CNN, LSTM and hybrid CNN-LSTM are relatively consistent as compared to standard CNN where the RMSEs varies from 0.5 pm to 2.5 pm for different Δλ. In the highly overlapped region, dilated CNN still outperforms the other three methods with average RMSE < 0.05 pm, versus an average RMSE of 0.1 pm and 1 pm for the LSTM and hybrid CNN-LSTM schemes, respectively.

In Table 1, we compare the dilated CNN with other models in terms of RMSE and testing time. All the models are trained and tested using the same data sets under the same computer environment, including CPU and GPU (see details in Section 5.4). The proposed dilated CNN model has an average RMSE of 0.0201 pm, which outperforms the other four models. The testing time for the proposed dilated CNN model is 15.17 ms for each test, which makes it promising for real-time monitoring applications. Furthermore, previous work that uses the LSTM for signal demodulation [6] has shown that the LSTM outperforms other algorithms like ELM [14], LS-SVR [8]. The work on LS-SVR [8] showed that LS-SVR outperforms other evolutionary algorithms (EAs) like GA [12], dynamic multi-swarm particle swarm optimizer (DMS-PSO) [11], tree search DMS-PSO (TS-DMS-PSO) [9] and DE [13]. This suggests that our dilated CNN signal demodulation scheme outperforms many algorithms even beyond those listed in Table 1, especially in its ability to maintain consistently low RMSE even for highly overlapped signals.

Table 1. Comparison of different algorithms

View Table

To evaluate the performance of our modulation scheme for lower resolution detection, we have set up our training and testing model using 10 pm resolution. Such a study is meaningful, for instance, in evaluating the potential of our scheme for use with interrogators, which have a 8-pm resolution limit even for dense interrogators (Micron Optics, BraggSense, Luna OVA), and even coarser resolutions other cases [17]. We also include random variations of the peak reflectivity and FWHM to capture the imperfections during FBG fabrication that might alter shape of the reflected signal. Hence, random distribution within a set margin is implemented to the peak reflectivity and FWHM. For the two-channel system, we have set ${I_{\textrm{peak}}}$ of 0.8 ± 0.05 and 0.7 ± 0.05 for the first and second channel, respectively. In addition, $\varDelta {\lambda _{\textrm{B,}1i}}$ = 0.2 ± 0.05 nm and $\varDelta {\lambda _{\textrm{B,}2i}}$ = 0.3 ± 0.05 nm for all i. We have also scaled up to four- and eight-channel sensing system to test the robustness of the model. For the four-channel system, ${I_{\textrm{peak}}}$are 0.9, 0.8, 0.7 and 0.6 with a margin of ±0.05, $\varDelta {\lambda _{\textrm{B, }ji}}$are 0.1 nm, 0.2 nm, 0.3 nm and 0.4 nm with a margin of ±0.05 nm. For the eight-channel system, ${I_{\textrm{peak}}}$are 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6 are 0.55 a margin of ±0.05, $\varDelta {\lambda _{\textrm{B, }ji}}$are 0.1 nm, 0.15 nm, 0.2 nm, 0.25 nm, 0.3 nm, 0.35 nm, 0.4 nm and 0.35 nm with a margin of ±0.05 nm. A detailed study on highly overlap spectra is shown in Fig. 5. We have obtained promising RMSE of 0.0185 pm, 0.0444pm and 0.0829 pm for the two-, four- and eight-channel system. Moreover, the model has achieved relatively short testing times of 13.0 ms, 13.9 ms and 18.7 ms for sensing systems with two-, four- and eight-channel, respectively. As expected, the average RMSE increases as the number of channels increases, but the increase is approximately linear and remains << 1 pm in all cases considered. Our study thus reveals the potential of our demodulation scheme to handle large numbers of FBG sensing channels.

Fig. 5. Root-mean-square error (RMSE) for different peak separations Δλ for 2, 4 and 8 channels system. The dotted line represents the average RMSE for the system whereby black, red, and blue corresponding to 2, 4 and 8 channels respectively. As expected, the average RMSE increases as the number of channels increases, but the increase is approximately linear and remains << 1 pm in all cases considered

Download Full Size | PDF

FBG interrogators are usually used in the industry implementation of quasi-distributed FBG sensor network system, and the popular interrogators are based on a white-light setup or a scanning laser system [18]. The scanning laser usually has a narrow band laser that rapidly scans through a wavelength range. The light is being multiplexed into different channels through a splitter, and each FBG array fiber is connected to a photodetector though a coupler. Therefore, the proposed deep learning model is not required as there is no signal overlap across different channels. While the white-light setup is similar to the setup shown in Fig. 1(a) except the spectrometer is used instead of OSA. The reflected FBG spectra from different channels are overlapped together. Hence, the proposed deep learning model could be implemented into the white-light setup. Spectrometers generally have a coarser wavelength resolution than OSA. Although this work's wavelength resolution is mainly focused on 1 pm, we have also shown that the proposed dilated CNN has promising results even in the case 10 pm resolution as shown in Fig. 5.

An alternative FBG sensing network design to that shown in Fig. 1(a) could be implemented as shown in Fig. 6. This alternative design allows reflected FBG spectra to overlap within the same fiber, and eliminates the possibilities of overlapping spectra across different channels. The advantage of the design in Fig. 6 is that it allows more FBGs to be inscribed in the same fiber, peak detection model built based on this design works well with overlapping spectra in the same fiber. Hence, the deep learning model can be implemented, for instance, using interrogators based on a scanning laser system. However, it should be noted that overlapping spectra within the same fiber increases the probability of crosstalk, reducing peak detection accuracies.

Fig. 6. An alternate design of the FBG sensing network whereby overlapping spectra is allowed in the same channel.

Download Full Size | PDF

5. Implementation of dilated convolutional neural network

The problem of separating multiple overlapping signals can be seen as a regression problem and sequential modelling is often used to deal with this. Studies have shown that simple convolutional architecture is preferred in most of basic sequence modelling tasks as compared to recurrent architecture [19]. CNN has achieve state-of-the-art performance in various fields like speech recognition and image segmentation [20].

A convolutional neural network uses a convolution operation for general matrix multiplications in its layers [21]. A CNN consists of input, output and hidden layers. Within its hidden layers, the CNN contains convolutional, pooling, fully connected and normalization layers. Since the overlapping spectrum problem is a regression problem that can be modelled using sequential modelling, CNN has the ability to extract complex features from the input signal (FBG signals in our case) effectively. The proposed CNN model deploys convolutional layers with leaky rectified linear unit (ReLU) activation function and batch normalizations.

5.1 Convolutional layer

The convolutional layer is the primary building block of the CNN, and contains sets of learnable filters/kernels. Parameters such as width, height and numbers of kernels can be adjusted. Data is convolved in this layer and passed to the next layer. A convolution operation is illustrated in Fig. 7(a), where a 4×4 input image matrix is convolved with a 2×2 kernels (with dilation rate of 2) to form a 2×2 feature map. The final output of a convolutional layer is formed by stacking all the feature maps together. Higher performance can potentially be achieved with a larger number of kernels, at the risk of overfitting due to an increase in the number of fitting parameters.

Fig. 7. Illustration of the convolution process taking place in a CNN, and the effects of different kernel dilations (a) The 4×4 input image matrix is convolved with a 2×2 kernel to produce a 2×2 feature map. The dilation rate used is 2, with zeros added in the dilation process. (b) Standard CNN using a non-dilated kernel (dilation rate is 1), yielding a receptive field of 7 at the output layer. (c) In contrast, a dilated CNN using dilated kernel whose dilation rate is doubled every layer achieves a receptive field at the output layer of 15. The receptive field could expand exponentially using dilated kernels without introducing additional parameters.

Download Full Size | PDF

In a CNN, the receptive field refers to the region of the previous layer that the current layer depends on. Increasing the receptive field allows the current layer to access a broader region of the previous layer, which helps to ensure that no crucial information is missed during the convolution process. The receptive field can be increased by stacking more layers or implementing dilated kernels. Increasing the number of layers would achieve a model with better performance while sacrificing training and testing time. Recently, the dilated CNN has been proposed for image segmentation [22] and unmanned aerial vehicle noise reduction for input with low signal-to-noise ratio [23]. It has shown the dilated CNN was able to cover a larger receptive field without increasing the memory costs and decreasing the resolution. Illustrations of the non-dilated CNN and the dilated CNN are shown in Fig. 7(b) and (c), respectively. The dilation rate is 1 in the non-dilated kernel, and we have two hidden layers in the example. After the convolution operations, the output of the non-dilated CNN has the receptive field of 7, as shown in Fig. 7(b). In the dilated CNN example, the dilation rate is doubled every layer. The output of the dilated CNN has a receptive field of 15, as shown in Fig. 7(c). In the dilated CNN, the receptive field is increased without additional parameters, implying that fewer layers are needed to achieve the same receptive field compared to the non-dilated case.

5.2 Batch normalization

The feature map data is being normalized by the batch normalization before being passed to the activation function. Normalization of the feature map data is done by subtracting the batch mean and then dividing by the batch standard deviation. After that, the affine transformation is added to the normalization [24]. Batch normalization has been empirically shown to speed up the training process, and improves model performance and stability [25]. Besides, a larger learning rate may increase the layer parameters, and then the gradient for backpropagation will be scale up which leads to an exploding gradient problem. However, the scale of the gradient is not affected when using batch normalization as the batch normalization takes care of the scale of parameters [25]. The batch normalization can then use a larger learning rate, which increases the training speed.

5.3 Activation function

The result of the convolution in each convolution layer is passed through an activation function to obtain the final feature map. A common activation function used in CNN is the rectified linear unit (ReLU). As shown in Fig. 8(a), a ReLU is a linear function for all positive input and zero for all negative input. Since there is no complex mathematical operation in a ReLU, it has a shorter training time, and the linearity in the positive region has been shown to give faster convergence [26]. However, the ReLU has the “dead neuron” problem, as negative input always lead to a zero output, which could result in a large number of inactivated neurons in the network. The Leaky ReLU [27], shown in Fig. 8(b) addresses this problem by having a linear function of a different slope (with a value like 0.01) for negative input. The Leaky ReLU can then avoid the “dead neuron” problem.

Fig. 8. Activation functions (f(x) is the output after activation function) for neural networks. (a) A rectified linear unit (ReLU) has an output of zeros for all the negative inputs (the input being denoted x here). This leads to a “dead neuron” problem where large portions of the network could be made up of inactivated neurons. (b) The Leaky ReLU is similar to the ReLU except for a small gradient in the negative region, avoiding the “dead neuron” problem, since negative input will also activate the neurons. The different gradients in the positive and negative input regions ensures the nonlinearity of the system.

Download Full Size | PDF

5.4 Dilated CNN model training and testing

In this work, we trained a dilated CNN model for FBG signal demodulation, for the two-channel scenario. For the model training, we use a processor with specifications: Eight-Core Intel Xeon E5-1680v4 CPU and NVIDIA GTX1080 Ti, 11GB GDDR5X GPU. Since the overlapping signals are treated as pixels with different weights, matrix indices are used to represent the wavelength positions. The training data consists only of spectral intensities and matrix indices. The training data resolution is 1 pm, giving 2000 data points within the wavelength range of 2 nm. The input of the model is the total signal, while the outputs are the two separated FBG signals. At each convolution layer, the convolution operation, batch normalization and the leaky ReLU are applied. The gradients control the learning rate of the network, and the training will slow down if the gradients are small. Vanishing gradient occurs when the gradients become very small or even zero. The vanishing gradient problem becomes significant when the dilated CNN layers increased. The residual connections in-between layers are used to reduce the vanishing gradient problem [28]. To improve the accuracy and speed of the model, we used the Adam stochastic optimization algorithm [29] to reduce the output losses. The overall CNN model was trained and tested using PyTorch 1.2.

To optimize our model, parameters such as number of training data and number of layers were analysed during the model training process. Learning rate and batch size do affect the model performance, but not as significantly as the training data and the number of layers. We set our learning rate and batch size as 0.00025 and 32, respectively. As shown in Fig. 3(a), the RMSTE and RMSVE decrease significantly in the first 75 epochs and slowly approaching towards converge. The learning rate and batch size were optimized, as the difference between RMSTE and RMSVE remain consistently low with a RMSE of 0.05 pm as illustrated in Fig. 3(b). As shown in Fig. 3(d), the testing time increases and the RMSE decreases when the number of dilated CNN layers increases. In our proposed dilated CNN, we find that the optimal epoch and number of layers is 300 and 80, respectively.

6. Conclusion

In this paper, a dilated convolutional neural network has been implemented for FBG signal demodulation of signals from a quasi-distributed FBG sensor network. We achieve extremely low root-mean-square errors in peak wavelength detection of < 0.06 pm, even in cases of extremely high signal overlap (peak separation Δλ < ±0.06 nm), and fast detection speeds of 15.17 ms per detection. The performance of our dilated CNN remains robust even in the presence of noise, with RMS errors of about 0.47 pm when the signal-to-noise ratio is 15 dB. By treating the signal demodulation problem as an image analysis problem in our implementation, our method can be applied to any wavelength range, not just the wavelength range in which we perform the model training. We have performed simulations to show that our dilated CNN model outperforms in RMSE other implementations such as those based on LSTM, standard CNN, hybrid CNN-LSTM and least squares methods. We also showed that our scheme can be robustly scaled up to a larger number of channels, achieving RMSEs of 0.10 pm (testing time 13.9 ms) and 0.12 pm (testing time 18.7 ms) for a four- and eight-channel system respectively. Therefore, the proposed dilated CNN model is a promising candidate to enhance the multiplexing capability and detection accuracy of quasi-distributed FBG sensor networks.

Funding

Nanyang Technological University (NSFC (11774102)).

Acknowledgements

The computations were performed using the GPU server in the Machine Learning and Data Analytics Lab at Nanyang Technological University School of Electrical and Electronic Engineering. LJW acknowledges support from the Nanyang Assistant Professorship Start-up grant.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. Y. N. Zhang, Y. Zhao, T. Zhou, and Q. Wu, “Applications and developments of on-chip biochemical sensors based on optofluidic photonic crystal cavities,” Lab. Chip 18(1), 57–74 (2018). [CrossRef]

2. Y. Zheng, Z. Wu, P. P. Shum, Z. Xu, G. Keiser, G. Humbert, H. Zhang, S. Zeng, and X. Q. Dinh, “Sensing and lasing applications of whispering gallery mode microresonators,” Opto-Electron. Adv. 1(9), 18001501–18001510 (2018). [CrossRef]

3. X. Yu, X. Dong, X. Chen, C. Tian, and S. Liu, “Large-scale multilongitudinal mode fiber laser sensor array with wavelength/frequency division multiplexing,” J. Lightwave Technol. 35(11), 2299–2305 (2017). [CrossRef]

4. G. Keiser, “Optical Fiber Communications, Mc,” GrawHill Inc (2010).

5. H. E. Joe, H. Yun, S. H. Jo, M. B. Jun, and B. K. Min, “A review on optical fiber sensors for environmental monitoring,” Int. J. Precis. Eng. Manuf. Green Technol. 5(1), 173–191 (2018). [CrossRef]

6. H. Jiang, Q. Zeng, J. Chen, X. Qiu, X. Liu, Z. Chen, and X. Miao, “Wavelength detection of model-sharing fiber Bragg grating sensor networks using long short-term memory neural network,” Opt. Express 27(15), 20583–20596 (2019). [CrossRef]

7. H. Zhang, Z. Wu, P. P. Shum, R. Wang, X. Q. Dinh, S. Fu, W. Tong, and M. Tang, “Fiber Bragg gratings in heterogeneous multicore fiber for directional bending sensing,” J. Opt. 18(8), 085705 (2016). [CrossRef]

8. J. Chen, H. Jiang, T. Liu, and X. Fu, “Wavelength detection in FBG sensor networks using least squares support vector regression,” J. Opt. 16(4), 045402 (2014). [CrossRef]

9. J. Liang, P. Suganthan, C. Chan, and V. Huang, “Wavelength detection in FBG sensor network using tree search DMS-PSO,” IEEE Photonics Technol. Lett. 18(12), 1305–1307 (2006). [CrossRef]

10. Y. Qi, C. Li, P. Jiang, C. Jia, Y. Liu, and Q. Zhang, “Research on demodulation of FBGs sensor network based on PSO-SA algorithm,” Optik 164, 647–653 (2018). [CrossRef]

11. J. Liang, C. Chan, V. Huang, and P. Suganthan, “Improving the performance of a FBG sensor network using a novel dynamic multi-swarm particle swarm optimizer,” Proc. SPIE 5998, 59980O (2005). [CrossRef]

12. C. Shi, C. Chan, W. Jin, Y. Liao, Y. Zhou, and M. Demokan, “Improving the performance of a FBG sensor network using a genetic algorithm,” Sens. Actuators, A 107(1), 57–61 (2003). [CrossRef]

13. D. Liu, K. Tang, Z. Yang, and D. Liu, “A fiber Bragg grating sensor network using an improved differential evolution algorithm,” IEEE Photonics Technol. Lett. 23(19), 1385–1387 (2011). [CrossRef]

14. H. Jiang, J. Chen, and T. Liu, “Wavelength detection in spectrally overlapped FBG sensor network using extreme learning machine,” IEEE Photonics Technol. Lett. 26(20), 2031–2034 (2014). [CrossRef]

15. Y. C. Manie, R.-K. Shiu, P.-C. Peng, B.-Y. Guo, M. A. Bitew, W.-C. Tang, and H.-K. Lu, “Intensity and wavelength division multiplexing FBG sensor system using a Raman amplifier and extreme learning machine,” J. Sens. 2018, 1–11 (2018). [CrossRef]

16. C. Chan, W. Jin, and M. Demokan, “Enhancement of measurement accuracy in fiber Bragg grating sensors by using digital signal processing,” Opt. Laser Technol. 31(4), 299–307 (1999). [CrossRef]

17. D. Tosi, “Review of chirped fiber bragg grating (CFBG) fiber-optic sensors and their applications,” Sensors 18(7), 2147 (2018). [CrossRef]

18. D. Tosi, “Review and analysis of peak tracking techniques for fiber Bragg grating sensors,” Sensors 17(10), 2368 (2017). [CrossRef]

19. S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271 (2018).

20. D. C. Cireşan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep, big, simple neural nets for handwritten digit recognition,” Neural Comput. 22(12), 3207–3220 (2010). [CrossRef]

21. I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, vol. 1 (MIT Cambridge, 2016).

22. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122 (2015).

23. Z.-W. Tan, A. H. Nguyen, and A. W. Khong, “An Efficient Dilated Convolutional Neural Network for UAV Noise Reduction at Low Input SNR,” in Proceedings of 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), (IEEE, 2019), pp. 1885–1892.

24. N. Bjorck, C. P. Gomes, B. Selman, and K. Q. Weinberger, “Understanding batch normalization,” in Proceedings of Advances in Neural Information Processing Systems, 7694–7705 (2018).

25. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167 (2015).

26. C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, “Activation functions: Comparison of trends in practice and research for deep learning,” arXiv preprint arXiv:1811.03378 (2018).

27. B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853 (2015).

28. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

Method	RMSE (pm)	Testing Time (ms)
Dilated CNN	0.0201	15.17
LSTM	0.1165	8.82
CNN	0.3467	14.63
CNN-LSTM	0.809	148
Least Squares	2.56	178737

Dilated convolutional neural networks for fiber Bragg grating signal demodulation

Abstract

1. Introduction

2. Quasi-distributed sensor network principles

3. Simulation setup

4. Results

5. Implementation of dilated convolutional neural network

5.1 Convolutional layer

5.2 Batch normalization

5.3 Activation function

5.4 Dilated CNN model training and testing

6. Conclusion

Funding

Acknowledgements

Disclosures

References

Cited By

Figures (8)

Tables (1)

Equations (4)

Optics Express