Wenjie Du, David Côté, Chris Barber, and Yan Liu, "Forecasting loss of signal in optical networks with machine learning," J. Opt. Commun. Netw. 13, E109-E121 (2021)
Loss of signal (LOS) represents a significant cost for operators of optical networks. By studying large sets of real-world performance monitoring data collected from six international optical networks, we find that it is possible to forecast LOS events with good precision one to seven days before they occur, albeit at relatively low recall, with supervised machine learning (ML). Our study covers 12 facility types, including 100G lines and ETH10G clients. We show that the precision for a given network improves when training on multiple networks simultaneously relative to training on an individual network. Furthermore, we show that it is possible to forecast LOS from all facility types and all networks with a single model, whereas fine-tuning for a particular facility or network brings only modest improvements. Hence our ML models remain effective for optical networks previously unknown to the model, which makes them usable for commercial applications.
Kayol S. Mayer, Jonathan A. Soares, Rossano P. Pinto, Christian E. Rothenberg, Dalton S. Arantes, and Darli A. A. Mello J. Opt. Commun. Netw. 13(10) E122-E131 (2021)
You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
Details of the Six Network Datasets Used in This Paper, Sorted by Positive Sample Rate in Ascending Order from Left to Righta
Network1
Network2
Network3
Network4
Network5
Network6
Mega-Dataset
Number of protocol types
6
9
8
6
8
8
9
Time range (days)
139
222
204
299
301
208
1373
Number of ports
3000
19,000
5000
35,000
16,000
16,000
94,000
Number of samples
394,158
3,540,472
762,268
6,140,585
3,140,132
2,478,716
16,456,331
Number of features
76
83
113
72
115
91
125
Missing rate
75.2%
72.4%
79.2%
70.1%
77.6%
79.8%
82.2%
Positive sample rate
4.3%
5.7%
5.9%
8.6%
15.8%
17.0%
10.4%
Each dataset includes all 12 facility types of layer-1 and layer-2 ports. The right-most column summarizes the “mega-dataset,” which is a combination of all six network datasets. The missing rate of the mega-dataset is the largest due to the fact that some networks have features that do not exist in others, resulting in nearly empty columns when merging the network datasets into the mega-dataset.
Table 2.
Performance Comparison across Models Trained on Single-Network Datasetsa
Model
Network1
Network2
Network3
Network4
Network5
Network6
Weighted Avg.
Random Forest (zero imputation)
0.036
0.010
0.018
0.023
0.046
0.042
0.029
Random Forest (median imputation)
0.035
0.007
0.003
0.012
0.049
0.005
0.018
XGBoost (zero imputation)
0.034
0.023
0.087
0.039
0.066
0.056
0.048
BRITS (zero imputation)
0.050
0.026
0.078
0.039
0.065
0.053
0.048
XGBoost
0.033
0.026
0.087
0.041
0.066
0.060
0.050
BRITS
0.044
0.029
0.087
0.040
0.067
0.056
0.050
The assessment metric used is PR-AUC to recall 0.1, defined in Section 4.A. The rightmost column shows each model’s scores averaged over the mega-datasets that represent models’ overall performance. The average scores are weighted by the number of samples in each test dataset. The best result of each column is highlighted in bold.
Table 3.
Performance Comparison across Models Trained on Mega-Datasetsa
Model
Network1
Network2
Network3
Network4
Network5
Network6
Weighted Avg.
Random Forest (zero imputation)
0.024
0.009
0.009
0.022
0.044
0.028
0.024
XGBoost
0.058
0.023
0.089
0.040
0.064
0.055
0.048
BRITS (pre-trained only)
0.056
0.026
0.088
0.038
0.066
0.056
0.049
BRITS (fine-tune the classifier only)
0.060
0.026
0.085
0.038
0.068
0.056
0.050
BRITS (fine-tune the entirety)
0.055
0.030
0.087
0.040
0.068
0.057
0.052
The assessment metric used is PR-AUC to recall 0.1. The rightmost column shows each model’s scores averaged over the mega-datasets that represent models’ overall performance. The average scores are weighted by the number of samples in each test dataset. The best result of each column is highlighted in bold.
Table 4.
Comparison of Model Performance on Two Important Use-Casesa
Model
100G OTN Line Cards
10G Ethernet Clients
XGBoost trained on 100G OTN line cards
0.049
/
XGBoost trained on 10G Ethernet clients
/
0.047
XGBoost trained on all 12 facility types
0.049
0.053
BRITS trained on 100G OTN line cards
0.054
/
BRITS trained on 10G Ethernet clients
/
0.050
BRITS trained on all 12 facility types
0.055
0.052
Models presented in this table are all trained on the mega-dataset, but on different subsets of the 12 facilities. For example, XGBoost trained on 100G OTN line cards is trained on samples collected from only line-facing ports of 100G OTN line cards. The evaluation metric used is PR-AUC to recall 0.1. Models trained on 12 facility types obtain close performance with models specifically trained on 100G lines or ETH10G clients.
Tables (4)
Table 1.
Details of the Six Network Datasets Used in This Paper, Sorted by Positive Sample Rate in Ascending Order from Left to Righta
Network1
Network2
Network3
Network4
Network5
Network6
Mega-Dataset
Number of protocol types
6
9
8
6
8
8
9
Time range (days)
139
222
204
299
301
208
1373
Number of ports
3000
19,000
5000
35,000
16,000
16,000
94,000
Number of samples
394,158
3,540,472
762,268
6,140,585
3,140,132
2,478,716
16,456,331
Number of features
76
83
113
72
115
91
125
Missing rate
75.2%
72.4%
79.2%
70.1%
77.6%
79.8%
82.2%
Positive sample rate
4.3%
5.7%
5.9%
8.6%
15.8%
17.0%
10.4%
Each dataset includes all 12 facility types of layer-1 and layer-2 ports. The right-most column summarizes the “mega-dataset,” which is a combination of all six network datasets. The missing rate of the mega-dataset is the largest due to the fact that some networks have features that do not exist in others, resulting in nearly empty columns when merging the network datasets into the mega-dataset.
Table 2.
Performance Comparison across Models Trained on Single-Network Datasetsa
Model
Network1
Network2
Network3
Network4
Network5
Network6
Weighted Avg.
Random Forest (zero imputation)
0.036
0.010
0.018
0.023
0.046
0.042
0.029
Random Forest (median imputation)
0.035
0.007
0.003
0.012
0.049
0.005
0.018
XGBoost (zero imputation)
0.034
0.023
0.087
0.039
0.066
0.056
0.048
BRITS (zero imputation)
0.050
0.026
0.078
0.039
0.065
0.053
0.048
XGBoost
0.033
0.026
0.087
0.041
0.066
0.060
0.050
BRITS
0.044
0.029
0.087
0.040
0.067
0.056
0.050
The assessment metric used is PR-AUC to recall 0.1, defined in Section 4.A. The rightmost column shows each model’s scores averaged over the mega-datasets that represent models’ overall performance. The average scores are weighted by the number of samples in each test dataset. The best result of each column is highlighted in bold.
Table 3.
Performance Comparison across Models Trained on Mega-Datasetsa
Model
Network1
Network2
Network3
Network4
Network5
Network6
Weighted Avg.
Random Forest (zero imputation)
0.024
0.009
0.009
0.022
0.044
0.028
0.024
XGBoost
0.058
0.023
0.089
0.040
0.064
0.055
0.048
BRITS (pre-trained only)
0.056
0.026
0.088
0.038
0.066
0.056
0.049
BRITS (fine-tune the classifier only)
0.060
0.026
0.085
0.038
0.068
0.056
0.050
BRITS (fine-tune the entirety)
0.055
0.030
0.087
0.040
0.068
0.057
0.052
The assessment metric used is PR-AUC to recall 0.1. The rightmost column shows each model’s scores averaged over the mega-datasets that represent models’ overall performance. The average scores are weighted by the number of samples in each test dataset. The best result of each column is highlighted in bold.
Table 4.
Comparison of Model Performance on Two Important Use-Casesa
Model
100G OTN Line Cards
10G Ethernet Clients
XGBoost trained on 100G OTN line cards
0.049
/
XGBoost trained on 10G Ethernet clients
/
0.047
XGBoost trained on all 12 facility types
0.049
0.053
BRITS trained on 100G OTN line cards
0.054
/
BRITS trained on 10G Ethernet clients
/
0.050
BRITS trained on all 12 facility types
0.055
0.052
Models presented in this table are all trained on the mega-dataset, but on different subsets of the 12 facilities. For example, XGBoost trained on 100G OTN line cards is trained on samples collected from only line-facing ports of 100G OTN line cards. The evaluation metric used is PR-AUC to recall 0.1. Models trained on 12 facility types obtain close performance with models specifically trained on 100G lines or ETH10G clients.