Bayesian generative adversarial network emulator based end-to-end learning strategy of the probabilistic shaping for OAM mode division multiplexing IM/DD transmission

Qi Xu; Ran Gao; Ran Gao; Huan Chang; Huan Chang; Zhipei Li; Fei Wang; Yi Cui; Yi Cui; Yi Cui; Jie Liu; Dong Guo; Xiaolong Pan; Lei Zhu; Qi Zhang; Qi Zhang; Qi Zhang; Qinghua Tian; Qinghua Tian; Qinghua Tian; Xin Huang; Jinghao Yan; Lin Jiang; Xiangjun Xin

doi:10.1364/OE.502563

1. Introduction

In recent years, the explosive growth of global Internet traffic has imposed high-capacity requirements on IM/DD transmission for short-reach optical interconnections [1,2]. In order to overcome the capacity bottleneck of single-mode fiber (SMF), orbital angular momentum (OAM) mode division multiplexing (MDM) was developed as a new multiplexing technology [3–7]. Typical schemes for OAM-MDM IM/DD transmission are based on the use of ring-core fiber (RCF), and include a 400-Gbit/s OAM-MDM transmission scheme using PAM modulation, a 186.4-Gbit/s OAM-MDM transmission scheme with DMT modulation, and a 2.5-Gbaud 3D CAP-8 architecture for OAM-MDM transmission [8–10]. RCF-based OAM-MDM IM/DD communication systems have achieved ultra-high capacity transmission when used in optical interconnections.

Probabilistic shaping (PS) is a well-established technique in optical communication that can raise the transmission capacity to close to the Shannon limit without increasing the transmission power and complexity of the system [11]. However, the probability distribution in conventional PS technology always obeys a Maxwell-Boltzmann (MB) distribution according to the additive white Gaussian noise (AWGN) channel with a SMF. The mode coupling between different OAM modes significantly affects the channel model of the OAM-MDM, and in short-reach OAM-MDM IM/DD systems, nonlinear impairments are significantly increased due to the insufficient bandwidth of electro-optical devices such as electrical amplifiers (EAs), electro-optic modulators, and spatial light modulators (SLMs). In particular, the nonlinear effects of the SLM may be generated through several physical phenomena, such as self-phase modulation and memory effects, making it difficult to create an accurate model. Both the mode coupling and the nonlinear impairment lead to a considerable gap between the OAM-MDM channel and the conventional AWGN channel. Existing PS technology is therefore unsuitable for an OAM-MDM IM/DD system.

The end-to-end (E2E) learning strategy is a machine learning technique based on an autoencoder (AE) that provides a way of automatically tailoring signal modulation to arbitrary communication links [12]. Stark et al. were the first to present an E2E solution to achieve the optimal probability distribution of the AWGN channel [13]. Aref et al. proposed a novel AE-based learning of PS for coded-modulation systems [14]. An accurate channel model of a real optical fiber communication system is difficult to create, and deep-learning-based E2E frameworks are usually adopted to give an accurate channel model, such as CGANs [15,16], DNNs [17] or heterogeneous neural networks [18,19]. However, the channel model of the test signal is greatly changed due to the random mode coupling in time-variant OAM-MDM transmission. Channel modelling strategies based on conventional machine learning assign weight coefficients with fixed values based on the training signal, and as a result, it is difficult to use these strategies to build a stochastic channel model for an OAM-MDM IM/DD system.

In this paper, we propose a scheme based on a Bayesian generative adversarial network (BGAN) and E2E learning for the PS in an OAM-MDM IM/DD system. Our BGAN-based E2E learning scheme can accurately fit a stochastic nonlinear model of an OAM-MDM system by treating the weights and biases as probability distribution. Secondly, we use a BGAN-based E2E strategy to learn the optimal probability distribution based on the carrierless amplitude phase (CAP) modulation format. An experiment is carried out to verify the effectiveness of the proposed method, and the results demonstrate that the proposed strategy can be used to build an accurate channel model of the OAM-MDM, and significantly improves the performance of the PS in a time-variant OAM-MDM IM/DD system compared with the conventional MB distribution.

2. End-to-end deep learning of probabilistic shaping for OAM-MDM system using BGAN emulator

In traditional SMF optic communication systems, PS is commonly implemented using an MB distribution, which is optimal for the AWGN channel. However, in an OAM-MDM system, there is a significant gap between the OAM-MDM channel and the AWGN channel. As a result, existing PS techniques are not suitable for OAM-MDM system. In order to achieve an accurate channel response for an OAM-MDM system, a novel E2E learning strategy based on a BGAN emulator is proposed for OAM-MDM transmission with CAP-32 modulation. The E2E learning scheme considered in this paper is shown in Fig. 1.

Fig. 1. Diagram showing the proposed E2E learning strategy based on a BGAN emulator for an OAM-MDM system: (a) actual OAM-MDM system; (b) proposed OAM-MDM system based on AE.

Download Full Size | PDF

Figure 1(a) shows an OAM-MDM system with CAP-32 modulation. In the proposed E2E learning scheme, the whole OAM-MDM system is implemented as an AE, including the transmitter, a channel based on an BGAN emulator, and the receiver, which enables joint training of the transmitter and receiver. The AE encoder achieves an optimal probability distribution ${P_M}$. The network parameters of the BGAN emulator are treated as a probability distribution rather than being fixed constants, meaning that the BGAN emulator can accurately model the channel response of the OAM-MDM system even though the received signal is subject to a strong stochastic nonlinear impairment. Then, as shown in Fig. 1(b), PS based on the BGAN emulator is achieved through E2E learning, thereby enhancing the performance of the OAM-MDM system.

2.1. BGAN-based channel modelling strategy for an OAM-MDM system

For a given transmitted signal originally in the OAM-MDM system, the received signal can be mathematically represented as

(1)$$\textrm{y}(n) = H(x(n)) + noise(n), $$

where n denotes the order of the CAP-32 symbol. $x(n) = [{{x_1},{x_2}, \cdot{\cdot} \cdot {x_n}} ],{x_n} \in ( - 1,1)$ denotes the normalised transmitted CAP-32 symbol sequence that has been upsampled with a factor of $m$. Hence, each symbol ${x_i}(i = 1,2, \ldots n)$ consists of m samples, and can be expressed as ${x_i} = [x_i^1,x_i^2 \cdots x_i^m]$. $noise(n)$ denotes the additive noise generated in the high-speed OAM-MDM system. $y(n) = [{{y_1},{y_2}, \cdot{\cdot} \cdot {y_n}} ]$ represents the received CAP-32 symbol sequence, where each symbol ${y_i}(i = 1,2, \ldots n)$ is also composed of m samples, that is, ${y_i} = [y_i^1,y_i^2 \cdots y_i^m]$. $H$ represents the complex channel response of the OAM-MDM system. Each transmitted symbol $x(n)$ will be input to the BGAN emulator to simulate the distorted symbol $\widehat y(n)$.

Before emulating the channel response in the OAM-MDM, the transmitted and received CAP-32 signals are preprocessed as shown in Fig. 2. The current symbol ${x_i}$ is wrapped with its l preceding and l succeeding symbols to form a condition vector ${X_i} = [{{x_{i - l}} \cdot{\cdot} \cdot ,{x_{i - l}} \cdot{\cdot} \cdot {x_{i - 1}},{x_i},{x_{i + 1}} \cdot{\cdot} \cdot {x_{i + l}}} ]$. The transmitted time-series signal $x(n) = [{{x_1},{x_2}, \cdot{\cdot} \cdot {x_n}} ]$ is transformed into a matrix composed of $(n - 2l)$ condition vectors with a length of $(2l + 1)m$. The condition vector matrix can be expressed as

(2)$$X = \left( {\begin{array}{ccc} {x_1^1}& \ldots &{x_{2l + 1}^m}\\ \vdots & \ddots & \vdots \\ {x_{n - 2l}^1}& \cdots &{x_n^m} \end{array}} \right) = \left( \begin{array}{l} {X_1}\\ \vdots \\ {X_{n - 2l}} \end{array} \right). $$

The received signal ${y_i}$ corresponding to the condition vector ${X_i}(i = (n - 2l))$ is taken as real data, and can be expressed as

(3)$$Y = \left( {\begin{array}{ccc} {y_{l + 1}^1}& \ldots &{y_{l + 1}^m}\\ \vdots & \ddots & \vdots \\ {y_{n - l}^1}& \cdots &{y_{n - l}^m} \end{array}} \right) = \left( \begin{array}{l} {Y_1}\\ \vdots \\ {Y_{n - 2l}} \end{array} \right). $$

Fig. 2. Data preprocessing

Download Full Size | PDF

The condition vector X and real data Y are combined to form the dataset $Q = \{{X,Y} \}$, which is then divided into two parts to give the training dataset ${Q_T} = \{{{X_T},{Y_T}} \}$ and the testing dataset ${Q_P} = \{{{X_P},{Y_P}} \}$, where ${X_T} = [{{X_1},{X_2} \cdots {X_T}} ]$, ${Y_T} = [{{Y_1},{Y_2} \cdots {Y_T}} ]$, ${X_P} = [{{X_1},{X_2} \cdots {X_P}} ]$ and ${Y_P} = [{{Y_1},{Y_2} \cdots {Y_P}} ]$.

The structure of the BGAN emulator is shown in Fig. 3(a). It consists of a generator and a discriminator, both of which comprise an input layer, a hidden layer, and an output layer. The generator models the channel of the OAM-MDM system to generate the emulated signal, while the discriminator compares the emulated signal with the real data. The BGAN emulator is trained using the training dataset ${Q_T}$ to calculate the probability distribution of the parameters (weights and bias). The probability distributions of the generator and discriminator parameters are updated iteratively through adversarial training until the emulated signal matches the real data.

Fig. 3. Schematic diagram of the BGAN emulator: (a) structure of the BGAN emulator; (b) training of the model parameters.

Download Full Size | PDF

First, in the input layer, a set of standard normal distribution noise vectors z is randomly generated. The noise vectors enable the generator to generate diverse data to match the noise distribution in the channel. The condition vector ${X_i}$ and the noise vector z are then concatenated to form the input vector $({{X_i} + z} )$ for the generator.

Next, the hidden layer is formed from two fully connected layers. All the weights and biases for the generator are initialized to follow a standard normal distribution with mean 0 and standard deviation 1, as shown in Eq. (4):

(4)$$p({{\theta_g}} )= \frac{1}{{\sqrt {2\pi {\sigma ^2}} }} \times {e^{ - \frac{{{{({\theta _g} - \mu )}^2}}}{{2{\sigma ^2}}}}}, $$

where $\mu$ is the mean value of the trainable parameter, and its initial value is 0. $\sigma$ is the standard deviation of the trainable parameter, and its initial value is 1. ${\theta _g}$ represents the parameters (${W_1}$, ${W_2}$, ${W_3}$, ${b_1}$, ${b_2}$, ${b_3}$) of the generator. $p({{\theta_g}} )$ represents the prior of the generator parameters. The emulated signal ${\widehat y_i}$ is calculated by the hidden layer and the output layer of the generator, and can be expressed as

(5)$${\widehat y_i} = {W_3}(relu({W_2}(relu({W_1}({X_i} + z) + {b_1}) + {b_2}))) + {b_3}, $$

where ${\widehat y_i}$ is a vector. ${W_1}$, ${W_2}$ and ${b_1}$, ${b_2}$ are the weights and biases of the hidden layer, respectively, and ${W_3}$ and ${b_3}$ are the weights and biases of the output layer, respectively.

In the next step, the real data ${Y_i}$ and the corresponding emulated signal ${\widehat y_i}$ are concatenated with the condition vector ${X_i}$, to serve as the input vector for the discriminator.

The hidden layer is also formed using two fully connected layers. All the weights and biases for the discriminator are initialized to follow a standard normal distribution with mean 0 and standard deviation 1, as shown in Eq. (6):

(6)$$p({{\theta_d}} )= \frac{1}{{\sqrt {2\pi {\sigma ^2}} }} \times {e^{ - \frac{{{{({\theta _d} - \mu )}^2}}}{{2{\sigma ^2}}}}}, $$

where ${\theta _d}$ represents the parameters (${W_1}^\prime$, ${W_2}^\prime$, ${W_3}^\prime$, ${b_1}^\prime$, ${b_2}^\prime$, ${b_3}^\prime$) of the discriminator. $p({{\theta_d}} )$ represents the prior of the discriminator parameters.

The likelihood probabilities of the parameter for the discriminator are then calculated using the hidden layer and output layer, and can be expressed as

(7)$$\begin{aligned} p({Q_T}|{\theta _g}) &= sigmoid(relu( {W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {\widehat y_i}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime}))\\ & = sigmoid(relu( {W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {W_3}(relu({W_2}(relu({W_1}\\ &({X_i} + z) + {b_1}) + {b_2}))) + {b_3}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime})) \end{aligned}$$

(8)$$\begin{aligned} p({Q_T}|{\theta _d}) &= sigmoid(relu( {W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {Y_i}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime}))\\ & + (1 - sigmoid(relu({W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {\widehat y_i}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime})))\\ & = sigmoid(relu( {W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {Y_i}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime}))\\ & + (1 - sigmoid(relu({W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {W_3}(relu({W_2}(relu\\ & ({W_1}({X_i} + z) + {b_1}) + {b_2}))) + {b_3}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime}))) \end{aligned}, $$

where ${W_1}^\prime$, ${W_2}^\prime$ and ${b_1}^\prime$, ${b_2}^\prime$ are the weights and biases of the hidden layer respectively. ${W_3}^\prime$ and ${b_3}^\prime$ are the weights and bias of the output layer, respectively.

The output P of the discriminator is a probability score indicating the “realness” of the emulated signal, which can be expressed as

(9)$$ \begin{aligned} P& = sigmoid(relu( {W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {\widehat y_i}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime}))\\ & = sigmoid(relu( {W_3}^{\prime}(relu({W_2}^{\prime}(relu({W_1}^{\prime}({X_i} + {W_3}(relu({W_2}(relu({W_1}\\ & ({X_i} + z) + {b_1}) + {b_2}))) + {b_3}) + {b_1}^{\prime})) + {b_2}^{\prime}) + {b_3}^{\prime})) \end{aligned}$$

Since the parameters of the BGAN emulator are determined according to the stochastic nonlinearity for the OAM-MDM, the purpose of training the BGAN emulator is to find the posterior distribution of the model parameters $p(\theta |{Q_T}),\theta = \{{{\theta_g},{\theta_d}} \}$ through the Bayesian formula, which can be expressed as

(10)$$p({\theta _g}|{Q_T}) = \frac{{p({Q_T}|{\theta _g}) \times p({\theta _g})}}{{p({Q_T})}}$$

(11)$$p({\theta _d}|{Q_T}) = \frac{{p({Q_T}|{\theta _d}) \times p({\theta _d})}}{{p({Q_T})}}, $$

where $p({Q_T})$ denotes the marginal likelihood. The prior and likelihood probabilities can be obtained using Eqs. (4)–8. However, $p({Q_T})$ represents all possible values of parameter $\theta$, which is difficult to obtain. Hence, the variation inference is used as an analytical approximation technique to learn the posterior distribution $p(\theta |{Q_T})$ based on the prior and likelihood probabilities of the BGAN emulator parameters [20]. As shown in Fig. 3(b), variation inference can approximate the posterior distributions $p({\theta _g}|{Q_T})$ and $p({\theta _d}|{Q_T})$ by using the Gaussian distributions ${q_{{\omega _g}}}({{\theta_g}} )$ and ${q_{{\omega _d}}}({{\theta_d}} )$ with variation parameters ${\omega _g} = ({\mu ,\sigma } )$ and ${\omega _d} = ({\mu^{\prime},\sigma^{\prime}} )$. First, ${q_{{\omega _g}}}({{\theta_g}} )$ and ${q_{{\omega _d}}}({{\theta_d}} )$ are initialized as standard normal distributions, and variation learning is then used to optimise the parameters ${\omega _g} = ({\mu ,\sigma } )$ and ${\omega _d} = ({\mu^{\prime},\sigma^{\prime}} )$ for ${q_{{\omega _g}}}({{\theta_g}} )$ and ${q_{{\omega _d}}}({{\theta_d}} )$ by minimising the Kullback-Leibler divergence with respect to the posterior distributions $p({\theta _g}|{Q_T})$ and $p({\theta _d}|{Q_T})$ [21], as follows:

(12)$$\begin{aligned} {\omega _g}^ {\ast}&= \arg \mathop {\min }\limits_{{\omega _g}} KL[{{q_{{\omega_g}}}({{\theta_g}} )\parallel p({\theta_g}|{Q_T})} ]\\ &= \arg \mathop {\min }\limits_{{\omega _g}} \int {{q_{{\omega _g}}}({{\theta_g}} )\log \frac{{{q_{{\omega _g}}}({{\theta_g}} )}}{{p({{\theta_g}} )p({Q_T}|{\theta _g})}}} d{\theta _g}\\ &= \arg \mathop {\min }\limits_{{\omega _g}} KL[{{q_{{\omega_g}}}{\theta_g}\parallel p({\theta_g})} ]- {\mathrm{\mathbb{E}}_{{q_\omega }({{\theta_g}} )}}[{\log p({Q_T}|{\theta_g})} ]\end{aligned}$$

(13)$$\begin{aligned} {\omega _d}^ {\ast}&= \arg \mathop {\min }\limits_{{\omega _d}} KL[{{q_{{\omega_d}}}({{\theta_d}} )\parallel p({\theta_d}|{Q_T})} ]\\ &= \arg \mathop {\min }\limits_{{\omega _d}} \int {{q_{{\omega _d}}}({{\theta_d}} )\log \frac{{{q_{{\omega _d}}}({{\theta_d}} )}}{{p({{\theta_d}} )p({Q_T}|{\theta _d})}}} d{\theta _d}\\ &= \arg \mathop {\min }\limits_{{\omega _d}} KL[{{q_{{\omega_d}}}{\theta_d}\parallel p({\theta_d})} ]- {\mathrm{\mathbb{E}}_{{q_\omega }({{\theta_d}} )}}[{\log p({Q_T}|{\theta_d})} ]\end{aligned}. $$

After the variation learning, ${q_{{\omega _g}^ \ast }}({{\theta_g}} )\approx p({\theta _g}|{Q_T})$ and ${q_{{\omega _d}^ \ast }}({{\theta_d}} )\approx p({\theta _d}|{Q_T})$, indicating that the distribution of parameters of the BGAN emulator is optimised and matches the probability distribution of the stochastic nonlinearity for the OAM-MDM.

The combination of stochastic mode coupling in OAM-MDM with optoelectronic devices typically results in strong stochastic nonlinear damage, which poses a challenge for traditional channel emulators with fixed parameters. However, the parameters of the BGAN emulator follow a probability distribution rather than having fixed values. By leveraging the variational inference and updating the posterior probability distribution of the parameters using prior and likelihood probabilities, the stochastic nonlinear features of the OAM-MDM CAP-32 signal can be accurately extracted from the probability distribution of the parameters. Hence, our BGAN emulator can obtain an accurate model from the posterior probability distribution of each parameter, thus forming an effective OAM-MDM transmission channel emulator for stochastic nonlinearity.

2.2. Probabilistic shaping for OAM-MDM system based on AE

After obtaining the BGAN emulator, the gradient can process from the AE decoder to the AE encoder. In this way, the proposed E2E AE scheme with PS for an OAM-MDM system with CAP-32 modulation is implemented.

Our E2E AE scheme based on PS for an OAM-MDM system with CAP-32 modulation is depicted in Fig. 4. It consists of an encoder, a mapper, CAP modulation, a BGAN emulator, CAP demodulation, and a decoder. The encoder is composed of two fully connected layers, while the decoder is composed of three fully connected layers.

Fig. 4. Diagram showing our AE scheme for PS in an OAM-MDM system.

Download Full Size | PDF

The constant t is first fed into the AE encoder, as shown in Fig. 4(a). The probability ${P_M}$ of the constellation point is calculated by the two fully connected layers of the encoder, and can be expressed as

(14)$${P_M} = SoftMax ({W_2}^{\prime \prime }(ReLU ({W_1}^{\prime \prime } \times t + {b_1}^{\prime \prime }) + {b_2}^{\prime \prime }), $$

where ${P_M} = \{{{p_1}, \ldots ,{p_M}} \}$ represents the probability of the constellation point ${C_M} = \{{{c_1}, \ldots {c_M}} \}$, and M represents the modulation order. ${W_1}^{\prime \prime }$, ${W_2}^{\prime \prime }$ and ${b_1}^{\prime \prime }$, ${b_2}^{\prime \prime }$ are the weights and biases of the fully connected layer. $ReLU ()$ and $SoftMax ()$ are the activation functions.

Next, as shown in Fig. 4(b), in order to generate a symbol sequence with probability ${P_M}$, we apply Algorithm 2 from [22] with batch size N. We convert the symbol sequence to their one-hot representations to get one-hot vector matrix V with size $N \times M$ [14]. The one-hot vector matrix can be expressed as

(15)$$V = \left( {\begin{array}{ccc} {{v_{1 \times 1}}}& \ldots &{{v_{1 \times M}}}\\ \vdots & \ddots & \vdots \\ {{v_{N \times 1}}}& \cdots &{{v_{N \times M}}} \end{array}} \right) = \left\{ \begin{array}{l} {V_1}\\ \vdots \\ {V_N} \end{array} \right\}. $$

Then, to keep the constellation power constant, the constellation point ${C_M} = \{{{c_1}, \ldots {c_M}} \}$ is normalised based on the probability ${P_M} = \{{{p_1}, \ldots ,{p_M}} \}$, which can be expressed as

(16)$${\hat{c}_m} = {c_m}/\sqrt {\sum\nolimits_{i = 1}^M {{p_i}{c_i}^2} }, $$

where ${c_m}$($m = 1, \ldots M$) represents the constellation point, ${\hat{c}_m}$ represents the normalised constellation point, and ${\hat{C}_M} = \{{{{\hat{c}}_1}, \ldots {{\hat{c}}_M}} \}$. The one-hot vector matrix V is then multiplied by ${\hat{C}_M}$ to obtain the shaped symbol sequence $K$:

(17)$$K = V \times {\hat{C}_M}. $$

Next, as shown in Fig. 4(c), the shaped symbol sequence $K = \{{{k_1}, \ldots {k_N}} \}$ is upsampled and filtered using two shaping filters ${f_1}$, ${f_2}$. It is then summed to generate a CAP signal $X = \{{{x_1}, \ldots {x_N}} \}$ to complete the CAP modulation.

As shown in Fig. 4(d), the CAP signal $X = \{{{x_1}, \ldots {x_N}} \}$ is then fed into the BGAN emulator to generate a received signal $Y = \{{{Y_1}, \ldots {Y_N}} \}$, which contains stochastic nonlinear impairment in the OAM-MDM channel.

At the receiver, the received signal $Y = \{{{Y_1}, \ldots {Y_N}} \}$ is sent to the matched filters ${g_1}$, ${g_2}$ and then downsampled to generate the CAP demodulated signal $E = \{{{E_1}, \ldots {E_N}} \}$, as shown in Fig. 4(e).

Finally, as shown in Fig. 4(f), the decoder maps the CAP demodulated signal $E = \{{{E_1}, \ldots {E_N}} \}$ to a probability vector ${R_j}(j = 1, \ldots N)$, which can be expressed as

(18)$${R_j} = SoftMax ({W_3}^{\prime \prime \prime }(ReLU {W_2}^{\prime \prime \prime }(ReLU ({W_1}^{\prime \prime \prime } \times E + {b_1}^{\prime \prime \prime }) + {b_2}^{\prime \prime \prime })) + {b_3}^{\prime \prime \prime }), $$

where ${W_1}^{\prime \prime \prime }$, ${W_2}^{\prime \prime \prime }$, ${W_3}^{\prime \prime \prime }$ and ${b_1}^{\prime \prime \prime }$, ${b_2}^{\prime \prime \prime }$, ${b_3}^{\prime \prime \prime }$ are the weights and biases of the fully connected layer.

When the probability vector $R = {\{{{R_1}, \ldots ,{R_N}} \}^T}$ has been obtained, the loss function is calculated based on R, V and ${P_M}$, and can be expressed as [13]

(19)$$L = \frac{1}{N}\sum\nolimits_{n = 1}^N {[{ - {V_n}\log ({R_n})} ]} - H({{P_M}} ), $$

where $H({{P_M}} )$ is the entropy of ${P_M}$. The E2E AE uses L to calculate the loss for the training process, and the Adam gradient descent function is applied to update the model parameters $({W_1}^{\prime \prime },{b_1}^{\prime \prime },{W_2}^{\prime \prime },{b_2}^{\prime \prime },{W_1}^{\prime \prime \prime },{b_1}^{\prime \prime \prime },{W_2}^{\prime \prime \prime },{b_2}^{\prime \prime \prime },{W_3}^{\prime \prime \prime },{b_3}^{\prime \prime \prime })$ using backpropagation. The optimal probability distribution ${P_M}$ is found when the loss function L converges. In this way, the AE can obtain optimal PS for the OAM-MDM system through the use of a BGAN emulator.

3. Experimental

3.1. Experimental setup

In order to verify the effectiveness of the proposed BGAN emulator, an experiment was carried out on a 200 Gbit/s OAM-MDM IM/DD transmission with two OAM modes over a 5 km RCF, as shown in Fig. 5.

Fig. 5. Experimental setup for the E2E OAM-MDM system.

Download Full Size | PDF

At the transmitter, a pseudo-random bit sequence with the length of ${2^{18}}$ was generated. The bit data were converted into CAP-32 symbols following the constellation mapping, and then filtered using shaping filters and summed to generate CAP-32 signals. The electrical signal was generated using an arbitrary waveform generator (AWG). Every 11 adjacent symbols were input to the BGAN emulator as the condition vector ${X_i}$. In order to ensure that the emulator learns the nonlinear characteristics of the channel, rather than the generated pseudo-random sequence, the different random number seeds are utilized for the training data and the test data. This difference generates two data sequences with different characteristic, which guarantees that the proposed emulator learns the nonlinear characteristics of the channel, rather than the random sequence. Following electrical amplification (EA), the electric signal was utilised to modulate an optical carrier with a wavelength of $1550.12$ nm through a Mach-Zehnder Modulator (MZM), generating a double-sideband optical signal. Subsequently, an optical coupler (OC) was employed to split the generated optical signal into two branches, which were then amplified using an erbium-doped fiber amplifier (EDFA). One of these branches was delayed for data mode decorrelation using a 10 m SMF. Two polarisation controllers (PCs) were used to adjust the direction of polarisation of the signal light. The two light beams were transmitted to an SLM through the collimators (Col.), and the linear polarizers (LP). The two Gaussian light beams were converted into OAM beams with $l = 2$ and $l = 3$ through the two vortex patterns of the SLMs. These two OAM beams were then combined into a single beam by employing a polarisation beam splitter (PBS). In view of the transmission characteristics of the RCF, the multiplexed OAM beam was coupled to a 5 km RCF for transmission through a quarter-wave plate (QWP).

In Fig. 5, the insets (i) show the cross section of the RCF. The intensity profiles of the two OAM modes $l = 2$ and $l = 3$ after 5 km RCF transmission are shown in (ii) and (iii), respectively. At the receiving side, the multiplexed OAM beams were split into two using a beam splitter (BS), and were converted into linearly polarised beams using two QWPs. The two beams were then passed through the vortex phase plate (VPP), which converted them into Gauss beams. These two Gauss beams were coupled to SMFs via collimators and converted into electronic signals via two PDs. The electronic waveform was recorded using a real-time oscilloscope. We then sampled the electronic signal using the real-time oscilloscope to obtain the corresponding digital signal, which underwent sequential resampling, low-pass filtering and synchronisation before being sent to the BGAN emulator as the real data ${Y_i}$. In order to compare the performance between the BGAN and CGAN emulators, the digital signal was also sent to the conventional CGAN emulator via the same process.

The condition vector ${X_i}$ and real data ${Y_i}$ were used as a dataset for training the BGAN and CGAN emulators. The trained emulators were then integrated with the AE to obtain the optimal probability distribution for the OAM-MDM system. The original input bit sequence was mapped using a distribution matcher to produce a symbol sequence with the learned optimal probability distribution [23]. The symbol sequence was modulated to generate a CAP electrical signal using the AWG, while the electrical signal was modulated by an optical carrier with the MZM and transmitted to the OAM-MDM system, as shown in Fig. 5. The GMI and BER for the BGAN and CGAN emulators were compared to verify the performance of the proposed scheme.

3.2. Experimental results and analysis

3.2.1 Performance of the BGAN emulator for the OAM-MDM system

Figures 6(a) and (b) show the signal waveform in the time and frequency domains after transmission via the real OAM-MDM system channel in different OAM modes, the CGAN emulator, and the BGAN emulator at a received optical power (ROP) of 1dBm. Compared with the conventional CGAN emulator, the output signal waveform of the BGAN emulator shows higher consistency with the real channel output in both the time and frequency domains. In order to quantitatively represent the channel modelling effect of the OAM-MDM system using the BGAN and CGAN emulators, we adopt the normalised MSE (NMSE) of the power, following the method in [24]. This quantity can be expressed as

(20)$$NMSE = \frac{{\sum\nolimits_1^k {{{(\hat{y} - y)}^2}} }}{{\sum\nolimits_1^k {{y^2}} }}, $$

where k is the data length, y is the real OAM-MDM system channel output data after synchronisation, and $\hat{y}$ is the data generated by the emulator. Figures 7(a) and (b) show the NMSE for the signal generated by the CGAN and BGAN emulators at different ROPs in different OAM modes $\left( {l = \langle 2,3\rangle } \right)$. Compared with the conventional CGAN emulator, the maximum improvements in modelling accuracy of the BGAN emulator are 29.3% and 26.3% at an ROP of −3dBm for the two modes, respectively. This result verifies the effectiveness of the proposed BGAN emulator in terms of characterising the stochastic nonlinear impairment in an OAM-MDM system based on a probability distribution for the parameters.

Fig. 6. Channel output waveforms in the time and frequency domains based on OAM-MDM system channel, CGAN emulator and BGAN emulator for (a) $l = 2$, (b) $l = 3$.

Download Full Size | PDF

Fig. 7. NMSEs of signal amplitude versus ROP for (a) $l = 2$, (b) $l = 3$ in an OAM MDM IM/DD transmission system over a 5 km RCF.

Download Full Size | PDF

To further validate the effectiveness of the BGAN emulator for the OAM-MDM system channel, we experimentally compared the BER performance of the BGAN and CGAN emulators under different ROPs of the CAP-32 transmitted signal, as shown in Fig. 8. The average BER errors between the signal generated by the BGAN emulator and the real received signal were only 9% and 8.9% for $l = 2$ and $l = 3$, respectively; these were much lower than the values of 23.5% and 21.9% found for the error between the signal generated by the CGAN emulator and the real received signal. This indicates that the proposed BGAN emulator is feasible for use in modelling the OAM-MDM system channel with stochastic nonlinear impairment.

Fig. 8. BER for CAP32 versus ROP for (a) $l = 2$, (b) $l = 3$ in an OAM-MDM IM/DD transmission system over a 5 km RCF.

Download Full Size | PDF

To further validate the robustness of the BGAN emulator for the OAM-MDM system channel, we experimentally compared the NMSE performance of the fixed BGAN and retrained BGAN emulator under different voltage peak-to-peak (VPPs). In the fixed BGAN, the BGAN emulator is trained only under the VPPs of 350 mV, and tested under different VPPs transmitted signal with the range from 200 mV to 450 mV. In the retrained BGAN, the BGAN emulator is trained under the VPP from 200 mV to 450 mV, and tested under different VPPs transmitted signal with the same range. As shown in Fig. 9, the average NMSE errors between the fixed BGAN and retrained BGAN emulator were only 6.1% and 7.2% for $l = 2$ and $l = 3$, respectively. This indicates that the fixed BGAN emulator has good robustness and can adapt to different signal condition.

Fig. 9. The performance of the fixed trained BGAN emulator under different Vpps for (a) $l = 2$, (b) $l = 3$ in an OAM-MDM IM/DD transmission system over a 5 km RCF.

Download Full Size | PDF

The complexity of both the proposed BGAN emulator and the CGAN emulator was also analysed. In terms of the number of real multiplications, the complexity of the BGAN emulator can be expressed as [25]:

(21)$$C{C_{BGAN}} = ({d_1} + {d_2} + 1){h_1} + ({h_1} + 1){h_2} + ({h_2} + 1){d_3}, $$

where ${d_1}$ denotes the length of the condition vector, and ${d_2}$ denotes the length of the noise vector. ${h_1}$ denotes the number of neurons in the first hidden layer, ${h_2}$ is the number of neurons in the second hidden layer, and ${d_3}$ is the length of the generated data.

The number of real multiplications for the CGAN emulator can be expressed as:

(22)$$C{C_{CGAN}} = ({d_1} + {d_2}){k_1} + {k_1}{k_2} + {k_2}{d_3}, $$

where ${k_1}$ denotes the number of neurons for the first hidden layer, ${k_2}$ denotes the number of neurons for the second hidden layer. ${d_1}$, ${d_2}$, and ${d_3}$ are the same as BGAN emulator.

The complexity of the proposed BGAN emulator and the CGAN emulator was evaluated with the two OAM modes at different values of the ROP. The complexity of the two emulators with similar NMSE performance is presented in Tables 1 and 2. Compared with the conventional CGAN emulator, the maximal reductions in complexity achieved by the BGAN emulator are 36.3% and 33.7%, respectively, at an ROP of −3dBm for the two modes. We can conclude that the proposed BGAN emulator has both high performance and low complexity.

Table 1. Experimental results for the complexity of the CGAN and BGAN emulators for l = 2

View Table | View all tables in this article

Table 2. Experimental results for the complexity of the CGAN and BGAN emulators for l = 3

View Table | View all tables in this article

3.2.2 Performance validation of AE-based PS for OAM-MDM system

In this section, we report the use of an AE to perform PS for the OAM-MDM system. The parameters of the AE are shown in Table 3. The PS results based on the AE implementation of the BGAN and CGAN emulators with different input optical powers (IOPs) for the two OAM modes are shown in Figs. 10 and 11. With an increase in the IOP, the entropy of the AE-based PS for both the CGAN and BGAN emulators first increases and then decreases. This is because with an increase in IOP, the signal-to-noise ratio (SNR) of the signal increases at first, and the nonlinear effect of the system decreases. As the IOP continues to increase, the nonlinear effect increases, which results in a higher probability of the central constellation points being obtained based on the AE. Furthermore, as shown in Fig. 10, the values of entropy achieved by the AE for the BGAN emulator are 4.4, 4.42, 4.7, 4.65, 4.59, and 4.56, which are lower than those of 4.47, 4.52, 4.73, 4.71, 4.64, and 4.66 obtained for CGAN emulator at the same IOPs for two OAM modes. This is because it is hard for the conventional CGAN emulator with fixed parameters to simulate the stochastic nonlinear impairment caused by mode coupling in the OAM-MDM system. However, the BGAN emulator can model the probability distribution of the stochastic nonlinear impairment statistically, and can therefore simulate the OAM-MDM transmission channel model accurately. The BGAN emulator captures more complex channel impairments and therefore the values of entropy achieved by the AE for the BGAN emulator are lower than CGAN emulator at the same IOP.

Fig. 10. Learned PS for the (a)-(c), (g)-(i) BGAN and (d)-(f), (j)-(l) CGAN emulators versus IOP for (a)-(f) $l = 2$, (g)-(l) $l = 3$ in the IM-DD OAM-MDM transmission system over a 5 km RCF.

Download Full Size | PDF

Fig. 11. Entropy of learned PS-based BGAN and CGAN emulators for CAP-32 versus IOP for (a) $l = 2$, (b) $l = 3$ in an IM-DD OAM-MDM transmission system over a 5 km RCF.

Download Full Size | PDF

Table 3. Parameters of the AE

View Table | View all tables in this article

Figure 12 shows the measured GMI versus ROP for different OAM modes in the OAM-MDM IM-DD transmission system over a 5 km RCF. Compared with OAM-MDM transmission without PS, the GMIs of the AE-based PS with the BGAN emulator are increased by a maximum of 0.5 and 0.55 bits/symbol at an ROP of −3dBm for OAM modes $l = 2$ and $l = 3$, respectively. Compared with the MB distribution with the same entropy, the GMIs for the AE-based PS with BGAN emulator are increased by a maximum of 0.31 and 0.33 bits/symbol at an ROP of −3dBm for the two OAM modes, respectively. Compared with AE-based PS with the CGAN emulator, the GMIs for the AE-based PS with the BGAN emulator are increased by a maximum of 0.16 and 0.2 bits/symbol at ROP of −3dBm for the two OAM modes. This result verifies that the proposed AE-based BGAN emulator can achieve an optimal PS that matches the stochastic nonlinear impairment in the OAM-MDM system.

Fig. 12. Measured GMI versus ROP for (a) $l = 2$, (b) $l = 3$ in the IM-DD OAM-MDM transmission system over a 5 km RCF.

Download Full Size | PDF

4. Conclusion

This paper has proposed an E2E learning scheme with PS based on a BGAN emulator for an OAM-MDM IM/DD system. Due to the random mode coupling in time-varying OAM-MDM transmission, it is difficult for traditional channel emulators to capture the stochastic nonlinear impairment with fixed weight coefficients in OAM-MDM transmission. However, the weights and biases of the BGAN emulator are regarded here as a probability distribution, and can accurately match the stochastic nonlinear model of OAM-MDM. Our experimental results show that the proposed BGAN emulator outperforms the conventional CGAN emulator, with improvements in modelling accuracy of 29.3% and 26.3% for OAM modes $l = 2$ and $l = 3$, respectively. Furthermore, an AE-based PS scheme for the BGAN emulator is presented. In terms of the GMI, the proposed BGAN emulator with AE-based PS outperforms the model without PS, the conventional MB distribution, and the CGAN emulator, by 0.5 and 0.55, 0.31 and 0.33, 0.16 and 0.2 bits/symbol for the two OAM modes, respectively. These experimental results demonstrate that the proposed AE-based PS scheme with the BGAN emulator is a promising candidate for OAM-MDM IM/DD optic fiber communication.

Funding

National Science Fund for Distinguished Young Scholars (62022016); National Key Research and Development Program of China (2019YFA0706300); National Natural Science Foundation of China (62105026, 62205023); Fundamental Research Funds for the Central Universities, Beijing Municipal Natural Science Foundation (4222075); BIT Research and Innovation Promoting Project (2023YCXY028).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. K. Zhong, X. Zhou, J. Huo, C. Yu, C. Lu, and A. Lau, “Digital Signal Processing for Short-Reach Optical Communications: A Review of Current Technologies and Future Trends,” J. Lightwave Technol. 36(2), 377–400 (2018). [CrossRef]

2. Q. Li, X. Yang, H. Wen, Q. Xu, J. Yang, Y. Li, H. Yang, M. Leeson, and T. Xu, “Flexible All-Optical 8QAM Signal Format Conversion Using Pump Assisted Nonlinear Optical Loop Mirror,” J. Lightwave Technol. 41(20), 6446–6456 (2023). [CrossRef]

3. J. Zhang, Y. Wen, H. Tan, J. Liu, and S. Yu, “80-Channel WDM-MDM Transmission over 50-km Ring-Core 3 Fiber Using a Compact OAM DEMUX and Modular 4×4 MIMO Equalization,” Optical Fiber Communication Conference 2019 (OFC, 2019), W3F. 3.

4. R. Ryf, S. Randel, A. H. Gnauck, et al., “Mode-Division Multiplexing Over 96 km of Few-Mode Fiber Using Coherent 6 × 6 MIMO Processing,” J. Lightwave Technol. 30(4), 521–531 (2012). [CrossRef]

5. L. Zhu, H. Yao, H. Chang, et al., “Adaptive Optics for Orbital Angular Momentum-Based Internet of Underwater Things Applications,” IEEE Internet Things J. 9(23), 24281–24299 (2022). [CrossRef]

6. H. Chang, X. Yin, H. Yao, et al., “Low-Complexity Adaptive Optics Aided Orbital Angular Momentum Based Wireless Communications,” IEEE Trans. Veh. Technol. 70(8), 7812–7824 (2021). [CrossRef]

7. S. Zhou, X. Liu, R. Gao, Z. Jiang, and H. X. Xin, “Adaptive Bayesian neural networks nonlinear equalizer in a 300-Gbit/s PAM8 transmission for IM/DD OAM mode division multiplexing,” Opt. Lett. 48(2), 464–467 (2023). [CrossRef]

8. F. Wang, R. Gao, Z. Li, et al., “400 Gbit/s 4 mode transmission for IM/DD OAM mode division multiplexing optical fiber communication with a few-shot learning-based AffinityNet nonlinear equalizer,” Opt. Express 31(14), 22622–22634 (2023). [CrossRef]

9. J. Zhang, J. Liu, Z. Lin, and S. Yu, “Nonlinearity-Aware Adaptive Bit and Power Loading DMT Transmission over Low-Crosstalk Ring-Core Fiber with Mode Group Multiplexing,” J. Lightwave Technol. PP(99), 1 (2020). [CrossRef]

10. M. Yang, L. Wang, H. Wang, and L. Shen, “MDM Transmission of 3-D CAP over 4.1-km Ring-Core Fiber in Passive Optical Networks,” Optical Fiber Communication Conference 2021 (OFC 2021), Th1A. 3.

11. T. Fehenberger, A. Alvarado, G. Böcherer, and N. Hanik, “On Probabilistic Shaping of Quadrature Amplitude Modulation for the Nonlinear Fiber Channel,” J. Lightwave Technol. 34(21), 5063–5073 (2016). [CrossRef]

12. O. Timothy and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cogn. Commun. Netw. 3(4), 563–575 (2017). [CrossRef]

13. S. Maximilian, F. Aoudia, and J. Hoydis, “Joint learning of geometric and probabilistic constellation shaping,” 2019 IEEE Globecom Workshops (GC Wkshps). IEEE, 2019.

14. V. Aref and M. Chagnon, “End-to-end learning of joint geometric and probabilistic constellation shaping,” in Optical Fiber Communication Conference (OFC) 2022, (Optica Publishing Group, 2022), p. W4I.3.

15. B. Karanov, M. Chagnon, and V. Aref, “Concept and experimental demonstration of optical IM/DD end-to-end system optimization using a generative model,” in Optical Fiber Communication Conference (OFC) 2020, (Optica Publishing Group, 2020), p. Th2A.48.

16. Y. Hang, Z. Niu, and S. Xiao, “Fast and accurate optical fiber channel modeling using generative adversarial network,” J. Lightwave Technol. 39(5), 1322–1333 (2021). [CrossRef]

17. Z. Li, J. Shi, Y. Zhao, G. Li, J. Chen, J. Zhang, and N. Chi, “Deep learning based end-to-end visible light communication with an in-band channel modeling strategy,” Opt. Express 30(16), 28905–28921 (2022). [CrossRef]

18. Y. Zhao, P. Zou, W. Yu, and N. Chi, “Two tributaries heterogeneous neural network based channel emulator for underwater visible light communication systems,” Opt. Express 27(16), 22532–22541 (2019). [CrossRef]

19. J. Shi, Z. Li, J. Jia, Z. Li, C. Shen, J. Zhang, and N. Chi, “Waveform-to-Waveform End-to-End Learning Framework in a Seamless Fiber-Terahertz Integrated Communication System,” J. Lightwave Technol. 41(8), 2381–2392 (2023). [CrossRef]

20. G. Yarin, “Uncertainty in deep learning,” (2016): 3.

21. C. Blundell, J. Cornebise, and K. Kavukcuoglu, “Proceedings of the 32nd International Conference on Machine Learning,” PMLR 37, 1613–1622 (2015). [CrossRef]

22. G. Böcherer and B. C. Geiger, “Optimal quantization for distribution synthesis,” IEEE Trans. Inf. Theory 62(11), 6162–6172 (2016). [CrossRef]

23. P. Schulte and G. Böcherer, “Constant composition distribution matching,” IEEE Trans. Inf. Theory 62(1), 430–434 (2016). [CrossRef]

24. D. Wang, Y. Song, J. Li, J. Qin, T. Yang, and M. Zhang, “Data-driven optical fiber channel modeling: A deep learning approach,” J. Lightwave Technol. 38(17), 4730–4743 (2020). [CrossRef]

25. F. Pedro J, Y. Osadchuk, B. Spinnler, A. Spinnler, and W. Schairer, “Performance versus complexity study of neural network equalizers in coherent optical systems,” J. Lightwave Technol. 39(19), 6085–6096 (2021). [CrossRef]

ROP (dBm)	NMSE	Settings of BGAN	Settings of CGAN	Multiplications	Reduction
−3	0.022	(200,192)	(256,256)	55150; 86528	36.3%
−2	0.021	(192,192)	(256,224)	52998; 78144	32.2%
−1	0.021	(180,160)	(224,192)	43786; 61184	28.4%
0	0.021	(160,160)	(192,192)	39046; 52608	25.8%
1	0.022	(128,128)	(160,128)	27142; 33408	18.8%
2	0.020	(128,95)	(128,128)	22687; 26880	15.6%

ROP (dBm)	NMSE	Settings of BGAN	Settings of CGAN	Multiplications	Reduction
−3	0.021	(210,190)	(256,256)	57406; 86528	33.7%
−2	0.019	(195,190)	(256,220)	53401; 77096	30.7%
−1	0.020	(180,165)	(220,185)	44721; 58530	23.6%
0	0.019	(165,165)	(195,185)	41091; 52005	21.0%
1	0.019	(125,125)	(165,115)	26131; 32205	18.7%
2	0.023	(125,95)	(130,115)	22171; 25520	13.1%

Parameters	Values
Encoder layer	2
Decoder layer	3
Activation function	ReLU
Batch size	8192
Learning rate	1e-4
Optimiser	Adam
Epoch	500

ROP (dBm)	NMSE	Settings of BGAN	Settings of CGAN	Multiplications	Reduction
−3	0.022	(200,192)	(256,256)	55150; 86528	36.3%
−2	0.021	(192,192)	(256,224)	52998; 78144	32.2%
−1	0.021	(180,160)	(224,192)	43786; 61184	28.4%
0	0.021	(160,160)	(192,192)	39046; 52608	25.8%
1	0.022	(128,128)	(160,128)	27142; 33408	18.8%
2	0.020	(128,95)	(128,128)	22687; 26880	15.6%

ROP (dBm)	NMSE	Settings of BGAN	Settings of CGAN	Multiplications	Reduction
−3	0.021	(210,190)	(256,256)	57406; 86528	33.7%
−2	0.019	(195,190)	(256,220)	53401; 77096	30.7%
−1	0.020	(180,165)	(220,185)	44721; 58530	23.6%
0	0.019	(165,165)	(195,185)	41091; 52005	21.0%
1	0.019	(125,125)	(165,115)	26131; 32205	18.7%
2	0.023	(125,95)	(130,115)	22171; 25520	13.1%

Bayesian generative adversarial network emulator based end-to-end learning strategy of the probabilistic shaping for OAM mode division multiplexing IM/DD transmission

Abstract

1. Introduction

2. End-to-end deep learning of probabilistic shaping for OAM-MDM system using BGAN emulator

2.1. BGAN-based channel modelling strategy for an OAM-MDM system

2.2. Probabilistic shaping for OAM-MDM system based on AE

3. Experimental

3.1. Experimental setup

3.2. Experimental results and analysis

3.2.1 Performance of the BGAN emulator for the OAM-MDM system

3.2.2 Performance validation of AE-based PS for OAM-MDM system

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (3)

Equations (22)

Optics Express