Linewidth-tolerant and multi-format carrier phase estimation schemes for coherent optical m-QAM flexible transmission systems

Tao Yang; Chen Shi; Xue Chen; Min Zhang; Yuefeng Ji; Feng Hua; Yufei Chen

doi:10.1364/OE.26.010599

1. Introduction

Nowadays, 100G/200G coherent systems have been globally deployed followed by advances of the digital signal processing (DSP). Moving forward to future elastic optical networks (EONs) with dynamic, heterogeneous and unpredictable traffic trends, it is expected to be high-capacity and spectrum/power-efficient by using flexible transmission systems with adaptive baud rate and multiple modulation formats (MFs) [1,2]. In this scenario, the DSP design targets will include lower complexity and power consumption, higher tolerance to both linear and nonlinear noise, better flexibility and intelligence, and so forth, to further reduce the cost per bit while increase the network efficiency in EONs [3,4]. Accordingly, hybrid m-QAM formats are considered to be promising candidates for flexible modulation due to their potential of high-speed optical transmission at high spectral efficiencies (SE) [5,6]. In addition, multi-dimension modulation is another alternative to support the fine-granularity adjustment of the tradeoff among bit rate, SE, reachable distance and optical signal-to-noise ratio (OSNR) tolerance [7,8]. However, as the modulation order increases, high-order MFs have stricter requirements on the linewidth of the free running transmitter and local oscillator (LO) lasers as the constellation points are inherently closer in the Euclidean plane [9]. Thus, carrier phase estimation (CPE), a vital block in DSP units, is urgently looking forward to being applicable to multiple MFs with higher performance but lower complexity and power consumption [10].

Existing CPE schemes for high-order MFs can be classified as either data-aided with inferior SE or non-data-aided estimation techniques, moreover, it can be implemented in feedforward manner or in feedback structure unfavorable to high-speed parallel processing in practice. Current blind and feedforward algorithms mainly include: blind phase search (BPS) [11], quadrature phase shift keying (QPSK) partitioning [12], maximum likelihood (ML) estimation [13] and constellation transformation (CT) [14]. BPS algorithm has very high linewidth tolerance and is applicable to multiple MFs, however, with an increasing modulation order a large number of test phases are still required as it fully adopts blind search which imply the computational burden. QPSK partitioning schemes, on the other hand, are inherently inefficient in terms of laser linewidth tolerance because only a small portion of the current symbols can be used for the phase estimation especially in high order m-QAM systems. The ML or CT algorithm, which requires less computational effort but must be implemented after a rough recovery of the constellation, and thus they are vulnerable to performance degradation due to wrong constellation-assisted hard decision or improper transformation.

To further improve performance/flexibility and reduce complexity, significant efforts have been made in the area of efficient two-stage CPE. The key idea behind these schemes are either to make it incorporate with an additional MF identification process to enhance the flexibility of the original algorithm [15–17] or to use a coarse-phase estimate with a hardware-efficient but less-accurate phase estimator at the first stage and fine-phase estimators in subsequent stages to improve linewidth tolerance and/or reduce the overall complexity [18–23]. The first stage algorithm can be BPS with coarse searching step [18,19] or QPSK partitioning [20]. Once the first stage has provided a coarse estimate, the fine-stage estimation can be realized using BPS with finer step size, ML estimation or CT based algorithm [21–23]. The choice of the combination is made based on the tradeoff between complexity and performance. However, for the requirements of EONs, the above schemes have not yet been designed as a flexible, feed-forward and high performance CPE with low implementation complexity. In addition, the reliability of the first stage directly affects following stages performance, so enhancing the performance of the first stage is of great significance. Therefore, it is urgent and significant to combine the advantages of current solutions while further reducing the complexity and enhancing the performance as well as flexibility of CPEs simultaneously.

In this paper we propose a novel multi-format and linewidth-tolerant CPE scheme. We theoretically obtain the quasi-linear relationship between the rotation angle and the corresponding offset distance. Afterwards, a two-stage CPE scheme is presented where an extended QPSK partitioning in the first stage could effectively improve the performance while a quasi-linear approximation (QLA) method is crucial to dramatically reducing the complexity. The numerical simulation and the experimental investigation of the scheme are conducted to optimize its parameters and evaluate the performance. The simulation result demonstrates its greatly enhanced linewidth tolerance compared with the BPS and QPSK partitioning, while slightly better OSNR performance is experimentally validated on polarization multiplexing (PM) 4-QAM and 16-QAM systems.

2. Principle of the QLA scheme

2.1 Quasi-linear relationship

The ideal 64-QAM constellation $(\pm a \pm j \cdot b), a, b \in {1, 3, 5, 7}$ (other lower MFs like QPSK/16-QAM/32-QAM are also included) is illustrated in Fig. 1 (a). Generally, for BPS schemes, all symbols within a specific block length need to be rotated by a large number of test angles increasing with the modulation level and resulting in an unacceptable complexity in hardware implementation. In contrast to BPS, as a hardware-efficient scheme, symbols only lying on the rings C₁, C₃ and C₉ (hereinafter referred to as QPSK-points), i.e., 50%, 25% and 19% points of 16-QAM, 32-QAM and 64-QAM respectively, are utilized for CPE in conventional QPSK partitioning. These points are highlighted by the red color in Fig. 1(a). It can be seen that these points located in the inner rings for either 16-QAM or 32-QAM have relatively lower SNR than those lying on the outer rings. Therefore, the conventional QPSK partitioning needs a long block to smooth out additive white Gaussian noise (AWGN). However, increasing the block length will decreases the performance of phase tracking. Meanwhile, it will greatly increase the risk of error detection for the cycle slip if there are only a few available symbols in a block.

Fig. 1 (a) The constellation partitioning of the ideal 64-QAM in conventional QPSK partitioning scheme, (b) the principle of the proposed extended QPSK partitioning scheme. Symbols used for conventional QPSK partitioning are highlighted by the red points, while those used for extended QPSK partitioning are highlighted by the red and blue points.

Download Full Size | PDF

Considering the tradeoff between performance and implementation complexity, we propose a novel scheme to extend the available symbols. Figure 1(b) illustrates the ideal constellation distribution without phase noise. Obviously, the symbols highlighted by blue points lying on C₅ and C₈ not only have relatively higher SNR compared with inner points but also have phase offsets of only ± 14° and ± 9.5° from the diagonal lines with slope equal to + 1 or −1, respectively. Therefore, those symbols (hereinafter referred to as QPSK-like symbols) could possibly be helpful for CPE. In addition, to further increase the number of available symbols, we rotate the symbols of C₄ by π/4 degree so that the phase offsets of those symbols could be minimized to ± 11.3°. As the result after rotation, those symbols could also be reasonably treated as QPSK-like symbols. However, the symbols of C₂, C₆ and C₇ with many possible phase angles are not suitable to be utilized, because even after being rotated there is still a relatively larger phase offsets. Thus, the number of the available symbols increases from 8 to 24 (percent of 25% to 75%) for 32-QAM and 12 to 36 (percent of 19% to 56%) for 64-QAM.

Consequently, after identifying the MFs by some robust schemes [16,17], the available symbols r(k) are sorted from the received samples y(k) to achieve CPE. The mechanism of our proposed scheme is described in detail as followed. Typically, at the coherent receiver side, after analog-to-digital conversion, quadrature imbalance correction, chromatic dispersion compensation, retiming, polarization de-multiplexing and frequency offset estimation, the pending symbol-rate sample before CPE can be expressed as

r (k) = A (k) e^{j (θ_{k} + ϕ)} + n (k), k = 0, 1, 2, \dots

Where $A (k) \cdot e^{j θ k}$ denotes the kth transmitted symbol drawn from a QAM constellation, $θ_{k}$ is the modulation phase approximately equal to $(2 n + 1) \cdot π / 4$ , n is an integer. $\emptyset$ represents the phase noise induced by laser linewidth which remains unchanged within a proper time-window, and n(k) stands for additive complex white Gaussian noise. For m-QAM that has 90° symmetry, to recover the carrier phase in a feedforward approach, the received $r (k)$ is rotated by test phase angles $φ_{b} (b = 1, ... Β)$ spanning $- π / 4$ to $π / 4$

Z (k, b) = r (k) \cdot e^{- j ϕ_{b}}

Then, the real part Z_I(k,b) and imaginary parts Z_Q(k,b) of the rotated sample Z(k,b) can be expressed as

{\begin{cases} Z_{I} (k, b) = Z (k, b) \cos (θ_{k} + ϕ - φ_{b}) \\ Z_{Q} (k, b) = Z (k, b) \sin (θ_{k} + ϕ - φ_{b}) \end{cases}

Depending on the received format (QPSK, 16-QAM, 32-QAM, or 64QAM), we map the four quadrants symbols to the first quadrant by performing absolute operation of the real and imaginary parts respectively. Thus, as shown in the example in Fig. 1(b), we first define an innovative simplified metric of offset distance $d (k, b)$ , to calculating the offset distance from the available symbol to its closest diagonal, as follows

d (k, b) = | | Z {}_{I}{(k, b)} | - | Z_{Q} (k, b) | |

Where $| \cdot |$ stands for taking absolute value. In order remove noise distortions, the offset distances of N₁ consecutive received symbols rotated by the same phase angle $φ_{b}$ should be summed up, which can be formulated as

\begin{array}{l} e_{b, 1} (k) = \sum_{n = - c e i l (N 1 / 2) + 1}^{f l o o r (N 1 / 2)} d (k - n, b) \\ = \sum_{- c e i l (N 1 / 2) + 1}^{f l o o r (N 1 / 2)} [| | Z (k - n, b) | \cdot | \cos (θ_{k - n} + ϕ - φ_{b}) | - | Z (k - n, b) | \cdot | \sin (θ_{k - n} + ϕ - φ_{b}) | |] \\ = \sum_{n = - c e i l (N 1 / 2) + 1}^{f l o o r (N 1 / 2)} [| Z (k - n, b) | \cdot | \sin ((θ_{k - n} - \frac{π}{4}) + (ϕ - φ_{b})) |] \\ = \sum_{n = - c e i l (N 1 / 2) + 1}^{f l o o r (N 1 / 2)} [| Z (k - n, b) | \cdot | \sin (ϕ - φ_{b}) |] \end{array}

Where floor(⋅) represents the flooring function, ceil(⋅) stands for the ceiling function. (θ_k-n-π/4) is roughly equal to nπ here. Therefore, if phase compensation error ϕ_error = ϕ-𝜑_b is equal to zero, i.e., total compensation of phase noise, the offset distance e_b,1(k) will gets its minimum value. Meanwhile, since function sin (ϕ_error) is a monotonous function within [-π/4, π/4] and has a quasi-linear interval when ϕ_error belongs to this range, we can conclude that the offset distance function e_b,₁(k) can be approximately described by a quasi-linear relationship of the rotation angles 𝜑_b (corresponding to the phase compensation errors ϕ_error).

Figure 2(a) illustrates the offset distance as a function of phase compensation error for QPSK, 16-QAM, 32QAM and 64-QAM signals, respectively. From the figure we can see that the optimal phase angle which minimizes the offset distances can be determined by finding the intersection of these two quasi straight lines. Thanks to the symmetry of the two lines, we can easily calculate the intersection by only three points distributed on the two straight lines. In addition, it should be noted that QPSK and 16-QAM are superior to 32/64-QAM in the linearity of the offset distance function, due to the phase offsets of the introduced QPSK-like symbols used for CPE in 32/64-QAM systems. Fortunately, the decrease in the linearity is small and the impact of the phase offsets can be mitigated by the averaging window. Consequently, this method, which we call quasi-linear approximation (QLA), is applicable for multiple QAM formats.

Fig. 2 Normalized offset distance as a function of phase compensation error for QPSK, 16-QAM, 32-QAM and 64-QAM respectively, in (a) the first stage, (b) the second stage.

Download Full Size | PDF

However, because there is still a portion of the current symbols cannot be used for the phase estimation for 16/32/64-QAM, the above method should be practically applied as a coarse stage. Hence, to further improve the accuracy of phase estimation, we add a more accurate following stage, using all symbols of a shorter block length N₂, to approach the optimal phase estimation as reported in our previous work [24], where the offset distance can be directly calculated by the formula as follows

{\begin{cases} e_{I, b, 2} (k) = \sum_{n = k - c e i l (N 2 / 2) + 1}^{k + f l o o r (N 2 / 2)} [| | | | y_{I} (k - n, b) | - 4 | - 2 | - 1 |] \\ e_{Q, b, 2} (k) = \sum_{n = k - c e i l (N 2 / 2) + 1}^{k + f l o o r (N 2 / 2)} [| | | | y_{Q} (k - n, b) | - 4 | - 2 | - 1 |] \\ e_{b, 2} (k) = e_{I, b, 2} (k) + e_{Q, b, 2} (k) \end{cases}

For different modulation formats like QPSK, 16-QAM, 32-QAM and 64-QAM here, we use the same calculating formula. The relationship between offset distance and phase compensation error in this fine stage is depicted in Fig. 2 (b). It can be observed that the quasi-linear interval is gradually narrowed to a range close to actual phase noise. Similarly, the intersection in this stage, i.e. the optimal phase estimation, could be more accurately calculated by the same method as in the former stage.

2.2 Scheme principle and implementation

The block diagram of the proposed CPE scheme consisting of two stages is shown in Fig. 3. The CPE itself operates on blocks of N₁ and N₂ samples in the first and second stage respectively, and comprises four steps in each stage: a small number of rotation angles are applied to the received signal, an offset distance is evaluated, three points distributed on the two lines are identified, the optimal phase estimation is determined using QLA. In the following, we describe the respective stages in more detail.

Fig. 3 Block diagram of the proposed two-stage CPE scheme. (1) course stage using introduced QPSK-like symbols and fixed two rotation angles to obtain a coarse phase estimation, (2) fine stage using all current symbols and dynamically adaptive four rotation angles to determine the optimum phase estimation.

Download Full Size | PDF

In the first stage, we rotate C₄ by π/4 degree firstly, and the symbols lying on the rings C₁, C₃, C₄, C₅, C₈ and C₉ are sorted and rotated by the three angles of 0 and ± π/6 degree (actually only two rotation angles are required) respectively. Then we calculate the offset distance e_i_,1 of all available symbols to the closest diagonal by Eq. (5). Thus, three points lying on symmetrical quasi-straight line can be obtained and named (−π/6, e_1,1), (0, e_2,1) and (π/6, e_3,1). Please note that those points are located on the real plan. Finally, the intersection point of two straight lines, i.e. optimal phase estimation in the first stage, can be calculated by the above three points. Since these three points have three possible distributions in the linear interval, as shown in Fig. 4, the detailed calculation process of the optimal phase angle can be summarized as three cases.

Fig. 4 Three possible distributions of normalized offset distance versus rotation angle.

Download Full Size | PDF

For the first case, when (−π/6, e_1,1) is the minimum of the three points, as shown in Fig. 4(a), the optimal phase angle is calculated as follows: Due to the periodicity of angles, we translate (π/6, e_3,1) to the left by one period by subtracting π/2 to the abscissa of point itself. Thus, a new point (-π/3, e_3,1) is obtained, and (-π/3, e_3,1), (-π/6, e_1,1) could determine a straight line Line1. Similarly, (-π/6, e_1,1) and (0, e_2,1) can determine the other line Line2. Considering that the straight line with larger slope represents the true estimate, we first compare the slopes of two lines to calculate the minimum point.

If e_3,1> e_2,1, it means that Line1 is the actual estimated result, and thus the slope of the other symmetric line which over the point (0, e_2,1) is opposite to that of Line1. According to the point-slope form of straight-line equation, we have:

{\begin{cases} y - e_{3, 1} = \frac{e_{3, 1} - e_{1, 1}}{(- π / 3) - (- π / 6)} \cdot (x - (- π / 3)) \\ y - e_{2, 1} = \frac{- (e_{3, 1} - e_{1, 1})}{(- π / 3) - (- π / 6)} \cdot (x - 0) \end{cases}

On the contrary, if e_3,1 $\leq$ e_2,1, the straight-line equation of Line2 and its symmetric line can be obtained in a similar way. Thus, the abscissa x of the minimum point, i.e. the optimal phase estimation, can be expressed as:

φ_{e s t, 1} = {\begin{cases} \frac{π (e_{3, 1} - e_{2, 1})}{12 (e_{3, 1} - e_{1, 1})} - \frac{π}{6}, e_{3, 1} > e_{2, 1} \\ \frac{π (e_{3, 1} - e_{2, 1})}{12 (e_{2, 1} - e_{1, 1})} - \frac{π}{6}, e_{3, 1} \leq e_{2, 1} \end{cases}

For the second case, when (0, e_2,1) is the minimum of the three points, as shown in Fig. 4(b), we can use (−π/6, e_1,1), (0, e_2,1) and (π/6, e_3,1) to calculate the minimum point which can be expressed as:

φ_{e s t, 1} = {\begin{cases} \frac{π (e_{1, 1} - e_{3, 1})}{12 (e_{3, 1} - e_{2, 1})}, e_{3, 1} > e_{1, 1} \\ \frac{π (e_{1, 1} - e_{3, 1})}{12 (e_{1, 1} - e_{2, 1})}, e_{3, 1} \leq e_{1, 1} \end{cases}

Similarly, for the third case, when (π/6, e_3,1) is the minimum of the three points, as shown in Fig. 4(c), we translate (-π/6, e_1,1) to the right by one period by adding π/2 to the abscissa of point itself. Thus, a new point (π/3, e_1,1) is obtained. Then, we can use (0, e_2,1), (π/6, e_3,1) and (π/3, e_1,1) to calculate the minimum point as follows:

φ_{e s t, 1} = {\begin{cases} \frac{π (e_{2, 1} - e_{1, 1})}{12 (e_{2, 1} - e_{3, 1})} + \frac{π}{6}, e_{2, 1} > e_{1, 1} \\ \frac{π (e_{2, 1} - e_{1, 1})}{12 (e_{1, 1} - e_{3, 1})} + \frac{π}{6}, e_{2, 1} \leq e_{1, 1} \end{cases}

In the second stage, based on 𝜑_est,₁ of the first stage, we add four new rotation angles to accurately estimate the final phase noise as follows

φ_{i, 2} = φ_{e s t, 1} + i \cdot Q L I / 4 - 5 \cdot Q L I / 8, i = 1, 2, 3, 4

Where QLI represents quasi-linear interval as shown in Fig. 2 (b). N₂ consecutive samples are rotated by the four angles respectively, and then four offset distance e_i_,2 can be calculate by Eq. (6). Thus, we obtain four points (𝜑_1,2, e_1,2), (𝜑_2,2, e_2,2), (𝜑_3,2, e_3,2) and (𝜑_4,2, e_4,2) probably lying on two symmetrical quasi-straight lines. Similarly, the intersection in second stage, i.e. the final phase estimate, can be calculated by the same method as the first stage. The detailed calculation process of the optimal phase estimation can be summarized as follows:

When (𝜑_2,2, e_2,2) or (𝜑_3,2, e_3,2) is the minimum of the four points, which means that the phase estimation from the first stage is very close to the actual phase noise, these four points are highly possible located in the linear interval of the second stage. Therefore, we can use (𝜑_i_{-1, 2}, e_i_-1,2), (𝜑_i_{, 2}, e_i_,2) and (𝜑_i_{+1, 2}, e_i_+1,2), i = 2 or 3, to calculate the intersection of the two straight-line as follows:

φ_{e s t, 2} = {\begin{cases} \frac{(φ_{s + 1, 2} + φ_{s, 2}) \cdot (e_{s - 1, 2} - e_{s + 1, 2})}{2 (e_{s + 1, 2} - e_{s, 2})} + \frac{φ_{s + 1, 2} + φ_{s - 1, 2}}{2}, e_{s + 1, 2} > e_{1, 2} \\ \frac{(φ_{s - 1, 2} + φ_{s, 2}) \cdot (e_{s + 1, 2} - e_{s - 1, 2})}{2 (e_{s - 1, 2} - e_{s, 2})} + \frac{φ_{s + 1, 2} + φ_{s - 1 v 2}}{2}, e_{s + 1, 2} \leq e_{1, 2} \end{cases}

On the contrary, when the minimum of the four points is (𝜑_1,2, e_1,2) or (𝜑_4,2, e_4,2), it means that the phase estimation 𝜑_est,₁ deviates from the true phase noise, which in turn causes the four rotation angles in the second stage being out of the linear interval. Therefore, in this case, we consider the minimum of the four points as the approximate final phase estimation. Please note that total rotation angle number in the proposed two-stage QLA scheme is 2 + 4.

2.3 Complexity analysis and comparison

We compare the computational complexity (CC) of the proposed CPE scheme with traditional QPSK partitioning and BPS. For clarity, all complexities are calculated for a single polarization with phase unwrapping and compensation, and the computation is based on optimum implementations [25]. Thus, taking 64-QAM as an example, the complexity computation of the proposed scheme is summarized as follows.

For the first stage CPE, we have: ①. The extended QPSK partitioning operation takes 2 real multipliers and 1 real adder to calculate the amplitude of each symbol, and 1 comparator to compare the amplitude of each symbol and the threshold. Thus the partition for N₁ symbols requires 2N₁ real multipliers, N₁ real adders and N₁ comparators, while the rotation of C4 requires N₁/2 real multipliers and N₁/4 real adders. ②. The rotation of the two angles (−π/6 and π/6) requires 9/2∙N₁ real multipliers and 9/4∙N₁ real adders. ③. The calculation of the offset distance requires 3∙N₁∙9/16 real adders and 3∙3N₁∙9/16 comparators, and summing the offset distances requires 3∙(N₁∙9/16-1) real adders. Please note that complexity of taking absolute value is equal to that of comparators. ④. The calculation of the optimal phase angle requires 2 real multipliers, 3 real adders, and 3 comparators. In addition, the unwrapping operation requires 1 comparator and 1 real adder. Please note that the linear expressions are used to elaborate the two straight lines and have no contribution to the complexity. ⑤. Finally, the process of the phase compensation requires 4 real multipliers, 2 real adders.

For the proposed two-stage CPE algorithm, it is worth noticing that once the optimal phase angle is calculated in ④. Then it is directly sent to the second stage for calculating the new 4 rotation angle, which means that the process of the unwrapping and the phase compensation do not required at this time. Therefore, we have: ⑥.The calculations of Eq. (11) for four rotation angles require 5 real multipliers and 5 real adders. ⑦.The process of rotating all N₂ symbols by the four angles requires 4∙4N₂ real multipliers and 4∙2N₂ real adders. ⑧. In Eq. (6), calculating the offset distance for the real and the imaginary parts of the rotated sample requires 4∙7N₂ real adders and 4∙8N₂ comparators. ⑨.In Eq. (12), for calculating the final estimation 𝜑_est_,2, it requires 1 comparator and 5 adders and 4 real multipliers. ⑩.Finally, the unwrapping operation require 1 comparator and 1 real adder, and the phase compensation requires 4 real multipliers, 2 real adders.

In conclusion, the overall CC is summarized as shown in Table 1. The complexity of BPS and QPSK partitioning are also listed as it reported in [25], N is the block length and B is the number of test phases for BPS. It can be observed that the complexity of the first stage QLA is close to that of the QPSK partitioning, while the complexity of the proposed two-stage QLA is roughly equivalent to about one-tenth of the BPS at B = 64 for 64-QAM systems.

Table 1. Computational complexity comparison of different CPE schemes

View Table

3. Simulations and performance evaluation

To analyze the impact of key parameters and evaluate the performance of the proposed schemes comprehensively, we construct a polarization multiplexing 16G baud 4/16/32/64-QAM simulation platform in the back-to-back (BTB) scenario, where the phase noise is modeled as a Wiener process. White Gaussian noise is added in front of the receiver to emulate optical noise. Typically, quadrature imbalance, timing error, polarization impairments and frequency offset are adequately compensated before CPE so that we can focus on the effect of the single factor of laser linewidth on performance. Differential coding is not applied and more than 10⁶ symbols are used for BER counting to guarantee sufficient evaluation accuracy.

3.1 Block length optimization

Generally speaking, larger block length contributes more to the mitigation of additive noise, while shorter block length is preferred for accelerating the phase tracking speed and reducing the correlation of phase noise within a certain block length. Especially for our proposed schemes, this tendency can help optimizing the number of block lengths which maximize the linewidth tolerance of the first stage QLA and finally increase the correct possibility of three points within the linear interval for later fine QLA implementation. On the other hand, the CC increases with the growing of block length. Therefore, as the block length has an evident effect on the performance of CPE, obtaining the optimum block length is critical for reasonable valuation and fair comparison for different CPEs. Considering the state-of-the-art FEC limit, a target BER of 10⁻² is used to evaluate the performance of the schemes. The reference OSNR varies with the change of MFs, which is obtained without any phase noise as well as CPE algorithms at BER = 1E-2. We compare the performance of the proposed scheme with the conventional QPSK partitioning and BPS. Here, when the trade-off between CC and performance is taken into account, the test phase number selected for BPS is optimized for fair comparison as reported in [11], where 32, 32, 32 and 64 test phases are required for QPSK, 16-QAM, 32-QAM and 64-QAM, respectively. Please note that the number of test phases for BPS is fixed in our subsequent investigation.

Figure 5 shows the relationship between the tolerance of linewidth times symbol duration products $Δ v \cdot T_{s}$ and the block length for QPSK/16-QAM/32-QAM/64-QAM respectively, under the condition of 1-dB OSNR penalty at BER = 1 × 10⁻². It can be observed that the tolerance of $Δ v \cdot T_{s}$ rises steeply with the growing block length, and there is a maximum value. Then, the tolerance is gradually degraded if the block length is further enlarged. In the case of 1-dB OSNR penalty, we find that the optimum block length of the proposed first stage QLA, BPS and conventional QPSK partitioning are 80, 40 and 120 for 64-QAM, respectively. Similarly, the optimum block length are 120, 60 and 200 for 32-QAM, 40, 26 and 42 for 16-QAM and 30, 28 and 32 for QPSK. After obtaining and fixing the optimum N₁ in each case, the best N₂ (N₂<N₁) of the second stage QLA are also investigated, as they are 18, 18, 30 and 26 for QPSK/16-QAM/32-QAM/64-QAM respectively, as shown in Fig. 5(e). In addition, we can see form the figure that the first stage QLA is between the other two schemes in the optimum block length, and the maximum $Δ f \cdot T_{s}$ tolerance of the QLA scheme is superior to that of QPSK partitioning. However, the reduction in optimum block length is not very significant when compared with the increase of the available symbols (from 25% to 75% for 32-QAM and 19% to 56% for 64-QAM). The main reason is the phase offsets of the QPSK-like symbols which need a longer block length to average out the phase errors. Thus, we can conclude that the increased available symbols lead to a shorter block length and help to accelerate the phase tracking speed of the first stage CPE. It is noteworthy that the optimum block lengths of 32-QAM in different CPEs are larger than that of 16-QAM and 64-QAM. The reasons are three-fold: the pattern effect in BPS [21], the lack of high-SNR available symbols in QPSK partitioning, and the larger phase offsets in the first stage QLA.

Fig. 5 The effect of block length on linewidth tolerance for (a) QPSK, (b) 16-QAM, (c) 32-QAM and (d) 64-QAM in the first stage. (e) The optimum block length in the second stage QLA.

Download Full Size | PDF

After determining the optimal block length, the CC of different schemes can be analyzed and summarized in detail. Here, we just compare the complexity of multiplications and additions since they are main contributors to CC. Compared with the BPS which has optimal linewidth tolerance, we find that the CC of the two-stage QLA scheme is significantly reduced by the group factors of 10.7/6.2, 8.7/5.4, 8.8/6.1 and 15.7/10.3 (in the form of multiplications/additions) for QPSK, 16-QAM, 32-QAM and 64-QAM, respectively. For instance, given 1-dB OSNR penalty at BER = 1 × 10⁻² in 64-QAM systems, the two-stage QLA with the optimal block length of N₁ = 80 and N₂ = 26 shows better performance than BPS with N = 40. Thus, the required member of multipliers/adders is 991/1499 for the two-stage QLA while it is 15520/15378 for the BPS, which results in a group factor of 15.7/10.3.

3.2 Linewidth tolerance evaluation

Secondly, the linewidth tolerance, defined as linewidth times symbol duration product $Δ v \cdot T_{s}$ for 1dB OSNR penalty at BER = 1 × 10⁻², of the proposed CPE scheme is evaluated and compared with the QPSK partitioning and BPS, as shown in Fig. 6. It should be noted that the block length is fixed once the optimal value has been determined under the condition of 1-dB OSNR penalty at BER = 1 × 10⁻². The OSNR obtained without any phase noise and CPE algorithms at BER = 1 × 10⁻² is used as the reference for the penalties.

Fig. 6 Required OSNR versus the linewidth tolerance $Δ v \cdot T_{s}$ at a target BER of 1 × 10⁻² for (a) QPSK, (b) 16-QAM, (c) 32-QAM and (d) 64-QAM systems.

Download Full Size | PDF

Figure 6 shows the required OSNR versus the $Δ v \cdot T_{s}$ tolerance at a target BER of 1 × 10⁻². It is observed that given 1dB required OSNR penalty, the first stage QLA and QPSK-portioning have almost the same line width tolerance. However, with the increase of OSNR, the linewidth tolerance of the first stage QLA is gradually better than that of QPSK-portioning. In addition, $Δ v \cdot T_{s}$ of 1 × 10⁻³, 4.5 × 10⁻⁴, 1.6 × 10⁻⁴ and 9.4 × 10⁻⁵ can be tolerated for 4/16/32/64-QAM by the proposed two-stage QLA scheme, while they are 8.5 × 10⁻⁴, 3.6 × 10⁻⁴, 1.1 × 10⁻⁴ and 6.2 × 10⁻⁵ for BPS. Therefore, significant improvements of about 18%, 25%, 45% and 51% of the linewidth tolerance $Δ v \cdot T_{s}$ are achieved for 4/16/32/64-QAM respectively. Meanwhile, the better the OSNR is, the greater the linewidth tolerance difference between the two pairs of compared schemes (the first stage QLA to QPSK partitioning, two-stage QLA to BPS) could be observed. This phenomenon could be attributed to the proposed two-stage configuration, in which a better mitigation of additive noise is proceed in the first stage and a faster phase tracking speed is obtained due to the shorter block length in the second stage.

3.3 Phase tracking performance

Then, to further disclose the reasons for the different performance of CPE schemes, taking 16-QAM and 64-QAM as examples, we investigate the estimation accuracy and phase tracking performance of the proposed scheme. The QPSK partitioning and BPS schemes with a typical number of test phases are also introduced as contrasts, respectively.

The simulation results of the phase deviation to the actual phase and the corresponding standard deviation are presented in Fig. 7. We can see that the phase deviation and the standard deviation of the first stage QLA is smaller than that of QPSK partitioning, but a relatively large phase deviation still exists. Fortunately, instead of searching the phase blindly with fixed step-size as traditional BPS algorithm, the two-stage QLA can obtain a more robust and accurate estimation phase angle resulting in a smaller phase deviation and standard deviation, which demonstrates the superior phase tracking performance of the proposed scheme. It is worth noting that 16-QAM at $Δ v \cdot T_{s} = 2 \times 10^{- 4}$ has a relatively lager deviation due to its heavy linewidth burden, compared with 64-QAM.

Fig. 7 Deviations to the actual phase of different CPE schemes, (a) 16-QAM at OSNR = 18 dB and $Δ v \cdot T_{s} = 2 \times 10^{- 4}$ , (b) 64-QAM at OSNR = 28 dB and $Δ v \cdot T_{s} = 8 \times 10^{- 5}$ .

Download Full Size | PDF

3.4 BER performance

Finally, the comprehensive performance of the proposed scheme is analyzed and compared with BPS by investigating the dependence of the BER on OSNR in the BTB system with various linewidths. The theoretical optimum results are given as a reference, which can be calculated by Eq. (13) [26]

{\begin{cases} B E R = \frac{2}{\log_{2} M} (1 - \frac{1}{\sqrt{M}}) e r f c [\sqrt{\frac{3 \log_{2} M}{2 (M - 1)} \cdot \frac{E_{b}}{N_{0}}}] \\ O S N R = \frac{E_{b}}{N_{0}} \cdot \frac{R_{b}}{2 B_{r e f}} \end{cases}

Where M is the modulation level, Rb is the total bit rate and Bref = 12.5 GHz. The relationship between BER and OSNR is obtained as shown in Fig. 8, using fixed block lengths which are determined under the condition of 1-dB OSNR penalty at BER = 1 × 10⁻². It is noteworthy that because of the high estimation accuracy, the slightly better OSNR performance of the proposed scheme could be observed compared with BPS in lower OSNR region. However, when it comes to higher OSNR region, the significantly reduced OSNR penalty could be observed for the first stage QLA as well as the two-stage QLA, compared with QPSK partitioning and BPS, respectively.

Fig. 8 BER as a function of OSNR for (a) QPSK with of 3 × 10⁻⁴ and 32-QAM with $Δ v \cdot T_{s}$ of 6 × 10⁻⁵, (b) 16-QAM with $Δ v \cdot T_{s}$ of 1.25 × 10⁻⁴ and 64-QAM with $Δ v \cdot T_{s}$ of 4 × 10⁻⁵.

Download Full Size | PDF

Constellation diagrams of the output of the BPS and two-stage QLA for the four formats are shown in Fig. 9. In each case, the OSNR that is theoretically required for a BER of 10⁻⁴ is used for the simulation. We can observe that BPS scheme with typical test phases results in a poor constellation, while it improves noticeably after recovered by the two-stage QLA scheme.

Fig. 9 Constellation diagrams recovered by BPS (upper) and the proposed two-stage QLA scheme (bottom) for (a) QPSK at OSNR = 12.5 dB and $Δ v \cdot T_{s} = 8 \times 10^{- 4}$ , (b) 16-QAM at OSNR = 19.2 dB and $Δ v \cdot T_{s} = 4 \times 10^{- 4}$ , (c) 32-QAM at OSNR = 22.4 dB and $Δ v \cdot T_{s} = 9 \times 10^{- 5}$ , (d) 64-QAM at OSNR = 25.4 dB and $Δ v \cdot T_{s} = 6 \times 10^{- 5}$ .

Download Full Size | PDF

4. Experiment setup and results

Figure 10 shows the experiment setup for 16 GBaud PM-QPSK and PM-16QAM coherent systems for BTB and 800km transmission measurements. 2¹⁷-1 PRBS are used to generate multi-level Gray mapped QPSK/16-QAM signals. Then, the signals (Ix, Qx, Iy, Qy) are sent into four synchronized 8-bit DACs with ~13GHz analog bandwidth and 64-GSa/s. Fourfold oversampling is used to generate 16G baud signals which drive a polarization multiplexed I/Q modulator. An integrated tunable external cavity laser (ECL) at 193.475 THz with a measured linewidth of 50 kHz is used as the transmitter laser. The fiber link is composed of eight 100 km standard single mode fiber (SSMF) with a low attenuation of 0.159 dB/km. Variable amounts of ASE noise are added to adjust the OSNR which is measured by an optical spectrum analyzer with 0.1 nm noise reference bandwidth. Then, the amplified signal is filtered by an optical band-pass filter (OBPF). The local oscillator (LO) laser is the same as the transmitter, which has the same measured linewidth of 50 kHz at 193.475 THz. The optical signal is detected by a coherent receiver and digitized by the 80GS/s Real-time sampling oscilloscope. For offline DSP processing, the received signal is first down-sampling to 2 samples/symbol, and I/Q imbalance compensation, chromatic dispersion (CD) compensation and clock recovery are carried out, and then passed through a butterfly FIR filter for polarization demultiplexing. The frequency offset is estimated by a 4th power frequency-domain FFT algorithm. Viterbi-Viterbi, QPSK partitioning, BPS and the two-stage QLA are utilized to realize CPE. Finally, more than 10⁶ symbols are used for BER counting.

Fig. 10 Experiment setup for 16 GBaud PM-QPSK and PM-16QAM coherent systems.

Download Full Size | PDF

Figure 11(a) and 11(b) show the BER versus OSNR curves for 16 GBaud PM-QPSK and PM-16QAM in the BTB and 800km transmission scenario, respectively. The simulation results with experimental parameters are presented for comparison, giving the transmitter and LO the same linewidth of 2.4 MHz for QPSK and 1MHz for 16QAM. The theoretical results are also given as a reference. The experimental results show that the transmission penalties of PM-QPSK and PM-16QAM system are 0.8dB and 1dB, respectively. The linewidth-induced phase impairment is compensated effectively by using the proposed two-stage QLA scheme. Furthermore, it could be observed that the two-stage QLA has slightly better BER performances compared with the Viterbi-Viterbi, QPSK partitioning and BPS, which is consistent with the simulation results in Section 3.4 under this very small linewidth condition. For larger linewidths, the simulation results show that the two-stage QLA could be superior to the other two schemes.

Fig. 11 BER as a function of OSNR for 16GBaud (a) PM-QPSK, (b) PM-16QAM systems.

Download Full Size | PDF

5. Conclusion

We propose and verify an innovative multi-format CPE scheme for coherent optical m-QAM flexible transmission systems, based on the technique of extended QPSK partitioning and quasi-linear approximation. The scheme combines the advantages of low complexity of QPSK partitioning and excellent linewidth tolerance of BPS simultaneously. Results of comprehensive simulation not only demonstrate its superiority of versatility and high estimation accuracy, but also show that for 1dB OSNR penalty at BER = 1 × 10⁻², a tolerance of linewidth and symbol duration products $Δ f \cdot T_{s}$ of 5.5 × 10⁻⁴, 1.8 × 10⁻⁴, 5.7 × 10⁻⁵ and 1.5 × 10⁻⁵ are achieved for 4/16/32/64-QAM respectively, which shows the significant improvements of about 18%, 25%, 45% and 51% of the linewidth tolerance by contrast with the BPS. In addition, the CC can be significantly reduced by a group factors of 10.7/6.2, 8.7/5.4, 8.8/6.1 and 15.7/10.3 in the form of multipliers/adders for 4/16/32/64-QAM with better performance compared with the BPS, respectively. We also experimentally evaluate its performance under the scenario of 16G baud BTB and 800km SSMF transmission, and the slightly better OSNR performance of the scheme is observed. The advantages of the proposed scheme make it a preferable alternative to powerful CPE for multiple high order m-QAM signals in EONs.

Funding

National Natural Science Foundation of China (61571061).

References and links

1. O. Gerstel, M. Jinno, A. Lord, and S. J. B. Yoo, “Elastic optical networking: a new dawn for the optical layer?” IEEE Commun. Mag. 50(2), s12–s20 (2012). [CrossRef]

2. X. Zhou and L. Nelson, “Advanced DSP for 400 Gb/s and Beyond Optical Networks,” J. Lightwave Technol. 32(16), 2716–2725 (2014). [CrossRef]

3. A. Lau, Y. Gao, Q. Sui, D. Wang, Q. Zhuge, M. Morsy-Osman, M. Chagnon, X. Xu, C. Lu, and D. Plant, “Advanced DSP techniques enabling high spectral efficiency and flexible transmissions: toward elastic optical networks,” IEEE Signal Process. Mag. 31(2), 82–92 (2014). [CrossRef]

4. D. A. Morero, M. A. Castrillón, A. Aguirre, M. R. Hueda, and O. E. Agazzi, “Design Tradeoffs and Challenges in Practical Coherent Optical Transceiver Implementations,” J. Lightwave Technol. 34(1), 121–136 (2016). [CrossRef]

5. Q. Zhuge, M. Morsy-Osman, X. Xu, M. Chagnon, M. Qiu, and D. V. Plant, “Spectral Efficiency-Adaptive Optical Transmission Using Time Domain Hybrid QAM for Agile Optical Networks,” J. Lightwave Technol. 31(15), 2621–2628 (2013). [CrossRef]

6. P. Winzer, “High-spectral-efficiency optical modulation formats,” J. Lightwave Technol. 30(8), 3824–3835 (2012). [CrossRef]

7. S. Zhang and F. Yaman, “Design and Comparison of Advanced Modulation Formats Based on Generalized Mutual Information,” J. Lightwave Technol. 36(2), 416–423 (2018). [CrossRef]

8. Z. He, W. Liu, B. Shen, X. Chen, X. Gao, S. Shi, Q. Zhang, D. Shang, Y. Ji, and Y. Liu, “Flexible multidimensional modulation formats based on PM-QPSK constellations for elastic optical networks,” Chin. Opt. Lett. 14(4), 20–23 (2016).

9. M. Xiang, S. Fu, L. Deng, M. Tang, P. Shum, and D. Liu, “Low-complexity feed-forward carrier phase estimation for M-ary QAM based on phase search acceleration by quadratic approximation,” Opt. Express 23(15), 19142–19153 (2015). [CrossRef] [PubMed]

10. B. Baeuerle, A. Josten, F. Abrecht, M. Eppenberger, E. Dornbierer, D. Hillerkuss, and J. Leuthold, “Multi-format carrier recovery for coherent real-time reception with processing in polar coordinates,” Opt. Express 24(22), 25629–25640 (2016). [CrossRef] [PubMed]

11. T. Pfau, S. Hoffmann, and R. Noe, “Hardware-efficient coherent digital receiver concept with feed forward carrier recovery for M-QAM constellations,” J. Lightwave Technol. 27(24), 989–999 (2009). [CrossRef]

12. Y. Gao, A. P. T. Lau, S. Yan, and C. Lu, “Low-complexity and phase noise tolerant carrier phase estimation for dual-polarization 16-QAM systems,” Opt. Express 19(22), 21717–21729 (2011). [CrossRef] [PubMed]

13. X. Zhou, “An improved feed-forward carrier recovery algorithm for coherent receivers with M-QAM modulation format,” IEEE Photonics Technol. Lett. 14(14), 1051–1053 (2010). [CrossRef]

14. S. M. Bilal, C. R. Fludger, V. Curri, and G. Bosco, “Multistage carrier phase estimation algorithms for phase noise mitigation in 64-quadrature amplitude modulation optical systems,” J. Lightwave Technol. 32(17), 2973–2980 (2014). [CrossRef]

15. X. Zhou, K. Zhong, Y. Gao, C. Lu, A. P. T. Lau, and K. Long, “Modulation-format-independent blind phase search algorithm for coherent optical square M-QAM systems,” Opt. Express 22(20), 24044–24054 (2014). [CrossRef] [PubMed]

16. M. Xiang, Q. Zhuge, M. Qiu, X. Zhou, M. Tang, D. Liu, S. Fu, and D. V. Plant, “RF-pilot aided modulation format identification for hitless coherent transceiver,” Opt. Express 25(1), 463–471 (2017). [CrossRef] [PubMed]

17. G. Liu, R. Proietti, K. Zhang, H. Lu, and S. J. Ben Yoo, “Blind modulation format identification using nonlinear power transformation,” Opt. Express 25(25), 30895–30904 (2017). [CrossRef] [PubMed]

18. Q. Zhuge, C. Chen, and D. V. Plant, “Low computation complexity two-stage feedforward carrier recovery algorithm for M-QAM,” in Proceedings of OFC (2011), paper OMJ5. [CrossRef]

19. K. P. Zhong, J. H. Ke, Y. Gao, and J. C. Cartledge, “Linewidth tolerant and low-complexity two-stage carrier phase estimation based on modified QPSK partitioning for dual-polarization 16-QAM systems,” J. Lightwave Technol. 31(1), 50–57 (2013). [CrossRef]

20. I. Fatadin, D. Ives, and S. J. Savory, “Carrier-phase estimation for 16-QAM optical coherent systems using QPSK partitioning with barycenter approximation,” J. Lightwave Technol. 32(13), 2420–2427 (2014). [CrossRef]

21. J. Li, L. Li, Z. Tao, T. Hoshida, and J. C. Rasmussen, “Laser-Linewidth-Tolerant Feed-Forward Carrier Phase Estimator with Reduced Complexity for QAM,” J. Lightwave Technol. 29(16), 2358–2364 (2011). [CrossRef]

22. J. Han, W. Li, L. Huang, H. Li, Q. Hu, and S. Yu, “Carrier Phase Estimation Based on Error Function Calculation for 16-QAM Systems,” IEEE Photonics Technol. Lett. 28(22), 2561–2564 (2016). [CrossRef]

23. J. Feng, W. Li, J. Xiao, J. Han, H. Li, L. Huang, and Y. Zheng, “Carrier Phase Estimation for 32-QAM Optical Systems Using Quasi-QPSK partitioning Algorithm,” IEEE Photonics Technol. Lett. 28(1), 75–78 (2015). [CrossRef]

24. T. Yang, X. Chen, S. Shi, E. Sun, and C. Shi, “Low-complexity and modulation-format-independent carrier phase estimation scheme using linear approximation for elastic optical networks,” Opt. Commun. 410, 376–384 (2018). [CrossRef]

25. J. H. Ke, K. P. Zhong, Y. Gao, J. C. Cartledge, A. S. Karar, and M. A. Rezania, “Linewidth-Tolerant and Low-Complexity Two-Stage Carrier Phase Estimation for Dual-Polarization 16-QAM Coherent Optical Fiber Communications,” J. Lightwave Technol. 30(24), 3987–3992 (2012). [CrossRef]

26. J. D. Proakis and M. Salehi, Digital Communications (McGraw-Hill, 2008).

	real multipliers	real adders	comparators	LUTs	decisions
QPSK partitioning	10N + 2	5N	2N + 1	1	N
BPS	6N∙B + 4N	6N∙B-B + 2N + 2	B	0	NB + N
The first stage QLA	7N₁ + 6	6N₁ + 7N₁/8 + 3	6N₁ + N₁/16 + 4	0	0
Two-stage QLA	7N₁ + 16N₂ + 15	6N₁ + 7N₁/8 + 36N₂ + 13	6N₁ + N₁/16 + 32N₂ + 5	0	0

Linewidth-tolerant and multi-format carrier phase estimation schemes for coherent optical m-QAM flexible transmission systems

Abstract

1. Introduction

2. Principle of the QLA scheme

2.1 Quasi-linear relationship

2.2 Scheme principle and implementation

2.3 Complexity analysis and comparison

3. Simulations and performance evaluation

3.1 Block length optimization

3.2 Linewidth tolerance evaluation

3.3 Phase tracking performance

3.4 BER performance

4. Experiment setup and results

5. Conclusion

Funding

References and links

Cited By

Figures (11)

Tables (1)

Equations (13)

Optics Express