Realization of real-time DSP for C-band PAM-4 transmission in inter-datacenter network

Sang-Rok Moon; Seung-Woo Lee; In-Ki Hwang; Hun-Sik Kang; Hae Young Rha; Joon Ki Lee

doi:10.1364/OE.382194

1. Introduction

The evolution in signal format from non-return to zero (NRZ) to pulse amplitude modulation-4 (PAM-4) has been one of the biggest issues in optical networks because of the rapid increase in bandwidth requirement. The intensity modulation/direct detection (IM/DD) PAM-4 format was first introduced for short-reach transmission in intra-datacenter networks, because of its spectral efficiency, cost-effectiveness, energy efficiency, and simplicity. However, its application is now being expanded to inter-datacenter networks [1,2] or mobile front-haul [3], using C-band.

One of the most important issues for the C-band IM/DD PAM-4 system is coping with inter-symbol interference (ISI), which is induced by chromatic dispersion in the optical fiber. Although dispersion compensating fiber (DCF) and dispersion compensating modules (DCMs) can be used as bulk dispersion compensators, to guarantee fine-tuned dispersion tolerance, dispersion compensation using digital signal processing (DSP) at the receiver-side has shown promise. Several novel DSP algorithms have been proposed, and their performances were reported [4–14]. However, most of these results were shown in off-line simulation and only few reports showed the real-time demonstration [10–14]. To the best of our knowledge, the reported real-time dispersion compensation capability at > 50 Gb/s has been limited to 170 ps/nm ∼ 260 ps/nm (corresponding to a transmission distance of 10 km ∼ 15 km) [12–14].

In this paper, we report the performance of a DSP for a 56 Gb/s PAM-4 transmission system, actualized using a real-time field programmable gate arrays (FPGAs). In Chapter II, we introduce our DSP architecture and discuss implementation issues. It may be noted that the DSP algorithm is based on our previous results [15–17].

To increase the capacity of the dispersion compensation, a combined structure of decision feedback equalizer (DFE) and maximum likelihood sequence equalizer (MLSE) was employed. The Mueller-Müller (M-M) algorithm was used for the timing recovery. For forward error correction (FEC), a low-overhead low-density parity check (LDPC) code (8.51% overhead) was used to provide better coding gain, instead of the conventional Reed-Solomon (RS) code. Since the LDPC code requires probability information for each bit, the MLSE was modified to have a soft-output using the soft-decision Viterbi algorithm (SOVA). It should be noted that the SOVA was modified from the conventional SOVA to reduce implementation complexity [17]. To mitigate the effect of burst error, a block interleaver was used.

In Chapter III, we analyze the transmission results of the real-time DSP, which demonstrated a 25 km transmission with 425 ps/nm of corresponding dispersion capacity. We empirically verified the feasibility of the DSP in an inter-datacenter network. We assumed a configuration that employs DCF/DCM and the DSP simultaneously, where the DCF/DCM provides fixed amount of dispersion compensation and the DSP provides dispersion tolerance. With DCF providing -1013 ps/nm (equivalent to the dispersion of ∼ 60 km single mode fiber (SMF)) and the DSP, we were able to achieve error-free transmission for 35 km ∼ 85 km. In Chapter IV, we summarize the results.

2. System architecture

2.1. Overview of PAM-4 transmission system

Figure 1 shows the schematics of a conventional PAM-4 transmission system. At the transmitter-DSP (Tx-DSP), the bit stream is encoded by an FEC encoder and mapped into the PAM-4 signal format. The output of the Tx-DSP is inserted to the optical modulator for E-O conversion. The optical signal is transmitted along the optical fiber, where an optical amplifier can be used to compensate for the transmission loss. The optical signal is converted to an electrical signal by a photo-detector (PD). At the receiver-DSP (Rx-DSP), the signal is synchronized, equalized, de-mapped, and finally decoded by the FEC decoder into a bit stream.

Fig. 1. Schematics of conventional PAM-4 system.

Download Full Size | PDF

Our Real-time DSP was designed for the described PAM-4 transmission system. Details of the DSP will be explained in the following sections.

2.2. Tx DSP architecture

Figure 2(a) shows the function blocks of the Tx-DSP. As an FEC encoder, we used a quasi-cyclic LDPC (QC-LDPC) (2256, 2448) encoder with submatrix size Z = 48. Its encoding matrix was generated by additive groups of prime fields [18] and its threshold BER to achieve 1 × 10⁻¹² was 1 × 10⁻³ [16]. For real-time operation, three LDPC encoders were operated in parallel. Each LDPC encoder was implemented using simple shift registers and logic circuits. The base parity check matrix and circular shifter were implemented in a hard-wired manner for simplicity.

Fig. 2. Tx-DSP architecture and frame format. (a) Tx-DSP structure, (b) Frame structure.

Download Full Size | PDF

After LDPC encoding, a block interleaver spread the bit stream to minimize the effect of burst errors after transmission [16]. Simple block interleaving without permutations was designed and the block had the form of a 12 × 1224 sized row-column matrix. The bit stream was then mapped into the PAM-4 signal format with gray code. A termination symbol was inserted for every 31 symbols and thus a block consisted of 32 symbols. The termination symbols was used to terminate Viterbi paths in the MLSE. It also enabled the realization of a highly parallel decision feedback equalizer by cutting the feedback loop of the DFE. Furthermore, it blocked the error propagation of the DFE.

Figure 2(b) shows the frame structure of the Tx-DSP output. Each frame started with a frame marker consisting of 16 consecutive ‘3’ followed by 16 consecutive ‘0’. The training symbols were transmitted to train the timing error recovery and channel equalizers of the receiver DSP. After transmission of frame marker and training symbols, data symbols were transmitted continuously. The sampling rate of the DAC was 28 GSa/s, thus the line rate was 56 Gb/s. The net data rate was 50 Gb/s (=56 Gb/s × 2256/2448 × 31/32).

2.3. Rx DSP architecture

Rx-DSP architecture is shown in Fig. 3(a). Firstly, the Rx-DSP received signal from a 56 GSa/s analog-to-digital convertor (ADC) (2 samples / symbol). Then, the starting position of the data stream was identified using cross-correlation with the frame marker. After the frame start was found, data was passed to the timing recovery subsystem, which consists of a timing error estimator, timing error compensator and loop filter. The Mueller-Müller algorithm, a decision directed algorithm, was used to estimate timing error because the algorithm is invulnerable to signal distortion due to chromatic dispersion. Timing error estimation was performed using decided symbols at the DFE and time recovered symbols.

Fig. 3. (a) Rx-DSP architecture, (b) Details of the Rx-DSP hardware design for FPGA1, (c) Details of the Rx-DSP hardware design for FPGA2.

Download Full Size | PDF

Then a second order loop filter was used to stabilize the timing recovery loop. After timing error recovery, the signal is fed into the DFE, with four blocks operating in parallel. The DFE consisted of a T/2-spaced 25-tap feed-forward equalizer (FFE) and a 1-tap feedback equalizer (FBE). In each cycle, least mean square (LMS) algorithm was adopted to obtain DFE coefficients using 16 timing recovered symbols and decided symbols. The update rate of coefficients was initially set to 0.0002 and then 3 different values were used depending on the time index.

After symbol equalizing at the DFE, the output of the FFE was connected to the MLSE with a memory length of two. For channel estimation, the mean values of every 1024 symbols of each states were obtained at the MLSE statistics calculation block. The MLSE operated by 32 symbols (31 data symbols and 1 termination symbol). Using the channel estimation, a SOVA algorithm, modified to reduced complexity, was used to calculate likelihood ratio (LLR) [17]. The de-interleaver was used to rearrange each of the LLR values for the robustness against burst errors. The LDPC decoder used the min-sum (MS) algorithm to decode LDPC codes, and was implemented in a robust and simple design, compared with the belief propagation (BP) algorithm. The routing between check node (CN) and variable node (VN) had fully broadcasting architecture which mitigates routing congestion by reducing the complexity of interconnection between each node in the Tanner graph [19]. The LDPC decoders were also designed with fully parallel architecture and three FEC decoder cores were operated in parallel, as similar to like the LDPC encoders.

Details of the Rx-DSP hardware design for the FPGA 1 and FPGA 2 are shown in Fig. 3(b) and Fig. 3(c). The figure shows the schematic parallel structure, bit-width and clock spent in each blocks. The synchronizer received 6-bit 256 samples in parallel. To support the 25 taps FFE in parallel with the 256 samples and 5 tap timing error compensators, the data aligner passed 280 samples in parallel to the timing compensator and then timing compensator outputted 280 samples to the FFE. FPGA1 contains a feedback loop consisting of a timing compensator, DFE, timing error estimator and second order loop filter. The T-spaced MLSE received 128 symbols in parallel and outputted 5-bit LLR values for 256 bits. Using a buffer, the termination symbols were removed and the de-interleaver received 256 LLR values. The de-interleaver de-interleaved them based on a 1224 × 12 matrix. It outputted them in 288 LLR values and delivered 96 LLR values to the three LDPC decoder cores.

Each LDPC decoder core for each iterative block consisted of a check node and a variable node block. The check and variable node block contain a node calculator and interconnect module which computes each node message and the interconnect module routes the information between the check nodes and variable nodes, respectively, in a fully parallel manner.

In the hardware design, the operating frequency of the FPGAs was selected to be 218.75 MHz. Total latency of the data-path was 1.79 µs (391 clock cycle). Note that in this calculation, the latency between two FPGAs was not considered, because the entire DSP will be implemented in one DSP chip in the real-field.

In the following experiment, the Tx-DSP was implemented using a Xilinx VC7215 Evaluation board with XC7VX690 T FPGA, and Rx-DSP was implemented on Xilinx XCVU190 and XCVU13P FPGAs. Table 1 shows the utilization report for each FPGA and key DSP blocks.

Table 1. Transmitter and Receiver Implementation Resources.

View Table

3. Experiment

3.1. Experimental setup

Figure 4 shows the experimental setup used to investigate the performance of the real-time DSP. A pseudo random binary sequence (PRBS) pattern with a 2¹⁵-1 bit length was used as data. After the Tx-DSP, the output of the DAC was modulated using a dual-drive Mach-Zehnder modulator (MZM) which has near-zero chirp. The bandwidths of the DAC and MZM were 40 GHz and 30 GHz, respectively. In order to use the linear region of the MZM, the extinction ratio was limited by 7 dB. The wavelength of the laser diode (LD) was 1548.3 nm.

Fig. 4. Experimental setup to investigate dispersion compensation capacity.

Download Full Size | PDF

The modulated light was transmitted through single mode fiber. Erbium doped fiber amplifier (EDFA) was used to compensate for the transmission loss. An amplified spontaneous emission (ASE) source and variable optical amplifier (VOA) were used to adjust the optical signal-to-noise ratio (OSNR) at the PIN-PD. An embedded ADC in a real-time oscilloscope was used for data capture. The bandwidths of the PIN-PD and the oscilloscope were 40 GHz and 20 GHz, respectively. Using a personal computer (PC) and Universal asynchronous receiver/transmitter (UART) interface, the captured data was saved in the random access memory (RAM) on the RX-FPGA board, and the saved data was transmitted to the RX-FPGA when enough data was collected. Because of the speed limit of the UART interface and data capturing, the Rx-FPGA was operated sparsely in time. However, it might be a valid option to test with real-time FPGAs.

3.2. Experimental result

In order to identify hardware implementation loss, we measured BER in terms of OSNR in a back-to-back configuration. The BER was read via Xilinx’ VIO (virtual IO) using embedded pattern matchers at the FPGA. The results are depicted using floating offline simulation, as shown in Fig. 5. For each data point, the BER of ∼5 × 10⁷ bits were measured. The data points without error were marked with unfilled symbols at 2 × 10⁻⁸ BER (inverse of 5 × 10⁷), to visualize the tendency of the BER curves. As shown in Fig. 5, the FEC threshold BER was achieved at OSNR 31 dB. Compared to the simulation results, the observed implementation penalty at 10⁻⁵ decoded BER was ∼0.5 dB. One of the penalty sources is timing delay in the timing recovery path. Because of the timing delay, there is a timing mismatch between the estimated timing error and the signal to be timing-recovered, and the timing mismatch degrades BER. The other penalty source is limited bit-width.

Fig. 5. Experimental results for identifying hardware implementation penalty.

Download Full Size | PDF

To identify dispersion compensation capacity, BER was measured as a function of OSNR with various transmission distances. As the transmission distance increased, the uncoded BER was degraded, as shown in Fig. 6(a). We achieved error-free transmission up to 30 km. However, an error-floor sign was observed at 30 km transmission.

Fig. 6. Experiment results with various transmission distances. (a) coded BER vs. OSNR, (b) coded vs. uncoded BER.

Download Full Size | PDF

For a deeper investigation, the relation between the coded and uncoded BER was depicted as shown in Fig. 6(b). The slope of the curve shows the error correction performance of the LDPC. Obviously, the slope of the curve becomes gradual at 30 km, indicating degradation of the LDPC performance. The reason of degradation might be an inaccurate LLR calculation in the MLSE-SOVA, which means that the dispersion exceeds the dispersion compensation capacity at 30 km transmission. In this case, the achievable transmission distance without error-floor sign is 25 km, and the corresponding dispersion compensation capacity is ∼ 425 ps/nm.

3.3. Transmission result for inter-datacenter network

As mentioned, the main application of the C-band PAM-4 transmission system will be inter-datacenter networks. Figure 7 shows the experimental setup to verify its feasibility. It is reasonable to assume that the maximum transmission distance of an inter-datacenter network is ∼ 80 km. We employed DCF and the DSP simultaneously. In this configuration, the DCF provides fixed amount of dispersion compensation and the DSP provides dispersion tolerance. A DCF with -1013 ps/nm dispersion (equivalent to -59.5 km SMF dispersion) was imbedded at the receiver. We measured BER with 35 km ∼ 85 km SMF fiber in order to confirm error-free transmission and the results are as shown in Fig. 8.

Fig. 7. Experimental setup to verify feasibility in inter-datacenter network with the real-time DSP.

Download Full Size | PDF

Fig. 8. BER vs. transmission distance for inter-datacenter network.

Download Full Size | PDF

For each data point, BER was measured with ∼ 2 × 10⁸ bits. If the transmission was error free, the data point was marked with an unfilled symbol at 5 × 10⁻⁹ BER (the inverse of 2 × 10⁸). At the edge of the graph, degradation of the uncoded BER was observed due to the remaining dispersion after DCF. However, error-free transmission was achieved for the 35 km ∼ 85 km transmission distance, confirming the feasibility of the system. We also verified stable operation for 8 hours by monitoring BER for transmission distances of 35 km, 60 km, and 85 km.

4. Summary

We reported realization results for a real-time DSP including FEC for a 56 Gb/s (corresponding to a net data rate of 50 Gb/s) PAM-4 system. To increase dispersion compensation capacity, a combined structure of DFE and MLSE was employed. Instead of conventional RS code, to increase coding gain, SOVA and LDPC code with 8.51% overhead were used. The M-M algorithm was adopted for timing error recovery. For implementation, we employed fully parallelized structure. The performance of the realized DSP was empirically investigated by experiment. The measured implementation penalty compared to a floating offline simulation was ∼ 0.5 dB. We demonstrated 25 km transmission without error-floor sign, indicating a ∼425 ps/nm dispersion compensation capacity. The 425 ps/nm dispersion compensation capacity is a world-best record with a real-time DSP, to the best of our knowledge. We empirically verified the feasibility of the DSP in an inter-datacenter network. As a result, we confirmed 35 km ∼ 85 km error-free transmission.

Funding

Institute for Information and Communications Technology Promotion(IITP) grant funded by the Korea government (MSIT)(2017-0-00047, Development of 200Gb/s optical transceiver for metro-access network).

Disclosures

The authors declare no conflicts of interest.

References

1. N. Eiselt, J. Wei, H. Griesser, A. Dochhan, M. Eiselt, J.-P. Elbers, J. J. V. Olmos, and I. T. Monroy, “First realtime 400G PAM-4 demonstration for inter-data center transmission over 100 km of SSMF at 1550 nm,” in Proc. of optical fiber conference (OFC), (2016).

2. J. H. Lee, S. H. Chang, J. Y. Huh, S.-K. Kang, K. Kim, and J. K. Lee, “EML based real-time 112 Gbit/s (2 × 56.25 Gbit/s) PAM-4 signal transmission in C-band over 80 km SSMF for inter DCI applications,” Opt. Fiber Technol. 45, 141–145 (2018). [CrossRef]

3. H. Li, R. Hu, Q. Yang, M. Luo, Z. He, P. Jiang, Y. Liu, X. Li, and S. Yu, “Improving performance of mobile fronthaul architecture employing high order delta-sigma modulator with PAM-4 format,” Opt. Express 25(1), 1–9 (2017). [CrossRef]

4. X. Tang, S. Liu, Z. Sun, H. Cui, X. Xu, J. Qi, M. Guo, Y. Lu, and Y. Qiao, “C-band 56-Gb/s PAM4 transmission over 80-km SSMF with electrical equalization at receiver,” Opt. Express 27(18), 25708–25717 (2019). [CrossRef]

5. J. Lee, N. Kaneda, and Y.-K. Chen, “112-Gbit/s intensity-modulated direct-detect vestigial sideband PAM4 transmission over an 80-km SSMF link,” in Proc. of European Conference on Optical Communication (ECOC), 2016.

6. C. Ye, D. Zhang, X. Huang, H. Feng, and K. Zhang, “Demonstration of 50Gbps IM/DD PAM4 PON over 10 GHz class optics using neural network based nonlinear equalization,” in Proc. of European Conference on Optical Communication (ECOC), 2017.

7. X. Tang, S. Liu, X. Xu, J. Qi, M. Guo, J. Zhou, and Y. Qiao, “50-Gb/s PAM4 over 50-km single mode fiber transmission using efficient equalization technique,” in Proc. of optical fiber conference (OFC), (2019).

8. S. Zhou, X. Li, L. Yi, Q. Yang, S. J. H. Lee, and S. Fu, “Transmission of 2×56 Gb/s PAM-4 signal over 100 km SSMF using 18 GHz DMLs,” Opt. Lett. 41(8), 1805–1808 (2016). [CrossRef]

9. J. Zhang, J. Yu, X. Li, Y. Wei, K. Wang, L. Zhao, W. Zhou, J. Xiao, X. Pang, B. Liu, X. Xin, L. Zhang, and Y. Zhang, “Demonstration of 100-Gb/s/λ PAM-4 transmission over 45 km SSMF Using one 10G-class DML in the C-band,” in Proc. of optical fiber conference (OFC), (2019).

10. J. Verbist, M. Verplaetse, S. A. Srinivasan, J. Van Kerrebrouck, P. De Heyn, P. Absil, T. De Keulenaer, R. Pierco, A. Vyncke, G. Torfs, X. Yin, G. Roelkens, J. Van Campenhout, and J. Bauwelinck, “Real-time 100 Gb/s NRZ and EDB transmission with a GeSi electro-absorption modulator for short-reach optical interconnects,” J. Lightwave Technol. 36(1), 90–96 (2018). [CrossRef]

11. V. Houtsma and D. V. Veen, “Demonstration of symmetrical 25 Gbps TDM-PON with 31.5 dB optical power budget using only 10 Gbps optical components,” in Proc. of European Conference on Optical Communication (ECOC), 2015.

12. J. Wei, N. Eiselt, H. Griesser, K. Grobe, M. H. Eiselt, J. J. V. Olmos, I. T. Monroy, and J.-P. Elbers, “Demonstration of the first real-time end-to-end 40-Gb/s PAM-4 for next-generation access applications using 10-Gb/s transmitter,” J. Lightwave Technol. 34(7), 1628–1635 (2016). [CrossRef]

13. R. Nagarajan, M. Filer, Y. Fu, M. Kato, T. Rope, and J. Stewart, “Silicon photonics-based 100 Gbit/s, PAM4, DWDM data center interconnect,” J. Opt. Commun. Netw. 10(7), B25–B36 (2018). [CrossRef]

14. N. Eiselt, H. Griesser, J. Wei, A. Dochhan, M. Eiselt, J.-P. Elbers, J. J. V. Olmos, and I. T. Monroy, “Real-time evaluation of 26-GBaud PAM-4 intensity modulation and direct detection systems for data-center interconnects,” in Proc. of optical fiber conference (OFC), (2016).

15. S.-R. Moon, H.-S. Kang, H.Y. Rha, and J.K. Lee, “58.125 Gb/s 80 km transmission of PAM-4 signal with improved dispersion tolerance,” in Proc. of OptoElectronics and Communications Conference (OECC), (2018).

16. S.-R. Moon, H.-S. Kang, H. Y. Rha, and J. K. Lee, “C-band PAM-4 signal transmission using soft output MLSE and LDPC code,” Opt. Express 27(1), 110–120 (2019). [CrossRef]

17. H. Y. Rha, S.-R. Moon, H.-S. Kang, S.-W. Lee, I.-K. Hwang, and J. K. Lee, “Low-complexity soft-decision Viterbi algorithm for IM/DD 56-Gb/s PAM-4 system,” IEEE Photonics Technol. Lett. 31(5), 361–364 (2019). [CrossRef]

18. L. Lan, L. Zeng, Y. Y. Tai, L. Chen, S. Lin, and K. Abdel-Ghaffar, “Construction of quasi-cyclic LDPC codes for AWGN and binary erasure channels: a finite field approach,” IEEE Trans. Inf. Theory 53(7), 2429–2458 (2007). [CrossRef]

19. A. Darabiha, A. C. Carusone, and F. R. Kschischang, “Block-interlaced LDPC decoders with reduced interconnect complexity,” IEEE Trans. Circuits Syst. II 55(1), 74–78 (2008). [CrossRef]

	TX FPGA	RX FPGA		Key Blocks
	XC7VX690T	XCVU190	XCVU13P	DFE	MLSE	De-interleaver	LDPC decoder
LUT	20.8%	66.1%	62.1%	24.2%	33.5%	33.5%	24%
Register	13.4%	28.0%	23.9%	9.38%	15.4%	15.4%	5.5%
Memory	31.2%	33.9%	35.1%	0.74%	3.6%	4.6%
DSP	-	23.7%	38.4%	23.3%	23.4%	38.4%

Realization of real-time DSP for C-band PAM-4 transmission in inter-datacenter network

Abstract

1. Introduction

2. System architecture

2.1. Overview of PAM-4 transmission system

2.2. Tx DSP architecture

2.3. Rx DSP architecture

3. Experiment

3.1. Experimental setup

3.2. Experimental result

3.3. Transmission result for inter-datacenter network

4. Summary

Funding

Disclosures

References

Cited By

Figures (8)

Tables (1)

Optics Express