Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

FPGA implementation of concatenated non-binary QC-LDPC codes for high-speed optical transport

Open Access Open Access

Abstract

In this paper, we propose a soft-decision-based FEC scheme that is the concatenation of a non-binary LDPC code and hard-decision FEC code. The proposed NB-LDPC + RS with overhead of 27.06% provides a superior NCG of 11.9dB at a post-FEC BER of 10−15. As a result, the proposed NB-LDPC codes represent the strong FEC candidate of soft-decision FEC for beyond 100Gb/s optical transmission systems.

© 2015 Optical Society of America

1. Introduction

Super forward error correction (Super-FEC) coding with powerful error correction capability has been extensively investigated over the past decades, and represents one of the key technologies enabling the development of 400Gb/s, 1Tb/s Ethernet and beyond [1]. Among a number of candidates, such as Reed-Solomon (RS) code and turbo product codes (TPC) [1,2], low-density parity-check (LDPC) codes have led to their recent inclusion in several standards in optical community, such as ITU-T G.975, G.709, thanks to their remarkable error correcting capabilities and the recent development of corresponding integrated circuits. In literature, concatenated LDPC(9252,7967) + RS(992,956) and concatenated LDPC(73728,62464) + RS(781,765) were reported with a Q limit of 7.1dB at BER of 10−13 and a Q limit of 5.8dB at BER of 10−12, respectively [3,4]. A single LDPC code and a convolutional LDPC with the codeword length of nearly 20000 and rate of 0.8 were demonstrated with Q limit of 5.9dB and 5.7dB, both at BER of 10−15 [5,6]. Q limit was further improved to 5dB at BER of 10−15 by a spatially-coupled type LDPC code [7]. Due to the good error correction performance and low circuit complexity, binary QC-LDPC codes were widely adopted in academia and industry [8]. Non-binary LDPC codes, which have been shown to outperform binary LDPC codes and RS codes, were studied by Monte Carlo simulations [9–11].

In this paper, we present FPGA-based implementation of a concatenated non-binary LDPC code + shortened-RS code with 27.06% overhead that can achieve a Q-limit of 5.05dB at BER of 10−15, which corresponds to a NCG of 11.9dB. To the best of our knowledge, this is the first work implementing a concatenated non-binary LDPC code that represents a promising solution for the next generation of beyond 100Gb/s optical transport networks.

The rest of this paper is organized as follows. In Section 2, we provide the construction method of non-binary LDPC codes and the corresponding decoding algorithm. In Section 3, we present the details of fixed point simulation results and the FPGA implementation architecture of layered non-binary LDPC decoders as well as the RS decoders. In Section 4, we present the emulation results and discuss their significance. Finally, Section 5 concludes our paper.

2. Construction of non-binary LDPC code and its decoding algorithms

There exist various algebraic and combinatorial methods to construct structured non-binary LDPC codes [12–14], however, the two-stage construction method used in this paper is described as follows. We first design a binary quasi-cyclic (QC) LDPC code of girth-8 or girth-10 following the guidelines in [14]. Then we replace the 1’s in binary parity check matrix with non-zero elements in Galois field GF(q) by random selection. Thus, the parity-check matrix HNB of a (γ,ρ)-regular non-binary QC-LDPC code can be represented by

HNB,γp×ρp=Wγp×ρpHB,γp×ρp=(w0,0w0,ρ1wγ1,0wγ1,ρ1)(A0,0A0,ρ1Aγ1,0Aγ1,ρ1).
where wi,j is a p×p matrix over GF(q) and Ai,j is a p×p circulant permutation matrix over GF(2). Operation is element-wise multiplication over GF(q). For the sake of efficient implementation, we chose wi,j such that every element in submatrix is the same.

Several algorithms were proposed for decoding of non-binary LDPC codes, such as Q-ary sum-product algorithm (QSPA), Log-domain FFT-QSPA (Log-FFT), mixed-domain FFT-QPSA, Max-Log QSPA, extended min sum algorithm, min max algorithm (MMA) [15–20]. Due to the low complexity, we adopt the MMA and some modification has been made to further make it suitable for implementation. A partial parallel layered decoding for (γ,ρ)-regular non-binary QC-LDPC code over GF(q) is employed due to the low memory requirement and the fast convergence speed [21]. In this method, the rows (i.e., the check nodes) of parity-check matrix are divided into groups (layers), where the size of each group is p. Let Rcvk,l, Lcvk,l, and Lv represent the check c to variable v message vector, the variable v to check c message vector, and the log-likelihood ratio at k-th iteration and l-th layer respectively; where k=1,...,Imax and l=1,...,γ. The modified layered min-max algorithm (LMMA) can be formulated as follows:

  • •Tentative decisions:
    Lvk,l(a)=Lv(a)+l'Rcvk,l'(a).
    c^v=argminaGF(q)Lvk,l(a).
  • •Variable node processing step:
    Lvck,l(a)=Lv(a)+l'lRcvk,l'(a).
    Lvck,l(a)=Lvck,l(a)mina'GF(q)(Lvck,l(a')).
  • •Check node processing step:
    Rcvk,l(a)=minav'L(c|av=a)(maxv'N(c)/vLv'ck,l(av')).

Equation (2) and Eq. (4) are different from that in [21], where instead of updating Lv and Rcvk,l in each layer, we only update Rcvk,l while keeping Lv untouched. The benefits of this modification are twofold. Firstly, the unwanted earlier saturation of Rcvk,l than Lv can be avoided without introducing additional operations. Secondly, updating Rcvk,l’s is more routing-friendly than updating both Lv and Rcvk,l. The Eq. (5) is necessary for numerical reasons to ensure the non-divergence of the algorithm and Eq. (6) is realized by trellis-based recursive approach.

3. FPGA architecture of concatenated non-binary LDPC decoder

In this Section, we first provide the fixed point simulation results of non-binary LDPC decoder with the purpose to identify the required number of bits. After that we provide the details of proposed FPGA architecture of non-binary LDPC decoder together with comparison with corresponding binary counterpart. We also provide the details of Reed-Solomon decoder implementation, which serves as an outer decoder.

3.1 Fixed-point simulation results

Before implementing the layered min-max decoding algorithm in hardware, quantization is an important issue that needs to be addressed. In this paper, we have simulated different types of quantization schemes to find the best tradeoff between the hardware complexity and the decoding performance. Let (I, F) represent a fixed-point scheme with I bits for the integer part and F bits for the fraction part. The layered min-max algorithm have been simulated with fixed point precision for GF(4) (16935,13550) non-binary LDPC code over binary-input continuous-output AWGN (BI-AWGN) channel. Different fixed point representations for initial LLR, variable to check node message, and check to variable node message have been investigated. Although a non-uniform quantization and different quantization methods can be used to represent Lv, Rcvk,l, and Lvck,l; we employ the uniform quantization scheme for all three types of messages. The results shown in Fig. 1 indicate that at least 5 bits (including sign bit) are required to achieve good decoding performance. However, both binary and non-binary LDPC decoders will use 6 bits precision representation of Lv, Rcvk,l, and Lvck,l for the purpose of comparison.

 figure: Fig. 1

Fig. 1 Fixed-point simulation over BI-AWGN.

Download Full Size | PDF

3.2 Architecture of non-binary LDPC decoder

Figure 2(a) presents the architecture of the LMMA-based non-binary LDPC decoder, which consists of two parts; namely, memories and processors. The processors can be divided into one variable node unit (VNU) corresponding to Eq. (2), Eq. (4), and Eq. (5), one check node unit (CNU) corresponding to Eq. (6) and one early termination unit (ETU) corresponding to Eq. (3).

 figure: Fig. 2

Fig. 2 Non-binary LDPC decoder architecture: (a) overall architecture, (b) architecture of CNU, and (c) architecture of BCJR-based min-max processor.

Download Full Size | PDF

There are four types of memories used in implementation: memory for Rcv with size of γ×n×q×WR stores the information from check nodes to variable nodes, memory for Lv with size of n×q×WL stores the initial log-likelihood ratios, memory for c^ with size n×log2(q) stores the decoded bits, and memories inside each CNU store the intermediate values. In discussion above, γ denotes the column weight, n is the codeword length, q is the size of Galois field, WR and WL represent the word-lengths for Rcv and Lv.

As shown in Fig. 2(b), it is obvious that CNU is the most complex part of the decoding algorithm, which consists of ρ inverse permutators, ρ BCJR-based minmax processors and ρ permutators, and two types of the first-in-first-out (FIFO) registers. The inverse permutator block shifts the incoming message vector cyclically. The first FIFO is used to perform the parallel–to-serial conversion as required in min-max processor. In Fig. 2(c), the min-max processor consists of one forward recursion block, one backward recursion block, two memories storing intermediate values, and one merge block. Then FIFO block is used again to perform serial-to-parallel conversion, followed by the permutator block. In order to reduce the latency of min-max processor, we adopted the bidirectional recursion technique. In conventional BCJR processor, it takes ρcycles updating forward metric and backward metric recursively and additional ρ cycles to combine them. However, the combining process can be proceeding once half of the recursion is done, which saves up to ρ cycles. Because of high complexity of CNU design and high memory requirements of non-binary decoder than that of binary decoder, reduced-complexity architectures and selective version of MMA have been widely studied [22,23].

At last, we present in Table 1 the utilization results for hardware implementation of the QC (3,15)-regular, girth-8 (16935, 13550, 0.8) LDPC code over GF(2) and GF(4). In order to make fair comparison, we adopt the 6 bits precision (including the sign bit) for both decoders, the maximum number of iterations is set to 15, 15 variable node units and one check node processor are employed. One can clearly notice that LMMA consumes 3.6 times larger memory than layered attenuated min sum algorithm (LAMSA) because of large field size, while occupied number of slices for LMMA is five times of that of LAMSA because of higher complexity involved in CNU. On the other hand, Max-Log algorithm has larger logic usage than Min-Max algorithm as they are both based on the BCJR algorithm while the min operation is replaced by addition.

Tables Icon

Table 1. Utilization Summary of LDPC Decoders and Reed-Solomon Decoder.

3.3 Architecture of Reed-Solomon decoder

When designing a Reed-Solomon decoder, we first have to implement adder and multiplier over finite field GF(q). Addition in GF(q) is realized by bitwise exclusive-OR (XOR) gate while multiplication is implemented by a set of AND gates and XOR gates [24]. As shown in Fig. 3, an RS decoder consists of three stages. First is the syndrome calculator block that computes the syndrome polynomial S(x) based on the received polynomial R(x). The Euclidean divider and multiplier block implement Euclidean algorithm stage that solves the key equations and provides an error-location polynomial σ(x) and error-evaluation polynomial ω(x). The third stage is the Chien search block that finds the roots of σ(x) and Forney algorithm block that calculates the error values. Thus errors found in Forney algorithm block are corrected in Error Correction block. For more details of the implementation of Euclidean Algorithm, an interested reader is referred to [25,26]. The last column of Table 1 shows the utilization summary of RS (1023, 1007). Clearly, the logic utilization is twice larger than that of binary LDPC decoder, while the memory utilization is comparable. Thus a well-designed LDPC code represents an alternative solution to combat the error floor phenomenon.

 figure: Fig. 3

Fig. 3 Overall architecture of Reed-Solomon decoder.

Download Full Size | PDF

4. Emulation results and discussion

4.1 Non-binary LDPC decoder performance

Before presenting the BER performance of concatenated non-binary LDPC codes, we first studied the performance of various decoding algorithms for non-binary LDPC codes. The LDPC codes under investigation were constructed following the two-stage procedure as discussed above, which are QC (3,15)-regular, girth-8 (16935, 13550, 0.8) LDPC code over GF(2) and GF(4). By computer search, we found the number of 8-cycles of this code to be 118,545. We adopted layered attenuated min-sum algorithm for decoding of binary LDPC codes, while layered Log-FFT algorithm, Max-Log algorithm, and Min-Max algorithm for decoding of non-binary LDPC codes. Both binary and non-binary decoders are employing 6 bits precision (including sign bit) and maximum number of iteration is set to 15. As shown in Fig. 4, the binary code exhibits an error floor at BER of 10−13, while non-binary decoders exhibit error floors at BER of 10−7, 10−8, and 10−9 for Log-FFT, Max-Log, and Min-Max algorithms, respectively. One can notice that the shape of error floor is different than the binary one; which could be contributed to the existence of the trapping sets that highly depend on decoding algorithms in addition to the fixed point representation of decoding algorithms.

 figure: Fig. 4

Fig. 4 BER performances of various non-binary LDPC decoders.

Download Full Size | PDF

4.2 Concatenated non-binary LDPC decoder performance

A better non-binary LDPC code is designed and investigated in proposed concatenated scheme, which is QC (3,15)-regular, girth-10 (34635, 27710, 0.8) LDPC code over GF(4). While the RS (988,972) is obtained by shortening from RS (1023,1007) over GF(10). We adopted layered min-max algorithm and Euclidean algorithm for decoding of non-binary QC-LDPC codes and RS code, respectively. Table 2 shows the utilization summary of non-binary LDPC decoder with 6 bits precision, RS decoder with 10 bits precision, and the proposed concatenated decoder. The better non-binary LDPC decoder consumes 1.5 times larger memory than the short one since the codeword length has been doubled with comparable logic utilization. The proposed concatenated scheme requires lower memory usage than the combined non-binary LDPC decoder and RS decoder because an efficient mapping has been achieved.

Tables Icon

Table 2. Utilization Summary of Proposed Concatenated Decoder.

The purpose of concatenating a RS code is to eliminate unwanted error floor. However, we should carefully choose the outer code in order to correct all the residual errors after the non-binary LDPC decoder as well as not introduce unnecessary overhead (OH). Thus we investigate the post-LDPC decoder behavior by FPGA emulation. Figure 5 shows the emulated histogram of the residual error bits in error LDPC codeword for a post-FEC BER of 10−10, the simulated number of codewords is ~1.3×109. Noticing that two-error bits are dominant after LDPC decoding, we notice that this is well matched with the error floor phenomenon caused by the decoding algorithms and fixed point representation. Theoretically speaking, we can construct an outer code with OH of ~0% if we have infinite depth of de-interleaver depth. However, this is not feasible as we fixed the size of de-interleaver to the same length as that of LDPC codeword. In our case, the maximum number of error bits in one LDPC codeword was at most 10 and each LDPC codeword consists of 7.01 RS codewords. Thus we can conclude that, if we choose RS codes able to correct more than 8 symbols, almost all of the residual error bits will be corrected for.

 figure: Fig. 5

Fig. 5 Error bits histogram of the post-LDPC decoder.

Download Full Size | PDF

The BER vs. Q-factor performances of the concatenated non-binary LDPC code are presented in Fig. 6. The FPGA-based simulation was conducted over BI-AWGN channel and 5, 6 bits precision are used in binary and non-binary LDPC code respectively. The results for 5, 10, and 15 maximum decoding iterations prove the fast convergence of layered decoding algorithm. The concatenated non-binary QC-LDPC code of overall rate 0.787 can achieve a Q-limit of 5.05 dB at BER of 10−15, which corresponds to NCG of 11.91 dB, while the binary LDPC code of rate 0.8 can achieve a Q-limit of 5.2dB at 10−15, which corresponds to NCG of 11.83dB. The performance of binary LDPC code is provided as reference. In addition, combined non-binary LDPC code with higher level modulation schemes can brings more benefit than binary LDPC code.

 figure: Fig. 6

Fig. 6 BER performances of concatenated non-binary LDPC codes.

Download Full Size | PDF

5. Conclusion

We proposed a novel SD-FEC employing the concatenation of a girth-10 non-binary QC-LDPC code and a RS code with overall 27% OH for high-speed optical transmission systems. The BER performance was verified through FPGA emulation system. Superior waterfall and error floor performance is demonstrated at a post-FEC BER of 10−15. No error floor has been found and 5.05dB in Q-limit is achieved, corresponding to NCG of 11.91dB. To the best of our knowledge, this is the first implementation of concatenated non-binary LDPC code + shortened-RS code. We believe that the proposed non-binary QC-LDPC code is one of the promising candidates for the next generation optical communication systems.

Acknowledgment

This work was supported in part by NSF ERC Center for Integrated Access Networks (CIAN) under grant EEC-0812072 as well as by NSF under Grant CCF-0952711.

References and links

1. ITU-T G. 975. 1, Forward error correction for high bit-rate DWDM submarine system, 2004.

2. S. Dave, L. Esker, F. Mo, W. Thesling, J. Keszenheimer, and R. Fuerst, “Soft-decision forward error correction in a 40-nm ASIC for 100-Gbps OTN Applications,” in Optical Fiber Communication Conference (OFC/NFOEC’2011), paper JWA14. [CrossRef]  

3. Y. Miyata, W. Matsumoto, H. Yoshida, and T. Mizuochi, “Efficient FEC for optical communications using concatenated codes to combat error-floor,” in Optical Fiber Communication Conference (OFC/NFOEC’2008), paper OTuE4.

4. N. Kamiya and S. Shioiri, “Concatenated QC-LDPC and SPC codes for 100 Gbps ultra long-haul optical transmission system,” in Optical Fiber Communication Conference (OFC/NFOEC’2010), paper OThL2. [CrossRef]  

5. D. Chang, F. Yu, Z. Xiao, Y. Li, N. Stojanovic, C. Xie, X. Shi, X. Xu, and Q. Xiong, “FPGA verification of a single QC-LDPC code for 100 Gb/s optical systems without error floor down to BER of 10−15,” in Optical Fiber Communication Conference (OFC/NFOEC’2011), paper OTuN2.

6. D. Chang, F. Yu, Z. Xiao, N. Stojanovic, F. N. Hauske, Y. Cai, C. Xie, L. Li, X. Xu, and Q. Xiong, “LDPC convolutional codes using layered decoding algorithm for high speed coherent optical transmission,” in Optical Fiber Communication Conference (OFC/NFOEC’2012), paper OW1H.4. [CrossRef]  

7. K. Sugihara, Y. Miyata, T. Sugihara, K. Kubo, H. Yoshida, W. Matsumoto, and T. Mizuochi, “A spatially-coupled type LDPC code with an NCG of 12dB for optical transmission beyond 100 Gb/s,” in Optical Fiber Communication Conference (OFC/NFOEC’2013), paper OM2B.4.

8. D. Zou and I. B. Djordjevic, “FPGA implementation of high-performance QC-LDPC decoder for optical communications,” Proc. SPIE 9388, 93880P (2015). [CrossRef]  

9. C. Seok, H. Lee, N. Kaneda, and Y. Chen, “Concatenated non-binary LDPC and HD-FEC codes for 100 Gb/s optical transport system,” in Proceedings of IEEE International Symposium on Circuits and Systems (IEEE, 2012), pp. 1783–1786.

10. I. B. Djordjevic and B. Vasic, “Nonbinary LDPC codes for optical communication systems,” IEEE Photon. Technol. Lett. 17(10), 2224–2226 (2005). [CrossRef]  

11. M. Arabaci, I. B. Djordjevic, R. Saunders, and R. M. Marcoccia, “Polarization-multiplexed rate-adaptive non-binary-quasi-cyclic-LDPC-coded multilevel modulation with coherent detection for optical transport networks,” Opt. Express 18(3), 1820–1832 (2010). [CrossRef]   [PubMed]  

12. B. Zhou, J. Kang, S. Song, S. Lin, K. A. Ghaffar, and M. Xu, “Construction of non-binary quasi-cyclic LDPC codes by arrays and array dispersion,” IEEE Trans. Commun. 57(6), 1652–1662 (2009). [CrossRef]  

13. B. Zhou, J. Kang, Y. Tai, S. Lin, and Z. Ding, “High performance non-binary quasi-cyclic LDPC codes on Euclidean geometry,” IEEE Trans. Commun. 57(5), 1298–1311 (2009). [CrossRef]  

14. M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes from circulant permutation matrices,” IEEE Trans. Inf. Theory 50(8), 1788–1793 (2004). [CrossRef]  

15. M. Davey and D. MacKay, “Low density parity check codes over GF(q),” IEEE Commun. Lett. 2(6), 165–167 (1998). [CrossRef]  

16. L. Barnault and D. Declercq, “Fast decoding algorithm for LDPC over GF(2^q),” in Proceedings of IEEE Information Theory Workshop (IEEE, 2003) pp. 70–73. [CrossRef]  

17. D. Declercq and M. Fossorier, “Decoding algorithm for nonbinary LDPC codes over GF(q),” IEEE Trans. Commun. 55(4), 633–643 (2007). [CrossRef]  

18. C. Spagnol, E. Popovici, and W. Marnane, “Hardware implementation of GF(2m) LDPC decoders,” IEEE Trans. Circ. Syst. 56(12), 2609–2620 (2009). [CrossRef]  

19. D. Declercq and M. Fossorier, “Decoding algorithm for nonbinary LDPC codes over GF(q),” IEEE Trans. Commun. 55(4), 633–643 (2007). [CrossRef]  

20. V. Savin, “Min-max decoding for non-binary LDPC codes,” in Proceedings of IEEE International Symposium on Information Theory (IEEE, 2008) pp. 960–964.

21. D. E. Hocevar, “A reduced complexity decoder architecture via layered decoding of LDPC codes,” in Proceedings of IEEE Signal Processing Systems (IEEE, 2004) pp. 107–112. [CrossRef]  

22. Y. Ueng, K. Liao, H. Chou, and C. Yang, “A high-throughput trellis-based layered decoding architecture for non-binary LDPC codes using Max-Log-QSPA,” IEEE Trans. Signal Process. 61(11), 2940–2951 (2013). [CrossRef]  

23. X. Zhang and F. Cai, “Reduced-complexity decoder architecture for non-binary LDPC codes,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19(7), 1229–1238 (2011).

24. E. D. Mastrovito, “VLSI designs for multiplication over finite fields GF(2m),” in Sixth Symp. Algebra, Algebraic Algorithms, and Error Correcting Codes (AAECC-6, 1988) pp. 297–309.

25. H. Lee, “A high-speed low-complexity Reed-Solomon decoder for optical communications,” IEEE Trans. Circ. Syst. 52(8), 461–465 (2005). [CrossRef]  

26. S. B. Wicker, Error Control Systems for Digital Communication and Storage (Prentice Hall, 1995).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1 Fixed-point simulation over BI-AWGN.
Fig. 2
Fig. 2 Non-binary LDPC decoder architecture: (a) overall architecture, (b) architecture of CNU, and (c) architecture of BCJR-based min-max processor.
Fig. 3
Fig. 3 Overall architecture of Reed-Solomon decoder.
Fig. 4
Fig. 4 BER performances of various non-binary LDPC decoders.
Fig. 5
Fig. 5 Error bits histogram of the post-LDPC decoder.
Fig. 6
Fig. 6 BER performances of concatenated non-binary LDPC codes.

Tables (2)

Tables Icon

Table 1 Utilization Summary of LDPC Decoders and Reed-Solomon Decoder.

Tables Icon

Table 2 Utilization Summary of Proposed Concatenated Decoder.

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

H NB,γp×ρp = W γp×ρp H B,γp×ρp =( w 0,0 w 0,ρ1 w γ1,0 w γ1,ρ1 )( A 0,0 A 0,ρ1 A γ1,0 A γ1,ρ1 ).
L v k,l ( a )= L v ( a )+ l' R cv k,l' ( a ) .
c ^ v =arg min aGF( q ) L v k,l ( a ).
L vc k,l ( a )= L v ( a )+ l'l R cv k,l' ( a ) .
L vc k,l ( a )= L vc k,l ( a ) min a'GF( q ) ( L vc k,l ( a' ) ).
R cv k,l ( a )= min a v' L( c| a v =a ) ( max v'N( c )/v L v'c k,l ( a v' ) ).
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.