Flexible Multi-ASIP SoC for Turbo/LDPC Decoder by MURUGAPPA VELAYUTHAN, Purushotham et al.
Flexible Multi-ASIP SoC for Turbo/LDPC Decoder
Purushotham Murugappa Velayuthan, Pallavi Reddy, Rachid Al Khayat,
Jean-Noe¨l Bazin, Amer Baghdadi, Fabien Clermidy, Michel Jezequel
To cite this version:
Purushotham Murugappa Velayuthan, Pallavi Reddy, Rachid Al Khayat, Jean-Noe¨l Bazin,
Amer Baghdadi, et al.. Flexible Multi-ASIP SoC for Turbo/LDPC Decoder. SOC-SIP : col-




Submitted on 24 Aug 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.



















Institut Mines-Telecom; Telecom Bretagne; CNRS Lab-STICC, CS 83818, 29238, Brest, France. 
(2)
CEA-LETI, MINATEC, 38054, Grenoble, France. 
 
I. INTRODUCTION 
The emerging digital communication standards must allow 
the access to very high throughputs with low energy 
consumption. Among the different components of a 
communication system, channel decoder is probably the 
most demanding in terms of computation, communication 
and memory. To improve their quality and efficiency, error 
correcting codes are now specified with a wide variety of 
options, resulting in a growing need for flexible hardware 
solutions (Table 1). This need for flexibility is even more 
emphasized with the introduction of new applications for 
opportunistic radio and cognitive radio which require a 
quasi-simultaneous decoding of multiple streams of data. 
Table 1. Selection of standards and channel codes (CC: 
Convolutional Code, SBTC: Single-Binary Turbo Code, DBTC: 
Duo-Binary Turbo Code, LDPC: Low-Density Parity-Check 
code)  





CC 1/2 - 3/4 64 ≤ 4095 ≤ 54 Mbps 
CC 2/3 256 - - 
LDPC 1/2 - 5/6 - ≤ 1944 ≤ 450 Mbps 
IEEE-802.16 
(WiMax) 
CC 1/2 - 7/8 64  ≤ 2040 - 
DBTC 1/3 - 5/6 8 ≤ 4800 ≤ 75 Mbps 
LDPC 1/2 - 3/4 - ≤ 2304 ≤ 75 Mbps 
DVB-RCS DBTC 1/3 - 6/7 8 96 - 1728 2 Mbps 
3GPP-LTE 
CC 1/3 64 - - 
SBTC 1/3 8 40 - 6144 ≤ 150 Mbps 
However, optimal solutions in terms of performance, area 
and power consumption are yet to be invented: a “blind” 
approach towards flexibility leads to losses in these criteria, 
unacceptable for most applications. 
Thus, the originality of this work is related to unifying 
flexibility-oriented and optimization-oriented approaches. 
The main goal is to de deliver basic enablers and hardware 
building blocks in order to derive, for specific application 
needs, the best balance between a highly flexible solution 
and a specifically optimized one. 
To reach the desired optimal tradeoff between flexibility 
and performance, we propose a multiprocessor architecture 
model integrating networks-on-chip and computation units 
based on the new concept of ASIP (Application Specific 
Instruction-set Processor). The results on the association of 
optimality in terms of energy consumption and area via an 
ideal sharing of hardware resources are obtained through a 
pragmatic approach that considers advanced optimization 
techniques on all levels: algorithm, architecture, and target 
technology (Figure 1).  
The main result of this work is the proposal and the design 
of an innovative architecture for universal channel decoder 
which achieves high-throughput, energy-efficiency and 
modularity with mastered complexity. The proposed 
channel decoder has been validated on an FPGA 
demonstrator supporting LDPC and Turbo decoding 
techniques and their specified parameters in the standards 
WiFi/WiMax/LTE/DVB-RCS. 
II. MULTI-ASIP DECODER ARCHITECTURE  
The proposed ASIP was modeled in LISA language using 
CoWare’s processor designer. The synthesis was done 
with 90 nm CMOS technology and gave 0.2 mm² per 
ASIP with maximum clock frequency Fclk = 500 MHz. 
Thus the proposed channel decoder architecture with 8 
ASIPs and interconnection network is about 1.76 mm² 
with total memory area of 1.2 mm². The best throughput 
achieved in LDPC mode is 460 Mbps for WiMax code 
rate Crate=5/6, Z=96, Mb=4, Nb=24 and Niter=10 iterations. 
The architecture has NA=8 ASIPs each processing CAN=3 
check nodes per ClkCN=2 clocks. 
Regarding turbo mode, an average Ninstr=4 instructions 
are needed to process 1 symbol which is composed of 
Bitssym=2 bits. Considering Niter=6 iterations, the 
maximum throughput achieved is 160 Mbps.  
Table 2 compares the obtained results with other related 
works. The achieved throughput is comparable to [4] in 
LDPC WiFi mode while the proposed architecture 
achieves 2 times more throughput in LDPC WiMax 
mode. In SBTC mode, the throughput is more than 8 
times that achieved by [4] at the cost of ~2 times the 
occupied area. [5] occupies 20% more area compared to 
our proposed architecture and does not achieve the 
throughput requirement of LTE. On the other hand, [3] 
achieves higher throughput in LDPC and SBTC modes at 
the cost of 7.75% more area and does not support DBTC. 








































[5] 0.9 45 70 100 70 18 150 
III. ENERGY-AWARE OPTIMISATIONS  
Power consumption analysis was conducted and novel 
power reduction techniques have been proposed. The 
analysis is done using Synopsys Prime Power tool and 
based on post synthesis simulations. Targeting ASIC 
technology, memories technological features and 
constraints in terms of power consumption and area have 
been thoroughly analyzed. On the other hand, each 
memory block has been studied in terms of size (width 
and depth), access behaviour and scheduling, bandwidth 
  
Figure 1. FPGA demonstrator of the proposed flexible Multi-ASIP Turbo/LDPC decoder 
requirements, and memory interface. Based on these two 
studies, a new memory banks re-organisation is proposed, 
besides a more adequate memory access control. Results of 
the proposed optimization techniques (related to pipeline 
structure, memory organization, and access control) have 
shown an interesting gain in normalized energy efficiency 
between 4% and 54% compared to state of the art. 
Even if architecture-level optimizations can lead to 
considerable power consumption reductions, algorithm-
level ones should potentially present higher gains. In this 
context, the use of stopping criteria is one of the most 
common algorithm-level power reduction methods in 
literature. These methods always come with some hardware 
overhead. In this research activity, a new trellis based 
stopping criterion was proposed. The novelty of this 
approach is the lower hardware overhead thanks to the use 
of trellis states as key parameter to stop the iterative 
process. Results are showing the importance of this added 
hardware in terms of method efficiency. Compared to state-
of-the-art LLR-based techniques, proposed Low 
Complexity Trellis Based (LCTB) is demonstrating 23% 
less power consumption on average, for comparable 
BER/FER performance level. 
IV. CONCLUSION 
In order to meet flexibility and performance constraints of 
current and future digital communication applications, 
multiple ASIPs combined with dedicated communication 
and memory architectures are required. In this work we 
consider the design of an innovative universal channel 
decoder architecture model by unifying flexibility-oriented 
and optimization-oriented approaches. Towards this 
objective, we have designed a flexible and scalable 
multiprocessor platform based on a novel ASIP 
architecture for high throughput turbo/LDPC decoding. 
The proposed platform supports turbo and LDPC codes 
of most emerging wireless communication standards 
(WiFi, WiMax, LTE, and DVB-RCS). Energy-aware 
optimisation techniques have been also proposed and 
implemented.  
Finally, a fully functional FPGA demonstrator is 
available and the proposed Multi-ASIP architecture has 
been successfully integrated into a new generation 
telecom chip. 
ACKNOWLEDGMENTS 
This work was supported in part by the UDEC and TEROPP 
projects of the French Research Agency (ANR). 
V. REFERENCES 
[1] P. Murugappa, R. AlKhayat, A. Baghdadi, M. Jézéquel, "A 
Flexible High Throughput Multi-ASIP Architecture for 
LDPC and Turbo Decoding ", DATE’11, 2011. 
[2] P. Reddy, F. Clermidy, A. Baghdadi, and M. Jezequel. "A 
Low Complexity Stopping Criterion for Reducing Power 
Consumption in Turbo Decoders", DATE’11, 2011. 
[3] Y. Sun and J. Cavallaro. "A Flexible LDPC/Turbo Decoder 
Architecture", Journal of Signal Processing Systems, vol. 
64, July 2010. 
[4] M. Alles, T. Vogt, and N. Wehn. "FlexiChaP: A 
reconfigurable ASIP for convolutional, turbo, and LDPC 
code decoding", 5th Inter. Sym. on Turbo Codes and 
Related Topics, 2008. 
[5] M. R. Giuseppe Gentile and L. Fanucci. "A Multi-Standard 
Flexible Turbo/LDPC Decoder via ASIC Design", Inter. 
Sym. on Turbo Codes and Iter. Info. Processing, 2010. 
 
