Reconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges by Cavallaro, Joseph R. et al.
  
Wireless World Research Forum 
(WWRF) 
 
 
Page 1 (5) 
 
Abstract — Mobile devices are severely 
power and area limited due to battery capacity 
and system size. In many of these example 
systems, advanced features require 
computationally complex signal processing on 
high-speed data streams for enhanced 
networking capabilities. Thus, mapping high-
level communication and networking 
algorithms to system architectures is a 
complex and challenging procedure. An 
important challenge is to characterize the area, 
time, and power requirements of these 
embedded system modules and to use this 
information effectively to determine the 
architecture of programmable, reconfigurable, 
and fixed-function modules. In this paper, we 
will focus on application examples in wireless 
networking which highlight these challenges 
in reconfigurable systems integration. 
 
Index Terms — Reconfigurable Elements, 
Flexible Air-Interface – Physical layer hardware 
architectures and software environments 
enabling reconfigurable radio  
 
Introduction 
 
OST wireless physical layers are 
characterized by several signal 
processing blocks which are similar in their 
structure and functionality but differ in their 
implementations. The digital baseband 
processing complexity is dominated by 
channel estimation, detection and decoding 
in the presence of interference. Interconnect 
is a limited design resource needed for 
achieving parallelism in these architectures 
[1-3]. In recent research, we have proposed 
algorithms and architectures for these three 
major blocks [4-12, 15, 16] for both base-
stations and mobile handsets in cellular and 
indoor wireless LAN systems. The proposed 
fixed-function ASIC-like implementations 
simultaneously achieve high performance 
and area-power efficiency by exploiting 
special algorithmic and architectural 
structures, and it is important to preserve 
these features when developing 
reconfigurable systems. 
 
With rapid development and adoption of 
advanced wireless systems, it is important to 
reduce the time to develop hardware 
research prototypes of these new algorithms 
[17, 18]. It is well known that VLSI 
implementations consume minimal power but 
typically require long design cycles. On the 
other hand, DSPs are completely 
programmable but are in general unable to 
meet real-time deadlines and power budgets 
for high data rate mobile communications. A 
new class of application specific processors 
is an area of current research [19]. These 
architectures are programmable and may 
also be reconfigurable to allow for application 
specific instructions [20, 21]. FPGAs present 
an excellent platform for prototyping and 
evaluation of these architectures, particularly 
the class of heterogenous FPGAs that 
contain both programmable fabric and 
embedded processor cores. However, there 
Reconfigurable Architectures for Wireless 
Systems: Design Exploration and 
Integration Challenges 
Joseph R. Cavallaro, Michael C. Brogioli, Alexandre de Baynast, and Predrag Radosavljevic, 
Center for Multimedia Communication, Department of Electrical and Computer Engineering, 
Rice University, MS 380, 6100 Main Street, Houston, TX 77005 USA,  +1-713-348-4719 
(voice), +1-713-348-6196 (fax), cavallar@rice.edu, brogioli@rice.edu, 
debaynas@ece.rice.edu, rpredrag@rice.edu  
M
  
Wireless World Research Forum 
(WWRF) 
 
 
Page 2 (5) 
are many new challenges in interconnect 
design to effectively integrate the embedded 
host and programmable fabric co-processor 
functions in these FPGAs, see Figure 1. 
 
Design Exploration and 
Integration Challenges 
 
Advanced signal processing algorithms 
contain a number of key numerically 
intensive kernels. Many of these kernels 
operate on streams of vectors or matrices 
and present challenges in interconnect and 
datapath design. As an example, wireless 
communication systems are composed of a 
pipeline of sub-algorithms with varying 
degrees of instruction and data parallelism. 
Our key research innovations are in signal 
processing algorithm to parallel architecture 
mapping. As wireless systems both increase 
in data rate and require increased flexibility to 
adapt to changing environmental channel 
conditions, VLSI signal processing 
architectures are expanding from fixed-
function ASICs and general programmable 
DSPs to application-specific instruction 
processors (ASIPs) and heterogeneous 
FPGAs with embedded processor cores and 
programmable fabrics [22-24]. We are 
investigating the power and interconnect 
challenges of these systems through 
partitioning into clusters [25] and the design 
of host to co-processor interfaces [29]. Our 
Rice University CMC wireless testbed will 
contain several high-density Xilinx Virtex-II 
Pro FPGAs that will allow us to study the 
efficiency of these devices and to provide 
FPGA modules and design techniques for 
intra-chip and inter-chip communication. 
 
As an example, our research in flexible Low 
Density Parity Check (LDPC) codes [26], has 
an initial prototype on standard FPGAs. 
LDPC is a highly parallel algorithm which 
requires large amounts of interconnect 
resources, and is therefore a research 
challenge in system integration. Our use of 
FPGA systems for rapid prototyping provides 
new core modules beyond those currently 
available from existing sources [27,28]. In our 
current research, we plan to utilize the Rice 
testbed to verify performance and integration 
in heterogeneous FPGAs and to make these 
host to coprocessor interface modules 
available. Additionally, we will describe our 
simulation environment for SoC design 
exploration that models the TI C64x DSP 
core with customizable programmable co-
processors. Our goals are to clearly and 
carefully model the software and hardware 
communication interfaces and bottlenecks in 
heterogeneous systems to improve the 
modularity and reusability of hardware and 
software for high data rate reconfigurable 
communication systems. It is important to 
model and simulate the data transfer 
between the hosts and co-processors so that 
the design of interface port and the 
communication link reduce the overall system 
cycle counts. A poorly design interface may 
actually reduce system performance. 
 
Processor Challenges for 4G 
Systems 
 
The ongoing definition of 4G wireless 
systems underscores several urgent 
challenges for system realization and design. 
Higher data rates and complex antenna and 
modulation schemes will require efficient 
algorithms and flexible chip architectures for 
baseband processing. Not only will area, 
Figure 1: Parallel Interference Cancellation detector example on a hetergeneous DSP-
FPGA system. Interconnect among modules may reduce system performance, after [5].
  
Wireless World Research Forum 
(WWRF) 
 
 
Page 3 (5) 
time, and power consumption be the main 
design constraints, but also system 
reconfigurability and design reuse will be 
needed for fast creation of new systems. This 
flexibility will be key for MIMO antenna 
processing blocks that may be configured 
with a varying number of antennas. 
 
As data rates have increased, a single DSP 
processor is not capable to keep up with the 
signal processing requirements of advanced 
algorithms. Multiple DSP processors do not 
have the power efficiency to exploit the 
instruction and data parallelism in the 
algorithms. A solution to this DSP limitation 
has focused on custom ASIC co-processor 
designs clustered around a programmable 
DSP host. However, the multiple co-
processor solution requires extensive design 
and verification time and with the spiraling 
costs of ASIC fabrication, these systems are 
increasing in cost and have little ability for re-
use or modification. The class of application 
specific instruction set processors has 
recently emerged to fill this middle ground 
between DSP and ASIC solutions. For 
example, stream processors, as shown in 
Figure 2, contain multiple SIMD-like clusters 
with specialized memory systems. Our recent 
research on chip equalizers for HSDPA and 
1X-EV-DO systems and LDPC decoding has 
shown the flexibility of these ASIP solutions 
which will lead to System on Chip (Soc) 
processors. From design exploration of 
parallelism and flexibility requirements, these 
future SoC processors will be an efficient 
integration of DSP, ASIP, and ASIC 
structures. 
 
For high data rate 4G systems operating at 
greater than 100 Mbps, there are several 
research trends that are emerging that will 
require significant advances in VLSI 
architectures to provide performance with 
limited area, time and power constraints. For 
many MIMO OFDM systems, effective coding 
will be important. In [30], an iterative turbo-
like system integrating LDPC decoding with 
the receiver is presented. However, for 
outdoor highly mobile environments, 
equalization will be needed even in OFDM 
systems [31,32]. Furthermore, parallel 
interference cancellation can also be used to 
enhance these MC-CDMA systems [33]. With 
the focus in 4G on adapting OFDM, there can 
also be benefits from applying techniques 
from WLAN applications, and also from low-
power UWB OFDM architectures [34]. 
 
In addition to the flexibility needed for 
 
Figure 2: Programmable Multi-cluster Stream processor 
architecture with FPGA co-processor highlighting interconnect 
challenges between functional units and register files, after [1]. 
 
  
Wireless World Research Forum 
(WWRF) 
 
 
Page 4 (5) 
adaptation as 4G standard continue to 
evolve, there is growing interest in system 
reconfigurability and design modularization 
and reuse. However, system reconfigurability 
typically introduces overhead (lower data rate 
and higher power) that may not be 
acceptable in many applications. The 
commercial aspects of the military XG and 
JTRS hardware research are leading to 
cognitive radio and also are studied in the 
European E2R end to end reconfigurability 
project and the WWRF WG-6 on 
Reconfigurability. These current system 
initiatives highlight research challenges in 
increasing data rate and lowering power 
consumption. In order to develop low-cost, 
low-power systems, there is much research 
to be done in creating a low-power 
architecture with an efficient hardware 
abstraction layer (HAL). This HAL would 
allow for efficient hardware and software 
partitioning and for the use of appropriate 
modules to create an adaptable 4G system. 
Current wireless system architectural blocks 
do not have the design standardization and 
modularization to allow for efficient mix and 
match for reconfigurable (or cognitive) radio 
systems. Research in the use of design and 
architecture simulation tools, such as Xilinx 
System Generator, are leading to more 
efficient design reuse strategies. 
 
Conclusion 
 
Reconfigurability and system design and re-
use will be important for 4G wireless 
systems. The mixture of DSP, ASIP, and 
ASIC structure will require efficient 
interconnection interfaces for high 
performance system design. The design of 
these ports, buffers, queues, and the 
resulting hardware abstraction layer will need 
to be optimized for overall system 
performance. Simulation environments will 
need to be enhanced to collect system 
statistics for design exploration of processor 
utilization and power efficiency. 
ACKNOWLEDGMENT 
This work was supported in part by Nokia 
Corporation, Texas Instruments, Inc., and by 
NSF under grants EIA-0224458, and EIA-
0321266. 
REFERENCES 
[1] S. Rajagopal, J. R. Cavallaro, and S. Rixner, 
“Design Space Exploration for Real-Time 
Embedded Stream Processors,” IEEE Micro, vol. 
24, July-August 2004. 
[2] M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, 
J. Rabaey, and A. Sangiovanni-Vincentelli, 
“Addressing the System-on-a-Chip Interconnect 
Woes through Communication-based Design,” in 
Proceedings of the Design Automation Conference, 
(Las Vegas, NV), pp. 667–672, June 2001. 
[3] I. Verbauwhede, P. Schaumont, C. Piguet, and B. 
Kienhuis, “Architectures and Design Techniques for 
Energy Efficient Embedded DSP and Multimedia 
Processing,” in Proceedings of the Design 
Automation and Test in Europe Conference, vol. 2, 
pp. 988–993, February 2004. 
[4] S. Das, S. Rajagopal, C. Sengupta, and J. R. 
Cavallaro, “Arithmetic acceleration techniques for 
wireless communication receivers,” in 33rd 
Asilomar Conference on Signals, Systems, and 
Computers, (Pacific Grove, CA), pp. 1469–1474, 
October 1999. 
[5] S. Rajagopal, B. A. Jones, and J. R. Cavallaro, 
“Task partitioning base-station receiver algorithms 
on multiple DSPs and FPGAs,” in International 
Conference on Signal Processing, Applications and 
Technology, (Dallas, TX), August 2000. 
[6] C. Sengupta, J. R. Cavallaro, and B. Aazhang, “On 
multipath channel estimation for CDMA using 
multiple sensors,” IEEE Transactions on 
Communications, vol. 49, pp. 543–553, March 
2001. 
[7] B. Aazhang and J. R. Cavallaro, “Multi-tier Wireless 
Communications,” Kluwer Wireless Personal 
Communications, Special Issue on Future Strategy 
for the New Millennium Wireless World, vol. 17, pp. 
323–330, June 2001. 
[8] S. Rajagopal and J. R. Cavallaro, “A bit-streaming 
pipelined multiuser detector for wireless 
communications,” in IEEE International Symposium 
on Circuits and Systems (ISCAS), vol. 4, (Sydney, 
Australia), pp. 128–131, May 2001. 
[9] K. Chadha and J. R. Cavallaro, “A Dynamically 
Reconfigurable Viterbi Decoder Architecture,” in 
Proc. Asilomar Conference on Signals, Systems 
and Computers, (Pacific Grove, CA), pp. 66–71, 
November 2001. 
[10] J. R. Cavallaro and M. Vaya, VITURBO: A 
Reconfigurable Architecture for Viterbi and Turbo 
Decoding, IEEE International Conference on 
Acoustics, Speech, and Signal Processing 
(ICASSP), pp. 497-500, Volume II, Hong Kong, 
China, April 2003. 
[11] G. Xu, S. Rajagopal, J. R. Cavallaro, and B. 
Aazhang, “VLSI Implementation of the Multistage 
Detector for Next GenerationWideband CDMA 
Receivers,” Journal of VLSI Signal Processing, vol. 
30, pp. 21–33, March 2002. 
[12] S. Rajagopal, S. Bhashyam, J. R. Cavallaro, and B. 
Aazhang, “Real-time algorithms and architectures 
for multiuser channel estimation and detection in 
wireless base-station receivers,” IEEE Transactions 
on Wireless Communications, vol. 1, pp. 468–479, 
July 2002. 
[13] W. R. Davis, N. Zhang, K. Camera, D. Markovic, T. 
Smilkstein, M. J. Ammer, E. Yeo, S. Augsburger, B. 
Nikolic, and R. W. Brodersen, “A Design 
Environment for High-Throughput Low-Power 
Dedicated Signal Processing Systems,” IEEE 
  
Wireless World Research Forum 
(WWRF) 
 
 
Page 5 (5) 
Journal of Solid-State Circuits, vol. 37, pp. 420–
431, March 2002. 
[14] R.W. Brodersen, W. R. Davis, D. Yee, and N. 
Zhang, “Wireless Systems-on-a-chip Design,” in 
Proc. Intl. Symposium on VLSI Technology, 
Systems, and Applications, (Hsinchu, Taiwan), pp. 
45–48, April 2001. 
[15] B. A. Jones, “Rapid Prototyping of Wireless 
Communications Systems,” Master’s thesis, 
Department of Electrical and Computer 
Engineering, Rice University, Houston, TX, January 
2002. 
[16] S. Rajagopal, S. Bhashyam, J. R. Cavallaro, and B. 
Aazhang, “Efficient VLSI architectures for baseband 
signal processing in wireless base-station 
receivers,” in 12th IEEE International Conference 
on Application-specific Systems, Architectures and 
Processors (ASAP), (Boston, MA), pp. 173–184, 
July 2000. 
[17] A. Bria, F. Gessler, O. Queseth, R. Stridh, M. 
Unbehaun, J.Wu, and J. Zander, “4th-Generation 
Wireless Infrastructures: Scenarios and Research 
Challenges,” IEEE Personal Communications, pp. 
25–31, December 2001. 
[18] U. Varshney and R. Jain, “Issues in Emerging 4G 
Wireless Networks,” IEEE Computer, pp. 94–96, 
June 2001. 
[19] H. Corporaal, “A different approach to high 
perfromance computing,” in High Performance 
Computing, 1997. Proceedings. Fourth International 
Conference on, (Bangalore, India), Dec. 1997. 
[20] S. Srikanteswara, J. Neel, J. H. Reed, and P. 
Athanas, “Designing Soft Radios for High Data 
Rate Systems and Integrated Global Services,” in 
Proc. Asilomar Conference on Signals, Systems 
and Computers, (Pacific Grove, CA), November 
2001. 
[21] I. P. Seskar and N. B. Mandayam, “A Software 
Radio Architecture for Linear Multiuser Detection,” 
IEEE Journal on Selected Areas in 
Communications, vol. 17, pp. 814–823, May 1999. 
[22] S. W. Ellinson and M. P. Fitz, “A Software Radio-
based System for Experimentation in Wireless 
Communications,” in IEEE Vehicular Technology 
Conference (VTC), vol. 3, (Ottawa, Canada), pp. 
2348–2352, May 1998. 
[23] J. Ou, S. Choi, and V. K. Prasanna, “Performance 
Modeling of Reconfigurable SoC Architectures and 
Energy-efficient Mapping of a Class of 
Applications,” in IEEE Symposium on Field-
Programmable Custom Computing Machines, pp. 
241–250, April 2003. 
[24] C. Dick and F. J. Harris, “Configurable Logic for 
Digital Communications: Some Signal Processing 
Perspectives,” IEEE Communications Magazine, 
pp. 107–111, August 1999. 
[25] S. Rajagopal, Data-Parallel Digital Signal 
Processors: Algorithm mapping, Architecture 
Scaling, and Workload Adaptation. PhD thesis, 
Rice University, 2004. 
[26] M. Karkooti, “VLSI architectures for real time LDPC 
decoding,” Master’s thesis, Rice University, 2004. 
[27] OpenCores Library. http://opencores.org/. 
[28] Xilinx Programmable Logic Core Library. 
http://xilinx.com/. 
[29] P. Willmann, M. C. Brogioli, and V. Pai, “Spinach: A 
liberty–based simulator for programmable network 
interface architectures,” in Languages, Compilers, 
and Tools for Embedded Systems, 2004.G. O. 
Young, “Synthetic structure of industrial plastics 
(Book style with paper title and editor),”  in 
Plastics, 2nd ed. vol. 3, J. Peters, Ed.  New York: 
McGraw-Hill, 1964, pp. 15–64. 
[30] B. Lu, G. Yue, and X. Wang, “Performance Analysis 
and Design Optimization of LDPC-Coded MIMO 
OFDM Systems,” IEEE Transactions on Signal 
Processing, Vol. 52, No. 2, February 2004, Pages 
348-361. 
[31] W. Bai, L. Tang, and Z. Bu, “An Equalization 
Method for OFDM in Time Varying Multipath 
Channels,” Vehicular Technology Conference, 
2004. VTC 2004-Fall. 2004 IEEE 60th, Sept. 2004, 
Paper 0303_16. 
[32] M. B. Breinholt, M. D. Zoltowski, and T. A. Thomas, 
“Space-Time Equalization and Interference 
Cancellation for MIMO OFDM,” Proc. of the Thirty-
Sixth Asilomar Conference on Signals, Systems 
and Computers, 2002, Volume: 2 , 3-6 Nov. 2002 
Pages:1688 - 1693 vol.2 
[33] S. Iraji and J. Lilleberg, “Interference Cancellation 
for Space-Time Block-Coded MC-CDMA Systems 
over Multipath Fading Channels,” Vehicular 
Technology Conference, 2003. VTC 2003-Fall. 
2003 IEEE 58th,Volume: 2, 6-9 Oct. 2003, 
Pages:1104 - 1108 Vol.2. 
[34] A. Batra, J. Balakrishnan, and A. Dabak, “Multi-
band OFDM: A New Approach for UWB,” ISCAS 
'04. Proceedings of the 2004 International 
Symposium on Circuits and Systems, Volume: 5, 
23-26 May 2004, Pages: 365 – 368. 
 
