Abstract-This
I. INTRODUCTION
Multicarrier wireless communication technology depends heavily on efficient digital signal processing. For high data rate communication, these DSP functions must be realized by means of parallel signal processing techniques. Fortunately, the principal DSP building blocks for multicarrier communications such as DFTs and FIR filter banks lend themselves readily to parallel processing architectures. However, the realization of multicarrier communication circuitry within a software defined radio [1] or cognitive radio platform may impose further requirements on the DSP portion of the radio namely that it be reconfigurable. In terms of these DSP building blocks, the reconfigurable nature of the circuits could for example involve changes to: dimensionality, as well as, filter coefficient and "twiddle factor" values. In addition the dynamic range of the intermediate values within and between the different DSP building blocks may also have to be adjusted to assure computational accuracy while maintaining acceptable power levels.
These requirements call for flexible and optimized hardware that could support complex DSP computation in real time [2] [3] . Most reconfigurable communications systems are designed to target specific applications such as the system in [4] that aim specifically at signals routing for processing elements of the SoC circuit and the partially reconfigurable system in [5] for WLANs. Other reconfigurable systems offer more flexibility, for example the DRAW system is a systolic array circuit that can be configured to support functions such as M-Sequence generator or FIR filtering [6] . In this paper, a SoC design that is reconfigurable and capable of supporting a wide range of simple and complex DSP functions is presented. The RSA structure presented here supports the implementation of N-channel Polyphase filters, IDFT/DFTs, and IDFT-Polyphase/Polyphase-DFT. The IDFT-Polyphase and Polyphase-DFT functions are usually employed to perform up and down conversion of composite FDM signals. A circuit that supports dynamic reconfigurations of IDFTPolyphase/Polyphase-DFT function could be used to process composite FDM carrier for different channel or spectrum bandwidths according to requirements. As a result, the RSA circuit could potentially be used to support dynamic spectrum access for cognitive network applications [7] .
The paper is organized as follows: a reconfigurable systolic array architecture and representative configurations for wireless applications are considered in section II. Section III describes the design of the reconfigurable systolic array (RSA) structure and its two main building blocks: the processing element (PE) and the switch (SW). Configurations of the RSA circuit that realize DFT, Polyphase filter, Polyphase-DFT, and IDFT-Polyphase functionality are also dealt with here. FPGA implementation and simulation results for the SW cell and RSA circuit are present in section IV. The mapping and simulation results for 32-point DFT, 8-channel: Polyphase filter; 8-channel IDFT-Polyphase and Polyphase-DFT circuits mapped onto the RSA are also presented in this section. Section V provides a summary and conclusions for this work.
II. SYSTOLIC ARRAY ARCHITECTURE FOR WIRELESS APPLICATIONS
Discrete Fourier Transform and FIR filtering are some of the most common DSP functions employed in wireless applications. Both the DFT and FIR filers can be configured by means of regular structures consisting of interconnected multiplier-accumulator (MAC) arithmetic units in which the input data is multiplied by either filter coefficients or "twiddle factors" and the generated partial results are then combined. Systolic arrays with dedicated processing elements have been effectively used to implement FIR filters and DFTs [8] [9] [10] [11] [12] . A block diagram of the array architecture intended to support FIR filters, DFTs, as well as, complex DSP functions is depicted in Fig. 1 . In fact, IDFT-Polyphase and Polyphase-DFT functions employed to perform up and down conversion of composite FDM signals can be implemented on such a systolic array. The IDFT-Polyphase and the Popyphase-DFT circuits both consist of three DSP building blocks [13] namely: a Polyphase filter bank, an IDFT/DFT module, and a bank of phase shifters, implemented by means of complex multipliers. The FIR filters and the phase shifters are operated independently. As a result, the building blocks of the IDFTPolyphase circuit can be mapped onto a systolic array of MACs as shown in Fig. 2 .
Similarly, a Polyphase-DFT circuit could be implemented on a systolic array of MACs as depicted in Fig. 2 , by replacing IDFT with DFT and exchanging its position with the Polyphase filter bank.
III. RECONFIGURABLE SYSTOLIC ARRAY DESIGN

A. The reconfigurable systolic array structure (RSA)
The RSA consists of two types of modules: the processing elements (PE) and the switches (SW). The PE is responsible for arithmetic computations while the SW supports signal routing between the PE cells. Figure 3 depicts the interconnections between the reconfigurable SWs and PEs, which jointly enable the RSA to support different modes of operation. 
B. The PE cell
The PE cell is designed to perform a combination of addition and multiplication operations depending on the circuit configuration requirements. A block diagram of the PE cell is shown in Fig. 4 . A more detailed description of the PE's architecture and design will be published in a companion paper. The PE's mode of operation is governed by the configuration data stored in the control circuit. As the configuration data can be changed dynamically the PE's architecture can be reconfigured in real time. Coefficients stored in the PE cell's RAM block are also configurable in real time.
The PE cell has two sets of inputs, each of which consists of four input buses. Data sequences are shifted out of the PE cell via four output buses, as shown in Fig. 4 . Input sequences shifted into the first set of input buses consist of data to be multiplied by coefficients, while partial results from another PE cell are shifted in via the second set of input buses. These partial results are normally added to results generated from the multiplication process, to produce the PE output data. Each multiplier block in Fig. 4 performs two real multiplications involving two sets of data and coefficients. Thus, four real multiplication operations are performed by the PE cell. 
C. The switch (SW)
The switches route signals from the RSA's inputs to the PE cells, between the PE cells, and from the PE cells to the array's output. Each SW interfaces with four neighboring SWs and three nearby PE cells. The signals from the two neighboring switches to north and west are global signals originating as: a RSA input or the output from a PE cell to the west or north of the SW. Signals from the two neighboring PE cells on the west and north sides of the SW are either global or local signals. The SW routes global signals to its two neighboring SWs on its east and south sides and the PE cell on its east side to which it also routes the local signals.
The global signal routed to the PE cell contains data, which is subjected to addition and/or multiplication operations and combined with the local signal to generate a partial output result. Figure 5 depicts interconnections between a SW and its neighboring SW and PE cells To realize an N-tap FIR filter, a RSA structure with N PE cells is needed. Global signals corresponding to the RSA data inputs are routed to each PE cell. Local signals corresponding to intermediate results are routed to the PE cells from the neighboring PE cell as in Fig. 6 , which depicts signal connections for DFT and FIR filter circuits. The upper and lower parts of Fig. 8 show the circuit configurations and signals interconnections for Polyphase-DFT and IDFT-Polyphase respectively.
IV. CIRCUIT IMPLEMENTATIONS AND SIMULATION RESULTS
In this section, circuit implementation and simulation results for the SW cell and the RSA structures are presented. Results for an N-point DFT implementation where N=32 with serial input and parallel outputs will be presented. The same basic RSA implementation has been configured to operate as a bank of eight 8-tap Polyphase filters, 8-channel Polyphase-DFT, or 8-channel IDFT-Polyphase. The RSA circuit consists of two arrays with a total of 16 PE and 16 SW cells. Two's complement number representation and fixed-point arithmetic has been used throughout. Hardware implementation is based on the Xilinx XC2VP100-6ff1704 FPGA device [14] .
The input and output data are represented by 24 and 34 bit-words respectively while the DFT coefficients are represented by 18 bit-words, all in complex format. Simulations for the RSA and the SW circuits show that they can operate at maximum frequencies of 30 and 222.3 MHz respectively. Simulation result for the RSA circuit shows that its throughput performance is limited by that of the PE cell, which operates at a clock rate of 32.5 MHz. Thus, for a 32-point DFT circuit, a performance of up to 240M DFT operations per second can be achieved. For the 8-channel Polyphase filter, the circuit throughput is 120 MSPS. And for the IDFT-Polyphase and Polyphase-DFT configurations, the RSA circuit shows a throughput of 60 MSPS in both cases.
Simulation results for the SW cell are shown in Table I  and Table II presents logic resource requirements for the various RSA circuit implementations. Throughput results for the 32-point DFT, the 8-channel Polyphase FIR filter, the 8-channel IDFT-Polyphase and Polyphase-DFT circuit mapping are presented in Table III . V. SUMMARY A reconfigurable SoC circuit implementation based on systolic array architecture, suitable for multi-carrier wireless applications, has been presented. The circuit structure consists of an array of identical processing elements and switches that can be reconfigured to perform arithmetic computations required for DFT, FIR-Polyphase filters or IDFT-Polyphase/Polyphase-DFT. The coarse grain and regular characteristics of the RSA architecture make it suitable for system on chip applications. Further research to extend the RSA architecture to support other complex DSP functions is envisaged.
