Recent studies have shown that most of the assigned radio spectrum is underutilized. On the other hand, the increasing number of wireless multimedia applications leads to increasing reconfigurability and energy efficiency. Therefore, we prospectrum scarcity. Cognitive Radio ([1] , [2] ) is proposed as posed a tiled MPSoC (see figure 1) architecture to support a promising technology to address the paradox of spectrum Cognitive Radio [5] . These tiles can be various processing scarcity and spectrum under-utilization. In Cognitive Radio, elements including General Purpose Processors (GPPs), Field spectrum sensing locates the unused spectrum segments in a Programmable Gate Arrays (FPGAs), Application Specific Intargeted spectrum pool. These segments will be used optimally tegrated Circuits (ASICs) and Domain Specific Reconfigurable without harmful interference to licensed users. This technology Hardware (DSRH) modules which target the specific algorithm is called spectrum pooling [3] . In spectrum pooling, OFDM domains. The Montium [6] tile processor developed at the is proposed as the baseband transmission scheme. Those University of Twente is an example of a DSRH. It targets subcarries which cause interference to licensed users should the digital signal processing (DSP) algorithm domain and has be nullified. Therefore, OFDM based Cognitive Radio has the flexibility to adapt to different algorithms in an energyto be reconfigurable to be adaptive to the changing wire-efficient manner. Therefore, the Montium tiled processor is less channels. This reconfigurability has to be supported by the key element in our proposed reconfigurable platform for a reconfigurable platform. Our research undertaken in the Cognitive Radio. The tiles in the SoC are interconnected Adaptive Ad-hoc Freeband (AAF) project [4] focuses on by a Network-on-Chip (NoC). Both the SoC and NoC are mapping the baseband algorithms of Cognitive Radio onto a dynamically reconfigurable, which means that the programs reconfigurable platform.
(running on the reconfigurable processing elements) as well as the communication links between the processing elements II. MPSoC FOR COGNITIVE RADIO are configured at run-time. Cognitive Radio is seen as an evolution from the softwaredefined radio platform [1] . However, the traditional software-III. TTL DESIGN METHODOLOGY defined radio platform for digital processing is mainly based MPSoCs offer many advantages as described in the previous on General Purpose Processors (GPPs) and Digital Signal section. However, it is a challenging task to map applications Processors(DSPs) which are inadequate for future high data onto MPSoCs. First, the applications to be mapped on the rate wireless communications in terms of processing speed MPSoC become more complex: they consist of more and more and energy efficiency. With the advance of the semiconductor tasks and some of the tasks may change their behavior dynamitechnology, the future trend of wireless baseband processors cally. Second, in order to map tasks to different components on is moving toward Multiprocessor System-on-Chips (MPSoCs) an MPSoC, designers have to deal with the low-level interfaces which integrate heterogeneous processing elements tailored for for the inter-component communication and synchronization different processing tasks. MPSoCs offer high performance, which become a bottleneck from a performance and an energy point of view. Further, opportunities for the reuse of hardware where R is the data rate; K is the number of the subcarriers; and software modules are limited and no method exists for No is the noise power density, B is the band of interest for exploring their trade-offs. Therefore, there is a gap between Cognitive Radio, hk is the subcarrier gain and Pk is the power the application models used for specification and the optimized allocated to the corresponding subcarrier. Fk is the factor implementation of the application on an MPSoC. A task indicating the availability of subcarrier k to Cognitive Radio, transaction level (TTL) interface approach [7] was proposed where Fk = 1 means the kth carrier can be used by Cognitive to help to close the gap by raising the abstraction level. Radio. The system power minimization can also be applied We propose to use the TTL approach both for developing under the constraint of a constant data rate. We formulate it the Cognitive Radio application at the system level and as as follows: a platform interface for implementing the application onto the K MPSoC architecture.
Min Z Pk= Ptotal
In the TTL approach, an application is modelled as a task During the first step, we create the task graph (see figure 4) of the reconfigurable sparse FFT. The source task generates the It is beneficial and often necessary to take advantage of the input samples which are sent via the data channel to the FFT sparse structure algorithmically to reduce the operations of the task. The destination task consumes the output samples from standard algorithms. Therefore, we propose a low complexity the FFT task. A configuration manager decides the type of FFT algorithm as an option for OFDM based Cognitive FFT algorithm, depending on the number of non-zero values Radio in [9] . We term this FFT algorithm sparse FFT. The L in the bit allocation vector. If L < N/2, the configuration algorithm is based on transform decomposition in [10] , but manager will generate the configuration data for a sparse FFT. has been tailored for our Cognitive Radio system. Transform Then it indicates the FFT task to perform sparse FFT and sends decomposition can be seen as a modified Cooley-Tukey FFT all the configuration data to the FFT task via the configuration where the DFT is decomposed into two smaller DFTs. The channel. The configuration manager will go to standby until detailed mathematical derivation of the algorithm can be found a new bit allocation vector arrives. Depending on L, the FFT in [10]; here we only show the computational structure in task either performs radix-2 FFT or the sparse FFT. figure 3. We made some modifications to the original algorithm
The TTL functions are called from the TTL C/C++ library to facilitate efficient hardware implementations. We choose to create tasks, define communication interfaces and generate the total number of carrier N as a power-of-two integer and the task graph. At the system level, the tasks are coded in L is the number of non-zero outputs. N1 is chosen as a C/C++. But in the platform implementation, the tasks can be nearest power-of-two integer larger than L. This choice of N1 implementations on a particular processor. Here we give a helps to exploit more regularities. Thus N2 is also a power-pseudo code example of the TTL implementation to show how of-two integer which satisfies N = N1N2. The algorithm is the reconfiguration is done for the FFT task. decomposed into two major parts: the N2 blocks of Nl-point Task Task_FFT DFTs which can be implemented as radix-2 FFTs and the mul-{ initialization; tiplications with twiddle factors and the recombination of the while(true) {local variables; multiplications. The reduction of computation comes from the \ \check the configuration updates second part where only L twiddle factors are multiplied with tryAcquiredata(Task_FFT->config-inport) each Xj2 ((k)N1) (denoting modulo N1) for n2= 1, 2, ..., N2. {\\update L According to the computational structure, we perform quan-\t\read in configuration titative analysis on the computational complexity by counting ttl_read(Task_FFT->config-inport, SFFT_CONFIG_DATA); the number of complex multiplications. So the number of i +N~~\ \read in data multiplication for sparse FFT is (N2 -1) * L + N log2 N1, for(i=0; i<num_samples; i±±) which is less than the number of multiplications for radix-2 ttl_read (Task_FFT->data_inport, proc_buffer [i); FFT (N log2N) when L < N/2. \\sparse FFT or radix-2 FFT 2 02 if (L<num samples/2) Therefore, we propose that the system will be reconfigured {\\sparse FFT processing to a sparse FFT when there are a large number of zeros in the call sparse_FFT; bit allocation vector. else C. The TTL Implementation {\\radix-2 FFT We implemented the reconfigurable sparse FFT in the TTL} environment to achieve the following goals: 1) to verify the foWr(itO oiu<nurwsamples; i±±) sparse FFT at system level; 2) to obtain high level profile inforttl_write (Task_FFT->data_outport, proc_buffer [i]);
The FFT task checks the updates from the configuration channel. If a new configuration is generated by the configuration manager, the FFT task will read in the configuration data We applied the reconfigurable sparse FFT to an OFDM TABLE III show an example of the sparse FFT reconfiguration for the MINIMUM PROCESSING REQUIREMENTS given OFDM system. We denote all the zero output indexes of the FFT with 0 and non-zero indexes with 1. In Scenario 1, 420 of 512 indexes are non-zeros which means that most of the dependent profile for specific implementations. By associating subcarriers can be used by Cognitive Radio. To avoid causing execution times with instructions, and by multiplying these interference to a potential licensed user, Cognitive Radio has execution times with the instruction counts, one can obtain a to switch off a certain number subcarriers and re-assign the rough estimate of total execution time of a task on a certain transmitted information to the available subchannels. This processor. Considering the Montium for the FFT task, the corresponds to Scenario 2 where only 56 out of 512 indexes Montium can execute one complex multiplication instruction are non-zeros. From Scenario 1 to Scenario 2, the FFT task in one clock cycle. From Table III , we find that the Montium is reconfigured from a radix-2 FFT to a sparse FFT. The high has to run at at least 23MHz for Scenario 1 and 19MHz for level TTL implementation has been run on a Linux PC. The Scenario 2. In other words, the sparse FFT will save 16% computation result verifies the functional correctness of the processing capacity. Such a reconfiguration only takes place sparse FFT. The TTL run-time environment can generate high when the bit allocation vector has been updated. We expect level profile information in terms of computation workload the bit allocation vector not to change very often: at least it and communication workload. The computation workload is will be constant over several OFDM frames. Therefore the measured by counting the number of annotated instructions reconfiguration overhead is relatively small compared to the while the communication workload is measured by counting saving of computations. From the TTL profile information, we the number tokens (data units) that are travelling through the compare the computation workload of 512-point sparse FFT TTL channels. Table II shows the computation workload of the for various L with different zero distributions. Figure 6 shows FFT task in two scenarios generated by TTL. The reduction that the computation workload increases with the number of non-zero L. 
