We have designed a coarse-grained, dynamically reconfigurable architecture, specifically for implementing the wireless MAC layer in consumer hand-held devices. The Dynamically Reconfigurable MAC Processor is a SoC architecture that uses a Reconfigurable Hardware Co-Processor to delegate critical tasks. The coprocessor can reconfigure packet-by-packet, handling upto 3 data streams of different protocols concurrently. We present results of simulations involving transmission and reception of packets, showing that the platform concurrently handles three protocol streams, reconfigures dynamically, yet meets and exceeds the protocol timing constraints, all at a moderate frequency. Thus we show that this architecture is capable of replacing up to three MAC processors in a wireless device. Its heterogeneous and coarse-grained functional units, requirements of limited connectivity between these units, and the idle time of hardware resources promise a very modest powerconsumption, suitable for mobile devices.
I. INTRODUCTION
The trend towards ubiquitous communication requires the implementation of multiple wireless standards on one compact, power-sensitive device. Such devices demand platforms with flexibility to implement multiple standards with short time-to-market at a low power consumption. Domain-limited, heterogeneous reconfigurable architectures offer a solution that enable hitting the right balance between the two for mobile devices. According to [1] , "Reconfigurable architectures that are just-flexibleenough to implement all wireless modes offer a good compromise between low cost, short time-to-market and low power consumption". The 'Dynamically Reconfigurable MAC Processor' (DRMP) is such an architecture, aiming to implement the MAC layer on a power-efficient platform that offers domain-specific flexibility.
We introduced this architecture in some detail in [2] and [3] . In this paper we briefly discuss this architecture and then present results confirming the architecture's capability of handling data of 3 protocols concurrently, reconfiguring packet-by-packet. Section II is discussion on related research and products. In section III, we present the architecture of the DRMP, and then in the next one we present results of simulation runs, discussing the implications of these results.
II. BACKGROUND
Some existing flexible architectures address the wireless domain, e.g. the Quicksilver [4] and Chameleon [5] platforms. These are in some ways similar to the DRMP. However, the foremost difference between these architectures and the DRMP is that these platforms are for operations associated with the PHY layer [6] , while the DRMP addresses the MAC layer which has different design considerations. There are other important differences too. Chameleon targets base stations, and power is not an important consideration. Its 'Datapath Unit' is quite general-purpose. The DRMP is a powerconscious device; its flexibility is limited to the MAC layer. It has heterogeneous, function-specific RFUs. The Quicksilver platform aims to address flexible signal processing needs of Software-Defined Radios. Traditional alternatives like ASICs, microprocessors or FPGAs, are either not flexible enough, or the power and resource overhead due to flexibility is too much for mobile devices.
There has also been some research that discusses the similarity of functions amongst different MAC protocols, and the possibilities of exploiting them ( [7] , [8] , [9] and [10] ). We have not however come across any architecture like the DRMP that specifically addresses the wireless MAC layer for hand-held devices, promising flexibility to dynamically switch between multiple protocol MACs on the same platform, yet maintaining a powerefficiency acceptable for mobile devices.
III. RECONFIGURABLE MAC PROCESSOR
To design a domain-specific reconfigurable platform for implementing the Wireless MAC layer in consumer hand-held devices, we analyzed three wireless standards: WiFi (IEEE 802.11), WiMAX (IEEE 802.16), and the High-speed WPAN (IEEE 802.15.3). Investigation of these wireless standards indicated that there is substantial overlap amongst these protocols [11] . This is the rationale for the design of a domain-limited reconfigurable platform that exploits these commonalities by using function-oriented Reconfigurable Functional Units (RFU). The prototype platform is designed to be flexible enough to concurrently implement three MACs. This implementation is expected to be more power-efficient than an equivalent implementation of the three MACs on either a microprocessor or an FPGA.
In the DRMP, the functionality of wireless MACs has been partitioned to a microprocessing unit (MPU) and a Reconfigurable Hardware Co-Processor (RHCP) (Fig. 1) . The MPU implements management and highlevel control functions of the MAC. The remaining functionality primarily includes time-critical operations associated with packet transmission and reception. In this area, we found the maximum overlap amongst the standards we investigated, hence the implementation on reconfigurable hardware. To optimize power-efficiency, the RHCP has coarse-grained, heterogenous, functionspecific Reconfigurable Functional Units (RFUs).
The Reconfigurable Hardware Co-Processor: The RHCP ( Fig. 2 ) interacts with the MPU through an Interface and Reconfiguration Controller (IRC) which delegates tasks to flexible RFUs. The RFUs carry out the tasks requested by MPU, and have a uniform interface. They are dynamically and individually reconfigurable. They are connected by a single packet-bus that also connects them to the packet-memory and the IRC. An Event handler interprets Rx events and formats service Inside the IRC, an Interface Controller (IC) interprets MPU commands to the RHCP, and delegates them to RFUs. A complementary Reconfiguration Controller (RC) controls reconfiguration of the RFUs packet-bypacket. The control task of the IC is delegated to three Task Handlers (TH), one for each of the three protocol modes that are running concurrently.
IV. IMPLEMENTATION AND RESULTS
On a prototype DRMP model in Simulink, we successfully transmitted and received 3 packets concurrently, assumed to be of 3 different protocol modes. The timing is cycle-approximate. The bus-interface is approximate but more detailed than a transaction-level model.
A. Simulation Results
In [11] , we had estimated that considerable time slack will be available to the DRMP because the time taken to process a packet would be considerably less than the packet duration. In [3] , we presented results that confirmed these observations. Based on these observations and results, we'd proposed that a packet-by-packet reconfiguration will be possible, allowing concurrent protocol processing; and also that there will be room for power efficiency improvement by trading-off time slack.
We have now successfully run simulations of concurrent transmission and reception of three packets of different modes. Application processor of the transmitting device sends three packets, each packet of a separate protocol data stream. The DRMP processes these packets one by one, reconfiguring RFUs as it switches from one mode to another, and then stores packets in their respective transmit buffers. The receiving device receives these packets concurrently in its buffers, the MAC processing is done in the DRMP sequentially, the RFUs reconfigured and shared among the three modes.
The size of the packet in each mode is 200 bytes, broken into 3 fragments. The architecture is assumed to run at a frequency of 200 MHz. The exchange of data with the PHY is modeled at 20 Mbps for all three modes. Fig. 3 shows the output taken directly from the simulation showing the active and idle times of various blocks in the DRMP for the first 30 microseconds of the transmission of the three packets. Note that that while the task-handlers-unique to each protocol mode-run concurrently, the RFUs are time-multiplexed among the three protocol modes. Yet, the packets are processed and ready to be sent in a fraction of the packet durations. Fig. 4 shows a similar situation for the packet reception (with complete packet duration shown). Tables I and II show the actual and proportional durations that the blocks are busy during transmission and reception. These results have been compared with results from a simulation with just one protocol mode [3] .
B. Discussion of Results
These results have proved that it is possible to dynamically reconfigure the DRMP architecture on a packet-bypacket basis, and handle three protocol modes concurrently. The platform can thus be used in a multi-standard device and concurrently handle the MAC processing of 3 wireless protocols. All this is achievable at a moderate frequency of 200 MHz on a 32-bit architecture.
Its worth pointing out that large parts of the architecture are idle even when three modes run concurrently-a typical RFU is active for around 10% of packet duration. In fact, when just one mode is active, which we can expect to be the case for most of the time the device is being used, the RFUs are typically busy for less than 5% to process a packet. We can save considerable power by exploiting this time lag: E.g. parts of the DRMP can be switched off when idle; or e.g. we could dynamically scale the operating frequency so that the DRMP's throughput is just fast enough to meet real-time protocol constraints, and no more. Compared to general-purpose reconfigurable architectures like FPGAs, the DRMP needs less interconnect resources. Moreover, heterogeneous function-specific reconfigurable units will need less configuration data than general-purpose units like LUT based logic blocks. All these features would add up to give power-efficient flexibility in the DRMP.
We have also compared the duration from the time that a request for packet transmission is received, to the time the packet is processed completely and is stored in the transmission buffer. We first measured the duration with one protocol running [3] , and then measured this duration with three protocol modes running, taking the worst-case result of the three modes. We observed that the packet processing lag increases from 8.9µs for one mode, to 24.5µs with three modes concurrently active. This increase of 15.6µs is a fraction of the packet duration. We can conclude that the processing lag experienced by one protocol mode due to resource sharing of the DRMP amongst two other modes is not significant. The abstract software model simply keeps track of the state of the system and does not perform computationally intensive tasks. The software is completely interruptdriven and only generates control signals, resulting in a very simple, lightweight API.
We have currently modeled most of the RFUs as context-switching RFUs, while when three different protocols are actually deployed, some RFUs may be reading configuration data from a memory on a mode switch. However, because the RFUs are function-specific, it's safe to assume that the configuration data will be very little compared to more general-purpose functional units. A reconfiguration data throughput of 6 Gbps (32-bit reconfiguration bus at 200 MHz) will ensure that this little reconfiguration data is loaded very quickly.
V. CONCLUSION
The DRMP is an innovative coarse-grained dynamically reconfigurable architecture, specialized for the Wireless MAC layer. It has been designed to address requirements of multi-standard consumer hand-held devices. In this paper, we've presented the results of simulations involving three protocol modes transmitting and receiving concurrently. From the results, we've shown that the DRMP is more than capable of meeting the protocol timing requirements even though it shares the hardware resources amongst the three protocol modes, and dynamically reconfigures the functional resources on every packet. This performance is achieved at a modest 200MHz clock, and yet leaves considerable timeslack that can be exploited for gaining even more powerefficiency than the coarse-grained and heterogeneous nature of the DRMP inherently offers.
We are working to synthesize the model to lower abstractions so that we can get accurate results of its resource and power consumption. There is also room to explore the architecture at the current level, focusing on e.g. the kind of RFUs that are best suited for prevalent protocols. We are confident our future investigations will confirm that the DRMP is a well-suited platform for commercial, multi-standard consumer devices. In context of the MAC layers, it will offer a attractive combination of flexibility, power-efficiency, programmability and, given it's produced in enough numbers, cost.
