INTRODUCTION
FPGAs provide a flexible computing platform for embedded systems. However, this flexibility comes at a higher cost in power consumption. Interconnect overhead for FPGAs is as high as 60% or greater of the power budget for large designs.
The configurability offered by programmable switch elements introduces large delays and power in long paths. In this project, we designed and analyzed circuit techniques for allowing fine-grain tradeoffs in power consumption. Additionally, the logic block is optimized for implementing distributed arithmetic based filters. Fine-grain voltage domains allow low energy operation in non-critical areas of logic and routing segments. This work includes implementation using a semi-custom design for a submicron fabrication process † .
RESEARCH SUMMARY
Power scaling in FPGAs can be enabled with support for distributed configurable voltage domains and block-level control. Contemporary designs using this approach are described in [1] [2] . Each Configurable Logic Block (CLB) represents a unique power domain tile on the chip. Each tile can be configured independently to operate at high VDD, low VDD or disabled. Additionally, the switch elements in the array by using a dual VDD buffer can also select the operating voltage. An example of configurable VDD routing is shown in Figure 1 .
In this work, a customized design flow to support multiple on-chip voltage domains leverages existing commercial tools such as Astro and standard cell library enhanced with a few additional cells for power management. A layout for one tile is shown in Figure 2 . The standard flow was † We would like to acknowledge support for this project from MARCO Interconnect Focus Center and DARPA contracts. We also acknowledge National Semiconductor for providing fabrication services. modified using a scripting language to control physical placement and isolation of the individual regions. Clock skew was managed using Clock Tree Synthesis where all inserted buffers run on a common VDD separate from the other voltage domains.
Power gating devices add area and delay overhead. Including the configuration memory for control, the area overhead is a few percent of the total area. The power switches are sized for a worst case 5% delay penalty. Figure 3 depicts a buffered switch element for dual VDD routing.
Further reduction in interconnect capacitance is realized through a better partitioning of non-critical configuration elements to reduce wire lengths. By relocating the switch block and CLB configuration memory to the periphery of the array, substantial area can be reduced thereby reducing the length and parasitic capacitance on the critical routing wires. The routing overhead for this partitioning was shown to be manageable in an 8x8 array. For very large arrays, this partitioning can be applied hierarchically.
We developed a power-aware place and route tool, using a modified version of VPR [3] , to determine which noncritical paths can run at reduced voltage and to configure the various domains of the array accordingly. Low overhead level converters provide voltage conversion between domains. With these fine grain controls the software is able to make tradeoffs between power and performance and deliver maximum power savings with minimum performance impact. Subthreshold leakage reduction is achieved with fine grain sleep regions that shutdown inactive regions resulting in an order of magnitude reduction in leakage current. A more detailed analysis of this technique is described in [4] .
Several benchmark circuits are analyzed using a high VDD of 1.8V and a low VDD ranging from 1.3V to 0.5V. A pseudo-randomly distributed set of vectors are applied at the inputs. The switching power is measured at the A test chip containing an 8x8 array of CLBs has been fabricated using a 0.18 µm CMOS process. This test chip will provide validation for this low power design methodology and will provide further insights into designing deep submicron FPGAs.
CONCLUSION
By introducing these fine-grain hardware controls at several levels, this power-aware FPGA forms the basis for a platform that allows for effective in-system energy-delay tradeoffs for energy-constrained applications such as handheld wireless systems. This work extends contemporary work on dual VDD FPGAs and addresses important implementation details. An FPGA test chip has been designed that contains 64 CLBs and 128 I/Os using the National Semiconductor CMOS 0.18um 5LM process.
