Power will be the key limiter to system scalability as interconnection networks take up an increasingly significant portion of system power. In this paper, we propose an architectural leakage power modeling methodology that achieves 95-98% accuracy against HSPICE estimates. When applied to interconnection networks, combined with previous proposed dynamic power models, we gain valuable insights on total network power consumption. Our modeling shows router buffers to be a prime candidate for leakage power optimization. We thus investigate the design space of power-aware buffer policies, propose a suite of policies, and explore the impact of various circuits mechanisms on these policies .
INTRODUCTION
As power becomes the dominant constraint in many computer systems, research into power-efficient systems has thrived. In many of these systems, the network fabric is a significant consumer of power. This has resulted in researchers modeling [9] and optimizing [8, 101 the dynamic power consumption of interconnection networks. As technology scales to deep sub-micron processes, leakage power becomes increasingly significant as compared to dynamic power. There is Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission andor a fee. thus a growing need to characterize and optimize network leakage power as well.
ISLPED'O3,
In this paper, we propose a new architectural methodology for estimating leakage power that distinguishes technologydependent from technology-independent variables, providing the flexibility of an architecture-level power model where architectural parameters suffice, together with the rigorous accuracy of a low-level model. An accurate model allows architects to rapidly estimate leakage power as they iterate across alternative designs. We applied our methodology to both on-chip and chip-to-chip interconnection networks, and validated our estimates against HSPICE, obtaining 95-98% accuracy.
By combining our proposed leakage power model with a dynamic power model [9], we were able to gather insights on the total power consumption of networks, characterizing the power breakdown of various network components as technology scales. Our modeling guided us to investigate and propose power-aware buffers as a leakage power optimization technique. We then explore the design space of architectural policies for power-aware buffers, and propose a suite of techniques that are able to save up to 96.6% of total buffer leakage power.
AN ARCHITECTURAL LEAKAGE POWER MODELING METHODOLOGY
Leakage current has five basic components: reverse biased pn junction current, sub-threshold leakage current, gated induced drain leakage, punch through current and gate tunneling current. These leakage current components have an almost linear relation with transistor width. For instance, subthreshold current I s u b which currently dominates leakage current is defined as follows [l]:
For a given circuit type i and input state s at a process technology, subthreshold current is almost proportional to the transistor width W. Although, different components will have different impact on leakage current as technology scales, e.g. gated induced drain current will become more and more significant, the total leakage current still keeps an almost linear relation with transistor width. We believe this is the first leakage power modeling methodology that truly separates technology-dependent and independent variables. In [a] , a single ICdeszgn is used to reflect the composition of device types (N/P), geometries (W/L), states (on/off), and stacking factors. As a result, k d e s z g n is extremely sensitive to changes in any of the variables and the impact of architectural parameters hard to isolate. In [6] , P;:ik = X l z b . CellsS is used to estimate the leakage power in an ASIC design environment, where xLzb, Slzb are technology-dependent parameters derived through experiments and "Cells" is the number of cells in the design. This model targets a later design stage then the architectural stage, when designers explore various circuit designs for a selected architecture. 
Derivation of I~'~~~

Leakage power modeling of router buffers
We applied our methodology to the major building blocks of interconnection networks as identified in [9] -buffers, crossbars, arbiters, and links. Here, we walk through our modeling of router buffers to demonstrate the methodology. Finally, we can estimate the total leakage current of a router buffer (Eq. 6) while its leakage power is leakage current multiplied by supply voltage (Eq. 7).
B-1 W ( t y p e ( I N V , l ) )
' B L ' Ileak(INV, 1
Validation
We validated our model with HSPICE simulation of each complete functional unit of a chip-to-chip router (crossbar, arbiter, and buffers) in 0.07pm technology. Leakage currents under different input states were estimated with our model and compared with the leakage currents obtained from HSPICE simulation for the same functional unit with the same input states, the exact structure and feature sizes. For instance, a 5-by-5 matrix crossbar unit has 5 data inputs and 25 control signals. The combination of their values determine the input state of the crossbar and thus the leakage current. For such functional units with a vast number of possible input states, we select a random sample of typical input states for validation. The accuracy of our model for these functional units is computed by averaging across different input states. Table 3 shows mean and standard deviation of our model's error in 0.07pm technology compared with HSPICE simulation. Since leakage current is large at 0.07pm, we expect the magnitude of error to be larger than that in earlier process technology. 
DYNAMIC AND LEAKAGE POWER CHARACTERIZATION OF INTERCON-NECTION NETWORKS
Combined with Orion, an architectural dynamic power model for networks [9] , we characterized the total power consumption of both an on-chip network and a chip-to-chip network. The on-chip network is parameterized as in [3], with a 4-by-4 mesh network on a 12mm2 chip, each node clocked at 1GHz, with 5 input/output ports (one of which is the injection/ejection port), 64 flit buffers per input port (each flit 128 bits wide), connected with a 5-by-5 matrix crossbar and 5 5:l arbiters. The router in the chip-to-chip network has 256 128-bit flit buffers per input port instead, other parameters remaining the same as that in the on-chip network. The feature size of the transistors is derived by Orion [9] from architectural parameters based on the timing delay requirements and assuming minimum area.
Effect of process technology. Tables 4 and 5 show the estimates for a router in a chip-to-chip and on-chip network respectively at 50% flit arriving rate in 1s at 80°C. As technology scales, leakage power becomes increasingly significant, starting from 2.5% of total (leakage+switching) power at current 0.18pm technology, to a hefty 60% at 0.07pm technology if clock frequency is kept invariant for the chipto-chip network. Even assuming doubling clock frequencies as we scale process technology, leakage power remains a significant 27% at 0.07pm. Though the on-chip network has fewer storage elements, leakage power still rises to a significant 21% a t 0.07pm, assuming clock frequency doubles each process generation. Table 5 , it is evident that full-swing on-chip link drivers and wires consume substantial dynamic power, overwhelming that of the router core in 0 . 1 0 and 0.07pm processes. However, when you look at leakage power consumption of router vs. links, the converse is true. As wires do not dissipate leakage power, the leakage power consumption of just the drivers is minimal, compared t o that of the router core. This prompted us to delve into a leakage power breakdown of various functional units within an on-chip router.
Breakdown of leakage power within a router. Fig. 2 shows the leakage power consumed by the various major functional units of an on-chip router and its links at different process technologies. It shows buffers consuming approximately 64% percent leakage power of the total node (router+link) for all process technologies, standing as the largest leakage power consumer. Our characterization highlights router buffers as a prime candidate for leakage power optimization.
POWER-AWARE BUFFERS
As interconnection networks experience significant temporal and spatial variance in workload that leads to highly varying buffer utilization, we propose power-aware buffers as an architectural technique for leakage power optimization in interconnection networks -2. e. buffers that regulate their own leakage power consumption based on actual utilization. To explore the potential of power-aware buffers, we first characterize network buffer utilization with the traffic model proposed in [8] , with Poisson task inter-arrival rate, and self-similar packet inter-arrival rates within each task session. This workload exhibits the high temporal and spatial variance present in many real-life networks. We simulate the chip-to-chip network described in Sec. 3 (2 virtual channels per port). Fixed-length packets of 20 flits are assumed. Fig. 3 graphs the average and minimum number of idle buffers as traffic increases. As expected, a large number of buffers is left idle at low injection rates. Interestingly, while there are routers whose buffers are fully-occupied (minimum number of idle buffers = 0) at high network load, average buffer utilization remains rather low, with about 85% idle buffers. This is reflective of the high variance in the workload that results in a large gap between average and maximum network utilization that is inherent in many actual workloads. Clearly, placing these idle buffers in an inactive mode that uses less leakage power will result in significant leakage power savings.
Power-aware buffer policy design
A router buffer is utilized in a stream-like fashion. When a flit enters a router, it gets written into an unoccupied buffer, and sits there while a series of router operations is triggered: routing, virtual-channel allocation and switch allocation. When it is scheduled to leave the router, the flit is read from the buffer pool, and the buffer is then marked as unoccupied and released back to the free list, ready to be reused when a new flit enters the router.
We term a policy that turns a buffer to inactive mode only when it's unoccupied single and one that switches a buffer to inactive mode anytime it's not being accessed, i.e. when it's both unoccupied and occupied, double. To evaluate the effectiveness of any policy, we need a yardstick -we define two theoretically ideal, though unachievable, policies: Ideal-Single, that reduces leakage power to zero instantly for buffers that are unoccupied with no additional power overhead, and IdeaE-Double, that does so similarly for buffers when they are both unoccupied and not being accessed.
A power-aware buffer policy can be oblivious, i.e. it does not take current buffer utilization or workload into account; or adaptive, tuning the policy according to current utilization. It can also be conservative, making sure network performance is not impacted, vs. aggressive, targeting as much leakage power savings as possible, even if this comes at the expense of network performance. Fig. 4 shows the design space of power-aware buffer policies that we envision, and several simple policies that we propose at each design point. Each policy can target either single or double leakage power savings. First, we propose a conservative policy, Lookahead, that obliviously places buffers in low-leakage mode and wakes them up N cycles before they are accessed. When a flit is read from the buffer queue, that buffer will be switched to the low-leakage inactive mode (if there are more than N free buffers), and when a flit arrives and is written into a router buffer at the tail of the queue, the buffer that is N cells ahead will be switched to normal operating mode. The policy is conservative as it sets the lookahead window of N to the number of cycles needed to switch a buffer from inactive to active mode (transition delay), so a free buffer will always be available when flits arrive, and network performance will never be affected. Clearly, if the buffer size B is less than N I our policy will result in no leakage power savings. An aggressive variant of this policy, Lookahead-Agg simply shortens N to less than the transition delay, trading off performance for higher leakage power savings. Our implementation of Lookahead inserts a newly freed buffer back at the head of the free list, so an active buffer has the highest chance of reuse, minimizing the impact on network performance significantly. A simple adaptive policy, we call Predictive, uses prior buffer utilization history to predict future usage, adjusting the lookahead window N accordingly. We use a simple statistic -when there are more writes than reads to a buffer in a time window W , N is incremented till it hits an upper-bound N h z g h . Otherwise, it is decremented to a lower-bound Nlow. The intuition is that when buffer writes outnumber reads, the buffer pool is building up, with fewer and fewer free buffers, so an adaptive policy should be less aggressive in switching buffers to inactive mode in order to enhance network performance. Conversely, when more flits are leaving rather than entering the router, an adaptive policy can more aggressively switch off buffers, guessing that fewer will be needed.
Circuit-level mechanisms
Power-aware buffers require circuit-level mechanisms that allow buffers to be put into inactive mode for leakage power savings. Several circuit-level mechanisms have been proposed for leakage power savings in SRAMs [4, 71, targeted for microprocessor caches. Since router buffers are usually constructed with SRAMs, these can be readily applied to power-aware buffers. The characteristics of circuit-level mechanisms that are critical to power-aware buffers are: (1) transition delaythe time it takes to switch a buffer between the normal operating mode and the inactive mode; (2) transition energy -the dynamic energy incurred each time to effect a transition; ( 3 ) leakage power savings -the difference between the leakage power incurred at normal operating mode and that at inactive mode; and (4) data preservation -whether the inactive mode preserves the contents of the SRAMs, z.e.
whether this circuit technique can be applied to both szngle and double power-aware buffer policies.
In this paper, we choose two circuit-level mechanikms with fairly different characteristics -Drowsy [4], and Gated V d d SRAMs [7] . Drowsy SRAMs have faster transition delays than Gated SRAMs, preserves data content, but delivers less leakage energy savings in the inactive mode as shown in Table 62 . Both techniques have negligible effect on the access time.
EXPERIMENTAL RESULTS
We extend a C++ network simulator to investigate the The converse is however true with smaller buffers (Fig. 6) . Here, the large N of Lookahead (Gated V d d ) constrains the number of buffers that can be turned inactive, and the low transition delay of Drowsy cells win over. Note that as traffic rate increases, however, flits occupy buffers for a longer time, so Lookahead-Single (Drowsy) is unable to exploit its fast transition delay. Lookahead-Double (Drowsy) however leverages this for higher leakage power savings at high traffic injection rates. Fig. 7 shows that as expected, Lookahead-Agg improves the leakage power savings of Lookahead, pushing savings up to 81% a t low traffic. Predictive pushes it even further, up to 88% savings a t low traffic. Even at very high traffic loads, Predictive still saves 71% leakage power, as it better adapts to actual utilization. This shows that even a simple adaptive policy can outperform oblivious policies.
Performance impact of power-aware buffer policies. Lookahead, being a conservative policy, does not have an impact on performance as it always ensures there will at least be an active, buffer available awaiting an arriving flit. However, the aggressive Lookahead-Agg and Predictive policies can potentially cause performance penalties. Fig. 8 sim- ulates the latency-throughput performance of these two policies, showing negligible performance degradation for both policies as compared to a network with no power-aware buffers.
CONCLUSIONS
We have proposed a methodology for modeling leakage power' on the architecture level. To facilitate the use of this methodology, we will distribute the tables online. We here also incorporated our network architectural leakage power models into Orion [9] so architects can easily factor in dynamic and leakage power estimates when evaluating network architectures. By delineating the design space for power-aware buffer policies, and exploring the impact of several simple alternatives, we hope our work will motivate the proposal of sophisticated policies in the future. tailed parameters of Drowsy Cache and Gated Vdd respectively. At Princeton, we wish to thank Hang-Sheng Wang for his help in characterizing dynamic power using Orion and Li Shang for assistance with the PopNet network simulator. This work is partially funded by NSF CAREER grant CCR-0237540.
