Partitioning of a system to maximize exploitable sleep time f o r low-power synthesis is discussed. The motivation is to deactivate the memory refresh circuitry, apply power down or disable the clock signals during the inactive periods of operation of circuit elements, and thus minimize the power consumption. Since it is impractical to have a separate set of control signals for each circuit element (otherwise, the control itself would consume a lot of power), it is advisable to partition a circuit based on the activity patterns of its elements so that the partitions can be switched into sleep mode for long periods of time. In this paper, we formulate this partitioning problem and show that it is NP-hard. We present GeoPart , a.geometric partitioning heuristic for this problem. An eficient implementation of Geo-Part using segment tree data structure is discussed. Ezperimental results are encouraging.
Introduction
Not so long ago, the main objectives in the automation process of designing VLSI chips were: 1. increased processing speed, 2. reduced chip area, and 3. testability. With the modern advances in VLSI and packaging technologies, the average transistor count has shown an increase rate of about one hundred-fold per decade [3], allowing much more complex functionality per chip, The minimization of the average power consumption in modern VLSI circuits is an emerging objective of utmost importance due to a number of reasons, including: longer battery life for portable communication and computing appliances, heat dissipation bottleneck in highly integrated circuits [3], and environmental issues [Z] . Because of the importance of the power consumption issue, there has been considerable shift of attention in logic and layout synthesis areas [15, 161 and more recently in high-level synthesis [4, 101 from the delay and area minimization issues towards low power design.
There are three sources of power consumption in CMOS circuits: the charging and discharging of capacitive loads during switchings a t gate outputs, the short circuit current which flows during output transitions, and the leakage current. The last two sources should be dealt with and optimized using proper device and circuit design techniques, hence the design automation community has focused on the minimization of the first source, which is frequently referred to as the switching power or dynamic power. Transition density or average switching rate at different sites in a circuit is introduced in [Ill as a quantity to measure the circuit activity, which can be used to estimate the average dynamic power consumption in a digital circuit. In this paper we study the partitioning problem to exploit sleep mode operation for power minimization in digital circuits. In a general setting, this problem can be viewed as partitioning a set of circuit elements such that the savings in power consumption achieved by switching each partition as a whole into sleep is maximized. Placing a partition into sleep mode assumes different meanings in each setting of the problem. The problem finds many applications in low power design, e.g., memory segmentation, partitioning to power-down portions of the design, and activity-driven clock tree design (see Figure 1) .
In this paper we formulate the partitioning problem to maximize sleep time, while at the same time controlling the total number of switchings of the partitions into and out of sleep mode and show that the problem is NP-hard. Because of the geometric nature of the problem, direct application of graph partitioning schemes to this partitioning problem IS not possible. We present GeoPart , a geometric algorithm based on iterative improvement techniques [9] for this problem. We present a n efficient implementation of this algorithm using the segment tree data structure [12] , and discuss its time complexity. We also discuss upper bounds on total achievable sleep time for a given instance of the problem. Experiments for memory segmentation were conducted on a number of sorting, matrix multiplication, and DSP algorithms to show 2 Problem Formul a t' ion consists of a set of non-overlapping intervals or an NIS (Nonoverlapping Interval Set) during all of which m is idle. We assume that the idle sets of elements in M are given as a set the idle set of ml (see Figure 2 ) . An empty interval is denoted by 0. Given intervals I1 = ( l 1 , r l ) and I2 = ( l 2 , 7 2 ) , we say 
. I,", ) represents
The intersection of more than two NISs is defined similarly. The endpoint set EN of NIS N is defined as the set of endpoints of the intervals in N , that is: EN = {p I 3q : That is MI is the set of CEs whose idle sets are partitioned into SI. The denaity of a partition SI at a given point p , is the number of NISs in S, that contain some interval containing p. The gain C(S1,Sz) of b-balanced bi-partitioning ( S 1 , S z ) of S is defined as: G(S1,Sz) = f ( l l , l 2 ,~~1 ,~~2 )
where t , = D ( A ( S , ) ) and swl = IA(S,)l. are referred to as the sleep times, and the awitchingsof partitions SI and SZ, respectively. Note that the internal intersection of a partition SI of S, can be thought of as the idle set of the corresponding partition MI of M , i.e., the set of maximal intervals during which all the CEs in M , are idle. Hence, we will use the terms idle set and internal intersection of a partition interchangeably in the rest of this paper. Any interval in the idle set of a partition is an idle interval or sleep interval of that partition.
M , = { m , I N, E SI}. Note that STMP is formulated as an optimization problem here. Using a transformation from min-cut graph partitioning problem [7] we can show that the decision version of STMP is NP-complete. We state this result as the following theorem(proof omited for brevity): Theorem 1. The problem STMP is NP-hard.
An Iterative Improvement Heuristic
Iterative improvement heuristic is a widely used technique in graph partitioning algorithms [9] . Generally, the heuristic starts with a random or other initial balanced paztition and then goes through a series of iterations to gradually improve the partitioning solution. In each iteration, the algorithm keeps moving one or more vertices from one partition to the other. The vertices to be moved are chosen such that the partitioning gain is maximized. The partitioning gain is defined as the net reduction in the number of edges connecting a vertex in one partition to a vertex in the other partition, that is the reduction in the number of cut edges. Once a vertex is moved, it is locked in the destination partition, and will not be considered for future moves in the current iteration.
An itcrat,ion stops when all vertices are locked in their partitions. At the beginning of each new iterat.ioii, all vertices are unlocked. The algorithm continues running one iteration after another until there is no improvement in an entire iteration. As the algorithm proceeds, the best visited partitioning soliition is recorded, and it is report.etl at the end as the final solution. We can employ similar technique, having in mind that the vertices are replaced by NISs and the gain of a bi-partitioning is as defined earlier. The geometric flavor of the algorithm, however, makes the choice of best possible move more complicated. We call the resulting algorithm Geo-Part .
A Segment Tree Implementation
In order to obtain a fast implementation we should be able to try a tentative move and compute its resulting gain efficiently. To do so, we have to devise a mechanism that allows updating (or computing from scratch) the values of t1,t2, s w~, sw2 after each tentative move, in an efficient way.
Segment tree (ST) data structure [13] was introduced as a data structure to handle intervals on an axis, whose endpoints belong to a fixed set of points, called the endpoint-set of the ST. An important property of the ST data structure is that it allows representation if each interval using at most O(1ogP) canonicalintervals, where P is the size of the endpoint set of the ST, furthermore insertion and deletion of an interval into a ST takes O(1ogP) time.
An slight modification of the ST data structure can be used to maintain each of our partitions. The endpoint-set of the STs for each partition would be the same as the endpoint-set of the STMP instance. To add an NIS to a partition, we will insert all its intervals into the ST, and to delete an NIS from a partition, we will delete all its intervals from the ST. 
Upper Bounds on Total Sleep Time
In order to have a feeling how far we are from the optimal solution, we need some reliable, yet realistic upper bounds on the total achievable sleep time. In this section we present two upper bounds 011 the total sleep time of the partitions in any b-balanced bi-partitioning of a given STMP instance. Note that the minimum of these is itself a valid upper bound.
Consider a given STMP instance (S, b) . Note that if IS1 < 2b, the upper bound on the total sleep time of each b-balanced bi-partitioning of S would trivially be zero. Therefore we assume IS1 2 2b. 
Application to Memory Segmentation
We say that a DRAM cell m is idle during interval I = (1,r) if m need not be refreshed during I. Let M = { m l , m z , ..., m,} represent the set of memory elements (MEs) used in an application. Assume that the access sequence for each ME m E M during a whole run cycle is given as a sequence of ordered pairs, each of the form (nt,A,) , where n t corresponds to the access time, and A, E {R, W } represents the type of access, read (R), or write (W). Given the access sequence for all the MEs, and our definition for an idle interval, we realize that, a DRAM cell is idle:
After its final access time, Before each write access until the closest read access (or Using these two rules we can calculate the idle set for each the start of computation) memory element form its access pattern.
Experimental Results
We implemented GeoPart using C and tested its performance on a number of test cases. Two matrix multiplication algorithms (mtrxmuitl, mtrxmuit2), a number of sorting algorithms ( hpsrt, qsrt, shellsrt) and a set of DSP routines (fourl, realft, avevar, moment) were selected from [13] as test cases for memory segmentation in low power synthesis of ASIC designs. To generate the idle sets for the memory elements, we used a profiling tool that provides the utilization of the registers, memory, and various computational units of a processor executing a program. The idle sets were generated from the utilization information provided by the profiling tool, using the methodology presented in Section 6.
In order to estimate the percentage of savings in power consumption, R, using our partitioning technique in memory segmentation we used the data sheets for a number of DRAM memories [I] . Note that a partition runs in standby mode, consuming significantly lower power, when switched into sleep mode. Let P , P' represent the power consumption with and without exploiting sleep mode using our partitioning technique, then using Power = E"ergy Time cD--"mprian , and ignoring the leakage current and the power consumption due to the sleep mode control unit, we have:
where Po and P , represent the power consumption of each partition of the memory in operation and standby modes, 
The data sheets indicate that for a given memory chip, we typically have 2 > 25, therefore the percentage of saving in the power consumption using the proposed partitioning technique would be at least: In the other cases, where our result achieves sleep time that is far from the given upper bound, an explanation maybe because the upper bounds themselves are not tight enough. The estimated savings on the power consumption using our technique ranges from 3.55% to 60.48%, averaging 20.02%, which is quite encouraging.
Conclusion
In this paper we studied system partitioning problem to maximize sleep time, which can be exploited to minimize the power consumption. The motivation is to deactivate the memory refresh circuitry, apply power down or just disable the clock signals during the inactive periods of operation of circuit elements. We formulated this partitioning problem in this paper and showed that it is NP-hard. We also presented a geometric heuristic algorithm based on the iterative improvement scheme for this problem, and discussed an efficient implementation of this algorithm using a segment tree d a h strncture to maintain each partition. N c condlict.rd cxperiiiients for memory segn~entatioii in low power syritlic4s of ASICs, on a number of nunierical a i d DSP algorithms. 'lhc experiments indicate that the estimated power consumption due t.o the meniory iinit is decreased from 3.5s to 60.48 percent, with an average of 30.02 percent. Our futnre research ill this area will be in the following directions: i) achieve tig1it.w upper bounds, i i ) design fast sleep time estimation irlgorit,liins to provide quick feedback to scheclnling and allocation twks in compilers and high-level synthesis. This can be used to perform scheduling and allocation targeted for higher sleep time, which could be exploited to reduce power consumption, iii) study the more general problem of multi-way partitioning, in which the algorit.hm should compute the optimal number cif partitions as well as the contents of each partition, i v ) experiment with other applications. In some applications the connectivity of the circuit e1ement.s should also be taken into account when doing the partitioning. Partitioning algorithms should be developed to allow trade-offs between routing area, delay, and power consumption.
