Data transfer intensive applications consume a significant amount of energy in memory access. The selection of a memory location from a memory array involves driving row and column select lines.
INTRODUCTION
In data transfer intensive applications, such as video and image processing, a significant fraction of the total energy consumption of the system is due to memory access [16] . Dynamic p w e r dissipation is significant in CMOS circuits, and therefore, behavioural Permission to make digital or hard copies of all or pm of this work for personal or cbsrwm use is granted without fee provided that copics are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy othmuiae, to republish, to post on servers or to redistribute to lists, requires prior specific p m n l S 5 l O " and 0, a fCC ICCAD '02 San Iosc, California, USA Copynghl2001 ACM X.XXXXX.XX-x/xxIxX S5 00 level energy minimization efforts often attempt to minimii signal transition counts, particularly on high capacitance nodes 1131.
In order to minimize switching activity caused hy memory access, it is necessary to have some knowledge about the access sequences. For many application-specific integrated circuits (ASICs), the access sequences are usually known a priori. ASICs may also contain data dependent access sequences. However, because the application is known, statistical information can he collected about the data dependent sequences. Information about the access sequences enable the application of energy optimizations to ASICs which would not be applicable to general purpose systems.
CMOS memory cell arrays are usually organized into rectangular blocks of memory cells. The selection of a memory location involves driving row, column and in some cases block select sig-MIS. Signal transitions on high capacitance signals such as the row and block select lines consume more energy compared to those on column select lmes [3] .
In this paper, we present a methodology for minimizing the energy consumption of memory access through address assignment that minimizes row switching. The novel contributions of this paper are: (1) formulation of the energy efficient address assignment problem as a multi-way graph partitioning problem; (2) application of an existing graph partitioning heuristic to solve the problem; (3) evaluation of the solution in terms of quality of the solution and mn time.
This paper is organized as follows. Section 2 presents some of the previous work in the area of memory access energy minimization. Section 3 formulates the energy efficient address assignment problem as a graph partitioning problem. Section 4 contains a list of assumptions we have made in this work. Section 5 describes ow address assignment methodology. Section 6 reports experimental results and Section 7 contains conclusions and indicates some possible future work.
PREvIouswoRK
The relevant previous work of interest address the problem of reducing memory access energy in ASICs at the behavioural level.
In For a given number of memory accesses energy can be M e r minimized by address assignment that attempts to minimize signal transition counts, particularly high energy transitions on off-chip address busses [13] . Address and data bus encoding methods can also be used to minimize switching activity on off-chip busses 1111.
None ofthe above mentioned work exploit the fact that different select lines in the memory cell array consume different mounts of energy Our work reported in this paper exploits this property of the cell arrays to add a complimentary energy minimization approach to the existing architecNral level energy minimization techniques.
PROBLEM DEFINITION
The problem of minimiziig the activity on the row select lmes can be thought of as aproblem of clustering the accessed data items such that the number of transitions between the clusters is minimized. The cluster size is no greater than the number of memory columns and the n u m k of clusters is equal to the number of memory rows. We now &Me, mathematically, the minimum row switching address assignment problem as a graph partitioning problem.
Given anundirected edge-weighted graph G(V,E), where vertex set V = {vili = 0, I , . . . , n -1) is the set ofvertices, E is the set of weighted edges, and positive integers p and q where p x q 2 /VI,
1. q : : 6 = Y and Eny, = 0 fori # j 3. the cut size, i.e., the sum of weights of edges crossing between subsets', is minimized.
ASSUMPTIONS
The following assumptions are made in this paper:
1.
A signal transition on a row select line consumes more en-2. Reducing switching activity on high capacitance signals such 3. Every individual data item has been mapped to a location in a physical memory but is not b u n d to a physical memory address.
ergy than a signal transition on a column select line.
as row select lines reduces energy consumption.
4.
There is no preference when switching from one row to another, i.e., for example switching from row 1 to row 3 is the same as switching from 1 to 2.
Address generators have not yet been synthesized
6. Memory cell arrays are rectangular. p and q represent the number of rows and the number of columns respectively. w represents the width of a memory word. p and q are known and p x q gives the total number of locations of that cell array. p x q x w gives the total capacity of the cell m y in bits.
'The sum of weights of edges crossing between subsets is equal to the number of row transitions.
7.
Memory arrays have one readlwrite port.
8. The data access sequences are hown 9. Memory cells are accessed by activating row and column select lines. We ignore block select lines.
IO. Each data array is assigned to a separate memory cell array. And the memory cell array is just large enough to contain the assigned data array.
Assumptions I-6 are necessary for the problem formulation and its solution. Assumptions 7-IO are simplifying assumptions. These assumptions hold through out this paper unless stated otherwise. 
ENERGY EFFICIENT ADDRESS ASSIGN-MENT METHODOLOGY

Input
Transition Graph
In Section 3 we defined the problem of tinding an address assignment that minimizes the activity on the row select lines as a multi-way graph partitioning problem. Therefore, the input symbolic address sequences are convexbed to a rrnnrition graph for each memory cell amy. The transition graph contains information regarding the unique memory locations accessed and the number of transitions between each pair ofunique memory locations.
A transition graph is created fmm a symbolic address sequence as follows 2. Every unique symbolic address is mapped to a vertex in the transition graph. A transition between a pair of symbolic addresses is indicated with an undirected edge between the cowsponding vertices. The edges are weighted, and the edge weight is equal to the number of transitions.
Figure l(a) shows the transition graph for access sequence S.
The numbers inside the vertices indicate the symbolic addresses and the n u m b on the edges indicate the weights of the edges.
Mule-Way Graph Partitioning Heuristic
Once the transition graph for a memory cell array has been created, we require a mapping of each vertex to one of p rows such that no row Contains more than q veaices.
For two-dimensional data arrays there are two simple address assignment m e t h d . row major and column major. In general, the problem can be viewed as a multi-way graph partitioning problem. Since the graph partitioning problem is NPcomplete [5], even for the simplest case of graph bisection with unweighted edges and vertices, a heuristic approach is used to solve this problem.
There are many graph partitioning heuristics and software tools available [4] . Chaco graph partitioning tool developed by Hendrickson and Leland [7] was used for our application. Chaco provides several partitioning algorithms. We chose the multilevel-KL algorithm [8] as this offers low cut size for large problems in moderate time [7] . As data transfer intensive systems access large memories the ability to quickly partition large graphs into a large number of sets is verf imuortant. Cham is controlled bv a host of variables. One of kc most important for our application is the KLAfElALANCE variable. When KLA53ALANCE is set to 0. set sizes for graphs whose vertices are not weighted vary by at most one 171. With this guarantee we can prove that multilevel-KL algorithm meets the piutitioning constraint It$/ _< q defined in Section 3:
Proposifion: If the largest and smallest set sizes, in a partitioned graph, differ by one and the size of the memory cell array that the vertices of the original graph are mapped to, is given by p x q where p is the number of rows and q is the number of columns, then for all positive integer values the largest set size will always be smaller than or equal to q.
PROOF. Let a be the number of sets of size x and b be the number of sets of size x -I, then because the total number of vertices has to be less than or equal to the size of the cell array we have:
Since p is the total number of partitions, simplifying and substihlt- 
output
The output is a mapping of symbolic addresses to rows of the memory cell array. It should be noted that the exact row and column addresses are not fixed after graph partitioning. In other words, graph partitioning only performs apanial address assignment. The row and column addresses can be sequentially assigned. However, this sequential assignment may not be optimal for address generators. There is further optimization oppomity here for reducing energy consumption in the address generators.
EXPERIMENTAL RESULTS
We performed experiments on several memory access sequences to evaluate our energy efficient graph based address assignment method. The total number of row transitions caused by memory accesses was used as an energy consumption metric for each of the TOW major, column major and graph based address assignment schemes. A row transition in a larger memory may consume more energy than a row transition in a smaller memory. However, in our experiments row transition counts are not weighted to take account of the memory size.
Memory access sequences used for our experiments are obtained fiom the following fxamples: Gauss-Seidel formula (GSR) [IO] . Some of the examples contain several data arrays, exhibiting different access sequences. From such examples we have manually selected the data array with the most number of accesses for our experiments. For simplicity, two dimensional data arrays in our examples are K x K square arrays. We varied the data array dimension K from IO to 1000 at intervals of 10 (I6 to 1000 at intervals of 8 for DCT) and studied the effects on the row transition count (RTC) when the arrays are mapped with different mapping schemes to memory cell arrays with different numbers of columns. Table 1 shows the average percentage reductions in row switching achieved thmugh graph based mapping over row inajor m a p ping when the data arrays are mapped to a memory with 32 columns (q = 32). For some examples, average reductions of 40-70% are achievable over row-major mapping. Row major mapping was chosen as the benchmark, because for all our access sequences row major mapping produced lower RTCs than column major mapping when q = 32. Table 1 also shows absolute numbers for row transition count when K = 256.
The magnitude of reduction in the row hansition count over row major mapping achievable through our method depends on the particular access sequence. For some access sequences row major mapping may be optimal and our method would not yield any improvement in row transition count. Where reductions in RTC are possible the amount of reduction additionally depends on the number of columns in the memory. Figure 3 shows row transition count for SOR aeainst number of columns when K = 256. When the number ofiolumns is one, all three mapping schemes produce the same RTC. And also if the number of columns is large enough to RTC (K = 256) I thiswork Graph partitioning address assignment aims to achieve a near optimal row transition count. In order to evaluate our graph based mapping with respect to the optimal mapping we have carried out the following experiment. source[i]Li] array in the image flip algorithm shown below has a row major access sequence. Therefore, row major mapping is optimal for this access sequence. 
CONCLUSION AND FUTURE WORK
In this paper we have presented a methodology for energy efficient address assignment through minimization of memory row switching. Row transition counts for many commonly found access sequences in multimedia applications can be reduced hy 40%-70% over row major mapping with our methodology. We have also demonshated that our methodology can achieve row transition counts very close to the optimum.
The methodology is directly applicable to address generator synthesis methods which require an expanded address sequence, such as counter based methods [6] , shiil register based methods [9] and Gnite state machine based methods [9] .
Energy consumption in the address generators was not considered. It should he noted that the methodology presented performs a grouping of data array variables which should be assigned to the same memory row. It does not fix row and column addresses. We could simply sequentially assign row and column addresses to mapped data variables. However, this sequential assignment could be further optimized for a given address generator architecture.
Furthermore, the methodology presented can he extended to data dependent memory access sequences through the use of statistical methods for conshuction of the transition graph. Also our methodology can he easily extended to more complex memory organiza-tions [3] that use block select lines in addition to mw and column select lines by first assigning data variables to blocks and then to rows.
ACKNOWLEDGMENTS
This work was funded by LSI Logic Corporation, US and the Overseas Research Students Awards Scheme, UK. The authors would like to thank Thomas Niemann, Christos Bouganis and Andy Royal for their help with the implementation of the sZg programme and Nishanth Kulasekeram for his comments on the paper.
