Energy consumption in caches depends on the number of enabled sets/ways/blocks. The optimal energy consumption is the case of one set/one way/one block enabled. This paper proposes an algorithm to map cache line to one block in fully associative cache by XOR'ing the address with constant. Bit selection is applied to the result and the block accessed. Only one block is accessed in this mapping. The proposed model is simulated with SPEC2K benchmarks. The average memory access time is comparable with traditional fully associative cache with energy savings.
INTRODUCTION
Caches come in three models-direct mapped, set associative and fully associative [1, 4] . In direct mapped cache a block can be placed in one set. In w-way set associative model, a block can occupy any of the w-ways of the cache set. In fully associative model, a cache line can be placed in any of the cache blocks. Usually bit selection equivalent to mod function is used to choose cache set in direct mapped and set associative caches. In fully associative cache, the address along with data is matched with each of the cache blocks. On a hit, the block is accessed. The least recently used algorithm is used to replace cache miss in caches usually. For a fully associative cache with n blocks, the amount of energy consumed for match logic is nE where E is the energy consumed per block.
Energy consumption is widely studied topic in caches. The authors in [3] propose a power efficient cache with matching of the four least significant bits of tags. The authors in [5] propose a method to disable a set of ways in set associative cache during modest activity while the full cache may remain operational for more cache intensive periods to save energy. The authors in [7] suggest a method to access only the tag store during comparison. In [8] a method is proposed to enable only single way of set associative cache. In [9] a way is predicted which is probed during access of the set associative cache saving energy. Partial tags are compared in the method proposed in [10] . The cache is divided into subcaches in [11] . Different techniques for data placement, sub cache prediction and selective probing are proposed in this paper. The author in [12] proposed a method to save energy in set associative cache. The author in [14] evaluates the idea of restriction the number of blocks to be searched in fully associative cache. The energy consumed by cache operation depends on the number of blocks/sets accessed along with the number of tag comparisons. The authors in [2] discuss the XOR mapping to direct mapped, set associative, victim cache, hash-rehash cache, column associative, skew-symmetric caches.
This paper proposes an algorithm to map address to one block in fully associative cache of n blocks. The algorithm negates each bit in the address and uses bit selection to map to a block. This is equivalent to bitwise XOR'ing the address with a string of 1's which is The rest of the paper is organized as follows. Section 2 gives the motivation, section 3 the proposed model, section 4 the mathematical analysis of the proposed system, section 5 simulation, section 6 conclusion, section 7 acknowledgements and section 8 references.
MOTIVATION
Consider a fully associative cache of eight blocks. Consider the address stream 1, 2, 1,3,1,4. The line corresponding to address 1 is placed in block 0, address 2 is placed in block 1, address 3 in block 2, and address 4 is placed in block 3 of a fully associative cache. There are four misses and two hits. All the blocks are searched for an address. If the energy consumed per block in match logic is 10J, 80J of energy is consumed per access. The total energy consumed for the given address stream is 6*80J=480J. Consider the following algorithm.
1. For address a XOR with 7.
2. Use bit selection to map to a block.
Stop.
According to the proposed algorithm, the address 1 is XOR'ed with 7 which is 111 in binary. This gives 110. The address 1 is placed in block 6. Address 2 on XOR'ing with 7 gives 101(010 XOR 111) and is placed in block 5. Address 3 on XOR'ing with 7 gives 100 (011 XOR 111) and is placed in block 4. Address 4 XOR'ed with 7 gives 011 (100 XOR 111) is placed in block 3. The number of hits is two and misses is four. Only one block is accessed for mapping an address. Hence the total energy consumed per access is 10J. For the given address stream the total energy consumed is 60J. There is 87.5% savings in energy consumption. This is the motivation of this paper.
PROPSED MODEL
Consider a fully associative cache with n blocks. Consider an address stream. The addresses are mapped to the cache based on the fully associative cache model. All the blocks of the fully associative cache are enabled for the entire trace. The match logic circuitry is enabled for all the blocks. Now, consider the following algorithm for an address a.
1. XOR the address with n-1.
2. Use bit selection to map to any of the n blocks.
Stop.
Only one block is accessed in this algorithm. The match circuitry of the accessed block is enabled. This saves energy. The address mapping is depicted in Figure 1 . 
MATHEMATICAL ANALYSIS OF PROPSED MODEL
Consider fully associative cache with n blocks. Let this be denoted as trad C . Consider address trace of R references. Let the references be placed according to the logic of the fully associative cache. Let H be the number of hits, t the hit time per reference, M miss penalty. The AMAT for this model is given by
Next, consider the algorithm proposed in section 3. Let the system implementing this algorithm be denoted as prop C . Let h be the number of hits, T the time to access one cache block, k the XOR mapping time for one address. The AMAT of the proposed system is given by
An improvement in AMAT is seen if
Consider the energy consumed in fully associative cache. In the traditional cache, all the match logic is performed in parallel for n blocks. If the energy consumed per block is E Joules, the total energy consumed is REn (4) In the proposed model, only one cache block is accessed for match logic. The total energy consumed in the proposed model is RE
A performance improvement of n n 1  is observed.
SIMULATION
The proposed model was simulated with SPEC2000 benchmarks. The addresses were collected using Simplescalar toolkit. Routines in C language were written to simulate the proposed model. The hits and misses were collected for the benchmarks. The parameters for the simulation are given in Table 1 . The proposed model was compared with fully associative and direct mapped cache models for AMAT. As seen from Figure 2 the AMAT is comparable with these models. The AMAT is calculated as follows. The cache is simulated as an array. Each address is mapped to a block in fully associative cache. The number of hits and misses are gathered from the address trace. The AMAT is calculated by substituting the values in the equations (1) and (2) for the traditional and proposed models respectively. 
CONCLUSION
A fully associative address mapping algorithm is proposed in this paper. An address is XOR'ed with n-1 for a fully associative cache of size n blocks. Bit selection is applied to the result and the corresponding block is accessed. The proposed model is simulated with SPEC2000 benchmarks. The AMAT for the proposed system is comparable with fully associative cache and direct mapped cache of the same size. An average energy savings of 99% in operational mode is observed.
