A High-Performance Hardware-EÆcient Memory Allocation Technique and Design by Cam, Hasan et al.
A High-Performance Hardware-EÆcient Memory Allocation
Technique and Design
Hasan Cam, Mostafa Abd-El-Barr, and Sadiq M. Sait
King Fahd University of Petroleum and Minerals
Computer Engineering Department
Dhahran 31261, Saudi Arabia
Abstract
This paper presents a hardware-eÆcient memory
allocation (EMA) technique designed to eliminate both
internal and external fragmentation that appear in the
buddy system. EMA can allocate a free memory block
of any size in any part of memory. Hardware imple-
mentation of EMA is introduced, but only part of its
circuits is shown in the paper due to the space limita-
tion. Simulation results show that EMA utilizes mem-
ory space more eÆciently than the previously known
techniques.
1 Introduction
Dynamic memory allocation is an important issue
in the design of computer systems. It has been re-
ported that dynamic memory management consumes
23% 38% of the time in six allocation-intensive C
programs run on 17-SPECmarks SPARC architecture
with 80 MB of memory [1]. Object-oriented programs
have a very high object creation rate and, therefore,
the speed of memory allocation is crucial for improv-
ing the system performance. It is highly desirable to
have a fast and eÆcient memory allocator that allo-
cates free space in blocks of exactly the prescribed
length.
A number of memory allocation algorithms have
been implemented in hardware, but each one of these
has some drawbacks. The buddy system, introduced
by Knowlton [2], is a fast and simple memory alloca-
tion technique. It allocates memory in blocks whose
lengths are power of 2 and, therefore, suers from
high internal and external fragmentation. If the re-
quested block size is not a power of 2, then the size is
rounded up to the next power of two. This may leave
a big chunk of unused space at the end of an allocated
block [3], thereby resulting in internal fragmentation.
Many researchers have focused on improving the per-
formance of the buddy system using either software
techniques [4, 5] or hardware techniques [3]. Chang
and Gehringer [6] have recently proposed a modied
hardware-based buddy system which eliminates inter-
nal fragmentation. This paper presents an eÆcient
hardware technique designed to detect any available
free block of requested size in the well-known memory
allocation technique buddy system and to eliminate in-
ternal fragmentation.
2 EÆcient Memory Allocation (EMA)
System
We assume that the memory is divided into a num-
ber of chunks, each having the same number of words.
A memory block consists of one or more chunks. A bit-
vector is used to represent the status (free or used) of
all memory chunks such that bits 0 and 1 correspond
to the case of a chunk being free or used, respectively.
Next, we introduce algorithm EMA which receives
two types of requests, namely, allocation and deallo-
cation, along with the size k of the requested block.
Algorithm EMA
Step 1. If request is deallocation, go to Step 5; other-
wise, (i.e., allocation), go to Step 2.
Step 2. (i) Detect all free blocks of size 2
dlog
2
ke
, (ii)
activate the address registers of these free blocks.
Step 3. (i) Detect the free block with the highest start-
ing address among those free blocks, (ii) return the
starting address of this free block to the memory man-
ager (indicating that the rst k chunks of the block are
free).
Step 4. Invert (from 0 to 1) all those k bits corre-
sponding to the rst k chunks of the free block. End
of Allocation. Stop
Step 5. Invert (from 1 to 0) all those k bits whose left-
most bit's address equals the given starting address of
the block to be deallocated.
Detection of Free Blocks (Step 2): The or-gate
prex circuit shown in Figure 1(a) is used to detect
all free blocks of size 2
dlog
2
ke
. The circuit assumes
a memory of size N chunks. Any node at level L
i
represents an OR gate. For i  1, the number of nodes
at level L
i
of the or-gate prex circuit is 2
i 1
less than
the number of nodes at level L
i 1
. An example circuit
for N = 10 is shown in Figure 1(b).
The or-gate prex circuit can detect any free block
if its size is a power of 2, wherever the free block is
located in the memory. This is an advantage over
the or-gate tree [6] which can only detect those free
blocks of size j whose starting address is a multiple of
j, where j is a power of 2. To determine the free block
whose rst chunk's address is the greatest, a so-called
level selector line S
i
is placed into level i, 0  i  n, of
the or-gate prex circuit as shown in Figure 2. There
are n + 1 level selectors labeled S
0
; S
1
; : : : ; S
n
for a
2
n
-bit vector. When a block of size k is requested,
only level selector S
i
, for i = dlog
2
ke, is asserted. For
any free block of size 2
i
, there will be exactly one cor-
responding or-gate node with value 0 at level L
i
of the
or-gate prex circuit.
Detection of the Free Block with the Highest
Address (Step 3): The bits of a bit-vector are labeled
from left to right, starting with 0. The label of each bit
is stored in its address register circuit (ARC). When
more than one address register is set by the vertical
lines, the selection of the address register with high-
est address is achieved by the hardware implementa-
tion of the binary countdown algorithm proposed by
Fraser [7]. The detect-rst tree and disabling circuits
shown in Figure 2 implement the binary countdown
algorithm in n steps as follows. Assume that the ad-
dress of the kth block in the bit vector is represented
in binary by x
1
k
x
2
k
: : : x
n
k
. In the jth step, 1  j  n,
the x
j
bits of the addresses of the enabled-address-
registers are ORed by the detect-rst tree, starting
with j = 1 in the rst step.
Bit Inversion (Step 4): Let SA and EA denote
the starting and ending addresses, respectively, of
the block determined by the circuit shown in Fig-
ure 2. Note that, given the size S of the requested
block, EA can be easily computed using the formula
EA = SA+S  1. If the bits corresponding to the al-
located block are represented by a subvector V of the
bit-vector, SA and EA correspond to the addresses of
the rst and last bit, respectively, of V . To indicate
that the bits of V are allocated, they are inverted to
bit vector
1 20 3address
General structure of or-gate prefix  circuit
(a)
bit vector: 1 1 1 1
1 2
0 0 0 0
0 3 4 5 6 7address:
or-gate  prefix  circuit for  N =10
(b)
1:  used block
0:  free  block
N-1
1 1
8 9
L1 :
L2 :
L0 :
L3 :
L1:
L2 :
L0 :
:
log   N2
L
N-2N-3N-4
Number of
       blocks  =  8
1 1 1
1
1
11
1
11
1 10
Number of
       blocks  =  2
Number of
       blocks  =  4
1 1
1 1
11
Figure 1: (a) General structure of an or-gate prex
circuit for a memory of size N chunks. (b) An or-gate
prex circuit for a bit-vector of length N = 10.
1. As soon as the free block with the highest address
is determined, the allocated blocks can be used, while
the bit inversion is simultaneously done to update the
status of the bit vector.
Memory Deallocation (Step 5): In case of memory
deallocation, the starting address SA and the size S
of the block to be deallocated are given. The only dif-
ference between memory allocation and memory deal-
location is that the starting address SA is given and,
bit-inverters invert the bits from 1 to 0 to indicate that
they are free.
3 Simulation Results
We have conducted a number of experiments to
evaluate EMA. In each experiment, a supply of syn-
thetic allocations/deallocations was created with their
execution times. EMA is compared with the buddy
system [3] and the memory allocator (CGMA) of
Chang and Gehringer [6]. To evaluate EMA, we dene
a measurement called memory allocation eÆciency
(MAE) as being the ratio of the number of requested
01 1
Address  Register  Circuit
address
lines
set  line
reset  line
clock  line
OR gateor
or
S2 = 22
S3 = 23
S4 = 24
bit
or
V0
or or
or
V1
bitbit bit
or
or
V2
or
or
V3
S1 = 21
or or or or
S0 = 20
0
ARC ARC
ARC
Control
  Unit
detect-first line
or
or
or
disable
detect-first
tree
disabling
circuit
ARC ARC
y j
j
x 1
y j
j
x 2
j
x3
j
x4
j
z 1
j
z 2
j
z3
j
z4
Figure 2: The detect-free-block (DFB) circuit for de-
termining the free block with the highest starting ad-
dress, for a 16-bit vector; only four bits of the vector
are shown due to space limitations. The lines of the
or-gate prex circuit are illustrated by the thick solid
lines.
memory blocks to the number of allocated memory
blocks. Figure 3 shows MAE versus memory space.
4 Conclusion
EMA is fast and exible enough to allo-
cate/deallocate a free block in any part of mem-
ory. This leads to better utilization of memory space,
thereby allowing more memory blocks to remain free
than is possible with the known hardware memory al-
locators.
Acknowledgment
The authors wish to acknowledge the support pro-
vided by King Fahd University of Petroleum and Min-
erals under Project No. COE/ARRAYS/177.
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
32 64 96 128 160 192 224 256 288 320 352 384
M
em
or
y 
Al
lo
ca
tio
n 
Ef
fic
ie
nc
y
Memory Space (in number of blocks)
buddy system
EMA
CGMA
Figure 3: Comparison of memory allocation eÆciency
versus memory space for EMA, CGMA, and buddy
system.
References
[1] B. Zorn, \The measured cost of conservative
garbage collection," Software-Practice and Expe-
rience, Vol. 23, No. 7, pp. 733-756, July 1993.
[2] K.C. Knowlton, \A fast storage allocator,"
Comm. ACM, Vol. 8, pp. 623-625, Oct. 1965.
[3] E.V. Puttkamer, \A simple hardware buddy sys-
tem memory allocator," IEEE Trans. Computers,
Vol. 24, No. 10, pp. 953-957, Oct. 1975.
[4] I.P. Page and J. Hagins, \Improving the perfor-
mance of buddy systems," IEEE Trans. Comput-
ers, Vol. 35, No. 5, pp. 441-447, May 1986.
[5] R.E. Barkley and T.P. Lee, \A lazy buddy sys-
tem bounded by two coalescing delays per class,"
Proc. 12th Symp. Operating Systems Principles,
Vol. 23, No. 5, pp. 167-176, Dec. 1989.
[6] J.M. Chang and E.F. Gehringer, \A high perfor-
mance memory allocator for object-oriented sys-
tems," IEEE Trans. on Computers, Vol. 45, No.
3, pp. 357-366, March 1996.
[7] A.G. Fraser, \Towards a universal data transport
system," Advances in Local Area Networks, K.
Kummerle, F. Tobagi and J.O. Limb (Eds), New
York: IEEE Press, 1987.
