Performance-area trade-off of address generators for address decoder- decoupled memory by Hettiaratchi,S. et al.
Performance-Area Trade-Off of Address Generators for Address
Decoder-Decoupled Memory
Sambuddhi Hettiaratchi, Peter Y.K. Cheung, Thomas J.W. Clarke
Department of Electrical and Electronic Engineering
Imperial College of Science, Technology and Medicine, London
E-mail: s.hetti@ic.ac.uk, p.cheung@ic.ac.uk, t.clarke@ic.ac.uk
Abstract
Multimedia applications are characterized by a large
number of data accesses and complex array index manip-
ulations. The built-in address decoder in the RAM memory
model commonly used by most memory synthesis tools, un-
necessarily restricts the freedom of address generator syn-
thesis. Therefore a memory model in which the address de-
coder is decoupled from the memory cell array is proposed.
In order to demonstrate the benefits and limitations of this
alternative memory model, synthesis results for a Shift Reg-
ister based Address Generator that does not require ad-
dress decoding are compared to those for a counter-based
address generator that requires address decoding. Results
show that delay can be nearly halved at the expense of in-
creased area.
1. Introduction
Multimedia applications such as video and image pro-
cessing are often characterized by a large number of data
accesses. Many image processing applications such as the
block matching motion estimation algorithm also contain
complex index manipulations, resulting in complex address
patterns. Furthermore, the high throughput required for
such applications imposes tight timing constraints on ad-
dress generation. Therefore, optimization of address gen-
eration in data transfer intensive applications is critical to
system synthesis.
Although memory synthesis and optimization have been
studied elsewhere, and a good overview can be found in [9],
most of this research is based on a random access mem-
ory (RAM) model as shown in Figure 1. This common
RAM model accepts a binary coded address and decodes
it using built-in decoders into row and column select sig-
nals. Such a memory architecture is advantageous, or even
necessary, for applications where the sequence of access to
Memory Cell
Array
R
ow
 D
ec
od
er
Col Decoder
2m
RS
CS 2
n
m
n
RA
CA
m+n
Ad
dr
es
s 
G
en
er
at
or
CLK
Reset
Ad
dr
es
s
RAM with built-in address decoders
Next
Figure 1. Conventional RAM model with ad-
dress generator
memory is not predictable. For many application-specific
integrated circuits (ASICs), the address sequences are usu-
ally known a priori and the actual sequence often follows
a reasonably regular pattern. In this paper, we examine the
possible impact on area and performance of memory ac-
cess related circuitry in eliminating the row and column ad-
dress decoders from the memory by incorporating any nec-
essary decoding with the address generation circuit. The
novel contributions of this paper are: (1) to propose an “ad-
dress decoder-decoupled memory” model that separates the
built-in address decoder from memory and incorporates the
address generation and any address decoding logic into one
or more finite state machines (FSMs); (2) to evaluate such
a generic model for address sequencing in the context of
circuit synthesis and assess its limitations; (3) to propose
a shift register based architecture with “two-hot” encoding
as an efficient implementation of such FSMs for regular ad-
dress sequences; (4) to compare the proposed approach with
those using conventional address counter/decoding methods
in terms of area and performance.
This paper is organized as follows. Section 2 gives a
brief summary of the previous work. Section 3 presents a
generalized model for the address generator targeted at the
address decoder-decoupled memory (ADDM). Section 4
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
describes a novel shift register based address generator ar-
chitecture which does not use address decoders. Section 5
describes the procedure used to map an address sequence
to this particular architecture. Section 6 reports synthesis
results and Section 7 contains conclusions and possibilities
for future work.
2. Related work
Memory synthesis optimizations such as in-place map-
ping, loop transformations and array clustering typically
add extra complexity to the address sequences in the non-
optimized application code [9]. Furthermore, most modern
memory architectures are based on memory hierarchy [2].
This involves adding extra smaller and faster layers of mem-
ory between larger memory and computational units and in-
troduces extra data transfers [12]. Explicit transfer memory
hierarchy, a memory hierarchy whose address sequences are
known and synthesized at compile time, requires extra ad-
dress generation [12].
In many digital signal processing applications the array
access patterns are regular and periodic [4]. In these cases
it becomes feasible and efficient to generate the necessary
address patterns either directly from a dedicated counter or
via circuit transformations applied to a counter output [4].
PHIDEO silicon compiler is targeted at stream-based video
applications [6]. The address allocation task of PHIDEO
trades off memory size against addressing cost. ZIPPO tool
which is integrated with PHIDEO considers several address
streams accessing different memory modules on-chip, and
synthesises an address generator that is area optimized by
sharing hardware [5].
The combined ADOPT methodology for address gen-
erator synthesis described in [8] is a general framework
for optimizing address generators. ADOPT considers both
counter-based and arithmetic-based address generator styles
and also considers hardware sharing. Schmit and Thomas
employ some interesting properties of the exclusive-OR
function to overcome the offset addition problem that is in-
troduced by array clustering memory optimization [10].
All these works use a conventional RAM model. Access
to memory is specified by a binary coded address which is
split to form row and column addresses. These are then de-
coded to access two dimensional arrays of memory cells.
Such memory architecture has the advantage of reducing
the number of signals needed to access memory, in partic-
ular for off-chip devices. It also decouples memory access-
ing from address sequencing, making the device suitable for
any application.
However, some researchers have argued that the built-in
address decoder which is present in the conventional RAM
model restricts the freedom of memory system optimizers
to arrive at an optimal memory architecture for a given al-
Memory Cell
Array
2m
RS
CS
2n
Fi
ni
te
 S
ta
te
 M
ac
hi
ne
CLK
Reset
Address Decoder-
Decoupled Memory
Next
Figure 2. Finite state machine address
generator model for the address decoder-
decoupled memory
gorithm [1, 3, 11]. This is particularly true for application-
specific integrated circuits where the address sequences are
deterministic and often regular. Under such circumstances,
decoupling the address decoder from the memory array and
incorporating it into the address generator circuit may offer
more scope for optimization.
3. Generalized address generator model for the
address decoder-decoupled memory
In the case of deterministic access patterns the address
generator for the address decoder-decoupled memory can
be generalized as an FSM (Figure 2). The problem of
address generator optimization can then be formulated as
a generic state encoding and assignment problem. Un-
fortunately, for a repetitive address sequence of length N,
an FSM with N states is required. For a relatively long
sequence, which is common in many applications, this
presents an intractable problem to a logic optimizer, result-
ing in large circuits that take a very long time to synthesis.
Alternatively, the general FSM architecture can be replaced
by a structured solution based on shift registers. These two
alternatives were investigated for a simple incremental ac-
cess sequence (i.e. 0,1,2,...,N-1). Both circuits were synthe-
sized with Synopsys’ Design Compiler for a 0:18µ CMOS
process. Figures 3 and 4 show the delay and area for FSM
based and the shift register based solutions respectively.
The shift register, in general, is over twice as fast as the
binary encoded symbolic state machine with only 10% in-
crease in area. Moreover, as the sequence length increases,
the synthesis time for the symbolic state machine becomes
impractical. (For N=256, FSM synthesis took over 6 hours
compared to 36 minutes for the shift register solution on a
SUN Ultra-5 workstation.)
2
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
8 16 32 64 128 256
0
0.5
1
1.5
2
2.5
de
la
y/
ns
Address sequence length
Shift Register        
Symbolic State Machine
Figure 3. The delay through the address
generator for different address sequences
lengths
8 16 32 64 128 256
0
2000
4000
6000
8000
10000
12000
a
re
a
/(c
ell
 un
its
)
Address sequence length
Shift Register        
Symbolic State Machine
Figure 4. The area of the address generator
for different address sequences lengths
s0,0
C1/2->
E2
R3
s0,1
s0.2
s0,3
s1,0
C1/2->
E2
R3
s1,1
s1,2
s1,3
DivCnt
Comb Logic
PassCntnext
enable pass
0
1
0
1
S0 S1
Comb Logic
l5
l1
l4
l0
l3
l7
l6
l2
X0 X1
clk clk
resetreset
Figure 5. SRAG architecture
4. Shift register based address generator archi-
tecture
A shift register can be used to generate a simple incre-
menting sequence of addresses for the decoder-decoupled
memory. For more complex address sequences, a more
sophisticated architecture is required. We propose a Shift
Register based Address Generator (SRAG) architecture, as
shown in Figure 5, which can generate many regular ad-
dress sequences, especially those found in block-based im-
age processing algorithms. This design is an improvement
on the Sequential FIFO memory (SFM) proposed by Alo-
qeely [1]. SFM works on the same principle as a RAM
but the address decoder is replaced by two single-bit shift
registers, one for read and the other for write (Figure 6).
SFM, despite its advantages, suffers from three main limi-
tations. Firstly, SFM assumes that memory is one dimen-
sional whereas memory arrays are two dimensional. Sec-
ondly, SFM uses one-hot encoding for its address generator
state machine. Finally, SFM is a first-in first-out (FIFO)
memory and cannot be applied to other types of address se-
quences such as block access.
In contrast, our SRAG architecture works explicitly with
a two-dimensional memory. Two dimensional memory al-
lows us to use “two-hot” encoding, “one-hot” in each of the
row and column address selections, which takes up far less
area than one-hot encoding. The two dimensional arrange-
ment of the memory array naturally implements decoding
for two-hot encoded addresses and therefore two-hot encod-
ing does not incur any delay penalty over one-hot encoding.
Finally, as we demonstrate in this paper SRAG is capable of
implementing access patterns other than FIFO type access.
The complete SRAG is composed of a row SRAG and
a column SRAG controlling the row select (RS) and the
column select (CS) lines respectively. The row and col-
3
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
Memory
Cells
data in
data out
tail-pointer
shift register
head-pointer
shift register
next
reset
next
reset
.
.
Figure 6. Sequential FIFO memory
umn SRAGs are identical in architecture. Therefore, for
the sake of simplicity, we will consider address generation
for one dimension of the memory array which has just one
set of select lines. Figure 5 shows the schematic for such an
SRAG containing two shift registers and two multiplexors.
All shift registers have a clock input C1, an enable input E2,
and a reset input R3.
The general architecture of the SRAG is composed of
a set of shift registers, S = (S0;S1; : : : ;SN 1), a set of
two input multiplexors, X = (X0;X1; : : : ;XN 1), and two
synchronous binary counters, DivCnt and PassCnt. The
SRAG has inputs clk, next and reset, internal control sig-
nals enable and pass and an output set of select lines,
L = (l0; l1; : : : ; lw 1). Each shift register Si 2 S is defined
as a set of flip-flops, (si;0;si;1; : : : ;si;Mi 1).
When enable = 0 all shift registers are disabled and re-
tain their previous states. When enable = 1 , at every
rising edge of clk, si; j := si; j 1 where i = 0 : : :N   1; j =
1 : : :Mi 1. The value assigned to the first flip-flop of each
shift register is controlled by the pass signal. If pass = 1
then s
(i+1)modN;0 := si;Mi 1 where i = 0 : : :N 1, otherwise
if pass = 0 then si;0 := si;Mi 1 where i = 0 : : :N   1. The
set of multiplexors implements the controlled assignments
to si;0 where i = 0 : : :N  1. If N = 1 multiplexors are not
required. At any given time only one flip-flop in the set S
can output the value 1, which we call the token.
The output of each flip-flop of each shift register s i; j is
mapped onto a distinct select line, lk 2 L. The token travels
from one flip-flop to the next, thus activating the select lines
in the desired order. The subscripts of the select lines cor-
respond to the one-dimensional memory address, a n. The
mapping of one-dimensional addresses to the flip-flops is
described in detail in the next section.
In general, address generators are driven by a next sig-
nal. Every next signal advances the address generator
from the current position in the address sequence a n to
the next position in the address sequence an+1. There-
fore, the next signal controls the enable signal. If the
current address in the address sequence is identical to the
next address, i.e, an = an+1, then it is necessary to pass
the next signal through a divider. DivCnt counts the num-
ber of next signals up to a predetermined value dC. The
connected combinational logic block asserts enable when
the counter value equals dC  1 and the input next is also
high. The SRAG only has one DivCnt, this imposes the
restriction that the length of each repetition of each ad-
dress in the address sequence should be equal. For exam-
ple, the SRAG shown in Figure 5, with dC = 2 and assum-
ing the pass signal is always asserted, gives the address se-
quence 5;5;1;1;4;4;0;0;3;3;7;7;6;6;2;2. In contrast, the
sequence 5;5;5;1;1;4;4;0;0;3;3;7;7;6;6;2;2 has a dC of
3 for address 5 and a dC of 2 for all other addresses and
violates the DivCnt restriction.
PassCnt controls the number of iterations for which a
particular shift register Si 2 S retains the token by control-
ling the pass signal that switches the two input multiplexors.
It does so by counting the number of enable signals. When
the counter value reaches a predetermined value pC   1
the connected combinational logic asserts the pass signal.
There is only one PassCnt in the SRAG. This imposes the
restriction that the length of each shift register multiplied
by the number of iterations of that shift register has to be
the same for all shift registers. For example, the SRAG
shown in Figure 5, with pC = 8 and dC = 1 gives the se-
quence 5;1;4;0;5;1;4;0;3;7;6;2;3;7;6;2. In contrast, the
sequence 5;1;4;0;5;1;4;0;5;1;4;0;3;7;6;2;3;7;6;2 has a
pC of 12 for S0 and 8 for S1 and therefore would violate the
PassCnt restriction.
The restrictions on DivCnt and PassCnt limit the spec-
trum of addressing patterns that can be implemented with
SRAG. This can be relaxed by using multiple counters that
provide more flexibility in the sequences that can be gen-
erated. Furthermore, it is not necessary to use counters for
deriving the enable and the pass signals. It is possible to use
shift registers or interacting FSMs to derive these signals.
5. Automatic Mapping Procedure
The input to the mapping procedure is a row address
(RA) and a column address (CA) sequence. The mapper
works on the RA and the CA sequences separately. The
function of the mapper is to identify the shift register set S,
the common division count dC and the common pass count
pC for a given address sequence.
In order to better explain the mapping procedure we use
the two dimensional array new img[I0][I1] from the block
matching motion estimation algorithm (Figure 7) where
img width= 4; img height = 4;mb width= 2, mb height =
2 and m = 0. Table 1 shows the resulting linear address se-
quence (LinAS), row address sequence (RowAS), and col-
umn address sequence (ColAS). In this paper, we assume
4
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
Name Address Sequence
LinAS 0;1;4;5;2;3;6;7;8;9;12;13;10;11;14;15
RowAS 0;0;1;1;0;0;1;1;2;2;3;3;2;2;3;3
ColAS 0;1;0;1;2;3;2;3;0;1;0;1;2;3;2;3
Table 1. Address sequences
Parameter Value
I 0;0;1;1;0;0;1;1;2;2;3;3;2;2;3;3
D 2;2;2;2;2;2;2;2
R 0;1;0;1;2;3;2;3
U 0;1;2;3
O 2;2;2;2
Z 0;1;4;5
S (0;1);(2;3)
P 4;4
dC 2
pC 4
Table 2. Mapping parameters for column ad-
dress sequence
that the new img array is row-major mapped to the mem-
ory array so that RA = I0, CA = I1, and linear address,
LA = I0  img width + I1. Data organization within the
memory cell array can greatly affect the available regularity
at the RowAS and ColAS level. The optimal data organi-
zation within the memory cell array for a given arbitrary
address sequence is not addressed in this paper. Since the
same procedure applies to both the RowAS and the ColAS
we will now focus on the RowAS alone.
Table 2 shows the mapping parameters for the RowAS.
The input to the procedure is the address sequence I. The
outputs are S, dC and pC. The following describes the se-
quence of steps involved in the mapping.
 The division count set D is found by counting the con-
secutive repetitions of individual addresses in I. The
restriction on DivCnt means that all 8 elements of D
should be equal. In this case, dC is equal to any ele-
ment of the sequence D.
 Once dC is known, I is reduced by replacing the re-
peated blocks by a single element to give the reduced
address sequence R, e.g., 00 is replaced by 0.
 Then the mapper examines R to find the unique address
sequence U. An address appearing in R is recorded in
U if that address is not already recorded in U. The or-
der in which addresses are recorded in U corresponds
to the order in which they first appear in R.
for(g=0; g < img_height/n; g++)
   for(h=0; h <img_width/n; h++) {
      m_vect[g][h] = big;
      for(i=-m; i<m ;i++)
         for(j=-m; j<m ;j++) {
            diff = 0;
            for(k=0;k<mb_height;k++)
                for(l=0;l<mb_width;l++) {
                   diff += abs(new_img[g*mb_height+k][h*mb_width+l]
                             - old_img[g*mb_height+i+k][h*mb_width+j+l]);
                }
            m_vect[g][h] = min(diff, m_vect[g][h]);
         }
      }
Figure 7. Block matching motion estimation
algorithm
 For each of the unique addresses in U, the number of
times they occur and the position of their first occur-
rence in R is recorded in the sets O and Z respectively.
 The grouping and mapping of select lines to shift reg-
isters is done in two steps. First, initial grouping is
done by checking if two consecutive unique addresses,
uk;uk+1 2U , that occur the same number of times also
have consecutive first appearances in R. If they do then
the select lines, luk ; luk+1 2 L are grouped and mapped
to the same shift register, si; j  luk ;si; j+1 luk+1 else
si;Mi 1  luk ;si+1;0  luk+1 . Initial grouping may fail
for certain address sequences such as 1;2;3;4;3;2;1;4,
therefore, a verification step follows. The number of
shift registers N is simply the number of groups and
the number of flip-flops in each group Mi is the num-
ber of elements in each group.
 Finally, for each shift register pass count pC is found
and recorded in P. Pass count is calculated by finding
the length of R that is produced by each of the shift
registers. The restriction on the PassCnt means that all
elements in P should be equal. In this case pC is equal
to any element of the sequence P.
We have implemented these mapping rules in our SRAd-
Gen tool. The tool accepts a sequence of one-dimensional
addresses and, if mapping is successful, produces synthe-
sisable VHDL code describing the corresponding SRAG.
6. Experimental Results
In this section, we present synthesis results in terms of
area and delay of the SRAG architecture and compare them
to those for a counter-based address generator using address
decoders (CntAG). Both types of address generator were
5
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
synthesized with Synopsys’ Design Compiler for a 0:18µ
CMOS process. The counter-based style was chosen as the
benchmark because, for regular access patterns, it performs
better than arithmetic-based address generators [7]. Note
that we cannot compare SRAG with SFM because SFM is
only a FIFO memory. The address sequences used are the
write and read sequences for the array new img in the block
matching motion estimation algorithm shown in Figure 7.
This code segment defines the read sequence for new img
which exhibits block-based memory access. However, it
does not define the production order (i.e. write sequence)
for new img. Therefore, we assume that the write sequence
is such that LinAS is incremental (i.e. 0,1,2,. . . ,N).
Delay results in Figure 8 indicate that for both read and
write accesses (excluding the delay through the memory
cell array) , the SRAG is on average approximately twice
as fast as the CntAG. The delay through the SRAGs in-
creases slowly with array size. This is because the delay
is contributed mainly by the counters in the control circuit,
which is dependent on the repetitive pattern in the address
sequence and not on the size of the array. In contrast, the de-
lay in the CntAG increases much faster with array size. This
is because as N gets larger, the size and delay of the address
decoder dominates. Figure 9 shows the delay through each
component of the CntAG, the total delay is the sum of the
counter delay and the worst of the row or the column de-
coder delay. This shows that as the array size increases the
decoder delay begins to dominate.
The area results in Figure 10 illustrate the performance-
area trade off. SRAG may be twice as fast as CntAG, but
it is also approximately three times larger in area. How-
ever, since the address generator/decoder circuit is in gen-
eral a small fraction of the total area of the memory, this
area penalty is usually unimportant when compared to the
speed increase.
Table 3 shows the average delay reduction and area in-
crease factors for several other examples. dct refers to an
access sequence from a separable discrete-cosine transform
(DCT). zoombytow is a sequence from an image zooming
algorithm. And motion est and fifo are the read and write
sequences for our motion estimation example, respectively.
7. Conclusion and Future Work
The results, presented in this paper, have shown the per-
formance gains achievable from address decoder decou-
pling. However, it is clear that the increase in performance
comes at a cost in area. It is possible to reduce the area of
SRAG through enhancements such as reuse of control cir-
cuitry between the row and the column address sequences
or exploiting the interaction between the row and the col-
umn address generators. However, the contribution of this
paper is not only the SRAG architecture but also the encour-
16x16 32x32 64x64 128x128 256x256
0
0.5
1
1.5
2
2.5
3
de
la
y/
ns
Array size(img_width x img_height)
SRAG(Write) 
CntAG(Write)
SRAG(Read)  
CntAG(Read) 
Figure 8. Address generator delay for differ-
ent array size
16x16 32x32 64x64 128x128 256x256
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
de
la
y/
ns
Array size(img_width x img_height)
counter       
row decoder   
column decoder
Figure 9. Delay figures for the components of
the CntAG
16x16 32x32 64x64 128x128 256x256
0
0.5
1
1.5
2
2.5
3 x 10
4
a
re
a
/(c
ell
 un
its
)
Array size(img_width x img_height)
SRAG(Write) 
CntAG(Write)
SRAG(Read)  
CntAG(Read) 
Figure 10. Address generator area for differ-
ent array size
6
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
Example Avg. Delay Avg. Area
reduction increase
factor factor
dct 1.7 3.2
zoombytwo 1.7 3.1
motion est 1.8 3.0
fifo 1.9 2.4
Table 3. Average delay reduction and area in-
crease for different examples
aging results it gives for the address decoder decoupling ap-
proach to array based memory access in general. For appli-
cations where the access patterns are regular, such as image
and video processing, this approach can effectively elimi-
nate the need for address decoding. However, the physical-
level viability and reliability of the ADDM has to be demon-
strated. For example, it must be guaranteed that no two row
select lines will be asserted at the same time as this could
corrupt data in the memory.
This particular work operates on a single address se-
quence at a time. Therefore, any reuse of address generation
circuity between different address sequences is not consid-
ered. The reuse of address circuity between different ad-
dress sequences in space and time can greatly reduce the
area resources required. As most modern high-performance
memory systems are based on distributed memory archi-
tectures, the interconnect and routing costs should also be
considered [7].
Although we expect this decoder decoupling approach to
reduce power dissipation, in this work we have not carried
out a rigorous study of it. Furthermore, SRAG architec-
ture is somewhat limited in its application and is very much
targeted towards block-based image processing applications
and FIFO type access. If SRAG is not applicable to a partic-
ular access pattern then , for example, CntAG architecture
or an arithmetic-based architecture can be used. We have
not demonstrated the impact of delay reduction achieved
through decoder decoupling on the overall memory access
delay due to lack of data for the memory cell array.
Our final goal is to discover algorithms and heuristics
which can explore the vast design space opened up by ad-
dress decoder decoupling at a high level of abstraction and
choose the best architecture for low level circuit optimiza-
tion.
Acknowledgement
This work was funded by LSI Logic Corporation and
Overseas Research Students Awards Scheme. The au-
thors would like to acknowledge the contributions of Mike
Brookes, George Constantinides, Andreas Dante, Nishanth
Kulasekeram, Andrew Royal and Marcus von Scotti.
References
[1] M. Aloqeely. A simple alternative for storage allocation in
high-level synthesis. In Proceedings of the IEEE Interna-
tional Symposium on Circuits and Systems, volume 6, pages
377 – 380, June 1998.
[2] L. Benini and G. de Micheli. System-level power optimiza-
tion: Techniques and tools. ACM Transations on Design Au-
tomation of Electronic Systems, 5(2):115–192, Apr. 2000.
[3] S. Gerez and E. Woutersen. Assignment of storage values
to sequential read-write memories. In Proceedings of the
European Design Automation Conference, pages 302 – 308,
Sept. 1996.
[4] D. Grant, P. B. Denyer, and I. Finlay. Synthesis of address
generators. In Proceedings of the IEEE International Con-
ference on Computer-Aided Design, pages 116 – 119, Nov.
1989.
[5] D. Grant, J. V. Meerbergen, and P. Lippens. Optimization of
address generator hardware. In Proceedings of the European
Design and Test Conference, pages 325 – 329, Feb. 1994.
[6] P. Lippens, J. V. Meerbergan, A. V. der Werf, and W. Ver-
haegh. PHIDEO: a silicon compiler for high speed algo-
rithms. In Proceedings of the European Conference on De-
sign Automation, pages 436 – 441, Feb 1991.
[7] M. Miranda, F. Catthoor, M. Janssen, and H. D. Man.
ADOPT: efficient hardware address generation in dis-
tributed memory architectures. In Proceedings of the In-
ternational Symposium on System Synthesis, pages 20 – 25,
Nov. 1996.
[8] M. Miranda, F. Catthoor, and M. J. H. D. Man. High-
level address optimization and synthesis techniques for data-
transfer-intensive applications. IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, 6(4):677 – 686,
Dec. 1998.
[9] P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brock-
meyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjelds-
berg. Data and memory optimization techniques for embed-
ded systems. ACM Transactions on Design Automation of
Electronic Systems, 6(2):149 – 206, Apr. 2001.
[10] H. Schmit and D. E. J. Thomas. Address generation for
memories containing multiple arrays. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems,
17(5):377–385, May 1998.
[11] J. van Sas, F. Catthoor, L. Inze´, and H. D. Man. Testability
strategy for registers and memories in a multi-processor ar-
chitecture. In Proceedings of the European Test Conference,
pages 294–303, Apr. 1989.
[12] S. Wuytack, Jean-Philippe, F. V. Catthoor, and H. J. D. Man.
Formalized methodology for data reuse exploration for low-
power hierarchical memory mappings. IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, 6(4):529–
537, Dec. 1998.
7
Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition (DATE02) 
1530-1591/02 $17.00 © 2002 IEEE 
