Survey of Low Power Techniques for ROMs by Edwin De Angel & Earl E. Swartzlander
Survey of Low Power Techniques for ROMs
Edwin de Angel Earl E. Swartzlander, Jr.
Crystal Semiconductor Corporation Department of Electrical and
P.O Box 17847 Computer Engineering
Austin, TX 78744 University of Texas at Austin
Austin, TX 78712
Abstract
This paper presents a survey of low power techniques for Read
Only Memories (ROMs). Signiﬁcant savings in power dissipation
are achieved through the use of techniques at the circuit and ar-
chitecture level. The ROM circuits have been designedin 0.35
￿m
CMOS technologyand simulated using PowerMill.
Introduction
With the development of submicron technologies and the increase
of complexity on VLSI chips,the market for portable applications,
digital signal processors and ASIC implementations has focused
signiﬁcant effort on the design of low power systems [1]. ROMs
(Read Only Memories) are an important part of many digital sys-
tems (e.g., digital ﬁlters, digital signal processors, microprocessors
etc). The high area density of ROMs makes these types of circuits
very attractive to store ﬁxed information (e.g., coefﬁcients of a dig-
ital ﬁlter). As new submicron technologies are developed, the fast
speedsoftheseprocessesallow theimplementationofarchitectures
which could not be implemented in the past. Also the increase in
the number of metal layers becomes a main instrument to reduce
switched capacitance without penalty in the density of the ROM.
Signiﬁcantsavingsin powerare achievedthrough the implementa-
tion of severaltechniques. The focusof this paperis on techniques
at the architecture and transistor levels and their global impact on
power dissipation.
The ﬁrst section of the paperexplains traditional ROM designs
and the sourcesof powerdissipation. Thesecondpart of this paper
discusseslow power techniques at the architecture level. The next
section presents techniques that are applicable at the circuit level.
The last section shows results and conclusions.
1 Sources of Power Dissipation
Figure 1 shows the traditional architecture of a ROM. The decoder
selectsamongthe rowlines thatrunthroughthe ROMcore,turning
on only one row line at a given time. The column multiplexer and
driver select which column is being read and drive the data bus.
The control logic generates the internal signals of the ROM (i.e.,
precharge, read etc.). The ROM core is used to store information
through the placement of transistors. There are two main types of
ROMS: NAND array, where pull down transistors are in seriesand
NOR array where the pull down transistors are in parallel. This
paper focuses on ROMs using a NOR array since these structures
are faster than NAND arrays and are the most frequently used [2].
D
e
c
o
d
e
r
Column Mux & Driver Control
... ...
.
.
.
Address
Clk
Dataout
Bit Line
Row Line
Rom Core
Figure 1: ROM Block Diagram
12-1 Mux
Figure 2: ROM Bitlines
In ordertosavepower,mostROMs prechargeduring onephase
of the clock and evaluate in the other. Table 1 shows the power
dissipation in a 2K x 18 ROM designed in 0.6
￿m technology at
3.3V and clocked at 10 MHz. As the table shows,the precharge of
the bit lines in the ROM core dissipates most of the power. There
are two main reasonsfor this. First, bit lineshavelarge capacitance
(drain capacitance of the transistors tied to this line, parallel plate
and fringe componentto substrateplus the overlap of the row lines
andothermetallayers). Second,morethan18bitlinesareswitched
per access;this is due to the word line selecting more bit lines thanTable 1: Power Dissipation ROM 2k x 18
Block Power Percentage
** (mW) (%)
Decoder 0.06 2.1
ROM Core 2.24 89
Control 0.18 7.2
Drivers 0.05 1.7
is necessary (see ﬁgure 2). The example presents a multiplexer
ratio 12 to 1. As a result at least4 more bit lines will switch instead
of one.
The power dissipated in the control logic is becauseit contains
all the drivers to generate the signals feeding the decoder. Also
the control logic generate the precharge signal which is used to
precharge the ROM core, enable the output drivers, and enable the
decode logic. The power dissipated in the decoder is not much
since only one row line switches per access.
2 Low Power Techniques: Architecture
Since most of the power dissipated is due to switching of the bit
lines, a signiﬁcantnumberof the following techniquesfocuson the
ROM core.
2.1 Hierarchical Word Line
This concepthasbeenproposedfor static random accessmemories
(SRAMS) [3]. The basic idea is to divide the memory in different
blocks and run the block word line in one layer (i.e., metal1 or
poly) and a global word line in other layer. As a result only the
bit cells of the desired block are accessed. The same concept can
be applied to ROMs. The ROM can be divided in several blocks
and a given block is enabled through the address bits. Although a
signiﬁcant amountof the power dissipated can be reduced through
this technique, it does not solve the problem completely, the main
reason is that due to layout considerations a ratio of at least 4 to 1
is required in the multiplexer. A signiﬁcant reduction in power is
obtained but still more than one bit line per bit could be switching.
2.2 Selective Precharge
Largecapacitanceisbeingswitchedpercyclebecauseeverybitline
is being prechargedhigh during the ﬁrst part of the cycle andmany
bit lines are dischargedevenwhentheselocationsare notaccessed.
Through selective precharge only bit lines which will be accessed
are precharged[5]. Thehardwareoverheadof this techniqueis low
sincemostof this controllogic is the samecontrol logic requiredto
control the multiplexers at the bottom of the ROM.
2.3 Minimization of Non-zero Terms
This technique focuses in the reduction of the capacitance in the
bit lines and the row lines. This can be achieved by minimizing
the number of non-zero terms in the ROM table which reduces the
number of NMOS devices in the ROM core. This technique is
very efﬁcient since zero terms do not switch bit lines and reduce
capacitance in both bit lines and row lines.
2.3.1 Inverted ROM
If the number of ones is very high, the whole ROM core can be
inverted and the ﬁnaldata inverted in the drivers. The efﬁciencyof
this type of encoding depends on the original number of non-zero
terms. If the number of non-zero terms is close to half the number
of bits in the ROM core then the reduction of non-zero terms will
be small or none.
2.3.2 Inverted Row
Thereductionofnon-zerotermscanbeperformedordoneonarow
by row basis. A given row is inverted if more than half of the bits
arenon-zeroterms. Figure3showstwo originalrows andtheresult
after the techniquehas been applied. It is important to observethat
an extra bit per row is required to perform the encoding. Also note
that if the the whole ROM would have been inverted the reduction
ofnon-zerotermsinonerowwouldhavebeenoffsetbytheincrease
in the other one.
0 1 111011
0 11 1 0 0 0 0 0 11 1 0 0 0 0
Original
1 01000100
0
One Row Encoding
Encoded Bit
Figure 3: Inverted Row
2.3.3 Sign Magnitude Representation
Often a ROM is used to store the coefﬁcients of a digital ﬁlter. As a
result,asigniﬁcantamountofthenon-zeroterms aredueto thesign
extension of the negative coefﬁcients. Sign Magnitude representa-
tion can be used to reduce a signiﬁcant number of the ones. The
maindrawbackofthistypeofencodingis thataconversiontotwo’s
complement is required at the endo of a cycle, which slows down
the ROM. Still for applications like mixed-signal systems where
speedis not an issue, this type of encoding can be very useful.
2.3.4 Sign Magnitude and Inverted Block
The number of non-zero terms can be reduced further more if the
sign magnitude representation is implemented along with the in-
vertedrowencoding. Afterthesignmagnitudeisdone,theinverted
row encoding could be applied in a subset of the row (e.g., the 5
least signiﬁcantbits).
2.4 Difference Encoding
Difference encoding can be used to reduce the whole size of the
ROM core. For digital ﬁlters and other applications the ROM is
accessed sequentially. If the values between adjacent data do not
change signiﬁcantly between one address and the next, the ROM
core can store the difference between the data instead of the whole
value [4]. The main disadvantage is that an adder is required to
calculate the original value.
A variation of the same concept is to hard wire different con-
stants (i.e., offsets) and store only the difference with respectto the
constant.
2.5 Smaller ROMs
Figure 4 shows the coefﬁcients of a 102 tap FIR ﬁlter. If these co-
efﬁcients are stored in ROM, the largest coefﬁcients will determine
the size of the ROM required. More than 70% of the coefﬁcients
are below 18 bits. Still the largest coefﬁcientgoes up to 24 bits. Asa resultthe ROMcorehaswastedspaceandadditionalcapacitance.
Abetterimplementation canbeachievedif thelarge coefﬁcientsare
stored in a wide ROM with fewer address. The small coefﬁcients
are storedin narrowROMwith manyaddresses. Asimilar principle
can be applied for locations in ROM which are often accessed;lo-
cationsthatareaccessedfrequently arestoredinasmall,fastROM,
while the other locations are stored in a larger ROM [6].
0.0 50.0 100.0
-2000000.0
0.0
2000000.0
4000000.0
6000000.0
Figure 4: 102 Tap FIR Filter
3 Low Power Techniques: Circuit Level
Low power techniques at the circuit level can be powerful tools to
reduce the power in VLSI systems[7].
3.1 NMOS Precharge
An important technique to reduce the power dissipated in the bit
lines is limiting the voltage swing. This can be done through
NMOS precharge of the ROM core; NMOS transistors are used to
precharge bit lines high. As a result, bit lines are precharged to
Vdd - Vt, where Vt is the threshold voltage. Since the bit lines
switch only between Vdd - Vt and ground signiﬁcant savings can
be achieved. A drawback of this technique is degradation of noise
margins and the body bias effect (which increases the threshold
voltage) requiring careful design of the outputdrivers.
3.2 Voltage Keeper
Oncethe numberof non-zeroterms hasbeenminimized, switching
of bit lines is reduced. Still even if the same location of the ROM
is accessed repeatedly, bit lines need to be precharged every time.
In order to avoid switching in the data bus or the adder required to
convertfrom sign magnitudeto two’s complementa voltagekeeper
is used to minimize switching.
Figure 5 shows a possible implementation of the keeper with
the invert logic. The voltage keeper is used to store past history
and avoid transitions in the data bus and adder (if sign magnitude
is implemented). Fire signal is enabled after the ROM core has
evaluated. Pass and Invert signals are used if sign magnitude or
Row Invert are implemented.
Vdd
Fire
Input
w
Output
Precharge
Invert_b
Invert
Pass
Pass_b
Figure 5: Output Stage
3.3 Buffer Sizing
A large set of buffers is required in the control logic to drive the
address lines through the decoder, generate the control signals for
thecolumnmultiplexers, drivethe rowlinesanddrivethe precharge
signals. For a long time, the optimum buffer tapering factor e
= 2
:72 has been used [8]. Figure 6 presents the model used. In
the ﬁgure
g representsthe conductancewhile
￿ representsthe taper
deﬁnedas:
CL
gg g g
2 n-1
C
n
y
ll l
l
Figure 6: Driving Large Capacitive Loads
(
W
=
L
)
k
+1
=
￿
(
W
=
L
)
k (1)
where
W and
L are the width and length of transistors in a given
stage. In this case
￿ indicates the size of stage
k
+ 1 relative to
stage
k. The number of stages required for a given capacitive load
is:
n
=
ln
C
L
=
C
i
ln
￿
(2)
This model ignores the effect of parasitic capacitances at the
outputofeachstage. Haviland[9]includestheparasiticcapacitance
in the calculations using a split capacitor model (see ﬁgure 7).
C
x and
C
y are the inherent output capacitance and the incidental
load capacitancerespectively. Using this model and developingan
equation to minimize delay the optimum taper factor is:C y
CL
gg g g
2 n-1
2
C Cx C y y C x x CC y
nn
l
ll l
ll l l
Figure 7: Improved Model
0.0 1.0 2.0 3.0
Cx/Cy
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
T
a
p
e
r
 
F
a
c
t
o
r
Delay
Power-Delay
Figure 8: Power-Delay Product versusDelay
￿
[ln
(
￿
)
￿ 1
]
=
C
x
C
y
(3)
This equation shows that the optimum taper
￿ depends on the
ratio of
C
x
=
C
y. Still, this equationhasbeendevelopedto minimize
delay. For power dissipation, there are often large capacitive loads
which are not in the critical path. Choi [10] derived the tapering
factorto minimize power-delayproductusingthesamemodel. The
optimum
￿ can be expressedas:
(
￿
￿ 2
)ln
(
￿
)
￿
(
￿
￿ 1
)
=0( 4 )
If the parasitic capacitances are neglected
￿
= 4
:25. Havi-
land [9] showsthat both tapering factors can be related by:
￿
P
o
w
e
r
￿
D
e
l
a
y
=
(
￿
D
e
l
a
y
)
￿ (5)
where
￿
￿
= 1
:44. Figure 8 shows a graph comparing the
￿ for
differentratiosof
C
x
=
C
y. Adifferent derivationto minimizepower
under delay constraint havebeen done by Figueras [13].
3.4 Reduction of Short Circuit Currents
Careful design of the control logic is required in order to avoid
turning on row lines when the precharged circuitry is on. Also
outputdriversneedtobeenabledaftertheROMcorehasevaluated.
Delay lines canbeusedto generatesignalswith precisetiming [12].
A robust design of the delay lines is needed to avoid performance
degradation through process variations.
Asigniﬁcantreductionoftheshortcircuitdissipationcanalsobe
achievedthroughscalingofthepowersupply. Accurateexpressions
to estimate short circuit currents has been done by Caufape[13].
Table 2: ROM Encoding
Encoding Power
** (mW)
Two’s Complement 0.80
Sign Magnitude 0.78
Row Invert 0.69
Table 3: Selective Precharge
Selective Precharge Power
** (mW)
Before 0.69
After 0.58
3.5 Voltage Scaling
Voltage scaling is one of the most powerful tools to reduce the
powerdissipation. Aquadraticimprovementcanbeeasilyachieved
throughvoltagescaling. Althoughthistechniqueisveryeffectivein
reducing power the speed of the circuits is degraded as the voltage
goes down. A ﬁrst order derivation [1] shows that the delay of
CMOS gates can be expressedas:
T
d
e
l
a
y
=
C
L
V
d
d
I
=
2
C
L
V
d
d
￿
C
o
x
(
W
=
L
)
(
V
d
d
￿
V
t
)2 (6)
The speedof ROMs is degraded signiﬁcantly becausethe tran-
sistor driving the bit lines is close to minimum size.
Results
Table2,3and 4showsthecummulativeeffectsofapplyingmultiple
lowpowermethods. Firstaconventional256x24ROMusingtwo’s
complementwasdesigned. Nextsignmagnitudewasappliedto the
data plugged into the ROM. The next design implements the row
invert encoding in addition to sign magnitude. Table 2 compares
the results of the several encodings in a 256 x 24 ROM. The data
storedintheROMwasgeneratedthroughapseudorandomfunction
in C language. The ROMs were designed with a mux ratio of 4
to 1, simulated with PowerMill [14] at 3.3V, 10MHz in 0.35
￿m
technology.
From thetableitcanbeobservedthatsincethedatain theROM
israndom,powersavingsusingrowinvertencodingaregreaterthan
usingsignmagnitudeencoding. Fordigitalﬁlters (seeﬁgure4)and
other applications where small negative numbers are required sign
magnitude gives better results.
Table 3 shows a comparison of the ROM with row invert en-
coding before and after selective precharge has been implemented.
Through selective precharge only 1 out of 4 columns are precharg-
ing resulting in signiﬁcant savings in power.
Table 4 shows the power dissipation of the ROM when the
voltage is scaled to 2.5V. Although signiﬁcantsavings are reached
quadratic savings are not achieved due to increase in short circuit
currents.
Table 4: Voltage Scaling
Voltage Power
** (mW)
3.3V 0.58
2.5V 0.39Table 5: Voltage Scaling
Technique Conditions Power Savings
** *** (%)
Sign Magnitude Random Data 2.5
Row Invert After Sign Magnitude 11
Selective Precharge After Sign Magnitude 14
and Row Invert
Voltage Scaling After Other Techniques 24
Total After all techniques 51
Table 5 shows the power savings of the different techniques.
The powersavingsshown for selective prechargeand voltage scal-
ing are after the other techniqueshave been applied.
Conclusion
ROMLowpowertechniquesatthearchitecturalandthecircuitlevel
have been presented. The use of several of these technique signif-
icantly reduces the power dissipated in the ROM. The efﬁciency
of the different techniques depends on the data stored to be stored
in the ROM core, speed requirements and area overhead. High
power savings can only be achieved through the use of multiple
techniques.
REFERENCES
[1] A. P. Chandrakasan, S. Sheng and R. W. Brodersen, “Low-
Power CMOS Digital Design,” IEEE Journal of Solid-State
Circuits, vol. 27, pp. 473-483,1992.
[2] D.A. HodgesandH.G.Jackson,AnalysisandDesignofDig-
ital Integrated Circuits, Second edition, McGraw-Hill Pub-
lishing Company.pp. 346-353,1988.
[3] M. Yoshimito, K.Anami, H. Shinohara,T. Yoshihara,H. Tak-
agi, S. Nagao, S. Kayano, and T. Nakano, “A Divided Word-
Line Structure in the Static RAMandits Applicationto a 64K
Full CMOS RAM,” IEEEJournalofSolid-State Circuits,vol.
SC-18, pp. 479-485,1983.
[4] N. Sankarayyaand K. Roy, “Algorithms for Low Power FIR
Filter Realization Using Differential Coefﬁcients,” IEEE 10th
International Conferenceon VLSI Design, Hyderabad,India,
pp. 174-178,1997.
[5] N. Weste, and K. Eshraghian, Principles of CMOS VLSI
Design: A Systems Perspective, Second edition, Addison-
Wesley,pp. 585-588, 1993.
[6] C. Piguet, “Low-Power Microprocessors and Memories,”
NATOSeminaronLowPowerDesigninDeepSubmicroElec-
tronics,Lucca,Tuscany,Italy, August 20-30,1996.
[7] E.deAngelandE.E.SwartzlanderJr.,“SurveyofTechniques
for Low Power VLSI Design,” International Conference on
Innovative Systems in Silicon, pp. 159-169,1996.
[8] R. C. Jaeger, “Comments on ‘An optimized output state for
MOS integrated circuits,’ ” IEEE Journal of Solid-State Cir-
cuits, vol. 10, pp. 185-186,1975.
[9] G.L.HavilandandA.A.Tuszynski,“CMOSTaperedBuffer,”
IEEE Journalof Solid-State Circuits, vol. 25, pp. 1005-1008,
1990.
[10] J. Choi and K. Lee, “Design of CMOS Tapered Buffer for
MinimumPower-DelayProduct,”IEEEJournalofSolid-State
Circuits, vol. 29, pp. 1142-1145,1994.
[11] H. J. Veendrick, “Short-Circuit Dissipation of Static CMOS
Circuitry and Its Impact on the Design of Buffer Circuits,”
IEEE Journal of Solid-State Circuits, vol. SC-19, pp. 468-
473, 1984.
[12] M. Santoro, Design and Clocking of VLSI Multipliers, Ph.D.
Dissertation, Stanford University, 1990.
[13] J.Figueras,“PowerModeling,”NATOSeminaronLowPower
Design in Deep Submicro Electronics, Lucca, Tuscany,Italy,
August20-30, 1996.
[14] C.X. Huang, B. Zhang, A-C. Deng, and B. Swirski, “The
DesignandImplementationofPowerMill,” Proceedings1995
InternationalSymposiumonLowPowerDesign,pp.105-109,
1994.