A low supply noise content-sensitive ROM architecture for SOC by Meng-Fan Chang et al.
The 2004 IEFE Asia-Pacific Conrcrrncc on 
Circuib and Systems. December 6-9.2004 
A LOW SUPPLY NOISE CONTENT-SENSITIVE ROM 
ARCHITECTURE FOR SOC 
Meng-Fan Chang’, Lih-Yih Chiou’, Kuei-Ann Wen’ 
Institute of Electronics, National Chiao Tung University, Hsin Chu, Taiwan 
’Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan 
mfchang.ee89g@nctu.edu.tw 
I 
ABSTRACT 
Supply noise caused by fluctuation of peak current is 
increasingly important for SOC. This work proposes a 
content-sensitive architecture to effectively reduce the 
fluctuation of peak current and lower power 
consumption of ROMs for various code-pattems and 
cycles. We achieve both by arranging the data pattems of 
ROM and by adjusting the bitline structures accordingly. 
Our experiments, on 0.25ym 256K bits ROM macros 
have shown that the fluctuation of peak current and 
power consumptions are less than 1.02% of the value 
using conventional approaches. 
1. INTRODUCTION 
Embedded read only memory (ROM) macros are 
commonly integrated into SoCs for code-storing or 
waveform tables. The demands for higher speed, lower 
power and larger capacity embedded ROM have defined 
the trends of SoCs. As digital, mixed-signal, 
radio-frequency (RF), and memory blocks are integrated 
on the same substrate of a single chip, the supply noise 
and supply-induced substrate noise [ I ]  in SoC may not 
only degrade the overall system performance, but also 
cause sensitive blocks to suffer from functional failures. 
Previous studies have addressed the effects of such noise 
on mixed-signal [Z-31, RF [4], and memory [5 ]  circuits. 
In SoCs. operating patterns of various cycles 
consume different peak currents and power 
consumptions. Parasitic resistance (IR drop) and 
inductance effects (Mi/df) are such that different current 
consumptions cause fluctuation in supply noise. This 
fluctuation of noisy environment increases the 
pattem-dependent soft-failure for sensitive blocks in SOC. 
Therefore, high-quality silicon IP (SIP) must be designed 
not only to generate small overall supply noise in SoCs 
but also to have small variations of supply noise from 
cycle to cycle. 
Unfortunately, various accessed ROM code-patterns 
cause large fluctuations of peak current and power 
consumptions across cycles. Previous reports focus on 
the reduction of overall power consumption of ROM 
[6-8]. To our best knowledge, this is the first work to 
investigate the code-pattem dependent fluctuations of 
peak current and current-induced supply noise for 
embedded ROM. 
A content-sensitive architecture (CSA) is developed 
for embedded ROM macros and ROM generators. It 
achieves small cycle-to-cycle supply noise variations, 
low standby current and low power consumption. The 
remainder of this presentation is organized as follows. 
Section 2 describes the proposed CSA. Section 3 
presents the experimental results. Section 4 draws 
conclusions. 
2. CONTECT-SENSITIVE ARCHITECTURE 
The code-pattems of the bit cells on a ROM hitline affect 
the amount of parasitic capacitance. The CSA explores 
shucture of a bitline and develops an effective procedure 
to identify the amount of parasitic capacitances and to 
minimize code-dependent fluctuations in parasitic effects. 
Thus, the peak current and supply noise can be 
significantly reduced. 
Increasing the number of connected bit cells on a 
bitline raise parasitic capacitances. In typical NOR-type 
contacttvidmetal programming ROM, the datum stored 
in a bit cell is determined by whether the electrical 
connection between the transistor of a bit cell and its 
bitline (BL) is on or off. In this work, a bit cell with a 
contacthidmetal layer that connects its NMOS transistor 
to its bit!ine stores the datum ”O”, a cell without such a 
layer stores the datum ”I”,  as depicted in Fig. 1. For a bit 
cell that stores the datum “O”, the parasitic capacitance 
on the drain side of the transistor is connected to its 
bitline. Hence, various data pattems result in different 
capacitance loading on a bitline. The bitline of a column 
whose all bit cells storing the datum “0” experiences the 
largest loading effect. This large parasitic capacitance on 
a bitline causes the power consumption of precharging 
the bitline to be high and the development of bitline 
voltage-swing to be slow. In conventional ROM, a bit 
cell with datum “ 0  represents the code “0”. 
0-7803-8660-4/04/$20.00 02004 IEEE 1021 
.. datum “ 0  - (CO/via) 
WL (PO) 
VSS (MI/OD 
BL 
(M2) 
Fig. 1. (a) Circuit and (b) layout of ROM bit cells.The layout 
include diffusion (OD), contact (CO), poly (PO), via and 
metal-2 (M2) layers. 
Fig. 2. Floor plan of a CSA ROM with up to 4 segments 
(dynamic merged, h 4 )  in each column. 
The CSA, as shown in Fig. 2, consists of divided 
programmed bitlines, local bitline switches (LBS), flag 
tables, dual-path output drivers and a ROM code 
programming procedure. 
Each column (bitline) can be divided into numerous 
segments (sub-bitlines) by LBSs and programmed based 
on the input ROM code. A LBS has two NMOS 
transistors and one shared contact that connects its two 
sub-bitlines (sub-BL) to the bitline. The layout of each 
transistor in LBSs is identical to the transistor in a ROM 
bit cell. Accordingly, better layout uniformity in cell 
array is achieved to reduce device mismatch after 
manufacturing. For each bitline, a k-bit table stores the 
flags that indicate the data-inversion status of each 
sub-bitline. If the flag signifies true for a sub-bitline, 
then the output driver takes the inversion path from the 
output of the sense amplifier. The codes read from the 
CSA ROM are still correct. 
We have developed a 3-steps code programming 
procedure to identify the amount of parasitic 
capacitances and to minimize variations in parasitic 
effects for bitlines. The procedure is illustrated in Fig. 3. 
At step-I, an input ROM code is initially programmed 
based on the CSA with i rows, j columns and k segments 
per bitline. The user-defined maximum number (k) of 
divided segments on each column is equal to 2, 4, 8, or 
16. At step-2, if the number of bit cells with code “0” on 
a sub-bitline exceeds m, then CSA programs data “1” for 
code “ 0  on this sub-bitline. Otherwise, programs data 
“ 0  for code “ 0 .  Them is defined as, 
rs i 
m=-XROUNDUP [ - . O ]  
2 k x rs 
herein rs, is 4 in this work, specifies the minimum 
number of increasing steps on the rows due to the layout 
of leaf-cells in a ROM generator. At step-3, if the 
ROM Max. number of segments 
Code (k) per column 
_ ............................. 
Step-2 
False 
data “ 0  = code “0 for data “ I ”  = code “0 for 
sub-BL (p.4); Flag @, q)=O 
Step-3 
umber of data “0’ 
consecutive sub-BLs , No Merge Merge sub-BLs, 
LBSs removal 
Fig. 3. ROM code programming procedure for CSA 
where 
k==, m=2. 
number of bit cells with data “ 0  on consecutive 
sub-bitlines is larger than m, these sub-bitlines are 
merged. The LBSs assigned at step-1 for these merged 
sub-bitlines are removed at this step. The removal of 
LBS reduces the loading on bitlines and segment 
selection lines to save power. 
As an example of code p r o g r d n g  procedure 
depicted in Fig. 4, a bitline is initially divided into four 
( k 4 )  sub-BLs by two LBSs. At step-2, the data in 
sub-BL[ 11 and sub-BL[3] is inverted. At step-3, since the 
total number of cells with data “ 0  on sub-BL[3:l] does 
not exceed m (m=2), these three sub-BLs are merged and 
one LBS is employed for this column instead of two. 
The largest loading on a CSA bitline is in bit cells, which 
is much smaller than i bit cells as in conventional ROM 
architecture, 
Therefore, the data-pattems and the number of LBSs 
on each bitline are “smartly” programmed based on the 
code-pattems. The parasitic loading on those accessed 
bitlines in CSA is optimized for smallest variations (m 
bit cells) for various code-patterns. CSA achieves 
smaller fluctuations of peak current than the inverted 
1022 
ROM or inverted row techniques [6]. Their work focuses 
on reducing the overall power consumptions but still 
may have large fluctuation of loading on accessed 
hitlines for various code-pattems and cycles. 
In addition to partitioning hitlines and the inverting 
mechanism based on the content of a ROM code, we 
also apply selective-precharge scheme on CSA to save 
power. The scheme applied on CSA incurs much less 
speed penalty than on conventional configurable ROM 
architecture. Unlike in conventional ROM, CSA ROM 
precharges only the selected hitlines rather than all of 
them, lo reduce the power consumption 191. In 
conventional ROM architecture [ 101, the precharge 
phase begins at the end of each access cycle and overlaps 
the decoding and control phases of a new cycle to reduce 
the cycle time. However, in selective-precharge scheme, 
the precharge phase cannot begin until the updated 
addresses of a new cycle are decoded to prevent the 
unselected hitlines from being precharged. 
This precharge phase normally delays the clock 
access time to make standard selective-precharge scheme 
not feasible for high-speed applications [7] in SOC. 
Fortunately, the precharge time and sensing time in CSA 
are much smaller than those of the standard 
selective-precharge or conventional ROM architecture 
because its hitline loading is much smaller. Therefore, 
the access time of the CSA ROM is not delayed, and the 
cycle time is shoner than that of the conventional ROM. 
In the standby mode, the CSA ROM is not 
precharged to reduce the leakage current. And there is no 
timing penalty to wake-up the CSA ROM. Between any 
two accessed cycles, which all the control signals are 
reset, the wordline is off and the hitlines keep their last 
status without precharging or discharging. Accordingly, 
the leakage current is reduced by the gated supply 
scheme while the ground is still provided. 
Two code-pattems are used to evaluate various ROM 
slructures. The minimum-accessed-loading (MinAL) 
pattem, in which all bit cells on the accessed row and 
selected columns store data “1”. is the code-pattem for a 
cycle that has the smallest parasitic capacitance on 
accessed bitlines and consumes the minimum current for 
both conventional and CSA ROM. The 
maximum-accessed-loading (MaxAL) pattern, which is 
similar to the MinAL pattern but stores data “0” instead 
of “I”, consumes the largest current for conventional 
ROM architecture. In CSA ROM, code-patterns in which 
all of the bit cells on the accessed row and m bit cells on 
each selected suh-bitline store data ”0” are the 
code-patterns (MaxAL-CSA) have the largest parasitic 
capacitance on bitlines and largest power consumption. 
The MinAL and MaxAWMaxAL-CSA may co-exist in a 
ROM code but happen at different cycles. 
3. EXPERIMENTAL RESULTS 
We investigated the peak and cycle current of 256Kh 
CSA ROMs. Various values of k and memory 
configurations are analyzed. The supply noise due to 
parasitic effects of package and current fluctuations of 
CSA-cyc a d - q c  CS-penk stc-penk 
Fig. 5 .  Fluctuations of peak and cycle current for 256Kb CSA 
(b16)  ROM and conventional (std) ROM macros with MinAL 
(Min.) and MaxAUMaxALCSA (Max.) code-patterns. 
0.6 r n p e a n - a m  1 2  
rl 0.5 1 .6s 
E 0.4 
6 0.3 1.2 5 
0.8 g 
i 
5 0.2 
0.1 0.4 
0 0 
2 4 8 16 
Segments. k 
Fig. 6. Fluctuations (dim of peak, cycle current and the area 
overhead (OH) of 256Kb CSA ROM macros with various k 
and codes (MinAL and MaxAL-CSA) 
1.2 
0.8 l e  
0.6.’ 
0.4 4 
0.2- 
0 
3 
16K16 8K32 4 w  
Fig. 7. Current consumptions and fluctuation (diff.) of three 
256Kb CSA ROM (k16)  macros (16K word by 16 bits, 8K 
word by 32 bits and 4K words by 64 bits). 
ROM is also presented. All experimental ROM 
structures are designed and physically laid out using 
1P4M 0.25um CMOS technology. Simulations are 
performed on RC extracted netlists. A 256Kb 
conventional ROM, as the reference design in this work, 
has verified the accuracy of our simulation methodology. 
As revealed in Fig. 5, the simulated variations in 
cycle and peak current consumptions of a 256Kb CSA 
ROM macro ( k 1 6 )  are 0.96% and 1.02% of those of a 
conventional one, respectively. Accordingly, CSA ROM 
not only consumes little power but also exhibits small 
cycle-to-cycle peak current fluctuations across various 
code-pattems. Figure 6 presents the simulated 
fluctuations of current consumptions and corresponding 
area overheads for CSA ROM macros with various 
values of k. Given a configuration of CSA, a large value 
of k in a column, more uniform on the cycle-to-cycle 
peak current and power consumption across various 
code-patterns and cycles. Although a design with larger 
value of k takes more area overhead, the penalty is less 
than 1.6% for a 256Kb macro. 
1023 
Table I 
Supply voltage variations due to ROM and package 
External Supply =2.5V 
Intra-Cycle Vpp 
Supply Voltage Fluctuations (V) 
Ref. Design CSA Reduction 
0.55 0.12 77.5% 
~ ~~ 
Vpp : peak-to-peak voltage va&ion across code-patterns 
Capacity(bits) 
Process ( u m )  
Word-width(hits) 
VDD (V) 
CycleTime(Mhz) 
Power(mW) 
PeakCurrent(mA) 
Table I1 
Comparisons between CSA ROM and previous works 
NHS- CRCS HSCS CSA Ref. 
PD 171 181 s 1111  (k=l6) Desim 
~~ . . ~~. . - 
16K 128K 32K 256K 256K 
0.35 0.35 0.25 0.25 0.25 
8 16 8 16 16 
3.3 3.3 1.0 2.5 2.5 
131.5 120 170 250’ 233 
5.29’ 8.63’ 2.2 8.3’ 81.5 
N.A. N.A. N.A. 25.6‘ 364.6’ 
PDP-bit I 2.51 0.56 0.39 0.12 1.36 
‘ Simulation result of worst-case code-patterns 
‘Simulation result of un-specified code-patterns 
Power is measured at 100Mhz 
PDP-bit: Power-Delay Product per bit (pico-Joulelbit) 
For a given memory capacity for CSA ROM, the 
cycle-to-cycle variations of current consumptions 
increase as the number of word-width increase. On the 
other hand, the reductions (from conventional macros) in 
worst-case peak current and the cycle current 
consumption are greater on those CSA ROM macros 
with a greater number of word-width. Fig. I presents the 
simulated current consumptions and fluctuations of three 
256Kb CSA ROMs. The ROM with a greater number of 
word-width has more number of bitlines need to be 
accessed during an operational cycle. Additionally, the 
reduction in the variations of the loading on each bitline 
is same for a CSA ROM with same value of k. 
Accordingly, CSA ROM with more bitline accessed has 
greater variations in bitline loading, but also achieves 
larger reduction in parasitic capacitance comparing to 
conventional architecture. 
The fluctuation of supply noise due to peak current 
and various ROM code-patterns are investigated by the 
co-simulation of a QF” package model and ROM 
macros. In practice, the envelops of current consumption, 
bonding structures and the number of power pins on a 
die determine the supply noise associated with the 
parasitic effects of a package. A QFN electrical model 
with 1.557nH. 0.15pF, and 0.5590 for a bondwire, 
whose length and diameter are 1.557” and l.Omil, 
respectively, is adopted in this packagemodelROM 
co-simulation. illustrated in Table I. CSA reduces 77.5% 
of the peak-to-peak supply voltage variations within a 
cycle and 93.3% of cycle-to-cycle supply voltage 
fluctuations. 
The simulated worst standby current of a 256Kb 
CSA ROM for various code-patterns is only 0.1pA 
compared to 2 . lpA of conventional one (95.3% 
reduction). CSA ROM clearly has good performance in 
peak current and the best power-delay product to 
memory capacity ratio. Table II compares the 
performance of CSA and other low-power solutions. 
4. CONCLUSIONS 
We have proposed CSA to reduce the intra-cycle and 
cycle-to-cycle fluctuations in peak current and power 
consumption for embedded ROM. The experiment 
examines various ROM architectures by 
packagemodellROM co-simulations. The reduction in 
fluctuations of peak and cycle current of a 256Kb CSA 
ROM macro (k16) can be up to 99%. while area 
overhead is less than 1.6%. Furthermore, the CSA has 
smaller power-delay product to memory capacity ratio 
than those of reponed works. 
5. ACKNOWLEDGMENT 
The authors would like to thank C. H. Lee, and Y. C. 
Liao for the implementations of chips. 
6. REFERENCES 
[I ]  M. Badaroglu et al., “Methodology and experimental 
verification for substrate noise reduction in CMOS 
mixed-signal ICs with synchronous digital circuits,” IEEE 
J. SolidState Circuits. pp. 1383-1395, Nov. 2002 
[21 N. Barton et al., “The effect of supply and substrate noise 
on jitter in ring oscillators,” Proc. IEEE 2002 Custom 
Integrated Circuits ConJ, pp. 505-508.2002. 
[31 F. Herzel and B. Razavi, “A study of oscillator jitter due to 
supply and substrate noise,” IEEE Trans. Circuits and 
Systems -11, pp. 56-62, Jan. 1999 
[4] M. Xu, T. H. Lee and B. A. Wooley. “Measuring and 
modeling the effects of substrate noise on the LNA for a 
CMOS GPS receiver,” IEEE J. Solid-state Circuits, pp. 
IS] M.-F. Chang, K.-A. Wen and D.-M. Kwai, “Supply and 
Substrate Noise Tolerance Using Dynamic Tracking 
Clusters in Configurahle Memory Designs,” IEEE Proc. 
Int’l Symp. Quality Electronic Design, pp. 225. Mar. 2004. 
[6] E. de Angel and E. E. Swartzlander, Jr., “Survey of 
low-power techniques for ROMs,” IEEE Int’l Symp.Low 
Power Electronics and Design, pp. 18-20, Aug. 1997 
[7] C. -R. Chang, J. -S. Wang. and C. -H. Yang, ‘‘ Low-Power 
and High-speed ROM Modules for ASIC Applications,” 
IEEE J, Solid-State Circuits. pp. 1516-1523, Oct. 2001 
I81 B. D. Yang and L. S. Kim, “ A Low-Power ROM Using 
Charge Recycling and Charge Sharing Technique,” IEEE J. 
Solid-State Circuits, pp. 641-653, April 2003 
[91 N. Weste and K. Eshraghian, Principle of CMOS V U 1  
Design: A Systems Perspective, second edition. Reading, 
MA: Addison-Wesley, pp. 585-588, 1993. 
[IO] Barry. R.L. et al.. “A high-performance ROM compiler for 
0.50 pm and 0.36 pm CMOS technologies,” IEEE Proc. 
Int’l ASIC Conference and Exhibit, pp. 370-373, Sep. 1995 
[ I l l  R. Sasagawa, I. Fukushi, M. Hamaminato and S. 
Kawashima, “High-speed Cascode Sensing Scheme for 
1.OV Contact-programming Mask ROM,” IEEE Symp. on 
V U 1  Circuits, Digest of Technical Papers, pp. 95-96, 1999 
473485, MW. 2001 
1024 
