A sensing circuit for single-ended read-ports of SRAM cells with bit-line power reduction and access-time enhancement by T. Heselhaus & T. G. Noll
Adv. Radio Sci., 9, 247–253, 2011
www.adv-radio-sci.net/9/247/2011/
doi:10.5194/ars-9-247-2011
© Author(s) 2011. CC Attribution 3.0 License.
Advances in
Radio Science
A sensing circuit for single-ended read-ports of SRAM cells with
bit-line power reduction and access-time enhancement
T. Heselhaus and T. G. Noll
Chair of Electrical Engineering and Computer Systems, RWTH Aachen University, 52062 Aachen, Germany
Abstract. The conventional sensing scheme of single-ended
read-only-ports as integrated in 8T-SRAM cells suffers from
low performance compared to double-ended complementary
sensing schemes. In the proposed sensing scheme the pre-
charge voltage of the single-ended read-bit-line is set to a
level above the threshold voltage of the sensing device with
an adjustable margin. This margin is minimized to speed up
the read access on the one hand and kept large enough to
provide a sufﬁcient bit-line noise margin on the other hand.
The pre-charge voltage level of the proposed sensing circuit
tracks the threshold voltage of the sensing device under pro-
cess variations in order to maintain a minimum required bit-
line noise margin. To avoid unnecessary bit-line discharg-
ing, the proposed sensing scheme employs a modiﬁed 8T-
SRAM cell. Compared to the conventional 8T-SRAM cell,
the read port of the proposed cell provides a virtual ground
line running in parallel to the bit-lines. An internal driver of
the sensing circuit releases the virtual ground line during the
evaluation period to prevent the charge dissipation resulting
in a raised voltage level. The reduced pre-charge level and
the increased virtual ground lead to a reduced bit-line volt-
age swing and thus a bit-line power reduction. Access time,
energy dissipation, and noise margin of the proposed sens-
ing circuit are compared with conventional sensing circuits
from the literature for different numbers of memory cells
connected to the bit-line. It is shown, that for a speciﬁc num-
ber of memory cells per bit-line the proposed circuit achieves
fastest access time at low power operation.
Correspondence to: T. Heselhaus
(heselhaus@eecs.rwth-aachen.de)
1 Introduction
Single-endedreadsensingschemeswillbecomemoreimpor-
tant in memory architectures of SRAM cells with additional
read-only-ports to overcome the reliability problems related
to standard 6T-SRAM cells. Though such single-ended ports
require only one bit-line, the main power dissipation origi-
nates from the full-swing bit-line sensing schemes.
The intention of this work is to set the initial bit-line volt-
age close to the threshold level of the individual local sens-
ing device and stop the discharging, once the local sensing
has detected the data bit, such that no more than necessary of
the bit-line charge is dissipated. The expected side effect is a
performance upgrade, since the initial bit-line voltage resides
close to the sensing threshold level, such that this level can
be reached in a shorter evaluation time.
Section 2 presents a modiﬁed memory cell, which is re-
quired to enable the bit-line swing reduction. In Sect. 3 the
proposed sensing circuit is described. In Sect. 4 the proposed
sensing scheme is veriﬁed with simulations and compared
with other sensing circuits from the literature.
2 The proposed 8T-SRAM cell
Figure 1 shows the conventional (Chang et al., 2005) and the
proposed 8T-SRAM cell. Both cell variants consist of a con-
ventional cross-coupled inverter-pair connected via two ac-
cess transistors to a complementary bit-line pair p and p,
which represents the write-port for storing data into the cell
by activating the storage-word-line s. The read-only-port
consists of two series connected transistors for single-ended
read-out with a separate read-word-line r.
Area and most of the wiring of the proposed cell is identi-
cal to the conventional cell, however the proposed cell con-
nects the source of the read-only-port to an additional virtual
ground line v instead of VSS. This additional virtual ground
Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V.248 T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells 2 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
"
A
"
A
"
+//
"
>
"
A
"
A
"
>
"

conventional proposed
PSfrag replacements
A A A A
D D
C C
> > 
+// +//
Fig. 1. Conventional and proposed 8T-SRAM-Cell as schematic
and layout view. The additional virtual ground line is labeled as v.
3 Pre-charge and sensing circuit
Fig. 2 shows the pre-charge and sensing circuit of a single-
ended bit-line for one column of memory cells. Only the
read-only-port of one memory cell is shown, however, there
are N memory cells connected to the bit-lines m and v. The
basic idea of the pre-charge circuit is to pre-charge the bit-
line m to a dened voltage level above the threshold-level of
the sensing transistor M3. Because the drain current of M3
is used to pre-charge the bit-line, M3 should itself turn off as
the pre-charge voltage is reached. The pre-charge process is
described as follows.
During the pre-charge-phase (=0) the word-line is de-
activated (r =0) and the sensing is disabled (e=1). For an
initially discharged bit-line m the gate potentials of M3 and
M4 arebothzeroandthebit-lineis chargedbytheseries con-
nected transistors M3 and M4. As the gates of M5 and M6
are at zero, these transistors establish a voltage divider of the
M5
M7
M8
M9
M11
M10
M2
M1
M5’
M4
M6
M3
PSfrag replacements
 6
8
C
>



J
Fig. 2. Schematic of the proposed single-ended read sensing and
pre-charge circuit for one memory column and domino connected
digit line multiplex.
actual bit-line voltage. The gate of M3 is connectedto the di-
vided voltage u, which is a fraction of the bit-line voltage m.
While the bit-line voltage rises, the tapped gate potential u
rises as well. The pre-charge process of the bit-line stops, if
u rises aboveVDD jVthj, such that M3 entersthe off-region.
Hence the nal pre-chargevoltage Vm;max of the bit-line has
risen in excess of the sensing threshold level VDD jVthj of
M3. This overshoot can be adjusted by choosing the appro-
priate dimensioning of M5 and M6.
The intention of the overshoot is to achieve the required
noise margin on the bit-line, which must be maintained even
under process variations. With the feed-back loop for pre-
chargingthe bit-line via the transistors M3, M4, M5 and M6,
the voltage overshoot of Vm;max tracks the sensing threshold
level of M3. This tracking effect is veried by simulations in
Section 4.
In fact, in the proposed sensing scheme the signal is de-
tected with two successive sensing stages, realized by M3
and M9. It behaves similar to an inverter sense circuit fol-
lowed by a domino NMOS pull-down device. However, the
nodes x, y and v in the sensing circuit have to be pre-charged
or clamped before entering the evaluation phase. During the
pre-charge-phase the bit-line m rises to Vm;max and shortly
M8beginsto dischargethe internalnodey such that themain
output stage M9 turns off. The drain of M9 is connected to
a common node x, which could be a global bit-line in a sub-
divided bit-line memory architecture or a common digit line
in some other hierarchical memory architecture (e.g. a col-
umn or data word multiplex output). Here, this output x is
assumed to be domino-connected, thus x requires an addi-
tional pre-charge device M11, which is not part of the sens-
ing circuit. However, M11 pre-charges x to VDD such that
M10 turns on and clamps the virtual ground line v to VSS.
The circuit is now prepared for sensing.
The pre-charge phase is completed by reverting  to VDD
which turns off M4. Since M6 is connected as a diode, M6
turns off as well, the voltage divider built by M5 and M6 is
now no longer active and u follows the bit-line voltage via
M5. The transistor M5' may be added to afford a close cou-
plingbetweenm andu overa widerangeofbit-linevoltages.
Actually both M5' and M5 together build up a transmission
gateto connectm tothe gateofM3. Insomecases whereM5
is strong enough for a close coupling between m and u, M5'
might be omitted. However, in this work M5' is included
in each power and performance simulation and comparison.
The transistor M5' is minimally dimensioned.
Additionally, e is now tied to zero, and M7 connects the
sensing transistor M3 to the internal node y. Any leakage
of M3 due to the near threshold-level pre-charged bit-line m
and node u is compensated now by the weakly dimensioned
transistor M8, such that y is kept to VSS. The evaluation-
phase starts by activating the word-line r. Depending on
the stored data g, the bit-line remains pre-charged or is dis-
charged if g =VDD via M2 and M1 of the memory cell and
M10 of the sensing circuit. The gate potential u of M3 falls
Fig. 1. Conventional and proposed 8T-SRAM cell as schematic and
layout view. The additional virtual ground line is labeled as v.
line runs in parallel to the read-bit-line m and can be easily
placed in the thin layout cell without any area penalty. The
intention of this additional virtual ground line is to reduce
the bit-line swing and power dissipation. The functionality
of this virtual ground line will be described in the next sec-
tion.
3 Pre-charge and sensing circuit
Figure 2 shows the pre-charge and sensing circuit of a single-
ended bit-line for one column of memory cells. Only the
read-only-port of one memory cell is shown, however, there
are N memory cells connected to the bit-lines m and v. The
basic idea of the pre-charge circuit is to pre-charge the bit-
line m to a deﬁned voltage level above the threshold-level of
the sensing transistor M3. Because the drain current of M3
is used to pre-charge the bit-line, M3 should itself turn off as
the pre-charge voltage is reached. The pre-charge process is
described as follows.
During the pre-charge-phase (φ =0) the word-line is de-
activated (r =0) and the sensing is disabled (e =1). For an
initially discharged bit-line m the gate potentials of M3 and
M4 are both zero and the bit-line is charged by the series con-
nected transistors M3 and M4. As the gates of M5 and M6
are at zero, these transistors establish a voltage divider of the
actual bit-line voltage. The gate of M3 is connected to the di-
vided voltage u, which is a fraction of the bit-line voltage m.
While the bit-line voltage rises, the tapped gate potential u
rises as well. The pre-charge process of the bit-line stops, if
u rises above VDD−|Vth|, such that M3 enters the off-region.
Hence the ﬁnal pre-charge voltage Vm,max of the bit-line has
risen in excess of the sensing threshold level VDD−|Vth| of
M3. This overshoot can be adjusted by choosing the appro-
priate dimensioning of M5 and M6.
2 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
"
A
"
A
"
+//
"
>
"
A
"
A
"
>
"

conventional proposed
PSfrag replacements
A A A A
D D
C C
> > 
+// +//
Fig. 1. Conventional and proposed 8T-SRAM-Cell as schematic
and layout view. The additional virtual ground line is labeled as v.
3 Pre-charge and sensing circuit
Fig. 2 shows the pre-charge and sensing circuit of a single-
ended bit-line for one column of memory cells. Only the
read-only-port of one memory cell is shown, however, there
are N memory cells connected to the bit-lines m and v. The
basic idea of the pre-charge circuit is to pre-charge the bit-
line m to a dened voltage level above the threshold-level of
the sensing transistor M3. Because the drain current of M3
is used to pre-charge the bit-line, M3 should itself turn off as
the pre-charge voltage is reached. The pre-charge process is
described as follows.
During the pre-charge-phase (=0) the word-line is de-
activated (r =0) and the sensing is disabled (e=1). For an
initially discharged bit-line m the gate potentials of M3 and
M4 arebothzeroandthebit-lineis chargedbytheseries con-
nected transistors M3 and M4. As the gates of M5 and M6
are at zero, these transistors establish a voltage divider of the
M5
M7
M8
M9
M11
M10
M2
M1
M5’
M4
M6
M3
PSfrag replacements
 6
8
C
>



J
Fig. 2. Schematic of the proposed single-ended read sensing and
pre-charge circuit for one memory column and domino connected
digit line multiplex.
actual bit-line voltage. The gate of M3 is connectedto the di-
vided voltage u, which is a fraction of the bit-line voltage m.
While the bit-line voltage rises, the tapped gate potential u
rises as well. The pre-charge process of the bit-line stops, if
u rises aboveVDD jVthj, such that M3 entersthe off-region.
Hence the nal pre-chargevoltage Vm;max of the bit-line has
risen in excess of the sensing threshold level VDD jVthj of
M3. This overshoot can be adjusted by choosing the appro-
priate dimensioning of M5 and M6.
The intention of the overshoot is to achieve the required
noise margin on the bit-line, which must be maintained even
under process variations. With the feed-back loop for pre-
chargingthe bit-line via the transistors M3, M4, M5 and M6,
the voltage overshoot of Vm;max tracks the sensing threshold
level of M3. This tracking effect is veried by simulations in
Section 4.
In fact, in the proposed sensing scheme the signal is de-
tected with two successive sensing stages, realized by M3
and M9. It behaves similar to an inverter sense circuit fol-
lowed by a domino NMOS pull-down device. However, the
nodes x, y and v in the sensing circuit have to be pre-charged
or clamped before entering the evaluation phase. During the
pre-charge-phase the bit-line m rises to Vm;max and shortly
M8beginsto dischargethe internalnodey such that themain
output stage M9 turns off. The drain of M9 is connected to
a common node x, which could be a global bit-line in a sub-
divided bit-line memory architecture or a common digit line
in some other hierarchical memory architecture (e.g. a col-
umn or data word multiplex output). Here, this output x is
assumed to be domino-connected, thus x requires an addi-
tional pre-charge device M11, which is not part of the sens-
ing circuit. However, M11 pre-charges x to VDD such that
M10 turns on and clamps the virtual ground line v to VSS.
The circuit is now prepared for sensing.
The pre-charge phase is completed by reverting  to VDD
which turns off M4. Since M6 is connected as a diode, M6
turns off as well, the voltage divider built by M5 and M6 is
now no longer active and u follows the bit-line voltage via
M5. The transistor M5' may be added to afford a close cou-
plingbetweenm andu overa widerangeofbit-linevoltages.
Actually both M5' and M5 together build up a transmission
gateto connectm tothe gateofM3. Insomecases whereM5
is strong enough for a close coupling between m and u, M5'
might be omitted. However, in this work M5' is included
in each power and performance simulation and comparison.
The transistor M5' is minimally dimensioned.
Additionally, e is now tied to zero, and M7 connects the
sensing transistor M3 to the internal node y. Any leakage
of M3 due to the near threshold-level pre-charged bit-line m
and node u is compensated now by the weakly dimensioned
transistor M8, such that y is kept to VSS. The evaluation-
phase starts by activating the word-line r. Depending on
the stored data g, the bit-line remains pre-charged or is dis-
charged if g =VDD via M2 and M1 of the memory cell and
M10 of the sensing circuit. The gate potential u of M3 falls
Fig. 2. Schematic of the proposed single-ended read sensing and
pre-charge circuit for one memory column and domino connected
digit line multiplex.
The intention of the overshoot is to achieve the required
noise margin on the bit-line, which must be maintained even
under process variations. With the feed-back loop for pre-
charging the bit-line via the transistors M3, M4, M5 and M6,
the voltage overshoot of Vm,max tracks the sensing threshold
level of M3. This tracking effect is veriﬁed by simulations in
Sect. 4.
In fact, in the proposed sensing scheme the signal is de-
tected with two successive sensing stages, realized by M3
and M9. It behaves similar to an inverter sense circuit fol-
lowed by a domino NMOS pull-down device. However, the
nodes x, y and v in the sensing circuit have to be pre-charged
or clamped before entering the evaluation phase. During the
pre-charge-phase the bit-line m rises to Vm,max and shortly
M8 begins to discharge the internal node y such that the main
output stage M9 turns off. The drain of M9 is connected to
a common node x, which could be a global bit-line in a sub-
divided bit-line memory architecture or a common digit line
in some other hierarchical memory architecture (e.g. a col-
umn or data word multiplex output). Here, this output x is
assumed to be domino-connected, thus x requires an addi-
tional pre-charge device M11, which is not part of the sens-
ing circuit. However, M11 pre-charges x to VDD such that
M10 turns on and clamps the virtual ground line v to VSS.
The circuit is now prepared for sensing.
The pre-charge phase is completed by reverting φ to VDD
which turns off M4. Since M6 is connected as a diode, M6
turns off as well, the voltage divider built by M5 and M6 is
now no longer active and u follows the bit-line voltage via
M5. The transistor M5’ may be added to afford a close cou-
pling between m and u over a wide range of bit-line voltages.
Actually both M5’ and M5 together build up a transmission
gate to connect m to the gate of M3. In some cases where M5
is strong enough for a close coupling between m and u, M5’
might be omitted. However, in this work M5’ is included
Adv. Radio Sci., 9, 247–253, 2011 www.adv-radio-sci.net/9/247/2011/T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells 249 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ... 3
and M3 now directly turns on, which leads to a steep rising
edge on the internal node y. The rising edge on y activates
M9, which now discharges the common data output node x.
The power reduction feature is induced by the reduced
pre-charge level and the discharge inhibition with the addi-
tional virtual ground line v of the proposed 8T-SRAM-Cell
and transistor M10 of the sensing circuit. Since M10 is con-
nected to the data output x, which gets zero if the selected
memory cell discharges the bit-line m, M10 turns off and in-
hibits further charge dissipation to VSS as soon the data bit is
detected. Because the bit-lines m and v are designed similar
in the layout view, the parasitic capacitances of both bit-lines
are similar. Thus, once M10 turns off, the remaining charge
on the bit-line m will be balanced between m and v. The
nal bit-line potential of m would therefore settle to nearly
half of the bit-line potential, when the data bit was detected.
The nal potential depends on the ratio of the parasitic ca-
pacitances of m and v.
4 Simulation results
The sensing circuit was simulated for a 40nm general pur-
pose bulk CMOS technology with a nominal supply volt-
age of 900mV. All circuit simulations are based on
transistor schematics. Extracted parasitics from a layout
view are added to the bit-lines and to the virtual ground
line corresponding to the connected number of cells N 2
f8;16;32;64;128g. Internal nodes of the sensing circuit are
loaded with estimated parasitic capacitances rated between
0:4fF and 0:7fF depending on the assumed wiring complex-
ity. The transistor dimensioning of the proposed sensing cir-
cuit was basically designed with the minimal allowed gate
width of w = 120nm. Only the driving/sensing transistors
M3, M4, M9, and M10 have a gate width of w =480nm.
Simulation using slow corner device models at 125°C and
a minimal supply voltage of 900mV are used to verify the
function of the proposed sensing circuit under worst case
timing conditions. Fig. 3 shows the simulatedpre-chargeand
evaluation-phase of a read cycle with a stored '1' on the se-
lected memory cell (g =VDD). The achieved bit-line swing
V is approximately 435mV, a little bit less than VDD=2
such that bit-line energy savings of 50% compared to a full
swing bit-line sensing scheme can be expected.
Tracking the pre-charge voltage according to the sensing
threshold level of M3 under process variations was function-
ally veried by Monte Carlo simulations. For this case pro-
cess variations apply to M3 only, while all other transistors
operate in the typical corner. Fig. 4 shows the probability
density functions (PDF) of sensing threshold level variations
and the resulting access time prole without tracking on the
left side and with tracking on the right side. On the ordinate
the mean sensing threshold level Vth, the mean pre-charge
level Vm;max, andtheir PDFs are plotted. Theabscissas show
the mean access times tACC and their PDFs for both cases.
0 500 1000 1500
0
450
900
0
450
900
−60
−30
0
PSfrag replacements

+   j+NBj
E
m
V
m
V
µ
A
E  ps
  
C
>


 J
m+ mV
Fig. 3. Simulated waveform for a read-cycle of the proposed sens-
ing circuit (slow corner, 125°C, VDD =900mV).
The standard deviation of the access time PDF was reduced
from 8:1% to 1:5%.
When considering process variations of all transistors of
the sensing circuit this also affects the fraction of the voltage
divider M5/M6. Fig. 5 shows the simulated correlation of
the sensing threshold level variation to the pre-charge level
variation. The left side of Fig. 5 shows the voltage level
distributions, where all transistors of the sensing circuit in
Fig. 2 are designed with a gate lengths of lM5 =lM6 =40nm.
Since device mismatches of the voltage divider transistors
M5 and M6 disturb the correlation of the sensing threshold
level to the pre-charge level, a reduction of the variability of
M5 and M6 can improve the correlation. Choosing a gate
length of lM5 =lM6 =70nm for the voltage divider M5/M6
improves the correlation coefcient to =0:8 compared to
=0:65 (for lM5 =lM6 =40nm). Reducing the variability
of the voltage divider reduces the standard deviation of the
access time from 10:1% to 9%, while the mean access time
remains nearly constant.
0 0
corr.
PSfrag replacements
) =nm
) =nm
) =nm
E E
+   j+NBj
+GG}R
mV
mV
 
ps ps
 !
E
 !
E
+
x
?
std(E)= std(E)=
Fig. 4. Verication of the sensing threshold level tracking. The
left diagram shows the resulting access time prole with a xed bit-
line pre-charge voltage, while the right diagram shows the results
when the sensing circuit includes the voltage divider M5/M6 for
the tracking (Monte Carlo simulations, 25°C, VDD =900mV).
Fig. 3. Simulated waveform for a read-cycle of the proposed sens-
ing circuit (slow corner, 125°C, VDD =900mV).
in each power and performance simulation and comparison.
The transistor M5’ is minimally dimensioned.
Additionally, e is now tied to zero, and M7 connects the
sensing transistor M3 to the internal node y. Any leakage of
M3 due to the near threshold-level pre-charged bit-line m and
node u is compensated now by the weakly dimensioned tran-
sistor M8, such that y is kept to VSS. The evaluation-phase
starts by activating the word-line r. Depending on the stored
data g, the bit-line remains pre-charged or is discharged if
g =VDD via M2 and M1 of the memory cell and M10 of the
sensing circuit. The gate potential u of M3 falls and M3 now
directly turns on, which leads to a steep rising edge on the
internal node y. The rising edge on y activates M9, which
now discharges the common data output node x.
Thepowerreductionfeatureisinducedbythereducedpre-
charge level and the discharge inhibition with the additional
virtual ground line v of the proposed 8T-SRAM cell and tran-
sistor M10 of the sensing circuit. Since M10 is connected to
the data output x, which gets zero if the selected memory cell
discharges the bit-line m, M10 turns off and inhibits further
charge dissipation to VSS as soon the data bit is detected. Be-
cause the bit-lines m and v are designed similar in the layout
view, the parasitic capacitances of both bit-lines are similar.
Thus, once M10 turns off, the remaining charge on the bit-
line m will be balanced between m and v. The ﬁnal bit-line
potential of m would therefore settle to nearly half of the
bit-line potential, when the data bit was detected. The ﬁnal
potential depends on the ratio of the parasitic capacitances of
m and v.
T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ... 3
and M3 now directly turns on, which leads to a steep rising
edge on the internal node y. The rising edge on y activates
M9, which now discharges the common data output node x.
The power reduction feature is induced by the reduced
pre-charge level and the discharge inhibition with the addi-
tional virtual ground line v of the proposed 8T-SRAM-Cell
and transistor M10 of the sensing circuit. Since M10 is con-
nected to the data output x, which gets zero if the selected
memory cell discharges the bit-line m, M10 turns off and in-
hibits further charge dissipation to VSS as soon the data bit is
detected. Because the bit-lines m and v are designed similar
in the layout view, the parasitic capacitances of both bit-lines
are similar. Thus, once M10 turns off, the remaining charge
on the bit-line m will be balanced between m and v. The
nal bit-line potential of m would therefore settle to nearly
half of the bit-line potential, when the data bit was detected.
The nal potential depends on the ratio of the parasitic ca-
pacitances of m and v.
4 Simulation results
The sensing circuit was simulated for a 40nm general pur-
pose bulk CMOS technology with a nominal supply volt-
age of 900mV. All circuit simulations are based on
transistor schematics. Extracted parasitics from a layout
view are added to the bit-lines and to the virtual ground
line corresponding to the connected number of cells N 2
f8;16;32;64;128g. Internal nodes of the sensing circuit are
loaded with estimated parasitic capacitances rated between
0:4fF and 0:7fF depending on the assumed wiring complex-
ity. The transistor dimensioning of the proposed sensing cir-
cuit was basically designed with the minimal allowed gate
width of w = 120nm. Only the driving/sensing transistors
M3, M4, M9, and M10 have a gate width of w =480nm.
Simulation using slow corner device models at 125°C and
a minimal supply voltage of 900mV are used to verify the
function of the proposed sensing circuit under worst case
timing conditions. Fig. 3 shows the simulatedpre-chargeand
evaluation-phase of a read cycle with a stored '1' on the se-
lected memory cell (g =VDD). The achieved bit-line swing
V is approximately 435mV, a little bit less than VDD=2
such that bit-line energy savings of 50% compared to a full
swing bit-line sensing scheme can be expected.
Tracking the pre-charge voltage according to the sensing
threshold level of M3 under process variations was function-
ally veried by Monte Carlo simulations. For this case pro-
cess variations apply to M3 only, while all other transistors
operate in the typical corner. Fig. 4 shows the probability
density functions (PDF) of sensing threshold level variations
and the resulting access time prole without tracking on the
left side and with tracking on the right side. On the ordinate
the mean sensing threshold level Vth, the mean pre-charge
level Vm;max, andtheir PDFs are plotted. Theabscissas show
the mean access times tACC and their PDFs for both cases.
0 500 1000 1500
0
450
900
0
450
900
−60
−30
0
PSfrag replacements

+   j+NBj
E
m
V
m
V
µ
A
E  ps
  
C
>


 J
m+ mV
Fig. 3. Simulated waveform for a read-cycle of the proposed sens-
ing circuit (slow corner, 125°C, VDD =900mV).
The standard deviation of the access time PDF was reduced
from 8:1% to 1:5%.
When considering process variations of all transistors of
the sensing circuit this also affects the fraction of the voltage
divider M5/M6. Fig. 5 shows the simulated correlation of
the sensing threshold level variation to the pre-charge level
variation. The left side of Fig. 5 shows the voltage level
distributions, where all transistors of the sensing circuit in
Fig. 2 are designed with a gate lengths of lM5 =lM6 =40nm.
Since device mismatches of the voltage divider transistors
M5 and M6 disturb the correlation of the sensing threshold
level to the pre-charge level, a reduction of the variability of
M5 and M6 can improve the correlation. Choosing a gate
length of lM5 =lM6 =70nm for the voltage divider M5/M6
improves the correlation coefcient to =0:8 compared to
=0:65 (for lM5 =lM6 =40nm). Reducing the variability
of the voltage divider reduces the standard deviation of the
access time from 10:1% to 9%, while the mean access time
remains nearly constant.
0 0
corr.
PSfrag replacements
) =nm
) =nm
) =nm
E E
+   j+NBj
+GG}R
mV
mV
 
ps ps
 !
E
 !
E
+
x
?
std(E)= std(E)=
Fig. 4. Verication of the sensing threshold level tracking. The
left diagram shows the resulting access time prole with a xed bit-
line pre-charge voltage, while the right diagram shows the results
when the sensing circuit includes the voltage divider M5/M6 for
the tracking (Monte Carlo simulations, 25°C, VDD =900mV).
Fig. 4. Veriﬁcation of the sensing threshold level tracking. The
left diagram shows the resulting access time proﬁle with a ﬁxed bit-
line pre-charge voltage, while the right diagram shows the results
when the sensing circuit includes the voltage divider M5/M6 for
the tracking (Monte Carlo simulations, 25°C, VDD =900mV).
4 Simulation results
The sensing circuit was simulated for a 40nm general pur-
pose bulk CMOS technology with a nominal supply volt-
age of 900mV. All circuit simulations are based on
transistor schematics. Extracted parasitics from a layout
view are added to the bit-lines and to the virtual ground
line corresponding to the connected number of cells N ∈
{8,16,32,64,128}. Internal nodes of the sensing circuit are
loaded with estimated parasitic capacitances rated between
0.4fF and 0.7fF depending on the assumed wiring complex-
ity. The transistor dimensioning of the proposed sensing cir-
cuit was basically designed with the minimal allowed gate
width of w = 120nm. Only the driving/sensing transistors
M3, M4, M9, and M10 have a gate width of w =480nm.
Simulation using slow corner device models at 125°C and
a minimal supply voltage of 900mV are used to verify the
function of the proposed sensing circuit under worst case
timing conditions. Figure 3 shows the simulated pre-charge
and evaluation-phase of a read cycle with a stored “1” on the
selected memorycell (g =VDD). The achieved bit-lineswing
1V is approximately 435mV, a little bit less than VDD/2
such that bit-line energy savings of 50% compared to a full
swing bit-line sensing scheme can be expected.
Tracking the pre-charge voltage according to the sensing
threshold level of M3 under process variations was function-
ally veriﬁed by Monte Carlo simulations. For this case pro-
cess variations apply to M3 only, while all other transistors
operate in the typical corner. Figure 4 shows the probability
density functions (PDF) of sensing threshold level variations
and the resulting access time proﬁle without tracking on the
left side and with tracking on the right side. On the ordinate
the mean sensing threshold level Vth, the mean pre-charge
level Vm,max, and their PDFs are plotted. The abscissas show
the mean access times tACC and their PDFs for both cases.
The standard deviation of the access time PDF was reduced
from 8.1% to 1.5%.
www.adv-radio-sci.net/9/247/2011/ Adv. Radio Sci., 9, 247–253, 2011250 T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells 4 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
300 400 500
550
600
650
300 400 500
PSfrag replacements
 =nm  =nm
+
G

G
}
R

m
V
+   j+NBj mV +   j+NBj mV
Pearson v= Pearson v=
mean(E)=ps mean(E)=ps
std(E)= std(E)=
Fig. 5. Simulated pre-charge level correlation to M3-threshold-level
variations of 1000 Monte Carlo runs for two different gate lengths
of the voltage divider M5/M6 (25°C, VDD =900mV).
Increasing the bit-line noise margin was simulated by en-
larging the gate length of M5 up to lM5 =420nm for a series
of Monte Carlo simulations, while keeping the gate length of
M6 xed to lM6 =70nm. The results shown in Fig. 6 exhibit
a similar distribution of the pre-charge level in correlation to
the sensing threshold voltage like depicted on the right dia-
gram in Fig. 5. However, the dot clouds are shifted to higher
pre-charge levels with increasing gate length lM5. The el-
lipses inside the dot clouds denote the standard error ellipse
of the two dimensional density function. The resulting im-
provement of the mean bit-line noise margin with increasing
the gate length lM5 =70nm!f140nm;280nm;420nmg are
respectively VBL;margin =f35mV;100mV;150mVg.
The proposed sensing circuit was compared to the follow-
ing sensing circuits (refer to Table 1). As explained for the
proposed sensing circuit, the sensing circuits for compari-
300 350 400 450 500 550
600
650
700
750
800
PSfrag replacements
)=nm
)=nm
)=nm
+
G

G
}
R

m
V
+   j+NBj mV
Fig. 6. Monte Carlo simulated pre-charge level for different lM5 in
correlation to the M3-threshold-level (lM6 =70nm, 1000 iterations,
25°C, VDD =900mV).
Designation
A The proposed sense circuit as shown in Fig. 2
B A domino style sensing of the local bit-line m with a
PMOS transistor, which implicitly implements the digit
line multiplex (see Fig. 7) (Takeda et al., 2004)
C Sensing with an inverter followed by an NMOS transistor
for digit line multiplex (Cosemans et al., 2007). A similar
sensing circuit implies a static CMOS NAND gate instead
of an inverter for early digit line multiplex (Chang et al.,
2007).
D An AC coupled sense amplier as PMOS replacement in a
domino read path (Qazi et al., 2010). For the reduced bit-
line swing due to word-line pulsing 400mV is assumed. To
compare this circuit, the driving part for the global bit-line
is included.
E An AC coupled sense amplier (Verma and Chandrakasan,
2009). For the reduced bit-line swing due to word-line
pulsing 200mV is assumed.
Table 1. Proposed and reference sensing circuits for comparison.
son are simulated using transistor schematics with estimated
parasitics for their internal nodes (0:4:::0:7fF). The transis-
tors are minimally dimensioned (l =40nm, w =120nm), ex-
cept for pre-charge and other driving devices (w =480nm).
For any numberof cells N 2f8;16;32;64;128gthe extracted
parasitics from a column of conventional 8T-SRAM cells as
shown in Fig. 1 are attached to the bit-line m of each ref-
erence circuit schematic. The circuit simulations of the ref-
erence circuits are applied for the same simulation corners,
temperature and supply voltage as for the proposed sensing
circuit. The cycle period was 2500ns. Fig. 7 shows the ref-
erence circuits B and C as implemented for comparison. For
the reference circuits domino-read and standard inverter the
gate widths of the sensing transistor M3 and the pre-charge
transistor M4 were set to w = 480nm, all others were de-
signed with minimal gate width of w = 120nm. Table 2
shows the simulated results of the proposed circuit A versus
the reference circuits B...E for N =128 cells. The access
time tACC of the proposed circuit A is faster than B, C, D
and E for N =128 cells. The access time tACC consists of a
Domino-Read
M4
M3
M3
M4
Standard Inverter
PSfrag replacements
  C C
> >


8 8
Fig. 7. Schematics of the domino-read (B) and inverter-read (C)
sensing circuits for comparison.
Fig. 5. Simulated pre-chargelevel correlation toM3-threshold-level
variations of 1000 Monte Carlo runs for two different gate lengths
of the voltage divider M5/M6 (25°C, VDD =900mV).
When considering process variations of all transistors of
the sensing circuit this also affects the fraction of the voltage
divider M5/M6. Figure 5 shows the simulated correlation of
the sensing threshold level variation to the pre-charge level
variation. The left side of Fig. 5 shows the voltage level
distributions, where all transistors of the sensing circuit in
Fig. 2 are designed with a gate lengths of lM5 =lM6 =40nm.
Since device mismatches of the voltage divider transistors
M5 and M6 disturb the correlation of the sensing threshold
level to the pre-charge level, a reduction of the variability of
M5 and M6 can improve the correlation. Choosing a gate
length of lM5 =lM6 =70nm for the voltage divider M5/M6
improves the correlation coefﬁcient to ρ =0.8 compared to
ρ =0.65 (for lM5 = lM6 = 40nm). Reducing the variability
of the voltage divider reduces the standard deviation of the
access time from 10.1% to 9%, while the mean access time
remains nearly constant.
Increasing the bit-line noise margin was simulated by en-
larging the gate length of M5 up to lM5 =420nm for a series
of Monte Carlo simulations, while keeping the gate length of
M6 ﬁxed to lM6 =70nm. The results shown in Fig. 6 exhibit
a similar distribution of the pre-charge level in correlation to
the sensing threshold voltage like depicted on the right dia-
gram in Fig. 5. However, the dot clouds are shifted to higher
pre-charge levels with increasing gate length lM5. The el-
lipses inside the dot clouds denote the standard error ellipse
of the two dimensional density function. The resulting im-
provement of the mean bit-line noise margin with increasing
the gate length lM5 =70nm→{140nm,280nm,420nm} are
respectively 1VBL,margin ={35mV,100mV,150mV}.
The proposed sensing circuit was compared to the follow-
ing sensing circuits (refer to Table 1). As explained for the
proposed sensing circuit, the sensing circuits for compari-
son are simulated using transistor schematics with estimated
parasitics for their internal nodes (0.4...0.7fF). The transis-
4 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
300 400 500
550
600
650
300 400 500
PSfrag replacements
 =nm  =nm
+
G

G
}
R

m
V
+   j+NBj mV +   j+NBj mV
Pearson v= Pearson v=
mean(E)=ps mean(E)=ps
std(E)= std(E)=
Fig. 5. Simulated pre-charge level correlation to M3-threshold-level
variations of 1000 Monte Carlo runs for two different gate lengths
of the voltage divider M5/M6 (25°C, VDD =900mV).
Increasing the bit-line noise margin was simulated by en-
larging the gate length of M5 up to lM5 =420nm for a series
of Monte Carlo simulations, while keeping the gate length of
M6 xed to lM6 =70nm. The results shown in Fig. 6 exhibit
a similar distribution of the pre-charge level in correlation to
the sensing threshold voltage like depicted on the right dia-
gram in Fig. 5. However, the dot clouds are shifted to higher
pre-charge levels with increasing gate length lM5. The el-
lipses inside the dot clouds denote the standard error ellipse
of the two dimensional density function. The resulting im-
provement of the mean bit-line noise margin with increasing
the gate length lM5 =70nm!f140nm;280nm;420nmg are
respectively VBL;margin =f35mV;100mV;150mVg.
The proposed sensing circuit was compared to the follow-
ing sensing circuits (refer to Table 1). As explained for the
proposed sensing circuit, the sensing circuits for compari-
300 350 400 450 500 550
600
650
700
750
800
PSfrag replacements
)=nm
)=nm
)=nm
+
G

G
}
R

m
V
+   j+NBj mV
Fig. 6. Monte Carlo simulated pre-charge level for different lM5 in
correlation to the M3-threshold-level (lM6 =70nm, 1000 iterations,
25°C, VDD =900mV).
Designation
A The proposed sense circuit as shown in Fig. 2
B A domino style sensing of the local bit-line m with a
PMOS transistor, which implicitly implements the digit
line multiplex (see Fig. 7) (Takeda et al., 2004)
C Sensing with an inverter followed by an NMOS transistor
for digit line multiplex (Cosemans et al., 2007). A similar
sensing circuit implies a static CMOS NAND gate instead
of an inverter for early digit line multiplex (Chang et al.,
2007).
D An AC coupled sense amplier as PMOS replacement in a
domino read path (Qazi et al., 2010). For the reduced bit-
line swing due to word-line pulsing 400mV is assumed. To
compare this circuit, the driving part for the global bit-line
is included.
E An AC coupled sense amplier (Verma and Chandrakasan,
2009). For the reduced bit-line swing due to word-line
pulsing 200mV is assumed.
Table 1. Proposed and reference sensing circuits for comparison.
son are simulated using transistor schematics with estimated
parasitics for their internal nodes (0:4:::0:7fF). The transis-
tors are minimally dimensioned (l =40nm, w =120nm), ex-
cept for pre-charge and other driving devices (w =480nm).
For any numberof cells N 2f8;16;32;64;128gthe extracted
parasitics from a column of conventional 8T-SRAM cells as
shown in Fig. 1 are attached to the bit-line m of each ref-
erence circuit schematic. The circuit simulations of the ref-
erence circuits are applied for the same simulation corners,
temperature and supply voltage as for the proposed sensing
circuit. The cycle period was 2500ns. Fig. 7 shows the ref-
erence circuits B and C as implemented for comparison. For
the reference circuits domino-read and standard inverter the
gate widths of the sensing transistor M3 and the pre-charge
transistor M4 were set to w = 480nm, all others were de-
signed with minimal gate width of w = 120nm. Table 2
shows the simulated results of the proposed circuit A versus
the reference circuits B...E for N =128 cells. The access
time tACC of the proposed circuit A is faster than B, C, D
and E for N =128 cells. The access time tACC consists of a
Domino-Read
M4
M3
M3
M4
Standard Inverter
PSfrag replacements
  C C
> >


8 8
Fig. 7. Schematics of the domino-read (B) and inverter-read (C)
sensing circuits for comparison.
Fig. 6. Monte Carlo simulated pre-charge level for different lM5 in
correlation to the M3-threshold-level (lM6 =70nm, 1000 iterations,
25°C, VDD =900mV).
Table 1. Proposed and reference sensing circuits for comparison.
Designation
A The proposed sense circuit as shown in Fig. 2
B Adominostylesensingofthelocalbit-linemwithaPMOS
transistor, which implicitly implements the digit line mul-
tiplex (see Fig. 7) (Takeda et al., 2004)
C Sensing with an inverter followed by an NMOS transistor
for digit line multiplex (Cosemans et al., 2007). A similar
sensing circuit implies a static CMOS NAND gate instead
of an inverter for early digit line multiplex (Chang et al.,
2007).
D An AC coupled sense ampliﬁer as PMOS replacement in a
domino read path (Qazi et al., 2010). For the reduced bit-
line swing due to word-line pulsing 400mV is assumed. To
compare this circuit, the driving part for the global bit-line
is included.
E An AC coupled sense ampliﬁer (Verma and Chandrakasan,
2009). For the reduced bit-line swing due to word-line
pulsing 200mV is assumed.
tors are minimally dimensioned (l =40nm, w =120nm), ex-
cept for pre-charge and other driving devices (w =480nm).
For any number of cells N ∈{8,16,32,64,128} the extracted
parasitics from a column of conventional 8T-SRAM cells as
shown in Fig. 1 are attached to the bit-line m of each ref-
erence circuit schematic. The circuit simulations of the ref-
erence circuits are applied for the same simulation corners,
temperature and supply voltage as for the proposed sens-
ing circuit. The cycle period was 2500ps. Figure 7 shows
the reference circuits B and C as implemented for compar-
ison. For the reference circuits domino-read and standard
inverter the gate widths of the sensing transistor M3 and the
Adv. Radio Sci., 9, 247–253, 2011 www.adv-radio-sci.net/9/247/2011/T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells 251
4 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
300 400 500
550
600
650
300 400 500
PSfrag replacements
 =nm  =nm
+
G

G
}
R

m
V
+   j+NBj mV +   j+NBj mV
Pearson v= Pearson v=
mean(E)=ps mean(E)=ps
std(E)= std(E)=
Fig. 5. Simulated pre-charge level correlation to M3-threshold-level
variations of 1000 Monte Carlo runs for two different gate lengths
of the voltage divider M5/M6 (25°C, VDD =900mV).
Increasing the bit-line noise margin was simulated by en-
larging the gate length of M5 up to lM5 =420nm for a series
of Monte Carlo simulations, while keeping the gate length of
M6 xed to lM6 =70nm. The results shown in Fig. 6 exhibit
a similar distribution of the pre-charge level in correlation to
the sensing threshold voltage like depicted on the right dia-
gram in Fig. 5. However, the dot clouds are shifted to higher
pre-charge levels with increasing gate length lM5. The el-
lipses inside the dot clouds denote the standard error ellipse
of the two dimensional density function. The resulting im-
provement of the mean bit-line noise margin with increasing
the gate length lM5 =70nm!f140nm;280nm;420nmg are
respectively VBL;margin =f35mV;100mV;150mVg.
The proposed sensing circuit was compared to the follow-
ing sensing circuits (refer to Table 1). As explained for the
proposed sensing circuit, the sensing circuits for compari-
300 350 400 450 500 550
600
650
700
750
800
PSfrag replacements
)=nm
)=nm
)=nm
+
G

G
}
R

m
V
+   j+NBj mV
Fig. 6. Monte Carlo simulated pre-charge level for different lM5 in
correlation to the M3-threshold-level (lM6 =70nm, 1000 iterations,
25°C, VDD =900mV).
Designation
A The proposed sense circuit as shown in Fig. 2
B A domino style sensing of the local bit-line m with a
PMOS transistor, which implicitly implements the digit
line multiplex (see Fig. 7) (Takeda et al., 2004)
C Sensing with an inverter followed by an NMOS transistor
for digit line multiplex (Cosemans et al., 2007). A similar
sensing circuit implies a static CMOS NAND gate instead
of an inverter for early digit line multiplex (Chang et al.,
2007).
D An AC coupled sense amplier as PMOS replacement in a
domino read path (Qazi et al., 2010). For the reduced bit-
line swing due to word-line pulsing 400mV is assumed. To
compare this circuit, the driving part for the global bit-line
is included.
E An AC coupled sense amplier (Verma and Chandrakasan,
2009). For the reduced bit-line swing due to word-line
pulsing 200mV is assumed.
Table 1. Proposed and reference sensing circuits for comparison.
son are simulated using transistor schematics with estimated
parasitics for their internal nodes (0:4:::0:7fF). The transis-
tors are minimally dimensioned (l =40nm, w =120nm), ex-
cept for pre-charge and other driving devices (w =480nm).
For any numberof cells N 2f8;16;32;64;128gthe extracted
parasitics from a column of conventional 8T-SRAM cells as
shown in Fig. 1 are attached to the bit-line m of each ref-
erence circuit schematic. The circuit simulations of the ref-
erence circuits are applied for the same simulation corners,
temperature and supply voltage as for the proposed sensing
circuit. The cycle period was 2500ns. Fig. 7 shows the ref-
erence circuits B and C as implemented for comparison. For
the reference circuits domino-read and standard inverter the
gate widths of the sensing transistor M3 and the pre-charge
transistor M4 were set to w = 480nm, all others were de-
signed with minimal gate width of w = 120nm. Table 2
shows the simulated results of the proposed circuit A versus
the reference circuits B...E for N =128 cells. The access
time tACC of the proposed circuit A is faster than B, C, D
and E for N =128 cells. The access time tACC consists of a
Domino-Read
M4
M3
M3
M4
Standard Inverter
PSfrag replacements
  C C
> >


8 8
Fig. 7. Schematics of the domino-read (B) and inverter-read (C)
sensing circuits for comparison.
Fig. 7. Schematics of the domino-read (B) and inverter-read (C)
sensing circuits for comparison.
Table 2. Simulated results for N =128 cells (slow corner, 125°C,
VDD =900mV).
A B C D E
tACC /ps 300 408 512 401 325
tACC,SA /ps 82 45 53 94 126
ESense /fJ 5.08 1.12 2.21 5.58 9.95
EBL /fJ 5.85 13.40 13.42 6.19 2.94
Etot /fJ 10.93 14.53 15.62 11.77 12.89
VBL,margin /mV 120 181 379 96 53
Transistor counta 10 3 5 13 14
a without transistors of memory cells.
pre-charge transistor M4 were set to w =480nm, all others
were designed with minimal gate width of w =120nm. Ta-
ble 2 shows the simulated results of the proposed circuit A
versus the reference circuits B...E for N =128 cells. The
access time tACC of the proposed circuit A is faster than B, C,
D and E for N =128 cells. The access time tACC consists of
a bit-line dependent contribution tACC,BL and an access time
offset tACC,SA, which depends on the internal complexity of
the sensing circuit. Apart from circuit B and C, the proposed
circuit shows a smaller intrinsic access time tACC,SA com-
pared to circuit D and E.
The sense error immunity to bit-line noise was simulated
with current pulses to inject charge to the bit-line during an
evaluation period. Read evaluation faults are detected within
an evaluation period of 2500ps. The tolerable bit-line noise
VBL,margin is shown in Table 2 for each circuit.
Table 2 shows that the bit-line energy per cycle of cir-
cuit A is similar to circuit D. This results from nearly the
same bit-line swing achieved in circuit A and D. The en-
ergy dissipation ESense of circuit A is less than that of cir-
cuit D and E. Since circuit B and C operate with full swing
bit-lines, but dissipate the least energy ESense, there must be
a crossover for a certain number of cells. Figure 8a shows
that this crossover point can be identiﬁed at N ≥ 64 cells.
T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ... 5
A B C D E
tACC / ps 300 408 512 401 325
tACC;SA / ps 82 45 53 94 126
ESense / fJ 5.08 1.12 2.21 5.58 9.95
EBL / fJ 5.85 13.40 13.42 6.19 2.94
Etot / fJ 10.93 14.53 15.62 11.77 12.89
VBL;margin / mV 120 181 379 96 53
Transistor count 10 3 5 13 14
without transistors of memory cells
Table 2. Simulated results for N =128 cells (slow corner, 125°C,
VDD =900mV).
bit-line dependent contribution tACC;BL and an access time
offset tACC;SA, which depends on the internal complexity of
the sensing circuit. Apart from circuit B and C, the proposed
circuit shows a smaller intrinsic access time tACC;SA com-
pared to circuit D and E.
The sense error immunity to bit-line noise was simulated
with current pulses to inject charge to the bit-line during an
evaluation period. Read evaluation faults are detected within
an evaluation period of 2500ps. The tolerable bit-line noise
VBL;margin is shown in Table 2 for each circuit.
Table 2 shows that the bit-line energy per cycle of cir-
cuit A is similar to circuit D. This results from nearly the
same bit-line swing achieved in circuit A and D. The en-
ergy dissipation ESense of circuit A is less than that of cir-
cuit D and E. Since circuit B and C operate with full swing
bit-lines, but dissipate the least energy ESense, there must be
a crossover for a certain number of cells. Fig. 8(a) shows
that this crossover point can be identied at N  64 cells.
A similar crossover point for the access time is located at
N  40 cells (see Fig. 8(b)). Simulations indicate smaller
access times of circuit E for N > 256 cells. However, the
proposed sensing circuit dissipates the least amount energy
and shows the fastest access time compared to all reference
circuits for N 2[64:::128].
8 16 32 64 128
0
0.5
1
1.5
2
8 16 32 64 128
0.5
1
1.5
2
A (prop.)
B
C
D
E
PSfrag replacements
+GG}R mV
+   j+NBj mV
(a) NIN (b) E
#
Fig. 8. Simulated tACC and Etot for different number of connected
cells N (normalized to the proposed circuit A, slow corner, 125°C,
VDD =900mV).
A B C D E
mean(tACC) 152ps 167ps 205ps 169ps 157ps
std(tACC) 9.0% 11.1% 9.5% 10.1% 10.6%
mean(tACC)* 153ps 164ps 204ps 170ps 158ps
std(tACC)* 9.5% 14.7% 10.6% 10.6% 11.3%
constant temperature 25°C and supply voltage VDD =900mV
*normally distributed temperature and supply voltage variations
temperature2[ 55°C:::125°C], VDD 2[810mV:::990mV]
Table 3. Comparison of the average access time and relative stan-
dard deviation of the proposed sensing circuit (with lM5 = lM6 =
70nm) for N =64 using 1000 Monte Carlo simulations.
Table 3 shows the mean access time over 1000 Monte
Carloiterationsforeachcircuitandits standarddeviationrel-
ative to the mean access time. The proposed sensing circuit
provides the fastest mean access time and lowest access time
variation compared to the reference circuits.
To implement the proposed sensing circuit in a hierarchi-
cal memoryarchitecture,thesensing circuitonall but thelast
hierarchylevel needs a small modicationto providethe sig-
naling scheme consisting of a digit line and a virtual ground
line. The modication affects the output stage of the pro-
posed sensing circuit and requires an additional internal in-
verter for driving the virtual ground switch. In Fig. 9 such a
modied sensing circuit and its application in a hierarchical
memory architecture is shown. The source terminal of the
output transistor is now connected to a virtual ground line
xv which enables the same signaling scheme as used for the
modied 8T-SRAM cell. Since now the output signal xm
will not fall below the threshold voltage of the virtual ground
switch M10, the output signal xm cannot be used to control
the gate of M10 anymore. Thus, it is required to generate the
controlsignal for the virtualgroundswitch inside the sensing
circuit with an additional inverter (highlighted in Fig. 9 by a
gray shaded box).
To show the performance improvement of the proposed
signaling scheme, two test circuits representing the data sig-
M10 PSfrag replacements
 6
>

>

SA*
SA*
SA
SA*
SA*
SA*
SA*
PSfrag replacements

6
> >  
>

Fig. 9. The modied sensing circuit with a virtual ground on the
output side and its implementation in a hierarchical memory archi-
tecture. SA* denotes the modied sensing circuit.
Fig. 8. Simulated tACC and Etot for different number of connected
cells N (normalized to the proposed circuit A, slow corner, 125°C,
VDD =900mV).
Table 3. Comparison of the average access time and relative stan-
dard deviation of the proposed sensing circuit (with lM5 = lM6 =
70nm) for N =64 using 1000 Monte Carlo simulations.
A B C D E
mean (tACC)a 152ps 167ps 205ps 169ps 157ps
std (tACC)a 9.0% 11.1% 9.5% 10.1% 10.6%
mean (tACC)b 153ps 164ps 204ps 170ps 158ps
std (tACC)b 9.5% 14.7% 10.6% 10.6% 11.3%
a constant temperature 25°C and supply voltage VDD =900mV.
b normally distributed temperature and supply voltage variations,
temperature ∈[−55°C...125°C], VDD ∈[810mV...990mV].
A similar crossover point for the access time is located at
N ≈40 cells (see Fig. 8b). Simulations indicate smaller ac-
cess times of circuit E for N >256 cells. However, the pro-
posed sensing circuit dissipates the least amount energy and
shows the fastest access time compared to all reference cir-
cuits for N ∈[64...128].
Table 3 shows the mean access time over 1000 Monte
Carlo iterations for each circuit and its standard deviation rel-
ative to the mean access time. The proposed sensing circuit
provides the fastest mean access time and lowest access time
variation compared to the reference circuits.
To implement the proposed sensing circuit in a hierarchi-
cal memory architecture, the sensing circuit on all but the last
hierarchy level needs a small modiﬁcation to provide the sig-
naling scheme consisting of a digit line and a virtual ground
line. The modiﬁcation affects the output stage of the pro-
posed sensing circuit and requires an additional internal in-
verter for driving the virtual ground switch. In Fig. 9 such a
modiﬁed sensing circuit and its application in a hierarchical
memory architecture is shown. The source terminal of the
output transistor is now connected to a virtual ground line
xv which enables the same signaling scheme as used for the
www.adv-radio-sci.net/9/247/2011/ Adv. Radio Sci., 9, 247–253, 2011252 T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells
T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ... 5
A B C D E
tACC / ps 300 408 512 401 325
tACC;SA / ps 82 45 53 94 126
ESense / fJ 5.08 1.12 2.21 5.58 9.95
EBL / fJ 5.85 13.40 13.42 6.19 2.94
Etot / fJ 10.93 14.53 15.62 11.77 12.89
VBL;margin / mV 120 181 379 96 53
Transistor count 10 3 5 13 14
without transistors of memory cells
Table 2. Simulated results for N =128 cells (slow corner, 125°C,
VDD =900mV).
bit-line dependent contribution tACC;BL and an access time
offset tACC;SA, which depends on the internal complexity of
the sensing circuit. Apart from circuit B and C, the proposed
circuit shows a smaller intrinsic access time tACC;SA com-
pared to circuit D and E.
The sense error immunity to bit-line noise was simulated
with current pulses to inject charge to the bit-line during an
evaluation period. Read evaluation faults are detected within
an evaluation period of 2500ps. The tolerable bit-line noise
VBL;margin is shown in Table 2 for each circuit.
Table 2 shows that the bit-line energy per cycle of cir-
cuit A is similar to circuit D. This results from nearly the
same bit-line swing achieved in circuit A and D. The en-
ergy dissipation ESense of circuit A is less than that of cir-
cuit D and E. Since circuit B and C operate with full swing
bit-lines, but dissipate the least energy ESense, there must be
a crossover for a certain number of cells. Fig. 8(a) shows
that this crossover point can be identied at N  64 cells.
A similar crossover point for the access time is located at
N  40 cells (see Fig. 8(b)). Simulations indicate smaller
access times of circuit E for N > 256 cells. However, the
proposed sensing circuit dissipates the least amount energy
and shows the fastest access time compared to all reference
circuits for N 2[64:::128].
8 16 32 64 128
0
0.5
1
1.5
2
8 16 32 64 128
0.5
1
1.5
2
A (prop.)
B
C
D
E
PSfrag replacements
+GG}R mV
+   j+NBj mV
(a) NIN (b) E
#
Fig. 8. Simulated tACC and Etot for different number of connected
cells N (normalized to the proposed circuit A, slow corner, 125°C,
VDD =900mV).
A B C D E
mean(tACC) 152ps 167ps 205ps 169ps 157ps
std(tACC) 9.0% 11.1% 9.5% 10.1% 10.6%
mean(tACC)* 153ps 164ps 204ps 170ps 158ps
std(tACC)* 9.5% 14.7% 10.6% 10.6% 11.3%
constant temperature 25°C and supply voltage VDD =900mV
*normally distributed temperature and supply voltage variations
temperature2[ 55°C:::125°C], VDD 2[810mV:::990mV]
Table 3. Comparison of the average access time and relative stan-
dard deviation of the proposed sensing circuit (with lM5 = lM6 =
70nm) for N =64 using 1000 Monte Carlo simulations.
Table 3 shows the mean access time over 1000 Monte
Carloiterationsforeachcircuitandits standarddeviationrel-
ative to the mean access time. The proposed sensing circuit
provides the fastest mean access time and lowest access time
variation compared to the reference circuits.
To implement the proposed sensing circuit in a hierarchi-
cal memoryarchitecture,thesensing circuitonall but thelast
hierarchylevel needs a small modicationto providethe sig-
naling scheme consisting of a digit line and a virtual ground
line. The modication affects the output stage of the pro-
posed sensing circuit and requires an additional internal in-
verter for driving the virtual ground switch. In Fig. 9 such a
modied sensing circuit and its application in a hierarchical
memory architecture is shown. The source terminal of the
output transistor is now connected to a virtual ground line
xv which enables the same signaling scheme as used for the
modied 8T-SRAM cell. Since now the output signal xm
will not fall below the threshold voltage of the virtual ground
switch M10, the output signal xm cannot be used to control
the gate of M10 anymore. Thus, it is required to generate the
controlsignal for the virtualgroundswitch inside the sensing
circuit with an additional inverter (highlighted in Fig. 9 by a
gray shaded box).
To show the performance improvement of the proposed
signaling scheme, two test circuits representing the data sig-
M10 PSfrag replacements
 6
>

>

SA*
SA*
SA
SA*
SA*
SA*
SA*
PSfrag replacements

6
> >  
>

Fig. 9. The modied sensing circuit with a virtual ground on the
output side and its implementation in a hierarchical memory archi-
tecture. SA* denotes the modied sensing circuit.
Fig. 9. The modiﬁed sensing circuit with a virtual ground on the
output side and its implementation in a hierarchical memory archi-
tecture. SA* denotes the modiﬁed sensing circuit.
6 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
Test circuit 2:
SA* SA* SA
Test circuit 1:
PSfrag replacements
C
C
8
8











Fig. 10. Test circuits for sensing scheme comparison. Test circuit 1
shows the proposed sensing scheme with virtual grounds. Test cir-
cuit 2 shows a domino-style sensing scheme. The gray shaded tran-
sistors are used to pre-charge the digit-lines and are not relevant for
the access time comparison. The dotted box encloses the read-only
port of the (modied) 8T-SRAM cell.
nal path of a hierarchicalmemoryarchitecturewere designed
and compared by simulations. The rst test circuit employs
the proposed sensing circuits and pairs of digit line and vir-
tual ground line on each level of hierarchy. The second
test circuit employs a domino-style sensing circuit similar to
the left circuit of Fig. 7. However, in domino-style sens-
ing circuits successive stages alternatively employ NMOS
and PMOS sensing devices. The test circuits are depicted
in Fig. 10. For fair comparison, each corresponding bit-
line, digit-line, and virtual ground of the test circuits are
equally loaded with parasitic capacitances on each hierar-
chy level. Fig. 11 shows the simulated waveform for read-
cycles of bothsensing circuits. The left read-cyclewas simu-
lated for the proposedsignalingscheme with includedvirtual
ground concept. This waveform shows the same properties
as the waveform in Fig. 3: the virtual grounds are succes-
0 400 800 1200
0
450
900
0
450
900
−600
−300
0
0 400 800 1200
PSfrag replacements
 
ps ps
m
V
m
V
µ
A
E  ps E  ps
     
C C


Fig. 11. Comparison of simulated waveforms for a read-cycle of the
proposed sensing circuit on the left and a domino-style full-swing
signaling circuit on the right (slow corner, 125°C, VDD =900mV).
sively released as soon as the data bit is detected. The re-
duced voltage swing of approximately475mV leads to digit-
line energy savings of approximately 50%. The right wave-
form shows the full-swing signals of the domino-style sens-
ing circuits. The access time of the proposed sensing cir-
cuit (tACC = 531ps) is smaller than the access time of the
domino-style sensing scheme (tACC =673ps). This leads to
an access time improvement of approximately 21%.
5 Conclusions
A new sensing circuit for a single-ended read-only-port of
SRAM cells is introduced. Instead of pre-charging the bit-
line to VDD, the proposed circuit sets the pre-charge level
close to the threshold level VDD  jVthj of the individual
sensing device, while ensuring a process variation tolerant
bit-line noise margin. With a small modication of the 8T-
SRAM cell to provide an additional bit-line v, the charge
dissipation of the bit-line m during evaluation can automat-
ically be stopped by the proposed circuit, once the data bit
is detected by the sensing circuit. Both effects together re-
duce the bit-line swing and the bit-line power dissipation.
For N =64:::128 cells, the proposed circuit achieves fewest
energy compared to the reference circuits along with good
performance improvements. The proposed sensing circuit is
also suited for a hierarchical memory architecture utilizing
the optimizedpre-chargelevel and virtual groundconcepts.
References
Chang, L., Fried, D. M., Hergenrother, J., Sleight, J. W., Dennard,
R. H., Montoye, R. K., Sekaric, L., McNab, S. J., Topol, A. W.,
Adams, C. D., Guarini, K. W., and Haensch, W.: Stable SRAM
Cell Design for the 32nm Node and Beyond, in: Symposium on
VLSI Technology, pp. 128129, 2005.
Chang, L., Nakamura, Y., Montoye, R. K., Sawada, J., Martin,
A. K., Kinoshita, K., Gebara, F. H., Agarwal, K. B., Acharyya,
D. J., Haensch, W., Hosokawa, K., and Jamsek, D.: A 5.3GHz
8T-SRAM with Operation Down to 0.41V in 65nm CMOS, in:
Symposium on VLSI Circuits, pp. 252253, 2007.
Cosemans, S., Dehaene, W., and Catthoor, F.: A Low-Power Em-
bedded SRAM for Wireless Applications, IEEE Journal of Solid-
State Circuits, 42, 16071617, 2007.
Qazi, M., Stawiasz, K., Chang, L., and Chandrakasan, A.: A
512kb 8T SRAM Macro Operating Down to 0.57V with An
AC-Coupled Sense Amplier and Embedded Data-Retention-
Voltage Sensor in 45nm SOI CMOS, in: IEEE International
Solid-State Circuits Conference, pp. 350351, 2010.
Takeda, K., Hagihara, Y., Aimoto, Y., Nomura, M., Uchida, R.,
Nakazawa, Y., Hirota, Y., Yoshida, S., and Saito, T.: Per-Bit
Sense Amplier Scheme for 1GHz SRAM Macro in Sub-100nm
CMOS Technology, in: IEEE International Solid-State Circuits
Conference, vol. 1, pp. 502542, 2004.
Verma, N. and Chandrakasan, A. P.: A High-Density 45 nm SRAM
Using Small-Signal Non-Strobed Regenerative Sensing, IEEE
Journal of Solid-State Circuits, 44, 163173, 2009.
Fig. 10. Test circuits for sensing scheme comparison. Test circuit 1
shows the proposed sensing scheme with virtual grounds. Test cir-
cuit 2 shows a domino-style sensing scheme. The gray shaded tran-
sistors are used to pre-charge the digit-lines and are not relevant for
the access time comparison. The dotted box encloses the read-only
port of the (modiﬁed) 8T-SRAM cell.
modiﬁed 8T-SRAM cell. Since now the output signal xm
will not fall below the threshold voltage of the virtual ground
switch M10, the output signal xm cannot be used to control
the gate of M10 anymore. Thus, it is required to generate the
control signal for the virtual ground switch inside the sensing
circuit with an additional inverter (highlighted in Fig. 9 by a
gray shaded box).
To show the performance improvement of the proposed
signaling scheme, two test circuits representing the data sig-
nal path of a hierarchical memory architecture were designed
and compared by simulations. The ﬁrst test circuit employs
the proposed sensing circuits and pairs of digit line and vir-
tual ground line on each level of hierarchy. The second
test circuit employs a domino-style sensing circuit similar to
the left circuit of Fig. 7. However, in domino-style sens-
ing circuits successive stages alternatively employ NMOS
and PMOS sensing devices. The test circuits are depicted
in Fig. 10. For fair comparison, each corresponding bit-line,
6 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports ...
Test circuit 2:
SA* SA* SA
Test circuit 1:
PSfrag replacements
C
C
8
8











Fig. 10. Test circuits for sensing scheme comparison. Test circuit 1
shows the proposed sensing scheme with virtual grounds. Test cir-
cuit 2 shows a domino-style sensing scheme. The gray shaded tran-
sistors are used to pre-charge the digit-lines and are not relevant for
the access time comparison. The dotted box encloses the read-only
port of the (modied) 8T-SRAM cell.
nal path of a hierarchicalmemoryarchitecturewere designed
and compared by simulations. The rst test circuit employs
the proposed sensing circuits and pairs of digit line and vir-
tual ground line on each level of hierarchy. The second
test circuit employs a domino-style sensing circuit similar to
the left circuit of Fig. 7. However, in domino-style sens-
ing circuits successive stages alternatively employ NMOS
and PMOS sensing devices. The test circuits are depicted
in Fig. 10. For fair comparison, each corresponding bit-
line, digit-line, and virtual ground of the test circuits are
equally loaded with parasitic capacitances on each hierar-
chy level. Fig. 11 shows the simulated waveform for read-
cycles of bothsensing circuits. The left read-cyclewas simu-
lated for the proposedsignalingscheme with includedvirtual
ground concept. This waveform shows the same properties
as the waveform in Fig. 3: the virtual grounds are succes-
0 400 800 1200
0
450
900
0
450
900
−600
−300
0
0 400 800 1200
PSfrag replacements
 
ps ps
m
V
m
V
µ
A
E  ps E  ps
     
C C


Fig. 11. Comparison of simulated waveforms for a read-cycle of the
proposed sensing circuit on the left and a domino-style full-swing
signaling circuit on the right (slow corner, 125°C, VDD =900mV).
sively released as soon as the data bit is detected. The re-
duced voltage swing of approximately475mV leads to digit-
line energy savings of approximately 50%. The right wave-
form shows the full-swing signals of the domino-style sens-
ing circuits. The access time of the proposed sensing cir-
cuit (tACC = 531ps) is smaller than the access time of the
domino-style sensing scheme (tACC =673ps). This leads to
an access time improvement of approximately 21%.
5 Conclusions
A new sensing circuit for a single-ended read-only-port of
SRAM cells is introduced. Instead of pre-charging the bit-
line to VDD, the proposed circuit sets the pre-charge level
close to the threshold level VDD  jVthj of the individual
sensing device, while ensuring a process variation tolerant
bit-line noise margin. With a small modication of the 8T-
SRAM cell to provide an additional bit-line v, the charge
dissipation of the bit-line m during evaluation can automat-
ically be stopped by the proposed circuit, once the data bit
is detected by the sensing circuit. Both effects together re-
duce the bit-line swing and the bit-line power dissipation.
For N =64:::128 cells, the proposed circuit achieves fewest
energy compared to the reference circuits along with good
performance improvements. The proposed sensing circuit is
also suited for a hierarchical memory architecture utilizing
the optimizedpre-chargelevel and virtual groundconcepts.
References
Chang, L., Fried, D. M., Hergenrother, J., Sleight, J. W., Dennard,
R. H., Montoye, R. K., Sekaric, L., McNab, S. J., Topol, A. W.,
Adams, C. D., Guarini, K. W., and Haensch, W.: Stable SRAM
Cell Design for the 32nm Node and Beyond, in: Symposium on
VLSI Technology, pp. 128129, 2005.
Chang, L., Nakamura, Y., Montoye, R. K., Sawada, J., Martin,
A. K., Kinoshita, K., Gebara, F. H., Agarwal, K. B., Acharyya,
D. J., Haensch, W., Hosokawa, K., and Jamsek, D.: A 5.3GHz
8T-SRAM with Operation Down to 0.41V in 65nm CMOS, in:
Symposium on VLSI Circuits, pp. 252253, 2007.
Cosemans, S., Dehaene, W., and Catthoor, F.: A Low-Power Em-
bedded SRAM for Wireless Applications, IEEE Journal of Solid-
State Circuits, 42, 16071617, 2007.
Qazi, M., Stawiasz, K., Chang, L., and Chandrakasan, A.: A
512kb 8T SRAM Macro Operating Down to 0.57V with An
AC-Coupled Sense Amplier and Embedded Data-Retention-
Voltage Sensor in 45nm SOI CMOS, in: IEEE International
Solid-State Circuits Conference, pp. 350351, 2010.
Takeda, K., Hagihara, Y., Aimoto, Y., Nomura, M., Uchida, R.,
Nakazawa, Y., Hirota, Y., Yoshida, S., and Saito, T.: Per-Bit
Sense Amplier Scheme for 1GHz SRAM Macro in Sub-100nm
CMOS Technology, in: IEEE International Solid-State Circuits
Conference, vol. 1, pp. 502542, 2004.
Verma, N. and Chandrakasan, A. P.: A High-Density 45 nm SRAM
Using Small-Signal Non-Strobed Regenerative Sensing, IEEE
Journal of Solid-State Circuits, 44, 163173, 2009.
Fig. 11. Comparison of simulated waveforms for a read-cycle of the
proposed sensing circuit on the left and a domino-style full-swing
signaling circuit on the right (slow corner, 125°C, VDD =900mV).
digit-line, and virtual ground of the test circuits are equally
loaded with parasitic capacitances on each hierarchy level.
Figure 11 shows the simulated waveform for read-cycles of
both sensing circuits. The left read-cycle was simulated for
the proposed signaling scheme with included virtual ground
concept. This waveform shows the same properties as the
waveform in Fig. 3: the virtual grounds are successively re-
leased as soon as the data bit is detected. The reduced volt-
age swing of approximately 475mV leads to digit-line en-
ergy savings of approximately 50%. The right waveform
shows the full-swing signals of the domino-style sensing
circuits. The access time of the proposed sensing circuit
(tACC =531ps)issmallerthantheaccesstimeofthedomino-
style sensing scheme (tACC =673ps). This leads to an access
time improvement of approximately 21%.
5 Conclusions
A new sensing circuit for a single-ended read-only-port of
SRAM cells is introduced. Instead of pre-charging the bit-
line to VDD, the proposed circuit sets the pre-charge level
close to the threshold level VDD −|Vth| of the individual
sensing device, while ensuring a process variation tolerant
bit-line noise margin. With a small modiﬁcation of the 8T-
SRAM cell to provide an additional bit-line v, the charge
dissipation of the bit-line m during evaluation can automat-
ically be stopped by the proposed circuit, once the data bit
is detected by the sensing circuit. Both effects together re-
duce the bit-line swing and the bit-line power dissipation.
For N =64...128 cells, the proposed circuit achieves fewest
energy compared to the reference circuits along with good
performance improvements. The proposed sensing circuit is
also suited for a hierarchical memory architecture utilizing
the optimized pre-charge level and virtual ground concepts.
Adv. Radio Sci., 9, 247–253, 2011 www.adv-radio-sci.net/9/247/2011/T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells 253
References
Chang, L., Fried, D. M., Hergenrother, J., Sleight, J. W., Dennard,
R. H., Montoye, R. K., Sekaric, L., McNab, S. J., Topol, A. W.,
Adams, C. D., Guarini, K. W., and Haensch, W.: Stable SRAM
Cell Design for the 32nm Node and Beyond, in: Symposium on
VLSI Technology, 128–129, 2005.
Chang, L., Nakamura, Y., Montoye, R. K., Sawada, J., Martin,
A. K., Kinoshita, K., Gebara, F. H., Agarwal, K. B., Acharyya,
D. J., Haensch, W., Hosokawa, K., and Jamsek, D.: A 5.3GHz
8T-SRAM with Operation Down to 0.41V in 65nm CMOS, in:
Symposium on VLSI Circuits, 252–253, 2007.
Cosemans, S., Dehaene, W., and Catthoor, F.: A Low-Power Em-
bedded SRAM for Wireless Applications, IEEE Journal of Solid-
State Circuits, 42, 1607–1617, 2007.
Qazi, M., Stawiasz, K., Chang, L., and Chandrakasan, A.: A
512kb 8T SRAM Macro Operating Down to 0.57V with An
AC-Coupled Sense Ampliﬁer and Embedded Data-Retention-
Voltage Sensor in 45nm SOI CMOS, in: IEEE International
Solid-State Circuits Conference, 350–351, 2010.
Takeda, K., Hagihara, Y., Aimoto, Y., Nomura, M., Uchida, R.,
Nakazawa, Y., Hirota, Y., Yoshida, S., and Saito, T.: Per-Bit
Sense Ampliﬁer Scheme for 1GHz SRAM Macro in Sub-100nm
CMOS Technology, in: IEEE International Solid-State Circuits
Conference, 1, 502–542, 2004.
Verma, N. and Chandrakasan, A. P.: A High-Density 45nm SRAM
Using Small-Signal Non-Strobed Regenerative Sensing, IEEE
Journal of Solid-State Circuits, 44, 163–173, 2009.
www.adv-radio-sci.net/9/247/2011/ Adv. Radio Sci., 9, 247–253, 2011