Probabilistic And Introverted Switching To Conserve Energy In A Digital System by Palem, Krishna V. et al.
c12) United States Patent 
Palem et al. 
(54) PROBABILISTIC AND INTROVERTED 
SWITCHING TO CONSERVE ENERGY IN A 
DIGITAL SYSTEM 
(75) Inventors: Krishna V. Palem, Atlanta, GA (US); 
Suresh Cheemalavagu, Marietta, GA 
(US); Pinar Korkmaz, Istanbul (TR); 
Bilge E. Akgul, Istanbul (TR) 
(73) Assignee: Georgia Tech Research Corporation, 
Atlanta, GA (US) 
( *) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 380 days. 
(21) Appl. No.: 111115,651 
(22) Filed: Apr. 27, 2005 
(65) 
(60) 
(51) 
(52) 
(58) 
(56) 
Prior Publication Data 
US 2005/0240787 Al Oct. 27, 2005 
Related U.S. Application Data 
Provisional application No. 60/565,748, filed on Apr. 
27, 2004. 
Int. Cl. 
G06N 3102 (2006.01) 
U.S. Cl. ...................... 713/320; 713/300; 713/310; 
713/321; 713/322; 713/323; 713/324; 713/330; 
713/340; 706/14; 706/45; 706/52 
Field of Classification Search ................ 713/300, 
713/310, 320-324, 330, 340; 706/14, 44, 
706/52 
See application file for complete search history. 
References Cited 
U.S. PATENT DOCUMENTS 
4,855,690 A 
6,104,968 A * 
6,195,669 Bl 
6,314,441 Bl* 
8/ 1989 Dias .. ... ... .. ... ... ... ... ... .. . 331/78 
8/2000 Ananth ....................... 700/297 
212001 Onodera et al. ... ... ... ... ... 708/3 
11/2001 Raghunath .................. 708/322 
I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 
US007290154B2 
(IO) Patent No.: US 7,290,154 B2 
Oct. 30, 2007 (45) Date of Patent: 
6,463,422 Bl 
6,469,576 B2 
6,542,014 Bl 
6,567,927 Bl* 
6,593,788 Bl 
6,618,711 Bl* 
7,085,749 B2 * 
10/2002 Hangartner .................. 706/14 
10/2002 Hasegawa .................... 330/69 
4/2003 Saito .......................... 327/164 
5/2003 Brinkmann .................. 714/10 
7/2003 Vogts ......................... 327/164 
9/2003 Ananth ........................ 706/14 
8/2006 Matsugu .. ... ... ... .. ... ... ... 706/20 
2004/0088272 Al * 512004 Jojic et al. .................... 706/ 13 
OTHER PUBLICATIONS 
Palem, K., "Energy Aware Computing Through Probabilistic 
Switching: A study of Limits," CASES '03, Oct. 30-Nov. 2, 2003, 5 
pages. 
Hedge, R. and Shanbhag, N., "Toward Achieving Energy Efficiency 
in Presence of Deep Submicron Noise," IEEE Transactions on Very 
Large Scale Integration (VLSI) Systems, vol. 8, No. 4, Aug. 2000, 
pp. 379-391. 
(Continued) 
Primary Examiner-A. Elamin 
(7 4) Attorney, Agent, or Firm-Thomas, 
Horstemeyer & Risley, LLP 
(57) ABSTRACT 
Kay den, 
A processor having binary switches is configured to operate 
at a predetermined probability value that the logical value of 
each switch is correct. A supply voltage is coupled to the 
binary switches. A randomized signal detector is configured 
to detect a randomized signal, which may be amplified to a 
predetermined level if the randomized signal is low. A 
computing element outputs a probabilistic binary bit having 
a 0 or 1 with a predetermined probability value of being 
correct in correspondence with the supply voltage and/or an 
amplification level of a noise signal. Subsequently, an appli-
cation executed by the processor receives the probabilistic 
binary bit for one or more additional operations. By oper-
ating on the probabilistic binary bits instead of conventional 
deterministic bits, the processor consumes less energy and 
completes its execution faster. For battery-powered portable 
electronic devices, use of processor configured for probabi-
listic binary bits substantially lengthens battery life. 
30 Claims, 10 Drawing Sheets 
R 
US 7,290,154 B2 
Page 2 
OTHER PUBLICATIONS 
Asahi, N., Akazawa, M., and Amemiya Y., "Single-Electron Logic 
Device Based on the Binary Decision Diagram," IEEE Transactions 
on Electron Devices, vol. 44, No. 7, Jul. 1997, pp. 1109-1116. 
Stein, K., "Noise-Induced Error Rate as Limiting Factor for Energy 
per Operation in Digital IC's," IEEE Journal of Solid-State Circuits, 
Co. SC-21, No. 5, Oct. 1977, pp. 527-530. 
* cited by examiner 
U.S. Patent Oct. 30, 2007 Sheet 1 of 10 US 7,290,154 B2 
••• 000 
000 
oeo 
( 20) 
0000 
@ 
FIG. 1 
U.S. Patent Oct. 30, 2007 Sheet 2 of 10 US 7,290,154 B2 
FIG. 2 
R 
FIG. 3 
U.S. Patent 
NOISE 
SOURCE 
37 
R 
-
34 
42 
Oct. 30, 2007 Sheet 3 of 10 
DETECTOR/AMPLIFIER 
v 
v+ I v'_ t 
Vss 
39 
US 7,290,154 B2 
Vour 
\_31 
COMPUTING 
ELEMENT 
01 
-
53 
FIG. 4 
U.S. Patent Oct. 30, 2007 Sheet 4of10 US 7,290,154 B2 
CROSS-COUPLED INVERTER 
67 
65 
1/0 PAD 70 
62 
"-60 
FIG. 5 
U.S. Patent Oct. 30, 2007 
CLOCK~ 
c1 
189 
A t~---~ ENTRY 
BRANCH 
0 
ELECTRON 
SINGLE 
ELECTRON 
JUNCTION 
ISLAND 
81 
83 
Sheet 5of10 US 7,290,154 B2 
INPUTX 
c2 1-90 
-----------
--------.- D 
B 
93 1 BRANCH 
82 
EXIT 
BRANCHES 
E 
0 BRANCH 
c, 191 84 
INPUTX 
FIG. 6 
U.S. Patent Oct. 30, 2007 
Switch sw 
Sheet 6of10 US 7,290,154 B2 
______ -~ enout1 
t---------1... out 
- - - - - - -~ enout2 
"80 
FIG. 7 
U.S. Patent Oct. 30, 2007 Sheet 7of10 US 7,290,154 B2 
in f(in) in f(in) 
0 0 0 1 
1 1 1 0 
Identity Function Complement Function 
in f(in) in f(in) 
0 0 0 1 
1 0 1 1 
Constant Function 0 Constant Function 1 
FIG. 8 
U.S. Patent Oct. 30, 2007 Sheet 8of10 US 7,290,154 B2 
Vdd 
enin1 Vdd 
enin 2 
in1 
out 
in2 
2 
Vdd Gnd 
enin1 
enout1 
Vdd 
enin 2 
out 1---- enout 2 
\__90 
FIG. 9 
U.S. Patent Oct. 30, 2007 Sheet 9 of 10 
enout1 out 
• I I 
switch 5 
I i------------------
1 
I 
I 
I 
I 
I 
I 
I 
I 
switch3 
, ________________ _ 
enout2 
• I I 
switch4 
switch2 
I 
I 
I 
I 
I 
I switch 1 ------------------~ 
en 
US 7,290,154 B2 
"90 
FIG. 10 
U.S. Patent Oct. 30, 2007 
A 
a a 1 
• I 
switch5 
.-------------
' 11 a 
I 
I 
I 
'"------:a a 
1 
switch3 
a 
I 
I 
I 
I 
I 
I 
I 
I 
: a 
,__ ----------
I 
switch4 
a !a 
I 
I 
I 
switch2 
1' I 
I I 
•th 0 : SW/ C 1 - -------------~ 
I 
en=1 
in1=0 
Sheet 10 of 10 US 7,290,154 B2 
B 
1 1 a 
• I 
r·-----------.1 I I 
I 
I 
o! a,,----
• 
L- ----11 1 
switch3 
• I I 
I 
I 
I 
I 
I 
I 
I 
switch4 
1 
I 
i1 
I 
I 
I 
: a switch2 ·-------------0, 
I 
I 
1 
switch1 
en=1 
in1=1 
1 : ______________ .J 
"'-90 
FIG. 11 
US 7,290,154 B2 
1 
PROBABILISTIC AND INTROVERTED 
SWITCHING TO CONSERVE ENERGY IN A 
DIGITAL SYSTEM 
CROSS REFERENCE TO RELATED 
APPLICATION 
This application claims priority to U.S. provisional appli-
cation entitled, "Randomized Computing Elements And 
Their Applications," having Ser. No. 60/565,748, filed Apr. 
27, 2004, which is entirely incorporated herein by reference. 
STATEMENT REGARDING FEDERALLY 
SPONSORED RESEARCH OR DEVELOPMENT 
This invention was made with Govermnent support under 
contract number F30602-02-2-0124, awarded by the U.S. 
Air Force. The U.S. Govermnent has certain rights in and to 
this invention. 
TECHNICAL FIELD 
The present disclosure pertains to computer processors, 
and more specifically, to a system and method for probabi-
listic determinations of binary switch states and introverted 
switching to reduce energy consumption and achieve accel-
erated execution times. 
BACKGROUND 
The use of portable electronic devices has exploded in 
recent years. As shown in FIG. 1, individuals today carry 
portable electronic devices such as wireless phones 18, 
PDAs, music players 20, laptop computers 12, and other 
similar devices. These devices may be battery powered, 
which means that each device may operate for a finite period 
of time before its respective battery is exhausted. Even with 
improved battery technologies, it is not uncommon for the 
battery life for such portable electronic devices to be any-
where from a few hours to a few days. 
As a nonlimiting example, a person taking a cross-country 
flight from New York to Los Angeles desiring to utilize 
music player 20 for the long flight may need to recharge the 
player 20 before returning back to the East Coast on the 
return flight. Thus, the individual would need to pack the 
requisite accompanying equipment to recharge the player's 
battery prior to the return flight, in this nonlimiting example, 
as the music player 20 otherwise would not have the battery 
life to operate for both the outgoing and return flights. This 
is but one nonlimiting example, as one of ordinary skill in 
the art would readily know of other examples wherein the 
battery life of such portable electronic devices may lead to 
periodic recharging. 
These portable electronics may include one or more 
processors, which may be configured to consume a signifi-
cant amount of battery energy, thereby leading to a relatively 
short battery life. At least one reason that a processor may 
prematurely exhaust a battery may be related to the fact that 
processors of today are oftentimes configured to make 
certain that each data bit processed is either a 0 or 1 at each 
step of a calculation. In making attempts to ensure a par-
ticular value as being a 0 or a 1, a processor may consume 
a significant amount of energy by holding that particular 
state at the 0 or 1. Furthermore, a processor containing 
millions of switches may consume a significant amount of 
the battery's energy in ensuring the accuracy of each 
switch's change of state from either 1 to 0 or 0 to 1. So in 
2 
ensuring the accuracy of each switch as it changes state, a 
typical processor may consume additional battery energy 
that may otherwise be used for extended processing time. 
Moreover, one of ordinary skill would know that some 
quantifiable level of battery energy may be lost due to 
leakage currents in the processor's transistor switches. Leak-
age current refers to the amount of current that flows through 
the transistor switch when there is no switching action. So 
for the millions of transistors in a typical processor chip, the 
10 aggregated leakage may significantly reduce battery life. 
FIG. 1, as referenced above, is a diagram of a nonlimiting 
exemplary group of devices that may operate on battery 
power and/or may otherwise be configured for power con-
servation operations. In this nonlimiting example of FIG. 1, 
15 processor chip 10 may be included in each of laptop 12, 
videocamera 14, television 16, wireless telephone 18, and 
music player 20. These devices shown in FIG. 1 are but 
nonlimiting examples, as one of ordinary skill in the art 
would know of additional nonlimiting examples which may 
20 also have a processor similar to processor 10 of FIG. 1. 
Nevertheless, in this nonlimiting example, exploded win-
dow 24 depicts four transistor switches 26-29 of a poten-
tially much larger number, which may be contained in 
processor 10. Stated another way, transistor switches 26-29 
25 may comprise four of the millions of transistor switches 
resident in the processor 10 chip. 
As the transistor switches 26-29 switch states from 0 to 1 
and/or 1 to 0, the battery resident in each of the portable 
devices of FIG. 1 may be more quickly drained if processor 
30 10 is configured so as to calculate with a greater degree of 
certainty for each transistor to change its status from a 0 to 
1 or a 1 to 0. Moreover, as stated above, the leakage current 
for each transistor 26-29, as well as the rest of the millions 
of transistors on processor 10, may be aggregated to a 
35 significant amount such that the battery in each of devices 
12, 14, 16, 18 and 20 of FIG. 1 may be caused to expire more 
quickly. 
Furthermore, in compliance with Moore's Law, which 
states that silicon power doubles approximately every 18 
40 months to 2 years, the number of transistors resident on 
processor 10 will likely increase over time as the size of the 
transistors decreases. As a nonlimiting example, if semicon-
ductor engineers reduce the size of transistors by only 
approximately 10% a year, a twofold increase in the number 
45 of transistors on a chip will likely be realized every 18 
months to 2 years. 
Even on processor 10, transistors on the chip are not 
necessarily identical to each other. As transistors become 
smaller and smaller, variability between transistors steadily 
50 increases. As the various transistors on processor 10 may 
look different electrically, the result may be realized in 
haphazard variations in performance of processor 10. As a 
result, processor 10 may actually become more unreliable as 
an increasing amount of transistors are included in processor 
55 10, thereby resulting in an untrustworthy quotient in each of 
laptop 10, camera 14, television 16, wireless phone 18, and 
music player 20, all of FIG. 1. 
The amount of heat generated by a processor is a limita-
tion for processors configured with a large number of 
60 transistors, as well as processors configured for high speed 
operation and calculations. Thus, to avoid melting the cop-
per circuit lines within the processor 10, such chips may be 
fitted with speed limiters. Speed limiters may be configured 
to prevent a processor 10 chip from calculating numbers as 
65 fast as it otherwise may. Nevertheless, the heat generated in 
processor 10 may be be attributed to the deterministic design 
of the chip itself, that is, to ensure data bits at 1 s or Os. 
US 7,290,154 B2 
3 
Prior attempts have been made to conserve battery life 
and also to reduce the heat generated by processor 10; 
however, results have been mixed. In one nonlimiting 
example, chip designers have created schemes that activate 
certain blocks of transistors used for particular calculations 
and deactivate the remaining blocks of transistors of the 
processor 10 so as to conserve battery energy and to reduce 
heat from a lesser number of transistors that are actually 
used. Here, the activation of blocks of transistors refers to 
triggering these blocks of transistors for receipt and evalu- 10 
ation of their inputs in order to produce the corresponding 
new output data. Whereas, the deactivation of blocks of 
transistors refers to preventing these blocks of transistors 
from receiving and hence evaluating their inputs. Therefore, 
once the blocks of transistors are deactivated, they do not 15 
perform any switching action and hence, hold their previous 
states. 
4 
FIG. 1 is a diagram depicting various portable electronic 
devices that have an embedded processor. 
FIG. 2 is a diagram of a computing element (CE), as may 
be utilized by the processor of claim 1. 
FIG. 3 is a diagram of a random computing element 
(RCE) based in part on the CE of FIG. 2, but with I2 replaced 
with input R. 
FIG. 4 is a nonlimiting exemplary diagram of an implicit 
RCE, as may be implemented in the processor of FIG. 1. 
FIG. 5 is a diagram of an alternative embodiment of the 
implicit RCE of FIG. 4, as may also be implemented in the 
processor of FIG. 1. 
FIG. 6 is an alternative embodiment of an implicit RCE, 
as may be implemented in the processor of FIG. 1. 
FIG. 7 is a diagram of an introverted switch, which may 
be included in the processor of FIG. 1. 
FIG. 8 is a table diagram of the possible functions of the 
switch of FIG. 7. 
FIG. 9 is a nonlimiting exemplary diagram of the intro-
20 verted switch of FIG. 7. 
While this solution is an improvement over prior designs 
which simply activate all transistors of a processor 10, 
thereby consuming a greater amount of energy of a device's 
battery, this solution still fails to adequately preserve a 
battery. In this instance, a substantial number of transistors 
even in the blocks that are activated are still unused, thereby 
essentially wasting valuable battery energy. Plus, this solu-
tion does not account for the energy consumed due to the 25 
deterministic approach to ensure the accuracy of each cal-
culation as being either a 1 or a 0. Even activating select 
blocks of transistors on processor 10 still results in wasted 
battery energy and reduced battery life. 
FIG. 10 is a nonlimiting exemplary diagram of a 2-input 
AND gate circuit implemented using five introverted 
switches of FIG. 9. 
FIG. 11 depicts the values of the input and output signals 
of all the switches of FIG. 10 when the inputs of the AND 
gate are in1in2=01 (in instance A) and in1in2 =11 (in instance 
B). 
DETAILED DESCRIPTION 
Moreover, due to decreased feature sizes, dopant fluctua- 30 
tions, thermal noise, increased sensitivity to V n capacitive/ 
inductive noise, interconnect variations, crosstalk, power 
grid noise, tunneling noise, inter-/intra-die process varia-
tions as well as defects, etc.; the devices at nano-scale can 
easily become destabilized, and hence, showing randomized 35 
behaviors. To overcome noise effects and some of the 
In addition to the drawings discussed above, this descrip-
tion describes one or more embodiments as illustrated in the 
above-referenced drawings. However, there is no intent to 
limit this disclosure to a single embodiment or embodiments 
that are disclosed herein. On the contrary, the intent is to 
cover all alternatives, modifications, and equivalents 
included within the spirit and scope ofthis disclosure and as 
defined by the appended claims. 
aforementioned imperfections, energy consumption has 
been traded in for noise tolerance via increasing the oper-
ating supply voltage, which in tum increases the energy 
consumption. However, energy consumption, and associated 
thermal/heat dissipation are already major problems that are 
increasing with technology scaling. Therefore, it is difficult 
to fulfill low-power requirements while preserving a robust 
device operation. Stated another way, meeting low-power 
requirements increases the likelihood of unreliable device 
operations. 
Such unreliable devices are likely to be more susceptible 
to noise, heat, defects, etc., which means that they may most 
likely be probabilistic-rather than deterministic-in 
nature. As the probabilistic behavior becomes the inevitable 
feature of these devices, processors will have to be made up 
of these unreliable devices while still operating at a desired 
performance level. 
As described above, today's processors consume a sig-
40 nificant amount of energy for deterministic calculations. As 
described above, deterministic calculations pertain to ensur-
ing that each data bit is either a 1 or 0 at every step of a 
calculation. However, if an application does not need such 
certainty, it is possible that energy consumption may be 
45 significantly reduced. Moreover, in this instance, the running 
time of a particular application may actually be decreased, 
and hence, its performance increased. As further described 
above, with the decreasing size of transistors on processor 
chips, such as processor 10 of FIG. 1, and the increasing 
50 variability between the various transistors on the chip, the 
predictability of such transistors may likewise be considered 
to be decreasing. Nevertheless, processors may be config-
ured in hardware to utilize the unpredictable nature of such 
As a result, a heretofore unaddressed need exists to 55 
overcome the deficiencies and shortcomings described 
above. 
transistors while still performing the intended calculations of 
an application. 
Instead of ensuring with a high degree of certainty the 
data bits as being a 1 or a 0, an approach may be made for 
the validation of probabilistic bits instead of the conven-
tional bits described above. A probabilistic bit, or PBIT, is DESCRIPTION OF THE DRAWINGS 
Many aspects of this disclosure can be better understood 
with reference to the following drawings. The components 
in the drawings are not necessarily to scale, emphasis instead 
being placed upon clearly illustrating the principals of the 
present disclosure. Moreover, in the drawings, like reference 
numerals designate corresponding parts throughout the sev-
eral views. 
60 similar to a conventional bit as described above, in that it 
takes on a 0 or 1 value. However, the certainty of the PBIT 
may be expressed with the probability of p. In using PBITs 
instead of conventional bits, a particular computing element 
may calculate a value with less energy, thereby extending 
65 the life of a battery or other power source for the processor. 
Depending upon the desired probability p for a given 
calculation, a corresponding amount of energy may be 
US 7,290,154 B2 
5 
associated with producing that probabilistic bit PBIT. For 
applications that may be suited for lower probability p 
values, a lesser amount of energy may be utilized such that 
battery consumption may be extended 10 to 100 or to even 
1000 fold from current lifecycles. 
6 
and m are greater than or equal to 1. In this nonlimiting 
example of FIG. 2, the output 0 1 will be equal to I1 ifl2 =0. 
Likewise, output 0 2 will be equal to I1 if I2=1. The CE of 
FIG. 2 may be a random computing element (RCE) if its 
outputs take on the value of 1 or 0 at random with a 
probability p, such that 0.5<p<l. Probabilistic computing has been previously utilized in 
software. However, as computer technology reaches physi-
cal limits, physical considerations accordingly become an 
increased concern. By applying probabilistic algorithms 
previously utilized in software applications to the hardware 
side of an embedded processor with reliable probability 
values, the result may be reduced power consumption, as 
well as decreased running-time of the computations. More-
over, by identifying the specific transistors needed for cer-
tain operations, processor 10 may be configured to turn off 
individual transistors of the hundreds of millions of transis-
tors on the processor 10 chip bed so as to further conserve 
battery energy. The result in each of these operations is a 
lower amount of dissipated heat as well as a faster operating 
processor, while obtaining desired quality of execution of a 
particular application. 
FIG. 3 is a diagram of the CE of FIG. 2, wherein I2 of FIG. 
2 is replaced with input R. In this nonlimiting example, input 
R will take the value 0 with a probability p and will take the 
10 value of 1 with the probability 1-p. Consequently, the output 
0 1 will be equal to I1 with the probability of p, and 0 2 will 
be equal to I1 with the probability 1-p. 
The element of FIG. 3 may actually be referred to as an 
RCE. If the randomness utilized by the RCE of FIG. 3 is 
15 obtained through the application of externally generated 
random bits to the RCE via its inputs or outputs, this result 
causes the RCE to be further characterized as an explicit 
RCE. An explicit RCE may generally include: (a) a dedi-
cated component that generates the random output data bits; 
20 and (b) a dedicated component to which a random input is 
applied and which, in turn, produces the random output data 
bits. A nonlimiting example for an explicit RCE is a pseudo 
random number generator, as one of ordinary skill in the art 
would know. 
Randomization is a mathematical technique that enables 
design of algorithms wherein at each step, a random event 
influences the next step to be executed. Randomized algo-
rithms do not always execute predictably, as opposed to 25 
deterministic algorithms, which may be utilized in the 
processor as described above. The transistor switches of a 
processor may operate as randomized devices in that they 
may be on or off with the interpretation that if the switch is 
on, its value is 1, whereas if it is off, its value is 0, or vice 30 
versa. As described above, the switching of these random-
ized devices (the transistors of a processor) utilize a mea-
surable quantity of energy, which may be otherwise supplied 
However, if the randomness of an RCE is not externally 
generated, but is obtained through the utilization of the 
inherent randomness in the computing element, then this 
RCE may be characterized as an implicit RCE. FIG. 4 is a 
nonlimiting exemplary diagram of an implicit RCE 31. 
The implicit RCE 31 of FIG. 4 is comprised of three 
portions. The first portion 34 relates to the generation of a 
random signal. In this nonlimiting example, resistor 37 is 
used as a device that interprets randomization physically. 
The random source, in this nonlimiting example, may be by a battery of one or more of the portable electronic devices 
in FIG. 1. 
The on and off states may be detected by observing the 
status change, which may be accomplished by measuring the 
voltage or current. The outcome of an energy expending 
action to tum the switch on or off is randomized in that it will 
occur with a fixed a priori probability p. The resulting output 
state of an on or off occurring with a probability p may be 
sustained with the same probability for the time it is 
switched, as a conventional switch. Consequently, the dis-
closure herein provides a system and method using random-
ness in such devices to achieve meaningful computation as 
well as to achieve performance in terms of faster execution 
times and reduced energy consumption. 
35 thermal noise in the resistor. One of ordinary skill in the art 
would know, however, that utilizing thermal noise as the 
noise source is but one of many possible nonlimiting 
examples. As an additional nonlimiting example, the circuit 
configuration of FIG. 4 may be configured to detect power 
40 supply noise, tunneling noise, interconnect noise, crosstalk 
noises, and other deep sub-micron noise sources. 
In the nonlimiting example of FIG. 4, the thermal noise 
signal across resistor 37 is a Gaussian noise signal, and the 
mean square value of the thermal noise voltage across 
45 resistor 37 may be represented by 
More specifically, this disclosure describes a processor 
having one or more binary switches (i.e., transistors) con-
figured to operate at a predetermined probability p that the 50 
logical value of each switch is correct. A supply voltage is 
coupled to the binary switches. A randomized signal detector 
is configured to detect a randomized signal, which may be 
amplified to a predetermined level if the randomized signal 
is low. A computing element outputs a probabilistic binary 55 
bit having a 0 or 1 with a predetermined probability value of 
being correct in correspondence with the supply voltage. 
Subsequently, an application executed by the processor 
receives the probabilistic binary bit for one or more addi-
tional operations. By operating on the probabilistic binary 60 
bit instead of a conventional bit as described herein, the 
processor consumes less energy and completes the execution 
of a particular application faster. For battery-powered por-
table electronic devices, use of processor configured for 
probabilistic binary bits substantially lengthens battery life. 65 
FIG. 2 is a diagram of a computing element (CE). The CE 
is an element that maps n inputs to m outputs where both n 
2 4hfRb.f 
vn = exp(hf / kT) - 1, where 
k is Boltzmann constant, 
T is the absolute temperature, 
R is the resistance, 
his Planck's constant, and 
li.f is the frequency bandwidth over which noise is measured. 
Where f is <<kT/h, the equation above may be approxi-
mated as 
;;?~kTRb.f 
Likewise, the mean square value of the thermal noise current 
may be represented as 
i;?~(4kT/R)b.f 
As a nonlimiting example, the rms noise voltage of a 
resistor having a resistance of 100 kQ at room temperature 
and a frequency range of 1 MHz is 40.216 µV with an rms 
US 7,290,154 B2 
7 
noise current of 0.402 nA. Thus, this nonlimiting example 
depicts a noise signal that is small and potentially undetect-
able by a computing element. Accordingly, detector/ampli-
fier 39 may be coupled to resistor 37 and is on the second 
portion. 
2 rx 
erf(x) =~Jo 
8 
diu. 
Detector/amplifier 39 detects the small noise signal, as 
illustrated above, and, as a nonlimiting example, may be 
configured as a differential amplifier that operates in the 
sub-threshold region. The detector/amplifier 39 may be 10 
configured with transistors 41-48 coupled as shown having 
For this implicit RCE implementation, as shown in FIG. 
4, it is evident from the equations above that p is related to 
the supply voltage (V dd) and the standard deviation (a) of 
the noise. Thus, p may be varied through changing V dd or a. 
As a nonlimiting example, if V m =V dj2, with V dd=5 V, and 
a=0.36 V, p for a data bit may be calculated to be 
0.999999999999810. IfV dd=0.5\7, probability p for the data 
bit may be calculated to be 0.75629823516199. Depending 
on the needs of the application, V dd can be set accordingly. 
FIG. 5 is a diagram of an alternative embodiment of the 
implicit RCE 31 of FIG. 4. In this implementation of FIG. 
5, RCE 60 is configured as a 6-transistor SRAM cell, which 
may be a part of a VLSI chip, as one of ordinary skill in the 
a differential stage that operates as a gain stage and a 
push-pull output stage. In this nonlimiting example, the 
sub-threshold region of operation may be utilized to main-
tain a low power consumption for the detector/amplifier 39 15 
as desirable. In sub-threshold operation of a MOS transistor, 
the amount of inversion layer charge on the transistor 
surface is lower than a threshold value. Due to this small 
inversion charge, the resulting drain current may also be 
small. Smaller drain current results in lower power dissipa-
tion, as the current that will be charging and discharging 
capacitors will also be small. 
20 art would know. The source of randomness is the noise 
caused by I/O signaling. The noise on I/O pad 62 is coupled 
via capacitor 65 to one of the input points 67 of cross-
coupled inverter 70. One of ordinary skill in the art would 
know, additionally, that the Gaussian assumption on noise The output of detector/amplifier 39 is coupled to the input 
of the computing element 53, the third portion of implicit 
RCE 31. Computing element 53 is comprised of one input 
and one output 0 1 ; therefore, the output of detector/ampli-
fier 39 will follow the input of the computing element 53. 
More specifically, 0 1 may be represented to be equal to the 
inverse of the input of the computing element 53. Thus, 0 1 
takes on a value of 0 or 1 at random due to the fact that the 
input of the computing element 53 is a randomly generated 
signal, as described above. 
25 distribution may also be realized for I/O signaling. Thus, in 
this nonlimiting example of FIG. 5, Gaussian distribution for 
noise caused by I/O signaling is assumed. 
At any point in time, cross-coupled inverter 70 may be 
connected to either one of the supply voltages (V ddH or 
30 V ddL) depending upon the value of the one-bit rand-mem 
signal. When rand-mem is equal to 1, cross-coupled inverter 
70 is connected to V ddH" Otherwise, cross-coupled inverter 
70 is connected to V ddL when the rand-mem signal is equal 
to 0. The probability that the value 0 or 1 is correct for the 
implicit RCE 31 of FIG. 4 may be calculated according to 35 
the following equation: 
In this implicit RCE 60 implementation, it may be 
assumed that V ddH is greater than 10 times the standard 
deviation of the noise on I/O pad 62. Since the noise is 
affecting one of the inverters 72, 74 in cross-coupled inverter 
70, the probability equation set forth above may still be 
where V m is the switching point of the inverter defined as the 
point at which the input voltage of the inverter is the same 
as the output voltage of the inverter. In this nonlimiting 
example, V m may be represented as 
':!.!!_ . _(W_/_L_)n . Vr 
µP (W /L)P n 
Vm = --------,====~-
µn (W /L)n 
-·--1 + 
where V dd is the supply voltage of the inverter. V Tp and V rn 
are the threshold voltages of the PMOS and NMOS tran-
sistors of computing element 53. Also, µn and µP are the 
average mobility of electrons and holes for the NMOS and 
PMOS transistors of the computing element 53, respectively. 
The (W/L))(W/L)P is the ratio of the aspect ratio of the 
NMOS transistor to the aspect ratio of the PMOS transistor. 
Referring to the probability determination above, a is the 
standard deviation of the Gaussian noise signal (i.e., a is the 
rms value of the Gaussian noise signal), and erf is the 
well-known error function, which may be represented as 
40 utilized for this implicit RCE 60 implementation. Conse-
quently, probability p of this implementation 60 may be 
manipulated by changing the supply voltage, the standard 
deviation of noise, or both. Based on the probability equa-
tion set forth above, the probability p of PBIT being correct 
45 is at least 0.99999971334843 ifV ddH is greater than 10 times 
the standard deviation of the noise at I/O pad 62. 
It is assumed that V ddL is no more than twice the standard 
deviation, which means that the probability p is at most 
0.84134474606854. Consequently, when V ddH is connected 
50 to cross-coupled inverter 70, the result corresponds to highly 
reliable operation. Conversely, the presence of V ddL con-
nected to cross-coupled inverter 70 results in unreliable 
operation, which may further be characterized as random-
ized operation. However, a particular application with 
55 probabilistic-based computations may be executed with a 
lower probability level while conserving battery life and/or 
generating less heat in the processor. 
The use of this dual supply voltage scheme may be 
explained in the context of the randomized algorithms. In 
60 some of the randomized algorithms such as random routing, 
genetic algorithms, simulated annealing, and random sort-
ing, random numbers that are generated are typically stored, 
as further use of the random numbers is needed. Yet, it may 
not be necessary to generate a random bit and then write it 
65 into a memory cell, such as the 6-transistor SRAM cell in 
FIG. 5. The reason the random bit may not be written to 
memory results from the fact that the first read from a 
US 7,290,154 B2 
9 
random memory element may also serve the purpose of 
generating the random bit. When cross-coupled inverter 70 
is coupled to V ddD which means that the random memory 
signal is equal to 0, there is a random value stored in the 
memory cell. The I/O signaling fluctuations of the output of 
inverter 72 causes the storage of a random value in the 
memory cell. If the random memory signal is set to 1 before 
a read access to this memory, the random bit will be read 
(generated). As long as the rand-mem signal is 1, the random 
bit will be reliably stored in the memory cell. Hence, the 10 
generation and storage of a random bit can be realized 
through the first read from the random memory element by 
means of the dual supply voltage scheme. 
FIG. 6 is an alternative embodiment of an implicit RCE 
78. In this nonlimiting example, the implicit RCE 78 com- 15 
prises a plurality of single electron transistors (SETs) 81-84. 
The implicit RCE 78 of FIG. 6 is a single-electron switch 
with differential inputs. In this switch 78, there are four 
tunnel junctions 81-84 and three capacitances 89-91. 
The input voltage X may be applied through capacitor 90, 20 
while a complement of X may be applied through capacitor 
91. If the input voltage Xis an appropriate positive voltage, 
then the input value may be interpreted as a binary 1 in the 
10 
processor core), enables an application running on such 
processor to still be able to properly execute with an 
increased but tolerable amount of error. As a nonlimiting 
example, applications configured for such processors may 
be configured to tolerate error up to approximately 25%, 
which may equate to a significant power reduction in the 
processor. As a nonlimiting example, ifthe power reduction 
was on the order of 10, a normal battery cycle of 20 hours 
may be extended to a period of 200 hours. In application, 
energy improvements may be realized by a factor of 30 or 
40 and perhaps other applications by even a factor of 1000, 
depending upon the desired operational quality for the 
application. 
As a nonlimiting example, a soldier on a battlefield 
equipped with a battery-powered electronic device may be 
able to switch processor 10 to do probabilistic computations 
so as to conserve battery power but while still obtaining the 
desired substantive battlefield information. As a nonlimiting 
example, a soldier's display, which may otherwise provide 
topology and other data, may be configured to show more 
simplistic drawings in the probabilistic computation mode, 
as described above, but still the data desired by the soldier. 
As an additional nonlimiting example, the soldier may 
switch the processor into even a further power reduction 1 branch. However, if X is a proper negative voltage, then 
the input value may be a binary 0 on the 0 branch. 25 mode such that even less auxiliary data is provided so that 
the processor may use probabilistic computations to still 
provide the substantive battlefield data the soldier desires. In 
this nonlimiting example, the battery-powered electronic 
Tunneling through junctions 81 and 83 is controlled by 
the charge on capacitors 89-91, the values of X and X, and 
the value of the clock voltage cp. As a nonlimiting example, 
electron 87 may tunnel through single electronic junction 81 
toward island 93 if the charge on the right side of single 30 
electronic junction 81 is more positive than a charge on the 
left side of single electronic junction 81, wherein the charge 
on the right side is dependent upon the capacitance of 
capacitor 90, the value of the input voltage X, and the value 
device utilized by the soldier may be extended while the 
soldier is deployed in the battlefield, unable to otherwise 
charge or replace the battery. So, by configuring the proces-
sor for probabilistic calculations rather than ensuring abso-
lute accuracy of data bits, the soldier may remain on the 
battlefield longer with valuable battlefield information. 
The nonlimiting implementation examples for implicit 
RCEs above can be viewed as building blocks or switches 
used to build more complicated processors that are com-
posed of switches. These processors are built out of millions 
of switches that can be on or off. In the ideal case, these 
switches should be enabled when needed and should be 
disabled otherwise. A processor can be configured to tum off 
portions of the chip in order to conserve battery energy 
and/or reduce heat generated by the enabled but idle 
switches. Instead of just deactivating blocks of switches, 
of the clock voltage cp. Similarly, the charge on the left side 35 
of single electron junction 81 is dependent upon the capaci-
tance of capacitor 90, the value of the complementary input 
voltage X, and the value of clock voltage cp. When an 
electron is supplied at the entry branch, it follows the path 
A~B~D (the 1 branch) ifX has a positive value. Likewise, 40 
an electron follows the path of A~c~E (the 0 branch) ifX 
has a negative value. Thus, if an electron follows the 
A~ B~ D path, it may be interpreted as switching to 1. If an 
electron 87 follows the pathA~c~E, it may be interpreted 
45 which leaves other blocks on, but still having idle switches, 
the processor may be configured to activate or deactivate 
individual switches. This provides an improved resolution in 
deactivating idle or unneeded switches, which would oth-
as switching to 0. 
Tunneling is a probabilistic phenomenon, and the waiting 
time for the expected tunneling is not fixed. For the device 
operation without error, the clock duration should be suffi-
ciently long. The probability of that the waiting time for the 
expected tunneling will be longer than the clock period 50 
(tcLK) is given by 
where r 1 is the mean tunneling rate. The probability for 
incorrect operation may be represented as Pe' whereas the 
probability for correct operation in the implementation of 
this implicit RCE 78 of FIG. 6 may be represented as 
p=l-Pe· Thus, this equation above depicts that the probabil-
ity p may be varied by changing the clock period tcLK· 
The implicit computing elements described above gener-
ate probabilistic bits such that applications or algorithms 
may exploit the probabilistic nature of these bits to execute 
probabilistic computations faster and with less energy. 
Stated another way, using one or more of the circuit imple-
mentations described above (which one of ordinary skill in 
the art would know could be implemented across an entire 
erwise waste battery energy. 
Yet to accomplish this operation, the gate, or switch, has 
to be configured for such operation. In this instance, the 
switch may be configured to have an introverted sleep state, 
which may be activated on command. The switch may be 
activated, or awaken, on command when needed for opera-
55 tion. 
FIG. 7 is a diagram of an introverted switch 80. As shown 
in FIG. 7, the switch 80 has four inputs: primary inputs (in1 
and in2 ) and the associated enable inputs (enin1 and enin2). 
The enable input enin1 is associated with input in1 and the 
60 enable input enin2 is associated with input in2 . The switch 
also has one primary output (out) and two enable outputs 
(enout1 and enout2). 
For a correct operation of the switch, either enable signal 
enin1 or enable signal enin2 , but not both, is asserted. If enin1 
65 is asserted (enin1=1 and enin2 =0), then out will be a func-
tion, f, of the input in1 . If enin2 is asserted ( enin1 =O and 
enin2 =1), then out will be a function, f of the input in2 . 
US 7,290,154 B2 
11 
The function f can be one of the four possible functions 
shown in FIG. 8, which is a table diagram of the possible 
functions. Specifically, the possible functions include: iden-
tity, complement, constant function 0 and constant function 
1. 
For a fixed function f the introverted switch behaves as 
follows: 
if enin1=1 and enin2=0, then 
out=f(in1) 
enout1=out 
enout2=out 
else if enin1=0 and enin2 =1, then 
out=f(in2 ) 
enout1=out 
enout2=out 
else if enin1 =O and enin2 =0, then 
out=O 
enout1=0 
enout2=0. 
10 
15 
FIG. 9 is a nonlimiting exemplary diagram of the intro- 20 
verted switch 80 of FIG. 7. The function f of the switch 80 
is the complement function, as one of ordinary skill in the art 
would know. 
12 
examples of implementations, merely set forth for a clear 
understanding of the principles disclosed herein. Many 
variations and modifications may be made to the above-
described embodiment( s) and nonlimiting examples without 
departing substantially from the spirit and principles dis-
closed herein. All such modifications and variations are 
intended to be included herein within the scope of this 
disclosure and protected by the following claims. 
We claim: 
1. A method for conserving an amount of energy con-
sumed by a processor, comprising the steps of: 
supplying the processor with a predetermined supply 
voltage; 
detecting a randomized signal, wherein the detected ran-
domized signal is amplified to a predetermined level if 
the detected randomized signal is below the predeter-
mined level; and 
providing an output data bit having a logical value of a 1 
or 0 from one or more switches in the processor 
according to a probability value that the output data bit 
is correct, the probability value being established in 
association with the randomized signal and the prede-
termined supply voltage, wherein an application 
executed by the processor uses the output data bit for 
one or more additional computations. 
2. The method of claim 1, wherein the predetermined 
supply voltage is adjustable to a plurality of levels, wherein 
the probability value of the output data bit being correct 
increases or decreases in correspondence as the predeter-
One other property of an introverted switch is that its 
energy consumption due to leakage is also very low. Refer- 25 
ring back to FIG. 9, the Vdd and Gnd lines are isolated in 
terms of subthreshold leakage current paths. One Vdd-Gnd 
subthreshold leakage path exists due to the inverters pro-
ducing out and enout2 values, and this path is isolated by the 
transistors Ml, M2, M3 and M4 that help minimize the 
leakage current. Another way to minimize the leakage 
energy consumption is by using a high-V r (high threshold-
voltage) transistor type for Ml and M3. 
30 mined supply voltage is changed between the plurality of 
levels. 
Using introverted switches, logic operations such as 
AND, NAND, OR, NOR, and many others can be imple- 35 
mented, as one of ordinary skill would know. FIG. 10 is a 
diagram of a 2-input AND gate circuit 90 implemented using 
five introverted switches 80 of FIG. 7. In this figure, the 
function f of the switches switchl, switch2 and switch5 is 
the identity function, the function f of the switch switch3 is 40 
the constant function 0, and the function f of the switch 
switch4 is the constant function 1. 
This AND gate switch 90 implementation is a naturally 
energy-aware design, as one of ordinary skill in the art 
would know. FIG. 11 is a nonlimiting example of two 45 
instantiations A and B of the circuit 90 of FIG. 10. Instance 
A of FIG. 11 depicts the values of the input and output 
signals of all the switches when the inputs of the AND gate 
are in1 in2=01. Instance B depicts the case when the inputs of 
the AND gate are in1in2=11. The shaded regions in these two 50 
instances include the switches that are active for the given 
input. As a nonlimiting example, in instance A, the switches 
switchl, switch3 and switch5 are active, whereas the 
switches switch2 and switch4 are idle. 
3. The method of claim 2, wherein the level predeter-
mined supply voltage is configurable by a user of a device 
utilizing the processor according to an input received from 
the user. 
4. The method of claim 2, wherein the level predeter-
mined supply voltage is selected by an application being 
executed by the processor so as to increase or decrease the 
probability of an output data bit being correct according to 
a predetermined value. 
5. The method of claim 1, wherein the randomized signal 
is amplified to a predetermined level of a plurality of levels, 
wherein the probability value of the output data bit being 
correct increases or decreases in correspondence as the 
amplified level of the randomized signal. 
6. The method of claim 5, wherein the amplified level of 
the randomized signal is configurable by a user of a device 
utilizing the processor according to an input received from 
the user. 
7. The method of claim 5, wherein the amplified level of 
the randomized signal is determined according to an appli-
cation being executed by the processor so as to increase or 
decrease the probability of correctness of an output data bit 
utilized during application execution. 
8. The method of claim 1, wherein the detected random-
ized signal corresponds to thermal noise. 
9. The method of claim 1, wherein the detected random-
ized signal corresponds to noise generated by a random 
signal generator. 
10. The method of claim 1, wherein the detected random-
ized signal corresponds to input/output signaling noise. 
Therefore, for a given input, there exists a unique path that 55 
includes active switches. The remaining switches outside 
this path are idle. This unique path property allows a large 
portion of the network to be quiescent and enables on-
demand energy consumption. This technique is similar to a 
guarded evaluation technique, as one of skill in the art would 60 
know. A guarded evaluation technique is applicable for 
coarse-grained circuit building blocks. In the introverted-
switch-based designs, however, the on-demand energy con-
sumption is at the switch-level granularity, and hence, the 
energy savings are significant. 
11. The method of claim 1, wherein the processor is 
utilized in a portable electronic device that is battery-
powered, wherein the life of the battery is extended when the 
65 determined probability is reduced. 
It should be emphasized that the above-described embodi-
ments and nonlimiting examples are merely possible 
12. The method of claim 1, further comprising the steps 
of: 
US 7,290,154 B2 
13 
asserting a first enable input signal coupled to a switch in 
the processor in association with a first input to the 
switch, wherein a second enable input signal coupled to 
the switch in association with a second input to the 
switch is nonasserted; 
asserting the second enable input signal coupled to the 
switch, wherein the first enable input signal coupled is 
nonasserted; and 
providing a primary output signal and first and second 
enable output signals from the switch in accordance 10 
with the first and second enable inputs. 
13. The method of claim 12, wherein the first and second 
enable input signals may be nonasserted as logical zeros so 
that the switch is deactivated until one of the first and second 
enable input signals is asserted as a logical one to activate 15 
the switch. 
14. The method of claim 12, wherein the switch may be 
utilized as part of a logical AND, NAND, OR, or NOR 
operation. 
15. The method of claim 1, wherein the one or more 20 
switches may be individually deactivated according to one 
or more enable inputs coupled to each switch. 
16. A method for conserving an amount of energy con-
sumed by a processor having switches comprised of one or 
more single electron transistors, comprising the steps of: 
supplying a clock signal; 
25 
14 
a predetermined supply voltage coupled to the one or 
more binary switches; 
a randomized signal detector configured to detect a ran-
domized signal; 
an amplifier configured to amplify the randomized signal 
to a predetermined level if the randomized signal is 
below the predetermined level; and 
a computing element configured to output a binary bit 
having a 0 or a 1 with a predetermined probability value 
of being correct in correspondence with the predeter-
mined supply voltage, wherein an application executed 
by the processor receives the output binary bit for one 
or more additional operations. 
22. The processor of claim 21, wherein the predetermined 
probability value may be increased or decreased in associa-
tion with an adjustment of a level of amplification by the 
amplifier. 
23. The processor of claim 21, further comprising: 
an user-selectable input coupled to the binary switch and 
configured to increase or decrease the predetermined 
supply voltage to an adjusted level according to a user 
selection, the predetermined probability value varying 
in proportion to the adjustment of the predetermined 
supply voltage. 
24. The processor of claim 21, wherein the processor is a 
component of a portable electronic device powered by a 
battery. establishing a randomized signal corresponding to a time 
for an electron to tunnel through one or more single 
electron transistors; 
determining a probability that an output of a switch in the 
processor is correct in association with the randomized 
signal and the clock signal; and 
25. The processor of claim 24, wherein the battery pro-
vides power to the processor for a greater period of time 
30 when the predetermined probability level is decreased and 
provides power to the processor for a shorter period of time 
when the predetermined probability level is increased. 
providing an output data bit from the switch having a 
logical value of a 1 or 0 according to the determined 
probability, wherein an application executed by the 35 
processor uses the output data bit for one or more 
additional computations. 
26. The processor of claim 21, further comprising: 
a resistor coupled to a binary switch and configured to 
produce the randomized signal in correspondence with 
thermal noise in the processor. 
27. The processor of claim 21, wherein the detected signal 
corresponds to noise in the processor. 17. The method of claim 16, wherein a duration of the 
clock signal is extended to increase the determined prob-
ability that the output from the switch is accurate and 
shortened to decrease the determined probability that the 
output from the switch is accurate. 
28. The processor of claim 21, the one or more binary 
40 switches further comprising: 
18. The method of claim 17, wherein the duration of the 
clock signal is established according to an input received 
from a user of a device utilizing the processor. 
19. The method of claim 17, wherein the duration of the 
clock signal is established by an application being executed 
by the processor. 
45 
20. The method of claim 16, wherein the processor is 
utilized in a portable electronic device that is battery- 50 
powered, wherein the life of the battery is extended when the 
determined probability is reduced. 
21. A processor having one or more binary switches 
configured to operate at a predetermined probability that the 
one or more binary switch outputs is correct, comprising: 
a first and second enable input path, wherein the binary 
switch is deactivated ifthe logical value of a signal on 
both the first and second enable input paths is zero, and 
wherein the binary switch is activated if the logical 
value of a signal on either the first and second enable 
input paths is one. 
29. The processor of claim 21, wherein the one or more 
binary switches may be configured to execute AND, NAND, 
OR, or NOR logic operations. 
30. The processor of claim 21, wherein the one or more 
binary switches may be individually activated or deactivated 
so that less than a total number of switches are activated at 
a given time for execution of an operation. 
* * * * * 
