Reconfigurable Mixed-signal VLSI Implementation Of Distributed Arithmetic by Ozalevli, Erhan et al.
c12) United States Patent 
Ozalevli et al. 
(54) RECONFIGURABLE MIXED-SIGNAL VLSI 
IMPLEMENTATION OF DISTRIBUTED 
ARITHMETIC 
(75) Inventors: Erhan Ozalevli, Atlanta, GA (US); 
Paul Hasler, Atlanta, GA (US); David 
Ver! Anderson, Alpharetta, GA (US); 
Walter Geeshan Huang, Greer, SC 
(US) 
(73) Assignee: Georgia Tech Research Corporation, 
Atlanta, GA (US) 
( *) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 
(21) Appl. No.: 111465,192 
(22) Filed: Aug. 17, 2006 
(65) Prior Publication Data 
US 2007/0040712 Al Feb. 22, 2007 
Related U.S. Application Data 
(63) Continuation-in-part of application No. 11/381,068, 
filed on May 1, 2006, now Pat. No. 7,280,063. 
(60) Provisional application No. 60/709,138, filed on Aug. 
17, 2005. 
(51) Int. Cl. 
H03M 1166 (2006.01) 
(52) U.S. Cl. ....................................... 3411144; 341/149 
(58) Field of Classification Search ......... 341/118-170 
See application file for complete search history. 
(56) References Cited 
U.S. PATENT DOCUMENTS 
4,989,179 A 1/1991 Simko 
100 
~ 
102 
I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 
US007348909B2 
(IO) Patent No.: US 7 ,348,909 B2 
Mar.25,2008 (45) Date of Patent: 
5,235,273 A 8/1993 Akar et al. 
5,349,351 A 9/1994 Ohara et al. 
5,623,279 A 4/1997 Itakura et al. 
5,682,175 A 10/1997 Kitamura 
6,094,153 A 712000 Rumsey et al. 
6,483,448 B2 11/2002 Martin et al. 
6,744,317 B2 * 6/2004 Kim et al. .................. 330/151 
6,958,947 B2 * 10/2005 Park et al. .................. 365/228 
OTHER PUBLICATIONS 
International Search Report for corresponding PCT Patent Appli-
cation No. PCT/US2006/032194 dated Apr. 20, 2007. 
* cited by examiner 
Primary Examiner-Lam T. Mai 
(74) Attorney, Agent, or Firm-James E. Schutz, Esq.; 
Seyed Kaveh E. Rashidi-Yazd, Esq.; Troutman Sanders LLP 
(57) ABSTRACT 
Disclosed herein is a reconfigurable mixed signal distributed 
arithmetic system including: an array of tunable voltage 
references operable for receiving a delayed digital input 
signal; a combination device in electrical communication 
with the array of tunable floating-gate voltage references 
that selectively combines an output of the array of tunable 
voltage references into an analog output signal; and a 
feedback element in electrical communication with the 
combination device, wherein the array of tunable voltages 
and the delayed digital input signal combine to perform a 
distributed arithmetic function and the reconfigurable mixed 
signal distributed arithmetic system responsively generates 
the analog output signal. 
18 Claims, 8 Drawing Sheets 
06 
108 
118 
U.S. Patent 
Figure lA 
Figure IB 
x(t) 
22 
S/H~ 
Mar.25,2008 
20 
! is~J ~ . 
! 
' 
Sheet 1 of 8 
sign 
• • 
i ~~~•~~~-b_,_1...___~~~~~-~_-,_I.__~~~&-·~ 
y(nT) 
·------------------------ I j 
,,___..,,,..___, 
cl"':< cl!I, Re~e-1 
28 
S/H ci1t~ 
-1 
1/2 
28b 
US 7 ,348,909 B2 
4 
-~- 26 
28c 
U.S. Patent Mar. 25, 2008 Sheet 2 of 8 US 7,348,909 B2 
Figure lC 
30 
~ 
T T T T 
I I 
•.. 1....... . .t~~~ . . .J. 
,~ ~ -----+fr_lJ ----11 ~~~ ~ . I I I • • ridci 
-+-I --+-1 --+--1 -
~ I I : I ~ I 
I I I • I ~ I 
~K-~ I Hr.M I I • • ,,. ................................................................ r ............ _ ........................................................ 1 ...................... _ ...................................................... 1 
I • • • tk.ici I S-<itt~~ f lrk~ci 
I 
U.S. Patent Mar.25,2008 Sheet 3 of 8 US 7 ,348,909 B2 
Figure 2 
06 
U.S. Patent Mar.25,2008 Sheet 4 of 8 US 7,348,909 B2 
Figure 3 
200 ) 
(K-1 )I" lbit aHgnod d1' bl~ aligned 1cs1 bit ~ligned 
datallalched datajlatched dataila!ched 
202 + ., ~,. -~SM\ reg .. ister ll---------; 
latch dock__J 
::~are - ( x~::::: ~-----'----x.___: =-~--] 
~CLK1 -1 i sample ho!~ I sample hord Li__ 
212~LK2 _j. hold sample 11.. ..... ;--h-o-ld_........., sample n--
214 ------<"lK3 !'sample; fi~E:ringj hold 1. ------ - . oomputahon (.;.. _____ _.... _____ ......_ ____ -f-_ 
206 ____ Invert 
sign1;1~_ 
Reset 208~~gnal 
· complete 
feed back Inverted for feedback not Inverted 
computation of i,egative numbersi-.. _____ .....,_ ___ __,.__ 
represented In t...VO's complement: 
,.;.-~~~~~..;-~~~~--. 
~ Disabled. accumu1alor reset to 
_...__ _____ e_na-'-b_loo ___ __. ' zero. Start of new fllterlng 
computation 
U.S. Patent Mar.25,2008 Sheet 5 of 8 US 7 ,348,909 B2 
Figure 4 
M s.eleet : ~-~· \306 
-=;;c 
~~~~~Y~ft~~~y~~~~~~~~,Y~NV~~ 
U.S. Patent 
Figure 5A 
Figure 5B 
Figure 5C 
Mar.25,2008 
400 
'---.. 
Vm---4 
Sheet 6 of 8 
500 
~ 
CLK 
Vin 4 
Mt 
506 ·~Ct 
~ ~ 
! <knP3 1 ! '-...._ · Vrcf 
~ --.-,.,:.+i----
~ ~ I C2 
----~ 
502 
502 
~ 
V1~ 
US 7 ,348,909 B2 
Yout 
Vout 
U.S. Patent Mar.25,2008 Sheet 7 of 8 US 7 ,348,909 B2 
Figure 5D 508 
~ 
V1n+--J 
Figure 5E 
600\. 
U.S. Patent 
Figure 6A 
Figure 6B 
0 10 
' 1 
60 ~ 
l 
1 
!. 
50 ~1 
0 
Mar.25,2008 Sheet 8 of 8 US 7 ,348,909 B2 
I I I I I I 
L5 2 
Normal~zed Frequency(rad/sec) 
US 7,348,909 B2 
1 
RECONFIGURABLE MIXED-SIGNAL VLSI 
IMPLEMENTATION OF DISTRIBUTED 
ARITHMETIC 
This applications claims priority of U.S. Provisional 
Patent Application No. 60/709,138 filed Aug. 17, 2005 and 
is a continuation in-part of U.S. application Ser. No. 11/381, 
068 filed May 1, 2006 now U.S. Pat. No. 7,280,063, the 
entire contents and substance of which are hereby incorpo- 10 
rated by reference. 
BACKGROUND 
1. Field of the Invention 15 
2 
B-1 
x[n - t] = -b;o + ~ b;,2-', 
i=l 
i=O, ... , K-1, 
(2) 
where ba is the l'h bit in the 2's complement representation 
of x[n-i]. Substituting equation (2) into equation (1) and 
swapping the order of the summations yields 
[
K-1 l B-l [K-1 l 
y[n] = - ~b;oW; + L ~b;,w; r'. 
l=l 
(3) 
For a given set ofw, (i=O, ... , K-1), the terms in the square 
The present invention relates generally to mixed signal 
distributed arithmetic, and more specifically to a reconfig-
urable mixed-signal very-large-scale integration (VLSI) 
implementation of distributed arithmetic. 
2. Description of Related Art 
The battery lifetime of portable electronics has become a 
major design concern as greater functionality is incorporated 
into portable electronic devices. The shrinking power budget 
20 braces may take only one of 2K possible values which are 
stored in a lookup table (LUT). The DA computation is then 
an implementation of equation (3). Another way to interpret 
equation (1) is to represent the coefficients as B-bit 2's 
complement binary numbers, 
of modem portable devices requires the use of low-power 25 
circuits for signal processing applications. These devices 
include, but are not limited to, flash memory and hard disk 
based audio players. The data, or media, in these devices is 
generally stored in a digital format but the output is still 
synthesized as an analog signal. The signal processing 30 
functions employed in such devices may include finite 
impulse response (FIR) filters, discrete cosine transforms 
(DCTs), and discrete Fourier transforms (DFTs ), which have 
traditionally been performed using digital signal processing 35 (DSP). DSP implementations typically make use of multi-
ply-and-accumulate (MAC) units forthe calculation of these 
operations, and as a result the computation time increases 
linearly as the length of the input vector grows. 
In many other applications, the input data is analog not 40 
digital while the output remains analog. Often, the process-
ing for these applications do not require digital signal 
processing components therefore do not require the analog 
input to be converted into a digital signal. If such a conver-
sion did occur, then this would use unnecessary power. Or, 45 
the processing for these applications occurred at a point 
where a digital-to-analog signal processing component 
would not be appropriate. For such applications, an analog-
to-analog signal processing component would be preferred. 
Examples of these applications include but are not limited to 50 
signal processing for sensor networks, wireless communi-
cations, audio systems, hearing aids, and video systems. 
Distributed arithmetic (DA) is an efficient way to compute 
an inner product, which is a common feature of the FIR 
filter, DCT, and DFT functions. DA computes an inner 55 
product in a fixed number of cycles, which is determined by 
the precision of the input data. In a traditional DA imple-
mentation, the inner product operation, 
B-1 
Wj = -bio + ~ bit2-t, 
l=l 
i=O, ... , K-1, 
(4) 
where ba is the l'h bit in the 2's complement representation 
ofw,. Substituting equation (4) into equation (1) and swap-
ping the order of the summations yields 
[
K-1 l B-l [K-1 l 
y[n] = _ ~ b;ox[n _ i] + L ~ b;,x[n _ i] r'. 
l=l 
(5) 
Now the LUT contains all possible combination sums of the 
input signal samples {x[n], x[n-1], ... , x[n-K+ll}. 
DA is computationally more efficient than MAC-based 
approach when the input vector length is large. However, the 
trade-off for the computational efficiency is the increased 
power consumption and area usage due to the use of a large 
memory. What is needed therefore is a mixed signal circuit 
implementation for optimized DA performance, power con-
sumption, and area usage. 
BRIEF SUMMARY 
Disclosed herein is a reconfigurable mixed signal distrib-
uted arithmetic system including: an array of tunable voltage 
references operable for receiving a delayed digital input 
signal; a combination device in electrical communication 
with the array of tunable floating-gate voltage references 
K-1 
y[n] = ~ w;x[n - i] 
i=O 
(1) 
60 that selectively combines an output of the array of tunable 
voltage references into an analog output signal; and a 
feedback element in electrical communication with the 
combination device, wherein the array of tunable voltages 
and the delayed digital input signal combine to perform a 
is done as follows. Let the input signal samples be repre-
sented as B-bit 2's complement binary numbers, 
65 distributed arithmetic function and the reconfigurable mixed 
signal distributed arithmetic system responsively generates 
the analog output signal. 
US 7,348,909 B2 
3 
Also disclosed herein is a method for performing mixed 
signal distributed arithmetic including: receiving an analog 
input signal; storing the analog input signal in a plurality of 
storage elements; selectively combining the delayed/stored 
analog input signal; and responsively generating an analog 
output signal, wherein selectively combining the delayed 
analog input signal is performed with a plurality of digital 
circuit elements. 
Further disclosed herein is a method for performing 
mixed signal distributed arithmetic including: receiving a 
digital input signal; storing the digital input signal in a shift 
register; combining the digital input signal from the shift 
register with an array of tunable voltage references; and 
responsively generating an analog output signal. 
Also, disclosed herein is a reconfigurable mixed signal 
distributed arithmetic system including: a plurality of stor-
age elements operable for receiving a sampled analog input 
signal; a combination device in electrical communication 
with the plurality of storage elements that selectively com-
bines an output of the plurality of storage elements into an 
analog output signal; and a feedback element in electrical 
communication with the combination device, wherein the 
combination device uses a plurality of digital circuits ele-
ments to selectively combine the output of the plurality of 
storage elements and wherein the plurality of storage ele-
ments and the plurality of digital circuits combine to per-
form a distributed arithmetic function and the reconfigurable 
mixed signal distributed arithmetic system responsively 
generates the analog output signal. 
These and other objects, features and advantages of the 
present invention will become more apparent upon reading 
the following specification in conjunction with the accom-
panying drawing figures. 
BRIEF DESCRIPTION OF THE DRAWINGS 
The subject matter that is regarded as the invention is 
particularly pointed out and distinctly claimed in the claims 
at the conclusion of the specification. The foregoing and 
other objects, features, and advantages of the invention are 
apparent from the following detailed description taken in 
conjunction with the accompanying drawings in which: 
FIGS. lA-C are block and timing diagrams that illustrate 
mixed-signal DA systems in accordance with exemplary 
embodiments of the invention; 
FIG. 2 is a circuit diagram illustration of a mixed-signal 
DA system in accordance with exemplary embodiments of 
the invention; 
FIG. 3 is a digital clock diagram corresponding to the DA 
system depicted in FIG. 2; 
FIG. 4 is a circuit diagram that illustrates a modified epot 
in accordance with an exemplary embodiment of the inven-
tion; 
FIGS. SA-E are circuit diagrams that illustrate various 
components of the mixed-signal FIR filter depicted in FIG. 
2; and 
FIGS. 6A-B are graphs that illustrate the computational 
error of the mixed-signal DA system and the frequency 
response of the variance for symmetric offset error. 
4 
transistors for reconfigurability and programmability. Refer-
ring now to FIG. lA, a block diagram of a mixed-signal DA 
system 10 in accordance with exemplary embodiments is 
illustrated. The mixed signal system includes a shift register 
12 for receiving and delaying a digital input signal, an array 
of tunable voltage references, or analog weights, 14, a 
combination device 16 for combining the weighted signals 
and generating an analog output signal, and a storage 
element 18. In exemplary embodiments, a delay element 
10 18a, a scaling unit 18b, and a switch 18c is used in a 
feedback path of the mixed-signal DA system 10 for the DA 
computation. The switch 18c may be used to select between 
zero and the feedback path. 
Turning now to FIG. lB, a block diagram of another 
15 mixed-signal DA system 20 in accordance with exemplary 
embodiments is illustrated. The mixed signal system 20 
includes a plurality of storage elements 22 for receiving, 
delaying, and optionally weighting an analog input signal. 
The mixed signal system 20 also includes a plurality of 
20 digital circuit elements 24 for selecting one or more desired 
stored input signals from at least a portion of the plurality of 
storage elements 22 and transmitting the desired stored input 
signals to a combination device 26. The mixed signal system 
20 includes a sample-and-hold element 28 that samples and 
25 stores an output of the combination device 26. In exemplary 
embodiments, an inverter 28a, a scaling unit 28b, and a 
switch 28c is used in a feedback path of the mixed-signal DA 
system 20 for the DA computation. 
Continuing with reference to FIG. lB, in one embodiment 
30 there are K+l sample-and-holds circuits such that when the 
sample-and-hold circuit that is sampling the analog input, 
x(t), the other sample-and-hold circuits can be used for 
computing the output. If an additional sample-and-hold 
circuit did not exist, then the computation would have to 
35 wait for the sampling operation to complete before begin-
ning. In each of the sample-and-hold circuits, a time delayed 
version of the input is stored. This delay ranges from zero to 
nTwhere n is the bit precision of the coefficients and Tis the 
amount of time needed to compute the output. Rather than 
40 using a cascade of analog storage elements where each 
element represents how long the input has been delayed 
relative to x(t) and each element is fixed to a certain 
coefficient, this architecture views each storage element as 
an absolute time when the input was captured and generates 
45 the relative time delay by moving the coefficient to the 
appropriate storage element as time elapses. 
FIG. lC illustrates a timing diagram 30 corresponding to 
the mixed-signal DA system 20 depicted in FIG. lB. The 
coefficient vector is stored digitally so to create the time 
50 delay means just storing the coefficients in a single shift 
register whose size is equal to n(K+l) where n is the bit 
precision of the coefficients and K is the number of elements 
in the coefficient vector. A single bit is shifted in the shift 
register every T/n interval to compute the DA computation 
55 and to insure that the coefficient element is shifted into the 
next register in T time. Having one of the registers at time 
t is zero insures that the sample-and-hold circuit that is 
sampling is not used in the computation. Only K of the 
The detailed description explains the preferred embodi- 60 
ments of the invention, together with advantages and fea-
tures, by way of example with reference to the drawings. 
sample-and-hold circuits are used for computing the output 
and which sample-and-hold circuits are used and when in 
the computation are determined by the coefficient vector and 
the current iteration of the computation. There are n itera-
tions in one cycle of the computation. 
DETAILED DESCRIPTION 
Disclosed herein is a mixed-signal DA system built uti-
lizing the analog storage capabilities of floating-gate (FG) 
The compact size of the mixed-signal DA systems 10 and 
65 20 is obtained through the iterative nature of the DA 
computational framework, where many multipliers and 
adders are replaced with an addition stage, a single gain 
US 7,348,909 B2 
5 
multiplication, and a coefficient array. The low-power 
implementations of these filters can readily ease the power 
consumption requirements of portable devices. Also, due to 
the serial nature of the DA computation, the power and area 
of this filter increase linearly with its order. Hence, the 5 
mixed-signal DA systems 10 and 20 allow for a compact and 
low power implementation of high-order FIR filters, DCT, 
and DFT functions. 
6 
one embodiment, these clocks have a frequency of K times 
the sampling frequency. The stored data is then inverted 
relative to the reference voltage, vref 118, by using the 
second inverting amplifier, AMP2 122, to obtain the same 
sign as the summed epot voltages. Ideally, AMP 2 122 is 
identical to AMP 1 106 and has the same size input/feedback 
capacitors. After obtaining the delay and the sign correction, 
the stored analog data is fed back to the addition stage as 
delayed analog data. During the addition, it is also divided Referring now to FIG. 2, a circuit diagram of an exem-
plary embodiment of yet another mixed-signal DA system 
100 is illustrated. The mixed-signal DA system 100 includes 
10 by two by using CFB=CFBamp/2=C/2, which gives a gain of 
0.5 when it is added to the new sum. This operation is 
repeated until the MSBs of the digital input data is loaded 
into the shift register 102. The MSBs correspond to (K-l)'h 
bits, and are used to make the computation 2's-complement 
of four base components, a shift register 102, an array of 
tunable FG voltage references (epots) 104, inverting ampli-
fiers (AMP) 106, and sample-and-hold (SH) circuits 108. 
Digital inputs are introduced to the mixed-signal DA system 
100 by using the shift register 102. These digital input words 
represent the digital bits, bi/ in equation (2), which selects the 
epot 104 voltages to form the appropriate sum of weights 
necessary for the DA computation at the j'h bit. The clock 
frequency of the shift register 102 is dependent on the input 20 
data precision, K, and the length of the filter, M, and is equal 
15 compatible. This compatibility is achieved by disabling the 
inverting amplifier in the feedback path during the last cycle 
of the computation by enabling the Invert signal. As a result, 
during the last cycle of the computation, the relative output 
voltage of AMP1 106 becomes 
to M ° K times the sampling frequency. Once the j'h input 
word is serially loaded into the top shift register, the data 
from this register is latched at K times the sampling fre-
quency. Alternatively, M shift registers could be used feed 25 
digital input data into and reduce the clock to K times the 
sampling frequency. A clock that is K times faster than the 
sampling frequency would preferably be used for this ideal 
configuration. 
LM-! C;w Vout 1 - Vref = - --'-(Vref - Vepot·)bio + amp C FBampl 1 
i=O 
L
K-! ·LM-! C;w 
2-1 --'-(Vref - Vepoti)bij. 
CFBamp1 j=l i=O 
(6) 
In one embodiment, where the amount of area used by the 30 
shift registers is not a design concern, an M-tap FIR filter 
could have M shift registers. The analog weights of DA are 
stored by the epots 104. For a more thorough discussion of 
the configurations and operation of the epot 104 see U.S. 
patent application Ser. No. 11/381,068 "Programmable Volt- 35 
age-Output Floating-Gate Digital to Analog Converter and 
Tunable Resistors." When selected, these weights are added 
Finally, when the computation of the output voltage in 
equation (6) is finished, it is sampled by SH3 108 using 
CLK
3 
132, which is enabled once every K cycle. SH3 108 
holds the computed voltage till the next analog output 
voltage is ready. The new computation starts by enabling the 
Reset 120 signal to zero out the effect of the previous 
computation. Then, the same processing steps are repeated 
for the next digital input data. by employing a charge amplifier structure 116 composed of 
same size capacitors, and a two-stage amplifier 106, AMP 1 . 
The epot 104 voltages as well as the rest of the analog 40 
voltages in the system may be referenced to a reference 
voltage 118, Vref=2.5V. Since the addition operation is 
performed by using an inverting 
Continuing with reference to FIG. 3, a timing diagram 
200 for the mixed-signal DA system 100 including, a shift 
register latch clock signal 202, a data signal 204, an Invert 
signal 206, a Reset signal 208, and various clock signals, 
CLKl 210, CLK2 212, and CLK3 214, is illustrated. The 
timing of the digital data and control bits governs the DA 
m-1 2= Wjbi(K-1) 
i=O 
amplifier, the relative output voltage, when Reset 120 signal 
is enabled, becomes equal to the negative sum of the 
selected weights for cini=CFBampl. For the first computa-
tional cycle, the result of the addition stage represents the 
summation which is the addition of weights for the LSBs of 
the digital input data. 
A delay element 110, an inverter 112, and a divide-by-two 
element 114 may be used in the feedback path of the 
mixed-signal DA system 100 for the DA computation. In one 
embodiment, sample-and-hold circuits, SH1 124 and SH2 
126, and inverting amplifiers, AMP 1 106 and AMP 2 122, are 
employed in the feedback path. The SH1 124 and SH2 126 
circuits store the amplifier output to feed it back to the 
mixed-signal DA system 100 for the next cycle of the 
computation. Non-overlapping clocks, CLK1 128 and CLK2 
130, are used to hold the analog voltage while the next 
stream of digital data is introduced to the addition stage. In 
45 computation. 
To achieve an accurate computation using DA, the circuit 
components are designed to minimize the gain and offset 
errors in the signal path. In an exemplary embodiment of the 
mixed-signal DA system 100 illustrated in FIG. 2, epots 104, 
50 inverting amplifiers 106and122, and sample-and holds 108, 
124, and 126 are utilized. In this embodiment, an array of 
epots 104 is used for storing the filter weights and during the 
programming, individual epots 104 are controlled and read 
by a decoder. In one embodiment, the epots 104 and invert-
55 ing amplifiers 106 and 122 use FG transistors to exploit their 
analog storage and capacitive coupling properties. A precise 
tuning of the stored voltage on FG node may be achieved by 
utilizing the hot-electron injection and the Fowler-Nordheim 
tunneling mechanisms. Exemplary methods for program-
60 ming FG transistors are disclosed in U.S. patent application 
Ser. No. 11/382,640 entitled "Systems and Methods for 
Programming Floating-Gate Transistors" the entire contents 
and substance of which is hereby incorporated by reference 
and in U.S. patent application Ser. No. 11/381,068 entitled 
65 "Programmable Voltage-Output Floating-Gate Digital to 
Analog Converter and Tunable Resistors". The epots 104 
employ FG transistors to store the analog coefficients of the 
US 7,348,909 B2 
7 
inner product. In contrast, the inverting amplifiers 106 and 
122 use FG transistors not only to obtain capacitive coupling 
at their inverting-node, but also to remove the offset at the 
FG terminals. 
8 
Turning now to FIG. 4, an exemplary embodiment of an 
epot circuit 300 is illustrated. The epot circuit 300 may be 
modified from its original version to obtain a low-noise 
voltage reference. The epot circuit 300 is a dynamically 
reprogrammable, on-chip voltage reference that uses a low-
noise amplifier integrated with FG transistors and program-
ming circuitry 304 to tune the stored analog voltage. The 
amplifier 302 in the epot circuit 300 may be used to buffer 
the stored analog voltage so that the epot circuit 300 can 
achieve low noise and low output resistance as well as the 
desired output voltage range. 
One advantage of exploiting FG transistors in the mixed-
signal DA system is that the area allocated for the capacitors 
may be dramatically reduced. This structure helps to over-
come the area overhead, which is mainly due to layout 
techniques used to minimize the mismatches between the 
input and feedback capacitors. In one embodiment, the unit 
capacitor, C, is set to 300 fF, and no layout technique is 
employed. As expected, due to inevitable mismatches 
between the capacitors, there will be a gain error contributed 
10 from each input capacitor. The stored weights are also used 
to compensate this mismatch. When the analog weights are 
stored to the epots, the gain errors are also taken into account 
to achieve accurate DA computation. Additional explanation 
of the size reduction realized using FG transistors may be 
15 found in U.S. patent application Ser. No. 11/381,068 "Pro-
grammable Voltage-Output Floating-Gate Digital to Analog 
Converter and Tunable Resistors." 
The epots 300 may be incorporated into the design not 
only to store the weights of DA, but also to obtain recon-
figurability/tunability. An exemplary embodiment of a pro-
gramming circuitry 304 for the epot is shown in FIG. 4. The 
stored voltage is tuned by the using Fowler-Nordheim 20 
tunneling and the hot-electron injection mechanisms. The 
tunneling is utilized for coarse programming of the epot 
voltage 316, and used to reach 200 mV below the target 
voltage. The purpose of undershooting is to avoid the 
coupling effect of the tunneling junction on the floating gate 25 
when tunneling is turned off. The tunneling mechanism 
decreases the number of electrons, thus increasing the epot 
voltage 316. After selecting the desired epot 300 by enabling 
its Select signal 306, a tunneling bit 308, digtunnel, is 
activated and a high voltage across the tunneling junction is 30 
created. During programming in accordance with an exem-
plary embodiment, the high voltage amplifier is powered 
with 14V. 
Unlike switched-capacitor amplifiers, the addition in the 
mixed-signal DA system is achieved without resetting the 
inverting node of the amplifiers because the floating-gate 
inverting-node of the amplifiers allow for the continuous 
time operation. Turning now to FIG. Sa, an exemplary 
embodiment of an inverting amplifier 400 is illustrated. The 
inverting amplifier 400 may be implemented by using a 
two-stage amplifier structure to obtain a high gain and a 
large output swing. Similar to the epots, the charge on the 
FG node of these amplifiers is precisely programmed by 
monitoring the amplifier output while the system operates in 
the reset mode. In the reset mode, the shift registers are 
cleared and the Reset signal is enabled. Therefore, all the 
input voltages to the input capacitors including the voltage 
to the feedback capacitor, CFB' are set to the reference 
voltage. These conditions ensure that the amplifier output 
35 becomes equal to the reference voltage when the charge on 
the FG is compensated. The charge on the FG terminal may 
be tuned using the hot-electron injection and the Fowler-
Nordheim tunneling mechanisms. By using this technique, 
the offset at the amplifier output may be reduced to less than 
In contrast to the coarse programming, the precise pro-
gramming is achieved by using the hot-electron injection. 
The desired epot 300 is selected by enabling its Select signal 
306 and an injection bit 322, diginject. A hot-electron 
injection mechanism 310 decreases the epot voltage 316 by 
increasing the number of electrons on the FG terminal. In 
accordance with an exemplary embodiment, hot electron 
injection may be performed by pulsing a 6.5V signal across 
the drain and the source terminals of a pFET. As the FG 
voltage 312, vfg' decreases, the injection efficiency drops 
exponentially since the injection transistor has better injec-
tion efficiency for smaller source-to-gate voltages. By keep- 45 
ing the FG potential at a constant voltage, the number of 
injected electrons and hence the output voltage change, is 
accurately controlled. To keep the FG at a constant potential, 
the input voltage of the epot 300, vref 314, is modulated 
during programming based on the epot voltage 316, since 50 
the epot voltage 316 is approximately at the same potential 
40 1 mV. 
as Vfg 312. After programming, the tunneling voltage V,un 
318 and injection voltage 320 are preferably set to ground to 
decrease power consumption, and minimize the coupling to 
the floating-gate terminal. Also, Vref 314 is set to 2.5V to 55 
have the same reference voltage for all parts of the system. 
The epot voltage 316 is programmed with respect to this 
voltage reference with an error less than 1 mV for a 4V 
output range. The amount of charge that needs to be stored 
at an epot 300 depends on the targeted weight and the gain 60 
error introduced by the input/feedback capacitors at the 
addition stage. During programming, the Reset signal is 
enabled and all other capacitor inputs are connected to Vref 
314 while periodically switching the targeted epot 300 to 
find the voltage difference when epot 300 is selected and 65 
unselected. This voltage may be used to find the approxi-
mate value of the stored weight. 
Referring now to FIG. Sb, an exemplary embodiment of 
a SH circuit SOO that achieves high sampling speed and high 
sampling precision is illustrated. The SH circuit SOO may be 
implemented by utilizing the sample-and-hold technique 
using Miller hold capacitance. The SH circuit SOO reduces 
the signal dependent error, while maintaining the sampling 
speed and precision by using the Miller capacitance tech-
nique together with amplifier Amp3 S02 shown in more 
detain in FIG. Sc. For simplification, assume there is no 
coupling between M1 S04 and M2 S06, and amplifier, Amp3 
S02, has a large gain, then the pedestal error contributed 
from turning switches (M1 S04 and M2 S06) off can be 
written as 
(7) 
where llQ1 and llQ2 are the charges injected by M1 S04 and 
M2 S06, respectively. Also, A and Cw are the gain and input 
capacitance of the amplifier, Amp3 S02. llQ2 is independent 
of the input level, therefore fl V s2 may be treated as an offset. 
In addition, the error contributed by M1 S04, i'lVs1 , may be 
minimized using the Miller feedback, and this error 
decreases as A increases. Due to the serial nature of the DA 
computation offset, the feedback path may be attenuated as 
US 7,348,909 B2 
9 
the precision of the digital input data increases. Therefore, 
Amp3 502 is preferably designed to minimize the signal 
dependent error, li. V si. 
In another exemplary embodiment, a gain-boosting tech-
nique may be incorporated into the SH amplifier, Amp4 508, 
as shown in more detail in FIG. Sd, to achieve a high gain 
and fast settling. Two SH circuits, SH1 and SH2 are used in 
the feedback path to obtain the fixed delay for the sampled 
analog voltage. In addition, the third SH circuit, SH3 , 108 is 10 
utilized to sample and hold the final computed output once 
every K cycles. SH3 108 uses a negative-feedback output 
stage 600, shown in FIG. Se, to be able to buffer the output 
voltage off-chip. Due to the performance requirements of the 
system, these SH circuits may typically consume more 15 
power than the rest of the system. 
10 
1 - (0.5 +fl JK 1 - o.sK 1 - (0.5 + fll 1 - o.5K (9) 
E=O: -CY--- =CY -CY---
1 - (0.5 +fl) 1 - 0.5 0.5 - fl 0.5 
A plot of the output error due to the gain error normalized 
by a for varying values of li. and K is illustrated in FIG. 6a. 
Since this system converts the digital input data to an analog 
output, the output error due to quantization is also provided. 
The intersection of output error and quantization error 
curves provides the minimum achievable output error of the 
proposed system and determines the precision of an equiva-
lent digital system. For example, when ti.=2- 11 , the two 
curves intersect at E=a=0.002 and K=8. This intersection 
point represents the minimum error when ti.=2- 11 and that 
proposed system is equivalent to using an 8-bit digital DA. 
DA is typically implemented in digital circuits, therefore 
an analysis of the error sources generated by the analog 
components should be considered. These error sources 
include gain and offset errors, non-ideal weights, and noise 
in the signal path. The effect of non-ideal weights mostly 
depends on the application that DA is used for. 
20 
As K becomes large, E approaches a limit which is equal to 
2li./(li.-0.5). Another source of error is the offset error. It is 
modeled as a constant error, ll, added to each j'h summation 
of weights, 
Continuing with reference to FIG. 2, as in serial digital-
to-analog converters, the gain and offset errors determine the 25 
accuracy of DA computation. If the error at the addition 
stage due to the weight errors in the epots and the mismatch 
errors between the input capacitors, Cini (for i=l, 2, ... ), is 
assumed to be negligible, then the gain/offset errors and the 
noise in the data paths become the main sources of error. In 30 
the mixed-signal DA system, the inverting amplifiers, AMP 1 
106 and AMP2 122, may introduce gain and offset errors, 
and the sample-and hold circuits, SH1 124 and SH2 126 may 
cause offset errors. In addition, the mismatch between CFB 35 
and CFBampl as well as between CFBamp2 and cinamp2 may 
cause gain errors. 
Unlike in the digital domain where a division by two is 
simply a shift of a bit, in the analog domain this operation 
as follows 
M-1 
~w;bu 
i=O 
[ 
M-1 l K-l [ M-1 l 
y[n] = - o + ~ w;b;o + L r 1 o + ~ w;bu . 
1-0 j=l 1-0 
(10) 
is achieved by employing an analog circuit. This circuit 
implementation often introduces an error and the result of 
the division becomes 0.5 plus a gain error, li.. The following 
error calculations and explanations are provided to enhance 
understanding of the theories underlying the present disclo-
sure. They are not intended to limit the scope of the present 45 
invention. The effect of li. on the output of a DA computa-
tion, y[n], is modeled by 
After distributing ~;~iK- 1 2-1 and then grouping the ll into 
40 J 
one term, the error due to offset can be written as the 
M-1 K-l M-1 
y[n] = - ~ w;b;o + L (0.5 +fl/~ w;bu 
1-0 j=l 1=0 
summation of a geometric series. 
l -0.5K 
error offset = c5--- - 2c5 = c5. 2-(K-l) 
1-0.5 
(11) 
As I increases, the offset error in the feedback loop 
(SJ 50 decreases, which is a byproduct of how DA handles two's 
complement numbers. In DA, the last summation of 
weights, 
55 M-1 
The output error caused by li. can be found by computing the 
difference of equations (5) and (8). For simplification, 
~ w;b;o, 
i=O 
M-1 
~w;bu 
i=O 
60 is subtracted rather than added. This system decreases the 
offset error especially when the I is large. For K=8 and 
ll=lOO mV, the offset error becomes 0.7813 mV. 
the term is set to a. Therefore, the output error, E, reduces to 65 
the difference of two geometric sums and can be expressed 
The random error is assumed to be Gaussian and is 
represented by X1. The random variable X1 is added to the 
summation of weights at each j'h iteration, and all X/s are 
independent and identically distributed. as 
US 7,348,909 B2 
11 
[ 
M-1 l K-l [ M-1 l 
y[n] = - Xo + ~ w;b;o + L rJ x 1 + ~ w;bu 
;=l 
(12) 
Once the term 
K-1 1= 2-J 
j=l 
is distributed and the X 01 terms are collected into one 
summation, the mean and variance ofy[n] can be written as 
and 
l -0.5K 
µy =µx--- -2µx 
1-0.5 
(T2 _ (T2 1 - o.25K 
y - x 1 -0.25 , 
respectively. As K approaches infinity, the mean of the 
random error approaches zero and the maximum variance of 
the random error becomes 4/3o2 . 
The errors due to non-ideal filter weights, such as random 
offset error, are caused by the limited precision of the epot 
programming and the epot noise. The effects of these errors 
are similar to the quantization effects in the digital domain 
which causes the linear difference equation of an FIR filter 
to become a nonlinear. 
12 
where o 2E(w) can vary from zero to 2Mo2 e. The frequency 
response of o 2E(w) for M=32 is illustrated in FIG. 6b. The 
effects of the symmetrical offset errors are similar to the 
effects of coefficient quantization in symmetrical digital FIR 
filters. These effects are reduced pass-band width, increased 
pass-band ripple, increased transition-band, and reduced 
minimum stop-band attenuation. 
In determining non-symmetric offset error, E(w) should 
not be rewritten as a summation of cosines because the offset 
10 error is not symmetrical. Assuming e[ n] is a random variable 
with a variance of a2 e' the variance of E(w), o2D is equal to 
Mo2 e for an M-tap FIR filter. Unlike the variance for 
symmetrical offset errors which varies with frequency, the 
variance for non-symmetric offset errors is constant. 
15 The effects of time-varying random error on DA compu-
tation can be modeled as 
20 
M-1 K-l M-1 
y[n] = -1= (w; + e;0 )b;0 +LrJ1= (w; + e;Jlbu 
1=0 j=l 1=0 
(15) 
Assuming each eiJ is a random variable that is independent 
25 
and identically distributed, the error can be expressed as 
30 
M-1 K-l M-1 
error= - ~ e;0 b;0 = L rJ ~ eubu 
;=l 
(16) 
Since the above equation is just a summation of random 
35 variables, the parameter of significance for this analysis is 
the maximum variance of the random error, 0 2 error· For 
simplification, the analysis assumed that biJ for all i and j is 
equal to 1. First, the variance of 
In determining, symmetric offset error, the frequency 40 
response for e[n] can be written as M-1 
-1= eiobio 
i=O 
M-1 
E(w) = 1= e[n]e-Jwn. 
n=O 
45 is computed as Mo2 . Then, the variance of is 
Assuming the FIR filter is Type-2 and the offset errors are of 
the same symmetry as the filter, E(w) can be rewritten as a 50 
summation of cosines. 
K-1 M-1 
LrJ~ eubu 
j=l 1-0 
E(w)= 
M f 2e[n]cos( i 2n; 1 )) 
n=O 
(13) calculated as 
55 
, l -0.5K 
Mer---. 
1-0.5 
Treating e[ n] as a random variable with a variance of a2 e and 
using some trigonometric identities and Euler's rule, the 60 
variance of E(w) can be written as follows These two variances results in a total variance, 0 2 error =Mo2 
(3-0.sK- 1), which approaches 3Mo2 when K is large. 
2 2( sin(wM)) 0£(W) =CT, M +-.--
sm(w) 
(14) 
Switched-capacitor techniques are suitable for FIR filter 
implementations and offer precise control over the filter 
65 coefficients. To avoid the power and speed trade-off in the 
switched-capacitor FIR filter implementations, a transposed 
FIR filter structure is preferably employed. In addition, a 
US 7,348,909 B2 
13 
rotating switch matrix may be used to eliminate the error 
accumulation. Alternatively, these problems can be partially 
alleviated by employing over sampling design techniques. 
The filter implementations with these techniques offer 
design flexibility by allowing for coefficient and/or input 
modulation. However, this design approach requires the use 
of higher clock rates to obtain high over-sampling ratios. 
The programmability in analog FIR filter implementa-
tions can also be obtained by utilizing switched-current 
techniques. These techniques allow for the integration of the 10 
digital coefficients through the use of the current division 
technique or multiplying digital-to-analog converters 
(MDAC). Moreover, a circular buffer architecture can be 
utilized to ease the problems associated with analog delay 
stages and to avoid the propagation of both offset voltage 15 
and noise. The use of a switched-current FIR filter based on 
14 
which follow. These claims should be construed to maintain 
the proper protection for the invention first described. 
What is claimed is: 
1. A reconfigurable mixed signal distributed arithmetic 
system comprising: 
an array of tunable voltage references operable for receiv-
ing a delayed digital input signal; 
a combination device in electrical communication with 
the array of tunable floating-gate voltage references 
that selectively combines an output of the array of 
tunable voltage references into an analog output signal; 
and 
a feedback element in electrical communication with the 
combination device, wherein the array of tunable volt-
ages and the delayed digital input signal combine to 
perform a distributed arithmetic function and the recon-
figurable mixed signal distributed arithmetic system 
responsively generates the analog output signal. 
DA can also be used for pre-processing applications to 
decrease the hardware complexity and area requirements of 
the FIR filters. Some of these techniques may be employed 
for post processing by using them after a DAC. However, 
the use of a high-resolution and/or high-speed DAC in 
addition to the FIR filter implementation causes an increase 
2. The reconfigurable mixed signal distributed arithmetic 
20 system of claim 1, further comprising: 
a shift register in electrical communication with an oper-
able for receiving and delaying a digital input signal 
and transmitting the delayed digital input signal to the 
array of tunable voltage references; and 
a sample-and-hold circuit in electrical communication 
with the combination device operable for sampling and 
storing the analog output signal. 
in the area and power consumption. The disclosed DA 
structure which can be used for FIR filtering employs DA for 
signal processing and utilizes the analog storage capabilities 25 
ofFG transistors to obtain programmable analog coefficients 
for re-configurability. The DAC is used as a part of the DA 
implementation, which helps in achieving digital-to-analog 
conversion and signal processing at the same time. 
Compared to the switched-capacitor implementations, 
which have their coefficients set by using different capacitor 
ratios, the proposed implementation offers more design 
flexibility since its coefficients can be set by tuning the 
stored weights at the epots. Also, offset accumulation and 35 
signal attenuation make it difficult to implement long tapped 
delay lines with traditional approaches. In one embodiment, 
DA processing decreases the offset as the precision of the 
digital input data increases. Also, the gain error is mainly 
caused by the two inverting stages (implemented using 
AMP
1 
and AMP2 ), and may be minimized using special 
layout techniques only at these stages. The measurement 
results illustrated that the output signal of the filter follows 
the ideal response very closely because it is insensitive to the 
number of filter taps and most of the computation is per- 45 
formed in the feedback path. Also, the power and area of the 
proposed design increases linearly with the number of taps 
due to the serial nature of the DA computation. Therefore, 
the disclosed system is well suited for compact and low-
power implementations of high-order filters for post-pro- 50 
cessing applications. The progranimable analog coefficients 
3. The reconfigurable mixed signal distributed arithmetic 
system of claim 1, wherein the tunable voltages references 
30 comprise a floating-gate transistor and prograniming cir-
cuitry. 
4. The reconfigurable mixed signal distributed arithmetic 
system of claim 3, wherein the tunable voltage references 
further comprises a low-noise amplifier. 
5. The reconfigurable mixed signal distributed arithmetic 
system of claim 1, wherein the tunable voltage references 
are tuned the using Fowler-Nordheim tunneling and a hot-
electron injection mechanism. 
6. The reconfigurable mixed signal distributed arithmetic 
40 system of claim 1, wherein the feedback element further 
comprises an inverting amplifier, a delay element, and a 
divide-by-two operation. 
of this filter will enable the implementation of adaptive 
systems that can be used in applications such as adaptive 
noise cancellation and adaptive equalization. Since DA is an 
efficient computation of an inner product, the disclosed 55 
system can also be utilized for signal processing transforms 
such as a modified discrete cosine transform. 
In one embodiment, the DA system 10 the digital input 
signal may be a digital representation on an analog input 
signal. For example the DA system 10, may receive an 60 
analog input signal that is sampled and represented as 
plurality of digital bits, or the digital input signal, as 
described above with reference to FIG. lA. 
7. A method for performing mixed signal distributed 
arithmetic comprising: 
programming analog coefficients to a plurality of circuit 
elements; 
storing the programmed analog coefficients in a plurality 
of storage elements; 
outputting delayed analog input signal samples from a 
feedback signal; 
outputting the stored analog coefficients at a plurality of 
times; 
summing the selected analog coefficients and delayed 
analog input signal samples at a plurality of times; and 
sampling and holding a result of the summation. 
8. The method for performing mixed signal distributed 
arithmetic of claim 7, further comprising combining the 
feedback signal with the delayed analog input signal 
samples. 
9. The method for performing mixed signal distributed 
arithmetic of claim 8, wherein the feedback signal is the 
delayed analog input signal samples that has been inverted, 
delayed, and divided by two. 
While the preferred embodiment to the invention has been 
described, it will be understood that those skilled in the art, 
both now and in the future, may make various improvements 
and enhancements which fall within the scope of the claims 
10. A method for performing mixed signal distributed 
65 arithmetic comprising: 
receiving a digital input signal having a plurality of bits; 
storing the digital input signal in a shift register; 
US 7,348,909 B2 
15 
generating a plurality of weighted analog signals by 
combining one or more of the bits of the digital input 
signal with an array of tunable voltage references; 
selectively combining one or more of the plurality of 
weighted analog signals using a plurality of digital 
circuit elements; and 
responsively generating an analog output signal. 
11. The method for performing mixed signal distributed 
arithmetic of claim 10, further comprising combining a 
feedback signal with the combined plurality of weighted 10 
analog signals. 
12. The method for performing mixed signal distributed 
arithmetic of claim 11, wherein the feedback signal is a 
combined weighted analog signal that has been delayed, 
inverted, and divided by two. 15 
13. The method for performing mixed signal distributed 
arithmetic of claim 10, wherein the tunable voltages refer-
ences comprise a floating-gate transistor and progranmiing 
circuitry. 
14. The method for performing mixed signal distributed 20 
arithmetic of claim 10, wherein the tunable voltage refer-
ences further comprises a low-noise amplifier. 
15. The method for performing mixed signal distributed 
arithmetic of claim 10, wherein the tunable voltage refer-
ences are tuned by the using Fowler-Nordheim tunneling 25 
and a hot-electron injection mechanism. 
16. The method of performing mixed signal distributed 
arithmetic of claim 10, wherein the digital circuit elements 
16 
comprise a binary value formed by selected bits of the digital 
input signal which is stored in the shift register. 
17. The method for performing mixed signal distributed 
arithmetic of claim 10, wherein the step of receiving a digital 
input signal having a plurality of bits further comprises: 
receiving an input analog signal; and 
generating the digital input signal having a plurality of 
bits responsive to the input analog signal. 
18. A reconfigurable mixed signal distributed arithmetic 
system comprising: 
a plurality of storage elements operable for receiving a 
sampled analog input signal; 
a combination device in electrical communication with 
the plurality of storage elements that selectively com-
bines an output of the plurality of storage elements into 
an analog output signal; and 
a feedback element in electrical communication with the 
combination device, wherein the combination device 
uses a plurality of digital circuits elements to selec-
tively combine the output of the plurality of storage 
elements and wherein the plurality of storage elements 
and the plurality of digital circuits combine to perform 
a distributed arithmetic function and the reconfigurable 
mixed signal distributed arithmetic system respon-
sively generates the analog output signal. 
* * * * * 
