On-Chip Contrastive Divergence Learning in by Patrice Fleury et al.
On-Chip Contrastive Divergence Learning in
analogue VLSI
Patrice Fleury, Hsin Chen, Alan F. Murray
Institute for Integrated Micro and Nano Systems,
The University of Edinburgh, Scotland - UK
Patrice.Fleury, Hsin.Chen, Alan.Murray@ee.ed.ac.uk
Abstract—We have mapped the contrastive divergence learn-
ing scheme of the Product of Experts (PoE) onto electrical
circuits. The issues raised during that hardware translation are
discussed in this paper and some circuits presenting our solutions
described. The entire learning rule is implemented in mixed-
signal VSLI on a 0.6
￿ m CMOS process. Chips results validating
our approach and methodology are also presented.
I. INTRODUCTION
Hardware implementations of probabilistic neural networks
may have much to offer for bio-medical[1], sensor fusion [2]
and deep-sub-micron applications. Therefore we are investig-
ating the hardware translation of such a probabilistic neural
network, the Products of Experts (PoE). Emphasis is placed
on its contrastive divergence learning [3] and circuits allowing
it on-chip are presented.
II. PRODUCTS OF EXPERTS
The PoE is an unsupervised, stochastic ANN comprising
a set of probabilistic generative models (“Experts”). A full
explanation of the PoE is given in [4], [5], [6].
A binary stochastic neuron adopts a state
￿
￿
￿
according to:
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
(1)
where
￿
is the activation function(Gaussian, Sigmoidal, etc...),
￿
￿
￿
the synaptic weights of a neuron,
￿
￿
and
￿
￿
the states of
the input (visible) and hidden layers.
Learning minimises the “Contrastive Divergence” based upon
the difference between the input datum and its one-step
reconstruction by the Experts (see Fig.1), as given in Eq.2:
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
!
 
￿
"
￿
￿
$
#
￿
￿
$
#
￿
%
 
￿
(2)
where
￿
is the learning rate,
￿
￿
￿ the input vector,
￿
#
￿ its
one-step reconstruction and
￿
￿
￿ &
￿
#
￿ their respective neural
activities.
The learning rule of Eq.2 is unusually amenable to VLSI.
However we have implemented a small simpliﬁcation of it
(Eq.3) to further ease its hardware implementation.
￿
￿
￿
￿
￿
￿
￿
’
&
)
(
+
*
-
,
￿
￿
￿
￿
￿
￿
￿
￿
￿
.
 
￿
"
￿
￿
#
￿
￿
#
￿
%
 
￿
(3)
The weight update
￿
￿
￿
￿
can now only assume three possible
values;
/
1
0 and zero. This has the minor effect of increasing
the learning convergence time [7]. It also has one major
implication for the quality of the learning in that its ﬁxed-
size step now has to be carefully chosen. It has to be small
for such that equilibrium can be reached.
III. VLSI IMPLEMENTATION OF THE LEARNING SCHEME
The block diagram of Fig.2 shows a the PoE’s learning pro-
cess (Fig.1) schematically.
￿
￿
and
￿
￿
￿
are voltages representing
the activity of the visible and hidden layers,
0 is the current
deﬁning the learning rate and
￿
￿
￿
is the weight voltage. The
ﬁrst 3 blocks implement Eq.2, discarding the learning rate
element
￿
. The 2 voltages
￿
￿
and
￿
￿
are multiplied together
and result in a current
2
)
3
5
4
7
6 which is then memorised as
2
￿
(STEP 1 of Fig.1). A current
2
#
is similarly derived after a
one-step Gibbs sampling (STEP 3 of Fig.1). These 2 currents
are then subtracted,
2
)
8
5
4
:
9 , accumulated and averaged over a
period of time,
2
<
;
>
=
@
? . The sign of
2
<
;
>
=
<
? represents the learning
direction. Finally the weight
￿
￿
￿
is “nudged” in the direction
dictated by the value of
2
<
;
A
=
@
? and controlled by the 2 digital
signals
B
’
C and
D
F
E
7
G
,
.
Imem
Si
Sj Sj
Si
Mult Sign
Digital
Control
Q1 Q2 Q3 Q4
Isub
I+
I−
Iout
−
+
Sub
Averaging circuit
Current Accumulator/
Iavg
ReadOut
up
Down
Weight Change
clock
Down
up
e
Wij
Fig. 2. Block diagram of the learning rule described in Eq.3
A. Analogue multiplier
Analogue multipliers are widely used in analogue signal
processing. They normally tend to be 4-quadrant and relatively
bulky (e.g. the Gilbert Multiplier). Fortunately, the PoE mul-
tiplier need only work with positive values (
H
￿
I
J
)
K
L
￿
I in Eq.1 and
L
M
I
￿
K
L
N
J in Eq.3), i.e. we require only 2 quadrants. Every 2 con-
secutive outputs from our multiplier must be subtracted from
one another, so a current output would be more appropriate.
This also means that the multiplier’s linearity is not crucial.
We are only interested in the relative difference between the
2 outputs. Minor offsets and other non-idealities are therefore
unimportant.
The multiplier depicted in Fig.3 is a wide range modiﬁcation
of Chible’s transconductance multiplier [8]. The chip results
of Fig.4 show the output current for a range of input values.STEP 3
f1 f2
P1’ P2’
b1’ b2’
f3
Sj−
Si−
probabilistic Activation
function
Sigmoid/Gaussian
Sample probability
STEP 1
Wij
d1 Si+ d2 d3
P1 P2
b1 b2
Sj+
STEP 2
f1 f2 f3
f1 f2 f3
b1 b2
Fig. 1. A 3 step learning process. STEP 1 - the data is ﬁrst fed up the network to derive the neurons activation probabilities (Eq.1), which are then
randomly sampled to decide upon the neurons states. STEP 2 - a downward pass generates a “fantasy” vector (
￿
I ) from the previous neural activity. STEP 3
- the one-step reconstructed data is ﬁnally passed up again, giving rise to new probabilities (
￿
J
￿
￿ ). Once these 3 steps are completed, learning is computed
according to Eq.2 or Eq.3
Since both
￿
￿
￿
￿
￿
￿
are positive at all time the output of our
multiplier will be restricted to the upper right quadrant of the
graph. However it can be seen that for value of
￿
￿
￿
below zero
(i.e.
￿
￿
￿
￿
2.5V) the output stays inactive.
M13
M11
M12
M3 M10
M14 M8 M7
M4
M6
Vzero M5
M2
M1 Sj
Si
Vdd
M9
R_Izero
Fig. 3. Chible’s multiplier
B. Current memory
The current memory circuit is used to ”hold” the current
resulting from the ﬁrst pass (i.e.
￿
￿
￿
￿
￿
￿
￿
￿ of Eq.3) of the data
through the network until a second current representing its
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
Sj (V)
I
M
u
l
t
 
(
u
A
)
Si = [2, 2.1, ..., 4] 
Fig. 4. Multiplier test results
one-step reconstruction(
￿
#
￿
￿
￿
￿
#
￿ of Eq.3) can be deducted from
it.
The output current of the multiplier is fed to the input of
the current memory circuit shown in Fig.5. The current
2
￿
￿
￿
is mirrored to M3 & M4, generating a voltage proportional
to it at the drain of M3. When the clock switches on M5 this
voltage is stored on
￿
￿
￿
￿
￿
￿
￿ , therefore inducing an current equal
to
2
￿
￿
￿
through M9 (the transistors are matched). Therefore
the dynamic current mirror, comprising M5, M9 and
￿
￿
￿
￿
￿
￿
￿ ,
memorises the value of
2
￿
￿
￿
every time the clock is high [9].
When the clock is low,
2
￿
retains its previous value while
2
#
is identical to
2
￿
￿
￿
at all times. At the end of a clock cycle 2
currents have been isolated by our circuit. They are then both
fed to the subtracter to calculate the ”closeness of ﬁt” for that
particular reconstruction from the generative model.
Transistor M8 is a dummy transistor [10] used to compensate
for charge injection [11] at M5 during the switching of the
clock. The clock is inverted by the CMOS pair M6 & M7 so
that the 2 identical but opposite clock feedthrough of M5 and
M8 are cancelled. This gives the voltage across the capacitor
some immunity from the clock and ensures better accuracy in
the stored current
2
￿
.
clock
Vdd
Cmem
I+ I−
Iin
M3
M1 M2
M4
M5
M7
M9
M6
M8
Fig. 5. Current memory
C. Current subtracter
The role of the current subtracter is to compare the net-
work’s neural activity when presented with a data set (STEPIbias
M22
M21
M20 M19
M18 M17
M16
Iout
Vdd
M15
Iin−
M14 M13
M12 M11
M10 M9
M8 M7
M6
M3
M5
M4
M2 M1
Iin+
Fig. 6. Accurate current subtracter
1 of Fig.1) with its activity after the one-step reconstruction
of that data (STEP 3 of Fig.1). This comparison is performed
by subtracting the 2 current inputs,
2
￿
￿
2
#
. The resulting
current,
2
8
5
4
:
9 , is a direct representation of the model’s ﬁtness
where
2
8
5
4
:
9 =’0’ (actually equals to
2
9
￿
;
@
8 of Fig.6) is the best
ﬁt.
The current subtracter we have designed for this task is
depicted in Fig.6. We have enhanced the circuit accuracy by
using improved Wilson current mirrors to keep the channel
length modulation to a minimum. The sum of the current
inputs
2
￿
￿
￿
￿
￿
￿
￿
2
<
9
￿
;
<
8 is accurately mirrored to M8 & M6 (Fig.6).
The current
2
￿
￿
￿
￿
￿
is mirrored to M13 & M15 from the current
ﬂowing into M6 at the drain of M9. The current through M9
& M11, which is the difference between
2
￿
￿
￿
￿
￿
2
9
￿
;
<
8 and
2
￿
￿
￿
￿
,
is itself mirrored the the output transistors M20 and M18.
The test results depicted in Fig.7 show that the response is
very linear. The bottom graph of Fig.7 expresses the error,
i.e. the non-linearity, as a percentage. It can be seen that it is
extremely low (
￿
￿
￿
￿
￿
) for the best part of the range and only
increases when the output current reaches zero.
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
Iin− (uA)
I
 
(
u
A
)
Iout
Ibias
Iin+
Iin−
0 0.5 1 1.5 2 2.5 3
0
10
20
E
r
r
o
r
 
%
1.573 
Fig. 7. Chip results of Fig.6
D. Current Accumulator/Averaging circuit
The weight update values,
￿
￿
￿
￿
, are aggregated over
a number of data samples before the weights are actually
modiﬁed. The quality of the learning is directly related to
the size of the data set used before the synaptic strengths are
updated. The bigger the data set (i.e. more features to learn),
M3
M11
M4
M13 M15
M5
M14 M12
Cmem Cmem Cmem Cmem 1/1
4 3
M8
2 1 F F F
M16
M7
Iin
1/4
M6
F
M1 M2
M9 M10
Vdd
Fig. 8. Current Accumulator/Averaging circuit
the better. This is, however, impractical in aVLSI. We have
therefore compromised and used data sets of 4, meaning that
the weight updates are calculated and applied every 4 samples.
Our current accumulator circuit (Fig.8) juxtaposes 4 dynamic
current mirrors [9]. Each one is active on a different clock
cycle. The dynamic current mirror relies on the fact that the
current ﬂowing through a MOS transistor depends primarily
on the voltage applied to its gate, so for 2 exactly similar
MOSFETs any
￿
￿
￿
￿
￿ will generate an identical current in each
transistor.
The current
2
￿
￿
produces a voltage at the gate of the active
load
￿
￿
￿ . When switched on for long enough this voltage
is passed onto a short-term memory capacitor,
￿
￿
￿
￿
￿
￿ , which
biases an identical transistor. Over 4 clock cycles, currents
are memorised on
￿
￿
￿ ,
￿
￿
￿ ,
￿
￿
￿ and
￿
￿
￿ . These 4 currents
are mirrored (by
￿
￿
￿ -
￿
￿
￿ ) and summed at
￿
￿
￿ . The 1 to 4
transistor’s size ratio of the current mirror that is
￿
￿
￿ and
￿
￿
￿
ﬁnally averages the currents (i.e. divides by 4).
Although not represented in Fig.8, dummy switches [10] are
used to compensate for charge injection [11] caused by the
switching of
￿
￿ -
￿
￿
￿ . As the charge injected is signal dependent
[12], the dummy transistorsizes have been chosen to maximise
the accuracy around “zero” (i.e. 1.5
￿ A). Only the sign of the
result is used in Eq.3, therefore the highest accuracy is required
at the transition between positive and negative values to ensure
the correctness of the weight update, i.e. that it is nudged in
right direction.
The test results presented in Table 1 show that the accumu-
lation and averaging of currents occurs with 7% accuracy.
However the consistency of the results over the 6 chips
suggests that the output currents are actually mostly affected
by small negative offsets. These offsets are acceptable as they
￿
I
!
  chip1 chip2 chip3 chip4 chip5 chip6 Err.(%)
0.5 0.464 0.473 0.463 0.451 0.467 0.474 6.93
0.999 0.937 0.948 0.931 0.878 0.939 0.936 7.09
1.499 1.409 1.419 1.397 1.351 1.405 1.411 6.69
1.999 1.875 1.891 1.857 1.819 1.864 1.881 6.72
2.498 2.335 2.333 2.321 2.263 2.334 2.329 7.15
2.998 2.794 2.816 2.776 2.729 2.792 2.796 7.14
TABLE I
Test results of the current accumulation and averaging. The currents are in
"
$
# . They are averaged over several runs. The error represents the mean
error across the 6 chipsM5 M6
M7
M8 M4 M3
M2 M1
Vdd
Iin Iref
Vcomp
Fig. 9. Current Comparator
will be trained out during learning, reducing the margin of
error of our circuit.
E. Sign circuit
The signcircuit receives a current whose sign determines the
”direction” of the learning step. Its role is to decide the value
of the weight update (Eq.3), in response to that input current,
by generating short spikes to control the weight change circuit
[13]. One pulse will force the weight to go up whilst the
other will push it down. The situation where both are ’on’
simultaneously is avoided, as the directionof the learning must
be clear at all times. The widths of these pulses, combined
with a current input
￿
of the weight change circuit, deﬁne the
learning rate.
The sign circuit is composed of 2 identical short pulse gener-
ators (Fig.11) and of 2 similar current comparators (Fig.9).
1) Current Comparator: The current comparators signal
whether the incoming current is positive or negative (zero
being 1.5
￿ A). One comparator’s output goes high when its
current input is bigger than 1.5
￿ A, while the other goes low.
The comparators consist of 1 NMOS and 2 PMOS current
mirrors and 1 or 2 CMOS inverters (Fig.9). When the input
current is smaller than the reference current, i.e.
2
￿
￿
￿
￿
2
￿
￿
￿
￿
￿ ,
￿
￿
￿ tries to source more current than
￿
￿
￿ can take. This causes
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
I
i
n
 
&
 
I
r
e
f
 
(
u
A
)
Iin
Iref
0 0.5 1 1.5 2 2.5 3
0
1
2
3
4
5
Iin (uA)
V
c
o
m
p
 
(
V
)
Comparator 1
Comparator 2
Iin (uA)
Fig. 10. Current Comparator tests results
D Q
Q
S
R C
Reset
A
N
D
i
n
v
Vpulse
Vbias
Vin
clock
Vdd
Cdelay
M2
M3
M1
Fig. 11. Short Pulse Generator
￿
￿
￿ to turn off and the input voltage of the CMOS inverter to
reach zero. On the other hand, when
2
￿
￿
￿
 
2
￿
￿
￿
￿ ,
￿
￿
￿ tries to
sink more current than
￿
￿ can provide, causing
￿
￿
￿ to turn
off and its drain voltage to approach Vdd [14]. The size of the
transistors have been chosen so that when
2
￿
￿
￿
￿
2
￿
￿
￿
￿ both the
comparator outputs are high, ensuring that the weight remains
unchanged (Fig.10).
2) Short Pulse Generator: The direction of the learning is
controlled by 2 pulse generators, one controls the ’UP’ step
and the other the ’DOWN’ step. Their widths deﬁne the time
for which the weight will change. Therefore it is necessary
to include control of the width and consequently the learning
step.
Each pulse generator is connected to a current comparator
(Fig.9). The output of one of them is low when the weight
is to be incremented while the other is low when it is to be
decremented (Fig.10).If they are both high, the weight remains
unchanged. Our circuit must therefore generate a low spike
only when it detects a low input signal.
When the input
￿
￿
￿
￿
of the D-type ﬂip-ﬂop (Fig.11) is below
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
0.5
1
c
l
o
c
k
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
1
2
3
V
d
e
l
a
y
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
0.5
1
Time (us)
p
u
l
s
e
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
0.5
1
V
i
n
Fig. 12. Tests results obtained by the pulse generator of Fig.110.5 1 1.5 2 2.5 3 3.5 4 4.5 5
10
−8
10
−6
10
−4
10
−2
10
0
10
2
Bias Voltage (V)
P
u
l
s
e
 
w
i
d
t
h
 
(
s
)
Second Time scale 
Milli−second time scale
Micro−second time scale 
Nano−second time scale 
Fig. 13. Test results showing the pulse widths resulting from the bias voltage
￿
￿
￿
I
￿
￿
￿
￿
its threshold voltage,
￿
￿
￿
4
￿
￿
￿
8
￿ goes low on the rising edge of
the clock. The output signal switches on the PMOS M2 of the
CMOS inverter, M2-M3, and causes the capacitor
￿
￿
￿
￿
￿
￿
￿
;
￿
￿ to
charge up at a rate ﬁxed by
￿
￿
9
￿
;
<
8 . When the voltage across
the capacitor reaches the threshold level of the inverter the
ﬂip-ﬂop forces its output to go up again (Fig.12), therefore
generating a short pulse. The time needed for the capacitor to
trigger the inverter it is connected to deﬁnes the width of the
pulse. This width, controlled by the top PMOS M1, can take
any values from 5ns, when the PMOS is fully on, to many
seconds when it is pushed in subthreshold (Fig.13).
F. Digital control
The digital circuit generating the clocks for controlling
all the on-chip learning rule instances is depicted in Fig.14.
This digital circuit generates a sequence of clocks from a
general clock. The ﬁrst 4 signals
￿
￿ -
￿
￿ control the current
accumulator by sampling every 4 data of the data sets. When
these 4 accumulations and the averaging are done, i.e. the
sign of the learning is known, the
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿ signal allows
the sign circuit to change the synaptic strengths. Test results
are presented in Fig.15. Some transitory spikes appear as
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿ goes down, but they do not have any spuriouseffects
D Q
Q C
D Q
Q C
clock
F
F
F
F
2
3
4
1
read
out
Fig. 14. Digital control circuit
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
c
l
o
c
k
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
F
1
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
F
2
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
F
3
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
F
4
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
Time (us)
R
e
a
d
O
u
t
Fig. 15. Tests results of Fig.14
on the learning as they only appear after
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿ and before
the ﬁrst current accumulation at
￿
￿ .
G. The weight change circuit
The circuit we have designed to update the weights ac-
cording to the learning rule of Eq.3 is presented in Fig.16.
The learning direction is controlled by the 2 switches
￿
￿
￿
and
 
￿
!
H
#
" which allow
￿
￿
or
￿
#
to be fed back to
the input
￿
. As Fig.16 suggests
￿
￿
is slightly bigger than
￿
%
$
(where
￿
%
$
￿
￿
) while
￿
#
is slightly smaller. By
allowing the feedback in either direction the input weight
￿
￿
￿
can be ramped to virtually any value. In theory, the step-
size
￿
F
￿
can be as small as possible. In practice, however the
mismatch between the differential pair N1 and N2 determines
the smallest possible update value. In order to minimise this,
we have laid out large interlaced transistors for the differential
pair. A fuller description of the weight update circuit and chip
results can be found in [13]. We have added a voltage limiter
(N10, N11, P7, P8, P9) to the circuit presented in [13]. We
want to restrict the weight from going below 1.8V, therefore
assuring a good response of our circuit. To do so, we use a
simple unbalanced inverter. P7 and N10 are size in such a way
that the inverter’s output goes high when the voltage store on
￿
’
&
￿
￿
?
)
(
<
6 goes below 1.8V. The second inverter, made of P8 and
N11, then switches on P9 which reset the weight to Wlim.
P8 P7
N11 N10
Wlim
Vdd
Init_ctrl
Init_weight
Down
Up
Weight
Vref_in
P6
N5
Rlow
Rhigh
P4
W−
P3 P2
N9 N7 N8 N6
N1 N2
P1 P5
N3
feedback
Cweight
W+
W’
Vdd
N4
P9
W
Vref_out
Fig. 16. Weight change circuit with easily tunable learning rate/step (
* in
Eq.3).H. The learning rule
The entire learning rule as depicted in Fig.2 has been
implemented in the AMS CUP 0.6
￿ m CMOS process. The 2
random slices of learning selected in Fig.17 show that the chip
is capable of changing the synaptic strengths of its neurons, i.e.
it learns, according to an input stimuli. In order to facilitate
the testing, we ﬁxed
￿
￿
￿ and
￿
#
￿ to 4V and we restricted
the values of
￿
￿
￿ and
￿
#
￿ to 2, 3 and 4V (ref. Fig.4). We
controlled and multiplexed the inputs with a PC, only allowing
3 scenarios for the input pattern;
￿
￿
￿
￿
￿
￿
￿
 
￿
#
￿
￿
￿
￿
￿ ,
￿
￿
￿
￿
￿
￿
#
￿
￿
￿
￿
￿ and
￿
￿
￿
￿
￿
￿
￿
￿
￿
#
￿
￿
￿
￿
￿ . According to
Eq.3 the weight should increase in the ﬁrst scenario, remain
constant in the second and decrease in the third. The test
results show that the ﬁrst and third scenario give the expected
weight change. Unfortunately the second does not. The weight
oscillates between 2 possible values. This is due to the many
on-chip offsets and the difﬁculty to obtain a real ’zero’ when
computing with analogue values. However this will not be a
problem when the step-size used is set to be much smaller than
the precision required by the application. The bottom graph of
Fig.2 shows that synaptic changes below 25mV can be realised
by our circuitry. This is an important feature as it allows the
learning parameters to be tuned to suit the needs of different
applications and data sets.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
2
2.5
3
3.5
4
W
e
i
g
h
t
 
(
V
)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
2
2.2
2.4
2.6
2.8
3
Time (s)
W
e
i
g
h
t
 
(
V
)
W
ij
S
i
−
S
i
+
D W < 25mV 
D W ~ 100mV 
Fig. 17. Test results showing on-chip learning
IV. CONCLUSIONS
We have presented the contrastive divergence learning
scheme of the Products of Experts, which we have slightly
altered to render even more hardware friendly. The circuits
designed for its VSLI implementation are described within
this paper. Finally chip results from individual circuits and
then the entire learning rule were presented to validate our
approach. They show that our chip is capable of accurate on-
chip learning and that the learning rate is easily tunable.
REFERENCES
[1] H. Chen and A.F. Murray, “A continuous restricted boltzmann machine
with an implementable training algorithm,” IEE Proceedings Vision,
Image and Signal Processing, 2003.
[2] T.B. Tang, H. Chen, and A.F. Murray, “Adaptive Stochastic Classiﬁer
for Noisy pH-ISFET Measurements,” in International Conference on
Artiﬁcial Neural Networks, June 2003.
[3] P. Fleury and A.F. Murray, “Mixed-Signal VLSI Implementation of
the Products of Experts Contrastive Divergence Learning Scheme,” in
International Symposium on Circuits and Systems (ISCAS), 26-28 May
2003.
[4] G.E. Hinton, “Training Products of Experts by Minimizing Contrastive
Divergence,” Tech. Rep. : GCNU TR 2000-004, 2000.
[5] G.E. Hinton, “Products of Experts,” in International Conference
on Artiﬁcial Neural Networks (ICANN’99), Edinburgh, Scotland, Sept.
1999, pp. 1–6.
[6] G.E Hinton, “Training Products of Experts by maximizing contrastive
likelihood,” Tech. Rep., Gatsby ComputationalNeuroscienceUnit, 1999.
[7] A.F. Murray, “Novelty Detection using Products of Simple Experts - A
Potential Architecture for Embedded Systems,” Neural Networks, 2001.
[8] Chible H., “Analog circuit for synapse neural networks VLSI imple-
mentation,” in International Conference on Electronics, Circuits and
Systems (ICECS), 2000, vol. 2, pp. 1004 –1007.
[9] E.A. Wegmann, G.a nd Vittoz, “Very accurate dynamic current mirrors,”
Electronics Letters, vol. 25, no. 10, pp. 644–646, 11 May 1989.
[10] C. Eichenberger and W. Guggenbuhl, “Dummy transistor compensation
of analog MOS switches,” IEEE Journal of Solid-State Circuits, vol.
24, no. 4, pp. 1143–1146, Aug. 1989.
[11] G. Wegmann, E. A. Vittoz, and F. Rahali, “Charge injection in analog
MOS switches,” in IEEE Journal of Solid-State Circuits (JSSC), Dec.
1987, vol. 22, pp. 1091–1097.
[12] K. Leelavattananon, J.B. Hughes, and C. Toumazou, “Very low charge
injection switched-current memory cell,” in ISCAS, 1998, vol. 1, pp.
531–534.
[13] P. Fleury and A.F. Murray, “High-accuracy mixed-signal VLSI for
weight modiﬁcation in contrastive divergence learning,” in International
Conference on Artiﬁcial Neural Networks (ICANN’2002), 27-30 Aug.
2002, pp. 426–431.
[14] T. Borgstrom and S. Bibyk, “A neural network integratedcircuit utilizing
programmable threshold voltage devices,” in ISCAS, 1989, vol. 2, pp.
1227 –1230.