The design, implementation and operation of a low power multilayer perceptron chip (Kakadu) in the framework of a cardiac arrhythmia classi cation system is presented in this paper. This system, called MATIC, makes timing decisions using a decision tree, and a neural network is used to identify heartbeats with abnormal morphologies. This classi er was designed to be suitable for use in implantable devices and a VLSI neural network chip (Kakadu) was designed so that the computationally expensive neural network algorithm can be implemented with low power consumption. Kakadu implements a (10,6,4) perceptron and has a typical power consumption of tens of microwatts. When used with the arrhythmia classi cation system, the chip can operate with an average power consumption of less than 25nW.
tecture enables one to keep the bene ts of a powerful morphology classi cation algorithm yet maintain low power consumption.
Section II describes the neural network VLSI chip (Kakadu) which was designed to perform the morphology part of the MATIC system. A comparison of various training algorithms and their ability to train the Kakadu chip is made in Section III. In Section IV, examples of Kakadu applied to simple classi cation problems are described. Section V describes the MATIC algorithm and the system's performance when applied to a large database of arrhythmias. Finally, a brief discussion followed by conclusions on this work are presented.
II. Kakadu MLP Chip
The Kakadu chip was implemented using low power analog techniques because they have the following advantages subthreshold analog circuits enable implementations which consume very little energy they are easily interfaced to the analog signals in an ICD (in contrast to digital systems which require analog to digital conversion) analog circuits are generally small in area fully parallel implementations are possible a certain amount of fault tolerance may be exhibited by the neural network.
A. Architecture
Kakadu implements an arti cial neural network based on the multilayer perceptron model 3]. A block diagram of the chip is shown in Figure 1 . The chip takes voltage inputs and passes them through the rst array of synapses to produce six pairs of hidden layer currents. These currents are converted to voltages using linear resistors that are external to the chip. The same nodes are used as voltage inputs to the next layer which produces output currents which are converted to voltage outputs by the third neuron layer.
The main blocks of the chip are two synapse arrays, a current source and weight addressing circuitry. The synapse's digital to analog converters are binary weighted current sources which are controlled by digitally stored weights. A common current source is used to supply bias voltages to synapses in each DAC. The circuit can be operated over a wide range of bias currents.
Although inputs to the neural network are analog, synapse values are written digitally. This enables con guration of the chip to be performed digitally but keeps the actual signal processing in the analog domain. The synapse array appears as an 84 word RAM (the rst layer having 10 6 words and the second layer having 6 4 words) with a 6 bit word size. Synapses are addressed by row and column through pairs of multiplexed row and column shift registers.
B. Implementation
B.1 Current Source
A single current source is used to provide biases for all synapses of the chip. The current source (Figure 2 ) is constructed by summing unit current sources. For transistors with uncorrelated matching properties, summing N unit current sources improves the matching by a factor of p N 4] . Correlated matching properties such as changes in doping or oxide thickness are addressed by arranging the current sources in a common centroid con guration 4]. Large (553 m 2 ) transistors are used for the current source although smaller (81 m 2 ) transistors are used inside the digital to analogue converters (DACs) in order to keep the total synapse area small.
The bias current is controlled by an o chip current or voltage. Since all of the currents feeding the synapses are derived from this single input, the entire circuit can be switched o by making Iin equal to zero. The current source can operate in either strong inversion or subthreshold mode, depending on the magnitude of the bias current. In the experiments, subthreshold operation was used.
B.2 Synapse
The synapse is composed from registers which store the weight values, a linear DAC and a transconductance multiplier. The bias current is the same as the unit current for the DAC so each DAC can output 31 times the bias current. A circuit diagram of the synapse is shown in Figure 3 .
Since synapses are the most numerous elements in a neural network, the size of the network that will t in a given area is controlled by their dimensions. Although small DRAFT synapses are required, the matching of crucial transistors (the 5 mirror transistors connected to I0{I4) within the synapse is proportional to the square root of the transistor area and so these transistors should be made as large as possible. A compromise was reached in selecting 81 m 2 transistors for the I0 to I4 mirrors within the synapse.
Storage of the synapse values is achieved using registers, the value of which are converted to analog values via a DAC. To achieve a small synapse area, the registers were designed to be as narrow as possible since each register contains 6 ip{ ops.
The DAC is constructed through current summing. Each bit of the DAC is controlled by a pass transistor which can be turned on or o depending on the value stored in the (static) input ip{ op (B0{B4). I0{I4 are voltages taken from the current source which serve to bias the currents in powers of two. B5 is used to encode the sign and is included in the synapse rather than the DAC. The entire synapse array appears as a large (write only) register to the controlling digital circuitry which programs the weight values.
The DAC is connected to a transconductance multiplier to form a synapse. The multiplier has a pair of voltage inputs, a current input (from the DAC) and a pair of current outputs. The transfer function of this multiplier is given by the relation 
The multiplier is linear with the current inputs (from the DAC) and nonlinear to the neuron voltage inputs. This is the desired situation as if they were reversed, the tanh function would only serve to compress the range of weight values available and would not allow nonlinear problems to be solved. The DAC only produces positive values. Current switching logic controlled by B5 enables the output to be changed in sign if a negative weight is desired. The V + and V ? inputs are from either neurons or input pins. Output of the multiplier are two current sinks.
The area of a synapse in 1:2 m double metal, single poly nwell technology is 106 113 m which includes all of the weight storage registers and switches, I0{I4 current mirrors, multiplier and sign switching circuitry. A neural network can be constructed from a single current source (described in Section II-B.1) and a synapse array. A larger, single layer version of Kakadu has been designed, which contains a 50 50 array of synapses, current source and weight addressing logic on a 7:2 7:2 mm die.
B.3 Neurons
In a low power system, where the neuron input current can be of the order of ten nanoamps, a high impedance of the order of 1 M is required. This is hard to implement in standard MOS technology because di usion and polysilicon do not have the high resistance necessary, and an active circuit with the desired transfer characteristic is di cult to design. If on{chip neurons are used, a method of measuring the activation of at least the output neurons is required for training, and this requires bu ers to drive the signals o {chip.
A possible solution to this problem is to implement the neurons using o {chip resistors. The resistors are provided o chip in order to allow easy control of the impedance and transfer characteristics of the neuron. The neurons also serve as test points for the chip. It is envisaged that these neurons will later be integrated onto the chip using either an active circuit or a high resistance material such as that used in a static ram process. Since the neurons are implemented externally to the chip, nonlinear neurons can also be used.
However, a resistor has a linear characteristic which, at rst glance, appears unsuitable. This problem was addressed by implementing the nonlinear characteristic required by the neural network in the synapse instead of the neuron. Using this technique, the nonlinearity of the Gilbert multiplier is used to an advantage.
The linear neurons mean that the transfer function of the Kakadu network are proportional to that of the synapses alone, and the transfer function is
a i = u i (3) where u i is the summed output of the synapses, a i is the neuron output, and are constants, l denotes the lth layer (0 l L ? 1), L is the total number of layers (namely 2), N l is the number of neuron units at the lth level i is the neuron number (1 i N l ) and f l (x) = tanh( x 2 ). For a two layer network, Equation 3 is very similar to the typical multilayer perceptron model as illustrated in Figure 4 . Any set of inputs can be scaled so that they are within DRAFT (x) and so the initial nonlinearity applied by f 0 (x) (ie. f l (x) where l = 0) does not change the computational capability of the circuit. There is an absence of a nonlinearity in the nal layer, and this can be thought of as a linear output unit. Equation 3 can thus be rewritten in the familiar multilayer perceptron form
where g 0 (x) = tanh( x 2 ), g 1 (x) = x and it is assumed that the inputs have been initially passed through g 0 (x).
As shown in Section IV, this does not a ect the neural network's ability to solve highly nonlinear problems such as xor and the parity problems. The disadvantages of using o {chip neurons are that since the currents must travel through pins so pin leakage may a ect the circuit and also, for larger networks, the number of pins required may become excessive. The larger parasitic capacitances associated with the neurons also reduce the bandwidth and possibly increase the power consumption of the system. It should also be noted that all analogue VLSI neural network implementations have limited input and output ranges since they are (at best) bound by voltage and current restrictions imposed by the supplies.
DRAFT B.4 Kakadu MLP Chip
Kakadu was fabricated using Orbit Semiconductor's 1:2 m double metal, single poly nwell process. A photomicrograph showing the main synapse blocks, row shift registers and the current source is shown in Figure 5 . Kakadu has 10 input, 6 hidden and 4 output neurons and hence is called a (10,6,4) neural network. It can implement any smaller network than (10, 6, 4) by setting unused synapse values to zero. A summary of the major chip features is shown in Table I. C. Chip Testing
C.1 Jiggle Chip Tester
The Kakadu chip was tested using the \Jiggle" test jig 5]. Jiggle was designed at the University of Sydney and is a general purpose chip tester having 64 12 bit analog input/output channels as well as 64 digital input/output channels. Jiggle connects to a VME bus, and the VME cage is interfaced to a Sun SPARCstation IPC via a Bit 3 Model 466 SBUS to VME converter.
Jiggle allows arbitrary analog or digital signals to be presented to the pins of the test chip and thus allows software control of the weight updating and training of the Kakadu chip. For the experiments described below, a supply voltage of 3 V, a bias current of 6:63 nA and neuron values of 1:2 M were used.
C.2 MDAC Linearity Test
The transconductance multiplier used in the Kakadu MDAC has a transfer function described by Equation 1. The output of the DAC (I DAC in Figure 3 ) is equal to I out+ ?I out?
(in Equation 1) and so for xed input voltages the equation can be simpli ed to I out = I DAC (6) where is a constant and
I out is connected to a neuron (a 1.2 M pullup resistor), and so the output can be measured as a voltage. Plots of the (measured) MDAC linearity for three bias currents are shown in Figure 6 . One can see that monotonicity is not achieved for the bias current of 6:63 nA (at which the chip is operated). The two points at which it is not monotonic are when the absolute value of the DAC input changes from 15 to 16 and is hence due to the most signi cant bit current source. At 15.9 nA, however, the DAC is monotonic. Training is used to account for the nonlinearity of the DAC.
C.3 Synapse Transfer Function
The synapse followed by a neuron has a transfer function described by V out = R(I out+ ? I out? ) (8) where R = 1:2 10 6 and (I out+ ? I out? ) is given by Equation 1. A curve t was used to nd (26:0719) and a plot of the measured and expected synapse transfer function can be seen in Figure 7 .
C.4 Power Consumption
It is useful to be able to estimate the power consumption of a chip like Kakadu. This is a function which is linear with the weight values since I DAC in Figure 3 is the current drawn for that particular synapse. A number of current consumption measurements were made for di erent weight values and then a least squares t was used to derive the current consumption formula I KAKADU = 0:842 + 0:00736 N X i=0 jw i j ( A) (9) where w i is the ith weight, i indexes through all of the weights in the chip and I KAKADU is the current consumption in A. Figure 8 shows the measured current dissipation of the chip and the curve t of Equation 9 to this data. Note that the maximum power consumption (60 W) of this chip occurs when all the weights are set to the maximum value.
III. Comparison of Training Algorithms
Training of analog neural network chips is much harder to achieve than their digital counterparts. Mismatch of transistors, non{perfect transistor models and noise means DRAFT that a mathematical formula for the neural network transfer function cannot be attained and thus a formula for the gradient also cannot be reliably computed.
The training problem can be posed as an optimization problem in which a function, f(w) can be de ned as f(w) = X p (ff(w; p) ? o(p)) 2 (10) where p = input patterns, ff = Kakadu feedforward function, w = weights, o(p) = desired output for pattern p and the optimization task is to minimize f(w).
Training is achieved by using the chip in feedforward mode, with the chip interfaced to a computer via Jiggle. An initial set of weights are written to the chip, and the training vectors are applied in turn to compute f(w). The weights are updated by the training rule in order to minimize f(w), and this procedure repeated until the error reaches a su ciently low value. In all of the training experiments, the error termination condition was that the mean squared error divided by the number of training patterns must be less than 1 10 ?4 .
Several techniques were used to train Kakadu by minimizing Equation 10 . A comparison between di erent training algorithms is given below. An extensive set of experiments were performed to assess the capabilities of the Kakadu architecture in terms of training and generalization performance, and to compare the speed of various training algorithms. The algorithms used were bp back-propagation. The gradient calculation were derived from Equation 3, cprs constant perturbation random sign ( 6] ), csa combined search algorithm as described in Section IV-.1, sa{only pure simulated annealing, sa combined search algorithm with simulated annealing instead of the partial random search, sed stochastic error descent ( 7] ), swnp summed weight neuron perturbation ( 8] ) and wp weight perturbation ( 9] ). A complete description of the training experiments has been reported 10] and a summary of the relevant results are presented here. Note that wp is sequential with respect to the weights, whereas swnp, sed and cprs are parallel weight perturbation methods. The training problem was that of ICEG morphology classi cation. The training set consists of 8 QRS complexes taken from a single patient (4 VT 1:1 and 4 NSR). The testing set consists of 220 patterns (150 VT 1:1 and 70 NSR). Each experiment was repeated twenty times with di erent starting conditions. Tables II and III show the performance of batch mode training, where weight updates were computed for each pattern and applied when all patterns have been processed. The maximum number of iterations was set to 2000. The results show that in batch mode, Kakadu can be trained to produce very good generalization performance.
A. Batch Training

B. Online Training
Tables IV and V show the performance of the training in online mode, where weight updates are computed for a given input patterns and then applied before the consideration of the next pattern. The maximum number of iterations was set to be 1000. The combined search algorithms (csa, sa and sa{only) do not appear in the tables since they work only in batch mode. As can be seen from the table, all algorithms are capable of training Kakadu with very good generalization performance.
C. Discussion on Training
All the training algorithms tried were successful in training Kakadu in batch or online mode, or both. The algorithm that was most successful in both batch and online mode was sequential weight perturbation (wp). None of the online methods managed to converge every time. DRAFT Of the batch algorithms csa, sa{only, sa and wp converged in all cases. Of these four algorithms that converged every time, csa had the best generalization performance.
Since only 6 bit weights are used, the limited range and resolution of the weights make Equation 10 discontinuous and hence di cult to train. However, these training results show that reliable training and generalization can be achieved despite this problem.
IV. Other Chip Training Examples
Although Kakadu was designed primarily for low power arrhythmia classi cation, it has been applied to other classi cation problems. In this section, a series of training examples of increasing complexity are described.
An output was considered correct if the di erence between the measured and desired output was less than a particular margin. In all of these experiments, this margin was set to be 0.08V.
There is a linear relationship between the value of the neuron gains (which are determined by the resistance of the neurons) and the value of the bias current. For example, the Kakadu chip will produce the same output if the neuron gain is doubled and the bias current halved. Since the neuron gains were xed at 1:2M , the bias current can be adjusted to suit the problem. This was not necessary in the experiments that were performed, but it is envisaged that this may be necessary for some problems. In all of the experiments below, the bias current used was 6:63nA unless otherwise stated.
The Combined Search Algorithm 11] was chosen to perform the training in all of the Kakadu training experiments because of its reliable training and generalization performance.
.1 Combined Search Algorithm
The combined search algorithm 11] employs two minimization strategies, namely modi ed weight perturbation and random search. Modi ed weight perturbation is a local search and the random search algorithm is a non{local search technique. CSA can be described by the following pseudocode The CSA algorithm is very simple and the results obtained are surprisingly good, convergence being very fast for small problems. Although CSA has been successfully used to train Kakadu, it is expected that performance would degrade rapidly for larger neural networks.
.2 XOR XOR has been a benchmark problem for neural networks because it is a simple yet highly nonlinear application. The minimum network size which can solve this problem is (3,2,1) with one input being a bias. To make Kakadu behave like a smaller network, the weight values for the unconnected synapses are set to zero. Kakadu was successfully trained on this problem, results of this test being shown in Table VI. The power consumption for these neural network training problems was measured after the outputs had settled to the nal output value. This is the static consumption and includes the chip plus the o {chip neuron dissipation. For XOR, this gure was 6:9 W at 3 V and the standard 6.63 nA. The same problem has been trained with bias currents down to 3:5nA.
The settling time from a change in the inputs until the output reaches 90% of the nal value (the Kakadu chip driving a 2pF active probe) was typically 30 s. Of course, a higher load capacitance will increase the settling time of the chip.
.3 PARITY (3 BIT)
Another nonlinear benchmark test is the 3 bit parity problem which can be thought of as XOR in three dimensions. In this example, a bipolar coding was used instead of the unipolar coding of the XOR problem to show that Kakadu is capable of both.
Kakadu was successfully trained using a (4,3,1) network, the results of the experiment Table VII . The quiescent power consumption for this problem was 9:0 W.
.4 PARITY (4 BIT)
Another nonlinear benchmark test is the 4 bit parity problem which can be thought of as XOR in four dimensions. This test was successfully trained using a (5,4,1) network, the results of the experiment being shown in Table VIII. The quiescent power consumption for this problem was 15:6 W.
In this example, a satisfactory result could not be achieved by the direct optimization of Equation 10 on the test chip. A mathematical model of the chip was derived from Equation 3 and weights were allowed to be oating point values within the maximum .
Pattern Recognition
Kakadu was tested on a simple character recognition problem ( Figure 9) . A (10,6,4) network was divided into a bias unit and a 3 3 pixel array. The network was trained (bias current 4:4nA) on the characters`0',`1',`7' and`+', each output being assigned to one character. After training, a single bit in each character was corrupted and the network output passed through a \winner take all" decision to determine the network's classi cation of the corrupted character. The results of this experiment (shown in Tables IX and Figure 9) show that Kakadu was able to correctly classify patterns that it had not been trained on. Kakadu draws 22:5 W during this test.
V. MATIC Experiment
A. MATIC Algorithm
The MATIC algorithm 2] classi es arrhythmias based on timing and morphological features (see Figure 10) . Inputs to the classi er are voltage levels obtained from temporary catheters implanted on the surface of the heart, one in the high right atrium (HRA) and one in the right ventricular apex ventricle (RVA). Such signals are known as intracardiac electrograms (ICEG). The raw signals are then bandpass ltered (0.5{100 Hz) and this becomes the input to the MATIC algorithm. MATIC can identify four di erent types of arrhythmia, namely ventricular brillation (VF), ventricular tachycardia (VT), supraventricular tachycardia (SVT) and normal sinus rhythm (NSR). These four di erent arrhythmias correspond to the four di erent therapies available in an ICD, which are to apply a high energy shock (de brillate), pace the ventricles, pace the atrium or do nothing.
The R and P wave detectors are peak detectors which are used to identify depolarisations in the RVA and HRA channels respectively. From the output of the R and P wave detectors, the timing classi er can determine the depolarisation sequences of the heart. From this timing information, reliable classi cation can be made for most arrhythmias. A certain type of VT, called ventricular tachycardia with 1:1 retrograde conduction (VT 1:1) cannot be classi ed based on timing between two channels. However, VT 1:1 is often characterized by a change in morphology of the RVA signal and a neural network is used to recognize normal and VT 1:1 signals for a particular patient. The timing and morphology classi ers run in parallel and the results are combined using simple arbitration logic to produce a nal classi cation.
A.1 MATIC Con guration
Most patients do not require morphology classi cation, timing being su cient to identify the arrhythmia. However, for some patients morphological classi cation is required. DRAFT human must con gure the MATIC system before it is used in order to identify whether morphology is required. MATIC is used with morphology only for VT 1:1 patients. In the case of non VT 1:1 patients, the arbitration logic discards all results from the neural network. A ow chart/block diagram of this con guration process is shown in Figure 11 .
Con guration involves deciding if the patient is a VT 1:1 patient or not and if so, 4 samples each of the patient's NSR and VT 1:1 morphology must be provided to serve as templates for the morphology classi er. After training, the weights need not be changed.
After con guration, the MATIC system is fully automatic, taking the ltered data as input and producing NSR, SVT, VT and VF classi cations as output.
A.2 Timing Logic
The timing logic used in Kakadu can be described by the owchart shown in Figure 12 . The timing logic rst computes the interval between R waves (RR interval), the interval between P waves (PP interval) and the interval between the last P wave and the last R wave (PR interval). From these three numbers, the timing logic can identify four di erent types of arrhythmia, namely ventricular brillation (VF), ventricular tachycardia (VT), supraventricular tachycardia (SVT) and normal sinus rhythm (NSR). These four di erent arrhythmias correspond to the four di erent therapies available in an ICD, which are to apply a high energy shock (de brillate), pace the ventricles, pace the atrium or do nothing. It is a very simple classi er, implemented using a decision tree which classi es the sequences much as a human would.
A.3 Neural Network Morphology Classi er
A typical VT 1:1 patient is shown in Figure 13 . It can be easily seen in the gure that the rst three peaks (QRS complexes) are a di erent morphology to the last 5 QRS complexes. These patterns correspond to normal sinus rhythm (NSR) and ventricular tachycardia (VT) respectively and the neural network is used to recognize these rhythms.
A window of samples is formed via a delay line as illustrated in Figure 14 . The neural network can be thought of as running continuously, and the output of the network is read when the QRS complex is centered in the 10 sample window formed by the delay line. The R wave detector is used to detect the middle of a QRS complex and hence center the QRS complex.
In order to construct a training set for a given patient, 4 normal and 4 VT 1:1 rhythms are hand selected. Normal rhythms were trained to have an output of 0:0V and VT 1:1 rhythms 1:0V . The weights used by the MATIC morphology classi er are then obtained by training the neural network on this data set. When ICEG signals are applied to the neural network, a VT 1:1 morphology is assumed present if the output becomes greater than 0:8 V .
A. 4 The output is then passed through an X out of Y detector which outputs a classi cation only if 5 out of the last 6 output classes from the post processor are the same.
B. MATIC Results
The MATIC system was used on a database of 12483 QRS complexes recorded from 67 patients 12] during electrophysiological studies (EPS) where temporary catheters were placed under uoroscopic guidance in the ventricle (RVA) and the atrium (HRA). These signals were sampled at 1000 Hz and recorded.
The MATIC algorithm was applied with morphology algorithm being implemented on the Kakadu chip (via Jiggle) and the rest of the algorithm implemented in software. Note that in order to facilitate debugging, the delay line of Figure 14 was implemented in software. An experimental prototype of a bucket brigade device delay line was implemented on an earlier chip 13] and found to have a charge transfer ine ciency of 0.035% which would be more than adequate for this application.
The database was \played" through this hybrid hardware/software system in order to obtain the results which are tabulated in Table X . The results of classifying only the VT 1:1 patients (the neural network is not used for other patients) is shown in Table XI . In order to compare the results of the experiment with that of a digital neural network, a standard two layer perceptron (TLP) of the same network size was rst trained. The results of this are shown in Table XII . The TLP network has marginally better performance than the Kakadu network and this is mostly due to the limited precision weight range available on the Kakadu chip (6 bits).
The power consumption of Kakadu for the 10 VT 1:1 patients are shown in Table XIII . The maximum power consumption of the chip was 25 W for the patients studied. The propagation delay of the Kakadu chip is approximately 30 s, and Kakadu has negligible (< 100 pA) power consumption at zero bias. If a conservative value of 1000 s for propagation is allowed, the energy consumed is 25 nJ per feedforward operation. The bias to the chip can be turned o when it is not being used (99.9% of the time), and so the average power consumption of the system (assuming the normal heart rate is 1 Hz) can be reduced by a factor of 1000 to less than 25 nW.
VI. Discussion
The problem of arrhythmia classi cation by morphology is not a particularly di cult problem if one is given a reasonable power budget. However, an ICD requires very low power consumption, and this precludes the use of most arrhythmia classi cation algorithms.
Neural networks have been successfully applied to pattern recognition problems in many areas of signal processing. Their regular and parallel architecture make them suitable for VLSI implementation and they can be implemented e ciently using analog techniques.
Neural networks are particularly suitable for use in implantable devices since analog computation leads to small area, and by operating transistors in the subthreshold region, low power consumption is achieved. These design ideas were used to implement the MATIC system which has the advantages of neural network classi ers, while maintaining low power consumption.
VII. Conclusion
We have described the MATIC system which classi es arrhythmias based on a hybrid decision tree and neural network approach. Because a neural network is computationally expensive, a low power analog VLSI neural network chip was designed and successfully trained to perform this task. The system can classify a large database of arrhythmias to an accuracy of 98.4% while consuming an average power of less than 25 nW.
VIII. Acknowledgements
This project was supported by Australian Generic Technology Grant 16029 and Telectronics Pacing Systems Ltd, Australia. The authors would also like to thank the reviewers for their constructive comments.
