Abstract -A novel circuit-level Hamming artificial neural nehuork architecture based on the principle of analog chargebased computation of the neural function is proposed. kwinner-take-all and k-loser-take-all operations are performed in the time-domain, allowing for fast and compact realization of complex functions. The VLSI realization of a twdimensional array arrangement of the Hamming network is presented, with the targeted precision alignment image processing application.
I. INTRODUCTION
The complexity of artificial neural network (ANN) .algorithms has opened the door to their hardware implementation with the main target of accelerating all computations involved. Silicon CMOS has proven to he an .advantageous medium, allowing for high integration .density and vety fast operation. Hence, many realizations have been demonstrated both in the digital and analog domains [I] - [2] .
However most of the proposed circuits face severe limitations in terms of circuit resolution, temporary memory size and their access schemes, realization of high interconnect density. Moreover, the increasing demand for low-power embedded systems remains a hurdle for computation intensive ANN related algorithms. Analog VLSI addresses these issues by the implementation of analog atomic elements, each processing some specific neural functions, and to be repeated into a regular structure forming a high-performance processing unit.
The hardware Hamming ANN implementation proposed in this paper follows this approach, with the goal of constructing a building block for high-speed and lowpower image processing applications.
CAPACITIVE-BASED NEURAL HAMM~NG OPERATION
A Hamming ANN is a two layer feed-forward neural network which has the ability of classifying input pattems, based on the criterion of the Hamming distance between previously, stored patterns, and the actual input vectors [3]. It always converges towards one of the stored pattems, as a benefit of its architecture. The first layer -called the quantifier network -is composed of a number of neurons which perform the Hamming distance computation. The second layer -called the discrimination layer -is traditionally composed of a feed-forward network which perfoms the winner-take-all (WTA) operation, i.e. selects the first-layer neuron of smallest Hamming distance as the winner.
Pattern classification applications based on the chargebased operation of the Hamming VLSI circuit have been previously demonstrated [4]- [6] , where the quantifier networks is composed of capacitive-threshold logic (CTL) gates [7] , and the discrimination network consists of an n-input version of a sense-amplifier, all processing being thus performed in the analog domain. CTL has been proven to accommodate a vety large fan-in while performing the weighted sum of input vectors at high-speed and low power, thus qualifying for signal processing applications PI.
The design proposed in this paper consists of a modified CTL-based first layer network, driving decision circuits, which in tum are to be connected to analog or digital postprocessing depending on the targeted application, to replace regular WTA units with increased functionality units performing in the time-domain.
The quantification network consisting of CTL gates is depicted in Fig. 1 , where a number of capacitances make the bridge between a common node called row and the set of digital inputs and prechage circuitry. A perturbation 0-7803-7898-9/03/$17.00 0 2 0 0 3 IEEE column is added to the regular CTL gate as an analog input to modify the row voltage state [9] . The comparator and buffer stages build the decision network which provides the outside world with a binary decision. The circuit operation is based on a three cycles scheme, throughout which charge conservation applies to the row node. The quantification operation is realized with a very simple two phase non-overlapping clock scheme consisting of a precharge (QR) and evaluation phase (OE). All nodes are imposed a voltage during the precharge phase. The row voltage is set to an externally imposed voltage Vu,l while the capacitances other nodes' voltages are imposed as reference voltages set to VDU, or GND with minor adaptation of the circuit. Note that the Vu,, row precharge voltage can be conveniently synthesized as the comparator's threshold voltage, as used in a later phase, ,e.g. Fig. 6 . The amount of charge transfemed to the row is equal to: column, and the driving signals.
c y = -& All nodes are set back into high impedance after completion of the precharge phase. The subsequent evaluation phase starts then, throughout which several vectors may be applied without performing any extra reset. The charge on the row node is considered as constant, the time constant of the leakage parasitic process being significantly larger than that of the system operation.
However, the charge on the row capacitors nodes is affected Assuming the equality of these two charges, the row voltage is forced to vary to AV,,, :
c y + c, Eventually, the comparator circuit restores a binary voltage depending on the sign of the row'voltage variation (here V,, is considered at the output of the comparator stage):
AV, , = 0 + limit of thecircuit precision SPICE simulations of the whole process can be seen on Fig. 2 where a first layer oftwelve neurons was used. proportional to the H h i n g distance, thus it is possible to use it either way, with or without any WTA unit.
OPERATION MODES OF THE HAMMING A m
A number of working modes can be derived from the hasic architecture described previously. The selection must be made prior to design, as to each working mode corresponds some circuit operation characteristics which is depending on the targeted application, as well as their related hardware.
The circuit can be Operated to detect relative or absolute Hamming distances. The neurons configurations for these two modes arc depicted in Fig. 4 .
In order to work on relative distances, only the capacitances representing a 'Logic 1' value to he stored for later comparison are integrated. Thus the operation actually implemented is an AND between the input and the pattem stored in the capacitances. Thus, the operation of this sort of neuron results in a distance computation that is not absolute, i.e. it must be compared to the respective results of other neurons computations in order to produce a meaningful information. Weighted computations are allowed in this mode, typically using different capacitance values, which allows for more advanced discriminations [61.
Using a unit capacitance for each input, while processing a digital Hamming distance operation prior to the capacitive stage allows the Computation of the absolute Hamming distance. The logic operation to be performed is an XOR between the input and stored data. The row voltage after evaluation of this sort of neuron is The second proposed working mode is related to the kind of perturbafion signal that is applied to the perturbation columns, and hence affect the row voltages. in this paper we consider the perturbation signal to be equal for each neuron, which is not restrictive and can be easily modified for further applications. Both a ramp and a pulse perturbation signal prove to be interesting candidates, targeting at very different potential applications. Their respective effect on row voltage can be seen on Nevertheless, some very fast operation times can he achieved using this working mode. A limited amount of digital and mixed-mode circuitry has to he added as postprocessing elements. Regular WTA operation is achieved by detecting the neuron which switches at last as a result of ramp perturbation. Conversely, Loser-take-all (LTA) consists of detecting the neuron which switches the first. Moreover, the k-WTA and k-LTA operations simply consist of detecting the k neurons which switch their output polarity at last, respectively the first. The postprocessing circuits allowing these features can be developed from regular Boolean logic. Nevertheless, a better solution consists in using a CTL gate to perform the k-switches operation, which in t u n triggers a bench of latches to capture the network output state as the computation result.
Similarly, using a limited amount of digital circuihy which can he easily synthesized, it is possible to achieve vector ranking based on their Hamming distances with the stored pattems. One possible post-processing circuit would consist of a log,(n)-hit counter, n lo&(n)-bit memory cells, and some combinatory logic. Each switching neuron triggers both the counter, and the latching of the counter's value into the neuron's own related memory cell, allowing for proper ranking of all events including simultaneous switching.
Finally, some slight modifications of the hardware are shown to produce dramatically different row voltage responses during the evaluation process. The possible cases are shown in Fig. 6 , which the following Paragraphs refer to. Cases 1 and 4 depict the maximal allowed row voltage amplitude, and Cases 2 and 3 depict the minimal forced row voltage. An offset capacitance with a value of one half of the unit capacitance is added on each neuron's row to guarantee this minimal voltage difference from the decision threshold of the comparator gate.
In the previous, it was admitted that the precharge state of the input column he GND, and the precharge state of the perturbation column he VDD. Following these assumptions, the circuit state during evaluation is depicted by Cases 1 and 2, where the evaluation process causes a voltage increase above the decision threshold level, and the subsequent perturbation ramp has tostan at VDD and drop to GND. Altematively, it is possible to switch the precharge value of input and perturbation columns, resulting in an evaluation state depicted by Cases 3 and 4. Here, the evaluation voltage is lower than the threshold level due the precharge at VDD being higher than the total voltage induced by the input vector. The perturbation ramp, or pulse have to he driven from a lower to a higher voltage.
Considering one of these modes, for example precharge of input columns to GND, it has been shown that the row voltage response is limited to Cases 1 and 2. Moreover, a hardware driven selection of the affectation of WTA and LTA to eitber Cases 1 or 2 can be achieved by the proper selection of the logic function to be applied to the input and stored bits prior to application of the operation result to the capacitance. The selection of the XNOR causes a row voltage increase when both the input and the stored vector have bit-by-hit similarities, whereas the inverted case is true for the selection of the XOR. Consequently, using an XNOR, the Winner neuron is depicted by Case 1 and the Loser neuron by Case 2; using an XOR, the Winner neuron -the one with maximal similarity -is depicted by Case 2 and the Loser neuron by Casel . Noticing that the neuron depicted in Case 2 always switches the first during the perturbation phase, it becomes possible to adapt the hardware to the desired application in order to have the fastest circuit response time. 
IV. AN IMAGE PROCESSING BU~LDING BLOCK
A two-dimensional array mangement of the Hamming network where each data is connected to a horizontal and a vertical IOW simultaneously, thus allowing direct mapping with a black-and-white image was demonstrated as a successful circuit for precision alignment systems. Closedloop system simulations were NII using Matlah software. The corrections algorithms were intentionally kept very simple in order to maintain a very low hardware overhead.
Absolute Hamming distance computation, ramp signal perturbation and regular XNOR Hamming operation were chosen as the working mode. 
V. VLSI INTEGRATION OF THE HAMMING ANN
The system-level view of the two-dimensional Hamming distance comparator is depicted in Fig. 8 . The Hamming core processing array is surrounded by several peripheral units which are devoted the tasks of generating working signals, and taking decisions on the base of the processing results. The offset cells located on each row are activated during evaluation in order to guarantee that the worse case row voltage be above the decision threshold level. The decision network consists of regular CMOS invetiers and buffers. The integration of the proposed circuit architecture is produced using a 0 . 3 5~ double-polysilicon CMOS technology. The capacitances are realized as overlaps of two polysilicon layers. The Hamming array consists of a unit cell, to be abutted in the horizontal and vertical direction, thus creating the core Hamming network. The unit cell used to build a 16x16 array is depicted in Fig. 9 .
All control signals are routed in a single direction. The undesired coupling of the signal lines is envisioned to be cancelled in a future design using the third metal layer as a shield
