It is being proved that the neurochip Totem is a viable solution for high quality and real time computational tasks in HEP, including event classification, triggering and signal processing. The architecture of the chip is based on a "derivative free" algorithm called Reactive Tabu Search (RTS), highly performing even for low precision weights. ISA, VME or PCI boards integrate the chip as a coprocessor in a host computer. This paper presents: 1) the state of the art and the next evolution of the design of Totem; 2) its ability in the Higgs search at LHC as an example. Reference number: 303
Introduction
Neural networks implemented as VLSI hardware are being considered as good candidates to solve problems of time-critical and high quality performance pattern recognition in High Energy Physics (HEP) [1] [2] [3] . The main benefit is speed, because of the massive parallel architecture. A cost is usually a very complex architectural structure, since common algorithms such as backpropagation, being derivative-based, require high precision computation [4] . To gain significant improvement in this respect, Battiti and Tecchiolli devised a "derivative-free" algorithm in the context of a novel approach to the training problem, which is first transformed into a combinatorial optimization task, then solved by means of the heuristic method called Reactive Tabu Search (RTS) [7, 8] . RTS is based on the construction of search trajectories in the space of the binary strings of length L = N * B, into which N weights, needed to configure a neural network, are suitably coded using B bits per weight. The search is intended to locate the best "suboptimal" minimum on a cost surface by means of a sequence of elementary moves, each consisting of a single bit-flip in the string of weights. When a move is done, its inverse is forbidden for a prohibition period of T successive steps (the Glover's Tabu Search method [10] ), allowing some amount of diversification in the search process. RTS remarkably enhances such diversification by dynamically adjusting the parameter T through a simple mechanism that evaluates and reacts to the current local shape of the cost surface. This way it escapes rapidly from local minima and cyclings and finds solutions even for low precision weights, moreover quite independently from any starting point. Sect. 2 is devoted to a description of the neurochip Totem; Sect. 3 presents a new architectural design; Sect. 4 gives the results of a sample application, namely the extraction of the Higgs events from background in simulated data at LHC energies.
The Totem chip
Totem is a full-custom chip designed to operate as a co-processor in a host system, carrying out the most compute-intensive operations for RTS [9] . ISA and high performance PCI and VME boards have been developed to set the coupling. The chip includes an array of 32 parallel processors with associated on-chip weight memory and control logic with broadcast and output buses. Pipelinings are included to speed up operations. A 32-bit static storage register on the output of the MACs allows data transfer from the neurons of a layer in a MLP net to occur concurrently with a parallel input-multiply-accumulate operation on all the processors. The memory depth of 128 8-bit words allows neurons with up to 128 inputs to be implemented. Because of the sequential access to the weights, the chip can realize different MLP topologies with a high degree of flexibility: the memory bank can either be assigned to a single neuron or be partitioned among neurons on different layers. The sigmoid function is implemented off-chip by a RAM-based look-up table. Up to four chips can be paralleled in each layer of a network. With 250,000 transistors on a 70 mm 2 die manifactured in a 1.2 µm CMOS technology, Totem performs 1 GMAC/sec when clocked at 30 MHz. A doubling in the processor density and higher operational speed will be obtained by the transition to a 0.8 µm CMOS technology currently in progress.
3 Advances in the design of Totem and the plog encoding Considerable percentage of the silicon real estate of the Totem chip had to be devoted to the multiplication units and to the memories where the weights are stored. Both areas will be reduced by means of the already mentioned technological migration. However in the case of the multipliers an alternative and complementary approach can be considered: going to a logarithmic representation of the feature inputs and the axon weights, so that multiplication is replaced by addition, which is cheaper in terms of silicon area requirements and allows both lower power consumption and faster clock rate. Some of the authors (P. Lee, I. Lazzizzera, A. Sartori, G. Tecchiolli and A. Zorat) are exploring this way indeed [5] . The first problem is to find a reasonable approximation to the bin-to-log and log-to-bin conversion, since they are quite expensive [6] . −f ) and the approximation to log 2 (≤ 0.0861): it amounts at most to a 10%. When applying the plog encoding to neural nets, the multiplier stage of a neuron is replaced by an adder and a plog-to-bin unit. In such a plog based architecture the RTS training method for a Multi Layer Perceptron (MLP) is applied without modifications. The above estimated error of at most 10% turns out to be the same that Totem owes to a 4-bit weight setup in the conventional multiplier architecture: with such low precision weights, however, adequate solutions for many problems [8] are still obtained. The point is that, assuming the same fabrication technology, the area of the multiplication blocks are reduced by a factor 10, with a reduction in power consumption by a factor 12 and an increase in computational speed by a factor 3. The reduction in power consumption can be exploited to increase both the number of processors per die and the operating speed. These figures pave the way to new implementations with high performance factors at the same fabrication costs. As an example the fabrication of a neural processor hosting hundreds of neurons running at a reasonable 100 MHz clock rate is feasible within a couple of years. With such a processor, triggering tasks requiring neural nets of approximately 100 neurons can easily substain input rates of the order of 10 7 events per second, thus making its use possible even in the most time critical experiments, such as LHC.
Higgs search: observables and performance of Totem
Totem has been tested in the discrimination of Higgs events from background at LHC energies using simulation data obtained by the PYTHIA/JETSET Monte Carlo code.
Arbitrarily we assume the Higgs mass to be M H = 200 GeV /c 2 , just above the threshold for the creation of two real Z's [12] . In this case the dominant production mechanism is the gluon-gluon fusion and the best decay channel for its identification is the so-called gold plated channel:
pp → HX → ZZ → 4µX. whose cross section is 2.84 × 10 −12 mb as computed by the Pythia MC code. We provide the two expected main backgrounds according to the actual top quark mass (M t = 175 GeV [14] ):
with 4 muons produced by semileptonic decays of the top and antitop;
with a muon pair produced by Z 0 decay and the other one by semileptonic b andb decays. These two backgrounds have cross-sections respectively of 7.84 × 10 −9 mb and 6 × 10 −9 mb as computed by the Pythia MC code. We order the final muons according to the magnitude of their transverse momenta and use the following ten variables as physical observables : (X 1 − X 4 ) the transverse momentum of the four muons; (X 5 − X 8 ) the invariant masses of the four µ + µ − pairs; (X 9 ) the four muons invariant mass; (X 10 ) the hadron multiplicity of the hard jets, discriminated according to the K ⊥ Clustering algorithm for hadron-hadron collisions [13] .
Totem has been trained using a sample of 4000 Higgs events, mixed with 2000 of each of the backgrounds. The test set, totally different from the training one, consisted of N H = 2000 Higgs events mixed with about 360,000 tt and 270,000 Zbb event samples (thus respecting only the ratio between the cross sections of the two backgrounds). Some results are listed in Table 1 References
