The Enable Machine is a systolic 2nd level trigger processor for the transition radiation detector (TRD) of ATLAS/LHC. It is developed within the EAST/RD-11 collaboration at CERN. The task of the processor is to find electron tracks and to reject pion tracks according to the EAST benchmark algorithm in less than 10 µs. Track are identified by template matching in a (φ,z) region of interest (RoI) selected by a 1st level trigger. In the (φ,z) plane tracks of constant curvature are straight lines. The relevant lines form mask templates. Track identification is done by histogramming the coincidences of the templates and the RoI data for each possible track. The Enable Machine is an array processor that handles tracks of the same slope in parallel, and tracks of different slope in a pipeline. It is composed of two units, the Enable histogrammer unit and the Enable z/φ-board. The interface daughter board is equipped with a HIPPI-interface developed at JINR/-Dubna, and Xilinx 'corner turning' data converter chips. Enable uses programmable gate arrays (XILINX) for histogramming and synchronous SRAMs for pattern storage. With a clock rate of 40 MHz the trigger decision time is 6.5 µs and the latency 7.0 µs. The Enable machine is scalable in the RoI size as well as in the number of tracks processed. It can be adapted to different recognition tasks and detector setups.
INTRODUCTION
Many pattern recognition tasks can be solved by mapping the input data to a feature space in which a feature vector indicates a set of feature attributes. These attributes span a ndimensional coordinate system for n attributes. The coordinate transform can often be executed by a systolic array of identical processing elements (PEs). These PEs operate in parallel on k data streams that are synchronized by a sequential ordering parameter. The coordinate transform is expressed as a spacetime equation which is formulated as a fixed data flow processing scheme. Data arrive just-in-time at the right place where the ¶ Work partly supported by the German Bundesministerium für Forschung und Technologie (BMFT) under grant PH/-RS6-93/09; the Gesellschaft für Schwerionenforschung (GSI) Darmstadt, Germany, under grant HD Spe C; and the RD-11 and RD-6 collaborations at CERN, Geneva, Switzerland.
right operation is executed. Examples of such systolic processor for high energy physics applications are given elsewhere 1 .
PRINCIPLE OF OPERATION
The task of the Enable machine is the identification of 'electron compatible' tracks in the TRD detector of the LHC collider within 10 µs. This problem is solved as described below. The Enable machine has been designed within the EAST/RD-11 collaboration at CERN and conforms exactly to the EAST benchmark specifications 2 . It has been implemented in programmable gate arrays. The prototype system has been tested in an international beam test experiment at CERN in October 1993. The Enable Machine is designed to find tracks in the outer parts of the TRD detector 3 . As a second level trigger system it receives information about regions of interest from a first level trigger, in this case calorimeter modules positioned around the TRD surface. In the considered region the TRD is assembled of equidistant slices along the beam axis z. Each one consists of radially oriented 'straws', little drift chambers of 4 mm diameter. If hit by a particle a straw marks a pixel in the (φ,z)-plane where φ is given by the straw number and z by the slice number. Since the detector operates in a homogeneous magnetic field particles have trajectories of nearly constant radius of curvature, i.e., constant dφ/dz. In the (φ,z) plane such tracks correspond to straight lines of different slope. To identify electron compatible tracks, one has to take into account the amplitude of the signal generated in a straw. For a rough discrimination two thresholds are applied to these signals, generating a 'total' data set containing all signals above a low threshold, and a 'high' data set containing only signals above a high threshold. To solve the pattern recognition task, the best defined track in the 'total' image has to be identified, and the ratio of the number of hits along this track in both images has to be computed.
DATA FLOW ARCHITECTURE
To find the best electron compatible track in the TRD detector, two histograms have to be generated, one for the total data set and one for the high data set. In both histograms one counter is assigned to every possible track. The straight lines searched for are given by a combination of a certain slope and offset. The counter contents then represents the probability with which the respective track is present in the input image. The operations described above have to be applied to all pixels of the input images. Thus massive parallelism should be exploitable to reach the required speed. This is most easily achieved if the pattern recognition processors are implemented as systolic arrays. In such an architecture a large number of very simple identical processing elements (PEs) operate in lock-step on spatially distributed data. In order to avoid data routing problems the images to be processed are pipelined through the processing array so that only nearest-neighbor interconnects are required. Both features of a PE, simple structure and local interconnects, ease greatly an FPGA implementation of systolic processors 4 .
HARDWARE ARCHITECTURE
The Enable Machine for the recognition of electron-compatible tracks is composed of three building blocks, one histogram generation unit for 'total' and 'high' data each, and a trigger decision unit (Fig. 1) . The histogram generation units are used to count the number of pixels that may belong to all physically reasonable particle tracks. The trigger decision unit analyzes the contents of the histogram channels and does the e/π classification in programmable hardware. The Enable Machine has a pipelined architecture. One column of the two input images each are entered every step and are forwarded through a number of pipeline stages. Only a small number (<10) of different slopes are of interest. In addition about 20 φ values must be handled. Thus every column of the Enable Machine handles all tracks of a single slope, i.e., parallel tracks of all possible offsets. Only one pattern has to be stored in a local memory and can therefore be chosen arbitrarily. The input image is compared with this track pattern after the corresponding offset has been added. Where matches are found counters are incremented, one for every offset. After the last column of both images have passed the first column of the two histogramming units, the first two histograms are ready. The histograms are read out to the trigger decision unit that first has to find the maximum of the weighted sum of the two corresponding histogram channels within this column. This is done by shifting the counter contents sequentially into a lookup table where the weighted sum is computed. It is compared to the so-far found maximum which is updated accordingly. This lookup/compare operation is done in parallel to the ongoing histogram generation for further columns which produces now two more histograms every step. As soon as the local maximum in the first processor column has been found, the corresponding value is forwarded to the following column. Here it can be compared with the local maximum that has been computed exactly when the forwarded value arrives. Eventually the global maximum has been found and the two counter contents corresponding to this track are asserted onto a look-up table. The ratio of (number of 'high' pixels)/(number of 'total' pixels), used for electron identification/pion rejection, is read out and compared to a programmed threshold.
FPGA IMPLEMENTATION
A prototype of the Enable Machine has been implemented in field programmable gate arrays. This implementation has been chosen since it provides high flexibility and reduces substantially the development time. The prototype serves as a test and evaluation board. The system resides on a 36 cm x 40 cm VME board. The computational core of the system consists of 36 XC3190 chips and 36 128K synchronous SRAMs dedicated to each Xilinx. A two block arrangement of 6 x 3 XILINXs has been adopted for processing of low and total data respectively. The SRAMs are placed at the location of each XILINX on the opposite layer of the board (4Mbyte pattern storage capacity). Two columns of the histogramming unit have been mapped into a single Xilinx 3190 chip. 40 8Bit counters could be integrated in one chip and therefore 400 tracks can be searched in total (10 different slopes with 20 parallel lines each for ascending and descending tracks). The architecture of the Enable Machine allows easily to adjust several critical parameters. It is possible to change the trigger algorithm by reprogramming the Xilinx chips. In addition the tracks searched for as well as the threshold of the e/π ratio are specified by freely programmable look-up tables. Exploiting these features it is also possible to search for specific patterns along tracks or for nonlinear tracks if this should be required.
SYSTEM EMBEDDING
The Enable Machine is a VME-based system. It is controlled by a dedicated SparcVme CPU card. The system can be programmed from a X11 graphical user interface. All communication is handled via remote procedure calls (RPC). The contents of the look-up tables as well as the Xilinx contents, i.e., the architecture of the Enable Machine, can be programmed from the host workstation. The configuration program for the Xilinx chips, e.g., can be downloaded in a few milliseconds. The search pattern can be generated on the Workstation with an pattern generation program and automatically be downloaded in the SRAMs of Enable.
HIPPI INTERFACE
For data input the High Performance Parallel Interface has been chosen which sustains a transmission rate of 800 Mbits/s. The HIPPI interface has been designed by JINR, Dubna and has been integrated in the z/φ-board. The VME based z/phi-board of size 36 cm x 40 cm serves mainly two functions: I/O interfacing and data reordering (z/phi corner turning). Over the two HIPPI busses images of size 256 x 32 x 2 are transmitted. All functional blocks are interleaved with FIFOs to have separate watch-points on all intermediate processing steps. The functionality of the HIPPI input part has been tested with a HIPPI tester, SLATE, designed by the Royal Holloway and Bedford New College, London and manufactured by the KFKI, Budapest. The performance of the Enable Machine is essentially limited by the speed of the input data flow. With a worst case clock frequency of 40 MHz the trigger decision time is 6.5 µs and the latency 7.0 µs.
CONCLUSIONS
The systolic processor presented solves a simple pattern recognition task in binary images at a very high speed. It should first be noted that the development of large-scale systems of this type is less demanding than expected. The reason is the regular structure of these processors. The PEs are simple and can relatively easily be developed using CAD tools. The nearest-neighbor interconnection structure can also easily and efficiently be implemented in programmable gate arrays or in application specific integrated circuits. The basic operations are increment or decrement of counters, and shifting of data in the spatial or time domain (shifting in the time domain corresponds to the delay of data or the forwarding over a number of data). So an unified architecture is under development, in which arbitrary Hough transforms are computable.
In an international beam test experiment at CERN in October 1993 the principal functionality of the system had been shown with correctly processed events.
