The NA48 charged trigger is a mixed hardware and software real time processing system intended to detect the interesting configurations of Ko charged decays. It achieves real-time event building, track reconstruction and kinematics computation on drift chamber data at an event rate of 100 kHz and within a maximum decision latency of 100 ps.
I. INTRODUCTION
NA48 is a particle physics experiment aiming at the fine measurement of a fundamental parameter of physics, through the statistical study of Kaon decays into Pions [l] . The Other Triggers Figure I : The Charged Trigger Inside the Experiment experiment comprises two main detectors: a calorimeter for neutral decays (K+2 no,...) and a spectrometer for charged ones (K+n'n-or n'e-v.. .). The spectrometer consists in four drift chambers and a magnet. The electronic information left by the charged particles in the drift chambers allows the system to reconstruct their trajectories (tracks), and the track deflections caused by the magnet leads to the kinetic parameters of the decays.
The primary rate of charged decays is of about 200 kHz. However, among all possible charged decays, only K+ .nixis relevant for the proposed measurement and is considered as the "signal" ("good" events) the others being therefore considered as "noise," With respect to that definition, the raw signalhoise ratio produced by the front end data acquisition system is of the order of lo4. The function of the so-called "charged trigger" is to achieve, in real time, a substantial increase of the signalhoise ratio by eliminating as much noise data as possible, in order to improve the statistical significance of the data and ease the bandwidth and storage constraints of the experiment.
As shown in Figure 1 , the charged trigger system is made of two subsystems: the level 1 trigger (LIC), which reduces the primary event rate down to 100 kHz, and the level 2 trigger (L2C) which reduces that rate down to -2 kHz. The L1C is a fast logic trigger, based on several simple criteria, which achieves a first selection of the charged events data and injects them into the L2C. The L2C is a parallel processing system mixing hardware and software elements; for each event, it computes the coordinates of the particle traces, reconstructs tracks, calculates the kinetics and flags the event as "signal" or "noise." This paper is concemed with the description of the salient features of the L2C.
BASIC CONCEPTS

A. The Charged Detection Principles (Figure 2)
The axis of the Kaon beam defines the z-axis of the experiment's coordinate system. Each drift chamber (DCH) 8 parallel sense wire planes, all perpendicular to the z-axis.
They are grouped by staggered pairs to form 4 coordinate views (VIEW). Each VIEW is essentially made of 512 parallel wires perpendicular to the axis of the coordinate it measures. Whenever a charged particle crosses a VIEW, it necessarily goes through two neighboring wires, leaving an electric pulse on each, The coordinate of the crossing point (also called "space-point") is computed by combining the coordinates of the wire pair with an analysis of the timing difference between the two pulses.
0-7803-4258-5/98/$10.00 0 1998 IEEE Theoretically, two VIEWS (x and y ) would be enough to determine the space-point of a particle inside a chamber. But since we are interested in pairs of particles (K+ n'n-), a typical event comprises a pair of x coordinates and a pair of y coordinates; a third coordinate (U =(x+y)ld2) corresponding to a fixed linear combination of x and y is then needed to determine which x is associated to which y . Finally, since each VIEW has an inefficiency of -l%, a fourth plane (v=(y-x)/d2) is added to each DCH to make up for it.
The L2C uses all the pulse timings (hits) produced by DCHl, DCH2 and DCH4 to compute the coordinates, tracks and kinetics of each event it receives.
B. The Trigger Principles
The whole NA48 experiment is synchronized by a 40 MHz clock, that is, time is defined for all subsystems by a reference clock in units of 25 ns. All data are associated with a "Time Stamp" (TS) value, representing the time at which the corresponding event (decay) took place. For the system, an "event" is a set of data detected around a TS. When the LlC spots a candidate event around a particular TS, it prompts the acquisition system to send the corresponding data to the L2C. The L2C processes that event and produces a single "Trigger Word" (TW) containing the TS and several flags which summarize the main characteristics of the event.
The TW is then transmitted to the trigger supervisor of the experiment which, depending on the trigger patterns coming from all trigger systems, decides to readout the corresponding data or to dump it. 
THE L2C REQUIREMENTS
NA48 being a precise measurement experiment, all biases introduced in the acquired data by the instrumentation (especially triggers) must be known to a high degree of precision (-10 ') .
The L2C must meet an input event rate of 100 kHz. It must also process events within a latency of about 100 p: if the processing of an event takes more than 100 ps, that event is flagged as "Not Computed" (NC) and the system discards it. Therefore, the rate of NC events must be sufficiently low so that not more than a few percent of good events are lost. The relative inefficiency of the system -that is the loss of good events-for the two types of Kaon decay (K8 and K,) must be known with a precision of below 10'.
The L2C is an asynchronous queued system. Since the statistical distribution of events in time is Poissonian, the queues will fluctuate around an average value, provided that the average processing rate is faster that the event rate [2] . The actual processing latency of an event is due to 1) its intrinsic complexity and 2) the time lost waiting in queues. Therefore, reducing queues increases system efficiency. In order to control the queuing levels, the L2C includes an "XonKoff' mechanism: when queues reach a critical level, the L2C asserts an "LlOFF" signal, turning off the L1C; once the queues are absorbed by the system, the L2C releases the LlOFF, allowing the L1C to resume its activity. For all practical purposes, this mechanism reduces inefficiency by increasing dead time, which is preferable since statistical biases due to dead time are better controlled. However, the dead time generated by this mechanism must not exceed 1% at the nominal L2C input event rate of 100 kHz.
The L2C must be sufficiently flexible to allow for small algorithmic changes. It must be scalable in order to adapt to rate modifications. It also has to provide all tools and probes needed for the optimization and fine-tuning of the overall system.
IV. THE L2C CORE STRUCTURE
As shown in Figure 3 , The L2C system is made of four subsystems: The Coordinate Builders (CB), the Event Builder and Dispatcher (EBD), the Event Workers (EW) and the Event Worker Farm Managcr (FM).
A. The Coordinate Builders (CB)
Each VIEW has its own CB board, in charge of producing coordinates by analyzing wire hits timings. The raw data is injected by the data acquisition system into the input FIFOs of the CB. This data is then processed through a 40 MHz pipelined algorithm implemented on firmware (XilinxTM): all neighboring wire hits are matched and, for each pair of hits, the event time is subtracted from the hit times yielding two drift times which are then used as indexes in a 2-D lookup table to retrieve the corresponding coordinate value. The implemented version of this algorithm exhibits O(3n) dependency (where n is data multiplicity) whereas the original nested loops algorithm is O(n(n+2)). For a typical event, the processing latency in the CB is 1.2 ps, which allows for an event rate of about 800 kHz. All computed coordinates are then sent through an optical link (Fiber Channel 266 Mbitsh) to the EBD in variable-length packets.
B. The Event Builder and Dispatcher (EBD)
Like the CBs, the EBD's design is based on firmware (AlteraTM 
C. The Event Worker (EW) Farm
The function of an EW is to receive the 12 coordinate packets of an event and use them to compute the particle space-points, tracks and magnetic deflections in order to determine whether the event is compatible with a K+n'ndecay. The principle of the algorithm is to assume that the event is a K+n+n-decay and compute the corresponding invariant mass; if the calculated mass is not equal to the mass of the Kaon, then the assumption is necessarily false and the event can be flagged as "noise."
A major complication, mainly due to the high luminosity of the Kaon beam, is that many events are not "clean" 2-particle events:l depending on various conditions, up to 20% of events are liable to have a coordinate count greater than 2, which introduces a combinatorial problem both in the spacepoints computation (XYW) and the mass computation (MASS) algorithms. For instance, if we have n (resp. p ) space-points in DCHl (resp. DCH2), the number of possible pairs of tracks is n(n-l)p@-1)/2, which means that the number of combinations in the MASS algorithm increases like n4, where n is the number of particles per event. This is the main reason for introducing MIMD (Multiple Instructions Multiple Data) parallelism inside the present implementation of EWs.
Presently, an EW is a cluster of 4 totally connected TMS320C40 ('C40) DSPs [3] by Texas Instruments [4]. The 'C40 DSPs are implemented on industrial octal boards by MIZAR [5] . As represented in Figure 4 , DSPl receives the data and dispatches the packets of DCH2 (resp. DCH4) to DSP2 (resp. DSP4) keeping the packets of DCH1. Then, the three begin X Y W computation in order to associate x and y coordinates and produce the space-points of each DCH. together all possible 2-tracks combinations, eliminating those which -either physically or geometrically-are not compatible with an interesting K+ 7ct7(1 decay. All three send compatible track pairs to DSP4 which, upon reception, extrapolates the tracks to DCH4, spots possible deflections by comparing the extrapolated space-points to DCH4 data and computes the associated invariant mass. If it finds one mass compatible with the Kaon mass, it flags the event as "signal." The TW bears this flag as well as other bits summarizing the characteristics of the event. When DSP4 finishes all its computations, it sends the TW to the FM and then proceeds with various table purges and housekeeping tasks. When that is done, the EW signals that it is available for another event by sending (through DSP3) its associated EOP id to the FM which, in tum, writes it to the AOP FIFO of the EBD. Figure 5 shows the chronological processing distribution between the 4 DSPs, in a case where there are 3 (resp. 3, 5 ) particles in DCHl (resp. DCH2, DCH4).
D. The Event Worker Farm Manager (FM)
The main functions of the FM are 1) to receive the TWs from the EWs TW daisy chain and send them to the experiment's Trigger Supervisor, 2 ) to receive the EOP ids of available EWs from the corresponding daisy chain and write them back into the AOP FIFO and 3) to manage the LlC Xon/Xoff mechanism which regulates queue levels. It also implements specific signaling functions for system debugging and emulation.2 The FM is firmware-based and For debugging and fine-tuning purposes, the L2C system is equipped with a full emulation mode which allows it to run in standalone using either real or Monte-Carlo data at a maximum rate of 300 kHz.
controlled by a Quad MIZAR board [5] through 'C40 communication links. The firmware has recently been reprogrammed in order to implement a hardware control of the Xon/Xoff mechanism, allowing the system to stand event rates over 100 kHz (up to 400 kHz as for the FM).
V. THE L2C SOFTWARE AND CONTROL
The L2C is installed on 4 VME crates and one workstation over distances of up to 100meters: 3 crates for the CBs associated to DCH 1, 2 and 4, and one crate containing the EBD, the EWs and the FM. Each crate is controlled by a SPARC@ VME SBC computer running UNIX. The whole system is monitored by a SunTM workstation through a private Ethernet network. The core software of the L2C consists in the DSP programs and the libraries developed for the control of specific VME boards. In order to maximize performance, the DSPs do not run any multitasking system. A higher level distributed program achieves the synchronization and monitoring of the whole setup.
A. The Core SofnYare
The CBs and the EBD have both a slave VME interface which allows for their full configuration and setup. (Figures 4 and 5) .
The control software implements various functions, namely configuration, process synchronization, online monitoring and communication with the NA48 experiment's run control program. It uses ISIS", an off-the-shelf communication software [6] which allows for an easy development of distributed software. The visualization modules are based on PAW [7] , a powerful data analysis toolkit developed by CERN.
The running conditions of the NA48 experiment depend strongly on the CERN-SPS particle accelerator. Two phases altemate during the run: burst and interburst. The burst phase corresponds to the actual release of the Kaon beam and lasts -2 s. The interburst phase corresponds to the acceleration phase inside the circular accelerator and lasts -12 s. Therefore, the actual acquisition and triggering takes place during the burst while the interburst is used to retrieve monitoring data from the system. As for the L2C, the EWs, the FM and CBs accumulate the monitoring data in on-board static RAM or FIFOs during the burst and transmit them to the control software during the interburst. 
VI. SOME RESULTS
The first run of the L2C took place in 1995 in a reduced configuration, allowing a thorough debugging and setting up. The results of that run allowed for a finer modeling of the system and, therefore, more precise estimations of the final performances. In 1997, the full system ran, complying with all the requirements, the results being even better than predicted. The online K O mass resolution is 5 Mev/c* and the rejection power -60 ( Figure 6 ).
The main constraint in the L2C system is the strict latency 
VII. PROJECTED UPGRADE OF THE EWS A. New Requirements for the L2C
The NA48 collaboration has requested that the charged trigger be able to stand an increased event rate of about 150-200 kHz. At such a rate, the pre.sent implementation would see its dead time and event loss increase to prohibitive values. Meanwhile, we have studied the possibility of replacing the EWs based on DSP clusters by mono-processor EWs based on state of the art general purpose RISCs: the straightforward, sequential software of a mono-processor EW would indeed be much more easy to maintain than the present MIMD code as it would get rid of much communication and synchronization code.
B. Benchmarks
We have run the L2C core software on various processors and obtained very encouraging results. This tremendous leap in performance can be explained by the fact that the architecture of present-day general purpose processors is particularly adapted to the L2C algorithm. The amount of data per event is very small (typically -160 bytes) while the track-reconstruction algorithm is relatively complex (though not bulky). The first access to the data by the processor practically loads the whole event into the processor's cache and then the algorithm proceeds at full speed. This also entails that any increase of clock frequency on a processor means a gain of the same proportions in computing time.
D. Implementation Inside the Existing System
Presently, the communication between the EWs and the rest of the system goes either through 'C40 links for fast data transfers or VME for monitoring and control. Any upgrade of the EWs poses a hardware problem, which is to interface the new RISC boards with the EBD and the FM through the 'C40 link protocol, and a software problem, which involves the porting of the DSP software on a RISC platform and the development of the drivers of the hardware interface. Since most of today's RISC VME boards are made to receive at least one PCI mezzanine card (PMC), the most straightforward way of interfacing the mono-processor EWs with the rest of the system is to develop a specific PMC that will handle the different data transfers on firmware, using the 'C40 link protocol (Figure9). It will also have to take on some low-level functions (e.g. timers) which are present on the DSP chips but will not be available on the RISC processor.
VIII. CONCLUSION
The L2C is a successful implementation of a high rate, software, scalable trigger system. Its strong points include a small scale, efficient and compact switched-based event builder, as well as a scalable processing farm of commercial CPUs which allows for easy upgrades.
It is to be noted that capabilities for standalone emulation and testing are crucial for the development of such a complex system: without these features the integration of the system would have proved much more difficult.
The importance of control software should also be underlined: monitoring a 100 kHz or more event processing system requires a strong software back-end which, beyond everything, has to be robust.
The NA48 charged trigger may be considered as a small example of what will have to be done for future LHC experiments: software trigger systems, statistical performance, massive parallelism, commercial electronics. The experience acquired through the design of this particular is precious for the design of future systems.
