Abstract-A new hardware trigger system based on tracks detected by a stereo drift chamber has been developed for the BABAR experiment at Stanford Linear Accelerator Center. The z0 pT Discriminator (ZPD) is capable of fast, 3-dimensional reconstruction of charged particle tracks, and provides rejection of background events due to beam particles interacting with the beam pipe at the first-level trigger. Over 1 gigabyte of data is processed per second by each ZPD module. Rapid track reconstruction has been realized by the use of the latestgeneration FPGAs.
I. INTRODUCTION
HE BABAR Experiment, operating since 1999 at Stanford Linear Accelerator Center, studies physics of B mesons produced in collisions of electrons and positrons with a center-of-mass energy of 10.58 GeV. The data acquisition system for the experiment is driven by a multi-level trigger system. The first hardware trigger system is called Level 1 (L1) Trigger. A L1 Accept initiates data transfer via optical fibers that connect the front-end electronics to the off-detector processors where further data reduction takes place. The latency of the L1 Trigger is fixed at 12 µs, determined by the length of the data FIFO in the front-end electronics.
The rate at which BABAR can record data is primarily limited by the bandwidth between the front-end and offdetector electronics. The data acquisition system was designed for a sustained L1 rate of 1 kHz, and saturates at around 3.5 kHz. On the other hand, the L1 rate is expected to rise as the accelerator continues to improve its luminosity. It was therefore necessary to upgrade the L1 Trigger system to reduce the background rate while maintaining high efficiency for the physics signal events.
The small cross-section of the e + e − collisions means that, even with the expected luminosity increase, the physics event rate remains less than 100 Hz. Fig. 1 shows the distribution of z 0 , the position of the track's closest approach to the beamline (z-axis) measured from the nominal interaction point, for the tracks in the events that were accepted by the current L1 Trigger. In addition to the peak near z 0 =0 produced by the e + e − collisions, many tracks are detected with distances of 20 cm or larger from the interaction point. These tracks, produced by beam particles and synchrotron radiation photons hitting the beamline components near the experiment, comprise the bulk of the L1 rate but are not useful for physics measurements.
Fig. 1
Origin along the beamline (z-axis) of drift chamber tracks which pass the current Drift Chamber Trigger. The peak at z0 = 0 is from e + e − collisions; the remaining tracks are primarily background tracks from beam particle collisions with beamline components.
The current BABAR detector and trigger systems are described in [1] . The L1 Drift Chamber Trigger (DCT) receives inputs from a cylindrical small-cell drift chamber with 40 concentric layers arranged in 10 superlayers of 4 layers each. Four of the ten superlayers are strung parallel to the beam axis, while the remaining six are given 40-70 mrad stereo angles. The z information provided by this arrangement is used in the software trigger and in the offline event reconstruction, but not in the L1 DCT, which finds tracks only in the 2D projection. The existing DCT is therefore unable to take advantage of the z 0 distribution shown in Fig. 1 origin with respect to the interaction point.
II. TRIGGER OVERVIEW
A block diagram of the upgraded trigger system is shown in Fig. 2 below.
Fig. 2
Hit information from the 7104 sense wires of the BABAR Drift Chamber is transmitted to the L1 DCT every 269 ns. Since the time resolution is shorter than the ~600 ns total drift time, each event is spread over a few consecutive clock cycles. The Track Segment Finder (TSF) modules [2] use the time development of the wire hit patterns over 3 clock cycles to identify track segments in each superlayer with a resolution of ~800 µm. These track segments are sent to the ZPD modules, which perform 3D track reconstruction.
The drift chamber is organized into octants and superlayers as shown in Fig. 3 below. Superlayer designations A, U, and V correspond to axial, right hand stereo, and left hand stereo respectively.
Fig. 3
Each ZPD receives segment data from three octants and searches for tracks that come from the origin and pass through one of twelve "seed" segments in the A7 or A10 superlayer of the central octant. The two adjacent octants are used to provide tracking coverage for low p T tracks. The tracks are then fit for their z 0 , curvature, and dip angle. These parameters are used to select signal events which have tracks from e + e − collisions. The current L1 DCT uses TSF modules that report detailed segment information only for the axial superlayers. A set of 24 updated TSF modules that send the full segment information from both the axial and the stereo superlayers are being built to replace the current modules.
Additional trigger modules use electromagnetic calorimeter information to make trigger decisions based upon neutral particles. A Global Trigger board combines information from each L1 Trigger subsystem to make the L1 Trigger decision.
III. Z 0 P T DISCRIMINATOR (ZPD)
Organization of the ZPD is shown in Fig. 4 below.
Fig. 4
TSF data are input as single ended standard 3.3V CMOS levels to each ZPD on a 160 bit wide, 60 MHz bus for a total data rate of ~10 GB/s. Each such slice of data contains geographical locations of hit segments. Sixteen of these slices are set off by a "frame bit" and constitute a "packet" of data. These packets are continuously reformatted and transmitted along a 75 bit multi-drop LVDS bus, called the "MegaBus" to six "Finder-Fitter" FPGAs (Xilinx XC2V4000, 1152 pin BGA). The MegaBus transmits its data in "Double Data Rate" (DDR) format in which the data change at both edges of the 60 MHz clock. Electrically, the MegaBus consists of pairs of 6 mil traces on top of each other and surrounded top on bottom by ground planes in the circuit board. This results in a 60 Ohm differential transmission line terminated at both ends. This results in very clean data transitions which have been tested to over 160 Mb/s (30% above spec) on each line with negligible 0-7803-8257-9/04/$20.00 © 2004 IEEE.
bit error rate. One such pair fits between the vias in the 1 mm ball grid array of the Finder-Fitter FPGA. The ZPD is a twelve layer printed circuit board, the number of layers driven primarily by the need to accommodate the MegaBus differential traces and ground planes.
There are six Finder/Fitters on the MegaBus, each being responsible for two of the twelve seed segments. VHDL code for all the Finder/Fitters is identical and a geographic address is used to distinguish them.
A. ZPD Track Finder Algorithm
The ZPD Track Finder algorithm uses a Hough transform [3] to search for charged particle tracks which originate from the e + e -interaction point and pass through a seed segment in superlayer A7 or A10. For each seed segment, 260 different curvature and dip angle possibilities are considered in parallel. The best option is selected based upon the number of segments which are consistent with each track parameter hypothesis.
Each Finder receives 5 track segments of data on the MegaBus every 8.3 ns. These segments are stored in dual port block RAMs until all segments in a packet have been received. The algorithm reads these memories at 60 MHz, processing up to 26 segments at a time.
Each Finder searches for tracks using two different seed segments which are processed serially using the same pipelined logic. The first step of the algorithm adjusts the segment phi values by subtracting the seed phi so that the subsequent algorithm is independent of an overall azimuthal rotation.
A Pattern Recognition Matrix (PRM) records how many superlayers have segments which are consistent with the track parameter hypothesis for each bin. The bin with the most hits is selected as the most likely track trajectory. If the maximum spans several bins, the track parameter values are averaged over those bins. The best track option must have no more than 2 superlayers with missing segments.
A look up table converts the track parameters into the expected phi positions of segments along the track. The segments closest to the expected phi value are associated with the track and passed to the Fitter algorithm for the full 3D track parameter fit.
Individual steps of the Finder algorithm are location constrained within the Virtex-II FPGA in order to assist the Xilinx place and route software. The PRM logic dominates the resource usage of the Finder-Fitter FPGAs and care was taken in order to minimize the resource usage of the basic elements which are replicated thousands of times.
A C++ program which simulates the Finder algorithm track finding efficiency and purity was used to optimize the PRM size and make design choices related to algorithm performance versus resource usage. This program outputs VHDL code containing the PRM constants which map segment phi values into PRM bins. The VHDL code was designed to minimize the differences between A7 and A10 Finders and the logic for either is selected by a single generic constant at the time of synthesis.
B. ZPD Track Fitter Algorithm
The Fitter algorithm receives a set of up to 10 spatial points, one in each superlayer, from the Finder algorithm, and attempts to fit a helix to them. This is done in a 2-stage, noniterative process that ensure efficient hardware implementation with a fixed latency.
In the first stage, the points are projected on the x-y plane at z = 0 and are fitted to an arc. Each of the three pairs of neighboring stereo points (U2 and V3, U5 and V6, and U8 and V9) are combined to cancel the effects of the stereo angles, so that they provide an effectively axial measurement. The fit starts from an arc that passes the interaction point and the seed (A7 or A10) segment, with the curvature determined by the Finder. The residuals at each spatial point and their first derivatives over the curvature are calculated. Newton's method is applied once to improve the value of the curvature. Iterative application of this procedure would improve the accuracy of the curvature only marginally at a significant cost in latency.
In the second stage, the residual of each stereo point with respect to the 2-dimensional arc is converted into a z measurement. These (up to 6) spatial points form a straight line on the cylinder defined by the arc. A simple linear leastsquare fit gives the z 0 and the dip angle. The error on the z 0 is estimated from the number of spatial points available to the Fitter.
The Fitter algorithm is implemented in a Xilinx FPGA utilizing the dedicated multipliers and block RAM. The highly pipelined algorithm runs with a 60Mhz clock and requires the dedicated resources to meet timing requirements. The Block RAM is used as a lookup table where algorithm values are stored. The critical path in the design is a multiplication with a value from a lookup table. The dedicated multipliers and block RAM are physically close in the FPGA to minimize delays. The remaining algorithm logic is constrained to be in the same area to further minimize delays.
C. Decision module
The Decision Module collects fitted track parameter information from each of the Finder-Fitters and makes trigger decisions. Four independent decision bits are reported to the global trigger based upon cuts in curvature, dip angle, z0, and the error on z0. An additional 4 decision bits are stored in the output data stream for testing new cut parameters. The cut parameters are stored in memory which is configured at the beginning of a data-taking run.
D. Diagnostic memories
Six diagnostic memories at various places throughout the dataflow provide debugging capabilities for the ZPDs. These memories are implemented using dedicated block RAMs within the Virtex-II FPGAs and can store 64 events of information. These can either record the datastream or be used to play data to downstream pieces of the data processing pipeline. These memories have been used to test the hardware, debug the firmware implementation, and compare the ZPD results with the C++ simulation code.
E. Implementation tools
The CAE tools used to design the ZPD were provided by Mentor Graphics. All of the firmware was written in VHDL using FPGA Advantage to produce a well laid out hierarchical design. The design implementation relied heavily on accurate VHDL simulations using ModelSim. Leonardo Spectrum was used for VHDL synthesis. Xilinx ISE was used to place and route the FPGAs.
IV. PERFORMANCE
The on-board interconnects among the FPGAs are validated by a boundary scan. The "MegaBus," which carries the largest amount of data, has been tested using a dedicated FPGA firmware that transmitted rotating bit patterns. This allowed us to stress-test the bus up to ~10 16 bits and with higher clock frequencies than the nominal 60 MHz.
In order to test more complex functionalities, several blocks of diagnostic memories are distributed along the data path of the ZPD module. They allow us to transmit simulated data through various parts of the system, record the output, and compare it with the expectation. We generated, using the BABAR Monte-Carlo simulation software, physics events in which two B mesons are created in an e + e − collision. The expected response of the BABAR Drift Chamber was fed into a program that simulated the ZPD algorithm. We were successful in demonstrating that the hardware matched the simulation bit-wise.
