Asynchronous Transfer Mode (ATM) is a candidate technology to implement the high performance network in the data collection system for the ATLAS experiment. This work presents the results of modelling and simulation studies which aim dt integrating the detailed organiLation of the detector read-out, the trigger requirements and the capabilities of ATM switching networks. The status of hardware development of small scale demonstrators is outlined.
I. INTRODUCTION The next generation of High Energy Physics experiments, ATLAS 111 and CMS [2], proposed at the CERN Large Hadron
Collider (LHC), will place heavy demands on the data acquisition and on-line filtering systems. A variable portion of the 106-1# detector channels in those experiments will be fired by tens of interactions created by two bunches of hadrons colliding at a $0 MHz rate. Sophisticated multi-level selection systems will reduce the raw data flow from a few tens of TBytes/s to the several tens of Mbyte/s that will then be recorded on tape for subsequent off-line analysis. A first reduction of this data will be carried out by fast pipe-lined logic that will retain only those events that satisfy some simple geometrical and energy deposition criteria.
After this first level selection the remaining data bandwidth is expected to be of the order of -1000 Gbit/s. Traditional busbased data acquisition (DAQ) systems are not adequate to handle this high bandwidth. Several data acquisition conceptual models have been proposed for use downstream of the first level trigger ([I] , [21) . The RD-31 project [3] aims at evaluating a new, parallel approach to data acquisition based on the use of standard Asynchronous Transfer Mode (ATM) packet switching technology [41. This technology holds the promise of becoming a "universal" communication standard, unifying the telecommunications and local area network w k e t s on the time scale of the LHC.
A group of collaborators within RD-31 is involved in the ATLAS experiment. It focuses its efforts on the architecture design and simulation studies adapted to the ATLAS trigger system based on the ATM technology, In this paper we propose an integrated architecture for the level 2 and level 3 selection and data read-out systems. It is based on the so-called data "Full" control strategy. We discuss the relative merits of this approach and evaluate its performance by means of simulations.
0018-9499/96$05
This paper is organized as follows. Section I1 describes the principles of the ATLAS event selection and data read-out models. Some of the system bandwidth requirements are presented in section 111. The motivations leading to the choice of ATM for our application are given in section IV. Our proposed "integrated Pull architecture" is described in section V. The system that we have modelled and the corresponding simulation results are presented in section VI. The status of two hardware demonstrators is out-lined in section VII. A summary and future plans are presented in section VIII.
EVENT SELECrrON AND DATA READ-OUT IN ATLAS

A. The Basic Principles
The ATLAS trigger consists of three logical levels, shown schematically in Fig. 1 . Beam crossing interactions occur at a rate of 40 MHz. At the nominal luminosity of 1 f l ~m-~s-', the input event rate resulting from the level 1 trigger threshold cuts is estimated to be approximately 30-40 kHz. A safe design value of 100 lcHz has been adopted. The level 1 trigger is without deadtime, because all the data are pipelined during the fixed 2 ps latency needed to decide whether to accept or reject the event candidates.
To mass storage The event rate that can be recorded on tape is estimated to be in the range 10-100 Hz. Thus a further reduction of the order of 103-104 is necessary. This important rejection factor can be achieved by using two more logical sequential steps for each event. First, the RoIs within an event are processed individually to identify particle candidates such as electrons, jets or muons. Then a topological analysis of the event is performed by the combination of the previously identified particles. The trigger decision is issued accordingly. An additional event rejection factor of -10 is expected to be achieved by the level 3 trigger that executes sophisticated algorithms and selects events on the basis of physics signatures. The level 3 system should provide access to the complete event data in order to perform full event analysis similar to that applied off-line. Events accepted by the level 3 selection are recorded on tape for subsequent off-line studies. 
B. Data Flow
The logical organization of the data flow for the read-out and the level 2 (LVL2) and level 3 (LVL.3) triggering systems is depicted in Fig. 2 . For each bunch crossing the signals from all subdetectors are stored locally in pipeline memories (digital or analog) during the level 1 processing. For events accepted by level 1 (LVL1) the data from all the detector front-end memories are transferred via optical links to about 2000 readout cards located in the counting room. The read-out cards contain a data buffer to store events during subsequent triggering steps. In addition they possess enough processing capabilities to preprocess and format data for the level 2 and level 3 selection systems. The data is transmitted to those systems via a dedicated port of the read-out cards. each subdetector are processed individually and in parallel.
One Local Processor (LP) per RoI per subdetector is allocated (see Fig. 2 ). In a given subdetector the data for a RoI can be spread across several read-out cards. Therefore a network providing local connectivity (Local Network, Fig. 2 ) is needed to gather the RoI data into the local processor. However, full connectivity at this stage provides more flexibility, better load balancing among the local processors [5] and might be simpler.
For each event the results of the local processing (referred to as "features") are combined in a Global Processor (GP) via a Global Network for the subsequent topological analysis (Fig. 2) . The Global Processor issues the LVL2 trigger decision. The LVL3 selection starts for the accepted events. Rejected events are discarded.
The LVL3 selection algorithm may require the complete event data from all subdetectors. The duration of the algorithm is estimated to be of ihe order of 0.1-1 second. Therefore, a farm of processors is needed in order to cope with the expected 1 kHz LVL2 rate since one processor is allocated per event. The necessary connectivity between read-out cards and the processing farm is provided by the L3 network (Fig. 2) .
The control and rnanagement of the LVL2 and LVL3 systems is performed by the L2 and L 3 supervisors respectively
BANDWIDTH REQUIEEbfENTS
The expected average event data sizes for all subdetectors is given in Table 1. Tlhe table also shows the aggregate bandwidth requirements for data transmission to the LVL2 and LVL3 systems. This corresponds to level 2 and level 3 input rates of 100 kHz and 1 kHz respectively. 
L2 Bandwidth
Gbivs
L3 Bandwidth
Gbitk
It can be seen that the aggregate bandwidth required for the triggering system is of the order of several tens of Gbit/s. This high bandwidth canriot be handled by the traditional busbased data acquisition systems. It is expected that switching technology will allow implementation of the high performance, cost-effective and expandable network, required for this challenging application. However, building this high performance network is not a trivial task. In order to reduce the overall development phase and facilitate maintenance of such a complex system, it is desirable to use commercially available components wherever possible. Compliance with widely adopted industrial standards ensures the interoperability (software and hardware) of equipment from various vendors. The data transferred to the LVL2 system corresponds to the regions of interest selected by the LvLl trigger. It is estimated that an event will contain 5 RoIs on average. ATM technology is also being adopted for highperformance local area networking (LAN) applications, and all major workstation companies are actively engaged in developing the technology (typical activities are the development of LAN hubs based on ATM switches, ATM interfaces to workstations, and the implementation of the internet TCP/IP protocol over ATM). Efforts in this area are coordinated by an industry association, the ATM Forum [81, which parallels the ITU's standardization efforts, while focussing on the needs of the workstation/LAN industry It appears likely that ATM technology will dominate both high-performance WAN and LAN networking throughout the time-span of experiments at the LHC. The growth of multimedia applications and the adoption of ATM by the more costcompetitive LAN industry suppliers are expected to render ATM affordable on the time scale of the LHC.
V. PROPOSED ARWECI'URE
A. The Pdnciples
Currently several different implementations of the proposed read-out scheme (Fig. 2) , based either on "Push" or on "Pull" strategies, are under evaluation within the ATLAS collaboration. In the "Push approach the sources send their data to the processors as soon as the data is ready and the destination processor entity is known. This scheme implies that sources must know, prior to the execution of the selection algorithm, which data will be needed by processors. Therefore, for LVL2 all RoIs from all subdetectors should be sent to the local processors and examined in parallel, even if this is not necessary. By contrast, in the "Pull" strategy the destination processors request data from the sources as it is needed. This approach allows the implementation of a sequential LVL2 algorithm: the RoI data for the particular subdetector is sent to the local processors only if it is required by the subsequent steps of the algorithm. The sequential steps of the level 2 selection can significantly reduce requirements for the switching network aggregate bandwidth and processing power.
At present it is not decided whether the full event data will be presented to the LVL3 or whether the selection will be based on partial event data. It should be mentioned that in the LVL3 triggering system based on the "Push" approach, the full event data will always be sent to the destination processor. The event flow control in this case might be relatively simple at the expense of a higher demand on the aggregate bandwidth of the switching network. On the other hand, the architecture, based on the "Pull" strategy will allow implementation of both partial read-out and/or full event building schemes.
In any DAQ architecture, control information should be exchanged between various parts of the system (e.g between the data sources and the destination processors, etc.). The control information can either pass via a dedicated network, or i t can use the same network as the one used for data transmission The main advantages of the second approach are the simplification of using a common medium for all types of traffic (LYL2 / LVL3 data and control) and the requirement of a single network adapter per node.
B. An Integrated "Pull" Architecture
In this section we desa-ibe a possible LVL2 / LVL3 selection and data read-out system based on the "Pull" principle and a single physical network. This physical network will support the "local", "global" and "L3" networks of the logical model shown in Fig. 2 .
We assume that the information on each event accepted by level 1 (number of RoIs and their position in the (q, $) coordinate system') will be delivered to the L2 supervisor via a dedicated path. One of the tasks of the supervisor is to allocate processing resources for this event, e.g. assign a local processor per Rol and a global processor for the LVL2 decision. Currently we propose a very simple destination processor assignment scheme: we estimate that simple sequential allocation is adequate [5] . More sophisticated algorithms are not excluded.
The control and data flow in the proposed system is shown in Fig. 3 . The global decision processor receives a notification message from the L2 supervisor (the flow labelled 1 in Fig. 3 ). This message contains the event ID, a list of RoIs and a list of IDS of the local processors assigned to these RoIs. The global processor sends a notification message to each local processor allocated for this event (flow 2). The message contains the event ID, a RoI ID and the Global Processor ID, etc ... Therefore, global and local processors know their respective partners for the event, e.g. the global processor knows from which local processor it has to expect features data. This can simplify error detection and recovev.
From the RoI ID information the local processor knows which sources contain data for the particular RoI (deduced by table look-up in the L2 supervisor, the global processor or even in the local processor itself). The local processor will send a request message to each source concerned (flow 3). In response to message (31, sources send the requested data (flow 4) to the local processor after preprocessing (if needed) and formatting. It is proposed that when all data for a given RoI have been delivered to the local processor, it starts to execute the feature extraction algorithm. Features are sent (flow 5 ) to the global processor. The global processor executes an algorithm based on the collected features for the event. Note, that neither the local processors nor the global processor need to idle while waiting for the data. For example, they can work on the previous events. A LVL2 'Yes/No" decision is issued when the global algorithm completes.
We consider two possibilities for the treatment of the level 2 decisions. In one case, the sources are notified only if the event has been accepted by the LVL2 selection. Only the LVL2 decision 'Yes" is sent to the L3 supervisor (flow 61, which then multicasts it to all sources (flow 7). No immediate action is taken for the events which did not pass the LVL2 selection. The oldest event is simply overwritten in the source buffer when a new event is read from the front-end modules. This scheme is attractive because it does not generate unnecessary traffic in the network (99% of the LVL2 decisions are expected to be "No"), it simplifies control logic in the data sources and requires less actions per event in the system. An alternative solution is to send both LVL2 decisions "Yes" and "NO" to the L3 supervisor (flow 6).
It is possible, that the same processor, which performed the LVL2 global decision will continue to work on the LVL3 selection for the event (because it already possesses a substantial information about it). Another option is to use different processors for those tasks. In this case the L3 supervisor allocates a processor for the LVL3 selection (flow 8).
As for the LVL2, the allocation algorithm can be either simple sequential (e.g. round-robin) or can use a more sophisticated discipline.
The allocated L3 processor sends request messages to the concerned sources (flow 9). In response, sources send requested data (flow 1 0 ) to the L 3 processor after preprocessing (e.g. zero suppression) and formatting. When the required data is available, the L3 processor executes the LVL3 selection algorithm. For accepted events, if necessary, the remaining part of the event data is collected prior to writing the event to mass storage. Rejected events are discarded.
As can be seen from the above description, several different types of traffic are transported by the network. Each of them has its own properties. For example, control traffic uses short messages; the LVL2 data has to be delivered at high rate; the level 3 traffic requires continuous concentration of data flow from many front-end sources to a destination processor, etc.
The ATM technology has been designed to carry simultaneously various types of traffic having different service requirements (e.g. real-time video, audio, data) on a common physical medium. Therefore, we propose to investigate whether ATM is adequate in our application to handle efficiently the LVL2 / LVL3 data and control traffic.
VI. SIMUL.ATION STUDIES
The ATLAS detector is composed of three main subsystems: calorimeter, muon chambers and tracker (see Table 1 ). We decided to concentrate our simulation efforts on the LVL2 / LVL3 trigger for the calorimeter sub-system because the ATLAS Saclay group is strongly involved in calorimetry; therefore we had direct access to relevant information concerning the physics and the read-out organization of the calorimeter.
A. Physics
As previously mentioned, the ATLAS triggering system is based on the concept of regions of interest (RoI). The LVL2 selection uses only data from RoIs. The number of RoIs within events and their properties (size, amount of data, etc.) are important parameters for the design of the overall triggering system. For example, the system described in Fig. 2 may not be practical if the number of RoIs within events is very large (no reduction of the required aggregate bandwidth) or if it is too small (no cost-effective gain from parallel processing of the RoIs).
Extensive Monte Carlo simulation studies have been made by physicist groups in order to evaluate trigger performance [6] . A sample of -1000 di-jet events which passed LVLl electron / gamma selection has been produced. Di-jet events are expected to give the largest contribution (-60%) to the level 1 trigger rate. Samples of events which passed other LVLl selection criteria (muon,. jet and missing energy triggers) have been produced and are currently under analysis.
The distribution of the number of RoIs per event, shown on Fig. 4 .a, has been derived from the analysis of those events. It can be seen that each went contains an average of 5 RoIs. However, the maximum number of RoIs can be as high as 12. It should be mentioned that those parameters depend on the thresholds used at the L'VL1 selection. For each RoI, the LVL1 trigger indicates its geographical position in the q, @ coordinates, its type (electrodgamma, jet, muon, etc.) and possibly some other information. The distribution of electron / gamma RoIs along the q coordinate (direction parallel to the beam axis) is depicted in Fig. 4 We consider that one link per crate is used to transmit the data from the read-out cards to the LVL2 and LVL3 triggering system. At present, the necessary connectivity inside of a crate is provided by a back-plane bus. The simulated architecture is presented in Fig. 5 [lo] , [ll] ).
Specific features of the SAR, such as static and/or dynamic bandwidth allocation and servicing priorities, have been implemented. In the case of the level 3 full event building, data fragments for LVL3 are -10 times bigger than RoI data for LVL2. In order to guarantee fast servicing times for the level 2 data and protocol packets, we intend to use a higher priority for these types of message. A lower priority will be assigned to the LVL3 data. The funneliig of the large LVL3 data packets towards a destination processor induces severe contention in the switching network. This contention can be reduced by an appropriate bandwidth allocation scheme, as described below. The mcdel of a source is shown in Fig. 6 . The source maintains the necessary number of semipermanent Virtual Connections (VC), providing a connection path to all farms of local processors. The VCs are associated with a high priority logical queue. They are serviced in FIFO order at a full link bandwidth. The protocol and LVL2 data (i.e. AAW packets) are placed in this high priority queue.
Lower priority queues, dedicated to the LVL3 data, are serviced whenever the high priority queue is empty. The source manages D low priority logical queues. Each queue contains the event fragments to be transferred to one of the D level 3 destinations. A semi-permanent VC to a destination is associated with each queue. Rate control is used in the sources in order to limit the traffic on each virtual connection so that the aggregate bandwidth of all traffic to a given destination does not exceed the available bandwidth of an output port. A rate control mechanism ensures that one cell is read from the head of each logical queue periodically. The period for servicing the logical queues can be chosen to be N times the cell transmission delay (i.e. 0.68 ps @ 622 Mbit/s), where N is a programmable parameter (N>=D). The fraction of the available bandwidth allocated per VC is 1 / N. Therefore, the peak LVL3 bandwidth per source is D / N of the 622 Mbit/s link rate.
with available industrial ATM components.
The required functionality of a source can be implemented
B.2. Destinatwm
The same destination model has been used for the farms of local processors and global/LVL3 processors. A destination contains a master unit and several processors. The master unit is responsible for: sending requests for data via the network, formatting the received data (e.g. reassemble RoI), distributing the formatted data to the processors, handling the results of the processors.
The processors in a farm perform the actual execution of the appropriate algorithm (feature extraction for RoI, Global LVL2 decision, etc.). If all processors in a farm are busy, events ready to be processed are queued.
Assuming a 100 kHz level 1 trigger rate, 5 RoIs per event, 128 ~s feature extraction algorithm duration and 50% processor occupancy, 128 local processors are needed. Our model contains 16 farms of 8 processors. At present we do not model the LVL2 Global and LVL3 algorithms execution, because we simulate only one subdetector.
B.3. L2 and L3 supatisors
The main tasks of the supervisors were described in section V.B. Our estimates show that -3 ATM cells are needed to deliver the list of RoI pointers and allocated processors to a global processor. Assuming a 100 kHz level 1 trigger rate, the corresponding average bandwidth is 125 Mbit/s. In our model 622 Mbit/s links were used to connect the L2 and L3 supervisors to the network.
B.4. Switching fabric
The switching fabric is a regular interconnection of switching elements. The buffer sizes in the switching elements and the bit-rates of the fabric's external and internal links are programmable.
Semi-permanent virtual connections are used to provide the required connectivity in the system. The connections are not established dynamically. This avoids the complexity of signalling and admission control. At present switching fabrics supporting up to 4,000 VCs per link are available [161.
C. Simulation Results
Two completely inldependent simulation programs have been developed in concurrent object oriented languages, Modsim [17] and pC++ [18] . The same set of input parameters was used for both programs to cross-check results. The results obtained from the two different codes agree within -1%.
Our queueing models do not take into account various overheads (e.g. procesmr I/O, software, etc.). We plan to refine our models with the measurements performed on the hardware demonstrator systems (see, section VII). However, we believe that our models are adequate to evaluate the performance of a single network when it is used to carry both data and protocol traffic for the LVL2 and the LVL3 systems.
They allow us to study interference between the two types of traffic in the system, and to evaluate methods to minimize it. The ability of ATM networks to carry various types of traffic, specific to our application, and the influence of fabric architecture have been investigated.
Performance evaluation of the LVL2 and LVL3 triggering systems requires to pass a large number of events through the simulation program to accumulate enough statistic (one LVL3 accepted event corresponds to -1000 initial LVLl events). At present, the number of LVLl accepted events available from the Monte Carlo studlies is -1000. We developed an event generator to rapidly produce large sets of events which possess characteristics similar to those of the physics events (number of RoIs, their distributions, etc.). In our simulations a set of 50,000 such events passed through the system. This corresponds to -0.5 seconds of the LHC operation. In what follows we present oulr simulation results. Unless otherwise specified, all simulation results correspond to an average of 100 kHz LVLl and 1 lcHz LVL2 Poisson distributed trigger rates and LvL3 full event building.
C.Z. Network bandwidth utilization
During the simulation the load on each link of the network is monitored. Figure :7 shows the bandwidth utilization for each source. As can be seen on fig.7 .a, the LVL2 data traffic (RoI) from the 26 sources to the 16 local processor farms requires -35% of thie sources' 622 Mbit/s output link bandwidth. This traffic creates -50% load on the destinations' input links. significantly bigger than that of the HAC and IFC. The traffic which delivers features from the local processors to the global processor uses -19 Mbit/s of the bandwidth of the global processors' input link and increases their load up to 50%. The request messages for the RoI and LvL3 data use less than 5% of the available bandwidth of the source input links (Fig. 7.b) . On average 25% utilization of the available switching fabric aggregate bandwidth is observed.
C.2. The rate division technique fur h e 1 3 tyufic
In our simulations, the event data fragments for LVL3 are -10 times bigger than RoI data fragments. Therefore, if each source segments LVL2 and LVLS data packets sequentially the RoI data can be blocked during a long time while waiting for a LVL3 packet transmission to terminate. When segmentation of the LVL2 and LVL3 packets was not interleaved, we observed an unacceptably high latency for the LVL2 traffic. However, the ATM technology allows concurrent segmentation of packets belonging to different virtual connections. Therefore, in our model, cells carrying the LVL2 data can be interleaved in the cell stream of a LVL3 packet.
Furthermore the concentration of many long LVL3 cell streams towards the same outlet creates severe contention inside of the switching fabric. This introduces long latencies for LVL3 traffic. If there is no support for different levels of routing priority inside of the switching fabric, control traffic and LVL2 traffic will also be affected. In order to prevent the sources saturating the switching fabric with LvL3 data, a rate division technique is used for the LVL3 traffic. This technique will be described for a system with S sources and D destinations. We consider only LVLS event building traffic. As was presented in section VI.B.a, each source maintains a semi-permanent virtual connection for each L3 destination. A programmable fraction of the available link bandwidth is allocated to each virtual connection. In our application LVL 3 events are evenly distributed among destinations. Therefore all the D virtual channels within a source should be granted an identical bandwidth. The sum of bandwidth for all VCs in a source cannot exceed the link bandwidth at the input to the switch. Hence, the average fraction of the available bandwidth used by any VC will not exceed l / D . Our model includes 26 sources and 14 L3 destinations. Therefore 1 /14 of the 622 Mbit/s link rate can be allocated to each VC (i.e. -44 Mbit/s) within a source.
As many sources concurrently send event data to the same destination, on average the sum of their traffic contributions cannot exceed the available output link bandwidth. The simplest scheme is to allocate the same fraction of bandwidth to each virtual connection in the system, provided that it does not exceed 1 / D . For a system with S sources and D destinations, this fraction would be 1 /S. This guarantees that the output links will not be saturated. Therefore, for our model with 26 sources and 14 L3 destinations, only 1/26 of the 622 Mbit/s link rate (i.e. -24 Mbit/s) can be granted to each VC within a source. In this case 14 24 = 336 Mbit/s will be allocated in each source.
This equal bandwidth allocation scheme is adequate if all sources have approximately the same amount of data to send. However, it can be seen from Table 3 that our system is very un-balanced. The EM sources have to send event data fragments -10 times larger than others.For a 1 kHz LVL2 trigger rate, each EM barrel source needs at least 250 Mbit/s to send its data. The required bandwidth for each VC in the different sources is also given in We compared the system performance with and without the rate division technique. In both cases, a 2-stage Omega network composed of 8x8 switching elements has been used to model the 64-port ATM network. The switching elements operate with output queues configured as a dynamically shared memory [13] . The buffer occupancy of switching elements refleds the contention within the fabric. Figure 8 .a shows the tail distribution of the occupancy of the shared buffer memory in the switching elements (the tail distribution indicates the probability of buffer overflow as a function of the switching element buffer size). 
Switching element buger occupancy
It can be seen that the rate division technique significantly reduces internal buffer occupancy and contention in the fabric.
In switching fabrics in which there is no hardware link-level flow control mechanism the internal buffers can overflow; in this case cells will be lost. At present, switching elements with a shared memory large enough to buffer 256 cells are commonly available. For that case, our simulation predicts 1@ cell loss probability if the rate division technique is used.The time necessary to gather RoI data, distributed among several sources, into a local processor is referred to as RoI building latency. The probability that this latency exceeds a given value (the tail distribution) is plotted in Fig. 8.b for the two different cases considered. The average RoI building latency amounts to 58 ps and 183 ps with and without rate division mpectively.
The time required to gather all LVL3 event data fragments into a L3 processor is referred to as event building latency. Longer event building latencies (average of 18 ms compared to 13 ms) have been observed when no rate division was applied. In this case, in order to reduce contention in the fabric, the L3 processor requested LVL3 data fragments one by one from each source sequentially. If no rate division is applied and if all sources send their data to the L 3 processor simultaneously, the system is immediately saturated.
C3. influence of the LVL3 trafic on the LVL2 &a@
As was mentioned in the previous section, a degradation of the LVL2 performance is observed when no rate division is applied for the LVL3 traffic. We investigated the influence of the LVL3 traffic on the LVL2 traffic when this technique was used. The switching fabric was a &stage Banyan network of 2x2 switching elements with hardware link-level flow control mechanism (AT&T like [141).
The RoI building latency tail distribution is plotted in Fig. 9 for three different cases. In the fmt case on fig.9 the LVL3 traffic is not present in the system and the average RoI building latency amounts to 58 w.
When the rate division technique is applied to the LVL3 traffic, a 25% increase of the average latency has been observed. However, if the LVL2 data is serviced at a higher priority in the sources, the LVL3 traffic has no significant influence on LVL2 traffic.
C.4. Influence of the architecture of the switching fabric
We conducted simulations with various types of switching fabrics. Figure 10 .a shows the RoI and LVL3 event building latencies for two different switching fabric types. Curve 1 corresponds to the 6-stage Banyan network of 2x2 switching elements. Curve 2 relates to the 2-stage Omega network composed of 8x8 switching elements. As can be seen, the shapes for the RoI building latency distributions are identical. Figure 1O .b shows that the event building latency for the two types of switching fabrics is identical. It amounts to 13 ms on average. Our simulations show that, when the rate division technique is applied, the internal architecture of the switching fabric has minor influence on the performance of the system.
C.5. Push vs. Pull
We have simulated the previously described data flow control strategies, namely "Push and "Pull". In the push approach, the sources send RoI and LVL3 data to the allocated processors as soon as they become available. In our model, the time required to distribute the necessary information (RoI pointers, allocated processors) to the sources was not taken into account.
In the pull approach, the local and L3 processors request the necessary data from the sources when needed. The transfer delay for the protocol traffic through the fabric has been modelled. Our simulaiions indicate that the average latency introduced by the network for the request messages amounts to -8 p.
The average RoI building latency for the pull approach is -10 ~.ls more than that for the push data flow strategy. The difference is due to the latency of the request traffic. The average event buildink; latencies for both cases are identical, since the influence of thie protocol traffic delay is negligible.
C.6. Influence of trigger rates.
We have evaluated the system performance for various LVLl and LVL2 trigger rates. The simulation results are shown on Fig. 11 . In one case, the LVLl trigger rate was varied while the average LVL2 rate was kept constant at 1 kHz. As can be seen 98 from fig.ll .a, the system behavior is satisfactory up to the X. REFERENCES targeted 100 kHz LvLl trigger r a t e h the second case, the the LVL2 trigger rate was varied. Figure ll .b shows that, even for the LVL3 full event building, the system is not saturated for LVL2 trigger rate up to 1.1 kHz (targeted rate is 1 kHZ).
[1~ A second demonstrator based on the Phoenix AT&T switch [141 is foreseen. At present, we have installed two SBus/ATM and one VME/ATM interfaces [2] . We evaluate performance of the interfaces at various protocol levels (e.g. LAN emulation, AAL5 layer) and under real-time requirements. An 8-port AT&T switching fabric will be delivered S O O~ VIII. SUMMARY The ATLAS Collaboration proposes to built a generalpurpose proton-proton detector which is designed to exploit the full discovery potential of the Large Hadron Collider (LHC) at CERN (Geneva). Asynchronous Transfer Mode (ATM) packet switching network technology has been proposed as the interconnect for building high-performance data acquisition architectures for future physics experiments.
In this document we have proposed a n integrated architecture for the ATLAS LVL2 / LVL3 selection and data read-out system. It is based on the "Pull" principle and a single network which carries both data and protocol traffics. We have performed simulation studies for the ATLAS calorimeter subsystem to validate the proposed concepts and investigate the feasibility of using ATM as the network technology. A satisfactory system behavior has been observed at the targeted ATLAS level 1 and level 2 trigger rates. The bandwidth allocation technique, provided by ATM technology, makes an ATM network adequate to handle efficiently our specific types of traffic. We plan to extend our simulation efforts to cover other sub-systems of the ATLAS detector. The demonstrator systems, currently under evaluation, will allow us to refine our models and evaluate performance issues on real hardware.
IX. A~K N~~L E L~~E~
