The work reported involves the construction of a large modular testbed using IEEE 1355 DS link technology. A thousand nodes will be interconnected by a switching fabric based on the STC104 packet switch. The system has been designed and constructed in a modular way in order to allow a variety of di erent network topologies to be investigated. Network throughput and latency have been studied for di erent network topologies under various tra c conditions. We also present results of studies carried out with tra c patterns expected for the ATLAS second level trigger system.
Introduction
Data acquisition and trigger systems for LHC experiments will depend critically on the use of high speed point to point links and packet switches 1, 2] . A range of alternative technologies are presently under study including, Fibre Channel 3], ATM 4], SCI 5] 
and IEEE 1355 6] (DS and HS links).
To date, practical experience with packet switching networks in High Energy Physics (HEP) has been con ned to relatively small systems and there are no experimental results on how the performance of such systems will scale up to several hundred or even several thousand nodes.
Theoretical studies 7, 8] have been carried out for large IEEE 1355 networks of up to one thousand nodes connected by di erent switching topologies. How-ever, the tra c patterns used are not those relevant to HEP experiments, but rather those found in telecommunications systems or massively parallel computers.
We present results obtained on a large modular testbed using DS link and switch technology. One thousand nodes will be interconnected by a switching fabric based on the 32 way STC104 packet switch 9]. The system has been designed and constructed in a modular way to allow a variety of di erent network topologies to be investigated (Clos 10] , hypercube, grid, torus, etc.).
The Macram e testbed has been constructed using three basic building blocks: { Tra c nodes, which can send and receive data at the nominal link speed of 100 Mbits/s with a programmable tra c pro le. { Packet switch units based on the STC104. { timing nodes, which transmit and analyse time stamped packet in order to perform latency measurements. Figure 1 shows a picture of the testbed architecture, the topology shown is a two dimensional grid (2D grid This work is carried out within the framework of the European Union's Esprit program as part of the Macram e project (Esprit project 8603). Figure 3 shows the latency as a function of the aggregate network throughput for three di erent size Clos networks. The tra c pattern is random, i.e. transmitting nodes send 64 byte packets to a destination chosen from a uniform distribution. The results are produced by varying the network load and measuring the corresponding throughput and latency values. It can be seen that the average latency increases exponentially as the network throughput approaches saturation. To achieve low average latencies the applied network load must therefore be below the saturation throughput. Figure 4 shows the probability that the latency of a packet will be greater than a given value for various network loads. The tra c pattern is random, with a packet length of 64 bytes. For 10% load the latency distribution is very narrow. Near the saturation throughput (about 60% load) a signi cant percentage of the packets experience a latency many times the average value of 18 s. To reduce the probability of large network latencies the network must be lightly loaded. Figure 5 shows the per node throughput for di erent size 2D grid and Clos networks under random tra c as a function of the packet length. The Clos networks show better performance because of the higher cross-sectional bandwidth of this topology.
Results

Network latency for Clos networks
Comparison of network topologies
The e ect of packet length on throughput can also be seen, for small packets the throughput is reduced due to xed packet overheads. Medium sized packets give the best performance because of the bu ering present in the STC104. Each switch can bu er 32 bytes in both the link input and output ports. Long packets ll the entire path from source to destination, and therefore throughput is reduced by head-of-line blocking. Figure 5 also shows that the throughput of Clos and 2D grid networks does not scale linearly with network size under random tra c, the per node throughput is reduced as the network size increases. For random tra c, contention at the destinations and internally to the network reduces the network throughput compared to that obtained for systematic tra c, where there is no destination contention. The fall o in performance from systematic to random tra c is more pronounced for the grid than the Clos. The overhead in dispatching packets in the tra c nodes is determined by hardware and is approximately 650 ns. This will not in general be the case when interfacing links to a microprocessor. To demonstrate the e ect of the packet overhead the dispatching delay has been arti cially increased. Figure 7 shows the dependence of the total network packet rate on the individual packet overhead for a 128 node Clos under random tra c. The fall o in performance is particularly marked for short packets; the packet rate drops by nearly an order of magnitude when the overhead is increased from 10 to 100 s. This underlines the importance of an e cient processor to link interface. Network performance under tra c patterns as expected in the ATLAS second level trigger system have also been investigated. Figure 8 shows the total network throughput versus the event rate for a 256 node Clos network.
Scalability of Clos and grid networks
The tra c shows a fan-in pattern, i.e. several sources (level-2 bu ers) are sending to the same destination (trigger processor). Several destinations are selected per event. The tra c parameters are based on the barrel of the Silicon Tracker (SCT) subdetector (see table below) and references 1, 13] .
