Abstract-This paper describes past and present techniques and technologies for the transport of digital data in recent physics experiments. After an overview of the typical requirements for modern data acquisition systems in the field of large scale experimental physics, we detail the successes and failures observed over the last 20 years of evolution of high-speed point-to-point link technology, networking standards and products. Modtextern data transport technology is presented along with several applications to experiments under construction. Advanced techniques, emerging technologies and trends in the field of high-speed digital data transport are outlined in the perspective of future experiments.
I. INTRODUCTION
T HE basic principles for the design of data acquisition systems (DAQ) in experimental physics have not changed very much over the last three decades. The art of the DAQ designer is to cascade in a clever way, behind a detector, electronic devices that fall in one of three categories: data processing devices, data retention devices and data transport devices. Data processing devices, such as discrete transistors, logic gates or computers, transform, and generally reduce, data to extract content information. Data retention devices, such as capacitors or digital memories, temporarily hold data that cannot be processed or transported as such, for example bursty data, or data awaiting the decision to be discarded or kept. Data transport devices, from simple wires to local area network links, are used to move/gather/route data to an environment appropriate for temporary storage or processing. This paper addresses the techniques and technologies used for the transport of digital data within large-scale physics experiments. What solutions were deployed in the past, and what performance was achieved? Which products and technologies were successful and for what reasons some others failed? What are today's key standards, products and techniques? How are these going to evolve? Without being an exhaustive survey of the abundant literature on the subject, this paper gives pieces of answers drawn from personal observations, readings, discussions and the general experience of the author. 
II. REQUIREMENTS FOR THE TRANSPORT OF DIGITAL DATA IN PHYSICS EXPERIMENTS
Some of the relevant items to classify digital data transport devices include:
• bandwidth: kbps, Mbps or Gbps-class;
• connectivity: point-to-point, multicast/broadcast, random traffic, fixed pattern; • latency: deterministic or variable;
• medium: electrical, optical, radio;
• distance: short ( 10 m), medium, large ( 2 km). Among the 216 possible combinations of the previous parameters, it is both a pleasant and difficult exercise to spot any sensible configuration that was never exploited by at least one experiment in physics. To remain general, this paper focuses on some of the most common configurations.
In modern large experiments, the transport of data off-detector typically uses tens to thousands of Gigabit-per-secondclass point-to-point links running in parallel [1] . Radiation hardness and/or magnetic field resistance are common requirements. Optical fibers are often preferred over copper to shield sensitive front-end electronics from external electromagnetic pollution. Timing distribution systems are an example of latency critical multicast/broadcast networks. These routinely fanout synchronous signals at a few 10 MHz rate from a central point to hundreds of units located several tens of meters apart, with nanosecond-order system-wide skew and sub-nanosecond jitter [2] . Trigger systems and event builders require data transport networks with multi-gigabit per second aggregate bandwidth, and tens to hundreds of ports [3] .
III. UNTIL THE MID-80S: THE BUS ERA
From the 50s to the mid-80s, the typical size of electronic systems for experimental physics has grown from one, to a few, then tens of crates. Backplane bus standardization was a key activity and interconnecting multiple crates with point-to-point links was the way to grow beyond the limitations imposed by a single enclosure. Several key standards were established by the nuclear science community: NIM (Nuclear Instrumentation Methods, standard DOE/ER-0457 established in 1964), CAMAC (Computer Automated Measurement And Control, European ESONE, US NIM standard EUR 4100/IEC 516 established in 1968) and FastBus (ANSI/IEEE 960-1986, IEC 935) [4] , [5] . FastBus was successfully used in many major physics experiments in the 80s (over 20 publications related to Fastbus appear in [6] ), but the complexity of the standard and the lack of acceptance by the industry mass market made users' interest shift towards the VMEbus (Versa Module Europe) [7] and VXI (VME eXtensions for Instrumentation). For these various bus 0018-9499/$20.00 © 2006 IEEE standards, inter-crate links running at up to 20 MB/s over several meters were based on parallel flat ribbon cables using TTL or ECL signaling levels. To interconnect VME crates, the VICbus [8] was popular. Up to 33 MB/s transfer rates could be achieved over several meters using 64 twisted-pair cables.
Commercial networking technologies included Fiber Distributed Data Interface (FDDI) [9] running at 100 Mbps over optical media, Ethernet based on a shared media (10 Mbps) [10] , and Token Ring (4.16 Mbps), a technology originating from IBM standardized with minor variations by the IEEE 802.5 working group [11] .
Optical links running at 100 Mbps were the highest practical speed available at relatively low cost thanks to the support of industry to the FDDI market. Dedicated chipsets such as Advanced Micro Devices TAXIchip™ [12] provided a ready-to-use solution for point-to-point links running at up to 140 Mbps over fiber-optic, coaxial cable or twisted pair media. Some applications of these devices are reported in [13] , and notably in [14] where 128 optical links running in parallel were used for the readout of a time projection chamber.
IV. LATE 80S TO MID 90S: THE TRANSPUTER AND DSP ERA Backplane buses played a central role for gathering detector data until bandwidth limitations became a critical bottleneck. To increase throughput, switched-based systems with multiple high speed links were introduced. Standard buses were nonetheless kept for mechanics, power, cooling, configuration, slow control, monitoring, and less demanding DAQ tasks.
A. Transputers
One of the most innovative products in the history of micro-processors is certainly the Transputer (trans-istor com-puter) introduced by INMOS in 1984 [15] . The basic idea is to integrate on the same silicon die a microprocessor core, a memory and communication links. By assembling many of these individually simple devices, parallel systems capable of performing complex tasks could be devised. A Transputer had 4 bi-directional links, running concurrently with the CPU. Operating speed was 5 Mbps for the first generation of Transputers, and reached 100 Mbps for the last model (T9000). Cross-point switches were also available to built complex network topologies. The C104 chip integrated 32 ports at 100 Mbps on a single die [16] . Transputers and their companion switch devices found numerous applications in instrumentation. A few examples are the event builder of the Zeus experiment [17] , the DAQ system of the GA.SP experiment [18] , the second level trigger of L3 experiment [19] , and on-line filtering in CPLEAR [20] . A 1024-node system based on the C104 packet switch was constructed [21] .
Transputer links used a clever scheme called Data Strobe (DS) encoding: data is sent on the Data line unmodified; the Strobe line changes state only if a data bit has the same value as the previous one. The exclusive OR of the Data and Strobe lines provides the receiver with the reconstructed clock (dual-edge) for sampling the Data line. This scheme is simple, does not require any PLL, allows for a full bit period of skew tolerance, and is auto-baud rate for the receiver. Several successor standards exploit this data encoding scheme: IEEE Std. 1355 [22] , Firewire [23] , and SpaceWire [24] .
The enthusiasm for Transputers lasted about 6 years. Excessive delays in the production of the T9000 (bugs in the silicon, slower clock rate than initially planned), and the introduction of fast Digital Signal Processors (DSPs) put an abrupt end to the concept.
B. DSPs for Parallel Systems
At least two very successful DSP devices for parallel systems are worth mentioning: Texas Instruments' TMS320C40 [25] introduced in 1991, and its aggressive competitor, Analog Devices' Sharc (ADSP-2106x), introduced in 1994 [26] . Both devices integrate on a single chip a 32-bit floating point processor core, a static RAM, and six high speed (30-40 MB/s) communication links. Commercial multi-DSP boards equipped with 4-10 DSPs became widely used. Some applications in physics are reported in [27] and notably in [28] , where 140 boards, each carrying six Sharc DSPs, were used to buffer and switch detector data in the Hera-B experiment.
Each C'40 link was a half-duplex port (8 data + 4 control lines) and could run at up to 30 MB/s. Each Sharc link was programmable in either transmit or receive mode, used four lines for data, one line for clock (up to twice the CPU clock rate of 40 MHz), and one acknowledge line. Unipolar TTL level signaling was used. Transfer rates of up to 40 MB/s could be achieved over few tens of centimeters using high quality ribbon cables.
The competitive advantage of using DSPs in instrumental physics lasted about half a decade. During several years, market demands drove the evolution of DSPs more towards lower power consumption and low cost than ultimate computing power and I/O bandwidth. Steadily increasing CPU power and the introduction of the PCI bus in 1992 brought decent I/O capabilities to PCs for a bargain price compared to multi-DSP boards. Today, DSPs running at GHz clock rates could bring back these devices on the forefront of the scene.
V. MID/LATE 90S: THE ATM, SCI, FAST ETHERNET DEBATE
By the late 80s, there was a crucial need for a new generation of standards for high speed data transfers in many different sectors of the information technology industry. Not surprisingly, each community came up with at least two different and competing standards. Vast R&D programs to evaluate some of these technologies in view of the future Large Hadron Collider were initiated by CERN [29] , [30] . What happened to these (too numerous) products and standards?
A. Beyond the Bus Concept
Looking for a successor to Fastbus and Futurebus, a community, partly close to high energy physics, developed the Scalable Coherent Interface (SCI) [31] , while a competing group proposed Quick Ring. SCI is based on 1 GByte/s links (originally ECL signaling) interconnecting devices in a ring topology, with possible bridging between rings. It has some support for cache coherency, features high bandwidth and extremely low-latency transfers. This makes SCI ideal for building multi-processor systems. Quick Ring is a slower and simpler version (no cache coherency). Unfortunately, both SCI and Quick Ring lacked the necessary support from industry and applications for the mass market to really take off. The main tribute to SCI is an extension of the initial standard that defines one of the two standards for Low Voltage Differential Signaling (LVDS) [32] , [33] . In the opinion of the author, the LVDS specification is undoubtly the most influential document of the decade in the field.
B. Linking Storage Devices
Initially meant for supercomputer to mass storage transfers, research on gigabit technology at Los Alamos National Laboratory led to the High Performance Parallel Interface (HIPPI) standard [34] . HIPPI initially offered 800 Mbit/s of bandwidth over a 32 bit interface. Also meant to connect computers to storage devices, a consortium of industrial developed Fibre Channel [35] . The initial version ran at 266 Mbps and was followed by a 1 Gbps version. Although both standards still co-exist and are being developed, Fibre Channel is having more commercial success than HIPPI.
In DAQ systems for experimental physics, the chipsets and components developed for Fibre Channel/serial HIPPI found many applications. Examples of devices for high speed point-topoint links are Cypress Hotlink (266-400 Mbps) and the extremely popular Hewlett Packard's G-link (1 Gbps). In order to avoid that the obsolescence of a device renders the design of a board unusable and to benefit easily from new faster devices, the S-Link concept was proposed [36] . It defines a FIFO-like interface which is independent of the physical link layer. Board designers just need to place connectors on their motherboard following the specification. Then any S-Link compliant third-party mezzanine card that includes the physical-layer-of-the-year can be used. The concept found many applications (e.g., in [37] ), and numerous products are still available.
C. Unifying Wide Area/Local Area Networking (WAN/LAN)
Promising the convergence of voice, data and video traffic over a common media, a forum (with strong representatives from the telecom world) established Asynchronous Transfer Mode (ATM) technology [38] . In ATM, data are carried in short (53 bytes) cells, time-multiplexed over a common media. Quality of service mechanisms allow to transfer multi-media, voice and data traffic across a unified infrastructure. Commercial switching products had up to 256 bi-directional 155 Mbps ports. Despite massive support from the telecom industry and some LAN equipment vendors, ATM failed to capture more than a small fraction of the LAN market. In physics applications, ATM products have been used for event builders in [39] and [40] . ATM was the technology of choice for backbone infrastructure (e.g., Internet Service Providers) until the late 90s. Although equipments are progressively being phased out, the ATM service market is still today a very profitable business-$4B revenue worldwide in 2003 compared to $2.5B for Ethernet-thanks to the success of ADSL (Asymmetric Digital Subscriber Line) [41] technology for high speed Internet-surfing, low cost telephony and television channel distribution over plain old telephone lines.
D. Upgrading Proven LAN Technology
Based on the success of 10 Mbps Ethernet, the local area network community proposed a tenfold increase of performance with two competing standards: Fast Ethernet [42] and 100VG-AnyLan. The advantage of the 100VG-AnyLan proposal was to support both Ethernet and Token Ring frame types, but this argument did not appeal much to customers. Fast Ethernet supports half-duplex and full-duplex operations and is interoperable with the first generation of Ethernet (identical frame format, auto-negotiation of speed on each segment). Another major evolution was the introduction of switched Fast Ethernet. Instead of sharing bandwidth between devices attached to the same segment, star topologies based on multi-port switches could be built. A modest size 16-port Fast Ethernet switched network provides a 320-fold increase of bandwidth compared to a single segment half-duplex 10 Mbps Ethernet! Fast Ethernet captured the largest fraction of the LAN market very rapidly, reducing month after month the chances of success of ATM. Fast Ethernet technology is ubiquitous in today's networking and one would hardly find systems for experimental physics where no single bit of data is sooner or later carried over a Fast Ethernet connection.
E. Proprietary Products
Adding to the profusion of new standards that appeared in the 90s, several companies put their own products in the arena: Mercury's Raceway (ANSI/VITA Standard 5-1994) used in [43] , Myricom's Myrinet (ANSI/VITA Standard 26-1998) used in [44] , Sky Computers' SkyChannel Packet Bus (ANSI/VITA Standard 10-1995), etc. Despite the merit of each product, deployment in physics experiments was marginal, and choosing exotic technologies often proved to be a good recipe for having regular system "upgrades"!
VI. LATE 90S TO PRESENT: THE (MULTI-)GIGABIT ERA

A. LVDS Technology and FPGAs
As mentioned earlier, the LVDS specification had a profound impact on a whole sector of the silicon industry. Families of devices to transport data at (multi-)Gbps rate over up to several meters of copper cable were introduced by National Semiconductor [45] , and lower speed devices are available from Texas Instruments. These devices are cheap, flexible, and draw very low power. An example of application is reported in [46] where a system based on 320 LVDS serializers and de-serializers handles 300 Gbit/s of data with an input-to-output delay of 200 ns. LVDS also supports multi-drop applications and high speed backplane bus applications (up to 5 Gbps per bus line pair).
The progresses made on field programmable gate arrays (FPGAs) are also spectacular. Two major evolutions took place recently: the ability of FPGAs to support LVDS and other high speed signaling I/O standards, and the integration of dedicated blocks to complement programmable logic: RAMs, multi-gigabit class transceivers, and DSP slices/RISC processors; a concept referred to as "a system-on-a-chip". Multi-million gate FPGA devices capable of digesting several 10 Gbps of I/O bandwidth offer ultimate flexibility for a few hundred dollars. FPGAs are an indispensable ingredient in countless industrial applications, consumer products, and modern DAQ systems in experimental physics.
B. Gigabit Ethernet
Following its predecessors, the Gigabit Ethernet standard introduced in 1998 had a large and immediate commercial success. In less than a year, Gigabit Ethernet put a definitive end to the market of ATM at the core of enterprise networks. The reasons for the success of Gigabit Ethernet are manifold: the technology did not ambition to be universal; the standard converged very quickly; the technology is rather simple, bears a famous name, and is compatible (at the frame level at least) with previous generations of Ethernet. The original CSMA/CD scheme was reworked for Gigabit Ethernet, but the real intended use of the technology is full duplex point-to-point links interconnecting hosts via switches. Note that the "1 Gbps" bandwidth quoted for Gigabit Ethernet is the bit rate before 8B/10B line encoding while the bandwidth advertised for Fibre Channel products is the line rate after encoding.
The DAQ of the Compass experiment uses four 16-port Gigabit Ethernet switches [37] . Most of the large scale experiments under construction plan to use Gigabit Ethernet in their DAQ system; some R&D work is presented in [3] . High-end commercial switches now offer up to 400 Gigabit Ethernet ports and 700 Gbps of (theoretical) aggregate capacity.
C. Optical Technology
Much progress on optical interconnects was also made during the last decade. Public networks built on Synchronous Optical Network -Synchronous Digital Hierarchy (SONET-SDH) standards were developed [47] . The OC-3 rate (155.52 Mbps) popular in the mid-90s was quickly followed by OC-12 (622.08 Mbps), then OC-48 (2.48 Gbps), OC-192 (10 Gbps), and products are now available for OC-768 (40 Gbps).
Dense Wavelength Division Multiplexing (DWDM) is a technique to increase the throughput of optical links in a scalable way by combining many wavelengths (up to 160) onto a single fiber. Products are almost exclusively used for long haul transport. The technique is used to transport the data of the Antares underwater neutrino telescope located 40 km off-shore (6 wavelengths per fiber) [48] , and an evaluation of an OC-48 DWDM transponder in view of future DAQ systems is given in [49] . For less demanding applications, full-duplex transceivers using a single fiber are available from several vendors in speeds ranging from 100 Mbps to 1.25 Gbps.
Ultimate speed is one of the motivations for using optical technology, but in many industrial, medical and physics application, reducing EMI is the main argument. Paradoxically, designing for gigabit per second links has often become simpler than relying on lower speeds because most 100 Mbps-class chipsets are now obsolete and serializers embedded in modern FPGAs use PLLs that are not capable of running below 622 Mbps. To design with optics in the 0-200 Mbps range, recommended readings include [50] , [51] , and [52] .
Parallel optics is also a technology that evolved rapidly in the recent years. The SNAP12 Multi Source Agreement (MSA) is a specification followed by many vendors for interoperable parallel optics based on 12-fiber composite cables (MTP ® /MPO). Current products include 12-channel transmitters and receivers (up to 3.125 Gbps per channel) and transceivers with four duplex channels. Transceivers with up to 36 duplex 2.7 Gbps channels are also being introduced by Tyco-AMP. An evaluation of an early 12-fiber optical transceiver is reported in [53] . Parallel optics is attractive for applications that require high bandwidth and compactness at a reasonable cost. When a direct connection between two parallel optics devices is made, channel bounding can be used to create "fat pipes" running at 10-40 Gbps over distances of up to a few hundred meters. This is faster than state-of-the-art single channel devices and significantly cheaper than an equivalent solution based on DWDM. The edge of a standard 9U card can accommodate up to 30 high speed links (copper cable assembly or single channel optical transceiver), leading to a maximum aggregate bandwidth of 60-120 Gbps per card [54] . Parallel optics available today offers at least a tenfold increase of performance. This can open a new era of applications with terabit per second per module capacity. Each individual channel of a parallel optics device may be used independently: through the appropriate patch cord, an extremely compact parallel optics device mounted on a concentrator card can be linked to 12 electronic cards each equipped with a single channel optical transceiver. Multiple parallel optics devices may be linked through passive optical cross-connects to make a network with a pre-defined topology. An application of parallel optics that exploits simultaneously the concept of fiber aggregation and cross-connection is described in [55] .
VII. EMERGING TECHNOLOGIES AND STANDARDS
Recent standards for high speed digital electronics deal with signals above 300 MHz and data rates in the 1-40 Gbps range. The general trend is to use high speed point-to-point serial links at various levels within systems, encapsulate information flows in packets, and bound multiple channels together at the logical level (rather than at the physical level) to increase throughput. Until recently, different techniques were used to interconnect components at the board level, interconnect boards within a system, and link systems together. As a consequence, the progress made in one area (e.g., backplane buses) had almost no impact in other areas (e.g., component interconnects or networking). The situation is changing now and the convergence of techniques for interconnects at different scales act as a common steering force for part of the silicon industry.
A. Board-Level Interconnects
Legacy board designs often use a system synchronous clock to exchange data between components. Clock skew and other factors put a limit 200 Mbps for the data transfer rate achievable per component pin. More recent devices, like SDRAM, use source synchronous interfaces to achieve up to 533 Mbps per pin using double edge clocking, low voltage unipolar signaling and parallel transfers with one strobe line per group of eight data lines. To increase throughput, leading-edge multi-channel ADCs use serial LVDS outputs to reach up to 800 Mbps per pin pair. Serial memories, like RamBus™ RDRAM reach 1.2 Gbps data throughput per pin. As previously stated, a new aspect of modern electronic design is that connecting components at the board level is starting to use techniques that once applied only to inter-system links: differential signaling, serial transmission, line encoding, data packets, etc. HyperTransport™ is an I/O technology for board level architecture that exploits some of these concepts [56] . To ease the interface with fast components, modern FPGAs incorporate Digital Loop Line (DLL) to deskew clocks, built-in support for multiple clock domains, individual programmable delay lines on each user I/O pin, and versatile serializers/de-serializers (e.g., Xilinx Virtex 4 family).
B. System-Level Interconnects
For over three decades, the multi-drop parallel bus paradigm has been the key concept for intra-system interconnects. Almost all possible ways to boost performance have been exploited: maximize the width of the data path by adding connector pins and/or time-multiplexing data over the address lines, increase clock speed, transfer data on both edges of the clock, and use reflective wave switching (PCI) instead of the traditional incident wave switching technique. Although the limits of the traditional parallel bus may be soon reached, the evolutions of the PCI bus are still promising: three different directions are being pursued [57] . "Conventional PCI" is the evolution of the original specifications. PCI-X is a backward compatible version, operating at 133 MHz, 266 MHz, or 533 MHz. A radical change is proposed with PCI-Express™ where the multi-drop parallel bus paradigm is abandoned in favor of multiple serial lanes (typical 2.5 Gbps per lane) transporting data packets. A competitive, and possibly complementary, standard is RapidIO ® [58] . Quality of service and multi-cast are among the new features offered. Following CompactPCI and CompactPCI eXtensions for Instrumentation (PXI), a very promising new crate and backplane infrastructure is being defined: the Advanced Telecommunications and Computing Architecture (AdvancedTCA) [59] . Like in many other modern standards, the parallel backplane bus is abandoned and replaced by multiple point-to-point links. Several backplane topologies are available: mesh, star, dual-star, etc. Each link may operate at up to 5 Gbps and the aggregate bandwidth cross-section of a 14-slot chassis can scale up to 2.5 Tbps.
C. Peripherals and Networking
For mass market products, Firewire will soon deliver 1.6 Gbps and possibly 3.2 Gbps. USB2 (currently at 480 Mbps) may also evolve, although wireless USB could become the dominant trend. For interfaces to mass storage devices, Serial ATA is evolving from today's 1.5 Gbps rate towards 3 Gbps, while 4-10 Gbps Fibre Channel products are expected this year. The standard for 10 Gbps Ethernet was ratified in 2002, and products are now reaching maturity. The InfiniBand™ architecture is heavily promoted [60] . Specifications for interoperable parallel copper cable and parallel optics modules are being set up by the IBPACK Multi Source Agreement group [61] . Many of the new optical networking standards are being set up by the Optical Interconnecting Forum [62] .
VIII. SUMMARY AND OUTLOOK
Over the last 25 years, the I/O bandwidth handled by electronic components and systems has increased by two to three orders of magnitude. Have the limits of parallel buses finally been reached? Will ATCA replace VME and CompactPCI? Will PCI-Express™ be more successful than SCI? Will DSPs with RapidIO ® links bring again DSPs on the forefront of I/O intensive real-time computing? Or will FPGAs keep their current leadership? How fast and far can copper really go? Will the parallel optics market take off? Success is unpredictable, but experimental physics will, for sure, continue to benefit from the latest technological advances in the field.
