The research project RD24 [11] is studying applications of the Scalable Coherent Interface (IEEE-1596) standard for the large hadron collider (LHC). First SCI node chips from Dolphin were used to demonstrate the use and functioning of SCI's packet protocols and to measure data rates. We present results from a first, two-node SCI ringlet at CERN, based on a R3000 RISC processor node and DMA node on a MC68040 processor bus. A diagnostic link analyzer monitors the SCI packet protocols up to full link bandwidth. In its second phase, RD24 will build a first implementation of a multi-ringlet SCI data merger.
I. BASICS of SCI
SCI [1] provides bus like features between SCI nodes in a ringlet. Point-to-point links interconnect the inputs and outputs of SCI nodes (fig 1) . These transmit incoming packets either to the output link, or direct them into an input FIFO. Packets which are generated by user-logic on the Cbus [2] side are queued in an output FIFO until the bypass FIFO is empty. In this way, several Node RINGLET nodes in a ringlet may be receiving and transmitting simultaneously at the intrinsic node-chip speed to achieve a ringlet bandwidth which is significantly higher than the node chip bandwidth SCI links transport packets as shown schematically in fig 2. A flag signal delimits packets which are composed of data or control symbols, clocked at every transition of the SCI clock. The 16 bit wide link of a GaAS NodeChip™ [3] from Dolphin transmits one 16 bit symbol every 2 ns, resulting in a raw link bandwidth of 1 Gbyte/s. SCI packets are framed by a header, containing address and command fields, and a CRC trailer.
Transactions consist of two subactions: during the request subaction a packet containing address, command and optionally data is sent to a responder node. After it's intrinsic latency, the responder starts the response subaction, which in case of a read transaction returns data via a response packet. Typical transactions, implemented in the first node chips are: read/write cached or noncached 64 byte, read/write 1 to 16 byte noncached, and move 64 byte transactions.
SCI uses 64 bit addresses. The upper 16 bits specify the node-identifier within a 64K node address space. The remaining 48 bits are the internal byte address in that node. SCI complies fully with the IEEE 1212 CSR standard [4] .
II. A PREVIEW of SCI for DATA ACQUISITION
Data acquisition systems in High Energy Physics experiments are faced with increasing demands in size and data rates. New approaches are being investigated [5] within a series of research projects. The project RD24 has demonstrated that SCI can be used to build systems which scale in size and performance beyond the limits of conventional bus systems. The first experience gained with the construction of a multi-node SCI ringlet system, including the design of prototype SCI processor and memory nodes allows us to preview the possibility to build large and uniform SCI systems with the following advantages:
• Data rates beyond 100 Mbyte/s per individual channel • Link speeds at 1 Gbyte/s • Simultaneous (split) transactions between nodes • Shared memory or data-driven systems • Short and long distances • VLSI chips with both requester and responder protocols • Optional use of caches and cache-coherency to reduce latencies and avoid event copies
As a first SCI implementor, RD24 investigates in several project phases, the applicability of SCI to key areas of Data Acquisition Systems. The areas of interest for application of SCI in a typical Data Acquisition system are shown in fig 3: 1) Access to numerous data buffers after first reduction and compression processors, typically dual ported memories 2) High rate data collection over distance using optical fibers 3) Crossover network to coherently build events from randomly distributed event fragments 4) Interfaces to highly performing processor farms 5) Transparent interfaces to workstations 6) Data logging and transmission to storage media 
A. Multiple Ringlet DAQ System
Several physical SCI implementations are becoming available shortly to allow design of multi-ringlet SCI systems for data collection (merging ) of high rate data over distance (fig 4) .
High rate HEP experiments can use SCI ringlets to merge SCI data streams over optical SCI fibers from distributed detector sources. CPU farms, investigating event by event in real time, can be interfaced via SCI-processor interfaces. Data from the detector's digitizing stages can be connected via CMOS SCI ringlets. These low cost nodechips [2] , initially capable of 100 MByte/s, roughly match the 1.4 Gigabit/s speed of already existing GiGa chips [9] for serial encoding of 16/17 bit data over coaxial cables or optical fibers. In order to build a first, multi-ringlet SCI data merger, RD24 has designed prototype SCI nodes with the following characteristics:
• SCI to memory node • DMA node for dual ported memories • I/O node via a firmware driven RISC processor For a complete implementation in experiments at CERN, work in RD24 is continuing with the following design projects:
• optical SCI-Fiber ringlet • SCI -Fiber to copper bridges • transparent processor interface
B. Bandwidth exploitation of SCI ringlets
The bandwidth of an SCI ringlet has an upper limit given by a saturation at around 1.5 times the link bandwidth of a nodechip.
In practice, the ratio of packet overhead to packet data reduces the available thoughput, depending on the packet protocols in use (table 1) . A high bandwidth SCI data merger makes best use of SCI using the move protocols. For instance, the dmove64 protocol uses 64 byte packets which, unlike most other SCI transactions, have no response subaction. The implementation of a data driven DAQ system, based on dmove64 write transactions, generated by SCI nodes at the data sources is an obvious goal of RD24. We have so far successfully designed a DMA node using the dmove protocols and achieved data transfer rates in excess of 100 Mbyte/s .
C. Interfacing
Both transparent and I/O optimized interfaces are required for the online data processing architectures of Data Acquisition Systems.
RD24 decided to keep it's first interface design projects as flexible as possible and to take a firmware approach which can be easily extended or modified. In order to compensate for speed losses due to this software approach, we used a RISC processor as programmable state machine between an SCI node chip and an application bus.
D. SCI links
The transfer over parallel SCI copper cables is limited in distance due to skew requirements in the sub-nanosecond range. The twin-coaxial cables 1 used for the GaAs ringlet at CERN are manufactured with a PCB design correction of skew times and allow for a length of 10-12 meters, staying within the 250 ps skew requirements of SCI. Ringlets with distances beyond 12 m can however be implemented via a serialization of SCI, i.e. transmission over either coaxial cable or optical fibers ( fig.5 ).
The 17 bit mode of the 1.4 Gbaud HDMP [7] receiver/transmitter GiGa chips, connected to 17 link signals of a parallel SCI link of a node chip is a first two-chip approach to implement long distance SCI ringlets with a performance at around 100 Mbyte/s. The direct output of the GiGa-chips can drive single-ended 50 Ohm lines. The use of high quality single mode lasers allows to extend SCI ringlets up to 10 Km. RD24 has started a design project of a long distance optical ringlet to investigate a shared memory application between two distant VMEbus crates III. FIRST SCI RINGLET DESIGN The 1 GByte/s Dolphin GaAs NodeChip™ [2] transmits packets over 16 ECL signal pairs at 500 MHz and allows to achieve a raw link bandwidth of 1 Gbyte/s per node. RD24 has used early engineering samples of these chips for a first SCI ringlet design. We used SCI at 1/2 of the nominal link speed, such that all our figures need to be corrected accordinglyfor a final version of the Dolphin GaAS chips.
The NodeChip™ is mounted on a VME-sized mezzanine board, including DC-DC converters and initialization hardware.
RD24 used commercial VME processor cards from CES [8, 9] to interface the mezzanine nodechip cards to a R3000 RISC processor board (RIO) and a MC68040 processor board (FIC) .
E. Design environment
The CERN Cadence/Verilog 2 CAE design environment on Sun stations was used for both the R3000 node interface and the DMA/68040 interface. We used a Verilog model for the Cbus [2] , (the application-side protocol of Dolphin nodechips) and behavioral libraries for the 68040 bus 3 . Consistent logical behavior of the interfaces was simulated prior to building hardware. For our prototypes, we used the ABEL HDL language and conventional programmable PAL logic. In a following stage we plan to move the designs into compact, field programmable logic array chips. 
R3000 RIO I/O node
The design of the R3000 based SCI node was started before the details of the NodeChip™ were fully known. It's main design goal was therefore flexibility. We have opted for a firmware solution for the following reasons:
• The complexity of some SCI protocols requires state machines which are too complex to implement in hardware without use of ASIC technology.
• We wanted to be able to easily adapt the design to more advanced second-generation node chips.
This firmware approach gives us flexibility through software emulation, at the price of some performance loss. It also allows us to study all SCI protocols implemented in the first version of node chips.
An input-output processor, the RIO 8260 from CES [8] , was chosen to implement the firmware and to carry the interface. This VME-board is equipped with a R3051 embedded RISC processor (25MHz), 4MB of DRAM and both a VME master and slave port, used for downloading software during development. No operating system is running on this board.
The MIPS 3000 bus (32 bits @ 25MHz) is connected to the synchronous Cbus of the NodeChip (64 bits @ 32 MHz) via a set of four FIFO chips (fig 6) : request-out and response-in for the requester part, request-in and response-out for the responder part. This approach allows the node to act simultaneously as a requester and responder. The FIFOs are 256 x 32 bits, with synchronous ports running at 40 MHz on the Cbus side and 25MHz on the processor side. A mailbox is used to control sending of FIFO packets to the Cbus.
Firmware formats the data as it arrives on the MIPS R3000 businto SCI packets and unpacks data packets from SCI before returning it via the R3000 bus. Address mapping is also done in firmware. This results in a flexible design: data can be formatted into any of the SCI packet formats which Cbus supports. The interface provides a bi-directional I/O connection between SCI, memory, VMEbus and other devices which can be interfaced to the MIPS R3000 bus. It is also used a research tool to test the cache coherency option of SCI. The functionality of a coherent SCI cache and memory controller can be emulated in the firmware to test the "typical set" of SCI coherency protocols as implemented in the NodeChip.
G. DMA node
A dual port interface to the 68040 bus was designed to allow generation of data block movement in excess of 100 MByte/s under control of a 68040 processor.
The DMA uses a sequence of fast dmove64 packet protocols with wsb (write selected byte) synchronization protocols at the start and end of 64 byte data boundaries. The 16 Kbyte deep dual port memory can also be addressed via SCI as a simple memory node, mapped into the 68040 address space and therfore represents a first implementation of a SCI memory node.
The performance achieved with this interface was measured as an average latency of 560 ns per dmove64 packet, corresponding to a data rate of 114 Mbyte/s.
H. Test setup
The test ringlet at CERN consists of two SCI nodes, mounted on VMEbus modules and controlled from a SUN hostcomputer though VMEbus. The DMA node, based on the 68040 FIC module allows for fast block transfers. The node based on the R3000 RIO module is driven by firmware and implements all SCI transactions of the nodechip at a substancially lower speed. The SCI diagnostic tracer, inserted as passive device, is used to display SCI packets on the screen of a Tektronix DAS 9200 logic state analyzer.
Firmware is cross-developed on the SUN host and downloaded through the PT SBS-915 SBUS to VME interface, to the nodes.
I. Test and servers software
During the development of the two nodeinterfaces, several test packages have been written to validate hardware and software. A terminal-driven diagnostic package can be executed either on the R3000 or on the SUN host. More complicated client/server state machines have been written to debug the first SCI hardware. Millions of SCI transactions have been validated on each node. In a further step, the two nodes have been examined for communication between each other, again without problems in data or protocol consistency Our next step will be the verification of a more flexible R3000 server, capable of transactions between SCI, VMEbus and a planned TURBOchannel interface, optionally via SCI coherent protocols.
J. Diagnostics
As part of its product suite, Dolphin is developing an SCI Tracer for debugging and diagnosis of SCI systems. The SCI Tracer is a special logic analyzer which can acquire and analyze SCI symbols from an active SCI link at the speed of 1 GByte/s. The hardware is implemented in a multi-board VME module. It consists of: an SCI board, a map board with memory maps for trigger patterns to control the tracing process and a storage board with 256 KByte of memory. Different types of triggers can be recognized and used as the basis for storing sequences of SCI packets. During the project, the first version of the SCI Tracerthe LinkProbe™ -has been made available as a product [10] . This is a front-end to a logic analyzer (such as the Tektronix DAS 9000), and is upgradable to a complete SCI Tracer. 
IV. CONCLUSION
First SCI protocol chips have been successfully used to design HEP specific nodes for an SCI ringlet at CERN. The speed of a CERN designed DMA node achieved more than100 Mbyte/s and will scale up by a factor of two for the final NodeChip™. Our firmware approach to control an SCI RISC node in a flexible and re-programmable way was succesful and allowed generation of a large set of SCI transaction for validation of SCI protocols. We used an SCI tracer module from Dolphin to monitor SCI protocols and packet content directly at the link.
V. ACKNOWLEDGMENT
This project has been funded by the DRDC Committee from CERN, the Norwegian Research Council, the Physics Department of the University of OSLO, Creative Electronic Systems (CES) in Geneva, the DEC Joint Project at CERN and APPLE Computer Inc. in Cupertino.
