A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, we have been able to achieve computer amplification linearly proportional to the number of executing processors.
Introduction
Empirically, it is true that much of today's nuclear science results in multi-parameter data recorded verbatim on line to be later replayed for complete and detailed analysis. Furthermore, whereas a decade ago such data would have typically been fewer than eight parameters per event, today's data tends to have more than eight parameters per event and in some cases is hundreds or even thousands of parameters per event.
The results of this trend for data analysis systems are two fold. First, collection of more parameters per event implies a search for very complex correlations. This in turn implies that to be 'truly effective, data analysis will often require some interactive intervention on the part of the experimenter. Second, the physical volume of data with which such interaction is desirable becomes very large. These two results are conflicting. Effective interaction requires fast response times, but large raw data volumes imply longer data perusal times for the data analysis system. The response of many experimenters to this conflict is a continuous and expensive subsidy to computer manufacturers in the form of ongoing purchases of more and more machines used by fewer and fewer experimenters with poorer and poorer results. Data analysis requirements remain many paces ahead of the ultimate capacity of available sequential computing machines.
In an attempt to solve this dilemma, we have paralleled eight stripped-down minicomputers in a configuration* which demonstrably gives the resulting system greater than eight times the processing speed of a single minicomputer. Furthermore, the array is highly modular, allowing repair, replacement or upgrading of major components without major downtime periods or major system overhauls. It is planned to assemble the 8-processor array into an array which shares a common interactive processor; one programmed to interface with several users simultaneously. The system is expected to defy obsolescence by virtue of its ability to have its relatively inexpensive high-speed processing modules upgraded as new ones appear, and by virtue of its use of a separate processor dedicated to user interaction. The user-interaction processor, its main virture being its program instead of its hardware, is not expected to become rapidly obsolete. This paper is a description of our initial eightprocessor array and some results based on the data analyses presently being run through the system.
Hardware
As Fig. 1 illustrates, the heart of the configuration is the crossbar-like connection of sixteen memories to any system processor. System processors include pipelined processors for inputting and outputting data, the array of minicomputer central processors, a special sorting module (not yet implemented), and each memory is given its own zero processor, allowing it to be cleared rapidly.
The crossbar configuration allows data to be switched through the system in blocks. After being filled from the input pipeline, the memory (4096 words by 32 bits in the present system) is switched to become part of the address space of an available minicomputer central processor, after which it is switched to be unloaded by the output pipeline processor. Rather than being centralized, the gating which effects the crossbar is distributed among the sixteen memory modules, thus relieving interconnect congestion.
Control of the crossbar is centralized, however, illustrated schematically in Fig. 2 . A microprocessor is used to remember the desired sequence of processors. It also remembers when data must exit from the system in the same sequence in which it entered. Using these remembrances, it empties and then loads the First-InFirst-Out (FIFO) memory devices used to control the crossbar connections during a processing pass.
Before the system is started, the sequence processor loads a list of four-bit numbers into the FIFO associated with the first process to be executed. For example, codes representing memories zero through fifteen would normally be loaded into the FIFO controlling the memory-zeroing operation. Since memory-zeroing is independent of all other operations, this process would initiate for each memory in rapid sequence. As a memory finishes being zeroed, its unique tag is passed into it's exhaust FIFO where its presence flags the sequence processor. The sequence processor, using its pre-loaded list of processor sequence, removes the memory code from the zero-processor exhaust FIFO and puts it into the next-prescribed processor FIFO; usually the input pipeline processor FIFO. Upon finishing with a memory, the input pipeline processor causes the memory code to be passed to its exhaust FIFO, once again flagging the sequence processor. In this way, the memory passes circuitly throughout all the prescribed processors. If the sequence processor has been told that the sequence of data entering the system must be the same as that of data exiting the system, it remembers the sequence of memory tags going into the connect FIFO for the input pipeline processor, and uses this to quarantee the same sequence of memories passing into the connect FIFO for the output processor pipeline. 12-bit high speed ram used to generate storage addresses for data. The twelve bits of ram address may be driven either from an event counter or from a selected part of the data word or from some combination of the two. The RAM is programmed to put each incoming event on one of a set of preselected fixed address boundries, thus greatly enhancing the speed of data analysis when the memory arrives at its central processor unit.
The output pipeline processor is not a pipeline in the prototype unit (see Fig. 5 ). It is a counter used to generate memory addresses and some logic used to detect codes added to the data by the central processor units and used to select either a destination for the data or an end of the data buffer. Future plans call for a genuine pipeline as the output processor to enhance the flexibility of the system. Data from the system may be sent either to a histogramming memory or to a choice of two bulk storage devices, or to any combination simultaneously.
Programs which run in the parallel CPUS are compiled on a separate computer system, and only the required run-time code is down-loaded into the parallel CPU's. Since this separate computer looks like an operator at the control panel of each of the parallel CPU's, it has absolute control. After downloadinq a program, the separate computer system puts the start address into the appropriate hardware register and turns on the RUN switch. Implementing the computer -CPU connection in this manner makes it unnecessary to have any resident code in any of the parallel CPU's.
This prevents 'overhead functions'--those normally associated with operating systems and computer input/ output--from robbing us of actual computing power. The result is a system whose actual computing power increases linearly with the number of parallel CPU's running. Fig. 1 The processor array consists of an input pipeline, output pipeline, special sorting module (not yet built) and zero-processors used to clear each memory. A crossbar connects processors to memories.
The crossbar control processor sequences the connections, guaranteering no interference.
CONNECT DECODERS REGISTERS Fig. 2 The crossbar control processor is a microprocessor/discrete-logic hybrid. The microprocessor remembers desired and actual sequences and uses FIFO's to save its decisions until the hardware executes a switch. An actual switch requires less than 50 ns. The input pipeline is a shift register containing simple processing logic between stages. Logic is used to detect and divert headers and comments. It finds event boundries and places them at predefined locations in the memory into which it empties. 
