A data acquisition and experiment control system for experiments at the Biology Small-Angle X-ray Scattering Station at the National Synchrotron Light Source has been developed based on a multiprocessor, functionally distributed architecture. The system controls an x-ray monochromator and spectrometer and acquires data from any one of three position-sensitive x-ray detectors. The average data rate from the position-sensitive detector is 106 events/sec. Data is stored in a one megaword histogramming memory. The experiments at this Station require that x-ray diffraction patterns be correlated with timed stimuli at the sample.
INTRODUCTION
Small-angle x-ray scattering experiments have been used to investigate relatively large order structures in biological systems.1-3 The structure in these systems is measured by a diffraction pattern generated when the incident x-rays scatter from the sample.
These experiments seek to correlate changes in the biological structure, measured by change in the associated diffraction pattern, with various external stimuli applied to the sample. Whether the stimulus is electrical, chemical or mechanical, the objective is to record a series of time-resolved diffraction patterns correlated with the applied stimulus. Auxiliary information, including monochromator and spectrometer settings, is also recorded. However, this information is constant over a series of time-resolved diffraction patterns. This paper describes the design and partial implementation of a data acquisition and experiment control system capable of satisfying the requirements of the time-resolved diffraction experiments outlined above. The system is being developed for the Biology SmallAngle X-ray Scattering Station at the National Synchrotron Light Source.
The necessity for a complex, noncommercial data acquisition system stems from the data flow and storage requirements set by the nature of the time-resolved experiments planned for the biology beamline. The detector of the x-ray diffraction patterns yields high data collection rates and requires large memory arrays to store the collected data. X-ray diffraction data are taken on position-sensitive detectors, three of which are planned for use with the data acquisition system. The first is a one-dimensional detector with a global position readout of 1024 channels capable of digitizing an event every microsecond. The second detector is also one-dimensional with an amplifier, discriminator, and scaler chain on each of: 100 anode elements. This detector has the capacity to accumulate data in each scaler during data acquisition at 106 events/sec/channel. The Each of these functions is confined to an individual processor with a minimal set of support components. Additional components and peripheral devices are added as required to perform the specific function assigned. The functions assigned to the three subfunction processors are discussed in more detail below.
Along with the subfunction processors is a histogramming memory with a fast read/increment/write processor, all on a separate data bus segment called the HBUS. This subsystem is capable of storing detector data at 106 events/sec in noninterleaved operation and 2 x 106 events/sec when the addresses are interleaved by a factor of two.
Interprocessor Communication
Communication among the four LSI-ll family processors is carried out through a commonly accessed memory array called the multiport memory subsystem.45 Figure 1 shows a block diagram of several processors interconnected via the multiport memory. Each processor, either an LSI-ll (QBUS) or PDP-ll (UNIBUS), has an access port through which it reads from or writes to the multiport memory. The access ports and memory share a bus, called the MBUS, with 26 bits of address space that is an extension of the DEC UNIBUS. Standard UNIBUS memory with some extra address decoding is used in the subsystem. The port connects on the processor side to the standard processor bus (QBUS or UNIBUS). Each port, which is constructed on one DEC quad printed circuit module, has a 4K word window in processor physical address space. Access to the multiport memory address space from processor address space is routed through the window by a mapping register contained in the processor I/O space. In this way all 26 bits of multiport memory space are available to the local processor. Contention for access to the multiport memory is resolved by a round-robin scheduling module labeled SCANNER in Fig. 1 . The scanner asserts a PORT SELECT signal to each access port in turn for a duration of 100 nsec if no memory request is pending. If a request is pending, the port asserts the signal, SCAN STOP, to prevent the scanner from advancing. The scanner is released after the memory cycle and continues on to process memory requests from other access ports. Memory cycle time-outs are processed by the scanner module in such a way that no time-out in any processor can cause another processor to time-out.
A block diagram of the overall system architecture is shown in Fig. 2 Two different types of display device are incorporated into the display process in the data acquisition system. The first is commercially available and consists of a DEC VT-100 video terminal modified for raster-scan graphics with a Selanar Graphics-lOO add-on unit. The display terminal connects to the display processor with a 9600 baud, RS232-C serial data line. The command language of the modified graphics terminal Fig. 2 . Overall architecture of the data acquisition and experiment control system. Three types of bus segments are used to implement the system: (1) the LSI-ll bus, (2) the multiport memory bus, (3) the histogramming memory bus.
emulates that of a Tektronix 4010, and therefore the Tektronix Plot-10 graphics software can be used. Thus the display software has been implemented so that programs executing at the experiment control processor invoke subroutine calls that have the standard Plot-10 format. These calls are converted into function requests directed to the display processor. Software executing in the display processor sends the usual Plot-10 compatible ASCII character sequences to the terminal. Because a processor is dedicated to the display function, the experiment control processor also has available subroutine-callable display functions of a much higher level. For instance, a single function request leads to the complete generation of a standard display of position-sensitive detector data.
The second display device associated with the display processor is now under development at BNL. This device employs a high-speed, electrostatic deflection CRT (Hewlett-Packard 1304A) in conjunction with a vector/raster generator that reads data directly from the detector data memory. X-, Y-, and Z-axis data are combined using analog techniques so that rotatable isometric and density plot representations may be easily achieved.
Input/Output Operations Manager Processor
The lowest level processor in the functional hierarchy, called the input/output operations manager processor, controls a set of peripheral devices and supplies programs to all other processors in the system except the experiment control processor. The most important of these peripherals is a medium capacity disk drive, a Data Systems Design 880. This disk utilizes Winchester technology and emulates a DEC RLO1 with an extended capacity of 7.8 Mbytes. The input/ output operations manager processor maintains copies of each subfunction processor's operating system on this disk, as well as a number of main program-overlay program sets written in FORTRAN for use by the subfunction processors.
At present, x-ray diffraction pattern data are stored on peripheral devices (disk or magnetic tape) connected to the experiment control processor. However, a special purpose file management subsystem will be written for execution on the input/output operations manager processor. This file management will have a file structure that facilitates storage and recall of large data arrays and experiment parameters.
Histogramming Memory
Data are accumulated in the histogramming memory from the x-ray position-sensitive detector after being conditioned by the time-slice interface. The histogramming memory consists of three subunits, as shown in Fig. 2 , connected to a separate bus segment called the HBUS. This bus is electrically and physically similar to the MBUS and has an identical 26-bit physical address space. The histogramning memory behaves like ordinary memory on the MBUS when accessed from any of the system processors. It is, however, a two-ported memory with data flowing from the position-sensitive detector into one port (the read/increment/write processor) and access :7equests from the acquisition system flowing through the other port (the HBUS-to-MBUS access port). Only requests for access to physical memory locations that lie within the his togramming memory address space are passed through the MBUS-to-HBUS access port. Therefore, interprocessor communication on the MBUS does not load down the HBUS. The read/increment/write processor has the ability to add from I to 224-1 to any location in histogramming memory in 1 4sec. The actual value added by the processor depends on the mode in which data is collected from the position-sensitive detector. The word size in memory is programmable to be either 16 or 32 bits.
Scheduling between the MBUS-to-HBUS access port and the read/increment/write processor is accomplished in a round-robin fashion similar to that described for the multiport memory. Priorities are not assigned to the ports because the HBUS bandwidth is properly allocated to assure no loss of data from the experiment. Derandomization of detector data in the read/increment/ write processor insures that HBUS latency does not result in data loss.
IMPLEMENTATION
This section describes selected portions of hardware and software implementation in the data acquisition system. The time-slice interface, read/increment/write processor, interprocessor communication protocol, software implementation, and subfunction processor implementation are discussed.
Time-Slice Interface
The time-slice interface, as shown in Fig. 3 , conditions data from the position-sensitive detector and stimulates the sample synchronously with the individual time slices. The time-slice interface is a list driven device; the list is a series of 16-word entries contained in the local memory of the time-slice processor. Data are transferred to the interface by a direct-memory-access interface (DEC DRV-llB) where they are stored in a series of buffer registers until a match between the time in the event clock and the specified event time occurs. At this time, the data in the buffer registers are transferred to the output registers, and the function implied by these registers is executed. As soon as the data are transferred to the output registers, the buffer registers are reloaded from the next entry in the event list. Read/Increment/Write Proces sor
The read/increment/write processor, which is connected to the HBUS, forms the second major component that handles data from the position-sensitive x-ray detector on its way to storage in data memory. Depending on detector type1 this specially built processor can add from 1 to (2L4-1) to the contents of a 16-or 32-bit word in histogramning memory. The usual device of this type, found in most pulse-height analysis systems, can increment the contents of data memory only by unity. However, as mentioned in the Introduction, one of the detectors used in this experiment accumulates data in -100 scalers before storing the data in a burst at the end of a time slice. The other detectors process data one x-ray event at a time and hence require the usual increment by unity.
A block diagram of the read/increment/write processor is shown in Fig. 4 At the system software level, the basic unit of the cammunication protocol is the transaction. A transaction is a function request directed from a master processor to a slave processor. The masterslave relationships among the processors in the system are established at system generation time; these relationships define the functional hierarchy of the processors. At the highest level of hierarchy is the experiment control processor, which is always the master processor during any transactions in which it is involved. The experiment control processor uses transactions to request functions of any other processor in the system. The input/output operations manager processor, on the other hand, is the lowest level processor in the system. It always acts as transaction slave and services function requests made by all other system processors.
Transaction processing involves three types of memory blocks located in the multiport memory module.
A master mode processor requests a function of a lower level processor by loading parameters of the function to a request transaction parameter block in shared memory. Transaction processing begins when the master mode processor sets the function request flag in this same request block.
Lower-level processors, which act as transaction slave mode processors, examine, in round-robin fashion, transaction request blocks for all potential transaction master processors. When a pending function request is located, the slave processor extracts the function parameters from the multiport memory and performs the requested function. Parameters that represent the output from the function are loaded to an acknowledge transaction parameter block in multiport memory, and a function completion flag is set within this block. The transaction master processor uses the completion flag as a signal that the requested function has been performed. The master processor extracts the parameters returned by the function from the transaction acknowledge block in multiport memory. The transaction request and acknowledge parameter blocks are each 64(10) words in length.
A third block, the transaction datablock, may be involved in a transaction. The datablock is the largest of the transaction blocks, having a maximum length of 8K words, and is also contained in the multiport memory. The datablock contains non-control information associated with the transaction. If the datablock information is to be transferred from the master to the slave, the master processor must load the datablock to multiport memory before the function request flag is set. If the datablock information is to be transferred in the opposite direction, the slave processor must place the requested data into multiport memory before setting the function completion flag.
A typical function request is one in which a master processor requests a FORTRAN overlay program for execution. The mmory limit parameters of the overlay (start address, transfer address, etc.) are returned in the acknowledge transaction parameter block; the overlay program itself is returned in the transaction datablock.
Master and slave mode processor routines which allocate the transaction blocks appropriate to a particular master-slave pair and set the physical addresses of the blocks in map registers in the multiportmemory access port are part of the operating systems in the lower-level processors. The routines are initiated by invoking software traps and take the form of trap service routines in the processor synchronous system trap (SST) module.
Any slave processor that responds to transaction requests from several masters is termed a shared-service processor. A shared-service processor must resolve a potential conflict when transactions from several masters compete for the resources necessary to complete the transaction. Principle among such resources are parameter blocks and data buffers. The strategy used in this system is to avoid such competition by preallocating in interprocessor communication memory in the multiport memory subsystem individual interprocessor communication blocks and data buffers for any transaction that might cause such competition. Because the multiport memory has virtually unlimited address space and because of the present low cost of memory, this solution is deemed to be the most cost effective. Other types of resources that may be claimed for processing a transaction include slave processor peripheral devices such as disk and magnetic tape subsystems. Round-robin scheduling allows the slave processor to claim all resources required for a transaction before transaction processing begins and to release them before assertion of the completion flag.
Thus the ability to claim a complete set of slave resources is assured at the beginning of any transaction. This simple set of rules serves quite adequately in a data acquisition system, where data collection procedures are inherently sequential.
Software
As noted above, the data acquisition system uses as much commercially available software as possible, especially for the processor with which the users have the most interaction, the experiment control processor. The experiment control processor executes the DEC RSX-llM operating system. This choice has the advantage that program development can be carried out away from the data acquisition system on a separate system. The experimenter for which this acquisition system is being constructed has a DEC PDP-11/45 which executes RSX-llM and is available for program development.
The user's data-acquisition programs can take full advantage of the peripheral device control and file managements provided by RSX-llM. It is expected that the file management will be used to maintain extensive experiment parameter files on the acquisition system disk. X-ray diffraction pattern data are recorded on magnetic tape. Programs are transferred from the program development system to the data acquisition system via floppy diskette.
A FORTRAN program executing on the experiment control processor requests functions of lower-level processors in the functional hierarchy by executing FORTRAN subroutine calls. Subroutine code sets up in multiport memory the transaction parameter blocks required to control the function request. The FORTRAN subroutines do not directly access registers in the multiport-memory access port but invoke instead a set of software interrupt routines. The software interrupts are referred to as synchronous system traps (SST's) in RSX-llM. The interrupt service routines are maintained in an RSX-lIM memory resident library. The user links his program to this library of standard subroutines and synchronous system trap definitions during the process of building his program.
Subfunction Processors
The time-slice processor, the data display processor and the input/output operations manager processor, denoted collectively as subfunction processors, all execute software that is structurally similar. The operating system is a slightly modified version of RSX-llS, the DEC memory resident version of RSX-llM. This operating system was chosen because it supports elementary input/output operations to a large number of peripheral device types. The method (the "QIO" executive directive) of initiating input/output operations and the device-driver structure used by RSX-llS have been preserved intact. Figure 5 shows the utilization of local memory for the subfunction processors. The operating system occupies approximately 6K words of processor physical address space. Functions requested of these processors are performed by code written in FORTRAN. The programs take the form of main program-overlay program sets, i.e., a FORTRAN main program remains resident in one part of processor memory and FORTRAN overlay programs are loaded into another part. The FORTRAN main program is typically 6K words long and the overlays typically 8K words long. A main program-overlay program set may contain as many as 63 overlays. Also resident in the memory of these processors is a library of synchronous system trap (SST) service routines. These routines are independent of the particular FORTRAN program executing in the processor. The routines manage system parameters, such as the number of the processor currently requesting a function, as well as managing higher level functions such as loading and initiating execution of FORTRAN main programs. The SST routines also control access to regions in the multiport memory subsystem. 000 000 020 000 040 000 060 000 100 000 120 000 140 000 1 60 000
All processors at functional hierarchy levels above the lowest obtain their main program-overlay sets from the input/output operations manager processor, the lowest level processor, which cannot request programs and overlays from itself. This processor obtains programs from appropriate areas in the multiport memory. On system initialization, bootstrap programs in the processors execute a wait loop of a predetermined length during which the input/output operations manager processor loads its operating system, main program and first overlay from a read-only portion of multiport memory. The bootstrap programs then request the input/ output operations processor to supply them with the operating system image and standard main program appropriate to their processors.
STATUS
As mentioned in the Introduction, the acquisition system has been partially implemented. This section briefly describes the current status of the data acquisition system. The hardware for all processors including the histogramming memory has been integrated and tested. The time-slice interface and the vector display interface are under development. Software for interprocessor communication, the main program-overlay set manager subsystem and the initial data display has been developed and tested. Current software development includes code for the time-slice processor functions and fast vector display. Application code for the experiment control processor is being developed by the scientific group which will carry out the experi- 
