Abstract-Hardware and packaging technology which provide the high performance of the CRAY-1 computer are reviewed. A brief overview of the computer is given, followed by a description of the computer circuits, packaging, power distribution, and cooling system.
II. SYSTEM OVERVIEW
is a listing of some CRAY-1S performance characteristics. The logic gates in the CPU are mostly emittercou$&l 'logic (ECL) diial 'NAND integrated circuits,-(IC%). Gate counts of the hardware functional units are given in Table II . Additional gates are used for control and storage functions which tie the hardware functions together. A full memory CRAY-IS contains four million words of high speed ECL random access memory (RAM). The 4K X 1. bit random access memory (RAM) chips are arranged to give a word length of 64 bits plus 8 bits for single error correction, double error detection (SECDED). The SECDED is an implementation of a Hamming code. A total of 73 728 RAM chips are used.
The CRAY-1 is a vector processor with the ability to operate iteratively on strings of up to 64 (vector length) operands Manuscript received October 16, 1980; revised January 29, 1981 . The author is with Cray Laboratories, 3375 Mitchell Lane, Boulder, CO 80301. or operand pairs. By contrast, scalar processors perform one iteration on one or a pair of operands. Each operand has a word length of 64 bits, which for floating point operations comprises a 49-bit signed coefficient and a 15-bit exponent. The CRAY-1 has the ability to operate efficiently even with short vector lengths [l] , and it is also a fast scalar processor.
Machine performance is expressed in millions of floating point operations per second (megaflops) because a single vector instruction is equivalent to a loop of several scalar instructions. The CRAY-1 has been shown capable of a sustained rate of 138 Mflops and to achieve 250 Mflops in short bursts [2] . System reliability is excellent, with an availability greater than 98 percent. Mean time between interruption (MTBI)is more than 100 h, and mean time to repair (MTTR) is about 1 h [3].
III. CIRCUIT COMPONENTS
One chip type comprises about 95 percent of the IC's in the CPU. The chip is a negative logic, five and four input dual NAND gate (S/4 gate) with complementary outputs. A logic diagram is shown in Fig. 4 . The S/4 gate is an ECL circuit with 750 ps propagation delay and 60 mW per gate power dissipation (120 mW per package). The 0.036 in square silicon die is packaged in a hermetically sealed 16-pin ceramic flatpack. Package dimensions are 3/8 X l/4 X l/12 in with leads on 50 mil centers as shown along with the smaller resistor package (described below) in Signal propagation between gates is governed by strict timing rules as shown in Fig. 8 . The 12.5 ns clock is divided into eight "gate times" of about 1.5 ns each. Roughly half the gate time is due to circuit propagation delay, and half is due to board-foil delay. Unused gates can be dropped from the path by adding 3 in of foil conductor.
A CRAY-1 module consists of two PC boards sandwiching a 0.08-in thick copper cooling plate that is also the ground bus. Signal communication between boards is performed differentially over 120-52 twisted pair in the backplane as shown schematically in Fig. 9 . The 120-a twisted pair is matched to two 60-a board traces in series. This differential method permits a low cross talk communication path and provides both the signal and its complement for use on the receiving module. An inverter gate (and gate delay) is therefore saved if the signal complement is needed. The twisted pair attaches to the module by a 96-pin pair connector with pins on 0.05 in centers. The board side of the connector pair is visible in Fig. 6 at the top.
Interconnection ground rules restrict twisted pair lengths to multiples of gate times, or multiples of 1 ft to a maximum of 4 ft. These circuit rules prevent timing problems or race conditions from happening anywhere in the machine. The CRAY-1 backplane contains 67 mi of twisted pair wire. The first machine was completed with no wiring errors.
V. .SYSTEM POWER
Voltage requirements for the PC board are -5 .2v(vE6) for the IC's and -27/'(7/,,) for the termination resistors. No bypass capacitors are: used on the PC board. A benefit of using the 5/4 gate IC is that it provides a balanced load to the supply. 'When one output turns on, another turns off, and the power supply loading is purely resistive and constant. Any ripple which occurs during the transitions is filtered out by the 16-nF capacitor formed by the power and ground plane in the PC board. A CPU module (two boards) can dissipate up to 36 W (7 A) from FE, and about 1.2 W (0.6 A) from VTT. There are 576 CPU modules. Taking an average of 25 W per module, the CPU dissipates a total of 14.4 kW. Each memory module contains 64 of the 4K X 1 bit ECL RAM chips which dissipate about 1 W each, and some interface logic chips. The power dissipation of the memory module is about 70 W. A full four million word CRAY-1S contains 1152 memory modules with a total memory power dissipation of 81 kW. Total power dissipation for the computer is 95 kW. Approximately 130 kW is supplied to the entire machine including power supply losses.
The computer modules receive FE, and VTT power from power supplies located under the seats around the periphery of the computer. These power supplies are simple linear rectifier-filter types which receive 400 Hz ac voltage from 36 variable transformers on a power distribution unit (PDU). The PDU receives 208 Vat at 400 Hz from a motor generator (MC) set which converts 480 V at 60 Hz to 208 V at 400 Hz. Each installation has two MG sets with one available for backup.
VI. SYSTEM COOLING
The CRAY-1S cooling system is designed to limit the IC die temperature to a maximum of 65°C. This provides a reliability margin from the 150°C absolute maximum IC junction temperature. The IC package case is maintained at 54°C. Heat generated in the silicon die flows through the IC package to the PC board ground plane and then to the 0.08-in thick copper cold plate. The cold plate conducts the board heat to its edges, which are held to 25°C by contact with a cast aluminum cold bar. The aluminum cold bars form the twelve vertical columns in the computer mainframe into which the modules slide horizontally on 0.4-in spacings. A refrigerant, Freon 22, flows through stainless steel tubes embedded in the cold bars. The development of the composite aluminum/ stainless steel cold bars represents a solution to one of the more difficult design problems of the CRAY-1. Cast aluminum is actually porous and oil mixed in with the Freon can cause reliability problems if it leaks onto the modules. A method to bond stainless steel tubing into cast aluminum had to be invented in order to make the cold bars practical.
The refrigerant is maintained at 18.5'C by an evaporative refrigeration system. Freon 22, which boils -41°C (at atmospheric pressure), absorbs heat from the cold bar and changes to the gas phase. It passes through a compressor and condenses IEEE TRANSACTIONS ON COMPONENTS HYBRIDS, AND MANUFACTURING'TECHNOLOOY, VOL. CHMT-4, NO. 2. JUNE 1981 back to a liquid by releasing heat to a cold water supply which Large numbers of gates per chip as in LSI will cause total chip power to be well over 1 W, and the requirement for dense packaging for short wire delays will cause a heat density can cause severe delay and waveform degration. A package can add 1 pF or more to the gate input capacitance. This can ACKNOWLEDGMENT add to signal delay by a factor of 30 ps/pF. The author wishes to thank all the Cray personnel whose Improved circuit technology gives a direct benefit to com-efforts and contributions made the CRAY-1, and therefore puter speed. Table IV gives some rough numbers of circuit this paper, possible. performance for comparison purposes. Silicon ECL may reach a delay limit in the low hundreds of picoseconds, and a power-delay limit in the low picojoule range. Significant im-111 provement is gained by switching to high mobility materials PI such as gallium arsenide or indium phosphide. Technical problems need to be solved, however. Processing of these [31 materials is still in the early stages. Gates have been fabricated [41 using GaAs transistors, but problems exist with device threshold voltage uniformity and limited current drive due to input r51 saturation as compared to silicon bipolar or MOS devices.
[61 An insulated gate structure would alleviate the current drive
