--Peripherals -the quality and range of peripherals is still much better on the central computers.
As this argument continues there is a movement to let you have your cake and eat it too! Good networking to hetero geneous systems is at last within sight as de facto standards, such as NFS (Net work File System for distributed file ac cess), X-Windows as a windowing stan dard and Postscript as a standard output language for graphics, are being rapidly adopted by many suppliers. If all goes has plannedFile systems will be accessed using NFS. Applications will talk to win dowing systems on workstations using X-Windows. All printers will accept Postscript format to print both text and graphics enabling users to print files wherever they want. The progress in this area has been due to the various manufacturers coming to a consensus of opinion on the directions to follow. No longer do they wait for the standards organisations, they prefer to follow de facto but functional standards now.
Finally the debate is taking on larger proportions as the personal workstation manufacturers are announcing new ma chines of 10, 20, 30 and even 40 Mips of power. They are incorporating new RISC (Reduced Instruction Set Computers) processor chips with 10 Mips power each into multi-processor architectures. We are certainly looking at an exciting future. The title of this report could also have been "Trends in Supercomputing''. The reason is that powerful parallel vector computers can deliver a high computa tional performance only if the applica tion software is adapted to their archi tectures. It becomes more and more evi dent that to solve the biggest problems, the software must be adapted to the computer architecture and vice-versa.
EPS CPG

Present Situation
Supercomputer is a designation given to about 300 computers installed world wide with a peak computing power of over 100 megaflops (million floating point operations per second). These ma chines have mainly been used for nume rical experimentation in various scien tific domains such as fluid dynamics, structural mechanics, seismic explo rations, reservoir modelling, quantum mechanics, plasma physics, materials science. It is also believed that portfolio analysis and financial transaction busi ness in banking will, in the near future, be executed on computers having many powerful processors and memories of the order of a few gigabytes.
Specific characteristics of supercom puters are high peak computing power achieved by very rapid clock periods (4.1 ns for a CRAY 2), parallel processors (65 536 in the highly parallel Connection Machine), pipeline architectures (in structions are divided into subinstruc tions which can all be in execution at the same time, so working like an assembly line) leading to up to two operations (1 add + 1 multiply) per clock period, large memories (more than 2 gigabytes for CRAY 2 and ETA 10) and very rapid connections to the outside world. The peak power and the memory space can be up to three orders of magnitude big ger than for workstations. The major flaws of these computers are that their operating systems and their mode of ac cess do not yet match modern stan dards. In addition, to benefit most from the high computing power, it is neces sary to formulate an application in such a way that at least 90% of all the com putations are executed as vector opera tions. Unfortunately, this is not yet the case for most of the applications now running on supercomputers. This further implies that non-vectorizable organizatory work such as editing, interactive graphics and documentation, data handling and communication business should be taken care of by a user friend ly, highly interactive personal worksta tion directly linked to the supercomputer via a high speed fibre optic network. At the moment, however, most of the supercomputers are accessible only through mainframes, thus reducing their attractiveness.
Scientist' Expectations
A scientist or engineer would like to be able to solve numerically the most realistic physical model described, for in stance, by a set of partial differential equations (PDE). As an example, one would like to compute the time evolu tionary solution of six coupled nonlinear PDE's in three dimensions. For a discrete approach (finite elements, finite diffe rences or finite volumes) a discretization of 100 x 100 x 100 intervals is today considered necessary to obtain physi cally relevant results. To reduce the number of time steps, implicit methods should be used; hence the necessity of very efficient iterative solvers based upon the methods of multigrid or conjugent-gradient with preconditioning. To make the best use of the parallel ar chitectures of future supercomputers, algorithms and programming techni ques leading to the definition of codes with coarse parallel granularity should be adopted. High parallelization at the subroutine level can be obtained by a decomposi tion of the geometrical domains into subdomains, each subdomain being assigned to one processor. To handle the connectivities between subdomains, fast networks built around high speed buses, global memories or direct con nections of nearest processors are need ed. An estimate for three-dimensional simulation programs shows the necessi ty for an effective computing power of 10000 Mflops (then one run does not take more than one hour CPU time) and a memory space of eight gigabytes (all the matrix elements are in memory, thus ex cluding input/output operations respon sible for reducing turn-around time and the memory occupancy).
This expectation of what one can call a "realistic case" is the current estimate of what one will be able to compute in the next five to ten years. However, if more physics is included in the model, i.e. more physical quantities influence the result, the number of PDE's will in crease. The solution then will show more details and, as a consequence, the number of time steps will increase. In ad dition, fine structures of the result will need finer spatial resolution. In the end one will require still more powerful com puters. We hope that they will exist. What will they look like? Let us try to answer this question.
Future Supercomputers
The performance of today's super computers can be increased in different ways. They are:
(a) Reduction of the cycle time
Within 4.1 ns up to two floating point operations can be computed per pro cessor in a CRAY 2. During the same time an electrical and a light signal travel 1.2 m in a cable. This means that the physical size of the computer starts to impose limits on the cycle times. Reduc tions of the cycle times have been rea lized by replacing silicon by a GaAs technology (CRAY 3 ready in 1989) or by cooling the electronic components to liquid nitrogen temperature (ETA 10 al ready available). For the near future, it is believed that the discovery of high tem perature superconductors will lead to high speed circuits based upon the Josephson effect. These circuits will have cycle times far below 1 ns. One can expect that by the end of the century, circuits in the 100 ps (10-10 s) range will be brought to the market. Thus in the next 30 years one can expect that the cycle time will be reduced by another factor of 100.
(b) Multipipelining and functional units
Before the advent of the CDC 6600 in the sixties, operations were performed strictly sequentially. At a given time, only one operation or one instruction in terpretation step was in execution. The CDC 6600 introduced stacking. Not only does the interpretation of one in struction work together with an execu table step, but it is also possible to acti vate all the processing units at once. This way of making better use of the available hardware was the first step towards pipelining. The step was further advanced on the CRAY 1. In this ma chine, the fundamental operations such as add or multiply are sliced into sub operations which can all be active when working on a vector instruction. The load/store unit, the branching processor, the scalar unit and the instruction inter preter can be simultaneously active as well.
A new type of computer (TRACE 7 of Multiflow) has recently appeared on the market and another has been announc ed (CHoPP). In these computers one tries to make maximal use of the pipe lining concept by combining a set of in structions into one super instruction. The format of these long super instruc tions offers a multiple compute and store and branch pipelined sequence every cycle. In order to avoid memory conflicts the LOAD operation is execu ted in broadcast mode. In the TRACE computer the most probable path is followed in the case of a branch. The compiler provides for compensation code to be able to undo the wrong path and to take the right one afterwards.
(c) Parallelization
Parallelization is another way of in creasing computing power. Different types of parallel architectures are pro posed. In the synchronous approach all the processors perform the same in struction at a given time, whereas in the asynchronous one, each processor ex ecutes its own program. The biggest problem in these parallel computers is the intercommunication between pro cessors. This can be done by direct local connections as in hypercubes (where each processor has its own local me mory) or by a global memory as in the CRAY'S. In the ETA 10 architecture each processor has its own local memory but communicates with the others through a global memory.
Parallelization can be obtained by splitting a program into a number of in dependent tasks, where each one is to be executed at the same time by a diffe rent processor (this mode is called multi tasking) or by decomposing a DO loop into small tasks (called microtasking). The first case must be programmed by the user, whereas the latter is handled by the compiler. One could imagine that in future architectures, the tasks will be given to different types of processor, each one optimal for the task. Vector code would automatically be sent to the special vector processor, scalar code to the special scalar part, graphics to graphics machines etc. This would guarantee a better use of the computer hardware.
(d) Software improvement
One of the major causes of inefficien cy in supercomputers is that the soft ware is not optimized for the specific ar chitecture of the machine. Users usually would like to continue using an old code developed on conventional computers. These sequential codes are often not vectorizable since logical paths are fol lowed step by step where one step con sists of one single operation. This has been done to minimize the number of floating point operations. Such an orga nization is opposed to the concept of high level parallelization and medium level vectorization. In addition, a com piler often cannot improve the efficiency of such codes. Substantial parts of a pro gram remain unoptimized.
The compilers (FORTRAN, PASCAL, C) available at the moment are not always at the level the user would like them to be. At best the innermost loops can be optimized. Many memory trans fers are necessary and, as a conse quence, the maximum speed is relatively low. More sophisticated compilers and better designed languages will hopefully improve this situation.
It is believed that in the near future the so called "Cancer Codes" which grew in a more or less uncontrolled way to monsters of several hundred thousand lines of code will be rewritten to benefit from parallel vector architectures and from huge central memories.
Outlook
One can see that there are oppor tunities to improve the computational power of present supercomputers by many orders of magnitude. The ETA 10 computer will soon outperform a CRAY 1 by a factor of 30 and by 1989, CRAY 3 and NEC SX3 computers will be 100 times faster. These machines incor porate more and more parallelism and offer memories of up to 8 Gigabytes (CRAY 3).
The announcement by Seymour Cray of 64-processor machines for the early nineties shows that the classical super computer manufacturers' trend is to wards high parallelism. They will in the future benefit from the developments now under way on highly parallel ar chitectures. With these interesting ex perimental machines one can study pro blems of organization of the data flow either through a global memory or by an interprocessor communication system. However, a pragmatic user of supercom puters is still reluctant to invest too much of his time to adapt a program to an experimental architecture.
