The RELICS language is a systolic programming language, which simplifies the programmer's task by making explicit the data-flow of systolic algorithms, and by exposing the data delivery mechanism. The underlying architecture model is different from other SIMD architectures in that it physically separates computation and data management. We introduce the RELICS language as a syntaxic and a semantic extension of the C language. We show that the RELICS programming model provides a simple programming method for systolic algorithms, which is applicable to a variety of parallel machines.
Introduction
Since the introduction of the systolic architecture concept by Kung and Leiserson in 1978, numerous algorithms and designs have been proposed t o solve compute-bound problems. Programmable parallel machines are now available t o develop, test and execute systolic algorithms. However, programming such machines correctly is a difficult task: both computations within the systolic cells and systolic data transfers between cells must be specified. Describing the computational part of the algorithm is relatively easy. But the communication and control management of the array raises subtle issues: synchronization protocols, dead-lock avoidance, 1/0 interfaces, and data partitionning.
To bridge the gap between systolic algorithms and parallel architectures, we propose a programming language, called RELACS. This language captures the essential features of systolic machines in a simple programming model and allows the programmer to get use of the capabilities of the underlying target architecture. This paper is a quick review of the RELACS programming environment. Further explanations and details are given in an extended version of this paper in [l].
The RELACS language
As the control of a systolic computer is not radically different from that of a conventional computer, we chose to extend the C language, a well know and efficient compiled language.
T h e programming model. We have retained a programming model that is at first sight close to data-parallel. The programmer sees his systolic machine as a programmable accelerator connected to a general purpose workstation. This accelerator appears as a SIMD network composed of a linear array of identical, conventional processors that communicate synchronously with their nearest neighbours. Only the two end-processors are linked t o the host (see figure 1) . The host broadcasts the same instruction to each processor, which executes the instruction on its own data. The user writes a single source program in RELACS, from which the compiler generates code for execution both on the host and on each cell of the network. The partitionning is done explicitly by data types and is described below. The compiler handles the details of communication protocols and generates the code for the target machine. Efficiency issues are discussed in [2].
-. . .
: ..ma tween host variables (scalar variables) from those which are located on the systolic array (systolic variables). For this, we have extended the C language storage class set. The RELACS language defines a new storage class specifier, the systolic class, that is used to reserved variables on each processor of the network. The default class, the static class, specifies a scalar variable residing on the host.
An expression operating on systolic variables produces a synchronous execution of the subsequent operations on each processor of the array. An expression operating on scalar variables performs the subsequent operations only on the host. Variables implied by an expression must be of the same class. Exchanges between systolic and scalar variables occur during communication operations.
Control structures The flow of control is sequential, and a classical block structure is provided. The RELACS language offers the same iterative and conditional control structures as the C language. However the SIMD execution model, which requires that all the processors receive the same instruction, complicates the treatment of the conditional jump. A local condition' on SIMD machines lead to a sequentialization of the execution of the two code branch, and so to a reduction of array processor efficiency. We chose to only support the simpler but much more efficient SIMD model which does On the other hand, it appears t o us useful to have a more limited local control; for example, to realize a maximum operation in each processor of the array. A conditional assignment instruction has been introduced in our programming model to provide for this local test without requiring one sequencer per cell. The basic form of a local test is written c ? a : b, and returns a if c is true, otherwise b. This instruction is the only one where a s y s t o l i c class condition is allowed. In C, such an instruction has the same meaning as the conditional instruction if (cond) expl; else 0-2; where only one expression is evaluated according t o the value of cond. In RELACS, both members are always evaluated.
Communications In systolic architectures, data transfers between processors are very important. Special care is devoted to this 1/0 mechanism in the RELACS language. New operators match the hardware architecture and express the tight coupling between neighbouring cells.
Our programming model assumes a SIMD execution mode and synchronous communications. Correct use of this model implies that the emission of a value by a processor in one direction is followed by the reception of the data sent by the neighbouring processor located in the opposite direction. This sequence of operations is expressed by assignment operators acting on aystolic class variables:
e Left assignment operator, example: x *< p. Each processor sends a value to the left and receives one from the right ( figure 2(a) ). The right most processor doesn't change the content of its variable I.
e Right assignment operator, example: I. => y. Each processor sends a value to the right and receives one from the left ( figure 2(b) ). The left most processor doesn't change the content of its variable I.
The global effect of these operators is a shift by one of the network variables, which reflects the data flow characteristic of systolic algorithms. It is quite natural to extend the range of these operators in order to include the communication between the network and the host. Two optional parameters to systolic assignments make this extension possible:
0 On the left assignment operator: x : A =< y : B. The right most processor receives the value of B sent by the host. The left most processor sends its variable x t o the host which stores it in A ( figure 2(c) ).
e On the right assignment operator: x : A => y : E. The left most processor receives the value of B sent by the host. The right most processor sends its variable r t o the host which stores it in A ( figure 2(d) ).
The introduction of these new operators frees the programmer from the synchronization details during communication phases. The RELACS compiler translates the source program into several concurrent C programs corresponding t o the computation task and t o the 1/0 management task of a systolic algorithm. The compiling method employed allows RELACS t o produce efficient code for a variety of machines, including SIMD and MIMD parallel machines. The current RELACS environment includes an operational compiler for the following pardel machines : iPSC/2, ARMEN [3] and iWARP. The ability t o compile parallel programs for sequential machines is of atmost importance for algorithm development and debugging. It allows deterministic execution, global state observation, and program profiling. The compilation of a RELACS program for a sequential machine produces a single C program, which is also exploited by a custom graphic debugging tool.
Among others applications developed with RELACS, matrix computations, video coding and string processing have been coded [4, 51. Current research involves the automatic partitionning of systolic RELACS programs on regular structures.
