interresting property since each primitive program is well defined and induces a specific architecture.
The paper present an investigation we have performed nsing the Perle-1 FPGA board [2] [3] developed by t,he DEC Paris Research Lab. In order to show the validity of our approach, we implement manually the prime factorization algorithm.
The paper is organized as follows: section 2 gives a short, present,at,ion of the Gamma formalism; section 3 int,rodiices the set of primitive programs (tropes). Sect,ion 4 present,s the implementation of the prime fact>orizat,ion problem on the Perle-1 board.
The Gamma Programming Paradigm
The Gamma model can be described a s a multiset t,ransformer: the computation is a succession of applicat,ions of rules which consume elements of the multiset and produce new elements. The computation ends when no rule can be applied. The application of rules is performed in a non-deterministic way.
The basic information structuring facility is the multfiset,. A multiset, is similar to a set except that it, may c,ont,ain multiple occurrences of the same element,. At,omic component>s of multisets may be of type real, character, integer or tuple of arbitrary type.
The main feature of the model is the I?-operator, which can be defined in the following way: Operatlor R. is called the "reaction condition"; it is a boolean funchon indicating under which conditions some eleinent,s of t,he multiset can react. The A funct,ion ( "a.ct,ion") describes the result of this reaction. We point, out, t,liat, if t,he reaction condition holds for several subsets at the same time, the choice (which is made among them) is not deterministic; if t,hese subsets are disjoint the react,ions can even take place at, the same time.
Let us take one example to illust,rat,e the programming style entailed by the Gamma-model. The sieve of Eratosthenes can be writken as follows: The following figure describes t,he comput,at.ion of sieve(8). The lines between element,s intlicat,e reactions. Of course, this is one among the possible paths leading to the stable s a k .
/
The Gamma-model presented above is not, the most, general definition; actually, the r operator can t,ake any number of couples (R.eact,ion,Act,ion), each reaction condition indicating in which case t,lie associat,ecl action can be applied. As an example, consider t,he fibonacci computation. It, can be expressetl as follows: fib(n) = r (R3,A3) (r((R2,A2)(Rl,Al))(n}) where
The initial number n is decomposed it1t.o a tiiiinber of ones which are then siimmetl-iip t>o prodlice the result. The couples of R.eact,iori/Art,iori (RZ,A2) and (R1,Al) work in parallel. Once t,he rnult,iset, is stable (i.e. no more react,ioris occur), t,hr next, R r a ction/Action (R3,A3) is performed.
TROPES

Presentation
Tropes are a way of decomposing Gamma programs. They constitut,e a set, of primitive program schemes, which toget,her with just two basic combining forms provide an expressive parallel programming language. The tropes take the form of parametrized conditional rewrite rules in which computation proceeds in a nondeterministic local rewriting of a global multiset.
The following notation is used to denote multiset rewrihg: Five rewritme rules, ca.llet1 tropes, have been selected to provide a set, of primitive programs:
They a.re tlefinetl in terms of multiset rewrites. As an exa.inple, t,he t~~( i~~i w i~u t e r , the reducer and the ezpander can be expressed in the following way:
The t~n~~s i , t~~~~2~~: 7 .
applies the same operation to all t,he elemeiits of t,he miilt,iset until no element satisfies the contlit,ion C . The reducer decreases the size of the miiltisrt, hy applying a function to pairs of elements satisfyiiig t,he condition C. The expunder decomposes t,lie elemeiitfs of t,he mult,iset, into a collection of basic valiies. The int,erestetl reader can find a complete tlescript,ioti of t,he t 7~) p c . s
in [4] . The examples t,aken in the previous section can bot,li be expressed as a combinatmion of tropes. The sieve of Erat,ost,hrnes is simply a reducer:
The Fibonacci computation is a combination of three tropes PI, P2 and P3 :
expander t ru n s m ut e r reducer
Note that this program can also be expressed using only the sequential combination operator :
Synthesis
From an architectural point, of view the dec,omposition into primitive programs provides an irit,eresting way to simplify the synt4hesis of Gamma programs. As the tropes are well defined, a hardware skeleton may be associated with each of them.
The basic procedure for synthesizing a Gamma program is first to decompose it, into tropes. This s k p is done manually. In other words, the programmer has to write a Gamma program using only a set, of five primitives and two operators.
Once the tropes have been selected, the next s k p is to synthesize a corresponding architecture for each of them. Experiments have shown that, the decomposition into tropes is not, sufficient, : depending on the types of the elements processed inside a tropes, tlhe architecture could be rather different). An element, can have the type singleton, pair, triplet, ..., etc and he composed of integer, character, etc. A pair of integers and a pair of characters have a different, t,ype. As an example, consider the trunsmuter :
Remember that this tropes applies the same operation to all the elements of the multiset until no element, satisfies the condition C . Two cases may be observed : 0 the type of E is different. from the type of f ( : r ) , 0 the types of 2 and f(z) are ident(ica1.
In the first case (type(z) # t,ype(f(a))), if an element, x reacts, it will be transformed in y = f(x). As t,he type of y is different of the type of z, we are sure that, it. will never react again. A possible skeleton archit,ectiire is :
The boxes f and C stand respectively for the action and the reaction. The C box drives a multiplexer which, according to the condition computation, outputs 3: or I(.).
In t,he second case (type(z) = type(f(r))), if an element. 2 reacts, f(z) must be tested since it may react, again. A possible architecture is :
This second architecture is more complex and implies a more sophisticated control.
For each tropes, different skeleton architectures are provided depending on the difference of the input,/out,put types. The choice of the right tropes can be done aut,omat,ically by an analysis of the properties of the f fiinct,ion.
The idea behind the refinement of the decomposition in tropes is to simplify hardware mechanism. It will be faster and will save FPGA resources.
The final step is to assemble the tropes. 
Implementation
The hardware platform we use is t,he PRL-DEC Perle-1 board [2] . It, is based on the PAM (Programmable Active Memory) [3] concept, : like RAM memory module, a PAM is attached to t,he system bus of a host computer. The processor c,an writte into, and read from the PAM. Being an active hardware COprocessor however, the PAM processes data bet,ween write and read instructions. The specific processing is determined by the content, of its conji~figiiruii~n rricrri-
ory.
The Perle-1 board is built around a large array of bit-level configurable logic cells. This array is surrounded by local RAM banks used as a cache, a programmable clock generator and some addit,ional logic to manage the host bus interface.
The central computational array consistss of a 4x4 matrix of Xilinx XC3090 programmable gat,e arrays 151. Four 32-bit wide RAM banks (1 MBytes) are provided on each side. The host bus interface is a TurboChannel interface delivering a 100 MByt,es/s h n dwith.
Programming Perle-1 consists of describing an architecture using an object, oriented language (C++) and built-in primitive funct>ions. More compa.ct, designs may be achieved by controlling the place and route process directly by software.
In order to manage efficientfly a first, implement,ation of tropes on Pede-1, we imposed some rest,rictions :
only one tropes per FPGA, 16-bit datapath, use of the memory only if necessary.
As the matrix is a 4x4 matrix, a maximumof 16 iropt:s can be implemented. Actually, this is not, really restrictive since the majority of Gamma programs can be written using very few tropes.
The next section presents an experiment we have performed in order tjo validate our approach.
Experiment
The goal of the experiment was twofold. First, we wanted to demonstrate that the Perle-1 board is a suitable platform to support the implementation of tropes. Second, we wanted to show that a machine derived from high level specifications, such as Gamma, may have performance as good as a Von Neuman machine execut,ing a sequential program. As a validating example, we choose the prime factorization problem.
Prime Factorization Problem
A fuiidament,al t,heorem of arithmetic states that every positive integer n can be written as a product of primes and that, this decomposition in unique. This fact, gives a one-to-one correspondence between positive integers and a multiset of prime numbers: for example, if n = 120 = 23 * 3 * 5 the corresponding miilt,iset, is {2,2,2,3,5}.
The prime factorization Gamma program can be expressed as a combination of 6 primitive programs : , b)21 ( c , -+ ( c , d)2 e multiple(a, c ) F5
The initial mnlt,iset, is composed of a pair, ( 2 , n ) . Three different, t,ypes of elements are present : pairs of type 1, noted ( a , b ) l , pairs of type 2, noted ( a , b)2 and singletons.
The sequence (F2 o F1) ((2, n ) l } produces a multiset, of pairs (i,n)z where 2 5 i 5 n. F 3 removes all pairs (i, 7 1 )~ which does not satisfy the condition i does not davtde n (noted -tmultiple(i, n)). After the execution of F4, the multiset contains the pairs (i, n)Z where i is a prime and i divides n. F 5 produces the prime fact,or and F 6 removes the double pairs. Each primitive program is then associated with a tropes. The correspondence between the primitive programs and the tropes is done by analyzing the input/ont~put types. As an example, the primitive program F1 is a particnlar ezpunder which takes one pair 
Perle-1 Implementation
The figure 4.2 shows the implementation done on the Perle-1 board. The tropes are pipelinetl from the input FIFO to the output, FIFO. One FPGA cont,aiiis one tropes except for the tropes corresponding to F 4 .
Actually, this tropes requires to make test, on t,wo elements (say a1 and a 2 ) . To be efficient,, it. was tlecided when defining the tropes skeleton archit,ect,ure to implement concurrently two tests (C(u1, "2) and C(a2, a l ) ) since both reactions have to be evaluated.
In the present case, the condition has to determine if a1 is a multiple of a 2 which is not a low cost* operator in terms of CLB resources. To minimize tthe size of this operator, it has been implement,ed using a divisor operator proceeding sequent,ially. The division step requires two operations, a shift and a subtraction (which are done in parallel).
Implementing this tropes in only one FPGA is nevertheless possible if one accept to decrease performances by providing only one r e a h o n operator. The two conditions are then evaluated sequenttially.
Performance Comparison
The implementation has been compared with a similar algorithm written in C and executed on two Sun workstations (Sparc-2 and Sparc-10). The algorithm, like the Gamma program, searches first for a prime number which divides n ; then it produces the prime factors: 
1
The diagram below shows the execution time versus the number n. The performances of the Gamma machine are sit,iiated between the performances of the two Sun Sparc Stations. This implementation demonstrat,es t>hat, using suitable architecture for executing high level programming language, such as Gamma, may provide performance as good as a Von Neuman machine execut,ing a sequential program. Deriving a FPGA architecture from high level programming model using the composition of primitive programs (tropes) is a promising approach. Investigation for implementing the Gamma formalism on the Perle-1 FPGA platform has shown that it is realistic. The next step is to automate this approach. Presently, specific skeleton architectures corresponding to a set of primitive Gamma programs have been defined. From these skeletons, the challenge is to synthesize efficient tropes and, given these tropes, to place them optimally on the FPGA matrix.
Software tools provided with the Perle-1 board allow direct synthesis in the same way as VLSI design. The tools support interactive placement of the CLBs, allowing the designer to control the hardware topology of the design. Experiments have shown that, a good assignment of the CLBs with a few jirdicioirs routing directives may increase considerably the FPGAs possibilities: temporal performance is better and CLBs are economized.
