Term Rewriting System (TRS) is a good formalism for describing concurrent systems that embody asynchronous and nondeterministic behavior in their specifications. Elsewhere, we have used TRS's to describe speculative micro-architectures and complex cache-coherence protocols, and proven the correctness of these systems. In this paper, we describe the compilation of TRS's into a subset of Verilog that can be simulated and synthesized using commercial tools. TRAC, Term Rewriting Architecture Compiler, enables a new hardware development framework that can match the ease of today's software programming environment. TRAC reduces the time and effort in developing and debugging hardware. For several examples, we compare TRAC-generated RTL's with hand-coded RTL's after they are both compiled for Field Programmable Gate Arrays by Xilinx tools. The circuits generated from TRS are competitive with those described using Verilog RTL, especially for larger designs.
MOTIVATION
Term Rewriting Systems (TRS's) [Baader and Nipkow, 1998 ] have been used extensively to give operational semantics of programming languages. More recently, we have used TRS's in computer architecture research and teaching. TRS's have made it possible, for example, to describe a processor with out-oforder and speculative execution succinctly in a page of text [Arvind and Shen, 1999] . Such behavioral descriptions in TRS's are also amenable to formal verification because one can show if two TRS's "simulate" each other. This paper describes hardware synthesis from TRS's.
2
We describe the Term Rewriting Architecture Compiler (TRAC) that compiles high-level behavioral descriptions in TRS's into a subset of Verilog that can be simulated and synthesized using commercial tools. The TRAC compiler enables a new hardware design framework that can match the ease of today's software programming environment. By supporting a high-level abstraction in design entry, TRAC reduces the level of expertise required for hardware design. By eliminating human involvement in the lower-level implementation tasks, the time and effort for developing and debugging hardware are reduced. These same qualities also make TRAC an attractive tool for experts to prototype large designs.
This paper describes the compilation of TRS into RTL via simple examples. Section 2. presents an introduction to TRS's for hardware descriptions. Section 3. explains how TRAC extracts logic and state from a TRS's type declaration and rewrite rules. Section 4. discusses TRAC's strategy for scheduling rules for concurrent execution to increase hardware performance. Section 5. compares TRAC-generated RTL against hand-coded RTL after each is compiled for Field Programmable Gate Arrays (FPGA) using Xilinx Foundation 1.5i synthesis package. Section 6. surveys related work in high-level hardware description and synthesis. Finally, Section 7. concludes with a few brief remarks.
TRS FOR HARDWARE DESCRIPTION
A TRS consists of a set of terms and a set of rewriting rules. The general structure of rewriting rules is: pat lhs if p ! exp rhs A rule can be used to rewrite a term s if the rule's left-hand-side pattern pat lhs matches s or a subterm in s and the predicate p evaluates to true. A successful pattern match binds the free variables of pat lhs to subterms of s. When a rule is applied, the resulting term is determined by evaluating the right-hand-side expression exp rhs in the bindings generated during pattern matching.
In a functional interpretation, a rule is a function which may be expressed as:
s. case s of pat lhs ) if p then exp rhs else s ) s
The function uses a case construct with pattern-matching semantics in which a list of patterns is checked against s sequentially top-to-bottom until the first successful match is found. A successful match of pat lhs to s creates bindings for the free variables of pat lhs , which are used in the evaluation of the "consequent" expression exp rhs . If pat lhs fails to match to s, the wild-card pattern ' ' matches s successfully and the function returns a term identical to s. In a TRS, the effect of a rewrite is atomic, that is, the whole term is "read" in one step and if the rule is applicable then a new term is returned in the same step. If several rules are applicable, then any one of them is chosen nondeterministically and applied. Afterwards, all rules are re-evaluated for applicability on the new term. Starting from a specially-designated starting term, successive rewriting progresses until the term cannot be rewritten using any rule.
Example 1 (GCD):
Euclid's Algorithm for finding the greatest common divisor (GCD) of two integers may be written as follows in TRS notation:
The terms of this TRS have the form Gcd(a,b), where a and b are positive integers. The answer is the first sub-term of Gcd(a,b) when Gcd(a,b) cannot be reduced any further. For example, the term Gcd(2,4) can be reduced by applying the Flip and Mod rules to produce the answer 2: Gcd(2,4) ! Gcd(4,2) ! Gcd(2,2) ! Gcd(0,2) ! Gcd(2,0) 2 TRS's for hardware description are often nondeterministic ("not confluent" in the programming language parlance) and restricted so that the terms cannot grow. The latter restriction guarantees that a system described by our TRS's can be synthesized using a finite amount of hardware. The nondeterministic aspect of TRS's has a strong flavor of modeling distributed algorithms as state-transition systems. (See for example [Manna and Pnueli, 1991 , Lamport, 1994 , Lynch, 1996 , Chandy and Misra, 1988 ). The focus of this paper, however, is on automatic synthesis rather than on formal verification of an implementation against a specification.
In the rest of this section we will describe the TRS notation accepted by TRAC. It includes built-in integers, booleans, common arithmetic and logical operators, non-recursive algebraic types and a few abstract datatypes such as arrays and FIFO's. Other user-defined abstract datatype, with both sequential and combinational functionalities, can be included in synthesis by providing an interface declaration and its implementation.
We begin by describing simple types (STYPE), which include built-in integer, product and algebraic (disjoint) union types. Product types are designated by a constructor name followed by one or more elements. An algebraic union is made up of two or more disjuncts. A disjunct is syntactically similar to a product except a disjunct may have zero elements. An algebraic union with only zero-element disjuncts is also known as an enumerable type. Product and algebraic unions can be composed to construct an arbitrary type hierarchy, but no recursive types are allowed.
The TRS in Example 1 should be accompanied by the type declaration:
Example 2 (GCD 2 ): We give another implementation of GCD to illustrate some modularity and types issues. Suppose we have the following TRS to implement the mod function.
Hardware Synthesis from Term Rewriting Systems
5
Mod Iterate Rule
Using this definition of mod, GCD can be written as follows:
ABSTRACT TYPES
Abstract datatypes are defined by their interfaces only and are included to facilitate hardware description and synthesis. An interface can be classified as either combinational or state-transforming. We discuss array, FIFO and content addressable memory abstract datatypes next.
Array is used to model register files and memories, and has only two operations defined in its interface. Syntactically, if a is an Array then a[idx] represents a combinational "read" operation which gives the value stored in the idx'th location, and a[idx:=v], a state-transforming "write" operation gives a new Array identical to a except location idx has been updated to value v. We only support Array of STYPE with an enumerable index type.
Fifo buffers provide the primary means of communication between different modules and pipeline stages. The two main state-transforming operations on Fifo's are enqueuing and dequeuing. Enqueuing element e to q appears as enq(q,e) while dequeuing the first element from q appears as deq(q). An additional state-transforming interface clr(q) clears the contents of the Fifo. The combinational operation first(q) gives the value of the first element in q. In the description phase, Fifo is abstracted to have a bounded but unspecified size. A rule that makes use of Fifo interfaces has an implied predicate condition that tests whether the Fifo is not empty or not full, as appropriate. We also support access to other Fifo entries with appropriate projection functions. Fifo entries are also restricted to be of STYPE.
Array C A M is similar to Array except its data fields are subdivided into a key field and a normal-data field. The same is true for Fifo C A M and Fifo.
The content-associative lookup interface cam(a,key ) returns true if an entry with a matching key field is found. The content-associative lookup interface camidx(a,key ) returns the index of an entry with a matching key field whereas camdata(a,key ) returns the data field. The value of camidx(a,key ) and camdata(a,key ) are undefined when cam(a,key ) is false.
As can be seen from the definition of TYPE, abstract datatypes are not allowed in algebraic disjuncts. Thus, only a complex product type can have elements of abstract types.
RULE SYNTAX
Syntactically, a rule is composed of a left-hand-side pattern and a right-handside expression. The predicate and where bindings are optional. The where bindings on the left-hand-side can require pattern matching. Any failure in matching PAT i to EXP i in the where bindings also deems the rule inapplicable. The expression on the right-hand-side, exp rhs , can also have where bindings, but RHS where bindings can be made only to simple variables and do not involve pattern matching. In the following ' ' represents the "don't care" symbol.
The type of PAT lhs must be either CPRODUCT or ALGEBRAIC. In addition, each rule must have PAT lhs and EXP rhs of the same type. This restriction, together with non-recursive type declaration, guarantees that the size of every term is finite and the size does not change by applying the rewriting rules. In Example 2, VAL is an ALGEBRAIC type with two disjuncts, Val and Mod. It is because of this type declaration that the Mod Done Rule does not violate the type discipline -both sides of the rule have the type, VAL.
Example 3 (Single-Cycle RISC Processor): The state of an unpipelined, simple RISC processor is described by its program counter (PC), register file (RF) and memory (MEM). This information is captured in the following type declaration:
k Loadpc(RNAME) k Add(RNAME,RNAME,RNAME) k Sub(RNAME,RNAME,RNAME) k Bz(RNAME,RNAME) k Load(RNAME,RNAME) k Store(RNAME,RNAME)
The processor we synthesized in Section 5. has four 32-bit general purpose registers, i.e. N=32, m=4. The behavior of the 7 instructions -move PC to register, load immediate, register-to-register addition and subtraction, branch if zero, memory load and store -can be specified as a TRS by giving a rewrite rule for each instruction. The following rule conveys the execution of the Add instruction.
Proc s (pc, rf , mem) where
Example 4 (Pipelined RISC Processor): The processor in Example 3 can be pipelined by introducing FIFO's as pipeline-stage buffers and by systematically splitting each rule into local rules for various pipeline stages. For example, in a two-stage pipeline design, the processing of an instruction can be broken down into separate fetch and execute steps. We model buffers between pipeline stages as a Fifo of an unspecified but finite size. In a behavioral description, it is convenient if the operation of each stage can be described without reference to other stages. FIFO buffers provide this isolation; most pipelined design rules dequeue an input from one FIFO and enqueue the result into another FIFO. In the synthesis phase these FIFO buffers are replaced by a fixed-depth FIFO or simply registers, and flow control logic ensures that a rule does not fire if the destination FIFO is full.
Here, we introduce the pipeline buffer BS in the declaration of the PROC p term.
8
The Add and Bz instruction rules are splitted into Fetch and Execute stage rules:
Fetch Rule
Proc p (pc, rf , bs, mem) ! Proc p (pc+1, rf , enq(bs,mem[pc]), mem)
Add Rule
Proc p (pc, rf , bs, mem) where Add(rd,r1,r2) = first(bs)
where Bz(rc,ra) = first(bs)
Notice the Fetch rule is always ready to fire. At the same time one of the execute stage rules may be ready to fire as well. This is the first example we have seen where more than one rule can be enabled on a given state. Even though according to TRS semantics, only one rule should be fired in each step, we will see that our compiler tries to fire as many rules in parallel as possible while maintaining correct TRS execution semantics. Without parallel firing of rules we won't get the pipelining effect we want.
Since there is a race to update the pc between the Fetch and the Branch Taken rules, the above rules can exhibit nondeterministic behavior. Specification of microprocessors and cache-coherence protocols often entails nondeterminism, even though a given realization is usually completely deterministic. Our compiler can handle such nondeterministic TRS's.
In addition to the TRS-to-RTL compilation to be described in Sections 3. and 4., we are developing source-to-source TRS transformations that can achieve the kind of pipelining described in Example 4. The dependence between the rules has to be analyzed carefully to ensure the correctness of all such transformations. Presently, human intervention is required to guide the transformation process at the high level. It is also possible to automatically derive the rules for a superscalar version of the pipelined processor in Example 4 [Arvind and Shen, 1999] .
INPUT AND OUTPUT
Traditionally a TRS describes a closed system, but we are experimenting with new notations and semantics to support description of a system with input and output (I/O) ports. In an approach that only requires minimal deviation from a standard TRS, the designer assigns I/O specific semantics to terms using source code annotations. For example, a wrapper to start and terminate a GCD computation can be given as:
Ignoring the I/O annotations ( iport and oport ), the type declaration and rules can be interpreted exactly as before. In fact, the combinational logic generated by TRAC is the same irrespective of I/O annotations. The first rule states as long as the first subterm of TOP is Load( ), the GCD term can be rewritten using the second and third subterms of TOP. The second rule states if the first subterm of TOP is Run and the GCD computation is done (when the second subterm of GCD is 0), then copy the first subterm of GCD to the fourth subterm of TOP.
The only effect of annotating the fourth subterm of TOP as an oport is that TRAC will attach wires to the output of the registers in that subterm and make their content externally visible through an output port. Conversely, the effect of annotating a term as an iport is that the wires normally connected to the output of the registers in that term are redirected to an input port instead.
A rule cannot rewrite a term labeled as an iport since the value of the 10 term does not correspond to any internal register. From the TRS perspective, an iport term may change unexpectedly, but atomically, without any rule application. By driving the appropriate values on the input ports corresponding to the first three subterms of TOP, a new GCD computation is started. Asserting signals corresponding to Run( ) at the input port enables GCD to execute to completion, and at which point, the answer appears on the output port as a consequence of the GCD Done rule. In this section we first describe a functional interpretation of each rule and then derive an "action on state" view of the same rule. The latter view is the starting point for hardware synthesis. Using and , an equivalent functional representation of a rule is
BASIC SYNTHESIS STRATEGY

FUNCTIONAL INTERPRETATION OF A RULE: AND FUNCTIONS
rule = s. if (s) then (s) else s
A RULE AS A STATE TRANSFORMER
In the architectural context, terms represent state, and rules define how the state can be transformed. If we restrict ourselves to synchronous circuits then each rule "reads" the state at the beginning of the clock cycle and if it can fire, it modifies the state at the end of the same clock cycle. In this "actions on state" view of a rule, one needs to update only those parts of the state that actually change. If two rules are enabled simultaneously and affect disjoint parts of the state then it is possible to execute both rules in the same clock cycle. After discussing the hardware to execute one rule in this section, we will return to the issue of concurrent firings in the next section.
Mapping Terms to Storage Elements:
A term can be represented as a tree based on its type. For example, the tree representation of GCD 2 is shown in Figure 1 .1. Algebraic types have an extra branch, Tag, where a register of width dlog 2 de records which of the d disjuncts the term belongs to. An ALGEBRAIC node has a branch for each of the disjuncts, but, at any time, only the branch whose tag matches the content of the tag register holds meaningful data. As an example we have shaded the active portions of the tree corresponding to Gcd(Val(2), Mod(4, 2)) in Figure 1 .1.
We can assign an unique name to each storage element based on its path (also known as projection) from the root. For example, the name for the second
Hardware Synthesis from Term Rewriting Systems
13
(from the left) NUM register in Figure 1.1 As an optimization, registers on different disjuncts of an ALGEBRAIC node can share the same physical register. In Figure 1 .1, the registers aligned horizontally are mappable to the same register. This idea can be expressed as allowing multiple pathnames to be associated with a single register state element. In a type structure that includes Array and other abstract datatypes, nodes corresponding to the abstract datatypes appear at the leaves of the tree.
The value embedded in the storage elements of a term can be represented in a similar manner using a set of <proj, value> pairs. are required to have the same type, the term resulting from a rewrite must have the same storage structure as the initial term. In other words, beginning with a TRS's starting term and its storage elements, successive rewrite operations never add or delete any storage elements. To implement a TRS, TRAC generates a state structure that is extracted from the starting term, and the rules are implemented as combinational logic that updates the content of the storage elements.
The pattern matching on the left-hand-side of a rule (the function) essentially tests the values of some of the storage elements. can also include combinational functions from the interface of abstract datatypes.
for the Flip&Mod rule of Example 2 will look like the following:
The right-hand-side of a rule (or ) can be viewed as specifying actions on the storage elements of the input term. The actions can be represented in a set of <proj, action> pairs. In general, a rule can be applied to a subterm of a whole term. In these cases, extract-state(s,proj) is called by a projection, relative to the whole term, that corresponds to the subterm. Furthermore, a rule can be applied to many parts of a term. In these cases, a rule's logic is instantiated multiple times, once for each state sub-structure where the rule is to be applied. In an alternative interpretation, a subterm-applicable rule needs to be lifted to the same type as the TRS's starting term prior to analysis. The effect of applying the lifted rule to the whole term is the same as applying the original rule to the subterm within the whole term. A subterm rule may be applicable to multiple positions in the whole term. A separate lifted version must be created for each possible application. For example, the Mod Done rule from GCD 2 in Example 2 could be applicable to both the first and second subterms of a GCD 2 term. The two lifted versions of the Mod Done rule are:
and
CIRCUIT SYNTHESIS
The and functions for the two GCD rules, GCD Mod and GCD Flip, in Example 1 are given below. A valid starting term for this TRS has the form Gcd(x, y) where x and y are postive integers. This starting term implies the set of storage elements: f<Proj 1 , REG[32]>, <Proj 2 , REG [32] >g. For conciseness, we refer to these registers as a and b in the following definitions:
For hardware synthesis we break down into actions on individual storage elements as specified above. Therefore, for each storage element e affected by a rule R, R e gives its next state value. R is the latch-enable signal of all the affected registers. Two state transition circuits corresponding to the two GCD rules, considered indenpendently, is first shown in Figure 1 .2.
Mod Rule
Flip Rule The final circuit is arrived by combining the two circuits. In these cases, both rules affect the storage element a but only one of them can actually fire in a given state. When merging the actions from rules with mutally-exclusive firing conditions ( ), the final latch enable is simply the logical-ORof their firing conditions (e.g., M o d + F l i p in this example), and the next state values are chosen from all of the 's using a multiplexer where a rule's enables its own . A sample update circuit that merges 's from two mutually-excluisve rules is illustrated as circuit A in Figure 1.4 . Figure 1 .3 shows the FSM generated by combining the and from both GCD rules.
However, in general, several 's could be asserted, i.e., several rules could be applicable. In the simplest solution, a new set of disjoint triggers 1 ::: n can be generated using a round-robin priority encoder fed by 1 ::: n . 's, which are mutually exclusive, globally replace 's at all multiplexers and at all latch enable OR-gates. A sample update circuit that merges 's from two possibly conflicting rules is illustrated as circuit B in Figure 1 .4. This arbitration is simple and correct, but the circuit is inefficient and allows only one rewrite per cycle. TRAC does not synthesize any state structures for abstract datatypes. When an abstract datatype is used in a TRS, TRAC instantiates the corresponding Verilog module in the RTL and makes appropriate connections to the interfaces. The user or the library is expected to provide a Verilog module in RTL for each abstract datatype. A state transforming interface has an implied signal driving by (or ) to enable the state changes when the corresponding rule is fired.
EXPLOITING PARALLELISM
According to TRS semantics, if multiple rules can simultaneously become applicable on a given term s, one of the rules is chosen nondeterministically and applied atomically to rewrite s to s'. Next, a new round of rewriting is started from scratch on s'. When a TRS exhibits such nondeterminism, multiple behaviors are allowed. Using a scheduler based on a round-robin priority encoder as discussed in Section 3.3, TRAC implements one of the allowed behaviors in a deterministic circuit that fires one rule per clock cycle.
If the simultaneously applicable rules involve mutually disjoint parts of the term, then these rules can be executed in any sequence successively to reach the same final term. In this scenario, although the semantics of a TRS specifies a sequential and atomic term rewriting, a hardware implementation can exploit the underlying parallelism and execute the rules concurrently in the same clock cycle. In general it is not safe to allow two arbitrary applicable rules to execute in the same clock cycle because executing one of them can alter the value of the or the function of the other. This section formalizes the conditions for simultaneous rule execution and suggests a scheduling that improves hardware performance by firing multiple rules in the same clock cycle when allowed.
TRANSPARENCY
The minimum condition for allowing two simultaneously applicable rules to fire in the same clock cycles is captured by the -transparent relationship.
Definition 1 ( -transparent)
Rule R 1 is -transparent to rule R 2 , denoted as R 1 < R 2 , if 8s:
This condition states that if two rules ever become applicable on the same term and R 1 < R 2 , then firing R 1 first does not prevent R 2 from firing on the resulting term. Firing in the reverse order may not necessarily be allowed, unless a stronger condition of mutual-transparency (or -conflict-free) is satisfied.
Definition 2 ( -conflict-free)
Rules R 1 and R 2 are -conflict-free if (R 1 < R 2 )^(R 2 < R 1 )
Given two rules where R 1 < R 2 , there are two basic approaches to allow both rules to fire in the same clock cycle. The first approach cascades the combinational logic from the two rules such that R 1 is applied first to the physical state elements, and R 2 is applied to the effective state after attempting to apply R 1 . In effect, we are creating a composite rule where
Arbitrary cascading does not always improve circuit performance since cascading combinational logic may lead to a longer cycle time, especially when serveral rules are composed. In a synchronous design, if the clock period increases, every rule firing is penalized, even when at most one rule can fire.
In a more practical approach, the input to the combinational logic from all rules are driven directly by state elements. Two transparent rules are allowed to execute in the same clock cycle only if the correct resulting state can be constructed from independent evaluation of the same current state.
PARALLEL COMPOSIBILITY
Two rules that do not affect the same storage elements are parallel composible, provided allowing them to execute concurrently on the same state produces a behavior that corresponds to at least one ordering of rule-execution in TRS.
Definition 3 (Parallel-Composible Transparency)
Rule R 1 is PC-transparent to rule R 2 , denoted as R 1 < P C R 2 , if Essentially what this definition says is that, if both rules R 1 and R 2 want to update a register, then they must produce the same value. In the case of an array, if the two rules update different elements of the array, then parallel composition will work assuming the array has multiple write ports. In the case of a FIFO, if one rule enqueues and the other dequeus then they can be combined to execute in the same cycle.
Note R 1 < P C R 2 does not imply that the outcome is confluent. Consider the following two rules that operate on four registers:
Now consider the starting term F(1,1,r C ,r D ). The effect of executing R 1 after R 2 is F (0,1,1,1) . On the other hand if R 2 is executed first the result would be F(0,1,r C ,1) and R 1 will no longer fire.
For two rules to be confluent we need the following stronger condition.
Definition 5 (Conflict-free)
Rules R 1 and R 2 are conflict-free if (R 1 < P C R 2 )^(R 2 < P C R 1 )
If two rules are parallel composible, the 's do not collide and no special merging circuit is required to arbitrate their actions on the affected storage elements.
SEQENTIAL COMPOSIBILITY
Even if two rules do affect some common state, by carefully prioritizing the effect of the two rules such that the effects of R 2 overrides R 1 (in case R 1 < R 2 ), a legal outcome can still be constructed from simultaneous evaluation of the two rules on the same current state.
Definition 6 (Sequentially-Composible Transparency)
Rule R 1 is SC-transparent to rule R 2 , denoted as R 1 < SC R 2 , if (R 1 < R 2 )8 s:( 1 (s)^ 2 (s)) ) 2 ( 1 (s)) = SC(s 1 (s) 2 (s))
Sequential composition that implements the priotization is defined as
Array n a ) 8 1 i n : If a register is only affected by either R 1 or R 2 then can be used directly.
Circuit (C) in Figure 1 .4 illustrates the update circuit for this case.
DOMINANCE Definition 8 (Dominance)
Rule R 2 dominates rule R 1 , denoted as R 1 < D R 2 , if (R 1 < R 2 )8 s:( 1 (s)^ 2 (s)) ) 2 ( 1 (s)) = 2 (s)
If two rules, R 1 and R 2 are conflicting, but R 2 dominates R 1 , we can include this information in the priority encoder when generating 's for global replacement of their 's. If 1 and 2 are both asserted on a cycle, instead of using a fair round-robin priority encoder, the encoder would statically give priority to 2 . For a two rule circuit, 2 = 2 and 1 = 1^: 2 . Circuit (D) in Figure 1 .4 illustrates the update circuit for this case.
SCHEDULING FOR SIMULTANEOUS FIRING
To conclude this section, we describe a scheduler that is currently implemented in TRAC that makes use of conflict-free (CF ) relationships. In general, an exact test for CF relationship between two arbitrary rule instances is expensive (Finding an s such that i (s)^ j (s) is like solving SA T). Instead, TRAC performs several conservative tests to find as many CF relationships as possible. First, two rule instances that read and write non-overlapping parts of the systems are CF. If two rule instances do not rewrite the same registers, and if none of the registers affected by the of one is used by the and of the other, and vice versa, then the two rules are CF since this condition is stronger than the requirement for CF. Lastly, TRAC symbolically analyzes pairs of 's to conservatively determine when a pair can never be satisfied simultaneously and thus are CF by default.
TRAC makes use of certain axioms when analyzing the conflict relationships between rules that reference abstract datatype interfaces. For example, deq(enq(q,e)) = enq(deq(q),e) if q is not empty first(q) = first(enq(q,e)) if q is not empty Based on the analysis above and taking into account the properties of FIFO buffers, it can be shown that the rules of Example 4 are CF except for the Fetch and the Branch-Taken rule. However, it can be shown that the Branch-Taken rule dominates the Fetch rule in the sense that the effect of applying the Branch-Taken rule after the Fetch rule is the same as not applying the Fetch rule at all i.e., ( BzN (s) = BzN ( Fetch (s))). Thus, instead of arbitrating between these two rules, the compiler gives priority to the Branch-Taken rule.
After TRAC has establish CF relationships between as many rule instances as possible, a graph of rule instances can be constructed by adding an edge between each non-CF pairs. Scheduling groups is formed by partitioning the graph into connected components. Different groups never interfere and can be scheduled independently. For each group, a round-robin priority encoder can be used to map to for arbitration. For a small group, an n n look-up table can be computed off-line to encode to where more than one can be asserted if the rules of the asserted 's are CF.
PERFORMANCE EVALUATION
TRAC generates RTL Verilog that can be synthesized to a variety of technologies by commercial tools like Synopsys and Xilinx hardware compilers. In this paper, we evaluate the quality of the TRAC-generated RTL's against hand-coded RTL when compiled for Xilinx FPGA's.
