Abstract
Introduction
The Abstract Transition System (ATS), which shares similar semantics as Term Rewriting Systems (TRS) [2] and guarded commands [7] , was previously proposed for high-level hardware description and synthesis [9, 101. ATS hardware description is operation-centric because the description of behavior is organized into atomic operations that each affect multiple state elements (rather then nextstate equations that each control one state element). An ATS consists of a set of state elements and a set of state transitions. An ATS transition comprises a predicate and a set of state-update actions. In a state where the predicates of several transitions are simultaneously true, any one of those transitions can be selected non-deterministically. ically; that is, in each round of state transitions, all transitions read all state values in one instantaneous step before the one selected transition writes all state elements in another instantaneous step. Thus, the execution of an ATS can be abstractly interpreted as a sequence of atomic applications of transitions, where each transition produces a state that satisfies the predicate of the next transition.
The atomic and sequential semantics of ATS do not prevent a correct implementation from executing several transitions concurrently. References [9] and [IO] describe a method for synthesizing ATS into a highly-concurrent clock-synchronous implementation that efficiently executes multiple transitions per clock cycle but still maintain the appearance of a sequential and atomic execution. Additiona1 details of ATS and its compilation are described in Section 2.
Despite its various advantages, the non-deterministic semantics of ATS presents an obstacle to describing hardware designs that require synchronous and deterministic operation. The non-deterministic semantics of ATS says that any enabled transition is permitted, but not committed, to execute. Hence, an ATS designer could not guarantee that a transition will execute in a particular clock cycle, or even at all. To address this shortcoming, we propose two synchronous extensions to ATS that permit the specification of synchronous behavior and compose naturally with the original ATS.
The two extensions are coniniirting transitions and synchronously delayed expressiurrs. The resulting ATS+ is intended to simplify the description of a hardware system where only a small portion of the overall activity must be synchronous. For example, in a layered communication stack, the activities in the upper layers interact with asynchronous handshakes (e.g., requesvgrant), and only the bottommost physical layer must match the timing of the physical medium. When describing a controller for such a protocol, we can simplify the specification of the upperlayers using the original ATS's simplifying atomic and sequential semantics. At the same time, where necessary, the Section 2 provides additional background on ATS. Section 3 introduces ATS+ extensions. Section 4 explains the compilation of ATS+ for hardware synthesis. Section 5 presents an evaluation of AT%; this evaluation compares ATS+ to Verilog RTL and Esterel. Section 6 discusses prior work in operation-centric and synchronous HDLs. Finally, Section 7 presents a summary and our conclusions.
Paper outline.
ATS primer

~
In this section, we present the ATS abstraction. AT$ is an intermediate hardware representation used to support the compilation of operation-centric source-level languages such as TRSpec [9] and Bluespec [I] . We use ATS to present the ideas in this paper to avoid the complications of source-level languages.
Ovemew
ATS is a state-based abstract hardware representation with operation-centric state transitions. The structure of ATS is summarized in Figure 1 . At the top level, an ATS is a tripie (S, So, X). S is a list of explicitly declared state elements, including registers (R), arrays (A), and FIFOs (F).
So is a list of initial values for the elements in S . X is a list of transitions, ( x , a ) . In a transition, x is a boolean predicate expression and (Y is a list of concurrent actions, with exactly one action for each state element in S. ( Section 1, the operation-centric transition semantics is noncommitting, that is an enabled transition is permitted, but not committed to execute. An R-type register state element can store an integer value up to a specified maximum word size. The value stored in a register R can be referenced using the side-effectfree get( ) query and set to v using the ser(v) action. (In our examples, we abbreviate R.get() simply as R and R.set(v) as R: =v.) Figure 1 lists the actions supported for arrays and FIFOs; a complete description of ATS, including U 0 ekments, is given in [9]. Without loss of generality, this paper focuses on systems with only R elements.
Compilation
Basic strategy. This paper is concerned with mapping ATS to a clock-synchronous implementation. In the assumed mapping strategy, the elements of S-registers in this case-are instantiated from a design library and the transitions in X are combined to form the next-state logic for the instantiated state elements. In each clock cycle, the K expressions of all transitions are evaluated combinationally and some subset of transitions whose K expression is asserted is selected to update the state elements on the next clock edge.
In a naive implementation, only one transition is selected in each clock cycle-this automatically satisfies the sequential and atomic semantics of operation-centtic transitions. This naive implementation is functionally correct but inefficient due to a lack of hardware concurrency. The implementation produced by an ATS compiler in reality employs an arbitration logic that selects multiple enabled transitions to update state elements concurrently in each clock cycle, provided the resulting new state values correspond to some valid sequence of atomic execution of the same constituent transitions. The ATS compiler generates such an arbitration logic based on a static analysis of conf7icrTfree ( < > C F ) and sequentiaZly-composa~jZ~~ (< s c ) properties among the ATS transitions.
Arbiter synthesis. < > C F is a symmetric relationship between two transitions that ensures two transitions could be executed correctly in the same clock cycle in a clocksynchronous implementation. Given T, and T, are both applicable in state s, T, < > C F Tb implies that On the other hand, <sc is an asymmetric relationship between two transitions that also ensures two transitions can be correctly executed in the same clock cycle. The <sc relationship is less strict than < > C F in that it only requires the concurrent execution to agree with one order of execution. Figure 2 gives examples of < > C F . <SC, and conflicting transitions that cannot be executed in the same clock cycle. In these examples, we write a transition (~,a) concretely as
In this notation, we omit registers whose action is 'E' from the register-action list. In the < > C F example (a), TI and T2 read and write two disjoint sets of registers. In the <sc example (b), T3 and T4 have a read-write dependence on register A, but concurrent execution of T3 and T4 gives the same resuIt as if T3 is applied before T, in sequence. In the conflicting example (c), a circular dependence between T5
and Te prevents the two transitions from producing a valid result if executed concurrently. Formal definitions of <sc and < > C F are given in [93. The same reference also gives theorems that states it is correct for an ATS compiler to devise an arbitration logic that, on each clock cycle, selects an arbitrary subset of enabled transitions provided 1. each pair of transitions is related either by < > C F or <SC. and 2. a partial order with respect to <sc exists for the selected transitions.
Bluespec compiler. The synthesis results in this paper are produced by the Bluespec Compiler (BSC) [l] . The theorems mentioned above are applied in BSC to produce highly concurrent clock-synchronous implementations from a sequentially conceived and interpreted operationcentric hardware description. to execute a number of transitions in sequence, the execution of those transitions alI observe the same state values as latched on the previous clock edge. The second characteristic-a direct consequence of the first-is that the critical delay path of a multi-transition-pcycle implementation is still determined only by the combinational delay of the single worst-case transition,
Limitations of operation-centric semantics
An enabled ATS transition is permitted, but not committed, to execute. When multiple ATS transitions are enabled together, ATS's abstract semantics allows any one transition to be selected nondeterministically. Hence, an ATS accepts multiple "correct" versions of deterministic docksynchronous implementations. In fact, this freedom is taken by the ATS compiler to select a subset of non-conflicting transitions to execute in each clock cycle. The downside of this freedom is that although a transition is guaranteed to execute only when enabled, a transition is never guaranteed to execute on a particular clock cycle, or even at all.
Consequently, ATS l a c b the ability to describe a system whose correctness depends both on functionality and the exact timing of events. In BSC, this limitation is addressed by an assert pragma which, when applied to a transition, informs the compiler that the transition must execute if its predicate is enabled. If the compiler cannot correctly guarantee the assertion (e.g., the user Iabels two conflicting transitions), then the compilation should fail with an error.
This assert pragma, in essence, imparts synchronous semantics to the labeled transitions. The assert pragma has been successfully used to specify designs with synchronous behavioral constraints. In this paper, we formalize the assert pragma as committing transitions. We also develop the notation and compilation procedure to enable intuitive integration of synchronous committing transitions with the original non-committing transitions in the same representation framework.
Synchronous extensions in ATS+
We propose two extensions to ATS to support the specification of synchronous behavior. The current formulation of these two extensions assumes the particular synthesis strategy given in [9J and briefly described in Section 2.2. The resulting extended system is called ATS+. The first extension is a new class of transitions .with committing execution semantics. The second extension adds support for synchronously delayed expressions that enable a transition's predicate and actions to read past values of state elements. 
Committing transitions
The definition of ATS+ includes a new class of committing transitions X,=(T,, ,.... TcA,). A committing transition has the same structure ( . , a ) as the original ATS transitions, which we now refer to as non-committing transitions.
In our examples, we write committing transitions with the symbol '+c' to differentiate from non-committing transitions. Like non-committing transitions, synchronous transitions obey the invariant that they only execute when their predicate is true. Furthermore, the execution semantics of committing transitions remains atomic. Like ATS, the execution of an ATS+ implementation must still correspond to an interleaving of atomic transitions (both committing and non-committing), where each transition leads to a state that enables the predicate of the next transition.
The sequential interleaving of transitions in ATS+ is, however, sectioned into clock periods that each contains one or more transitions. The clock periods correspond to the red clock in a synthesized clock-synchronous implementation. If a committing transition's predicate is satisfied at the start of a clock period, then it must be executed in that clock period. Consequently, for a valid AT%, the set of committing transitions must be 1. pairwise < > n r~' , < > C F or <sc, and 2. have a partial-order with respect to <sc.
These two conditions ensure all committing transitions that -could potentially be enabled in the same clock cycle in an implementation can be executed concurrently as required by their committing semantics. On the other hand, if a noncommitting transition is enabled in a clock cycle but conflicts with another enabled committing transition, the noncommitting transition can always be correctly deferred to the next clock cycle.
' T a < > h f~T b
implies T, and Tb have mutually exclusive predicate conditions, i.e., Vs ~( X T , (s)A TT,Js)).
Synchronously delayed expressions
ATS+ with committing transitions has the abiIity to specify alt synchronous hardware behaviors. Unfortunately, describing synchronous behaviors that span multiple clock periods can be tedious because a committing transition can only relate states that are separated by one clock edge. Synchronous behaviors that span multiple clock periods must be constructed using a sequence of committing transitions as basic building blocks. Below we introduce two syntactic shorthands that simplify the specification of multiple-cycle synchronous behavior in ATS+.
"was" expressions. 
ATS+ compilation
Our goal is to map an ATS+ description to a clocksynchronous Verilog RTL description. Our ATS+ compiler is a meta-compiler layered on top of the Bluespec CompiIer Figure 4, 
(BSC). As shown in
Committing transitions
DeIayed expressions
The first pass of ATS+ compilation expands delayed expressions into a set of committing transitions without delayed expressions.
" 
Assuming (r>O), the expansion of this transition is
The delayed expressions in an ATS+ are expanded one at a time until only committing and non-committing transitions without delayed expressions remain. In the second pass of ATS+ compilation, every committing transition is recast into a non-committing transition annotated by BSC's asserr pragma. The committing transitions need to be checked for validity according to the requirement posed in Section 3.1 (i.e., the committing transitions should be painvise < > A I E , < > c p , or <SC). By design, this validity check coincides with BSC's existing check for the scheduling of asserted transitions. If BSC cannot schedule all of the asserted transitions, it will generate an error. The committing transitions generated by the ATS+ compiler during the delayed expression expansion cannot change the vnlidity of an ATS+. However, because BSC's conflict analysis is a conservative approximation, it is possible for a valid set of committing transitions to fail compilation. This issue is discussed as the latter of the two optimizations described next.
Optimizations
Interval optimization. For delayed intervals with large upper bounds, the shift register chain introduces a high area overhead. A Boolean interval expression using the reduction operator '&' can be optimized by computing the reduction using a saturation counter. This optimization replaces a (t'-rc1)-bit section of the shift register chain with a r h g z (t'-r+Z)]-bit counter. This optimization is expressed in the expansion below. Given Therefore, TI and T2 are vaIid and synthesizable together even though their actions are neither < > C F nor < S C . Next consider T3 and T4 with predicates that are mutually exclusive delayed expressions. In this case, after de-sugaring, the actions of T3 and T 4 q e each predicated by values of delay registers that corresponds to the values of (X&Y) and (X&!Y) from 3 cycles ago. In this case, BSC would fail to recognize that the delayed predicate values are mutually exclusive since they appear to be coming from two uncorrelated registers. T3 and T4 would not be synthesizable by BSC although they are valid together.
To fully expose the data dependencies between transitions to BSC, the ATS+ compiler pushes the delay of an expression down into its variables, as shown in transitions T5 and T6. After common sub-expression elimination, the expression Y[3] in the de-sugared version of both Ts and 7 ' 6 would be replaced by references to the same delay register. Thus, BSC would be able to deduce T5 <>ME T6, and successfully synthesize these two transitions together. However, the push transformation increases the number of shiftregister chains in a design since each delayed variable uses its own chain. Therefore, BSC is actually invoked twice by our ATS+ compiler. First, pushed transitions are compiled by ATS+ compiler and then BSC to enable the most precise conflict analysis possible. To produce the more efficient implementation, the ATS+ compiler and BSC are invoked again with the unmodified transitions, annotated with pragmas to override BSC's analysis.
Results
In this section, we compare ATS+ to hand-coded Ver- In Tables 1 and 2 , we report the synthesized area and cycle time of AT% and RTL Venfog examples. The results are generated using Synopsis Design Compiler for a commercial 0.18um standard cell libraj. In these examples, the quality of the circuits generated from AT% description through BSC and from hand-coded Verilog are very similar. In the first two examples based on standalone primitives, the circuits synthesized from ATS+ through BSC are essentially identical to those synthesized directly from handcoded RTL and therefore bave the same .area and cycle time.
In addition, we synthesized the interval expression example both with and without the optimization discussed in Section 4.3. We compare the result to a Verilog counterpart that has been manually converted to use counters. The counter optimization achieves the expected area reduction. The degree of impact from this optimization is a function of the interval length. Delayed expressions primitives. We first compare how simple ATS+ delayed expressions would be expressed in Verilog RTL and Esterel. The two primitive statements are Verilog support references to values more than one cycle ago, the equivalent expression has to be explicitly constructed from single-cycle primitives. Shared token example 1. This example is based on an arbiter that manages the sharing of a common resource between two clients with the help of a token. A client is allowed to use the resource if it logically possesses the token. A client must maintain its request signal until it is granted the token. This example has a hard synchronous requirement that a client must be granted a token within 10 cycles or less. Excerpts of descriptions for this arbiter in ATS+, Verilog and Esterel are shown in Figure 7 .
The ATS+ description specifies the above requirements with four transitions. The first two are committing trmsitions that specify the synchronous requirement, i.e., if a client has been waiting for 10 cycles, then the token must be taken from the other client. The last two non-committing transitions say a client may get the token upon request, provided there are no other committing or non-committing transitions that conflict with the token grant.
The equivalent Esterel description consists of three parallel loops. We use the first two loops to keep track of ten successive requests. It is important to point out that since both Esterel and Verilog are deterministic, their descriptions must fix a priority for what happens if both chents request the token but neither has waited without a token for 10 cycles.
Shared token example 2. We present another sharedtoken arbiter. Only excerpts of the ATS+ description are shown in Figure 8 . This arbiter handles requests from three clients for a shared resource and uses a circulating token that cycles among the clients to determine priority. The first transition circulates the token among the three clients. The next three committing transitions indicate that a request from the client who currently holds the token will be granted right away. The finai three non-committing transitions handle granting the request to a client without a token if the token holder is not requesting in the same cycle. Non-committing transitions are ideal in these latter scenarios since these events do not have hard synchronous requirements. This exemplifies the use of non-determinism to un-burden the designer of "don't care" decisions.
Related work
ATS is an high-level hardware representation based on operation-centric state transitions. AT$ descriptions are not behavioral descriptions [41 in that an ATS description is not a procedural description with sequential control flow (as in C or behavioral Verilog.) ATS is developed as an intermediate representation for the compilation of TRSpec [9] and is currently also used to support Bluespec [l]. Our synthesis flow uses the Bluespec operation-centric language and compiIer.
The ATS model of computation is inspired by Term Rewriting Systems (TRS) [2] , a well-known reduction formalism with lineage from h b d a calculus. ATS is dso similar to Dijkstra's guarded commands [7] . Similar semantics has also served as the basis of parallel programming languages 161, hardware description languages for synchronous and asynchronous design synthesis [13, 15, 161, and languages for hardware design verification [ 
12, 14J.
ATS+ supports the natural specification of synchronous and asynchronous behaviors in the same framework. The synchronous subset of ATS+ is a simple synchronous language. "Synchronous languages" [3] refer to a class of formal specificatiodprogramming languages that are exemplified by Esterel [5J, Lustre [8] and Signal I1 11. Synchronous languages typically offer 1. a discrete model of time, 2. explicit expressions of concurrency, and 3. a deterniinisfic compiled behavior,
Condusions
In this paper, we presented committing transitions and synchronously delayed expressions as two synchronous extensions to ATS. These two new synchronous language elements enable ATS+ to capture both synchronous and asynchronous behaviors in the same hardware description. The intent is to allow the original ATS's simplifying atomic and sequential semantics to assist in the description of complex concurrent internal behaviors, and in the same description, the synchronous extensions are used to describe synchronous interfaces to external modules or internal synchronous IP blocks.
We described the compilation of the synchronous extensions usirig existing ATS synthesis capabilities (i.e.-, BSC). Namely, synchronously delayed expressions are expanded into automatically generated committing transitions, and both the user and the generated committing transitions are translated into non-committing transitions annotated by BSC's assert pragma. In our evaluation, we compared ATS+ to Verilog RTL and Esterel in terms of ease of description and to Verilog RTL in synthesis quality. We show that ATS+ enables compact descriptions of complex synchronous and asynchronous behavior and permits efficient synthesis to clock-synchronous implementations.
