In this paper we formally define 
Introduction
In today's SoC era most of the new hardware block designs are still conducted at the register transfer level. Engineers first use hardware design language like Verilog or VHDL to model the RTL hardware components: control logic and datapath units such as multiplexers, registers, arithmetic units etc. Synthesis, simulation, and optionally formal verification tools are then used on this model to generate the hardware logic and verify its functional correctness. Often times this is a long and error-prone process that has been and continues to be the major bottleneck of SoC design productivity. One of the main source of the problem is always attributed to Verilog or VHDL. They were designed and well suited to support gate-level simulation details e.g. their delta cycle semantics guarantee that they can simulate the combinational logic behavior even if there are combinational feedback loop within the circuit. They do not support conveniently the true RTL semantics for synthesis, simulation and formal verification. Furthermore their semantics are not formally defined and even the IEEE standard documents have ambiguities which led to incompatible simulation results among different vendors. In practice whatever coding style the dominant EDA tool accepts became the de facto standard for RTL semantics.
To alleviate this problem, many languages [2, 10, 6 , 3] and associated tools have been proposed and implemented in the industry and academia. However in our opinion some of these languages are not suited for RTL, some inherited the exact problems that exists in today's HDLs, and some are headed in a wrong direction [4] . We believe a study on the appropriate RTL semantics is necessary and could shed light on the design of the next generation HDLs by allowing precise specifications of intended program behavior and permitting proofs that a program does (or perhaps does not) meet its specification.
In Section 2 we review some of the related works that have been accomplished. We propose a new RTL++ language as our basis to discuss and describe the enhanced RTL semantics in Section 3. In Section 4 we present the formal execution semantics for RTL++ using structural operational semantics(SOS) [12] as the framework. In Section 5 we give the definition of a new model of computation called Register-transfer Finite State Machine (RFSM) as a platform to describe the synthesis semantics for RTL++. Section 6 shows an example to extend SystemC to support the key semantic enhancement. Section 7 concludes the paper and suggests the future research direction.
Related Work
There has been some effort attempted to define formal semantics for existing HDLs. The most relevant one is [7] in which Gordon defined a trace semantics model for Verilog synthesizable subset. Accellera made an effort in 2001 to standardize on RTL semantics [1] . [14] provides an implementation using a C++ library to model this proposed semantics. Zhu in [15] defined a meta RTL language to enrich RTL data types. Another inspiration source has been from synchronous programming community who pioneered the synchrony concept which is also the assumption used in this paper. Esterel [3] is ideal for control-dominant application and has been adopted in control software domain while Lustre [8] has been largely applicable to digital signal processing software development. We believe a good RTL language should be simpler and more attuned for both control and data oriented application.
3. The RTL++ language module d i f f e q ( i n p u t wire bool s t a r t , output p r e g i s t e r <2> bool ready , i n p u t wire bool r e s e t , i n p u t wire i n t e g e r Xinp , i n p u t wire i n t e g e r Yinp , i n p u t wire i n t e g e r Uinp , i n p u t wire i n t e g e r Ap , i n p u t wire i n t e g e r DXp , output r e g i s t e r i n t e g e r Xoutp , output r e g i s t e r i n t e g e r Youtp , output r e g i s t e r i n t e g e r Uoutp , c l o c k c l k ) ; r e g i s t e r i n t e g e r x , y , u , a , dx ; r e g i s t e r i n t e g e r t1 , t3 , t4 , t 5 ; p r e g i s t e r <2> i n t e g e r t 2
; / / p a t h t o t 2 i s p i p e l i n e d
w h i l e t r u e { wait u n t i l ( s t a r t == 1 ) ; r e a d y = 0 ; x = Xinp ; y = Yinp ; u = Uinp ; a = Ap ; dx = DXp ; wait ; w h i l e ( x < a ) { t 1 = u * dx ; t 2 = 3 * x ; / / s t a r t 1 s t p i p e l i n e d op wait ; t 2 = 3 * y ;
/ / s t a r t 2nd p i p e l i n e d op x = x + dx ; wait ; t 4 = t 1 * t 2 ; / / t 2 from 1 s t p i p e l i n e d op y = y + t1 ; wait ; t 5 = dx * t 2 ; / / t 2 from 2 nd p i p e l i n e d op wait ; u = u − t 4 ; wait ; u = u − t 5 ; wait ; } Xoutp = x ; Youtp = y ; Uoutp = u ; r e a d y = 1 ; } endmodule Listing 1. RTL++ code example for differential equation y + 3xy + 3y = 0 solver Our goal is to provide a formal semantic framework to guide the related simulation, synthesis and formal verification tool. It is not our intention to invent yet another new HDL though we have not found an suitable language allowing us to define the RTL semantics we would like to propose. We also would like to stay language neutral to avoid inheriting something without knowing the consequence. Therefore we start by defining an new RTL language called RTL++ that will serve the purpose as an experimental ground for studying the new semantics we proposed. The new features in the RTL++ language could help inspire designs of future HDLs or extensions of existing languages. In Section 6 we showed an example of extending SystemC to support a feature in our RTL++.
A good way to introduce a new language is by example. Listing 1 presents an RTL++ program example which we adapted from [11] . This example demonstrates most of the language constructs needed for specifying a RTL block. Similar to HDLs like Verilog module is the basic compositional unit for building structural hierarchy. A system modeled in RTL++ is composed of a list of modules that runs in parallel while communicating through their ports connected by wires. Each module is synchronous to a clock. The whole system can be seen as a globally asynchronous but locally synchronous one which suffice to represent most of the hardware systems. The detailed communication mechanism is outside the scope of this paper where we focus on the behavior of each individual module. The module behavior is specified with a while true loop statement. You can see that the language is mainly an imperative language which is familiar to most engineers. In contrast to the current practice of describing a FSM in a HDL case statement that explicitly branches to executing different actions block based on specific state values, RTL++ promotes the usage of so-called implicit state-machine style. A sequential block of program is divided into cycle block using a cycle delimiter -wait statement in RTL++. Each cycle block implies a distinct program control flow state.
We presents in the following sections the main ingredients in RTL++ that distinguishes it from other HDLs. Many topics of importance are omitted. These include data structures, data types, functions and procedures etc. For instance RTL++ only uses integer and bool as the only data types accepted for boolean and arithmetic operations. In this paper we focus on the execution flow control aspect of a RTL program.
Variables
To avoid the shortcomings in other HDLs such as register inferencing ambiguities, a clear semantics for synthesis is the number one objective for the design of RTL++. We define respectively three types of variables: wire, register, and pipelined register intended to be mapped exactly to the same type of objects in the synthesized hardware logic. They all associate with the module clock. We present their informal semantics as follows. 
Wire
Wire variable is used to store intermediate computation results within the cycle block boundary. Once assigned a new value it is immediately available to all readers. This is in a sense similar to VHDL variable from which it nevertheless differs in an aspect that the storage of a wire variable is volatile. The variable lives only in its definition cycle block, where there is an assignment with the variable being on the left hand side. The variable dies immediately once it passes the cycle block boundary delimited by wait statement. The wire variable can be mapped to the combinational output of a function in hardware logic.
Register
Register variable is used to store computation results across cycle block boundaries. It uses essentially two buffers to hold its current value and its new value. When a value is assigned to the register variable, the value is stored in the new value buffer first. Only in the beginning of next clock cycle, its current value will be updated to the saved new value buffer. Unlike VHDL signal or Verilog reg which could be synthesized to registers or wires depending on how they are inferred, a register variable in RTL++ language corresponds to a specific physical register (flip-flop) in hardware.
Pipelined Register
It is very awkward to capture in RTL HDL the design intent of pipelined operations which are very common in RTL design. We introduce a pipelined register variable in RTL++ to facilitate this purpose. As illustrated in Figure 1 a backto-back shift register array is used to represent the effect of an pipelined operation. The depth of the array is equal to the number of stages of the pipeline. With respect to the pipeline output, it is an equivalent abstraction -the same input sequences produce the same output sequence. (This is under the normal assumption that the intermediate pipeline stage registers are not used directly for computation.) The introduction of this new type of variable in RTL++ (as the name suggests) help raise the RTL abstraction level which usually deals only with computations happening in one clock cycle. This abstraction works for modeling and simulation obviously. To be synthesizable it relies on the availability of pipeline retiming tools such as Synopsys Design Compiler [13] which has a behavioral retiming feature that can essentially find an optimal boundary in the combinational logic to insert pipeline stage registers. Also as exemplified in Listing 1 pipelined register variable can be used on module ports to account for interconnect delay that has become dominant for deep submicron chip design.
Abstract Syntax
We define the RTL++ language with an abstract syntax which lets us focus on the structures of the program with semantic significance rather than worrying about parsing correctly the lexical token. The syntax can be captured in a BNF-like notation as follows: 2 (Arith. Expression)
Where the various syntactic categories and meta-variables that are used to range over constructs of each category:
n will range over numerals, w will range over wires, r will range over registers, p will range over pipelined registers, v will range over all three types of variables, v = w ∪ r ∪ p a will range over arithmetic expressions, b will range over boolean expressions, S will range over statements, and aop ∈ {+, −, * , ...} a finite set of integer binary operators, bop a ∈ {=,<,>,...} a finite set of binary integer boolean operators, bop b1 ∈ {not, ...} a finite set of unary boolean operators, bop b2 ∈ {and, or, ...} a finite set of binary boolean operators.
[ass] 
Formal Execution Semantics of RTL++
We define execution semantics of RTL++ using Plotkinstyle structural operational semantics (SOS) as our framework. The idea is to explicitly describe how RTL++ programs compute in stepwise fashion and the possible statetransformation they perform. SOS uses a transition system which is a structure Γ, where Γ is a set of configurations and ⊆ Γ × Γ is the transition relation.
In the context of RTL++, the configuration is a pair S, s where S is the RTL++ syntactic constructs defined in Section 3.2, s is the value(state) of all RTL++ variables including wires, registers, and pipelined registers. [b] s to denote the semantic functions for arithmetic expression and boolean expression respectively. Semantic function takes a syntactic entity as argument and returns its meaning. We will only define semantics of the variables and omit the rest of the detail of the expression semantics of RTL++. We uniformly define any of the three types of variables as a vector υ = {υ 0 , ..., υ n } where for wire, n = 0; for regular registers, n = 1; for pipelined registers n is the total number of pipeline stages. We can define three variable functions as shows that the first step in executing a branch statement is to perform the test and to select the appropriate branch. The [while] rule shows that the first step in the execution of the while-construct is to unfold it one level, that is to rewrite it as a branch statement. The test will therefore be performed in the second step of the execution (where one of the axioms
The last rule [wait until ] shows that a wait until statement can actually be translated into an equivalent while statement.
Synthesis Semantics: Register-transfer Finite State Machine
RTL++ has very clear synthesis semantics. All constructs of the program can be easily translated onto a hard-ware model of computation while preserving its execution semantics. A commonly used model of computation for RTL design is FSMD [5] . However it does not support the pipelined operation semantics directly. Here we define a new model called Register-transfer Finite State Machine (RFSM) that natively supports the proposed RTL++ semantics. Figure 2 • R is the set of register (memory elements) that store the datapath states. Its initial set R 0 = {unde f }.
where n is the total number of registers (pipelined registers are grouped as one), and
where m is the number of pipeline stages of R i , and we define the first stage register set
and the last stage register set
• I is a finite set of input symbols.
• O is a finite set of output symbols.
• f s : PC × R × I → PC is the state transition function.
• f r :
is the register update function.
• f o : PC × R → O is the output function.
Each cycle block of a RTL++ program can be labeled with a unique natural number starting with the first cycle block being 0. These cycle block labels corresponds directly into PC state register of RFSM. Synthesis tool has the freedom to re-encode the state value assignment using such as one-hot or one-cold encoding. The register and pipelined register variables also have a one-to-one mapping from their declarations in the program to the datapath register set R of RFSM. These hardware physical registers definitely can be shared among register variables having non-overlap life time [9] . This topic is out of the scope of this paper. The rest of the RTL++ assignment statements can be handled by synthesis tool using allocation and binding algorithm [5] to generate the combinational logic and the connections. 
# d e f i n e NUM STAGES
3 s t r u c t a l g s t a g e s : sc m o d u l e {
Conclusions and Future directions
In this paper we have proposed an enhanced semantics that supports pipelined operation. Due to the lack of suitable HDL to use to define our proposed semantics, we have also defined the abstract syntax for RTL++ language intended to capture the minimal but necessary set of ingredients for RTL design modeling rather than designing a perfect language. For example, wait statement in RTL++ only supports one clock cycle advancement. In a real world language, to help user avoid repeating wait for multiple times one will certainly add another wait[n] statement for coding convenience and readability improvement purpose. The execution semantics of RTL++ is documented in the intuitive and simple structural operational semantics notations. We also define a formal RFSM model for us to discuss the synthesis semantics of RTL++. These contributions form the unambiguous basis for future algorithm and tool developments in the area of RTL synthesis, formal verification and simulation. We have shown one example of extending SystemC to support the pipelined register variable concept in RTL++ semantics. Our immediate future research direction is to study the communication semantics between modules to investigate whether a higher level abstraction is needed for RTL.
