Automated Formal Equivalence Verification of Pipelined Nested Loops in
  Datapath Designs by Behnam, Payman et al.
 Automated Formal Equivalence Verification of 
Pipelined Nested Loops in Datapath Designs 
Payman Behnam, Student Member IEEE, Bijan Alizadeh, Senior Member IEEE, Sajjad Taheri, 
Student Member IEEE  
Abstract— The ever-growing complexity of digital systems has made designers move toward using Electronic System Level 
(ESL) design methodology at a higher abstraction level. The designs at ESL are then automatically synthesized to Register 
Transfer Level (RTL) by means of High Level or behavioral Synthesis (HLS) tools. Due to possibility of buggy synthesis, 
especially when the target design must be manipulated or optimized (i.e., pipelining), an efficient equivalence checking method 
is necessary to check functional equivalency of the ESL specification and the RTL implementation. This problem is even more 
serious in the case of loop pipelining, since several challenges such as overlapping execution, retiming and forwarding occurre 
which make traditional sequential equivalence checking approaches, inapplicable. At the same time, the growing market for 
datapath dominated applications such as DSP for multimedia applications and embedded systems requires a suitable 
Computer-Aided Design (CAD) support for their verification. In this paper, we present an efficient formal approach to check the 
equivalence of synthesized RTL against the high-level specification in the presence of pipelining transformations. To increase 
the scalability of our proposed method, we dynamically divide the designs into several smaller parts called segments by 
introducing cut-points. Then we employ Modular Horner Expansion Diagram (M-HED) to check whether the specification and 
implementation are equivalent or not. In an iterative manner, the equivalence checking for each segment is performed. At each 
step, the equivalent nodes and those nodes which have an impact on them are removed until the whole design is covered. Our 
proposed method enables us to deal with the equivalence checking problem for behaviorally synthesized designs even in the 
presence of pipelines for nested loops. The empirical results demonstrate the efficiency and scalability of our proposed method 
in terms of run-time and memory usage for several large designs synthesized by a commercial behavioral synthesis tool. 
Average improvements in terms of the memory usage and run time in comparison with SMT- and SAT-based equivalence 
checking are 16.7× and 111.9×, respectively. 
Index Terms— Formal verification, equivalence checking, piplined nedted loop, HED 
——————————   u   —————————— 
1. INTRODUCTION
he complexity of next generation of digital systems 
has overtaken traditional time consuming handcrafted 
RTL design methods. Therefore, some approaches are de-
sirable to generate RTL codes automatically. High Level 
Synthesis (HLS) tools have been provided to respond to 
such needs. The HLS is the process of generating RTL de-
sign from higher level programs such as C, C++, SystemC, 
or so on [6]. Using HLS tools leads to more productive 
designs for next-generation, computationally intensive 
applications. When we make use of HLS tools, however, 
we need to make sure that synthesized RTL is bug free. 
This indicates that the transformation correctness of high 
level or behavioral synthesis phase is very important [2, 
3]. 
A large amount of work has been done to verify the 
RTL against its specification. A combinational equivalence 
checking approach between designs in SystemC and RTL 
has been suggested in [4]. The authors of [7], [8] and [9] 
presented Sequential Equivalence Checking (SEC) ap-
proaches between software specification and hardware 
implementation. During equivalence checking, several  
optimization techniques such as cut-point, cut-plane and 
cut-loop are used. The cut-point optimization is to find 
internal equivalent nodes of specification and their corre-
sponding circuit implementations. Cut-points reduce the 
size of symbolic expressions by replacing verified sub-
circuits with new symbolic values [28]. Cut-plane is con-
sidered as a set of cut-points while cut-loop is considered 
as a cut-plane at the end of a loop [30]. Some techniques 
have been proposed to check the equivalency between 
combinational circuits with some structural similarities 
using bit-level decision diagrams [28, 31] SAT-based ap-
proaches [38, 39, 43, 44], probabilistic methods [46] and 
directed test generations [47]. The structural similarities 
enable them to find identical internal nets as cut-points to 
partition the whole design into a set of smaller segments. 
However, their approach to problem of equivalence 
checking is limited to bit level verification and hence can-
not handle large RTL designs. In [30], the authors have 
proposed a novel approach to verify equivalence of C-
based system level description versus RTL model by look-
ing for merge-points as early as possible to reduce the size 
of equivalence checking problems. This method however 
is suggested for hand crafted RTL codes. In addition, it 
makes use of cut-loop techniques which are inapplicable 
to pipelined designs as will be discussed in Section 2. 
Lots of work has been performed for equivalence 
checking of generated RTL using HLS tools against its 
specification [5, 10, 11, 12, 13, 14]. The authors of [5] have 
T 
———————————————— 
Payman Behnam is with the Computer Science Department, University of Utah, SLC, USA. 
Bijan Alizadeh is with the Electrical and Computer Engineering Department, University of 
Tehran, Tehran, Iran. Sajad Taheri is with the Computer Science Department, University of 
California Irvine, Irvine, USA (e-mail: payman.behnam@utah.edu, b.alizadeh@ut.ac.ir, saj-
jadt@uci.edu). 
 
  
used a bi-simulation correspondence checking to validate 
designs generated by the SPARK behavioral synthesis 
tool. A suite of optimizations for the SEC framework has 
been presented by [10] which exploit both the explicit con-
trol and data flow representations in the Clocked Control 
and Data Flow Graph (CCDFG) and the module struc-
tures in the ESL description. The authors of [12] proposed 
a SEC framework to compare an ESL design with its be-
haviorally synthesized RTL in the presence of optimiza-
tions such as operation gating and global design varia-
bles. The work in [13] has tried to solve the equivalence 
checking problem for compiler transformations in behav-
ioral synthesis. 
The process of behavioral synthesis consists of several 
transformation phases including compilation, scheduling, 
allocation, binding, and control generation [6, 18]. On the 
other hand, when the target design must be pipelined, 
loop pipelining is employed as part of scheduling and 
binding phases which results in several challenges, such 
as overlapping execution, retiming, forwarding, and los-
ing direct one to one mapping between the specification 
and the pipelined RTL. Hence, traditional sequential 
equivalence checking approaches are becoming ineffi-
cient. Despite the existence of numerous methods to veri-
fy pipelined microprocessors [15, 16, 17], there are a few 
published approaches on formal equivalence checking of 
behaviorally synthesized pipelined loop designs. 
The comparison of input-output relations between the 
specification and the high level synthesized pipelined RTL 
is prohibitively expensive for loops with many iterations. 
A reference pipelining transformation on the CCDFG was 
proposed in [11] to deal with the problem of loop pipelin-
ing without using approaches based on input-output 
comparison. The proposed method is based on building 
reference pipeline model with a certified specific trans-
formation and checking the equivalence between the ref-
erence model and synthesized RTL using dual-rail sym-
bolic simulation. However, it requires several parameters 
whose values are needed to be obtained from the HLS 
tool. Additionally, although it can handle SEC of pipe-
lined designs with nested loops, it cannot make use of 
proposed SEC optimization techniques for internal loops 
and hence has to unroll internal loops. The authors of [14] 
solved the problem of equivalence checking for function 
pipelining (instead of loop pipelining) in behavioral syn-
thesis.  
Tackling the equivalence checking problem of high lev-
el synthesized designs, requires a scalable representation 
model. In recent years, a strong and scalable high-level 
decision diagram called M-HED has been proposed [25, 
27, 33]. This decision diagram has a compact and a canon-
ical form, and is also close to high-level descriptions of a 
design. The other properties of M-HED such as a facility 
for expressing primary outputs of a design in terms of 
primary inputs in a polynomial form, presenting state 
variables in terms of integer equations in a formal model, 
and availability of arithmetic operations in a word-level, 
has made it a powerful and scalable platform for verifica-
tion [25, 27, 33, 37, 41, 42]. 
In this technical report, we present a scalable formal 
equivalence checking methodology for pipelined loop 
designs synthesized at high level while nested loops are 
pipelined and no information from the HLS tool is neces-
sary. The short version of the presented method has been 
published in [45] as a A ASP-DAC paper. In this version, 
we describe our proposed method in more details with 
several examples.  As Figure 1 shows, first we perform a 
symbolic simulation to create a list of assignments of spec-
ification (LAASC) and behavioral synthesized design 
when all loops are pipelined (LAPRTL). These lists com-
pletely describe the behavior of specification and imple-
mentation. For equivalence checking, we employ an effi-
cient canonical hybrid bit- and word-level decision dia-
gram called M-HED [25, 27, 33]. The key idea to increase 
the scalability of our framework is to avoid cost-
prohibitive input-output comparison by introducing dy-
namic cut-points instead of fixed cut-points used in [28]. 
Although the authors of [28] used BDDs and cut-loops 
for equivalence checking, their method is not able to han-
dle large designs and those designs that contain pipelined 
loops. As we stated before, loop pipelining is one of the 
most intricate transformations [40] that arises challenges 
in developing automated equivalence checking methods. 
In addition, unlike the method presented in [11], these 
cut-points are not operation mapping between the specifi-
cation and the RTL design. Moreover, in contrast with 
[11], our proposed method doesn’t need several parame-
ters whose values are obtained from HLS tool and also 
doesn’t make use of sophisticated concepts of formal veri-
fication such as theorem proving for certification frame-
work. This property allows other researchers to under-
stand and replicate it easily. The cut-points in our method 
are inserted into parts of the code where the size of its 
corresponding M-HED is maximized. This way, the de-
sign is divided into several smaller segments. Therefore, 
the size of the equivalence checking problem significantly 
reduces. We continuously check the equivalence of corre-
sponding segments from specification list and pipelined 
loop synthesized implementation list to detect the equal 
nodes. Then, we cut out the equivalent nodes and intro-
duce them as new primary inputs for the rest of the seg-
ments. These primary inputs are used while next seg-
ments should be checked.  If no match is found while 
comparing segments, we make use of an internal equiva-
lence approach that enables us to incrementally remove 
outputs of corresponding segments, expose temporary 
nodes as new output nodes, and explore them to check 
whether they are equivalent or not.  
Equivalence Checking
LAPRTL LAASC
Equivalent DesignEND
Specification Symbolic Simulation
Passed
Failed
CatapultC Pipelined Implementation
Based	on	M-HED	and	Dynamic	Cut-point	Insertion
 
Figure 1. Proposed equivalence checking methodology. 
We employ CatapultC [1] as an HLS tool to automati-
  
cally generate thousands of lines of RTL code from C++ 
code description of several large-sized designs. Although 
the individual parts of our presented technique are 
known, our innovation is to employ the unique combina-
tion of them. Hence, the main contributions of this paper 
are as follows:   
• Creating a framework based on M-HED for fully 
automating SEC of complex design where pipe-
lined nested loops are supported. To the best of our 
knowledge it is the first work which utilizes high 
level decision diagrams for equivalence checking 
of high level synthesized pipelined loop in data 
path designs.   
• Proposing a method to overcome the lack of optimi-
zation methods for equivalence checking of pipe-
lined loop designs synthesized at high level. In 
the proposed method, we can handle the problem 
of equivalence checking in a reasonable run time 
in the complex data path designs with pipelined 
nested loops. 
• Dynamically specifying the cut-points so that pipe-
lined loops can efficiently be verified.  
The rest of this paper is structured as follows. The chal-
lenge of loop pipelining in equivalence checking is dis-
cussed in Section 2. The M-HED as a hybrid canonical 
representation is expressed in Section 3. Our proposed 
equivalence checking approach with a simple example is 
described in Section 4. Experimental results are reported 
in Section 5, while a brief conclusion is presented in Sec-
tion 6. 
2. LOOP PIPELINING CHALLENGES IN 
EQUIVALENCE CHECKING 
In this section, we explain the critical challenges of loop 
pipelining in verification and equivalence checking when 
a code in high level language such as C is synthesized into 
RTL. Loop pipelining is an operation that allows the next 
iteration of a loop is started before the previous one is 
fully finished. This operation increases parallelism and 
throughput. In HLS tools, loop pipelining can be con-
trolled by the number of cycles that must elapse between 
two successive iterations. This parameter is called initia-
tion interval [1]. For example, consider a pair of three-
level nested loops shown in Figure 2. The result of a single 
iteration of the outer loop when none of the middle and 
inner loops is pipelined is shown in Figure 3. When initia-
tion interval is one, the result of a single iteration of outer 
loop when both inner and middle loops are pipelined is 
shown in Figure 4(a). Finally, Figure 4(b) illustrates the 
result of single iteration when all loops are pipelined 
while initiation interval is one.  
   As shown in Figures 3 and 4, when loops are pipelined, 
the overlapping execution of consecutive iterations ap-
pears. It means that i+1th iteration can be initiated before 
(or concurrent with) ith iteration is committed. These retim-
ing and out-of-order executions cause the sequence of 
operations in the specification and generated RTL as well 
as controlling finite-state machines to become different. 
Hence directly using of several optimizations such as cut-
loop techniques becomes inapplicable that makes the 
problem of equivalence checking difficult. 
Outer: For (...) { 
           middle_1: For ( int i = 0; i < 2; i++) 
              inner_1: For ( int j = 0; j < 3; j++) 
            a1[j] = b[i]-c[i]+d[i]-e[i]; 
          middle_2: For ( int k = 0; k < 2; k++) 
                  inner_2: For ( int l = 0; l < 3; l++) 
              a2[l] = b[k]-c[k]+d[k]-e[k];} 
Figure 2.  An example of a pair of three-level nested loop. 
Inner_1
Inner_2
Middle_1
(Itteration 0) Middle_1
(itteration 1)
Middle_2
(itteration 0)
Middle_2
(itteration 1)
 
Figure 3. Execution order of three-level nested loops with no pipeline 
To give a glimpse about the kind of challenge we face 
with when a C code is synthesized into RTL code, let us 
consider the C code and related RTL model illustrated in 
Figure 5. The RTL schematic is obtained by CatapultC[1] 
when the specification is synthesized at the frequency of 
100 MHz. In each iteration, four multiplications, two ad-
ditions and lastly one multiplication must be performed. 
In this case, one can use fixed cut-loop optimization tech-
nique for equivalence checking. That is because an itera-
tion of specification equals to two cycles of implementa-
tion.  
 
(a) 
Inner_2
Inner_1
 
(b) 
Figure 4. Execution order of tree-Level nested loops (a) when middle 
and inner loops are pipelined, (b) when outer, middle, and inner 
loops are pipelined. 
 
   In other words, the symbolic values on a cut in the spec-
ification always equal to the symbolic values of a specific 
cut in the implementation. However, when a design is 
pipelined such a correspondence is lost and it would not 
be possible to determine fixed cut-loops. In the fixed cut-
loop we must be able to find a location in such a way that 
always after a fixed number of cycles in the implementa-
tion, all points in the specific cut of implementation and 
specification become equivalent. 
Inner_1
Inner_2
  
*
*
*
*
REG
clkrst
REG
clkrst
REG
clkrst
REG
clkrst
REG
clk
rst
REG
clkrst
+
+ *
REG
clkrst
FSM
c[63:0]
d[63:0]
a[63:0]
b[63:0]
res
Outer: for (int i=0; i<2; i++)
    {
       Inner: for (int j=0; j<2; j++)
{
    tempf   = a[j]*b[i] + c[i]*d[j];
    temps  = c[j]*d[i] + a[i]*b[j];
    res[i][j] = tempf * temps;
              }
    }
Fixed Cut-loop
 
Figure 5. Using a high level synthesis tool (CatapultC) to generate RTL 
without pipelining nested loop. 
Suppose that five multipliers and two adders are avail-
able during the high level synthesis phases. Hence, four 
multiplications in the second iteration of the inner loop 
(Figure 5) are being computed before the computation of 
the first iteration is finished. Such reordering deprives us 
of the opportunity to select a fixed cut-loop in the imple-
mentation (cutIi; i=1 to 6 in Figure 6) and the specification 
(cutSi; i=1 to 6 in Figure 6) in such a way that a regular 
behavior in the ordering of operations in the implementa-
tion and the specification is observed. For example, sup-
pose that we want to find a corresponding cycle in the 
implementation for the first iteration of the specification 
(i.e., cutS3) and check whether this cycle can be used as a 
fixed cut-loop or not. By moving forward in successive 
cycles of the implementation, cutI3 is found. But several 
multiplication and addition operations in cutI3 have no 
corresponding operations in the cutS3 of specification. 
Based on definition of cut-loop in Section 2, the cutI3 can-
not be considered as a cut-loop because no equivalent 
statement for other statements in cutI3 can be found in 
cutS3. With proceeding in remaining cycles of the imple-
mentation we observe that neither cutI3 nor any other 
cuts can be used as a cut-loop in a way that always after 
fixed number of cycles all statements in that cut become 
equivalent to all statements in a specific cut in the specifi-
cation. This example shows that in the case of pipelined 
loop implementation, using optimization techniques such 
as cut-point, cut-loop, or cut-plane for equivalence check-
ing purposes are not straightforward. These challenges 
motivate us to come up with an efficient methodology for 
equivalence checking between the specification and syn-
thesized pipelined loop implementation. 
3. MODULAR HORNER EXPANSION DIAGRAM (M-
HED) 
In order to make this paper self-contained, we introduce a 
graph-based representation called Horner Expansion Dia-
gram (HED) for functions with a mixed Boolean and inte-
ger domain, and an integer range to represent arithmetic 
operations at a high level of abstraction [25, 33]. By con-
trast, other Word Level Decision Diagrams (WLDDs) are 
graph-based representations that provide a concise repre-
sentation of integer-valued functions defined over binary 
variables as a bit vector. On the other hand, Binary Deci-
sion Diagrams (BDDs) or Satisfiability (SAT) based meth-
ods suffer from size explosion problems when designs 
grow in size and complexity. BDD-based verification tools 
have not been very successful for designs containing large 
arithmetic data-path units due to prohibitive memory 
requirements. In HED, we assume that the set of variables 
is totally ordered and all vertices that have been con-
structed obey this ordering. 
 
 
REG
REG
clk
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
REG
clk
clk
REG
clk
REG
clk
REG
REG
clk
clk
REG
clk
REG
clk
REG
REG
clk
clk
REG
clk
REG
clk
X
X
X
X
+
+
X
X
X
X
X
X
X
X
X
X
X
X
X
Specification
ImplementationcutI1 cutI2 cutI3 cutI4 cutI5 cutI6
a[0]
b[0]
c[0]
d[0]
c[0]
d[0]
a[0]
b[0]
a[1]
b[0]
c[0]
d[1]
c[1]
d[0]
a[0]
b[1]
a[0]
b[1]
c[1]
d[0]
c[0]
d[1]
a[1]
b[0]
a[1]
b[1]
c[1]
d[1]
c[1]
d[1]
a[1]
b[1]
Cycle1 Cycle2 Cycle3 Cycle4 Cycle5 Cycle6
Iteration2
a[1]
b[0]
c[0]
d[1]
c[1]
d[0]
a[0]
b[1]
cutS4cutS5cutS6
X
X
+
X
X
+
X
REG
clk
REG
clk
REG
clk
+
+
X
REG
clk
REG
clk
REG
clk
+
+
X
REG
clk
REG
clk
REG
clk
+
+
X
Iteration1
a[0]
b[0]
c[0]
d[0]
c[0]
d[0]
a[0]
b[0]
cutS1cutS1cutS3
X
X
+
X
X
+
X
Iteration3
a[0]
b[1]
c[1]
d[0]
c[0]
d[1]
a[1]
b[0]
cutS7cutS8cutS9
X
X
+
X
X
+
X
Iteration4
a[1]
b[0]
c[0]
d[1]
c[1]
d[0]
a[0]
b[1]
cutS10 cutS11 cutS12
X
X
+
X
X
+
X
 
Figure 6. Demonstration of inability to use fixed cut-loops in order to check the equivalence of specification and pipelined RTL implementation 
when loops are unrolled. 
  
These conventions are similar to other word-level 
canonical representations and are not discussed here 
for brevity [21,23]. The HED is a binary graph-based 
representation where the algebraic expression F(X,Y, 
…) is expressed by a first-order linearization of the 
Taylor series expansion [30]. Suppose variable X is the 
top variable of F(X,Y, …). Equation (1) shows F(X,Y, 
…), where const is independent of the variable X, while 
linear is the coefficient of variable X. 
F(X, Y, …)=F(X=0,…)+X×[F'(X=0,…)+…] = const + X×linear    (1) 
The HED is a directed acyclic graph G = (VR, ED) 
with vertex set VR and edge set ED. While the vertex 
set VR consists of two types of vertices: Constant (C) 
and Variable (V), the edge set indicates integer values 
as weight attribute. A Constant node v has as its attrib-
ute a value val(v)Î Z. A Variable node v has as attrib-
utes an integer variable var(v) and two children 
const(v) and linear(v) Î {V, C}.  
3.1 Reduction Rules and Canonicity 
Analogous to BDDs and *BMDs, HED can be reduced 
by removing redundant nodes and merging isomor-
phic nodes. In order to do so, the following reduction 
rules have been employed: 
Rule 1: Remove a node if its linear portion (right child) 
is 0-terminal or its right edge has 0 weight. Then replace 
this node with its const portion (left child). Figure 7(a) 
illustrates this situation where node v contains only 
const part and therefore the function computed at that 
node is independent of variable var(v). 
Rule 2: Merge isomorphic nodes. This merging rule 
identifies isomorphic sub-graphs. Two nodes are iso-
morphic if they not only have the same const and linear 
portions but also their variable IDs should be the same. 
Figure 7(b) shows that nodes v1 and v2 are isomorphic 
and then merged together as a new node v. 
Figure 8 illustrates how f(x, y, z) = 24-8z+12y+12yz-
6x-6x2z is represented by the HED. Let the ordering of 
variables be x > y > z. First the decomposition w.r.t. x is 
taken into account. As shown in Figure 8(a), after re-
writing f(x, y, z) = (24-8z+12y+12yz) + x(-6-6xz) based 
on (1), const and linear parts will be 24-8z+12y+12yz 
and -6-6xz, respectively. The linear part is decomposed 
w.r.t. variable x again due to x2 sub-monomial. After 
that, the decomposition is performed w.r.t. variable y 
and then z as shown in Figure 8(b). In order to reduce 
the size of the HED representation, redundant nodes 
are removed and isomorphic sub-graphs are merged. 
In Figure 8(b), 24-8z, 12+12z and -6z are rewritten by 
8[3+z(-1)], 12[1+z(1)] and -6[0+z(1)], respectively. In 
order to normalize the weights, gcd(12,12) = 12, 
gcd(8,12) = 4 and gcd(-6,-6) = -6 are taken to extract 
common factors. Finally, Figure 8(c) shows the normal-
ized graph where gcd(4,-6) = 2 is taken to extract the 
common factor between out-going edges from x node. 
In this representation, dashed and solid lines indicate 
const and linear parts, respectively. Note that in order to 
have a simpler graph; paths to 0-terminal have not 
been drawn in Figure 8(c). 
 
Figure 7. Reduction rules: (a) redundant nodes, (b) isomorphic 
sub-graphs. 
Figure 8. HED representation of 24-8z+12y+12yz-6x-6x2z: (a) 
decomposition with respect to variable x, (b) decomposition with 
respect to variables x and y, (c) decomposition with respect to 
variables x, y and z. 
In this graph basic arithmetic operators such as ad-
dition, unary addition, subtraction, unary subtraction 
and multiplication are available that work for symbolic 
integer variables. In order to represent Boolean func-
tions, logical bitwise operations including NOT, AND, 
and OR have been provided that are discussed in the 
following subsections [32, 33]. 
3.2 Boolean Logic 
In order to have an integrated representation of Boole-
an and integer variables, the logical operations need to 
be supported as well as arithmetic expressions. Boole-
an operations are defined based on the arithmetic op-
erations illustrated in Figure 9. To address bit-slicing 
problem, in [32] we have introduced a hybrid method 
to check the equivalence between a high level specifica-
tion and the RTL implementation. In our hybrid equiv-
alence checking approach, we have proposed a de-
composition technique as shown in Figure 10 that ena-
bles us to deal with bit-slicing problem. The main idea 
is those word-level variables whose bit-slices are used 
in the Boolean part of the design are decomposed into 
other word-level variables. For example, if ith bit of var-
iable Tmp, i.e. Tmp[i], is used in the Boolean part, the 
related word-level variable, i.e., Tmp, is decomposed 
into other word-level variables according to Tmp = 
2(i+1)´TmpH+2i´y+TmpL, where TmpH and TmpL are new 
integer variables and y is Tmp[i]. 
3.3 Arithmetic Operations 
Addition (W = X+Y) and subtraction (W = X-Y) can be 
represented canonically based on decompositions in (2) 
and (3) respectively, as shown in Figure 11(a) and Fig-
ure 11(b): 
)1(*)]1(*0[)1(* XYXYYXW ++=+=+=   (2) 
)1(*)]1(*0[)1(* XYXYYXW +-+=+-=-=      (3) 
Multiplier (W = X*Y) can be represented canonically 
based on (4), as shown in Figure 11(c). 
u 
(a) 
u 
v 
0 
(b) 
v v v 
u uu u
2 
x 
24-8z+12y+12yz 
-6-6xz 
y x 
24-8z 12+12 -6z -6 
(a) (b) 
x 
y x 
z z -1 
3 
2 3 
z 
-3 2 
(c) 1 
x 
  
)]1(*0[0)(*0* YXYXYXW ++=+==       (4) 
 
Figure 9. Logical operations in HED graph. 
 
Figure 10. Decomposition technique to address bit-slicing prob-
lem. 
 
Figure 11. Arithmetic operations in HED: (a) addition, (b) subtrac-
tion, (c) multiplication. 
In order to describe how arithmetic operators such as 
addition and multiplication are applied to two HED nodes 
and as a result a new node is generated, let u and v be two 
nodes to be composed, resulting in a new node q. Let 
var(u) = x and var(v) = y denote the decomposing varia-
bles corresponding to the two nodes to be decomposed. 
The following cases should be considered: 
1- If both nodes are Constant nodes (u, v Î C), a new 
Constant node q is computed as follows: 
Addition q = u + v: val(q) = val(u) + val(v) 
Multiplication q = u * v: val(q) = val(u) * val(v) 
2- If one of the nodes is Constant node (v Î C), a new 
Variable node q is created as follows. For addition 
operation, the const part of result, i.e. q0, is obtained 
by adding constant val(v) to const part of u (u0). The 
linear part of result will be linear part of u (u1). For 
multiplication operation, val(v) should be multiplied 
with both const and linear parts of u (u0 and u1) to cre-
ate const and linear parts of the result (q0 and q1) re-
spectively.    
    Addition q = u + v: q0 + x * q1 = (u0 + val(v)) + x * u1 
    Multiplication q= u * v: q0 + x * q1  
  = u0 * val(v) + x * u1 *val(v)  
3- If both nodes are Variable nodes (u, v Î V), proceed 
according to variable order. Suppose order(x) > or-
der(y). 
a. Where the two nodes are indexed by different varia-
bles, var(q) = max(var(u), var(v)) = x. For addition 
operation, the const part of result, i.e. q0, is computed 
by adding v node to const part of u (u0). The linear 
part of result will be linear part of u (u1). For multipli-
cation operation, v node should be multiplied with 
both const and linear parts of u (u0 and u1) to generate 
const and linear parts of the result (q0 and q1) respec-
tively. 
Addition q = u + v: q0 + x * q1 = (u0 + v) + x * u1 
 Multiplication q = u * v: q0 + x * q1 = u0 * v + x * u1 * v 
b. Where the nodes have the same index then var(q) = 
x. In this case, the const part of q is created by pairing 
the const parts of two nodes (u0+v0 for addition and 
u0*v0 for multiplication). The linear part of q is ob-
tained as a sum of two cross products of const and 
linear parts when multiplication should be done. 
Furthermore, the quadratic term, i.e. q2, is taken into 
account in linear portion of q.   
        Addition q = u + v: q0 + x * q1 = (u0 + v0) + x * (u1 + v1) 
         Multiplication q = u * v: q0 + x * (q1 + x * q2)  
              =  u0 * v0 + x * (u1 * v0 + u0 * v1 + x * (u1 * v1)) 
3.4 Shift Operations 
While shift left operator, <<, can be viewed as scalar mul-
tiplication, shift right operator, >>, can be modeled as a 
division by 2N. In order to compute the division on HED, 
while assuming the divisor is a constant integer number 
and also powers of two, 2N, the following recursive algo-
rithm is applied until terminal cases are reached. At ter-
minal nodes, the division is converted to a numerical di-
vision problem which is performed easily. If, however, 
constant values at terminal nodes are less than 2N, the 
related variable, Var, is replaced by Var/2N [34]. 
Z = f / 2N  = (fconst + X * flinear) / 2N = (fconst/2N) + X * (flinear/2N)   
If terminal < 2N : replace Var by Var/2N 
3.5 Conditional Statements 
In order to handle conditional statements such as if-then-
else statement and case statement, variables from differ-
ent branches of conditional statements are rewritten by 
different indices (e.g., variable n is defined as n1, n2, … , 
nm variables for m cases to consider both if and else parts 
in different iterations). Then these new variables are add-
ed to the design in place of conditional statements as 
shown in Figure 12. 
 
 
 
 
 
Figure 12. Conditional statements (a) original source code (b) list 
of assignments 
3.6 Modular Horner Expansion Diagram (M-
HED) 
In order to verify polynomial data paths over bit-vectors, 
we have extended the HED to manipulate modular 
arithmetic [25]. Although the equivalence verification 
over Zm is known to be NP-hard when m³2 [29], analyzing 
polynomials over arbitrary finite integer rings and their 
properties are useful to deal with the equivalence check-
ing problem [20]. The theory of univariate vanishing pol-
ynomials over Zm, mÎN, m>1; i.e., those polynomials f 
such that f(x) mod m º 0, has been presented in [24]. The 
authors of [26] have extended the concepts of the work 
[24] and derived a unique form representation of a multi-
variate polynomial over finite integer rings of the form
1) NOT X ==> 1–X  
2) X AND Y ==> X*Y  
3) X OR Y ==> X+Y-X*Y 
High Level Synthesis and Partitioning 
Decision Making 
Tmp 
Tmp[i] 
Tmp[i] = y 
Word-level Representa-
tion (HED) 
Low level SAT 
Checking 
Boolean Expressions Arithmetic Expressions 
Behavioral Model 
Decompose Tmp as 
2(i+1)´TmpH+2i´y+TmpL 
  
  
 
-1 
X 
Y 
1 
X 
Y 
1 0 0 
X 
1 
Y 
0 (c) (b) (a) 
For (n=0; n<2; n++) 
     IF (n = 0)  A = B + C; 
     ELSE  A = B × C; 
                  (a) 
A0 = B+ C; 
A1 = B×C; 
       (b) 
  
npZ , where p is any prime integer. Let us consider a sim-ple example that defines functions f1[3:0] = 15(Y[3:0])3 - 
5(Y[3:0])2 + 19Y[3:0] + 6 and f2[3:0] = 7(Y[3:0])3 + 3(Y[3:0])2 + 
3Y[3:0] + 6. While f1 is not equivalent to f2 as polynomial 
functions over Z, they are equivalent overZ24, i.e. f1 mod 24º 
f2 mod 24. Computing their difference over Z24 results in 
f1[3:0] – f2[3:0] = 8(Y[3:0])3 - 8(Y[3:0])2 + 16Y[3:0]. While the 
result is non-zero polynomial, 
}15,...,1,0{,016mod)1688( 23 Î"=+- YYYY  and we 
say 8Y3-8Y2+16Y vanishes in Z24. In general, it is not 
straightforward at all to see whether given polynomials 
are vanishing ones or not. Actually, a straightforward ap-
proach which expands everything into Boolean domain 
does not clearly work. 
We have discussed how the properties of polynomial 
functions over finite integer ring allow us to reduce two 
polynomial functions to a canonical form based on the 
HED [25]. We follow the basic idea proposed in [22] but 
make use of the HED for efficient implementations and 
manipulations of polynomials with fixed bit-width that 
results in a new package called M-HED. Therefore, equiv-
alent polynomial functions over finite integer rings, i.e., 
data paths with finite bit-width in hardware designs, are 
automatically identified due to the canonical representa-
tion of the M-HED. Since we only use the M-HED to 
check the equivalence between two polynomials over 
finite word-length, and manipulating polynomials with 
fixed bit-width is not our contribution in this paper, we 
will not discuss further details and we refer the interested 
reader to our previous work [25]. 
4. PROPOSED EQUIVALENCE CHECKING 
APPROACH 
In this section we explain our methodology in more 
details. First, it is worthwhile explaining how a specifi-
cation and an RTL implementation with pipelined loop 
are converted to a list of assignments, so that they can 
be represented by M-HED. 
4.1 Symbolic Simulation 
Algorithmic Specification in C (ASC) and RTL with pipe-
lined loop in Verilog generated by CatapultC (PRTL) are 
treated as inputs of SEC-PIPED algorithm. Then execu-
tion of specification and implementation are translated 
into several assignments by using symbolic simulation 
(SymSim function in lines 1-2 of Figure 15). In symbolic 
simulation, symbolic values rather than concrete ones 
(integer or binary values) are used as input vectors. As a 
clarifying example, consider the C code of 4-point FFT 
shown in Figure 13. The intent of this code sequence is to 
perform the butterfly computations with three main loops. 
The outside loop counts through 2 stages of the FFT 
computation and it causes huge data-dependent computa-
tions. The inner loops perform the individual butterfly 
computations of each stage. The heart of the FFT algo-
rithm is the block of the code that performs each butterfly 
computation in the third loop. Note that wi ad wr parame-
ters are commonly known as twiddle factors and can be 
computed before the algorithm is fulfilled. To symboli-
cally simulate such a code, the loops are unrolled and 
therefore a list of assignments is obtained. Then, control-
ling assignments are removed and the indexes of the ar-
rays are adjusted. Figure 14 illustrates the list of assign-
ments after removing controlling assignments and adjust-
ing the indexes of the arrays in such a way that multiple 
assignments to a single variable don’t happen while data 
dependencies are preserved. The result of symbolic simu-
lation is a list of assignments which exactly mimics the 
behavior of given C code. In a similar manner, RTL 
codes are symbolically simulated.  
Note that in the list of assignments, we have three var-
iable types; inputs which are appearing only on the right 
hand side, outputs which are only in the left hand side and 
intermediate signals which are in both sides. The objec-
tive of equivalence checking is to check whether the func-
tionality of the output variables in the generated assign-
ment list according to the specification and implementa-
tion are equivalent or not. 
 
 
 
Figure 13. 4-point FFT specification. 
 
4.2 SEC Algorithm using M-HED 
As mentioned in Section 2, the main idea is introducing 
dynamic cut-points (CPs) to deal with reordering and 
out of order execution problem. In addition, CPs divide 
the list of assignments into several smaller segments 
making large designs tractable by M-HED.  
//Arrayswr(i)=cos(i*2*PI/N)*512&wi(i)=sin(i*2*PI/N)*512 
len = 4; 
incr = 1; 
for (stage = 0; stage < 2; stage++) 
       len1 = len; 
       len = len/2; 
       windex = 0; 
for (j=0; j < len; j++) 
       C = wr[windex]; 
       S = wi[windex]; 
//buttefly computation 
for (index = j; index < 4; index = index + len1) 
       index2 = index + len; 
       tmr = aar[index] - aar[index2]; 
      tmi = aai[index] - aai[index2]; 
      aar[index] = aar[index] + aar[index2]; 
      aai[index] = aai[index] + aai[index2]; 
      if (windex == 0) 
            aar[index2] = tmr;  
            aai[index2] = tmi; 
      else 
            aar[index2] = tmr*C - tmi*S; 
            aai[index2] = tmr*S + tmi*C; 
            windex = windex + incr; 
incr = 2*incr; 
  
len = 4; incr = 1; 
//stage = 0 
len1 = 4; len = 2; windex = 0; 
//j=0 
C = wr[0]; S = wi[0]; 
index = 0; index2 = 2; 
tmr = aar[0] - aar[2]; 
tmi = aai[0] - aai[2]; 
aar[0] = aar[0] + aar[2]; 
aai[0] = aai[0] + aai[2]; 
aar[2] = tmr; 
aai[2] = tmi; 
windex = 1; 
//j=1 
C = wr[1]; S = wi[1]; 
index = 1; index2 = 3; 
tmr = aar[1] - aar[3]; 
tmi = aai[1] - aai[3]; 
aar[1] = aar[1] + aar[3]; 
aai[1] = aai[1] + aai[3]; 
aar[3] = tmr*C – tmi*S; 
aai[3] = tmr*S + tmi*C; 
windex = 2;  incr = 2; 
//stage = 1 
len1 = 2; len = 1;  windex = 0; 
//j=0 
C = wr[0]; S = wi[0]; 
index = 0;  index2 = 1; 
tmr = aar[0] - aar[1]; 
tmi = aai[0] - aai[1]; 
aar[0] = aar[0] + aar[1]; 
aai[0] = aai[0] + aai[1]; 
aar[1] = tmr; 
aai[1] = tmi; 
index = 2;  index2 = 3; 
tmr = aar[2] - aar[3]; 
tmi = aai[2] - aai[3]; 
aar[2] = aar[2] + aar[3]; 
aai[2] = aai[2] + aai[3]; 
aar[3] = tmr; 
aai[3] = tmi; 
Figure 14. List of assignments after symbolic simulation, remov-
ing controlling variables and adjusting array indexes. 
 
In other words, dynamic CP insertion increases the 
scalability of M-HED for addressing equivalence 
checking problem. Selecting the right number and loca-
tion of CPs is a challenge. While choosing too few CPs 
leads to a blow up of the forward M-HED construction, 
choosing too many CPs results in lots of re-
substitutions for false negative elimination. In our pro-
posed method, in an iterative way, a CP is inserted in a 
location so that the size of corresponding M-HED be-
comes as large as possible. This way, two segments are 
generated. 
One segment in which the corresponding M-HED 
size is equal to the maximum allowed M-HED size and 
the other one in which the corresponding M-HED size 
can be even larger than the maximum allowed M-HED 
size. The first generated segment of the specification 
(sasc) and the implementation (sprtl) are compared to 
check for equivalency. Although in HLS due to re-
source sharing, the mapping between specification and 
RTL implementation usually is many to one, it is not 
necessary to compare each node in the segment of 
specification with all nodes in the segment of RTL im-
plementation. That is because we make use of a canon-
ical decision diagram, i.e. M-HED, so that two nodes 
that are functionally equivalent are automatically de-
tected. In each iteration of the SEC-PIPED algorithm, 
the following operations are done. 
If LAASC is not empty (line 6 of Figure 15), we insert 
a CP in the assignment list. As a result, two segments 
are generated. The size of corresponding M-HED of the 
first segment (sasc) is equal to the maximum size al-
lowed by M-HED. A segment in LAASC (sasc) is chosen 
by segmentSelector function and removed from LAASC 
(lines 7-8). Next, M-HED of all statements in sasc is cre-
ated (Hsasc in line 9). In the same way, if LAPRTL is not 
empty (line 13), a CP is inserted in LAPRTL. Then a 
segment in LAPRTL (sprtl) is chosen by using seg-
mentSelector function again and removed from LAPRTL 
(lines 14-15). Next, M-HED representation of sprtl 
(Hsprtl) is created (line16). 
Please note that the maximum size of a design that 
can be handled by M-HED is dependent on the struc-
ture of design. Typically, M-HED can handle 30000 as-
signment lines. Based on this information as well as the 
structure of designs to be verified the proper location 
of CP is automatically determined. After creating Hsasc 
and Hsprtl, they are compared using M-HED and the 
equivalent output nodes are specified by eo (line 20). 
As mentioned in Section 4-1, the output nodes are 
those that appear only the left-hand side in a segment. 
As a result of equivalence checking, the equivalent 
output nodes as well as those nodes that equivalent 
outputs are dependent on them (dni and dns) are re-
moved from sasc (lines 24-26) and sprtl (lines 29-31). 
New primary inputs for equivalent nodes in their plac-
es are introduced that are used in the next segments of 
LAASC and LAPRTL. Note that remaining segments 
are updated using UpdateSegment function in lines 27 
and 32 in order to reflect the effect of these new prima-
ry inputs. 
 
Figure 15. Sequential equivalence checking (SEC) of pipelined 
data path designs. 
   Another point to be noted here is that removing in-
ternal nodes when they have impact on equivalent 
nodes is not always safe. That is because these nodes 
SEC-PIPED (ASC, PRTL)  
1 LAASC   = SymSim(ASC); 
2  LAPRTL= SymSim(PRTL);  
3  sasc  = Æ; 
4  sprtl = Æ; 
5 WHILE (LAASC ¹ Æ or LAPRTL ¹ Æ) 
6      IF (LAASC ¹ Æ)  
7           sasc = segmentSelector(LAASC); 
8           LAASC = LAASC – sasc; 
9          Hsasc = HEDGen(sasc); 
10   ELSE 
11        Hsasc = HEDGen(sasc);//since it isn’t  constructed in Line9 
12   END IF 
13   IF (LAPRTL ¹ Æ)  
14         sprtl = segmentSelector(LAPRTL); 
15         LAPRTL = LAPRTL – sprtl; 
16         Hsprtl = HEDGen(sprtl); 
17   ELSE 
18      Hsprtl = HEDGen(sprtl);//since it isn’t  constructed in Line16 
19   END IF 
20   eo = EquChecking (Hsasc, Hsprtl);  //equivalent output nodes; 
21   IF (eo= Æ) //no equivalent nodes are found 
22        (LAASC, LAPRTL, sasc, sprtl) =  
 INTERNAL-EQU(LAASC,  LAPRTL, sasc, sprtl); 
23    ELSE 
24        sasc = sasc – eo;//remove equ. output nodes from sasc 
25        dns = {n Î sasc: $ eq Î eo & n has an impact on eq}; 
26        sasc = sasc – dns; 
27        LAASC=UpdateSegment(LAASC, eo, dns, PIs) 
28        sprtl = sprtl – eo;//remove equ. output nodes from sprtl 
29        dni = {n Î sprtl: $ eq Î eo & n has an impact on eq}; 
30        sprtl = sprtl – dni; 
31        LAPRTL =UpdateSegment(LAPRTL, eo, dni, PIs); 
32     END IF 
33 END WHILE    
34 IF (sasc = Æ and sprtl = Æ) 
35        RETURN “EQUIVALENT Designs”; 
36 ELSE 
37        RETURN “UNEQUIVALENT Designs”; 
38 END IF 
______________________________________________________________ 
  
may also have impact on other nodes that are checked 
yet. Replacing them with primary inputs would re-
move useful correlation among other nodes. To avoid 
this unsafe operation, during updating remaining 
segments by UpdateSegment function, if internal nodes 
are used in unprocessed segments, they are described 
in terms of primary inputs. To do so, we utilize useful 
embedded feature of M-HED described in Section 3 
that help us to do it easily.  
    It should be noted that the while loop of SEC-PIPED 
algorithm is finished when all segments in both LAASC 
and LAPRTL become empty. If the last updated sasc 
and sprtl are empty, it means that for all output nodes 
in sasc, there is an equivalent node in sprtl and vice 
versa which necessitate the specification and imple-
mentation to be equal (lines 34-35). Otherwise they are 
not equivalent (line 37).  
   Figure 16 illustrates the main idea behind SEC-PIPED 
algorithm. In this figure, pi, ii, si and npi are primary in-
put, implementation node, specification node, and fi-
nally new primary input defined in the place of 
equivalent nodes. As shown in this figure, the first 
segment of the implementation (sprtl) and specification 
(sasc) are compared using M-HED. The equivalent out-
put nodes of these segments (i4, i5, s1, s2) are detected and 
segments are updated by introducing new primary 
inputs in the place of equivalent nodes (np1 and np2 Fig-
ure 16(b)). Besides, during equivalence checking nodes 
that impact on equivalent nodes are removed except 
nodes that may affect on other nodes to be checked (i2 
in Figure 16(a)). These nodes are detected during up-
dating segments and described in polynomial form in 
terms of primary inputs using the useful property of 
M-HED (p2p3 in Figure 16(b)). Section 3 described how 
this polynomial can be written. The new segments are 
compared again and equivalent output nodes (i8 and s3 
in Figure 16(b)) are detected. 
4.3 Finding Internal Equivalent Nodes 
Suppose we follow the SEC-PIPED algorithm, but in 
the given segment size, no equivalent nodes exist (lines 
21-23 of Figure 15). In this case no nodes can be re-
moved from sasc or sprtl and therefore M-HED con-
struction is blocked. Since the size of the corresponding 
M-HED of sasc or sprtl is equal to the maximum al-
lowed M-HED size, in the next iteration no statement 
can be added to the selected segment and therefore this 
segment is selected again and again and the algorithm 
falls into an endless loop. In fact, because of several 
redundancies or optimizations added during pipelined 
RTL generation, the size of assignment lists of specifi-
cation (LAASC) and implementation (LAPRTL) would 
be different. Hence, when we choose one pair segment 
for comparison (sasc and sprtl) equivalent output nodes 
may not exist. The question that rises in the mind of 
reader is whether or not our methodology can handle 
such a case. 
In order to avoid blocking forward M-HED con-
struction, INTERNAL-EQU algorithm shown in Figure 
17 is proposed. The basic idea is to look for non-output 
(internal) equivalent nodes. To do so, first we compare 
output nodes of Htsprtl and a modified version of 
Htsasc while output nodes of Htsprtl and the original 
Htsasc are not equivalent. To obtain the modified 
Htsasc, sasc and sprtl are reserved in temporary loca-
tions (tsasc and tsprtl in lines 1-2 of Figure 17). We omit 
output statements of sasc, and append these statements 
to LAASC (lines 9-11 of Figure 17). After such deletions, 
some previously intermediate nodes appear as new 
output nodes. Afterwards, the M-HED of the modified 
tsasc is constructed (line 12). If all statements are re-
moved and we are not able to obtain any new equiva-
lent nodes, the process is repeated for tsprtl (lines 17-
24).  This is because output nodes of sprtl may be 
equivalent to internal nodes of sasc or vice versa. To 
perform such an operation, first, sasc is retrieved and 
the corresponding M-HED is reconstructed (lines 15-
16). Then all processes of the first while loop is repeat-
ed for sprtl to obtain internal nodes which are equiva-
lent to some output nodes of sasc. In this algorithm, 
when an equivalent pair of nodes is found, sasc, sprtl, 
LAASC, LAPRTL are updated and returned as results 
(lines 7 and 19). This way, we are able to detect equiva-
lent nodes based on an iterative deletion approach. 
Updated Segments by 
introducing PIs in places of 
equivalent nodes
SpecificationImplemantation
SpecificationImplemantation
sprtl sasc
sprtl sasc
(a)
p1 p2 p3 p4
i1 i2 i3
i5i4
i7i6
i8
p1 p2 p3 p4
s2s1
s3
np1
i7i6
i8
np2p1 p2
p3
p2
s3
np2p1 p3np1
(b)
 
Figure 16. Illustrations of the main part of SEC-PIPED function (a) when the first segments (sasc and sprtl) are compared (b) when the se-
cond segments (sasc and sprtl) are compared. 
  
 
Figure 17. Procedure of finding internal equivalent nodes. 
    
Figure 18 demonstrates how to use this algorithm. 
Suppose that the maximum size of M-HED only allows 
us to construct eleven nodes of implementation and six 
nodes of specification. As shown in Figure 18(a), be-
cause s1 and s2 are equivalent to i4 and i5, and also i4 and i5 
are considered as internal nodes (output nodes are i6 
and i7), no output nodes are equivalent. Hence, by us-
ing INTERNAL-EQU algorithm, output nodes of sprtl 
(i6 and i7) are removed and internal nodes (i4 and i5) are 
introduced as new outputs. This way, equivalent out-
put nodes are appeared and constructing M-HED can 
be resumed similar to Figure 16(a) as seen in Figure 
18(b). 
4.4 Example 
In this subsection we illustrate our methodology with 
an example. As mentioned before, Figure 5 and Figure 
6 show a nested loop in C code as an ASC block and 
related synthesized hardware. As you can see in Figure 
5, in the ASC the results of two multiplications are 
added and stored in temporary variables (tempf and 
temps). These temporary variables are multiplied and 
stored in a res array. The RTL code synthesized by Cat-
apultC is not pipelined and therefore finding cut-loop 
is possible. 
However, Figure 6 indicates a sequence of cycles 
when a RTL code with a loop pipelined is synthesized. 
As mentioned in Section 2, pipelining makes the 
equivalence checking hard. Figure 19 demonstrates the 
three steps of equivalence checking using our method-
ology. In this figure, output nodes in each segment of 
implementation and specification are colored as blue 
and green respectively. Suppose that the first segment 
of the specification (segment1s) includes tempf0 and 
temps0 as output nodes and the first segment of the 
implementation (segment1i) includes mul2, mul3, and 
add0 as output nodes. After creating M-HEDs for all 
assignments in these segments, add0 Î segment1i and 
tempf0 Î segment1s are detected as equivalent nodes. 
However, because add1Ï segment1i, neither add1 nor any 
other node is detected as an equivalent node to temps0. 
At this point, a new primary input is defined instead of 
equivalent nodes (NEW PI in Figure 19(b)) and {mul0, 
mul1, add0}Î segment1i and {tempf0}Î segment1s are re-
moved.  
Since {mul1, mul2}Î segment1i have no impact on add0 
and {temps0}Î segment1s has no equivalent node in seg-
ment1i, they are not removed from segment1i and segment1s. 
In the next phase, as shown in Figure 19(b), new seg-
ments segment2s and segment2i are taken into account so 
that {NewPI, temps0, res00} Î segment2s and { NewPI, 
mul2, mul3, mul4, mul5, add1} Î segment2i. During equiv-
alence checking, no output node is matched and there-
fore INTERNAL-EQU algorithm needs to be called. In 
this algorithm, first of all, output res00 is removed from 
segment2s which makes temps0 a primary output of seg-
ment2s (Figure 19(c)). Then, M-HEDs of updated seg-
ments are constructed again and SEC-PIPED is re-
sumed. At this point, the equivalent nodes temps0 and 
add1 are detected. Next, they are removed and new 
primary inputs are defined instead of them. This pro-
cedure is continued so that all nodes can be removed 
from sasc and sprtl. If all segments have been pro-
cessed, LAASC and LAPRTL, and also the final sasc and 
sprtl have become empty, the algorithm returns 
EQUIVALENT Designs. As it can be observed, our pro-
posed solution can be applied to RTL designs with 
pipelined nested loops and complex structure.
1.	Removing	output	nodes	of	
Implementation.	
2.	Introducing	previous	internal	nodes	
as	new	output	nodes.
SpecificationImplemantation
sprtl sasc
(a)
p1 p2 p3 p4
i1 i2 i3
i5i4
i7i6
i8
p1 p2 p3 p4
s2s1
s3
SpecificationImplemantation
sprtl sasc
(b)
p1 p2 p3 p4
i1 i2 i3
i5i4
i7i6
i8
p1 p2 p3 p4
s2s1
s3
 
Figure 18. Illustration of the main part of INTERNAL-EQU function. (a) before finding internal-equivalent nodes when the first segments (sasc 
and sprtl) are compared. (b) after finding internal-equivalent nodes when the first segments (sasc and sprtl) are compared. 
  INTERNAL-EQU(LAASC, LAPRTL, sasc, sprtl) 
1  tsasc =sasc; 
2   tsprtl =sprtl; 
3   Htsasc = HEDGen(tsasc); 
4   Htsprtl = HEDGen(tsprtl);  
5    WHILE (tsasc¹ Æ) 
6          IF EquChecking (Htsasc,Htsprtl) ¹ Æ 
7                   RETURN	LAASC, LAPRTL, tsasc, tsprtl; 
8         ELSE 
9              ontsasc = outputs statements in tsasc; 
10 tsasc    = tsasc - ontsasc; 
11            LASASC = LAASC È ontsasc; 
12  Htsasc = HEDGen(tsasc);   
13       END IF 
14   END WHILE 
15   tsasc = sasc 
16   Htsasc = HEDGen(tsasc);  
17   WHILE (tsprtl¹ Æ) 
18         IF EquChecking (Htsasc,Htsprtl) ¹ Æ 
19                RETURN	LAASC, LAPRTL, tsasc, tsprtl; 
20         ELSE 
21                 ontsprtl = outputs statements in tsprtl; 
22                tsprtl    = tsprtl- ontsprtl; 
23               Htsprtl  = HEDGen(tsprtl); 
24               LAPRTL=LAPRTL È ontsprtl; 
24         END IF 
25    END WHILE 
26   RETURN “UNEQUIVALENT Designs” 
____________________________________________________________ 
  
REG
clk
New 
PI
?
New 
PI
clk
?
At this point, no node 
can be removed and 
therefore deadlock 
occurs in forward M-
HED construction!
REG
REG
clk
clk
New 
PI
clk
Update segments by 
introducing PIs in places 
of equivalent nodes
Removing output nodes 
of Specification and  
introducing previous  
internal nodes as new 
output nodes
X
X
X
X
X
X
X
X
X
+
+
X
X
X
+
X
X
X
X
+
REGclkX
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
REG
clk
segment1s
segment1i segment2i
REG
clk
X
X
+
X
X
+
X
a[0]
b[0]
c[0]
d[0]
c[0]
d[0]
a[0]
b[0]
a[0]
b[0]
c[0]
d[0]
c[0]
d[0]
a[0]
b[0]
X
X
+
X
c[0]
d[0]
a[0]
b[0]
c[0]
d[0]
a[0]
b[0]
a[1]
b[0]
c[0]
d[1]
c[1]
d[0]
a[0]
b[1]
X
X
REG
clk
REG
clk
c[1]
d[0]
a[0]
b[1]
a[1]
b[0]
c[0]
d[1]
c[0]
d[0]
a[0]
b[0]
New 
PI
X
X
+
X
c[0]
d[0]
a[0]
b[0]
(a) (b) (c)
Equivalent functions 
are detected (using 
M-HED)
No equivalent function is detected
No equivalent function 
is detected
Equivalent 
functions are 
detected
segment2s segment3s
segment3i
tempf0
temps0
mul0
mul1
mul2
mul3
add0
add1
res00 res00
mul2
mul3
add1
mul4
mul5
mul6
mul7
mul8
temps0
mul2
mul3
add1
mul4
mul5
mul6
mul7
mul8
res00
temps0
 
Figure 19. Demonstration of steps of our proposed equivalence checking methodology 
 
5. LIMITATIONS OF PROPOSED METHODOLOGY 
False negatives are failing properties that unintentionally 
raise a wrong flag as a sign of unequal design while there 
is no bug in the design. Verification engineer must elimi-
nate false negatives to make sure only real bugs lead to 
failing properties [19]. It is well known that using each 
type of cut-points in each kind of decision diagrams and 
even equivalence checking approaches can potentially 
lead to false negatives. In the proposed methodology we 
try to avoid false negatives in different ways.  
First, we move nonequivalent nodes in a given seg-
ment i into another segment j and repeat the checking 
process until nodes become equivalent or all segments are 
covered (i.e. at the end of statements). In fact, our meth-
odology doesn’t decide about the results of equivalence 
checking immediately after finding nonequivalent nodes 
in a pair of checked segments. Because two nodes in a 
specific sasc and sprtl (e.g. sasci and sprtlk) may not be 
equivalent, while by moving a node from sasci into sascj, 
we can find equivalent nodes (like temps0 in Figure19). 
The scalability of the proposed methodology can be kept 
by removing equivalent nodes and nodes that effect on 
them. These eliminations reduce the size of equivalence 
checking significantly.  
Second, whenever internal nodes those effect on 
equivalent nodes as well as other nodes in the unpro-
cessed segments are removed, new inputs are introduced 
instead of them. These inputs are described in terms of 
primary inputs of design in polynomial form using M-
HED that provides facilities to avoid false negative results 
as much as possible (like i2 in the Figure16(a) ). Note that 
false positives do not occur on M-HED due to its canoni-
cal representation. 
6. EXPERIMENTAL RESULTS 
Our methodology for equivalence checking has been 
implemented in C++ and carried out on an Intel 2.8 GHz 
Corei7 with 8 GB main memory running Linux with Qt 
creator as an IDE. In order to demonstrate the effective-
ness of the proposed equivalence checking technique, we 
apply our technique to several designs common in Digital 
Signal Processing (DSP) and multimedia applications. The 
benchmarks include ColorConversion as an algorithm ena-
bling different conversion standards to be supported with 
the same hardware, Sobel as a convolution algorithm 
which is the core of many image processing algorithms, 
Finite Impulse Response with two sizes (FIR4, FIR32) as 
the most common digital filter, Discrete Cosine Transform 
(DCT16, DCT32) and Fast Fourier Transform (FFT32, 
FFT64, FFT256, FFT512). These designs come from a vari-
ety of problem domains such as mathematics, digital sig-
nal processing and multimedia. Furthermore, we employ 
CatapultC as an HLS tool to automatically generate RTL 
codes from C code of these benchmarks[1]. Each circuit 
has been piped in a certain frequency. In addition we use 
Minisat [35] as a SAT-solver and Z3 [36] which is general-
ly considered the fastest SMT solver. The time out (TO) is 
set to 1000 seconds. 
Table I shows the experimental results with and 
without using our methodology. The first column 
(Benchmark) is the benchmark name. The second col-
umn (#LRTL) shows the number of RTL lines of each 
benchmark generated by CatapultC. The Third column 
(#LS) shows the number of lines obtained after symbol-
ic simulation. Loop information column indicates the 
loop information such as the number of single, 2-
nested and 3-nested loops in the design. The major 
column without our methodology shows the results when 
primary outputs are directly expressed in terms of 
primary inputs and then represented by M-HED. The 
last major column, with our methodology, shows the re-
sults in terms of memory usage (MemoryUsage) and 
required processing time (CpuTime) after applying the 
proposed method. 
 
 
  
Table I. Experimental results of equivalence checking without/with our methodology. 
Benchmark # LRTL #LS Loop information without our methodology with our methodology 
Singles 2-nested 3-nested MemoryUsage (MB) CpuTime (s) MemUsage (MB) CpuTime (s) 
ColorConversion 141 712 0 1 0 1 0.1 1 0.1 
FIR4 170 848 2 0 0 1.9 0.2 1.9 0.2 
FIR32 243 2212 2 0 0 5.2 6.1 2.6 1.3 
DCT16 854 3996 0 0 2 MO NA 5.2 6.1 
DCT32 1258 6352 0 0 2 MO NA 11.7 9.2 
Sobel 2268 9765 0 3 0 MO NA 12.1 12.7 
FFT32 2742 10156 0 0 1 MO NA 15.6 14.2 
FFT64 2986 14258 0 0 1 MO NA 38.5 31.1 
FFT256 3378 32568 0 0 1 MO NA 89.5 82.7 
FFT512 3976 88682 0 0 1 MO NA 121.6 219.2 
(MO : Out of 8GB memory; NA : Not applicable, due to the memory out, time is not reported; CPU time is given in seconds) 
 
As the results show, in some cases (ColorConversion, 
FIR4, and FIR32) without our methodology and with our 
methodology can handle the problem of equivalence 
checking, but in other cases our methodlogy can han-
dle it efficiently, while without our methodology, we have 
faced with memory out (MO) problem. These results 
convince us that equivalence checking without our 
methodology is prohibitively expensive and even im-
possible for large designs. Furthermore, as stated in 
Section 2, directly using of several optimizations for 
equivalence checking such as cut-loop and cut-plane 
techniques is inapplicable when the loops of thr design 
is pipelined. In this situation, our methodology can 
solve the problem of equivalence checking efficiently. 
In fact, M-HED can represent arithmetic operations at 
word level representation and there is no need to en-
code them to bit-level operations. Besides, it can handle 
bit-level operations as well as world level. Indeed M-
HED is a strong and scalable decision diagram for rep-
resenting and verification of datapath circuits [27, 30] 
 In another experiment, we have tried to solve the 
equivalence checking by using a SAT-solver. The re-
sults reported in Table II show that using SAT-solvers 
in datapath circuits especially with many arithmetic 
components is inefficient. As it can be seen, using SAT 
increases verification run time significantly even for 
small circuits. Obviously, when using our methodolo-
gy, the run time for equivalence checking is reduced by 
111.9× on average, i.e., two orders of magnitude of av-
erage speedup. As opposed to low level methods such 
as Boolean SAT based techniques, the results indicate 
that our method not only uses an efficient canonical 
form to represent symbolic expressions but also is scal-
able even on large circuits. 
In order to complete the set of results, we compared 
our results against using Z3 as an SMT-solver in Table 
III. SMT-solvers try to handle the weakness of SAT-
solvers on arithmetic designs by combining SAT-
solvers with different mathematical theories. In these 
engines, the input design is first simplified by using 
different theories such as linear arithmetic, theory of 
arrays, and bit-vectors approaches. 
 
Table II. Improvements in comparison with SAT-based method 
Benchmark 
Using SAT 
Improvements by 
using our methodol-
ogy 
MemoryUsage 
(MB) 
CpuTime      
(sec) 
MemoryUsage 
(MB) 
CpuTime 
(sec) 
ColorConversion 9.2 41.1 9.2× 411× 
FIR4 10.1 54.6 5.3× 273× 
FIR32 28.2 121 10.9× 93.1× 
DCT16 32.2 301.2 6.2× 49.4× 
DCT32 47.2 518.6 4.1× 56.4× 
Sobel 78.2 872.7 6.5× 68.7× 
FFT32 87.2 923.5 5.6× 65.1× 
FFT64 NA TO NA 48.2× 
FFT256 NA TO NA 36.3× 
FFT512 NA TO NA 18.3× 
Average Improvement by using our meth-
odology NA 111.9× 
TO: Out of 1000 sec; NA : Not applicable, due to the timeout, 
the memory usage is not reported 
 
Hence, the remaining instance is smaller and easier 
to solve. Although the most useful theories in equiva-
lence checking are related to bit-vector and array, in 
small cases, using these theories for abstraction and 
simplification versus solving the problem by M-HED is 
a time-consuming task and therefore the run time in-
creases. For DCT16 and DCT32 benchmarks, the results 
obtained by SMT-solver are better than those of M-
HED. That is because, in these benchmarks, the synthe-
sized designs have many bit-level descriptions. Alt-
hough M-HED has features to handle bit-level opera-
tions efficiently, SMT-solvers are strong and powerful 
tool for verification when word-level and many bit-
level operations are mixed in a single design. In com-
parison with M-HED, they are not good enough to 
handle datapath designs which are described mostly in 
word level. In addition, without using our methodolo-
gy, word level engines cannot handle the equivalence 
checking problem of datapath pipelined loop design 
  
efficiently. 
 
Table III. Improvements in comparison with SMT-based method. 
Benchmark 
Using SMT Improvements by using our methodology 
MemoryUsage 
(MB) 
CpuTime      
(sec) 
MemoryUsage 
(MB) 
CpuTime 
(sec) 
ColorConversion 1.5 1.2 1.5× 12× 
FIR4 2.5 1.7 1.3× 8.5× 
FIR32 2.8 3.5 1.1× 2.7× 
DCT16 3.9 4.4 0.8× 0.7× 
DCT32 7.7 6.5 0.7× 0.7× 
Sobel 19.8 15.1 1.6× 1.2× 
FFT32 44.3 21.8 2.8× 1.5× 
FFT64 101.9 47.2 2.6 1.5× 
FFT256 MO NA 89.4× NA 
FFT512 MO NA 65.8× NA 
Average Improvement by using our method-
ology 16.7× NA 
TO: MO : Out of 8GB memory; NA : Not applicable, due to the 
memoryout, the time is not reported 
Color
conver
sion
FIR4 FIR32 DCT4 DCT32 SOBEL FFT32 FFT64 FFT256 FFT512
Time(Max	size) 0.1 0.2 1.3 6.1 9.2 12.7 14.2 31.1 82.7 219.2
Time(Max	size/2) 0.1 0.2 1.3 10.3 16.7 18.4 21.8 42.7 107.2 282.6
Time(Max	size/3) 0.1 0.2 1.3 15.2 21.8 27.2 31.4 51.7 136.8 313.3
0
50
100
150
200
250
300
350
Ti
m
e(
s)
Figure 20. The effect of segmentation size on the run time. 
 
In the last experiment, the effect of segmentation 
size and selecting the right number and location of CPs 
on the run time is investigated. Figure 20 reports the 
results for three cases: maximum allowed M-HED size 
(Max Size), Max Size/2 and Max Size/3. As can be seen 
in this figure, the running time has remained fixed in 
the first three cases. This is because the whole design is 
located in one segment, even in the case of Max Size/3. 
The results show that by reducing the segmentation 
size the processing time increases. That is because by 
reducing the segmentation size the number of non-
equivalent output nodes in each segment increases and 
therefore INTERNAL-EQU function must be used to 
avoid blocking forward M-HED construction and 
therefore the processing time increases. 
7. CONCLUSION 
In this paper, we have introduced a formal equivalence 
checking methodology for behavioral synthesized 
pipelined designs with nested loops based on a canoni-
cal decision diagram called M-HED that supports 
modular polynomial computations. For increasing the 
scalability of our methodology, we employ dynamic 
cut-points which enable us to effectively perform se-
quential equivalence checking. To the best of our 
knowledge, this is the first work that formally checks 
the equivalence of pipelined nested loops by using 
high level decision diagrams. The experimental results 
demonstrate that our proposed methodology can sup-
port designs with arbitrary structures and large data 
path. Average improvements in terms of the memory 
usage and run time in comparison with SMT- and SAT-
based equivalence checking are 16.7× and 111.9× re-
spectively.  
8. REFERENCES 
[1] Mentor Graphics, Catapult C Reference Manual, 2011. 
[2] Y.L. Lin, “Recent Developments in High-level Synthesis,” In 
ACM Transactions on Design Automation of Electronic Sys-
tems, volume 2, issue 1, pp. 2-21,1997. 
[3] J. Cong, Y. Fan, G. Han, W. Jiang, and Z. Zhang, “Behavioral 
and Communication Co-Optimizations for Systems with Se-
quential Communication Media,” In Proc. of Design Auto-
mation Conference, pp. 675-678. 2006.  
[4] A. J. Hu, “High-level vs. RTL Combinational Equivalence: 
An Introduction,” In Proc. of International Conference on 
Computer Design, pp. 274-279, 2006.  
[5] S. Kundu, S. Lerner, and R. Gupta, “Validating High-Level 
Synthesis,” In Proc. of Computer Aided Verification, pp. 459-
472, 2008. 
[6] D. Gajski, N. D. Dutt, A. Wu, and S. Lin, High Level Synthe-
sis: Introduction to Chip and System Design, Kluwer Aca-
demic Publishers, MA, USA, 1993. 
[7] X. Feng, A. J. Hu, and J. Yang, “Partitioned Model Checking 
from Software Specifications,” In Proc. of Asia and South 
Pacific-Design Automation Conference, pp. 583-588, 1993. 
[8] L. Claesen, M.Genoe, and E. Verlind, “Implementa-
tion/Specification Verification by Means of SFG-Tracing,” In 
Proc. of International Conference on Correct Hardware De-
sign and Verification Methods, pp. 583-588, 2005. 
[9] P. Chauhan, et al., “Non Cycle Accurate Sequential Equiva-
lence Checking,” In Proc. of Design Automation Conference, 
pp. 460-465, 2009. 
[10] K. Hao, F. Xie, S. Ray, and J. Yang, “Optimizing Equivalence 
Checking for Behavioral Synthesis,” In Proc. of Design Au-
tomation & Test in Europe, pp. 1500-1505, 2010. 
[11] K. Hao, S. Ray, F. Xie, “Equivalence Checking for Behavior-
ally Synthesized Pipelines,” In Proc. of Design Automation 
Conference, pp. 344-349, 2012. 
[12] Z. Yang, K. Hao, S. Ray, and F. Xie, “Handling Design and 
Implementation Optimizations in Equivalence Checking for 
Behavioral Synthesis,” In Proc. of Design Automation Con-
ference, pp. 1-6, 2013. 
[13]  Z. Yang, K. Hao, K. Cong, S. Ray, and F. Xie, “Equivalence 
Checking for Compiler Transformations in Behavioral Syn-
thesis,” In Proc. of International Conference on Computer 
Design, pp. 491-494, 2013. 
[14]  K. Hao, S. Ray, and F. Xie, “Equivalence Checking for Func-
tion Pipelining in Behavioral Synthesis,” In Proc. of Design 
Automation & Test in Europe, pp. 1-6, 2014. 
[15] M. N. Velev, and P. Gao, ”Automatic Formal Verification of 
Multithreaded Pipelined Microprocessors,” In Proc. of Inter-
national Conference on Computer Aided Design, pp.679-
686, 2011. 
[16] B. Alizadeh, ”Formal Verification and Debugging of Precise 
Interrupts on High Performance Microprocessors,” In ACM 
  
Transactions on Design Automation of Electronic Systems, 
vol. 17, no. 4, pp. 37-1:37-8, 2012. 
[17] B. Alizadeh, A. Gharehbaghi, and M. Fujita,” Pipelined Mi-
croprocessors Optimization and Debugging,” In Proc. of 
Applied Reconfigurable Computing, pp. 435-444, 2010. 
[18] C. Karfa, C. Mandal, D. Sarkar, S.R. Pentakota, and C. Reade, 
“A Formal Verification Method of Scheduling in High-level 
Synthesis,” In Proc. of International Symposium on Quality 
Electronic Design, pp. 71–78. 2006. 
[19] B. Wile, J. Goss, and W. Roesner, Comprehensive Functional 
Verification: The Complete Industry Cycle. First Edition, 
Elsevier, 2008. 
[20] Shekhar, P. Kalla, and F. Enescu, “Equivalence Verification of 
Polynomial datapaths using Ideal Membership Testing,” In 
IEEE Transactions on Computer-aided-design, vol. 26, no. 7, 
pp. 1320-1330. 2007. 
[21] S. Horeth, and R. Drechsler, “Formal Verification of Word-
level Specifications,” In Proc. of Design Automation and Test 
in Europe, pp.52-58, 1999. 
[22] N. Shekhar, P. Kalla, and F. Enescu, “Equivalence Verifica-
tion of Polynomial Datapaths with Multiple Word-length 
Operands,” In Proc. of the Conference on Design, Automa-
tion and Test in Europe, pp. 824-829. 2006. 
[23] B. Becker, R. Drechsler, and R. Enders, “On the Representa-
tional Power of bit-level and Word-level Decision Dia-
grams,” In Proc. of Asia and South Pacific-Design Automa-
tion Conference, pp. 461-467, 1997. 
[24] D. Singmaster, “On Polynomial Functions (mod m),” In 
Journal of Number Theory, vol. 6, pp. 345-352, 1974.  
[25] B. Alizadeh, and M. Fujita, “Modular Datapath Optimiza-
tion and Verification Based on Modular-HED,” In IEEE 
Transactions on Computer-aided Design of Integrated Cir-
cuits and Systems, vol. 29, no. 9, pp. 1422-1435, 2010. 
[26] N. Hungerbuhler, and E. Specker, “A Generalization of the 
Smarandache Function to Several Variables,” In Electronic 
Journal of Combinatorial Number Theory, vol. 6, pp. A23-11, 
2006.  
[27] B. Alizadeh, and M. Fujita, “A Unified Framework for 
Equivalence Verification of Datapath Oriented Applica-
tions,” In Transactions on Information and Systems (IEICE), 
vol. E92-D, no. 5, pp. 985-994, 2009. 
[28] A. Kuehlman, and F. Krohm, “Equivalence Checking Using 
Cuts and Heaps,” In Proc. of Design Automation Confer-
ence, pp. 263-268, 1997.  
[29] O.H. Ibarra, and S. Moran, “Probabilistic Algorithms for 
Deciding Equivalence of Straight-line Programs,” in Journal 
of the Association for Computing Machinery, vol. 30, pp. 
217-228, 1983. 
[30] B. Alizadeh, and M. Fujita, “Automatic Merge Point Detec-
tion for Sequential Equivalence Checking of System-level 
and RTL Descriptions,” In Proc. of International Symposium 
on Automated Technology for Verification and Analysis, 
LNCS 4762, pp. 129-144, 2007. 
[31] B. Alizadeh, and M. Fujita, “A Functional Test Generation 
Technique for RTL Datapaths,” In Proc. International High 
Level Design Validation and Test Workshop, pp. 64-70, 2012. 
[32] B. Alizadeh, and M. Fujita, “A Hybrid Approach for Equiva-
lence Checking Between System Level and RTL Descrip-
tions,” In Proc. of International Workshop on Logic and Syn-
thesis, pp. 298-30, 2007. 
[33] B. Alizadeh, and M. Fujita, “LTED: A Canonical and Com-
pact Hybrid Word-Boolean Representation as a Formal 
Model for Hardware/Software Co-designs,” In Proc. of In-
ternational Workshop on Constraints on Formal Verification, 
pp. 15-29, 2007. 
[34] B. Alizadeh, and M. Fujita, “A Novel Formal Approach To 
Generate High Level Test Vectors Without ILP And SAT 
Solvers,” In Proc. of International High Level Design Valida-
tion and Test Workshop, pp. 97-104, 2007. 
[35] N. Eén, and N. Sörensson, “An Extensible SAT solver,” In 
Proc. of Theory and Applications of Satisfiability Testing vol. 
2919, pp. 502–518., 2003. 
[36] L. de Moura, and N. Bjørner, “Z3: An Efficient SMT Solver,” 
In Proc. of Tools and Algorithms for the Construction and 
Analysis of Systems Conference, pp. 337–340, 2008. 
[37] B. Alizadeh, and P. Behnam, “Formal Equivalence Verifica-
tion and Debugging Techniques with Auto-correction Mech-
anism for RTL Designs,” In Journal of Microprocessors and 
Microsystems, vol. 37, no. 8, pp. 1108-1121, 2013. 
[38] C. A. J. van Eijk, “Sequential Equivalence Checking Based on 
Structural Similarities,” In IEEE Trans. Computer-Aided De-
sign of Integrated Circuits and Systems, pp. 814-819, volume 
19, Issue: 7, 2000. 
[39] E. Goldberg,  M. Prasad, and R. Brayton, “Using SAT 
for Combinational Equivalence Checking,” In Proc. of De-
sign Automation and Test in Europe, pp. 114-121, 2001.  
[40] J. Tristan, and X. Leroy, “A Simple, Verified Validator for 
Software Pipelining,” In Proc. of ACM SIGPLAN-SIGACT 
Symposium on Principles of Programming Languages, pp. 
83–92, 2010. 
[41] S. Sadeghi-kohan, P. Behnam, B. Alizadeh, M. Fujita, and Z. 
Navabi, “Improving Polynomial Datapath Debugging with 
HEDs,” in Proc. European Test Symposium, pp. 1-6, 2014. 
[42] B. Alizadeh, P. Behnam, S. Sadeghi-Kohan, “A Scalable For-
mal Debugging Approach with Auto-Correction Capability 
Based on Static Slicing and Dynamic Ranking for RTL 
Datapath Designs,” in IEEE Transactions on Computers, 
vol.64, no.6, pp.1564-1578, 2015. 
[43] P. Behnam, B. Alizadeh, Z. Navabi, “Automatic Correction 
of Certain Design Errors Using Mutation Technique,” in 
Proc. IEEE European Test Symposium, pp. 1-2, 2014. 
[44] P. Behnam, and B. Alizadeh, “In-circuit Mutation-based 
Automatic Correction of Certain Design Errors Using SAT 
Mechanisms," in Proc. IEEE Asian Test Symposium, pp. 
199_204, 2015. 
[45] P. Behnam, B. Alizadeh, S. Taheri, and M. Fujita, “Formally 
Analyzing Fault Tolerance in Datapath Designs Using 
Equivalence Checking,” In Proc. IEEE/ACM Asia and South 
Pacific Design Automation Conference, Macao, China, 
pp.133-138, 2016. doi: 10.1109/ASPDAC.2016.7428001.  
URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnu  
mber=7428001&isnumber=7427971 
[46] H. Sabaghian-Bidgoli, P. Behnam, B. Alizadeh and Z. Nava-
bi, "Reducing Search Space for Fault Diagnosis: A Probabil-
ity-Based Scoring Approach," IEEE Computer Society Annu-
al Symposium on VLSI, pp. 545-550, 2017. 
[47] A. Ahmed, F. Farahmandi, and P. Mishra, “Directed Test 
Generation using Concolic Testing of RTL Models,” 
IEEE/ACM Design Automation and Test in Europe, 2018. 
 
