Study of a multilevel approach to partitioning for parallel logic simulation by Swaminathan Subramanian et al.
Study of a Multilevel Approach to Partitioning for Parallel Logic Simulation
￿
Swaminathan Subramanian, Dhananjai M. Rao,a n dPhilip A. Wilsey
Experimental Computing Laboratory, Cincinnati, OH 45221–0030
Abstract
Parallel simulation techniques are often employed to
meet the computational requirements of large hardware
simulations in order to reduce simulation time. In addi-
tion, partitioning for parallel simulations has been shown
to be vital for achieving higher simulation throughput. This
paper presents the results of our partitioning studies con-
ducted on an optimistic parallel logic simulation frame-
work based on the Time Warp synchronization protocol.
The paper also presents the design and implementation of
a new partitioning algorithm based on a multilevel heuris-
tic, developed as a part of this study. The multilevel al-
gorithm attempts to balance load, maximize concurrency,
and reduce inter-processor communication in three phases
to improve performance. The experimental results obtained
from our benchmarks indicate that the multilevel algorithm
yields better partitions than other partitioning algorithms
included in the study.
1 Introduction
Parallel simulation tools are frequently used to simu-
late large and complex digital circuits in order to reduce
time for simulation [3]. To extract better performance from
parallel logic simulators, partitioning techniques are neces-
sary [5, 19, 20]. The partitioning techniques can exploit ei-
ther the parallelism inherent in (i) the simulation algorithm,
or (ii) the circuit being simulated. The amount of paral-
lelism that can be gained from the former method is limited
by the algorithm used for simulation. The latter method at-
tempts to improve performance by dividing the circuit to
be simulated across processors. Hence, during simulation,
the workload is distributed and the concurrency and paral-
lelism in the circuit are exploited. Its success is bounded
by the amount of parallelism inherent in the circuit and the
number of processors available for simulation. The parti-
tioning algorithms, discussed in this paper, concentrate on
achieving speedup by improving concurrency, minimizing
￿Support for this work was provided in part by the Defense Advanced
Research Projects Agency under contract DABT63–96–C–0055.
inter-processor communication, and balancing the proces-
sor workload based on the circuit being simulated [1]. The
new multilevel approach to partitioning attempts to opti-
mize the aforementioned factors by decoupling them into
separate phases. The multilevel algorithm for partitioning
hasbeenstudiedandanalyzedin[8,12]andhasbeenshown
to produce high quality partitions (measured with respect
to edges cut, i.e., the number of edges that cross partition
boundaries)over several partitioning algorithmssuch as the
inertial and the spectral bisection algorithms. The complex-
ity of the multilevel algorithm is
O
(
N
E
),w h e r eN
E rep-
resents the number of edges in the circuit graph making the
multilevelpartitioningtechniquea fast linear time heuristic.
The remainder of the paper discusses the partitioning
studies that were conducted to improve the performance of
a parallel VHDL simulation framework,developedas a part
of the SAVANT [21] project. The partitioning techniques
were seamlessly integrated into the simulation framework
in order to ease their study and use. A brief summary of
the different partitioning techniques developed previously
is presented in Section 2. Section 3 illustrates the multilevel
partitioning algorithm developed as a part of this study. A
brief description of the experimental framework along with
the issues involved in the design and integration of the par-
titioning algorithms are presented in Section 4. The re-
sults of the experimentsconducted using the framework are
presented in Section 5. Section 6 presents the conclusions
drawn from the study along with pointers to future work.
2 Related Work
Several techniques have been developed to partition
logic circuits for parallel simulation. The algorithms ad-
dress various issues related to concurrency,communication
and load balancing. Prathima [1] presents a technique for
partitioning using element strings. The algorithm assigns
chains of logic gates to each partition, to encourage con-
currency, and maintains gates with different delays on a
fanouttogether,tominimizecommunication. Patiletal[17]
employed several heuristics, such as the greedy and simu-
lated annealing techniques, for partitioning, based on a cost
function to estimate the execution time for parallel simu-lation, given the processor assignment and the underlying
architecture. The two phase corolla approach to partition-
ing for Time Warp simulation was studied by Sporrer et
al [20]. In this approach, a ﬁne-grained clustering step
initially identiﬁes strongly connected regions which is fol-
lowed by a coarse-grained step forming partitions. A con-
currency preserving partitioning (CPP) algorithm, that em-
ployed instantaneousworkload for load balancing, was pre-
sented by Hong [14]. Bagrodia et al [2] have illustrated the
use of an acyclic multi-way partitioning scheme for gate
level simulations.
A partitioning scheme based on fanout/fanin cone
clustering starting from the input gates was studied by
Smith [19]. A random partitioning scheme that assigns
nodes to partitions in a random and load balanced man-
ner was reported in [15]. A major bottleneck for the ran-
dom partitioner is communication. A Depth-First traversal
of the circuit graph can also be utilized for partitioning by
assigning nodes to partitions in the order traversed [11].
Cloutier [5] and Smith [19] utilize a topological (or level)
algorithm for partitioning. This technique proceeds by ﬁrst
levelizing the circuit graph and then assigning nodes at the
same topological level to a partition. Detailed analyses of
this partitioningalgorithmforTimeWarp basedsimulations
is available in literature [5, 19]. The fanout/fanin cone clus-
tering,randompartitioning,thedepthﬁrst searchpartitioner
and the topological partitioner have been included in this
study.
3 The Multilevel Approach
The partitioning algorithms included in this study use a
directed graph representation of the input circuit. The cir-
cuit graph is represented as a directed graph
G
=
(
V
;
E
)
where
V forms the vertex set and
E the edge set of the
graph. The vertices denote logic gates and edges repre-
sent signals [9] that interconnect these logic gates. An ideal
partitioning of a circuit graph implies that the load is per-
fectly balanced and an equal number of gates are active
in each partition at any simulation instance. A particular
partitioning algorithm cannot provide ideal partitioning for
all circuit graphs because each circuit has its own structure
and pattern of communication. The multilevel algorithm at-
tempts to satisfy such constraints by separating out these
concerns in three phases; namely (i) the coarsening phase;
(ii) the initial partitioning phase; and (iii) the reﬁnement
phase. Unlike other algorithms, the strength of the multi-
level approach stems from the fact that it allows reﬁning
the circuit graph at several intermediate levels instead of at
only the original circuit graph level. A detailed description
of each of phase is presented in the following paragraphs.
Coarsening: The various stages involved in the coarsen-
ing phase are shown in Figure 1. The coarsening phase
proceeds from the primary input nodes in the graph. The
graph of the initial set of processes (gates or nodes) con-
stituting the circuit to be simulated is represented by
G
0.
The coarsening phase produces a hierarchical sequence of
smaller graphs, say
G
1,
G
2,
:
:
:,
G
m from the original graph
G
0. Eachvertex(also called globule)in a lower level graph,
say
G
1, representsa set of connectedverticesin its immedi-
ate higher level graph
G
0. Each stage in this phase coarsens
(or subsumes) a set of inter-connected vertices to yield a
single vertex in the next stage. Emphasis on concurrency is
stressed in this phase. In essence, distributing the objects in
a concurrent manner reduces the number of rollbacks [7],
that occur during optimistic simulations, and improves per-
formance.
Coarsening of the graphs can be achieved by using dif-
ferent schemes based on various parameters. Coarsening
(irrespective of the scheme used) produces a sequence of
smaller graphs derived from the original graph
G
0.A n y
scheme simply deﬁnesthe mannerin which coarseningpro-
ceeds. As shown in Figure 1, coarsening starts from input
vertices and combines a set of vertices from a higher level
to form new vertices or globules in the next lower level. At
each level, a vertex is allowed to be coarsened only once
and vertices that contain a primary input vertex, are not al-
lowed to be combined together. The restriction is placed in
order to maintain concurrency. The coarsening procedure
halts when the number of globules fall below a threshold or
if alltheglobulesare inputglobulespreventingfurthercom-
bination. The resulting graphs from coarsening satisfy the
following relation. Given
G
0 is the original graph,
G
i
=
(
V
i
;
E
i
),a n d
G
i
+
1
=
(
V
i
+
1
;
E
i
+
1
) are graphs at levels
i
and
(
i
+
1
) , then,
V
i
+
1
=
f
V
(
i
+
1
)
;
0
;
V
(
i
+
1
)
;
1
;
:
:
:
;
V
(
i
+
1
)
;
n
g.
Each
V
(
i
+
1
)
;
k
￿
V
i and
V
(
i
+
1
)
;
k
\
V
(
i
+
1
)
;
l
=
￿. The edge
set of a vertex at level
(
i
+
1
)then becomesthe union of the
edges of the vertices at level
i from which it was originally
composed. In the current implementation a fanout coarsen-
ing scheme was employed in order to improve concurrency
of parallel simulations. Graphs with a number of vertices
on their signals/edges tend to increase the number of roll-
backs [7] in optimistic simulations if vertices on intercon-
necting signals are split across partitions. Fanout coarsen-
ing avoids this by maintaining vertices on a signal together
in a partition; this reduces communicationacross partitions.
Inthistechnique,coarseningbeginsfromprimaryinputver-
tices and proceeds in a depth-ﬁrst manner. When a vertex is
chosenforcoarseninginthisscheme,it iscombinedwith all
other vertices on its fanout. A vertex could be connected to
several signals, however,only one of them is considered for
coarsening. At each level, other than the ﬁrst, coarsening
starts from vertices that were just added to a globule in the
previous level, thereby increasing concurrency with linear
chains.
Initial Partitioning: The initial partitioning phase formsglobule circuit graph coarsest level
(Level 0) (Level 1)
(Level m)
Figure 1. Coarsening Procedure
Initial Partition Partitions at Level (m-1) Final Partitioned Graph
Level (m) Level 0
Figure 2. Reﬁning Procedure
the second stage of the multilevel partitioning approach.
Initial partitioning at the coarsest level provides a “
k-way”
partitioning of the original graph. The value of
k is deter-
mined by the number of partitions desired. This phase at-
tempts to load balance by distributing equal number of ver-
tices across partitions while preserving concurrency. This
phase is responsiblefor assigning verticesto partitions. The
partitions generated in this phase are further reﬁned in the
next phase of the multilevel algorithm. The initial partition
before reﬁnement is shown in Figure 2 using dotted lines.
Initially, all the input globules in the coarsest level are
split equally across the partitions such that the load is suf-
ﬁciently balanced. Any remaining globules are assigned to
partitions in a random manner, maintaining load balance.
Let the lowest level be
m,
V
i
j be the
j
t
h vertex at level
i
(
0
￿
i
￿
m
),a n d
P
[
V
i
j
] be partition to which vertex
V
i
j
was assigned. It can be shown that
8
(
v
2
V
i
j
)
P
[
v
]
=
P
[
V
i
j
],
where
v corresponds to a vertex from level
(
i
￿
1
) that has
beencombinedwith otherverticesfromlevel
(
i
￿
1
)to form
V
i
j at level
i.
Reﬁnement: The coarsening and the initial partitioning
phases concentrate on improvingconcurrencyand load bal-
ance. The reﬁnement phase, attempts to reduce communi-
cation by placing together strongly connected globules in
a partition. Starting from the lowest coarse-grained level,
this phase tries to load balance and reduce communication
through“
k-way” reﬁnement at each intermediatelevel. The
“
k-way”partition of the original graphgeneratedby the ini-
tial partitioning phase is utilized. The greedy algorithm for
local reﬁnement was used [12]. The greedy algorithm con-
vergesina few iterationsreducingthe timeneededforparti-
tioning. The greedy technique has also been shown to yield
better partitions [12] with reduced edge-cut compared to
other reﬁnement algorithms (e.g., Kernighan-Lin [13] and
Fiduccia-Mattheyses [6]).
This phase starts from the lowest level of the hierarchi-
cal sequence of graphs, namely level
m. The cut-set, that
represents the number of edges that cross over partitions,
is used as the parameter to be minimized by the greedy al-
gorithm. The greedy reﬁnement algorithm selects a vertex
at random and computes the gain in the cut-set (reduction
in edge-cut)for everypartition that the vertexcan be moved
to. Thepartitionwith maximumgainis thenselected forthe
move. A move is feasible if it reduces the cut-set and pre-
serves load balance. Reducing the cut-set in turn reduces
communication overheads. Once a vertex is selected for
a move, it is “locked”, preventing its move until an itera-
tion of the greedy algorithm ﬁnishes. The greedy algorithm
was found to converge in a few iterations. As illustrated
in Figure 2, the graphs are recursively projected to the next
higher level, preserving the partitioning information, while
reﬁning across levels.
4 Experimental Framework
The simulation frameworkused forthe partitioningstud-
ies (illustrated in Figure 3) consists of three primary com-
ponents [21]: (i) SAVANT, (ii) TYVIS and (iii) WARPED.
The primary input to the framework is the description of
the hardware component in VHDL [9]. The input VHDL is
analyzed into an Internal Intermediate Representation (IIR)
called the Advanced Intermediate Representation with Ex-
tensibility [22] (AIRE)using scram; the VHDL parser and
code-generator developed as a part of the SAVANT project.
TheintermediateformisusedtogenerateC++codecompli-
ant with the TYVIS interface. The generated code is com-
plied along with TYVIS libraries to obtain the ﬁnal simula-
tion executable.
TYVIS is a VHDL kernel that provides necessary run-
time support for simulation of VHDL designs. It provides
the basic data structures and methods necessary to interface
thegeneratedC++codefromscramwith the WARPED [18]
parallel simulation kernel. WARPED is an optimistic paral-
lel discrete event simulator developed at the University of
Cincinnati. It uses the Time Warp mechanism [10] for dis-
tributed synchronization. In WARPED, the logical processesCircuit Inputs Gates Outputs
s5378 35 2779 49
s9234 36 5597 39
s15850 77 10383 150
Table 1. Characteristics of benchmarks
(LPs) that represent the physical processes being modeled
are placed into groups called “clusters” that represent oper-
ating system level parallel processes. LPs within a cluster
operate as classical Time Warp processes. Further details
on the working of TYVIS and WARPED a r ea v a i l a b l ei nt h e
literature [21].
The necessary infrastructureto enable partitioningof the
VHDL processes was integrated with the TYVIS kernel.
A set of six different partitioning strategies, including the
multilevel strategy, were incorporated. Since the frame-
work employs a runtime elaboration technique [21], parti-
tioningoccursat runtime,afterthesimulationis instantiated
and initialized. The runtime support functions generated by
scram are used by the partitioning algorithms to build the
necessary data structures. The runtime partitioning tech-
nique provides the ﬂexibility to choose from different par-
titioningalgorithmswithout necessitating re-compilationof
the system. The design also provides a simple technique to
choose the number of partitions. Also, the object oriented
techniques employed provide an efﬁcient interface that can
be used to integrate other partitioning algorithms.
5 Experiments
Three of the ISCAS ’89 benchmarks [4] were used to
evaluate the performance of the partitioning algorithms.
The characteristics of the benchmarks used in the experi-
ments are shown in Table 1. All the partitioning algorithms
failed to provide speedup for benchmarks with less than
2500 gates, since such models were small enough for the
sequentialsimulator to outperformthe parallelversion. The
parallel simulation experiments were conducted on eight
workstations inter-connected by fast ethernet. Each work-
station consisted of dual Pentium II processors with 128
MB of RAM running Linux 2.2.12. The experiments were
repeated ﬁve times and the average was used as the repre-
sentative value in all the characteristics. The partitioning
techniques used in the experiments included the multilevel
methodology and the following ﬁve algorithms (i) Ran-
dom,(ii)Topological,(iii)DepthFirst, (iv)Cluster (Breadth
First) and (v) Fanout cone.
Thesimulationtimesforthes9234benchmarkareshown
in Figure 4. It is observed that the multilevel algorithm out-
performs all other partitioning algorithms when more than
4 nodes are involved in the simulation. The performance of
the Cluster and DFS algorithms deteriorates with increase
in number of nodes due to lack of concurrency. The lack of
concurrency also increases the number of rollbacks in the
simulations. The performance of the Topological algorithm
is limited due to increased communicationoverheads; more
signals are split across partitions for concurrency. The sim-
ulation execution times for all the benchmarks have been
tabulated in Table 2. As illustrated in the table, the mul-
tilevel strategy performs better than other strategies when
the number of processors employed, lie between 8 (4 work-
stations) and 16 (8 workstations). When 4 processors were
employedto simulatethe s15850model,the simulationsran
outofmemoryandhencetheresultsarenotpresentedin Ta-
ble 2.
The messaging characteristics for the simulation exper-
iments is presented in Figure 5. As shown in the ﬁgure,
the multilevel algorithm reduces the amount of communi-
cation in the 8 to 16 processor region. The Cone partitioner
performedwelldueto lowercommunicationandbettercon-
currency features. Increased communication overheads due
to greater edge cut in the case of the Topological partitioner
resulted in increased execution times. The Cluster and the
DFS partitioning did not perform well in the 16 node case
due to similar reasons.
The rollbackcharacteristicsof the s9234model is shown
in Figure 6. As illustrated by the bar chart, the multilevel
algorithm greatly reduces the number of rollbacks during
simulation; highlighting the equilibrium achieved between
concurrency and communication. The sudden dips in the
executiontime graph(Figure 4) are caused by a lower num-
ber of rollbacks and lower communication for most of the
partitioning algorithms. The Cluster, DFS and the Topolog-
ical algorithms suffered from a number of rollbacks with
more communicationdegradingtheir performancein the 16
processor case.
6 Conclusions
In this paper, we presented the partitioning studies that
were conducted on a parallel VHDL simulation framework
(SAVANT/TYVIS/WARPED). A new partitioning technique
based on the multilevel heuristic was developed and its per-
formance relative to existing partitioning strategies was in-
vestigated. In addition, the design and the integration of
the variouspartitioningstrategies intothe simulationframe-
work was also described. Results from the experimental
analysis indicatethat the multileveltechniqueyieldedbetter
partitions than other partitioning strategies. Parallel simu-
lation (of all the sample applications) on 16 processors us-
ing the multilevel technique executed in less than half the
time taken by a sequential simulation of the same appli-
cation(s). This speedup can be attributed to the reduction
in both the number of rollbacks and the amount of inter-     Library
  WARPED
    
C++ code
Partitioning
  SAVANT
VHDL Design
File
VHDL
Analyzer
IIR
Code Generator
Executable
  Simulation
TyVIS
VHDL Kernel
Library
 Partitioning
  Library
Code
 Generator
Figure 3. The Simulation Framework
0
100
200
300
400
500
600
700
800
1 2 3 4 5 6 7 8
E
x
e
c
u
t
i
o
n
 
T
i
m
e
 
-
 
s
e
c
s
Number of nodes
s9234 Execution Times
Sequential
Random
DFS
Cluster
Topological
Multilevel
ConePartition
Figure 4. Execution times of s9234
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
3.5
x 10
5 Messaging statistics for s9234 model
Number of Nodes
N
u
m
b
e
r
 
o
f
 
A
p
l
l
i
c
a
t
i
o
n
 
M
e
s
s
a
g
e
s
Random       
DFS          
Cluster      
Topological  
Multilevel   
ConePartition
Figure 5. Messaging characteristics of s9234
Circuit Seq No. of Random DFS Cluster Topological Multilevel Cone
Time Nodes
s5378 149.96 2 166.44 118.72 97.45 128.63 91.66 166.54
4 116.11 84.80 83.28 331.45 84.07 113.11
6 131.95 76.12 96.86 194.34 63.61 96.07
8 101.89 81.09 78.62 152.91 52.94 76.56
s9234 651.24 2 675.07 473.90 417.63 577.14 529.39 701.10
4 496.30 424.41 322.02 434.85 341.84 502.60
6 520.80 320.98 373.41 539.59 316.96 414.65
8 383.32 489.97 415.02 360.90 290.31 351.35
s15850 2154.21 4 2090.82 1279.19 1317.28 2272.62 1043.43 1832.24
6 1434.79 906.08 1351.17 1439.99 943.91 1363.40
8 1407.33 947.64 1215.64 2735.07 864.03 1176.36
Table 2. Simulation Time (in secs) for the different partitioning algorithm0 2 4 6 8 10
0
2
4
6
8
10
12
14
x 10
4
Number of Nodes
T
o
t
a
l
 
N
u
m
b
e
r
 
o
f
 
R
o
l
l
b
a
c
k
s
Rollback behaviour of s9234
Random       
DFS          
Cluster      
Topological  
Multiilevel  
ConePartition
Figure 6. Rollback characteristics of s9234
processor communication. Since the multilevel technique
is a linear time heuristic, it can be easily scaled to partition
for a large number of processors. Research is currently on-
going to incorporateseveral enhancementsto the multilevel
heuristic. For example, we are currently investigating the
use of activity levels of communication to make better de-
cisions while coarsening. In addition, different schemes for
coarsening and reﬁnement are also being studied.
References
[1] P. Agrawal. Concurrency and communication in hardware
simulators. IEEE Transactions on Computer-Aided Design,
Oct. 1986.
[2] R. L. Bagrodia and W. Liao. Maisie: A language for the de-
sign of efﬁcient discrete-event simulations. IEEE Transac-
tions on Software Engineering, 20(4):225–238, Apr. 1994.
[3] M. L. Bailey, J. V. Briner, Jr., and R. D. Chamberlain. Par-
allel logic simulation of VLSI systems. ACM Computing
Surveys, 26(3):255–294, Sept. 1994.
[4] CAD Benchmarking Lab , NCSU. ISCAS’89 Benchmark In-
formation. (available at http://www.cbl.ncsu.edu/
www/CBL_Docs/iscas89.html).
[5] J. Cloutier, E. Cerny, and F. Guertin. Model partitioning and
the performance of distributed time warp simulation of logic
circuits. In Simulation Practice and Theory, pages 83–99,
1997.
[6] C. M. Fiduccia and R. M. Mattheyses. A linear time heuris-
tic for improving network partitions. In Proceedings of the
19th IEEE Design Automation Conference, pages 175–181,
1982.
[7] R. Fujimoto. Parallel discrete event simulation. Communi-
cations of the ACM, 33(10):30–53, Oct. 1990.
[8] B. Hendrickson and R. Leland. A multi-level algorithm for
partitioning graphs. In Proceedings of the 1995 ACM/IEEE
Supercomputing Conference, Dec. 1995.
[9] IEEE Standard VHDL Language Reference Manual.N e w
York, NY, 1993.
[10] D. Jefferson. Virtual time. ACM Transactions on Program-
ming Languages and Systems, 7(3):405–425, July 1985.
[11] K. L. Kapp, T. C. Hartrum, and T. S. Wailes. An improved
cost function for staticpartitioning of parallel circuit simula-
tions using a conservative synchronization protocol. In Pro-
ceedings of the 9th Workshop on Parallel and Distributed
Simulation (PADS ’95), pages 78–85, 1995.
[12] G. Karypis and V. Kumar. Multilevel k-way partitioning
scheme for irregular graphs. Technical Report TR 95-055,
University of Minnesota, Computer Science Department,
Minneapolis, MN 55414, Aug. 1995.
[13] B. W.Kernighan and S.Lin. An efﬁcient heuristic procedure
for partitioning graphs. The Bell Systems Technical Journal,
pages 291–307, Feb. 1970.
[14] H. K. Kim and J. Jean. Concurrency preserving partition-
ing (CPP) for parallel logic simulation. In Proceedings of
the Tenth Workshop on Parallel and Distributed Simulation,
pages 98–105, May 22–24 1996.
[15] S.A. Kravitz and B. D. Ackland. Staticvs. dynamic portion-
ing of circuits for a MOS timing simulator on a message-
based multiprocessor. In Proceedings of the SCS Multi-
conference on Distributed Simulation, 1988.
[16] N. Manjikian and W. M. Loucks. High performance paral-
lel logic simulation on a network of workstations. In Pro-
ceedings of the 7th Workshop on Parallel and Distributed
Simulation, pages 76–84, May 1993.
[17] S. Patil, P. Banerjee, and C. D. Polychronopoulos. Efﬁcient
circuit partitioning algorithms for parallel logic simulation.
In Proceedings, Supercomputing ’89, pages 361–370, Nov.
1989.
[18] R. Radhakrishnan, D. E. Martin, M. Chetlur, D. M. Rao,
and P. A. Wilsey. An Object-Oriented Time Warp Simula-
tion Kernel. In D. Caromel, R. R. Oldehoeft, and M. Thol-
burn, editors, Proceedings of the International Symposium
on Computing in Object-Oriented Parallel Environments
(ISCOPE’98), volume LNCS 1505, pages 13–23. Springer-
Verlag, Dec. 1998.
[19] S. P. Smith, B. Underwood, and M. R. Mercer. An anal-
ysis of several approaches to circuit partitioning for paral
lel logic simulation. In In Proceedings of the 1987 Inter-
national Conference on Computer Design., pages 664–667.
IEEE, NewYork, 1987.
[20] C. Sporrer and H. Bauer. Corolla partitioning for distributed
logic simulation of VLSI-circuits. In Proceedings of the 7th
Workshop on Parallel and DistributedSimulation, pages 85–
92, May 1993.
[21] K. Subramani, D. E. Martin, and P. A. Wilsey. SA-
VANT/TyVIS/WARPED: Components for the analysis and
simulation of vhdl. VHDL User’s Group, pages 195–201,
1998.
[22] J. C. Willis, P. A. Wilsey, G. D. Peterson, J. Hines, A. Zam-
friescu, D. E. Martin, and R. N. Newshutz. Advanced
intermediate representation with extensibility (AIRE). In
VHDL Users’ Group Fall 1996 Conference, pages 33–40,
Oct. 1996.