Parallel Cycle Simulation by Hering, Klaus
Parallel Cycle Simulation
Klaus Hering ¤
Institute of Computer Science
University of Leipzig
Abstract
Parallelization of logic simulation on register-transfer and gate
level is a promising way to accelerate extremely time extensive system
simulation processes for whole processor structures. In this report par-
allel simulation realized by means of the functional simulator parallel-
TEXSIM based on the clock-cycle algorithm is considered. Within a
corresponding simulation, several simulator instances co-operate over
a loosely-coupled processor system, each instance simulating a part of
a synchronous hardware design. Therefore, in preparation of paral-
lel simulation, partitioning of hardware models is necessary, which is
essentially determining e±ciency of the following simulation.
A framework of formal concepts for an abstract description of par-
allel cycle simulation is developed. This provides the basis for partition
valuation within partitioning algorithms.
Starting from the de¯nition of a Structural Hardware Model as
special bipartite graph Sequential Cycle Simulation is introduced as
sequence of actions. Following a cone-based partitioning approach
a Parallel Structural Hardware Model is de¯ned as set of Structural
Hardware Models. Furthermore, a model of parallel computation
called Communicating Processors is introduced which is closely re-
lated to the well known LogP Model. Together with the preceding
concepts it represents the basis for determining Parallel Cycle Simu-
lation as sequence of action sets.
¤khering@informatik.uni-leipzig.de
1
1 Introduction
Due to challenging technological capabilities the attainable complexity of
VLSI designs is growing rapidly. Detecting design faults as early as possi-
ble prevents from wasting valuable resources. Therefore, the employment of
veri¯cation processes in all design phases is inevitable. Simulation is a very
important VLSI design veri¯cation method. The background of our work is
given by functional simulation on register-transfer and gate level (logic simu-
lation) without consideration of timing aspects. In [9] a simulation strategy
is presented with underlying hardware models embodying complete proces-
sor structures and simulation stimuli (test cases) being microprogrammes or
machine instruction sequences. During system simulation time-extensive sim-
ulation runs for ¯nal validation of complex designs are considered. Aiming at
signi¯cant run time reductions for such simulation processes we parallelized
the sequential functional simulator TEXSIM 1 which operates on the basis of
the clock-cycle algorithm. parallelTEXSIM is documented in [2]. We chose a
parallelization approach making use of model inherent parallelism. Within a
corresponding parallel simulation, several simulator instances co-operate over
a loosely-coupled processor system, each instance simulating a part of a syn-
chronous hardware design. Therefore, in preparation of parallel simulation,
partitioning of the whole hardware model is necessary which is essentially
determining the run time behaviour of a following simulation.
In [3] a project comprising investigation, development and implementation of
model partitioning algorithms in the context of parallelTEXSIM is outlined.
A hierarchical partitioning strategy is introduced in [4] followed by a special
instance called mixture of experts approach described in [5]. Based on ideas
of D.Zike andW.Roesner presented in [8] we consider fan-in cones as ele-
mentary components for building model partitions. Related work is reported
in [6] and [7].
The model partitioning problem can be formulated as a combinational opti-
mization problem. In this context partitions are related to quantities (costs)
which more or less directly express a connection to parallel simulation run
time (partition valuation). For partition valuation a model of corresponding
parallel simulation is needed. Choosing such a model appears as balance
between (a necessary degree of) detail and (a su±cient degree of) simplicity.
1developed by IBM
2
Beyond the special background of partition valuation the objective of the
present report is to summarize a framework of formal concepts for charac-
terization of parallel cycle simulation realized by parallelTEXSIM, providing
an abstract basis for development and investigation of corresponding model
partitioning algorithms. A schematic representation of the key concepts is
given in Tab.1.
Structural Hardware Model (SHM)
(bipartite graph)
+
Sequential Cycle Simulation (SCS)
(sequence of actions)
+
Parallel Structural Hardware Model (PSHM)
(set of SHMs)
+
Extended Sequential Cycle Simulation (ESCS)
(sequence of actions)
+
Parallel Cycle Simulation (PCS)
(sequence of action sets)
*
Unrestricted Parallel Behaviour (UPB)
(sequence of action sets)
*
Communicating Processors (CP)
(model of parallel computation)
Table 1: Concepts for modelling parallelTEXSIM simulation
At ¯rst a Structural Hardware Model (SHM) is de¯ned as directed bipartite
graph with a subset of the node set representing design components as for
instance gates or latches. The corresponding complementary node set stands
for wires. Underlying designs are assumed to be synchronous.
3
For assigning behaviour to a SHM corresponding to the simulation of one
cycle a set of (node related) actions is introduced. These actions are consid-
ered to be basic simulation components. They are not supplied with semantic
details as for instance state transferring functions (for a ¯rst approach of de-
tailed semantic description of parallel cycle simulation see [2]). Taking into
account a levelizing of nodes, special action sequences are chosen as Sequen-
tial Cycle Simulation (SCS).
Furthermore, a Parallel Structural Hardware Model (PSHM) is de¯ned in
relation to a cone-based partition of a SHM. The latter is to be interpreted as
a representation of a whole design under consideration. A PSHM embodies
a set of SHM s each of them determined by a partition component (set of
cones). For SHM s which are elements of a PSHM, Extended Sequential
Cycle Simulation (ESCS) is introduced as component behaviour. An ESCS
is a sequence of actions containing a special action for the representation of
communication between PSHM components.
Finally, Parallel Cycle Simulation (PCS) is de¯ned as behaviour of a PSHM.
A PCS represents a sequence of action sets with the individual actions re-
lated to ESCS s belonging to components of the PSHM considered. As basis
for this de¯nition a model of parallel computation called Communicating
Processors (CP) is introduced. Besides its application for modelling paral-
lel cycle simulation CP enables the investigation of various communication
mechanisms for asynchronously working processor systems. CP behaviour is
determined by given (sequential) component behaviour and communication
mechanisms involved. As general framework Unrestricted Parallel Behaviour
(UPB) is de¯ned, which can be restricted to concrete CP behaviour by in-
clusion of synchronization conditions.
2 Structural Hardware Model
Essential components of our structural hardware model are given by the
family of sets listed below :
² ME : logical boxes - representing logical gates, multiplexers, : : :
² MI : input boxes - representing design elements for signal input
² MO : output boxes - representing design elements for signal output
4
² ML : storing boxes - representing clocked elements (latches)
² MS : nets - representing wires
In the following, N(x) and N¡(x) denote the set of immediate successors
and predecessors of a node x within a directed graph, respectively (with the
usual extension of the de¯nitions to sets of arguments).
De¯nition 2.1 (SHM) LetME;MI ;MO;ML;MS be pairwise disjoint ¯nite
sets, MB = ME [MI [MO [ML and MB; MS 6= ;. Then a directed bipar-
tite graph M = (MB;MS;MR) satisfying the following conditions is called
Structural Hardware Model (SHM):
1. fx j x 2MB [MS ^N
¡(x) = ;g =MI
2. fx j x 2MB [MS ^N(x) = ;g =MO
3. Any directed cycle in M includes at least one element of ML.
Figure 1 roughly illustrates a SHM. Thick arrows represent subsets of MS.
Figure 1: A Structural Hardware Model schematically
5
Remark 2.1 Condition 3 expresses the exclusion of asynchronous feedbacks
in combinational logic (synchrony of the underlying design).
Remark 2.2 There exists a sequence of design description transforma-
tions leading to SHM s starting from original descriptions given in DSL or
BDL/S (a combination of both languages can be used within one descrip-
tion). So called protos as data structures of the Design Automation Data
Base DA DB2 form the last stage before reaching SHM s.
For de¯ning a behaviour of SHM s a levelizing of logical boxes is introduced.
Because of this, logical boxes are concentrated in groups according to the
longest distance to storing or input boxes (via input paths). Boxes belonging
to the same group are to be interpreted as carriers of simulation activities
which can be performed independently from each other.
De¯nition 2.2 (Levelizing) Let M be a SHM with ME 6= ; and Li be
de¯ned as follows:
1. L0 = MI [ML
2. Li+1 = Li [ fxjx 2ME ^N
¡(N¡(x)) µ Lig
Let be
k = minfjjLj = Lj+1g. (2.1)
Then
L(M) = fL1; : : : ; Lkg with Li = Li n Li¡1 for i 2 f1; : : : ; kg
is called levelizing of ME.
Remark 2.3 For justi¯cation of the above de¯nition, the existence of k ¸ 1
with property 2.1 is evident.
Lemma 2.1 L(M) is a partition of ME (in mathematical sense).
2developed by IBM
6
3 Sequential Cycle Simulation
For the representation of basic components of cycle simulation, abstract ac-
tions (bound to nodes di®erent from nets) are introduced. They are not
supplied with semantic details as, for instance, state transferring functions.
With respect to partition valuation, actions are considered as sources of sim-
ulation expense. In relation to a SHM M we consider an action set
A = AE [AI [AO [AL (3.1)
with a bijective assignment function a : MB !A assuming a(M!) = A! for
! 2 fE; I; O; Lg.
One has to interpret a as an action standing for the evaluation of a logical
function for a 2 AE and as an update action (of an input, output or latch,
respectively) for a 2 AI [AO [AL.
De¯nition 3.1 (SCS) With k denoting sequence concatenation (including
the usual extension to sets of sequences), A+ embodying the set of all ¯nite
non-empty sequences over A and k = jL(M)j
sseq 2 A
+
satisfying the following conditions is called Sequential Cycle Simulation
(SCS) with respect to M :
1. Each action of A appears exactly once within sseq.
2. sseq 2 AI
+ k a (L1)
+
k:::k a (Lk)
+
k AO
+ k AL
+
Remark 3.1 All boxes imply exactly one action during the simulation of
one cycle (time-driven simulation). In future work it might be of interest to
consider the TEXSIM capability of excluding parts of combinational logic
from cycle simulation.
Remark 3.2 Levelizing determines a pre-ordering of actions belonging to
logical boxes. The order of actions within sub-sequences of sseq belonging
toAI
+ ; a (Lj)
+
; AO
+ orAL
+ is assumed to be without relevance for the
simulation result.
7
4 Parallel Structural Hardware Model
In the following, PSHM s are introduced to give a structural representation
of design partitions related to parallelTEXSIM simulation. Our partitioning
approach is based on fan-in cones (see [4]). In our context a fan-in cone
comprises the set of all logical boxes, which are (potentially) able to in°uence
via their box-related actions the respective action of a cone-de¯ning (head)
box during simulation of one cycle. We remark that the cone-head itself is
an element of the corresponding cone, too.
De¯nition 4.1 (Fan-in cone) With M being a SHM, the fan-in cone
co(x) for x 2 ME [ML [MO is de¯ned as the smallest set satisfying the
following conditions :
1. x 2 co(x)
2. y 2ME ^N(N(y)) \ co(x) 6= ; ! y 2 co(x)
Obviously, co(x) µ ME [ML [MO is valid. Input boxes and storing boxes
(di®erent from x) immediately feeding the cone co(x) do not belong to its
elements.
Lemma 4.1 For x; y 2ME [ML [MO the following relation holds:
co(x) = co(y) =) x = y
Therefore, calling x the head of co(x) is justi¯ed.
Basic elements for building partitions of a hardware model M are given by
the following set (cone set of M) :
Co(M) = fco(x)jx 2ML [MOg (4.1)
Lemma 4.2 ME [ML [MO =
S
c2Co(M)
c
8
De¯nition 4.2 (Partition) LetM be a SHM. Then a partition ¦ (in math-
ematical sense) of Co(M) is called a partition of M .
Remark 4.1 Di®erent elements (cones) of Co(M) may have common boxes
(cone overlapping). If we assume, that a partition component determines
the model part to be handled by a simulator instance on a single processor
during parallel simulation, then overlapping cones as elements of di®erent
partition components stand for replication of simulation work. Besides this
drawback the cone-based partitioning approach bears the advantage that
interprocessor communication during parallel cycle simulation is necessary
only at cycle boundaries.
In the following, the objective is to construct PSHM s with respect to par-
titions of SHM s on the basis of "sub-models" de¯ned by partition compo-
nents. Later on, sub-model behaviour will be combined to behaviour of the
whole parallel model characterizing parallel cycle simulation. For sub-model
de¯nition some concepts related to partition components (cone sets) are in-
troduced.
Let M be a SHM. For arbitrary cone sets C µ Co(M) we de¯ne:
² BC =
S
c2C
c
(set of all boxes belonging to at least one cone of C)
² head(C) = fxjco(x) 2 Cg
(set of all heads belonging to cones of C)
² feed(C) = fs j s 2MS ^N(s) \B
C 6= ; ^N¡(s) * BCg
(set of all nets feeding C from "outside")
Corresponding nets have at least one sink box within a cone of C and
one source box lying outside all cones of C.
² leave(C) = fs j s 2MS ^N
¡(s)\ head(C) 6= ; ^N(s)\BCo(M)nC 6= ;g
(set of all nets leaving C via cone-heads)
Corresponding nets have at least one cone-head of C as a source box
9
and one sink box belonging to a cone outside C. Due to possible cone
overlapping this does not exclude the existence of a cone in C covering
the corresponding sink box as well.
Lemma 4.3 s 2 feed(C) ^ b 2 N¡(s) nBC ¡! b 2MI [ML
Remark 4.2 The elements of feed(C) [ leave(C) are to be interpreted as
carriers of information at cycle boundaries with respect to C. Nets belonging
to feed(C) can have a source box within a cone of C and nets belonging to
leave(C) can have a sink box within a cone of C. Remark, that there may
exist nets not included in leave(C) having a source box within a cone of C
and a sink box lying outside of all cones belonging to C. These nets are not
related to communication at cycle boundaries.
Now, models related to cone sets, embodying components of a partition of a
SHM, are introduced.
De¯nition 4.3 (Sub-model) Let M be a SHM , ¦ a partition of M and
C 2 ¦. MC
¦
=
³
MCB;M
C
S ;M
C
R
´
with MCB = M
C
E [M
C
I [M
C
O [M
C
L is called
sub-model of M with respect to ¦, if the corresponding components satisfy
the following conditions :
1. MCE = B
C \ME; M
C
L = B
C \ML
2. MCI =M
C
I;I [M
C
I;L
² MCI;I = N
¡(feed(C)) \MI
² MCI;L = f(C
0; s)js 2 feed(C) ^N¡(s) \
³
BC
0
nBC
´
6= ; ^
C 0 2 ¦ ^ C 0 6= Cg
3. MCO =M
C
O;O [M
C
O;L
² MCO;O = B
C \MO
² MCO;L = f(s; C
0)js 2 leave(C) ^N(s) \BC
0
6= ; ^ C0 2 ¦ ^ C 0 6= Cg
4. MCS = fs j s 2MS ^N(s) \B
C 6= ;g [ leave(C)
10
5. MCR = [MR \ [((B
C [MCI;I)£M
C
S ) [ (M
C
S £B
C)]] [
f((C
0
; s); s)j(C
0
; s) 2MCI;Lg [
f(s; (s; C
0
))j(s; C
0
) 2MCO;Lg
In all cases, N and N¡ are related to M .
Remark 4.3 The elements (C0; s) of MCI;L are to be interpreted as input
boxes for MC
¦
. They are related to the set N¡(s) \
³
BC
0
nBC
´
of "foreign
latches" belonging to the component C 0 6= C of ¦ feeding C via the net s.
N¡(s)\
³
BC
0
nBC
´
µML is a consequence of Lemma 4.3 and of the exclusion
of input boxes from cones. The elements of MCI;I embody global input boxes
of MC
¦
.
Remark 4.4 The elements (s; C0) of MCO;L are to be interpreted as output
boxes for MC
¦
related to the set N(s) \ BC
0
of "foreign boxes" belonging to
the component C 0 6= C of ¦ fed by C via the net s. Due to possible cone
overlapping, one element of N(s) can belong to di®erent components of ¦.
The elements of MCO;O embody global output boxes of M
C
¦
.
Lemma 4.4 Let M be a SHM and ¦ be the single-block partitionfCo(M)g.
Then M
Co(M)
fCo(M)g =M is valid.
Lemma 4.5 A sub-model MC¦ of M is a SHM.
De¯nition 4.4 (PSHM) Let M be a SHM and ¦ be a partition of M .
M¦ =
n
MC¦ j C 2 ¦
o
is called Parallel Structural Hardware Model (PSHM) with respect
to ¦.
In Figure 2 a sub-model MC¦ in the context of a PSHM is represented
schematically. M¦ implies a binary communication relation over ¦. Set
11
Figure 2: A sub-model in PSHM context
M
C
0
!C
00
I;L = f(C
0
; s)j(C 0; s) 2MC
00
I;Lg and M
C
0
!C
00
O;L = f(s; C
00)j(s; C00) 2MC
0
O;Lg.
M
C
0
!C
00
I;L contains all input boxes of M
C
00
related to output boxes in MC
0
and
M
C
0
!C
00
O;L contains all output boxes of M
C
0
related to input boxes in MC
00
.
Obviously, we have
S
C¤2¦
M
C
¤
!C
00
I;L =M
C
00
I;L and
S
C¤2¦
M
C
0
!C
¤
O;L =M
C
0
O;L.
Lemma 4.6 Let ¦ be a partition of a SHM M and C
0
; C
00
2 ¦ .
Then
¯¯
¯¯MC
0
!C
00
I;L
¯¯
¯¯ =
¯¯
¯¯MC
0
!C
00
O;L
¯¯
¯¯is valid.
De¯nition 4.5 (Communication relation) Let M¦ be PSHM. Then
Comm
¦ =
n
(C0; C 00) j C0; C 00 2 ¦ ^MC
0
!C
00
O;L 6= ;
o
is called communication relation of M¦.
Remark 4.5 Regarding the components of a PSHM M¦ as representations
of model parts to be handled on single processors during parallel simulation,
(C0; C00) 2 Comm¦ means interprocessor communication (directed from the
processor handling C 0 to this handling C00).
12
5 Extended Sequential Cycle Simulation
After introduction of a parallel hardware model from structural point of view
in the previous section, the behaviour of its components is under considera-
tion now. As for SHM s not standing in the context of a PSHM, the behaviour
of PSHM components is chosen as action sequence again.
Consider a sub-model MC
¦
of M with respect to ¦. In this context an action
set
AC = AC
E
[AC
I;I
[AC
I;L
[AC
O;O
[AC
O;L
[AC
L
[ fcg (5.1)
with a bijective assignment function a : MC
B
! AC n fcg assuming
a
³
MC
!
´
= AC
!
(! representing an arbitrary variant of the lower indices ap-
pearing in (5.1)) is introduced. Di®erent from SCS, a special action c not
bound to a special box and representing component communication at cy-
cle boundaries is involved. AC re°ects the splitting of the sets of input and
output boxes within MC
¦
.
De¯nition 5.1 (ESCS) Let MC
¦
be a sub-model of M with respect to ¦,
L
³
MC
¦
´
= fLC
1
; : : : ; LC
kC
g be the levelizing of MC
E
and AC be given as in (5.1).
Then
sC
seq
2
³
AC
´+
satisfying the following conditions is called Extended Sequential Cycle
Simulation (ESCS) with respect to MC
¦
:
1. Each action of AC appears exactly once within sC
seq
.
2. sC
seq
= sC
cycle
ksC
comm
with
{ sC
cycle
2 AC
I;I
+
ka
³
LC
1
´+
k:::ka
³
LC
kC
´+
k AC
O;O
+
k AC
L
+
and
{ sC
comm
= sC
pre comm
k (c) ksC
post comm
,
sC
pre comm
2 AC
O;L
+
, sC
post comm
2 AC
I;L
+
Remark 5.1 sC
cycle
appears as SCS with respect to MC
¦
modi¯ed by omitting
the actions from AC
I;L
[ AC
O;L
(assigned to boxes from MC
I;L
[MC
O;L
). sC
comm
represents 3 phases of communication related work:
13
² s
C
pre comm
: preparation of interprocessor communication under sending
aspect with respect toMC
¦
(extraction of sub-model data with following
placement in communication related structures)
² c: (possibly) complex communication action at cycle boundaries
² s
C
post comm
: post-processing of interprocessor comunication under re-
ceiving aspect with respect to MC
¦
(extraction of data from communi-
cation related structures with following placement in sub-model struc-
tures)
The structure of sC
seq
re°ects the restriction of communication between com-
ponents involved in parallel cycle simulation to cycle boundaries. For combin-
ing the behaviour of PSHM components to a behaviour of the whole PSHM
we make use of a model of parallel computation.
6 Communicating Processors
A model of parallel computation embodies a combination of descriptions of
a more or less abstract parallel processor structure (consisting of process-
ing elements, memory modules and an interconnection network) and its be-
haviour. It determines a framework for the investigation of concurrent pro-
cesses co-operating within the realization of parallel algorithms. A variety
of corresponding models has been developed, all of them compromising on a
necessary degree of detail (to allow addressing of relevant problems) and a
su±cient degree of simplicity (to keep these problems tractable). Working on
parallel logic simulation, we were looking for a model of parallel computation
² related to loosely-coupled parallel machines without supposing a spe-
cial architecture, but giving the possibility of introducing architecture
dependent properties via parameters,
² allowing di®erent communication mechanisms to consider and
² describing behavioural capabilities of single processes in terms of se-
quences of abstract actions to have the possibility of relating them to
several interpretations (for instance, to simulation time amount as basis
for partition valuation).
14
In [1] a model of a distributed-memory multiprocessor with processors com-
municating by point-to-point messages is introduced. The model is called
LogP with the four letters representing the main parameters of the model:
² L as upper bound of the latency for communicating "small" messages
from source to target
² o as overhead in terms of the length of time a processor is engaged in
transmission or reception of a message
² g as gap representing the minimum time interval between consecutive
message transmissions / receptions at one processor
² P as the number of processors and memory modules considered
LogP speci¯es the performance characteristics of an underlying interconnec-
tion network via the parameters given above without consideration of special
network topologies. Inspired by LogP, we introduce Communicating Pro-
cessors (CP) as model of a loosely-coupled parallel machine providing the
possibility of integrating a set of communication mechanisms corresponding
to topical needs.
De¯nition 6.1 (CP) A model of parallel computation called Communicat-
ing Processors (CP) is de¯ned as triplet P = (PP ; PA; PC) where
² PP = fP1; : : : ; Png is a set of (abstract) processors working asyn-
chronously,
² PA = fA1; : : : ;Ang is a family of ¯nite processor-bound action sets and
² PC = fM1; : : : ;Mlg is a ¯nite set of communication mechanisms. A
communication mechanism is given as an ordered pair with a quali-
tative characteristic as ¯rst component and a (possibly empty) set of
quantitative characteristics as second component. A qualitative char-
acteristic comprises
{ the determination of actions related to the corresponding mecha-
nism
15
{ the determination of source/target relations within a set of in-
volved processors
{ the determination of synchronization conditions
A quantitative characteristic appears as a real function or constant,
valuating a communication-related aspect.
Remark 6.1 In the context of CP actions represent the execution of oper-
ations on the processors under consideration. There is nothing said about
their complexity. The execution of an extensive high-level procedure can
be regarded as well as handling a microcode instruction. There is a certain
freedom of assigning semantic details to actions corresponding to the require-
ments of topical objectives. We use actions as basic blocks to build sequences
interpreted as behaviour of an underlying processor.
Remark 6.2 Within PC elemental point-to-point communication mecha-
nisms built on send- and receive- actions can be considered as well as col-
lective communication mechanisms realizing broadcasts or related tasks with
(usually) more than two actions (on di®erent processors) involved. Specify-
ing synchronization conditions results in a potential blocking of actions from
behavioural point of view. This directly restricts possible combinations of
component behaviour sequences within CP behaviour. An example of reify-
ing PC is given in relation to the de¯nition of Parallel Cycle Simulation in
chapter 8.
Remark 6.3 Quantitative characteristics are introduced within PC for al-
lowing communication properties of real parallel architectures to °ow into the
CP model. For instance, time boundaries of special communication related
events (see latency, gap, overhead within LogP) could be such characteris-
tics. Another example is given by functions yielding run time estimations for
communication processes in dependence of the number of involved processors,
message lengths, network load situation and similar arguments. Contrary to
the qualitative characteristics the quantitative ones are not intended to in-
°uence action based behaviour de¯nitions.
16
7 Unrestricted Parallel Behaviour
The de¯nition of CP behaviour will be based on given component behaviour
(as action sequences) and synchronization conditions according to the quali-
tative characteristics of communication mechanisms described in PC . At ¯rst
we want do determine unrestricted CP behaviour omitting synchronization
conditions speci¯ed. We start with some initial de¯nitions:
² Let Bi be a given set of all behaviour sequences of Pi for a CP P³
Bi µ Ai
+
´
. We will skip to an "enriched" component behaviour for
technical reasons. Thereby, we consider an action a within a component
behaviour sequence s together with the corresponding processor index
i and the position of a within s as "enriched" action. With N denoting
the set of natural numbers we call
Ai = Ai £ fig £ N (7.1)
the enriched action set of processor Pi. Enriched action sets of di®erent
processors within PP are disjoint. In the following, act(a), proc(a) and
pos(a) denote the three components of actions a 2 Ai. The enriched
component behaviour Bi with respect to Pi is built from Bi by skipping
in every action sequence from actions to the corresponding "enriched"
actions as schematically represented below:
s = (: : : ; ak; : : :) 2 Bi
+
s = (: : : ; (ak; i; k) ; : : :) 2 Bi
(7.2)
Hence, Bi µ Ai
+
is valid. We call
A =
n[
i=1
Ai (7.3)
the total enriched action set with respect to P:
² For any ¯nite sequence s = (s1; : : : ; sm) we de¯ne expand(s) as the
set of all ¯nite sequences s
0
=
³
s
0
1; : : : ; s
0
m
0
´
satisfying the following
condition:
17
There exists a ¯nite sequence i = (i1; : : : ; im) of
immediately consecutive closed intervals [1; n1] ;
[n1 + 1; n2] ; : : :
h
nm¡1 + 1; m
0
i
of natural numbers such
that s
0
k = sj holds for k 2 ij.
For example, with s = (0; 1; 0) we have (0; 0; 0; 1; 1; 0) 2 expand(s).
In every case s 2 expand(s) is valid. For sets S of sequences we
de¯ne expand(S) =
S
s2S
expand(s).
The background of this de¯nition is given by the multiplication of ac-
tion occurences in consecutive snapshots of CP behaviour.
² Let Mi be n arbitrarily chosen sets with i 2 [1; n], M =
nS
i=1
Mi and
s 2
³
2M
´+
, where 2M denotes the power-set of M . Furthermore, let
si = (si
1
; : : : ; sim) be the maximum sub-sequence of s with s
i
k\ Mi 6= ;
for each component sik of s
i.
Then proji(s) (for i 2 [1; n]) is de¯ned as set of all sequences
s¤ = (s¤
1
; : : : ; s¤m) with s
¤
k 2 s
i
k \Mi for k 2 [1;m].
For example, consider M1 = f0g ;M2 = f1g and s = (f0; 1g ; f0g ; f1g).
Then we have s1 = (f0; 1g ; f0g), s2 = (f0; 1g ; f1g), proj1(s) = f(0; 0)g
and proj2(s) = f(1; 1)g.
The intention of this de¯nition is to identify component behaviour
within the system behaviour of a CP.
De¯nition 7.1 (UPB) Let P be a CP model, B a family of sets Bi µ Ai
+
of component behaviour sequences and Ai, Bi, A de¯ned as in (7:1), (7:2)
and (7:3), respectively. Then the set UB(P) of all sequences
s 2
³
2A
´+
satisfying the following conditions is called Unrestricted Parallel Be-
haviour (UPB) of P with respect to B:
18
1. sk 6= ; for all component indeces k of s
2.
¯¯
¯sk \Ai
¯¯
¯ · 1 for all indeces k of s and i 2 [1; n]
3. ; ½ proji(s) µ expand(Bi) for all i 2 [1; n] with Mi = Ai
4. a 2 sk ^ a =2 sk+1 ¡! a =2 sk+l+1 for all component indeces k of s,
a 2 A and l 2 N
Remark 7.1 The components of s are to be interpreted as maximum sets
(snapshots) of simultaneous active actions on di®erent processors. Condition
2 expresses that the contribution of each processor to such a set can be at
most one action.
Remark 7.2 According to condition 3, within s a "line" concerning each
processor can be found which is related to a possible "local" behaviour se-
quence. Due to condition 2, proji(s) supplies exactly one sequence of actions
belonging to
³
Ai
´+
which has to be an expansion of an "enriched" component
behaviour sequence belonging to processor Pi. An action of Ai can occur in
several consecutive snapshots. However, condition 4 prevents the existence
of gaps between such phases. Remark that within an original behaviour se-
quence belonging to Bi consecutive components can embody the same action;
in a corresponding "enriched" behaviour sequence all consecutive actions are
di®erent from each other due to the inclusion of the sequence position into
the actions.
UPB represents a general framework which is to be restricted to concrete CP
behaviour by inclusion of synchronization conditions according to communi-
cation mechanisms integrated in CP. In the next chapter such a restriction
is performed leading to a de¯nition of parallel cycle simulation on the basis
of a collective communication mechanism.
8 Parallel Cycle Simulation
In the following we consider an arbitrarily chosen PSHM
M¦ =
n
MC¦ j C 2 ¦
o
19
determined by a partition ¦ of a SHM M as introduced in de¯nition 4.4.
It is taken as basis for constructing a CP model P = (Pp; PA; PC). Then,
the (parallel) behaviour of P based on the component behaviour of M¦ will
comprise sequences of action sets we call Parallel Cycle Simulation with
respect to M¦:
Let us assume
¯¯
¯M¦
¯¯
¯ = n. We determine an ordering over M¦ by the in-
troduction of component denotations M1; : : : ;Mn. According to (5:1) each
Mi is related to a set of abstract actions which is now called Ai. By Bi we
denote the set of all Extended Sequential Cycle Simulations (see de¯nition
5.1) which belong to Mi
³
Bi µ Ai
+
´
.
We set Pp = fP1; : : : ; Png(a set of abstract processors) and
PA = fA1; : : : ;Ang. Furthermore, we want to introduce one communication
mechanism M into P (PC = fMg). M does not depend on the concrete
PSHM under consideration. It is related to the mpc index -command be-
longing to the Message Passing Library of the AIX Parallel Environmen_t.
This command was used for the implementation of interprocessor commu-
nication at cycle boundaries during simulation with parallelTEXSIM. The
qualitative characteristic of M in the framework of P is as follows:
² The only action engaged in M is the communication action c which is
element of every action set Ai.
² The whole processor set PP is involved in M. Each processor sends to
each of the remaining processors individual messages ( all-to-all per-
sonalized communication).
² M is a collective communication for which n actions c (one at each
processor) have to synchronize.
For the de¯nition of Parallel Cycle Simulation quantitative characteristics of
M are not taken into account.
De¯nition 8.1 (PCS) Let M¦ = fM1; : : : ;Mngbe a PSHM with the corre-
sponding family B = fB1; : : : ;Bng of sets of component behaviour sequences
(ESCSs) and a CP model P = (Pp; PA; PC) constructed as above. The set
RB(P) of all sequences s 2 UB(P) which satisfy the following condition is
called parallel behaviour of P with respect to B:
20
There exists a sequence component sk with jskj = n such that for
all a 2 sk act(a) = c is valid.
The elements of RB(P) are called Parallel Cycle Simulation (PCS)
with respect to M¦.
Remark 8.1 The restricting condition required in the de¯nition above ex-
presses the synchronization e®ect related to the (collective) communication
action c. Note that we regard the simulation of exactly one cycle.
Lemma 8.1 Under previous de¯nitions for any PSHM M¦ we have
RB(P) 6= ;.
Lemma 8.2 Let s be a PCS with respect to M¦. For each i 2 [1; n] sequences
si 2 Bi and s
¤ i 2 Ai
+
with proji(s) =
n
s¤ i
o
and s¤ i 2 expand (si) are
unambiguously determined.
According to (7:2) the action sequence
si =
³
: : : ; act
³
si
k
´
; : : :
´
(8.1)
built from si belongs to Bi and therefore is an ESCS (see de¯nition 8.1).
si identi¯es the behaviour of Mi within s and is denoted by beh
i(s) in the
following.
Lemma 8.3 Let s be a PCS with respect to M¦ and k be an index of s
such that jskj = n and act(a) = c for all a 2 sk. Consider an arbitrarily
chosen a 2 s
k
\A
i
with act(a) 6= c, i 2 [1; n] and k being an index of s. For
act(a) lying in the cycle- or pre comm-phase of behi(s) (see de¯nition 5.1)
we have k < k. For act(a) lying in the post comm-phase of behi(s) we have
k > k.
Parallel Cycle Simulation is visualized concerning a 3 processor variant in
¯gure 3. Actions belonging to input-, output-, latch- and logical boxes are
21
Figure 3: Parallel Cycle Simulation based on a model partition with 3 com-
ponents
concentrated in separated areas. Two sub-sequences with SCS or ESCS
structure, respectively, are shaded grey. Areas represented in black are re-
lated to pre- and post-communication sub-sequences. The synchronization
e®ect of the special communication action c is emphasized by vertical bars.
For partition valuation, estimations of run time are assigned to action se-
quences. An estimation of cycle time can be expressed as follows:
tcycle = max
j
³
t
j
I;I
+ t
j
E
+ t
j
O;O
+ t
j
L
+ t
j
O;L
+ t
j
I;L
´
+ tcomm (8.2)
Thereby tcomm denotes the expected time for one collective communication
at cycle boundaries. It is given by a function depending on the number of
processors and the maximum length of a message that has to be sent between
any two processors. This function is used as a quantitative characteristic of
the communication mechanism M mentioned above within the CP model
considered. The other time intervals occuring within (8.2) are inquired using
average execution times of boxes belonging to certain classes (known from
pre-simulation) together with structural model information.
22
9 Concluding Remarks
The framework of concepts developed here provides a formal basis for the
construction, investigation and implementation of model partitioning algo-
rithms. On the one hand the cone-related de¯nitions allow to describe the
partitioning subject exactly. They guide to the determination of data struc-
tures derived from Structural Hardware Models as for instance Overlap Hy-
pergraphs and Communication Graphs which are frequently used in parti-
tioning. On the other hand abstract modelling of Parallel Cycle Simulation
supports partition valuation, thereby leading to a combination of load bal-
ancing and communication aspects. In its strongest form, partition valuation
appears as performance prediction for corresponding parallel simulation pro-
cesses. Early performance prediction [10] is one of the challenges of our future
work.
Acknowledgements
Heartfelt thanks to Reiner Haupt, Thomas Villmann and Udo Petri for many
valuable discussions and reviewing the manuscript.
References
[1] D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos,
R. Subramonian, and T. von Eicken. LogP: Towards a realistic model
of parallel computation. 4th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, pages 1{12, 1993.
[2] D. DÄohler. Entwurf und Implementierung eines parallelen Logiksim-
ulators auf Basis von TEXSIM. Diplomarbeit, UniversitÄat Leipzig,
FakultÄat fÄur Mathematik und Informatik, 1996.
[3] K. Hering. Partitionierungsalgorithmen fÄur Modelldatenstrukturen zur
parallelen compilergesteuerten Logiksimulation (Projekt). Technical Re-
port 5(94), UniversitÄat Leipzig, Institut fÄur Informatik, 1994.
23
[4] K. Hering, R. Haupt, and T. Villmann. Cone-basierte, hierarchische
Modellpartitionierung zur parallelen compilergesteuerten Logiksimula-
tion beim VLSI-Design. Technical Report 13(95), UniversitÄat Leipzig,
Institut fÄur Informatik, 1995.
[5] K. Hering, R. Haupt, and T. Villmann. Hierarchical strategy of model
partitioning for VLSI-design using an improved mixture of experts ap-
proach. Proc. of 10th Workshop on Parallel and Distributed Simulation,
pages 106{113, 1996.
[6] N. Manjikian. High performance parallel logic simulation on a network
of workstations. Technical Report CCNG T-220, University of Water-
loo, Department of Electrical and Computer Engineering and Computer
Communications Network Group, 1992.
[7] R. B. Mueller-Thuns, D. G. Saab, R. F. Damiano, and J. A. Abraham.
VLSI logic and fault simulation on general purpose parallel computers.
IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems 12, pages 446{460, 1993.
[8] W. Roesner. TEXSIM for loosely coupled multi-processors - perfor-
mance estimates, sizing. IBM internal, 1993.
[9] W. G. Spruth. The Design of a Microprocessor. Springer, 1989.
[10] Z. Xu and K. Hwang. Early prediction of MPP performance : The SP2,
T3D and Paragon experiences. Parallel Computing 22, pages 917{942,
1996.
24
