Efficient Composition of Scenario-based Hardware Specifications by de Gennaro A et al.
1Efficient Composition of
Scenario-based Hardware Specifications
Alessandro de GennaroB, Paulius Stankaitis, Andrey Mokhov
School of Engineering, Newcastle University, UK
BCorresponding author: a.de-gennaro@ncl.ac.uk
Abstract—Complex hardware systems can be designed by
breaking down their behaviour into high-level descriptions of
constituent scenarios and then composing these scenarios into
an efficient hardware implementation using a form of high-
level synthesis. There are a few existing methodologies for such
scenario-based specification and synthesis, and in this paper
we focus on highly concurrent systems, whose scenarios are
typically described using explicit concurrency models such as
partial orders.
We propose a new algorithm for composition of partial order
scenarios. Unlike previously published methods, the proposed
algorithm supports composition constraints, which allow the
designer to restrict certain aspects of the composition in order
to reuse legacy IP. Furthermore, our implementation is more
scalable and can cope with specifications comprising hundreds
of scenarios at the cost of only ' 5% of area overhead compared
to optimal solutions obtained by exhaustive search.
The proposed algorithm is implemented in an open-source
EDA tool, validated on a set of benchmarks, and compared to the
state-of-the-art behavioural composition approaches and to other
existing methodologies that make use of behavioural synthesis.
I. INTRODUCTION
Hardware systems grow more complex every year: proces-
sors gain new features and application-specific instructions [1],
and the number of processing cores and other IP components
steadily increases following the need for IP reuse [2]. Con-
ventional approaches to the development of hardware systems
rely on HDL system descriptions, which require designers to
deal with low-level implementation details. When the level
of complexity increases, it is convenient to raise the level
of abstraction for easier system representation, for using
automatic hardware synthesis features, and, consequently, for
an increased productivity [3]. See examples [4], [5].
In this work, we use the high-level methodology based on
the Conditional Partial Order Graphs (CPOG) [6] formalism
to design hardware architectures in the control domain. This
methodology, originally conceived for the design of proces-
sor instruction set architectures (ISAs) [7], is supported by
automated hardware synthesis flow, and by algorithms for the
derivation of efficient hardware implementations [8]. However,
previously published algorithms do not scale to large numbers
of behavioural scenarios and have no support for composition
constraints, which are important for real-life systems heavily
relying on IP reuse. This motivates our research.
The paper comprises the following sections.
• Background: Section II reviews the CPOG formalism and
the related methodology [7].
• Related work: Section III compares the methodology with
other approaches in the field of behavioural synthesis, and
reviews existing algorithms for efficient composition of
scenarios.
• Scenario composition algorithm: Section IV presents our
main contribution: a new algorithm for CPOG composition
that scales to systems comprising hundreds of partial order
scenarios and supports composition constraints.
• Design automation: Section V describes the developed
open-source tool SCENCO [9], which is integrated in the
WORKCRAFT framework [10] as an external plugin and
implements the CPOG methodology.
• Algorithm and tool validation: Section VI validates
the presented contributions on a set of benchmarks that
includes ad-hoc controllers, processor instruction sets, and
software output logs.
We discuss achieved results and future research in Section VII.
This paper is an extended version of [11] and includes
the following changes. We review the CPOG-based design
methodology in Section II-B and summarise differences
with other existing behavioural synthesis approaches in Sec-
tion III-A. The new algorithm for scenario composition is
described in greater detail: in particular, we discuss how
to reduce the space of possible solutions to improve the
performance of the algorithm (Section IV-A), describe how
the composition algorithm handles constraints (Sections IV-B
and IV-D), and analyse the algorithm’s correctness and com-
plexity (Sections IV-E and IV-F). We describe how to syn-
thesise the interface between the controller and the controlled
datapath modules in Section V-A. Finally, the presented algo-
rithm and tool are evaluated on an extended set of benchmarks
in Section VI.
II. BACKGROUND
Complex systems are designed by breaking them down
into their constituent behaviours, or scenarios. In this paper,
a scenario is a list of operations that are executed in a
specified order. Formally, a scenario s = (O,≺) is a partial
order (PO) [12], i.e. a binary precedence relation ≺ describing
dependencies between a set of operations O that satisfies two
properties:
• Irreflexivity: ∀a ∈ O,¬(a ≺ a)
• Transitivity: ∀a, b, c ∈ O, (a ≺ b) ∧ (b ≺ c)⇒ (a ≺ c)
A scenario specification formally captures the behaviour of
a system by the set of its constituent scenarios S = {s1, .., sn}.
As an example, the behaviour of a processor can be specified
by the set of instructions it can execute, see Figure 1a.
2S2: Unconditional branch
S1: Arithmetic instruction
(a) Scenario specification.
select
outin
s1
s2
1
2 2
1
(b) Inefficient behavioural composition. (c) Efficient behavioural composition.
Fig. 1: Subfigure (a) shows a scenario specification comprising two processor instructions whose behaviour is expressed using
partial orders. Scenario s1 corresponds to the arithmetic instruction that fetches an instruction from the program memory,
decodes it, loads the two operands concurrently (loadA ‖ loadB), uses them to perform an arithmetic operation (ALU), and
subsequently saves the result into the memory via the saveMEM operation. Scenario s2 is the unconditional branch, which
takes one operand to compute the jump address (ALU) and saves the result into the program counter register (savePC).
Subfigures (b-c) show two approaches to behavioural composition of scenarios.
We use Conditional Partial Order Graphs (CPOGs) (re-
viewed in Section II-A) for representation of scenario specifi-
cations. CPOGs are supported by efficient scenario composi-
tion methodology that allows to take advantage of the similar-
ities between scenarios. The methodology will be described
in detail in Section II-B, but here we provide an intuitive
explanation of what we mean by ‘efficient composition’. Fig-
ure 1b shows an inefficient composition where each scenario
is synthesised in isolation and the right scenario is selected
by means of (de)multiplexers in runtime. A more efficient
approach consists of deriving a hardware implementation
where system resources and common parts of behaviour are
shared, as shown in Figure 1c.
This paper presents a new approach to composition of
scenarios for deriving efficient hardware implementations of
control circuits, such as interface controllers and processor
instruction decoders.
A. Conditional Partial Order Graphs
A CPOG is a collection of scenarios in the form of partial
orders. Formally, a CPOG [6] is a tuple H = (V,E,B, φ)1:
• V is a set of vertices which correspond to operations (or
events) in a modelled system.
• E ⊆ V × V is a set of arcs representing dependencies
between the operations.
• B is a set of Boolean variables {b1, b2, ..., b|B|}. A code is
an assignment c : B → {0, 1} of these variables, e.g. B =
{b1, b2}, c(b1) = 0 and c(b2) = 1 that will be further
denoted as c = 01 for brevity. A code selects a particular
PO from those contained in the CPOG.
• Function φ : (V ∪ E) → F (B), with F (B) being the
set of all Boolean functions over variables in B, assigns a
Boolean condition φ(z) ∈ F (B) to every vertex and arc
z ∈ V ∪ E.
1A CPOG is H = (V,E,B, φ, ρ) in [6], we do not use ρ in this paper.
CPOGs can be represented graphically: vertices are depicted
as circles ©, and arcs are depicted as arrows →. Vertices
and arcs are labelled by their conditions φ(z). For example,
Figure 2 shows a CPOG with two possible projections on
top (we define projections in Section II-B3). The purpose of
conditions φ is to switch vertices and arcs on (off) when
the conditions on them are (not) satisfied. We use dashed
circles and arrows to represent vertices and arcs that have been
switched off by their conditions.
The example in Figure 2 shows that a CPOG can be used
to represent multiple behavioural scenarios compactly by over-
laying their common parts. In practice CPOGs remain compact
and easy to understand even when the number of scenarios
increases, making the formalism suitable for representing a
large class of hardware systems.
B. Design methodology
This section reviews the design methodology based on the
CPOGs [7], see Figure 3. The scenarios of a system are
formally specified as a scenario specification (in the form of
a set of partial orders). Scenarios are composed into a system
specification (in the form of a CPOG), which represents the
complete system behaviour. The latter is used to synthesise a
hardware controller (in the form of gate-level description in
Verilog). The presented approach enables the specification of
composition constraints (in the form of codes). The controller
is then automatically interfaced to the specified datapath
modules in the final system implementation.
As a running example, the methodology is applied to the
system described by the scenarios in Figure 1a.
1) Scenario specification: A hardware system is described
by a collection of scenarios, each in the form of a partial
order. Vertices and arcs constitute the basic elements of these
graphs, where vertices represent system operations (or events),
and arcs represent dependencies between them.
System scenarios can be specified either graphically (see
Section V) or textually in a file. Text files containing scenarios
3Scenario encoding and
Composition
projection
H|b=1
projection
H|b=0
0 0
10
1 1
Fig. 2: Example of CPOG with 2 projections: H|b=0 on the left side, H|b=1 on the right.
Composition
Sections 2.2.2 & 2.2.3
Hardware synthesis
Section 2.2.4
System implementation
(Verilog netlist)
Scenario specification
(partial orders)
Sections 2.2.1
Composition constraints
(codes)
Section 4.2
System specification
(CPOG)
Section 2.1
Datapath modules
(Verilog)
Hardware controller
(Verilog netlist)
Interface synthesis
Section 5.1
Fig. 3: Specification and hardware synthesis flow.
are parsed, and each scenario is converted into a graph. As an
example, text-based descriptions of the scenarios in Fig. 1a
are shown below.
s1 = fetch→ decode→ (loadA + loadB)→ ALU→ saveMEM
s2 = fetch → decode → loadA → ALU → savePC
The effort required by engineers to produce such scenario
specifications is high, and it is desirable to extract scenarios
from higher-level descriptions. There are several examples of
high-level specification languages targeting processor architec-
tures, e.g. see Arm’s Architecture Specification Language [13]
and Sail [14]. This aspect of automation is outside the scope of
this paper; we refer the reader to [15] for a relevant example.
2) Scenario encoding: Scenario encoding is the process of
finding an injective function between a set of scenarios and a
set of codes. Let n be the number of scenarios. The following
definitions will be used to formally state the CPOG encoding
problem.
• S is the set of scenarios {s1, s2, ..., sn} described as POs.
• C is the universe of codes {c1, c2..., c|C|} satisfying the
following two properties:
1) |C| = 2|B|;
2) ci 6= cj for 1 ≤ i < j ≤ |C|;
e.g. given a set of Boolean variables B = {b1, b2}, the
corresponding code universe is C(B) = {00, 01, 10, 11}.
• Encoding is a set of n pairs {(s1, c1), ..., (sn, cn)}, where
each scenario si is encoded by the code ci, such that:
– si 6= sj ∧ ci 6= cj for all 1 ≤ i < j ≤ n;
The arithmetic instruction scenario s1 and the unconditional
branch scenario s2 in Fig. 1a can be encoded by one Boolean
variable B = {b}, with the code universe C(B) = {0, 1}. The
encoding illustrated in Fig. 2 is e = {(s1, 0), (s2, 1)}.
Different encodings lead to different CPOGs, and con-
sequently to different hardware implementations, see next
sections.
3) Composition: Let e = {(s1, c1), ..., (sn, cn)} be a sce-
nario encoding for a CPOG H = (V,E,B, φ). The following
definitions will be used to formally state the CPOG synthesis
problem.
• A projection H|ci applies the code ci to all Boolean
conditions of H . The result is a graph Hi, whose vertex/arc
conditions are now fully evaluated to 1 or 0, see Figure 2.
• The operation scen(Hi) removes vertices and arcs with 0
condition, and applies the transitive closure to the resulting
graph, obtaining the scenario si.
The purpose of the above definitions is to let a code ci select
a scenario si from the CPOG:
∀ 1 ≤ i ≤ n, scen(H|ci )= si
The CPOG synthesis process uses the encoding e to syn-
thesise the CPOG H . It produces the encoding functions
F (B) = {f1, f2, ..., fn}, so that the code ci ∈ e selects the
scenario si ∈ e. Following [6], we represent the CPOG H as
the following linear combination of projections:
H = f1H|c1+ ...+fnH|cn=
∑
1≤i≤n
fiHi =
∑
1≤i≤n
fiscen−1(si)
The CPOG synthesis requirement is satisfied if the encoding
functions are orthogonal (fifj = 0, 1 ≤ i < j ≤ n), and are
not contradictions, i.e fi 6= 0 for all 1 ≤ i ≤ n.
4As an example, consider the encoding {(s1, 0), (s2, 1)} of
the scenarios in Figure 1a. The resulting CPOG should be in
the form of H = f1H|c1 + f2H|c2 such that scen(H|c1 ) = s1
and scen(H|c2 ) = s2. The CPOG is represented by the linear
combination H = bH|0 + bH|1, and the encoding functions
f1 = b and f2 = b satisfy the synthesis requirement. Figure 2
shows the resulting CPOG H at the bottom, and the projec-
tions H|b=0 and H|b=1 on the top. The CPOG represents the
system specification.
4) Hardware synthesis: The hardware synthesis step of
the design flow extracts a set of Boolean equations from the
derived CPOG, obtaining an implementation of the controller.
Its area, latency and power strongly correlate with the CPOG
complexity [16], defined as the number of Boolean literals of
conditions φ. An operation v ∈ V can be executed if:
1) it belongs to the current projection, i.e. φ(v) = 1;
2) all preceding vertices have already been executed:
∀u ∈ V, (u ≺ v)⇒ ack(u).
This is captured in terms of Boolean equations as follows:
req(v)= φ(v) ∧ ∏
∀u∈V
[φ(u) ∧ φ((u, v)) ⇒ ack(u)],
where (u, v) is the arc from u to v, req(v) is the request
signal which activates the v operation, while ack(u) is the
acknowledgement signal which comes from the u operation,
and indicates its completion. As an example, the hardware
implementation (in the form of Boolean equations) of the
CPOG in Figure 2 is shown below:
req(fetch) = go
req(decode) = ack(fetch)
req(loadA) = ack(decode)
req(loadB) = b ∧ ack(decode)
req(ALU ) = ack(loadA) ∧ (b⇒ack(loadB))
req(savePC) = b ∧ ack(ALU )
req(saveMEM ) = b ∧ ack(ALU )
done = (b⇒ ack(savePC)) ∧ (b⇒ ack(saveMEM ))
Signals go and done are automatically added into set of
operations to delimit the start and the end of a scenario exe-
cution. The above Boolean equations are used for the synthesis
of the gate-level description of the system hardware controller
(in Verilog), which is in compliance with its scenario specifi-
cation. Finally, the controller can be connected to the specified
synchronous or asynchronous datapath modules automatically,
see Section V-A, and the final system implementation can be
further processed by conventional EDA tools.
III. RELATED WORK
A. Behavioural synthesis
Behavioural synthesis is not new and several other ap-
proaches exist that allow the designer to formally describe
the behaviour of a controller and synthesise the correspond-
ing hardware implementation. The most relevant approaches
are: the work by Cortadella et al. [17] that is based on
Signal Transition Graphs (STG) as the formal specification
model and produces asynchronous controllers; and the work
S1
S2
(a) STG specification.
b=1
r_fet+
go+
s0
r_fet+
go+
b=0
r_all-
go-
r_dec+
a_fet+
r_mem+
a_ALU+r_lA+
a_dec+
r_lB+ r_ALU+
a_lA+
a_lB+
done+
a_mem+
s1 s2 s3 s4 s5 s6
r_all-
go-
done-
a_all-
s13
S1
s11
r_dec+
a_fet+
r_pc+
a_ALU+
done+
a_PC+
r_lA+
a_dec+
r_ALU+
a_lA+
s7 s8 s9 s10 s12
S2
(b) FSM specification.
Fig. 4: The processor specification described as STG and FSM.
by De Micheli [18] that uses synchronous Finite State Ma-
chines (FSM) to derive controllers implemented as microcode
memories or hard-wired control units.
In [6], CPOGs were compared to STGs and FSMs in
terms of their compactness and ease of use when specifying
asynchronous circuits. Below we highlight the main reasons
for using CPOGs in the broader context of scenario-based
synthesis.
• The separation of datapath (scenarios) and control (en-
coding) abstraction layers enables scenarios to remain
unchanged when the encoding changes.
• Underlying partial orders can efficiently represent highly
concurrent systems without incurring exponential state
explosion.
• Scenario composition allows CPOGs to remain compact
even when the size of the specification grows.
• Opportunity to minimise various design criteria (e.g. area,
power, latency) by scenario encoding, which is our main
goal.
In this paper, we compare CPOGs, STGs and FSMs prac-
tically by synthesising real scenario-based specifications. Our
benchmarks, evaluated in Section VI, highlight that: (1) the
STG methodology does not scale to specifications that include
many scenarios, (2) the presented approach shows better
results than the FSM methodology.
As an example of specifications, Fig. 4a and 4b show an
STG and FSM model of the processor scenarios in Fig. 1a. In
these figures: red transitions are the inputs of the designed
5controller, blue ones are the outputs and green ones are
dummy transitions (used to simplify the model). In the STG
in Figure 4a, the two scenarios are mutually excluded via
the choice place p1, and the causality dependencies of their
operations are modelled via sequences of request/acknowledge
transitions. The two scenarios are encoded by the same en-
coding used in Figure 2: {(s1, 0), (s2, 1)} on one bit b. STG
specifications are handled by the EDA tools Petrify [19] and
MPSat [20], which synthesise asynchronous implementations
using different algorithms. Petrify uses binary decision dia-
grams [17], while MPSat uses Petri net unfoldings [20].
In the FSM specification in Figure 4b, the two scenarios
are selected via one bit b observed at the rising of the go
signal, which starts the computation. Upon the completion of
each scenario, all output requests r all are reset, and the FSM
returns to the initial state s0 when all input acknowledge-
ments a all are also reset. Such specifications are described
in VHDL as FSMs, and are handled by Design Compiler [21]
to derive synchronous controllers. We applied concurrency
reduction [6] to some of the considered FSM specifications
not to incur state explosion, see benchmarks in [9].
B. CPOG scenario composition
The characteristics of the synthesised hardware controller
correlate with the encoding selected [8]. In this work, we
present a metric for extracting such a correlation, and an
algorithm for approaching the efficient behavioural compo-
sition heuristically. In this section, we report other encoding
techniques available for the efficient composition of scenarios
into a CPOG.
The Single-literal encoding [16] is based on the graph
colouring algorithm [22]. It finds and encoding under the
constraint that each Boolean equation φ(z) of the synthesised
CPOG can have at most 1 literal. The number of Boolean
variables |B| determines the colours available for solving
the graph colouring problem, and can be increased above
dlog2|S|e automatically.
The SAT-based encoding [8] uses SAT solvers (CLASP [23]
or MINISAT [24]) for minimising the synthesised CPOG
Boolean equations. The number of Boolean variables |B| for
encoding is set by the user. In this paper, we set |B| =
dlog2|S|e.
In Section VI, we show that the above approaches do not
scale well to high number of scenarios (|S| > 15).
IV. SCENARIO COMPOSITION ALGORITHM
The optimal scenario encoding problem is NP-
complete [16]. Finding the encoding that optimises a
target hardware characteristic can be only achieved by
synthesising and comparing all available encodings. In
practice, this exhaustive search is infeasible due to the
exponential growth of number of available encodings |E|
when either the number of |S| scenarios or |C| codes
increases, |E| defined in Section IV-A. This motivates the
proposed Heuristic encoding, described in this section.
TABLE I: Symmetric encodings derivable from e1, e.g. e2 is
symmetric to e1, as it can be obtained by negating the Boolean
variable b1 in all the codes in e1.
Scenarios e1(b1b2) e2(b1b2) e3(b1b2) e4(b1b2)
s1 00 10 01 11
s2 01 11 00 10
s3 10 00 11 01
s4 11 01 10 00
A. Symmetric encodings
It is inefficient to inspect encodings that result in similar
hardware implementations. This is the case for symmetric
encodings, which are best explained by an example. The
encoding e1 = {(s1, 00), (s2, 01), (s3, 10), (s4, 11)} has three
symmetric encodings: e2, e3 and e4, see examples in Table I.
A symmetric encoding can be obtained by negating one or
more Boolean variables in all the codes of an encoding. We do
not consider symmetric encodings, as the corresponding im-
plementations differ only in terms of input inverters, which is
insignificant. To rule out symmetric encodings, we always en-
code the first scenario by the first available code, e.g. the zero
code 00..0: e = {(s1, 00..0), · · · , (s|S|, c|S|)} for all e ∈ E .
The symmetry-breaking allows to restrict the universe of
allowed encodings E = {e1, e2, ..., e|E|} to the set that satisfies
the two properties below:
1) All encodings are different: ei 6= ej for 1 ≤ i < j ≤ |E|.
2) No two encodings ei and ej are symmetric.
Given |S| scenarios and |C| codes, the size of the universe of
encodings is:
|E| = (|C| − 1)!
(|C| − |S|)!
Note that at least dlog2ne Boolean variables are needed to
encode |S| scenarios (|B| ≥ dlog2|S|e). In this paper, we fix
the number of such variables to the minimum, and restrict |E|
using |C| = 2dlog2|S|e codes (see Section II-B2).
B. Composition constraints
In real-life systems, there are composition constraints that
restrict the space of allowed encodings, for example due to
backward compatibility requirements. Consider the two sce-
narios in Figure 1a, and assume that the following constraints
must be met:
• The code of the arithmetic instruction (s1) consists of an
arbitrary 2-bit opcode, and two 3-bit operands A and B.
• The unconditional branch (s2) consists of the fixed 00111
opcode, and a 3-bit branch offset.
The above requirements can be ex-
pressed with the composition constraints
G = {(s1, ??XXXXXX), (s2, 00111XXX)}, which
uses 8 Boolean variables B = {b1, b2, ..., b8}.
• The arithmetic instruction 2-bit opcode (b1b2) is denoted
by ??, where each ? is a don’t care bit that becomes either
0 or 1 in the encoding. Each X is a don’t use bit, which
is not used for selecting a PO from those contained in the
6CPOG. In fact, 6 bits are left unused for the two operands
A (b3b4b5) and B (b6b7b8).
• The unconditional branch opcode (b1b2b3b4b5) is fixed to
00111, the remaining 3 bits are left unused for the branch
offset operand (b6b7b8).
As shown in the above example, a constraint g is an assign-
ment g : B → {0, 1, ?, X} of the set of Boolean variables B.
Sets of constraints are used to express composition constraints
G = {(s1, g1), (s2, g2), ..., (s|S|, g|S|)}.
The presented scenario composition algorithm handles com-
position constraints. As an example, the constraints set above
can be satisfied by {(s1, 10XXXXXX), (s2, 00111XXX)}.
On the other hand, {(s1, 00XXXXXX), (s2, 00111XXX)}
is an incorrect encoding as the code 00111000 selects both the
instructions. Codes such as 00XXXXXX and 00111XXX
are said conflicting. The implementation details for satisfying
the composition constraints and finding an initial encoding
prior to the heuristic optimisation are described in Algo-
rithm 1.
Algorithm 1: Algorithm for satisfying the given compo-
sition constraints and finding the initial encoding.
1 Function findInitialEncoding(B,G);
Input : Boolean variables B, a set of |S| constraints
G.
Output : Encoding enc or error.
Parameter: MAX(= 10) number of possibile iterations.
2 C ← (0|B|, · · · , 1|B|) ; // universe of codes
3 enc← (−1, · · · ,−|S|) ; // empty encoding
4 foreach i such that ? 6∈ G[i] do
5 if G[i] /∈ C then return error;
6 enc[i]← G[i] ; // fully constrained si
7 C ← C \G[i];
8 foreach i such that (? ∈ G[i]) ∧ (G[i] 6= ?|B|) do
9 iteration← 0;
10 do
11 code← randomAssignment(G[i])
12 while (code /∈ C ∧ iteration++ < MAX);
13 if code /∈ C then return error;
14 enc[i]← code ; // partially constr. si
15 C ← C \ code;
16 if |C| < |− ∈ enc| then return error;
17 if (G[0] = ?|B|) ∧ (0|B| ∈ C) then
18 enc[0]← 0|B| // avoid symmetric encs.
19 C ← C \ enc[0];
20 foreach i such that G[i] = ?|B| do
21 enc[i]← pickRandom(C) ;
// unconstrained si
22 C ← C \ enc[i];
23 return enc;
The function findInitialEncoding takes as input
the Boolean variables B for encoding and the set of
composition constraints G. The latter is an array of
size |S| whose indexes represent the scenarios and
whose elements represent the constraints. As running
example, we consider the constraints on 5 scenarios
{(s0, ???), (s1, ??X), (s2, ???), (s3, 110), (s4, ???), (s5, ???)}
on the 3 variables B = {b1, b2, b3}, which is represented by
the array G = (???, ??X, ???, 110, ???, ???).
Initially, the universe of codes C is initialised with 2|B|
codes (line 2), and the encoding enc with |G| no-code
symbols (−) as the encoding is initially empty (line 3). The
array enc represents the initial encoding, its indexes represent
the scenarios and its elements represent the codes. In the
example, C and enc are:
C = (000, 001, 010, 011, 100, 101, 110, 111)
enc = (−,−,−,−,−,−)
In lines 4-7, the codes paired with fully constrained scenar-
ios (i.e. ? 6∈ G[i]) are checked to be contained in the universe
of code C (line 5). Subsequently, these scenarios are encoded
by the provided codes (line 6). The latter are removed from C
(line 7), as they cannot be used for encoding other scenarios.
C and enc becomes:
C = (000, 001, 010, 011, 100, 101, 111)
enc = (−,−,−, 110,−,−)
In lines 8-15, partially constrained codes (i.e. ? ∈ G[i] ∧
G[i] 6= ?|B|) are turned into codes, i.e. the function rando-
mAssignment(G[i]) turns their don’t care bits (? ∈ G[i]) into
binary values {0, 1} randomly (line 11). Such resulting codes
(code) encode scenarios si (line 14), and are subsequently
removed from C (line 15). In the example, the constraint
{??X} is turned to {01X}, and is used to remove {010, 011}
from C. C and enc become:
C = (000, 001, 100, 101, 111)
enc = (−, 01X,−, 110,−,−)
The function randomAssignment can introduce conflicting
codes. In the example, the constraint G[1] =??X cannot be
turned to 11X , as the latter is already used for encoding s3
(11X /∈ C). Lines 10-12 can be repeated up to a MAX of 10
times to increase the probability of satisfying all constraints.
If the constraint is still not satisfied, an error is returned
(line 14).
In line 16, an error is returned if the number of codes left
for encoding (|C|) is less than the scenarios that need to be
encoded (|− ∈ enc|). In this case, more codes and bits B are
required for encoding the given S under the constraints G.
In lines 17-19, the first scenario s0 is encoded by the zero
code if it is unconstrained (G[0] =?|B|) and if the zero code
has not been used (0|B| ∈ C). This is necessary for avoiding
symmetric encodings. In the example, C and enc become:
C = (001, 100, 101, 111)
enc = (000, 01X,−, 110,−,−)
7In lines 20-22, unconstrained scenarios (G[i] =?|B|) are
encoded randomly. Codes left are extracted by C, used to
encode scenarios si (line 21), and subsequently removed
from C (line 22). In the example, C and enc become:
C = (101)
enc = (000, 01X, 100, 110, 001, 111)
The output of Algorithm 1 is the encoding enc, which
satisfies G and can be optimised via the heuristics that we
will describe shortly.
C. Heuristic cost function
The main idea of the heuristics is to encode similar
scenarios by similar codes. Similarities between codes are
determined using the classic Hamming distance metric [25].
Similarities between scenarios, on the other hand, are de-
termined by referring to their partial order representation.
Consider two scenarios s1 = (O1,≺1) and s2 = (O2,≺2).
The distance between s1 and s2 is computed following the
two rules below:
1) An operation o ∈ O1 counts as a difference if o /∈ O2.
2) A dependency (os ≺ ot) ∈≺1 counts as a difference if
(os ≺ ot) /∈≺2, and if it connects two operations which
are both present in the operation sets of the two scenarios:
(os ∈ O1 ∧ os ∈ O2) ∧ (ot ∈ O1 ∧ ot ∈ O2).
Distances between pairs of scenarios are elements of
the Scenario Distance Matrix SD. Distances between pairs
of codes are elements of the Code Distance matrix CD.
Both SD and CD have size |S| × |S|, where |S| is the
size of the scenario specification. Elements SDij and
CDij represent the number of differences between the ith
and jth scenarios and codes, respectively, in an encod-
ing e = {(si, ci), (sj , cj), · · · , (s|S|, c|S|)}). These matrices
are used to evaluate encodings heuristically via the below cost
function:
F(S, e) =
∑
0≤i<j≤|S|
(SDij − CDij)2 (1)
Intuitively, minimising F means encoding similar scenarios
with similar codes. We evaluated the cost function F empiri-
cally, by analysing several scenario specifications.
As an example, Figure 5a shows the analysis of a subset
of 8 scenarios of the Intel 8051 [7] scenario specification,
where the universe of encoding E is fully inspected, and
5040 controllers are synthesised with a 90 nm technology
library [9]. The size of the controllers is plotted against the
heuristic value F of the corresponding encodings. Figure 5b,
in turn, shows the analysis of the scenario specification of
the Arm Cortex M0+ [11], composed of 11 scenarios. In this
figure, 102 controllers produced by the proposed algorithm,
described in Section IV-D, are compared to 105 controllers
produced by encoding scenarios randomly.
The two figures highlight the existence of a correlation
between the controller area and F , and suggest the following
two claims:
Co
nt
ro
lle
r a
re
a 
[u
m
2 ]
Cost function value [F]
Promising
candidates
|E| = 5040
260
300
340
380
420
460
500
540
60 80 100 120 140
(a) All controllers (E) plotted with respect to the cost function F
(Intel 8051 [7], subset of 8 scenarios).
Cost function value [F]
C
on
tr
ol
le
r 
ar
ea
 [
um
2 ]
V
ar
ia
bi
lit
y
Random search (#e = 105)
Proposed approach (#e = 102)
200
250
300
350
400
450
500
550
80 100 120 140 160 180 200 220 240
(b) 102 heuristic and 105 random controllers are compared, and
plotted with respect to F (Arm Cortex M0+ [11], 11 scenarios).
Fig. 5: The presented cost function is studied over two
benchmarks.
• the likelihood of synthesising efficient implementations is
higher where F is lower, see Promising candidates in
Fig. 5a;
• the likelihood of synthesising efficient implementations is
proportional to the number of encodings inspected, due
to the inaccuracy of the heuristics, see Variability span in
Fig. 5b.
D. The heuristic encoding algorithm
The presented heuristic algorithm is based on the cost
function F , and on the below implementation of the simu-
lated annealing (SA) [26]. The latter is a heuristic method
for solving optimisation problems where a function must be
minimised in a large search space. The algorithm pseudo-code
is shown in Algorithm 2.
8Algorithm 2: The presented Heuristic encoding algorithm.
1 Function heuristicEncoding (B,S,G)
Input : Boolean variables B, scenarios S and
their constraints G.
Output : Heuristic encoding enc.
Parameters: t0 = 10, a = 0.996, te = 0.1
2 enc← findInitialEncoding(B,G);
3 encbest ← enc;
4 C ← (0|B|, · · · , 1|B|) \ enc[0];
5 while (t0 > te) do
6 encnext ← enc;
7 i← pickRandomScenario(1 ≤ i < |S|);
8 code← pickRandomCode(C);
9 if ∃ j such that code ∈ encnext[j] then
10 encnext[i]↔ encnext[j];
11 else
12 encnext[i]← code;
13 if ∀ i satisfy(encnext[i], G[i]) then
14 if F(S, encnext) < F(S, encbest) then
15 encbest ← encnext;
16 d← F(S, encnext)−F(S, enc);
17 v ← pickRandomValue(0 ≤ v < 1);
18 if v < e−
d
t0 then
19 enc← encnext;
20 t0 ← t0 × a ; // cool down t0 by a
21 return encbest;
22 Function satisfy (c, g)
Input : A code c, and a constraint g.
Output : Boolean values True or False.
23 if |c| 6= |g| then return False;
24 foreach 1 ≤ i ≤ |g| do
25 if (c[i] = 1) ∧ (g[i] = 0) then return False;
26 if (c[i] = 0) ∧ (g[i] = 1) then return False;
27 if (c[i] 6= X) ∧ (g[i] = X) then return False;
28 return True;
The inputs of the function heuristicEncoding are the
Boolean variables B, the scenarios S and constraints G. We
continue the running example used for the Algorithm 1, where
constraints G = {???, ??X, ???, 110, ???, ???} were turned
to enc = {000, 01X, 100, 110, 001, 111} by the findInitialEn-
coding function (line 2). The encoding is also copied into
encbest (line 3), which represents the best encoding found
during the SA search.
Simulated annealing parameters were calibrated experimen-
tally. The initial temperature is t0 = 10. The cooldown factor
alpha is a = 0.996, and the ending temperature is te = 0.1.
These parameters can be modified for increasing or decreasing
the number of iterations for the SA optimisation.
Line 4 initialises the universe of codes C. The code of
the first scenario enc[0] is removed for avoiding symmetric
encodings.
Lines 5-21 minimise the initial encoding heuristic value
F(S, enc) by repeatedly swapping pairs of codes in the
encoding, until the initial temperature t0 reaches the ending
temperature te (line 5). Line 6 stores the current encoding enc
into the the next encoding encnext. Lines 7-8 select a random
scenarios si in enc (1 ≤ i < |S|), and a random code cj in
C, respectively. Such indexes are used for swapping codes in
encnext (see lines 9-12). In the example, if i = 4 and j = 7,
the fourth scenario (encoded by 001) is swapped with the code
111 (which identifies s5 in enc). enc and encnext become:
enc = (000, 01X, 100, 110, 001, 111)
encnext = (000, 01X, 100, 110, 111, 001)
The next encoding encnext is considered if it satisfies the
composition constraints G (line 13). The function satisfy
(lines 22-28) checks that the bit size of the code matches the
bit size of the constraint (line 23), and that the bits constrained
by {0, 1, X} hold these values in the final code (lines 24-27).
Notice that bits constrained by ? do not need to be checked,
as both logic values {0, 1} satisfy such constraints.
In lines 14-15, encnext replaces the best encoding
encbest found during the SA optimisation if the
former has a lower heuristic value than the latter,
i.e. F(S, encnext) < F(S, encbest). In lines 16-
19, encnext also replaces enc either if the
former has a lower heuristic value than the latter
(i.e. v < e−
d
t0 for all 0 ≤ v < 1 ∧ d ≤ 0), or if the extracted
random value v is lower than e−
d
t0 , with d > 0. In the second
case, a worse encoding (with a higher F) replaces enc.
The randomness allows the heuristicEncoding to return
a different encbest (output) at every execution. The solution
space is connected, as all e ∈ E are reachable by a set of swap
moves.
The current implementation of the algorithm is run in a
single thread of execution. However, multiple instances of the
heuristicEncoding function can be run on multiple threads,
resulting in several encodings to be produced concurrently.
The parallelisation of the presented algorithm is left as future
research.
E. Correctness
An encoding enc, constrained by composition constraints G,
is said to be correct if:
1) enc satisfies G for all 1 ≤ i ≤ |S|. I.e. Every code enc[i]
can be derived from G[i] by substituting every ? by either 0
or 1.
2) the encoding enc does not contain conflicting codes, which
do not identify scenarios univocally (see Section IV-B);
Whenever Algorithm 1 terminates, a correct encoding enc
is returned by construction. I.e. the result enc is constructed by
selecting the codes for encoding from the universe C, which
only contains valid codes being derived by the number of
bits |B| selected for encoding (line 2). Fully and partially
constrained scenarios are always encoded by codes derived
9by their constraints, see lines 6 and 9-14, respectively. Thus
the resulting enc always satisfies the constraints G. Also,
overlapping codes cannot be introduced in the final encoding
result enc: a code is always removed from C when it is used
to encode a scenario and thus cannot be reused to encode a
different scenario, see lines 7, 15, 19 and 22.
On the other hand, Algorithm 1 generates an error if the
constraints G cannot be met for any of the following reasons:
• The user introduces overlapping constraints, see line 5.
• A partially constrained code is not turned into a code c
left for encoding (c 6∈ C) in any of the MAX iterations,
see lines 9-13. An optimal solution would be to run an ex-
haustive search, which we avoid to reduce the algorithmic
complexity.
• The number of codes |C| is not enough for encoding a set
of scenarios with size |G| with the given constraints G,
see line 16.
With regards to Algorithm 2, the function heuristicEn-
coding handles the output of the previous function enc,
and advances to encbest through a sequence of swap moves
that inspects many intermediate encodings encnext. Given a
correct enc (see line 2), each intermediate encoding encnext
derived by a swap (see lines 7-12) is always an encoding with
no conflicting codes (i.e. code swap does not introduce en-
coding conflicts). Intermediate encodings can replace encbest
only if they satisfy the constraints G (lines 13 and 22-27).
Consequently, encbest is also correct by construction. The two
algorithms always terminate, as there are not infinite loops.
F. Time complexity analysis
The function findInitialEncoding is constituted by a se-
quence of three loops. The first one (lines 4-7) encodes
fully constrained scenarios by moving their codes into the
encoding enc. Its complexity only depends on the number
of fully constrained codes introduced: O(|S|). The second
loop (lines 8-15) encodes partially constrained scenarios by
looping over the bits |B| of each constraint, in order to flip
every ? to {0, 1}. Thus, its complexity is: O(|S| · |B|). The
third loop (lines 20-22) makes use of the function pickRan-
dom (O(1)) to extract codes left in the code universe C and
encode the unconstrained scenarios. Its complexity depends on
the constraints: O(|S|). Consequently, the complexity of the
Algorithm 1 (A1) comes from the second loop. In this paper,
we assume that |B| = dlog2|S|e, hence the below equation:
O(A1) = O(|S|) +O(|S| · |B|) +O(|S|) = O(|S| · log |S|)
On the other hand, the function heuristicEncoding, ex-
cluding the internal findInitialEncoding function in line 2,
is constituted by a loop that implements an exponential
multiplicative cooling strategy of the simulated annealing
algorithm (SA) [26], i.e. an initial temperature t0 is multiplied
by a constant factor a at each iteration, until an ending
temperature te is reached. This causes a fixed number of
iterations n that can be tweaked by modifying these param-
eters. At each iteration of the SA, the most computationally
expensive statements are in lines 13 and 22-28: where the
encnext is checked against the constraints G, and in lines 14
and 16: where the function F has to be computed. The former
has a complexity of O(|S| · dlog2 |S|e), as the encoding has
to be checked for every bit of each constraint. The latter
has a complexity of O(|S|2), see Formula 1. Consequently,
Algorithm 2 (A2) has the following time complexity:
O(A2) = n ·
[O (|S| · dlog2 |S|e) +O (|S|2)] = O (n · |S|2)
This analysis disregards the implementation details of the
further set of functions (e.g. pickRandomScenario) that
the proposed algorithm rely on. However, these additional
functions do not increase the above time complexity if im-
plemented reasonably.
V. DESIGN AUTOMATION
The design methodology described in Section II-B is imple-
mented in the EDA tool SCENCO [9], which stands for SCE-
Nario ENCOder. It features the following scenario encoding
algorithms.
1) Exhaustive search fully explores the universe of encod-
ings E .
2) SAT-based encoding and Single-literal encoding are
described in Section III.
3) Heuristic encoding is the proposed algorithm (Sec-
tion IV).
4) Random search encodes scenarios randomly.
5) Sequential encoding assigns codes sequentially,
i.e. {(s1, 000), (s2, 001), (s3, 010), ...}.
SCENCO relies on Espresso [27] for Boolean minimisation,
and Abc [28] for technology mapping, the gate library is
specified in the GenLib format [29]. Abc is also used for
producing synthesised controllers in the Verilog file format.
SCENCO also uses Clasp [23] and MiniSAT [24] SAT solvers
for supporting the SAT encoding.
SCENCO graphical user interface is described in [30], Fig-
ure 6 shows and describes an example of the applied design
methodology in WORKCRAFT.
A. Interface synthesis
The synthesis of the interface between the controller and
the datapath has been automated in the EDA tool [9], relying
on the ideas elaborated in [31] and summarised below.
The controller can be interfaced either with asynchronous
datapath modules, relying on the reqest/acknowledge hand-
shake, and to synchronous modules using matched delays [32],
which produce acknowledgement signals after a chosen delay.
In turn, since the controller resets request signals only at the
end of each scenario execution, decouple and merge [31] are
needed to release datapath modules immediately after they
acknowledge their completion. Also, merge is used when a
module is executed multiple times within a scenario, see a
schematic of the interface in Figure 7.
The developed tool [9] takes as input the datapath modules
in the form of Verilog, and interfaces them to the synthesised
controller automatically. The produced Verilog file contains
the final system implementation, see Figure 3.
10
Fig. 6: The methodology described in Section II-B is implemented in WORKCRAFT. Scenario specification: three scenarios
(LOAD, ADD and PUSH) are introduced in the form of text (Tools controls window), and are parsed and converted into partial
orders (Instruction set generated [CPOG] window). Scenarios can be also entered or edited graphically. Scenario encoding:
scenarios are encoded using the chosen algorithm (Encoding menu). The encoding {(LOAD, 01), (ADD, 10), (PUSH, 11)} is
found in the example. Composition: The CPOG is generated automatically (Conditional partial order graph [CPOG] window).
Hardware synthesis: the hardware controller is synthesised from the CPOG (Synthesised controller [circuit] window). The
controller interface is highlighted in the window Controller interface [circuit]. The hardware controller can be simulated and
formally verified using other tools available in the WORKCRAFT framework. For example, the window Simulation of the PUSH
instruction [DTD] shows a simulation of the PUSH scenario of the controller.
async
sync
clock
datapathcontrol
decouple
merge
req_a1
ack_a1
req_a
ack_a
req_s1
req_s2
req_sack_s1
ack_s2
ack_s matched
delay
buscontroller
Fig. 7: Interface between controller and datapath.
VI. ALGORITHM AND TOOL VALIDATION
We validate the presented algorithm and tool over a set
of benchmarks coming from three domains: ad-hoc con-
trollers, processor instruction sets and process mining in Sec-
tions VI-B, VI-C and VI-D, respectively. Experimental results
are compared with existing scenario composition approaches
on all benchmarks, and also with the behavioural synthesis
methodologies based on STGs and FSMs in the processor
benchmarks. All used benchmarks can be found online (see
benchmarks folder in [9]), and can be displayed and run in
WORKCRAFT [10].
A. Configuration and notation
We run our experiments on an Intel-i7-3610QM 2.30 GHz
CPU, with 8 GB DDR 1600 MHz RAM Memory. Benchmarks
are:
1) Specified in the form of partial orders in
WORKCRAFT [10], and synthesised by the presented tool
SCENCO [9].
2) Specified as STGs in WORKCRAFT, and synthesised by
Petrify [19] and MPSat [20].
3) Specified as synchronous FSMs in VHDL, and synthesised
by Synopsys Design Compiler [21].
The same 90nm gate library is used for technology mapping.
For presenting benchmark results, we use the following
notation. #e denotes the number of encodings generated and
synthesised by the proposed algorithm. The smallest controller
out of these is shown as result. Area (|B|) denotes the
area [µm2] of the resulting controller, with the number of
bits used for encoding in brackets. RT denotes the tool
runtime [s], which is the time that goes from parsing the
specification to obtaining the final implementation. We only
consider results produced within a runtime of 1 hour, denoted
in turn as timeout TO. Finally, we use the dash character ‘−’
when a behavioural synthesis approach cannot be applied to
11
TABLE II: Comparison of CPOG scenario encoding algo-
rithms over the ad-hoc controller benchmarks. Units of mea-
sure: Area (|B|) = [µm2] (number of bits).
Model |S| Exhaust. Single-lit. SAT-based Proposed
#e = 10
Buck contr. 4 266 (2) 261 (3) 266 (2) 266 (2)
Rec. pipe. 13 TO 357 (12) TO 481 (4)
a benchmark due to a technical issue, see textual description
for an explanation.
Controllers derived by FSMs and STGs include sequen-
tial components (registers and C-elements, respectively) for
holding system states. In the results, we only consider the
combinational part of the controllers for not penalising them
in the comparison.
B. Ad-hoc controllers
The first set of benchmarks includes an on-chip power
management controller of a buck converter [33], and an
asynchronous controller for the reconfigurable pipeline of a
dataflow processor [34].
The power management controller is required to regulate
the activation of the PMOS (gp) and NMOS (gn) transistors
in response to three signals coming from sensors within the
power regulator: over-current (oc), under-voltage (uv) and
zero-crossing (zc). The two transistors must never be on at the
same time to avoid a short circuit. Two of the four scenarios
that compose the power management controller are shown in
Figure 8a and described below.
Over-current scenario: when the oc condition is detected
(event oc+), the PMOS transistor must be switched off
(event gp-). Afterwards, the NMOS transistor must be
switched on (event gn+).
Zero-crossing followed by under-voltage scenario: If zc is
detected before uv, the NMOS transistor must be switched
off (event gn-). The two transistors must stay off until
the arrival of the uv condition. Afterwards, the PMOS
transistor must be switched on (event gp+).
The asynchronous dataflow processor contains a 16-stage
reconfigurable pipeline for statistical analysis of data streams.
Its controller manages the energy-quality of the result by con-
trolling the number of active pipeline stages. It is an important
case study, as it was fabricated in an ASIC and tested [34].
Three of its 13 scenarios that compose the reconfigurable
pipeline are specified below in the text-form, i.e. the scenario
s1 activates 4 pipeline stages, the s2 activates 5 stages, up to
the scenario s13 that activates all 16 stages of the pipeline:
s1 = stage1 → stage2 → stage3 → stage4
s2 = stage1 → stage2 → stage3 → stage4 → stage5
...
s13 = stage1 → stage2 → · · · → stage16
Evaluation: Table II shows the results upon application of the
state-of-the-art encoding algorithms. The Single-literal encod-
ing produces a 1.9% smaller buck controller in comparison
to other approaches, and uses one more variable than needed
(|B| = 3). On 2 variables, the optimal controller is generated
by the Exhaustive search by definition. Such a controller is
also achieved by the SAT-based and by the proposed Heuristic
algorithm.
The reconfigurable pipeline controller is not produced
within the considered timeout by the Exhaustive and the SAT-
based algorithms, due to the complexity of the corresponding
scenario specification. The Single-literal controller is ' 25.8%
smaller than the controller produced by the proposed encoding
technique, and uses 3× more variables. In [34], we imple-
mented the controller produced by the proposed encoding
technique, as the final design was constrained by the pins of
the external package.
The runtime of the tool for processing the above bench-
marks is always less than 1 s.
C. Processor instruction sets
The second set of benchmarks includes different subsets
of instructions of the ARM Cortex M0+ [11], Texas Instru-
ments MSP430 [8] and Intel 8051 [7], [35]. These processor
specifications were derived by analysing their corresponding
ISA reference manuals, and identifying classes of instructions
(scenarios) that share similar functionalities and addressing
modes.
In regards to the design of real processors, the above
manual scenario extraction approach is not ideal to obtain
accurate specifications. However, recent research on specifica-
tion languages for processor architectures (see Section II-B1)
enables to fully specify the behaviour of modern systems
comprising hundreds of instructions, and to derive accurate
specifications for synthesising real processors. In this context,
the presented algorithm is important as it scales well to
hundreds of scenarios (as we show in Section VI-D) making
the CPOG-methodology suitable to the design of such modern
systems.
The ARM Cortex M0+ scenario specification is fully de-
scribed in [11]. This processor has an ISA constituted of 68
instructions. The specification composed of 11 scenarios and
6 datapath modules (scenario operations) models 61 of these
instructions. As an example, two scenarios of the specification
are shown in Figure 8b and described below.
Load (reg.) covers the LDR (reg.) instruction. The ALU op-
eration computes the memory address, the MAU loads
a value from the memory and stores it into a specified
register. The IFU fetches a new processor instruction.
Arit/Log (Imm.) covers arithmetical, logical and data transfer
instructions with immediate addressing, e.g. ADD (imm.),
LSR (imm.). An immediate value is fetched from the
instruction register (PCIU → IFU), and used as operand
for the selected operation (ALU). The result is stored into a
specified register. The ALU operation is executed concur-
rently with the program counter incrementation (PCIU/2).
The resulting PC is used for fetching a new instruction
(IFU/2).
12
Zero-crossing followed by under-voltage
Over-current
(a)
Cond. ALU op.
#123 to Rn
ALU op. #123 to Rn
(c)
Arit/Log (Imm.)
Load (Reg.)
(b)
ARM specification (proposed encoding)
b3 b4 b1 b2
b2
b4
b4
b4
b3
b4b3
b4
b4
b4 b3
b4 b1
ARM specification (random search)
b1 b2
b4
b4
b2 b4 b3
b4b2
b3 b4
b1 b4
b1 b2 b4
b1 b2 b1 b3 b4
b2 b3 b4
b1 b3
b3
b1
b1
b2
(d)
Fig. 8: (a) shows 2 scenarios of the Buck controller specification [33]. (b) shows 2 scenarios of the ARM CORTEX M0+ [11].
(c) shows 2 scenarios of the TI MSP430 specification [8], the scenario Cond. ALU op. #123 to Rn is a conditional scenario in
the form of CPOG, due to the conditional operation ALU/2. (d) shows the CPOGs synthesised from full scenario specification
in [11], i.e. the CPOG derived by the proposed encoding is on the left-hand side, and the one derived by the Random search
is on the right-hand side.
The Texas Instruments (TI) MSP430 scenario specification
has been introduced in [8]. The specification composed of 8
scenarios and 7 datapath modules models the full instruction
set composed of 51 instructions. This benchmark is important
as some of its scenarios have conditional elements. As an
example, two of its scenarios are shown in Figure 8c and
described below:
ALU op. #123 to Rn executes an arithmetic operation be-
tween two general purpose registers {A,B}, and writes
the result back into one of them (ALU). Operations
PCIU → IFU fetch a new instruction concurrently.
Cond. ALU op. #123 to Rn executes the above arithmetic
operation between two general purpose registers {A,B}
conditionally (ALU/2), on the condition le = A < B
set by the ALU. The flag le is input to the synthesised
controller, and is used for managing the activation of
ALU/2.
The second scenario is said to be conditional, and can be
described in the form of a CPOG. Conditional scenarios can be
composed regularly with other scenarios, see [16] for further
details.
The Intel 8051 specification supported the design of an
asynchronous version of this processor [35]. It comprises 37
scenarios and 17 datapath modules that model 255 processor
instructions. It is important to the CPOG validation, as it
contains 3× more scenarios and 2× more operations than the
other processor benchmarks.
Evaluation: Table III shows the results of the applied state-
of-the-art CPOG encoding algorithms to the described set of
processor benchmarks. This set is also used to compare the
proposed algorithm based on CPOGs to the methodologies
based on FSMs and STGs, as it is the most diverse set being
characterised by (1) specifications of different sizes (from 4 to
37 scenarios), (2) specifications with conditional scenarios (see
TI MSP430), (3) specifications comprising a different number
of datapath modules (from the ARM processor with 6, to the
Intel with 17). For these reasons, it is able to highlight the
characteristics of all used approaches to behavioural synthesis.
The Exhaustive search produces the smallest instruction
decoders using dlog2|S|e variables. In practice, it is applicable
to specifications that contains up to 8 scenarios, as its runtime
increases exponentially with the specification size.
The Single-literal encoding produces the smallest instruc-
tion decoders in most of the cases when it does not exceed the
13
TABLE III: The proposed algorithm is compared with existing CPOG composition techniques, and with the FSM and STG
synthesis approaches, over 26 processor instruction set benchmarks. Bold results are the smallest controllers for each model.
Units of measure: Area (|B|) = [µm2] (number of bits), Runtime (RT) = [s].
Model |S| Exhaustive Single-literal SAT-based Proposed Proposed FSM (seq. encod.) STG (one-hot encod.)
#e = 10 constr., #e = 10 dc shell Petrify / MPSat
Area (|B|) RT Area (|B|) RT Area (|B|) RT Area (|B|) RT Area (|B|) RT Area (|B|) RT Area (|B|) RT
4 162 (2) 1 177 (4) 1 162 (2) 1 162 (2) 2 167 (2) 1 193 (2) 5 265 / 200 (4) 51 / 93
5 179 (3) 162 202 (5) 1 201 (3) 1 182 (3) 2 192 (3) 2 227 (3) 7 243 / 242 (5) 47 / 295
ARM 6 201 (3) 524 198 (5) 1 225 (3) 3 201 (3) 2 241 (3) 2 303 (3) 7 − / 263 (6) − / 270
Cortex 7 209 (3) 1051 198 (5) 1 225 (3) 3 226 (3) 2 249 (3) 2 316 (3) 7 − / 319 (7) − / 409
M0+ 8 180 (3) 1005 165 (5) 2 224 (3) 2 224 (3) 2 230 (3) 2 345 (3) 7 − / 343 (8) − / 981
9 TO TO 220 (5) 1 269 (4) 1 224 (4) 2 235 (4) 2 457 (4) 6 − / 456 (9) − / 2232
10 TO TO 218 (5) 1 232 (4) 1 241 (4) 2 252 (4) 2 467 (4) 10 − / TO − / TO
11 TO TO 212 (5) 1 246 (4) 2 249 (4) 2 279 (4) 2 498 (4) 7 − / TO − / TO
4 154 (2) 1 177 (4) 1 − − 154 (2) 2 171 (2) 1 288 (2) 5 292 / TO (4) 119 / TO
Texas 5 174 (3) 162 181 (6) 1 − − 180 (3) 2 193 (3) 1 294 (3) 7 − / TO − / TO
Instruments 6 189 (3) 489 191 (7) 2 − − 205 (3) 2 235 (3) 1 384 (3) 7 − / TO − / TO
MSP430 7 252 (4) 1059 223 (8) 1 − − 276 (3) 2 293 (3) 2 376 (3) 7 − / TO − / TO
8 299 (4) 1145 304 (8) 1 − − 321 (3) 2 345 (3) 2 390 (3) 6 − / TO − / TO
4 175 (2) 1 166 (3) 1 175 (2) 1 175 (2) 2 175 (2) 1 240 (2) 7 − / TO − / TO
5 175 (3) 165 170 (4) 1 189 (3) 1 178 (3) 2 193 (3) 2 335 (3) 7 − / TO − / TO
6 214 (3) 521 196 (5) 1 235 (3) 1 226 (3) 2 224 (3) 1 377 (3) 7 − / TO − / TO
7 234 (3) 1111 242 (6) 1 239 (3) 1 240 (3) 2 260 (3) 1 422 (3) 7 − / TO − / TO
8 295 (3) 1145 267 (7) 1 TO TO 302 (3) 2 312 (3) 2 498 (3) 7 − / TO − / TO
9 TO TO 286 (8) 1 303 (4) 13 322 (4) 2 347 (4) 2 519 (4) 10 − / TO − / TO
Intel 8051 10 TO TO 464 (9) 2 TO TO 335 (4) 2 368 (4) 2 565 (4) 10 − / TO − / TO
15 TO TO TO TO TO TO 679 (4) 2 736 (4) 2 943 (4) 12 − / TO − / TO
20 TO TO TO TO TO TO 822 (5) 2 842 (5) 2 1187 (5) 15 − / TO − / TO
25 TO TO TO TO TO TO 1147 (5) 3 1202 (5) 3 1519 (5) 17 − / TO − / TO
30 TO TO TO TO TO TO 1376 (5) 4 1450 (5) 4 1879 (5) 20 − / TO − / TO
35 TO TO TO TO TO TO 1690 (6) 4 1741 (6) 4 2159 (6) 21 − / TO − / TO
37 TO TO TO TO TO TO 1879 (6) 4 2037 (6) 4 2336 (6) 20 − / TO − / TO
time limit. However, synthesised decoders might not be appli-
cable to real processors, as the code size |B| is fixed by the
algorithm rather than by the processor (op)code specifications.
The SAT-based encoding produces decoders with an average
overhead of ' 7.4% in comparison to Exhaustive decoders.
The current implementation does not support scenarios in the
form of CPOGs (see missing results − in the TI MSP430
rows). The runtime of the SAT-based and Single-literal ap-
proaches increase exponentially (exceeding the timeout) when
|S| grows.
On average, the Proposed encoding produces implemen-
tations with an area overhead of ' 4.5% in comparison to
optimal solutions. It scales to higher number of scenarios
(see Intel 8051 results), and supports scenarios in the form
of CPOGs (see TI MSP430 results). The runtime is always
within the timeout. As an example, Figure 8d shows the ARM
system specifications obtained by composing its 11 constituent
scenarios via the proposed encoding (left-hand side) and via
the random search (right-hand side). The ‘proposed’ CPOG
contains shorter conditions φ.
We also run the Proposed encoding by constraining
⌈
|S|
2
⌉
scenarios of every processor specification randomly, using
{0, 1, ?, X}. The resulting decoders always satisfy the com-
position constraints given, and have an overhead of ' 12.4%,
on average, in comparison to optimal implementations.
Finally, we used the behavioural synthesis approaches
based on Finite State Machines (FSM) and Signal Tran-
sition Graphs (STG), in order to show that the proposed
methodology shows better results in comparison to established
techniques in the field. The approach based on synchronous
FSM and Design Compiler (known as dc shell in the Synopsys
tool-chain) is always able to synthesise controllers from the
given specifications with the usage of the sequential encoding.
Synthesised implementations show an average area overhead
of ' 56% in comparison to the proposed unconstrained
approach. The processing runtime is comparable.
On the other hand, the methodology based on STG is never
able to synthesise implementations from the given specifica-
tions with the sequential encoding. The results shown on the
table are derived with the one-hot encoding, which simplifies
the specifications by replacing the go transitions and their
dependencies with the codes, see [9]. However, even after
this simplification, the methodology is not often successful.
In most cases, Petrify returns the error “support too big for
minimisation” (see missing results − on the left-hand side of
the STG column), and MPSat does not find a solution within
the given time limit (see TO entries on the right-hand side).
MPSat is partially successful with the ARM Cortex M0+,
whose scenarios include fewer datapath modules (6 compared
to the 17 modules of the Intel 8051) and which does not
include conditional scenarios (as the TI MSP430). On average,
the methodology based on STG has an area overhead of
' 51% in comparison to the proposed unconstrained approach,
and a much higher synthesis runtime.
D. Software output logs
The third set of benchmarks includes scenario specifications
that describe a set of different software output logs [36].
14
TABLE IV: Three configurations of the proposed algorithm are compared with trivial CPOG composition techniques on 11
software output logs divided in (S)mall, (M)edium and (L)arge sizes. Bold results are the smallest controllers for each model.
Units of measure: Area (|B|) = [µm2] (number of bits), Runtime (RT) = [s].
Model |S| Sequential Random search (a) Proposed (b) Proposed (c) Proposed
#e = 1 #e = 10 #e = 1, SA ×10
Area (|B|) Runtime Area (|B|) Runtime Area (|B|) Runtime Area (|B|) Runtime Area (|B|) Runtime
BigLog1 16 503 (4) 1 530 (4) 1 495 (4) 1 389 (4) 2 447 (4) 1
(S) Purchasetopay 20 581 (5) 1 713 (5) 1 502 (5) 1 434 (5) 3 546 (5) 1
BigLog2 26 953 (5) 1 906 (5) 1 699 (5) 1 565 (5) 3 568 (5) 1
Log2 32 1163 (5) 1 1271 (5) 1 969 (5) 1 825 (5) 3 815 (5) 1
Incidenttelco 77 3531 (7) 1 3490 (7) 1 2953 (7) 1 2927 (7) 6 2613 (7) 1
(M) Svn log 92 3215 (7) 1 3274 (7) 1 2928 (7) 1 2769 (7) 5 2363 (7) 2
Telecom 122 4417 (7) 1 4644 (7) 1 4123 (7) 1 4237 (7) 7 3938 (7) 2
Colibrilog 167 6870 (8) 2 7777 (8) 2 7212 (8) 3 6789 (8) 11 6852 (8) 6
Caise2014 401 37314 (9) 11 38339 (9) 11 37234 (9) 14 37180 (9) 115 35616 (9) 32
(L) Log1-filtered 402 13910 (9) 3 20215 (9) 4 18212 (9) 6 18004 (9) 46 14343 (9) 31
Documentflow 651 21131 (10) 8 25222 (10) 8 24700 (10) 12 24623 (10) 112 22646 (10) 57
They come from the process mining community: artificial logs
derived from the simulation of a process model (BigLog1,
Log2, Caise2014), and real-life traces in different other
contexts (purchasetopay, incidenttelco, svn log, telecom,
documentflow).
Due to the size of these benchmarks (from 16 to 651 sce-
narios), we compare the Proposed encoding to the Sequential
encoding and Random search, as the other CPOG algorithms
always exceed the time limit. The proposed encoding is
applied with three configurations: (a) with #e set to 1 , (b)
with #e set to 10 , (c) and with #e = 1 and the Simulated
Annealing parameters modified in such a way to allow ×10
more iterations for the optimisation (SA ×10).
Evaluation: On average, the area of the controllers found
by the proposed encoding in configurations 1 and (2) are
4.7% (9.8%) more efficient, in terms of area, than sequential
controllers, and 12.9% (18%) more efficient than random
controllers. In turn, the results produced by the proposed en-
coding in configuration 3 are 13.2% and 21.7% more efficient,
on average, than the sequential and random implementations,
respectively.
On average, the Sequential encoding produces ' 8.56%
smaller controllers in comparison to the Random search al-
gorithm. Such a good result is due to a certain degree of
similarity between pairs of subsequent scenarios, which are
encoded naturally by pairs of subsequent and similar codes
by the Sequential algorithm.
In Table IV, the benchmarks are divided in three sets of
different sizes, from the bottom to the top: (S)mall (10 <
|S| < 30), (M)edium (30 < |S| < 400) and (L)arge (400 <
|S| < 652). See below consideration:
Small set, configuration 2 of the proposed encoding finds the
best results, as the higher number of encodings inspected
(#e = 10) provides a higher chance to produce a good
result. The increased number of SA iterations of configu-
ration 3 is not justified in this set due to the small |S|.
Medium set, configuration 3 finds the best results, as the
higher |S| justifies a longer optimisation time provided
to the Simulated Annealing optimisation.
TABLE V: Features of the CPOGs compositional algorithms.
Max |S|: maximum number of scenarios supported. CPOG-
scenarios: support of scenarios in the form of CPOG. Con-
straints: support of composition constraints.
Exhaust. SAT-based Single-lit. Proposed
Max |S| 8 ' 10/15 ' 10/15 ' 650
CPOG-scenarios
√ √ √
Constraints
√
Large set, configuration 3 finds smaller controllers in com-
parison to configurations 1 and 2. However, these bench-
marks highlight the heuristic (inaccurate) component of
the proposed approach, which may find worse controllers
in comparison to trivial algorithms. For this set, a higher
number of SA iterations would be justified for obtaining
good results.
This set of benchmarks shows that the proposed approach can
handle specifications of hundreds of scenarios. Also, it can be
tuned as much as needed by modifying the time for the SA
optimisation.
VII. DISCUSSION AND FUTURE RESEARCH
This paper presented a novel approach to scenario compo-
sition for the design methodology based on the Conditional
Partial Order Graphs. The presented open-source SCENCO
tool, embedded in the EDA toolsuite WORKCRAFT, imple-
ments this methodology. The algorithm and tool are evaluated
on a set of benchmarks and compared to the state-of-the-
art composition algorithms for CPOG, and to the behavioural
synthesis techniques based on FSM and STG.
Table V summarises the comparison of all CPOG compo-
sition techniques, relying on the experimental results shown
and evaluated in Section VI. The proposed algorithm, un-
like previously published techniques, handles hundreds of
scenarios with a good area/synthesis runtime trade-off, and
supports composition constraints. It also supports conditional
scenarios for modelling behaviours that contain dynamic
branching. Also, the experimental results highlight that the
15
CPOG methodology produces more efficient implementations
(in terms of area) than the approach based on the FSM and
STG. The latter can be applied only to relatively compact
models.
To further improve the CPOG methodology, a number of
recommendations for future research are given. (1) Parallel
implementation of the presented algorithm, which is important
to further improve the efficiency of behavioural composition
by exploring more solutions at no extra runtime cost. (2)
Support for x-aware scenario encoding (with x being latency,
power, energy, and other characteristics), which is important
for making the methodology attractive to many practical
domains.
ACKNOWLEDGMENT
We would like to thank all members of the µSystems
Research Group at Newcastle University for supporting us and
this research. In particular, we thank Alex Yakovlev, Danil
Sokolov, Maxim Rykunov, Jonathan Beaumont and Ghaith
Tarawneh. This research was supported by the Royal Society
Research Grant “Computation Alive: Design of a Processor
with Survival Instincts”.
REFERENCES
[1] R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C.
Lee, S. Richardson, C. Kozyrakis, and M. Horowitz. Understanding
sources of inefficiency in general-purpose chips. In ACM SIGARCH
Computer Architecture News, volume 38, pages 37–47. ACM, 2010.
[2] D. Gajski, AC-H Wu, V. Chaiyakul, S. Mori, T. Nukiyama, and
P. Bricaud. Essential issues for ip reuse. In Design Automation
Conference, 2000. Proceedings of the ASP-DAC 2000. Asia and South
Pacific, pages 37–42. IEEE, 2000.
[3] G. Martin and G. Smith. High-level synthesis: Past, present, and future.
IEEE Design Test of Computers, 26(4):18–25, July 2009.
[4] A. Reid. Trustworthy specifications of ARM v8-a and v8-m system
level architecture. In 2016 Formal Methods in Computer-Aided Design
(FMCAD), pages 161–168, Oct 2016.
[5] A. Fox and M.O. Myreen. A Trustworthy Monadic Formalization of the
ARMv7 Instruction Set Architecture, pages 243–258. Springer Berlin
Heidelberg, Berlin, Heidelberg, 2010.
[6] A. Mokhov and A. Yakovlev. Conditional partial order graphs:
Model, synthesis, and application. IEEE Transactions on Computers,
59(11):1480–1493, Nov 2010.
[7] A. Mokhov, A. Iliasov, D. Sokolov, M. Rykunov, A. Yakovlev,
and A. Romanovsky. Synthesis of Processor Instruction Sets from
High-Level ISA Specifications. IEEE Transactions on Computers,
63(6):1552–1566, June 2014.
[8] A. Mokhov, A. Alekseyev, and A. Yakovlev. Encoding of processor
instruction sets with explicit concurrency control. IET Computers Digital
Techniques, 5(6):427–439, November 2011.
[9] SCENCO code. GitHub repository: github.com/tuura/scenco.
[10] WORKCRAFT. GitHub repository: github.com/workcraft/workcraft,
WORKCRAFT website: www.workcraft.org.
[11] A. de Gennaro, P. Stankaitis, and A. Mokhov. A heuristic algorithm
for deriving compact models of processor instruction sets. In 2015
15th International Conference on Application of Concurrency to System
Design, pages 100–109, June 2015.
[12] G. Birkhoff. Lattice Theory. Number v. 25, pt. 2 in American
Mathematical Society colloquium publications. American Mathematical
Society, 1940.
[13] Alastair Reid, Rick Chen, Anastasios Deligiannis, David Gilday, David
Hoyes, Will Keen, Ashan Pathirane, Owen Shepherd, Peter Vrabel, and
Ali Zaidi. End-to-end verification of processors with ISA-Formal. In
International Conference on Computer Aided Verification, pages 42–58.
Springer, 2016.
[14] Kathryn E Gray, Peter Sewell, Christopher Pulte, Shaked Flur, and
Robert Norton-Wright. The Sail instruction-set semantics specification
language. 2017.
[15] Georgy Lukyanov and Andrey Mokhov. Concurrency Oracles for Free.
In Proceedings of the Algorithms and Theories for the Analysis of Event
Data 2018 Workshop, 2018 (in print).
[16] A. Mokhov. Conditional Partial Order Graphs. PhD thesis, Newcastle
University, 2009.
[17] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and
A. Yakovlev. Logic Synthesis for Asynchronous Controllers and In-
terfaces. Springer Publishing Company, Incorporated, 2013.
[18] Giovanni De Micheli. Synthesis and Optimization of Digital Circuits.
McGraw-Hill Higher Education, 1st edition, 1994.
[19] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and
A. Yakovlev. Petrify: A tool for manipulating concurrent specifications
and synthesis of asynchronous controllers, 1996.
[20] V. Khomenko, M. Koutny, and A. Yakovlev. Detecting state coding
conflicts in stg unfoldings using sat. In Third International Conference
on Application of Concurrency to System Design, 2003. Proceedings.,
pages 51–60, June 2003.
[21] Pran Kurup and Taher Abbasi. Logic Synthesis Using Synopsys. Springer
Publishing Company, Incorporated, 2nd edition, 2011.
[22] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction
to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
[23] M. Gebser, B. Kaufmann, and T. Schaub. Conflict-driven answer set
solving: From theory to practice. Artificial Intelligence, 187:52 – 89,
2012.
[24] Niklas Ee´n and Niklas So¨rensson. An Extensible SAT-solver, pages 502–
518. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
[25] R.W. Hamming. Error detecting and error correcting codes. The Bell
System Technical Journal, 29(2):147–160, April 1950.
[26] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated
annealing. SCIENCE, 220(4598):671–680, 1983.
[27] P.C. McGeer, J.V. Sanghavi, R.K. Brayton, and A.L. Sangiovanni-
Vicentelli. Espresso-signature: a new exact minimizer for logic func-
tions. IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, 1(4):432–440, Dec 1993.
[28] Berkeley Logic Synthesis and Verification Group. ABC, a system for
sequential synthesis and verification.
[29] E.M. Sentovich, K.J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Sal-
danha, H. Savoj, P.R. Stephan, R.K. Brayton, and A.L. Sangiovanni-
Vincentelli. Sis: A system for sequential circuit synthesis. Technical
Report UCB/ERL M92/41, EECS Department, University of California,
Berkeley, 1992.
[30] SCENCO documentation. workcraft.org/help/encoding plugin.
[31] A. Mokhov, M. Rykunov, D. Sokolov, and A. Yakovlev. Design of
processors with reconfigurable microarchitecture. Journal of Low Power
Electronics and Applications, 4(1):26–43, 2014.
[32] J. Sparso and S. Furber. Principles of Asynchronous Circuit Design: A
Systems Perspective. Springer Publishing Company, Incorporated, 1st
edition, 2010.
[33] D. Sokolov, V. Dubikhin, V. Khomenko, D. Lloyd, A. Mokhov, and
A. Yakovlev. Benefits of asynchronous control for analog electronics:
Multiphase buck case study. In Design, Automation Test in Europe
Conference Exhibition (DATE), 2017, pages 1751–1756, March 2017.
[34] D. Sokolov, A. de Gennaro, and A. Mokhov. Reconfigurable asyn-
chronous pipelines: from formal models to silicon. In Design, Automa-
tion Test in Europe Conference Exhibition (DATE), 2018, March 2018.
[35] M. Rykunov. Design of Asynchronous Microprocessor for Power
Proportionality. PhD thesis, Newcastle University, 2013.
[36] A. Mokhov, J. Carmona, and J. Beaumont. Mining Conditional Partial
Order Graphs from Event Logs, pages 114–136. Springer Berlin
Heidelberg, Berlin, Heidelberg, 2016.
