A formal semantics for the Taverna 2 workflow model  by Sroka, Jacek et al.
Journal of Computer and System Sciences 76 (2010) 490–508Contents lists available at ScienceDirect
Journal of Computer and System Sciences
www.elsevier.com/locate/jcss
A formal semantics for the Taverna 2 workﬂow model
Jacek Sroka a, Jan Hidders b, Paolo Missier c,∗, Carole Goble c
a University of Warsaw, Poland
b Delft University of Technology, The Netherlands
c University of Manchester, United Kingdom
a r t i c l e i n f o a b s t r a c t
Article history:
Received 10 March 2009
Received in revised form 20 August 2009
Available online 22 November 2009
Keywords:
Taverna 2
Scientiﬁc workﬂows
Trace semantics
Formal semantics
This paper presents a formal semantics for the Taverna 2 scientiﬁc workﬂow system.
Taverna 2 is a successor to Taverna, an open-source workﬂow system broadly adopted
within the e-science community worldwide. The new version improves upon the existing
model in two main ways: (i) by adding support for data pipelining, which in turns enables
input streams of indeﬁnite length to be processed eﬃciently; and (ii) by providing new
extensibility points that make it possible to add new operators to the workﬂow model.
Consistent with previous work by some of the authors, we use trace semantics to describe
the effect of workﬂow computations, and we show how they can be used to describe the
new features in the Taverna 2 model.
© 2009 Elsevier Inc. All rights reserved.
1. Introduction
Taverna [9] is a workﬂow language and computation model designed to support the automation of complex, service-
based processes, speciﬁcally in the domain of science. Version 1 was ﬁrst described in 2004 [17] and has since proven
to be a useful tool for scientists, mainly in bioinformatics [4,12]. The formal semantics of Taverna was described ﬁrst by
Turi et al. [23], and then by Hidders et al. [7]. The latter in particular uses trace semantics to take into account possible
side-effects that are associated with the services.
As a language, Taverna is strongly user-focused, as it attempts to strike a balance between expressivity of its operators
and simplicity of use, with the ultimate goal of empowering users, who have only a rudimentary understanding of program-
ming, to assemble complex workﬂows. The re-design of the workﬂow engine, which is now called “Taverna 2”, reﬂects the
need to reassess this balance as a result of years of practical usage, and of changing requirements.
This paper presents a new trace semantics for Taverna 2, following a similar approach to that of Hidders et al. [7]. We
speciﬁcally formalise some of the model’s new features, namely (i) support for data streaming, through pipelined execution
of workﬂows; and (ii) support for extensibility of the set of workﬂow operators. Despite the fact that techniques for exploit-
ing implicit data parallelism in workﬂows have been known for a long time [25], pipelining was not available in Taverna 1,
resulting in a limitation in the scalability of applications where data consists of large collections.
Although we describe these features formally, and characterize the types of parallelism that are available in Taverna, our
presentation is focused entirely on the development of the theoretical model, while a quantitative performance analysis of
the workﬂow engine is beyond the scope of this paper.
We also pay special attention to Taverna’s support of repeated processor execution by iteration over lists. This feature
is important for scientiﬁc workﬂows and is shared among some other workﬂow systems. Notably, one of the computation
* Corresponding author.
E-mail addresses: j.sroka@mimuw.edu.pl (J. Sroka), a.j.h.hidders@tudelft.nl (J. Hidders), pmissier@cs.man.ac.uk (P. Missier), carole.goble@manchester.ac.uk
(C. Goble).0022-0000/$ – see front matter © 2009 Elsevier Inc. All rights reserved.
doi:10.1016/j.jcss.2009.11.009
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 491models in the Kepler system [13], namely the COMAD model, supports streaming of data collections by letting workﬂow
processors manipulate XML documents [14]. This has been recently advocated as a way to better meet the requirements of
non-expert users [15].
The importance of deﬁning formal models of various types of workﬂow computation has been recognised for a long
time [1], including a formal model of Kepler and its precursor, the Ptolemy system [5]. Such models can be used to support
the design of workﬂow languages and of their interpreters, compilers and optimizers as well as of debuggers, and to support
the deﬁnition of veriﬁcation procedures, similar to those used for verifying the correctness of complex business transactions.
In the case of Taverna, a system whose design was largely driven by the practical need to support process automation for a
variety of users types, such formalisation is even more important, in that it provides workﬂow designers, developers as well
as users with an a posteriori description of the system’s behaviour that is both unambiguous and complete.
The ﬁrst question that needs to be answered before a formal semantics can be deﬁned is what it is that we would like
the semantics to describe about the speciﬁed workﬂows and at what abstraction level. It can be argued, for example, that
from the point of view of the user a Taverna workﬂow just describes a computation that maps a set of input values to a
set of output values. In that case the semantics of a Taverna workﬂow would be a non-deterministic function that maps the
input values to the output values, or in other words, if two workﬂows express the same mapping then they are equivalent as
far as the user is concerned. This is for example the perspective that was taken in earlier work by Turi et al. [23]. However,
in this paper we assume that in some cases the user does care about which steps are executed by the computation and in
what order. This is for example the case if these are calls to functions with side effects, e.g., an update to a database, or calls
to web services that require a certain protocol. In this case the formalism should not just indicate to which output the input
is mapped, but also which steps are taken to do so, and hence a process formalism is more appropriate. We will assume
that the relevant events that are to be described are the consumption of the input values, the invocation of functions that
have side effects or call external web services and the production of output values.
There are many formalisms and languages already available to describe processes and in particular processes that com-
pose web services. Many of these have a syntax and semantics based on either the π -calculus [16] or on Petri nets [21].
Based on π calculus are for example WS-CDL [10], WSCI [3] and XLANG, and based on Petri nets are BPMN [6], YAWL [24]
and WSFL [11]. A somewhat special case is BPEL [2], which is based on both XLANG and WSFL. Its semantics is at least
partially describable in terms of both Petri nets, e.g., [18], and π -calculus, e.g., [27], and it is explicitly aimed at deﬁning an
executable process speciﬁcation.
Despite the existence of the previously described range of languages and formalisms, we have chosen to deﬁne a speciﬁc
tailored set of operators for describing the semantics of Taverna 2 in terms of traces. We have done so for several reasons:
Observational equivalence. The main goal of the presented formalism is to deﬁne when two workﬂow speciﬁcations are
equivalent, and to allow reasoning over what can and cannot be expressed in the Taverna 2 workﬂow language. This requires
the chosen language to have a complete formal semantics that indeed describes when two workﬂows are observationally
equivalent. Note that for formalisms that are based on transition systems, such as Petri nets and π -calculus, this means
that next to the transition system an equivalence relation also needs to be deﬁned. See for example [26] for a range of
such equivalence relations and an explanation of when they are appropriate depending on the type of processes that are
being described. In the case of workﬂow enactment such as deﬁned in Taverna, it is not hard to see that trace semantics is
the correct equivalence relation since the user can observe the events, i.e., the calls to services, but cannot inﬂuence them.
As a consequence, formalisms based on bisimulation semantics, such as π -calculus, are less suitable or at least require a
proof that this does not lead to incorrect equivalencies. It will also be clear that a formalism that directly deﬁnes the set
of possible traces, rather than indirectly through Petri nets which in turn deﬁne sets of traces, will be easier to understand
and reason over.
Taverna 2 speciﬁc features. Typically, Taverna workﬂows involve computations on recursively nested lists, i.e., list composition
and iteration. Although extensions exist of π -calculus, e.g., Pict [20], and of Petri nets, e.g., DFL [8], that can describe this,
these require major extensions of the basic formalism. Also here we claim that a tailored formalism that directly deﬁnes
the traces for such behavior is more compact and easier to comprehend. A similar argument can be made for the two new
features in Taverna 2 that are the focus of this paper: the pipelined execution of processors and the dispatch stack which
deﬁnes an extensible mechanism to alter the behavior of individual processors.
The rest of the paper is organised as follows. We present an informal overview of the Taverna model and the calculus
in Section 2, followed by a deﬁnition of trace, in Section 3. Traces are then used in Section 4 to describe the semantics of
workﬂow graphs, in Section 5 to formalise Taverna’s iteration strategies, and again in Section 6 to describe the extensibility
features of the model. Finally, in Section 7 we describe a translation of a Taverna 2 speciﬁcation into our calculus, and we
conclude in Section 8.
2. Overview of Taverna and introduction to the calculus
Informally, a Taverna workﬂow graph is a directed acyclic graph where nodes, called processors, represent software com-
ponents, for instance Web services or local scripts, with an interface that consists of input and output ports. Arcs in the
graph connect pairs of ports, and specify a data dependency from the output port of one processor to the input port of
492 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508(a) A simple bioinformatics Taverna workﬂow (b) A nested workﬂow
Fig. 1. Examples of Taverna workﬂows.
another. Additionally, a control link between a source and a sink processor can be used to specify that the sink processor
cannot execute until the source processor has produced all its output.
A simple example of a Taverna workﬂow for bioinformatics is illustrated in full detail notation with processor and port
names in Fig. 1(a).1 The workﬂow takes a user-supplied list of genes, in the format expected by the KEGG database,2 and
it retrieves information about the metabolic pathways which the input genes are involved in, according to the database.
It uses the KEGG Web Service to fetch the data, as well as a number of ancillary scripts, known as “shims”, as adapters
between the output of a service and the input of the next.
Workﬂow computation is mostly data-driven: when there are no control links, a processor is ready to execute as soon as
all of its input ports are bound to a value, and its execution causes its output ports to be bound to the result values. These
values are then propagated along any arcs that originate in any of the output ports to create new bindings downstream. In
particular, workﬂow execution begins when the workﬂow inputs are bound to initial input values, e.g., Kegg_ gene_id_list
in the example, and terminates when output values cannot be propagated any further down the graph. In the example, the
ﬁnal results are bound to the output values gene_descriptions, pathway_descriptions, and pathway_by_genes.
Note that processors need not have any input ports, for instance rexeg in the ﬁgure, which is used to produce a constant
value. Similarly a processor may have no output ports, and would only be executed for its side effects. Note also that the
separator input ports of merge_descriptions, merge_pathways and merge_pathways_2 processors have no incoming arcs. For
such input ports a default value is provided, but this is not visible in the graphical notation.
With processors we associate activities, which deﬁne the core functionality of the processor. These can be either basic
activities, i.e., representing a local program or script or the invocation of an external service, or workﬂows themselves, i.e.,
1 The workﬂow is published in the myExperiment website, please see http://www.myexperiment.org/workﬂows/611.
2 KEGG is available at http://www.genome.jp/kegg/pathway.html.
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 493Fig. 2. List values in Taverna.
nested workﬂows, as shown in Fig. 1(b) where processor P2 contains a workﬂow with processors P4 and P5. All activities
are characterized by a pair of interfaces that are both a set of port names. Basic activities in particular perform some kind
of computation using the values on their input ports and produce new values on the output ports.
Let B be a countably inﬁnite set of basic activity names and P a countably inﬁnite set of port names. For example,
with reference to Fig. 1(a), Kegg_query ∈ B , and string, return,attachmentList ∈ P .3 An interface is a ﬁnite set I ⊂ P We use
variants of the variables I and J to denote interfaces. Interfaces describe the input and output ports of activities (including
sub-workﬂows).
The type of an interface is based upon the type of its ports, which consists of a single basic type s containing basic
values such as strings, booleans and numbers, and a list type constructor:
Deﬁnition 1 (Port type). The set of port types T is deﬁned by the syntax rule T ::= s | [T ].
Here s denotes the basic type and [τ ] the type of lists of elements of type τ . For instance, port return in processor
Kegg_query is of type [s]. Note that lists can be nested, as for example Y for processor P4 in Fig. 2. We will use variants of
the variables τ , σ and ρ to denote port types.
Deﬁnition 2 (Interface type). An interface type is deﬁned as a partial function ι : P → T that is deﬁned for a ﬁnite subset
of P . Such an interface type is denoted as 〈a1 : τ1, . . . ,an : τn〉, for example 〈return : [s],attachmentList : [s]〉, and the empty
type is allowed in order to accommodate processors with no input (see for instance processor regex, which produces a
constant value), and is denoted as 〈 〉. The set of all interface types is denoted as Θ .
We will use variants of the variables ι and κ to denote interface types. The disjoint union of two interface types ι and
κ is denoted as ι, κ . For partial functions and other binary relations t such as interface types the domain of t is deﬁned as
dom(t) = {x | (x, y) ∈ t}. We assume that for all basic activity names f ∈ B , an input interface type ι and output interface
type κ are given, which is denoted as f : ι → κ , for instance Kegg_ query : 〈string : [s]〉 → 〈return : [s],attachment_list : [s]〉.
Lists, i.e., values of type [τ ], play a central role in the semantics of the calculus since the values that are manipulated by
the workﬂows are mostly lists. Furthermore, we also use lists to describe traces. Lists are denoted as [v1, . . . , vn], with the
empty list denoted as [ ]. The concatenation of two lists v and w is denoted as v · w . We generalize this notation such that
•i=1..nvi denotes v1 · · · · · vn if n > 0 and [ ] if n = 0. If w = [w1, . . . ,wn] then we write a ∈ w to denote that a appears
in w , i.e., there exists an 1 i  n such that wi = a. The length of a list v is denoted as |v|.
In order to deﬁne the semantics of port types we postulate the set of basic values S which will be semantics of
the basic type s. We will assume that the basic values include strings, booleans and numbers, but not a special error
value err.
3 As activities are often wrappers for third party Web Services, the sometimes uninspiring names of the ports are not chosen by the workﬂow designer,
but rather they are mandated by the service’s WSDL interface deﬁnition.
494 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508The semantics of a port type τ , denoted as [[τ ]], deﬁnes the set of values that can be associated to the port, and is
deﬁned with induction on the structure of τ such that (1) [[s]] = S ∪ {err} and (2) [[[τ ]]] is the union of {err} and the set of
all lists [v1, . . . , vn] with n 0 and vi ∈ [[τ ]] for all 1 i  n. The set of all possible port values is the union of all semantics
of all port types and is denoted as V , i.e., V =⋃τ∈Θ [[τ ]]. We will use variants of the variables v,w, x and y to range over
port values. The semantics of interface types is based on the notion of tuples over port values.
Deﬁnition 3 (Interface value). An interface value is a partial function t : P → V that is deﬁned for a ﬁnite subset of P . Such
a value is denoted as t = 〈a1 = v1, . . . ,an = vn〉.
An example is the value given by the tuple
〈return = ["Thyroid cancer...","PPAR signaling pathway..."],attachmentList = []〉
produced by processor Kegg_query. The set of all interface values is denoted as I . We will use variants of t to represent
interface values. Element ai of tuple t is denoted t(ai).
The semantics of an interface type ι = 〈a1 : τ1, . . . ,an : τn〉, denoted as [[ι]], is deﬁned as the set of all interface values
t = 〈a1 = v1, . . . ,an = vn〉 such that v j ∈ [[ι(a j)]] for all 1 j  n.
For the basic activities f : ι1 → ι2 we assume their semantics to be deﬁned by a binary relation [[ f ]] ⊆ [[ι1]] × [[ι2]]. For
example, [[Kegg_query]] includes:
〈string = ["path:hsa05216","path:hsa03320"]〉
〈return = ["Thyroid cancer...","PPAR signaling pathway..."],attachmentList = []〉
Note that this binary relation is not required to be either total or functional. We allow it to be partial to model that the
associated service always fails for certain input values even if they belong to the required input type. This might for example
be the case if they do not satisfy certain additional preconditions that the service requires. We allow the semantics to be
non-functional to model that the result of a service call may be non-deterministic from the point of view of the workﬂow,
i.e., not be completely determined by the arguments.
2.1. List values and repeated processor invocation
Although the workﬂow is an acyclic graph, and thus does not permit users to explicitly describe loops over groups of
processors, the Taverna model speciﬁes how a processor can execute repeatedly, by iterating over its input ports list values.
Here we give an informal account of this mechanism, while a complete formal model will be given in Section 5.
Let pvd(v) denote the depth of a port value v , for example, pvd("cat") = 0, pvd(["cat","dog"]) = 1,
pvd([["cat"], ["dog"]]) = 2, et cetera. Similarly, let ptd(τ ) denote the depth of a port type τ . Let A : ι → κ be an ac-
tivity with a simple input interface type ι = 〈a : τ 〉, and let t = 〈a = v〉 be an interface value that describes the input
to A. Consider the case where pvd(v) = ptd(τ ) + 1, that is, the depth of the value assigned to a is one greater than the
depth of the type τ of a. This depth mismatch is dealt with by executing A repeatedly, once for each element vi ∈ v . This
can be described as the application of the well-known map function to A and v , i.e., (map A v) = [(A v1) . . . (A v |v|)].
More generally, map is applied recursively pvd(v) − ptd(τ ) times to A. For example, if ι = 〈a : s〉 and t = 〈a = v〉 where
v = [["cat","dog"], ["black","white"]], then the workﬂow computes
(map (map A) [["cat","dog"], ["black","white"]]) =
[(map A ["cat","dog"]), (map A ["black","white"])] =
[[(A "cat"), (A "dog")], [(A "black"), (A "white")]]
This implicit iteration rule is designed to facilitate the composition of individual services into workﬂows. For example, it
makes it possible to feed the results of a database lookup service P1, which returns a collection of records, to a service P2
that is designed to analyse a single record. In this case, implicit iteration is a natural interpretation of the data dependency
between P1 and P2.
The iteration rule extends to the case where depth mismatches appear on multiple inputs. Fig. 2 shows a workﬂow
fragment, where all input ports (presented on the top right side of the processor) of P4 expect simple values. When only
the value on port c is basic as expected and the list-valued inputs a = [a1, . . . ,an] and b = [b1, . . . ,bm] are provided instead,
the multiple port mismatches pvd(a) = ptd(P4:X1) + 1 and pvd(b) = ptd(P4.X3) + 1 are solved with the help of the cross
product a  b = [[〈a1,b1〉, . . . , 〈a1,bm〉], . . . , [〈an,b1〉, . . . , 〈an,bm〉]]. We can describe the result using standard λ-calculus
notation, as follows:
y = (map (map (λ xa xb.(P4 xa xb c)
))
(a b)
)
where |y| = n, and λ xa xb.(P4 xa xb c) denotes a function with arguments xa and xb that applies P4 to xa , xb , and c. The
map operator repeatedly applies this function, by iteratively binding xa and xb to each element of the product a b. In the
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 495example, the result is a list of the form [[y1,1, . . . , y1,m], . . . , [yn,1, . . . , yn,m]] where yi, j = (P4 ai b j c). Note that here c
is treated as a constant parameter, because there is no port mismatch on X3, therefore the same value c is used in each
iteration.
Additionally, Taverna workﬂow designers have the option to choose a different iteration strategy, based on the
dot product a b = [〈a1,b1〉, 〈a2,b2〉, . . . , 〈ak,bk〉] where k = min(n,m). The description in λ-calculus notation is
y = (map (λ xa xb.
(
P4 xa xb c
))
(a b)
)
where |y| = k. The overall iteration strategy of a processor can be speciﬁed by an iteration strategy expression, such as for
example (a b) c, where the symbol  indicates the cross product strategy and the  indicates the dot product strategy.
In Section 5 we further generalize and formalize the cross and dot product operators, showing in particular that the cross
product operator can be simulated using the dot product iteration and one additional operator of the calculus.
2.2. Pipelining
The Taverna 2 execution engine provides support for the pipelined execution of processors along a path in the work-
ﬂow, an important feature not previously available in Taverna 1, as mentioned in the introduction. This feature is strictly
connected with the implicit iteration mechanism just discussed, and is aimed at improving eﬃcient execution over large
data collections. In Taverna, iterations over items in a list may be executed concurrently if suﬃcient resources are available
to support parallel execution threads. When this happens, the elements of the result list may be produced in arbitrary or-
der, due to different speeds of each of the threads. These possible different arrival orders are captured in workﬂow traces,
described in Section 3, by recording events that represent individual locations (see Deﬁnition 4) of an output list being
assigned a value, and values at input locations being used by an activity.
In Taverna 1, once the output list is complete, the engine activates each of the downstream processors that consume the
list. While this behaviour is correct, it does not take advantage of additional implicit parallelism that is available due to
the mutual independence between data elements within a list. In contrast, Taverna 2 allows for the consumer processors
to begin to process individual elements of the result without the need to wait for the entire list to be complete. The most
common and useful case occurs when consumers are also subject to implicit iteration. Suppose for example that, in the
fragment of Fig. 2, P4:Y has an outgoing arc to some port P5:X, with port type depth equal to 0. As we know, P5 will
iterate on each of the elements yi, j in y, and furthermore, these iterations will all be independent of one another. Since the
sub-lists as well as the simple values in y are themselves independently generated, each of them can be forwarded to P5 as
soon it becomes available on port P4:Y, independently from the others. By greedily pushing partial results to downstream
processors, the speed at which individual values are computed is no longer bounded by the slowest thread that accounts
for one iteration over P4.
We can frame this behaviour in terms of a collection of workﬂow patterns for parallel computing, proposed in [19].
Within this framework, Taverna implements a superscalar pipeline execution semantics, whereby a new thread is allocated
to each list element in the input to an iterating processor. At the same time, however, an upper bound on the number
of available threads can be set for each processor (through an advanced conﬁguration option). With this limitation, some
pipelines can be blocking, if the upstream processors are systematically faster than downstream processors.
In addition, Taverna also supports streaming pipelines. In this context, a stream is deﬁned as an unpredictably long col-
lection of discrete input data elements, an increasingly common occurrence in applications such as process monitoring, or
sensor data processing. Indeed, Biomart (www.biomart.org) [22] is an example of a service that supplies its output in a
streamed fashion. Taverna is able to consume the stream as it appears on the output port, either by feeding it incrementally
to downstream services that also accept streams, or by buffering the output prior to forwarding it to services that expect
complete data as input. Conceptually, buffers have unlimited capacity, thanks to a two-tier data architecture that involves
an in-memory buffer with a database overﬂow. A complete description of the data architecture is beyond the scope of this
paper, however.
3. Trace semantics
As anticipated in the introduction, we model the semantics of workﬂows in terms of traces. Informally, traces record
sequences of three types of elementary events: (i) input events, representing values arriving on input ports; (ii) the atomic
execution of a basic activity, and (iii) output events, i.e., the availability of a value on an output port. In this section we
introduce our notation for describing traces, while in the next section we introduce the set of operators of our calculus
that describe Taverna 2 workﬂows and deﬁne the semantics of such operators as the set of all legal traces that the use of the
operator can produce. As a workﬂow is described by an expression that combines operators of the calculus, by extension
the semantics of a workﬂow is deﬁned as the set of all legal traces that the expression can produce. Note that by de-
scribing the semantics in terms of traces that contain basic activity executions, rather than just input and output events,
we account for the fact that these activities involve possibly stateful service invocations, a common occurrence in scientiﬁc
workﬂows.
Traces carry information about the relative ordering of events which are assumed to be atomic, but they are not con-
cerned with the actual relative duration of parts of the computation. Although a basic activity execution can in principle
496 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508(a) Sequential composition (b) Parallel composition
Fig. 3. Basic workﬂow graphs to illustrate sequential and parallel composition.
be thought of as the combination of two steps: sending a request and receiving a response, we are going to assume for
simplicity that basic activity executions are atomic and they are therefore described by a single event.
As mentioned in the previous section, collections consist of elementary values that can, in principle, be consumed and
produced, and thus appear in input and output trace events, in any order. We use locations to indicate the position of an
elementary basic value within a containing collection value, as follows.
Deﬁnition 4 (Location). A location is a possibly empty list of non-zero natural numbers. These lists are denoted by separating
the numbers with dots, such as in 1.4.2.1 and 2.3, and the empty location is denoted as 
 . The set of all locations is denoted
as L. The concatenation of two locations λ1 and λ2 is denoted as (λ1.λ2).
The numbers further from the location head give the indexes in deeper nested lists. As an example, each element yi, j of
the list [[y1,1, . . . , y1,m], . . . , [yn,1, . . . , yn,m]] is at location i. j, for all 1 i  n, 1 j m.
Deﬁnition 5 (Workﬂow trace). A workﬂow trace is a list of events. An event is either
• a successful basic activity execution: f (t1) 
→ t2 where f ∈ B and t1, t2 ∈ I ,
• a failed basic activity execution: f (t1) 
→ ⊥ where f ∈ B and t1 ∈ I ,
• an input event: ina(λ, v) with a ∈ P , λ ∈ L and v ∈ S ∪ {err, [ ]},
• an output event: outa(λ, v) with a ∈ P , λ ∈ L and v ∈ S ∪ {err, [ ]},
• a fail event: fail.
Since a workﬂow execution involves both parallelism across processors, and pipelining across iterations of a proces-
sor, different executions may generate different traces for the same inputs, even when the services involved are state-
less.
Fig. 3 includes two simple workﬂows. We use them to demonstrate: (1) complete workﬂow trace examples and (2) that
traces can have different relative ordering of events. Here are two possible traces for two executions of the workﬂow of
Fig. 3(a):
α = [ina(1, x1), ina(2, x2), FS
(〈b = x1〉
) 
→ 〈c = y1〉, FS
(〈b = x2〉
) 
→ 〈c = y2〉,
FT
(〈b = y1〉
) 
→ 〈c = z1〉, FT
(〈b = y2〉
) 
→ 〈c = z2〉,outf(1, z1),outf(2, z2)
]
β = [ina(1, x1), ina(2, x2), FS
(〈b = x1〉
) 
→ 〈c = y1〉, FT
(〈e = y1〉
) 
→ 〈e = z1〉,
outf(1, z1), FS
(〈b = x2〉
) 
→ 〈c = y2〉, FT
(〈d = y2〉
) 
→ 〈d = z2〉,outf(2, z2)
]
Here FS and FT are the basic activities associated with processor S and T. Note that in both traces these processors iterate
over their input, a list with two elements, but in β processor T already starts before S is ﬁnished, as might be expected in
a pipelined execution.
We also give two traces for two executions of the workﬂow of Fig. 3(b):
α = [ina(
, x), FU
(〈b = x〉) 
→ 〈c = y1〉, FV
(〈d = x〉) 
→ 〈e = y2〉,outf(1, y1),outf(2, y2)
]
β = [ina(
, x), FV
(〈d = x〉) 
→ 〈e = y2〉, FU
(〈b = x〉) 
→ 〈c = y1〉,outf(2, y2),outf(1, y1)
]
Note that the complete traces include only initial input events and ﬁnal output events. As will become clear with the
presentation of the semantics rules, the corresponding intermediate input and output events cancel themselves out when
the composition operator is used.
To describe all possible orderings of events in a trace in a compact way, we write (α |||β) to denote the set of all possible
interleavings of two traces α and β . Formally: (α ||| β) = {α1 · β1 · · · · · αn · βn | n  1, α = α1 · · · · · αn, β = β1 · · · · · βn}.
Note that this indeed deﬁnes all possible interleavings of the events in α and β since αi and β j can be the empty trace. We
generalize the interleaving set for more than two traces, denoted as (α1 ||| . . . |||αn) or |||i=1..nαi , such that β ∈ (α1 ||| . . . |||αn)
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 497iff there is a γ ∈ (α2 ||| . . . ||| αn) and β ∈ (α1 ||| γ ). For n = 1 we deﬁne (α1 ||| . . . ||| αn) as {α1}, and for n = 0 as {[ ]}, i.e., the
singleton set containing the empty trace.
As we have seen from the previous example, multiple events may be used to represent the arrival or generation of list
values. For convenience, we summarize the entire set of such events by denoting the encoding of an entire input (resp.,
output) value v in a trace α as ina(v) ⇓ α (resp. outa(v) ⇓ α). Note that the elements of the list may be produced in any
order, therefore different executions may show different encodings for the same values.
It will also be necessary to describe list values of depth n, which are obtained by wrapping an existing list of depth
n− 1, e.g., a list [[a1,a2]] obtained by wrapping list [a1,a2]. If ina(λ, v) is the event corresponding to the arrival of an input
v at location λ, then the corresponding location in the wrapped list is i.λ for some positive i. Therefore we let i.α denote
the trace that we obtain if in α we replace each event ina(λ, v) with ina(i.λ, v) and each event outa(λ, v) with outa(i.λ, v).
Formally, ina(v) ⇓ α is deﬁned with induction on the structure of v by the following inference rules:
v ∈ S ∪ {err, [ ]}
ina(v) ⇓ [ina(
, v)]
∀i=1..n(ina(vi) ⇓ αi) γ ∈ |||i=1..n(i.αi)
ina([v1, . . . , vn]) ⇓ γ
These are examples of inference rules that will be used extensively throughout the paper. The ﬁrst rule states that if v is a
basic value (or err or [ ]), then the single event ina(v) generates the single-event trace [ina(
, v)]. The second rule states
that if n basic values v1, . . . , vn arrive on port a, and each such arrival is encoded as αi , then the arrival of a list [v1, . . . , vn]
is encoded by a trace γ , which can be any interleaving of traces i.αi . The rules for output traces, i.e., outa(v) ⇓ α, are similar
and omitted for brevity.
Where basic values and lists are associated with streams on individual input and output ports, tuple values are used
to model the complete input or output of a processor and in that sense arrive at or are produced by a processor as a
whole. Correspondingly, as a further shorthand we introduce the notation in∗(t) ⇓ β and out∗(t) ⇓ β to denote that trace
α encodes the tuple interface value t in input events or output events, respectively. Formally, such an encoding β is an
interleaving of traces αi , where each αi encodes the arrival (resp., generation) of one ﬁeld ai of the tuple t on port ai :
dom(t) = {a1, . . . ,an}
∀i=1..n(inai (t(ai)) ⇓ αi) β ∈ |||i=1..nαi
in∗(t) ⇓ β
The rules for output traces, i.e., out∗(t) ⇓ β , are similar and omitted for brevity.
Finally, we introduce the notation α|I to denote the ﬁltering of the stream α on the input events on ports from I
disregarding all other event types, i.e., α|I = {ina(λ, v) | ina(λ, v) ∈ α, a ∈ I} and a counterpart notation for output events
α|I = {outa(λ, v) | outa(λ, v) ∈ α, a ∈ I}. Note that a ﬁltering is a set, rather than a trace itself. We use α| to denote the set
of all events in α, i.e., α| = {e | e ∈ α}.
4. Semantics of workﬂow graphs
In the calculus we distinguish two types of expressions, fragments and activities. Fragments represent parts of work-
ﬂow graphs, i.e., single processors and subgraphs. Activities represent wrappers for real-world services that are executed
by workﬂows, e.g., web service clients. The difference between the two is that fragments never fail, while activities may
terminate with failure. We need to make this distinction explicit because T2 assumes that nodes in the graph represent
processors that never fail. However, since in reality the services that are invoked when the processors are executed can fail,
this potential mismatch is resolved by the processor execution and the semantics accounts for this.
In this section we give the formal semantics for both types of expressions. The fragments correspond to subgraphs of
workﬂow graphs, which can be composed into bigger graphs. The operators that describe such fragments have types of
the form ι κ , where ι and κ are interface types that describe the input ports and the output ports, respectively. The
semantics of these operators is deﬁned by judgments of the form e ⇓ α, where e is a fragment expression and α a possible
trace that can include only input events, output events and basic activity executions but no failure events. The intended
semantics of a type ι κ is that if a fragment has this type then it holds for the associated set of traces that (1) no trace in
the set contains fail, (2) for each interface value t of interface type ι there is at least one trace α in this set such that α|P ,
i.e., the set of input events, encodes t and (3) for each trace α in this set if it holds that α|P encodes a values of type ι,
then α|P , i.e., the set of output events in α, encodes a value of type κ .
Activities are executed during the run of a workﬂow and can be either basic activities that represent real world services
or nested workﬂow graphs. Operators that describe activities have types of the form ι ⇒ κ and their semantics is deﬁned
by judgments of the form e ⇓ α, where e is an activity expression and α a possible trace that can include fail. The intended
semantics of a type ι ⇒ κ is that if a fragment has this type then it holds for the associated set of traces that (1) for each
interface value t of interface type ι there is at least one trace α in this set such that α|P encodes t , and (2) for each trace
α in this set it holds that if α|P encodes a values of type ι then either α contains fail or α|P encodes a value of type κ
and (3) for each trace in this set it holds that if it contains a fail then this is the last event in the trace. Note that it holds
that every fragment of type ι κ is also an activity of type ι ⇒ κ . Although we do not formally prove this in this paper,
498 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508(a) (b) (c)
Fig. 4. Illustrations of lna→a,b,c , mergea;b;c→a and F b,c G .
it can be veriﬁed that the sets of traces that are deﬁned by the presented operators indeed satisfy the requirements by the
types of the operators.
In Section 4.1 we present the basic operators for deﬁning fragments. Then, in Section 4.2, we explain how to translate
basic workﬂow graphs deﬁned in Taverna 2 into these basic operators. Finally, in Section 4.3 we introduce activities and
deﬁne the conversions between fragments and activities.
4.1. Basic fragment operators
The syntax of the fragment operators is given by:
ι = 〈b1 : τ , . . . ,bn : τ 〉 I = dom(ι)
lna→I : 〈a : τ 〉 ι
ι = 〈a1 : τ , . . . ,an : τ 〉
mergea1;...;an→b : ι 〈b : [τ ]〉
F : ι1 κ1, κ2 G : κ2, ι2 κ3 I = dom(κ2)
dom(ι1) ∩ dom(ι2) = ∅ dom(κ1) ∩ dom(κ3) = ∅
(F I G) : ι1, ι2 κ1, κ3
The linking operator lna→I is used to express arcs that transport data values, and deﬁnes a fragment in which values from
the port a are copied to all the ports in I . This is illustrated in Fig. 4(a). The merge operator mergea1;...;an→b represents
the (almost) symmetrical situation where values from many arcs are combined into a list produced on one port. Here the
ordering of a1; . . . ;an in the operator matters, as empahsized by the use of semicolons, and determines the ordering of
elements in the result list. This is illustrated in Fig. 4(b) where the numbers on edges depict this ordering. Finally, the
composition operator F I G , illustrated in Fig. 4(c), constructs a fragment by combining the fragments F and G on the
interface I so that the values produced by F on this interface are consumed by G .
The semantics of the linking workﬂow graph is deﬁned by:
{c1, . . . , cn} = I γ = [ina(λ1, v1)] · β1 · · · · · [ina(λm, vm)] · βm
∀i=1..m(βi = [outc1(λi, vi), . . . ,outcn(λi, vi)])
lna→I ⇓ γ
The rule states that there is a valid trace in which the stream from the input port a, i.e., its subsequent events
ina(λ1, v1), . . . , ina(λm, vm), is read and written to each of the output ports in I , and this is decoded by corresponding
subtraces β1, . . . , βm . The output ports in I are chosen in an arbitrary order but the streams are not reordered.
The semantics of the merge operator is deﬁned by:
n > 0 ∀ j=1..m(1 i j  n)
α ∈ ||| j=1..m[inai j (λ j, v j),outb(i j.λ j, v j)]
mergea1;...;an→b ⇓ α merge→b ⇓ outb(
, [ ])
Here we have n input ports on which m values arrive. ∀ j=1..m (1  i j  n) speciﬁes a port on which every single of
those values arrives, i.e., value v j is assumed to arrive on port ai j . Then in the trace α the value from port ai j is outputed
as the i jth element of the result list. Thus the ordering of the result list is determined by the ordering of the input ports.
Nevertheless the order of events in the stream can change. This is because α is constructed by interleaving, so this deﬁnition
allows the merge to buffer certain events. The second rule covers the case of n = 0 where the merge becomes the empty
list constructor.
The semantics of the composition operation is based on the notion of composing traces for a certain set of
port names I . This means that two traces are interleaved such that the output events of one trace for the
ports in I coincide with the input events for these ports in the other trace. For example, consider the traces
α = [ina(
, v1), f (t1) 
→ t2,outb(1, v2), g(t2) 
→ t4,outa(
, v3),outb(2, v4), inb(
, v5),outc(
, v6)] and β = [ind(
, v7),
h(t3) 
→ t4, inb(1, v2), f (t6) 
→ t7,outb(
, v8), inb(2, v4), inc(
, v6), g(t8) 
→ t9,outc(
, v9)]. The output events of α for the
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 499Fig. 5. A basic workﬂow graph to illustrate the representation in calculus.
port names in {b, c} are [outb(1, v2),outb(2, v3),outc(
, v4)]. The input events in β for this set of port names are
[inb(1, v2), inb(2, v3), inc(
, v4)]. Observe that the considered output events of α exactly match the considered input events
of β in the same order. This means that if α and β are traces of fragments F and G (Fig. 4(c)), where indeed the ports b
and c are connected, then α could describe a run of F , while β describes a possible simultaneous run of G . A run of such a
composition of F and G could be constructed by interleaving α and β such that the corresponding output and input events
coincide and then removing these output and input events. To illustrate this, we ﬁrst identify the subtraces α and β that
lay between the relevant output and input events:
α β
1 [ina(
, v1), f (t1) 
→ t2] [ind(
, v7),h(t3) 
→ t4]
– [outb(1, v2)] [inb(1, v2)]
2 [g(t2) 
→ t4,outa(
, v3)] [ f (t6) 
→ t7,outb(
, v8)]
– [outb(2, v4)] [inb(2, v4)]
3 [inb(
, v5)] [ ]
– [outc(
, v6)] [inc(
, v6)]
4 [ ] [g(t8) 
→ t9,outc(
, v9)]
Here the numbered rows indicate the subtraces that will be interleaved and the unnumbered rows indicate the
events on which the traces are synchronized. A possible result could for example be [ina(
, v1), ind(
, v7),h(t3) 
→ t4,
f (t1) 
→ t2] · [ f (t6) 
→ t7, g(t2) 
→ t4,outb(
, v8),outa(
, v3)] · [inb(
, v5)] · [g(t8) 
→ t9,outc(
, v9)].
The trace composition operator, denoted α ◦I β , describes all possible compositions of traces α and β that synchronize on
the port names in I . We deﬁne the set α ◦I β using the following inference rule:
{outb1(λ1, v1), . . . ,outbm (λm, vm)} = α|I {inb1(λ1, v1), . . . , inbm(λm, vm)} = β|I
α = α1 · [outb1(λ1, v1)] · · · · · αm · [outbm(λm, vm)] · αm+1
β = β1 · [inb1(λ1, v1)] · · · · · βm · [inbm (λm, vm)] · βm+1 ∀i=1..m+1(γi ∈ (αi ||| βi))
γ1 · · · · · γm+1 ∈ (α ◦I β)
The ﬁrst two premises of the rule identify the output and input events that should be synchronized, i.e., α|I and β|I . The
next two premises identify the subtraces between the events that are synchronized, namely α1, . . . ,αm+1 and β1, . . . , βm+1.
The ﬁnal premise states that γ1 is an interleaving of α1 and β1, γ2 is an interleaving of α2 and β2, etc. Then, in the
conclusion, it is stated that the string γ1 · · · · · γm+1 is in the set α ◦I β .
With the help of the trace composition operator, we can now deﬁne the semantics of the fragment composition operator,
as follows.
F ⇓ α G ⇓ β γ ∈ α ◦I β
(F I G) ⇓ γ
4.2. Representing basic Taverna 2 workﬂow graphs
The preceding operators are suﬃcient to model the semantics of simple Taverna 2 workﬂow graphs, such as the one
shown in Fig. 5. In this workﬂow graph the small circle indicates merging of values arriving from different edges into a list.
In Fig. 6 we illustrate how this workﬂow graph can be represented in the calculus.
We assume that the behavior of the processors P, Q and R as fragments is described by the calculus expressions FP , FQ
and FR . Then the total workﬂow graph can be represented as F7 = (((((lna→e ∅ lnb→ f )e, f FP)g lng→h, j)h FQ) j FR)i,k
mergei,k→d . This expression builds up the workﬂow graph from left to right. It starts with the edge connecting input a to
input port e, which translates to F1 = lna→e . Note that this represents a workﬂow graph with inputs {a} and outputs {e}.
This is combined with the other edge from b to f , giving F2 = F1 ∅ lnb→ f . Note that the operator ∅ indicates that F1 and
500 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508Fig. 6. Illustration of a calculus expression for the workﬂow graph from Fig. 5.
lnb→ f do not synchronize on any ports, and so operate in parallel. Also note that F2 has inputs {a,b} and outputs {e, f }.
In the next step we construct F3 = F2 e, f FP which connects the output ports {e, f } of F2 with the input ports {e, f } of
FP and therefore has outputs {g}. We then construct F4 = F3 g lng→h, j with outputs {h, j} being copies of the output
g of F3. Then F5 = F4 h FQ has outputs {i, j}, F6 = F5  j FR has outputs {i,k}, and ﬁnally F7 = F6 i,k mergei,k→d has
outputs {d}.
Let us illustrate how to use the semantic rules to derive a possible trace of the expression. We assume
that P executes a basic activity fP which performs an addition over its inputs, and so FP ⇓ [ine(
,3), in f (
,4),
fP(〈e = 3, f = 4〉) 
→ 〈g = 7〉,outg(
,7)]. We assume that Q doubles its input and therefore FQ ⇓ [inh(
,7),
fQ(〈h = 7〉) 
→ 〈i = 14〉,outi(
,14)]. Finally we assume that R squares its input, which means that for example
FR ⇓ [in j(
,7), fR(〈 j = 7〉) 
→ 〈k = 49〉,outk(
,49)]. We can now derive a trace for F7 as follows. For F1 = lna→e it holds
that F1 ⇓ [ina(
,3),oute(
,3)], and similarly lnb→ f ⇓ [inb(
,4),out f (
,4)]. It then follows by applying the rule for I with
I = ∅ that F2 = (F1∅ lnb→ f ) ⇓ [inb(
,4), ina(
,3),oute(
,3),out f (
,4)]. Clearly the output events for ports e and f in this
ﬁnal trace match with the input events for ports e and f in the mentioned trace of FP , i.e., they happen in the same order
for the same ports, locations and values, and so the e, f operator can combine them, which allows us to derive that F3 =
(F2 e, f FP) ⇓ [inb(
,4), ina(
,3), fP(〈e = 3, f = 4〉) 
→ 〈g = 7〉,outg(
,7)]. Since lng→h, j ⇓ [ing(
,7),outh(
,7),out j(
,7)]
it follows that F4 = (F3 g lng→h, j) ⇓ [inb(
,4), ina(
,3), fP(〈e = 3, f = 4〉) 
→ 〈g = 7〉,outh(
,7),out j(
,7)]. Continuing
this for the whole expression we can for example derive that F7 ⇓ [inb(
,4), ina(
,3), fP(〈e = 3, f = 4〉) 
→ 〈g = 7〉,
fQ(〈h = 7〉) 
→ 〈i = 14〉, fR(〈 j = 7〉) 
→ 〈k = 49〉,outd(1,14),outd(2,49)]. Note that since the ﬁnal result is a list with two
elements this is encoded as a stream with two events for port d.
4.3. Activities
The syntax of basic activities and translations between activities and fragments is given by:
f ∈ B f : ι → κ
f : ι ⇒ κ
A : ι ⇒ κ J = dom(κ)
fr J (A) : ι κ
The ﬁrst rule from the left gives the syntax for the basic activities for which, recall from Section 2, we assume f to
be the basic activity name with an input interface ι and output interface κ . The second rule deﬁnes the fragment operator
which transforms an activity into a fragment, which means that it converts fail events into error values for all the output
ports. Here J denotes the output interface of the activity and the resulting fragment.4 We do, however, need to state this,
using the following sub-typing rule:
4 Note that we do not need an additional operator that turns a fragment into an activity, because fragments are activities by deﬁnition.
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 501e : ι κ
e : ι ⇒ κ
Since basic activities can fail, we need two rules to deﬁne their semantics. to deal with a successful execution and with
failure, respectively:
(t1, t2) ∈ [[ f ]] in∗(t1) ⇓ α out∗(t2) ⇓ β
f ⇓ α · [ f (t1) 
→ t2] · β
in∗(t1) ⇓ α
f ⇓ α · [ f (t1) 
→ ⊥] · [fail]
The semantics of the fragment operator is deﬁned by the following two rules. The ﬁrst rule states that traces of an
activity A that do not contain a fail event are also traces of the corresponding fragment. The second rule describes what
happens if the trace does contain a fail event. In this case, the result trace begins with the events in α, followed by the
output events in γ representing an output tuple which contains error values for all output ports in J .
A ⇓ α fail ∈ α
fr J (A) ⇓ α
A ⇓ α · [fail] J = {a1, . . . ,am}
β = [outa1(
,err), . . . ,outam (
,err)]
fr J (A) ⇓ α · β
Note that we use the assumption that failing activities have no output events (so no such events need to be removed),
and that the fail event is the last event.
5. Iteration strategies and their trace semantics
In Section 2.1 we introduced and motivated iteration strategies. Here we formalize the iteration strategies in terms of
calculus operators, and show that those operators are suﬃcient to represent all the iteration strategies deﬁned in Taverna 2.
5.1. Operators for expressing iteration strategies
Only two additional operators are needed to express Taverna 2 iteration strategies, namely I→ J (F ) and rpta;b→c . Their
syntax is given by:
F : 〈a1 : τ1, . . . ,an : τn〉 〈b1 : σ1, . . . ,bm : σm〉
n > 0 I = {a1, . . . ,an} J = {b1, . . . ,bm}
I→ J (F ) : 〈a1 : [τ1], . . . ,an : [τn]〉 〈b1 : [σ1], . . . ,bm : [σm]〉
rpta;b→c : 〈a : τ ,b : [σ ]〉 〈c : [τ ]〉
The dot product iteration operator I→ J (F ) expects a list on each input port in I and iterates on all of them at the same
time, i.e., the fragment F is executed once for every combination of elements that are on the same indexes in the input
lists. This means that if the input lists are of unequal length then the unmatched elements from the longer lists are ignored.
The repeat operator rpta;b→c repeats the value on port a in a list as often as the list on port b has elements, e.g., it maps
the input 〈a = [1],b = [[2], [3,4], []]〉 to the output 〈c = [[1], [1], [1]]〉.
With these two operators we can simulate complex iteration strategies. For example, assume we would like to com-
pute the combinations for the strategy (a  b) where for ports a and b the expected port types are s and s and the
offered port types are [s] and [s], respectively. If the overall input is, for example 〈a = [1,2],b = [3,4]〉, then we need
to produce the pseudo value [[〈a = 1,b = 3〉, 〈a = 1,b = 4〉], [〈a = 2,b = 3〉, 〈a = 2,b = 4〉]]. Note that this is not really a
value because we do not allow tuples in lists, only lists and basic values. However this pseudo value indicates for which
combinations the processor will execute, and how the result values are constructed, i.e., each tuple is replaced with the
result for that combination. We can encode this pseudo value as two real values, one containing the a values, and the
other the b values. The ﬁrst value would be [[1,1], [2,2]] and the second [[3,4], [3,4]]. The ﬁrst value can be produced
by lna→a,a′ a′,b rptb;a′→c a,c a,c→a(rpta;c→a) and the second value by rptb;a→b . As will be illustrated later, this is in fact
possible for any iteration strategy and combination of expected and offered port types.
The semantics of the dot product iteration operator is given by three rules. We start with the case where all the input
lists are non-empty:
p > 0 ∀ j=1..p(F ⇓ α j) α ∈ ||| j=1..p( j.α j) {c1} ∪ · · · ∪ {ck} {a1, . . . ,an}
β = [inc1(λ1, v1), . . . , inck (λk, vk)] δ ∈ α ||| β
a1,...,an→b1,...,bm (F ) ⇓ δ
Here p is the length of the shortest of the input lists, the trace α j for each j ∈ 1, . . . , p is the trace of the execution of
fragment F for position j, the event inci (λi, vi) for each i ∈ 1, . . . ,k is an input events that describes parts of the input
lists beyond the ﬁrst p elements and will therefore be ignored. The trace of the dot product iteration is then composed
502 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508from (i) α, which is a combination of traces α j extended in such a way that the input and output events refer to sub-
values of the value at position j, and (ii) β which includes the possible additional input events trimmed from longer input
lists.
The second rule deals with the case where there is at least one empty input list, which means that an empty list is also
produced on all the output ports. Observe that the result, encoded by β , can be produced as soon as an empty list appears
on any of the input ports.
in∗(〈a1 = v1, . . . ,an = vn〉) ⇓ α vi = [ ]
α = α1 · [inai (
, [ ])] · α2 out∗(〈b1 = [ ], . . . ,bp = [ ]〉) ⇓ β γ ∈ α2 ||| β
a1,...,an→b1,...,bm(F ) ⇓ α1 · [inai (
, [ ])] · γ
The ﬁnal rule deals with the case where there are no empty input lists, but at least one error value, which means that
error values are produced on all the output ports. Here β starts after at least one input event for every input port has
arrived in α1, because at that moment it is clear that on port ai there arrived an error value and that none of the input
values is an empty list.
in∗(〈a1 = v1, . . . ,an = vn〉) ⇓ α1 · α2
∀ j=1..n(v j = [ ]) ∀ j=1..n∃λ,v ′(ina j (λ, v ′) ∈ α1)
vi = err out∗(〈b1 = err, . . . ,bp = err〉) ⇓ β γ ∈ α2 ||| β
a1,...,an→b1,...,bm (F ) ⇓ α1 · γ
Finally, the semantics for the repeat operator is given by:
in∗(〈a = va,b = [v1, . . . , vn]〉) ⇓ γ1 · · · · · γm+1
out∗(〈c = •k=1..n[va]〉) ⇓ [outc(i1.λ1,w1), . . . ,outc(im.λm,wm)]
∀ j=1..m(∃w ′,λ′(inb(i j .λ′,w ′) ∈ γ1 · · · · · γ j)) ∀ j=1..m(ina(λ j,w j) ∈ γ1 · · · · · γ j)
rpta;b→c ⇓ γ1 · [outc(i1.λ1,w1)] · γ2 · · · · · [outc(im.λm,wm)] · γm+1
The ﬁrst precondition speciﬁes that input events in the concatenation of γ1, . . . , γm+1 encode exactly the input tuple.
The second gives m output events encoding the list with va repeated n times, from which it follows that the sequence of
pairs (λi,wi) includes n repetitions of the values needed to encode va . The output trace is constructed by interleaving the
subsequent parts of the input trace with the subsequent output events. The last two preconditions ensure that an output
event outc(i j .λ j,w j) is produced only after (i) on the input b we have seen the index i j and (ii) on the input a we have
seen the relevant piece of va , i.e., (λ j,w j). Note that the ﬁnal part of the output trace γm+1 is necessary because each of
the values v1, . . . , vn might be encoded with more than one input event, but to produce the output event for its index we
only need to have consumed one of those input events. Note also that this deﬁnition is not as greedy as it could be, because
in principle the operator could output events encoding va on all the positions 1, . . . , i, if there was an input event for vi+1,
but the given deﬁnition matches the current implementation observed in Taverna 2.
5.2. Translating iteration strategy expressions
In this section we show that the presented operators and primitives are suﬃcient to represent all the iteration strategies
in Taverna. As an example, consider a processor with input interface type 〈a : s,b : s, c : s〉 which is offered values of
type [s], [s] and [[s]] on ports a, b and c, respectively, with iteration strategy (a  b)  c. The semantics of the iteration
strategy is deﬁned in terms of a calculus expression IS(a[1]b[1])c[2] : 〈a : [s],b : [s], c : [[s]]〉 〈a : [[s]],b : [[s]], c : [[s]]〉 that
consumes the offered values and produces three streams that contain all the combinations over which the processor is to
iterate. The integer numbers associated with each port indicate the difference in nesting depth between the offered type and
the expected type. This is a positive number if the offered value is nested too deep and negative if it is not nested deep
enough. For example, if the input value is 〈a = [1,2],b = [3,4], c = [[5,6], [7]]〉 then the output streams of IS(a[1]b[1])c[2]
encode 〈a = [[1,1], [2]],b = [[3,4], [3]], c = [[5,6], [7]]〉. The processor then iterates over these three lists, simultaneously
processing each time the combinations it ﬁnds at a certain position. In this case it processes for position 1.1 the value
〈a = 1,b = 3, c = 5〉, for position 1.2 the value 〈a = 1,b = 4, c = 6〉 and for position 2.1 the value 〈a = 2,b = 3, c = 7〉.
Expressing IS(a[1]b[1])c[2] requires ﬁrst expressing ISa[1]b[1] . The output for the b port can be computed as Fb =
rptb;a→b . Note that this maps 〈a = [1,2],b = [3,4]〉 to 〈b = [[3,4], [3,4]]〉. The output for the a port can be computed
by Fa = (lna→a,a′ a′ rptb;a′→b′ ) a,b′ a,b′→a(rpta;b′→a). Given 〈a = [1,2],b = [3,4]〉 the expression (lna→a,a′ a′ rptb;a′→b′ )
produces 〈a = [1,2],b′ = [[3,4], [3,4]]〉 and from this a,b′→a(rpta;b′→a) produces 〈a = [[1,1], [2,2]]〉. It is then not hard to
see that ISa[1]b[2] = lna→a,a′ ∅ lnb→b,b′ a,b Fb a′,b′ (lna′→a ∅ lnb′→b a,b Fa). Indeed, it can be shown that it is possible to
express an operator ⊗i, jI; J that computes the cross product iteration strategy assuming that the left-hand side is encoded in
the ports in the set I at nesting depth i and the right-hand side is in ports J at depth j.
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 503The next step in expressing IS(a[1]b[1])c[2] consists in combining the output of ISa[1]b[1] with that of port c. Note
that this output encodes the iteration values at nesting depth 2. So we need to take the dot product of ports a, b and c
at nesting depth 2, which can be done by nesting the iteration operator twice and applying it to the identity fragment:
a,b,c→a,b,c(a,b,c→a,b,c(lna→a ∅ lnb→b ∅ lnc→c)). Note that this indeed maps 〈a = [[1,1], [2,2]],b = [[3,4], [3,4]], c =
[[5,6], [7]]〉 to 〈a = [[1,1], [2]],b = [[3,4], [3]], c = [[5,6], [7]]〉. Also here it is possible to express a general operator iI; J
that computes the dot product iteration strategy assuming that the left-hand side is encoded in the ports in I , the right-
hand side in ports J and at depth i for all ports. At the time of writing Taverna 2 indeed requires that at the point that the
dot product is computed the nesting depth of the expected values is on both sides equal; a workﬂow graph in which this is
not the case is considered incorrect. However, in future versions this might be taken care of by inserting extra list-wrapping
operators that increase the nesting depth.
A complete iteration strategy annotated with positive nesting depth differences can be relatively straightforwardly
translated using the operators ⊗i, jI; J and iI; J , e.g., IS(a[1]b[1])c[2] is translated to ⊗1,1a;b a,b,c 2a,b;c . If the nesting depth
differences are negative then this is remedied by inserting extra wrapping operations that increase the nesting depth. For
example, ISa[−1]b[1] =mergea→a a ISa[0]b[1] .
In the complete processor semantics, the iteration expression is followed by the actual iteration of the processor over
the computed combinations. Observe that in this case this means that it iterates at nesting depth 2. For this we intro-
duce the notation iI→ J (F ) which indicates the iteration at nesting depth i and is deﬁned such that 0I→ J (F ) = F and
i+1I→ J (F ) = I→ J (iI→ J (F )). So in the example presented earlier, assuming that the body of the processor is represented
by the expression F ′ : 〈a : s,b : s, c : s〉 〈d : [s]〉, its total semantics becomes IS(a[1]b[1])c[2] a,b,c 2a,b,c→d(F ′).
6. The dispatch stack
As mentioned in the introduction, one of the interesting features of the new Taverna 2 model is the extensibility of the
process activation sequence. In practice, each processor P in the graph is conﬁgured by means of a stack of execution layers
and the processing that takes place when the workﬂow execution reaches P depends on this conﬁguration. Each layer in
the stack receives a request from the layer above, it performs a certain function that transforms the request and then it
either forwards the request to the layer below or it “bounces” it back up the stack as a response. Taverna 2 comes with a
choice of prebuilt standard layers but the stack can also include custom layers of a user provided type. This gives a way to
implement additional control operators in Taverna 2.
Each processor’s stack is conﬁgured independently from the others, using any of the available layers. This makes it
possible to associate a different behavior to different nodes in the graph. The semantics of each processors’s execution is
completely deﬁned by the semantics of each layer, along with the interaction amongst the layers in the stack.
The layers in the stack can be variously conﬁgured, but at the bottom is always the Invoke layer, which is responsible
for launching the execution of a basic activity and collecting its result. When the activity is a Web Service, for example, this
layer implements a client that initiates the service invocation, collects its result and detects error conditions. Other layers
perform additional tasks, intended mostly to provide quality of service to the execution. The Retryk layer, for example,
accounts for the possibility that a service is temporarily unreachable, the Bounce layer returns a failure state without
forwarding the request to the layer below it, if the incoming request from layer contains an error, while the Failover layer
tries several alternative activities in turn, hoping that one of them will succeed. We are going to describe the function
of each layer more precisely in the rest of the section. Speciﬁcally, we show how the behavior of each standard layer
can be formalized in terms of our trace semantics. We then provide a formal deﬁnition for two additional custom layers,
namely the Branch layer and the Loop layer, which can be used to add new control ﬂow operators to Taverna, an if-
then-else and a while-loop operator respectively. This shows how one can exploit the stack-based architecture to extend
the workﬂow deﬁnition language of Taverna 2 and furthermore how such extensions can be described within the trace
semantics framework.
6.1. Dispatch stack activities
In the following we describe the semantics of the dispatch stack in terms of activities, i.e., given a list of activities
A = [A1, . . . , An] and a dispatch stack S = [L1, . . . , Lm] we describe the behaviour of activity S(A). Later on in Section 7 we
show how the total behavior of a processor is described based on this, but in terms of fragments and taking the iteration
strategy into account.
The syntax for the dispatch stack layers is as follows:
A1 : ι ⇒ κ
[ ]([A1]) : ι ⇒ κ
S(A) : ι ⇒ κ
[Retryk|S ](A) : ι ⇒ κ
A = [A1, . . . , An] ∀i=1..n(S([Ai]) : ι ⇒ κ) S(A) : ι ⇒ κ
[Failover|S ](A) : ι ⇒ κ [Bounce|S ](A) : ι ⇒ κ
504 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508A = [A1, . . . , An] ∀i=1..n(S([Ai]) : ι ⇒ κ)
c ∈ dom(ι)
[Branchc|S ](A) : ι ⇒ κ
S(A) : ι ⇒ ι c ∈ dom(ι)
[Loopc|S ](A) : ι ⇒ ι
Here we adopt the notation [H|T ] to denote a list with head H and tail T . In the following we describe the behavior of
each layer in the same order.
Note that, if a stack does not include either e Failover or Branchc layers, then its activity list must be a singleton,
because the bottom Invoke layer requires exactly on activity. Note also that, when Failover or Branchc are present, the
activity list can be of an arbitrary length, even empty, yet all activities in it will be of the same type.
6.2. Standard layers
The Invoke layer expects a single activity in the list, and executes it. Since it is always required as the last layer it is not
represented in the formal deﬁnition: its behavior is described by the behavior of the empty stack.
S = [ ] A = [A1] A1 ⇓ α
S(A) ⇓ α
The role of the Retryk layer is to submit the activities in its input list at most k times to the layers below it. The layer
returns with failure, if the layers below it in the stack fail all k times, and it returns with success as soon as one of the
attempts succeeds.
S = [Retry0|R] in∗(t) ⇓ α
S(A) ⇓ α · [fail]
S = [Retryk|R] k > 0
R(A) ⇓ α fail ∈ α
S(A) ⇓ α
S = [Retryk|R] k > 0 R(A) ⇓ α · [fail]
S ′ = [Retryk−1|R] S ′(A) ⇓ β β ∈ β1 ||| β2 α|P = β|P = β1|
S(A) ⇓ α · β2
The ﬁrst rule explicitly allows that 0 retries are speciﬁed, which means that the processor always fails after consuming
the whole input. The second rule describes a successful execution where the trace of the layers remaining in R does not
contain a failure and becomes the result trace. The third rule deals with the case where the trace of the layers remaining
in R contains a failure, i.e., equals α · [fail]. The resulting trace is obtained by removing the failure event and replacing it
with a new trace β2. This new trace is obtained by removing the input events from a possible trace β of S ′(A) where S ′ is
equal to S except the number of retries is decreased by one. The equation α|P = β|P guarantees that β includes the same
input events as α, albeit possibly in some different order. The part of β that is included in the resulting trace, namely β2,
contains all events in β except the input events. This is guaranteed by β|P = β1| which implies that all input events of β
are in β1 and therefore not in β2.
The role of the Failover layer is to try and submit each activity in the input list to the layer below. It returns with
success when an activity succeeds, and it fails when all activities fail. This is similar to the Retryk layer, except that the
number of attempts is determined by the length of the list A, rather than by a user-deﬁned parameter k.
In principle, this layer may implement one of a number of strategies to prioritize the activities. We illustrate the simple
model where the activities are tried in the order in which they appear in the list.
S = [Failover|R] A = [ ]
in∗(t) ⇓ α
S(A) ⇓ α · [fail]
S = [Failover|R] A = [A|B]
R([A]) ⇓ α, fail ∈ α
S(A) ⇓ α
S = [Failover|R]
A = [A|B] R([A]) ⇓ α · [fail] S(B) ⇓ β β ∈ β1 ||| β2 α|P = β|P = β1|
S(A) ⇓ α1 · β2
The rules for the Failover layer are analogous those for the Retryk layer.
The Bounce layer returns a failure state without forwarding the request to the layer below it, if the incoming request
from the previous layer contains an error. Otherwise it returns the value provided by the layer below it.
S = [Bounce|R] in∗(t) ⇓ α
∃a,λ(ina(λ,err) ∈ α)
S = [Bounce|R] R(A) ⇓ α
¬∃a,λ(ina(λ,err) ∈ α)S(A) ⇓ α · [fail] S(A) ⇓ α
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 505The left rule deals with the case where the value on some input port a contains an error on any index λ. In this case,
the request is bounced off and a fail event is appended to the complete input. The right rule deals with the complementary
case where non of the input values was an error nor contained an error as any of its subvalues. The resulting trace is then
the complete trace of the rest of the stack R(A).
6.3. Extension layers
So far we have described the layers that comprise the standard conﬁguration of the dispatch stack. A powerful feature
of Taverna 2 is its ability to accept new layers as part of a stack’s conﬁguration. As each processor’s layer is conﬁgured
independently from the others, new layers can be selectively added to provide speciﬁc processors with new functionality.
Here we describe two such layers, which add control operator functionality to Taverna 2.
The ﬁrst of the two, the Loopc layer, is used to implement a conditional loop processor P . We assume that the input and
output ports of P are identical and include one distinguished port named c. If the layer receives a request with an interface
value t where t(c) is true, then it forwards the request to the next layer. If this returns a result then it feeds this back as
a request to itself, but if there is no result it returns failure. If however in the request t(c) is not true, then the input is
immediately returned unchanged.
S = [Loopc|R] α = α1 · [inc(
, v)] · α2
v = true α ∈ |||i=1..n([inai (λi, vi),outai (λi, vi)]) α1| = α1|P
S(A) ⇓ α
S = [Loopc|R] R(A) ⇓ α
α = α1 · [inc(
, true)] · α2 α1| = α1|P S(A) ⇓ β γ ∈ α ◦P β
S(A) ⇓ γ
The ﬁrst rule covers the case where the value provided on the port c is not true. The operation described by the result
trace α ∈ |||i=1..n([inai (λi, vi),outai (λi, vi)]) is simply coping the input to output, possibly with some buffering. Note that we
do not allow output events in α1, i.e., before we know that the value on port c is not the boolean true.
In the second rule α describes the initial iteration of the loop, which occurs because on port c a boolean true is pro-
vided. Here we also not allow output events in α1. The result trace of the whole iteration is obtained by, recall the trace
composition operator ◦I from Section 4.1, composing α with the trace of the remaining iterations of the loop described
by β on the whole domain of input ports P .
The second of the two custom layers enables workﬂow designers to create If-then-else processors. The layer’s behavior
is similar to that of the Loop layer, in that it assumes a conditional value is bound to a particular variable c, i.e., (c, i) ∈ t .
This time the value, a positive integer in the range 1 . . . |A|, is used to select the ith element A(i) of list A. This activity is
forwarded to the layer below for execution. A value that is out of range results in a failure.
S = [Branchc|R] α| = α|P inc(
, v) ∈ α v ∈ {1, . . . , |A|}
S(A) ⇓ α · [fail]
S = [Branchc|R]
v ∈ {1, . . . , |A|} R([A(v)]) ⇓ α α = α1 · [inc(
, v)] · α2 α1| = α1|P
S(A) ⇓ α
The ﬁrst rule deals with the case where the value v provided on port c is out of range. The resulting trace contains all
the input events followed by failure. Note that we allow for the input event for port c to be followed by some other input
events. The second rule deals with the case that the value v provided on port c is in the correct range. The result trace is
then a valid trace for R([A(v)]), i.e., the remaining layers and the vth activity, but we require that in the part of the trace
before the input event for port c, that is in α1, there are no other events except input events.
7. Translating Taverna 2 processor speciﬁcations
In this section it is explained how to represent in a calculus expression the behavior of a single processor. Recall that
in Section 4.2 it was shown how to represent a workﬂow graph where the behavior of the individual processors was
represented by abstract expressions such as FP , FQ and FR . Here we explain how these expressions are constructed from
the processors’ speciﬁcation.
The complete speciﬁcation of a processor’s behavior in Taverna 2 consists of the following components: (1) the expected
types on each of its input ports, (2) the iteration strategy expression, (3) the dispatch stack S and (4) a list of activity
speciﬁcations. In the default conﬁguration, the stack S is of the form S = [Bounce,Failover,Retryk]. This has the effect that
a request submitted to the processor succeeds if at least one of the activities in A succeeds, possibly after failing at most
506 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508k − 1 times, and it fails otherwise. The additional extension layers, like the Loopc layer, are typically placed at the top of
the stack. The list of activity speciﬁcations can contain basic activity names and workﬂow graphs.
The construction of the corresponding calculus expression is done in three steps. In the ﬁrst step we construct an
expression for the core semantics which describes the behavior of the dispatch stack and the activities list. In the second
step we extend this expression such that it describes the step semantics which describes the behavior of a single iteration
step of the processor. In the last step the expression is again extended such that it describes the iteration semantics by
incorporating the iteration strategy and describing how the processor iterates over the provided input values.
We begin with the translation of the list of activity speciﬁcation which is translated to a list of activity expressions A.
The basic activity names are simply translated to themselves. Each workﬂow graph is represented by a fragment expression
that gives the semantics of this workﬂow graph, as explained in Section 4.2. The core semantics of the processor is then
described by the activity expression S(A).
The core semantics falls short of describing the step semantics of a processor, i.e., the semantics for a single iteration
step, in two ways. The ﬁrst is that it might still fail, and this can be remedied by applying the fragment operator fr J ()
which replaces failure with the production of error values. The second is that a processor always waits with executing an
iteration step until all the input values for that step have completely arrived. To represent this we introduce the input
synchronization operator:
F : ι κ
sync(F ) : ι κ
F ⇓ α α = α1 ||| α2 α1| = α|P = β1|
sync(F ) ⇓ β1 · α2
We then can represent the step semantics of a processor as sync(fr J (S(A))) where J is the set of output ports of the
processor.
We now proceed with representing the iteration semantics of a processor. An iteration strategy is that is annotated with
nesting depth differences can be expressed by an expression ISis , as discussed in Section 5.2. The nesting depth difference
can be computed from the expected type, which is given, and the offered type, which can be relatively straightforwardly
computed. For example, if the input port is connected to an output port of a preceding processor, then it is the produced
output type of that output port, and if it is connected to a workﬂow input then it is the expected type of that workﬂow
input. Assuming that in the result of ISis the expected values are found at nesting depth i, the full semantics of the processor
can be described as ISis I iI→ J (sync(fr J (S(A)))) where I and J are the set of input ports and output ports, respectively,
of the processor.
Together with the translation given in Section 4.2 for the workﬂow graph that contains the processors, this demonstrates
how the semantics of a complete Taverna 2 workﬂow graph can be represented as a fragment expression in the calculus.
As an example consider again the workﬂow graph in Fig. 5 in Section 4.2. Recall that the whole workﬂow graph could be
represented as F7 = (((((lna→e ∅ lnb→ f ) e, f FP) g lng→h, j) h FQ)  j FR) i,k mergei,k→d . Now assume that we know for
processor P that its dispatch stack is the default one, i.e., S = [Bounce,Failover,Retry4], and its list of activities contains
one basic service viz. s1. In that case the core activity of the processor is given by S(A) = [Bounce,Failover,Retry4]([s1]).
Assume that for P the speciﬁed iteration strategy is (e  f ) and that the types of the workﬂow inputs a and b are [s]
and [s], and the expected input types for the ports e and f of service s1 are s and s, respectively. It then follows that
for P for ports e and f the offered types are [s] and [s], respectively, and the expected types are s and s, respectively.
So the annotated iteration strategy is (e[1] f [1]) and its corresponding calculus expression is expressed by 1e; f which
simply copies the input lists from the input port e to the output port e and the same for the input port f and the output
port f . Since this encodes the iteration values at nesting depth 1 the full representation of the processor semantics becomes
FP = 1e, f e, f 1e, f→g(sync(frg([Bounce,Failover,Retry4]([s1])))).
A feature that was not discussed is the control link between two processors, which indicates that the sink processor
cannot start execution before the source processor, also called the controlling processor, has produced all its output. This
can be simulated by adding dummy input ports to the sink processor for each of the output ports of the source processor
and connecting them. The expected types of these dummy input ports should be exactly the produced types by the cor-
responding output port, taking into account the offered types and iteration strategy for the source processor. Adding such
dummy input ports can be done by nesting the processor in a nested workﬂow with an empty dispatch stack and that
contains only this processor, and adding the dummy ports as extra unconnected workﬂow inputs. These unconnected input
ports can be represented by a special dummy input port operator duma which deﬁnes a fragment with a single input port a
of port type τ and no output ports. Its behavior is simply that it consumes the input values from port a and nothing else.
Formally its syntax and semantics is deﬁned by:
duma : 〈a : τ 〉 〈〉
in∗(〈a = v〉) ⇓ α
duma ⇓ α
Next to representing unconnected workﬂow inputs this operator also allows us to represent workﬂow graphs where certain
output ports are not connected to any input port or workﬂow output. In this case the output port can be connected to the
input port of a dummy input port operator.
J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508 5078. Conclusion and further research
We have presented in this paper a calculus with trace semantics to deﬁne the formal semantics of Taverna 2 workﬂow
graphs. This calculus consists of a limited set of operators that each capture a speciﬁc aspect of the execution of workﬂows
represented in these workﬂow graphs. Speciﬁcally they allow the description of the iteration strategies, the pipe-lined
execution of processors, and the dispatch stack mechanism that is used in Taverna 2 to conﬁgure the speciﬁc behavior of a
processor. Although the formal deﬁnition of the semantics of the presented operators is far from trivial, we claim that their
intuitive meaning is not hard to understand and can help to create insight into the exact meaning of Taverna 2 workﬂow
graphs. Moreover, we think it can serve as a starting point for determining whether the behavior of two different workﬂow
graphs is identical.
Although the formalisation of properties like parallel and pipelined execution is complete, aspects of the deﬁnition of
workﬂow graphs remain to be formalized, in particular their complete translation into calculus expressions remains the
subject of future research. This would include the complete algorithms for deriving the offered types for processor input
ports, the produced types for processor output ports and the complete translation of an annotated iteration strategy, which
were all only described informally here. In addition also the translation of control links was only described informally and
will be formalized in more detail in future work.
Acknowledgments
The authors would like to acknowledge the support of all members of the myGrid team, and are especially grateful to
Stian Soiland-Reyes and Stuart Owen for providing precious technical insight into the Taverna model.
References
[1] W.M.P. van der Aalst, The application of Petri nets to workﬂow management, J. Circuits Systems Comput. 8 (1) (1998) 21–66.
[2] Alexandre Alves, Assaf Arkin, Sid Askary, Ben Bloch, Francisco Curbera, Yaron Goland, Neelakantan Kartha, Sterling, Dieter König, Vinkesh Mehta, Satish
Thatte, Danny van der Rijn, Prasad Yendluri, Alex Yiu, Web Services Business Process Execution Language Version 2.0, OASIS Committee Draft, May
2006.
[3] Assaf Arkin, Sid Askary, Scott Fordin, Wolfgang Jekeli, Kohsuke Kawaguchi, David Orchard, Stefano Pogliani, Karsten Riemer, Susan Struble, Pal Takacsi-
Nagy, Ivana Trickovic, Sinisa Zimek, Web service choreography interface (wsci) 1.0, World Wide Web Consortium, Note NOTE-wsci10-20020808, August
2002.
[4] P. Fisher, C. Hedeler, K. Wolstencroft, H. Hulme, H. Noyes, S. Kemp, R. Stevens, A. Brass, A systematic strategy for large-scale analysis of genotype
phenotype correlations: Identiﬁcation of candidate genes involved in African trypanosomiasisafrican trypanosomiasis, Nucleic Acids Research 35 (16)
(August 2007) 5625–5633.
[5] A. Goderis, C. Brooks, I. Altintas, E. Lee, C. Goble, Composing different models of computation in kepler and ptolemy ii, in: International Conference on
Computational Science, in: Lecture Notes in Comput. Sci., Springer, Berlin/Heidelberg, 2007, pp. 182–190.
[6] Object Managment Group, Business Process Modeling Notation (BPMN), Version 1.2, OMG Document Number: Formal/2009-01-03, Standard document
URL: http://www.omg.org/spec/BPMN/1.2, January 2009.
[7] J. Hidders, J. Sroka, Towards a calculus for collection-oriented scientiﬁc workﬂows with side effects, in: OTM Conferences (1), 2008, pp. 374–391.
[8] Jan Hidders, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, Jan Van den Bussche, DFL: A dataﬂow language based on petri nets and nested
relational calculus, Inf. Syst. 33 (3) (2008) 261–284.
[9] D. Hull, K. Wolstencroft, R. Stevens, C.A. Goble, M.R. Pocock, P. Li, T. Oinn, Taverna: A tool for building and running workﬂows of services, Nucleic Acids
Research, 34 (Web-Server-Issue), 2006, pp. 729–732.
[10] Nickolas Kavantzas, David Burdett, Greg Ritzinger, Tony Fletcher, Yves Lafon, Charlton Barreto, Web services choreography description language version
1.0, World Wide Web Consortium, Candidate Recommendation CR-ws-cdl-10-20051109, November 2005.
[11] Frank Leymann, Web services ﬂow language (WSFL 1.0), Technical report, IBM, May 2001.
[12] P. Li, J. Castrillo, G. Velarde, I. Wassink, S. Soiland-Reyes, S. Owen, D. Withers, T. Oinn, M. Pocock, C. Goble, S. Oliver, D. Kell, Performing statistical
analyses on quantitative data in taverna workﬂows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray
data, BMC Bioinformatics 9 (334) (2008), August.
[13] B. Ludäscher, I. Altintas, C. Berkley, Scientiﬁc workﬂow management and the Kepler system, in: Concurrency and Computation: Practice and Experience,
Special Issue on Scientiﬁc Workﬂows, 2005.
[14] T. McPhillips, S. Bowers, B. Ludäscher, Collection-oriented scientiﬁc workﬂows for integrating and analyzing biological data, in: Proceedings 3rd Inter-
national Conference on Data Integration for the Life Sciences (DILS), LNCS/LNBI, Springer, 2006.
[15] T. McPhillips, S. Bowers, D. Zinn, B. Ludäscher, Scientiﬁc workﬂow design for mere mortals, Future Generation Computer Systems 25 (5) (2009) 541–
551.
[16] Robin Milner, Communicating and Mobile Systems: The Pi-Calculus, Cambridge University Press, 1999, June.
[17] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M.R. Pocock, A. Wipat, P. Li, Taverna: A tool for the composition
and enactment of bioinformatics workﬂows, Bioinformatics 20 (17) (2004) 3045–3054.
[18] Chun Ouyang, Eric Verbeek, Wil M.P. van der Aalst, Stephan Breutel, Marlon Dumas, Arthur H.M. ter Hofstede, Formal semantics and analysis of control
ﬂow in WS-BPEL, Sci. Comput. Program. 67 (2–3) (2007) 162–198.
[19] C. Pautasso, G. Alonso, Parallel computing patterns for grid workﬂows, in: Proc. of the HPDC2006 Workshop on Workﬂows in Support of Large-Scale
Science (WORKS06), Paris, France, 2006.
[20] Benjamin C. Pierce, David N. Turner, Pict: A programming language based on the pi-calculus, in: G. Plotkin, C. Stirling, M. Tofte (Eds.), Proof, Language
and Interaction: Essays in Honour of Robin Milner, MIT Press, 2000.
[21] Wolfgang Reisig, Petri Nets: An Introduction, Springer-Verlag New York, NY, USA, 1985.
[22] D. Smedley, S. Haider, B. Ballester, R. Holland, D. London, G. Thorisson, A. Kasprzyk, Biomart – biological queries made easy, BMC Genomics 10 (22)
(2009).
[23] D. Turi, P. Missier, D. De Roure, C. Goble, T. Oinn, Taverna workﬂows: Syntax and semantics, in: Proceedings of the 3rd e-Science Conference, Bangalore,
India, December 2007.
508 J. Sroka et al. / Journal of Computer and System Sciences 76 (2010) 490–508[24] W.M.P. van der Aalst, Ter A.H.M. Hofstede, YAWL: Yet another workﬂow language, Infor. Syst. 30 (4) (2005) 245–275.
[25] Wil M.P. van der Aalst, Arthur H.M. ter Hofstede, Bartek Kiepuszewski, Alistair P. Barros, Workﬂow patterns, Distrib. Parallel Databases 14 (1) (2003)
5–51.
[26] R. van Glabbeek, The linear time-branching time spectrum, in: Proceedings on Theories of Concurrency: Uniﬁcation and Extension, Springer-Verlag,
1990, pp. 278–297.
[27] M. Weidlich, G. Decker, M. Weske, Eﬃcient analysis of BPEL 2.0 processes using π -calculus, in: Asia-Paciﬁc Service Computing Conference, The 2nd
IEEE, December 2007, pp. 266–274.
