This paper presents a formal semantics for the Taverna 2 scientific workflow system. Taverna 2 is a successor to Taverna, an open-source workflow system broadly adopted within the e-science community worldwide. The new version improves upon the existing model in two main ways: (i) by adding support for data pipelining, which in turns enables input streams of indefinite length to be processed efficiently; and (ii) by providing new extensibility points that make it possible to add new operators to the workflow model. Consistent with previous work by some of the authors, we use trace semantics to describe the effect of workflow computations, and we show how they can be used to describe the new features in the Taverna 2 model.
Introduction
Taverna [9] is a workflow language and computation model designed to support the automation of complex, servicebased processes, specifically in the domain of science. Version 1 was first described in 2004 [17] and has since proven to be a useful tool for scientists, mainly in bioinformatics [4, 12] . The formal semantics of Taverna was described first by Turi et al. [23] , and then by Hidders et al. [7] . The latter in particular uses trace semantics to take into account possible side-effects that are associated with the services.
As a language, Taverna is strongly user-focused, as it attempts to strike a balance between expressivity of its operators and simplicity of use, with the ultimate goal of empowering users, who have only a rudimentary understanding of programming, to assemble complex workflows. The re-design of the workflow engine, which is now called "Taverna 2", reflects the need to reassess this balance as a result of years of practical usage, and of changing requirements. This paper presents a new trace semantics for Taverna 2, following a similar approach to that of Hidders et al. [7] . We specifically formalise some of the model's new features, namely (i) support for data streaming, through pipelined execution of workflows; and (ii) support for extensibility of the set of workflow operators. Despite the fact that techniques for exploiting implicit data parallelism in workflows have been known for a long time [25] , pipelining was not available in Taverna 1, resulting in a limitation in the scalability of applications where data consists of large collections.
Although we describe these features formally, and characterize the types of parallelism that are available in Taverna, our presentation is focused entirely on the development of the theoretical model, while a quantitative performance analysis of the workflow engine is beyond the scope of this paper.
We also pay special attention to Taverna's support of repeated processor execution by iteration over lists. This feature is important for scientific workflows and is shared among some other workflow systems. Notably, one of the computation models in the Kepler system [13] , namely the COMAD model, supports streaming of data collections by letting workflow processors manipulate XML documents [14] . This has been recently advocated as a way to better meet the requirements of non-expert users [15] .
The importance of defining formal models of various types of workflow computation has been recognised for a long time [1] , including a formal model of Kepler and its precursor, the Ptolemy system [5] . Such models can be used to support the design of workflow languages and of their interpreters, compilers and optimizers as well as of debuggers, and to support the definition of verification procedures, similar to those used for verifying the correctness of complex business transactions. In the case of Taverna, a system whose design was largely driven by the practical need to support process automation for a variety of users types, such formalisation is even more important, in that it provides workflow designers, developers as well as users with an a posteriori description of the system's behaviour that is both unambiguous and complete.
The first question that needs to be answered before a formal semantics can be defined is what it is that we would like the semantics to describe about the specified workflows and at what abstraction level. It can be argued, for example, that from the point of view of the user a Taverna workflow just describes a computation that maps a set of input values to a set of output values. In that case the semantics of a Taverna workflow would be a non-deterministic function that maps the input values to the output values, or in other words, if two workflows express the same mapping then they are equivalent as far as the user is concerned. This is for example the perspective that was taken in earlier work by Turi et al. [23] . However, in this paper we assume that in some cases the user does care about which steps are executed by the computation and in what order. This is for example the case if these are calls to functions with side effects, e.g., an update to a database, or calls to web services that require a certain protocol. In this case the formalism should not just indicate to which output the input is mapped, but also which steps are taken to do so, and hence a process formalism is more appropriate. We will assume that the relevant events that are to be described are the consumption of the input values, the invocation of functions that have side effects or call external web services and the production of output values.
There are many formalisms and languages already available to describe processes and in particular processes that compose web services. Many of these have a syntax and semantics based on either the π -calculus [16] or on Petri nets [21] .
Based on π calculus are for example WS-CDL [10] , WSCI [3] and XLANG, and based on Petri nets are BPMN [6] , YAWL [24] and WSFL [11] . A somewhat special case is BPEL [2] , which is based on both XLANG and WSFL. Its semantics is at least partially describable in terms of both Petri nets, e.g., [18] , and π -calculus, e.g., [27] , and it is explicitly aimed at defining an executable process specification. Despite the existence of the previously described range of languages and formalisms, we have chosen to define a specific tailored set of operators for describing the semantics of Taverna 2 in terms of traces. We have done so for several reasons:
Observational equivalence. The main goal of the presented formalism is to define when two workflow specifications are equivalent, and to allow reasoning over what can and cannot be expressed in the Taverna 2 workflow language. This requires the chosen language to have a complete formal semantics that indeed describes when two workflows are observationally equivalent. Note that for formalisms that are based on transition systems, such as Petri nets and π -calculus, this means that next to the transition system an equivalence relation also needs to be defined. See for example [26] for a range of such equivalence relations and an explanation of when they are appropriate depending on the type of processes that are being described. In the case of workflow enactment such as defined in Taverna, it is not hard to see that trace semantics is the correct equivalence relation since the user can observe the events, i.e., the calls to services, but cannot influence them.
As a consequence, formalisms based on bisimulation semantics, such as π -calculus, are less suitable or at least require a proof that this does not lead to incorrect equivalencies. It will also be clear that a formalism that directly defines the set of possible traces, rather than indirectly through Petri nets which in turn define sets of traces, will be easier to understand and reason over.
Taverna 2 specific features. Typically, Taverna workflows involve computations on recursively nested lists, i.e., list composition and iteration. Although extensions exist of π -calculus, e.g., Pict [20] , and of Petri nets, e.g., DFL [8] , that can describe this, these require major extensions of the basic formalism. Also here we claim that a tailored formalism that directly defines the traces for such behavior is more compact and easier to comprehend. A similar argument can be made for the two new features in Taverna 2 that are the focus of this paper: the pipelined execution of processors and the dispatch stack which defines an extensible mechanism to alter the behavior of individual processors.
The rest of the paper is organised as follows. We present an informal overview of the Taverna model and the calculus in Section 2, followed by a definition of trace, in Section 3. Traces are then used in Section 4 to describe the semantics of workflow graphs, in Section 5 to formalise Taverna's iteration strategies, and again in Section 6 to describe the extensibility features of the model. Finally, in Section 7 we describe a translation of a Taverna 2 specification into our calculus, and we conclude in Section 8.
Overview of Taverna and introduction to the calculus
Informally, a Taverna workflow graph is a directed acyclic graph where nodes, called processors, represent software components, for instance Web services or local scripts, with an interface that consists of input and output ports. Arcs in the graph connect pairs of ports, and specify a data dependency from the output port of one processor to the input port of another. Additionally, a control link between a source and a sink processor can be used to specify that the sink processor cannot execute until the source processor has produced all its output. A simple example of a Taverna workflow for bioinformatics is illustrated in full detail notation with processor and port names in Fig. 1(a) . 1 The workflow takes a user-supplied list of genes, in the format expected by the KEGG database, 2 and it retrieves information about the metabolic pathways which the input genes are involved in, according to the database.
It uses the KEGG Web Service to fetch the data, as well as a number of ancillary scripts, known as "shims", as adapters between the output of a service and the input of the next. Workflow computation is mostly data-driven: when there are no control links, a processor is ready to execute as soon as all of its input ports are bound to a value, and its execution causes its output ports to be bound to the result values. These values are then propagated along any arcs that originate in any of the output ports to create new bindings downstream. In particular, workflow execution begins when the workflow inputs are bound to initial input values, e.g., Kegg_ gene_id_list in the example, and terminates when output values cannot be propagated any further down the graph. In the example, the final results are bound to the output values gene_descriptions, pathway_descriptions, and pathway_by_genes.
Note that processors need not have any input ports, for instance rexeg in the figure, which is used to produce a constant value. Similarly a processor may have no output ports, and would only be executed for its side effects. Note also that the separator input ports of merge_descriptions, merge_pathways and merge_pathways_2 processors have no incoming arcs. For such input ports a default value is provided, but this is not visible in the graphical notation.
With processors we associate activities, which define the core functionality of the processor. These can be either basic activities, i.e., representing a local program or script or the invocation of an external service, or workflows themselves, i.e., nested workflows, as shown in Fig. 1(b) where processor P2 contains a workflow with processors P4 and P5. All activities are characterized by a pair of interfaces that are both a set of port names. Basic activities in particular perform some kind of computation using the values on their input ports and produce new values on the output ports.
Let B be a countably infinite set of basic activity names and P a countably infinite set of port names. For example, with reference to Fig. 1(a) , Kegg_query ∈ B , and string, return, attachmentList ∈ P. 3 An interface is a finite set I ⊂ P We use variants of the variables I and J to denote interfaces. Interfaces describe the input and output ports of activities (including sub-workflows). [s] . Note that lists can be nested, as for example Y for processor P4 in Fig. 2 . We will use variants of the variables τ , σ and ρ to denote port types.
Definition 2 (Interface type).
An interface type is defined as a partial function ι : P → T that is defined for a finite subset of P. Such an interface type is denoted as a 1 : τ 1 , . . . , a n : τ n , for example return : [s], attachmentList : [s] , and the empty type is allowed in order to accommodate processors with no input (see for instance processor regex, which produces a constant value), and is denoted as . The set of all interface types is denoted as Θ.
We will use variants of the variables ι and κ to denote interface types. The disjoint union of two interface types ι and κ is denoted as ι, κ. For partial functions and other binary relations t such as interface types the domain of t is defined as dom(t) = {x | (x, y) ∈ t}. We assume that for all basic activity names f ∈ B , an input interface type ι and output interface type κ are given, which is denoted as f : ι → κ, for 
. . , w n ] then we write a ∈ w to denote that a appears in w, i.e., there exists an 1 i n such that w i = a. The length of a list v is denoted as |v|.
In order to define the semantics of port types we postulate the set of basic values S which will be semantics of the basic type s. We will assume that the basic values include strings, booleans and numbers, but not a special error value err. Note that this binary relation is not required to be either total or functional. We allow it to be partial to model that the associated service always fails for certain input values even if they belong to the required input type. This might for example be the case if they do not satisfy certain additional preconditions that the service requires. We allow the semantics to be non-functional to model that the result of a service call may be non-deterministic from the point of view of the workflow, i.e., not be completely determined by the arguments.
List values and repeated processor invocation
Although the workflow is an acyclic graph, and thus does not permit users to explicitly describe loops over groups of processors, the Taverna model specifies how a processor can execute repeatedly, by iterating over its input ports list values. Here we give an informal account of this mechanism, while a complete formal model will be given in Section 5.
Let pvd(v) denote the depth of a port value v, for example, 
This implicit iteration rule is designed to facilitate the composition of individual services into workflows. For example, it makes it possible to feed the results of a database lookup service P1, which returns a collection of records, to a service P2 that is designed to analyse a single record. In this case, implicit iteration is a natural interpretation of the data dependency between P1 and P2.
The iteration rule extends to the case where depth mismatches appear on multiple inputs. Fig. 2 [y 1,1 , . . . , y 1,m ], . . . , [y n,1 , . . . , y n,m ] ] where y i, j = (P4 a i b j c) . Note that here c is treated as a constant parameter, because there is no port mismatch on X3, therefore the same value c is used in each iteration.
Additionally, Taverna workflow designers have the option to choose a different iteration strategy, based on the
where |y| = k. The overall iteration strategy of a processor can be specified by an iteration strategy expression, such as for example (a b) c, where the symbol indicates the cross product strategy and the indicates the dot product strategy.
In Section 5 we further generalize and formalize the cross and dot product operators, showing in particular that the cross product operator can be simulated using the dot product iteration and one additional operator of the calculus.
Pipelining
The Taverna 2 execution engine provides support for the pipelined execution of processors along a path in the workflow, an important feature not previously available in Taverna 1, as mentioned in the introduction. This feature is strictly connected with the implicit iteration mechanism just discussed, and is aimed at improving efficient execution over large data collections. In Taverna, iterations over items in a list may be executed concurrently if sufficient resources are available to support parallel execution threads. When this happens, the elements of the result list may be produced in arbitrary order, due to different speeds of each of the threads. These possible different arrival orders are captured in workflow traces, described in Section 3, by recording events that represent individual locations (see Definition 4) of an output list being assigned a value, and values at input locations being used by an activity.
In Taverna 1, once the output list is complete, the engine activates each of the downstream processors that consume the list. While this behaviour is correct, it does not take advantage of additional implicit parallelism that is available due to the mutual independence between data elements within a list. In contrast, Taverna 2 allows for the consumer processors to begin to process individual elements of the result without the need to wait for the entire list to be complete. The most common and useful case occurs when consumers are also subject to implicit iteration. Suppose for example that, in the fragment of Fig. 2 , P4:Y has an outgoing arc to some port P5:X, with port type depth equal to 0. As we know, P5 will iterate on each of the elements y i, j in y, and furthermore, these iterations will all be independent of one another. Since the sub-lists as well as the simple values in y are themselves independently generated, each of them can be forwarded to P5 as soon it becomes available on port P4:Y, independently from the others. By greedily pushing partial results to downstream processors, the speed at which individual values are computed is no longer bounded by the slowest thread that accounts for one iteration over P4.
We can frame this behaviour in terms of a collection of workflow patterns for parallel computing, proposed in [19] . Within this framework, Taverna implements a superscalar pipeline execution semantics, whereby a new thread is allocated to each list element in the input to an iterating processor. At the same time, however, an upper bound on the number of available threads can be set for each processor (through an advanced configuration option). With this limitation, some pipelines can be blocking, if the upstream processors are systematically faster than downstream processors.
In addition, Taverna also supports streaming pipelines. In this context, a stream is defined as an unpredictably long collection of discrete input data elements, an increasingly common occurrence in applications such as process monitoring, or sensor data processing. Indeed, Biomart (www.biomart.org) [22] is an example of a service that supplies its output in a streamed fashion. Taverna is able to consume the stream as it appears on the output port, either by feeding it incrementally to downstream services that also accept streams, or by buffering the output prior to forwarding it to services that expect complete data as input. Conceptually, buffers have unlimited capacity, thanks to a two-tier data architecture that involves an in-memory buffer with a database overflow. A complete description of the data architecture is beyond the scope of this paper, however.
Trace semantics
As anticipated in the introduction, we model the semantics of workflows in terms of traces. Informally, traces record sequences of three types of elementary events: (i) input events, representing values arriving on input ports; (ii) the atomic execution of a basic activity, and (iii) output events, i.e., the availability of a value on an output port. In this section we introduce our notation for describing traces, while in the next section we introduce the set of operators of our calculus that describe Taverna 2 workflows and define the semantics of such operators as the set of all legal traces that the use of the operator can produce. As a workflow is described by an expression that combines operators of the calculus, by extension the semantics of a workflow is defined as the set of all legal traces that the expression can produce. Note that by describing the semantics in terms of traces that contain basic activity executions, rather than just input and output events, we account for the fact that these activities involve possibly stateful service invocations, a common occurrence in scientific workflows.
Traces carry information about the relative ordering of events which are assumed to be atomic, but they are not concerned with the actual relative duration of parts of the computation. Although a basic activity execution can in principle be thought of as the combination of two steps: sending a request and receiving a response, we are going to assume for simplicity that basic activity executions are atomic and they are therefore described by a single event.
As mentioned in the previous section, collections consist of elementary values that can, in principle, be consumed and produced, and thus appear in input and output trace events, in any order. We use locations to indicate the position of an elementary basic value within a containing collection value, as follows. 
Definition 4 (Location)
.
Definition 5 (Workflow trace). A workflow trace is a list of events. An event is either
• a successful basic activity execution: f (t 1 ) → t 2 where f ∈ B and t 1 , t 2 ∈ I, • a failed basic activity execution: f (t 1 ) → ⊥ where f ∈ B and t 1 ∈ I,
Since a workflow execution involves both parallelism across processors, and pipelining across iterations of a processor, different executions may generate different traces for the same inputs, even when the services involved are stateless. 
Here F S and F T are the basic activities associated with processor S and T. Note that in both traces these processors iterate over their input, a list with two elements, but in β processor T already starts before S is finished, as might be expected in a pipelined execution.
We also give two traces for two executions of the workflow of Fig. 3(b) :
Note that the complete traces include only initial input events and final output events. As will become clear with the presentation of the semantics rules, the corresponding intermediate input and output events cancel themselves out when the composition operator is used.
To describe all possible orderings of events in a trace in a compact way, we write (α ||| β) to denote the set of all possible interleavings of two traces α and β. Formally:
Note that this indeed defines all possible interleavings of the events in α and β since α i and β j can be the empty trace. We generalize the interleaving set for more than two traces, denoted as (α 1 
as {α 1 }, and for n = 0 as {[ ]}, i.e., the singleton set containing the empty trace.
As we have seen from the previous example, multiple events may be used to represent the arrival or generation of list values. For convenience, we summarize the entire set of such events by denoting the encoding of an entire input (resp., 
These are examples of inference rules that will be used extensively throughout the paper. Where basic values and lists are associated with streams on individual input and output ports, tuple values are used to model the complete input or output of a processor and in that sense arrive at or are produced by a processor as a whole. Correspondingly, as a further shorthand we introduce the notation in * (t) ⇓ β and out * (t) ⇓ β to denote that trace α encodes the tuple interface value t in input events or output events, respectively. Formally, such an encoding β is an interleaving of traces α i , where each α i encodes the arrival (resp., generation) of one field a i of the tuple t on port a i :
The rules for output traces, i.e., out * (t) ⇓ β, are similar and omitted for brevity.
Finally, we introduce the notation α| I to denote the filtering of the stream α on the input events on ports from I disregarding all other event types, i.e., α| I = {in a (λ, v) | in a (λ, v) ∈ α, a ∈ I} and a counterpart notation for output events
Note that a filtering is a set, rather than a trace itself. We use α| to denote the set of all events in α, i.e., α| = {e | e ∈ α}.
Semantics of workflow graphs
In the calculus we distinguish two types of expressions, fragments and activities. Fragments represent parts of workflow graphs, i.e., single processors and subgraphs. Activities represent wrappers for real-world services that are executed by workflows, e.g., web service clients. The difference between the two is that fragments never fail, while activities may terminate with failure. We need to make this distinction explicit because T2 assumes that nodes in the graph represent processors that never fail. However, since in reality the services that are invoked when the processors are executed can fail, this potential mismatch is resolved by the processor execution and the semantics accounts for this.
In this section we give the formal semantics for both types of expressions. The fragments correspond to subgraphs of workflow graphs, which can be composed into bigger graphs. The operators that describe such fragments have types of the form ι κ, where ι and κ are interface types that describe the input ports and the output ports, respectively. The semantics of these operators is defined by judgments of the form e ⇓ α, where e is a fragment expression and α a possible trace that can include only input events, output events and basic activity executions but no failure events. The intended semantics of a type ι κ is that if a fragment has this type then it holds for the associated set of traces that (1) no trace in the set contains fail, (2) for each interface value t of interface type ι there is at least one trace α in this set such that α| P , i.e., the set of input events, encodes t and (3) for each trace α in this set if it holds that α| P encodes a values of type ι, then α| P , i.e., the set of output events in α, encodes a value of type κ.
Activities are executed during the run of a workflow and can be either basic activities that represent real world services or nested workflow graphs. Operators that describe activities have types of the form ι ⇒ κ and their semantics is defined by judgments of the form e ⇓ α, where e is an activity expression and α a possible trace that can include fail. The intended semantics of a type ι ⇒ κ is that if a fragment has this type then it holds for the associated set of traces that (1) for each interface value t of interface type ι there is at least one trace α in this set such that α| P encodes t, and (2) for each trace α in this set it holds that if α| P encodes a values of type ι then either α contains fail or α| P encodes a value of type κ and (3) for each trace in this set it holds that if it contains a fail then this is the last event in the trace. Note that it holds that every fragment of type ι κ is also an activity of type ι ⇒ κ. Although we do not formally prove this in this paper, it can be verified that the sets of traces that are defined by the presented operators indeed satisfy the requirements by the types of the operators. In Section 4.1 we present the basic operators for defining fragments. Then, in Section 4.2, we explain how to translate basic workflow graphs defined in Taverna 2 into these basic operators. Finally, in Section 4.3 we introduce activities and define the conversions between fragments and activities.
Basic fragment operators
The syntax of the fragment operators is given by:
. . , a n : τ merge a 1 ;...;a n →b : ι
The linking operator ln a→I is used to express arcs that transport data values, and defines a fragment in which values from the port a are copied to all the ports in I . This is illustrated in Fig. 4(a) . The merge operator merge a 1 ;...;a n →b represents the (almost) symmetrical situation where values from many arcs are combined into a list produced on one port. Here the ordering of a 1 ; . . . ; a n in the operator matters, as empahsized by the use of semicolons, and determines the ordering of elements in the result list. This is illustrated in Fig. 4(b) where the numbers on edges depict this ordering. Finally, the composition operator F I G, illustrated in Fig. 4(c) , constructs a fragment by combining the fragments F and G on the interface I so that the values produced by F on this interface are consumed by G. The semantics of the linking workflow graph is defined by:
The rule states that there is a valid trace in which the stream from the input port a, i.e., its subsequent events
, is read and written to each of the output ports in I , and this is decoded by corresponding subtraces β 1 , . . . , β m . The output ports in I are chosen in an arbitrary order but the streams are not reordered.
The semantics of the merge operator is defined by:
merge a 1 ;...;a n →b ⇓ α
Here we have n input ports on which m values arrive. ∀ j=1..m (1 i j n) specifies a port on which every single of those values arrives, i.e., value v j is assumed to arrive on port a i j . Then in the trace α the value from port a i j is outputed as the i j th element of the result list. Thus the ordering of the result list is determined by the ordering of the input ports.
Nevertheless the order of events in the stream can change. This is because α is constructed by interleaving, so this definition allows the merge to buffer certain events. The second rule covers the case of n = 0 where the merge becomes the empty list constructor. The semantics of the composition operation is based on the notion of composing traces for a certain set of port names I . This means that two traces are interleaved such that the output events of one trace for the ports in I coincide with the input events for these ports in the other trace. For example, consider the traces out c ( , v 9 ) ]. The output events of α for the (1, v 2 ), out b (2, v 3 ), out c ( , v 4 ) ]. The input events in β for this set of port names are ( , v 4 ) ]. Observe that the considered output events of α exactly match the considered input events of β in the same order. This means that if α and β are traces of fragments F and G (Fig. 4(c) ), where indeed the ports b and c are connected, then α could describe a run of F , while β describes a possible simultaneous run of G. A run of such a composition of F and G could be constructed by interleaving α and β such that the corresponding output and input events coincide and then removing these output and input events. To illustrate this, we first identify the subtraces α and β that lay between the relevant output and input events:
Here the numbered rows indicate the subtraces that will be interleaved and the unnumbered rows indicate the events on which the traces are synchronized. A possible result could for example be [in a ( ,
The trace composition operator, denoted α • I β, describes all possible compositions of traces α and β that synchronize on the port names in I . We define the set α • I β using the following inference rule:
The first two premises of the rule identify the output and input events that should be synchronized, i.e., α| I and β| I . The next two premises identify the subtraces between the events that are synchronized, namely α 1 , . . . , α m+1 and β 1 , . . . , β m+1 . The final premise states that γ 1 is an interleaving of α 1 and β 1 , γ 2 is an interleaving of α 2 and β 2 , etc. Then, in the conclusion, it is stated that the string γ 1 · · · · · γ m+1 is in the set α • I β.
With the help of the trace composition operator, we can now define the semantics of the fragment composition operator, as follows.
Representing basic Taverna 2 workflow graphs
The preceding operators are sufficient to model the semantics of simple Taverna 2 workflow graphs, such as the one shown in Fig. 5 . In this workflow graph the small circle indicates merging of values arriving from different edges into a list. In Fig. 6 we illustrate how this workflow graph can be represented in the calculus.
We assume that the behavior of the processors P, Q and R as fragments is described by the calculus expressions F P , F Q and F R . Then the total workflow graph can be represented as
This expression builds up the workflow graph from left to right. It starts with the edge connecting input a to input port e, which translates to F 1 = ln a→e . Note that this represents a workflow graph with inputs {a} and outputs {e}. This is combined with the other edge from b to f , giving F 2 = F 1 ∅ ln b→ f . Note that the operator ∅ indicates that F 1 and ln b→ f do not synchronize on any ports, and so operate in parallel. Also note that F 2 has inputs {a, b} and outputs {e, f }.
In the next step we construct F 3 = F 2 e, f F P which connects the output ports {e, f } of F 2 with the input ports {e, f } of F P and therefore has outputs {g}. We then construct F 4 = F 3 g ln g→h, j with outputs {h, j} being copies of the output g of F 3 . Then F 5 = F 4 h F Q has outputs {i, j}, F 6 = F 5 j F R has outputs {i, k}, and finally F 7 = F 6 i,k merge i,k→d has outputs {d}.
Let us illustrate how to use the semantic rules to derive a possible trace of the expression. We assume that P executes a basic activity f P which performs an addition over its inputs, and so
]. We assume that Q doubles its input and therefore 14) ]. Finally we assume that R squares its input, which means that for example 3) , out e ( , 3) , out f ( , 4) ]. Clearly the output events for ports e and f in this final trace match with the input events for ports e and f in the mentioned trace of F P , i.e., they happen in the same order for the same ports, locations and values, and so the e, f operator can combine them, which allows us to derive that
, out j ( , 7) ]. Continuing this for the whole expression we can for example derive that
Note that since the final result is a list with two elements this is encoded as a stream with two events for port d.
Activities
The syntax of basic activities and translations between activities and fragments is given by:
The first rule from the left gives the syntax for the basic activities for which, recall from Section 2, we assume f to be the basic activity name with an input interface ι and output interface κ. The second rule defines the fragment operator which transforms an activity into a fragment, which means that it converts fail events into error values for all the output ports. Here J denotes the output interface of the activity and the resulting fragment. 4 We do, however, need to state this, using the following sub-typing rule: 4 Note that we do not need an additional operator that turns a fragment into an activity, because fragments are activities by definition.
e : ι κ e : ι ⇒ κ
Since basic activities can fail, we need two rules to define their semantics. to deal with a successful execution and with failure, respectively:
The semantics of the fragment operator is defined by the following two rules. The first rule states that traces of an activity A that do not contain a fail event are also traces of the corresponding fragment. The second rule describes what happens if the trace does contain a fail event. In this case, the result trace begins with the events in α, followed by the output events in γ representing an output tuple which contains error values for all output ports in J .
Note that we use the assumption that failing activities have no output events (so no such events need to be removed), and that the fail event is the last event.
Iteration strategies and their trace semantics
In Section 2.1 we introduced and motivated iteration strategies. Here we formalize the iteration strategies in terms of calculus operators, and show that those operators are sufficient to represent all the iteration strategies defined in Taverna 2.
Operators for expressing iteration strategies
Only two additional operators are needed to express Taverna 2 iteration strategies, namely I→ J (F ) and rpt a;b→c . Their syntax is given by:
The dot product iteration operator I→ J (F ) expects a list on each input port in I and iterates on all of them at the same time, i.e., the fragment F is executed once for every combination of elements that are on the same indexes in the input lists. This means that if the input lists are of unequal length then the unmatched elements from the longer lists are ignored. The repeat operator rpt a;b→c repeats the value on port a in a list as often as the list on port b has elements, e.g., it maps [3, 4] ]. The first value can be produced by ln a→a,a a ,b rpt b;a →c a,c a,c→a (rpt a;c→a ) and the second value by rpt b;a→b . As will be illustrated later, this is in fact possible for any iteration strategy and combination of expected and offered port types. The semantics of the dot product iteration operator is given by three rules. We start with the case where all the input lists are non-empty:
..,a n →b 1 ,...,b m (F ) ⇓ δ Here p is the length of the shortest of the input lists, the trace α j for each j ∈ 1, . . . , p is the trace of the execution of fragment F for position j, the event in c i (λ i , v i ) for each i ∈ 1, . . . ,k is an input events that describes parts of the input lists beyond the first p elements and will therefore be ignored. The trace of the dot product iteration is then composed from (i) α, which is a combination of traces α j extended in such a way that the input and output events refer to subvalues of the value at position j, and (ii) β which includes the possible additional input events trimmed from longer input lists.
The second rule deals with the case where there is at least one empty input list, which means that an empty list is also produced on all the output ports. Observe that the result, encoded by β, can be produced as soon as an empty list appears on any of the input ports.
The final rule deals with the case where there are no empty input lists, but at least one error value, which means that error values are produced on all the output ports. Here β starts after at least one input event for every input port has arrived in α 1 , because at that moment it is clear that on port a i there arrived an error value and that none of the input values is an empty list.
Finally, the semantics for the repeat operator is given by:
The first precondition specifies that input events in the concatenation of γ 1 , . . . , γ m+1 encode exactly the input tuple.
The second gives m output events encoding the list with v a repeated n times, from which it follows that the sequence of pairs (λ i , w i ) includes n repetitions of the values needed to encode v a . The output trace is constructed by interleaving the subsequent parts of the input trace with the subsequent output events. The last two preconditions ensure that an output event out c (i j .λ j , w j ) is produced only after (i) on the input b we have seen the index i j and (ii) on the input a we have seen the relevant piece of v a , i.e., (λ j , w j ). Note that the final part of the output trace γ m+1 is necessary because each of the values v 1 , . . . , v n might be encoded with more than one input event, but to produce the output event for its index we only need to have consumed one of those input events. Note also that this definition is not as greedy as it could be, because in principle the operator could output events encoding v a on all the positions 1, . . . , i, if there was an input event for v i+1 , but the given definition matches the current implementation observed in Taverna 2.
Translating iteration strategy expressions
In this section we show that the presented operators and primitives are sufficient to represent all the iteration strategies in Taverna [7] ] then the output streams of IS (a [1] b [1] ) c [2] encode a = [ [1, 1] , [2] ], b = [ [3, 4] , [3] ], c = [ [5, 6] , [7] ] . The processor then iterates over these three lists, simultaneously processing each time the combinations it finds at a certain position. In this case it processes for position 1. Expressing IS (a [1] b [1] ) c [2] requires first expressing IS a [1] b [1] . The output for the b port can be computed as F b = rpt b;a→b . Note that this maps a = [1, 2] , b = [3, 4] to b = [ [3, 4] , [3, 4] ] . The output for the a port can be computed by F a = (ln a→a,a a rpt b;a →b ) a,b a,b →a (rpt a;b →a ) . [3, 4] , [3, 4] ] and from this a,b →a (rpt a;b →a ) produces a = [ [1, 1] , [2, 2] ] . It is then not hard to see that IS a [1] b [2] = ln a→a, a ∅ ln b→b,b a,b F b a ,b (ln a →a ∅ ln b →b a,b F a ) . Indeed, it can be shown that it is possible to express an operator ⊗ i, j I; J that computes the cross product iteration strategy assuming that the left-hand side is encoded in the ports in the set I at nesting depth i and the right-hand side is in ports J at depth j. [2] consists in combining the output of IS a [1] b [1] with that of port c. Note that this output encodes the iteration values at nesting depth 2. So we need to take the dot product of ports a, b and c at nesting depth 2, which can be done by nesting the iteration operator twice and applying it to the identity fragment: a,b,c→a,b,c ( a,b,c→a,b,c (ln a→a ∅ ln b→b ∅ ln c→c ) ). Note that this indeed maps a = [ [1, 1] , [2, 2] ], b = [ [3, 4] , [3, 4] ], c = [ [5, 6] , [7] ] to a = [ [1, 1] , [2] ], b = [ [3, 4] , [3] ], c = [ [5, 6] , [7] ] . Also here it is possible to express a general operator i I; J that computes the dot product iteration strategy assuming that the left-hand side is encoded in the ports in I , the righthand side in ports J and at depth i for all ports. At the time of writing Taverna 2 indeed requires that at the point that the dot product is computed the nesting depth of the expected values is on both sides equal; a workflow graph in which this is not the case is considered incorrect. However, in future versions this might be taken care of by inserting extra list-wrapping operators that increase the nesting depth.
The next step in expressing IS
A complete iteration strategy annotated with positive nesting depth differences can be relatively straightforwardly . If the nesting depth differences are negative then this is remedied by inserting extra wrapping operations that increase the nesting depth. For
In the complete processor semantics, the iteration expression is followed by the actual iteration of the processor over the computed combinations. Observe that in this case this means that it iterates at nesting depth 2. For this we introduce the notation (F ).
The dispatch stack
As mentioned in the introduction, one of the interesting features of the new Taverna 2 model is the extensibility of the process activation sequence. In practice, each processor P in the graph is configured by means of a stack of execution layers and the processing that takes place when the workflow execution reaches P depends on this configuration. Each layer in the stack receives a request from the layer above, it performs a certain function that transforms the request and then it either forwards the request to the layer below or it "bounces" it back up the stack as a response. Taverna 2 comes with a choice of prebuilt standard layers but the stack can also include custom layers of a user provided type. This gives a way to implement additional control operators in Taverna 2.
Each processor's stack is configured independently from the others, using any of the available layers. This makes it possible to associate a different behavior to different nodes in the graph. The semantics of each processors's execution is completely defined by the semantics of each layer, along with the interaction amongst the layers in the stack.
The layers in the stack can be variously configured, but at the bottom is always the Invoke layer, which is responsible for launching the execution of a basic activity and collecting its result. When the activity is a Web Service, for example, this layer implements a client that initiates the service invocation, collects its result and detects error conditions. Other layers perform additional tasks, intended mostly to provide quality of service to the execution. The Retry k layer, for example, accounts for the possibility that a service is temporarily unreachable, the Bounce layer returns a failure state without forwarding the request to the layer below it, if the incoming request from layer contains an error, while the Failover layer tries several alternative activities in turn, hoping that one of them will succeed. We are going to describe the function of each layer more precisely in the rest of the section. Specifically, we show how the behavior of each standard layer can be formalized in terms of our trace semantics. We then provide a formal definition for two additional custom layers, namely the Branch layer and the Loop layer, which can be used to add new control flow operators to Taverna, an ifthen-else and a while-loop operator respectively. This shows how one can exploit the stack-based architecture to extend the workflow definition language of Taverna 2 and furthermore how such extensions can be described within the trace semantics framework.
Dispatch stack activities
In the following we describe the semantics of the dispatch stack in terms of activities, i.e., given a list of activities
we describe the behaviour of activity S(A). Later on in Section 7 we show how the total behavior of a processor is described based on this, but in terms of fragments and taking the iteration strategy into account.
The syntax for the dispatch stack layers is as follows:
Here we adopt the notation [H|T ] to denote a list with head H and tail T . In the following we describe the behavior of each layer in the same order.
Note that, if a stack does not include either e Failover or Branch c layers, then its activity list must be a singleton, because the bottom Invoke layer requires exactly on activity. Note also that, when Failover or Branch c are present, the activity list can be of an arbitrary length, even empty, yet all activities in it will be of the same type.
Standard layers
The Invoke layer expects a single activity in the list, and executes it. Since it is always required as the last layer it is not represented in the formal definition: its behavior is described by the behavior of the empty stack.
The role of the Retry k layer is to submit the activities in its input list at most k times to the layers below it. The layer returns with failure, if the layers below it in the stack fail all k times, and it returns with success as soon as one of the attempts succeeds.
The first rule explicitly allows that 0 retries are specified, which means that the processor always fails after consuming the whole input. The second rule describes a successful execution where the trace of the layers remaining in R does not contain a failure and becomes the result trace. The third rule deals with the case where the trace of the layers remaining in R contains a failure, i.e., equals α · [fail]. The resulting trace is obtained by removing the failure event and replacing it with a new trace β 2 . This new trace is obtained by removing the input events from a possible trace β of S (A) where S is equal to S except the number of retries is decreased by one. The equation α| P = β| P guarantees that β includes the same input events as α, albeit possibly in some different order. The part of β that is included in the resulting trace, namely β 2 , contains all events in β except the input events. This is guaranteed by β| P = β 1 | which implies that all input events of β are in β 1 and therefore not in β 2 .
The role of the Failover layer is to try and submit each activity in the input list to the layer below. It returns with success when an activity succeeds, and it fails when all activities fail. This is similar to the Retry k layer, except that the number of attempts is determined by the length of the list A, rather than by a user-defined parameter k.
In principle, this layer may implement one of a number of strategies to prioritize the activities. We illustrate the simple model where the activities are tried in the order in which they appear in the list.
The rules for the Failover layer are analogous those for the Retry k layer. The Bounce layer returns a failure state without forwarding the request to the layer below it, if the incoming request from the previous layer contains an error. Otherwise it returns the value provided by the layer below it.
The left rule deals with the case where the value on some input port a contains an error on any index λ. In this case, the request is bounced off and a fail event is appended to the complete input. The right rule deals with the complementary case where non of the input values was an error nor contained an error as any of its subvalues. The resulting trace is then the complete trace of the rest of the stack R(A).
Extension layers
So far we have described the layers that comprise the standard configuration of the dispatch stack. A powerful feature of Taverna 2 is its ability to accept new layers as part of a stack's configuration. As each processor's layer is configured independently from the others, new layers can be selectively added to provide specific processors with new functionality. Here we describe two such layers, which add control operator functionality to Taverna 2.
The first of the two, the Loop c layer, is used to implement a conditional loop processor P . We assume that the input and output ports of P are identical and include one distinguished port named c. If the layer receives a request with an interface value t where t(c) is true, then it forwards the request to the next layer. If this returns a result then it feeds this back as a request to itself, but if there is no result it returns failure. If however in the request t(c) is not true, then the input is immediately returned unchanged.
The first rule covers the case where the value provided on the port c is not true. The operation described by the result
is simply coping the input to output, possibly with some buffering. Note that we do not allow output events in α 1 , i.e., before we know that the value on port c is not the boolean true.
In the second rule α describes the initial iteration of the loop, which occurs because on port c a boolean true is provided. Here we also not allow output events in α 1 . The result trace of the whole iteration is obtained by, recall the trace composition operator • I from Section 4.1, composing α with the trace of the remaining iterations of the loop described by β on the whole domain of input ports P.
The second of the two custom layers enables workflow designers to create If-then-else processors. The layer's behavior is similar to that of the Loop layer, in that it assumes a conditional value is bound to a particular variable c, i.e., (c, i) ∈ t. This time the value, a positive integer in the range 1 . . . |A|, is used to select the ith element A(i) of list A. This activity is forwarded to the layer below for execution. A value that is out of range results in a failure.
The first rule deals with the case where the value v provided on port c is out of range. The resulting trace contains all the input events followed by failure. Note that we allow for the input event for port c to be followed by some other input events. The second rule deals with the case that the value v provided on port c is in the correct range. The result trace is then a valid trace for R ([A(v)]) , i.e., the remaining layers and the vth activity, but we require that in the part of the trace before the input event for port c, that is in α 1 , there are no other events except input events.
Translating Taverna 2 processor specifications
In this section it is explained how to represent in a calculus expression the behavior of a single processor. Recall that in Section 4.2 it was shown how to represent a workflow graph where the behavior of the individual processors was represented by abstract expressions such as F P , F Q and F R . Here we explain how these expressions are constructed from the processors' specification.
The complete specification of a processor's behavior in Taverna 2 consists of the following components: (1) the expected types on each of its input ports, (2) the iteration strategy expression, (3) the dispatch stack S and (4) a list of activity specifications. In the default configuration, the stack S is of the form S = [Bounce, Failover, Retry k ]. This has the effect that a request submitted to the processor succeeds if at least one of the activities in A succeeds, possibly after failing at most k − 1 times, and it fails otherwise. The additional extension layers, like the Loop c layer, are typically placed at the top of the stack. The list of activity specifications can contain basic activity names and workflow graphs.
The construction of the corresponding calculus expression is done in three steps. In the first step we construct an expression for the core semantics which describes the behavior of the dispatch stack and the activities list. In the second step we extend this expression such that it describes the step semantics which describes the behavior of a single iteration step of the processor. In the last step the expression is again extended such that it describes the iteration semantics by incorporating the iteration strategy and describing how the processor iterates over the provided input values.
We begin with the translation of the list of activity specification which is translated to a list of activity expressions A.
The basic activity names are simply translated to themselves. Each workflow graph is represented by a fragment expression that gives the semantics of this workflow graph, as explained in Section 4.2. The core semantics of the processor is then described by the activity expression S(A).
The core semantics falls short of describing the step semantics of a processor, i.e., the semantics for a single iteration step, in two ways. The first is that it might still fail, and this can be remedied by applying the fragment operator fr J () which replaces failure with the production of error values. The second is that a processor always waits with executing an iteration step until all the input values for that step have completely arrived. To represent this we introduce the input synchronization operator:
We then can represent the step semantics of a processor as sync(fr J (S(A))) where J is the set of output ports of the processor.
We now proceed with representing the iteration semantics of a processor. An iteration strategy is that is annotated with nesting depth differences can be expressed by an expression IS is , as discussed in Section 5.2. The nesting depth difference can be computed from the expected type, which is given, and the offered type, which can be relatively straightforwardly computed. For example, if the input port is connected to an output port of a preceding processor, then it is the produced output type of that output port, and if it is connected to a workflow input then it is the expected type of that workflow input. Assuming that in the result of IS is the expected values are found at nesting depth i, the full semantics of the processor can be described as IS is I i I→ J (sync(fr J (S(A)))) where I and J are the set of input ports and output ports, respectively, of the processor.
Together with the translation given in Section 4.2 for the workflow graph that contains the processors, this demonstrates how the semantics of a complete Taverna 2 workflow graph can be represented as a fragment expression in the calculus. As an example consider again the workflow graph in Fig. 5 in Section 4.2. Recall that the whole workflow graph could be represented as F 7 = (((((ln a→e ∅ ln b→ f ) e, f F P ) g ln g→h, j ) h F Q ) j F R ) i,k merge i,k→d . Now assume that we know for processor P that its dispatch stack is the default one, i.e., S = [Bounce, Failover, Retry 4 ], and its list of activities contains one basic service viz. s 1 . In that case the core activity of the processor is given by S(A) = [Bounce, Failover, Retry 4 ] ([s 1 ] ). Assume that for P the specified iteration strategy is (e f ) and that the types of the workflow inputs a and b are [s] and [s] , and the expected input types for the ports e and f of service s 1 are s and s, respectively. It then follows that for P for ports e and f the offered types are [s] and [s], respectively, and the expected types are s and s, respectively. So the annotated iteration strategy is (e [1] f [1] ) and its corresponding calculus expression is expressed by 1 e; f which simply copies the input lists from the input port e to the output port e and the same for the input port f and the output port f . Since this encodes the iteration values at nesting depth 1 the full representation of the processor semantics becomes 
)))).
A feature that was not discussed is the control link between two processors, which indicates that the sink processor cannot start execution before the source processor, also called the controlling processor, has produced all its output. This can be simulated by adding dummy input ports to the sink processor for each of the output ports of the source processor and connecting them. The expected types of these dummy input ports should be exactly the produced types by the corresponding output port, taking into account the offered types and iteration strategy for the source processor. Adding such dummy input ports can be done by nesting the processor in a nested workflow with an empty dispatch stack and that contains only this processor, and adding the dummy ports as extra unconnected workflow inputs. These unconnected input ports can be represented by a special dummy input port operator dum a which defines a fragment with a single input port a of port type τ and no output ports. Its behavior is simply that it consumes the input values from port a and nothing else.
Formally its syntax and semantics is defined by:
Next to representing unconnected workflow inputs this operator also allows us to represent workflow graphs where certain output ports are not connected to any input port or workflow output. In this case the output port can be connected to the input port of a dummy input port operator.
Conclusion and further research
We have presented in this paper a calculus with trace semantics to define the formal semantics of Taverna 2 workflow graphs. This calculus consists of a limited set of operators that each capture a specific aspect of the execution of workflows represented in these workflow graphs. Specifically they allow the description of the iteration strategies, the pipe-lined execution of processors, and the dispatch stack mechanism that is used in Taverna 2 to configure the specific behavior of a processor. Although the formal definition of the semantics of the presented operators is far from trivial, we claim that their intuitive meaning is not hard to understand and can help to create insight into the exact meaning of Taverna 2 workflow graphs. Moreover, we think it can serve as a starting point for determining whether the behavior of two different workflow graphs is identical.
Although the formalisation of properties like parallel and pipelined execution is complete, aspects of the definition of workflow graphs remain to be formalized, in particular their complete translation into calculus expressions remains the subject of future research. This would include the complete algorithms for deriving the offered types for processor input ports, the produced types for processor output ports and the complete translation of an annotated iteration strategy, which were all only described informally here. In addition also the translation of control links was only described informally and will be formalized in more detail in future work.
