Abstract Unlike their traditional, silicon counterparts, DNA computers have natural interfaces with both chemical and biological systems. These can be used for a number of applications, including the precise arrangement of matter at the nanoscale and the creation of smart biosensors. Like silicon circuits, DNA strand displacement systems (DSD) can evaluate non-trivial functions. However, these systems can be slow and are susceptible to errors. It has been suggested that localised hybridization reactions could overcome some of these challenges. Localised reactions occur in DNA 'walker' systems which were recently shown to be capable of navigating a programmable track tethered to an origami tile. We investigate the computational potential of these systems for evaluating Boolean functions and forming composable circuits. We find that systems of multiple walkers have severely limited potential for parallel circuit evaluation. DNA walkers, like DSDs, are also susceptible to errors. We develop a discrete stochastic model of DNA walker 'circuits' based on experimental data, and demonstrate the merit of using probabilistic model checking techniques to analyse their reliability, performance and correctness. This analysis aids in the design of reliable and efficient DNA walker circuits.
Introduction
The development of simple biomolecular computers is attractive for engineering and health applications that require in vitro or in vivo information processing capabilities. DNA computing models which use hybridization and strand displacement reactions to perform computation have been particularly successful. DNA strand displacement systems (DSD) have been shown experimentally to simulate logic circuits (Qian and Winfree 2011; Seelig et al. 2006) and are known to be Turing-universal (Qian et al. 2010) . However, computing with biomolecules creates many challenges. For example, reactions within a DSD are global in the following sense: strands which are intended to react must first encounter one another in a mixed solution. The mixing of all reactants may lead to unintended reactions between strands. These systems do not, at present, ensure the spatial locality typical of other computing models. Qian and Winfree suggested that tethering DNA based circuits to an origami tile could overcome some of these challenges (Qian and Winfree 2011) . This idea was explored and expanded upon by Chandran et al. (2011) , who investigate how such systems could be realised experimentally, give constructions of composable circuits, and propose a biophysical model for verification of tethered, hybridization-based circuits. Our work is largely inspired by theirs, but we consider another setting which also exhibits localised reactions: DNA walker systems (Bath et al. 2009 (Bath et al. , 2005 Cha et al. 2014; Green et al. 2008; Omabegho et al. 2009; Wickham et al. 2011; Yin et al. 2004) , in particular programmable walkers (Muscat et al. 2011 (Muscat et al. , 2012 Wickham et al. 2012) . Theoretical work on the motion of walkers from a non-computational perspective includes (Semenov et al. 2011) .
In the walker system considered in this work, the walker traverses a track of strands, called anchorages, that are tethered to a DNA origami tile (Wickham et al. 2012) . Anchorages contain a domain that is complementary to the walker strand. Movement of the walker between anchorages is shown in Fig. 1 . After experimental preparation, all anchorages are unblocked-they are hybridized to the origami and no other strand-with the exception of designated blocking anchorages that are initially bound in a duplex to a blocking strand. Anchorages and their blocking strands are addressed by means of distinct toehold sequences (shown coloured): anchorages are selectively unblocked by adding strands complementary to their blockers as input. Much like field programmable gate arrays, these systems are easily reconfigured. By using programmable anchorages at track junctions, Wickham et al. (2012) demonstrate that a walker can be directed to any leaf in a complete two-level binary tree using input strands that unblock the intended path.
In Sect. 2, the computational expressiveness of such walker systems is explored-including their potential to create composable circuits-using a theoretical framework that assumes ideal conditions. We highlight significant limitations of current walker systems and motivate future work. In Sect. 3 we develop a probabilistic model to analyse the impact of different sources of error that arise in experiments on reliability, performance and correctness of the computation. The model can be used to support the design and verification of DNA walker circuits.
Computational potential of DNA walker circuits
In this section we explore the computational potential of DNA walker systems. We focus on deterministic Boolean function evaluation, and call the resulting constructions DNA walker circuits. We begin by defining a model of computation that makes explicit the underlying assumptions that characterize the DNA walker systems considered here. These assumptions are consistent with current published experimental systems: in particular, we do not explore the potential for multiple walkers to interact within the same circuit. However, we do consider the potential consequences for parallel computation. To simplify the presentation, some technical proofs have been moved to the appendix.
A model of computation for DNA walker circuits
A DNA walker circuit is composed of straight, undirected, tracks (consecutive anchorages), and gates (track junction points) that connect at most three tracks. A gate can have at most one Boolean guard for each track that it connects. A particular guard is implemented using one or more blocking strands that share a common toehold sequence. Distinct guards use distinct toehold sequences: a logical relationship between guards (e.g., one is the negation of another) does not imply a relationship between their toehold sequences. A track adjacent to a gate is unblocked if it has a guard that evaluates to true, i.e., if its unblocking strands are added to solution, and is otherwise blocked. When the system is prepared, a self-consistent set of unblocking strands is added to unblock X or :X but not both. For Fig. 1 1 The walker strand carries a load (Q) that will quench fluorophores (F) when nearby. After experimental preparation, the walker is attached to the initial anchorage and blocking strands are present on designated blocking anchorages. Selected anchorages that are initially blocked can become unblocked by adding complementary unblocking strands. In this case, unblocking strands are added for the blocked anchorages that are labelled by :X. 2 Once a nicking enzyme (E) is added, it can attach to the walker-anchorage complex and cut the anchorage. The anchorage top melts away from the walker, exposing 6 nucleotides as a toehold. 3 The exposed toehold becomes attached to the next anchorage. 4 In a displacement reaction, the walker migrates to the new anchorage. The stepping is energetically favourable, because it reforms the base pairs that were lost after the previous anchorage was cut. 5 Repeating this process, the walker arrives at a junction. The walker continues down the unblocked track, eventually reaching the final anchorage and quenching the fluorophore example, Fig. 1 depicts a circuit of a single gate connecting three tracks. The track ending with the anchorage marked with the red fluorophore (top right of panel 1) has the Boolean guard X, while the track ending with the anchorage marked with the green fluorophore has the Boolean guard :X. Panel 2 of Fig. 1 shows that the path to the green fluorophore is unblocked when :X evaluates to true (i.e., the unblocking strands for :X are added to solution). In this case, X evaluates to false and the path to the red fluorophore remains blocked (i.e., the unblocking strands for X are not added to solution). This is an example of a fork gate. We define a fork gate as having at most one input track, and exactly two guarded output tracks. Each circuit has one source-a fork gate with no input track denoting the initial position of a walker. A join gate with an output track has two guarded input tracks. A join gate with no output track is a sink and has at most two (unguarded) input tracks. Each circuit has one or more true sinks and one or more false sinks.
In a circuit C with Boolean guards over n variables, a variable assignment A for C is a truth assignment of those n variables. Consider any DNA walker circuit C and variable assignment A for C. Let C½A denote the set of unblocked paths originating from the source of C, after all guards are evaluated as blocked or unblocked, under assignment A. We say that C is deterministic under assignment A if there is exactly one unblocked path from the source to a sink in C½A. A fork gate is deterministic if, under any assignment that it is reachable, exactly one output track is unblocked. Similarly, a join gate is deterministic if, under any assignment that it is reachable by one input track, the other input track is blocked. Note that this definition of determinism precludes the possibility of a deadlock, (i.e., when there is no unblocked path from the source to a sink). Let VALUE C½A ð Þ be the output value of the circuit under assignment A (i.e., whether the reachable sink is a true sink or a false sink). Circuit C is deterministic if it is deterministic under all possible variable assignments.
A circuit set S, consisting of one or more unconnected circuits, is deterministic if and only if
, for each C i ; C j 2 S, under any possible assignment A. Informally, this states that different circuit components in a deterministic circuit set cannot report different output values under the same assignment. (The purpose of circuit sets is demonstrated in the next section.) Let VALUE S½A ð Þ be the value of S under assignment A. The size of S, denoted by SIZE S ð Þ, is the total count of component gates. 1 We define the worst case time of a computation in S, denoted by TIME S ð Þ, as the longest unblocked path from a source to a sink, under any variable assignment. This notion of time captures the ability of multiple walkers to simultaneously traverse disjoint paths (one per unconnected circuit).
Let S½A denote the set of unblocked paths in S under assignment A (one per unconnected circuit). Given a circuit C i 2 S, we say that a gate G 2 C i is reachable in C i ½A (equivalently S½A) if there exists an unblocked path from the source of C i to G, under assignment A. Gates that are not reachable under any variable assignment are called redundant. An example of a fork gate that is never reachable, and therefore redundant, is shown in Fig. 2 . We will reason about circuit sets where all gates are nonredundant. When this is not the case, the circuit set can be simplified to one that is logically equivalent.
Deterministic fork and join gates in DNA walker circuits
In this section, we begin by establishing the necessary and sufficient conditions for a fork gate to be deterministic in a DNA walker circuit. The fork gate in this model is the primitive used to branch a computation based on the values of its output guards.
Recall that a fork gate that is not reachable under any variable assignment is called redundant. A fork gate G with output track guards G L and G R is trivial if every path p that can reach G either traverses a track guarded by G L and another guarded by :G R before reaching G, or it traverses a track guarded by :G L and another guarded by G R before reaching G. Note that if G is reachable (i.e., non-redundant), then it must be the case that G L 6 G R . We say that gate G is trivial because any path leading to G fully dictates which output track will be traversed. The following lemma holds by definition of a trivial fork gate.
Lemma 1 A non-redundant fork gate G in a DNA walker circuit is deterministic if it is trivial.
Note that when G L :G R (i.e., the guards are negations of each other), then a gate is trivial if every path p that can reach G must first traverse a track guarded by any of G L ; G R ; :G L , or :G R . An example of a trivial fork gate of this kind is depicted in Fig. 2 . A trivial fork gate does have uses as we will see in Sect. 2.3.
Lemma 2 A fork gate in a DNA walker circuit without a distinct guard on both output tracks is either redundant, trivial or not deterministic.
The following theorem shows that non-redundant fork gates that are deterministic must be trivial or have output guards that are negations of each other.
Theorem 1 A non-redundant fork gate in a DNA walker circuit is deterministic if and only if it is trivial or there exists some guard G such that the left output track is guarded by G and the right is guarded by :G.
Given any Boolean function f : f0; 1g n ! f0; 1g, there exists a deterministic DNA walker circuit set S that can evaluate f , under any assignment to its n variables, such that TIME S ð Þ ¼ OðnÞ. One construction is to simply form a canonical binary decision tree over some fixed order of the n variables. However, in such a construction SIZE S ð Þ ¼ Hð2 n Þ. It is natural to consider more space efficient representations to evaluate f , such as binary decision diagrams (BDDs) (Bryant 1992) . In particular, reduced ordered BDDs are capable of representing some Boolean functions in a compressed form that can be exponentially smaller than its canonical binary decision tree representation. Like walker circuits, BDDs have a unique source. Unlike general BDDs, DNA walker circuits are necessarily planar. Either we are limited to considering planar BDD representations or additional fork and join nodes must be added to a BDD representation when realising it as a walker circuit-we show how a non-planar circuit can be made planar in Sect. 2.3. A significant difference, however, is that BDDs form directed acyclic graphs while tracks in a DNA walker circuit are undirected. Consider the case when a walker reaches a join gate via its left input track. Unless the right input track is blocked, the walker is equally likely to continue on the right input track as it is on the output track. Additional steps are necessary to compensate for the undirected nature of tracks in walker circuits.
Unlike fork gates, it is not obvious whether all join gates can be made deterministic. Theorem 2 characterizes both the necessary and sufficient conditions: a deterministic join of two disjoint sets of paths, one for each input track, is only possible if they were previously ''forked'' 2 on some variable X (i.e., in one set all paths traverse an edge guarded by X and in the other set all traverse an edge guarded by :X). This property is exemplified by the contrast between the disjunction circuit of Fig. 3a and the disjunction of two conjunctions circuit as shown in Fig. 3b . In the latter, two walkers are used in an attempt to parallelize the evaluation. However, as the clauses do not have literals over a common variable, there are no guards that can be assigned to the join gate labeled J to ensure the circuit is deterministic. Note that this limitation is not caused by the restricted topology of walker circuits (i.e., their layout on a planar surface), but rather by the property that their tracks are undirected.
Theorem 2 A non-redundant join gate in a DNA walker circuit is deterministic if and only if it is a sink or there exists some guard G such that the left input track is guarded by G, the right by :G and, prior to reaching those No assignment of guards to the join gate labelled J can ensure that this circuit is deterministic. This is evident when
guards, all paths that can reach the left input must traverse a track guarded by G and all paths that can reach the right must traverse a track guarded by :G.
Composable DNA walker circuits
Despite the shortcomings of join gates in current DNA walker circuits, it is not the case that Boolean formulas must be evaluated using a circuit forming a binary decision tree. Large scale combinatorial Boolean circuit design is possible because of composable gates. In this section, we demonstrate that DNA walker circuits can simulate fundamental Boolean functions that are easily composable into larger circuits. A gate is composable if and only if it has one input gate and two output gates-one denoting true, the other false-such that the input and output gates lie on the external face of the connectivity graph of the circuit. The last condition ensures that a track can be connected to these gates without crossing another existing track. A common design pattern is to first create gates for simple functions. In turn, these can be composed to create circuits for functions with increased complexity. Using automated verification techniques, these fundamental component circuits could be designed and optimised for a number of criteria, including reliability and expected time. (This is the focus of Sect. 3.)
We begin by considering composable DNA walker circuits that can function as composable gates for Boolean functions over two variables. Each DNA walker circuit can realize a number of different Boolean functions by interpreting different blocking anchorages as different Boolean guards. Consider the composable circuits over two variables, X and Y, in Fig. 4 ; the common circuit topology shown in the top row can realize six different Boolean functions, while the common topology shown in the middle and bottom rows can realize eight different Boolean functions. Furthermore, each instantiation is fully composable: its source can be connected to the output of a previous circuit, and both its true and false outputs can be connected to other circuits.
The common topology of the middle and bottom rows can be composed to simulate any Boolean function. This is possible as the topology is (i) composable and (ii) can function as either of two universal gates (NAND and NOR). While just two different topologies can realize fourteen different Boolean functions over two variables, note that there are in fact sixteen possible Boolean functions over two variables 3 . The two functions which cannot be realized by either of these topologies are EQUAL (are both inputs equal), and XOR (are both inputs different). The two topologies of Fig. 4 are called simple circuits-circuits that do not contain trivial fork gates. Recall that a fork gate with output guards G and :G is trivial if all paths leading to the gate must first traverse a track guarded by G or :G. We next show that topological constraints preclude composable, simple circuits from evaluating certain classes of functions, such as EQUAL and XOR. An invalid topology for a simple, composable DNA walker circuit that can simulate the EQUAL and XOR functions is shown in Fig. 5 .
Our strategy for identifying a class of topologies which cannot be realized as composable, simple walker circuits is to identify those which are non-planar. We use the following characterization of planar graphs due to Wagner (1937) . Specifically, we will identify topologies that contain the forbidden graph minor K 3;3 -the complete bipartite graph where each partition has three vertices.
Theorem 3 (Wagner (1937) ). A finite graph is planar if and only if it does not contain K 5 nor K 3;3 as a minor.
Theorem 4 If a DNA walker circuit is composable and simple then both output gates are not reachable from two other distinct gates.
Proof Suppose, by contradiction, that some DNA walker circuit C is composable and simple and that both output gates are reachable from two other distinct gates. Label one output gate t, the other f , the single input gate s and let p and q be the labels of the two distinct gates which can reach both t and f . There must exist a path from s to p and a path from s to q. By the assumption, there is also a path from p to t, p to f , q to t and q to f . As C is a composable circuit, then s, t and f must be on the external (unbounded) face of the connectivity graph. To enforce this condition, add a new vertex x to the external face and connect it to s, t and f . If s, t and f did lie on an external face before adding x, then they can be connected to x without crossing any edges. The resulting topological connections are illustrated in Fig. 6 (left). However, these connected paths contain K 3;3 as a minor and therefore C is not planar by Theorem 3. As such, C is not a composable, simple walker circuit (or a valid walker circuit in general). Contradiction. h
The composable, simple circuits of Fig. 5 are invalid as their topologies are non-planar. However, composable circuits for these functions which are planar, but not simple, can be realized. A non-planar topology can be made planar by introducing additional join and trivial fork gates. The general strategy is illustrated in Fig. 7a . Consider a pair of crossing paths (tracks). Firstly, their intersection is replaced by a simple gadget consisting of a join gate, a connecting track, and a fork gate. As each connected An illustration used in the proof of Theorem 4 demonstrating that no composable, simple circuit is planar if both output gates are reachable from two distinct gates. If both output gates were reachable from two distinct gates, then the connectivity graph (left) must contain K 3;3 (right) as a graph minor 3 A Boolean function must differ from another on at least one input. As such, there are 2 2 n possible Boolean functions of n variables that differ in at least one of the 2 n possible inputs. Intuitively, this is the number of unique binary strings of length 2 n .
component of a walker circuit has a single source, then these paths must have diverged from one or more fork gates as otherwise they would be the same path. Suppose that the last fork gate where these paths diverged had output guards L and :L. Without loss of generality, suppose the path originating in the top left of the crossing traversed the output track guarded by L (of the last fork gate that the paths diverged). Then the path originating in the top right of the crossing traversed the output track guarded by :L. Secondly, add the guards L and :L to the left and right input tracks of the new join gate, respectively, and add the guards :L and L to the left and right output guards of the fork gate, respectively. This completes the transformation. By Theorem 2, the new join gate is deterministic as it is guarded by L and :L-a variable on whose value the two paths were previously ''forked''. By Theorem 1, the fork gate is deterministic as the guards are negations of each other. Together, these gates ensure that there is a unique path through the connecting gadget for any variable assignment. An example of transforming the invalid, composable, simple walker circuit for XOR into a valid, composable (but not simple) walker circuit is shown in Fig. 7b . In this case, the crossing paths last diverged at the initial fork gate guarded by X and :X. In general, this transformation necessarily results in a larger circuit-two gates, and four guards are added for every crossing path.
Reporting output in DNA walker circuits
Output of a DNA walker circuit can be reported with the use of different coloured (spectrally resolvable) fluorophores and also quenchers. If a walker carries a quencher cargo, then it has the potential to decrease one of a number of different fluorescent signals from fluorophores positioned at the circuit sinks. This scenario is illustrated in Fig. 8 (Left). In a circuit that decides a Boolean function, a single, quenching, walker can only decrease the signal of a particular colour (corresponding to a particular fluorophore) by an amount that is inversely proportional to the number of sinks labelled with that same colour. Accurate output reporting could be problematic in larger circuits with many sinks. We will therefore focus only on reporting strategies that fully suppress a particular colour. Rather than carrying a quencher, a walker instead carries a fluorophore of a single colour and either all true sinks or all false sinks are labelled with quenchers. An example with quenching true sinks is shown in Fig. 8 (Center) . This circuit can fully suppress the fluorophore signal when it evaluates to true, regardless of its size. However, this is a onesided reporting strategy as one cannot distinguish between the case of an incomplete computation or one evaluating to false. As illustrated in Fig. 8 (Right), this shortcoming can be addressed by using two circuits in parallel, with each using a one-sided reporting strategy. Each of the two (otherwise identical) circuits uses a different coloured walker: one has quenching false sinks and the other quenching true sinks. In this circuit set, one colour will be fully suppressed when it is true, the other when it is false, and neither will be suppressed until the computation completes. Any Boolean formula can be represented in one of its canonical forms. In this section, we focus on conjunctive normal form (CNF) which is a single conjunction of clauses, where each clause is a disjunction over literals. A formula in CNF is said to be k-CNF if the largest clause has size k. Using a standard transformation, a Boolean formula in k-CNF with at most l total literals can be converted to an equisatisfiable 3-CNF formula over OðlÞ variables, with at most OðlÞ clauses (each having at most 3 literals) (Karp 1972). As such, we will reason exclusively about circuits to evaluate 3-CNF formulas. Constructing a walker circuit to represent a formula in 3-CNF with m clauses is straightforward. Each clause can be represented by the disjunction circuit of Fig. 3a . The source of the circuit will be the first fork gate of the first clause. The output track signalling the i-th clause is satisfied is connected to the input track of clause i þ 1. Thus, the walker will only reach the single true sink of the circuit (output from clause m) if the formula is satisfied for that particular variable assignment. To ensure that both true and false signals can be reported deterministically, we use the reporting strategy depicted in Fig. 8 (Right) which employs two parallel copies of the circuit, each using different coloured walkers and different quenching sinks.
Theorem 5 Let F be any 3-CNF Boolean formula with m clauses. There exists a DNA walker circuit set S, with SIZEðSÞ ¼ HðmÞ and TIMEðSÞ ¼ OðmÞ, such that given any variable assignment A for F , VALUE S½A ð Þis the truth value of F under assignment A. Proof The construction is described in the preceding paragraph. It is easy to see that the circuit is deterministic and that it correctly reports the truth value of F under assignment A. What remains is to bound the circuit size and worst case time. The construction uses a set of two circuits: C T and C F . Consider the circuit C T used to evaluate if F is true under assignment A. There are m clauses and each is simulated by a disjunction circuit of size Oð1Þ. These circuits are composed in series to form C T . Therefore, SIZEðC T Þ ¼ HðmÞ and TIMEðC T Þ ¼ OðmÞ. The arguments are the same for circuit C F and, as both are evaluated in parallel, the claim follows.
h
While the construction of Theorem 5 can represent any Boolean formula, and some in exponentially less space than a binary decision tree, the resulting circuit set is formula specific. Given the effort of creating DNA walker circuits, a more uniform circuit-one capable of evaluating many Boolean functions-is worth exploring. As with silicon circuits, we can construct a uniform circuit to evaluate any 3-CNF formula, under any variable assignment, up to some bound on the number of variables. Each variable can be present in a clause as either a positive or negative literal, but not both. (The circuit can be modified to handle this case if necessary.) Therefore, there are at most 2 3 n 3 unique clauses in any 3-CNF Boolean formula over n variables. This bound also holds true for any formula over m variables, where m n. In this general circuit, we supplement each possible clause with an initial fork gate guarded on the condition of the clause being active or inactive in the particular formula being evaluated. If it is inactive, the walker can pass through to the output Center a green coloured walker and quenching true sinks. When the circuit evaluates to true the green signal is fully suppressed. However, the fluorescence output from this circuit cannot distinguish between an incomplete computation and a false one. Right two parallel copies of the circuit, with different fluorophores labelling the walkers and with quenching true sinks in one and quenching false sinks in the other: the computation is complete and unambiguously reported when one colour is suppressed track denoting true, without traversing guards for the literals of the clause. Note that this only increases the size of each clause by a constant.
Corollary 1 There exists a DNA walker circuit set S, with SIZEðSÞ ¼ Oðn 3 Þ and TIMEðSÞ ¼ Oðn 3 Þ, that can evaluate any 3-CNF Boolean formula over m n variables under any variable assignment.
A 3-CNF formula with m clauses can be evaluated in polylogarithmic time (in m) using a silicon circuit in a straightforward manner: each clause can be evaluated in parallel and those results can be combined using a binary reduction tree of height Oðlog mÞ-only if all clauses are satisfied will the root of the reduction tree output true. Is the same possible in DNA walker circuits? Unfortunately, this is not the case in general. Such a circuit would require a new kind of join gate, outside of our current model of computation, to perform a conjunction of multiple walkers -one walker leaves the gate only after all input walkers have arrived. Parallel evaluation of circuits representing formulas in disjunctive normal form (DNF) does not fare better. Consider the case of a DNF formula with m clauses where clause m À 1 and clause m have no literals over a common variable. By Theorem 2, a join gate connecting the circuits for these clauses cannot be deterministic. An example of this situation is given in Fig. 3(b) .
Design and verification of DNA walker circuits
We have so far assumed DNA walker circuits to work perfectly. In a real experiment various errors can occur, for example, the walker may release from a track, or a blockade can fail to block an anchorage. In this section, we analyse the reliability and performance of DNA walker circuits using probabilistic model checking. We develop a continuous-time Markov chain (CTMC) model, based on DNA walker experiments (Bath et al. 2005; Wickham et al. 2012 Wickham et al. , 2011 , and analyse it against quantitative properties such as the probability of the computation terminating or the expected number of steps until termination. We use the PRISM model checker (Kwiatkowska et al. 2011) , which accepts properties in the form of temporal logic. The stepping behaviour of the walker is modelled using rate constants that were estimated in previous work (Wickham et al. 2011) . The predictions of the model match the experiments in Fig. 9 well, and we use the rate constants without further fitting. We do, however, fit a failure rate for blockades using single-junction tracks (Fig. 10 ). The quality of the model is evaluated by comparison with experimental results for the double-junction circuit. While doing so, we identify a leakage transition within that circuit, which can be removed by changing the layout of the circuit. As an example, we optimise a large circuit composed of three circuits that only differ in blockade guards, and state design principles that aim to minimize leakage reactions.
Probabilistic model checking with PRISM
Probabilistic model checking is a formal verification technique for the modelling and analysis of discrete stochastic models, such as those arising from biochemical reactions (Kwiatkowska et al. 2007 (Kwiatkowska et al. , 2010 . We use the probabilistic model checker PRISM (Kwiatkowska et al. 2011) , which supports numerical (uniformisation) and statistical (Monte Carlo) methods, to compute various properties of DNA walker circuits. PRISM employs the Continuous Stochastic Logic (CSL) (Aziz et al. 2006; Baier et al. 2003) , endowed with the reward operator introduced in (Kwiatkowska et al. 2007 ), over which the properties are specified. A path is an alternating sequence of states and residence times, where states are labelled with atomic propositions, e.g., we can label all states of the model where a walker quenches any fluorophore by ''finished''. Temporal logic properties allow us to reason about the order of events in a path. For example, the formula ''Ffinished'' (eventually) is true for a path if there exists a state in the path labelled ''finished'', and the time-bounded formula ''F ½0;T finished'' (eventually by time T) is true if there exists a state in the path labelled ''finished'' that is reached in up to T time units. More generally, given state formulas U and W, the (until) formula ''U U ½T;T 0 W'' is true for a path if W becomes true at a time point in the interval ½T; T 0 , and U remains true until then. The unbounded variant ''U U W'' is defined similarly. The formula ''F U'' is equivalent to ''true UU''.
The set of paths of a CTMC is endowed with a probability measure (Baier et al. 2003; Kwiatkowska et al. 2007 ), and we can use the CSL probability operator P, which takes a path formula as an argument, to reason about the probability of events occurring. For example, the formula P ¼? ½ F ½T;T finished yields the probability of all paths that satisfy ''F ½T;T finished'' (in other words, the probability of the event of reaching a state where a walker has quenched a fluorophore at time T). PRISM also provides the reward operator R, which computes the expected reward over paths. The model is enhanced with a reward structure, which assigns nonnegative real numbers to states and transitions. For example, the formula R f''steps 00 ¼?g ½C T denotes the expected number of walker steps accumulated until time T (C stands for cumulative reward), assuming the reward structure ''steps'' assigns the value 1 to each transition that represents a walker step and 0 to all states.
DNA walker circuits 203
To facilitate the generation of walker circuit models, we develop a custom tool to generate PRISM models with matching track-layout graphs. PRISM then builds an internal representation of the model and computes the probability and reward values for the given formulas. There are several methods provided to compute these (Kwiatkowska et al. 2007 (Kwiatkowska et al. , 2011 , including numerical computation performed to a specified precision based on uniformisation, and simulation-based methods (also known as approximate model checking). For more details concerning probabilistic model checking of molecular networks the reader is invited to consult (Kwiatkowska et al. 2007 (Kwiatkowska et al. , 2010 .
Model
In the model, each transition corresponds to the walker changing its location from one anchorage to the next, skipping over any intermediate steps (Fig. 1) . Anchorages are constrained to lie on a triangular lattice (Wickham et al. 2012 (Wickham et al. , 2011 . Experiments show that the walker can step onto anchorages that are fixed as far away as 19 nm. We assume non-zero rates for the walker to step onto any intact anchorage within 24 nm distance. This range was chosen by taking into account the lengths of the empty anchorage and walker-anchorage complex, estimated around 15 nm and 11 nm respectively.
Assume that the stepping rate k depends on distance d between anchorages and some base stepping rate k s . Denote by d a ¼ 6:2 nm the average distance between anchorages in the experiment shown in Fig. 9 . Denote by d M ¼ 24 nm the maximal interaction distance discussed earlier. Based on previous experimental estimates of (Wickham et al. 2011) , we approximate the stepping rate k as
when 1:5d a \d 2:5d a k s =100 when 2:5d a \d d M 0 otherwise. These rates define a sphere of reach around the walkeranchorage complex, allowing the walker to step onto an uncut anchorage when it is nearby. In Fig. 10b the sphere of reach is depicted to scale with walker circuits. Because both the initial anchorage and the absorbing anchorages are slightly different from regular anchorages, we allow two exceptions to the stepping rate. Firstly, the domain complementary to the walker on the initial anchorage is two bases longer than the corresponding domain of a regular anchorage. Stepping from the initial anchorage was reported to happen 3Â more slowly: this is incorporated in the model. Secondly, absorbing anchorages include a mismatched base that prevents cutting by the nicking enzyme. Based on the experimental data, we fit a tenfold reduction for the rate of stepping onto the final absorbing anchorage (Fig. 9) . is the probability that a finished walker quenches the correct fluorophore at time T (conditional probability), expressed as the ratio of P ¼? ½F ½T;T ðfinished-correctÞ and P ¼? ½F ½T;T ðfinishedÞ. Deadlock is the probability for the walker to get stuck prematurely by time T, with no intact anchorage within reach (see Fig. 13 ), given by property P ¼? ½F ½T;T deadlock. Steps indicates the expected number of steps taken by time T, given by property R f''steps 00 g¼? ½C T , where ''steps'' is a reward structure that assigns 1 to each step transition. The red dotted line indicates a leakage transition. The results for the singlejunction circuit are obtained using PRISM's fast adaptive uniformisation (Dannenberg et al. 2013) method to an absolute error of at most 10 À6 . The results for the dual-junction circuit are generated by checking at least 10 5 paths against the property DNA walker circuits 205
Three types of interaction that are known to occur are omitted from the model: all three could be incorporated in future. Firstly, a rate of k s =5000 is reported (Wickham et al. 2011 ) for transfer of the walker between separate tracks built on different DNA origami tiles. Transfer between tiles could be eliminated by binding the tiles to a surface, thus keeping them apart. Secondly, the walker can move between intact anchorages, with a rate of $ k s =13 (Wickham et al. 2011) . We assume that the enzymatic activity is high, so that an anchorage is nicked as soon as the walker steps onto it. Thirdly, the walker can step backward onto cut anchorages. This requires a blunt-end strand-displacement reaction which is known to be slow relative to toehold-mediated displacement (Zhang and Winfree 2009) . A variant of the model with a backward rate k b ¼ k=500 is shown in dotted lines in Fig. 9 (Left). In this case the model predicts significant quenching of fluorophore F2 at late times by walkers whose forward motion is obstructed by omission of one or more anchorages: this does not match experimental data.
The time-dependent responses of fluorescent probes F2 and F8 shown in Fig. 9 (Top) are predicted by the Markov chain model using the rate parameters discussed above without any further fitting: they correspond well to the experimental data.
An additional parameter is needed to model branched circuits (Fig. 10a) . We add a possibility of failure of the blocking mechanism, such that, before the walker starts the computation, each blocked anchorage may unblock spontaneously. The probability of unblocking is uniform for all blocked anchorages. The walker can step onto such spontaneously unblocked anchorages, and may thus divert from the intended path. This may delay the walker, or, worse, it may even direct the walker to reach a different end-node, leading to the computation returning the wrong result. We infer a failure rate of 30 % by fitting to the results of the single-junction experiment (Wickham et al. 2012 ), see Fig. 10 .
Additional sources of error may exist in the system, for example, an anchorage or the walker itself could be missing from the track, or the reporting mechanism could fail. These have not been considered, but could be added to the model using our approach.
Model results
Having used experiments on straight tracks and the singlejunction circuit to determine the parameters of the model, we use the double-junction circuit shown in Fig. 10c to evaluate its quality. The model captures essential features of the walker behaviour and is reasonably well aligned with experimental data. In the model, not all walkers reach an absorbing anchorage by time T ¼ 200 min, although the predicted quenching is much higher than observed. The reason for this discrepancy is not easily determined and motivates further study. We exercise the model by model checking against temporal logic queries aimed at quantifying the reliability and performance of the computation. We note that not all the walkers that finish actually do quench the intended fluorophore. In both the model and the experiments we can identify a difference between paths that follow the side of the circuit (paths LL and RR in Fig. 10) , and paths that enter the interior (paths RL and LR): the probability of a correct outcome for the side paths is greater. This is explained by leakage transitions between neighbouring paths; an example leakage transition is indicated by a red dotted line in Fig. 10c . Walkers on an interior path can leak to both sides, but a path that follows the exterior can only leak to one side. This effect can also be shown by inspecting paths. The property
denotes the probability that a walker stays on the intended path until it quenches the correct fluorophore, by time T. By using this property we can reason about the probability of the walker deviating from the intended path. For the double-junction circuit in Fig. 10c , we infer that the probability of staying on the intended path and reaching the absorbing anchorage within 200 min is 55 % for paths LR and RL, and 58 % for paths LL and RR. This shows that walkers on interior paths are indeed more likely to deviate from the intended path than walkers on paths that follow the exterior of the circuit.
The double-junction circuit can be improved by reducing the probability of leakage from the intended path. By decreasing the proximity of off-path anchorages and reducing the track length, both the proportion of walkers finishing and correctness are increased (see Fig. 10d ). The asymmetry between paths (LL, RR vs. LR, RL) also disappears.
Increasing the number of consecutive blockades that form a track guard also results in better performance. Fig. 11 shows a redesign of the circuit from Fig. 10c ; the number of consecutive blockades that constitute a guard have been increased from two to six. Additional guards beyond the second blockage are decreasingly effective at improving the probability that the walker arrives at the correct end-node, while the probability of deadlock increases with the depth of the circuit. Deadlock occurs when a walker is isolated on a non-absorbing anchorage with no intact anchorage in range, which can happen when the walker switches direction after stepping over an intact anchorage, as in Fig. 13 . From a computational standpoint deadlock is undesirable, as it is impossible to differentiate a deadlocked process from a live process. Note that the leakage rate from the original double-junction circuit (see red arrow in Fig. 10c ) is still present in this circuit.
The expected waiting time before the walker steps is, given the model, equal to 1 k , where k is the rate of stepping. This means that, for unblocked anchorages on the origami that are immediately adjacent to the walker-anchorage complex, the expected waiting time is 1 0:009 % 2 min. Figure  11 shows the probability for the walker to finish the computation (correct or incorrect) and the probability for it to deadlock. If the walker would always be able to take a step with the base stepping rate k s ¼ 0:009, then we would expect nearly all walkers to finish computation or deadlock at the given time bound.
The results show a different picture, which is due to the fact that the walker can also step onto anchorages that are further away (i.e., not immediately adjacent on the origami). It is therefore possible to reach a state where all anchorages that are adjacent to the walker on the origami are cut, but one or more uncut (non-adjacent) anchorages are within range of the walker. In this case, the walker is not (yet) deadlocked; however, the stepping rate to these non-adjacent anchorages is either k s =50 or k s =100, depending on their range.
Therefore the expected waiting time before a jump occurs can be as high as 1 k s =100 % 185 min. By using the property
we can compute the probability for the walker to either finish the computation or deadlock at time T. For the circuit in Fig. 11 , that probability at time T ¼ 4 min Â depth of the circuit ¼ 76 min is equal to 57 %. If we remove from the model the ability for the walker to step onto an anchorage that is not adjacent on the origami, then the walker is much more likely to finish or deadlock, as the value now becomes 90 %. This shows that the ability of the walker to step onto anchorages that are not adjacent on the origami degrades the performance of the walker. It is unclear at this point whether the actual walker also suffers from this type of behaviour. The performance of PRISM (Kwiatkowska et al. 2011 ) depends on the model checking method. For small tracks, as in Fig. 9 , PRISM computes the value of a property to a precision of 10 À6 within 2 s, using standard uniformisation and the default options for the hybrid engine on common hardware 4 . Properties for the single-junction circuit in Fig. 10 were model checked within 3 s to a precision of 10 À6 using fast adaptive uniformisation (Dannenberg et al. 2013) . For large circuits, for which uniformisation may become infeasible, we may use simulation-based methods that estimate the true value of the probability to a given confidence interval by checking a given number of simulated execution paths against the property (Hérault et al. 2004) . For the dual-junction circuit in Fig. 10 , singlethreaded simulation of 10 5 paths takes 27 s. For the probability of the walker making it to any absorbing anchorage, PRISM reports the 95 %-confidence interval to be 90:2 AE 0:2 % given the input LL. More details about the molecular walkers case study and the analysis methods used can be found at. 
Design principles for walker circuits
In this section we show how a composable circuit can be optimised, and establish some design principles that increase the reliability of the circuits. We consider a 3-CNF circuit with three clauses where each clause is a disjunction over three literals.
The first circuit layout in Fig. 12 (Left) is a straightforward embedding of the topology of the circuit.
However, this layout results in potential leak reactions, some of which are indicated by red arrows in Fig. 12 (Left). Such transitions should be removed as much as possible from the design. To minimize the probability of leak reactions, we apply the following design principles for the hand optimised embedding in Fig. 12 (Right).
-(Principle 1) Increasing the distance between tracks reduces the probability for the walker to deviate from the intended path. Increasing the distance between tracks can be accomplished by using junctions that are equiangular and by elongating connecting tracks.
-(Principle 2) Increasing the length of the tracks increases the probability for the walker to deadlock. In addition, it increases the expected amount of time until the walker finishes the computation. Therefore, long tracks should be avoided if possible.
Note that these two principles can conflict as increasing the distance between tracks can require that at least one be elongated, thus increasing the rate of deadlock. A pragmatic approach was applied to the design of the layout in Fig. 12 (Right). The output tracks leading to the false terminals of each clause were made equiangular to two other tracks incident to the common fork gate (principle 1). Furthermore, the connecting tracks between clauses were elongated (principle 1), but only modestly in order to avoid an unnecessary increase in the probability of deadlock (principle 2). We emphasise that the stated design principles are guidelines based on our understanding of the model, and that our software tool was used as a computer aided design (CAD) tool that provides the user with feedback on the performance of circuits. The manually improved circuit is not necessarily optimal, and it is possible that other designs would perform even better. Optimising the circuit does improve its performance, as demonstrated in Table 1 .
Conclusions
The capability for an autonomous DNA walker to navigate a programmable track has been recently demonstrated (Wickham et al. 2012) . We have considered the potential for this system to implement DNA walker 'circuits'. Working from experimental observations, we have developed a simple model that explains the influence of track architecture, blockade failure and stepping characteristics on the reliability and performance of walker circuits. The model can be further extended as more detailed experimental measurements become available. Model checking enables analysis of path properties and quantitative measures such as the expected number of steps, which cannot be established using traditional ODE frameworks. A major advantage of our approach is that circuit designs can be manipulated to study the properties of variant architectures. We have demonstrated the utility of this approach by manually redesigning circuit layouts to decrease the potential for leak reactions to occur. It is not clear if an optimal layout can be determined by an automated algorithm that is also efficient. However, this may not be necessary. As demonstrated in Sects. 2 and 3, it is possible to design DNA walker circuits that are easily composable and which compute fundamental Boolean functions. By determining optimal layouts for composable circuits, larger circuits can be made more reliable in a principled manner. An example of this design principle was demonstrated in Sect. 3 by first optimising a composable 3-CNF clause gadget that resulted in a more reliable circuit overall. An interesting future direction for research would be to explore whether one can efficiently synthesise or evolve DNA circuit layouts.
We have shown that walker circuits can be designed to evaluate any Boolean function. We have also demonstrated the potential for large DNA walker circuits to be built from composable components. One motivation for implementing circuits with a DNA walker system, instead of a DNA strand displacement system (DSD), is the potential for faster reaction times due to spatial locality. However, the walker system we have considered has severely limited potential for parallel circuit evaluation using multiple walkers. As this is not an issue in a DSD, it is the case that this walker system requires exponentially more time to compute certain Boolean functions than a corresponding DSD. As an example, an arbitrary 3-CNF formula of m clauses requires OðmÞ time to be evaluated by this walker system, whereas the same formula could be evaluated by a DSD cascade in Oðlog mÞ parallel time (e.g., by forming a DSD cascade of a binary reduction tree of the formula). This is not necessarily true of all walker systems. The problem arises in the system under consideration due to the undirected nature of the tracks that are traversed by a walker. For the same reason, it is not clear if circuit redundancy techniques (von Neumann 1956) can be used in these walker systems in order to improve their reliability. Other autonomous walker systems with directed tracks have been demonstrated (Green et al. 2008; Muscat et al. 2011; Omabegho et al. 2009; Yin et al. 2004) , including mechanisms that do not modify the track (Omabegho et al. 2009; Yin et al. 2004 ). It would be interesting to explore the potential of reusable circuits, the information processing capabilities of DNA walkers beyond circuit evaluation, In these tests, the time at which the property is evaluated is different for the two circuits and proportional to the size of the circuit. For each of the 128 possible inputs, we simulate 200 paths per property. In this case the depth of the circuit corresponds to the longest possible path, under any variable assignment and the potential for multiple interacting walkers. Finally, it would be interesting to explore systems where circuits could be evaluated efficiently by many walkers in parallel, and are amenable to well established design techniques to improve overall circuit reliability (von Neumann 1956).
Case 1 (p has no guards in common with an output track of G) Consider any variable assignment where p reaches G and set G L ¼ G R ¼ false. The gate G is still reachable via p since they do not share a guard. However, reaching G results in a deadlock. Contradiction.
Case 2 ( p has a guard or its negation in common with one output track of G) Suppose p contains the guard G L . Consider any variable assignment where p reaches G and therefore G L ¼ true. By assigning G R ¼ true, both output tracks of G are accessible. Contradiction. The case when p contains the guard G R is symmetric. Similarly, it can be shown that variable assignments exist to ensure G will result in a deadlock, and thus a contradiction, when p contains :G L or :G R .
Case 3 (p has a guard or its negation in common with both output tracks of G) Since G is not trivial then (i) p is not guarded by both G L and :G R , and (ii) p is not guarded by both :G L and G R . By condition (i) and (ii) p must either contain the guards G L and G R or the guards :G L and :G R . When p reaches G and the former is true, then two output tracks are accessible, and when the latter is true, a deadlock occurs. Both cases result in a contradiction. h
Proof of Theorem 2
Proof (if-sufficiency) If the non-redundant gate is a sink then any path that reaches it from its left input track cannot be extended via its right input track, and vice versa; it is therefore deterministic. Suppose the gate is not a sink and its left input track is guarded by G, the right by :G and, prior to reaching those guards, all paths that can reach the left input must traverse a track guarded by G and all paths that can reach the right must traverse a track guarded by :G. There are two cases to consider. Suppose G evaluates to true. Then, no path can reach the right input since, by the assumption, those paths must traverse a track guarded by :G prior to reaching the gate. It follows that all paths that can reach the gate when G evaluates to true must be to the left input. Furthermore, as the right input is guarded by :G, those paths can only be extended via the output of the gate. The other case (G evaluates to false) is symmetric. Furthermore, as the guards are negations of each other, they cannot simultaneously evaluate to false and cause a potential deadlock.
(only if-necessity) A non-redundant sink is deterministic, so consider the case when the gate is not a sink. By definition of a join gate that is not a sink, it must have two guarded input tracks. Let G L and G R be the guards of the left and right inputs, respectively. First, consider all paths that can reach the left input, guarded by G L . It must simultaneously be true that (i) none of those paths traverse a track guarded by :G L and (ii) all of those paths traverse a track guarded by :G R . If condition (i) is not satisfied, then there would exist a path that traverses a track guarded by :G L and, to extend past the join gate, must traverse another guarded by G L . As this is not possible, the path would end in a deadlock and the gate would not be deterministic. If condition (ii) is not satisfied then there would exist some path p that does not traverse a track guarded by :G R , but may possibly traverse a track guarded by G R . In this case, there exists a variable assignment where G R , and all other guards on path p, evaluate to true. With such a variable assignment, path p could be extended via the output track or the right input track. Thus, condition (ii) must also be satisfied, as otherwise the gate would not be deterministic. The conditions (and the argument that both are necessary) when considering all paths that can initially reach the right input, guarded by G R , are symmetric.
The sufficiency argument shows the gate is deterministic when G L :G R . It remains to show it is not deterministic otherwise. First, consider the consequence when both G L and G R evaluate to true. By condition (ii) all paths leading to the left (right) input traverse a track guarded by :G R (:G L ). In this case, no paths can reach the gate. Thus, consider when both G L and G R evaluate to false. The conditions permit that paths can reach the gate; however, if any path does it will deadlock as both inputs to the gate are blocked. Thus, for all paths that can reach the gate, it will be deterministic only when G L :G R . h
