The Complexity of All-switches Strategy Improvement. by Fearnley, John & Savani, Rahul
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
JOHN FEARNLEY AND RAHUL SAVANI
University of Liverpool, Liverpool, United Kingdom
e-mail address: john.fearnley@liverpool.ac.uk
University of Liverpool, Liverpool, United Kingdom
e-mail address: rahul.savani@liverpool.ac.uk
Abstract. Strategy improvement is a widely-used and well-studied class of algorithms
for solving graph-based infinite games. These algorithms are parameterized by a switching
rule, and one of the most natural rules is “all switches” which switches as many edges as
possible in each iteration. Continuing a recent line of work, we study all-switches strategy
improvement from the perspective of computational complexity. We consider two natural
decision problems, both of which have as input a game G, a starting strategy s, and an
edge e. The problems are: 1.) The edge switch problem, namely, is the edge e ever
switched by all-switches strategy improvement when it is started from s on game G? 2.)
The optimal strategy problem, namely, is the edge e used in the final strategy that is
found by strategy improvement when it is started from s on game G? We show PSPACE-
completeness of the edge switch problem and optimal strategy problem for the following
settings: Parity games with the discrete strategy improvement algorithm of Vo¨ge and
Jurdzin´ski; mean-payoff games with the gain-bias algorithm [14,37]; and discounted-payoff
games and simple stochastic games with their standard strategy improvement algorithms.
We also show PSPACE-completeness of an analogous problem to edge switch for the bottom-
antipodal algorithm for finding the sink of an Acyclic Unique Sink Orientation on a cube.
A preliminary version of this paper appeared in the proceedings of SODA 2016 [13].
1
2 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
1. Introduction
In this paper we study strategy improvement algorithms for solving two-player games such as
parity games, mean-payoff games, discounted games, and simple stochastic games [24,37,43].
These games are interesting both because of their important applications and their unusual
complexity status. Parity games, for example, arise in several areas of theoretical computer
science, for example, in relation to the emptiness problem for tree automata [6, 22] and as
an algorithmic formulation of model checking for the modal µ-calculus [7, 41]. Moreover,
all of these problems are in NP ∩ coNP, and even UP ∩ coUP [3, 26], so they are unlikely to
be NP-complete. However, despite much effort from the community, none of these problems
are known to be in P, and whether there exists a polynomial-time algorithm to solve these
games is a very important and long-standing open problem.
Strategy improvement is a well-studied method for solving these games [24, 37, 43]. It
is an extension of the well-known policy iteration algorithms for solving Markov decision
processes. The algorithm selects one of the two players to be the strategy improver. Each
strategy of the improver has a set of switchable edges, and switching any subset of these
edges produces a strictly better strategy. So, the algorithm proceeds by first choosing
an arbitrary starting strategy, and then in each iteration, switching some subset of the
switchable edges. Eventually this process will find a strategy with no switchable edges, and
it can be shown that this strategy is an optimal strategy for the improver.
To completely specify the algorithm, a switching rule is needed to pick the subset of
switchable edges in each iteration (this is analogous to the pivot rule used in the simplex
method). Many switching rules have been proposed and studied [9, 23, 30, 31, 38]. One
of the most natural rules, and the one that we consider in this paper, is the all-switches
rule, which always switches a vertex if it has a switchable edge. In particular, we consider
greedy all-switches, which chooses the best edge whenever more than one edge is switchable
at a vertex (ties are broken arbitrarily). For a long time, the all-switches variant of the
discrete strategy improvement algorithm of Vo¨ge and Jurdzin´ski [43] was considered the
best candidate for a polynomial-time algorithm to solve parity games. Indeed, no example
was known that required a super-linear number of iterations. However, these hopes were
dashed when Friedmann showed an exponential lower bound for greedy all-switches strategy
improvement for parity games [15]. In the same paper, he showed that his result extends
to strategy improvement algorithms for discounted games and simple stochastic games.
The computational power of pivot algorithms. In this paper, we follow a recent line
of work that seeks to explain the poor theoretical performance of pivoting algorithms using
a complexity-theoretic point of view. The first results in this direction were proved for
problems in the complexity classes PPAD and PLS. It is know that, if a problem is tight-
PLS-complete then computing the solution found by the natural improvement algorithm is
PSPACE-complete [25]. Similarly, for the canonical PPAD-complete problem End-of-the-Line,
computing the solution that is found by the natural line following algorithm is PSPACE-
complete [35]. This was extended to show that is PSPACE-complete to compute any of the
solutions that can be found by the Lemke-Howson algorithm [21], a pivoting algorithm that
solves the PPAD-complete problem of finding a Nash equilibrium of a bimatrix game.
Until recently, results of this type were only known for algorithms for problems that,
due to known hardness results, are unlikely to lie in P. However, a recent series of papers
has shown that similar results hold even for the simplex method for linear programming.
Adler, Papadimitriou and Rubinstein [1] showed that there exists an artificially contrived
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 3
pivot rule for which is PSPACE-complete to decide if a given basis is on the path of bases
visited by the simplex algorithm with this pivot rule. Disser and Skutella [5] studied the
natural pivot rule that Dantzig proposed when he introduced the simplex method, and
they showed that it is NP-hard to decide whether a given variable enters the basis when the
simplex method is run with this pivot rule. Finally, Fearnley and Savani strengthened both
these results by showing that the decision problem that Disser and Skutella considered is
actually PSPACE-complete [12], and they also showed that determining if a given variable is
used in the final optimal solution found by the algorithm is PSPACE-complete. This result
exploited a known connection between single-switch policy iteration for Markov Decision
Processes (MDPs) and the simplex method for a corresponding linear program (LP). The
result was first proved for a greedy variant of single-switch policy iteration, which then
implied the result for the simplex method with Dantzig’s pivot rule.
All of the results on linear programming are motivated by the quest to find a strongly
polynomial algorithm for linear programming, which was included in Smale’s list of great
unsolved problems of the 21st century [40]. One way of resolving this problem would be to
design a polynomial-time pivot rule for the simplex method, and if we are to do this, then
it is crucial to understand why existing pivot rules fail. The PSPACE-completeness indicates
that they fail because in fact they can do something far more than is necessary, namely
they are capable of solving any problem that can be computed in polynomial space.
We face a similar quest to find a polynomial-time algorithm for the games studied in
this paper. Strategy improvement is a prominent algorithm for solving these games, and
indeed it is one of the only algorithms for solving discounted and simple stochastic games.
So, devising a polynomial-time switching rule is an obvious direction for further study. It
may in fact be easier to devise a polynomial time switching rule, because there is a lot more
freedom in each step of the algorithm: simplex pivot rules correspond to switching rules that
can only switch a single edge, whereas strategy improvement rules can switch any subset of
edges. Indeed, it may be the case that the polynomial Hirsch conjecture fails, ruling out
a strongly polynomial simplex method, even though the analogue of the Hirsch conjecture
for strategy improvement is known to be true: one can always reach an optimal strategy in
at most n strategy improvement iterations, where n is the number of vertices in the game.
Our contribution. Our main results are that, for greedy all-switches strategy improve-
ment, determining whether the algorithm switches a given edge is PSPACE-complete, and
determining whether the optimal strategy found by the algorithm uses a particular edge is
PSPACE-complete. One of the key features that strategy improvement has is the ability to
switch multiple switchable edges at the same time, rather than just one as in the simplex
method. Our results show that naively using this power does not help to avoid the PSPACE-
completeness results that now seem to be endemic among pivoting algorithms. The proof
primarily focuses on the strategy improvement algorithm of Vo¨ge and Jurdzin´ski for solving
parity games [43]. The following definition formalises the problem that we are interested
in.
Definition 1.1. Let G be a game, and let e be an edge and σ be a strategy profile of G.
The problem EdgeSwitch(G, σ, e) is to decide if the edge e is ever switched by greedy
all-switches strategy improvement when it is applied to G starting from σ.
The main technical contribution of the paper is to show the following theorem.
Theorem 1.2. EdgeSwitch for Vo¨ge and Jurdzin´ski’s algorithm is PSPACE-complete.
4 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
We use this theorem to show similar results for other games. For mean-payoff games,
our results apply to the gain-bias algorithm [14]; and for discounted and simple stochastic
games our results apply to the standard strategy improvement algorithms [4,36]. We utilise
the well-known polynomial-time reductions from parity games to mean-payoff games [36],
mean-payoff games to discounted games, and from discounted games to simple stochastic
games [44]. The parity games we construct have the property that when they are reduced
to the other games mentioned above strategy improvement will behave in the same way (for
discounted and simple stochastic games this was already observed by Friedmann [15]), so
we get the following corollary of Theorem 1.2.
Corollary 1.3. EdgeSwitch for the gain-bias algorithm, and the standard strategy im-
provement algorithms for discounted-payoff and simple stochastic games is PSPACE-complete.
Theorem 1.2 proves a property about the path taken by strategy improvement during
its computation. Previous results have also studied the complexity of finding the opti-
mal strategy that is produced by strategy improvement, which we encode in the following
problem.
Definition 1.4. Let G be a game, and let e be an edge and σ be a strategy profile of G.
The problem OptimalStrategy(G, σ, e) is to decide if the edge e is used in the optimal
strategy that is found by greedy all-switches strategy improvement when applied to G starting
from σ.
Augmenting our construction for parity games with an extra gadget gives the following
theorem.
Theorem 1.5. OptimalStrategy for Vo¨ge and Jurdzin´ski’s algorithm is PSPACE-complete.
This result requires that the parity games that we construct have multiple optimal
solutions because otherwise the PSPACE hardness of OptimalStrategy would imply NP∩
coNP = PSPACE. With further modifications, we can again extend this result to strategy
improvement algorithms for other games.
Corollary 1.6. OptimalStrategy for the gain-bias algorithm, and the standard strat-
egy improvement algorithms for discounted-payoff and simple stochastic games is PSPACE-
complete.
Our results can also be applied to unique sink orientations (USOs). An orientation of
an n-dimensional hypercube is a function that assigns a direction to each edge of the cube.
The faces of an n-dimensional cube are the k-dimensional cubes that can be obtained by
fixing n− k of the dimensions and letting the other dimensions be free. An orientation is a
USO if every face of the cube has a unique sink [42].
Recently, it was shown that recognising a USO is coNP-complete, and that recognising
an acyclic USO (AUSO) is PSPACE-complete [20]. As we will see, the games that we consider
will be guaranteed to induce AUSOs. The fundamental algorithmic problem for USOs is
to find the global sink assuming oracle access to the edge orientation. The design and
analysis of algorithms for this problem is an active research area [16, 17, 19, 23, 33, 39], in
particular for AUSOs. The BottomAntipodal algorithm [39] for AUSOs on cubes starts
at an arbitrary vertex and in each iteration jumps to the antipodal vertex in the sub-cube
spanned by the outgoing edges at the current vertex.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 5
For binary games, where vertices have outdegree at most two, the valuation functions
used by strategy improvement induce an AUSO on a cube, and all-switches strategy im-
provement corresponds to BottomAntipodal on this AUSO. For non-binary games we
instead get an AUSO on a grid [18], so our results immediately give a PSPACE-hardness
result for grid USOs for a problem analogous to EdgeSwitch. To get a similar result
for AUSOs on cubes we turn our construction into a binary parity game, and we get the
following.
Corollary 1.7. Let C be a d-dimensional cube AUSO, specified by a poly(d)-sized circuit
that computes the edge orientations for each vertex of C. Given a dimension k ∈ {1, . . . , d},
and a vertex v, it is PSPACE-complete to decide if BottomAntipodal started at v, ever
switches the kth coordinate.
Since a USO has a unique solution, by definition, we cannot hope to get a result for
AUSOs that is analogous to OptimalStrategy, since, as noted above, PSPACE-hardness
of OptimalStrategy requires multiple optimal solutions under standard complexity-
theoretic assumptions.
Related work. The best known algorithms for mean-payoff, discounted, and simple sto-
chastic games have subexponential running time: the random facet strategy improvement
algorithms combine strategy improvement with the random-facet algorithm for LPs [30–32].
Following the work of Friedmann [15] that we build on heavily in this paper, Friedmann,
Hansen, and Zwick showed a sub-exponential lower bound for the random facet strategy
improvement algorithm [16]. They also used a construction of Fearnley [8] to extend the
bound to the random facet pivot rule for the simplex method [17].
For parity games, there are several algorithms that perform better than random facet
strategy improvement. First, a deterministic subexponential-time algorithm was found [28].
Very recent work has improved this even further by producing an algorithm that uses quasi-
polynomial time and space [2], and it has subsequently been shown that there are algorithms
that use quasi-polynomial time and polynomial space [10,27].
Roadmap. In Section 2, we give a formal definition of parity games, and more specifi-
cally the one-sink games used by Friedmann that we also use for our construction. We
then give a high-level overview of how all-switches strategy improvement works. Our main
reduction starts with an iterated circuit evaluation problem. In Section 3, we describe
our main construction of a parity game that will implement iterated circuit iteration when
strategy improvement is run on it. In Section 4, we describe the sequence of strategies that
all-switches strategy improvement will go through as it implements the iterated circuit eval-
uation. In Section 5, we show that the construction works as claimed and thus prove that
EdgeSwitch is PSPACE-hard for parity games. In Section 6, we show how this result for
EdgeSwitch extends to strategy improvement algorithms for other games. In Section 7,
we show how to augment our construction with an extra gadget to give PSPACE-hardness
results for OptimalStrategy. In Section 8, we state some open problems.
2. Preliminaries
2.1. Parity games. A parity game is defined by a tuple G = (V, VEven, VOdd, E, pri), where
(V,E) is a directed graph. The sets VEven and VOdd partition V into the vertices belonging to
6 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
player Even, and the vertices belonging to player Odd, respectively. The priority function
pri : V → {1, 2, . . . } assigns a positive natural number to each vertex. We make the
standard assumption that there are no terminal vertices, which means that every vertex is
required to have at least one outgoing edge. The strategy improvement algorithm of Vo¨ge
and Jurdzin´ski also requires that we assume, without loss of generality, that every priority
is assigned to at most one vertex.
A strategy for player Even is a function that picks one outgoing edge for each Even
vertex. More formally, a deterministic positional strategy for Even is a function σ : VEven →
V such that, for each v ∈ VEven we have that (v, σ(v)) ∈ E. Deterministic positional
strategies for player Odd are defined analogously. Throughout this paper, we will only
consider deterministic positional strategies, and from this point onwards, we will refer to
them simply as strategies. We use ΣEven and ΣOdd to denote the set of strategies for players
Even and Odd, respectively.
A play of the game is an infinite path through the game. More precisely, a play is a
sequence v0, v1, . . . such that for all i ∈ N we have vi ∈ V and (vi, vi+1) ∈ E. Given a pair
of strategies σ ∈ ΣEven and τ ∈ ΣOdd, and a starting vertex v0, there is a unique play that
occurs when the game starts at v0 and both players follow their respective strategies. So,
we define Play(v0, σ, τ) = v0, v1, . . . , where for each i ∈ N we have vi+1 = σ(vi) if vi ∈ VEven,
and vi+1 = τ(vi) if vi ∈ VOdd.
Given a play pi = v0, v1, . . . we define
MaxIo(pi) = max{p : ∃ infinitely many i ∈ N s.t. pri(vi) = p},
to be the maximum priority that occurs infinitely often along pi. We say that a play pi
is winning for player Even if MaxIo(pi) is even, and we say that pi is winning for Odd if
MaxIo(pi) is odd. A strategy σ ∈ ΣEven is a winning strategy for a vertex v ∈ V if, for
every strategy τ ∈ ΣOdd, we have that Play(v, σ, τ) is winning for player Even. Likewise,
a strategy τ ∈ ΣOdd is a winning strategy for v if, for every strategy σ ∈ ΣEven, we have
that Play(v, σ, τ) is winning for player Odd. The following fundamental theorem states that
parity games are positionally determined.
Theorem 2.1 ( [6,34]). In every parity game, the set of vertices V can be partitioned into
winning sets (W0,W1), where Even has a positional winning strategy for all v ∈ W0, and
Odd has a positional winning strategy for all v ∈W1.
The computational problem that we are interested in is, given a parity game, to determine
the partition (W0,W1).
2.2. Strategy improvement. We now describe the strategy improvement algorithm of
Vo¨ge and Jurdzin´ski [43] for solving parity games, which will be the primary focus of this
paper.
Valuations. The algorithm assigns a valuation to each vertex v under every pair of strate-
gies σ ∈ ΣEven and τ ∈ ΣOdd. Since both of these strategies are positional, we know that
Play(v, σ, τ) consists of a finite initial path followed by an infinitely repeated simple cycle.
Let p be the largest priority that is seen infinitely often along Play(v, σ, τ). Since every
priority is assigned to at most one vertex, there is a unique vertex u with pri(u) = p. We
use this vertex to decompose the play: let P (v, σ, τ) be the finite simple path that starts at
v and ends at u, and let C(v, σ, τ) be the infinitely-repeated cycle that starts at u and ends
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 7
at u. We can now define the valuation function Valσ,τVJ (v) = (p, S, d) where p is as above
and:
• S is the set of priorities on the finite path that are strictly greater than p:
S = {pri(u) : u ∈ P (v, σ, τ) and pri(u) > p}.
• d is the length of the finite path: d = |P (v, σ, τ)|.
We now define an order over valuations. First we define an order  over priorities: we
have that p ≺ q if one of the following holds:
• p is odd and q is even.
• p and q are both even and p < q.
• p and q are both odd and p > q.
Furthermore, we have that p  q if either p ≺ q or p = q.
Next we define an order of the sets of priorities that are used in the second component
of the valuation. Let P,Q ⊂ N. We first define:
MaxDiff(P,Q) = max
(
(P \Q) ∪ (Q \ P )).
If d = MaxDiff(P,Q) then we define P @ Q to hold if one of the following conditions holds:
• d is even and d ∈ Q.
• d is odd and d ∈ P .
Furthermore, we have that P v Q if either P = Q or P @ Q.
Finally, we can provide an order over valuations. We have that (p, S, d) ≺ (p′, S′, d′) if
one of the following conditions holds:
• p ≺ p′.
• p = p′ and S @ S′.
• p = p′ and S = S′ and p is odd and d < d′.
• p = p′ and S = S′ and p is even and d > d′.
Furthermore, we have that (p, S, d)  (p′, S′, d′) if either (p, S, d) ≺ (p′, S′, d′) or (p, S, d) =
(p′, S′, d′).
Best responses. Given a strategy σ ∈ ΣEven, a best response against σ is a strategy
τ∗ ∈ ΣOdd such that, for every τ ∈ ΣOdd and every vertex v we have Valσ,τVJ (v)  Valσ,τ
∗
VJ (v).
Vo¨ge and Jurdzin´ski proved the following properties.
Lemma 2.2 ( [43]). For every σ ∈ ΣEven a best response τ∗ can be computed in polynomial
time.
We define Br(σ) to be an arbitrarily chosen best response strategy against σ. Furthermore,
we define ValσVJ(v) = Val
σ,Br(σ)
VJ (v).
Switchable edges. Let σ be a strategy and (v, u) ∈ E be an edge such that σ(v) 6= u. We
say that (v, u) is switchable in σ if ValσVJ(σ(v)) ≺ ValσVJ(u). Furthermore, we define a most
appealing outgoing edge at a vertex v to be an edge (v, u) such that, for all edges (v, u′) we
have ValσVJ(u
′)  ValσVJ(u).
There are two fundamental properties of switchable edges that underlie the strategy
improvement technique. The first property is that switching any subset of the switchable
edges will produce an improved strategy. Let σ be a strategy, and let W ⊆ E be a set of
8 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
switchable edges in σ such that, for each vertex v, there is at most one edge of the form
(v, u) ∈W . Switching W in σ creates a new strategy σ[W ] where for all v we have:
σ[W ](v) =
{
u if (v, u) ∈W ,
σ(v) otherwise.
We can now formally state the first property.
Lemma 2.3 ( [43]). Let σ be a strategy and let W ⊆ E be a set of switchable edges in σ
such that, for each vertex v, there is at most one edge of the form (v, u) ∈W . We have:
• For every vertex v we have ValσVJ(v)  Valσ[W ]VJ (v).
• There exists a vertex v for which ValσVJ(v) ≺ Valσ[W ]VJ (v).
The second property concerns strategies with no switchable edges. A strategy σ ∈ ΣEven
is optimal if for every vertex v and every strategy σ′ ∈ ΣEven we have Valσ′VJ(v)  ValσVJ(v).
Lemma 2.4 ( [43]). A strategy with no switchable edges is optimal.
Vo¨ge and Jurdzin´ski also showed that winning sets for both players can be extracted from
an optimal strategy. If σ is an optimal strategy, then W0 contains every vertex v for which
the first component of ValσVJ(v) is even, and W1 contains every vertex v for which the first
component of ValσVJ(v) is odd. Hence, to solve the parity game problem, it is sufficient to
find an optimal strategy.
The algorithm. The two properties that we have just described give rise to an obvious
strategy improvement algorithm that finds an optimal strategy. The algorithm begins by
selecting an arbitrary strategy σ ∈ ΣEven. In each iteration, the algorithm performs the
following steps:
(1) If there are no switchable edges, then terminate.
(2) Otherwise, select a set W ⊆ E of switchable edges in σ such that, for each vertex v,
there is at most one edge of the form (v, u) ∈W .
(3) Set σ := σ[W ] and go to step 1.
By the first property, each iteration of this algorithm produces a strictly better strategy
according to the ≺ ordering, and therefore the algorithm must eventually terminate. How-
ever, the algorithm can only terminate when there are no switchable edges, and therefore
the second property implies that the algorithm will always find an optimal strategy.
The algorithm given above does not specify a complete algorithm, because it does not
specify which subset of switchable edges should be chosen in each iteration. Indeed, there
are many variants of the algorithm that use a variety of different switching rules. In this
paper, we focus on the greedy all-switches switching rule. This rule switches every vertex
that has a switchable edge, and if there is more than one switchable edge, it arbitrarily
picks one of the most appealing edges.
One-sink games. Friedmann observed that, for the purposes of showing lower bounds, it is
possible to simplify the Vo¨ge-Jurdzin´ski algorithm by restricting the input to be a one-sink
game [15].
A one-sink parity game contains a sink vertex s such that pri(s) = 1. An even strategy
σ ∈ ΣEven is called a terminating strategy if, for every vertex v, the first component of
ValσVJ(v) is 1. This means that, when the opponent plays optimally against σ, the play will
terminate in the sink s. Formally, a parity game is a one-sink parity game if:
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 9
• There is a vertex s ∈ V such that pri(s) = 1, and (s, s) is the only outgoing edge
from s. Furthermore, there is no vertex v with pri(v) = 0.
• All optimal strategies are terminating.
Now, suppose that we apply the Vo¨ge-Jurdzin´ski algorithm, and furthermore suppose
that the initial strategy is terminating. Since the initial and optimal strategies are both
terminating, we have that, for every strategy σ visited by the algorithm and every vertex
v, the first component of ValσVJ(v) = 1, and so it can be ignored. Furthermore, since there
is no vertex with priority 0, the second component of ValσVJ(v) must be different from the
second component of ValσVJ(u), for every v, u ∈ V . Therefore, the third component of the
valuation can be ignored.
Thus, for a one-sink game, we can define a simplified version of the Vo¨ge-Jurdzin´ski
algorithm that only uses the second component. So, we define Valσ(v) to be equal to the
second component of ValσVJ(v), and we carry out strategy improvement using the definitions
given above, but with Valσ(v) substituted for ValσVJ(v). Note, in particular, that in this
strategy improvement algorithm, and edge (v, u) is switchable in σ if Valσ(σ(v)) @ Valσ(u).
In our proofs, we will frequently want to determine the maximum difference between
two valuations. For this reason, we introduce the following notation. For every strategy σ,
and every pair of vertices v, u ∈ V , we define MaxDiffσ(v, u) = MaxDiff(Valσ(v),Valσ(u)).
2.3. Circuit iteration problems. To prove our PSPACE-completeness results, we will re-
duce from two circuit iteration problems, which we now define.
The problems. A circuit iteration instance is a triple (F,B, z), where:
• F : {0, 1}n → {0, 1}n is a function represented as a boolean circuit C,
• B ∈ {0, 1}n is an initial bit-string, and
• z is an integer such that 1 ≤ z ≤ n.
We use standard notation for function iteration: given a bit-string B ∈ {0, 1}n, we recur-
sively define F 1(B) = F (B), and F i(B) = F (F i−1(B)) for all i > 1.
We now define two problems that will be used as the starting point for our reduction.
Both are decision problems that take as input a circuit iteration instance (F,B, z).
• BitSwitch(F,B, z): decide whether there exists an even i ≤ 2n such that the z-th
bit of F i(B) is 1.
• CircuitValue(F,B, z): decide whether the z-th bit of F 2n(B) is 1.
The requirement for i to be even in BitSwitch is a technical requirement that is necessary
in order to make our reduction to strategy improvement work.
The fact that these problems are PSPACE-complete should not be too surprising, be-
cause F can simulate a single step of a space-bounded Turing machine, so when F is
iterated, it simulates a run of the space-bounded Turing machine. The following result was
shown in [12].
Lemma 2.5. [12, Lemma 7] BitSwitch and CircuitValue are PSPACE-complete.
Circuits. For the purposes of our reduction, we must make some assumptions about
the format of the circuits that represent F . Let C be a boolean circuit with n input bits,
n output bits, and k gates. We assume, w.l.o.g., that all gates are or-gates or not-gates.
The circuit will be represented as a list of gates indexed 1 through n + k. The indices 1
through n represent the n inputs to the circuit. Then, for each i > n, we have:
10 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
• If gate i is an or-gate, then we define I1(i) and I2(i) to give the indices of its two
inputs.
• If gate i is a not-gate, then we define I(i) to give the index of its input.
The gates k + 1 through k + n correspond to the n output bits of the circuit, respectively.
For the sake of convenience, for each input bit i, we define I(i) = k + i, which indicates
that, if the circuit is applied to its own output, input bit i should copy from output bit
I(i). Moreover, we assume that the gate ordering is topological. That is, for each or-gate i
we assume that i > I1(i) and i > I2(i), and we assume that for each not-gate i we have
i > I(i).
For each gate i, let d(i) denote the depth of gate i, which is the length of the longest
path from i to an input bit. So, in particular, the input bits are at depth 0. Observe that
we can increase the depth of a gate by inserting dummy or-gates: given a gate i, we can
add an or-gate j with I1(j) = i and I2(j) = i, so that d(j) = d(i) + 1. We use this fact in
order to make the following assumptions about our circuits:
• For each or-gate i, we have d(I1(i)) = d(I2(i)).
• There is a constant c such that, for every output bit i ∈ {k + 1, k + n}, we have
d(i) = c.
From now on, we assume that all circuits that we consider satisfy these properties. Note
that, since all outputs gates have the same depth, we can define d(C) = d(k + 1), which is
the depth of all the output bits of the circuit.
Given an input bit-string B ∈ {0, 1}n, the output of each gate in C can be determined.
We define Eval(B, i) = 1 if gate i is true for input B, and Eval(B, i) = 0 if gate i is false
for input B.
Given a circuit C, we define the negated form of C to be a transformation of C in
which each output bit is negated. More formally, we transform C into a circuit C ′ using
the following operation: for each output bit n+ i in C, we add a Not gate n+ k + i with
I(n+ k + i) = n+ i.
3. The Construction
Overview. We will show that EdgeSwitch is PSPACE-complete by reducing from the
circuit iteration problem BitSwitch. Figure 1 gives a high level picture of the construction.
Given a circuit F that is to be iterated, we create a gadget that is capable of computing F
on a given input. Our construction will contain two copies of this gadget, which will be
numbered 0 and 1. The two circuits alternate, with the output of one circuit being passed
to the input of the other circuit. So, given an initial bit-string B, circuit 0 computes F (B),
then circuit 1 computes F (F (B)), then circuit 0 computes F (F (F (B))), and so on. The
technical reason for having two copies of the circuit is that our circuit gadget cannot handle
the input bits being changed before the output bits are read, and so a single circuit gadget
cannot feed its own outputs back into its inputs.
Figure 1 also shows the clocks which play a fundamental role in driving the construction.
Each copy of the circuit is equipped with its own clock, which controls the timing of that
circuit. In particular, each clock has two states r and s. Ordinarily, the valuation of r
is larger than the valuation of s. Every so often, the clock produces a signal, which is
transmitted by the valuation of s being larger than the valuation of r. This signal causes
the associated circuit to begin computing based on its current input. Thus, the clocks plays
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 11
Circuit0
Circuit1
Clock0
Clock1
Inputs0
Inputs1
r0
s0
r1
s1
r0 s0
r1
s1
r0
s0
s1r1
Figure 1. High-level overview of our construction. There are two copies
of the underlying circuit, and two clocks. The two are synchronized via
the nodes r0, s0, r1, and s1. In this diagram the directions of arrows are
consistent with the directed edges in the corresponding parity game.
an important role in synchronising the two circuits, and ensuring that each circuit starts
computing only after its partner has finished computing the previous iteration.
Each clock is implemented by a modified version of Friedmann’s exponential-time exam-
ple. Friedmann’s examples are designed to force greedy all-switches strategy improvement
to mimic a binary counter. The signal sent by the clock occurs when Friedmann’s example
increments the binary counter to the next number. Our modifications serve only to increase
the number of strategy improvement iterations that take place between each increment.
Finally, Figure 1 shows the Input/Output gadgets. These gadgets are responsible for
transmitting bit-strings between the two circuits, and so they are the most complex part
of the construction. These gadgets have two modes. When they are in output mode, they
are connected at the outputs of a circuit, where they read and store the outputs. When
they are in input mode they are connected at the inputs of the other circuit, where they
allow the stored bit-string to be read. For this reason, the Input/Output gadgets must
be connected to both clocks, so that they are able to transition between the circuits at the
correct time.
The reduction. Formally, let (F,B, z) be the input to the circuit iteration problem, and
let C be the negated form of the circuit that computes F . Throughout this section, we will
use n as the bit-length of B, and k = |C| as the number of gates used in C. We will use Or,
Not, and Input/Output to denote the set of or-gates, not-gates, and input/output-gates,
respectively.
12 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Deceleration
lane
bit0bit1· · ·bitn
sink
sr
Figure 2. High-level overview of a clock.
In the rest of this section, we describe the construction. We begin by giving an overview
of Friedmann’s example, both because it plays a key role in our construction, and because
the Not gate gadgets in our circuits are a modification of the bit-gadget used by Friedmann.
We then move on to describe the gate gadgets, and how they compute the function F .
Priorities. As we have mentioned, the strategy improvement algorithm that we consider
requires that every priority is assigned to at most one vertex. This is unfortunately a
rather cumbersome requirement when designing more complex constructions. To help with
this, we define a shorthand for specifying priorities. Let c ∈ N, let i, l ∈ {1, . . . , |V |}, let
j ∈ {0, 1, 2}, and let e ∈ {0, 1}. We define
P(c, i, l, j, e) = 6 · |V |2 · c+ 6 · |V | · i+ 6 · l + 2 · j + e.
The first four parameters should be thought of as a lexicographic ordering, which deter-
mines how large the priority is. The final number e determines whether the priority is odd
or even. Note that P(c, i, l, j, e) is an injective function, so if we ensure that the same set of
arguments are never used twice, then we will never assign the same priority to two different
vertices. One thing to note is that, since this priority notation is rather cumbersome, it is
not possible to use it in our diagrams. Instead, when we draw parts of the construction, we
will use representative priorities, which preserve the order and parity of the priorities used
in the gadgets, but not their actual values.
3.1. Friedmann’s exponential-time example. In this section, we give an overview of
some important properties of Friedmann’s exponential-time examples. In particular, we
focus on the properties that will be important for our construction. A more detailed de-
scription of the example can be found in Friedmann’s original paper [15].
A high level view of Friedmann’s construction is shown in Figure 2. It works by forcing
greedy all-switches strategy improvement to simulate an n bit binary counter. It consists
of two components: a bit gadget that is used to store one of the bits of the counter, and a
deceleration lane that is used to ensure that the counter correctly moves from one bit-string
to the next.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 13
t0
16
t1
7
t2
9
t3
11
t4
13
a1
8
a2
10
a3
12
a4
14
r sr sr sr sr s
Figure 3. Friedmann’s deceleration lane of length 4. Each ti vertex with
i > 0 has an odd priority, while each ai vertex has an even priority that is
equal to pri(ti) + 1. The vertex t0 has an even priority that is larger than
any of the priorities assigned to the ai vertices.
The deceleration lane. Friedmann’s example contains one copy of the deceleration lane.
The deceleration lane has a specified length m, and Figure 3 shows an example of a deceler-
ation lane of length 4. Friedmann’s construction contains one copy of the deceleration lane
of length 2n. Remember that our diagrams use representative priorities, which preserve the
order and parity of the priorities used, but not their values.
A key property of the deceleration lane is that greedy all-switches strategy improvement
requires m iterations to find the optimal strategy. Consider an initial strategy in which each
vertex ti uses the edge to r, and that the valuation of r is always larger than the valuation
of s. First note that, since there is a large even priority on t0, the optimal strategy is for
every vertex ti, with i ≥ 1, to use the edge to ti−1. However, since the vertices ti with
i ≥ 1 are all assigned odd priorities, in the initial strategy only the edge from t1 to t0 is
switchable. Furthermore, once this edge has been switched, only the edge from t2 to t1 is
switchable. In this way, the gadget ensures that m iterations are required to move from the
initial strategy to the optimal strategy for this gadget.
Another important property is that the gadget can be reset. This is achieved by having
a single iteration in which the valuation of s is much larger than the valuation of r, followed
by another iteration in which the valuation of r is much larger than the valuation of s. In
the first iteration all vertices ti switch to s, and in the second iteration all vertices switch
back to r. Note that after the second iteration, we have arrived back at the initial strategy
described above.
The bit gadget. The bit gadget is designed to store one bit of a binary counter. The
clock construction will contain n copies of this gadget, which will be indexed 1 through n.
Figure 4 gives a depiction of a bit gadget with index i.
The current value of the bit for index i is represented by the choice that the current
strategy makes at di. More precisely, for every strategy σ we have:
• If σ(di) = ei, then bit i is 1 in σ.
• If σ(di) 6= ei, then bit i is 0 in σ.
The Odd vertex ei plays a crucial role in this gadget. If σ(di) = ei, then Odd’s best response
is to use edge (ei, hi), to avoid creating the even cycle between di and ei. On the other
14 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
di
3
ei
4
fi
15
hi
16
gi
5
ki
13
a1a2. . .a2i
gi+1
gn
x
...
r
s
Figure 4. An example of Friedman’s bit gadget. The vertex di has a
small odd priority, while the vertex ei has an even priority that is equal
to pri(di) + 1. The vertex fi has a large odd priority, while the vertex hi
has an large even priority that is equal to pri(hi)+1. The priorities assigned
to ki and gi are not relevant to the operation of Friedmann’s construction.
hand, if σ(di) 6= ei, then Odd’s best response is to use (ei, di), to avoid seeing the large even
priority at hi.
One thing to note is that, in the case where σ(di) 6= ei, the edge to ei is always
switchable. To prevent di from immediately switching to ei, we must ensure that there is
always a more appealing outgoing edge from ei, so that the greedy all-switches rule will
switch that edge instead. The edges from di to the deceleration lane provide this. Once t1
has switched to t0, the edge from di to a1 becomes more appealing than the edge to ei,
once t2 has switched to t1, the edge from di to a2 becomes more appealing than the edge
to ei, and so on. In this way, we are able to prevent di from switching to ei for 2i iterations
by providing outgoing edges to the first 2i vertices of the deceleration lane.
The vertices s and r. The vertex s has outgoing edges to every vertex fi in the bit
gadgets, and the vertex r has outgoing edges to every vertex gi in the bit gadgets. If i is
the index of the least significant 1 bit, then s chooses the edge to fi and r chooses the edge
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 15
to gi. The priority assigned to r is larger than the priority assigned to s, which ensures
that the valuation of r is usually larger than the valuation of s, as required to make the
deceleration lane work.
When the counter moves from one bit-string to the next, the index of the least signifi-
cant 1 changes to some i′ 6= i. The vertex s switches to fi′ one iteration before the vertex r
switches to gi′ . This creates the single iteration in which the valuation of s is larger than
the valuation of r, which resets the deceleration lane.
Simulating a binary counter. To simulate a binary counter, we must do two things.
Firstly, we must ensure that if the counter is currently at some bit-string K ∈ {0, 1}n, then
the least significant zero in K must be flipped to a one. Secondly, once this has been done,
all bits whose index is smaller than the least significant zero must be set to 0. If these two
operations are always performed, then strategy improvement will indeed count through all
binary strings.
The least significant zero is always flipped because each bit i has 2i edges to the decel-
eration lane. Since the purpose of the deceleration lane is to prevent the vertex di switching
to ei, the vertex di′ where i
′ is the index of the least significant zero, is the first to run out
of edges, and subsequently switch to ei′ .
Once this has occurred, all bits with index smaller than the least significant zero are set
to 0 due to the following chain of events. The vertex s switches fi′ , and then the vertex di′′
in all bits with index i′′ < i′ will be switched to s. Since di′′ no longer uses the edge to ei′′ ,
the bit has now been set to 0.
Our modifications to Friedmann’s example. In order to use Friedmann’s example as a
clock, we make a few minor adjustments to it. Firstly, we make the deceleration lane longer.
Friedmann’s example uses a deceleration lane of length 2n, but we use a deceleration lane
of length 2k+ 4n+ 6. Furthermore, while the vertex di has outgoing edges to each aj with
j ≤ 2i in Friedmann’s version, in our modified version the vertex di has outgoing edges to
each aj with j ≤ 2i+ 2k + 2n+ 6.
The reason for this is that Friedmann’s example can move from one bit-string to the
next in as little as four iterations, but we need more time in order to compute the circuit F .
By making the deceleration lane longer, we slow down the construction, and ensure that
there are at least 2k + 2n + 6 iterations before the clock moves from one bit-string to the
next.
The second change that we make is to change the priorities, because we need to make
room for the gadgets that we add later. However, we have not made any fundamental
changes to the priorities: the ordering of priorities between the vertices and their parity is
maintained. We have simply added larger gaps between them.
The following table specifies the version of the construction that we use. Observe that
two copies are specified: one for j = 0 and the other for j = 1. Furthermore, observe that
the vertex x will be the sink in our one-sink game.
16 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Input01 Input
0
2
Gate03
(NOT)
Gate04
(OR)
Gate05
(OR)
Gate06
(NOT)
Input11 Input
1
2
r0
s0
r0
s0
s1
r1
r0
s0
r0
s0
r0
s0
r0
s0
r1
s1
r1
s1
r0
s0
o01 o
0
2
o03 o
0
4
o05 o
0
6
o11 o
1
2
Depth 0
Depth 1
Depth 2
Figure 5. Example of how we implement a specific circuit with three gates.
Vertex Conditions Edges Priority Player
tj0 j ∈ {0, 1} rj , sj P(2, 0, 2k + 4n+ 4, j, 0) Even
tjl j ∈ {0, 1}, rj , sj , tjl−1 P(2, 0, l, j, 1) Even
1 ≤ l ≤ 2k + 4n+ 6
ajl j ∈ {0, 1}, tjl P(2, 0, l + 1, j, 0) Even
1 ≤ l ≤ 2k + 4n+ 6
dji j ∈ {0, 1}, 1 ≤ i ≤ n eji , sj , rj , ajl for P(1, i, 0, j, 1) Even
1 ≤ l ≤ 2k + 2n+ 6 + 2i
eji j ∈ {0, 1}, 1 ≤ i ≤ n hji , di P(1, i, 1, j, 0) Odd
gji j ∈ {0, 1}, 1 ≤ i ≤ n f ji P(1, i, 2, j, 1) Even
kji j ∈ {0, 1}, 1 ≤ i ≤ n x, gjl , for i < l ≤ n P(8, i, 0, j, 1) Even
f ji j ∈ {0, 1}, 1 ≤ i ≤ n eji P(8, i, 1, j, 1) Even
hji j ∈ {0, 1}, 1 ≤ i ≤ n kji P(8, i, 2, j, 0) Even
sj j ∈ {0, 1} x, f jl for 1 ≤ l ≤ n P(7, 0, 0, j, 0) Even
rj j ∈ {0, 1} x, gjl for 1 ≤ l ≤ n P(7, 0, 1, j, 0) Even
x x P(0, 0, 0, 0, 1) Even
3.2. Our construction.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 17
Circuits. Given a circuit, we will produce a gadget that simulates that circuit. An exam-
ple is given in Figure 5. For each gate in the circuit, we design a gadget that computes
the output of that gate. The idea is that greedy all-switches strategy improvement will
compute these gates in depth order. Starting from an initial strategy, the first iteration
will compute the outputs for all gates of depth 1, the next iteration will use these outputs
to compute the outputs for all gates of depth 2, and so on. In this way, after k iterations
of strategy improvement, the outputs of the circuit will have been computed. We then use
one additional iteration to store these outputs in an input/output gadget.
Strategy improvement valuations will be used to represent the output of each gate.
Each gate i has a state oji , and the valuation of this state will indicate whether the gate
evaluates to true or false. In particular the following rules will be followed.
Property 3.1. In every strategy σ we have the following properties.
(1) Before the gate has been evaluated, we will have Valσ(oji ) @ Valσ(rj).
(2) If the gate has been evaluated to false, we will continue to have Valσ(oji ) @ Valσ(rj).
(3) If the gate has been evaluated to true, then we will instead have Valσ(rj) @ Valσ(oji ),
and MaxDiffσ(rj , oji ) will be a large even priority.
The input/output gadgets are connected to both circuits, and these gadgets have two modes.
(1) When circuit j is computing, the gadget is in output mode, where it reads the output
of circuit j and stores it.
(2) When circuit 1− j is computing, the gadget is in input mode, where it outputs the
value that was stored from the previous computation into circuit 1− j.
Therefore, the gates of depth 1 in circuit j read their input from the input/output gadgets
in circuit 1−j, while the input/output gadgets in circuit j read their input from the outputs
of circuit j. To formalise this, we introduce the following notation. For every Not-gate, we
define InputState(i, j) as follows:
InputState(i, j) =
{
o1−jI(i) if d(i) = 1,
ojI(i) if d(i) > 1.
For every Or-gate, and every l ∈ {1, 2}, we define InputState(i, j, l) as follows:
InputState(i, j, l) =
{
o1−jIl(i) if d(i) = 1,
ojIl(i) if d(i) > 1.
The clocks. As we have mentioned, we use two copies of Friedmann’s example to act as
clocks in our construction. These clocks will be used to drive the computation. In particular,
the vertices rj and sj will play a crucial role in synchronising the two circuits. As described
in the previous section, when the clock advances, i.e., when it moves from one bit-string
to the next, there is a single iteration in which the valuation of sj is much larger than the
valuation of rj . This event will trigger the computation.
• The iteration in which the valuation of s0 is much larger than the valuation of r0
will trigger the start of computation in circuit 0.
• The iteration in which the valuation of s1 is much larger than the valuation of r1
will trigger the start of computation in circuit 1.
18 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
In order for this approach to work, we must ensure that the two clocks are properly
synchronised. In particular, the gap between computation starting in circuit j and compu-
tation starting in circuit 1 − j must be at least k + 3, to give enough time for circuit j to
compute the output values, and for these values to be stored. We now define notation for
this purpose. First we define the number of iterations that it takes for a clock to move from
bit-string K to K + 1. For every bit-string K ∈ {0, 1}n, we define Lsz(K) to be the index
of the least significant zero in K: that is, the smallest index i such that Ki = 0. For each
K ∈ {0, 1}n, we define:
Length(K) =
(
2k + 2n+ 6
)
+ 2 Lsz(K) + 5.
This term can be understood as the length of the deceleration lane to which all bits in the
clock have edges, plus the number of extra iterations it takes to flip the least-significant
zero, plus five extra iterations needed to transition between the two bit-strings.
Next we introduce the following delay function, which gives the amount of time each
circuit spends computing. For each j ∈ {0, 1} and each K ∈ {0, 1}n, we define:
Delay(j,K) =

(
d(C) + 3
)
+ 2n if j = 0,(
d(C) + 3
)
+ 2 · Lsz(K) + 5 if j = 1.
Circuit 1 starts computing Delay(0,K) iterations after Circuit 0 started computing, and
Circuit 0 starts computing Delay(1,K) iterations after circuit 1 started computing. Observe
that Delay(0,K) + Delay(1,K) = Length(K), which ensures that the two circuits do not
drift relative to each other. The term d(C) + 3 in each of the delays ensures that there is
always enough time to compute the circuit, before the next circuit begins the subsequent
computation.
Or gates. The gadget for a gate i ∈ Or is quite simple, and is shown in Figure 6. It is
not difficult to verify that the three rules given in Property 3.1 hold for this gate. Before
both inputs have been evaluated, the best strategy at oji is to move directly to r
j , since the
valuation of both inputs is lower than the valuation of rj . Note that in this configuration
the valuation of oji is smaller than the valuation of r
j , since oji has been assigned an odd
priority.
Since, by assumption, both inputs have the same depth, they will both be evaluated at
the same time. If they both evaluate to false, then nothing changes and the optimal strategy
at oji will still be r
j . This satisfies the second rule. On the other hand, if at least one input
evaluates to true, then the optimal strategy at oji is to switch to the corresponding input
states. Since the valuation of this input state is now bigger than rj , the valuation of oji will
also be bigger than rj , so the third rule is also satisfied.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 19
oji
1
rj
sj
oj
I1(i)
oj
I2(i)
Vertex Conditions Edges Priority Player
oji j ∈ {0, 1}, sj , rj , P(4, i, 0, j, 1) Even
i ∈ Or InputState(i, j, 1)
InputState(i, j, 2)
Figure 6. The Or gate.
Not gates. The construction for a gate i ∈ Not is more involved. The gadget is quite
similar to a bit-gadget from Friedmann’s construction. However, we use a special modified
deceleration lane, which is shown in Figure 7.
The modified deceleration lane is almost identical to Friedmann’s deceleration lane,
except that state tji,d(i) is connected to the output state of the input gate. The idea is that,
for the first d(i)− 1 iterations the deceleration lane behaves as normal. Then, in iteration
d(i), the input gate is evaluated. If it evaluates to true then the valuation of tji,d(i) will be
large, and the deceleration lane continues switching as normal. If it evaluates to false, then
the valuation of tji,d(i) will be low, and the deceleration lane will stop switching.
The Not gate gadget, which is shown in Figure 8 is a simplified bit gadget that is
connected to the modified deceleration lane. As in Friedmann’s construction, the strategy
. . .tj
i,d(i)−1t
j
i,d(i)
tj
i,d(i)+1
. . .
aj
i,d(i)−1a
j
i,d(i)
aj
i,d(i)+1
oj
I(i)
rj sjrj sj
Figure 7. Modified deceleration lane for a Not gate i in circuit j.
20 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
dji
3
eji
4
oji
15
hji
16
aji,1a
j
i,2
. . .aji,m
sj
rj
Figure 8. Not gate with index i in circuit j.
chosen at dji will represent the output of the gate. In a strategy σ, the gate outputs 1 if
σ(dji ) = e
j
i , and it outputs 0 otherwise. As we know, Friedmann’s bit gadget is distracted
from switching dji to e
j
i by the deceleration lane. By using the modified deceleration lane,
we instead obtain a Not gate. Since the deceleration lane keeps on switching if and only if
the input gate evaluates to true, the state dji will switch to e
j
i in iteration d(i) if and only if
the input gate evaluates to false. This is the key property that makes the Not gate work.
To see that the three rules specified in Property 3.1 are respected, observe that there is
a large odd priority on the state oji , and an even larger even priority on the state h
j
i . This
causes the valuation of oji to only be larger than the valuation of r
j if and only if dji chooses
the edge to eji , which only happens when the gate evaluates to true.
Finally, when the computation in circuit j begins again, the Not-gate is reset. This
is ensured by giving the vertex dji edges to both s
j and rj . So, when the clock for circuit
j advances, no matter what strategy is currently chosen, the vertex dji first switches to s
j ,
and then to rj , and then begins switching to the deceleration lane.
The following table formally specifies the Not gate gadgets that we use in the con-
struction.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 21
rj sj yj : 4 zj : 2
sjrjr1−j
Figure 9. Circuit mover gadget for circuit j. Left: the edges to rj and
sj in a vertex from a Not-gate. Right: the replacement for these outgoing
edges.
Vertex Conditions Edges Priority Player
tji,0 j ∈ {0, 1}, i ∈ Not rj , sj P(5, i, 2k + 4n+ 4, j, 0) Even
tji,l j ∈ {0, 1}, i ∈ Not, rj , sj , tji,l−1 P(5, i, l, j, 1) Even
1 ≤ l ≤ 2k + 4n+ 6,
and l 6= d(i)
tji,d(i) j ∈ {0, 1}, i ∈ Not InputState(i, j) P(5, i, d(i), j, 1) Even
aji,l j ∈ {0, 1}, i ∈ Not, ti,l P(5, i, l + 1, j, 0) Even
1 ≤ l ≤ 2k + 4n+ 6,
and l 6= d(i)
aji,d(i) j ∈ {0, 1}, i ∈ Not ti,d(i) P(4, i, 0, j, 0) Even
dji j ∈ {0, 1}, i ∈ Not sj , rj , eji , aji,l P(4, i, 0, j, 1) Even
for 1 ≤ l ≤ 2k + 4n+ 6
eji j ∈ {0, 1}, i ∈ Not hji , dji P(4, i, 1, j, 0) Odd
oji j ∈ {0, 1}, i ∈ Not eji P(6, i, 0, j, 1) Even
hji j ∈ {0, 1}, i ∈ Not rj P(6, i, 1, j, 0) Even
Input/output gates. For each input-bit in each copy of the circuit, have an input/output
gate. Recall that these gadgets have two modes. When Circuit j is computing, the in-
put/output gadgets in circuit j are in output mode, in which they store the output of the
circuit, and the input/output gadgets in circuit 1 − j are in input mode, in which they
output the value that was stored in the previous computation.
At its core, the input/output gadget is simply another copy of the Not-gate gadget
that is connected to the ith output bit of circuit j. However, we modify the Not-gate
gadget by adding in extra vertices that allow it to be moved between the two circuits. The
most important part of this circuit mover apparatus is shown in Figure 9: all of the vertices
22 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
dji
3
eji
4
oji
15
qji,0
6
qji,1
32
r1−j
4
hji,0
2
hji,1
30
hji,2
12
aji,1a
j
i,2
. . .aji,m
zj
yj
r1−j
rj
Figure 10. Input/Output gate with index i in circuit j.
in the Not-gate gadget that have edges to rj and sj are modified so that they instead have
edges to yj and zj . Figures 10 and 11 show the Input/Output gadget and its associated
modified deceleration lane, respectively. There are three differences between this gadget
and the Not gate, which are the inclusion of the vertices hji,? and q
j
i,? (shown in Figure 10),
and the vertices pji and p
j
i,1 (shown in Figure 11). All of these vertices are involved in the
operation of moving the gadget between the two circuits.
When the gadget is in output mode, the vertex yj chooses the edge to rj , the vertex
hji,0 chooses the edge to h
j
i,1, and the vertex p
j
i chooses the edge to o
j
I(i). When these edges
are chosen, the gadget is essentially the same as a Not-gate at the top of the circuit. So,
once the circuit has finished computing, the vertex dji chooses the edge to e
j
i (i.e., the stored
bit is 1) if and only if the ith output from the circuit was a 0. Since the circuit was given
in negated form, the gadget has therefore correctly stored the ith bit of F (B).
Throughout the computation in circuit j, the valuation of rj is much larger than the
valuation of r1−j . The computation in circuit 1 − j begins when the clock in circuit 1 − j
advances, which causes the valuation of r1−j to become much larger than the valuation of
rj . When this occurs, the input/output gate then transitions to input mode. The transition
involves the vertex yj switching to r1−j , the vertex hji,0 switching to h
j
i,2, and the vertex
pji switching to p
j
i,1. Moreover, the player Odd vertex q
j
i,0 switches e
j
i . This vertex acts as
a circuit breaker, which makes sure that the output of the gadget is only transmitted to
circuit 1− j when the gadget is in input mode.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 23
. . .tj
i,d(C)−1t
j
i,d(C)
tj
i,d(C)+1
. . .
aj
i,d(C)−1a
j
i,d(C)
aj
i,d(C)+1
pji
2
oj
I(i)
pji,1
14
r1−j
yj zjyj zj
Figure 11. Modified deceleration lane for an Input/Output gate i in circuit j.
The key thing is that all of these switches occur simultaneously in the same iteration.
Since strategy improvement only cares about the relative difference between the outgoing
edges from the vertex, and since all edges leaving the gadget switch at the same time, the
operation of the Not-gate is not interrupted. So, the strategy chosen at dji will continue to
hold the ith bit of F (B), and the gadget has transitioned to input mode.
When the gadget is in input mode, it can be viewed as a Not at the bottom of circuit
1 − j that has already been computed. In particular, the switch from hji,1 to hji,2 ensures
that, if the output is 1, then the gadget has the correct output priority. Moreover, the
deceleration lane has enough states to ensure that, if the output is 0, then output of the
gadget will not flip from 0 to 1 while circuit 1− j is computing.
Finally, once circuit 1− j has finished computing, the clock for circuit j advances, and
the input/output gadget moves back to output mode. This involves resetting the Not gate
gadget back to its initial state. This occurs because, when the clock in circuit j advances,
there is a single iteration in which the valuation of sj is higher than the valuation of rj .
This causes zj to switch to sj which in turn causes a single iteration in which the valuation
of zj is higher than the valuation of yj . Then, in the next iteration the vertex yj switches to
rj , and so the valuation of yj is then larger than the valuation of zj . So, the valuations of
yj and zj give exactly the same sequence of events as rj and sj , which allows the Not-gate
to reset.
The following table specifies the input/output gadget.
24 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Vertex Conditions Edges Priority Player
yj j ∈ {0, 1} r1−j , rj P(3, 0, 1, j, 0) Even
zj j ∈ {0, 1} rj , sj P(3, 0, 0, j, 0) Even
tji,0 j ∈ {0, 1}, i ∈ Input/Output yj , zj P(5, i, 2k + 4n+ 4, j, 0) Even
tji,l j ∈ {0, 1}, i ∈ Input/Output, yj , zj , tji,l−1 P(5, i, l, j, 1) Even
1 < l ≤ 2k + 4n+ 6,
and l 6= d(C)
tji,d(C) j ∈ {0, 1}, i ∈ Input/Output pji P(5, i, d(C), j, 1) Even
pji j ∈ {0, 1}, i ∈ Input/Output ojI(i), pji,1 P(3, i, 2, j, 0) Even
pji,1 j ∈ {0, 1}, i ∈ Input/Output r1−j P(5, i, 2k + 4n+ 5, j, 0) Even
aji,l j ∈ {0, 1} i ∈ Input/Output, ti,l P(5, i, l + 1, j, 0) Even
1 ≤ l ≤ 2k + 4n+ 6
dji j ∈ {0, 1}, i ∈ Input/Output yj , zj , eji , aji,l for P(4, i, 0, j, 1) Even
1 ≤ l ≤ 2k + 4n+ 6
eji j ∈ {0, 1}, i ∈ Input/Output hji,0, dji P(4, i, 1, j, 0) Odd
qji,0 j ∈ {0, 1}, i ∈ Input/Output eji , qji,1 P(4, i, 2, j, 0) Odd
qji,1 j ∈ {0, 1}, i ∈ Input/Output r1−ji P(6, d(C) + 2, 0, j, 0) Even
oji j ∈ {0, 1}, i ∈ Input/Output qji,0 P(6, i, 0, j, 1) Even
hji,0 j ∈ {0, 1}, i ∈ Input/Output hji,1, hji,2 P(3, i, 3, j, 0) Even
hji,1 j ∈ {0, 1}, i ∈ Input/Output rj P(6, d(C) + 1, 1, j, 0) Even
hji,2 j ∈ {0, 1}, i ∈ Input/Output r1−j P(6, 0, 1, j, 0) Even
One-sink game. If we are to use the simplified strategy improvement algorithm, we must
first show that this construction is a one-sink game. We do so in the following lemma.
Lemma 3.2. The construction is a one-sink game.
Proof. In order to show that the construction is a one-sink game, we must show that the
two required properties hold. Firstly, we must show that there is a vertex the satisfies the
required properties of a sink vertex. It is not difficult to verify that vertex x does indeed
satisfy these properties: the only outgoing edge from x is the edge (x, x), and we have
pri(x) = P(0, 0, 0, 0, 0) = 1. Furthermore, no vertex is assigned priority 0.
Secondly, we must argue that all optimal strategies are terminating. Recall that a
terminating strategy has the property that the first component of the Vo¨ge-Jurdzin´ski
valuation is 1, which implies that all paths starting at all vertices eventually arrive at the
sink x. So, consider a strategy σ that is not terminating, and let v be a vertex such that
the first component of ValσV J is strictly greater than 1. Let C be the cycle that is eventually
reached by following σ and Br(σ) from v. There are two cases to consider:
• If C contains at least one vertex from a clock, then C must be entirely contained
within that clock, because there are no edges that leave either of the two clocks.
In this case we have that σ is not optimal because Friedmann has shown that his
construction is a one-sink game.
• If C does not contain a vertex from a clock, then it is entirely contained within the
circuits. First observe that C cannot be a two-vertex cycle using the vertices dji and
eji , because it is not a best-response for Odd allow a cycle with an even priority to
be formed, since he can always move to rj , and from there eventually reach a cycle
with priority p  1 (because the clock is a one-sink game). But, the only other way
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 25
to form a cycle in the circuits is to pass through both of the circuits. In this case,
the highest priority on the cycle will be an odd priority assigned to the state oji in
either a Not-gate (if there is one on the path), or an input/output gate (otherwise).
Since this odd priority is strictly greater than 1, and since player Even can always
assure a priority of 1 by, for example, moving to rj in every input/output state dji ,
we have that σ is not an optimal strategy.
Therefore, we have shown that the construction is a one-sink game.
4. Strategies
In this section, we define an initial strategy, and describe the sequence of strategies that
greedy all-switches strategy improvement switches through when it is applied to this initial
strategy. We will define strategies for each of the gadgets in turn, and then combine these
into a full strategy for the entire construction.
It should be noted that we will only define partial strategies in this section, which means
that some states will have no strategy specified. This is because our construction will work
no matter which strategy is chosen at these states.
To deal with this, we must define what it means to apply strategy improvement to a
partial strategy. If χ is a partial strategy and σ ∈ ΣEven is a strategy, then we say that σ
agrees with χ if σ(v) = χ(v) for every vertex v ∈ V for which χ is defined. So, if χ1 and χ2
are partial strategies, then we say that greedy all-switches strategy improvement switches
χ1 to χ2 if, for every strategy σ1 ∈ ΣEven that agrees with χ1, greedy all-switches strategy
improvement switches σ to a strategy σ2 that agrees with χ2.
We now describe the sequence of strategies. Each part of the construction will be
considered independently.
The clock. We start by defining the sequence of strategies that occurs in the two clocks. For
each clock bit-string K ∈ {0, 1}n, we define a sequence of strategies κK1 , κK2 , . . . , κKLength(K).
Greedy all-switches strategy improvement switches through each of these strategies in turn,
and then switches from κKLength(K) to κ
K+1
1 , where K + 1 denotes the bit-string that results
by adding 1 to the integer represented by K. The sequence begins in the first iteration
after the valuation of sj is larger than the valuation of rj . We will first present the building
blocks of this strategy, and the combine the building blocks into the full sequence.
We begin by considering the vertices tjl in the deceleration lane. Recall that these states
switch, in sequence, from rj to tjl−1. This is formalised in the following definition. For each
m ≥ 1, each l in the range 1 ≤ l ≤ 2k + 4n+ 6, and each j ∈ {0, 1}, we define:
ρm(t
j
0) =
{
sj if m = 1,
rj if m > 1.
ρm(t
j
l ) =

sj if m = 1,
rj if m > 1 and m ≤ l + 1,
tjl−1 if m > l + 1.
We now move on to consider the vertices dji , which represent the bits in the counter.
We begin by defining a sequence of strategies for the bits that are 0. Recall that these
vertices switch to the states aji along the deceleration lane until they run out of edges, at
26 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
which point they switch to the vertex eji . This is formalised in the following definition. For
each i in the range 1 ≤ i ≤ n, each m ≥ 1, and each j ∈ {0, 1}, we define:
ρm(d
j
i ) =

sj if m = 1,
rj if m = 2,
aj2k+2n+6+2i if m = 3,
ajm−3 if 4 ≤ m ≤ 2k + 2n+ 6 + 2i+ 3,
eji if m > 2k + 2n+ 6 + 2i+ 3.
Note that the first three iterations are special, because the edge to aj1 only becomes switch-
able in the third iteration. The edge to rj and the edge to aj2k+2n+6+2i prevent the edge to
eji being switched before this occurs.
We now give a full strategy definition for the vertices dji . The bits that are 0 follow
the strategy that we just defined, and the bits that are 1 always choose the edge to eji . For
each bit-string K ∈ {0, 1}n, each i in the range 1 ≤ i ≤ n, each m ≥ 1, and each j ∈ {0, 1},
we define:
ρKm(d
j
i ) =
{
ρm(d
j
i ) if Ki = 0,
eji if Ki = 1.
Finally, we consider the other vertices in the clock. To define strategies for these
vertices, we must first define some notation. For each i in the range 1 ≤ i ≤ n, we define
NextBit(K, i) to be a partial function that gives the index of the first 1 that appears higher
than index i: that is, the smallest index j > i such that Kj = 1. We now define the
strategies. These strategies all depend on the current clock bit-string K, and have no
dependence on how far the deceleration lane has switched, so the parameter m is ignored.
For each bit-string K ∈ {0, 1}n, each m ≥ 1, each i in the range 1 ≤ i ≤ n, and each
j ∈ {0, 1}, we define:
ρKm(g
j
i ) =
{
kji if Ki = 0,
f ji if Ki = 1.
ρKm(k
j
i ) =
{
gjNextBit(K,i) if NextBit(K, i) is defined,
x otherwise.
ρKm(r
j) =
{
gjNextBit(K,0) if NextBit(K, 0) is defined,
x otherwise.
ρKm(s
j) =
{
f jNextBit(K,0) if NextBit(K, 0) is defined,
x otherwise.
When the clock transitions between two clock bit-strings, there is a single iteration in
which the strategies defined above are not followed. This occurs one iteration after the
vertex djLsz(K) switches to e
j
Lsz(K). In this iteration, the vertices g
j
Lsz(K) and s
j switch to
f jLszK , while every other vertex continues to use the strategies that were defined above. We
now define a special reset strategy that captures this. For each bit-string K ∈ {0, 1}n, and
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 27
every vertex v in either of the two clocks we define:
ρKReset(v) =
{
f jLsz(K) if v = g
j
Lsz(K) or v = s
j ,
ρKLength(K)(v) otherwise.
We can now combine the strategies defined above in order to define the full sequence
of strategies that are used in the clocks. In the first Length(K) − 1 iterations, we follow
the sequence defined by the strategies ρKm(v), and in the final iteration we use the strat-
egy ρKReset(v). Formally, for each bit-string K ∈ {0, 1}n, each m in the range 1 ≤ m ≤
Length(K), and every vertex v in either of the two clocks, we define:
κKm(v) =
{
ρKm(v) if m ≤ Length(K)− 1,
ρKReset(v) if m = Length(K).
Friedmann showed the following lemma.
Lemma 4.1 ( [15]). Let K ∈ {0, 1}n. If we start all-switches strategy improvement at κK1 for
clock j, then it will proceed by switching through the sequence κK1 , κ
K
2 , . . . , κ
K
Length(K), κ
K+1
1 .
The circuits. For each bit-string B ∈ {0, 1}n, we give a sequence of strategies σB1 , σB2 , . . . ,
which describes the sequence of strategies that occurs when B is the input of the circuit.
The sequence is indexed from the point at which the circuit’s clock advances to the next
bit-string. That is, σB1 occurs one iteration after the valuation of s
j exceeds the valuation
of rj .
Recall that all of the gates with the same depth are evaluated in the same iteration.
We can now make this more precise: each gate i will be evaluated in the strategy σBd(i)+2.
After this iteration, there will then be two cases based on whether the gate evaluates to 1
or 0. To deal with this, we require the following notation. For each bit-string B and each
gate i, we define Eval(B, i) to be 1 if gate i outputs true on input B, and 0 otherwise.
Or gates. Before the gate is evaluated, the state oji chooses the edge to r
j . Once the gate
has been evaluated, there are four possibilities. If both input gates evaluate to false, then
the state oji continues to use the edge to r
j . If one of the two inputs is true, then oji will
switch to the corresponding input state. The case where both inputs are true is the most
complicated. Obviously, oji will switch to one of the two input states, and in fact, it switches
to the one with the highest valuation. Since the overall correctness of our construction does
not care which successor is chosen in this case, we simply define OrNext(i, B,m) to be the
successor with the highest valuation in step m of the sequence for bit-string B.
We can now formally define the sequence of strategies used by an Or-gate. For every
gate i ∈ Or, every pair of bit-strings B ∈ {0, 1}n, and every m ≥ 1 we define:
σBm(o
j
i ) =

sji if m = 1,
rji if m > 1 and m ≤ d(i) + 2,
rji if m > d(i) + 2 and Eval(B, I1(i)) = 0 and Eval(B, I2(i)) = 0,
InputState(i, j, 1) if m > d(i) + 2 and Eval(B, I1(i)) = 1 and Eval(B, I2(i)) = 0,
InputState(i, j, 2) if m > d(i) + 2 and Eval(B, I1(i)) = 0 and Eval(B, I2(i)) = 1,
OrNext(i, B,m) if m > d(i) + 2 and Eval(B, I1(i)) = 1 and Eval(B, I2(i)) = 1.
(4.1)
28 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Not gates. There are two components of the Not-gate gadget: the modified deceleration
lane and the state dji . We begin by considering the modified deceleration lane.
We first define a strategy for the case where the gate evaluates to false. In this case,
the input gate evaluates to true, which causes the modified deceleration lane to continue
switching after iteration d(i) + 2. We formalise this in the following definition, which is
almost identical to the definition given for the deceleration lane used in the clock. For each
i ∈ Not, each l in the range 1 ≤ l ≤ 2k + 4n + 6 with l 6= d(i), each j ∈ {0, 1}, and each
m ≥ 1, we define:
σm(t
j
i,0) =
{
sj if m = 1,
rj if m > 1.
(4.2)
σm(t
j
i,l) =

sj if m = 1,
rj if m > 1 and m ≤ l + 1,
tji,l−1 if m > l + 1.
(4.3)
On the other hand, if the gate evaluates to false, then the deceleration lane stops
switching. This is formalised in the following definition, which uses the previous definition
to give the actual strategy used by the modified deceleration lane. For each i ∈ Not, each
B ∈ {0, 1}n, each l in the range 0 ≤ l ≤ 2k+ 4n+ 6 with l 6= d(i), each j ∈ {0, 1}, and each
m ≥ 1, we define:
σBm(t
j
i,l) =

σm(t
j
i,l) if l ≤ d(i) + 1, or l > d(i) + 1 and Eval(B, I(i)) = 1,
sj if l > d(i) + 1 and m = 1 and Eval(B, I(i)) = 0,
rj if l > d(i) + 1 and m > 1 and Eval(B, I(i)) = 0.
We now turn out attention to the state dji , where we again begin by considering the
case where the gate evaluates to false. In this case, the state dji continues switching to
the modified deceleration lane. This is formalised in the following definition, which is
almost identical to the definition given for the corresponding states in the clock. For all
i ∈ Not ∪ Input/Output, all B ∈ {0, 1}n, for all m ≥ 1, and all j ∈ {0, 1}, we define:
σm(d
j
i ) =

sj if m = 1,
rj if m = 2,
aji,2k+4n+6 if m = 3,
aji,m−3 if 4 ≤ m ≤ 2k + 4n+ 6 + 3,
eji if 2k + 4n+ 6 + 3 < m.
On the other hand, if the gate evaluates to true, then after iteration d(i)+2, the state dji
switches to eji . This is formalised in the following definition, where the previous definition is
used in order to give the actual sequence of strategies for the state dji . For all B ∈ {0, 1}n,
all i ∈ Not, all j ∈ {0, 1}, and all m ≥ 1 we define:
σBm(d
j
i ) =
{
σm(d
j
i ) if m ≤ d(i) + 2, or m > d(i) + 2 and Eval(B, I(i)) = 1,
eji if m > d(i) + 2 and Eval(B, I(i)) = 0.
(4.4)
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 29
Input/output gates. We now describe the sequence of strategies used in the input/output
gates. These strategies are almost identical to the strategies that would be used in a Not-
gates with depth d(C) + 1, but with a few key differences. Firstly, whereas the Not-gates
used edges to rj and sj , these have instead been replaced with the edges to yj and zj from
the circuit movers. Secondly, the circuit movers cause a one iteration delay at the start
of the sequence. Note, however, that despite this delay, the input/output gates are still
evaluated on iteration d(C) + 3.
We begin by giving the strategies for the modified deceleration lane used in the in-
put/output gates. For each i ∈ Not, each l in the range 1 ≤ l ≤ 2k+ 4n+ 6 with l 6= d(C),
each j ∈ {0, 1}, and each m ≥ 1, we define:
σm(t
j
i,l) =

zj if m = 2,
yj if m = 1 or m > 1 and m ≤ l + 2,
tji,l−1 if m > l + 2.
σm(t
j
i,0) =
{
zj if m = 2,
yj if m = 1 or m > 2.
Then, for each i ∈ Not, each B ∈ {0, 1}n, each l in the range 0 ≤ l ≤ 2k + 4n + 6 with
l 6= d(C), each j ∈ {0, 1}, and each m ≥ 1, we define:
σBm(t
j
i,l) =

σm(t
j
i,l) if l < d(C) + 3, or l ≥ d(C) + 3 and Bi = 0,
zj if l > d(C) + 3 and m = 2 and Bi = 1,
yj if l > d(C) + 3 and either m = 1 or m > 2 and Bi = 1.
Finally, we give the strategy for the state dji . We reuse the strategy σm−1 from the Not-gate
definitions, but with a one iteration delay. For all B ∈ {0, 1}n, all i ∈ Input/Output, all
j ∈ {0, 1}, and all m > 1 we define:
σBm(d
j
i ) =
{
σm−1(d
j
i ) if 1 < m ≤ d(C) + 3, or m > d(C) + 3 and Bi = 0,
eji if m > d(i) + 3 and Bi = 1.
(4.5)
The circuit mover states. Finally, we describe the sequence of strategies used in the
states that move the input/output gates between the circuits. These strategies do not
depend on the current input bit-string to the circuit. Instead, they depend on the state of
both of the clocks, and are parameterized by the value of the delay function that we defined
earlier.
30 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Formally, for every m ≥ 1, every i ∈ Input/Output, and every clock-bit string
K ∈ {0, 1}n we define:
σKm(z
j) =
{
sj if m = 1,
rj if m > 1.
(4.6)
σKm(y
j) =
{
r1−j if m = 1 or m ≥ Delay(j,K) + 1
rj if m > 1 and m < Delay(j,K) + 1.
(4.7)
σKm(p
j
i ) =
{
pji,1 if m = 1 or m ≥ Delay(j,K) + 1,
ojI(i) if 1 < m ≤ Delay(j,K) + 1.
(4.8)
σKm(h
j
i,0) =
{
hji,2 if m = 1 or m ≥ Delay(j,K) + 1,
hji,1 if 1 < m < Delay(j,K) + 1.
(4.9)
Putting it all together. We can now define a combined sequence of strategies for the
entire construction. We will defined a sequence of strategies χB,K,j1 , χ
B,K,j
2 , . . . , which
describes a computation in circuit j under the following conditions:
• The clock for circuit j currently stores K in its binary counter.
• The input to circuit j is B.
Before stating the strategies, we first define some necessary notation. For every clock
bit-string K ∈ {0, 1}, and every j ∈ {0, 1} we define:
OC(K, j) =
{
K − 1 if j = 0,
K if j = 1.
This gives the bit-string used in the other clock, when circuit j is computing. Since clock 0
is ahead of clock 1, we have that OC(K, 0) is the bit-string before K, while OC(K, 1) is the
same as K.
We can now define the sequence. For each bit-string B ∈ {0, 1}n, each bit-string
K ∈ {0, 1}n, each m ≥ 1, and every vertex v we define:
χB,K,jm (v) =

κKm(v) if v is in clock j,
κ
OC(K,j)
m+Delay(1−j,OC(K,j))(v) if v is in clock 1− j,
σBm(v) if v is in a Not or Or gate in circuit j,
σ
F (B)
m (v) if v is in an input/output gate in circuit j,
σBm+Delay(1−j,OC(K,j))(v) if v is an input/output gate in circuit 1− j,
σKm(v) if v is a circuit mover state in circuit j,
σ
OC(K,j)
m+Delay(1−j,OC(K,j))(v) if v is a circuit mover state in circuit 1− j.
The first two cases of this definition deal with the clocks: the clock in circuit j follows the
sequence for bit-string K, while the clock in circuit 1 − j continues to follow the sequence
for bit-string OC(K, j). Observe that the clock for circuit 1 − j has already been running
for Delay(1 − j,OC(K, j)) iterations, so the strategies for this clock start on iteration 1 +
Delay(1−j,OC(K, j)). The next two cases deal with the gate gadgets in circuit j: the Not
and Or gates follow the sequence for bit-string B, and then the input/output gates for
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 31
circuit j, which are in output mode, store F (B). The next case deals with the input/output
gates in circuit 1 − j are in input mode, and so follow the strategy for bit-string B. The
final two cases deal with the circuit mover states, which follow the strategies for the clock
bit-string used in their respective clocks. Observe that no strategy is specified for the gate
gadgets in circuit 1− j, because the strategy chosen here is irrelevant.
For technical convenience, we define:
χB,K,0Delay(0,K) = χ
F (B),K,1
1
χB,K,1Delay(1,K) = χ
F (B),K+1,0
1
Using this definition, we can now state the main technical claim of the paper.
Lemma 4.2. Let B ∈ {0, 1}n be a bit-string, let C ∈ {0, 1}n be a bit-string such that
C 6= (1, 1, . . . , 1), and let j ∈ {0, 1}. If greedy all-switches strategy improvement is applied
to χB,K,01 , then it will pass through the sequence:
χB,K,j1 , χ
B,K,j
2 , . . . , χ
B,K,j
Delay(j,K).
Unfortunately, the proof of this lemma is quite long, and the vast majority of it is presented
in the appendix. In Section 5, we give an overview of the proof, and describe how each of
the individual appendices fit into the overall proof.
Best responses. Recall that, for each strategy considered, strategy improvement computes
a best-response for the opponent. Now that we have defined the sequence of strategies, we
can also define the best-responses to these strategies. For each strategy χB,K,ji , we define
a strategy µB,K,ji ∈ ΣOdd that is a best-response to χB,K,ji . We will later prove that these
strategies are indeed best-responses.
We begin by considering the vertices eji for each Not-gate i. Recall that these vertices
only pick the edge to hji,0 in the case where they are forced to by d
j
i selecting the edge to
eji . As defined above, this only occurs in the case where the Not-gate evaluates to true.
Formally, for each bit-string B ∈ {0, 1}n, each bit-string K ∈ {0, 1}n, each m ≥ 1, and
every i ∈ Not we define:
µB,K,jm (e
j
i ) =
{
hji,0 if m > d(i) + 2 and Eval(B, I(i)) = 0,
dji otherwise.
For the input/output gadgets in circuit 1 − j, which will provide the input to circuit
j, the situation is the same. The vertex e1−ji chooses the edge to h
1−j
i,0 if and only if Bi is
1. Formally, for each bit-string B ∈ {0, 1}n, each bit-string K ∈ {0, 1}n, each m ≥ 1, and
every vertex i ∈ Input/Output we define:
µB,K,jm (e
1−j
i ) =
{
h1−ji,0 if Bi = 1,
d1−ji if Bi = 0.
For the input/output gadgets in circuit 1 − j, the situation is the largely the same as
for a Not-gate with depth d(C) + 1, and the edge chosen depends on F (B)i. However, one
difference is that we do not define a best-response for the case where m = 1, because the
input/output gadget does not reset until the second iteration, and our proof does not depend
32 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
on the best response chosen in iteration one. Formally, for each bit-string B ∈ {0, 1}n, each
bit-string K ∈ {0, 1}n, each m > 1, and every vertex i ∈ Input/Output we define:
µB,K,jm (e
j
i ) =
{
hji,0 if m > d(i) + 3 and F (B)i = 1,
dji otherwise.
Finally, we define the best responses for the vertices qj as follows. For each bit-string
B ∈ {0, 1}n, each bit-string K ∈ {0, 1}n, each m in the range 1 ≤ m ≤ Delay(j,K)−1, and
every vertex i ∈ Input/Output we define:
µB,K,jm (q
j
i,0) =
{
eji if m = 1,
qji,1 if m > 1.
µB,K,jm (q
1−j
i,0 ) =
{
e1−ji .
5. The Proof
In this section we give the proof for Lemma 4.2. Let B,K ∈ {0, 1}n be two bit-strings, let
j ∈ {0, 1}, and let m be in the range 1 ≤ m ≤ Delay(j,K)− 1. We must show that greedy
all-switches strategy improvement switches χB,K,jm to χ
B,K,j
m+1 .
Let σ be a strategy that agrees with χB,K,jm . Since we are using the all-switches switching
rule, we can consider each vertex v independently, and must show that the most appealing
outgoing edge at v is the one specified by χB,K,jm+1 (in our construction there will always be
exactly one most appealing edge, so we do not care how ties are broken by the switching
rule). Hence, the majority of the proof boils down to calculating the valuation of each
outgoing edge of v, and then comparing these valuations.
To compare the valuation of two outgoing edges (v, u) and (v, w), we usually use the
following technique. First we consider the two paths pi1 and pi2 that start at u and w,
respectively, and follow σ and Br(σ). Then we find the first vertex v′ that is contained in
both paths. Since the v relation only cares about the maximum difference between the two
paths, all priorities that are visited after v′ are irrelevant, since they appear in both Valσ(u)
and Valσ(w). On the other hand, since each priority is assigned to at most one vertex, all of
the priorities visited by p1 before reaching v
′ are contained in Valσ(u) and not contained in
Valσ(w), and all of the priorities visited by p2 before reaching v
′ are contained in Valσ(w)
and not contained in Valσ(u). So it suffices to find the largest priority on the prefixes of p1
before v′ and the prefix of p2 before v′. The parity of this priority then determines whether
Valσ(u) v Valσ(w) according to the rules laid out in the definition of v.
We now give an outline of the proof.
• The fact that the two clocks switch through their respective strategies follows from
Lemma 4.1.
• The difference in valuation between the states rj and sj of the clock are the driving
force of the construction. In Appendix A, we give two lemmas that formalize this
difference.
• Next, in Appendix B, we prove that the best-response strategies defined in Section 4
are in fact the best responses. That is, we show that µB,K,jm is a best response to
every strategy σ that agrees with χB,K,jm .
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 33
• In Appendix B.4, we give two key lemmas that describe the valuations of the output
stats oji . These lemmas show three important properties. Firstly, if m ≤ d(i) + 2,
then the valuation of oji is low, so there is no incentive to switch to o
j
i before gate i is
evaluated. Secondly, if m > d(i) + 2 and the gate evaluates to 0, then the valuation
of oji remains low. Finally, if m > d(i) + 2 and the gate evaluates to 1, then the
valuation of oji is high. These final two properties allow the gates with depth strictly
greater than i to compute their outputs correctly.
• The rest of the proof consists of proving that all vertices switch to the correct
outgoing edge. The states oji in the Or gate gadgets are dealt with in Appendix C.
The states tji,l in the Not gates are dealt with in Appendix D. The states d
j
i in the
Not gates are dealt with in Appendix E. The states zj and z1−j are dealt with
in Appendix F, and the states yj and y1−j are dealt with in Appendix G. The
states pj and p1−j are dealt with in Appendix H and the states hji,0 are dealt with
in Appendix I. Finally, the states in the Input/Output gates, which behave in a
largely identical way to the Not gates, are dealt with in Appendix J.
All of the above combines to provide a proof for Lemma 4.2.
Having shown this Lemma, we can now give the reduction fromBitSwitch toEdgeSwitch.
Given a circuit iteration instance (F,B, z), we produce the parity game G corresponding
to F , we use χB,1,02 as the initial strategy, and if i ∈ Input/Output is the input/output
gate corresponding to index z, then we will monitor whether the edge from d0i to e
0
i is ever
switched by greedy all-switches strategy improvement. We therefore produce the instance
EdgeSwitch(G, (d0i , e
0
i ), χ
B,1,0
2 ). We must take care to ensure that χ
B,1,0
2 is a terminating
strategy, which is proved in the following Lemma.
Lemma 5.1. We have that χB,1,02 is a terminating strategy.
Proof. Firstly, since we use the same strategies as Friedmann in the clock, we do not need
to prove that theses portions of the strategies are terminating, because this has already
been shown by Friedmann. In particular, this implies that all paths starting at sj and rj
for j ∈ {0, 1} will eventually arrive at the sink x. Therefore, it is sufficient to show that the
first component of Val
χB,1,02
VJ (v) is 1 for every vertex v in the circuits.
Observe that, in the strategy χB,1,02 , we have that the only possible cycles that player
Odd can form in the best response are two-vertex cycles of the form dji and e
j
i , but these
cycles have an even priority, so the best response can not choose them. In particular, it is
not possible to form a cycle that passes through the input/output gadgets in both circuits,
because the path that starts at o0i for every input/output gate i must eventually arrive at
sj . Thus, we have that all paths starting at all vertices in the circuits that follow χB,1,02 and
its best response will eventually arrive at this sink x.
Now all that remains is to argue that EdgeSwitch(G, (d0i , e
0
i ), χ
B,1,0
2 ) is true if and
only if BitSwitch(F,B, z) is true. To do this, we simply observe that the sequence of
strategies used in Lemma 4.2 only ever specify that d0i must be switched to e
0
z in the case
where there is some even j such that F j(B)z = 1. At all other times, the vertex d
0
i chooses
an edge other that eji . Hence, the reduction is correct, and we have shown Theorem 1.2.
34 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
6. Other Algorithms
Other strategy improvement algorithms. As we mentioned in the introduction, The-
orem 1.2 implies several results about other algorithms. In particular, discounted games
and simple-stochastic games both have natural strategy improvement algorithms given by
Puri [36], and Condon [4], respectively. Friedmann showed that if you take a one-sink parity
game, and apply the natural reduction from parity games to either discounted or simple
stochastic games, then the greedy variants of Puri’s and Condon’s algorithms will switch
exactly the same edges as the algorithm of Vo¨ge and Jurdzin´ski [15, Corollory 9.10 and
Lemma 9.12]. Hence, Theorem 1.2 also implies the discounted and simple-stochastic cases
of Corollary 1.3.
One case that was missed by Friedmann was mean-payoff games. There is a natural
strategy improvement algorithm [14] for mean-payoff games that adopts the well-known
gain-bias formulation from average-reward MDPs [37]. In this algorithm, the valuations
has two components: the gain of a vertex gives the long-term average-reward that can be
obtained from that vertex under the current strategy, and the bias measures the short term
deviation from the long-term average.
We argue that, if we apply the standard reduction from parity games to mean-payoff
games, and then set the reward of x to 0, then the gain-bias algorithm for mean-payoff games
switches exactly the same edges as the algorithm of Vo¨ge and Jurdzin´ski. The standard
reduction from parity games to mean-payoff games [26,36] replaces each priority p with the
weight (−m)p, where m denotes the number of vertices in the parity game. By setting the
weight of x to 0, we ensure that the long-term average reward from each state is 0. Previous
work has observed [12], that if the gain is 0 at every vertex, then the bias represents the
total reward that can be obtained from each state. It is not difficult to prove that, after the
standard reduction has been applied, the total reward that can be obtained from a vertex
v is larger than the total reward from a vertex u if and only if Valσ(u) @ Valσ(v) in the
original parity game. This is because the rewards assigned by the standard reduction grow
quickly enough so that only the largest priority visited matters. Hence, we also have the
mean-payoff case of Corollary 1.3.
Bjo¨rklund and Vorobyov have also devised a strategy improvement algorithm for mean-
payoff games. Their algorithm involves adding an extra sink vertex, and then adding
edges from every vertex of the maximizing player to the sink. Their valuations are also
the total reward obtained before reaching the sink. We cannot show a similar result for
their algorithm, but we can show a result for a variant of their algorithm that only gives
additional edges to a subset of the vertices of the maximizing player. To do this, we do
the same reduction as we did for the gain-bias algorithm, and then we only add an edge
from x to the new sink added the Bjo¨rklund and Vorobyov. The same reasoning as above
then implies that the Bjo¨rklund-Vorobyov algorithm will make the same switches as the
Vo¨ge-Jurdzin´ski algorithm.
Unique sink orientations. As mentioned in the introduction, there is a relationship be-
tween strategy improvement algorithms and sink-finding algorithms for unique sink orienta-
tions. Our result already implies a similar lower bound for sink-finding algorithms applied
to unique sink orientations. However, since the vertices in our parity game have more than
two outgoing edges, these results only hold for unique sink orientations of grids. The more
commonly studied model is unique sink orientations of hypercubes, which correspond to
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 35
oji,2
5
oj
I1(i)
oj
I2(i)
oji,1
3
oji
1
rj
sj
Figure 12. The binary Or gate.
binary parity games, where each vertex has at most two outgoing edges. We argue that our
construction can be formulated as a binary parity game.
Friedmann has already shown that his construction can be formulated as a binary parity
game [15], so we already have that the clocks can be transformed so that they are binary.
Furthermore, since our Not-gates and our Input/Output-gates are taken directly from
Friedmann’s bit gadget, we can apply Friedmann’s reduction to make these binary. In
particular, note that all of the extra states that we add to the input/output gate are binary,
so these states do not need any modification.
The only remaining part of the construction is the Or-gate, which has four outgoing
edges. We replace the existing gadget with a modified gadget, shown in Figure 12.
Vertex Conditions Edges Priority Player
oji j ∈ {0, 1}, i ∈ Or sj , oji,1 P(4, i, 0, j, 1) Even
oji,1 j ∈ {0, 1}, i ∈ Or rj , oji,2 P(4, i, 1, j, 1) Even
oji,2 j ∈ {0, 1}, i ∈ Or InputState(i, j, 1), InputState(i, j, 2) P(4, i, 2, j, 1) Even
This gadget replaces the single vertex of the original Or-gate, with three binary vertices.
The only significant difference that this gadget makes to the construction is that now it can
take up to two strategy improvement iterations for the Or-gate to compute its output.
This is because, we may have to wait for oji,2 to switch before o
j
i,1 can switch. The vertex o
j
i
always chooses the edge oji,1 during the computation, because the valuation of r
j is larger
than the valuation of sj .
To deal with this, we can redesign the construction so that each Not-gate i is computed
on iteration 2i rather than iteration i, and each Or-gate is computed before iteration 2i.
This involved making the following changes:
36 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
• The length of the deceleration lane in the two clocks must be extended by 2k, to
account for the 2k extra iterations it takes for the circuits to compute (k extra
iterations for circuit 0 and k extra iterations for circuit 1). Moreover, the delays for
both of the clocks must be increased by k.
• For the same reason, the length of the modified deceleration lanes in the Not and
Input/Output gates must be increased by 2k.
• Finally, the edge to ojInputState(i,j) must be moved from ti,d(i) to ti,2d(i).
Once these changes have been made, we have produced a binary parity game.
One final thing we must be aware of is that we only get a unique-sink orientation if there
is never a tie between the valuation of two vertices. This, however, always holds in a one-
sink game where every vertex has a distinct priority, because all paths necessarily contain
a distinct set of priorities, which prevents ties in the v ordering. Therefore, we have the
PSPACE-completeness result for the BottomAntipodal algorithm claimed in Corollary 1.7.
7. The optimal strategy result
We are also able to prove a result about the complexity of determining which optimal
strategy is found by the Vo¨ge-Jurdzin´ski algorithm. However, we cannot formulate this
in the context of a one-sink game, because any result of this nature must exploit ties in
valuations. In a one-sink game, since every vertex has a different priority, no two paths can
have the same set of priorities, so ties are not possible. Hence, for a one-sink game, there
will be a unique optimal strategy, and so the complexity of finding it can be no harder than
solving the parity game itself, and this problem is not PSPACE-complete unless PSPACE =
UP ∩ coUP.
On the other hand, ties in valuations are possible in the original Vo¨ge-Jurdzin´ski algo-
rithm. This is because the first component of their valuation is not necessarily 1, and so the
second component does not necessarily contain every priority along the relevant path (recall
that priorities smaller than the first component are not included in the second component).
These facts mean that it is possible to construct parity games that have multiple optimal
strategies under the Vo¨ge-Jurdzin´ski valuation.
Our modified construction. We will use a slight modification of our construction to show
that computing the optimal strategy found by the Vo¨ge-Jurdzin´ski algorithm is PSPACE-
complete. The key difference is the addition of a third clock with n + 1 bits, which will
be indexed by 2. We remove a single edge from this clock: the edge from e2n+1 to h
2
n+1 is
removed.
Recall that in the clock construction, the odd vertices eji do not use the edge to h
j
i
unless they are forced to by the vertex dji selecting the edge to e
j
i . Hence, until the n+ 1th
bit is flipped, the third clock behaves like any other clock. When the n+ 1th bit is flipped,
after 2n iterations have taken place, a new cycle is formed with a very large even priority.
We also modify the edges that leave d1z. For each edge e = (d
1
z, u) we do the following:
(1) We delete e.
(2) We introduce a new vertex vu owned by player Even. This vertex is assigned an
insignificant priority that, in particular, is much smaller than the priorities assigned
to e2n+1 and d
2
n+1.
(3) We add the edges (d1z, vu), (vu, u), and (vu, f
2
n+1).
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 37
The following table summarises the extra clock that we add to the construction, and the new
outgoing edges from d1z. For ease of notation, we define U = {y1, z1, e1z} ∪ {a1z,l : 1 ≤ l ≤
2k+ 4n+ 6} to be the original outgoing edges from d1z that will now be replaced. Moreover,
we assume that each vertex u is represented by a number in the range 1 ≤ u ≤ |U |, which
will be used as part of the priority for vu.
Vertex Conditions Edges Priority Player
t20 r
2, s2 P(2, 0, 2k + 4n+ 4, 2, 0) Even
t2l 1 ≤ l ≤ 2k + 4n+ 6 r2, s2, t2l−1 P(2, 0, l, 2, 1) Even
a2l 1 ≤ l ≤ 2k + 4n+ 6 t2l P(2, 0, l + 1, 2, 0) Even
d2i 1 ≤ i ≤ n e2i , s2, r2, P(1, i, 0, 2, 1) Even
a2l for 1 ≤ l ≤ 2k + 2n+ 6 + 2i
e2i 1 ≤ i ≤ n di P(1, i, 1, 2, 0) Odd
g2i 1 ≤ i ≤ n f2i P(1, i, 2, 2, 1) Even
k2i 1 ≤ i ≤ n x, g2l , for i < l ≤ n P(8, i, 0, 2, 1) Even
f2i 1 ≤ i ≤ n e2i P(8, i, 1, 2, 1) Even
h2i 1 ≤ i ≤ n k2i P(8, i, 2, 2, 0) Even
s2 x, f2l for 1 ≤ l ≤ n P(7, 0, 0, 2, 0) Even
r2 x, g2l for 1 ≤ l ≤ n P(7, 0, 1, 2, 0) Even
vu u ∈ U u P(0, 0, 0, u, 0) Even
d1z vu for all u ∈ U P(4, i, 0, j, 1) Even
PSPACE-completeness. We now argue how this modified construction provides a PSPACE-
hardness proof for the optimal strategy decision problem. Before the n+1th bit of the third
clock flips, the edge (vu, f
2
n+1) is never switchable due to the large odd priority assigned to
f2n+1, so this modification does not affect the computation of F
2n(B). On the other hand,
once the n + 1th bit of the third clock flips, all edges of the form (vu, f
2
n+1) immediately
become switchable, because the first component of the valuation of f2n+1 is now a large even
priority, and not 1. So all of these edges will be switched simultaneously.
The key thing to note is that, since the priorities assigned to the vertices vu are in-
significant, they do not appear in the second component of the valuation, and so vertex d1z
is now indifferent between all of its outgoing edges. Moreover, the vertices vu never switch
away from f2n+1 for the following reasons:
• The only even cycle that can be forced by player Even is the one that uses d2n+1 and
e2n+1. So, these vertices must select a strategy that reaches this cycle eventually.
• All priorities used in the circuits are smaller than the priority of the cycle between
d2n+1 and e
2
n+1. So, the second component of the valuation function is irrelevant,
and the only way of improving the strategy would be to find a shorter path to the
cycle.
• The vertices vu are the only vertices that have edges to the third clock, so the only
way a vertex vu could reach the third clock would be to travel through both circuits
to reach d1z, and then use a different vertex vu′ , but this would be a much longer
path, and therefore this would have a lower valuation.
Hence, the vertices vu will never switch away from f
2
n+1.
Observe that, after 2n iterations, the input/output gadgets in circuit 1−j store the value
of F 2
n
(B), and therefore d1z chooses the edge to e
j
z if and only if the zth bit of F 2
n
(B) is 1.
The above argument implies that d1z does not switch again, so in the optimal strategy found
38 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
by the algorithm, the vertex d1z chooses the edge to e
j
z if and only if the zth bit of F 2
n
(B)
is 1. Thus, we have that computing the optimal strategy found by the Vo¨ge-Jurdzin´ski
strategy improvement algorithm is PSPACE-complete, as claimed in Theorem 1.5.
Other games. We also get similar results for the gain-bias algorithm for mean-payoff
games, and the standard strategy improvement algorithms for discounted and simple sto-
chastic games. For the most part, we can still rely on the proof of Friedmann for these
results. This is because, although we do not have a one-sink game, the game behaves as a
one-sink game until the n+ 1th bit in the third clock is flipped. An easy way to see this is
to reinstate the edge between e2n+1 and h
2
n+1 to create a one-sink game, and observe that,
since the edge is not used in the best response until the n + 1th bit is flipped, it cannot
affect the sequence of strategies visited by strategy improvement. Once the n+ 1th bit has
flipped, we only care about making d1z indifferent between its outgoing edges, and in this
section we explain how this is achieved.
For mean-payoff games, we use the same reduction as we did in Section 6 to our altered
construction. After doing this, we set the weight of the vertices vu to 0 to ensure that d
1
z
will be exactly indifferent between all of its outgoing edges once these vertices switch to
f2n+1. This gives the result for the gain-bias algorithm in mean-payoff games.
For discounted games, once the standard reduction from mean-payoff to discounted
games has been applied, the proof of Friedmann already implies that the discounted game
algorithm makes the same decisions as the Vo¨ge-Jurdzin´ski algorithm for the vertices other
than d1z. The only worry is that the discount factor may make the vertex d
1
z not indifferent
between some of its outgoing edges. However, it is enough to note that all paths from d1z
to f2n+1 have length 2, and therefore the vertex will be indifferent no matter what discount
factor is chosen. This gives the result for the standard strategy improvement algorithm for
discounted games.
Finally, after applying the standard reduction from discounted to simple-stochastic
games, the proof of Friedmann can be applied to argue that the valuations in the simple
stochastic game are related to the valuations in the discounted game by a linear transfor-
mation. Hence, d1z will still be indifferent between its outgoing edges after the n+ 1th bit is
flipped. This gives the result for the standard strategy improvement algorithm for simple
stochastic games. Thus, we have the claimed results from Corollary 1.6.
8. Open problems
Strategy improvement generalizes policy iteration which solves mean-payoff and discounted-
payoff Markov decision processes [37]. The exponential lower bounds for greedy all-switches
have been extended to MDPs. Fearnley showed that the second player in Friedmann’s
construction [15] can be simulated by a probabilistic action, and used this to show an
exponential lower bound for the all-switches variant of policy iteration of average-reward
MDPs [8]. This technique cannot be applied to the construction in this paper, because we
use additional Odd player states (in particular the vertices qji,1) that cannot be translated in
this way. Can our PSPACE-hardness results be extended to all-switches strategy improvement
for MDPs?
Also, there are other pivoting algorithms for parity games that deserve attention. It has
been shown that Lemke’s algorithm and the Cottle-Dantzig algorithm for the P-matrix lin-
ear complementarity problem (LCP) can be applied to parity, mean-payoff, and discounted
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 39
games [11, 29]. It would be interesting to come up with similar PSPACE-completeness re-
sults for these algorithms, which would also then apply to the more general P-matrix LCP
problem.
40 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
References
[1] I. Adler, C. H. Papadimitriou, and A. Rubinstein. On simplex pivoting rules and complexity theory. In
Proc. of IPCO, pages 13–24, 2014.
[2] C. S. Calude, S. Jain, B. Khoussainov, W. Li, and F. Stephan. Deciding parity games in quasipolynomial
time. In Proc. of STOC, pages 252–263, 2017.
[3] A. Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992.
[4] A. Condon. On algorithms for simple stochastic games. In Advances in Computational Complexity
Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science,
pages 51–73. American Mathematical Society, 1993.
[5] Y. Disser and M. Skutella. The simplex algorithm is NP-mighty. In Proc. of SODA, pages 858–872,
2015.
[6] E. A. Emerson and C. S. Jutla. Tree automata, mu-calculus and determinacy. In Proc. of FOCS, pages
368–377, 1991.
[7] E. A. Emerson, C. S. Jutla, and A. P. Sistla. On model-checking for fragments of µ-calculus. In Proc.
of CAV, pages 385–396, 1993.
[8] J. Fearnley. Exponential lower bounds for policy iteration. In Proc. of ICALP, pages 551–562, 2010.
[9] J. Fearnley. Non-oblivious strategy improvement. In Proc. of LPAR, pages 212–230, 2010.
[10] J. Fearnley, S. Jain, S. Schewe, F. Stephan, and D. Wojtczak. An ordered approach to solving parity
games in quasi polynomial time and quasi linear space. In Proc. of SPIN, pages 112–121, 2017.
[11] J. Fearnley, M. Jurdzin´ski, and R. Savani. Linear complementarity algorithms for infinite games. In
Proc. of SOFSEM, pages 382–393, 2010.
[12] J. Fearnley and R. Savani. The complexity of the simplex method. In Proc. of STOC, pages 201–208,
2015.
[13] J. Fearnley and R. Savani. The complexity of all-switches strategy improvement. In Proc. of SODA,
pages 130–139, 2016.
[14] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, 1997.
[15] O. Friedmann. An exponential lower bound for the latest deterministic strategy iteration algorithms.
Logical Methods in Computer Science, 7(3), 2011.
[16] O. Friedmann, T. D. Hansen, and U. Zwick. A subexponential lower bound for the random facet
algorithm for parity games. In Proc. of SODA, pages 202–216, 2011.
[17] O. Friedmann, T. D. Hansen, and U. Zwick. Subexponential lower bounds for randomized pivoting rules
for the simplex algorithm. In Proc. of STOC, pages 283–292, 2011.
[18] B. Ga¨rtner, W. D. J. Morris, and L. Ru¨st. Unique sink orientations of grids. Algorithmica, 51(2):200–235,
2008.
[19] B. Ga¨rtner and I. Schurr. Linear programming and unique sink orientations. In Proc. of SODA, pages
749–757, 2006.
[20] B. Ga¨rtner and A. Thomas. The complexity of recognizing unique sink orientations. In Proc. of STACS,
pages 341–353, 2015.
[21] P. W. Goldberg, C. H. Papadimitriou, and R. Savani. The complexity of the homotopy method, equi-
librium selection, and Lemke-Howson solutions. ACM Trans. Economics and Comput., 1(2):9, 2013.
[22] E. Gra¨del, W. Thomas, and T. Wilke, editors. Automata, Logics, and Infinite Games. A Guide to
Current Research, volume 2500 of LNCS. Springer, 2002.
[23] T. D. Hansen, M. Paterson, and U. Zwick. Improved upper bounds for random-edge and random-jump
on abstract cubes. In Proc. of SODA, pages 874–881, 2014.
[24] A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Management Science, 12(5):359–
370, 1966.
[25] D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. How easy is local search? J. Comput. Syst.
Sci., 37(1):79–100, 1988.
[26] M. Jurdzin´ski. Deciding the winner in parity games is in UP ∩ co-UP. Information Processing Letters,
68(3):119–124, 1998.
[27] M. Jurdzin´ski and R. Lazic´. Succinct progress measures for solving parity games. In Proc. of LICS,
2017.
[28] M. Jurdzin´ski, M. Paterson, and U. Zwick. A deterministic subexponential algorithm for solving parity
games. In Proc. of SODA, pages 117–123, 2006.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 41
[29] M. Jurdzin´ski and R. Savani. A simple P-matrix linear complementarity problem for discounted games.
In Proc. of CiE, pages 283–293, 2008.
[30] G. Kalai. A subexponential randomized simplex algorithm. In Proc. of STOC, pages 475–482, 1992.
[31] W. Ludwig. A subexponential randomized algorithm for the simple stochastic game problem. Informa-
tion and Computation, 117(1):151–155, 1995.
[32] J. Matousˇek, M. Sharir, and E. Welzl. A subexponential bound for linear programming. Algorithmica,
16(4–5):498–516, 1996.
[33] J. Matousˇek and T. Szabo´. Random edge can be exponential on abstract cubes. In Proc. of FOCS,
pages 92–100, 2004.
[34] A. W. Mostowski. Games with forbidden positions. Technical Report 78, University of Gdan´sk, 1991.
[35] C. H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence.
Journal of Computer and System Sciences, 48(3):498–532, 1994.
[36] A. Puri. Theory of Hybrid Systems and Discrete Event Systems. PhD thesis, University of California,
Berkeley, 1995.
[37] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley
& Sons, Inc. New York, NY, USA, 2005.
[38] S. Schewe. An optimal strategy improvement algorithm for solving parity and payoff games. In Proc.
of CSL, pages 369–384, 2008.
[39] I. Schurr and T. Szabo´. Jumping doesn’t help in abstract cubes. In Proc. of IPCO, pages 225–235, 2005.
[40] S. Smale. Mathematical problems for the next century. The Mathematical Intelligencer, 20(2):7–15,
1998.
[41] C. Stirling. Local model checking games. In Proc. of CONCUR, pages 1–11, 1995.
[42] T. Szabo´ and E. Welzl. Unique sink orientations of cubes. In Proc. of FOCS, pages 547–555, 2001.
[43] J. Vo¨ge and M. Jurdzin´ski. A discrete strategy improvement algorithm for solving parity games. In
Proc. of CAV, pages 202–215, 2000.
[44] U. Zwick and M. S. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer
Science, 158(1–2):343–359, 1996.
42 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Appendix A. Facts about the clock
In this section we prove two important lemmas about the clocks. The first lemma shows an
important property about the difference in valuation between rj and sj for the clock used
by circuit j. The second lemma considers the difference in valuation across the two clocks,
by comparing the valuations of rj , sj , r1−j , and s1−j .
Lemma A.1. Let σ be a strategy that agrees with κKm for some m in the range 1 ≤ m ≤
Length(K), some clock-value K ∈ {0, 1}n, and some j ∈ {0, 1}. We have:
(1) If m = Length(K)− 1 then Valσ(rj) @ Valσ(sj).
(2) If m < Length(K)− 1 then Valσ(sj) @ Valσ(rj).
In both cases, we have that MaxDiffσ(sj , rj) ≥ P(7, 0, 0, 0, 0).
Proof. We begin with the first case. In this case, by definition, we have that the path that
starts at sj and follows σ moves to f ji for some i, whereas the path that starts at r
j and
follows σ moves to gji′ for some i
′ 6= i. There are two possibilities.
(1) If i > i′, then since i is the least significant zero in K, we must have that i′ =
NextBit(K, 0) = 1. Hence, the path that starts at gji′ passes through the bit gadgets
for all bits strictly smaller than i before eventually arriving at gjNextBit(K,i) (or x if
NextBit(K, i) is not defined). In particular, since the path does not pass through
hji the largest priority on the path is strictly smaller than P(8, i, 2, j, 0).
On the other hand, the path that starts at f ji eventually arrives at g
j
NextBit(K,i) (or
x if NextBit(K, i) is not defined) and it does pass through hji . The largest priority
on this path is P(8, i, 2, j, 0), and since the priority is even, we can conclude that
Valσ(rj) @ Valσ(sj).
(2) If i < i′, then since i′ is the least significant one in K, we must have that i = 1.
Hence, the path that starts at f ji eventually moves to k
j
i and then directly to g
j
i′ .
The largest priority on this path is P(8, i′, 2, j, 0), and since this is even, we can
conclude that Valσ(rj) @ Valσ(sj).
We now move on to the second case. In this case, by definition, we have that the path
that starts at sj moves to f ji for some i, whereas the path that starts at r
j moves to gji
and then to f ji . Since the priority on g
j
i is strictly smaller than P(7, 0, 1, j, 0), we have that
MaxDiffσ(rj , sj) = P(7, 0, 1, j, 0), which the priority assigned to rj . Since this priority is
even, we have that Valσ(sj) @ Valσ(rj), as required.
Observe that in all cases considered above, we have shown that MaxDiffσ(rj , sj) ≥
P(7, 0, 0, 0, 0) as required. Hence, we have completed the proof of this lemma.
Lemma A.2. Let σ be a strategy that agrees with χB,K,jm for some m ≥ 1, some bit-strings
B,K ∈ {0, 1}n, and some j ∈ {0, 1}. We have:
(1) If m = Delay(j,K)−1, then Valσ(r1−j) @ Valσ(rj) @ Valσ(s1−j) and MaxDiffσ(r1−j , sj) ≥
P(7, 0, 0, 0, 0) and MaxDiffσ(r1−j , rj) ≥ P(7, 0, 0, 0, 0).
(2) If m < Delay(j,K)−1, then Valσ(r1−j) @ Valσ(sj) @ Valσ(rj) and MaxDiffσ(r1−j , sj) ≥
P(7, 0, 0, 0, 0) and MaxDiffσ(sj , r1−j) ≥ P(7, 0, 0, 0, 0).
Proof. We begin with the second claim. The fact that Valσ(sj) @ Valσ(rj) follows from
part 2 of Lemma A.1, so it is sufficient to show that Valσ(r1−j) @ Valσ(sj). There are two
cases to consider, based on whether j = 0 or j = 1.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 43
(1) If j = 0, then clock j uses bit-string K, and clock 1 − j uses bit-string K − 1.
Observe that the clock strategies specify that the path starting at sj visits hji if
and only if Ki = 1. Similarly, the path starting at r
1−j visits h1−ji if and only if
(K − 1)i = 1. If i′ is the index of the least significant 1 in K, then we have that the
path that starts at sj visits hji′ and k
j
i′ , and the path that starts at s
1−j does not
visit these vertices. Moreover, these two paths are the same after this point. Hence,
we have that MaxDiff(sj , r1−j) is P(8, i′, 2, j, 0), and since this priority is even, we
can conclude that Valσ(r1−j) @ Valσ(sj).
(2) If j = 1, then both clocks use bit-string K. Hence, the paths starting at sj and r1−j
use the same path through their respective clocks. So, if i′ is the index of the most
significant 1 in K, then we have that P(8, i′, 2, 0, 0) is the largest priority on the path
starting at r1−j = r0, and P(8, i′, 2, 1, 0) is the largest priority on the path starting
at sj = s1. Thus, MaxDiffσ(sj , r1−j) = P(8, i′, 2, 1, 0), and since this priority is even
and contained in Valσ(sj) we can conclude that Valσ(r1−j) @ Valσ(sj).
We now move on to the first claim. Here the same reasoning as we gave for the second
case can be used to prove that Valσ(r1−j) @ Valσ(sj), and therefore Lemma A.1 implies
that Valσ(r1−j) @ Valσ(rj). What remains is to prove that Valσ(rj) @ Valσ(s1−j). Again
there are two cases to consider.
(1) If j = 0, then clock j uses bit-string K, and clock 1− j is about to transition from
bit-string K − 1 to bit-string K. In fact, the path from s1−j is already the path for
bit-string K, so the proof from item 2 above can be reused.
(2) If j = 1, then clock j uses bit-string K, and clock 1− j is about to transition from
bit-string K to bit-string K + 1. In fact, the path from s1−j is already the path for
bit-string K + 1, so the proof from item 1 above can be reused.
Finally, we observe that all of the maximum difference priorities used in the proof are strictly
larger than P(7, 0, 0, 0, 0), which completes the proof.
Appendix B. Best responses
In this section, we prove that the best responses defined in Section 4 are indeed best re-
sponses to χB,K,jm . There are two types of odd vertices used in the construction: the
vertices eji used in the Not and Input/Output gates, and the vertices q
j
i,0 used in the
Input/Output gates. We begin by proving a general lemma concerning the vertices eji
used in the Not and Input/Output gates.
Lemma B.1. Let σ be a strategy that agrees with χB,K,jm for some bit-strings B,K ∈ {0, 1}n,
some m in the range 1 ≤ m ≤ Delay(j,K) − 1, and some j ∈ {0, 1}. For every i ∈
Not∪Input/Output, and every l ∈ {0, 1}, we have that if σ(dli) = eli, then Br(σ)(eli) 6= dli.
Proof. Note that if player Odd uses the edge from eli to d
l
i, then this would create a cycle
with largest priority P(1, i, 1, j, 0), which is even. Since the game is a one-sink game, and
since the initial strategy is terminating, we have that Odd can eventually reach the odd
cycle at xj from the vertices rj , r1−j , sj , and s1−j . Furthermore, Odd can reach one of
these four vertices by moving to hli. Since the odd cycle at x
j has priority smaller than
P(1, i, 1, j, 0), we can conclude that Br(σ)(eji ) 6= dli.
44 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
We now proceed to prove individual lemma for each of the vertices that belong to player
Odd. Each type of Odd vertex will be considered in a different subsection.
B.1. The vertices qli,0. We now consider the vertices q
l
i,0 for i ∈ Input/Output. The
first lemma considers the case where l = j, and the second lemma considers the case where
l = 1− j.
Lemma B.2. Let σ be a strategy that agrees with χB,K,jm for some bit-strings B,K ∈
{0, 1}n, some m in the range 1 ≤ m ≤ Delay(j,K) − 1, and some j ∈ {0, 1}. For every
i ∈ Input/Output, we have that Br(σ)(qji,0) = µB,K,jm (qji,0).
Proof. There are two cases to consider.
• If m = 1, then then we must show that the edge to eji is chosen by Odd in the
best response. Consider a strategy τ where τ(eji ) = h
j
i,0, and τ(q
j
i,0) = e
j
i . When τ
is played against σ, the path that starts at eji eventually arrives at r
1−j , and the
largest priority on this path is strictly smaller than P(6, d(C) + 2, 0, j, 0). On the
other hand, taking the edge to qji,1 leads directly to r
1−j while visiting the priority
P(6, d(C) + 2, 0, j, 0). Since this priority is even, we can conclude that Odd would
prefer to play τ than to use the edge from qji,0 to q
j
i,1 in his best response. Therefore,
we must have that Br(σ)(qji,0) = e
j
i , as required.
• If m > 1, then we must show that the edge to qji,1 is the least appealing edge at
qji,0. Observe that the path that starts at e
j
i and follows σ eventually arrives at r
j ,
and every priority on this path that is strictly smaller than P(7, 0, 0, 0, 0). On the
other hand, the path that starts at qji,1 moves directly to r
1−j . Hence, we can apply
Lemma A.2 (both parts) to argue that Valσ(qji,1) @ Valσ(e
j
i ), as required.
Lemma B.3. Let σ be a strategy that agrees with χB,K,jm for some bit-strings B,K ∈
{0, 1}n, some m in the range 1 ≤ m ≤ Delay(j,K) − 1, and some j ∈ {0, 1}. For every
i ∈ Input/Output, we have that Br(σ)(q1−ji,0 ) = µB,K,jm (q1−ji,0 .
Proof. We must show that the edge to e1−ji is the least appealing edge at q
1−j
i,0 . Observe that
the path that starts at e1−ji and follows σ will eventually arrive at either r
j or r1−j . In either
case, the largest priority on this path will be strictly smaller than P(6, d(C) + 2, 0, j, 0). On
the other hand, the path that starts at q1−ji,1 moves directly to r
j , and the largest priority
on this path is P(6, d(C) + 2, 0, j, 0). Since this priority is even, we can conclude that
Valσ(e1−ji ) @ Valσ(q
1−j
i,0 ), as required.
B.2. The vertices eli in Not gates. The following lemma considers the vertices e
l
i for
l = j and i ∈ Not. We do not need to prove a lemma for the case where l = 1 − j and
i ∈ Not, because these vertices are in the non-computing circuit, and we do not specify
strategies for the Not and Or gadgets in the non-computing circuits.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 45
Lemma B.4. Let σ be a strategy that agrees with χB,K,jm for some bit-strings B,K ∈ {0, 1}n,
some m in the range 1 ≤ m ≤ Delay(j,K) − 1, and some j ∈ {0, 1}. For every i ∈ Not,
we have that Br(σ)(eji ) = µ
B,K,j
m (e
j
i ).
Proof. There are three cases to consider.
(1) If m = 1 then, the path that starts at dji and follows σ moves directly to s
j . On
the other hand, the path that starts at hji moves directly to r
j . All of the priorities
on these paths are strictly smaller than P(7, 0, 0, 0, 0), so we can apply Lemma A.1
part 2 to argue that Valσ(dji ) @ Valσ(d
j
i ), so therefore we have Br(σ)(e
j
i ) = d
j
i .
(2) If m > 1 and either m ≤ d(i) + 2, or m > d(i) + 2 and Eval(B, I(i)) = 1, then
observe that the path that starts at dji and follows σ eventually arrives at r
j , and
that the largest priority on this path is strictly smaller than P(6, i, 1, j, 0). On the
other hand, the path that starts at hji and follows σ moved directly to r
j , and the
largest priority on this path is P(6, i, 1, j, 0). Since this priority is even, we can
conclude that Valσ(dji ) @ Valσ(e
j
i ), so therefore we have Br(σ)(e
j
i ) = d
j
i .
(3) If m > d(i) + 2 and Eval(B, I(i)) = 0, then we have σ(dji ) = e
j
i , so we can apply
Lemma B.1 to prove that Br(σ)(eji ) = h
j
i .
B.3. The vertices eji in input/output gates. The following lemmas consider the vertices
eli when i ∈ Input/Output. The first lemma considers the case where l = j, and the second
lemma considers the case where l = 1− j.
Lemma B.5. Let σ be a strategy that agrees with χB,K,jm for some bit-strings B,K ∈
{0, 1}n, some m in the range 1 ≤ m ≤ Delay(j,K) − 1, and some j ∈ {0, 1}. For every
i ∈ Input/Output, we have that Br(σ)(eji ) = µB,K,jm (eji ).
Proof. There are a number of cases to consider.
(1) If m > 1 and F 2(B)i = 0 and either m ≤ d(i) + 3, or m > d(i) + 3 and F 2(B)i = 0,
then the path that starts at dji eventually arrives at r
j , and the largest priority on
this path is strictly smaller than P(6, d(C) + 1, 1, j, 0). On the other hand, the path
that starts at hji,0 and follows σ passes through h
j
i,1 and then arrives at r
j . The
largest priority on this path is P(6, d(C) + 1, 1, j, 0), and since this priority is even,
we can conclude that Valσ(dji ) @ Valσ(h
j
i,0). Therefore, we have that Br(σ)(e
j
i ) = d
j
i .
(2) If m > 1 and m > d(i) + 3 and F 2(B)i = 1, then σ(d
j
i ) = e
j
i , and so we can apply
Lemma B.1 to argue that Br(σ)(eji ) = h
j
i,0.
Lemma B.6. Let σ be a strategy that agrees with χB,K,jm for some bit-strings B,K ∈
{0, 1}n, some m in the range 1 ≤ m ≤ Delay(j,K) − 1, and some j ∈ {0, 1}. For every
i ∈ Input/Output, we have that Br(σ)(e1−ji ) = µB,K,jm (e1−ji ).
Proof. There are a number of cases to consider.
(1) If m = 1 and Bi = 0, then the path that starts at d
1−j
i and follows σ will eventually
reach r1−j , and the largest priority on this path is strictly smaller than P(6, d(C) +
1, 1, j, 0). On the other hand, the path that starts at h1−ji,0 moves to h
1−j
i,1 and
46 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
then arrives at r1−j , and the largest priority on this path is P(6, d(C) + 1, 1, j, 0).
Since this priority is even, we have that Valσ(d1−ji ) @ Valσ(h
1−j
i,0 ) and therefore
Br(σ)(e1−ji ) = d
1−j
i .
(2) m > 1 and Bi = 0, then the path that starts at d
1−j
i moves to r
j , and the largest
priority on this path is strictly smaller than P(6, 0, 1, j, 0). On the other hand,
the path that starts at h1−ji,0 moves to h
1−j
i,2 and then arrives at r
j , and the largest
priority on this path is P(6, 0, 1, j, 0). Since this priority is even, we have that
Valσ(d1−ji ) @ Valσ(h
1−j
i,0 ) and therefore Br(σ)(e
1−j
i ) = d
1−j
i .
(3) If Bi = 1, then we have σ(d
1−j
i ) = e
1−j
i , and we can apply Lemma B.1 to argue that
Br(σ)(e1−ji ) = h
1−j
i,0 .
B.4. Gate outputs. In this section we give two key lemmas that describe the valuation of
the output states oji . The first lemma considers the case where m = 1 or m = 2, and the
second lemma considers the case where m ≥ 3.
Lemma B.7. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m = 1 or m = 2. For every gate i, we have Valσ(oji ) @ Valσ(rj).
Proof. There are four cases to consider
(1) We begin by showing the claim for an input/output gate i from circuit 1− j in the
case where m = 1. Note that Lemma B.6 implies that Br(σ)(e1−ji ) = d
1−j
i . Observe
that by definition, the path that starts at dji and follows σ will trace a path through
the gadgets for circuit 1 − j, and will eventually reach r1−j . Furthermore, the
largest priority possible that can be seen along this path is P(6, d(C) + 1, 1, j, 0) <
P(7, 0, 0, 0, 0). Hence, we can apply Lemma A.2 to argue that Valσ(o1−ji ) @ Valσ(rj).
(2) Now we consider an input/output gate i from circuit 1− j in the case where m = 2.
Again, Lemma B.6 implies that Br(σ)(e1−ji ) = d
1−j
i , but in this case since m = 2,
we have that the vertices y1−j and p1−ji have both switched to r
j . Hence, the path
that starts at o1−ji will eventually arrive at r
j . It can be verified that, whatever
path is taken from o1−ji to r
j , the largest priority along this path is P(6, i, 0, j, 1) on
the vertex o1−ji . Since this priority is odd, we have that Val
σ(o1−ji ) @ Valσ(rj).
(3) Next we consider the case where i is a Or-gate. If m = 1 then we have that
σ(oji ) = s
j , and if m = 2 then σ(oji ) = r
j . In both cases, we can use Lemma A.1 and
the fact that the priority assigned to oji is odd, to conclude that Val
σ(oji ) @ Valσ(rj).
(4) Finally, we consider the case where i is a Not-gate. We can apply Lemma B.4 to
argue that Br(σ)(eji ) = d
j
i . If m = 1 then we have that σ(d
j
i ) = s
j , and if m = 2
then σ(dji ) = r
j . In both cases, the highest priority on the path from oji to either
sj or rj is the odd priority from oji , so we can use this fact, along with Lemma A.1
to conclude that Valσ(oji ) @ Valσ(rj).
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 47
Lemma B.8. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 3 ≤ m ≤ Delay(j,K. For every gate i, we have:
(1) If m ≤ d(i) + 2, then Valσ(oji ) @ Valσ(rj).
(2) If m > d(i) + 2 and Eval(B, i) = 0, then Valσ(oji ) @ Valσ(rj).
(3) If m > d(i) + 2 and Eval(B, i) = 1, then Valσ(rj) @ Valσ(oji ) and:
P(6, 0, 0, 0, 0) ≤ MaxDiffσ(rj , oji ) ≤ P(6, i, 1, j, 0).
Proof. We will prove this claim by induction over the depth of the gates. For the base case
we consider an input/output gate i from circuit 1 − j, which provide the input values for
circuit j. Since we consider these gates to have depth 0, we always have m > d(i) + 2, so
there are two cases to prove based on whether Bi is zero or one. First, observe that since
m ≥ 3, the circuit mover gadgets attached to the input/output gadget for bit i in circuit
1 − j have σ(y1−j) = rj . Since Delay(1 − j,K) ≥ d(C) + 3, the definition given in (4.5)
implies that the strategy at d1−ji is determined by Bi. So we have the following two cases.
• If Bi = 0, then σ(d1−ji ) = a1−ji,l for some l. By definition, in the strategy σ, all paths
from a1−ji,l eventually arrive at r
j , and the maximum priority on any of these paths
is smaller than P(6, 0, 0, j, 1). Hence, the largest priority on the path from o1−ji to
rj is the priority P(6, 0, 0, j, 1) on the vertex o1−ji , and since this is an odd priority,
we have Valσ(o1−ji ) @ Valσ(rj).
• If Bi = 1, then σ(d1−ji ) = e1−ji and Lemma B.6 then implies Br(σ)(e1−ji ) = h1−ji .
So, the path that starts at o1−ji and follows σ passes through e
1−j
i , h
1−j
i,0 , h
1−j
i,2 , and
then arrives at rj . The largest priority on this path is P(6, 0, 1, j, 0) > P(6, 0, 0, j, 1)
on the state h1−ji,2 , so we have Val
σ(rj) @ Valσ(o1−ji )
Hence, the base case of the induction has been shown. The inductive step will be split into
two cases, based on whether i is a Not-gate or an Or-gate.
Suppose that the inductive hypothesis holds for all gates i with d(i) < k, and let i be
a Or-gate with d(i) = k. We must prove three cases.
• The first two cases use the same proof. If m ≤ d(i) + 2, or if m > d(i) + 2 and
Eval(B, i) = 0, then by definition we have σ(oji ) = r
j . Since the priority assigned
to oji is odd, we have Val
σ(oji ) @ Valσ(rj), as required.
• If m > d(i) + 2 and Eval(B, i) = 1, then by definition we have that σ(oji ) =
InputState(i, j, l) for some gate l with l ∈ {1, 2}, and we know that Eval(B, Il(i)) =
1. Hence, we can apply the inductive hypothesis to argue that MaxDiffσ(rj , InputState(i, j, l))
is even, and it satisfies:
P(6, 0, 0, 0, 0) ≤ MaxDiffσ(rj , InputState(i, j, l)) ≤ P(6, l, 1, j, 0).
Since the priority assigned to oji is smaller than P(6, 0, 0, 0, 0), we have that the
same two properties apply to MaxDiffσ(rj , oji ). Hence, Val
σ(rj) @ Valσ(oji ), and
the required bounds on MaxDiffσ(rj , oji ) hold because P(6, l, 1, j, 0) < P(6, i, 1, j, 0).
Now suppose that the inductive hypothesis holds for all gates i with d(i) < k, and let
i be a Not-gate with d(i) = k. We must prove three cases.
• If m ≤ d(i) + 2, then by definition, the path that starts at oji and follows σ passes
through eji , and Lemma B.4 implies that it then moves to d
j
i , a vertex of the form
48 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
aji,l, a number of vertices of the form t
j
i,l, before finally arriving at r
j . It can easily
be verified that the largest priority on this path is P(6, i, 0, j, 1) from the vertex oji .
So, we have MaxDiffσ(rj , oji ) = P(6, i, 0, j, 1), and since this priority is odd, we have
Valσ(oji ) @ Valσ(rj), as required.
• If m > d(i) + 2 and Eval(B, i) = 0, then we must have Eval(B, I(i)) = 1. Hence,
by definition, the path that starts at oji and follows σ first passes through e
j
i ,
and then Lemma B.4 implies that it then passes through dji , and then some ver-
tex of the form aji,l, followed by a number of vertices of the form t
j
i,l, before fi-
nally arriving at tji,d(i) and then moving to InputState(i, j). The largest priority
on this path is P(6, i, 0, j, 1) from the vertex oji . By the inductive hypothesis, we
have MaxDiffσ(rj , InputState(i, j)) < P(6, I(i), 1, j, 0), and since I(i) < i we have
that P(6, i, 0, j, 1) > P(6, I(i), 1, j, 0). Hence, we have that MaxDiffσ(rj , ojI(i)) =
P(6, i, 0, j, 1), and since this is odd, we have that Valσ(oji ) @ Valσ(rj), as required.
• If m > d(i) + 2 and Eval(B, i) = 1, then by definition we have that the path that
starts at oji and follows σ passes through e
j
i , and then Lemma B.4 implies that
it passes through hji , and then reaches r
j . The largest priority on this path is
P(6, i, 1, j, 0) on the vertex hji . Hence we have MaxDiff
σ(rj , oji ) ≤ P(6, i, 1, j, 0), and
since this priority is even, we have that Valσ(rj) @ Valσ(oji ). Hence, we have shown
both of the required properties for this case.
Now that we have shown the two versions of the inductive hypothesis, we have completed
the proof.
Appendix C. Or gates
The following pair of lemmas show that the states oji in the Or gate gadgets correctly switch
to the outgoing edge specified by χB,K,jm+1 . The first lemma considers the case where 1 ≤
m < Delay(j,K)− 1, and the second lemma considers the case where m = Delay(j,K)− 1.
Lemma C.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m < Delay(j,K)− 1. For each Or-gate i, greedy
all-switches strategy improvement will switch oji to χ
B,K,j
m+1 (o
j
i ).
Proof. In this proof we will show the that the most appealing edge is the one that is specified
in Equation (4.1). This boils down to a case analysis.
First suppose that m < d(i)+2. We must prove that the edge to rj is the most appealing
edge at oji . By Lemma A.1, we have that Val
σ(sj) @ Valσ(rj). Furthermore, since d(I1(i)) =
d(I2(i)) = d(i) − 1, part 1 of Lemma B.8 implies that Valσ(InputState(i, j, l)) @ Valσ(rj)
for l ∈ {1, 2}. Hence, we have that rj is the most appealing edge at oji , as required.
Now suppose that d(i) + 2 ≤ m ≤ Delay(j,K)− 1. There are three cases to consider.
• If both input gates are false, then Lemma A.1 and part 2 of Lemma B.8 imply that
rj will continue to be the most appealing edge at oji , as required.
• If Il(i) is true and I1−l(i) is false, for some l ∈ {1, 2}, then part 3 of Lemma B.8
implies that Valσ(rj) @ Valσ(InputState(i, j, l)), and Valσ(InputState(i, j, 1− l)) @
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 49
Valσ(rj). Therefore, the most appealing edge at oji is the one to InputState(i, j, l),
as required.
• Finally, if both input gates are true, then Lemma B.8 implies that the highest appeal
edge at oji is either InputState(i, j, 1) or InputState(i, j, 2). Since OrNext(i, B,m)
is defined to be the successor with highest appeal, we have that the highest appeal
edge at oji is the one to o
j
IOrNext(i)(i,B,m)
, as required.
This completes the proof that greedy all-switches strategy improvement will switch oji to
χB,K,jm+1 (o
j
i ).
Lemma C.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m = Delay(j,K)− 1. For each Or-gate i, greedy all-switches strategy
improvement will switch o1−ji to χ
B,K,j
m+1 (o
1−j
i ).
Proof. We must show that the edge to s1−j is the most appealing edge at o1−ji . It can be
verified that all paths starting at InputState(i, j, 1) and InputState(i, j, 2) either reach r1−j ,
s1−j , or rj . Furthermore, the largest possible priority on these paths is strictly smaller than
P(7, 0, 0, 0, 0). Hence, we can apply part 1 of Lemma A.1 and part 1 of Lemma A.2 to
conclude that the edge to s1−j is the most appealing outgoing edge at o1−ji .
Appendix D. The states tji,l in Not gates
In this section we show that the states tji,l in the Not gate gadgets correctly switch to the
outgoing edge specified by χB,K,jm+1 . The first two lemmas consider the case where 1 ≤ m <
Delay(j,K) − 1, and the third lemma considers the case where m = Delay(j,K) − 1. The
first lemma deals with the case where l < d(i), while the second lemma deals with the case
where l > d(i). Observe that there is no need to deal with the case where l = d(i), because
tji,d(i) only has one outgoing edge.
Lemma D.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m < Delay(j,K)− 1. For each Not-gate i, and
each l int the range 1 ≤ l < d(i), greedy all-switches strategy improvement will switch tji,l to
χB,K,jm+1 (t
j
i,l).
Proof. Lemma A.1 implies that the state tji,l will not switch to s
j . In the rest of this proof
we consider the two remaining outgoing edges at this state. We have two cases to consider.
(1) We first deal with the case where m < l + 1, where we must show that the edge to
rj is the highest appeal edge at tji,l. Let v = σ(t
j
i,l) be the successor of t
j
i,l according
to σ (if m = 1 then v = sj , and otherwise v = rj). Since m ≤ l, the definition
in Equation (4.2) we have that σ(tji,l−1) = v. Since the priority assigned to t
j
i,l−1
is odd, we therefore have that Valσ(tji,l−1) @ Valσ(v). Since we already know from
Lemma A.1 that Valσ(sj) @ Valσ(rj), we have therefore proved that the edge to rj
is the most appealing edge at tji,l.
50 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
(2) Now we deal with the case where m ≥ l + 1, where we must show that the most
appealing edge at tji,l is the one to t
j
i,l−1. Since m > l, the definition in Equation (4.2)
implies that the path that starts at tji,l−1 and follows σ will pass through t
j
i,l′ for all
l′ in the range 0 ≤ l′ < l before arriving at rj . The highest priority on this path is
P(5, i, 2k + 4n + 4, j, 0) on the vertex tji,0. Since this priority is even we have that
Valσ(rj) @ Valσ(tji,l−1), and therefore the edge to t
j
i,l−1 is the most appealing edge
at tji,l.
Lemma D.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m < Delay(j,K)− 1. For each Not-gate i, and
each l > d(i), greedy all-switches strategy improvement will switch tji,l to χ
B,K,j
m+1 (t
j
i,l).
Proof. Lemma A.1 implies that the state tji,l will not switch to s
j . In the rest of this proof
we consider the two remaining outgoing edges at this state.
(1) First we deal with the case where m < l+1, where we must show that the edge to rj
is the most appealing edge at tji,l. When l > d(i)+1, the proof of this fact is identical
to Item 1 in the proof of Lemma D.1. For l = d(i) + 1, we invoke Lemma B.8 to
argue that, since m < l+ 1 = d(i) + 2, we have that Valσ(oji ) @ Valσ(rj). Since the
priority assigned to tji,d(i) is odd, we therefore also have that Val
σ(tji,d(i)) @ Val
σ(rj).
Therefore, the edge to rj is the most appealing edge at tji,d(i)+1.
(2) Now we deal with the case where m ≥ l + 1, and where Eval(B, I(i)) = 0. Here we
must show that the edge to rj is the most appealing edge at tji,l. For each l > d(i)+1,
the proof is identical to the proof of the first case in the proof of this lemma. For
l = d(i) + 1, we invoke Lemma B.8 to argue that, since Eval(B, I(i)) = 0 we must
have Valσ(ojI(i)) @ Val
σ(rj), and therefore the edge to rj is the most appealing edge
at tji,l.
(3) Finally, we deal with the case where m ≥ l+1 and where Eval(B, I(i)) = 1. Here we
must show that edge to tji,l−1 is the most appealing edge at t
j
i,l. From the definition
given in Equation (4.2), we have that the path that starts at tji,l−1 and follows σ will
pass through tji,l′ for each l
′ in the range d(i) ≤ l′ < l before moving to ojI(i). Since
m ≥ l+ 1 > d(i), we have that m ≥ d(i) + 2, and therefore Lemma B.8 implies that
Valσ(rj) @ Valσ(ojI(i)) and that MaxDiff
σ(rj , ojI(i)) ≥ P(6, 0, 0, 0, 0). All priorities
on the path from tji,l−1 to o
j
I(i) are smaller than P(6, 0, 0, 0, 0), so we can conclude
that Valσ(rj) @ Valσ(tji,l−1), as required.
Lemma D.3. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m = Delay(j,K) − 1. For each Not-gate i, and each l int the range
1 ≤ l < 2k+4n+6, greedy all-switches strategy improvement will switch t1−ji,l to χB,K,jm+1 (t1−ji,l ).
Proof. We must show that the edge to s1−j is the most appealing edge at t1−ji,l . It can be
verified that all paths starting at t1−ji,l−1 reach one of r
1−j , s1−j , or rj . Furthermore, the
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 51
largest possible priority on these paths is strictly smaller than P(7, 0, 0, 0, 0). Hence, we can
apply part 1 of Lemma A.1 and part 1 of Lemma A.2 to conclude that the edge to s1−j is
the most appealing outgoing edge at t1−ji,l .
Appendix E. The state dji in Not gates
In this section we show that the states dji in the Not gate gadgets correctly switch to the
outgoing edge specified by χB,K,jm+1 . The first lemma considers the case where m = 1, the
second lemma considers the case where m = 2, the third lemma considers the case where
3 ≤ m < d(i)+2, the fourth lemma considers the case where d(i)+2 ≤ m < Delay(j,K)−1
and the gate outputs 0, the fifth lemma considers the case where d(i)+2 ≤ m < Delay(j,K)
and the gate outputs 1, and the final lemma considers the case where m = Delay(j,K)− 1.
Lemma E.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some j ∈
{0, 1}n, and where m = 1. For each Not-gate i, greedy all-switches strategy improvement
will switch dji to χ
B,K,j
m+1 (d
j
i ).
Proof. According to the definition given in Equation (4.4), we must show that the edge to
rj is the most appealing edge at dji . We do so by a case analysis.
(1) First we consider the vertex sj . Here we can apply part 2 of Lemma A.1 to argue
that Valσ(sj) @ Valσ(rj).
(2) Next we consider a vertex aji,l with l 6= d(i). Here the definition given in Equa-
tion (4.2) implies that the path that starts at aji,l and follows σ passes through t
j
i,l
and then arrives at sj . The largest priority on this path is P(5, i, l + 1, j, 0) on the
vertex aji,l. However, since this priority is smaller than P(7, 0, 0, 0, 0), we can apply
part 2 of Lemma A.1 to prove that Valσ(aji,l) @ Valσ(rj).
(3) Next we consider a vertex aji,l with l = d(i). Here we can apply Lemma B.7 to argue
that Valσ(ojI(i)) @ Val
σ(rj). Furthermore, the largest priority on the path from aji,l to
oji is the odd priority on t
j
i,d(i). Hence, we can conclude that Val
σ(aji,d(i)) @ Val
σ(rj).
(4) Finally, we consider the vertex eji . Lemma B.4 implies that Br(σ)(e
j
i ) = d
j
i . Hence,
the path that starts at eji moves to d
j
i and then to s
j . The highest priority on this
path is P(4, i, 1, j, 0) < P(7, 0, 0, 0, 0). Therefore, we can apply Lemma A.1 to show
that Valσ(eji ) @ Valσ(rj).
Hence, we have shown that the edge to rj is the most appealing outgoing edge at dji , so
greedy all-switches strategy improvement will switch dji to r
j .
Lemma E.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some j ∈
{0, 1}n, and where m = 2. For each Not-gate i, greedy all-switches strategy improvement
will switch dji to χ
B,K,j
m+1 (d
j
i ).
Proof. According to the definition given in Equation (4.4), we must show that the edge to
aji,2k+4n+6 is the most appealing edge at d
j
i . We do so by a case analysis.
52 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
(1) First we consider the vertex rj . Observe that the path that starts at aji,2k+4n+6
and follows σ visits tji,2k+4n+6 and then arrives at r
j . The highest priority on this
path is the even priority assigned to aji,2k+4n+6, so therefore we can conclude that
Valσ(rj) @ Valσ(aji,2k+4n+6).
(2) Next we consider the vertex sj . Here we can apply Lemma A.1 to argue that
Valσ(sj) @ Valσ(rj), and we have already shown that Valσ(rj) @ Valσ(aji,2k+4n+6).
(3) Next we consider a vertex aji,l with l 6= d(i) and l 6= aji,2k+4n+6. The path that starts
at aji,l and follows σ passes through t
j
i,l and then arrives at r
j . The largest priority
on this path is the even priority assigned to aji,l. However, since l < 2k + 4n + 6,
we have that this priority is smaller than the even priority assigned to aji,2k+4n+6.
Therefore, we have Valσ(aji,l) @ Valσ(a
j
i,2k+4n+6).
(4) Next we consider the vertex aji,d(i). Here we can apply Lemma B.7 to argue that
Valσ(ojI(i)) @ Val
σ(rj). Furthermore, the largest priority on the path from aji,l to o
j
i
is the odd priority on tji,d(i). Since the highest priority on the path from a
j
i,2k+4n+6
to rj is even, we we can conclude that Valσ(aji,d(i)) @ Val
σ(aji,2k+4n+6).
(5) Finally, we consider the vertex eji . Lemma B.4 implies that Br(σ)(e
j
i ) = d
j
i . Hence,
the path that starts at eji moves to d
j
i and then to r
j . The highest priority on this
path is P(4, i, 1, j, 0) on the vertex eji . However, this is smaller than the largest
priority on the path from aji,2k+4n+6 to r
j , so we can conclude that Valσ(eji ) @
Valσ(aji,2k+4n+6).
Lemma E.3. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 3 ≤ m < d(i) + 2. For each Not-gate i, greedy
all-switches strategy improvement will switch dji to χ
B,K,j
m+1 (d
j
i ).
Proof. Lemma A.1 implies that the state dji will not switch to s
j . In the rest of this proof
we consider the other outgoing edges at this state.
Since we are in the case where m < d(i) + 2 the definition from Equation (4.4) specifies
that greedy all-switches strategy improvement must switch to aji,m−2. Hence must argue
that the edge to aji,m−2 is the most appealing edge at d
j
i , and we will start by considering
the appeal of this edge. The definition in Equation (4.2) implies that the path that starts
at aji,m−2 passes through t
j
i,l for all l ≤ m− 2 before arriving at rj . The highest priority on
this path is P(5, i, 2k+ 4n+ 4, j, 0) on the vertex tj0, and the second highest priority on this
path is P(5, i,m − 1, j, 0) on the vertex aji,m−2. We will now show that all other edges are
less appealing.
(1) First we consider the vertex rj . Since P(5, i, 2k+4n+4, j, 0) is even, we immediately
get that Valσ(rj) @ Valσ(aji,m−2).
(2) Next we consider a vertex aji,l with l < m−2. The path that starts at this vertex and
follows σ passes through tji,l′ for all l
′ ≤ l before arriving at rj . The highest priority
on this path is P(5, i, 2k + 4n + 4, j, 0) on the vertex tj0, and the second highest
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 53
priority on this path is P(5, i, l + 1, j, 0) on the vertex aji,l. Hence, we have that
MaxDiffσ(aji,m−2, a
j
i,l) is P(5, i,m − 1, j, 0), and since this is even, we can conclude
that Valσ(aji,l) @ Valσ(a
j
i,m−2).
(3) Next we consider a vertex aji,l with l > m − 2 and with l 6= d(i). The path that
starts at this vertex and follows σ passes through tji,l and then moves immediately
to rj . The highest priority on this path is P(5, i, l+ 1, j, 0) on the vertex aji,l. Since
l + 1 < 2k + 4n + 4 we have that MaxDiffσ(aji,m−2, a
j
i,l) is P(5, i, 2k + 4n + 4, j, 0),
and since this priority is even, we can conclude that Valσ(aji,l) @ Valσ(a
j
i,m−2).
(4) Next we consider the vertex aji,d(i). The path that starts at this vertex passes
through tji,d(i), and then moves to InputState(i, j). Since m < d(i) + 2, we have
that m ≤ d(I(i)) + 2, and therefore the first case of Lemma B.8 tells us that
Valσ(InputState(i, j)) @ Valσ(rj). Hence we can conclude that Valσ(aji,d(i)) @
Valσ(aji,m−2).
(5) Finally, we consider the vertex eji . By Lemma B.4, the path that starts at e
j
i and
follows σ passes through dji . If m = 3, it then moves to a
j
i,2k+4n+6, and if m > 3
it then moves to aji,m−3. In either case, since the priority assigned to e
j
i is smaller
than the priorities assigned to the vertices ajl and t
j
l , we can reuse the arguments
made above to conclude that Valσ(eji ) @ Valσ(a
j
i,m−2).
Therefore, we have shown that the edge to aji,m−2 is the most appealing outgoing edge at
dji , and so this edge will be switched by greedy all-switches strategy improvement.
Lemma E.4. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range d(i) + 2 ≤ m < Delay(j,K)− 1. For each Not-gate
Eval(B, I(i)) = 1 then greedy all-switches strategy improvement will switch dji to χ
B,K,j
m+1 (d
j
i ).
Proof. Lemma A.1 implies that the state dji will not switch to s
j . In the rest of this proof
we consider the other outgoing edges at this state.
Since we are in the case where m ≥ d(i) + 2 and Eval(B, I(i)) = 1, the definition
from Equation (4.4) specifies that greedy all-switches strategy improvement must switch to
aji,m−2. Hence must argue that the edge to a
j
i,m−2 is the most appealing edge at d
j
i , and we
will start by considering the appeal of this edge. The definition in Equation (4.2) implies
that the path that starts at aji,m−2 passes through t
j
i,l for all l in the range d(i) ≤ l ≤ m− 2
before arriving at InputState(i, j). Since m ≥ d(i) + 2 we have m > d(I(i) + 2, and so
Lemma B.8 implies that MaxDiffσ(rj , InputState(i, j)) ≥ P(6, 0, 0, 0, 0). We now consider
the other outgoing edges from dji .
(1) First we consider rj , where Lemma B.8 immediately gives that Valσ(rj) @ Valσ(aji,m−2).
(2) Next we consider a vertex aji,l with l < d(i). Using the same reasoning as Item 2 in
the proof of Lemma E.3, we can conclude that the highest priority on the path from
aji,l to r
j is P(5, i, 2k + 4n + 4, j, 0) < P(6, 0, 0, 0, 0) and so therefore Valσ(aji,l) @
Valσ(aji,m−2).
54 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
(3) Next we consider a vertex aji,l with l in the range d(i) ≤ l < m− 2. The path that
starts at aji,l and follows σ passes through t
j
i,l′ for all l
′ in the range d(i) ≤ l′ ≤ l and
then arrives at InputState(i, j). The highest priority on this path is P(5, i, l+1, j, 0).
On the other hand, the largest priority on the path from aji,m−2 to InputState(i, j)
is P(5, i,m − 1, j, 0). Since this priority is even, we can conclude that Valσ(aji,l) @
Valσ(aji,m−2).
(4) Next we consider a vertex aji,l with l > m− 2. Using the same reasoning as Item 3
in the proof of Lemma E.3 we can conclude that the highest priority on the path
from aji,l to r
j is P(5, i, l + 1, j, 0) < P(6, 0, 0, 0, 0) and so therefore Valσ(aji,l) @
Valσ(aji,m−2).
(5) Finally, we consider the vertex eji , where we can use the same reasoning as Item 5
in the proof of Lemma E.3 to conclude that Valσ(eji ) @ Valσ(a
j
i,m−2).
Therefore, we have shown that the edge to aji,m−2 is the most appealing outgoing edge at
dji , and so this edge will be switched by greedy all-switches strategy improvement.
Lemma E.5. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range d(i) + 2 ≤ m < Delay(j,K)− 1. For each Not-gate
Eval(B, I(i)) = 0 then greedy all-switches strategy improvement will switch dji to χ
B,K,j
m+1 (d
j
i ).
Proof. Lemma A.1 implies that the state dji will not switch to s
j . In the rest of this proof
we consider the other outgoing edges at this state.
Since m ≥ d(i) + 2 and Eval(B, I(i) = 0), the definition in Equation (4.4) specifies that
the edge to eji is the most appealing edge at d
j
i . To prove this, we first show that all other
edges are less appealing than aji,d(i)−1, and we will then later show that e
j
i is more appealing
than aji,d(i)−1. The definition in Equation (4.2) implies that the path that starts at a
j
i,d(i)−1
passes through tji,l for all l in the range 0 ≤ l ≤ d(i) − 1 before arriving at rj . The largest
priority on this path is P(5, i, d(i), j0) on the state aji,d(i)−1. We now consider the other
outgoing edges.
(1) First we consider the vertex rj . Since P(5, i, 2k+4n+4, j, 0) is even, we immediately
get that Valσ(rj) @ Valσ(aji,m−2).
(2) Next we consider a vertex aji,l with l < d(i) − 1. Here we can use the same rea-
soning as we used in Item 2 in the proof of Lemma E.3 to argue that Valσ(aji,l) @
Valσ(aji,d(i)−1).
(3) Next we consider a vertex aji,l with l > d(i). Here we can use the same reason-
ing as we used in Item 3 in the proof of Lemma E.3 to argue that Valσ(aji,l) @
Valσ(aji,d(i)−1).
(4) Finally, we consider the vertex aji,d(i). Here we can use the same reasoning as we used
in Item 4 in the proof of Lemma E.3 to argue that Valσ(aji,d(i)) @ Val
σ(aji,d(i)−1),
although this time we will use case two of Lemma B.8.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 55
So, we have shown that every edge other than the one to eji is less appealing than the edge
to aji,d(i)−1.
Now we will show that the edge to eji is more appealing than the edge to a
j
i,d(i)−1. There
are two cases to consider.
• If m = d(i)+2, then σ(dji ) = aji,d(i)−1. Since Lemma B.4 implies that Br(σ)(eji ) = dji ,
we have that the path that starts at eji and follows σ passes through d
j
i and then
arrives at aji,d(i)−1. The largest priority on this path is P(4, i, 1, j, 0), and since this
is even, we can conclude that Valσ(aji,d(i)−1) @ Val
σ(eji ).
• If m > d(i) + 2, then σ(dji ) = eji . In this case Lemma B.4 implies that Br(σ)(eji ) =
hji . The path that starts at e
j
i and follows σ passes through h
j
i and then moves
directly to rji . The largest priority on this path is P(6, i, 1, j0), which is bigger than
P(5, i, d(i), j0). Therefore, MaxDiffσ(eji , a
j
i,d(i)−1 = P(6, i, 1, j0), and since this is
even, we can conclude that Valσ(aji,d(i)−1) @ Val
σ(eji ).
Hence, we have shown that the edge to eji is the most appealing edge at d
j
i .
Lemma E.6. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m = Delay(j,K)− 1. For each Not-gate i greedy all-switches strategy
improvement will switch d1−ji to χ
B,K,j
m+1 (d
1−j
i ).
Proof. We must show that the edge to s1−j is the most appealing edge at d1−ji . It can
be verified that all paths starting at the vertices a1−ji,l and e
j
i reach one of r
1−j , s1−j , or
rj . Furthermore, the largest possible priority on all of these paths is strictly smaller than
P(7, 0, 0, 0, 0). Hence, we can apply part 1 of Lemma A.1 and part 1 of Lemma A.2 to
conclude that the edge to s1−j is the most appealing outgoing edge at d1−ji .
Appendix F. The vertices zl
In this section we show that the states zl correctly switch to the outgoing edge specified by
χB,K,jm+1 . The first lemma considers the case where l = j, while the second lemma considers
the case where l = 1− j.
Lemma F.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and m in the range 1 ≤ m ≤ Delay(j,K)− 1. The greedy all-switches strategy
improvement algorithm will switch zj to χB,K,jm+1 (z
j).
Proof. We must show that the edge to rj is the most appealing edge at zj . This follows
immediately from part 2 of Lemma A.1.
Lemma F.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K) − 1. The greedy all-switches
strategy improvement algorithm will switch z1−j to χB,K,jm+1 (z
1−j).
Proof. There are two cases to consider.
(1) If m < Delay(j,K) − 1, then we must show that the edge to r1−j is the most
appealing edge at z1−j . This follows immediately by applying part 2 of Lemma A.1.
56 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
(2) If m = Delay(j,K)− 1, then since Delay(j,K) + Delay(1− j,K) = Length(K), we
must show that the edge to s1−j is the most appealing edge at z1−j . This follows
immediately from part 1 of Lemma A.1.
Appendix G. The vertices yl
In this section we show that the states yl correctly switch to the outgoing edge specified by
χB,K,jm+1 . The first lemma considers the case where l = j, while the second lemma considers
the case where l = 1− j.
Lemma G.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K) − 1. The greedy all-switches
strategy improvement algorithm will switch yj to χB,K,jm+1 (y
j).
Proof. The definition of χB,K,jm+1 (y
j
i ) specifies that the edge chosen at y
j is defined by σm+Delay(j,K)(y
j),
which is given in Equation (4.7). According to this definition, we must show that the edge
to rj is the most appealing outgoing edge at yj . This follows immediately from parts 1
and 2 of Lemma A.2.
Lemma G.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K) − 1. The greedy all-switches
strategy improvement algorithm will switch y1−j to χB,K,jm+1 (y
1−j).
Proof. According to Equation (4.7), we must show that the edge to rj is the most appealing
edge at y1−j . This follows immediately from Lemma A.2.
Appendix H. The vertices pli
In this section we show that the vertices pli correctly switch to the outgoing edge specified by
χB,K,jm+1 . The first lemma considers the case where l = j, while the second lemma considers
the case where l = 1− j.
Lemma H.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some j ∈
{0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K)− 1. For each i ∈ Input/Output,
the greedy all-switches strategy improvement algorithm will switch pji to χ
B,K,j
m+1 (p
j
i ).
Proof. Sincem is in the range 1 ≤ m ≤ Delay(j,K)−1, the definition given in Equation (4.8)
specifies that ojI(i) should be the most appealing outgoing edge at p
j
i . There are two cases
to consider.
(1) If m = 1, then observe that the path that starts at ojI(i) and follows σ will eventually
reach sj , no matter whether I(i) is a Not-gate or an Or-gate. Furthermore, the
largest priority on this path is strictly smaller than P(7, 0, 0, 0, 0). On the other
hand, the path that starts at pji,1 moves immediately to r
1−j , and the largest priority
on this path is strictly smaller than P(7, 0, 0, 0, 0). Hence, we can apply part 2 of
Lemma A.2 to argue that Valσ(pji,1) @ Valσ(o
j
I(i)), as required.
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 57
(2) If m ≥ 1, then then observe that the path that starts at ojI(i) and follows σ will
eventually reach rj , and again the largest priority on this path is strictly smaller than
P(7, 0, 0, 0, 0). Hence, we can apply part 2 of Lemma A.2 to argue that Valσ(pji,1) @
Valσ(ojI(i)), as required.
Lemma H.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some j ∈
{0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K)− 1. For each i ∈ Input/Output,
the greedy all-switches strategy improvement algorithm will switch p1−ji to χ
B,K,j
m+1 (p
1−j
i ).
Proof. The definition given in Equation (4.8) specifies that the edge to p1−ji should be the
most appealing edge at p1−ji . The path that starts at o
1−j
I(i) and follows σ must eventually
arrive at either s1−j or r1−j . In particular, observe that rj cannot be reached due to the
vertices qji,0, which by Lemma B.2 selects the edge towards q
j
i,1. On the other hand, the path
that starts at p1−ji,1 moves directly to r
j . Moreover the largest priorities on both of these
paths are strictly smaller than P(7, 0, 0, 0, 0). Hence, we can apply Lemmas A.1 and A.2 to
argue that Valσ(ojI(i)) @ Val
σ(rj).
Appendix I. The vertices hli,0
In this section we show that the vertices hli,0 correctly switch to the outgoing edge specified
by χB,K,jm+1 . The first lemma considers the case where l = j, while the second lemma considers
the case where l = 1− j.
Lemma I.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some j ∈
{0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K)− 1. For each i ∈ Input/Output,
the greedy all-switches strategy improvement algorithm will switch hji,0 to χ
B,K,j
m+1 (h
j
i,0).
Proof. We must show that the most appealing edge at hji,0 should be h
j
i,1. Observe that both
hji,1 and h
j
i,2 move directly to r
j and r1−j , respectively. Moreover, the priorities assigned to
these vertices are strictly smaller than P(7, 0, 0, 0, 0). Since hji,1 moves to r
j , we can apply
part 2 of Lemma A.2 to prove that hji,1 is the most appealing edge at h
j
i,0.
Lemma I.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and some m in the range 1 ≤ m ≤ Delay(j,K) − 1. The greedy all-switches
strategy improvement algorithm will switch h1−ji,0 to χ
B,K,j
m+1 (h
1−j
i,0 ).
Proof. We must show that the most appealing edge at hji,0 should be h
j
i,1. Observe that
both h1−ji,1 and h
1−j
i,2 move directly to r
1−j and rj , respectively. Moreover, the priorities
assigned to these vertices are strictly smaller than P(7, 0, 0, 0, 0). We will use these fact
in order to apply Lemma A.2 in the following case analysis. According to Equation (4.9),
there are two cases to consider. Since hji,1 moves to r
j , we can apply part 2 of Lemma A.2
to prove that hji,1 is the most appealing edge at h
j
i,0.
58 THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT
Appendix J. Input/output gates
In this section we show that the vertices in the input/output gadgets correctly switch to
the outgoing edge specified by χB,K,jm+1 . The first two lemmas deal with the case where the
input/output gadget for circuit j resets. Note that this occurs one iteration later than
the rest of the vertices in circuit j, which is why we prove separate lemmas for this case.
Otherwise, the input/output gadgets behave as if they are Not gates, so the proofs that
we have already given for the Not gates can be applied with only minor changes. These
is formalized in the final two lemmas of this section. The first of these lemmas considers
the input/output gadgets in circuit j and the second considers the input/output gadgets in
circuit 1− j.
Lemma J.1. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m = 1. For each Input/Output-gate i, and each l int the range
1 ≤ l < 2k+ 4n+ 6, greedy all-switches strategy improvement will switch tji,l to χB,K,jm+1 (tji,l).
Proof. We must show that the edge to zj is the most appealing edge at tji,l. All paths that
start at tji,l and follow σ will eventually arrive at r
1−j , either via yj , or via pji . On the
other hand, the path that starts at zj moves directly to sj . Therefore, part 2 of Lemma A.2
implies that the edge to zj is the most appealing edge at tji,l.
Lemma J.2. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m = 1. For each Input/Output-gate i greedy all-switches strategy
improvement will switch dji to χ
B,K,j
m+1 (d
j
i ).
Proof. This proof uses the same argument as the proof of Lemma J.1, because all paths
that start at dji will eventually arrive at r
1−j .
Lemma J.3. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m in the range 2 ≤ m ≤ Delay(j,K)−1. For each Input/Output-gate
i, and each l int the range 1 ≤ l < 2k + 4n + 6, greedy all-switches strategy improvement
will switch tji,l to χ
B,K,j
m+1 (t
j
i,l) and d
j
i to χ
B,K,j
m+1 (d
j
i ).
Proof. Since m ≥ 2, we have that σ(yj) = rj and σ(zj) = sj . Hence, both tji,l and dji behave
in exactly the same way as the states tji′,l and d
j
i′,l for i
′ ∈ Not, with the exception that
these states one step behind the Not-gate vertices, due to the delay introduced by yj . Note,
however, that this is account for by placing the edge to pji on t
j
i,d(C), rather than t
j
i,d(C)+1,
as would be expected for a Not-gate with depth d(C) + 1. Therefore, to prove this lemma,
we can use exactly the reasoning as gave for Lemmas D.1, D.2, E.1, E.2, E.3, E.5, and E.4.
This is because all of the reasoning used there is done relative to rj and sj , and since yj
and zj have insignificant priorities, none of this reasoning changes.
Lemma J.4. Let σ be a strategy that agrees with χB,K,jm for some K,B ∈ {0, 1}n, some
j ∈ {0, 1}n, and for m in the range 2 ≤ m < Delay(j,K)−1. For each Input/Output-gate
i, and each l int the range 1 ≤ l < 2k + 4n + 6, greedy all-switches strategy improvement
will switch t1−ji,l to χ
B,K,j
m+1 (t
1−j
i,l ) and d
1−j
i to χ
B,K,j
m+1 (d
1−j
i ).
THE COMPLEXITY OF ALL-SWITCHES STRATEGY IMPROVEMENT 59
Proof. Much like the proof of Lemma J.3, this claim can be proved using essentially the
same reasoning given as was given for Lemmas D.1, D.2, E.1, E.2, E.3, E.5, and E.4, because
an Input/Output-gate behaves exactly like a Not-gate.
In particular, note that since we have defined χB,K,0Delay(0,K) = χ
B,K,1
1 and χ
B,K,1
Delay(1,K) =
χB,K+1,01 , the gate gadgets in circuit 1 − j continue to have well-defined strategies, so the
d1−ji will continue to act like a Not-gate between iterations 1 and 2.
The one point that we must pay attention to is that, in the transition between χB,K,j1
and χB,K,j2 , the state y
1−j switches from r1−j to rj . Note, however, that both hji,0 and p
j
i
both switch to vertex that eventually leads to rj at exactly the same time, so all paths that
exit the gadget will switch from r1−j to rj . Since almost all of the reasoning in the above
lemmas is done relative to rj , the relative orders over the edge appeals cannot change.
The only arguments that must be changed are the ones that depend on Lemma B.8.
Here, the fact that the priority P(5, i, 2k + 4n + 5, j, 0) is assigned to p1−ji,1 is sufficient to
ensure that Valσ(rj) @ Valσ(t1−ji,d(C)), so the deceleration lane will continue switching. On
the other hand, since P(5, i, 2k + 4n + 5, j, 0) < P(7, 0, 0, 0, 0), this priority is not large
enough to cause dji to switch away from e
j
i , if Bi = 1.
