Abstract-Given two pushdown automata, the bisimilarity problem asks whether the infinite transition systems they induce are bisimilar. While this problem is known to be decidable our main result states that it is nonelementary, improving EXPTIMEhardness, which was the best previously known lower bound for this problem. Our lower bound result holds for normed pushdown automata as well.
I. INTRODUCTION
A central problem in theoretical computer science is to decide whether two machines or systems behave equivalently. While being generally undecidable for Turing machines, a lot of research has been devoted to finding subclasses of machine devices for which this problem becomes decidable. Equivalence checking is the problem of determining whether two systems are semantically identical.
It is well-known that even language equivalence of pushdown automata is undecidable, in fact already their universality is undecidable. On the positive side, a celebrated result due to Sénizergues states that language equivalence of deterministic pushdown automata is decidable [1] . The best known upper bound for the latter problem is a tower of exponentials [2] (see [3] for a more recent proof), while only hardness of deterministic polynomial time is known to date.
Among the numerous notions of equivalence [4] in the realm of formal verification and concurrency theory, the central one is bisimulation equivalence (bisimilarity for short), which enjoys pleasant mathematical properties. It can be seen to take the king role: There are important characterizations of the bisimulation-invariant fragments of first-order logic and of monadic second-order logic in terms of modal logic [5] and of the modal μ-calculus [6] , respectively. In particular, bisimilarity is a fundamental notion for process algebraic formalisms [7] . As a result, a great deal of research in the analysis of infinite-state systems (such as pushdown automata or Petri nets) has been devoted to deciding bisimilarity of two given processes, see e.g. [8] for a comprehensive overview.
A milestone result in this context has been proven by Sénizergues: Bisimilarity on pushdown systems (i.e. transition systems induced by pushdown automata) is decidable [9] . Since a pushdown system can be viewed as an abstraction of the call-and-return behavior of a recursive program, one can read this decidability result as that one can decide equivalence of recursive programs in terms of their visible behavior.
In [9] bisimilarity is proven to be decidable even for the more general class of equational graphs of finite out-degree.
Concerning decidability this result can in some sense be considered as best possible since on the slightly more general classes of type-1 rewrite systems [10] and order-two pushdown graphs [11] bisimilarity becomes undecidable.
Sénizergues' algorithm for deciding bisimilarity of pushdown systems consists of two semi-decision procedures and in fact no complexity-theoretic upper bound is known for this problem to date. On the other hand, the best known lower bound for this problem is EXPTIME shown by Kučera and Mayr [12] . In [13] EXPTIME-hardness has been established even for the subclass of basic process algebras, for which a 2EXPTIME upper bound is known [14] (in [15] a simpler proof has recently been announced). Such complexity gaps are typical in the context of infinite-state systems.
In fact, in case decidability is known, the precise computational complexity status of bisimilarity on infinite-state systems is known only for few classes, including basic parallel processes (communication-free Petri nets) [16] and one-counter systems (the transition systems induced by pushdown automata over a singleton stack alphabet) [17] .
Our contribution. The main result of this paper states that bisimilarity of (systems induced by) pushdown automata is nonelementary, even in the normed case. We give small descriptions of pushdown systems on which a bisimulation game is implemented that allows to push and verify encodings of nonelementarily big countersà la Stockmeyer [18] . As an important technical tool we realize deterministic verification phases in the bisimulation game by simulating non-erasing real-time transducers that are fed with the stack content. As building blocks, we use the well-established technique of Defender's forcing [10] . We are optimistic that our technique gives new insights for potential further lower bounds for bisimilarity of PA processes, regularity for pushdown systems, and weak bisimilarity of basic process algebras.
Organisation. In Section II we introduce preliminaries. Section III overviews the ideas in the proof. In Section IV we recall basics on transductions, introduce useful abbreviations for pushdown rules. We also recall the "forcing" technique. Section V explains the key construction that allows a check for bisimilarity to model manipulations on counters for large integers. Section VI consists of our nonelementary lower bound proof for bisimilarity of pushdown automata, while Section VII extends this to PDAs that satisfy an additional condition, normedness. Section VIII gives conclusions. A (real-time and non-erasing) transducer on Σ and Υ is a tuple T = (Q, q 0 , Σ, Υ, δ), where Q is a finite set of states, q 0 ∈ Q is an initial state, Σ and Υ are finite alphabets, and
+ is a transition function with output. We say that T is letter-to-letter if δ(q, a) ∈ Q × Υ for each q ∈ Q and each a ∈ Σ. We inductively extend δ to the function
* is said to be letter-to-letter if T is. We define the size of T
A pushdown automaton (PDA) is a tuple P = (Q, Γ, Act, − →), where Q is a finite set of control states, Γ is a finite set of stack symbols, Act is a finite set of actions,
is a finite set of internal rules, push rules, and pop rules, respectively. The size of P is defined as |P|
− → q w} for each a ∈ Act. We will abbreviate each configuration (q, w) in S(P) by qw; in particular the configuration (q, ε) will be denoted by just q.
Given a PDA P = (Q, Γ, Act, − →), q 1 , q 2 ∈ Q and w 1 , w 2 ∈ Γ * the PDA bisimilarity problem asks whether q 1 w 1 ∼ q 2 w 2 holds in S(P). In this paper we prove the following theorem: Theorem 1. PDA bisimilarity is nonelementary.
III. PROOF OVERVIEW
For a start, let us recall that bisimilarity has a very natural game-theoretic account. Given two labelled transition systems, one can consider a bisimulation game involving two players, traditionally called Attacker and Defender respectively. They play rounds, in which Attacker fires a transition from one of the systems and Defender has to follow with an identically labelled transition from the other system. In the first round, the chosen transitions must lead from the states to be tested for bisimilarity, while, in each subsequent round, they must start at the states reached after the preceding round. Defender loses if she cannot find a matching transition. In this framework, bisimilarity corresponds to the existence of a winning strategy for Defender.
The game-theoretic reading suggests an intuitive way of reducing halting problems for Turing machines to bisimulation problems, based on constructing bisimulation games that satisfy the following condition: a given Turing machine halts on an input string if and only if Defender has a winning strategy. Such games can be viewed as a competition between the players, in which Defender is given an opportunity to exhibit an accepting run and Attacker is equipped with mechanisms to challenge (and verify) the correctness of Defender's construction. The effect of constructing a run by Defender is achieved by allowing Defender to make choices during the game. As the process of playing a bisimulation game naturally favours Attacker as the decision maker, it is not actually clear that the game can be used to express Defender's choice. Nevertheless, it turns out that thanks to the forcing technique of [10] , it is possible to construct transition systems in which Defender effectively ends up making choices.
When proving hardness of bisimilarity for classes of computational models, such as pushdown automata, the positions of bisimulation games discussed above must correspond to configurations of the machines. In particular, this means that during the game, players can be thought of as having access to the associated computational resources. For example, in our case, Defender will make moves that store his proposed accepting run on the stack. Additionally, the game can also store some information in the control state of the pushdown system, but since we are interested in finding polynomial-time reductions, these have to be of polynomial size.
Next we give more intuition for our argument by discussing how PSPACE computations can be modelled through bisimulation games, following the argument of Kučera and Mayr [12] (their argument is for EXPTIME, which is equal to alternating PSPACE, but we omit alternation from the discussion, because alternating computation will not be used in our main argument). Let us consider a PSPACE machine M and an input word. We can code the tape configuration of such a machine by a stack of polynomial size, and we will thus naturally consider a reduction that produces a pair of PDAs -in fact, they are the same PDA but with a different initial state -whose stack configurations at any point represent alleged sequences of configurations of M with separators (older configurations occur deeper in the stack). The PDA will have moves that can push new tape symbols of the machine M on the stacks of each configuration, and we can rely on Defender's forcing to delegate the choice of such moves to Defender. The control state can be used to make sure that tapes are the correct size, because each configuration is of polynomial size and we can afford to create polynomially many control states as part of a polynomial-time reduction.
In order to check that Defender's choices amount to a computation history, the PDA is able to move into a "verification mode": at this point, suppose the top of the stacks correspond to a cell having position i at time t+1; the top stack symbol σ is saved in the control state, the stack is popped until the top element corresponds to cell position i at time t, and then the symbol appearing is compared to σ: if the symbol corresponds to what the transition relation of the machine says it should be, the machines behave in a bisimilar manner, and otherwise they do not. Note that in order to support popping from position i at time t + 1 to position i at time t, a counter will be required. Because in this case only polynomially many steps are needed, control state space of the PDA can be used for that purpose.
What breaks down in this argument when we try to move to more powerful machines -e.g. EXPSPACE machines? Firstly, tape configurations are now of size 2 n , and hence we can no longer use the control state to verify that the tape configurations are even of the right size. Secondly, the verification of a valid transition can no longer be achieved by having the machines simply pop their stacks in synch with one another -they would not know when they have reached the corresponding cell position at the previous tape configuration.
We deal with the first difficulty by adding counters to every cell in the stack content; thus the code of a tape configuration will consist of a sequence of n address bits followed by a tape content. We can use these address bits to know that the end of a tape configuration has occurred, and thus restrict the machines to have separators between configurations. The fact that the addresses really do represent counters moving up sequentially will need to be verified, but for EXPSPACE this can be done through popping and control states.
The solution to the second difficulty is to perform verification of transitions in a very different way from the PSPACE case. Verification will be carried out only when the machines reach the boundary of a tape configuration. At this point, the machines will first go out of synch by one tape configuration -with one machine popping the stack down to the next configuration marker while the other keeps its stack intact. (Technically, this will be achieved as follows: first, both machines push, in synch, a configuration; then, both machines pop stack symbols, but one of them in half speed, so that one machine obtains the previous stack, whereas the other one effectively pops a configuration.) After this the machines will pop stack symbols, but with one machine emitting symbols corresponding exactly to what it sees, while the other machine emits symbols corresponding to the configuration obtained by applying the transition function to the symbols it sees. Thus, in the second phase, the machines will emit the same symbols exactly when the two successive configurations obey the transition function.
The above idea can be extended from EXPSPACE to k-EXPSPACE inductively. Indices that count up to a given tower of exponentials will now precede each tape symbol. The indices used to capture smaller towers will be embedded into those for larger ones. For instance, assuming that c 0 , · · · , c 2 n −1 are the binary strings representing the numbers
, where σ i 's are bits, will be used to represent natural numbers from the interval [0, 2
The indexing can be used to enforce that the stack consists of tape configurations of the correct size. The verification that counting indices are incremented correctly as well as the verification that the tape configurations obey the transition function, can be done using the technique of going out of synch and reading distinct symbols.
Altogether, we get k-EXPSPACE-hardness for all k, and thus a nonelementary lower bound.
IV. NOTATION AND TECHNIQUES
In order to prove Theorem 1 we are going to show that PDA bisimilarity is k-EXPSPACE-hard for each fixed k ≥ 1.
To that end, given a k-EXPSPACE Turing machine M with an input string, we will construct (in polynomial time) a PDA P such that M accepts the input if and only if certain two configurations of P are bisimilar. We will rely on a number of techniques and notational conventions introduced below.
In this section and the next we will progressively reveal more and more technical details about the special structure of P = (Q, Γ, Act, − →). For a start, we shall assume a certain partitioning of Q.
• Suppose B is a finite set. Let We are going to assume that Q is partitioned as follows:
The role of the partition will become clear in a moment.
In the interest of succinctness and readability we will define − → via macro rules, which compactly represent collections of PDA transition rules with a certain role. They take one of the following five shapes:
The various indices on a macro role (such as T, a 1 , . . .) will be explained shortly. For the moment we mention that the implementation of each macro rule will contribute a number of implicit states (that is, elements of Q impl ) to the automaton. Convention. We assume that no implicit state can be used by two different macros. Moreover, if a state occurs on the left-hand-side of one of the rules, it cannot occur on the lefthand-side of any other rule except that if p occurs in the first rule then we allow other rules of the first kind with the same p but different σ.
We explain each of the macro formats next.
A. Single pop with fixed trace
For p, q ∈ Q main , σ ∈ Γ and a 1 , · · · , a ∈ Act, we write
for the sequence of transitions displayed below
B. Transduction of stack content with matching
Our PDA construction will also require the automaton to read certain sequences of action names depending on stack content. This can be conveniently expressed using the language of transducers. In particular, emissions of signals during pop transitions will be important. The next macro will make it easy to specify such activities flexibly.
For p, q ∈ Q main , a regular language L ⊆ Γ * and a transducer T on Γ and Act we write
to stipulate the sequence of transitions described below. They will make sure that, once P reaches configuration py for y ∈ Γ * , the shortest prefix w of y with w ∈ L will be popped, #T (w)# will be read (where # ∈ Act is a special action symbol), and the control state will be changed to q; if y does not have a prefix w with w ∈ L, then y will be popped, and #T (y) will be read.
The transitions are the result of a product construction between a deterministic finite automaton (DFA) accepting L (e.g. the minimal one) and the transducer T . More precisely, let A = (Q A , q 
C. Synchronized pushing
The next macro will use elements of B to construct simultaneously transitions involving both • B and B • . More precisely, given s, t ∈ B and σ 1 , . . . , σ ∈ Γ ( ≥ 1), we will write 
D. Forcing
Recall that the PDA P to be constructed in our argument is supposed to enable a bisimulation game on S(P), which will correspond to a step-by-step construction of an accepting run. The run will be represented as a sequence of configurations. During the game Defender will have the power to decide what to add to the sequence, whereas Attacker will be able to initiate correctness checks that can detect mistakes in Defender's choices. To construct parts of P that will allow for such choices at suitable stages, we are going to use two blueprint designs for labeled transition systems: Or-widgets (Defender's forcing) and And-widgets (Attacker's forcing), shown in Figure 1. They express respectively logical disjunction and logical conjunction with respect to bisimulation.
Lemma 2 ([10]). Consider the states and transitions of a widget from Figure 1, viewed as part of a larger LTS in which there are no other outgoing transitions from • s, s • than those shown in the Figure and no other transitions involving u i (1 ≤ i ≤ 3). Then we have: (a) Or-widget: • s ∼ s • if and only if • t ∼ t • or • t ∼ t • ; (b) And-widget: • s ∼ s • if and only if • t ∼ t • and • t ∼ t • .
In terms of the Defender-Attacker game, if the players reach ( • s, s • ) in the game, the Or-widget allows Defender to decide if the play should continue in ( • t 1 , t 1• ) or ( • t 2 , t 2• ), whereas, in the And-widget, it is Attacker who makes this choice.
The next macro is based on the Or-widget (Figure 1 (a) ). Given, σ 1 , σ 2 ∈ Γ ∪ {ε} and s, t 1 , t 2 ∈ B, we write
to denote a sequence of transitions closely following the widget. It will allow Defender to choose between two (possibly) pushing transitions. More specifically, we want to add the following transitions on the understanding that u 1 , u 2 , u 3 ∈ Q impl :
Note that Lemma 2 concerns labeled transition systems, whereas the definitions above refer to PDAs. Consequently, in order to induce the Or-widget in the LTS S(P) for • s = • sx and s • = s • y, where x, y ∈ Γ * , we will assume that x = y (due to the need to reach the same configuration from • sx and s • y in a single transition). Intuitively, when the state of the Defender-Attacker game is ( • sx, s • x) for x ∈ Γ * , Defender can choose whether the game will proceed to ( 
. , t w }
to denote that a sequence of Or-widgets is used to achieve
Similarly we write
. . , t w }
to denote that And-widgets (Attacker's forcing, Figure 1 (b) ) are used to achieve
Note that the shape of the And-widget does not contain any state synchronizations. Consequently, it does not matter whether the stack content is the same at • s and s • . However, we will not need this level of generality in our argument.
V. COUNTERS
To represent configurations of a k-EXPSPACE Turing machine we shall use binary strings whose length is equal to the tower of exponentials of height k. For technical reasons discussed in Section III, rather than working with raw configurations we shall precede each binary symbol with a number that indicates its position in the string. The following definitions introduce the relevant technical notions.
For each , n ≥ 0 we define Tower( , n) inductively as
Tower( ,n) . •
When n is clear from the context, we may speak of ancounter to mean an ( , n)-counter. Observe that the length of each ( , n)-counter is uniquely determined by and n. Note also that the set of values taken by ( , n)-counters is equal to [0, Tower( +1, n) −1] . In the two extreme cases (val (c) = 0 or val (c) = Tower( + 1, n) − 1) we shall call the ( , n)-counters zeros and ones (or, when n is clear from context, "the ones -counter"), respectively. In the following we write
Binary strings of length Tower(k, n) in which each bit is preceded by a number indicating its position are thus naturally represented as k-counters. Consequently, k-counters will be taken to represent configurations of k-EXPSPACE Turing machines. Because during our construction we will be interested in storing configurations on the stack, from now on we assume that our stack alphabet Γ includes Ω ≤k .
Next we present a construction that enables one to compare two consecutive counters pushed on the stack via bisimulation. Its key idea is the use of transducers to communicate information about stack content as well as to desynchronize the two stacks involved in playing the bisimulation game. It will also illustrate the pL T − → q macro at work. Given an alphabet Ω and a word w ∈ Act * , we write Ω → w to refer to a transducer T w that outputs w on reading each letter of the input string from Ω * .
Lemma 4. Let T 1 , T 2 be letter-to-letter transducers on Ω ≤ +1
and Act for some ≥ 0. Suppose the definition of P = (Q, Γ, Act, − →) involves, possibly among others, the following macros.
•
pop one -counter and one Ω +1 -symbol
, and let w 1 , w 2 , w 3 be -counters.
Proof:
(last two rules).
Given two transductions f
and i ∈ {1, 2}. We note that from two given transducers T 1 , T 2 with transductions f T1 : Σ *
In what follows we rely on two specific transducers T +0 , T +1 on Ω ∪ Ω +1 and {0, 1, a, b} depicted in Figure 2 ; they are formally not transducers since some outgoing transitions are missing -but the missing transitions will not be relevant later and are therefore omitted. They interpret the input word over Ω as a number in binary, with the least significant bit read first. Transducer T +0 copies the number and outputs a upon reading an Ω +1 -symbol. Transducer T +1 attempts to increase the number by 1 and outputs a upon reading an Ω +1 -symbol, but it will output b if the input number consisted only of 1s. If w 1 , w 2 are -counters and σ 1 , σ 2 ∈ Ω +1 , then we have
if and only if val (w 1 ) = val (w 2 ) + 1. Using the two transducers one can verify through bisimilarity whether two counters placed suitably on the stack have consecutive values.
Lemma 5. Suppose stop , testDec , testDec 1 ∈ B and the definition of P = (Q, Γ, Act, − →) involves the following macros. 
In the remainder of the paper we will assume a very particular shape of the set B. Suppose I is a finite set of instructions. Then we insist that
It may be helpful to think of α as a bounded sequence of instructions that are manipulated separately from the unbounded pushdown stack. In what follows we shall use β to range over α ∈ I * such that 1 ≤ |α| ≤ k + 1. Our next result shows how to define macro rules for managing counters on the stack. 
VI. REDUCTIONS
We prove Theorem 1 by showing that PDA bisimilarity is k-EXPSPACE-hard for all k ≥ 1.
To that end we introduce a somewhat abstract description of accepting runs, based on transducers. It will be convenient to rely on it when representing a k-EXPSPACE computation through PDA configurations. 
push from Ω and a decremented ( − 1)-counter OR push from Ω and a zeros ( − 1)-counter Suppose z 0 , . . . , z t are binary sequences representing configurations of an accepting run of a deterministic Turing machine, i.e. z 0 , z t correspond to initial and accepting configurations respectively and, for any 0 ≤ i < t, z i+1 represents the successor configuration with respect to that corresponding to z i . If we imagine that T 1 is a transducer capable of generating successor configurations and T 2 is a copy-cat transducer, then the relationship between z i and z i+1 boils down to the requirement that T 1 (z i ) = T 2 (z i+1 ). This motivates the definition below, where we allow an arbitrary T 2 , not just a copy-cat. This will permit T 2 to be a copy-cat but with some delay in outputting configurations -this is necessary for computing successor configurations.
Definition 7. Let T 1 , T 2 be letter-to-letter transducers on {0, 1} and Υ. Let h be in N. We say that the pair (T 1 , T 2 ) is h-terminating if there exist t ∈ N and z 0 , . . . , z t ∈ {0, 1} h such that
Fix for the rest of the paper k ≥ 1. Given that h-terminating pairs of transducers were introduced as a generalization of accepting computation histories, the following result does not come as a surprise.
Given (n, T 1 , T 2 ), where n ∈ N is presented in unary and (T 1 , T 2 ) is a Tower(k, n)-terminating pair of transducers on {0, 1} and Υ, decide whether
TRANSREACH(k) is k-EXPSPACE-complete with respect to polynomial-time many-one reductions.
The main result will now follow immediately from reducing TRANSREACH(k) to bisimilarity: Lemma 9. TRANSREACH(k) is polynomial-time reducible to PDA bisimilarity.
Proof: Let us fix an instance (n, T 1 , T 2 ) of TRANSREACH(k). Using notation introduced in Sections IV and V, we construct P = (Q, Γ, Act, − →) next.
The PDA P will be able to push a code of a sequence of words onto the stack, where each word ρ i is encoded as a k-counter, say w i , in the obvious way: ρ i = η(w i ) where OR go on to the next zi
ruleà la Lemma 4 to test equality with 0
ruleà la Lemma 4 to test equality with 0 η : Ω * ≤k → {0, 1} * denotes the homomorphism with η(σ) = val (σ) for σ ∈ Ω k and η(σ) = ε otherwise. The w i 's will be separated on the stack by the symbol $ def = 0 k+1 , i.e. we shall use Ω ≤k ∪ {0 k+1 } as the stack alphabet.
Formally, P is defined as follows. 
The rules defining − → are those listed in Figure 4 along with the rules from Lemma 5 and Figure 3 . In Figure 4 we include brief intuitions for each of the new rules, referring to the sequence z 0 , z 1 , . . . associated with the transducers. Note that there are no outgoing rules involving stop so as to satisfy the technical condition in Lemma 5.
Altogether, the rules amount to playing a game in which Defender is allowed to construct sequences while Attacker can check whether these represent a Tower(k, n)-terminating sequence z 0 , z 1 , · · · ending in 0 Tower(k,n) . To prove that the reduction is correct, one first shows that the three conditions below are satisfied, where x ∈ Γ * and w 1 , w 2 , w 3 are k-counters.
(a) ∼ testFin w 1 $x iff η(w 1 ) = 0 Tower(k,n) .
(b) ∼ testTran w 3 $w 2 $w 1 $x iff T 1 (η(w 1 )) = T 2 (η(w 2 )).
(c) ∼ start iff last(Tower(k, n), T 1 , T 2 ) = 0 Tower(k,n) .
Applying the last item, we have ∼ start if and only if last(Tower(k, n), T 1 , T 2 ) = 0 Tower(k,n) , which completes the reduction.
Observe that the definition of P involves polynomially many macro rules and the size of each is polynomial in (n, T 1 , T 2 ). Because macros can be expanded into ordinary rules in polynomial time, the overall reduction can also be performed in polynomial time.
Proposition 8 and Lemma 9 imply Theorem 1.
VII. NORMEDNESS
We say that a configuration of a PDA is normed if each reachable configuration can reach a deadlock configuration where the stack is empty. Note that if a configuration is normed then every reachable configuration is normed too. Here we strengthen Theorem 1 to show that our lower bound remains valid even if we restrict ourselves to normed configurations.
Recall that in our previous construction, P contains control states • stop and stop • for which we did not provide any outgoing transitions. Hence, once P reaches either of the control states, it will not be possible to empty the stack. Fortunately, the absence of outgoing transitions was not a necessary condition in our argument. Rather, we took advantage of a property that holds vacuously in these configurations:
