Synchronizing Data Words for Register Automata by Quaas, Karin & Shirmohammadi, Mahsa
ar
X
iv
:1
71
0.
02
32
9v
2 
 [c
s.F
L]
  9
 Ju
n 2
01
9
Synchronizing Data Words for Register Automata
Karin Quaas
Universita¨t Leipzig
Mahsa Shirmohammadi
CNRS & IRIF
Abstract
Register automata (RAs) are finite automata extended with a finite set of registers to store and
compare data from an infinite domain. We study the concept of synchronizing data words in RAs:
does there exist a data word that sends all states of the RA to a single state?
For deterministic RAs with k registers (k-DRAs), we prove that inputting data words with 2k+1
distinct data from the infinite data domain is sufficient to synchronize. We show that the synchro-
nization problem for DRAs is in general PSPACE-complete, and it is NLOGSPACE-complete for
1-DRAs. For nondeterministic RAs (NRAs), we show that Ackermann(n) distinct data (where n is
the size of the RA) might be necessary to synchronize. The synchronization problem for NRAs is
in general undecidable, however, we establish Ackermann-completeness of the problem for 1-NRAs.
Another main result is the NEXPTIME-completeness of the length-bounded synchronization problem
for NRAs, where a bound on the length of the synchronizing data word, written in binary, is given.
A variant of this last construction allows to prove that the length-bounded universality problem for
NRAs is co-NEXPTIME-complete.
1 Introduction
Given a deterministic finite automaton (DFA), a synchronizing word is a word that sends all states of
the automaton to a unique state. Synchronizing words for finite automata have been studied since the
1970s [8, 25, 30, 23] and are the subject of one of the most well known open problems in automata
theory—the Cˇerny´ conjecture. This conjecture states that the length of a shortest synchronizing word
for a DFA with n states is at most (n−1)2. Synchronizing words moreover have applications in planning,
control of discrete event systems, biocomputing, and robotics [3, 30, 15]. More recently the notion has
been generalized from automata to games [21, 28, 20] and infinite-state systems [14, 9], with applications
to modelling complex systems such as distributed data networks or real-time embedded systems.
In this paper we are interested in synchronizing data words for register automata. Data words are
sequences of pairs, where the first element of each pair is taken from a finite alphabet and the second
element is taken from an infinite data domain, such as the natural numbers or ASCII strings. Data words
have applications in querying and reasoning about data models with complex structural properties, e.g.,
XML and graph databases [1, 16, 5, 2]. For reasoning about data words, various formalisms have been
considered, including first-order logic for data words [4, 6], extensions of linear temporal logic [22, 12, 11,
13], data automata [7, 4], register automata [19, 26, 24, 11] and extensions thereof, e.g. [29, 17, 10].
Register automata (RAs) are a generalization of finite automata for processing data words. RAs are
equipped with a finite set of registers that can store data values. While processing a data word such an
automaton can store the datum at the current position in one of its registers; it can also test the current
datum for equality with data already stored in its registers. In applications, RAs allow for handling
parameters such as user names, passwords, identifiers of connections, sessions, etc. RAs come in many
variants, including one-way, two-way, deterministic, nondeterministic, and alternating. For alternating
one-way RAs, classical language-theoretic decision problems, such as emptiness, universality and inclusion
are undecidable. In this paper, we focus on the class of one-way nondeterministic RAs, which have a
decidable emptiness problem [19], and the subclass of nondeterministic RAs with a single register, which
has a decidable universality problem [11].
Semantically, an RA defines an infinite-state transition system due to the unbounded domain for the
data stored in the registers. Synchronizing words were introduced for infinite-state systems with infi-
1
nite branching in [14, 28]; in particular, the notion of synchronizing words is motivated and studied for
weighted automata and timed automata. In some infinite-state settings, such as nested-word automata,
finding the right definition of synchronizing word is however more challenging [9]. We define the syn-
chronization problem for RAs within the framework suggested in [14, 28]: given an RA R over a finite
alphabet Σ and an infinite data domain D, does there exist a data word w ∈ (Σ ×D)+ and some state
qw such that the word w sends each of the infinitely many states of R to qw? Note that the state qw
depends on the word w; we call such a data word a synchronizing data word.
Contribution. The problem of finding synchronizing data words for RAs poses new challenges in the
area of synchronization. It is natural to ask how many distinct data are necessary and sufficient to syn-
chronize an RA, which we refer to as the data efficiency of synchronizing data words. We show that the
data efficiency is polynomial in the number of registers for deterministic RAs (DRAs). For nondeterminis-
tic RAs (NRAs), we provide an example that shows that the data efficiency may be Ackermann(n), where
n is the number of states of the NRA. Remarkably, the data efficiency is tightly related to the complexity
of deciding the synchronization problem. For DRAs, we prove that for all automata R with k registers,
if R has a synchronizing data word, then it also has one with data efficiency at most 2k+1. We provide
a family (Rk)k∈N of DRAs with k registers, for which indeed a polynomial data efficiency (in k) is neces-
sary to synchronize. This bound is the base of an (N)PSPACE-algorithm for DRAs; we prove a matching
PSPACE lower bound by ideas carried over from timed settings [14]. We show that the synchroniza-
tion problems for DRAs with a single register (1-DRAs) and for DFAs are NLOGSPACE-interreducible,
implying that the problem is NLOGSPACE-complete for 1-DRAs.
For NRAs, a reduction from the non-universality problem yields the undecidability of the synchro-
nization problem. For single-register NRAs (1-NRAs), we prove Ackermann-completeness of the problem
by a novel construction proving that the synchronization problem and the non-universality problem for
1-NRAs are polynomial-time interreducible. We believe that this technique is useful in studying synchro-
nization in all nondeterministic settings, requiring careful analysis of the size of the construction.
Another main contribution is to prove NEXPTIME-completeness of the length-bounded synchroniza-
tion problem for NRAs: given a bound on the length (written in binary), does there exist a synchronizing
data word with length at most the given bound? For the lower bound, we present a reduction from the
membership problem of O(2n)-time bounded nondeterministic Turing machines. The crucial ingredient
in this reduction is a family of RAs implementing binary counters. A variant of our construction yields a
proof for co-NEXPTIME-completeness of the length-bounded universality problem for NRAs; the length-
bounded universality problem asks whether all data words of length at most a given bound (written in
binary) are in the language of the automaton. We further make a connection to the emptiness problem
of single-register alternating RAs.
An extended abstract of this article has appeared in the Proceedings of the 41st International Sym-
posium on Mathematical Foundations of Computer Science, (MFCS) 2016 [?]. In comparison with the
extended abstract, here we simplify two of the main constructions and add detailed proofs of all re-
sults. The main improvement is giving a simpler NEXPTIME-hardness reduction for the length-bounded
synchronization problem for NRAs.
2 Preliminaries
A deterministic finite-state automaton (DFA) is a tuple A = 〈Q, Σ,∆〉, where Q is a finite set of states,
Σ is a finite alphabet, and ∆ : Q×Σ→ Q is a transition function that is totally defined. The function ∆
extends to finite words in a natural way: ∆(q,wa) = ∆(∆(q,w), a) for all words w ∈ Σ∗ and letters a ∈ Σ;
it extends to all sets S ⊆ Q by ∆(S,w) =
⋃
q∈S ∆(q,w).
Data Words and Register automata. For the rest of this paper, fix an infinite data domain D.
Given a finite alphabet Σ, a data word over Σ is a finite words over Σ × D. For a data word w =
(a1, d1)(a2, d2) · · · (an, dn), the length |w| of w is n. We use data(w) = {d1, . . . , dn} ⊆ D to refer to the
set of data values occurring in w, and we define the data efficiency of w to be |data(w)|.
Let R be a finite set of register variables. We define register constraints φ over R by the grammar
φ ::= true | =r | φ ∧ φ | ¬φ,
2
where r ∈ R. We denote by Φ(R) the set of all register constraints over R. We may use 6= r for the
inequality constraint ¬(= r). A register valuation is a mapping ν : R → D that assigns a data value to
each register; we sometimes write ν = (ν(r1), · · · , ν(rk)) ∈ Dk, where R = {r1, · · · , rk}. The satisfaction
relation of register constraints is defined onDk×D as follows: (ν, d) satisfies the constraint =r if ν(r) =d;
the other cases follow. For example, ((d1, d2, d1), d2) satisfies ((= r1) ∧ (= r2)) ∨ (6= r3)) if d1 6= d2. For
the set up ⊆ R and d ∈ D, we define the update ν[up ··= d] of valuation ν by (ν[up ··= d])(r) = d if r ∈ up,
and (ν[up ··= d])(r) = ν(r) otherwise.
A register automaton (RA) is a tuple R = 〈L,R, Σ,T 〉, where L is a finite set of locations, R is a
finite set of registers, Σ is a finite alphabet and T ⊆ L×Σ×Φ(R)× 2R × L is a transition relation. We
may use ℓ
φ a up↓
−−−−−→ ℓ′ to show transitions (ℓ, a,φ, up, ℓ′) ∈ T . We call ℓ
φ a up↓
−−−−−→ ℓ′ an a-transition and φ
the guard of this transition. A guard true is vacuously true and may be omitted. Likewise we may omit
up if up = ∅. We may write r ↓ when up = {r} is a singleton set. For NRAs with only one register, we
may shortly write = and 6= for the guards =r and 6=r, respectively, and ↓ for the update ↓r.
A configuration of R is a pair (ℓ, ν) ∈ L × D|R| of a location ℓ and a register valuation ν. We
describe the behaviour of R as follows: Given a configuration q = (ℓ, ν) and some input (a, d) ∈ Σ ×D
an a-transition ℓ
φ a up↓
−−−−−→ ℓ′ may be fired from q if (ν, d) satisfies the constraint φ; then R moves to the
successor configuration q′ = (ℓ′, ν′), where ν′ = ν[up ··= d] is the update of ν. By post(q, (a, d)), we denote
the set of all successor configurations q′ of q on input (a, d). We extend post to sets S ⊆ L × D|R| of
configurations by post(S, (a, d)) =
⋃
q∈S post(q, (a, d)); and we extend post to words by post(S,w·(a, d)) =
post(post(S,w), (a, d)) for all words w ∈ (Σ×D)∗, and all inputs (a, d) ∈ Σ×D.
A run of R over the data word w = (a1, d1)(a2, d2) · · · (an, dn) is a sequence of configurations
q0q1 . . . qn, where qi ∈ post(qi−1, (ai, di)) for all 1 ≤ i ≤ n. If R reaches a configuration q = (ℓ,x)
during processing a word w, we may say that an x-token is in ℓ (or simply a token is in ℓ).
In the rest of the paper, we consider complete RAs, meaning that for all configurations q ∈ L×D|R|
and all inputs (a, d) ∈ Σ × D, there is at least one successor: |post(q, (a, d))| ≥ 1. We also classify
the RAs into deterministic RAs (DRAs) and nondeterministic (NRAs), where an RA is deterministic if
|post(q, (a, d))| ≤ 1 for all configurations q and all inputs (a, d). A k-NRA (k-DRA, respectively) is an
NRA (DRA, respectively) with |R| = k.
Synchronizing words and synchronizing data words. Synchronizing words are a well-studied
concept for DFAs, see, e.g., [30]. Informally, a synchronizing word leads the automaton from every state
to the same state. Formally, the word w ∈ Σ+ is synchronizing for a DFA A = 〈Q, Σ,∆〉 if there exists
some state q ∈ Q such that ∆(Q,w) = {q}. The synchronization problem for DFAs asks, given a DFA A,
whether there exists some synchronizing word for A.
The synchronization problem for DFAs is in NLOGSPACE by using the pairwise synchronization
technique: given a DFA A = 〈Q, Σ,∆〉, it is known that A has a synchronizing word if and only if for all
pairs of states q, q′ ∈ Q, there exists a word v such that ∆(q, v) = ∆(q′, v) (see [30] for more details). The
pairwise synchronization algorithm initially sets S|Q| = Q. For i = |Q| − 1, · · · , 1, the algorithm repeats
the following two steps: (a) For two distinct states q, q′ ∈ Si+1, find vi such that ∆(q, vi) = ∆(q′, vi). (b)
Set Si = ∆(Si+1, vi) (and repeat the loop). The word w = v|Q|−1 · · · v1 is synchronizing for A.
We introduce synchronizing data words for RAs. Given an RA R = 〈L,R, Σ,T 〉, a data word w ∈
(Σ×D)+ is synchronizing forR if there exists some configuration qw = (ℓ, ν) such that post(L×D|R|,w) =
{qw}. Intuitively, no matter what is the starting location and register valuation, by inputting the data
word w, R will be in the unique successor configuration qw. This configuration qw depends on w. The
synchronization problem for RAs asks, given an RA R over a data domain D, whether there exists some
synchronizing data word for R. The length-bounded synchronization problem for RAs decides, given an
RA R and a bound N ∈ N written in binary, whether there exists some synchronizing data word w for
R satisfying |w| ≤ N .
3 Synchronizing data words for DRAs
In this section, we first show that the synchronization problems for 1-DRAs and DFAs are NLOGSPACE-
interreducible, implying that the problem is NLOGSPACE-complete for 1-DRAs. Next, we prove that
3
init
ℓ1
ℓ′1
ℓ2 ℓ3
synch
ℓ′2 ℓ
′
3
= r1 = r1∨ = r2 = r1∨ = r2∨ = r3
= r1 = r1∨ = r2 = r1∨ = r2∨ = r3
=
r1 , r1 ↓
else, r2 ↓ else, r3 ↓
els
e,
R↓
else, r2 ↓ else, r3 ↓ else, R↓
R↓
6= r
1,
r1↓
Figure 1: A DRA with registers r1, r2, r3 and the single letter a (omitted from transitions) that can
be synchronized in the configuration (synch,x4) by the data word wsynch = (a,x1)(a,x2)(a,x3)(a,x4) if
{x1,x2,x3,x4} ⊆ D is a set of 4 distinct data.
the problem for k-DRAs, in general, can be decided in PSPACE; a reduction similar to a timed setting,
as in [14], provides the matching lower bound. To obtain the complexity upper bounds, we prove that
inputting words with data efficiency 2|R|+ 1 is sufficient to synchronize a DRA.
The concept of synchronization requires that all runs of an RA, whatever the initial configuration
(initial location and register valuations), end in the same configuration (ℓsynch, νsynch), only depending on
the synchronizing data word wsynch, formally post(L ×D|R|,wsynch) = {(ℓsynch, νsynch)}. While processing
a synchronizing data word, the infinite set of configurations of RAs must necessarily shrink to a finite
set of configurations. The DRA R with 3 registers depicted in Figure 1 illustrates this phenomenon.
Consider the set {x1,x2,x3} ⊆ D of distinct data values: starting from any of the infinite configurations
in {init} × D3, when processing the data word (a,x1)(a,x2)(a,x3), R will be in a configuration in the
finite set {(ℓ3, (x1,x2,x3)), (ℓ′3, (x1,x2,x3)}. We use this observation to provide a linear bound on the
number of distinct data values that is sufficient for synchronizing DRAs.
In Lemma 1 below, we prove that data words over only |R| distinct data values are sufficient to shrink
the infinite set of all configurations of DRAs to a finite set. We establish this result based on the following
two key facts:
(1) When processing a synchronizing data word wsynch from a configuration (ℓ, ν) with some register
r ∈ R such that ν(r) 6∈ data(wsynch), the register r must be updated. Observe that such updates must
happen at inequality-guarded transitions, which themselves must be accessible by inequality-guarded
transitions (possibly with no update). As an example, consider the DRA R in Figure 1, and assume
d1, d2 6∈ data(wsynch). The two runs of R starting from (init, d1, d1, d1) and (init, d2, d2, d2) first take
the transition init
6=r1 a r1↓
−−−−−−−→ ℓ′1 updating register r1. Next, the two runs must take ℓ
′
1
else a r2↓
−−−−−−→ ℓ′2 to
update r2 and ℓ
′
2
else a r3↓
−−−−−−→ ℓ′3 to update r3; otherwise these two runs would never be synchronized in a
single configuration.
(2) Moreover, to shrink the set L × D|R|, for every ℓ ∈ L, one can find a word wℓ that leads the
DRA from {ℓ} ×D|R| to some finite set. Since R is deterministic, appending some prefix or suffix to wℓ
achieves the same objective. This allows us to use a variant of the pairwise synchronization technique to
shrink the infinite set L×D|R| to a finite set, by successively inputting wℓ for a location ℓ that appears
with infinitely many data in the current successor set of L×D|R|.
Lemma 1. For all DRAs for which there exist synchronizing data words, there exists some data word w
such that data(w) ≤ |R| and post(L ×D|R|,w) ⊆ L× (data(w))|R|.
Proof. Let R = 〈L,R, Σ,T 〉 be a DRA on the data domain D with k ≥ 1 registers. Let v be a synchro-
nizing data word for R with N = |data(v)| distinct data. Suppose that k < N ; otherwise the statement
of the lemma trivially holds.
For all 1 ≤ i ≤ k, we say that xi is the i-th datum in the synchronizing data word v = (a1, d1)(a2, d2) · · · (an, dn)
if there exists j ≤ k such that xi = dj , xi 6∈ {d1, · · · , dj−1} and |{d1, · · · , dj}| = i. For every i ≤ k,
4
denote by 〈L, i〉 the set
〈L, i〉 = L× {ν ∈ Dk | ∃R′ ⊆ R · |R′| ≥ i · ∀r ∈ R′ · ν(r) ∈ {x1, · · · ,xi}}.
We Claim that for all locations ℓ ∈ L and all 1 ≤ i ≤ k, there exists some data word ui such that
• data(ui) ⊆ {x1,x2, · · · ,xi}, and
• post({ℓ} ×Dk,ui) ⊆ 〈L, i〉, meaning that after reading ui all reached configurations have at least i
registers with values from {x1,x2, · · · ,xi}.
For ℓ ∈ L, let wℓ = uk satisfy the above condition. Set S0 = L × Dk and w0 = ε. Then, for all i =
1, · · · , |L|, repeat the following: if there exists some ℓ ∈ L such that {ℓ}× (D \ {x1, · · · ,xk})k ∩Si−1 6= ∅,
then set wi = wℓ and Si = post(Si−1,wi). Otherwise set wi = wi−1 and Si = Si−1. Observe that
w = (wi)1≤i≤|L| proves the statement of Lemma. It remains to prove the Claim.
Proof of Claim. Let ℓˆ be some location in the DRA R. The proof is by an induction on i.
Base of induction. Let wait = {ℓˆ} × (D \ data(v))k be the set of configurations with location ℓˆ such
that the data stored in all k registers is not in data(v). Note that for all configurations (ℓˆ, ν) ∈ wait,
the unique run of R starting in (ℓˆ, ν) on (a prefix of) v consists of the same sequence of the following
transitions:
• a prefix of transitions
∧
r∈R 6=r ∅↓
−−−−−−−−→, with inequality guards on all registers and with no register
update,
• followed by a transition
∧
r∈R 6=r up↓
−−−−−−−−−→, with inequality guard on all registers and with an update for
some non-empty set up ⊆ R.
Otherwise, the two runs starting from any pair of configurations (ℓˆ, ν1), (ℓˆ, ν2) ∈ wait with unequal
valuations ν1 6= ν2 would end up in distinct configurations, say (ℓ, ν′1), (ℓ, ν
′
2) with ν
′
1 6= ν
′
2. This is a
contradiction to the fact that the data word v is synchronizing.
Now let the inequality-guarded transition
∧
r∈R 6=r up↓
−−−−−−−−−→, updating the registers in up, be fired at the j-
th input (aj , dj) while reading v; see Figure 2. We prove that the data word u1 = (a1,x1)(a2,x1) · · · (aj ,x1)
with data(u1) = {x1} guides {ℓˆ} × Dk to a subset in which each configuration has some register with
value x1: post({ℓˆ} × Dk,u1) ⊆ 〈L, 1〉. This phenomenon is depicted in Figure 3 and can be argued as
follows. Observe that x1 = d1 is the first input datum; thus after inputting (a1,x1) the set of successors
is a disjoint union of two branches:
• either at least one register r has datum x1 after the transition
∨
r∈R=r a1
−−−−−−−−→. All the following
successors in this branch, on input (a2,x1)(a3,x1) · · · (aj ,x1), preserve the datum x1 in the register
r;
• or none of the registers is assigned x1 after the transition
∧
r∈R 6=r a1
−−−−−−−−→. By inputting (a2,x1)(a3,x1) · · · (aj ,x1),
all the following successors in this branch, thus, take inequality-guarded transitions, and would not
update any registers, except for the last transition
∧
r∈R 6=r up↓
−−−−−−−−−→ fired by (aj ,x1).
The above argument proves that u1 with data(u1) ⊆ {x1} is such that post({ℓˆ} ×Dk,u1) ⊆ 〈L, 1〉. The
base of induction holds.
Step of induction. Assume that the induction hypothesis holds for i − 1, namely, there exists some
word ui−1 with data(ui−1) ⊆ {x1, · · · ,xi−1} such that post({ℓˆ}×Dk,ui−1) ⊆ 〈L, i− 1〉. To construct ui,
we define the concept of a symbolic state: we say (ℓ, up, ν, j) is a symbolic state if ℓ ∈ L, the set up ⊆ R
of registers is such that |up| ≥ min(j, k) and ν ∈ {x1, · · · ,xj}k and j ≤ N . The semantics of (ℓ, up, ν, j)
is the following set:
J(ℓ, up, ν, j)K = {ℓ} × {ν′ ∈ Dk | ν′(r) = ν(r) if r ∈ up}.
5
. . .
{ℓˆ} ×Dk
{ℓ2} × (D \ {d1})k
if d1 6= d2 then
∨
r∈R = r
...
{ℓj} × (D \ {d1 × d2, · · · , dj−1})k
{ℓj+1} × {ν ∈ Dk | ν(r) =
{
dj r ∈ up
d ∈ D \ {d1, · · · , dj} r 6∈ up
}
∧
r∈R 6= r, up↓
∧
r∈R 6= r
∧
r∈R 6= r
∧
r∈R 6= r
{ℓ1} ×Dk \ (D \ {d1})k
∨
r∈R = r
Figure 2: Runs of R over the data
word (a1, d1)(a2, d2) · · · (aj , dj).
{ℓˆ} ×Dk
{ℓ2} × (D \ {x1})k
No successor!
∨
r∈R = r
...
{ℓj} × (D\{x1})k
No successor!
∨
r∈R = r
{ℓj+1} × {ν ∈ Dk | ν(r) =
{
x1 r ∈ up
d ∈ D \ {x1} r 6∈ up
}
∧
r∈R 6= r, up↓
∧
r∈R 6= r
∧
r∈R 6= r
∧
r∈R 6= r
{ℓ1} ×Dk \ (D \ {x1})k
Note that all successors
of this branch always pre-
serve the value x1 on the
register r which satisfies
the guard = r of the first
transition.
∨
r∈R = r
Figure 3: Runs of R over the data word u1 =
(a1,x1)(a2,x1) · · · (aj ,x1)
Denote by Γ the set of all such symbolic states (ℓ, up, ν, i− 1). By definition, the set Γ is finite. Now we
can construct ui as follows. Let S0 = post({ℓˆ}×Dk,ui−1) and w0 = ui−1. Recall that S0 ⊆ 〈L, i−1〉 and
observe that S0 ⊆
⋃
q∈ΓJqK. Start with j = 0 and, while Sj 6= ∅, pick a symbolic state q = (ℓ, up, ν, i− 1)
such that JqK ∩ Sj 6= ∅ and construct a word uq (as explained in the details below) such that
• data(uq) = {x1,x2, · · · ,xi}, and
• post(JqK,uq) ⊆ 〈L, i〉.
Let Sj+1 = post(Sj \ JqK,uq) and wj+1 = wj · uq. Repeat the loop for j + 1. Observe that ui = wj∗ ,
where j∗ ≤ |S0| is such that Sj∗ = ∅, satisfies the induction statement.
Below, given a symbolic state q = (ℓ, up, ν, i− 1), the aim is to construct the data word uq. Without
loss of generality, we assume that |up| = i− 1; otherwise uq = ui−1. Let
wait = J(ℓ, up, ν, i− 1)K ∩ {ℓ} × {ν′ | ν′(r) ∈ D \ data(v) if r 6∈ up}
be the set of all configurations in the symbolic state q, where all data stored in the registers r 6∈ up are
not in data(v). Similarly to the induction base, no matter what the register valuation in a configuration
in wait looks like, the unique run of R on the synchronizing word v = (a1, d1)(a2, d2) · · · (an, dn) starting
in that configuration takes the same sequence of transitions. Since ν ∈ {x0, · · · ,xi−1}k, after inputting
successive data from data(v), all successors of configurations in wait are elements of a symbolic state.
For all 0 ≤ j ≤ n, let the symbolic state qj = (ℓj , upj , νj ,N) be such that Jq0K = JqK ∩ wait, and
post(Jqj−1K, (aj , dj)) ⊆ JqjK if j ≥ 1.
In the sequel, we argue that there exists some 1 ≤ m ≤ n such that, in the sequence of transitions from
one symbolic state to another symbolic state over the prefix (a1, d1)(a2, d2) · · · (am, dm) of v (the first
m inputs), the following holds:
• on inputting (aj , dj) for all 1 ≤ j < m, the transition
(
∧
r∈Λj
=r)∧(
∧
r 6∈Λj
6=r) aj Γj↓
−−−−−−−−−−−−−−−−−−−−→ with Λj , Γj ⊆ up
is taken from qj−1 to qj . It implies that νj−1(r) = dj for all r ∈ Λj, and νj(r) = dj for all r ∈ Γj .
• and on inputting (am, dm), the transition
(
∧
r∈Λm
=r)∧(
∧
r 6∈Λm
6=r) am Γm↓
−−−−−−−−−−−−−−−−−−−−−−−→, that is taken from qm−1
to qm, is such that Λm ⊆ upm whereas Γm 6⊆ upm.
Now from the prefix (a1, d1)(a2, d2) · · · (am, dm) of v, i.e., the first m inputs, and from the set of data
{x1,x2, · · · ,xi}, we construct the word uq = (a1, y1)(a2, y2) · · · (am, ym) for q = (ℓ, up, ν, i− 1) as follows:
for all 1 ≤ j ≤ m,
6
• if Λj 6= ∅, i.e., some register r ∈ up already stores the datum dj , then yj = dj .
• if Λj = ∅, i.e., none of the registers r ∈ up stores the datum dj , then yj = d where d ∈
{x1,x2, · · · ,xi} \ {νj−1(r) | r ∈ up}. The existence of such d is guaranteed since |up| = i − 1
and |{x1,x2, · · · ,xi}| = i. Moreover, since the transitions
(
∧
r∈up 6=r) aj Γj↓
−−−−−−−−−−−−→ have inequality guards
for all registers, then changing the datum from dj to yj would result only in taking the same
transition.
Observe that data(uq) ⊆ {x1, · · · ,xi}. As a result, all registers that are updated along the runs of R
over uq store some datum from {x1, · · · ,xi}. This argument shows that post(JqK,uq) ⊆ 〈L, i〉. This
concludes the step of induction, and completes the proof.
After reading some word that shrinks the infinite set of configurations of DRAs to a finite set S of
configurations, we generalize the pairwise synchronization technique [30] to finally synchronize configu-
rations in S. By this generalization, we achieve the following Lemma 2, for which the detailed proof can
be found in Appendix 6.
Lemma 2. For all DRAs for which there exist synchronizing data words, there exists a synchronizing
data word w such that |w| ≤ 2|R|+ 1.
Given a 1-DRA R, the synchronization problem can be solved as follows: (1) check that from each
location ℓ an update on the single register is achieved by going through inequality-guarded transitions,
which can be done in NLOGSPACE. Lemma 1 ensures that feeding R consecutively with a single
datum x ∈ D is sufficient for this phase and the set of successors of L×D would be a subset of L×{x}.
Next (2) pick an arbitrary set {x, y, z} of data including x, by Lemma 2 and the pairwise synchronization
technique, the problem reduces to the synchronization problem for DFAs where data in registers and
input data extend locations and the alphabet: Q = L×{x, y, z} and Σ×{x, y, z}. Since a 1-DRA, where
all transitions update the register and are guarded with true, is equivalent to a DFA, we obtain the next
theorem.
Theorem 3. The synchronization problem for 1-DRAs is NLOGSPACE-complete.
We provide a family of DRAs, for which a linear bound on the data efficiency of synchronizing data
words, depending on the number of registers, is necessary. This necessary and sufficient bound is crucial
to establish membership of synchronizing DRAs in PSPACE.
Lemma 4. There is a family of single-letter DRAs (Rn)n∈N, with n = |R| registers and O(n) locations,
such that all synchronizing data words have data efficiency Ω(n).
Proof. The family of DRAs Rn(n ∈ N) is defined over an infinite data domain D. The DRA Rn
has n registers and a single letter a. The structure of Rn is composed of two distinguished locations init
and synch and two chains, where each chain has n locations: ℓ1, ℓ2, · · · , ℓn and ℓ′1, ℓ
′
2, · · · , ℓ
′
n. The DRAR3
is shown in Figure 1. The only transition in synch is a self-loop with update on all n registers, thus Rn
can only be synchronized in synch. There are two transitions in init, each going to one of the chains:
init
=r1 a r1↓
−−−−−−−→ ℓ1 and init
6=r1 a r1↓
−−−−−−−→ ℓ′1.
Then, post({init} ×Dn, (a,x)) = {ℓ1, ℓ′1} × ({x} ×D
n−1) for all x ∈ D.
From {ℓ1, ℓ′1} × ({x} × D
n−1), informally speaking, in both chains the respective i-th locations are
simultaneously reached after inputting i distinct data: for all 1 ≤ i < n, in each ℓi and ℓ′i there are two
transitions. One transition is a self-loop, with a satisfied equality guard on at least one of the updated
registers r1, . . . , ri so far. The other transition goes to the next location ℓi+1 in the chain, with an
inequality guard on all updated registers r1, r2, · · · , ri so far, and an update on the next register ri+1.
ℓi
∨
r∈{r1,··· ,ri}
(=ri) a
−−−−−−−−−−−−−−−−→ ℓi and ℓi
∧
r∈{r1,··· ,ri}
( 6=ri) a ri+1↓
−−−−−−−−−−−−−−−−−−→ ℓi+1,
7
ℓ′i
∨
r∈{r1,··· ,ri}
(=ri) a
−−−−−−−−−−−−−−−−→ ℓ′i and ℓ
′
i
∧
r∈{r1,··· ,ri}
( 6=ri) a ri+1↓
−−−−−−−−−−−−−−−−−−→ ℓ′i+1.
At the last locations ℓn and ℓ
′
n of the two chains, there is one transition with inequality guards on all
registers leaving the chain to synch, and there is one transition which is, again, a self-loop with an equality
constraint for at least one of the registers.
ℓn
∧
r∈R( 6=ri) a R↓
−−−−−−−−−−−−−→ synch and ℓn
else a
−−−−→ ℓn ℓ
′
n
∧
r∈R( 6=ri) a R↓
−−−−−−−−−−−−−→ synch and ℓ′n
else a
−−−−→ ℓ′n.
By construction, we see that n + 1 distinct data values must be read for reaching synch from the
infinite set {init} ×Dn. Since Rn can only be synchronized in synch, all synchronizing data words must
have data efficiency at least n+ 1 ∈ Ω(n).
It remains to prove that Rn has indeed some synchronizing word. Let {x1,x2, · · · ,xn+1} be a
set of n + 1 distinct data values and wsynch = (a,x1)(a,x2) · · · (a,xn)(a,xn+1). For the configuration
space L = {init, synch, ℓ1, · · · , ℓn, ℓ′1, · · · , ℓ
′
n}, observe that post(L × D
n,wsynch) = {(synch,xn+1)} and
|data(wsynch)| = n+ 1. The proof is complete.
Theorem 5. The synchronization problem for k-DRAs is PSPACE-complete.
Proof. (Sketch) The synchronization problem for k-DRA is in PSPACE using the following co-(N)PSPACE
algorithm: (1) pick a set X = {x1,x2, · · · ,x2k+1} of distinct data values. (2) guess some location ℓ ∈ L
and check if there is no word w ∈ (Σ × {x1,x2, · · · ,xk})∗ with length |w| ≤ 2k|L||Σ| such that along
firing transitions that arer inequality-guarded on all k registers, some registers are not updated. If (2)
is satisfied, then return “no” (meaning that there is no synchronizing data word for the input k-DRA).
Otherwise, (3) guess two configurations q1, q2 ∈ L ×Xk such that there is no word w ∈ (Σ ×X)∗ with
length |w| ≤ 2(2k+1)|L||Σ| such that |post({q1, q2},w)| = 1. If (3) is satisfied, then the algorithm returns
“no”; otherwise return “yes”.
For PSPACE-hardness, we adapt an established reduction (see, e.g., [14]) from the non-emptiness
problem for k-DRA, see Appendix 6. The result then follows by PSPACE-completeness of the non-
emptiness problem for k-DRA [11].
4 Synchronizing data words for NRAs
In this section, we study the synchronization problems for NRAs. We slightly update a result in [14]
to present a general reduction from the non-universality problem to the synchronization problem for
NRAs. This reduction proves the undecidability result for the synchronization problem for k-NRAs,
and Ackermann-hardness in 1-NRAs. We then prove that for 1-NRAs, the synchronization and non-
universality problems are indeed interreducible, which completes the picture by Ackermann-completeness
of the synchronization problem for 1-NRAs.
In the nondeterministic synchronization setting, we present two kinds of counting features, which are
useful for later constructions. For the first one, we define a family (Rcounter(n))n∈N of 1-NRAs with size
only linear in n, where an input datum x ∈ D must be read 2n times to achieve synchronization.
Lemma 6. There is a family of 1-NRAs (Rcounter(n))n∈N with O(n) locations, such that for all synchro-
nizing data words w, some datum d ∈ data(w) appears in w at least 2n times.
Proof. (Sketch) The 1-NRA Rcounter(n) shown in Figure 4 encodes a binary counter that ensures that in
every synchronizing data word w some datum x ∈ data(w) appears at least 2n times. The location synch
has self-loops on all letters, thus, Rcounter(n) can only be synchronized in location synch. Generally
speaking, the counting involves an initializing process and several incrementing processes. The initializing
process is started by firing a ⋆-transition, which places a token, let us say: an x-token, into location zero.
This sets the counter to 0. Note that firing ⋆-transitions is the only way to guide tokens out of reset;
hence, whenever there is some token in reset, a new initializing process must be started. We use this to
enforce a new initializing process whenever some transition is fired that is incorrect with respect to the
incrementing process.
8
2n 22 21 20
2nc 2
2
c 2
1
c 2
0
c reset
resetsynch
zero
=,Bit0=,Bit0=,Bit0
. . .
. . .
Σ, ↓
=
,#
, ↓
=
,# Σ\{⋆}
=,
Bi
t2
=
,B
it
n
=,Bit3 , . . . ,Bit
n
⋆, ↓
=,Bit1,Bit2, . . . ,Bitn
=
,B
it
0
=,Bit0=,Bit0,Bit1
=,Bit0=,Bit0,Bit1=,Bit0, . . . ,Bitn−1
=
,
B
it
0
=
,
B
it
1
,
B
it
2
,
.
.
.
,
B
it
n
=
,
B
it
1
=
,
B
it
2
,
.
.
.
,
B
it
n
=
,
B
it
2
=
,
B
it
3
,
.
.
.
,
B
it
n
=
,
B
it
n
Figure 4: A partial picture of the 1-NRA Rcounter(n) (with n ≥ 3) implementing a binary counter. In
order to avoid crossing edges in the figure, we use two copies of the same location reset. All locations
have inequality-guarded self-loops for all letters in Σ\{⋆}. All missing equality-guarded ⋆-transitions are
directed to zero. For all 0 ≤ i < n, missing equality-guarded #-transitions from 2ic are guided to synch
with an update on the register. All other non-depicted equality-guarded transitions are directed to reset,
and inequality-guarded transitions are self-loops.
An incrementing process can be set off by inputting the datum x via equality guards. The numbers
1 ≤ m ≤ 2n are represented by placing a copy of the x-token in the locations corresponding to the binary
representation of m. An x-token in location 2i (in 2ic, respectively) means that the i-th least significant
in the binary representation is set to 1 (to 0, respectively). First, a Bit0-transition places a copy of the
x-token in each of {2nc , . . . , 2
2
c , 2
1
c , 2
0} to represent 0...001. In each incrementation step the x-tokens
are re-placed by firing specific Biti-transitions (0 ≤ i ≤ n), following the standard procedure of binary
incrementation. At the end, when a copy of the x-token locates in each of {2n, 2n−1c , . . . , 2
0
c} (representing
10...0), the #-transitions guide all of these tokens to location synch and finally synchronize Rcounter. We
give a detailed explanation of the structure of Rcounter(n) in Appendix 7.
We present a second kind of counting features in RAs that explains the hardness of synchronizing
NRAs, even with a single register. In Lemma 7, we define a family of 1-NRAs (with only O(n) locations),
where tower(n) distinct data must be read to gain synchronization. Recall from [27] that the function
tower is at level three of the infinite Ackermann hierarchy (Ak)k∈N of fast-growing functions Ai : N →
N, inductively defined by A1(n) = 2n and Ak+1(n) = A
n
k (1) = Ak(. . . (Ak︸ ︷︷ ︸
n times
(n)) . . . ). Hence, applying
doub
def
= A1, exp
def
= A2, and tower
def
= A3, respectively, on some natural number n results in some number
that is double, exponential, and tower, respectively, in n. The function Aω(n) = An(n) is a non-primitive
recursive Ackermann-like function, defined by diagonalization.
Lemma 7. There is a family of 1-NRAs (Rtower(n))n∈N with O(n) locations, such that |data(w)| ≥
tower(n) for all synchronizing data words w.
Proof. The domain of the family of 1-NRAs (Rtower(n))n∈N is the natural numbers N. The alphabet
of Rtower(n) is Σ = {#, ⋆, rep, doub, exp, tow}. The structure of Rtower(n) is composed of n locations
data1, data1,2, · · · , data1,2,··· ,n and 6 more locations reset, synch, store, rep,waitDoub,waitExp. The general
structure of Rtower(n) is partially depicted in Figure 5. The NRA Rtower(n) is such that |data(w)| ≥
tower(n) for all synchronizing data words w.
All transitions in synch are self-loops with an update on the register synch
Σ r↓
−−−−→ synch; thus, Rtower(n)
can only be synchronized in synch. Moreover, synch is only accessible from store by a #-transition.
Assuming w is one of the shortest synchronizing words, we see that post(L×D,w) = {(synch,x)}, where
w ends with (#,x).
From all locations ℓ ∈ L \ {synch}, we have ℓ
⋆ r↓
−−−−→ data1; we say that ⋆-transitions reset Rtower(n).
Moreover, the only outgoing transition in location reset is the ⋆-transition. Thus, a reset must occur
in order to synchronize Rtower(n). After this forced reset, say on reading (⋆, 1), the set of reached
9
reset data1 data1,2 . . .
waitTow
(data1,2,...,n)
waitExpwaitDoubrepstoresynch
exp
tow
⋆, ↓
6=, rep
6=, rep, ↓
6=, rep
6=, rep, ↓
6=, rep
6=, rep, ↓
exp, rep, doub
6=, tow
=, tow
=, exp
=, doub
6=, rep
6=, rep, ↓
#, ↓
Σ, ↓
rep, doub
6=, exp
6=, rep
6=, doub
6=, rep
6=, doub
Figure 5: A partial illustration of the 1-NRA Rtower(n) for n ≥ 3. All ⋆-transitions are guided to data1
with an update on the register. All other missing non-depicted transitions are directed to reset.
configurations is {(data1, 1), (synch, 1)}. Since resetting is inefficient, we try to avoid it; we call all
transitions leading to reset inefficient.
For all locations data1,··· ,i with 1 ≤ i < n, we define the two transitions
data1,··· ,i
6=r rep
−−−−−→ data1,··· ,i+1 and data1,··· ,i
6=r rep r↓
−−−−−−→ data1,··· ,i+1.
All other transitions in data1,··· ,i are inefficient and directed to reset. Below, we rename data1,2,··· ,n to
waitTow. We partially depict the transitions from waitTow, waitExp, waitDoub, rep and store in Figure 5.
All transitions are inefficient, except
• waitTow
=r tow
−−−−−→ waitExp, waitTow
6=r tow
−−−−−→ waitTow, and waitTow
σ
−−→ waitTow for all σ ∈ {doub, exp, rep}.
• waitExp
=r exp
−−−−−→ waitDoub, waitExp
doub
−−−−→ waitExp and waitExp
rep
−−−→ waitExp.
• waitDoub
=r doub
−−−−−−→ rep, waitDoub
6=r doub
−−−−−−→ waitDoub and waitDoub
6=r rep
−−−−−→ waitDoub,
• rep
6=r rep
−−−−−→ store and rep
6=r rep r↓
−−−−−−→ store,
• store
tow
−−−→ waitExp, store
exp
−−−→ waitDoub, store
6=r doub
−−−−−−→ store and store
6=r rep
−−−−−→ store, and
• store
# r↓
−−−−→ synch.
We remark that store
# r↓
−−−−→ synch is the only #-transition that is not inefficient. This implies that
for efficiently synchronizing Rtower(n), one needs to re-move all produced tokens to store before firing a
#-transition. The main issue in re-moving produced tokens, however, is that some inequality-guarded
transitions are unavoidable, and these transitions may replicate the tokens. For example, if one token
is in data1, firing two transitions data1
6=r rep
−−−−−→ data1,2 and data1
6=r rep r↓
−−−−−−→ data1,2 replicates one token
to two tokens in data1,2. Using this, one can implement doubling, exponentialization, and towering of
distinct tokens, as explained in the following.
Doubling: Assume that there are n distinct tokens {1, 2, . . . ,n} in waitDoub. Then the only efficient
transition is waitDoub
=r doub
−−−−−−→ waitRep. In particular, all {#, exp, tow}-transitions activate a reset. As
a result, as long as some token is in waitDoub, {#, exp, tow}-transitions should be avoided for the sake
of efficiency. This implies that for all 1 ≤ i ≤ n, the i-token in waitDoub can leave the location only
individually on the input (doub, i). Now, inputting (doub, i) moves the i-token to waitRep. Here the i-
token must immediately move on to store via the inequality-guarded rep-transitions, which will replicate
the i-token into two tokens. Note that we must fire rep-transitions with some “fresh” datum j such that
j 6∈ {1, . . . ,n}, otherwise a reset is evoked. (For simplicity, we use j = i + n by convention.) It can now
10
be easily seen that the only efficient way to guide all n tokens out of waitDoub is by inputting the data
word
wdoub(n) = (doub, 1)(rep,n+ 1)(doub, 2)(rep,n+ 2) . . . (doub,n)(rep, 2n),
which puts 2n distinct tokens into store.
Exponentialization: Assume there are n distinct tokens {1, 2, . . . ,n} in waitExp. The only efficient
transition is waitExp
=r exp
−−−−−→ waitDoub. In particular, all {#, tow}-transitions activate a reset, and should
be avoided as long as some token is in waitExp. This implies that for all 1 ≤ i ≤ n, the i-token in waitExp
can leave the location only individually on the input (exp, i). Now, inputting (exp, 1) moves the 1-token
to waitDoub. From above we know that the only efficient way for guiding a single token in waitDoub
towards synchronization is by inputting the data word wdoub(1), resulting in two distinct tokens in store:
1 and 2. We can now proceed to remove the 2-token from waitExp by inputting (exp, 2). Note that this
also guides the {1, 2}-tokens residing in store to waitDoub. Again, for efficient synchronization, we must
input the data word wdoub(2), which results in four distinct tokens {1, 2, 3, 4} in store. It is now easy to
see that the only efficient way to guide all n tokens out of waitExp is by inputting the data word
wexp(n) = (exp, 1) · wdoub(1) · (exp, 2) · wdoub(2) · (exp, 3) · wdoub(4) · . . . · (exp,n) · wdoub(2n−1),
which puts 2n distinct tokens into store.
Towering: Assume there are n distinct tokens {1, 2, . . . ,n} in waitTow. The only efficient transition
is waitExp
=r tow
−−−−−→ waitExp. In particular, firing #-transitions activates a reset, and should be avoided
as long as some token is in waitTow. This implies that for all 1 ≤ i ≤ n, the i-token in waitTow can
leave the location only individually on the input (tow, i). Now, inputting (exp, 1) moves the 1-token to
waitExp. From above we know that the only efficient way for guiding a single token in waitTow towards
synchronization is by inputting the data word wexp(1), resulting in two distinct tokens in store: 1 and
2. We can now proceed to remove the 2-token from waitTow by inputting (tow, 2). Note that this also
guides the {1, 2}-tokens residing in store to waitExp. Again, for efficient synchronization, we must input
the data word wexp(2), which results in four distinct tokens {1, 2, 3, 4} in store. It is now easy to see that
the only efficient way to guide all n tokens out of waitTow is by inputting the data word
wtow(n) = (tow, 1) · wexp(1) · (tow, 2) · wexp(2) · (tow, 3) · wexp(4) · . . . · (tow,n) · wexp(tower(n−1)),
which puts tower(n) distinct tokens into store.
Now, after the (forced) initial reset by firing ⋆-transitions, it is easy to see that the only data word
that advances in synchronizing is (rep, 2)(rep, 3) · · · (rep,n). It replicates the 1-token to n distinct tokens
1, 2, · · · ,n, which are placed into waitTow. From above we know that the only efficient way to guide all
n tokens out of waitTow is by inputting wtow(n), which places tower(n) distinct tokens into store. We can
now fire #-transitions to synchronize Rtower(n) without evoking a reset, but note that due to the equality
guard at the #-transition from store to synch, each of the tower(n) distinct tokens in store can move to
synch only individually. This implies |data(w)| ≥ tower(n) for all synchronizing words w.
We can now use similar ideas as in Lemma 7 for defining a family of 1-NRAs RAn(m) (n,m ∈ N)
such that all synchronizing data words of RAn(m) have data efficiency at least An(m), where An is at
level n of the Ackermann hierarchy. This provides a good intuition that the synchronization problem for
NRAs must be Ackermann-hard, even if the NRA has a single register. In the following, we prove that
the synchronization problem and the non-universality problem for NRAs are interreducible.
Let us first define the non-universality problem for RAs. To define the language of a given NRA R, we
equip it with an initial location ℓin and a set Lf of accepting locations, where, without loss of generality,
we assume that all outgoing transitions from ℓin update all registers. The language L(R) is the set of
all data words w ∈ (Σ × D)∗, for which there is a run from (ℓin, νin) to (ℓf, νf) such that ℓf ∈ Lf and
νin, νf ∈ D|R|. The non-universality problem asks, given an RA, whether there exists some data word w
over Σ such that w 6∈ L(R). We adopt an established reduction in [14] to provide the following Lemma.
Lemma 8. The non-universality problem is reducible to the synchronization problem for NRAs.
11
The detailed proof can be found in Appendix 7. As an immediate result of Lemma 8 and the undecid-
ability of the non-universality problem for NRAs (Theorems 2.7 and 5.4 in [11]), we obtain the following
theorem.
Theorem 9. The synchronization problem for NRAs is undecidable.
Next, we present a reduction showing that, for 1-NRAs, the synchronization problem is reducible to
the non-universality problem, providing the tight complexity bounds for the synchronizing problem.
Lemma 10. The synchronization problem is reducible to the non-universality problem for 1-NRAs.
Proof. We establish a reduction from the synchronization problem to the non-universality problem for 1-
NRAs as follows. Given a 1-NRA R = 〈L,R, Σ,T 〉, we construct a 1-NRA Rcomp equipped with an initial
location and a set of accepting locations such that R has some synchronizing word if, and only if, there
exists some data word that is not in L(Rcomp).
First, we see that an analogue of Lemma 1 holds for 1-NRAs: for all 1-NRAs with some synchronizing
data word, there exists some word w with data efficiency 1 such that post(L×D,w) ⊆ L×data(w). For all
locations ℓ ∈ L, such a data word must update the register by firing an inequality-guarded transition that
is reached only via inequality-guarded transitions; this can be checked in NLOGSPACE. Given R, we
assume that such a data word w always exists; otherwise, we define Rcomp to be a 1-NRA with a single
(initial and accepting) location equipped with self-loops for all letters, so that L(Rcomp) = (Σ × D)∗.
Given data(w) = {x}, we say that R has some synchronizing word v if post(L× {x}, v) is a singleton.
Second, we define a data language lang such that data words in this language are encodings of the syn-
chronizing process. Let L = {ℓ1, ℓ2, · · · , ℓn} be the set of locations and x, y two distinct data. Informally,
each data word in lang starts with the
• initial block : a delimiter (⋆, y), the sequence (ℓ1,x), (ℓ2,x), · · · , (ℓn,x) and an input (a, d) ∈ Σ×D
as the beginning of a synchronizing word. The initial block is followed by several
• normal blocks : the delimiter (⋆, y), the set of successor configurations reached from the configu-
rations and the input of the previous block, and the next input (a′, d′) of the synchronizing data
word. The data word finally ends with the
• final block : the delimiter (⋆, y), a single successor configuration reached from the configurations and
the input of the previous block, and the delimiter (⋆, y).
Formally, the language lang is defined over the alphabet Σlang = Σ ∪ L ∪ {⋆} where ⋆ 6∈ Σ ∪ L. It
contains all data words u that satisfy the following membership conditions :
1. The data words u starts with (⋆, y)(ℓ1,x), (ℓ2,x), · · · , (ℓn,x) for some x, y ∈ D with y 6= x; this
condition guarantees the correctness of the encoding for the initial block.
2. Let proj(u) be the projection of u into Σlang (i.e., omitting the data values). Then there exists
some ℓsynch ∈ L where proj(u) ∈ (⋆L+Σ)+ ⋆ ℓsynch ⋆. This condition guarantees the right form of
data words to be encodings of synchronizing processes.
The next two conditions guarantee the uniqueness of the delimiter:
3. The letter ⋆ in u occurs only with datum y.
4. No other letter in u occurs with datum y.
The next three conditions guarantee that all the successors that can be reached from configurations
and inputs in each block are correctly inserted in the next block. For all (ℓ,x) ∈ L×D and (a, d) ∈ Σ×D
in the same block,
5. if x = d and there exists a transition ℓ
=r a
−−−−→ ℓ′ (with or without update), then (ℓ′,x) must be in
the next block.
12
6. if x 6= d and there exists a transition ℓ
6=r a
−−−−→ ℓ′, then (ℓ′,x) must be in the next block.
7. if x 6= d and there exists a transition ℓ
6=r a r↓
−−−−−→ ℓ′ then (ℓ′, d) must be in the next block.
By construction, the NRA R has some synchronizing data word if, and only if, lang 6= ∅. Below, we
construct a 1-NRARcomp that accepts the complement of lang. Then, the NRAR has some synchronizing
data word if, and only if, there exists some data word that is not in L(Rcomp).
The 1-NRA Rcomp is the union of several 1-NRAs that are in the family of 1-NRAs R1,R2, · · · ,R7,
where an 1-NRA is in the family Ri if it violates the i-th condition among the membership conditions
in lang.
1. Family R1: we add a 1-NRA that accepts data words not starting with (⋆, y)(ℓ1,x), · · · , (ℓn,x).
i 0 1 · · · n
f
⋆ ↓
else
6= r ℓ1 ↓
else
= r ℓ2
else
= r ℓn
Σ′
Σ′
2. Family R2: we add a DFA that accepts data words u such that proj(u) is not in the regular
language (⋆L+Σ)+ ⋆ ℓsynch ⋆.
3. Family R3: we add a 1-NRA that accepts data words in which two delimiters ⋆ have different data.
1 2 3
⋆ ↓
else
6= r ⋆
Σ′
4. Family R4: we add a 1-NRA that accepts data words in which the datum of first ⋆ is not used only
by occurrences of ⋆.
1 2 3
⋆ ↓
else
= r Σ′ \ {⋆}
Σ′
5. Family R5: for all transitions ℓ
=r a
−−−−→ ℓ′, we add a 1-NRA that only accepts data words such that
one block contains some (ℓ,x) and (a, d) with x = d where the next block does not have (ℓ′,x).
1 2 3 4 5
6
7
Σ′
⋆
L \ {ℓ}
ℓ ↓
L
= r a ⋆
L \ {ℓ} else
⋆
=
r
ℓ ′
Σ′
Σ′
6. Family R6: for all transitions ℓ
6=r a
−−−−→ ℓ′, we add a 1-NRA that only accepts data words such that
one block contains some (ℓ,x) and (a, d) with x 6= d where the next block does not have (ℓ′,x).
1 2 3 4 5
6
7
Σ′
⋆
L \ {ℓ}
ℓ ↓
L
6= r a
L \ {ℓ}
⋆
else
⋆
=
r
ℓ ′
Σ′
Σ′
7. Family R7: for all transitions ℓ
6=r a r↓
−−−−−→ ℓ′, we add a 1-NRA that only accepts data words such
that one block contains some (ℓ,x) and (a, d) with x 6= d where the next block does not have (ℓ′, d).
13
1 2 5
6
3
4 synch
=, b
a, ↓
=, bb
a, b, ↓a
6=, b, ↓6=, b, ↓
b, ↓a6=, ba, ↓
a, ↓ =, a a
=, b =, b
b, ↓
Figure 6: An RA with synchronizing data word (a,x)(b, y)(b, z) with three distinct data values x, y, z.
The approach of using a unique data value to shrink the infinite set of configurations to a finite subset
only yields synchronizing data words of length greater than 3.
1 2 3 4 5
6
7
Σ′
⋆
L \ {ℓ}
ℓ ↓
L
6= r a ↓
L \ {ℓ}
⋆
else
⋆
=
r
ℓ ′
Σ′
Σ′
The proof is complete.
By Lemmas 8 and 10 and Ackermann-completeness of the non-universality problem for 1-NRA, which
follows from Theorem 2.7 and the proof of Theorem 5.2 in [11], and the result for counter automata with
incrementing errors in [18], we obtain the following theorem.
Theorem 11. The synchronization problem for 1-NRAs is Ackermann-complete.
5 Length-Bounded synchronizing data words for NRAs
As proved in the previous section, the synchronization problem for NRAs is in general undecidable. In
this section, we study the length-bounded synchronization problem for NRAs, in which the synchronizing
data words are required to be shorter than a given length (written in binary).
To decide the synchronization problem in 1-RAs, both in the deterministic and nondeterministic
setting, we rely on Lemma 1. With this lemma at hand, it was sufficient to search for synchronizing data
words that first input a single datum x (chosen arbitrary) as many times as necessary to have the set of
successor configurations included in L×{x}. In the next step, this obtained set of successor configurations
was synchronized in a singleton. However, the shortest synchronizing data words do not always follow
this pattern, for an example see Figure 6. Observe that the data word (a,x)(b, y)(b, z) is synchronizing
with length 3 (not exceeding the bound 3). However, all synchronizing data words that repeat a datum
such as x, to first bring the RA to a finite set, have length at least 4. The example shows that one cannot
rely on the techniques developed in Section 4 to decide the length-bounded synchronization problem for
NRA.
In this section, we prove
Theorem 12. The length-bounded synchronization problem for NRAs is NEXPTIME-complete.
The NEXPTIME-membership of the length-bounded synchronization problem is straightforward:
guess a data word w shorter than the given length (that is written in binary and thus may be expo-
nential in the length) and check in EXPTIME whether w is synchronizing. Our main contribution is to
prove the NEXPTIME-hardness of this problem, for which in turn, by Lemma 8, it is sufficient to show
that the length-bounded universality problem is co-NEXPTIME-complete. The length-bounded univer-
sality problem asks, given an RA and N ∈ N encoded in binary, whether all data words w with |w| ≤ N
are in the language of the automaton.
14
Theorem 13. The length-bounded universality problem for NRAs is co-NEXPTIME-complete.
Proof. The length-bounded universality problem for NRAs can be solved in co-NEXPTIME, by guessing
a (possibly exponentially long) data word, and check whether the guessed word is a witness for non-
universality of the RA.
We prove that the complement of the length-bounded universality problem is NEXPTIME-hard.
The proof is a reduction from the membership problem of O(2n)-time bounded nondeterministic Turing
machines : given a nondeterministic Turing machine M and an input word x, decide whether M accepts
x within time bound 2|x|. This problem is NEXPTIME-complete.
Given a nondeterministic Turing machine M and an input x of length n, we construct an NRA R
equipped with an initial location and a set of accepting locations, and a bound N (encoded in binary)
such that there exists a witness of non-universality w (i.e., w 6∈ L(R)) with |w| ≤ N if, and only if, M
has some accepting computation on x within time bound 2n.
Let M have the set Q of control states and the tape alphabet Γ. Let us recall that a configuration
ofM is a word in the language Γ∗(Q×Γ)Γ∗, where each letter in (Q×Γ)∪Γ encodes a single cell and the
position of the reading/writing head. A computation ρ ofM is a sequence c0c1c2 · · · of configurations that
respects the transition function of the Turing machine. Without loss of generality, we assume thatM has
a self-loop on all accepting states. Hence for the input x ∈ Γ∗ of length n, all accepting computations ρ
of M are sequences of length exactly 2n, and all configurations ci along such a computation are words
ci ∈ Γ∗(Q× Γ)Γ∗ of length at most 2n. In the following, we pad the configurations shorter than 2n with
at the tail such that the length of all such configurations become equal to 2n.
Let ΣM := Σ ∪Σ′, where
Σ = (Q× Γ) ∪ Γ ∪ ˙(Q× Γ) ∪ Γ˙ ∪ { , ˙ , #, ⋆}
be such that , ˙ ,#, ⋆ 6∈ Γ. Here, ˙(Q× Γ) and Γ˙ denote a dotted version of letters in Q×Γ and Γ; formally
{ ˙(q, a) | (q, a) ∈ (Q × Γ)} and {a˙ | a ∈ Γ},
and Σ′ will be defined later. Let K = 23n + 22n + 1. Given a computation ρ = c1 · · · c2n , we define
u(ρ) ∈ ΣK , roughly speaking, such that
1. It consists of 2n copies of ρ (with some extra delimiters).
2. Between all consecutive copies of ρ there is a ⋆ delimiter, and u(ρ) starts and ends with ⋆, too.
Hence, there are 2n + 1 occurrences of ⋆ in u(ρ).
3. In each copy of ρ, there is a # delimiter between consecutive configurations. Since there are 2n
configurations in (each copy of) ρ, the number of # in u(ρ) is 2n(2n − 1).
4. In the i-th copy of ρ, the letter for the i-th cell of every participating configuration ci is dotted, all
other letters are non-dotted. Hence, in each copy of ρ there are exactly 2n dotted letters (one in
each configuration of ρ), with distance 2n + 1.
5. The distance between two ⋆ delimiters is 22n + 2n − 1, due to the fact that ρ consists of 2n
configurations, each of which has 2n tape cells in turn and is separated from the next configuration
by a # delimiter.
Figure 7 illustrates an example of u(ρ). Observe that for all ρ = c0c1 · · · c2n , we have
|u(ρ)| = 2n︸︷︷︸
number of copies of ρ
number of configurations ci in ρ︷︸︸︷
2n 2n︸︷︷︸
length of ci
+
number of #︷ ︸︸ ︷
2n(2n − 1)+ (2n + 1)︸ ︷︷ ︸
number of ⋆
= K.
We define a data language lang over the alphabet Σ such that data words in this language are faithful
encodings of computations ρ of M over the input word x. In particular, the language contains all data
words v that satisfy the following conditions:
15
⋆ ˙(qinit, a1) a2 # a˙ (q, a2) . . .
0 1 2 3 4 0 6 7 8 9
⋆ (qinit, a1) a˙2 # a ˙(q, a2) . . .
0 1 2 3 4 0 6 7 8 9
⋆ (qinit, a1) a2 ˙ # a (q, a2) ˙ . . .
0 1 2 3 4 0 6 7 8 9
⋆ (qinit, a1) a2 ˙ # a (q, a2) ˙ . . .
0 1 2 3 4 0 6 7 8 9
copies of the
encodings
shifting
dotted letters
unique data
identifiers
Figure 7: Partial encoding of u(ρ) for an accepting 22-time bounded computation ρ of a Turing machine
on a1a2.
6. Let proj(v) be the projection of v into Σ (i.e., omitting the data values). There exists some accepting
computation ρ of M on the input x such that proj(v) = u(ρ).
7. The letters ⋆ and # occur only with a unique datum, say datum 0 (and no other letter occurs with
that datum).
8. For all occurrences of ⋆, for all 1 ≤ i ≤ 22n + 2n − 1, all letters at the i-th positions after each ⋆
must carry the same datum, say datum i. Except for occurrences of #, the datum i is exclusive for
the i-th positions after occurrences of ⋆.
Given a data word v ∈ lang such that proj(v) = u(ρ) for some computation ρ, condition (8) and
previous conditions on u(ρ) entail that for all 1 ≤ j, k ≤ 2n the j-th tape cell in the k-th configuration
ck of all copies of ρ in v carries the same datum (revisit Figure 7). Observe that all data words v ∈ lang
use exactly 22n + 1 distinct data values.
By definition of lang, we see that lang is non-empty if, and only if, there is an accepting computation ρ
of M over x. Recall that ΣM = Σ ∪ Σ
′ (where Σ′ is defined later). Below, we construct a 1-NRA R
over alphabet ΣM such that the language accepted by R (projected into Σ, ignoring Σ′ letters) is the
complement of lang. At the end, we examine the existence of N ∈ O(K) such that M has an accepting
computation over x if, and only if, R is (length-bounded) non-universal with respect to the bound N .
The 1-NRA R is the union of several 1-NRAs and DFAs that we describe in the following. Each of
these automata violates one of the necessary conditions for data words v to be in lang.
• We add a DFA that accepts data words v such that proj(v) is not in the regular language (⋆L)∗⋆
where L is defined by
(
(Γ + Γ˙)∗
(
(Q × Γ) + ˙(Q × Γ)
)
(Γ + Γ˙)∗ ( + ˙)∗#
)∗
.
• We add a DFA that accepts data words v such that proj(v) does not start with
⋆
(
˙(qinit, a1)a2a3 . . . an
∗#
)
,
where qinit is the initial control state ofM and x = a1a2 · · · an is the input. This regular expression
also guarantees that in the first copy of ρ, the first cell is dotted.
• We add a DFA that accepts data words w containing at least two dotted letters between two
consecutive #.
16
• We add a 1-NRA that accepts data words in which some delimiter occurs with some datum different
from the datum for the first ⋆.
1 2 3
⋆ ↓
else
6= r ⋆, #
Σ
• We add a 1-NRA that accepts data words in which some other letter appears with the datum
dedicated to delimiters ⋆ and #.
1 2 3
⋆ ↓
else
= r Σ \ {⋆, #}
Σ
• We add 1-NRA that accepts data words in which there are two letters (other than #) between two
consecutive ⋆ that carry the same datum.
1 2 3 4
Σ
⋆ Σ \ {#, ⋆} ↓
else
= r Σ \ {⋆, #}
else
Σ
• We add a 1-NRA that accepts data words v such that there are two consecutive # whose distance is
not exactly 2n (ignoring the occurrences of ⋆). For this we use a variant of Rcounter(n) implementing
a binary counter introduced in Section 4. For accepting data words v such that the distance between
two consecutive # is less than 2n, we add a transition
2nc
#
−−→ ℓf,
and for accepting those words that the distance is more than 2n, we add a transition
2n
Σ
−→ ℓf.
Here, ℓf is an accepting location with a self-loop for every letter in Σ.
For the next four 1-NRAs we can use simple variants of Rcounter(n):
• We add a 1-NRA that accepts data words v such that between two consecutive ⋆, the letter # does
not occur exactly 2n − 1 times.
• We add a 1-NRA that accepts data words v such that ⋆ does not occur exactly 2n + 1 times.
• We add a 1-NRA that accepts data words v such that the distance between two consecutive dotted
letters is not exactly 2n + 1, if no delimiter ⋆ is seen between these two letters. We add another
1-NRA that accepts data words v such that the distance between two consecutive dotted letters is
not exactly 2n + 2 if ⋆ is seen.
• We add a 1-NRA that accepts data words v such that the letters with 22n + 2n − 1 distance carry
different data.
To implement the above binary counters with 1-NRAs, we finally define
Σ′ = {Bitdi ,Bit
#
i ,Bit
⋆
i ,
˙Biti,Bit
x
i ,Bit
x
i+n | 0 ≤ i ≤ n},
where
• letters Bitd0, . . . ,Bit
d
n for counting the distance between two consecutive #. The counter takes into
account only letters in Σ \ {⋆}, ignoring the occurrences of ⋆ and other Biti-letters from Σ′. The
1-NRA detects whether the distance is less or greater than 2n.
17
• letters Bit#0 , . . . ,Bit
#
n for counting the occurrences of #. The 1-NRA detects whether the number
of # between two consecutive ⋆ is less or greater than 2n − 1.
• letters Bit⋆0, . . . ,Bit
⋆
n for counting the occurrences of ⋆ (to check against 2
n + 1).
• letters ˙Bit0, . . . , ˙Bitn for counting the distance between two consecutive dotted letters (to check
against 2n + 1 or 2n + 2).
• letters Bitx0 , . . . ,Bit
x
2n for counting the distance between two letters that carry the same datum (to
check against 22n + 2n + 1).
We construct all these gadgets such that the Bit-letters always carry the same datum as the delimiters.
The union of all above 1-NRAs and DFAs accepts all data words except those v such that proj(v) =
(⋆ρ)2
n
⋆ (that in addition respect the uniqueness conditions on data appearing in v). Finally, we add
NRAs that check whether ρ = c1 · · · c2n in such v is not a faithful computation of M, or it is not an
accepting computation. To this aim, for all words σ1σ2σ3 ∈ ((Q × Γ) ∪ Γ)3 of length three such that
σ1σ2σ3 can appear at some position i in a valid configuration c of M, we define Post(σ1σ2σ3) to be the
set of words u ∈ ((Q × Γ) ∪ Γ)3 that can appear in a successor configuration of c in the same position i
(according to the rules of M).
• For all words σ˙1σ2σ3 ∈ ( ˙Q× Γ)∪ Γ˙)((Q×Γ)∪Γ)2 that starts with a dotted letter, we add a 1-NRA
that accepts data words that for some occurrence of the subword (σ˙1, d1)(σ2, d2)(σ3, d3) with some
data d1, d2, d3, the subword τ1τ˙2τ3 (ignoring the data values) with exactly 2
2n + 2n+1 + 1 distance
is not in Post(σ1σ2σ3). Observe that the subword σ˙1σ2σ3 is intuitively indicating some part of
some configuration c in some copy of ρ, and τ1τ˙2τ3 with distance 2
2n+2n+1+1 is a subword of the
successor configuration of c in the next copy of ρ.
The following NRA is for the case (qinit, a1)a2 . To implement this 1-NRA, we rely on the previous
conditions that two letters (apart from the delimiters) with the same datum have the exact distance
22n + 2n+1 + 1 (checked with a parallel 1-NRA).
1
Σ
2
˙(qinit, a1), ↓
3
a2
4
Σ\ {⋆}
5
⋆
Σ, 6=r
6
(qinit, a1), =r
7
a˙2
8
Σ\{#}
9
#
Σ\{#}
10
τ1
11
τ˙2
12
τ3
Σ
• We add a DFA that accepts data words v such that the last configuration in ρ does not contain a
letter in (Qf × Γ) ∪ ˙(Qf × Γ), where Qf is the set of accepting control states of M.
To complete the proof, we examine the existence of N ∈ O(K) such that M has an accepting
computation over x if, and only if, R is (length-bounded) non-universal with respect to the bound N .
Given the shortest witness w ∈ Σ+M of non-universality of R, the projection v of w into Σ encodes
an accepting computation of M over x, and subsequently has length exactly K. The extra letters of w
compared to v are to implement the five needed counters faithfully. However, these letters do not increase
the length of w much more than K: for instance, the condition for counting the occurrences of # requires
that we accompany every # with a single Bit#i -letter. Hence,
N ≤


cell letters and Bitdi︷ ︸︸ ︷
23n+1 +2n+1(2n − 1)︸ ︷︷ ︸
# and Bit#
i
+
⋆ and Bit⋆i︷ ︸︸ ︷
2n+1 + 2+ 23n+1︸ ︷︷ ︸
cell letters and ˙Biti
+
cell letters and Bitxi︷ ︸︸ ︷
23n+1

 .
Note that N is still exponential in n.
The construction of R is complete and the NEXPTIME-hardness follows from the sketched reduction.
Note that the result already holds for 1-NRAs.
There is a natural reduction from the non-universality problem for 1-NRAs to the emptiness problem
for single-register alternating RAs (1-ARAs). The trivial NEXPTIME membership (guess and check) and
Theorem 12 lead to the NEXPTIME-completeness of the length-bounded emptiness problem for 1-ARAs.
18
Acknowledgements We thank Sylvain Schmitz for helpful discussions on well-structured systems
and non-elementary complexity classes. We thank James Worrell for inspiring discussions, especially
drawing our attention to a trick that simplified the NEXPTIME-hardness construction. We appreciate
the anonymous reviewers for their insightful comments and suggestions.
References
[1] R. Angles and C. Gutierrez. Survey of graph database models. ACM Comput. Surv., 40(1):1:1–1:39,
Feb. 2008.
[2] P. Barcelo´, L. Libkin, A. W. Lin, and P. T. Wood. Expressive languages for path queries over
graph-structured data. ACM Trans. Database Syst., 37(4):31:1–31:46, Dec. 2012.
[3] Y. Benenson, R. Adar, T. Paz-Elizur, Z. Livneh, and E. Shapiro. DNA molecule provides a computing
machine with both data and fuel. Proc. National Acad. Sci. USA, 100:2191–2196, 2003.
[4] M. Bojan´czyk, A. Muscholl, T. Schwentick, L. Segoufin, and C. David. Two-variable logic on words
with data. In 21th IEEE Symposium on Logic in Computer Science (LICS 2006), 12-15 August
2006, Seattle, WA, USA, Proceedings, pages 7–16. IEEE Computer Society, 2006.
[5] M. Bojanczyk and P. Parys. Xpath evaluation in linear time. J. ACM, 58(4):17:1–17:33, July 2011.
[6] A. Bouajjani, P. Habermehl, Y. Jurski, and M. Sighireanu. Rewriting systems with data. In
E. Csuhaj-Varju´ and Z. E´sik, editors, Fundamentals of Computation Theory, 16th International
Symposium, FCT 2007, Budapest, Hungary, August 27-30, 2007, Proceedings, volume 4639 of Lec-
ture Notes in Computer Science, pages 1–22. Springer, 2007.
[7] P. Bouyer, A. Petit, and D. The´rien. An algebraic approach to data languages and timed languages.
Inf. Comput., 182(2):137–162, 2003.
[8] J. Cˇerny´. Pozna´mka k homoge´nnym experimentom s konecˇny´mi automatmi. Matematicko-fyzika´lny
cˇasopis, 14(3):208–216, 1964.
[9] D. Chistikov, P. Martyugin, and M. Shirmohammadi. Synchronizing automata over nested words.
In Foundations of Software Science and Computation Structures - 19th International Conference,
FOSSACS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software,
ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, volume 9634 of Lecture
Notes in Computer Science, pages 252–268. Springer, 2016.
[10] L. Clemente and S. Lasota. Timed pushdown automata revisited. In 30th Annual ACM/IEEE
Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10, 2015, pages 738–
749. IEEE, 2015.
[11] S. Demri and R. Lazic. LTL with the freeze quantifier and register automata. ACM Trans. Comput.
Log., 10(3), 2009.
[12] S. Demri, R. Lazic, and D. Nowak. On the freeze quantifier in constraint LTL: decidability and
complexity. Inf. Comput., 205(1):2–24, 2007.
[13] S. Demri, R. Lazic, and A. Sangnier. Model checking memoryful linear-time logics over one-counter
automata. Theor. Comput. Sci., 411(22-24):2298–2316, 2010.
[14] L. Doyen, L. Juhl, K. G. Larsen, N. Markey, and M. Shirmohammadi. Synchronizing words for
weighted and timed automata. In V. Raman and S. P. Suresh, editors, 34th International Conference
on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, December
15-17, 2014, New Delhi, India, volume 29 of LIPIcs, pages 121–132. Schloss Dagstuhl - Leibniz-
Zentrum fuer Informatik, 2014.
19
[15] L. Doyen, T. Massart, and M. Shirmohammadi. Infinite synchronizing words for probabilistic au-
tomata. In Mathematical Foundations of Computer Science 2011 - 36th International Symposium,
MFCS 2011, Warsaw, Poland, August 22-26, 2011. Proceedings, volume 6907 of Lecture Notes in
Computer Science, pages 278–289. Springer, 2011.
[16] D. Figueira. Satisfiability of downward xpath with data equality tests. In Proceedings of the Twenty-
eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’09,
pages 197–206, New York, NY, USA, 2009. ACM.
[17] D. Figueira. Alternating register automata on finite words and trees. Logical Methods in Computer
Science, 8(1), 2012.
[18] D. Figueira, S. Figueira, S. Schmitz, and P. Schnoebelen. Ackermannian and primitive-recursive
bounds with Dickson’s lemma. In Proceedings of the 26th Annual IEEE Symposium on Logic in
Computer Science, LICS 2011, June 21-24, 2011, Toronto, Ontario, Canada, pages 269–278. IEEE
Computer Society, 2011.
[19] M. Kaminski and N. Francez. Finite-memory automata. Theor. Comput. Sci., 134(2):329–363, 1994.
[20] J. Kret´ınsky´, K. G. Larsen, S. Laursen, and J. Srba. Polynomial time decidability of weighted syn-
chronization under partial observability. In 26th International Conference on Concurrency Theory,
CONCUR 2015, Madrid, Spain, September 1.4, 2015, volume 42 of LIPIcs, pages 142–154. Schloss
Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015.
[21] K. G. Larsen, S. Laursen, and J. Srba. Synchronizing strategies under partial observability. In
CONCUR 2014 - Concurrency Theory - 25th International Conference, CONCUR 2014, Rome,
Italy, September 2-5, 2014. Proceedings, volume 8704 of Lecture Notes in Computer Science, pages
188–202. Springer, 2014.
[22] A. Lisitsa and I. Potapov. Temporal logic with predicate lambda-abstraction. In 12th International
Symposium on Temporal Representation and Reasoning (TIME 2005), 23-25 June 2005, Burlington,
Vermont, USA, pages 147–155. IEEE Computer Society, 2005.
[23] P. V. Martyugin. Complexity of problems concerning carefully synchronizing words for PFA and di-
recting words for NFA. In Computer Science - Theory and Applications, 5th International Computer
Science Symposium in Russia, CSR 2010, Kazan, Russia, June 16-20, 2010. Proceedings, volume
6072 of Lecture Notes in Computer Science, pages 288–302. Springer, 2010.
[24] F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alphabets.
ACM Trans. Comput. Log., 5(3):403–435, 2004.
[25] J. Pin. Sur les mots synthronisants dans un automate fini. Elektronische Informationsverarbeitung
und Kybernetik, 14(6):297–303, 1978.
[26] H. Sakamoto and D. Ikeda. Intractability of decision problems for finite-memory automata. Theor.
Comput. Sci., 231(2):297–308, 2000.
[27] S. Schmitz. Complexity hierarchies beyond elementary. ACM Trans. Comput. Theory, 8(1):3:1–3:36,
2016.
[28] M. Shirmohammadi. Phd thesis: Qualitative analysis of probabilistic synchronizing systems. 2014.
[29] N. Tzevelekos. Fresh-register automata. In T. Ball and M. Sagiv, editors, Proceedings of the
38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011,
Austin, TX, USA, January 26-28, 2011, pages 295–306. ACM, 2011.
[30] M. V. Volkov. Synchronizing automata and the cerny conjecture. In C. Mart´ın-Vide, F. Otto,
and H. Fernau, editors, Language and Automata Theory and Applications, Second International
Conference, LATA 2008, Tarragona, Spain, March 13-19, 2008. Revised Papers, volume 5196 of
Lecture Notes in Computer Science, pages 11–27. Springer, 2008.
20
Appendix
6 Proofs for Deterministic Register Automata
Lemma 2. For all DRAs for which there exist synchronizing data words, there exists a synchronizing
data word w such that |w| ≤ 2|R|+ 1.
Proof. Let R = 〈L,R, Σ,T 〉 be a DRA on the data domain D and with k ≥ 1 registers. Recall that
we denote by data(w) the data occurring in data words w; for configurations q = (ℓ, ν) we use the same
notation data(q) = {ν(r) | r ∈ R} to denote the data appearing in the valuation of q. Let π : Y1 → Y2
be a bijection on data where Y1,Y2 ⊆ D. For every configuration q = (ℓ, ν), define π(q) = (ℓ, ν′),
where ν′ satisfies ν′(r) = π(ν(r)) for all r ∈ R. For every data word w = (a1, d1) . . . (an, dn), define
π(w) = (a1,π(d1)) . . . (an,π(dn)). Note that the application of π on q and w preserves the reachability
property, i.e., post(π(q),π(w)) = {π(q′) | q′ ∈ post(q,w)}.
Assuming thatR has some synchronizing data word, we first prove the following claim by an induction.
Claim. For all pairs of configurations q1, q2, if there exists w such that |post({q1, q2},w)| = 1, then
• for all sets X = {x1,x2, · · · ,x2k+1} ⊂ D with data(q1), data(q2) ⊆ X ,
• there exists some data word wq1,q2 ∈ (Σ×X)
∗ such that |post({q1, q2},wq1,q2)| = 1.
Note that by |X | = 2k + 1, the data efficiency of wq1,q2 is at most 2k + 1.
Proof of Claim. Let q1 and q2 be two configurations of R and define data(q1, q2) = data(q1)∪ data(q2).
Since R has some synchronizing data words, there exists w such that |post({q1, q2},w)| = 1. The proof
is by an induction on the length of w.
Base of induction. Assume w = (a, d) have length |w| = 1. Let X be any arbitrary set of data such
that |X | = 2k + 1 and data(q1, q2) ⊆ X . There are two cases:
• d ∈ X : This entails that data(w) ⊆ X . Observe that wq1,q2 = w satisfies the induction statement.
• d 6∈ X : Since |data(q1, q2)| ≤ 2k, there exists data x 6= d such that x = X \ data(q1, q2). Since
x 6= d, we can define the bijection π : {d}∪ data(q1, q2)→ {x}∪ data(q1, q2) such that π(d) = x and
π(d′) = d′ for all d′ ∈ data(q1, q2). Observe that π(qi) = qi for all i ∈ {1, 2}. Then
|post({q1, q2}, (a, d))| = |post({π(q1),π(q2)}, (a,π(d))| = |post({q1, q2}, (a,x))|.
This and the assumption |post({q1, q2}, (a, d))| = 1 yield |post({q1, q2}, (a,x))| = 1. The wordwq1 ,q2 =
(a,x) satisfies the induction statement.
The base of induction hence holds.
Step of induction. Assume that the induction hypothesis holds for i− 1. Consider some word (a, d) ·w
such that |w| = i− 1 and |post({q1, q2}, (a, d) · w)| = 1.
Consider some set X which has cardinality 2k + 1 and data(q1, q2) ⊆ X , we construct the data
word wq1 ,q2 as follows. Let p1 = post(q1, (a, d)) and p2 = post(q2, (a, d)), and let data(p1, p2) = data(p1)∪
data(p2). Due to the fact that p1, p2 are successors of q1, q2 after inputting (a, d), we know that if
d ∈ data(q1, q2) then d ∈ data(p1, p2). There are two cases:
• d ∈ data(q1, q2) or d 6∈ data(p1, p2). These guarantee that data(p1, p2) ⊆ data(q1, q2) if d ∈
data(q1, q2), and that data(p1, p2) = data(q1, q2) if d 6∈ data(p1, p2). As a result, data(p1, p2) ⊆ X .
By induction hypothesis, there exists some data word wp1,p2 over data domain X such that
|post({p1, p2},wp1,p2)| = 1. For wq1,q2 = (a, d) · wp1,p2 the statement of induction holds, as
|post({q1, q2},wq1,q2)| = 1.
21
• d 6∈ data(q1, q2) and d ∈ data(p1, p2). Without loss of generality, we assume that d 6∈ X . Otherwise
d ∈ X would imply data(p1, p2) ⊆ X , and we simply let wq1,q2 = wp1,p2 . Since |data(q1, q2)| ≤ 2k,
there exists some datum x 6= d such that x ∈ X \ data(q1, q2). Since x 6= d, we can define the
bijection π : {d} ∪ data(q1, q2) → {x} ∪ data(q1, q2) such that π(d) = x and π(d′) = d′ for all
d′ ∈ data(q1, q2). Since data(p1, p2)\{d} ⊆ data(q1, q2), having d in the domain of π, the bijection π
ranges over data(p1, p2). By induction hypothesis, there exists some data word wp1,p2 over data do-
main (X\{x})∪({d}) such that |post({p1, p2},wp1,p2)| = 1. Then, |post({π(p1),π(p2)},π(wp1,p2))| =
1. For all 1 ≤ i ≤ 2, we have π(pi) ∈ post(qi, (a,x)) since pi ∈ post(qi, (a, d)) and x = π(d). By
above arguments, we conclude that |post(({q1, q2}, (a,x)π(wp1 ,p2)| = 1. As {x}∪data({q1, q2}) ⊆ X ,
thus the data word wq1,q2 = (a,x)π(wp1 ,p2) satisfies the statement of induction.
The above arguments prove that in all cases, there exists wq1,q2 ∈ (Σ×X)
∗ that merges two configu-
rations q1 and q2 into a singleton, which completes the proof of Claim.
Since R has some synchronizing data word, using Lemma 1, we know that there exists some word w
with data efficiency k such that post(L×Dk,w) ⊆ L×data(w)k. Consider some setX = {x1,x2, · · · ,x2k+1} ⊂
D such that data(w) ⊆ X . We use the pairwise synchronization technique as follows. Define Sn = L×Xk
and n = |L|(2k + 1)k, i.e., |Sn| = n. For all i = n− 1, · · · , 1 repeat the following:
1. Take a pair of configurations q1, q2 ∈ Si+1. By theClaim above, one can find some wordwq1,q2 ∈ (Σ×X)
∗
such that |post({q1, q2},wq1,q2)| = 1,
2. Define vi = wq1 ,q2 and Si = post(Si+1, vi).
Note that by determinism of R, for every i ∈ {1, · · · ,n − 1, }, we have |Si| ≤ |Si+1| − 1. Thus the
word wsynch = w ·vn−1 · · · v2 ·v1 is a synchronizing data word for R. Since data(w) ⊆ X and data(vi) ⊆ X
for all i ∈ {1, · · · ,n− 1}, the data efficiency of wsynch is at most 2k + 1. The proof is complete.
Lemma 5. The synchronization problem for k-DRAs is PSPACE-complete.
Proof. We prove PSPACE-hardness by a reduction from the non-emptiness problem for k-DRA. Let
R = (L,R, Σ,T ) be a k-DRA equipped with an initial location ℓi and an accepting location ℓf , where,
without loss of generality, we assume that all outgoing transitions from ℓi update all registers, and that
ℓf has no outgoing edges. We also assume that R is complete, otherwise, we add some non-accepting
location and direct all undefined transitions to it.
The reduction is such that from R we construct another k-DRA Rsyn such that the language of R is
not empty if, and only if, Rsyn has some synchronizing data word. We define Rsyn = (Lsyn,R, Σsyn,Tsyn)
as follows. The set of locations is Lsyn = L ∪ {reset}, where reset 6∈ L is a new location; the alphabet is
Σsyn = Σ ∪ {⋆}, where ⋆ 6∈ Σ. To define Tsyn, we add the following transitions to T .
• ℓf
a R↓
−−−−→ ℓf for all letters a ∈ Σsyn,
• ℓi
⋆ R↓
−−−−→ ℓi
• reset
a R↓
−−−−→ ℓi for all letters a ∈ Σsyn,
• ℓ
⋆ R↓
−−−−→ reset for all ℓ ∈ Lsyn except for reset, ℓi, ℓf .
Note that Rsynch is indeed deterministic and complete. To establish the correctness of the reduction, we
prove that the language of R is not empty if, and only if, Rsyn has a synchronizing data word.
First, assume that the language ofR is not empty. Then there exists a data wordw = (a1, d1) . . . (an, dn)
such that w ∈ L(R). Hence there exists a run starting from (ℓi, νi) and ending in (ℓf , νf ) for some
νi, νf ∈ D|R|. The data word (⋆, d)(⋆, d)w(⋆, d) for some d ∈ D synchronizes Rsyn in location ℓf .
Second, assume that Rsyn has some synchronizing data word. Let w ∈ (Σsyn × D)∗ be one of the
shortest data synchronizing data words. All transitions in ℓf are self-loops with update on all registers;
22
Hence, Rsyn can only be synchronized in ℓf . Hence, we also have post((ℓi, νi),w) = {(ℓf , νf )} (for
some νi, νf ∈ D|R|). By the fact that w is a shortest synchronizing data word, we can infer that the
corresponding run does not contain any ⋆-transitions except for two self-loops in ℓi in the very beginning.
Hence there exists a run from (ℓi, νi) to ℓf and thus L(R) 6= ∅.
7 Proofs for Non-deterministic Register Automata
Lemma 6. There is a family of 1-NRAs (Rcounter(n))n∈N with O(n) locations, such that for all synchro-
nizing data words w, some datum d ∈ data(w) appears in w at least 2n times.
Proof. The family of 1-NRAs (Rcounter(n))n∈N is defined as follows. We define the alphabet of RARcounter(n)
by Σ = {#, ⋆,Bit0,Bit1, · · · ,Bitn}. The structure of Rcounter(n) is composed of three distinguished loca-
tions synch, reset, zero and locations 2n, 2n−1, · · · , 21, 20 and 2nc , 2
n−1
c , · · · , 2
1
c , 2
0
c . The general structure
of Rcounter(n) is partially depicted in Figure 4. The RA Rcounter(n) is constructed such that for all syn-
chronizing data words w, some datum x ∈ data(w) appears in w at least 2n times. A counting feature
is thus embedded in Rcounter(n): intuitively, the set of all reached configurations represents the counter
value. Starting from {(zero,x)}, the first increment results in {2nc , · · · , 2
2
c , 2
1
c , 2
0}×{x}, where location 2i
means that the i-th least significant bit in the binary representation of the counter value is set to 1,
and location 2ic means that the i-th bit is set to 0. Informally, we say that there is an x-token in every
reached location. Here, 2nc , · · · , 2
2
c , 2
1
c , 2
0 have x-tokens. A sequence of counter increments is encoded
by re-placing the x-tokens, as shown in the following sequence of sets of locations: {2nc , · · · , 2
2
c , 2
1, 20c},
{2nc , · · · , 2
2
c, 2
1, 20}, {2nc , · · · , 2
3
c , 2
2, 21c, 2
0
c}, etc. The transitions of Rcounter(n) are defined in such a way
that, starting from {(zero,x)}, either 2i or 2ic have tokens, but never both of them at the same time. We
now present a detailed explanation of the structure of Rcounter(n).
All transitions in synch are self-loops with an update on the register synch
Σ r↓
−−−−→ synch. Thus,
Rcounter(n) can only be synchronized in synch. Moreover, synch is only accessible by #-transitions. Sim-
ilarly, all transitions except for those with label ⋆, are self-loops in location reset; thus, Rcounter(n) can
only be synchronized by leaving reset by reading ⋆. We use this also to avoid transitions which are incor-
rect with respect to the binary incrementing process: all incorrect actions are guided to reset to enforce
another ⋆. Assuming w to be one of the shortest synchronizing words, we see that post(L × D,w) =
{(synch,x)}, where w starts with (⋆,x) and ends with (#,x).
The counting involves an initializing process and several incrementing processes.
• initializing the counter to zero: the ⋆-transitions are devised to place a token in zero: from all
locations ℓ ∈ L \ {synch} we have ℓ
⋆ r↓
−−−−→ zero. This sets the counter to 0.
• incrementing the counter : we use Bit0, . . . ,Bitn-transitions with equality guards to control the
increment. Intuitively, an equality-guarded Biti-transition is taken to set the i-th bit in the binary
representation of the counter value according to the standard rules of binary incrementation.
Initially, the token in zero splits in 20 and 2nc , · · · 2
1
c to represent 0 · · · 01, by taking the transitions
zero
=r Bit0−−−−−→ 20 and zero
=r Bit0−−−−−→ 2jc for all 1 ≤ j ≤ n. Equality-guarded Biti-transitions for
i ∈ {1, . . . ,n} are incorrect for zero and thus guided to reset. Whenever data different from x is
processed, Rcounter(n) takes self-loops (omitted in Figure 4) and keeps the x-tokens unmoved.
The equality-guarded Biti-transitions should only be taken if the i-th bit is not set, or, equivalently,
if the location 2i contains no token. This is guaranteed by a Biti-transition 2
i =r Biti−−−−−→ reset, for
every 0 ≤ i ≤ n, which results in an incorrect transition and should be avoided. (Otherwise the
counting process has to restart from 0.) In Figure 4, we depict the corresponding transitions for
i = 2 and i = n.
Further, we need to guarantee that for all i ≥ 1 a Biti-transition is taken only if all less significant
bits are set, or, equivalently, if all locations 2i−1, · · · 20 contain a token. This is ensured by a Biti-
transition 2jc
=r Biti−−−−−→ reset, for every 0 ≤ j < i, which again results in an incorrect transition. See,
e.g., the transition 22c
=r Biti−−−−−→ reset in Figure 4 for every 3 ≤ i ≤ n.
23
Finally, Biti-transitions must produce tokens in 2
i and 20c , · · · 2
i−1
c , thus 2
i
c
=r Biti−−−−−→ 2i and 2j
=r Biti−−−−−→
2jc for all 0 ≤ j < i. All tokens in locations 2
j and 2jc, respectively, for j > i remain where they are,
which is implemented by equality-guarded Biti-self-loops in 2
j and 2jc, respectively.
By construction, it is easy to see that Biti-transitions are the only way to produce a token in 2
i, which
can be fired if 2ic has a token. The Biti-transitions then consume the token in 2
i
c. This guarantees that
after the first ⋆-transition, which puts a token into zero, the two locations 2i and 2ic will never have a
token at the same time.
Finally, all equality-guarded #-transitions in 2nc and 2
i for all 0 ≤ i < n are sent to reset. In contrast,
all #-transitions in 2n and 2ic for all 0 ≤ i < n are sent to synch, with an update on the register. This
guarantees that the counter must correctly count from 0 to 10 · · · 0, meaning that at least one datum x
appears at least 2n times while synchronizing Rcounter(n).
Lemma 8. The non-universality problem is reducible to the synchronization problem for NRAs.
Proof. The reduction is based on the construction presented in Theorem 17 in [14].
Let R = 〈L,R, Σ,T 〉 be an NRA equipped with an initial location ℓin and a set Lf of accepting
locations, where, without loss of generality, we assume that all outgoing transitions from ℓin update all
registers. We also assume that R is complete, otherwise, we add some non-accepting location and direct
all undefined transitions to it.
We construct an NRA Rsyn such that there exists some data word that is not in L(R) if, and only
if, Rsyn has some synchronizing data word. We define Rsyn = 〈Lsyn,R, Σsyn,Tsyn〉 as follows. The set
of locations is Lsyn = L ∪ {reset, synch} where synch, reset 6∈ L are two new locations. The alphabet is
Σsynch = Σ∪ {#, ⋆} where #, ⋆ 6∈ Σ. The transition relation Tsyn is the union of T and set containing the
following transitions:
• synch
a R↓
−−−−→ synch for all letters a ∈ Σsyn,
• reset
⋆ R↓
−−−−→ ℓin and reset
a R↓
−−−−→ reset for all letters a ∈ Σsyn\{⋆},
• ℓ
⋆ R↓
−−−−→ ℓin for all locations ℓ ∈ L,
• ℓ
# R↓
−−−−→ synch for all non-accepting locations ℓ ∈ L\Lf,
• ℓ
# R↓
−−−−→ reset for all accepting locations ℓ ∈ Lf.
Next, we prove the correctness of the reduction.
First, assume there exists a data word w = (a1, d1) . . . (an, dn) such that w 6∈ L(R). Hence, all
runs starting in (ℓin, νi) with νi ∈ D
|R| end in some configuration (ℓ, ν) with ℓ 6∈ Lf. The data word
(⋆, d) ·w ·(#, d) with d ∈ D synchronizes Rsyn in location synch, proving that Rsyn has some synchronizing
data word.
Second, assume that Rsyn has some synchronizing data word. All transitions in synch are self-loops
with update on all registers; thus, Rsyn can only synchronize in synch. Moreover, synch is only accessible
with #-transitions; assuming w is one of the shortest synchronizing data words, we see that post(L ×
D,w) = {(synch, ν))} for some ν ∈ D|R|. From all locations ℓ ∈ L we have ℓ
⋆ R↓
−−−−→ ℓin; we say that
⋆-transitions reset Rsyn. Moreover, the only outgoing transition in location reset is the ⋆-transition.
Thus, a reset followed by some # must occur while synchronizing. Let w = w0(⋆, d⋆)w1(#, d#)w2, where
w1 ∈ (Σ×D)+ is the data word between the last occurrence of ⋆ and the first following occurrence of #,
and w2 ∈ (Σ′\{⋆})∗. We prove that w1 6∈ L(R). By contradiction, assume that w1 is in the language;
thus, there exist valuations νi, νf ∈ D|R| such that Rsyn has a run over w1, i.e., starting in (ℓin, νi) and
ending in (ℓf , νf ) where ℓf ∈ Lf. In fact, since all outgoing transitions in ℓin update all registers, then
for all valuations νi, Rsyn has an accepting run over w1.
Note that w0 cannot be a synchronizing word for Rsyn, because this would contradict the assumption
that w is one of the shortest synchronizing data word. It implies that there must be some configuration q
24
such that postRsyn(q,w0) contains some configuration (ℓ, ν) with ℓ 6= synch. From (ℓ, ν), inputting the
next (⋆, d⋆) (that is after w0 in synchronizing word w), we reach (ℓin, {d⋆}|R|). Since for all valuations νi,
starting in (ℓin, νi), Rsynch has an accepting run over w1, it must have an accepting run from (ℓin, {d⋆}|R|)
to some accepting configuration (ℓf , νf ) too. Reading the last # (that is after w1 in synchronizing
word w), reset is reached. Since w2 does not contain any ⋆, reset is never left, meaning that Rsyn cannot
synchronize in synch, a contradiction. The proof is complete.
Note that the reduction preserves the number of registers in the NRAs.
25
