Register automata (RAs) are finite automata extended with a finite set of registers to store and compare data from an infinite domain. We study the concept of synchronizing data words in RAs: does there exist a data word that sends all states of the RA to a single state? For deterministic RAs with k registers (k-DRAs), we prove that inputting data words with 2k + 1 distinct data from the infinite data domain is sufficient to synchronize. We show that the synchronization problem for DRAs is in general PSPACE-complete, and it is NLOGSPACE-complete for 1-DRAs. For nondeterministic RAs (NRAs), we show that Ackermann(n) distinct data (where n is the size of the RA) might be necessary to synchronize. The synchronization problem for NRAs is in general undecidable; however, we establish Ackermann-completeness of the problem for 1-NRAs. Another main result is the NEXPTIME-completeness of the length-bounded synchronization problem for NRAs, where a bound on the length of the synchronizing data word, written in binary, is given. A variant of this last construction allows to prove that the lengthbounded universality problem for NRAs is co-NEXPTIME-complete. 
INTRODUCTION
Given a deterministic finite automaton (DFA), a synchronizing word is a word that sends all states of the automaton to a unique state. Synchronizing words for finite automata have been studied since the 1970s [9, 24, 26, 31] and are the subject of one of the most well known open problems in automata theory-the Černý conjecture. This conjecture states that the length of a shortest synchronizing word for a DFA with n states is at most (n − 1) 2 . Synchronizing words, moreover, have applications in planning, control of discrete event systems, biocomputing, and robotics [4, 16, 31] . More recently, the notion has been generalized from automata to games [21, 22, 29] and infinite-state systems [10, 15] , with applications to modeling complex systems such as distributed data networks or real-time embedded systems.
In this work, we are interested in synchronizing data words for register automata (RAs). Data words are sequences of pairs, where the first element of each pair is taken from a finite alphabet and the second element is taken from an infinite data domain, such as the natural numbers or ASCII strings. Data words have applications in querying and reasoning about data models with complex structural properties, such as XML and graph databases [1, 3, 6, 17] . For reasoning about data words, various formalisms have been considered, including first-order logic for data words [5, 7] , extensions of linear temporal logic [12] [13] [14] 23] , data automata [5, 8] , RAs [12, 20, 25, 27] , and extensions thereof (e.g., [11, 18, 30] ).
RAs are a generalization of finite automata for processing data words. RAs are equipped with a finite set of registers that can store data values. While processing a data word, such an automaton can store the datum at the current position in one of its registers; it can also test the current datum for equality with data already stored in its registers. In applications, RAs allow for handling parameters such as user names, passwords, identifiers of connections, and sessions. RAs come in many variants, including one-way, two-way, deterministic, nondeterministic, and alternating. For alternating one-way RAs, classical language-theoretic decision problems, such as emptiness, universality, and inclusion, are undecidable. In this article, we focus on the class of one-way nondeterministic RAs (NRAs), which have a decidable emptiness problem [20] , and the subclass of NRAs with a single register, which has a decidable universality problem [12] .
Semantically, an RA defines an infinite-state transition system due to the unbounded domain for the data stored in the registers. Synchronizing words were introduced for infinite-state systems with infinite branching in Doyen et al. [15] and Shirmohammadi [29] ; in particular, the notion of synchronizing words is motivated and studied for weighted automata and timed automata. In some infinite-state settings, such as nested-word automata, finding the right definition of synchronizing word is more challenging [10] . We define the synchronization problem for RAs within the framework suggested in Doyen et al. [15] and Shirmohammadi [29] : given an RA R over a finite alphabet Σ and an infinite data domain D, does there exist a data word w ∈ (Σ × D) + and some state q w such that the word w sends each of the infinitely many states of R to q w ? Note that the state q w depends on the word w; we call such a data word a synchronizing data word.
Contribution. The problem of finding synchronizing data words for RAs poses new challenges in the area of synchronization. It is natural to ask how many distinct data are necessary and sufficient to synchronize an RA, which we refer to as the data efficiency of synchronizing data words. We show that the data efficiency is polynomial in the number of registers for deterministic RAs (DRAs). For NRAs, we provide an example that shows that the data efficiency may be Ackermann(n), where n is the number of states of the NRA. Remarkably, the data efficiency is tightly related to the complexity of deciding the synchronization problem. For DRAs, we prove that for all automata R with k registers, if R has a synchronizing data word, then it also has one with data efficiency at most 2k + 1. We provide a family (R k ) k ∈N of DRAs with k registers, for which indeed a polynomial data efficiency (in k) is necessary to synchronize. This bound is the base of an (N)PSPACE-algorithm for DRAs; we prove a matching PSPACE lower bound by ideas carried over from timed settings [15] . We show that the synchronization problems for DRAs with a single register (1-DRAs) and for DFAs are NLOGSPACE-interreducible, implying that the problem is NLOGSPACE-complete for 1-DRAs.
For NRAs, a reduction from the nonuniversality problem yields the undecidability of the synchronization problem. For single-register NRAs (1-NRAs), we prove Ackermann-completeness of the problem by a novel construction proving that the synchronization problem and the nonuniversality problem for 1-NRAs are polynomial-time interreducible. We believe that this technique is useful in studying synchronization in all nondeterministic settings, requiring careful analysis of the size of the construction.
Another main contribution is to prove NEXPTIME-completeness of the length-bounded synchronization problem for NRAs: given a bound on the length (written in binary), does there exist a synchronizing data word with length at most the given bound? For the lower bound, we present a reduction from the membership problem of O(2 n )-time bounded nondeterministic Turing machines. The crucial ingredient in this reduction is a family of RAs implementing binary counters. A variant of our construction yields a proof for co-NEXPTIME-completeness of the length-bounded universality problem for NRAs; the length-bounded universality problem asks whether all data words of length at most a given bound (written in binary) are in the language of the automaton. We further make a connection to the emptiness problem of single-register alternating RAs.
An extended abstract of this article appeared in the Proceedings of the 41st International Symposium on Mathematical Foundations of Computer Science (MFCS'16) [2] . In comparison with the extended abstract, here we simplify two of the main constructions and add detailed proofs of all results. The main improvement is giving a simpler NEXPTIME-hardness reduction for the length-bounded synchronization problem for NRAs.
PRELIMINARIES
A deterministic finite-state automaton is a tuple A = Q, Σ, Δ , where Q is a finite set of states, Σ is a finite alphabet, and Δ : Q × Σ → Q is a transition function that is totally defined. The function Δ extends to finite words in a natural way: Δ(q, wa) = Δ(Δ(q, w ), a) for all words w ∈ Σ * and letters a ∈ Σ; it extends to all sets S ⊆ Q by Δ(S, w ) = q ∈S Δ(q, w ).
Data words and RAs. For the rest of this article, fix an infinite data domain D. Given a finite alphabet Σ, a data word over Σ is a finite words over
, the length |w | of w is n. We use data(w ) = {d 1 , . . . ,d n } ⊆ D to refer to the set of data values occurring in w, and we define the data efficiency of w to be |data(w )|.
Let R be a finite set of register variables. We define register constraints ϕ over R by the grammar
where r ∈ R. We denote by Φ(R) the set of all register constraints over R. We may use r for the inequality constraint ¬(=r ). A register valuation is a mapping ν : R → D that assigns a data value to each register; we sometimes write ν = (ν (r 1 ), . . . , ν (r k )) ∈ D k , where R = {r 1 , . . . , r k }. The satisfaction relation of register constraints is defined on D k × D as follows: (ν, d ) satisfies the constraint =r if ν (r ) =d; the other cases follow. For example,
where L is a finite set of locations, R is a finite set of registers, Σ is a finite alphabet, and
and ϕ the guard of this transition. A guard true is vacuously true and may be omitted. Likewise, we may omit up if up = ∅. We may write r ↓ when up = {r } is a singleton set. For NRAs with only one register, we may shortly write = and for the guards =r and r , respectively, and ↓ for the update ↓r .
A configuration of R is a pair ( , ν ) ∈ L × D |R | of a location and a register valuation ν . We describe the behavior of R as follows. Given a configuration q = ( , ν ) and some input 
during processing a word w, we may say that an x-token is in (or simply a token is in ).
In the rest of the article, we consider complete RAs, meaning that for all configurations q ∈ L × D |R | and all inputs (a, d ) ∈ Σ × D, there is at least one successor: |post(q, (a, d ))| ≥ 1. We also classify the RAs into DRAs and NRAs, where an RA is deterministic if |post(q, (a, d ))| ≤ 1 for all configurations q and all inputs (a, d ). A k-NRA (k-DRA, respectively) is an NRA (DRA, respectively) with |R| = k.
Synchronizing words and synchronizing data words. Synchronizing words are a wellstudied concept for DFAs (e.g., see Volkov [31] ). Informally, a synchronizing word leads the automaton from every state to the same state. Formally, the word w ∈ Σ + is synchronizing for a DFA A = Q, Σ, Δ if there exists some state q ∈ Q such that Δ(Q, w ) = {q}. The synchronization problem for DFAs asks, given a DFA A, whether there exists some synchronizing word for A.
The synchronization problem for DFAs is in NLOGSPACE by using the pairwise synchronization technique: given a DFA A = Q, Σ, Δ , it is known that A has a synchronizing word if, and only if, for all pairs of states q, q ∈ Q, there exists a word v such that Δ(q, v) = Δ(q , v) (see Volkov [31] for more details). The pairwise synchronization algorithm initially sets S |Q | = Q. For i = |Q | − 1, . . . , 1, the algorithm repeats the following two steps: (a) for two distinct states q, q ∈ S i+1 , find
We introduce synchronizing data words for RAs. Given an RA R = L, R, Σ,T , a data word w ∈ (Σ × D) + is synchronizing for R if there exists some configuration q w = ( , ν ) such that post(L × D |R | , w ) = {q w }. Intuitively, no matter what is the starting location and register valuation, by inputting the data word w, R will be in the unique successor configuration q w . This configuration q w depends on w. The synchronization problem for RAs asks, given an RA R over a data domain D, whether there exists some synchronizing data word for R. The length-bounded synchronization problem for RAs decides, given an RA R and a bound N ∈ N written in binary, whether there exists some synchronizing data word w for R satisfying |w | ≤ N .
SYNCHRONIZING DATA WORDS FOR DRAS
In this section, we first show that the synchronization problems for 1-DRAs and DFAs are NLOGSPACE-interreducible, implying that the problem is NLOGSPACE-complete for 1-DRAs. Next, we prove that the problem for k-DRAs, in general, can be decided in PSPACE; a reduction similar to a timed setting, as in Doyen et al. [15] , provides the matching lower bound. To obtain the complexity upper bounds, we prove that inputting words with data efficiency 2|R| + 1 is sufficient to synchronize a DRA.
The concept of synchronization requires that all runs of an RA, whatever the initial configuration (initial location and register valuations), end in the same configuration ( synch , ν synch ), only depending on the synchronizing data word w synch , formally post(L × D |R | , w synch ) = {( synch , ν synch )}. While processing a synchronizing data word, the infinite set of configurations of RAs must necessarily shrink to a finite set of configurations. The DRA R with three registers depicted in Figure 1 illustrates this phenomenon. Consider the set {x 1 , x 2 , x 3 } ⊆ D of distinct data values: starting from any of the infinite configurations in {init} × D 3 , when processing the data word (a, x 1 )(a, x 2 )(a, x 3 ), R will be in a configuration in the finite set {( 3 , (x 1 , x 2 , x 3 )), ( 3 , (x 1 , x 2 , x 3 )}. We use this observation to provide a linear bound on the number of distinct data values that is sufficient for synchronizing DRAs. 
In Lemma 3.1, we prove that data words over only |R| distinct data values are sufficient to shrink the infinite set of all configurations of DRAs to a finite set. We establish this result based on the following two key facts:
(1) When processing a synchronizing data word w synch from a configuration ( , ν ) with some register r ∈ R such that ν (r ) data(w synch ), the register r must be updated. Observe that such updates must happen at inequality-guarded transitions, which themselves must be accessible by inequality-guarded transitions (possibly with no update). As an example, consider the DRA R in Figure 1 , and assume (2) Moreover, to shrink the set L × D |R | , for every ∈ L, one can find a word w that leads the DRA from { } × D |R | to some finite set. Since R is deterministic, appending some prefix or suffix to w achieves the same objective. This allows us to use a variant of the pairwise synchronization technique to shrink the infinite set L × D |R | to a finite set, by successively inputting w for a location that appears with infinitely many data in the current successor set of L × D |R | . 
Proof. Let R = L, R, Σ,T be a DRA on the data domain D with k ≥ 1 registers. Let v be a synchronizing data word for R with N = |data(v)| distinct data. Suppose that k < N ; otherwise, the statement of the lemma trivially holds.
For all 1 ≤ i ≤ k, we say that x i is the i-th datum in the synchronizing data
We claim that for all locations ∈ L and all 1 ≤ i ≤ k, there exists some data word u i such that Otherwise, the two runs starting from any pair of configurations (ˆ , ν 1 ), (ˆ , ν 2 ) ∈ wait with unequal valuations ν 1 ν 2 would end up in distinct configurations, say ( , ν 1 ), ( , ν 2 ) with ν 1 ν 2 . This is a contradiction to the fact that the data word v is synchronizing. Now let the inequality-guarded transition r ∈R r up↓ − −−−−−−−−−− →, updating the registers in up, be fired at the j-th input (a j , d j ) while reading v (Figure 2) . We prove that the data word
to a subset in which each configuration has some register with value
This phenomenon is depicted in Figure 3 and can be argued as follows. Observe that x 1 = d 1 is the first input datum; thus, after inputting (a 1 , x 1 ), the set of successors is a disjoint union of two branches:
• either at least one register r has datum x 1 after the transition r ∈R =r a 1 − −−−−−−−− →, and all the following successors in this branch, on input (a 2 , x 1 )(a 3 , x 1 ) · · · (a j , x 1 ), preserve the datum x 1 in the register r ,
• or none of the registers is assigned x 1 after the transition 
The preceding argument proves that u 1 with data(
The base of induction holds.
Step of induction. Assume that the induction hypothesis holds for i − 1, namely, there exists some word
To construct u i , we define the concept of a symbolic state: we say that ( , up, ν, j) is a symbolic state if ∈ L, the set up ⊆ R of registers is such that |up| ≥ min(j, k ) and ν ∈ {x 1 , . . . , x j } k , and j ≤ N . The semantics of ( , up, ν, j) is the following set:
Denote by Γ the set of all such symbolic states ( , up, ν, i − 1). By definition, the set Γ is finite. Now we can construct u i as follows.
Start with j = 0 and, while S j ∅, pick a symbolic state q = ( , up, ν, i − 1) such that q ∩ S j ∅ and construct a word u q (as explained in the following details) such that
where j * ≤ |S 0 | is such that S j * = ∅, satisfies the induction statement. In the following, given a symbolic state q = ( , up, ν, i − 1), the aim is to construct the data word u q . Without loss of generality, we assume that
be the set of all configurations in the symbolic state q, where all data stored in the registers r up are not in data(v). Similarly to the induction base, no matter what the register valuation in a configuration in wait looks like, the unique run of R on the synchronizing word v = (a 1 , d 1 )(a 2 , d 2 ) · · · (a n , d n ) starting in that configuration takes the same sequence of transitions. Since ν ∈ {x 0 , . . . , x i−1 } k , after inputting successive data from data(v), all successors of configurations in wait are elements of a symbolic state. For all 0 ≤ j ≤ n, let the symbolic state
In the sequel, we argue that there exists some 1 ≤ m ≤ n such that, in the sequence of transitions from one symbolic state to another symbolic state over the prefix
, the following holds:
• and on inputting (a m , d m ), the transition 
Now from the prefix
e., the first m inputs), and from the set of data {x 1 , x 2 , . . . , x i }, we construct the word
• if Λ j ∅, i.e., some register r ∈ up already stores the datum d j , then y j = d j .
• if Λ j = ∅ (i.e., none of the registers r ∈ up stores the datum Observe that data(u q ) ⊆ {x 1 , . . . , x i }. As a result, all registers that are updated along the runs of R over u q store some datum from {x 1 , . . . , x i }. This argument shows that post( q , u q ) ⊆ L, i . This concludes the step of induction and completes the proof.
After reading some word that shrinks the infinite set of configurations of DRAs to a finite set S of configurations, we generalize the pairwise synchronization technique [31] to finally synchronize configurations in S. By this generalization, we achieve the following Lemma 3.2, for which the detailed proof can be found in Appendix A. Given a 1-DRA R, the synchronization problem can be solved as follows. (1) Check that from each location an update on the single register is achieved by going through inequality-guarded transitions, which can be done in NLOGSPACE. Lemma 3.1 ensures that feeding R consecutively with a single datum x ∈ D is sufficient for this phase and the set of successors of L × D would be a subset of L × {x }. Next, (2) pick an arbitrary set {x, y, z} of data including x; by Lemma 3.2 and the pairwise synchronization technique, the problem reduces to the synchronization problem for DFAs where data in registers and input data extend locations and the alphabet: Q = L × {x, y, z} and Σ × {x, y, z}. Since a 1-DRA, where all transitions update the register and are guarded with true, is equivalent to a DFA, we obtain the next theorem. We provide a family of DRAs for which a linear bound on the data efficiency of synchronizing data words, depending on the number of registers, is necessary. This necessary and sufficient bound is crucial to establish membership of synchronizing DRAs in PSPACE.
Lemma 3.4. There is a family of single-letter DRAs (R n ) n ∈N , with n = |R| registers and O(n) locations such that all synchronizing data words have data efficiency Ω(n).
Proof. The family of DRAs R n (n ∈ N) is defined over an infinite data domain D. The DRA R n has n registers and a single letter a. The structure of R n is composed of two distinguished locations init and synch and two chains, where each chain has n locations: 1 , 2 , . . . , n and 
, informally speaking, in both chains the respective i-th locations are simultaneously reached after inputting i distinct data: for all 1 ≤ i < n, in each i and i there are two transitions. One transition is a self-loop, with a satisfied equality guard on at least one of the updated registers r 1 , . . . , r i so far. The other transition goes to the next location i+1 in the chain, with an inequality guard on all updated registers r 1 , r 2 , . . . , r i so far, and an update on the next register r i+1 : At the last locations n and n of the two chains, there is one transition with inequality guards on all registers leaving the chain to synch, and there is one transition that is, again, a self-loop with an equality constraint for at least one of the registers: By construction, we see that n + 1 distinct data values must be read for reaching synch from the infinite set {init} × D n . Since R n can only be synchronized in synch, all synchronizing data words must have data efficiency at least n + 1 ∈ Ω(n).
It remains to prove that R n has indeed some synchronizing word. Let {x 1 , x 2 , . . . , x n+1 } be a set of n + 1 distinct data values and w synch = (a,
and |data(w synch )| = n + 1. The proof is complete. Theorem 3.5. The synchronization problem for k-DRAs is PSPACE-complete.
Proof (Sketch). The synchronization problem for k-DRA is in PSPACE using the following co-(N)PSPACE algorithm. (1) Pick a set X = {x 1 , x 2 , . . . , x 2k+1 } of distinct data values. (2) Guess some location ∈ L and check if there is no word w ∈ (Σ × {x 1 , x 2 , . . . , x k }) * with length |w | ≤ 2 k |L | |Σ | such that along firing transitions that are inequality-guarded on all k registers, some registers are not updated. If (2) is satisfied, then return "no" (meaning that there is no synchronizing data word for the input k-DRA). Otherwise, (3) guess two configurations (3) is satisfied, then the algorithm returns "no"; otherwise, return "yes".
For PSPACE-hardness, we adapt an established reduction (e.g., see Doyen et al. [15] ) from the nonemptiness problem for k-DRA (see Appendix A). The result then follows by PSPACEcompleteness of the nonemptiness problem for k-DRA [12] .
SYNCHRONIZING DATA WORDS FOR NRAS
In this section, we study the synchronization problems for NRAs. We slightly update a result in Doyen et al. [15] to present a general reduction from the nonuniversality problem to the synchronization problem for NRAs. This reduction proves the undecidability result for the synchronization Fig. 4 . A partial picture of the 1-NRA R counter(n) (with n ≥ 3) implementing a binary counter. To avoid crossing edges in the figure, we use two copies of the same location reset. All locations have inequality-guarded self-loops for all letters in Σ\{ }. All missing equality-guarded -transitions are directed to zero. For all 0 ≤ i < n, missing equality-guarded #-transitions from 2 i c are guided to synch with an update on the register. All other nondepicted equality-guarded transitions are directed to reset, and inequality-guarded transitions are self-loops.
problem for k-NRAs and Ackermann-hardness in 1-NRAs. We then prove that for 1-NRAs, the synchronization and nonuniversality problems are indeed interreducible, which completes the picture by Ackermann-completeness of the synchronization problem for 1-NRAs.
In the nondeterministic synchronization setting, we present two kinds of counting features, which are useful for later constructions. For the first one, we define a family (R counter(n) ) n ∈N of 1-NRAs with size only linear in n, where an input datum x ∈ D must be read 2 n times to achieve synchronization.
Lemma 4.1. There is a family of 1-NRAs (R counter(n) ) n ∈N with O(n) locations such that for all synchronizing data words w, some datum d ∈ data(w ) appears in w at least 2 n times.
Proof (Sketch). The 1-NRA R counter(n) shown in Figure 4 encodes a binary counter that ensures that in every synchronizing data word w some datum x ∈ data(w ) appears at least 2 n times. The location synch has self-loops on all letters, and thus R counter(n) can only be synchronized in location synch. Generally speaking, the counting involves an initializing process and several incrementing processes. The initializing process is started by firing a -transition, which places a token, let us say an x-token, into location zero. This sets the counter to 0. Note that firing -transitions is the only way to guide tokens out of reset; hence, whenever there is some token in reset, a new initializing process must be started. We use this to enforce a new initializing process whenever some transition is fired that is incorrect with respect to the incrementing process.
An incrementing process can be set off by inputting the datum x via equality guards. The numbers 1 ≤ m ≤ 2 n are represented by placing a copy of the x-token in the locations corresponding to the binary representation of m. An x-token in location 2 i (in 2 i c , respectively) means that the i-th least significant in the binary representation is set to 1 (to 0, respectively). First, a Bit 0 -transition places a copy of the x-token in each of {2 n c , . . . , 2 2 c , 2 1 c , 2 0 } to represent 0. . . 001. In each incrementation step, the x-tokens are replaced by firing specific Bit i -transitions (0 ≤ i ≤ n), following the standard procedure of binary incrementation. At the end, when a copy of the x-token locates in each of {2 n , 2 n−1 c , . . . , 2 0 c } (representing 10. . . 0), the #-transitions guide all of these tokens to location synch and finally synchronize R counter . We give a detailed explanation of the structure of R counter(n) in Appendix B.
We present a second kind of counting features in RAs that explains the hardness of synchronizing NRAs, even with a single register. In Lemma 4.2, we define a family of 1-NRAs (with only O(n) locations), where tower(n) distinct data must be read to gain synchronization. Recall from Schmitz [28] that the function tower is at level three of the infinite Ackermann hierarchy (A k ) k ∈N of fast-growing functions A i : N → N, inductively defined by A 1 (n) = 2n and
respectively, on some natural number n results in some number that is double, exponential, and tower, respectively, in n. The function A ω (n) = A n (n) is a nonprimitive recursive Ackermann-like function, defined by diagonalization. Proof. The domain of the family of 1-NRAs (R tower(n) ) n ∈N is the natural numbers N. The alphabet of R tower(n) is Σ = {#, , rep, doub, exp, tow}. The structure of R tower(n) is composed of n locations data 1 , data 1,2 , . . . , data 1,2, ...,n and six more locations reset, synch, store, rep, waitDoub, waitExp. The general structure of R tower(n) is partially depicted in Figure 5 . The NRA R tower(n) is such that |data(w )| ≥ tower(n) for all synchronizing data words w.
All transitions in synch are self-loops with an update on the register synch Σ r ↓ − −−−− → synch; thus, R tower(n) can only be synchronized in synch. Moreover, synch is only accessible from store by a #-transition. Assuming that w is one of the shortest synchronizing words, we see that post(L × D, w ) = {(synch, x )}, where w ends with (#, x ).
From all locations ∈ L \ {synch}, we have r ↓ − −−−− → data 1 ; we say that -transitions reset R tower (n) . Moreover, the only outgoing transition in location reset is the -transition. Thus, a reset must occur to synchronize R tower(n) . After this forced reset, say on reading ( , 1), the set of reached configurations is {(data 1 , 1), (synch, 1)}. Since resetting is inefficient, we try to avoid it; we call all transitions leading to reset inefficient. For all locations data 1, ...,i with 1 ≤ i < n, we define the two transitions All other transitions in data 1, ...,i are inefficient and directed to reset. In the following, we rename data 1,2, ...,n to waitTow. We partially depict the transitions from waitTow, waitExp, waitDoub, rep, and store in Figure 5 . All transitions are inefficient except the following:
• waitTow We remark that store # r ↓ − −−− → synch is the only #-transition that is not inefficient. This implies that for efficiently synchronizing R tower(n) , one needs to remove all produced tokens to store before firing a #-transition. The main issue in removing produced tokens, however, is that some inequalityguarded transitions are unavoidable, and these transitions may replicate the tokens. In particular, all {#, exp, tow}-transitions activate a reset. As a result, as long as some token is in waitDoub, {#, exp, tow}-transitions should be avoided for the sake of efficiency. This implies that for all 1 ≤ i ≤ n, the i-token in waitDoub can leave the location only individually on the input (doub, i). Now, inputting (doub, i) moves the i-token to waitRep. Here, the i-token must immediately move on to store via the inequality-guarded reptransitions, which will replicate the i-token into two tokens. Note that we must fire rep-transitions with some "fresh" datum j such that j {1, . . . , n}; otherwise, a reset is evoked. (For simplicity, we use j = i + n by convention.) It can now be easily seen that the only efficient way to guide all n tokens out of waitDoub is by inputting the data word
which puts 2n distinct tokens into store.
Exponentialization. Assume that there are n distinct tokens {1, 2, . . . , n} in waitExp. The only efficient transition is waitExp =r exp − −−−−− → waitDoub. In particular, all {#, tow}-transitions activate a reset and should be avoided as long as some token is in waitExp. This implies that for all 1 ≤ i ≤ n, the i-token in waitExp can leave the location only individually on the input (exp, i). Now, inputting (exp, 1) moves the 1-token to waitDoub. From earlier, we know that the only efficient way for guiding a single token in waitDoub toward synchronization is by inputting the data word w doub(1) , resulting in two distinct tokens in store: 1 and 2. We can now proceed to remove the 2-token from waitExp by inputting (exp, 2). Note that this also guides the {1, 2}-tokens residing in store to waitDoub. Again, for efficient synchronization, we must input the data word w doub (2) , which results in four distinct tokens {1, 2, 3, 4} in store. It is now easy to see that the only efficient way to guide all n tokens out of waitExp is by inputting the data word
which puts 2 n distinct tokens into store.
Towering. Assume that there are n distinct tokens {1, 2, . . . , n} in waitTow. The only efficient transition is waitExp =r tow − −−−−− → waitExp. In particular, firing #-transitions activates a reset and should be avoided as long as some token is in waitTow. This implies that for all 1 ≤ i ≤ n, the i-token in waitTow can leave the location only individually on the input (tow, i). Now, inputting (exp, 1) moves the 1-token to waitExp. From earlier, we know that the only efficient way for guiding a single token in waitTow toward synchronization is by inputting the data word w exp (1) , resulting in two distinct tokens in store: 1 and 2. We can now proceed to remove the 2-token from waitTow by inputting (tow, 2). Note that this also guides the {1, 2}-tokens residing in store to waitExp. Again, for efficient synchronization, we must input the data word w exp (2) , which results in four distinct tokens {1, 2, 3, 4} in store. It is now easy to see that the only efficient way to guide all n tokens out of waitTow is by inputting the data word
which puts tower(n) distinct tokens into store. Now, after the (forced) initial reset by firing -transitions, it is easy to see that the only data word that advances in synchronizing is (rep, 2)(rep, 3) · · · (rep, n). It replicates the 1-token to n distinct tokens 1, 2, . . . , n, which are placed into waitTow. From earlier, we know that the only efficient way to guide all n tokens out of waitTow is by inputting w tow(n) , which places tower(n) distinct tokens into store. We can now fire #-transitions to synchronize R tower(n) without evoking a reset, but note that due to the equality guard at the #-transition from store to synch, each of the tower(n) distinct tokens in store can move to synch only individually. This implies |data(w )| ≥ tower(n) for all synchronizing words w.
We can now use similar ideas as in Lemma 4.2 for defining a family of 1-NRAs R A n (m) (n, m ∈ N) such that all synchronizing data words of R A n (m) have data efficiency at least A n (m), where A n is at level n of the Ackermann hierarchy. This provides a good intuition that the synchronization problem for NRAs must be Ackermann-hard, even if the NRA has a single register. In the following, we prove that the synchronization problem and the nonuniversality problem for NRAs are interreducible.
Let us first define the nonuniversality problem for RAs. To define the language of a given NRA R, we equip it with an initial location in and a set L f of accepting locations, where, without loss of generality, we assume that all outgoing transitions from in update all registers. The language L(R) is the set of all data words w ∈ (Σ × D) * , for which there is a run from ( in , ν in ) to ( f , ν f ) such that f ∈ L f and ν in , ν f ∈ D |R | . The nonuniversality problem asks, given an RA, whether there exists some data word w over Σ such that w L(R). We adopt an established reduction in Doyen et al. [15] to provide the following lemma.
Lemma 4.3. The nonuniversality problem is reducible to the synchronization problem for NRAs.
The detailed proof can be found in Appendix B. As an immediate result of Lemma 4.3 and the undecidability of the nonuniversality problem for NRAs (Theorems 2.7 and 5.4 in Demri and Lazic [12] ), we obtain the following theorem.
Theorem 4.4. The synchronization problem for NRAs is undecidable.
Next, we present a reduction showing that for 1-NRAs, the synchronization problem is reducible to the nonuniversality problem, providing the tight complexity bounds for the synchronizing problem.
Lemma 4.5. The synchronization problem is reducible to the nonuniversality problem for 1-NRAs.
Proof. We establish a reduction from the synchronization problem to the nonuniversality problem for 1-NRAs as follows. Given a 1-NRA R = L, R, Σ,T , we construct a 1-NRA R comp equipped with an initial location and a set of accepting locations such that R has some synchronizing word if, and only if, there exists some data word that is not in L(R comp ).
11:14 K. Quaas and M. Shirmohammadi
First, we see that an analogue of Lemma 3.1 holds for 1-NRAs: for all 1-NRAs with some synchronizing data word, there exists some word w with data efficiency 1 such that post(L × D, w ) ⊆ L × data(w ). For all locations ∈ L, such a data word must update the register by firing an inequality-guarded transition that is reached only via inequality-guarded transitions; this can be checked in NLOGSPACE. Given R, we assume that such a data word w always exists; otherwise, we define R comp to be a 1-NRA with a single (initial and accepting) location equipped with selfloops for all letters so that L(R comp ) = (Σ × D) * . Given data(w ) = {x }, we say that R has some synchronizing word v if post(L × {x }, v) is a singleton.
Second, we define a data language lang such that data words in this language are encodings of the synchronizing process. Let L = { 1 , 2 , . . . , n } be the set of locations and x, y two distinct data. Informally, each data word in lang starts with the • initial block: a delimiter ( , y), the sequence ( 1 , x ), ( 2 , x ) , . . . , ( n , x ) and an input (a, d ) ∈ Σ × D as the beginning of a synchronizing word. The initial block is followed by several • normal blocks: the delimiter ( , y), the set of successor configurations reached from the configurations and the input of the previous block, and the next input (a , d ) of the synchronizing data word. The data word finally ends with the • final block: the delimiter ( , y), a single successor configuration reached from the configurations and the input of the previous block, and the delimiter ( , y).
Formally, the language lang is defined over the alphabet
It contains all data words u that satisfy the following membership conditions:
(1) The data words u starts with ( , y)( 1 , x ), ( 2 , x ), . . . , ( n , x ) for some x, y ∈ D with y x; this condition guarantees the correctness of the encoding for the initial block. By construction, the NRA R has some synchronizing data word if, and only if, lang ∅. In the following, we construct a 1-NRA R comp that accepts the complement of lang. Then, the NRA R has some synchronizing data word if, and only if, there exists some data word that is not in L(R comp ).
The 1-NRA R comp is the union of several 1-NRAs that are in the family of 1-NRAs R 1 , R 2 , . . . , R 7 , where an 1-NRA is in the family R i if it violates the i-th condition among the membership conditions in lang:
(1) Family R 1 : We add a 1-NRA that accepts data words not starting with ( , y)
(2) Family R 2 : We add a DFA that accepts data words u such that proj(u) is not in the regular language ( L + Σ) + synch . (3) Family R 3 : We add a 1-NRA that accepts data words in which two delimiters have different data.
(4) Family R 4 : We add a 1-NRA that accepts data words in which the datum of first is not used only by occurrences of . The proof is complete. , x )(b, y)(b, z) with three distinct data values x, y, z. The approach of using a unique data value to shrink the infinite set of configurations to a finite subset only yields synchronizing data words of length greater than 3.
By Lemmas 4.3 and 4.5 and Ackermann-completeness of the nonuniversality problem for 1-NRA, which follows from Theorem 2.7 and the proof of Theorem 5.2 in Demri and Lazic [12] , and the result for counter automata with incrementing errors in Figueira et al. [19] , we obtain the following theorem. Theorem 4.6. The synchronization problem for 1-NRAs is Ackermann-complete.
LENGTH-BOUNDED SYNCHRONIZING DATA WORDS FOR NRAS
As proved in the previous section, the synchronization problem for NRAs is in general undecidable. In this section, we study the length-bounded synchronization problem for NRAs, in which the synchronizing data words are required to be shorter than a given length (written in binary).
To decide the synchronization problem in 1-RAs, both in the deterministic and nondeterministic setting, we rely on Lemma 3.1. With this lemma at hand, it was sufficient to search for synchronizing data words that first input a single datum x (chosen arbitrary) as many times as necessary to have the set of successor configurations included in L × {x }. In the next step, this obtained set of successor configurations was synchronized in a singleton. However, the shortest synchronizing data words do not always follow this pattern (for an example, see Figure 6 ). Observe that the data word (a, x )(b, y)(b, z) is synchronizing with length 3 (not exceeding the bound 3). However, all synchronizing data words that repeat a datum such as x, to first bring the RA to a finite set, have length at least 4. The example shows that one cannot rely on the techniques developed in Section 4 to decide the length-bounded synchronization problem for NRA.
In this section, we prove the following theorem.
Theorem 5.1. The length-bounded synchronization problem for NRAs is NEXPTIME-complete.
The NEXPTIME-membership of the length-bounded synchronization problem is straightforward: guess a data word w shorter than the given length (that is written in binary and thus may be exponential in the length), and check in EXPTIME whether w is synchronizing. Our main contribution is to prove the NEXPTIME-hardness of this problem, for which in turn, by Lemma 4.3, it is sufficient to show that the length-bounded universality problem is co-NEXPTIME-complete. The length-bounded universality problem asks, given an RA and N ∈ N encoded in binary, whether all data words w with |w | ≤ N are in the language of the automaton.
Theorem 5.2. The length-bounded universality problem for NRAs is co-NEXPTIME-complete.
Proof. The length-bounded universality problem for NRAs can be solved in co-NEXPTIME, by guessing a (possibly exponentially long) data word, and checking whether the guessed word is a witness for nonuniversality of the RA.
We prove that the complement of the length-bounded universality problem is NEXPTIME-hard. The proof is a reduction from the membership problem of O(2 n )-time bounded nondeterministic Turing machines: given a nondeterministic Turing machine M and an input word x, decide whether M accepts x within time bound 2 |x | . This problem is NEXPTIME-complete.
Given a nondeterministic Turing machine M and an input x of length n, we construct an NRA R equipped with an initial location and a set of accepting locations, and a bound N (encoded in binary) such that there exists a witness of nonuniversality w (i.e., w L(R)) with |w | ≤ N if, and only if, M has some accepting computation on x within time bound 2 n .
Let M have the set Q of control states and the tape alphabet Γ. Let us recall that a configuration of M is a word in the language Γ * (Q × Γ)Γ * , where each letter in (Q × Γ) ∪ Γ encodes a single cell and the position of the reading/writing head. A computation ρ of M is a sequence c 0 c 1 c 2 · · · of configurations that respects the transition function of the Turing machine. Without loss of generality, we assume that M has a self-loop on all accepting states. Hence, for the input x ∈ Γ * of length n, all accepting computations ρ of M are sequences of length exactly 2 n , and all configurations c i along such a computation are words c i ∈ Γ * (Q × Γ)Γ * of length at most 2 n . In the following, we pad the configurations shorter than 2 n with at the tail such that the length of all such configurations become equal to 2 n .
Let
and Σ will be defined later. Let K = 2 3n + 2 2n + 1. Given a computation ρ = c 1 · · · c 2 n , we define u (ρ) ∈ Σ K , roughly speaking, as follows:
(1) It consists of 2 n copies of ρ (with some extra delimiters).
(2) Between all consecutive copies of ρ there is a delimiter, and u (ρ) starts and ends with as well. Hence, there are 2 n + 1 occurrences of in u (ρ). (3) In each copy of ρ, there is a # delimiter between consecutive configurations. Since there are 2 n configurations in (each copy of) ρ, the number of # in u (ρ) is 2 n (2 n − 1). (4) In the i-th copy of ρ, the letter for the i-th cell of every participating configuration c i is dotted, all other letters are nondotted. Hence, in each copy of ρ, there are exactly 2 n dotted letters (one in each configuration of ρ), with distance 2 n + 1. (5) The distance between two delimiters is 2 2n + 2 n − 1, due to the fact that ρ consists of 2 n configurations, each of which has 2 n tape cells in turn and is separated from the next configuration by a # delimiter. 
We define a data language lang over the alphabet Σ such that data words in this language are faithful encodings of computations ρ of M over the input word x. In particular, the language contains all data words v that satisfy the following conditions: (6) Let proj(v) be the projection of v into Σ (i.e., omitting the data values). There exists some accepting computation ρ of M on the input x such that proj(v) = u (ρ). (7) The letters and # occur only with a unique datum, say datum 0 (and no other letter occurs with that datum). (8) For all occurrences of , for all 1 ≤ i ≤ 2 2n + 2 n − 1, all letters at the i-th positions after each must carry the same datum, say datum i. Except for occurrences of #, the datum i is exclusive for the i-th positions after occurrences of .
Given a data word v ∈ lang such that proj(v) = u (ρ) for some computation ρ, condition (8) and previous conditions on u (ρ) entail that for all 1 ≤ j, k ≤ 2 n the j-th tape cell in the k-th configuration c k of all copies of ρ in v carries the same datum (revisit Figure 7) . Observe that all data words v ∈ lang use exactly 2 2n + 1 distinct data values.
By definition of lang, we see that lang is nonempty if, and only if, there is an accepting computation ρ of M over x. Recall that Σ M = Σ ∪ Σ (where Σ is defined later). In the following, we construct a 1-NRA R over alphabet Σ M such that the language accepted by R (projected into Σ, ignoring Σ letters) is the complement of lang. At the end, we examine the existence of N ∈ O(K ) such that M has an accepting computation over x if, and only if, R is (length-bounded) nonuniversal with respect to the bound N .
The 1-NRA R is the union of several 1-NRAs and DFAs that we describe in the following. Each of these automata violates one of the necessary conditions for data words v to be in lang:
• We add a DFA that accepts data words v such that proj(v) is not in the regular language ( L) * , where L is defined by
• We add a DFA that accepts data words v such that proj(v) does not start with
where q init is the initial control state of M and x = a 1 a 2 · · · a n is the input. This regular expression also guarantees that in the first copy of ρ, the first cell is dotted.
• We add a DFA that accepts data words w containing at least two dotted letters between two consecutive #.
• We add a 1-NRA that accepts data words in which some delimiter occurs with some datum different from the datum for the first .
• We add a 1-NRA that accepts data words in which some other letter appears with the datum dedicated to delimiters and #.
• We add 1-NRA that accepts data words in which there are two letters (other than #) between two consecutive that carry the same datum.
• We add a 1-NRA that accepts data words v such that there are two consecutive # whose distance is not exactly 2 n (ignoring the occurrences of ). For this, we use a variant of R counter(n) implementing a binary counter introduced in Section 4. For accepting data words v such that the distance between two consecutive # is less than 2 n , we add a transition
and for accepting those words that the distance is more than 2 n , we add a transition
Here, f is an accepting location with a self-loop for every letter in Σ.
For the next four 1-NRAs, we can use simple variants of R counter(n) :
• We add a 1-NRA that accepts data words v such that between two consecutive , the letter # does not occur exactly 2 n − 1 times.
• We add a 1-NRA that accepts data words v such that does not occur exactly 2 n + 1 times.
• We add a 1-NRA that accepts data words v such that the distance between two consecutive dotted letters is not exactly 2 n + 1, if no delimiter is seen between these two letters. We add another 1-NRA that accepts data words v such that the distance between two consecutive dotted letters is not exactly 2 n + 2 if is seen.
• We add a 1-NRA that accepts data words v such that the letters with 2 2n + 2 n − 1 distance carry different data.
To implement the preceding binary counters with 1-NRAs, we finally define
n for counting the distance between two consecutive #. The counter takes into account only letters in Σ \ { }, ignoring the occurrences of and other Bit iletters from Σ . The 1-NRA detects whether the distance is less or greater than 2 n .
• letters Bit # 0 , . . . , Bit # n for counting the occurrences of #. The 1-NRA detects whether the number of # between two consecutive is less or greater than 2 n − 1.
• letters Bit 0 , . . . , Bit n for counting the occurrences of (to check against 2 n + 1).
11:20 K. Quaas and M. Shirmohammadi
• lettersḂ it 0 , . . . ,Ḃ it n for counting the distance between two consecutive dotted letters (to check against 2 n + 1 or 2 n + 2).
• letters Bit x 0 , . . . , Bit x 2n for counting the distance between two letters that carry the same datum (to check against 2 2n + 2 n + 1).
We construct all of these gadgets such that the Bit-letters always carry the same datum as the delimiters.
The union of all preceding 1-NRAs and DFAs accepts all data words except those v such that proj(v) = ( ρ) 2 n (that in addition respect the uniqueness conditions on data appearing in v). Finally, we add NRAs that check whether ρ = c 1 · · · c 2 n in such v is not a faithful computation of M, or it is not an accepting computation. To this aim, for all words σ 1 σ 2 σ 3 ∈ ((Q × Γ) ∪ Γ) 3 of length three such that σ 1 σ 2 σ 3 can appear at some position i in a valid configuration c of M, we define Post (σ 1 σ 2 σ 3 ) to be the set of words u ∈ ((Q × Γ) ∪ Γ) 3 that can appear in a successor configuration of c in the same position i (according to the rules of M):
• For all wordsσ 1 σ 2 σ 3 ∈ (Q × Γ) ∪Γ)((Q × Γ) ∪ Γ) 2 that starts with a dotted letter, we add a 1-NRA that accepts data words that for some occurrence of the subword
, the subword τ 1τ2 τ 3 (ignoring the data values) with exactly 2 2n + 2 n+1 + 1 distance is not in Post (σ 1 σ 2 σ 3 ). Observe that the subwordσ 1 σ 2 σ 3 is intuitively indicating some part of some configuration c in some copy of ρ, and τ 1τ2 τ 3 with distance 2 2n + 2 n+1 + 1 is a subword of the successor configuration of c in the next copy of ρ.
The following NRA is for the case (q init , a 1 )a 2 . To implement this 1-NRA, we rely on the previous conditions that two letters (apart from the delimiters) with the same datum have the exact distance 2 2n + 2 n+1 + 1 (checked with a parallel 1-NRA).
• We add a DFA that accepts data words v such that the last configuration in ρ does not contain a letter in (Q f × Γ) ∪(Q f × Γ), where Q f is the set of accepting control states of M.
To complete the proof, we examine the existence of N ∈ O(K ) such that M has an accepting computation over x if, and only if, R is (length-bounded) nonuniversal with respect to the bound N . Given the shortest witness w ∈ Σ + M of nonuniversality of R, the projection v of w into Σ encodes an accepting computation of M over x and subsequently has length exactly K. The extra letters of w compared to v are to implement the five needed counters faithfully. However, these letters do not increase the length of w much more than K: for instance, the condition for counting the occurrences of # requires that we accompany every # with a single Bit .
Note that N is still exponential in n.
The construction of R is complete, and the NEXPTIME-hardness follows from the sketched reduction. Note that the result already holds for 1-NRAs.
There is a natural reduction from the nonuniversality problem for 1-NRAs to the emptiness problem for single-register alternating RAs (1-ARAs). The trivial NEXPTIME membership (guess and check) and Theorem 5.1 lead to the NEXPTIME-completeness of the length-bounded emptiness problem for 1-ARAs. Proof. Let R = L, R, Σ,T be a DRA on the data domain D and with k ≥ 1 registers. Recall that we denote by data(w ) the data occurring in data words w; for configurations q = ( , ν ), we use the same notation data(q) = {ν (r ) | r ∈ R} to denote the data appearing in the valuation of q. Let π :
, where ν satisfies ν (r ) = π (ν (r )) for all r ∈ R. For every data word w = (a 1 , d 1 
Note that the application of π on q and w preserves the reachability property (i.e., post(π (q), π (w )) = {π (q ) | q ∈ post(q, w )}).
Assuming that R has some synchronizing data word, we first prove the following claim by an induction.
Claim. For all pairs of configurations q 1 , q 2 , if there exists w such that |post({q 1 , q 2 }, w )| = 1, then
Note that by |X | = 2k + 1, the data efficiency of w q 1 ,q 2 is at most 2k + 1.
Proof of claim. Let q 1 and q 2 be two configurations of R and define data(q 1 , q 2 ) = data(q 1 ) ∪ data(q 2 ). Since R has some synchronizing data words, there exists w such that |post({q 1 , q 2 }, w )| = 1. The proof is by an induction on the length of w.
Base of induction.
Assume that w = (a, d ) have length |w | = 1. Let X be any arbitrary set of data such that |X | = 2k + 1 and data(q 1 , q 2 ) ⊆ X . There are two cases:
• d ∈ X : This entails that data(w ) ⊆ X . Observe that w q 1 ,q 2 = w satisfies the induction statement.
Since x d, we can define the bijection π :
This and the assumption |post(
The base of induction hence holds.
Step of induction. Assume that the induction hypothesis holds for i − 1. Consider some word (a, d ) · w such that |w | = i − 1 and |post({q 1 
Considering some set X that has cardinality 2k + 1 and data(q 1 , q 2 ) ⊆ X , we construct the data word w q 1 ,q 2 as follows. Let p 1 = post(q 1 , (a, d ) ) and p 2 = post(q 2 , (a, d )), and let data(p 1 , p 2 ) = data(p 1 ) ∪ data(p 2 ). Due to the fact that p 1 , p 2 are successors of q 1 , q 2 after inputting (a, d ), we know that if d ∈ data(q 1 , q 2 ) then d ∈ data(p 1 , p 2 ). There are two cases: 
Without loss of generality, we assume that d X .
Otherwise, d ∈ X would imply data(p 1 , p 2 ) ⊆ X , and we simply let w q 1 ,q 2 = w p 1 ,p 2 . Since |data(q 1 , q 2 )| ≤ 2k, there exists some datum x d such that x ∈ X \ data(q 1 , q 2 ). Since x d, we can define the bijection π : q 2 ) , having d in the domain of π , the bijection π ranges over data(p 1 , p 2 ). By induction hypothesis, there exists some data word
By the preceding arguments, we conclude that |post((
The preceding arguments prove that in all cases there exists w q 1 ,q 2 ∈ (Σ × X ) * that merges two configurations q 1 and q 2 into a singleton, which completes the proof of the claim.
Since R has some synchronizing data word, using Lemma 3.1, we know that there exists some word w with data efficiency k such that
We use the pairwise synchronization technique as follows. Define S n = L × X k and n = |L|(2k + 1) k (i.e., |S n | = n). For all i = n − 1, . . . , 1 repeat the following:
(1) Take a pair of configurations q 1 , q 2 ∈ S i+1 . By the preceding claim, one can find some word
Note that by determinism of R, for every i ∈ {1, . . . , n − 1, }, we have |S i | ≤ |S i+1 | − 1. Thus, the word w synch = w · v n−1 · · ·v 2 · v 1 is a synchronizing data word for R. Since data(w ) ⊆ X and data(v i ) ⊆ X for all i ∈ {1, . . . , n − 1}, the data efficiency of w synch is at most 2k + 1. The proof is complete. Lemma 3.5. The synchronization problem for k-DRAs is PSPACE-complete.
Proof. We prove PSPACE-hardness by a reduction from the nonemptiness problem for k-DRA. Let R = (L, R, Σ,T ) be a k-DRA equipped with an initial location i and an accepting location f , where, without loss of generality, we assume that all outgoing transitions from i update all registers, and that f has no outgoing edges. We also assume that R is complete; otherwise, we add some nonaccepting location and direct all undefined transitions to it.
The reduction is such that from R we construct another k-DRA R syn such that the language of R is not empty if, and only if, R syn has some synchronizing data word. We define R syn = (L syn , R, Σ syn ,T syn ) as follows. The set of locations is L syn = L ∪ {reset}, where reset L is a new location; the alphabet is Σ syn = Σ ∪ { }, where Σ. To defineT syn , we add the following transitions to T :
Note that R synch is indeed deterministic and complete. To establish the correctness of the reduction, we prove that the language of R is not empty if, and only if, R syn has a synchronizing data word.
First, assume that the language of R is not empty. Then there exists a data word w = (a 1 , d 1 ) · · · (a n , d n ) such that w ∈ L(R). Hence, there exists a run starting from ( i , ν i ) and ending
Second, assume that R syn has some synchronizing data word. Let w ∈ (Σ syn × D) * be one of the shortest data synchronizing data words. All transitions in f are self-loops with update on all registers; therefore, R syn can only be synchronized in f . Hence, we also have post(
. By the fact that w is a shortest synchronizing data word, we can infer that the corresponding run does not contain any -transitions except for two self-loops in i in the very beginning. Hence, there exists a run from ( i , ν i ) to f , and thus L(R) ∅.
B PROOFS FOR NONDETERMINISTIC REGISTER AUTOMATA
Initially, the token in zero splits in 2 0 and 2 n c , . . . for all 1 ≤ j ≤ n. Equality-guarded Bit itransitions for i ∈ {1, . . . , n} are incorrect for zero and thus guided to reset. Whenever data different from x is processed, R counter(n) takes self-loops (omitted in Figure 4 ) and keeps the x-tokens unmoved.
The equality-guarded Bit i -transitions should only be taken if the i-th bit is not set or, equivalently, if the location 2 i contains no token. This is guaranteed by a Bit i -transition 2 i =r Bit i − −−−−− → reset, for every 0 ≤ i ≤ n, which results in an incorrect transition and should be avoided. (Otherwise, the counting process has to restart from 0.) In Figure 4 , we depict the corresponding transitions for i = 2 and i = n.
Further, we need to guarantee that for all i ≥ 1, a Bit i -transition is taken only if all less significant bits are set or, equivalently, if all locations 2 i−1 , . . . By construction, it is easy to see that Bit i -transitions are the only way to produce a token in 2 i , which can be fired if 2 i c has a token. The Bit i -transitions then consume the token in 2 i c . This guarantees that after the first -transition, which puts a token into zero, the two locations 2 i and 2 i c will never have a token at the same time.
Finally, all equality-guarded #-transitions in 2 n c and 2 i for all 0 ≤ i < n are sent to reset. In contrast, all #-transitions in 2 n and 2 i c for all 0 ≤ i < n are sent to synch, with an update on the register. This guarantees that the counter must correctly count from 0 to 10 · · · 0, meaning that at least one datum x appears at least 2 n times while synchronizing R counter(n) . Lemma 4.3. The nonuniversality problem is reducible to the synchronization problem for NRAs.
Proof. The reduction is based on the construction presented in Theorem 17 in Doyen et al. [15] . Let R = L, R, Σ,T be an NRA equipped with an initial location in and a set L f of accepting locations, where, without loss of generality, we assume that all outgoing transitions from in update all registers. We also assume that R is complete; otherwise, we add some nonaccepting location and direct all undefined transitions to it.
We construct an NRA R syn such that there exists some data word that is not in L(R) if, and only if, R syn has some synchronizing data word. We define R syn = L syn , R, Σ syn ,T syn as follows. The set of locations is L syn = L ∪ {reset, synch}, where synch, reset L are two new locations. The alphabet is Σ synch = Σ ∪ {#, }, where #, Σ. The transition relation T syn is the union of T and set containing the following transitions:
• synch Next, we prove the correctness of the reduction. First, assume there exists a data word w = (a 1 , d 1 ) . . . (a n , d n ) such that w L(R). Hence, all runs starting in ( in , ν i ) with ν i ∈ D |R | end in some configuration ( , ν ) with L f . The data word ( , d ) · w · (#, d ) with d ∈ D synchronizes R syn in location synch, proving that R syn has some synchronizing data word.
Second, assume that R syn has some synchronizing data word. All transitions in synch are selfloops with update on all registers; thus, R syn can only synchronize in synch. Moreover, synch is only accessible with #-transitions; assuming w is one of the shortest synchronizing data words, we see that post(L × D, w ) = {(synch, ν ))} for some ν ∈ D |R | . From all locations ∈ L, we have R ↓ − −−−− → in ; we say that -transitions reset R syn . Moreover, the only outgoing transition in location reset is the -transition. Thus, a reset followed by some # must occur while synchronizing. Let w = w 0 ( , d )w 1 (#, d # )w 2 , where w 1 ∈ (Σ × D) + is the data word between the last occurrence of and the first following occurrence of #, and w 2 ∈ (Σ \{ }) * . We prove that w 1 L(R). By contradiction, assume that w 1 is in the language; thus, there exist valuations ν i , ν f ∈ D |R | such that R syn has a run over w 1 , for instance, starting in ( in , ν i ) and ending in ( f , ν f ) where f ∈ L f . In fact, since all outgoing transitions in in update all registers, then for all valuations ν i , R syn has an accepting run over w 1 .
Note that w 0 cannot be a synchronizing word for R syn , because this would contradict the assumption that w is one of the shortest synchronizing data word. It implies that there must be some configuration q such that post R syn (q, w 0 ) contains some configuration ( , ν ) with synch. From ( , ν ), inputting the next ( , d ) (that is after w 0 in synchronizing word w), we reach ( in , {d } |R | ). Since for all valuations ν i , starting in ( in , ν i ), R synch has an accepting run over w 1 , it must have an accepting run from ( in , {d } |R | ) to some accepting configuration ( f , ν f ) as well. Reading the last # (that is after w 1 in synchronizing word w), reset is reached. Since w 2 does not contain any , reset is never left, meaning that R syn cannot synchronize in synch, a contradiction. The proof is complete.
Note that the reduction preserves the number of registers in the NRAs.
