| Title | PRAMおよび対数時間一様な論理回路族に基づく計算量 <br> の階層（計算モデルと計算の複雑さに関する究） |
| :---: | :--- |
| Author（s） | Iwamoto，Chuzo；Iwama，Kazuo |
| Citation | 数理解析研究所講究録（1996），950：26－32 |
| Issue Date | 1996－05 |
| URL | http：／hdl．handle．net／2433／60345 |
| Right | Departmental Bulletin Paper |
| Type | Dublisher <br> Textversion |

# PRAM および対数時間一様な論理回路族に基づく計算量の階層 

九州芸術工科大学 岩本 宙造（Chuzo Iwamoto）<br>九州大学•工学部 岩間 一雄（Kazuo Iwama）

## 1 Introduction

It is well recognized that to separate $\mathrm{NC}^{k}$ from $\mathrm{NC}^{k+1}$ is very difficult．It is even unknown whether all sets in P are recognized by logspace－uniform circuits of linear size and logarithmic depth．This might be the reason why we usually think，unlike the sequential case，it is hopeless to try to prove hierarchies for par－ allel complexities．However，it should be noted that this perception is only reasonable for just one model， logspace－uniform circuits．In this paper，it is shown that（i）there exist a constant $d$ and a language $L$ such that $L$ is recognizable in time $d T(n)$ by some PRIORITY CRCW PRAM but is not recognizable in time $T(n)$ by any PRIORITY CRCW PRAM if the number of processors is fixed and（ii）there exist con－ stants $c, d$ and a language $L$ such that $L$ is recogniz－ able by some family of DLOGTIME－uniform circuits of size $(Z(n))^{c}$ and depth $d T(n)$ but is not recogniz－ able by any family of DLOGTIME－uniform circuits of size $Z(n)$ and depth $T(n)$ if $T(n)$ is not bounded by $O(\log n)$ ．The above result（i）improves the hierar－ chy of PRAM－based parallel complexity classes shown by Kirchherr［19］，and as for（ii），little surprisingly no such hierarchies based on circuits have been presented．

Kirchherr［19］showed that there exists a language $L$ which is not recognizable in time $\log ^{i} n$ by any PRI－ ORITY CRCW PRAM with $n^{j}$ processors but is rec－ ognizable in time $\log ^{i+8} n$ by a PRIORITY CRCW PRAM with $n^{96 j+104}$ processors．This hierarchy is ob－ tained by transforming it into the hierarchy of time－ and reversal－bounded deterministic TMs．This trans－ formation might be the reason why his hierarchy is much less tight than our present one．In this paper，we apply the diagonalization method directly to PRAMs， by which we can show that a constant increase of par－ allel time（and no increase of processors）yields a new PRAM－based parallel complexity class．In the sequen－ tial case，a tight time－hierarchy theorem is known for RAMs［12］．

More precisely，our second result shows：There exist constants $c$ and $d$ such that if $T_{2}(n)>d T_{1}(n)$ and $Z_{2}(n)>\left(Z_{1}(n)\right)^{c}$ for all $n$ greater than some $n_{0}$ ，then
$\operatorname{DLT}-\mathrm{U}\left(T_{1}(n), Z_{1}(n)\right) \nsupseteq \operatorname{DLT}-\mathrm{U}\left(T_{2}(n), Z_{2}(n)\right)$ ，where $\operatorname{DLT}-\mathrm{U}(T(n), Z(n))$ is the class of sets recognizable by DLOGTIME－uniform circuits of depth $T(n)$ and size $Z(n)$ ．This immediately implies hierarchies for big－O complexities，like

$$
\begin{array}{ll} 
& \text { DLT-U }\left(O(\log n), O\left(n^{2}\right)\right) \\
\subsetneq & \text { DLT-U }\left(O(\log n \log \log n), O\left(n^{2 c}\right)\right) \\
\subsetneq & \text { DLT-U }\left(O\left(\log ^{2} n\right), O\left(n^{2 c^{2}}\right)\right) \subsetneq \cdots \subsetneq P .
\end{array}
$$

Recall that in the case of logspace－uniform circuits， even whether LS－U $(O(\log n), O(n)) \subsetneq P$ is not known， where LS－U $(T(n), Z(n))$ is the class of sets recogniz－ able by logspace－uniform circuits of depth $T(n)$ and size $Z(n)$ ．

At the same time，however，it is also a fact that DLOGTIME－uniform circuits do not seem to dif－ fer that much from logspace－uniform circuits，since if $k \geq 2$ then $\mathrm{NC}^{k}$ coincides for both uniformi－ ties［21］．One might think that a proper hierarchy un－ der DLOGTIME－uniformity could imply a proper hi－ erarchy under logspace uniformity，since（i）logspace－ uniform circuits can be translated to DLOGTIME－ uniform circuits with constant and polynomial loss in depth and size［21］，（ii）constant and polynomial in－ crease of depth and size strictly enlarges the complex－ ity class of DLOGTIME－uniform circuits（our new re－ sult in this paper），and（iii）all DLOGTIME－uniform circuits are obviously logspace－uniform．

Nevertheless，（fairly standard）diagonalization works for DLOGTIME－uniform circuits but does not seem so for logspace－uniform ones．The main contri－ bution of this paper is to call attention to this distinc－ tion between the two uniformities．（The answer to the above skepticism that our hierarchy might imply logspace－uniform hierarchy will be given in Section 3．）

Since we consider fan－in 2 circuits in this paper， our hierarchy theorem does not hold for depth less than $\log n$ ．In the class $\mathrm{NC}^{1}$ ，several separation re－ sults have been known．For example，there is a non－ collapsing hierarchy in $\mathrm{AC}^{0}$ ，which is the class of prob－ lems solvable by constant depth，polynomial－size，un－ bounded fan－in circuits．It is known［22］that there
are problems in $\mathrm{AC}_{k}^{0}-\mathrm{AC}_{k-1}^{0}$ for each $k>0$, where $\mathrm{AC}_{k}^{0}$ is the class of problems solvable by DLOGTIMEuniform, depth- $k$, polynomial-size, unbounded fanin circuits. Also, it is known [2, 13] that the exclusive OR function is not in $\mathrm{AC}^{0}$, which implies that $\mathrm{AC}^{0} \subsetneq \mathrm{NC}^{1}$. On the other hand, it is open whether ACC $\subsetneq$ ? $\mathrm{NC}^{1}$ [4], where ACC is the class of problems solvable by constant depth, polynomialsize, unbounded fan-in Boolean circuits in which any " $\operatorname{MOD}(k)$-gate," $k>1$, may be used. ([3] conjectured that $A C C \neq N C^{1}$.) One of the results inside $N C^{1}$ is that there are problems complete for DLOGTIMEuniform $\mathrm{NC}^{1}$ under DLOGTIME reductions [5, 9]. (More information on the class $\mathrm{NC}^{1}$ may be found in $[6,7]$.)

On the other hand, almost no results have been known about the $\mathrm{NC}^{k}$ hierarchy. It is open whether there exists an integer $k$ such that $\mathrm{NC}^{k}=\mathrm{NC}^{k+1}$ [17]. No one has succeeded in proving $\mathrm{NC}^{1} \neq \mathrm{P}$ (or even ACC $\neq \mathrm{NP}$ ). Also, the $\mathrm{NC}^{k}$ hierarchy collapses if NC has complete problems under either log-space or $\mathrm{NC}^{1}$ reductions (see [10] for the $\mathrm{NC}^{1}$-reducibility). One approach to studying parallel complexity classes is characterizing circuits by TMs. Ruzzo [21] showed that $\mathrm{NC}^{k}=\mathrm{ASPACE}, \mathrm{TIME}\left(\log n, \log ^{k} n\right)$ for $k \geq 2$. If the circuits are $U_{\mathrm{B}}$-uniform [21] ( $U_{\mathrm{B}}$-uniform circuits of depth $T(n)$ are constructed by $T(n)$-space DTMs), it is known that $\operatorname{DTIME}(T) \subseteq$ uniform size $(T \log T) \subseteq$ $\operatorname{DTIME}\left(T \log ^{3} n\right) \quad[20] \quad$ and $\quad \operatorname{NSPACE}(S) \quad \subseteq$ uniform depth $\left(S^{2}\right) \subseteq \operatorname{DSPACE}\left(S^{2}\right)[8]$. Also it seems quite sure that the parallel complexity of some problems gradually increases with the value of parameters. Those problems include $k$-connectivity [18], $\alpha$ connectivity $[14,15]$, and some artificial language having unlimitedly lower parallel time-complexities [16].

DLOGTIME-uniformity has slightly different definitions in the literature. Our present definition is the one using the extended connection language. The same results hold for another definition using the direct connection language if $T(n)=\Omega(\log n \log \log n)$. Also, similar hierarchy holds for unbounded fan-in circuits. Note that our result needs to specify an explicit size of circuits; it does not say anything about whether $\mathrm{NC}^{k} \subsetneq \mathrm{NC}^{k+1}$, or even whether $\mathrm{NC}^{1} \subsetneq P$, under DLOGTIME-uniformity, which is still open.

## 2 Definitions and Results

All Turing Machines (TMs) in this paper are deterministic $k$-tape TMs with only 0 and 1 as their tape symbols.

Our PRAM is essentially the same model as defined in [23]. A PRAM has a common memory, $M[0], M[1], M[2], \ldots$, and a sequence of processors (RAMs) operating synchronously in parallel. (See [1] for RAM.) Each processor of the PRAM has its own local memory, $R[0], R[1], R[2], \ldots$, and has instructions for addition, subtraction, logical OR, AND, conditional branches based on predicates $=$ and $<$, and reading and writing into its local memory. The processors can access to the common memory, and each processor has instructions for reading from and writing into the common memory using its local memory to specify the common memory address. If more than one processor attempts to write the same location in common memory at the same time, the lowest numbered processor succeeds. All processors have the same program. The input string of length $n$ is given in $M[0], M[1], \ldots, M[n-1]$. The computation halts when all processors have halted. The PRAM operates in time $T(n)$ if it halts within $T(n)$ steps on any input of length $n$. When the PRAM accepts (rejects) the input string, symbol $1(0)$ appears in $M[0]$ after $T(n)$ steps. The complexity of PRAM program is measured according to the uniform cost criterion.

The definitions of circuits are mostly from [21]. A combinatorial circuit is a directed acyclic graph, where each node (gate) has indegree $d \leq 2$, and labeled by some Boolean function of $d$ variables, or has indegree 0 and is labeled by " $x$ " (an input). Nodes with outdegree 0 are outputs. In this paper, we consider a family $C=\left(\alpha_{1}, \alpha_{2}, \ldots, \alpha_{n}, \ldots\right)$ of circuits, where $\alpha_{n}$ has $n$ inputs and one output. We denote the size and depth of $\alpha_{n}$ by $Z(n)$ and $T(n)$, respectively.

Let $g(p)$ denote the gate reached by following the path $p$ of inputs to $g$. For example, $g(\varepsilon)$ is $g, g(L)$ is $g$ 's left input, $g(L R)$ is $g$ 's left input's right input, and so on. The standard encoding $\bar{\alpha}_{n}$ is a string of 4tuples $\langle n, g, p, y\rangle$, where $g \in\{0,1\}^{*}, p \in\{\varepsilon, \mathrm{~L}, \mathrm{R}\}$, and $y \in\{x, \wedge, \vee, \neg\} \cup\{0,1\}^{*}$ such that in $\alpha_{n}$ either (i) $p=$ $\varepsilon$ and gate $g$ is a $y$-gate, $y \in\{x, \wedge, \vee, \neg\}$, or (ii) $p \neq \varepsilon$ and gate $g(p)$ is numbered $y, y \in\{0,1\}^{*}$. The direct connection language $L_{\mathrm{DC}}$ of the family $C$ is the set of strings of the form $\langle n, g, p, y\rangle$. The family of circuits, $C=\left(\alpha_{1}, \alpha_{2}, \ldots, \alpha_{n}, \ldots\right)$, of size $Z(n)$ and depth $T(n)$ is said to be logspace-uniform if the mapping $n \rightarrow \bar{\alpha}_{n}$ is computable by a DTM in space $\log Z(n)$.

The definition of the extended encoding $\widehat{\alpha}_{n}$ is the same as $\bar{\alpha}_{n}$, except $p \in\{\mathrm{~L}, \mathrm{R}\}^{*}$ and $|p| \leq \log Z(n)$. The extended connection language $L_{\mathrm{EC}}$ of the family $C$ is the set of strings of the form $\langle n, g, p, y\rangle$. The family of circuits, $C=\left(\alpha_{1}, \alpha_{2}, \ldots, \alpha_{n}, \ldots\right)$, of size $Z(n)$ and depth $T(n)$ is said to be DLOGTIME-
uniform if there is a DTM recognizing $L_{\mathrm{EC}}$ which takes time $O(\log Z(n))$. This definition is the same as $U_{\mathrm{E}}$-uniform in [21].

Remark. Another definition of DLOGTIMEuniformity uses the direct connection language $L_{D C}$, i.e., the circuit family $C$ of size $Z(n)$ and depth $T(n)$ is said to be DLOGTIME-uniform if there is a DTM recognizing $L_{\mathrm{DC}}$ which takes time $O(\log Z(n))$. Theorem 2 also holds for this definition of DLOGTIMEuniformity if $T(n)=\Omega(\log n \log \log n)$.

Now we are ready to show our main results. (The proof of Theorem 1 is omitted. The proof of Theorem 2 is given in Section 3.)

Theorem 1. Suppose $T_{1}(n)$ is a function which is not constant and is computable by a $T_{1}(n)$-time PRAM with $P(n)$ processors. Then, there exist a language $L$, a constant d, and an integer $n_{0}$ such that (i) $T_{2}(n)>d T_{1}(n)$ for all $n \geq n_{0}$ and (ii) $L$ is recognizable by a $T_{2}(n)$-time $P R A M$ with $P(n)$ processors but is not recognizable by any $T_{1}(n)$-time PRAM with $P(n)$ processors.

Let $\operatorname{PRAM}(T(n), P(n))$ be the class of sets recognizable by PRAMs with $P(n)$ processors in time $T(n)$. If $T_{2}(n)=d T_{1}(n)$, then $T_{2}(n)>T_{1}(n)$ for all integers $n>0$. Hence:

Corollary 1. For a similar $T_{1}(n)$ as above, there exists a constant d such that $\operatorname{PRAM}\left(T_{1}(n), P(n)\right) \subsetneq$ $\operatorname{PRAM}\left(d T_{1}(n), P(n)\right)$.

Theorem 2. Suppose that $T_{1}(n)$ is a polylogarithmic function not bounded by $O(\log n)$ and $Z_{1}(n)$ is a polynomial function such that $\log Z_{1}(n)$ and $T_{1}(n)$ are computable by $O(\log n)$-time DTMs if input $n$ is given in binary. Then, there exist a language $L$, constants $c, d$, and an integer $n_{0}$ such that (i) $T_{2}(n)>$ $d T_{1}(n)$ and $Z_{2}(n)>\left(Z_{1}(n)\right)^{c}$ for all $n \geq n_{0}$ and (ii) $L$ is recognizable by a family of DLOGTIME-uniform circuits of size $Z_{2}(n)$ and depth $T_{2}(n)$ but is not recognizable by any family of DLOGTIME-uniform circuits of size $Z_{1}(n)$ and depth $T_{1}(n)$.

The functions $T_{1}(n)$, computable by $O(\log n)$-time DTMs, includes many specific functions, such as $\log n \log ^{*} n, \log n \log \log n$, and $\log ^{k} n$.

Corollary 2. For similar $T_{1}(n), Z_{1}(n)$, and constants $c, d$ as above, (i) $\operatorname{DLT}-U\left(T_{1}(n), Z_{1}(n)\right) \subsetneq$ $\operatorname{DLT}-U\left(d T_{1}(n),\left(Z_{1}(n)\right)^{c}\right)$.
(ii) If $\lim _{n \rightarrow \infty} T_{1}(n) / T_{2}(n)=\lim _{n \rightarrow \infty}\left(Z_{1}(n)\right)^{c} / Z_{2}(n)=0$ then $D L T-U\left(T_{1}(n), Z_{1}(n)\right) \subsetneq D L T-U\left(T_{2}(n), Z_{2}(n)\right)$.

## 3 Proof of Theorem 2

Let $\beta_{n}$ be circuits of size $Z_{1}(n)$ and depth $T_{1}(n)$, and $\alpha_{n}$ be circuits of size $Z_{2}(n)$ and depth $T_{2}(n)$. In order to prove that the class of languages recognizable by $\beta_{n}$ is properly contained in the class of languages recognizable by $\alpha_{n}$ by the diagonalization method, we will show that (i) $\alpha_{n}$ can generate the extended encoding $\widehat{\beta}_{n}$ of any $\beta_{n}$ and (ii) $\alpha_{n}$ can simulate $\beta_{n}$. Under the DLOGTIME-uniformity, the extended connection language is recognized by an $O(\log n)$-time DTM. By simulating this DTM, $\alpha_{n}$ generates the extended encoding $\widehat{\beta}_{n}$ of any $\beta_{n}$. For (ii), we can use the universal circuit $U$ of Cook and Hoover [11], which can simulate any circuit $\beta_{n}$ if the extended encoding $\widehat{\beta}_{n}$ is given to $U$ as its input.

The depth $T_{1}(n)$ of $\alpha_{n}$ may be any well-behaving function not bounded by $O(\log n)$. The reason why the theorem does not hold for $T_{1}(n)=O(\log n)$ is that $\alpha_{n}$ must be able to simulate any $O(\log n)$-time DTM which has as many states, tapes and symbols as possible. (This is similar to the space-hierarchy theorem of DTM, i.e., the languages recognizable by $S_{1}(n)$ space DTMs is properly contained in those by $S_{2}(n)$ space DTMs for any well behaving function $S_{2}(n)$ not bounded by $O\left(S_{1}(n)\right)$, but the theorem does not hold for $S_{2}(n)=O\left(S_{1}(n)\right)$.) It is also shown in [11] that the extended encoding can be obtained from the standard encoding by the conversion circuit of depth $O(\log n \log \log n)$. Therefore, Theorem 2 also holds for DLOGTIME-uniformity with the direct connection language if $T(n)=\Omega(\log n \log \log n)$.

This might be a good point to give an answer to the skepticism in Section 1. To be exact, [21] says that any single language recognizable by a family of logspace uniform circuits of size $Z(n)$ and depth $T(n)$ is recognizable by a family of DLOGTIME-uniform circuits of size $(Z(n))^{c_{1}}$ and depth $d_{1} T(n)$ for some constants $c_{1}, d_{1}$. One should notice that this does not say that the class of languages recognizable by families of logspace uniform circuits of size $Z(n)$ and depth $T(n)$ is contained in the class of languages recognizable by families of DLOGTIME-uniform circuits of size $(Z(n))^{c_{2}}$ and depth $d_{2} T(n)$ for some constants $c_{2}, d_{2}$.

Also, it should be mentioned that if we use Theorem 1, it is straightforward to show that $\mathrm{DLT}-\mathrm{U}(T(n), Z(n)) \quad \subsetneq \quad \operatorname{DLT}-\mathrm{U}\left(T(n) \log n,(Z(n))^{c}\right)$ using the circuit simulation of PRAMs [23]. Removing this $\log n$ gap needs direct diagonalization or efficient simulation of TMs by DLOGTIME-uniform circuits, which includes several subtle details as described in
the rest of the paper.
Recall that what we have to do is to construct a family of DLOGTIME-uniform circuits $\alpha_{n}$ of size $Z_{2}(n)$ and depth $T_{2}(n)$ such that the language $L$ recognized by $\alpha_{n}$ is not recognizable by any family of DLOGTIME-uniform circuits of size $Z_{1}(n)$ and depth $T_{1}(n)$.

The overview of the circuit $\alpha_{n}$ is as follows. The circuit $\alpha_{n}$ is composed of three subcircuits, $\alpha_{n}^{\text {tm }}, \alpha_{n}^{\text {code }}$ and $\alpha_{n}^{\text {sim }}$. The output of $\alpha_{n}^{\mathrm{tm}}$ is 1 if and only if there exists a TM, say, $T_{b}$, such that the input string $b$ is an encoding of $T_{b}$. Let $L_{\mathrm{EC}}$ be the extended connection language accepted by $T_{b}$, and let $\widehat{\beta}_{n}$ be the extended encoding of circuits $\beta_{n}$ defined by $L_{\text {EC }}$. Circuit $\alpha_{n}^{\text {code }}$ generates the extended encoding $\widehat{\beta}_{n}$ of $\beta_{n}$ by simulating $T_{b}$. ( $\beta_{n}$ may have more than $Z_{1}(n)$ gates, so we consider the first $Z_{1}(n)$ gates. If the input string $b$ is ill-formed or if $T_{b}$ does not halt within $O(\log n)$ time, then the outputs of $\alpha_{n}^{\text {code }}$ are all 0 .) Strictly speaking, $\alpha_{n}^{\text {code }}$ does not generate the extended encoding $\widehat{\beta}_{n}$; instead, $\alpha_{n}^{\text {code }}$ checks whether each string of the form $\langle n, g, p, y\rangle$ is in $L_{\mathrm{EC}}$. Since the number of different strings of the form $\langle n, g, p, y\rangle$ is roughly $(Z(n))^{3}, \alpha_{n}^{\text {code }}$ outputs a string over $\{0,1\}$ of length about $(Z(n))^{3}$, where 1 (0) represents a string $\langle n, g, p, y\rangle$ is (is not) in $L_{\mathrm{EC}}$. Therefore, $\alpha_{n}^{\text {code }}$ contains about $(Z(n))^{3}$ copies of the TM simulators.

Circuit $\alpha_{n}^{\text {sim }}$ outputs 1 if the circuit $\beta_{n}$ has depth at most $T_{1}(n)$ and outputs 1 . The output of $\alpha_{n}$ is 1 if and only if the outputs of $\alpha_{n}^{\mathrm{tm}}$ and $\alpha_{n}^{\mathrm{sim}}$ are 1 and 0 , respectively. Finally, we will show that the language accepted by $\alpha_{n}$ cannot be accepted by any family of DLOGTIME-uniform circuits of size $Z_{1}(n)$ and depth $T_{1}(n)$ (see Lemma 1).

It is known [11] that there exists a DLOGTIMEuniform universal circuit $U$ of depth $O\left(T_{1}(n)\right)$ and size $O\left(\left(Z_{1}(n)\right)^{3} T_{1}(n) / \log Z_{1}(n)\right)$ such that if the extended encoding of $\widehat{\beta}_{n}$ of any circuit $\beta_{n}$ of size $Z_{1}(n)$ and depth $T_{1}(n)$ is given as input, then the circuit $U$ simulates $\beta_{n}$. Although the encoding inputs to $U$ is slightly different from $\alpha_{n}^{\text {sim }}$, the construction of $\alpha_{n}^{\text {sim }}$ is very similar to $U$ and therefore we omit $\alpha_{n}^{\text {sim }}$. We shall start with circuit $\alpha_{n}^{\mathrm{tm}}$.
Circuits $\alpha_{n}^{\mathrm{tm}}$ : The output of $\alpha_{n}^{\mathrm{tm}}$ is 1 if and only if there exists a TM $T_{b}$ such that the input string $b$ is an encoding of $T_{b}$. First, we must fix the encoding rule for TMs. Suppose that a TM has states $s_{1}, s_{2}, \ldots$ and uses 0,1 as its tape-symbols. Let $k$ be the minimum integer such that the numbers of tapes and states are less than $k$. State $s_{i}$ is encoded into string $1^{i} 00 \cdots 0$ of length $k$. Tape symbols 0 and 1 are encoded into $100 \cdots 0$ and $110 \cdots 0$ of length $k$, respectively.

Strings $100 \cdots 0$ and $110 \cdots 0$ of length $k$ also represent that the head is moved to the left and right, respectively. For example, if $k=4$, we encode the next move function $\delta\left(s_{3}, 0,1,0\right)=\left(s_{1}, 1,1,0, L, R, L\right)$ into the following string of length $3 k^{2}+2 k$ :


Note that each substring of length $k$ consists of 1 's followed by at least one 0 . The encoding of the TM is a concatenation of the encodings of the next move functions followed by substring $1^{k}$ which is further followed by an arbitrary long string. String $1^{k}$ indicates the terminal of sequence of the next move functions. Also, all next move functions of $T_{b}$ must appear in the prefix of length $\psi(n) .(\psi(n)$ is any slowly growing function computable by an $O(\log n)$-time DTM. We will fix $\psi(n)$ in Lemma 1.)

Circuit $\alpha_{n}^{\mathrm{tm}}$ checks whether there exists an integer $k$ such that the input $b$ is an encoding of some TM with at most $k$ tapes and $k$ states. $\alpha_{n}^{\mathrm{tm}}$ has circuits $c_{n}^{\mathrm{tm}}(1), c_{n}^{\mathrm{tm}}(2), \ldots, c_{n}^{\mathrm{tm}}(k), \ldots, c_{n}^{\mathrm{tm}}(\psi(n))$, where each $c_{n}^{\mathrm{tm}}(k)$ checks whether $b$ is an encoding of some TM with at most $k$ tapes and $k$ states. $\alpha_{n}^{\mathrm{tm}}$ outputs 1 iff there exists an integer $k$ such that circuit $c_{n}^{\mathrm{tm}}(k)$ outputs 1. The structure of $c_{n}^{\mathrm{tm}}(k)$ is as follows. Each circuit $c_{n}^{\mathrm{tm}}(k)$ decides whether the input $b$ contains at least one substring $1^{k}$ (which indicates the terminal). Let $b_{1}$ be the maximum prefix of $b$ which does not contain $1^{k}$. The string $b$ is an encoding of a DTM if the prefix $b_{1}$ satisfies the following conditions: (i) $b_{1}$ consists of substrings of length $3 k^{2}+2 k$, (ii) each substring consists of $3 k+2$ blocks of length $k$, and (iii) each block consists of 1's followed by 0 's. The values of $3 k^{2}+2 k, 3 k+2$, and $\psi(n)$ can be computed by an $O(\log n)$-time DTM. Once those values are known and held in storage tapes, the above structure can be checked by a single scan just as finite automata. Thus, $\alpha_{n}^{\mathrm{tm}}$ can be a DLOGTIME-uniform circuit.

If no such $k$ exists, $\alpha_{n}^{\text {tm }}$ outputs 0 , and thus the output of $\alpha_{n}$ becomes 0 , regardless of the outputs of $\alpha_{n}^{\text {code }}$ and $\alpha_{n}^{\text {sim }}$. Therefore, in the following, we can assume that the input $b$ meets the conditions.
Circuits $\alpha_{n}^{\text {code }}$ : Recall that $L_{\mathrm{EC}}$ is the extended connection language accepted by $T_{b}$, and $\beta_{n}$ is the circuit whose extended connection language is $L_{\mathrm{EC}}$. In order to find the type of gate $g(p)$ of $\beta_{n}$, we must decide whether $\langle n, g, p, y\rangle$ is in $L_{\mathrm{EC}}$ for each $y \in$ $\{x, \wedge, \vee, \neg\}$ by simulating $T_{b}$. Thus, $\alpha_{n}^{\text {code }}$ has the following subcircuits $c_{n}^{\text {type }}(k, g, p, y)$ for all $k$ and all $y \in\{x, \wedge, \vee, \neg\}$ such that each $c_{n}^{\text {type }}(k, g, p, y)$
checks whether $\langle n, g, p, y\rangle$ is in $L_{\mathrm{EC}}$.
For simplicity, we consider $k$-tape $k$-state TM $T_{b}$ which uses 0 and 1 as its tape symbols. We represent the configuration of $T_{b}$ at step $t$ by four words

$$
\begin{gathered}
s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right) \\
h\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \\
w_{0}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right) \\
w_{1}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right)
\end{gathered}
$$

where $s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right)$ is a $(\log k)$-bit binary word, and the remaining three words are single bits. (In the circuit, each bit is represented by, e.g., a single AND gate of fan-in 1.) $s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right)$ represents the state of $T_{b}$ at step $t$ if $j_{1}, \ldots, j_{k}$ coincide with the head positions. For example, $s\left(k, g, p, y ; 0 ; j_{1}, \ldots, j_{k}\right)$ represents the initial state if $j_{1}=\cdots=j_{k}=1$ (i.e., every head is placed at the first cell at step 0 ); otherwise all bits of $s\left(k, g, p, y ; 0 ; j_{1}, \ldots, j_{k}\right)$ are 0 . If the $j$ th cell of the $i$ th tape of $T_{b}$ contains symbol 0 (1) and if $j_{1}, \ldots, j_{k}$ coincide with the head positions, then $w_{0}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right)=$ $1\left(w_{1}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right)=1\right)$; otherwise $w_{0}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right)$
$0\left(w_{1}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right)=0\right)$. If the head of the $i$ th tape of $T_{b}$ is placed at the $j_{i}$ th cell and if $j_{1}, \ldots, j_{k}$ coincide with the head positions, then $h\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right)=1 ;$ otherwise $h\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right)=0$.

Recall that the next move function is encoded by a string of length $3 k^{2}+2 k$ and that this string is composed of $3 k+2$ blocks of length $k$. We transform each block of the encoding of the next move function, say, $\delta_{f}, f=1,2, \ldots$, by the following words

$$
q_{1}(k, f), a(k, f ; i), q_{2}(k, f), b(k, f ; i), d(k, f ; i)
$$

where $q_{1}(k, f)$ and $q_{2}(k, f)$ are $(\log k)$-bit binary words, and $a(k, f ; i), b(k, f ; i)$ and $d(k, f ; i)$ are singlebits. Those words mean that if the state is $q_{1}(k, f)$ and the $i$ th head is reading symbol $a(k, f ; i)$ for each $i$, then the state is changed into $q_{2}(k, f)$ and the $i$ th head writes $b(k, f ; i)$ and moves to the left (right) if $d(k, f ; i)=0(d(k, f ; i)=1)$. This transformation is done by a DLOGTIME-uniform circuit whose depth is roughly $\log k$ (details are omitted).

Now we show how to simulate a single step of TM $T_{b}$. If $t=0, s(k, g, p, y ; 0 ; 1,1, \ldots, 1)$ represents the initial state, $w_{0}(k, g, p, y ; 0,1, j ; 1,1, \ldots, 1)$ and $w_{1}(k, g, p, y ; 0,1, j ; 1,1, \ldots, 1)$ for $j \geq 1$ contain the input string $\langle n, g, p, y\rangle$ in binary, and $h(k, g, p, y ; 0, i, 1 ; 1,1, \ldots, 1)=1$. The remaining
words

$$
\begin{gathered}
s\left(k, g, p, y ; 0 ; j_{1}, \ldots, j_{k}\right), \\
h\left(k, g, p, y ; 0, i, j_{i} ; j_{1}, \ldots, j_{k}\right), \\
w_{0}\left(k, g, p, y ; 0, i, j ; j_{1}, \ldots, j_{k}\right), \\
w_{1}\left(k, g, p, y ; 0, i, j ; j_{1}, \ldots, j_{k}\right)
\end{gathered}
$$

are set to be 0 . In the following, we show the connection between steps $t$ and $t+1$.

First of all, the following circuit determines whether $j_{1}, j_{2}, \ldots, j_{k}$ coincide with the head positions.

$$
\begin{aligned}
& \text { heads }\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right) \\
& \quad=\bigwedge_{i=1}^{k}\left(h\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right)\right)
\end{aligned}
$$

This is an AND gate of fan-in $k$ and is replaced by a depth-( $\log k$ ) circuit of fan-in 2. The following circuit outputs 1 iff two binary $k$-bit words $y, z$ are the same.

$$
E Q(y, z)=\bigwedge_{i=1}^{k}\left(y_{i} z_{i} \vee\left(\neg y_{i}\right)\left(\neg z_{i}\right)\right)
$$

where $y_{i}$ and $z_{i}$ are the $i$ th bit of $y$ and $z$, respectively. We compare the current state and $q_{1}(k, f)$ by

$$
\begin{aligned}
& \text { cmp-state }\left(k, g, p, y, f ; t ; j_{1}, \ldots, j_{k}\right) \\
& =E Q\left(s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right), q_{1}(k, f)\right) \\
& \wedge \operatorname{heads}\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right) .
\end{aligned}
$$

Since $s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right)$ and $q_{1}(k, f)$ are $(\log k)$ bit binary words, this comparison needs depth roughly $\log \log k$. We then compare the symbol read by the $i$ th head of $T_{b}$ and $a(k, f ; i)$ by

$$
\begin{aligned}
& c m p-s y b l_{1}\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \\
& =w_{1}\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \wedge a(k, f ; i) \\
& \quad \wedge h e a d s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right) \\
& \text { cmp-sybl }\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \\
& =w_{0}\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \wedge \neg a(k, f ; i) \\
& \quad \wedge h e a d s\left(k, g, p, y ; t ; j_{1}, \ldots, j_{k}\right) .
\end{aligned}
$$

We define $c m p-s y b l\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right)$ as

$$
\begin{aligned}
& c m p-s y b l\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \\
& =\quad \operatorname{cmp-sybl}\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \\
& \quad \vee c m p-s y b l_{1}\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) .
\end{aligned}
$$

Then $c m p-\operatorname{sybl}\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right)=1$ iff the $i$ th head of $T_{b}$ is reading the symbol which coincides with $a(k, f ; i)$. Therefore, the current configuration agrees with the next move function $\delta_{f}$ iff the
following $\operatorname{agree}\left(k, g, p, y, f ; t ; j_{1}, \ldots, j_{k}\right)=1$.

$$
\begin{aligned}
& \operatorname{agree}\left(k, g, p, y, f ; t ; j_{1}, \ldots, j_{k}\right) \\
&= \operatorname{cmp-state}\left(k, g, p, y, f ; t ; j_{1}, \ldots, j_{k}\right) \\
& \wedge\left(\bigwedge_{i=1}^{k} c m p-s y b l\left(k, g, p, y, f ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right)\right)
\end{aligned}
$$

Let $d_{i}$ be a single bit, and $d_{i}^{\prime}=-1$ if $d_{i}=$ 0 and $d_{i}^{\prime}=1$ if $d_{i}=1$. In the following, ( $d_{1}, d_{2}, \ldots, d_{k}$ ) means that the $i$ th head is moved to the left (resp. right) if $d_{i}=0\left(\right.$ resp. $\left.d_{i}=1\right)$. We define $\operatorname{move}\left(k, f ; d_{1}, d_{2}, \ldots, d_{k}\right)$ as

$$
\operatorname{move}\left(k, f ; d_{1}, d_{2}, \ldots, d_{k}\right)=\bigwedge_{i=1}^{k} E Q\left(d_{i}, d(k, f ; i)\right)
$$

We define $s\left(k, g, p, y, f ; t+1 ; j_{1}, \ldots, j_{k} ; d_{1}, \ldots, d_{k}\right)$ as

$$
\begin{aligned}
& q_{2}(k, f) \wedge \operatorname{agree}\left(k, g, p, y, f ; t ; j_{1}-d_{1}^{\prime}, \ldots, j_{k}-d_{k}^{\prime}\right) \\
& \wedge \operatorname{move}\left(k, f ; d_{1}, \ldots, d_{k}\right)
\end{aligned}
$$

Now the next state are updated by

$$
\begin{aligned}
& s\left(k, g, p, y ; t+1 ; j_{1}, \ldots, j_{k}\right)= \\
& \quad \bigvee_{d_{1}, \ldots, d_{k} \in\{0,1\}}\binom{k \cdot 2^{k}}{\bigvee_{j=1} s\left(k, g, p, y, f ; t+1 ; j_{1}, \ldots, j_{k} ; d_{1}, \ldots, d_{k}\right)}
\end{aligned}
$$

where $k \cdot 2^{k}$ is the number of the next move functions. This is an OR gate of fan-in $2^{k} k 2^{k}$, and thus this can be replaced by a fan-in 2 circuit of depth $2 k+\log k$.

Then, we define $w_{1}\left(k, g, p, y, f ; t+1, i, j ; j_{1}+\right.$ $\left.d_{1}, \ldots, j_{k}+d_{k} ; d_{1}, \ldots, d_{k}\right)$ as

$$
\begin{aligned}
& \quad\left(b(k, f ; i) \wedge \operatorname{agree}\left(k, g, p, y, f ; t ; j_{1}, \ldots, j_{k}\right)\right. \\
& \left.\quad \wedge h\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \wedge \operatorname{move}\left(k, f ; d_{1}, \ldots, d_{k}\right)\right) \\
& \vee\left(w_{1}\left(k, g, p, y ; t, i, j ; j_{1}, \ldots, j_{k}\right)\right. \\
& \left.\quad \wedge \neg h\left(k, g, p, y ; t, i, j_{i} ; j_{1}, \ldots, j_{k}\right) \wedge \operatorname{move}\left(k, f ; d_{1}, \ldots, d_{k}\right)\right)
\end{aligned}
$$

Tape symbols are updated by
$w_{1}\left(k, g, p, y ; t+1, i, j ; j_{1}, \ldots, j_{k}\right)=$
$\bigvee_{d_{1}, \ldots, d_{k} \in\{0,1\}}\left(\bigvee_{f=1}^{k \cdot 2^{k}} w_{1}\left(k, g, p, y, f ; t+1, i, j ; j_{1}, \ldots, j_{k} ; d_{1}, \ldots, d_{k}\right)\right)$.
$w_{0}\left(k, g, p, y, f ; t+1, i, j ; j_{1}, \ldots, j_{k} ; d_{1}, \ldots, d_{k}\right)$ and $w_{0}\left(k, g, p, y ; t+1, i, j ; j_{1}, \ldots, j_{k}\right)$ are defined similarly.

The head positions are updated if the head positions at step $t+1$ are adjacent to the positions at step $t$. We define $h(k, g, p, y, f ; t+$

$$
\begin{aligned}
& \left.1, i, j_{i} ; j_{1}, j_{2}, \ldots, j_{k} ; d_{1}, d_{2}, \ldots, d_{k}\right) \text { as } \\
& \quad h\left(k, g, p, y ; t, i, j_{i}-d_{i}^{\prime} ; j_{1}-d_{1}^{\prime}, \ldots, j_{k}-d_{k}^{\prime}\right) \\
& \quad \wedge \operatorname{agree}\left(k, g, p, y, f ; t ; j_{1}-d_{1}^{\prime}, \ldots, j_{k}-d_{k}^{\prime}\right) \\
& \\
& \quad \wedge \operatorname{move}\left(k, f ; d_{1}, \ldots, d_{k}\right)
\end{aligned}
$$

$h\left(k, g, p, y, f ; t+1, i, j_{i} ; j_{1}, j_{2}, \ldots, j_{k} ; d_{1}, d_{2}, \ldots, d_{k}\right)=$ 1 iff the $i$ th head is placed at $\left(j_{i}-d_{i}^{\prime}\right)$ th cell at step $t$ and is moved to $j_{i}$ th cell at step $t+1$ for each $i$. Then, the head positions are updated by
$h\left(k, g, p, y ; t+1, i, j_{i} ; j_{1}, j_{2}, \ldots, j_{k}\right)=$

$$
\bigvee_{d_{1}, \ldots, d_{k} \in\{0,1\}}\left(\bigvee_{f=1}^{k \cdot 2^{k}} h\left(k, g, p, y, f ; t+1, i, j_{i} ; j_{1}, \ldots, j_{k} ; d_{1}, \ldots, d_{k}\right)\right)
$$

As shown in Lemma 1, it is enough that $\alpha_{n}^{\text {code }}$ simulates $\psi(n) \log n$ steps of $T_{b}$. A single step of the simulation needs depth $c_{1} k$ for some constant $c_{1}$. Therefore, $\alpha_{n}^{\text {code }}$ has depth $c_{1} k \psi(n) \log n$ in total.

Each gate has its own name, and each name is represented by a binary string of length $c_{2} \log n$ for some constant $c_{2}$. The depth of the connection between step $t$ and step $t+1$ is a polynomial in $k$. Therefore, the type and gate number of the gate reached by following some path $p,|p| \leq \log Z(n)$, from each gate of $\alpha_{n}^{\text {code }}$ can be computed by an $O(\log n)$-time DTM. Hence, $\alpha_{n}$ is a DLOGTIME-uniform circuit. The following lemma concludes the proof.

Lemma 1. Any family of DLOGTIME-uniform circuits of size $Z_{1}(n)$ and depth $T_{1}(n)$ cannot recognize the language which is recognized by the abovedefined $\alpha_{n}$ of size $Z_{2}(n)$ and depth $T_{2}(n)$.

Proof. Assume for contradiction that there exists a family of DLOGTIME-uniform circuits, say, $\beta_{n}$, of size $Z_{1}(n)$ and depth $T_{1}(n)$ such that $\beta_{n}$ can accept the language accepted by $\alpha_{n}$. Since $\beta_{n}$ is DLOGTIMEuniform, there exists an $O(\log n)$-time DTM $T_{b}$ which recognizes the extended connection language $L_{\mathrm{EC}}$ of $\beta_{n}$. Consider a sufficiently long string $b$ such that the encoding of $T_{b}$ appears in the prefix of length $\psi(n)$. If we define $\alpha_{n}^{\text {code }}$ by using an appropriate slowly growing function $\psi(n)$ computable by an $O(\log n)$ time DTM (e.g., $\psi(n)=\min \left(\log ^{*} n, \sqrt{T_{1}(n) / \log n}\right)$ ), then the $\operatorname{depth} c_{1} k \psi(n) \log n$ of $\alpha_{n}^{\text {code }}$ becomes at most $d T_{1}(n)$. (Note that $k$ is at most $\psi(n)$ and $T_{1}(n)$ is not bounded by $O(\log n)$.)

If such a long string $b$ is given to $\alpha_{n}$ as its input, then (i) the output of $\alpha_{n}^{\text {tm }}$ is 1 , (ii) $\alpha_{n}^{\text {code }}$ correctly outputs the extended encoding $\widehat{\beta}_{n}$ of $\beta_{n}$, and therefore (iii) $\alpha_{n}^{\text {sim }}$ outputs 1 if and only if $\beta_{n}$ outputs 1 . Recall that the output of $\alpha_{n}$ is 1 if and only if the outputs of $\alpha_{n}^{\mathrm{tm}}$ and $\alpha_{n}^{\text {sim }}$ are 1 and 0 . Therefore, $\alpha_{n}$ outputs 1 if and only if $\beta_{n}$ outputs 0 , a contradiction.

## Acknowledgments

We would like to thank Eric Allender for valuable suggestions and comments on the initial drafts of this paper. This work originated from the discussion with him while the first author visited Rutgers University. Also, thanks are due to an anonymous referee who gave us the suggestion on the universal circuits of Cook and Hoover. This research was supported in part by Scientific Research Grant, Ministry of Education, Japan.

## References

[1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, The design and analysis of computer algorithms, Addison-Wesley, Reading, MA, 1974.
[2] M. Ajtai, " $\Sigma_{1}^{1}$-formulae on finite structure," Ann. Pure Appl. Logic, Vol. 24, pp. 1-48, 1983.
[3] D.A. Barrington, "Bounded-width polynomial-size branching programs can recognize exactly those languages in $\mathrm{NC}^{1}$," J. Comput. System Sci., Vol. 38, pp. 150-164, 1989.
[4] D.A.M. Barrington, K. Compton, H. Straubing, and D. Thérien, "Regular Languages in $\mathrm{NC}^{1}$ " J. Comput. System Sci., Vol. 44, pp. 478-499, 1992.
[5] D.A.M. Barrington, N. Immerman, and H. Straubing, "On uniformity within $\mathrm{NC}^{1}$," J. Comput. System Sci., Vol. 41, pp. 274-306, 1990.
[6] D.A.M. Barrington, H. Straubing, and D. Thérien, "Non-uniform Automata over groups," Inform. and Comput., Vol. 89, pp. 109-132, 1990.
[7] D.A.M. Barrington and D. Thérien, "Finite monoids and the fine structure of $\mathrm{NC}^{1}, " J . A s$ soc. Comput. Mach., Vol. 35, No. 4, pp. 941-952, 1988.
[8] A. Borodin, "On relating time and space to size and depth," SIAM J. Comput., Vol. 6, No. 4, pp. 733-743, 1977.
[9] S.R. Buss, S.A. Cook, A. Gupta, and V. Ramachandran, "An optimal parallel algorithm for formula evaluation," SIAM J. Comput., Vol. 14, pp. 755-780, 1992.
[10] S.A. Cook, "A taxonomy of problems with fast parallel algorithms," Inform. and Control, Vol. 64, pp. 2-22, 1985.
[11] S.A Cook and H.J. Hoover, "A depth-universal circuit," SIAM J. Comput., Vol. 14, No. 4, pp. 833839, 1985.
[12] S.A. Cook and R.A. Reckhow, "Time bounded random access machines," J. Comput. System Sci., Vol. 7, pp. 354-375, 1973.
[13] M. Furst, J. Saxe and M. Sipser, "Parity, circuits, and the polynomial time hierarchy," Mach. Systems Theory, Vol. 17, pp. 12-27, 1984.
[14] K. Iwama and C. Iwamoto, "Extended graph connectivity and its gradually increasing parallel complexity," in: Proc. Fifth Annual International Symposium on Algorithms and Computation (Lecture Notes in Computer Science 834), pp. 478486, 1994.
[15] K. Iwama and C. Iwamoto, " $\alpha$-connectivity: A gradually non-parallel graph problem," to appear in the Journal of Algorithms, Vol. 20, No. 3, 1996.
[16] K. Iwama, C. Iwamoto, and M. Morshed, "Time lower bounds do not exist for CRCW PRAMs," to appear in Theoretical Computer Science, Vol. 160, 1996.
[17] D.S. Johnson, "A catalog of complexity classes", in: J. van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. A (North-Holland, Amsterdam, 1990) pp. 69-161.
[18] S. Khuller and B. Schieber, "Efficient parallel algorithms for testing $k$-connectivity and finding disjoint $s-t$ paths in graphs," SIAM J. Comput., Vol. 20, No. 2, pp. 352-375, 1991.
[19] W.W. Kirchherr, "A hierarchy theorem for PRAM-based complexity classes," in: Proc. 8th Conference on Foundations of Software Technology and Theoretical Computer Science (Lecture Notes in Computer Science 338), pp. 240-249, 1988.
[20] N. Pippenger and M.J. Fischer, "Relations among complexity measures," J. Assoc. Comput. Mach., Vol. 26, No. 2, pp. 361-381, 1979.
[21] W.L. Ruzzo, "On uniform circuit complexity," J. Comput. System Sci., Vol. 22, pp. 365-383, 1981.
[22] M. Sipser, "Borel sets and circuit complexity," in: Proc. 15th Ann. ACM Symp. on Theory of Computing, pp. 61-69, 1983.
[23] L. Stockmeyer and U. Vishkin, "Simulation of parallel random access machines by circuits," SIAM J. Comput., Vol. 13, No. 2, pp. 409-422, 1984.

