Abstract: A zero-one sequence x 1 , . . . , x n is k-tonic if the number of i's such that x i = x i+1 is at most k. The notion generalizes well-known bitonic sequences. In negation-limited complexity, one considers circuits with a limited number of NOT gates, being motivated by the gap in our understanding of monotone versus general circuit complexity, and hoping to better understand the power of NOT gates. In this context, the study of inverters, i.e., circuits with inputs x 1 , . . . , x n and outputs ¬x 1 , . . . , ¬x n , is fundamental since an inverter with r NOTs can be used to convert a general circuit to one with only r NOTs. In particular, if linearsize log-depth inverter with r NOTs exists, we do not lose generality by only considering circuits with at most r NOTs when we seek superlinear size lower bounds or superlogarithmic depth lower bounds. Markov [JACM1958] showed that the minimum number of NOT gates necessary in an n-inverter is log 2 (n + 1) . Beals, Nishino, and Tanaka [SICOMP98-STOC95] gave a construction of an ninverter with size O(n log n), depth O(log n), and log 2 (n + 1) NOTs. We give a construction of circuits inverting k-tonic sequences with size O((log k) n) and depth O(log k log log n + log n) using log 2 n + log 2 log 2 log 2 n + O(1) NOTs. In particular, for the case where k = O(1), our k-tonic inverter achieves asymptotically optimal linear size and logarithmic depth. Our construction improves all the parameters of the k-tonic inverter by Sato, Amano, with size O(kn), depth O(k log 2 n), and O(k log n) NOTs. We also give a construction of k-tonic sorters achieving linear size and logarithmic depth with log 2 log 2 n + log 2 log 2 log 2 n + O(1) NOT gates for the case where k = O(1). The following question by Turán remains open: Is the size of any depth-O(log n) inverter with O(log n) NOT gates superlinear?
Introduction and Summary
Although exponential lower bounds are known for the monotone circuit size [12] , [8] , [5] , at present we cannot prove a superlinear lower bound for the size of circuits computing an explicit Boolean function. It is natural to ask: What happens if we allow a limited number of NOT gates? The hope is that by the study of negation-limited complexity of Boolean functions under various scenarios [6] , [7] , [4] , [3] , [2] , [13] , [9] , we obtain a better understanding about the power of NOT gates.
As explained in the abstract, the study of inverters is fundamental in this context. We consider circuits consisting of AND/OR/NOT gates, and the size of a circuit is the number of gates in it. The best known construction of a general inverter is due to Beals, Nishino, and Tanaka [4] . Their inverter has size O(n log n) and depth O(log n) and uses log 2 (n + 1) NOT gates. . In a recent paper [13] , Sato, Amano, and Maruoka considered circuits that is guaranteed to invert a restricted class of inputs, and gave a construction for a k-tonic inverter , i.e., a circuit that inverts all k-tonic 0/1 sequences, with size O(kn), depth O(k log 2 n), and O(k log n) NOTs. We give a new, different construction of a k-tonic inverter achieving improvements of all the three parameters. In particular, for k = O(1), we achieve asymptotically optimal linear size and logarithmic depth using only slightly more than log 2 n NOT gates. Improvements are shown in Table 1 .
Theorem 1.
There is a k-tonic inverter that has size O((log k)n) and depth O(log k log log n + log n), and uses log 2 n + log 2 log 2 log 2 n + O(1) NOT gates. Table 1 . the parameters of the k-tonic inverters of Sato et al [13] and this paper Sato et al [13] this paper size O(kn) O((log k)n) depth O(k log 2 n) O(log k log log n + log n) # of NOTs O(k log n) log 2 n + log 2 log 2 log 2 n + O(1)
Amano, Maruoka, and Tarui [3] considered the minimum size of a circuit that merges two 0/1 sequences using t NOT gates, and they showed that it is Θ(n log n/2 t ), thus demonstrating a smooth trade-off of size versus the number of NOTs from the monotone case of Θ(n log n) to the general case of Θ(n). Their merging circuit actually works for any bitonic sequence. Sato, Amano, Maruoka [13] also considered a generalized scenario in terms of k-tonic sequences and, for t ≤ log 2 n and k = O(log n), they gave a construction of a k-tonic sorter , i.e., a circuit that sorts all k-tonic binary sequences, that has size O(kn + (n log n)/2 t ) and uses O(tk 2 ) NOT gates. The design principle and the analysis of our k-tonic inverter immediately yields an improved k-tonic sorter:
There is a k-tonic sorter that uses t NOT gates and has size O ((log k)n + (n log n)(t/2 t )) and depth O((log k) · t + log n).
Component Circuits/Networks
In this section we explain the components that we use in our circuits. The constructions in Sections 2.1 and 2.2 are due to Beals, Nishino, and Tanaka [4] . The reader may choose to skip this section and come back to it after seeing how components are assembled and used in our circuits.
inverting the inputs of a comparator network
Let N 1 be a comparator network (see, e.g., Knuth [10] ) with inputs v 1 , . . . , v n and outputs w 1 , . . . , w n . Assume that N 1 has depth d and contains s comparators. Consider the case where inputs are Boolean. In the Boolean case, each comparator can be considered as a pair of one AND gate and one OR gate (Figure 1 ), and thus N 1 can be considered as a depth-d size-2s monotone circuit.
Assume that the negations of the outputs of N 1 , i.e., ¬w 1 , . . . , ¬w n are computed by another circuit and are available. Then, we can construct a circuit N 2 that outputs the negations of the inputs ¬v 1 , . . . , ¬v n as follows. For each comparator c with inputs x 1 and x 2 and outputs y 1 and y 2 , we can compute ¬x 1 and ¬x 2 from x 1 , x 2 , ¬y 1 , ¬y 2 as shown in Figure 1 . Repeatedly apply this construction considering comparators one by one from the outputs of N 1 towards the inputs, and obtain the network N 2 . The circuit N 2 has depth 2d and consists of 2s ANDs and 2s ORs. 
the Beals-Nishino-Tanaka inverter
The inverter operates as follows. Sort x 1 , . . . , x n by the (upside-down) AKS nsorting network [1] with depth O(log n) and size O(n), and obtain y 1 ≥ · · · ≥ y n . Apply Fischer's network M n [6] , [7] , [4] , and obtain ¬y 1 , . . . , ¬y n . Finally, apply the network explained in Section 2.1 that outputs ¬x 1 , . . . , ¬x n using ¬y 1 , . . . , ¬y n . Here M n is a network that inverts a sorted 0/1-sequence y 1 ≥ · · · ≥ y n with size O(n) and depth O(log n) using log 2 (n + 1) NOT gates.
(More precisely, for n = 2 r − 1, M n has size 4n − 3r; this is the minimum size [9] of circuits inverting n sorted inputs with r NOTS.) The inverter uses log 2 (n + 1) NOT gates and has depth O(log n) and size O(n log n).
conditional shifter
Let p ∈ {0, 1}. Assume that δ ≤ α and let y 1 , . . . , y α+δ be a 0/1-sequence. Suppose that we want to let z 1 , . . . , z α respectively be y 1 , . . . , y α if p = 1 and y δ+1 , . . . , y α+δ if p = 0. In other words, we want to either (1) discard the last δ y j 's, or (2) discard the first δ y j 's and then shift by δ. This can be easily be done using p and ¬p as follows: For j = 1, . . . , α, compute
Let y 1 ≤ · · · ≤ y α+δ and z 1 ≤ · · · ≤ z α respectively be the 0/1-sequences obtained by sorting y 1 , . . . , y α+δ and z 1 , . . . , z α . Assume further that the following condition holds:
Then, we can easily compute ¬y 1 , . . . , ¬y α+δ and y 1 , . . . , y α+δ as follows.
pivot bits in our circuit to achieve the claimed depth. It turns out that most work we do is giving appropriate definitions and developing an appropriate framework for analysis. In Section 3.3 we explain how we can achieve the number of NOT gates as claimed in Theorem 1 by a simple finer analysis. In Section 3.4 we explain how we can obtain our k-tonic sorter in a similar way.
overall structure of the k-tonic inverter
We first explain using a general algorithmic language and then explain in terms of circuits. We consider a k-tonic binary input sequence of length n and assume that n = k2 r for some integer r. If n is not of this form, we can pad an input sequence x = x 1 , . . . , x n with trailing 1's and obtain the sequence x = x 1 , . . . , x n , 1, . . . , 1 whose length is the minimum n ≥ n of the form n = k2 r , apply the inverter for the (k + 1)-tonic sequence x , and discard the last n − n outputs. Let x = x 1 , . . . , x n be a k-tonic 0/1 sequence of length n = 4km. Think of x i 's as entries of a 4k × m matrix M as follows. (We won't be doing any linear algebra; we can equally speak in terms of a rectangular array or a twodimensional grid.)
A row is dirty if it contains both 0 and 1; otherwise it is clean; an all-0 row is 0-clean and an all-1 row is 1-clean.
Since the sequence x is k-tonic, among the 4k rows of M , at most k rows are dirty. Sort each column of M with smaller entries up, and obtain the matrix M 0 . The matrix M 0 has at most k dirty rows, and all of them as middle rows. Thus either the bottom (4k − k)/2 = 3k/2 rows are all 1-clean, or else the top 3k/2 rows are all 0-clean. For now, we use the following weaker form: Either (1) all the bottom k rows are 1-clean or (2) all the top k rows are 0-clean.
Define pivot bit p as p = AND of the km entries in the bottom k rows of M 0 . Use one NOT gate and obtain ¬p. Using p and ¬p discard either the bottom k 1-clean rows or the top k 0-clean rows according to whether p = 1 or 0, i.e., whether (1) holds or not. The remaining 3k-row matrix M 1 has at most k dirty rows, and all of them as middle rows. Again discard the bottom (3k − k)/2 = k rows or the top k rows using the pivot bit for the bottom k rows together with one NOT gate.
We are left with a 2k × m matrix M 2 . Split each row of M 2 into the first half and the last half. Let L be the 4k × m 2 matrix whose 4k rows are the 4k halves of the rows of M 2 stacked on top of one another for each row. At most k rows of L are dirty. Thus using two NOT gates we have halved the problem size: We can apply the same operation and arguments for L, i.e., sort each column and discard 2k clean rows using two NOT gates.
We now explain in terms of circuits. We start over with the 4k × m matrix M above. To sort each column of M , apply AKS sorting networks each sorting 4k elements; use m separate networks for m columns in parallel. Now consider the column-sorted matrix M 0 , and let y 1 , . . . , y n be the entries in its first row through its last row.
Consider
Continue halving the problem size ν = log 2 log 2 n times so that the size is n = n/ log 2 n, and then use the Beals-Nishino-Tanaka inverter for n inputs.
Our circuit consists of three parts ( Figure 2 ): subcircuit 1 computing pivot bits and reducing the problem size from n to n subcircuit 2: the Beals-Nishino-Tanaka inverter for n = n/ log 2 n inputs subcircuit 3, which shifts the outputs appropriately using the the pivot bits 
The inverter explained so far has the parameters shown in Table 2 . The parameters of subcircuits 2 and 3 are readily seen. We provide some explanation for the parameters of subcircuit 1.
For n = km, consider the first parallel application of m separate AKS sorting networks each sorting k elements. This first part has size O((log k)n). Since the value of n geometrically decreases by a constant factor, the size of subcircuit 1 is dominated by the size of this first part. If we compute each pivot bit naively as taking the AND of some y j 's as above, this takes depth O(log n); we repeat this ν times; thus the depth of subcircuit 1 will be O(log n · ν) = O(log n log log n). In the consideration above we halve the problem size by two NOT gates; thus a total of 2ν = 2 log 2 log 2 n NOTs are used for the problem size reduction.
In Section 3.2 and 3.3 we explain how to reduce the depth of subcircuit 1 to O(log k log log n) and the number of NOTs in subcircuit 1 to log 2 log 2 n + log 2 log 2 log 2 n + O(1), and thus obtain an k-tonic inverter with the parameters claimed in Theorem 1.
reducing the depth to O(log k log log n)
Let M be an l × m 0/1-matrix. Let t be a nonnegative integer such that 2 t divides m, i.e., we can divide each row into 2 t consecutive parts of equal size. The parameter t represents the number of pivot bits in our circuit, which equals the number of times that we shrink the problem size by a constant factor, which also equals the number of NOT gates that we use.
For s = 0, 1, . . . , t, we define an s-block of M as follows. Each row itself forms a 0-block; there are l 0-blocks. Split each 0-block, i.e., split each row into the first half and the last half. These halves are the 2l 1-blocks. Similarly, splitting each (s − 1)-block yields two s-blocks; there are 2 s l s-blocks. Thus each row forms 2 s s-blocks for s = 0, . . . , t, and hence forms a total of u = t s=0 2 s = 2 t+1 − 1 blocks. For each row, order these u blocks as follows. The first block is the 0-block, i.e., the whole row. Then comes the two 1-blocks, i.e., the first half and the last half, in this order. Then comes the four 2-blocks, i.e, the first quarter up to the last quarter; and so on.
Let F = (f ij ) be an l × u 0/1-matrix, where u is as above. Our intention will roughly be to let the equality f ij = 1 represent the fact that the j-th block in the i-th row of M is 1-clean, i.e., all-1.
In our circuit, we call f ij a flag bit. We compute flag bits f ij 's just once as follows. For each t-block b, which is a smallest block, compute the flag bit f b for block b as f b = ∧ xi∈b x i . For s = t − 1, . . . , 0, each s-block b contains two (s + 1)-blocks b 1 and b 2 ; compute f b as f b = f b1 ∧ f b2 . Thus all flag bits are initially computed using only ANDs in depth log 2 m. After initial computation, we sort flag bits column-wise using AKS sorting networks and discard bottom or top rows of flag bits as we discard top or bottom rows of input bits.
Right after the initial computation, the flag bit f b for a block b is 1 iff the block b is 1-clean. After sorting f b 's, this may not hold: it is possible that a block b is 1-clean but f b = 0. But sorting maintains the property that if f b = 1, then b is 1-clean (we later call this 1-conservative), and we show how this suffices for our purposes. We discard the bottom k rows of input bits if the AND of the corresponding k flag bits is 1, computing the AND in depth log 2 k . In other words, the first pivot is computed in depth log 2 n , and thereafter each pivot bit is computed in depth log 2 k . This is how we obtain the claimed depth.
We proceed to show the correctness of the method above. We give definitions of key properties; Lemma 2 says that the properties hold after the initial computation of flag bits; Lemma 3 says that the properties are maintained by the operations above.
For two matrices M and F as above, the pair (M, F ) is 1-conservative if the following holds: For 1 ≤ i ≤ l and 1 ≤ j ≤ u, if f ij = 1 then the j-th block in the i-th row of M is 1-clean. The j-th block b in the i-th row of M is good if either (1) b is 1-clean and f ij = 1 or (2) b is 0-clean; otherwise b is a bad block. Say that (M, F ) is k-mixed if there are at most k bad s-blocks for each s = 0, . . . , t. Note that the definitions of 1-conservative and k-mixed are with respect to the parameter t. When appropriate, we make this dependence explicit by saying, e.g., k-mixed with respect to t subdivisions.
Let (M, F ) be as above: M is an l × m matrix and F is an l × u matrix, where u = t s=0 2 s = 2 t+1 − 1 for a parameter t ≤ 2 m . Stacking (M, F ) yields the pair of matrices ( M , F ), where M is an 2l × (m/2) matrix and F is an 2l × ((u − 1)/2) matrix obtained as follows. Split each row of M into the first half and the last half. Stack the 2l halves thus obtained on top of one another, and obtain a 2l × m/2 matrix M. In other words, the first half and the last half of the i-th row of M is respectively the (2i − 1)-th row and the 2i-th row of M . As for F : Throw away the first column of F , which corresponds to 0-blocks of M ; the 0-blocks have been thrown away by stacking M into M . Put the second column of F on top to the third column; put the 4th and the 5th column on top of the 6th and the 7th column respectively; in general put the (2 s + r)-th column on top of the (2 s + 2 s−1 + r)-th column (0 ≤ r < 2 s , 1 ≤ s ≤ t), and obtain F .
Lemma 2 Let x 1 , . . . , x n be a k-tonic 0/1-sequence of length n = lm. Let M be the l×m matrix having x i 's in a row-major form: e.g., the first row is x 1 , . . . , x m . Consider s-blocks of M for 0 ≤ s ≤ t. Let F = (f ij ) be the l × m 0/1-matrix such that f ij = AND of all x r 's in the j-th block of the i-th row. Then, (M, F ) is 1-conservative and k-mixed with respect to t subdivisions.
Proof. By definition of f ij 's, the pair (M, F ) is 1-conservative; furthermore, clearly there is no 1-clean block with the corresponding flag bit f ij being 0: In the setting above a block b is bad iff it is dirty. A 0-1 change in the sequence x 1 , . . . , x n produces at most one dirty s-block for each s = 0, . . . , t. The lemma follows.
Lemma 3 Now consider the 1-blocks of M , i.e., the first halves and the last halves of the rows. Assume that there are k 1 bad 1-blocks among the first halves and k 2 bad 1-blocks among the last halves, where k 1 + k 2 ≤ k. We can apply the above argument for 0-blocks separately for the first halves and for the last halves. In each of the two cases, after column-sorting there are as many good 1-blocks as before. Hence the assertion holds for 1-blocks. For the s-blocks, we can argue similarly separately considering 2 s groups of s-blocks. This completes our proof that (M , F ) is k-mixed.
Finally, stacking does not destroy being 1-conservative nor does it introduce any new bad block.
In our circuit stacking simply corresponds to rearranging the ordering of the intermediate gates; it does not need any gate. This completes our proof of the claimed depth reduction.
reducing the number of NOTs
Consider, as in Section 3,1, a 4k-row matrix M consisting of a k-tonic 0/1-sequence, and the matrix M 0 obtained from M by sorting each column. Discard k rows using one NOT gate. Now, instead of again discarding as explained in Section 3.1, consider processing the remaining 6k 1-blocks; i.e., halve the 3k rows, stack the 6k halves, and consider the resulting 6k-row matrix. At most k rows are bad. Discard 2k rows. (Actually we can discard (6k − k)/2 = (5/2)k rows; this does not yield an asymptotic improvement.)
Further halve and stack to obtain 2(6k − 2k) = 8k rows, discard 3k rows, obtain 2(8k − 3k) = 10k rows, discard 4k rows, and so on. In general, at iteration s, discard sk rows out of (2s + 2)k rows; halve and stack to obtain ((2s + 2)k − sk) × 2 = 2sk + 4k = (2(s + 1) + 2) k rows. Thus with t NOT gates we reduce the problem as follows. We assume that t ≥ 2. where we have the second equality since each denominator is twice the previous numerator. Thus we can reduce the size to 1/ log 2 n using t NOT gates with t satisfying 2 t /t ≥ log 2 n, and hence with t = log 2 log 2 n + log 2 log 2 log 2 n + O(1). In the scheme above, the column size in column-sorting increases, and we use AKS k -sorting networks for increasing k . This increases the depth of subnetwork 1, but we can easily see that the asymptotic depth does not change. This completes the description of our k-tonic inverter as claimed in Theorem 1, and thus the proof of Theorem 1.
negation-limited k-tonic sorter
We can obtain a k-tonic sorter in Theorem 2 as follows. Use exactly the same design as above to reduce the problem size to n = n (t/2 t ) with t NOT gates. Then, instead of the Beals-Nishino-Tanaka inverter use the AKS n -sorting network. Apply the shifter for sorted outputs described in Section 2.3.
Open Problems
The following question by Turán remains open: Is the size of any depth-O(log n) inverter with O(log n) NOT gates superlinear?
For k = O(1), can we reduce the number of NOT gates in a k-tonic inverter to log 2 n + O(1) while maintaining size O(n) and depth O(log n)? This question may be of interest somewhat beyond its minor technical appearance: in one way it asks whether we can utilize each NOT gate to exactly halve the problem size.
