Abstract. We introduce a family of graphs C(n, i, s, a) that generalizes the binary search tree. The graphs represent logic circuits with fan-in i, restricted fan-out s, and arising by n progressive additions of random gates to a starting circuit of a isolated nodes. We show via martingales that a suitably normalized version of the number of terminal nodes in binary circuits converges in distribution to a normal random variate.
Introduction
The well-known binary search tree has many applications as a data structure (see Mahmoud (1992) ), as a model underlying searching and sorting applications (see Knuth 1998 ) and Mahmoud (2000) ), and as a model for formal languages and computer algebra (see Kemp (1984) ). We introduce a family of acyclic directed graphs C(n, i, s, a) that generalizes the binary search tree.
The graphs represent logic circuits with indegree (fan-in) i, restricted outdegree (fan-out) s, arising by n progressive additions of random gates to a starting circuit of a isolated nodes. The initial a nodes represent the initial input lines. The model has applications in neurosciences (see Valiant (1994) ) and electrosciences (see Hutton, Rose, Grossman and Cornell (1998) ).
The family of graphs C(n, i, s, a) we introduce is a hierarchy of combinatorial structures that includes the binary search tree. We shall refer to a circuit with fan in i and restricted fan-out s as the i-ary circuit with fanout s. For specified parameters i, s, a, we refer to the structure simply as the circuit. Nodes with outdegree s are considered saturated. At each stage, i insertion positions are chosen from unsaturated nodes as the parents of a new child. The outdegree of each of these nodes increases by the number of insertion positions taken from it. In circuit interpretation, i output lines of a previous stage are taken as input lines into a new i-ary gate, which can have up to s output lines drawn from it. The consumed lines are no longer viable inputs at later stages.
Let L n,i,s,a denote the number of terminal nodes in the circuit having a initial input nodes. The main result of this paper is to demonstrate the convergence in distribution of a suitably normed version of L n,i,s,a to a normal variate for binary circuits. The algebra gets quite unwieldy, if we keep all the parameters. For clarity of the exposition, we shall illustrate the result on the subfamily C(n, 2, 3, 1) of binary fan-in, fan-out 3, and growing out of a single node. Throughout all insertion stages, the underlying undirected graph remains one connected component; the fan-out is restricted to 3. We shall return at the end to the general binary case and state the necessary adjustments in the proof to get a general result for the family C(n, 2, s, a) and make some conjectures about what to expect in the class C(n, i, s, a).
Throughout, we shall use the following notation. We shall denote the normally distributed random variate with mean 0 and variance σ 2 by N (0, σ 2 ). We shall use the symbols D −→ and P −→ for convergence in distribution and in probability, respectively. The notation O L 1 (g(n)) will stand for a random variable that is O(g(n)) in the L 1 norm.
Let the notation L n be reserved for L n,2,3,1 . One main result of this investigation is the central limit tendency:
and its extension to a similar central limit theorem for L n,2,s,a . Section 2 gives a precise definition for the circuits. In Section 3 we derive the exact first two moments for the number of terminal nodes for C(n, 2, 3, 1). The central limit theorem for C(n, 2, 3, 1) is derived in Section 4 via a martingale transform. The scope of the result and proof technique is extended in Section 5 to the binary circuits family C(n, 2, s, a), where a conjecture for C(n, i, s, a) is presented, too.
The growth of a random circuit
An i-ary circuit with fan-out s is a directed graph that starts out with a isolated nodes of indegree and outdegree 0. Each node can have up to s positional children (output lines in the language of circuits), with a distinguished leftto-right order. A childless node is a terminal node. A nonterminal node with k children has s − k positional insertion places, which are the positions not taken by its positional children in the left-to-right order. The circuit evolves in stages as follows. After n − 1 stages, a circuit C(n − 1, i, s, a) has grown. At the nth stage, i positional insertion places from unsaturated nodes are chosen from C(n − 1, i, s, a) as parents for a new node (or gate in the circuits). The new node is adjoined to the circuit with edges directed from the i positional insertion places to it, and is given 0 outdegree.
With regard to the interpretation of C(n, i, s, a) as a circuit, the number of terminal nodes is a parameter of interest. In boolean circuits they stand for how many "answers" can be derived from a given inputs. The problem is trivial for s = 1. The case i = 1 is not very challenging. The least i that makes the problem significant is 2. The case C(n, 2, 2, a) is not too interesting, as the number of terminal nodes remains finite and at most a throughout all the stages. The smallest significant member of the family C(n, 2, s, a) for the study is C(n, 2, 3, a). For simplicity of the exposition we take a = 1 to reduce the number of parameters. We shall focus on the binary circuit (fan-in i = 2), with fan-out 3, growing out of a single input. We develop results for C(n, 2, 3, 1) in Sections 3 and 4. In Section 5 we shall sketch the extension of these results to cover the entire C(n, 2, s, a) family.
The graphs C(n, i, s, a) generalize binary search trees (which are C(n, 1, 2, 1)). This runs in a parallel vein to the generalization of recursive trees into a hierarchy of random graphs called recursive circuits (Tsukiji and Mahmoud (2001) ). For a broad review of definition and uses of recursive trees the reader is referred to the survey in Smythe and Mahmoud (1996) .
For modeling and analysis purposes, the random graphs C(n, i, s, a) are extended by supplying each node with a sufficient number of special nodes (called external nodes) to universally make the outdegree of all the circuit nodes (now viewed as internal) equal to s. Figure 1 shows one possible extended binary circuit with fan-out 3, after two insertion steps into an initial graph of one node. The internal nodes are shown as bullets and the external nodes are shown as squares.
Many models of randomness can be imposed on i-ary circuits with fanout s. A natural probability model is one in which all pairs of external nodes in C(n − 1, i, s, a) are equally likely candidate inputs for the nth entrant. Note that under this model, the various circuits of one size are not equally likely. C(n, 2, 3, 1) For the rest of this section, we focus on the circuit C(n, 2, 3, 1), and L n is to be understood as the number of terminal nodes L n,2,3,1 in this particular random graph. To keep track of terminal nodes, we monitor the number of external nodes joined to terminal nodes. We do this by a color code. These external nodes are to be colored with white (W). The rest of the external nodes are to be colored with blue (B). In Figure 1 white external nodes are shown as blank squares, and blue external nodes are shown as crossed squares.
Terminal nodes in
Let W n and B n be respectively the number of white and blue external nodes in the extended circuit after n nodes have been added.
Choosing two external nodes as inputs for the nth insertion from the nodes of C n−1,2,3,1 can occur in one of four ways: -The two external nodes are children of the same terminal node (both white) in C n−1,2,3,1 . Let 1 W W, same (n) be the indicator of this event. In this case, two external nodes of C n−1,2,3,1 are occupied as input, and a new terminal node appears in C n,2,3,1 . Their sibling is no longer white, it turns into blue. The new terminal node acquires 3 white external nodes, a net gain of 0 white external nodes. The change in the number of white external nodes can be written conditionally as
If we let F n be the sigma field generated by the first n steps, we have the conditional expectation
(1) We have a steady rate of increase in the number of external nodes, after n insertions in the circuit. We always consume two external nodes to produce three new ones (net gain of one external node per addition); there are n + 3 external nodes, and the numbers of white and blue external nodes are tied together via the total count
According to the definition of the indicators, we have
Plugging (2)- (4) into the conditional equation (1), we see that (fortunately) quadratic terms in W n−1 disappear giving the manageable recurrence
Taking expectations, we have an unconditional recurrence:
with solution
Likewise, we can work toward the second moment. We start with (1) in the squared form
Plugging in (2)- (4), and taking expectations, we have an unconditional recurrence:
Solving the equation, then subtracting the square of the mean, we get the variance 
The growth rates of the mean and variance give a concentration law. It is immediate from (6) and (9) , by Chebyshev's inequality, that
We can therefore represent W n as 3 7 n+o P (n). In the sequel we shall need a slightly sharper representation. We shall need to sum up the lower order terms. When appropriately scaled, these lower-order terms leave behind o P (1) terms. Generally speaking, the sum of o P (1) terms is not guaranteed to remain o P (n), but in this case it does because we have the stronger convergence in L 1 .
Lemma 1 W
Proof. We have from (6) and (9)
By the Cauchy-Schwartz inequality, we have
Obviously, W n + 3 7 n is O(n). So, by (10) the latter inequality gives
The lemma follows.
Central limits
The recurrence (5) can be "martingalized:" Appropriate factors b n and c n can be chosen so that b n W n + c n is a martingale. We develop this useful lead next. We let M n = b n W n + c n and seek b n and c n so as to satisfy
Lemma 2 The random variable
is a martingale with respect to the sigma fields F n .
Proof. Let M n = b n W n + c n , for yet-to-be-determined constants b n and c n , that render M n a martingale sequence, with respect to the sigma fields F n . These constants must then satisfy
Using the recurrence (5), we obtain
for every n ≥ 1. This is possible, if
which unwinds in
for arbitrary constant b 2 , and we can take b 1 = 0. Equating the free terms in (11), we get
We also want E[M 1 ] = 0, requiring that c 1 = 0. Hence, for arbitrary b 2 ,
is a martingale.
The main result of this section is presented next.
Theorem 1 Let L n be the number of terminal nodes in a random
C(n, 2, 3, 1) circuit after the insertion of n gates. The terminal nodes follow the asymptotic central limit law:
Proof. To study the number of terminal nodes, we can look at the white external nodes, for L n = 1 3 W n . Further, let M k be the martingale in Lemma 2, and let be the back shift operator; that is, for a function h, h(n) = h(n) − h(n − 1). For any constant factors A n , A n M k is a martingale difference sequence (with respect to the increasing sigma-field sequence F k ). The factor A n = n −13/2 suits our purpose. We verify the martingale central limit theorem for the martingale difference n −13/2 M k . It suffices to check the conditional Lindeberg condition and the conditional variance condition on the martingale differences (see Hall and Hyde(1980) ).
The conditional Lindeberg condition requires that, for all ε > 0,
We have
We also have
. Therefore, all the sets {| M k | 2 > ε 2 n 13 } are empty for large n, and every k ≤ n. Deterministically, U n → 0; the conditional Lindeberg condition has been verified. A Z-conditional variance condition requires that
For this calculation, we note that
and subsequently we have
Substituting (5) and (8) and invoking Lemma 1, one derives 
Extending the results to C(n, 2, s, a)
In this section we generalize the results presented in Sections 3 and 4 for the circuit C(n, 2, 3, 1) to those for C(n, 2, s, a). All the definitions can be extended in a natural way. We still color the external nodes joined to terminal nodes with white, and call their number after n stages W n , color the rest of the external nodes with blue, and call their number after n stages B n , etc. Starting with sa external nodes the circuit receives s − 2 external nodes per addition, so that after n insertions
By arguing the changing of external node colors according to the coding scheme (the argument here parallels those of Sect. 3), the equations (1) and (7) are extended as
where the conditional expectations of the indicators are
After taking expectations we have
This type of recurrence equations can be solved asymptotically by induction. For example, one can show that
for suitably chosen constants k 1 , and k 2 , and an asymptotic representation follows. One gets It seems that the results can be extended with quite a bit of extra effort to the family of random graphs C(n, i, s, a), where one still gets linear means and variances and a central limit theorem for the number of terminal nodes. For instance, in the case of i = 3 cubic terms W 3 n−1 appear in the conditional expectation of W n , arising from terms like
. However, both cubic and quadratic terms cancel out upon unconditioning, just like in the binary circuits.
