Introduction
In this paper a network will be considered an acyclic graph having several input nodes (inputs) and some (at least one) output nodes (outputs). The nodes are characterised by fan-in (the number of incoming edges -denoted by ∆) and fan-out (the number of outgoing edges), while the network has a certain size (the number of nodes) and depth (the number of edges on the longest input to output path). If with each edge a synaptic weight is associated and each node computes the weighted sum of its inputs to which a nonlinear activation function is then applied (artificial neuron), the network is a neural network (NN): 
with w i ∈IR the synaptic weights, θ ∈IR known as the threshold, and σ a non-linear activation function. If the non-linear activation function is the threshold (logistic) function, the neurons are threshold gates (TGs) and the network is just a threshold gate circuit (TGC) computing a Boolean function (BF). The cost functions associated to a NN are depth and size. These are linked to T ≈ depth and A ≈ size of a VLSI chip. Unfortunately, NNs do not closely follow these proportionalities as:
the area of the connections counts [2, 3, 9] ; the area of one neuron is related to its associated weights. That is why the size and depth complexity measures are not the best criteria for ranking different solutions when going to silicon [11] . Several authors have taken into account the fan-in [1, 9, 10, 12] , the total number of connections, the total number of bits needed to represent the weights [8, 15] or even more precise approximations like the sum of all the weights and thresholds [2] [3] [4] [5] [6] [7] :
An equivalent definition of 'complexity' for a NN is ∑ n − 1 i = 0 w i 2 [16] . It is worth mentioning that there are also several sharp limitations for VLSI implementations like: (i) the maximal value of the fan-in cannot grow over a certain limit; (ii) the maximal ratio between the largest and the smallest weight. For simplification, in the following we shall consider only NNs having n binary inputs and k binary outputs. If real inputs and outputs are needed, it is always possible to quantize them up to a certain number of bits such as to achieve a desired precision. The fan-in of a gate will be denoted by ∆ and all the logarithms are taken to base 2 except mentioned otherwise. Section 2 will present previous results for which proofs have already been given [2] [3] [4] [5] [6] [7] . In section 3 we shall prove our main claim while also showing several simulation results.
Background
A novel synthesis algorithm evolving from the decomposition of COMPARISON has recently been proposed. We have been able to prove that [2, 3] : 
One restriction is that the input variables are pair-dependent, meaning that we can group the ∆ input variables in ∆ / 2 pairs of two input variables each: (g ∆/2−1 , e ∆/2−1 ) ,…, (g 0 , e 0 ), and that in each such group one variable is 'dominant' (i.e., when a dominant variable is 1, the other variable forming the pair will also be 1):
Each f ∆ can be built starting from the previous one f ∆−2 (having a lower fan-in) by copying its synaptic weights; the constructive proof has led to [5] : Proposition 2 The COMPARISON of two n-bit numbers can be computed by a ∆-ary tree neural network with polynomially bounded integer weights and thresholds (≤ n k ) having size O (n / ∆) and
For a closer estimate of the area we have used equation (2) and proved [5] :
Proposition 3 The neural network with polynomially bounded integer weights (and thresholds) computing the COMPARISON of two n-bit numbers occupies an area of
O (n ⋅ 2 ∆ / 2 / ∆) for
all the values of the fan-in (∆) in the range 3 to O ( logn ) .
The result presented there is: 
More precisely we have:
which leads to:
For 2m > 2 ∆ the equations are much more intricate, while the complexity values for area and for AT 2 are only reduced by a factor (equal to the fan-in [6, 7] ). If we now suppose that a feedforward NN of n inputs and k outputs is described by m examples, it can be directly constructed as simultaneously implementing k different functions from F n,m [4, 6, 7] :
be computed by a neural network with polynomially bounded integer weights (and thresholds) having size
O (m (2n+k) / ∆), depth O (log(m n ) / log∆) and occupying an area of O (m n ⋅ 2 ∆ / ∆ + mk) if 2m ≤ 2 ∆ ,
for all the values of the fan-in (∆) in the range 3 to O ( logn ) .
The architecture has a first layer of COMPARISONs which can either be implemented using classical Boolean gates (BGs) or -as it has been shown previously -by TGs. The desired function can be synthesised either by one more layer of TGs, or by a classical two layers AND-OR structure (a second hidden layer of AND gates -one for each hypercube), and a third layer of k OR gates represents the outputs. For minimising the area some COMPARISONs could be replaced by AND gates (like in a classical disjunctive normal form implementation).
Which is the VLSI-Optimal Fan-In ?
Not wanting to complicate the proofs, we shall determine the VLSI-optimal fan-in when implementing COMPARISON (in fact an F n,1 function) for which the solution was detailed in Propositions 1 to 3. The same result is valid for F n,m functions as can be intuitively expected either by comparing equations (3) and (4), or because: the delay is determined by the first layer of COMPARISONs; while the area is mostly influenced by the same first layer of COMPARISONs (the additional area for the implementing the symmetric 'alternate addition' [4] can be neglected). For a better understanding we have plotted equation (3) in Figure 1 .
Proposition 6 The VLSI-optimal (which minimises the AT

) neural network which computes the COMPARISON of two n-bit numbers has small-constant fan-in 'neurons' with small-constant bounded weights and thresholds.
Proof:
Starting from the first part of equation (3) we can compute its derivative:
∆    which -unfortunately -involves transcendental functions of the variables in an essentially non-algebraic way. If we consider the simplified 'complexity' version of equation (3) we have:
which when equated to zero leads to ln∆ (∆ ln2 − 2) = 4 (also a transcendental equation). This has ∆ = 6 as 'solution' and as the weights and the thresholds are bounded by 2
The proof has been obtained using several successive approximations: neglecting ceilings, using a 'simplified' complexity estimate. That is why we present in Figure 2 exact plots of the AT 2 measure which support our previous claim. It can be seen that the optimal fan-in 'constantly' lies between 6 and 9 (as ∆ optim = 6…9, one can minimise the area by using COMPARISONs only if the group of ones has a length of α ≥ 64 -see [4] [5] [6] [7] ). Some plots in Figure 2 are also including a TG-optimal solution denoted by SRK [14] and the logarithmic fan-in solution (∆ = logn) denoted B_lg [5] .
Conclusions
This paper has presented a theoretical proof for one of the intrinsic limitations of digital VLSI technology: there are no 'optimal' solutions able to cope with highly connected structures. For doing that we have proven the contrary, namely that constant fan-in NNs are VLSI-optimal for digital architectures (either Boolean or using TGs). Open questions remain concerning 'if' and 'how' such a result could be used for purely analog or mixed analog/digital VLSI circuits. 
