We prove that polynomial size discrete Hopfield networks with hidden units compute exactly the class of Boolean functions PSPACE/poly, i.e., the same functions as are computed by polynomial space-bounded nonuniform Turing machines. As a corollary to the construction, we observe also that networks with polynomially bounded interconnection weights compute exactly the class of functions P/poly, i.e., the class computed by polynomial time-bounded nonuniform Turing machines.
Introduction
We investigate the power of discrete Hopfield networks (Hopfield 1982) as general computational devices. Our main interest is in the problem of Boolean function computation by symmetric networks of weighted threshold logic units; but for the constructions, we also need to consider asymmetric nets.
In our model of network computation, the input to a net is initially loaded onto a set of designated input units; then the states of the units in the network are updated repeatedly, according to their local update rules until the network (possibly) converges to some stable global state, at which point the output is read from a set of designated output units. We consider only finite networks of units with binary states, and with discrete-time synchronous dynamics (i.e., all the units are updated simultaneously in parallel). However, it is known that any computation on a synchronous network can be simulated on an asynchronous network where the updates are performed in a specific sequential order (Tchuente 1986; Bruck and Goodman 1988) , or indeed even on a network where the update order is a priori totally undetermined (Orponen 1995) .
Following the early work of McCulloch and Pitts (1943) and Kleene (1956 ) (see also Minsky 1972) , it has been customary to think of finite (asymmetric) networks of threshold logic units as equivalent to finite automata (for recent work along these lines see, e.g., Alon et al. 1991; Horne and Hush 1994; Indyk 1995) . However, in Kleene's construction for the equivalence, the input to a net is given as a sequence of pulses, whereas from many of the current applications' point of view it would be more natural to think of all of the input as being loaded onto the network in the beginning of a computation. [This is also the input convention followed in standard Boolean circuit complexity theory (Wegener 19871 .1 Of course, this view makes the network model noiiiiiziform, in the sense that any single net operates on only fixed-length inputs, and to compute a function on an intinite domain one needs a seqiieiice of networks.
Since a recurrent net of s units converging in time f may be "unwound" into a feedforward circuit of size s . f, the class of Boolean functions computed by polynomial size, polynomial time asymmetric nets coincide with the class P/poly of functions computed by polynomial size Boolean circuits or, equivalently, polynomial time Turing machines with a polynomially bounded number of nonuniform "advice bits" (Karp and Lipton 1982; Balcazar r t al. 19881 .' On the other hand, if computation times are not bounded, then a relatively straightforward argument shows that the class of functions computed by polynomial size asymmetric nets equals the class PSPACE/poly of functions computed by polynomial space bounded Turing machines with polynomially bounded advice. Parberry (1990) attributes this result to an early unpublished report by Lepley and Miller (1983) , but for completeness we outline a proof in Sectloll 3.
Thus, general asymmetric recurrent nets are fairly easy to characterize computationally, and turn out to be quite powerful. On the other hand, as pointed out by Hopfield (1982) , networks with symmetric interconnections possess natural Liapunov functions, and are thus at least dynamically much more constrained. (The simple dynamics, and the generic form of the Liapunov functions, are also what make symmetric networks so attractive in many applications.) Specifically, Goles (1982) and Hopfield (1 982) observed that under asynchronous updates, any symmetric net with no negative self-connections at the units always converges from any initial state to some stable final state. By analyzing the rate of decrease of the Liapunov functions, Fogelman ef n / . (1983) further showed that in a symmetric net of p units with no negative self-connections, and with integer connection weights zv1,, the convergence requires at most a total of unit state changes. Under synchronous updates a similar bound holds also for nets with negative self-connections (Poljak and Sura 1983; Goles ct a / . 1985; Bruck and Goodman 1988) , but in this model the network ' In a recent paper, Siegelmann and Sontag (1994) prove that also f m i u f d size asymmetric recurrent nets with rc.nl-idtted unit states and connection weights, and a saturated-linrar transfer function compute in polynomial time exactly the functions in P/poly. may also converge to oscillate between two alternating states instead of a unique stable state.
Thus, in particular, symmetric networks with polynomially bounded weights converge in polynomial time. On the other hand, networks with exponentially large weights may indeed require an exponential time to converge, as was first shown by Goles and Olivos (1981) for synchronous updates, and by Haken and Luby (1988) for a particular asynchronous update rule. [The former construction was later simplified by Goles and Martinez (1989) .] A network requiring exponential time to converge under an arbitrary asynchronous update order was first demonstrated by Haken (1989) . The existence of networks with exponentially long asynchronous transients is now known to follow also from the general theory of local search for optimization problems (Schaffer and Yannakakis 1991) .
In this paper, we prove that despite their constrained dynamics, computationally symmetric networks lose nothing of their power: specifically, symmetric polynomial size networks with unbounded weights can compute all functions in PSPACE/poly, and networks with polynomially bounded weights can compute all functions in P/poly. The idea, presented in Section 4, is to start with the simulation of space-bounded Turing machines by asymmetric nets, and then replace each of the asymmetric edges by a sequence of symmetric edges whose behavior is sequenced by clock pulses. The appropriate clock can be obtained from, e.g., the symmetric exponential-transient network by Goles and Martinez (1989) . Obviously, such a clock network cannot run forever (in this case it is not sufficient to have a network that simply oscillates between two states), but nevertheless the sequence of pulses it generates is long enough to simulate a polynomially space-bounded computation or, in the case of polynomially bounded weights, a polynomially time-bounded one.
For more information on the computational aspects of recurrent threshold logic networks, or more generally automata networks, see, e.g., the survey articles and books by Florken and Orponen (19941, Fogelman ef al. (1987) , Goles and Martinez (1990) , Kamp and Hasler (1990) , and Parberry (1990 Parberry ( , 1994 .
Preliminaries
Following Parberry (1990), we define a (discrete) neural network as a 6-tuple N = (V, I , 0, A: w. h), where V is a finite set of units, which we assume are indexed as V = (1. . . . , p}; I C V and 0 V are the sets of input and output units, respectively; A C V is a set of initially active units, of which we require that A n I = 0; w : V x V + Z is the edge weight matrix, and where sgn(t) = 1 for t 2 0, and sgn(t) = 0 for t < 0. This updating process is repeated until no more changes occur, at which point we say that the network has converged, and the output value f N ( x ) can be read off the output units (in order). If the network does not converge on input x, the value ~N ( x ) is undefined.
For simplicity, we consider from now on only networks with a single output unit; the extensions to networks with multiple outputs are straightforward. The language recognized by a single-output network N, with JZ input units, is defined as
Given a language A following complexity classes of languages:
(0,1}*, denote A(") = A n (0, l}". We consider the PNETS = {A 2 (0. l}' 1 for some polynomial 9, there is for each n a network of size at most 9(n) that recognizes A(") } PNETS(symm) = {A C (0. l}' I for some polynomial q, there is for each n a Hopfield net of size at most q ( n ) that recognizes A(") } PNETS(symm, small) = {A C {0,1}* I for some polynomial 9, there is for each n a Hopfield net of weight at most q(n) that recognizes A(") } Let (x. y) be some standard pairing function mapping pairs of binary strings to binary strings (see, e.g., Balcdzar et al. 1988, p. 7) . A language A 2 {0.1}* belongs to the nonuniform complexity class PSPACE/poly (Karp and Lipton 1982; Balcdzar et al. 1987 Balcdzar et al. , 1988 , if there exist a polynomial space bounded Turing machine M and an "advice" function f : N + (0, l}*, such that for some polynomial 9 and all n E N, I f ( i z ) l 5 q(n), and for all x E {0,1}*,
The class P/poly is defined analogously, using polynomial time instead of space bounded Turing machines. The class P/poly can also be characterized as the class of languages recognized by polynomial size-bounded sequences of feedforward Boolean circuits (Balcazar et al. 1988, p. 111) . This includes circuits using threshold logic gates, as any threshold function on k variables can be implemented as an AND/OR/NOT-circuit of size O(k2 log2 k) and depth O(1ogk) (Parberry 1994, p. 173) .
Simulating Turing Machines with Asymmetric Nets
Simulating space bounded Turing machines with asymmetric neural nets is fairly straightforward. To prove the converse inclusion, let A E PSPACE/poly via a machine M and advice functionf. Let the space complexity of M on input (x.f(Ix1)) be bounded by a polynomial q(Ix1). Without loss of generality (see, e.g., Balcazar et al. 1988) we may assume that M has only one tape, halts on any input (x.f(lx1)) in time c'~(lxl), for some constant c, and indicates its acceptance or rejection of the input by printing a 1 or a 0 on the first square of its tape.
Following the standard simulation of Turing machines by combinational circuits (Balchzar et al. 1988, pp. 106-112) , it is straightforward to construct for each n a feedforward threshold logic circuit that simulates the behavior of M on inputs of length n. [More precisely, the circuit (Fig. 1, left) . Every two consecutive layers of wires are interconnected by an intermediate layer of q(n) constant-size subcircuits, each implementing the local transition rule of machine M at a single position of the simulated configuration. The input x is entered to the circuit along input wires; the advice string f ( n ) appears as a constant input on another set of wires; and the output is read from the particular wire at the end of the circuit that corresponds to the first square of the machine tape. One may now observe that the interconnection patterns between layers are very uniform: all the local transition subcircuits are similar, with a structure that depends only on the structure of M , and their number depends only on the length of s. Hence we may replace the exponentially many consecutive layers in the feedforward circuit by a single transformation layer that feeds back on itself (Fig. 1, right) . (As can be seen in the figure, we now use an explicit layer of units to represent the configuration of machine M , with small positive self-connections to maintain the representation between successive transformations.) The size of the recurrent network thus obtained is only O[q(n) ]. When initialized with input x loaded onto the appropriate input units, and advice string f ( n ) mapped to the appropriate initially active units, the network will converge in O[c'!('''] update steps, at which point the output can be read off the unit corresponding to the first square of the machine tape. 
Simulating Asymmetric Nets with Symmetric Nets
Having now shown how to simulate polynomial space bounded Turing machines by polynomial size asymmetric nets, the remaining problem is how to simulate the asymmetric edges in a network by symmetric ones. This is not possible in general, as witnessed by the different convergence behaviors of asymmetric and symmetric nets. However, in the special case of conzlergent computations on asymmetric nets the simulation can be effected. Proof. Because PNETS(symm) PNETS, and by the previous theorem PNETS c PSPACE/poly, it suffices to prove the inclusion PSPACE/poly c PNETS(symm1. Given any A E PSPACE/poly, there is by Theorem 3.1 a sequence of polynomial size asymmetric networks recognizing A. Rather than show how this specific sequence of networks can be simulated by symmetric networks, we shall show how to simulate the convergent computations of an arbitrary asymmetric network of n units and e edges of nonzero weight on a symmetric network of O(n + e) units and O(n2) edges.
The construction combines two network "gadgets": a simplified version of a mechanism due to Hartley and Szu (1987) for simulating an asymmetric edge by a sequence of symmetric edges and their interconnecting units, whose behavior is coordinated by a system clock (Figs. 2  and 3) ; and a binary counter network due to Goles and Martinez (1989;  see also Goles and Martinez 1990, pp. 88-95 ) that can count up to 2" using about 3~ units and O(ir') symmetric edges (Fig. 4 ). An important observation here is that any convergent computation by a network of 11 units has to terminate in 2" synchronous update steps, because otherwise the network repeats a configuration and goes into a cycle; hence, the exponential time counter network can be used to provide a sufficient number of clock pulses for the simulation to be performed.
Let us first consider the gadget for a symmetric simulation of an asymmetric edge of weight 11' from a unit i to a unit j (Fig. 2) . Here the idea is that the two units inserted between the units i and j in the symmetric network function as locks in a canal, permitting information to move only from left to right. The locks are sequenced by clock pulses emanating from the units labeled A and B, in cycles of period three as presented in Figure 3 .
Let us consider the behavior of the gadget starting at some time f = 0 (for simplicity), assuming that at this time clock A is on, the first intermediate unit is clear, clock B is off, and the current state of the simulated unit i is represented in the second intermediate unit. At time 1 clock B turns on, clearing the second intermediate unit at time 2 (note that the connection from unit j is not strong enough to turn this unit back on). However, simultaneously at time 2, a new state is computed at unit j, The next question is how to generate the clock pulses A and B. It is not possible to construct a symmetric clock network that runs forever: at best such a network can end up oscillating between two states, but this is not sufficient to generate the period 3 pulse sequences required for the previous construction. However, Figure 4 presents the first two stages in the construction of a (311 -4)-unit symmetric network with a convergence time of more than 2" (actually, 2" + 2"+' -3) synchronous update steps. (For the full details of the construction, see Goles and Martinez (1989) .] The idea here is that the i i units in the upper row implement a binary counter, counting from all 0s to all 1s (in the figure, Figure 4 : The first two stages in the construction of a binary counter network (Goles and Martinez 1989) .
the unit corresponding to the least significant bit is to the right). For each "counter" unit added to the upper row, after the two initial ones, two "control" units are added to the lower row. The purpose of the latter is to first turn off all the "old" units, when the new counter unit is activated, and from then on balance the input coming to the old units from the new units, so that the old units may resume counting from zero.* It is possible to derive from such a counter network a sufficient number of the A and B pulse sequences by means of the delay line network presented in Figure 5 . Here the unit at the upper left corner is some sufficiently slow oscillator; since we require pulse sequences of period three, this could be the second counter unit in the preceding construction, which is "on" for four update steps at a time. (Thus, a 2"+'-counter suffices to sequence computations of length up to 2" -1.) The delay line operates as follows: when the oscillator unit turns on, it "primes" the first unit in the line; but nothing else happens until the oscillator turns off. At that point the "on" state begins to travel down the line, one unit per update step, and the pulses A and B are derived from the appropriate points in the line.
The value W used in the construction has to be chosen so large that the states of the units in the underlying network have no effect back on the delay line. It is sufficient to choose W larger than the total weight of the underlying network. Similarly, the weights and thresholds in the counter network have to be modified so that the connections to the delay line do not interfere with the counting. Assuming that W 2 3, it is here 2Following the construction by Goles and Martinez (19891, we have made use of one negative self-connection in the counter network. If desired, this can be removed by making two copies of the least significant unit, both with threshold 0, interconnected by an edge of weight -1, and with the same connections to the rest of the network as the current single unit. All the other weights and thresholds in the network must then be doubled. sufficient to multiply all the weights and thresholds by 6W, and then subtract one from each threshold. Note that as the counter network eventually converges in a stack with all the units "on," the clock pulses correspondingly freeze in positions A "on" and B "off." This makes further updates in the underlying network impossible, but retains it in a Concerning the edge weights in the above constructions, one can see that in the network implementing the machine simulation ( Figs. 1 and 2) , the weights actually are bounded by some constant that depends only on the simulated machine M; in the delay line, the weights are proportional to the total weight of the underlying network; and the weights in the counter network (Fig. 4) are proportional to the length of the required simulation and, less significantly, to the weight of the delay line. (Note that each new counter unit doubles the running time of the counter network, and, on the other hand, introduces weights of magnitude at most equal to the sum of all the earlier weights.) Thus, we obtain as a corollary to the construction that if the simulated Turing machine (or, more generally, asymmetric network) is known to converge in polynomial time, then it is sufficient to have polynomially bounded weights in the simulating symmetric network. Formulating this in terms of nonuniform complexity classes, we obtain: Corollary 4.2. PNETStsymm, small) = P/poly. Consequently, anything that can be computed by asymmetric networks, or nonuniform Turing machines, in polynomially bounded time can also be computed by polynomial weight symmetric networks, with their guaranteed good convergence properties. consistent configuration.
0
The result also implies that large ke., superpolynomial) weights are essential for the computational power of polynomial size symmetric networks if and only if P/poly # PSPACE/poly. [The condition P/poly = PSPACE/poly is known to have the unlikely consequence of collapsing the "polynomial time hierarchy" to its second level (Karp and Lipton 1982) .] In asymmetric networks large weights are not needed; in fact even bounded weights suffice, as can be seen by conceptually replacing each threshold logic unit by a corresponding AND/OR/NOT subcircuit.
Conclusion and Open Problems
We have characterized the classes of Boolean functions computed by asymmetric and, more interestingly, symmetric polynomial size recurrent networks of threshold logic units under a synchronous update rule. When no restrictions are placed on either computation time or the sizes of interconnection weights, both of these classes of networks compute exactly the class of functions PSPACE/poly. If interconnection weights are limited to be polynomial in the size of the network, the class of functions computed by symmetric networks reduces to P/poly. This limitation has no effect on the computational power of asymmetric nets. Although we have considered here only networks with discrete synchronous dynamics, it can be shown that any computation on such a network can also be performed on a slightly larger network with a totally asynchronous dynamics (Orponen 1995) .
Some of the open problems suggested by this work are the following. In the original associative memory model proposed by Hopfield (1982) , all the units are used for both input and output, and no hidden units are allowed. Although this is a somewhat artificial restriction from the function computation point of view, characterizing the class of mappings computed by such networks would nevertheless be of some interest in the associative memory context.
Of more general interest would be the study of the continuous-time version of Hopfield's network model (Hopfield 1984; Hopfield and Tank 1985) . It will be an exciting broad research task to define the appropriate notions of computability and complexity in this model, and attempt to characterize its computational power. the author w a s visiting the Institute for Theoretical Computer Science, Technical University of Graz, Austria.
