Hybrid CMOS/nanoelectronic circuits, combining CMOS chips with simple nanoelectronic crossbar add-ons, may extend the exponential Moore-Law progress of microelectronics beyond the 10-nm frontier. This paper reviews the development of neuromorphic networks ("CrossNets") based on this prospective technology. In these networks, the neural cell bodies ("somas") are implemented in the CMOS subsystem, crossbar nanowires are used as axons and dendrites, while two-terminal crosspoint devices are used as elementary synapses. Extensive analysis and simulations have shown that such networks may perform virtually all information processing tasks demonstrated with software-implemented neural networks, with much higher performance. Estimates show that CrossNets may eventually overcome bio-cortical circuits in density, at comparable connectivity, while operating 4 to 6 orders of magnitude faster, at manageable power dissipation.
INTRODUCTION: CMOL/ NANOELECTRONIC HYBRIDS
Recent research has shown that the impending crisis of the exponential ("Moore's-Law") progress of microelectronics may be postponed for more than a decade by the transfer from purely CMOS technology to hybrid CMOS/nanodevice circuits. [1] [2] [3] [4] In such a circuit, a specially designed CMOS chip is complemented with a simple nanoelectronic add-on: a nanowire crossbar with simple, similar, two-terminal nanodevices at each crosspoint (Fig. 1) . Two types on two-terminal nanodevices are being explored for use in the hybrids: "latching switches" (sometimes called "resistive switches" or "programmable diodes"), with resistive bistability shown schematically in Figure 2 , and "memristive" devices whose parameters, including effective resistance, depend gradually of their operation history.
a Effective connection between the CMOS subsystem and the crossbar subsystem may be provided by an area-distributed interface; Figure 3 shows the so-called "CMOL" version of such interface, 1-4 b which allows the CMOS subsystem to address every nanowire, and hence every crosspoint device of the add-on crossbar.
c The basic idea behind such CMOS/nanoelectronic hybrids is that nanowire levels of the crossbar do not need alignment ("overlay"), 5 8 and hence may be fabricated using advanced patterning methods, such as nanoimprint, [9] [10] [11] EUV interference lithography, [12] [13] [14] or block-copolymer lithography 15 16 for whom the nanoscale-accurate overlay is not available. As a result, the crossbar half-pitch F nano may be much smaller than that (F CMOS ) of the CMOS subsystem. Detailed simulations have shown that for the ratio F CMOS /F nano ∼10, which may be anticipated by the end of this decade, 23 , the CMOS/nano hybrids with CMOS interface may provide a nearly two-orders-of magnitude advantage over pure CMOS (with the same F CMOS and power per unit area, and comparable speed) for at least two digital applications: resistive memories 24 and FPGA-like reconfigurable logic circuits. [25] [26] This advantage alone, equivalent crudely to 10 to 15 years of Moore's Law extension, may be a sufficient motivation for the industrial introduction of the hybrids.
The goal of this paper is to review another area of possible applications of such hybrids in integrated circuits of another type: bio-inspired neuromorphic networks-"CrossNets,"
2-5 27-35 whose estimated performance over CMOS circuits with similar functionally is even much higher, up to 6 orders of magnitude. The review starts with a description (in Section 2) of the basic idea and topology of CrossNets, followed by their performance estimates (Section 3). Section 4 reviews the developed methods for Fig. 2 . The I-V curve of a latching switch (schematically). The device may be switched from OFF state to ON state and back by applying sufficiently high voltages, V > V t and V < V t , respectively. Such devices have been implemented using a broad range of materials (for reviews, see Refs. [5, 17, 18] ), some of them with acceptable device-to-device reproducibility. [19] [20] [21] [22] the import and/or runtime adaptation of their synaptic weights (which, for neural computation, play the role of programming at usual computing). Several examples of tasks which may be performed by CrossNets are described in Section 5, while the concluding Section 6 discusses prospects of their practical applications and major challenges to be met on the way towards their practical introduction. Figure 4 shows the generic geometry of CrossNets. Neural cell bodies (somas) are implemented in the CMOS subsystem. For the simplest firing-rate models of neural networks, [36] [37] [38] the soma may be just an analog amplifier with saturation, while in more bio-plausible, "spiking" models the somatic circuit receives, processes, and generates its own nerve pulses ("spikes").
TOPOLOGIES
Output signal voltage V k (t) of a soma is applied, through an area-distributed interface (Fig. 3) , to two "axonic" nanowires, in Figure 4 shown with red lines. Perpendicular, physically similar "dendritic" nanowires of the crossbar (blue lines) lead to inputs of other somas. If the two-terminal device at the crosspoint of k-th axon and j-th dendrite is in its ON state (Fig. 2) , this voltage provides a substantial contribution to the current injection I j into j-th dendritic wire, so that in the linear approximation, and low input load, the total current is
where G jk is the crosspoint device conductance, and s j = ±1 is the input polarity. Equation (1) is exactly the key add-multiply operation pertinent to any artificial neural network (with the product s j G jk playing the role of the synaptic weight w jk which consumes most computing resources at the network implementation in software run on general-purpose digital computers. In CrossNets, this is an analog operation which may be extremely fast-see the next section. 
Composite synapse providing L = n 2 + 1 discrete weight levels. Dark-gray rectangles are resistive metallic strips at some/nanowire interfaces.
Perhaps the most important feature of CrossNets is that their connectivity M (the upper limit in the sum in Eq. (1), i.e., the number of cells providing signal to any given one "directly" (via a synaptic contact), depends only on the distance between the somas and theoretically is unlimited, despite the quasi-2D geometry of these circuits. This is very important for modeling of (and eventually competing with) bio-cortical circuits whose average connectivity is close to 10 4 .
39
Besides the intercell distance (and hence the connectivity), CrossNet properties depend on the cell distribution over the synaptic field. Figure 5 shows the feedforward versions of two CrossNet types most explored so far: the so-called FlossBar and InBar. 28 The former network is more natural for the implementation of multilayered perceptrons (MLP, see Section 5.2), while the latter system may be preferable for recurrent network implementations (Section 5.1).
The generic topologies shown in Figures 4-5 may be readily extended to more advanced neuromorphic networks. For example, flexible combinations of FlossBar and InBar plaquettes are straightforward to engineer. 55 Also, if the used crosspoint devices are bistable (Fig. 2) , i.e. if a single device provides binary synaptic weight, synapses with multi-level weights may be organized from small arrays of such bistable switches (Fig. 6) ; two complementary square arrays, of n × n switches each, provide L = 2n 2 + 1 discrete weight levels, with L = 33 (i.e., n = 4) being sufficient for some key algorithms. 32 
PERFORMANCE ESTIMATES
The most important motivation for the architecture development, quantitative simulation (and eventually hardware implementation) of CMOL CrossNets comes from estimates of their possible density, speed, and power. In CMOL topology, the total area occupied by synapses serving one cell is close to
At reasonable connectivity (M > 10 2 and weight level discreteness (L ≥ 33), this area is more than sufficient for the layout of even the most complex (say, spiking) somatic circuitry, so that CrossNet density may be estimated from Eq. (2). For such realistic numbers as L = 33, M = 3×10 3 , and F nano = 5 nm, it yields A ∼ 10 −7 cm 2 , corresponding to approximately 100 M neural cells per cm 2 . This is already close to mammal cerebral cortex neuron density (per cortex area 39 ). A substantial additional density increase may be obtained by the further crossbar scaling (some conceptual problems such as quantum-mechanical tunneling between the adjacent nanowires do not start until F nano ∼ 2 nm), quasi-3D 40 and genuine-3D 41 integration, as well as and using single "memristive" crosspoint devices, with continuously adjustable conductance, as synapsessee Section 6 below.
CrossNet speed estimates are even more impressive. Since some crosspoint devices may be switched faster than 100 ns (see Tables I-III . For this power level, which does not require dedicated cooling, the intercell delay scale RC is of the order of 2.5 s. If P is increased to 100 W/cm 2 (typical for the high-performance microprocessors), the latency decreases to ∼25 ns. These numbers are, respectively, approximately 3 and 5 orders of magnitude lower than the average intercell latency in the biological cortex. 38 39 Of course, such comparison of CrossNet with biocortical circuitry would be completely fair only if their functionality had been close. So far, theoretical neuroscience is still very far from telling us how this goal may be achieved, and gives recipes for performing only relatively simple cognitive tasks (some of which, nevertheless, already have important practical applications). The next section describes some options for implementation such recipes in CrossNets. 
Weight Import
If the set of synaptic weights for a particular task has been determined using an external digital computer (a "precursor"), these weights may be imported into a CrossNet. This task is not so trivial. Indeed, setting a certain set w jk of synaptic weights in a CrossNet with bistable crosspoint devices requires switching each device into its proper state, either ON and OFF. For that, voltages applied to two nanowires leading to each device have to be manipulated in a way which ensures the desirable switching event in the selected device while not perturbing the states of other ("semi-selected") devices connected to each of the wires. Moreover, the import procedure should be parallelized as much as possible to ensure practicable weight import times. Nevertheless, such import procedures, with a number of time steps scaling as M (rather than the total number of crosspoint devices in the network!) have been developed both for InBars and FlossBars, both with binary 29 and multi-level 29 32 synaptic weights. There is a feeling that these solutions may be extended to virtually any future CrossNet topology.
In-Situ Adaptation: Firing-Rate Models
If the task of synaptic weight calculation is too large for a digital precursor, it has to be performed within the CrossNet itself. Of several adaptation algorithms, the Hebb rule and its variations [36] [37] [38] are believed to be most important. Figure 7 shows the method (based on the well-known "statistical multiplication" approach) of providing the Hebbtype plasticity in firing-rate CrossNets. 30 In this method, each synapse consists of four arrays of n × n elementary latching switches, fed by bipolar (dualrail) voltages, so that in the signal propagation mode, the synaptic weight may take any of the L = 
S(t) S(t) S(t) S(t)
Sign of In this arrangement, the probability of each comparator to apply the voltage of proper polarity to each output wire is proportional to the input analog signal. A straightforward analysis 30 shows that, as a result, the average synaptic weight change during a time-division multiplexing cycle is
where is a constant depending on the bistable device parameters and the comparator output voltage. This is just the standard x 1 x 2 form of the Hebb rule, but with additional saturation (which is unavoidable at hardware implementation of synaptic weights).
An important feature of this adaptation method is that all necessary circuitry (including the comparator and pseudorandom number generator) serving one axonic or dendritic line may be shared by all M synapses of that line. As a result, the overhead of CMOS hardware necessary for their implementation does not affect density of CrossNets with biologically-plausible values of connectivity M.
In-Situ Adaptation: STDP
In spiking neuromorphic networks, which explicitly model neural pulses in biological systems, the most popular way of the Hebb rule implementation is the so-called spiketime-dependent plasticity, STDP. 42 to increase the probability of OFF → ON switching if the pre-synaptic and post-synaptic spikes are within a certain time window (the latter spike follows the former one), and suppress the probability in the opposite case. Figure 8 shows an simple example of how this can be achieved in spiking CrossNets. 6 Each spike consists of two voltage pulses: a longer pulse A and a shorter but higher pulse B, with the amplitude approaching (but somewhat below) the switching threshold V t . Applied to a composite synapse (Fig. 6) , the former pulse creates a current pulse which causes a gradual RC-increase of the dendritic voltage V d . (The contribution of pulse B to the recharging is small, due to its short duration.)
If the dendritic voltage was well below the spiking threshold of the post-synaptic cell, the dendritic wire recharging decreases the net voltage V = V a -V d applied to the crosspoint switches, and hence creates a certain (small) probability of ON → OFF switching of those devices which were in their ON state. However, if V d was close enough to the threshold, the recharging triggers a similar spike in the post-synaptic neuron. The somatic circuitry sends this pulse not only to that soma's output (V a , but also back to the dendrite with the appropriate (negative) polarity. Now the net voltage V is increased, so that when the short pulse B is applied to the axonic nanowire, V exceeds the threshold, creating a finite probability of switching for those crosspoint devices of the composite synapse which were still in their OFF state. (Similar schemes, but with more complex somatic circuitry, have been proposed in Refs. [44] [45] [46] .)
Unfortunately, numerical modeling shows that this scheme does not promise good scaling of STDP plasticity with growing connectivity M (Fig. 9) . The reason is 
(a) 
Va(t -t k′ )
Va(t -t j ) ×α scaling inverter from to soma j′ from to soma j′ 
that at the bio-plausible average rate f of spikes generated by each cell may be as high as ∼0.1/ , where is the spike duration, so that the product Mf may be much higher than 1, meaning that at the input of each soma, many spikes may overlap. As a result, the STDP response becomes noisy, and its average deviates from the desired antisymmetric function of the spike delay.
Ways toward better scaling still have to explored; here let me only mention that these complications may be naturally avoided in "Flash CrossNets"-model circuits using flash memory cells working in the analog mode-see Figure 10 and Table I . (Earlier suggestions of using flash technology in neuromorphic networks [47] [48] [49] were based on more complex cells.) Figure 11 shows a typical result of theoretical analysis of such flash synapses, with the simple somatic feedback circuit shown in Figure 10 Of course, the flash memory technology is essentially a twist of CMOS, so that it requires patterning with accurate (Fig. 10) versus the spike delay. Small points: simulation results from ∼2,400 numerical experiments; squares-results given by the singlespike approximation.
layer alignment, and cannot be scaled down as much as nanowire crossbars. However, in Flash CrossNets, one flash cell may provide the synaptic weight accuracy comparable to that of a multi-latching-switch array (Figs. 6, 7) , at a comparable network density. (Suggestions 50 to use continuously-adjusted memristive crosspoint devices for providing analog synaptic weights would probably require much lower device-to-device variability than the one demonstrated experimentally.)
APPLICATION EXAMPLES
The results presented in this section have been obtained by CrossNet simulation using their realistic models. They give some idea about possible performance of such networks.
Hopfield Networks: Pattern Recognition
Possibly the simplest type of an artificial neural net is the recurrent, firing-rate network with symmetric synaptic weights, w jk = w kj . (Such networks had been explored by several researchers 51 before they were made famous by Hopfield. 52 ) Properly trained, the Hopfield network may work as an associative memory, using a part of a pre-written patterns to restore ("recognize") the whole pattern.
Since the capacity of such memory is very weakly affected by synaptic weight discreteness, a CrossNet with just one latching switch per synapse may operate very well in this mode; its main difference from the generic Hopfield net is the quasi-local (rather than global) connectivity M, limiting its capacity to ∼0.45 M at 99% restoration fidelity. 
Multilayer Perceptrons: Pattern Classification
A much more important, but also more demanding function of neuromorphic networks is the pattern classification which may be achieved, for example, in layered perceptrons after their supervised training by error backpropagation-see, e.g., Refs. [36] [37] [38] . Two major concerns about using CrossNets in this mode have been: (i) the necessary accuracy of synaptic weights, and (ii) defect tolerance. Figure 13 shows typical results of study, 32 which used a very common benchmark-the MNIST database of typewritten characters. 54 It shows that, for example, L = 33 synaptic levels (available from two 4 × 4 composite synapses shown in Fig. 6 ) are sufficient for getting virtually the same fidelity (∼98%) as for exact (continuous) synaptic weights, and that a very substantial number of stuck-at-closed defects cause only a slow fidelity degradation.
These results pertain to weights imported from a precursor network; this training method is quite sufficient, for example, for a known face recognition in a large crowd, because it may be based on using multiple copies of the desired pattern, with a TV-raster-type search (Fig. 14) . Estimates have shown 55 that such a CrossNet chip, with area below 1 cm 2 , may identify a face on a 8-Mpixel image in approximately 100 s, the number to be compared with ∼10 3 s for the same algorithm run on a general-purpose microprocessor.
Global Reinforcement and TD Learning
Some cognitive tasks require unsupervised learning, in particular, global reinforcement with either instant or delayed reward. 36-38 56 A study of this mode of CrossNet operation 33 has shown that these networks are quite suitable for the most popular global-reinforcement algorithms, such as A ri . 56 Moreover, they may use similar algorithms (which have been called A 1 and A 1 based on synaptic rather than somatic randomness, which are more natural for nanodevice implementation of synapses. Figure 15 shows that these new algorithms provide just a slightly lower learning speed than A ri for the cart-and-pole balancing task-a popular benchmark for global reinforcement with delayed reward.
PROSPECTS, CHALLENGES, AND OPTIONS
Though studies of possible CrossNet applications are in the very beginning, it looks like that these networks may be used for performing virtually any cognitive task which had been demonstrated using software-implemented neural nets, at very high speed (with manageable power (ii) Experimental networks used for high-performance studies of novel neuromorphic algorithms-both for modeling certain cortical functions and for the implementation of more complex practical cognitive tasks such as data mining or autonomous robotic operation in complex and hostile environments.
Moreover, it is possible (though by no means guaranteed) that in future, CrossNet circuits will become the first hardware capable of challenging the mammal (human?) cerebral cortex. This opportunity may be even more enhanced by the recent suggestion of several quasi-3D 40 and genuinely-3D 41 versions of CMOL circuits-see, e.g., Figure 16 .
However, in order for that to happen, numerous issues have to be resolved. First of all, the existing methods of fabrication of crosspoint devices with latching-switch functionality have to be improved to increase their yield and reduce device-to-device variability. It may happen that the solution of this problem will require latching switches with a completely different physics of their operation-for example, based on single-electron tunneling in molecular self-assembled monolayers. 57 Second, integration of hybrid circuits has to be demonstrated at much larger scale than it has been done so far, which in turn may require facing several technological challenges (such as fabrication of nanometer-sharp interconnect pins at temperatures allowable at the back end of CMOS stack).
Last but not least, new CrossNet architectures, capable of performing more complex cognitive tasks, have to be suggested and explored-possibly assisted by modeling with simpler CrossNet circuits.
