A switched-capacitor (SC) neuromorphic system for closed-loop neural coupling in 28 nm CMOS is presented, occupying 600 um by 600 um. It offers 128 input channels (i.e., presynaptic terminals), 8192 synapses and 64 output channels (i.e., neurons). Biologically realistic neuron and synapse dynamics are achieved via a faithful translation of the behavioural equations to SC circuits. As leakage currents significantly affect circuit behaviour at this technology node, dedicated compensation techniques are employed to achieve biological-realtime operation, with faithful reproduction of time constants of several 100 ms at room temperature. Power draw of the overall system is 1.9 mW.
tion of fully digital implementations [10] , [11] , current neuromorphic systems are not able to participate in the technological advances and especially the system scaling offered by deep submicron processes.
These problems can be largely circumvented by using switched capacitor (SC) circuits [12] [13] [14] , which rely on charges and voltages to perform computation, not on currents. By replacing continuously flowing very small currents with their equivalent accumulated charge, equivalent signal levels are higher (and hence more controllable) and robust charge-based signal transmission and computation can be utilized.
We present a neuromorphic system realized in SC circuit technique in a Super Low Power 28 nm CMOS technology, operating with a 1 V supply. The system is targeted at a closed loop interface to in-vitro cortical neuron cultures. This necessitates mimicking the memory and short-term decision making dynamics of the cortical network [15] , [16] , with timescales on the order of several 100 ms [1] , [17] . We implement the model of short term dynamics presented in [18] , with transistor-level SC circuits derived from the high-level building blocks introduced in [18] . The logical organisation of the synaptic matrix was adapted from [19] .
Although the SC technique is inherently more robust to current and voltage noise than subthreshold circuits, on the timescales referred to above, stored charge signals can still be affected by the leakage currents of switching transistors. This effectively limits biological-realtime operation. Thus, we use a simplified version of the circuit techniques in [20] , [21] to reduce leakage currents to achieve longer time constants. Compared to conventional biological interfacing solutions, [22] , [23] , no digital processing chain is necessary. The behavioural models that allow the system to couple into biological dynamics are directly implemented as discrete-time analog state circuitry, driven by incoming action potentials (i.e., spikes). At the same time, the SC approach makes the systems' behaviour widely digitally configurable. The use of 28 nm CMOS eases integration with low-power digital systems.
The remainder of the paper is structured as follows. First, we introduce the overall system, followed by its digital and analog building blocks. We show how biologically realistic neuron and synapse behaviour as well as biological-realtime operation is achieved with SC CMOS circuit techniques. We then give detailed measurement results on the overall system and its indi- vidual components. Lastly, we discuss the significance of the results. Fig. 1 gives an overview of the system. 128 input circuits on the left side implement presynaptic short term dynamics for their respective row in the synaptic matrix [18] , while the 64 Leaky Integrate and Fire (LIAF) neurons shown on the bottom are driven by their respective column, providing the output (i.e., stimulation) signal as a function of the 8192 synapses in the matrix coupling presynaptic inputs to the neurons. Synaptic weights are stored in a RAM block on the side of the matrix.
II. IMPLEMENTATION

A. Overall System
The entire driving circuitry of presynapses, synapses and neurons is situated on the left hand side of the matrix. In real-time operation, a state machine cycles through the columns of the synaptic matrix in 0.62 ms. At the start of the cycle, the input pulses that were registered during the last cycle are forwarded to the driver circuits and the corresponding presynaptic adaptation state is computed. Then, each synaptic column and its corresponding output neuron is activated sequentially, weighting the presynaptic pulses by the corresponding synapse state and synaptic weight, integrating them on the postsynaptic neuron and applying the leaky decay term to the neuron. Details on the cycle process can be found in Section II.D. In effect, the switched capacitor neuron and synapse matrix behaves as a fixed-timestep neural simulator with a 0.62 ms time resolution, with neuron and synapse states stored in the matrix and updates to the states carried out via the active driver circuits on the left side of the matrix [19] , [24] . 28 nm die picture with location of neuromorphic system outlined and the corresponding layout. The overall IC is 1.5 mm by 3 mm, with various test structures in addition to the neuromorphic system. The layout view shows detailed placement of the single building blocks of the neuromorphic system (see Fig. 1 ).
A picture of the manufactured IC is shown in Fig. 2 . The circuit design utilizes core devices of the SLP 28 nm technology only. In contrast to the current biasing usually employed in neuromorphic ICs [25] , the neuromorphic SC circuits are governed by voltages for amplitude settings and digital configuration for time constants. Correspondingly, there is a multi-output R-ladder based digital to analog converter (DAC) situated below the matrix in Fig. 2 . It provides the bias voltages for, e.g., postsynaptic current (PSC) scaling, neuron thresholds or reset voltages. To reduce the area of the DAC, neuromorphic elements have been assigned to groups sharing the same bias voltages. Group size is 16, so that neurons 0 to 15 share their biases, and synapse drivers and presynaptic adaptation 0 to 15 also share their biases, etc.
Time constants are set via counters that govern the switching cycles of the SC circuits. Thus, scaling of the clock frequency effectively scales the speed of the system, keeping the 0.62 ms resolution relative to the chosen time base. The neuromorphic system was designed for speeds from biological realtime up to an acceleration of 100. As the time constants scale with the clock and the DAC amplitude settings are independent of clock speed, the same configuration for all parameters can be used irrespective of the speed-up, nominally giving the same results. Communication with the system is provided by a joint test action group (JTAG) interface, implementing a generic packetbased protocol for both pulse and configuration data. Additionally, two configurable test outputs allow for monitoring analog voltages, such as membrane potentials. With its minimal interface, using only six signal pins and two bias pins (one bias current and one pin for common mode voltage), the neuromorphic system can be easily integrated into a multi-core system.
B. Digital System Design
Similar to the communication setup in [26] , [27] , the neuromorphic system employs a unified packet-based interface for configuration and incoming/outgoing pulse data. Data exchange is realized via an input first-in-first-out (FIFO) buffer and an output FIFO buffer. For system integration, only a write and read interface to these FIFOs has to be provided, which is done via JTAG in the current implementation. Each data packet has a 32 bit payload and a 16 bit header. For input data, the header includes 4 bits of type and 12 bits of address information. For output data, the header only contains a 5 bit type identifier.
Input spikes are sent to the neuromorphic system as addresses of 7 bits and one enable bit, so that four spikes fit in one data packet. Output spikes are collected over one matrix cycle and stored as one bit per neuron. If at least one neuron spiked in the current cycle, the 64 bit spike vector over all neurons is sent to the output FIFO, forming two separate entries. Similar to the grouping for the analog parameters, the digital parameters are also shared among groups of 16 each for presynaptic circuits and neurons, which reduces the digital configuration space.
All digital components and the SC circuits are clocked by an on-chip phase-locked loop (PLL) [28] . It produces an internal fixed frequency of 2 GHz that is downscaled to a 330 MHz output. The neuromorphic system employs an 8 bit configurable clock divider that allows for further downscaling of the clock frequency. Biological-realtime operation corresponds to a divider value of 100, i.e., a clock frequency of 3.3 MHz for the state machine of the neuromorphic system and a matrix update cycle of 0.62 ms, as mentioned in Section II.A. For the maximum speed-up factor of 100, the divider value is 1, resulting in a clock frequency of 330 MHz and a matrix update cycle of 6.2 s. With respect to the update frequency of the matrix, the clock is somewhat high, which is due to the fact that non-overlapping switching signals for the SC-components are derived from it, see also the signal edges in Fig. 8 . The maximum speed-up factor is partially limited by the high clock frequency needed for the digital components, but the actual limit is due to the RC time constants of the SC circuits, as explained later.
C. Presynaptic Adaptation and Synaptic Long-Term Plasticity
The presynaptic adaptation circuit (see Fig. 3 ) implements the model of synaptic dynamics proposed in [18] , which is derived from biological measurements [29] . It is capable of reproducing depression, facilitation and combinations of both mechanisms. The circuit produces an output voltage , which represents the waveform of exponentially decaying PSCs: (1) where is the amplitude of the n-th PSC. Since the shortterm adapation circuitry makes use of SC circuits, the resulting PSC voltage trace is time discrete. The time constant of the PSC decay, as well as the time constants for depression and facilitation , can be adjusted. The impact of the facilitation and depression mechanisms can be controlled by the digital parameters and , respectively. For details, please refer to [18] . For a list of configurable parameters and their tuning range see Table I .
The long term plasticity model chosen for this neuromorphic system is the stochastic stop learning synapse of [30] . It is based on modifying the synaptic state as a function of the presynaptic 
TABLE I LIST OF PARAMETERS FOR SHORT-TERM ADAPTATION AND NEURON CIRCUIT
spike and the postsynaptic membrane voltage. In our implementation of the rule, when a column gets activated during the matrix cycle, the analog synaptic state held in the corresponding synaptic capacitance in the synaptic matrix is read out. It is then modified according to the equations in [30] , with appropriate configurable parameters. As this paper focuses on the overall neuromorphic system and its static operation, the reader is referred to the companion paper [31] for an in-depth circuit description and detailed measurements of long-and short term plasticity in this neuromorphic system.
Based on the synapse state, the PSC amplitude is then modified by a weight scaling circuit. There is one of these per synapse row, located in the presynaptic adaptation circuit. As can be seen in Fig. 4 , all synapses are addressed sequentially by a state machine (see Fig. 8 for the timing diagram, which also shows the entire matrix cycle in relation to the synapse and neuron driver signals). Corresponding to the synapse address, 4 bit long-term potentation (LTP) and long-term depression (LTD) weight values and are read from a RAM. To scale the presynaptically computed PSC by the long term plasticity, the synapse state is collapsed into a binary state value, which can be either potentiated or depressed [30] . Depending on this synapse state, the 4 bit LTP or LTD weight is then selected by a multiplexer. The switches at the four binary-weighted capacitors are closed according to the given weight value. After selecting a synapse, the weight capacitors are initially reset to 0 V. In the following integration phase the differential PSC voltage is applied to the input of the weight scaling circuit. The charge is then transmitted by the capacitors to the neuron circuit (see Section II.D). Additionally, the weight scaling circuit offers the possibility to configure the synapses as either inhibitory or excitatory. This configuration bit is also stored in RAM.
While presynaptic drivers for almost all synapses are activated by an incoming pulse, synapse row 127 is always active, with a constant charge. This charge can be modulated indvidually for each neuron by setting the synaptic weight of row 127 and the column corresponding to the neuron. This way, a constant background current with a 4bit weight and inhibitory or excitatory effect can be set.
D. Switched Capacitor Neuron
The neuron circuit implements an LIAF neuron model: (2) with membrane potential , membrane capacitance and membrane time constant .
is the sum of all PSCs. If reaches the firing threshold a spike is emitted and the membrane is reset to . The parameters of the LIAF neuron and their tuning ranges are listed in Table I . As can be seen in Fig. 5 , the 64 fully-differential membrane circuits are located on one row and share one driver circuit. The membrane circuits are sequentially switched to active and the PSC output of all 128 weight scaling circuits ( Fig. 4 ) are summed on node as a charge. The charge on the global summing node is integrated on the currently selected membrane capacitance by the driver circuit, which is basically an SC integrator.
The integrator's opamp circuit is shown in Fig. 6 . A two-stage architecture has been chosen to overcome the difficulties of stacking transistors at very low supply voltages. In order to enhance the opamp's gain, a boosting technique has been applied [32] , where the load of the first stage has been split into cross-coupled transistors, providing partial positive feedback. Stability is derived by Miller compensation and the common-mode voltage of the output stage is controlled by an SC common-mode feedback circuit. Slew rate performance is enhanced by additional source followers at the output, which is required at high speed-up factors. The bias current scales well with the speed-up, so that the opamp consumes 300 nW at biological realtime and 30 W at a speed-up of 100.
For biological-realtime operation, large membrane time constants in the order of 100 ms are required. Since leakage currents heavily increase when scaling technologies down below 100 nm [33] , a dedicated low-leakage switch similar to those in [20] and [21] has been used [see Fig. 7 (a)], which operates as follows. If the membrane circuit is inactive the membrane capacitance is fully decoupled from the rest of the circuit by turning off M1 and M2. The middle node of the T-switch is set to the common-mode voltage . This reduces the drain-source . In order to decrease junction leakage , minimally sized source/drain areas have been used. While this sizing aids in real-time operation of the matrix, it also defines the upper limit of the speed-up compared to biological realtime (i.e., the factor 100 mentioned in Section II.A), as the switch resistance determines the RC time constant of the SC circuits, limiting the charge transfer speed.
A further advantage of the low-leakage switch is that the decoupling via the middle node makes the leakage currents independent of the opamp output . Gate leakage has no impact in the off-state of the switch and simulations have shown that the effect in the short on-states is negligible. In contrast to the complimentary transmission gates of [20] , [21] , the voltage range chosen here allows to use NMOS devices only, reducing leakage currents and circuit complexity. Regular transistors from the core library were used, rather than dedicated low-devices as in [21] . Compared to [20] , the middle node is held at to reduce -caused channel leakage. The presynaptic adaptation circuit of Section II.C also employs this low leakage switch to achieve its time constants.
While the circuit combats undesired leakage currents, it is also used to implement an intentional, configurable leakage mechanism to complete the LIAF neuron model. This is directly implemented in the individual membrane circuits [see Fig. 7 (c)]. A small capacitance fF is discharged and then shunted to the membrane capacitance fF, leading to a charge equalization. This process is triggered periodically and thus lets the membrane voltage decay exponentially towards 0 V differential voltage when no synaptic input is applied. The membrane time constant is controlled by the switching frequency, alternatively expressed as the period between leakage events:
Since is derived from the system clock, the membrane time constant and all other time constants generated by SC circuits on the chip are proportional to the speed-up factor. Fig. 8 shows the four control signals used by the membrane circuit of column 0.
In order to avoid a permanent integration of the opamp's offset voltage generated by device mismatch, an offset compensation technique has been applied [34] . In the reset phase , unity gain feedback is applied to the opamp. Thus, the output offset voltage is visible at the input. Since is reset to at this time, the opamp offset is sampled on the compensation capacitance . In the following integration phase , i.e., the phase where the PSC charges are transferred to the membrane, the opamp offset is substracted from the input voltage.
In the comparison phase , the membrane voltage is compared against the firing threshold. The comparator circuit consists of an offset compensated preamplifier and a dynamic latch. If , a spike is detected and the membrane voltage is reset. Due to the single-ended nature of the biasing voltages and , the reset is done in an asymmetric fashion, but is compensated by the opamp's common-mode feedback.
III. RESULTS
As detailed in Section II.B, the entire system is ratiometric with respect to the clock frequency. That is, the system clock can be scaled so that the system operates anywhere from biological realtime up to a factor of 100 faster. Realtime operation was used for the measurements in this paper, as the effectiveness of the leakage current techniques becomes most evident there. In addition, operation in biological realtime is the most interesting regime in terms of computation, as it allows interfacing with, e.g., neuromorphic image sensors in real time. The IC and its board are operated at ambient temperature, i.e., no special measures are undertaken to cool the IC. 
A. Measurement of the Presynaptic Adaptation
For measuring the presynaptic adaptation circuits, the two analog test outputs were captured using an oscilloscope, allowing the simultaneous measurement of the PSC voltage of the first presynaptic circuit and the membrane voltage of one neuron. The aquired data was averaged over time bins of 0.1-0.3 ms to reduce the effect of noise. Fig. 9 shows an example of a single postsynaptic potential (PSP). Compared to the expected -shaped curve, the measurement shows a slightly sharper onset, indicating a mismatch in the actual time constants from the nominal values. The corresponding PSC waveform matches with the nominal time constant well (see upper plot in Fig. 9 ). Thus, the mismatch can be attributed to a mismatch in the membrane leakage. As the leakage mechanisms and capacitance sizes are the same in both cases, we attribute the additional leakage in the membrane to the 128 connected PSC outputs.
To evaluate the presynaptic adaptation performance, we stimulated a presynaptic circuit with a regular spike train, choosing various parameter settings to mimic different adaptation types. Results are shown in Figs. 10 and 11 . The measurements agree well with the nominal time courses even without calibrating any parameters (note that for the nominal curves, only the offset of the read-out amplifier was fitted). They differ mainly in the adaptation strength, i.e., in the ratio between highest and lowest PSC amplitude, which is smaller in the measured curves. This effect is most prominent for the depressing synapse. Also, for the synapse with combined facilitation and depression, the total amplitude is maybe 20% too small, see lower half of Fig. 11 . These effects may be caused by charge injection effects, resulting in voltage offsets during updates of the adaptation variables at incoming spikes. Part of the effects may also be explained by the effective time constants being too small. To distinguish between these two effects, we measured the relaxation of a depressed synapse when adaptation was turned off, see lower plot in Fig. 10 . The PSC amplitudes should progress ac- cording to an exponential with the depression time constant in this case. This resembles the measurements well. From this, we infer that the mismatch seen for the depressing synapse is mainly caused by deviating update amplitudes of the depression variable.
Overall, the measurement results show that leakage, charge injection and capacitance mismatch only have a minor impact on the time course of the state variables, showing faithful reproduction of time constants on the order of several hundred milliseconds.
B. Characterization of the LIAF Neuron
We measured the transfer functions of the LIAF neurons in order to characterize variations between neurons and perfor- mance of the leakage circuit. Single neurons were stimulated with regular spike trains at different rates and their output rates were measured over a period of 10 seconds. Fig. 12 shows results for all 64 neurons of one chip with leakage switched off. As expected, the curve increases linearly at low rates, while saturating at high rates, which is caused by saturation of the PSC voltage of the presynaptic circuits. The overall variation between neurons is quite low. A few neurons generally exhibit a lower output frequency. This is especially the case for neuron 0, which may be affected by additional parasitic capacitance at the border of the synaptic matrix. Fig. 13 summarizes measurements for different membrane time constants. As shown in the upper graph, the onset of the transfer functions varies with the time constant setting, as expected for a LIAF neuron. Ideally, the onset frequency should be inversely proportional to the membrane time constant. We used this relationship to compare the effective time constant with the configured settings. The onset frequency was determined for each transfer function by performing a linear fit in the output frequency range of 50 Hz to 150 Hz, not taking onset and saturation effects into account. Results are shown in the lower graph of Fig. 13 . Note that this method is less accurate at larger time constants, where the onset frequency is close to zero, so that small absolute deviations in frequency result in high deviations in the final result. The effective time constants follow the nominal setting linearly for low values, while the slope of the curve decreases at higher values. This effect may be caused by leakage, but may as well be due to a systematic offset in the transfer functions.
C. Characterization of the Synaptic Transfer Function
In order to characterize the synaptic transfer function, a fixedrate pulse train is applied to a single synapse and the resulting firing rate of the postsynaptic neuron is measured. The neuron is configured for integrate-and-fire behaviour (with set to infinity) to achieve a linear relation between input and output firing rate. As can be seen in Fig. 14, the a smooth progression in output firing rate for an increase in input rate. Due to the PSC saturation effect mentioned in Section III.B, the relation between input and output firing rate declines to below linear for high input rates. II  COMPARISON OF THE PRESENTED NEUROMORPHIC NEURAL INTERFACE WITH OTHER GENERAL-PURPOSE NEUROMORPHIC WORK (UPPER PART) As can be seen from the curve intercept on the output frequency axis, a constant background current is applied to the neuron (via synapse row 127, compare Section II.C) that sets its unstimulated firing rate at circa 80-105 Hz. From Fig. 13 , it can be seen that the neuron reacts well to very low rates of synaptic input even without background current. However, if the neuron intrinsically fires at a low rate for low input firing rates, charge injection and other small-signal detrimental effects partially mask the effect of a synaptic weight increase. Thus, this background current is applied to set the intrinsic neuron firing to a high rate, enabling the analysis of the synaptic weight on a 4 bit resolution level.
individual curves show
When sweeping the synaptic weight in Fig. 14, the curves exhibit a linear progression in slope, showing the 4 bit accuracy of the synaptic weight scaling capacitances in Fig. 4 . For the plots in Fig. 15 , the slopes of the curves in Fig. 14 are derived by fitting a linear function to the data points from 0 to 100 Hz input frequency. The blue dashed line shows the slopes derived for the synapse in Fig. 14. This weight sweep was carried out for 20 synapses of one neuron on a single chip. As can be seen from the sample curve, the slope progression across synaptic weights is actually far better behaved than could be implied by the error bars in Fig. 15 . The large spread of curves is mainly due to the scaling error of (compare Fig. 4 ). This error tends to even out when using several PSC circuits, as for the measurements in Section III.B, which use several PSC inputs and thus do not show such a large spread.
can also be calibrated to some extent via the individual DAC settings. However, this was not carried out for the above characterization, as we wanted to obtain an estimate for the typical spread that can be expected on a single chip between the individual synapses when used without calibration.
Note that this spread of transfer functions due to the presynaptic mismatch is not necessarily detrimental; it could be exploited in the context of, e.g., liquid computing [35] or in the Neural Engineering Framework (NEF) [36] , which both rely on random projections via synaptic and neuronal mismatch. However, both need well-controlled readout weights to collapse the random projections. Thus, the 4 bit weight resolution as shown in Fig. 15 together with the ability to set each synapse excitatory or inhibitory could be applied in the NEF to sophisticated population-based signal processing [37] .
D. Overall Results
The characterization results reported in the previous sections show that all components of the system, such as presynaptic adaptation, synapses and neurons are fully functional. Table II gives a comparison with state-of-the-art conventional neuromorphic systems and those targeted at biological interfaces.
Using the mixed-signal SC approach, we could aggressively scale down the neuromorphic system, taking full advantage of technology shrink. As synapse area is a major determinant of overall system size for neuromorphic systems [38] , we have included synapse area in the comparison. As expected, our implementation exhibits full technology shrink when compared with for example the synapse area of [39] .
Conventional neuromorphic systems based on subthreshold circuits [39] usually do not scale that well, as transistors need to be a certain minimum size to control mismatch [40] , [41] . There Fig. 16 . Power measurements for the different supply voltage domains versus the employed speed-up factor. The central clock divider was set inversely proportional to the speed-up factor, such that the resulting clock frequency scaled linearly with the speed-up (see Section II.B). The analog power draw is dominated by the OpAmps (see Section II.D). Their external bias current was chosen such that the system was still operational at the selected speed-up factor. The power consumption is largely independent of the input or output spike rates. are efforts to overcome this barrier by implementing synapses using analog floating gate storage [42] , which is largely immune to mismatch. It could be worthwhile to explore this approach in advanced technology nodes, as floating gates continue to be scaled. However, it is not clear whether the precise storage of analog values required for this approach scales to deep submicron technologies. Current examples of this technique are still implemented in nodes around 350 nm [43] , so absolute synapse sizes are still a factor of 10 larger than in our implementation [38] . The neuromorphic system of [11] in 45 nm only contains externally-programmable 1-bit synapses in the same overall area and power budget. Thus, even compared to a purely digital neuromorphic system in deep-submicron, our SC system delivers the same or better computational density at a competitive power consumption, see Table II .
As shown in Fig. 16 , the power consumption of the digital circuit parts dominates overall power draw in real-time operation (speed-up factor 1 in the diagram). Note that the design was not primarily optimized for low power, meaning that all the digital components of the whole IC (not just the neuromorphic system) are permanently connected to the digital supply voltage (with only the clock of the neuromorphic system being switched on), which results in a static power draw of approximately 1.1 mW at nominal supply voltage of 1 V. Furthermore, the design was not optimized for aggressive supply voltage scaling, as done in [11] . Therefore, the lowest digital supply voltage where the digital parts operate without errors is 0.75 V. We thus performed measurements both at 0.75 V and at the nominal supply voltage of 1.0 V. As expected, the lower supply voltage reduces digital power draw by almost a factor of two. As described in Section II.A, our neuromorphic system can be scaled in speed by varying the clock frequency, so that experiments can be performed either in realtime for interfacing to real-world sensors, or in accelerated time for reducing simulation time. When moving to accelerated simulations, contributions to the power budget change, as can be seen in Fig. 16 . Both dynamic digital and analog power increase approximately linear with the speed-up factor, the latter because of the increasing required bandwidth of the opamps in the analog part. As a consequence, analog power dominates for high speed-up factors. This power draw could be reduced by switching off the opamps during switching phases when their outputs are not used. The static power draw of the PLL could be reduced if instead of the current fixed-frequency PLL and subsequent clock divider, a variable-frequency PLL such as that in [44] was used, where power consumption scales with the output clock frequency.
If power consumption is normalized with respect to the speed-up of the simulation, effective power consumption reduces from 1.9 mW in realtime operation to approximately mW for an speed-up factor of 100. In other words, the energy required for emulating a spiking neural network for one second reduces from 1.9 mJ to 0.15 mJ. This is mainly due to reduced influence of static power. Thus, accelerated simulations could be used for increasing energy efficiency for applications that do not require real-time operation. When assuming that all neurons fire with their maximum frequency of 1 kHz in real-time operation (resulting in 100 kHz at a speed-up factor of 100), the above values correspond to 30 nJ/spike in real time operation and 2.3 nJ/spike at an speed-up factor of 100. This number is well within the range otherwise reported for power-optimized subthreshold architectures, see Table II . The value given for [11] counts only the incremental increase in power consumption per additional spike. If the metric of our system (overall power consumption divided by cumulative spike rate) is applied, the energy per spike would be 19 nJ [45] .
In contrast to a fully addressable synaptic matrix [39] , our architecture inherently relies on feeding all synapses of a given row with the same presynaptic input. In this way, all components of the synapse circuit that depend solely on the input spikes are shared between synapses, thus they are implemented only once per row, which greatly reduces overall circuit area compared to a fully adressable synaptic matrix. Memristive arrays [46] , [47] inherently use the same architecture, as the employed implementation as a crossbar does not allow for individual presynaptic circuits inside the synaptic matrix.
Driving all the synapses of a row with the same presynaptic input poses constraints on the realizable connection topologies. Networks that employ all-to-all connectivity or similar topologies with high local connection density can be realized efficiently, whereas for topologies with low connection density, only a fraction of the synapses in the matrix are used. Dedicated mapping algorithms can partially compensate for these restrictions [10] , [48] . Increasing the number of presynaptic circuits and letting individual synapses choose between several (e.g., two) inputs also greatly reduces the imposed constraints and makes the architecture well-suited even for topologies with low connection density, as demonstrated in [19] . At the same time, this concept retains the original approach of shared presynaptic circuits, so that the implementation presented in this paper could be easily extended in this way.
In terms of interfacing to biological tissue, our approach is similar to [1] , i.e., it concentrates on the behavioural dynamics, while using conventional lab equipment to detect and record biological spikes and convert spikes of the SC neurons back into stimulation signals. A reasonable level of versimilitude in the reproduction of physiological behaviour is needed in an interface to neural tissue [1] . The chosen short term plasticity has a firm grounding in biological measurements [29] . The long term plasticity rule chosen for this implementation has a more theoretical background, with only limited support from biological evidence [30] . However, our SC implementation is by no means restricted to this single plasticity rule. In particular, the faithful reproduction of pre-and postsynaptic waveforms (evident for example in Figs. 11 and 9 ) could also be employed by a plasticity rule based on neuronal waveforms such as that in [24] , which aims at the replication of a wide range of biological plasticity experiments [16] .
IV. CONCLUSION
We have constructed a mixed-signal neuromorphic system implemented in the 28 nm node. The usage of switched capacitor circuit techniques together with dedicated low-leakage optimization allows us to achieve biological-realtime operation. The SC implementation in 28 nm enables a very agressive area scaling without compromising analog performance (see Section III). As can be seen from Table II , its power budget is competitive with recent power-optimized digital or analog neuromorphic systems [11] , [39] .
In terms of neural recording and stimulation, the high-density system integration in a 28 nm technology, the realistic synaptic and neuronal dynamics and moderate power dissipation make our system a good candidate for future implanted closed loop interfaces (when enhanced by amplifiers and spike detectors). Compared to the biologically targeted neuromorphic system presented in [1] , which is optimized for application as a spinal cord central pattern generator, our system contains significantly more neurons and various adaptation mechanisms, which are widely configurable. In collaboration with the group of S. Marom, one of the target uses of our system is replicating the experiment described in [17] with one of the biological networks in the chain replaced by a hardware network. Overall, our system fits very well with this intended usage in the context of biological interfaces.
However, the presented neuromorphic system is by no means restricted to biological interfaces. The versatility and configurability of the implemented neuron and synapse circuits can be used in general neuromorphic processing comparable to [8] , [9] . One interesting use may be in an integrated adaptive vision system [6] that directly incorporates the high-density neuromorphic processing with deep-submicron pixel cells [49] . The configurable-timescale waveform generation shown for example in Figs. 9 or 10 could also be used as a high-density driver for nanoscale memristive arrays [47] , [50] , [51] . Currently, he is a Project Manager with the Chair of Highly-Parallel VLSI-Systems and Neuromorphic Circuits, Technische Universität Dresden, working on silicon integration of low power SoCs in deep-submicron CMOS technologies. His special research interests include low power and high-performance circuits for clocking and data transmission. He has authored or coauthored more than 30 papers and four patents.
Dr. Höppner has acted as a reviewer for various IEEE conferences and journals.
Georg Ellguth received the Dipl.-Ing. (M.Sc.) degree in electrical engineering from Technische Universität Dresden, Dresden, Germany, in 2004.
Since 2004, he has been a Research Assistant with the Chair of Highly-Parallel VLSI-Systems and Neuromorphic Circuits, Technische Universität Dresden. His research interests include low-power implementation techniques in multi-processor system-on-chip.
