Abstract-To advance our understanding of the functioning of neuronal ensembles, systems are needed to enable simultaneous recording from a large number of individual neurons at high spatiotemporal resolution and good signal-to-noise ratio. Moreover, stimulation capability is highly desirable for investigating, for example, plasticity and learning processes. Here, we present a microelectrode array (MEA) system on a single CMOS die for in vitro recording and stimulation. The system incorporates 26,400 platinum electrodes, fabricated by in-house post-processing, over a large sensing area (3.85 2.10 mm ) with sub-cellular spatial resolution (pitch of 17.5 µm). Owing to an area and power efficient implementation, we were able to integrate 1024 readout channels on chip to record extracellular signals from a user-specified selection of electrodes. 
I. INTRODUCTION

E
XTRACELLULAR RECORDINGS of the electrical activity of neural and cardiac cell networks in organs such as the brain, the retina, or the heart, can provide a wealth of information about the physiology as well as the pathological degenerations that may cause diseases, such as Parkinson's or Alzheimer's. Microelectrode arrays (MEAs) have been used for a long time for in vitro extracellular recordings of electrogenic cell cultures and tissues, such as acute or organotypic brain slices and retinae [1] - [3] . They provide simultaneous multisite recording capability, which is essential to study cellular interconnections and network properties that arise from synchronized cellular activity [4] , [5] . However, passive MEAs, which typically include metal electrodes on a glass substrate, are limited in both the number of electrodes (usually less than 300) and the spatial resolution (typically 30 m), features that are needed to reconstruct large neural networks at cellular detail.
With CMOS technology, these limitations can be overcome by using multiplexing techniques, which enable access to a large number of closely-spaced electrodes to obtain large sensing areas at high spatial resolution [6] . Moreover, the monolithic integration of recording amplifiers and ADCs, on the same substrate with the electrodes, avoids off-chip parasitics and interference and, at the same time, allows for realizing a large number of recording channels with a low number of connections. Whereas many neural acquisition chips, mostly for in vivo applications, have been designed in the last decade (see for example [7] - [14] ), only a few implementations of CMOS microelectrode systems have been realized to date. Neural acquisition systems typically interface with arrays or micro-needle probes consisting of only a few hundred electrodes. CMOS MEAs, on the other hand, feature in excess of several thousand recording sites. It is therefore important to provide the capability of recording from such large arrays, while maintaining good signal characteristics. Currently available CMOS MEAs, however, are limited in either spatial resolution [15] , [16] , noise performance [16] - [18] , or readout channel count [15] , [19] .
One category of CMOS MEAs is based on an active-pixel sensor (APS) architecture [15] - [18] . Since the area for the analog front-end (AFE) amplifier is limited by the pixel size, this scheme results in a tight trade-off between noise, power consumption and spatial resolution. In addition, as all electrodes, even those without significant neural signals, are scanned, the full-frame rate is typically less than 10 kHz [16] , [18] , due to power constraints. Higher sampling rates are desirable to reconstruct the fast transient of the spike waveforms. In [16] and [18] a subset of the array can be scanned at an increased rate. Nevertheless, the selection is not flexible enough to adapt to complex morphologies or regions of interest.
Another approach is to employ an analog switch matrix [19] to continuously connect a subset of the electrodes to readout units located outside of the sensing area. Each pixel of the sensing array only contains switches and SRAM cells, leading to a high spatial resolution. The relaxed area constraints for the AFE allow for the implementation of amplifiers with lower noise and anti-aliasing filters. The system proposed in [19] is limited to 126 channels for simultaneous recordings, and, therefore, renders the analysis of large neural networks difficult and time consuming. As an example, 100 subsequent acquisitions are required to cover an array area of 1.8 2.0 mm [20] .
In this paper, we present a recently developed CMOS MEA system that further exploits the switch-matrix approach. The system preserves sub-cellular spatial resolution over a large sensing area (8.09 mm ) and features 1024 channels for recording at high temporal resolution (20 kS/s). Despite an eight-fold increase in the channel count with respect to [19] , state-of-the-art noise performance has been achieved (2.4 V ), owing to an area and power efficient design of the circuitry for amplification and A/D conversion. Moreover, the cross-over distortions and channel-length modulation effects, which had been observed in the design of the stimulation unit in [21] , have been largely eliminated. The routing flexibility, provided by the switch-matrix, has been substantially improved, e.g., the high-density blocks can be 5 times larger than those in [19] . Post-CMOS fabrication of electrodes and biocompatible die-bonding and encapsulation have been used to obtain a device that can be handled like a standard MEA dish (see Fig. 1 ).
This paper is organized in eight sections. Section II presents the system requirements and the proposed architecture. The analog switch-matrix, the readout and the stimulation units are described in Sections III, IV, and V, respectively. Section VI describes the chip implementation and fabrication. Measurement results, including electrophysiological recordings, are given in Section VII. Section VIII compares the chip to the state of the art and concludes the paper. 
A. System Requirements
To enable a broad range of experiments, we aimed at realizing a versatile platform, capable of recording from various in vitro and ex vivo biological preparations, such as cultured neuronal networks, brain slices, acute retinae and cardiac-cell cultures. High spatial resolution, down to the cellular or sub-cellular level ( 20 m) , is required to facilitate the task of separating individual signal sources [22] , [23] . Such separation is necessary to understand how whole-network properties arise from cellular behavior and inter-cellular connections [5] . In the case of neurons, the cell bodies (somata) have diameters in the range of 5-50 m, but the neurites cover a much larger area. As an example, the denditric trees of Purkinje cells extend over several tens or hundreds of m [24] . For most cases of neuronal preparations, sub-cellular details of single neurons can be resolved with an electrode pitch of less than 20 m [20] , [25] . In addition, it is desirable to record simultaneously from distant regions to be able to study interactions between sub-circuits, e.g., in a brain slice, as far as several millimeters away from each other. As a tradeoff with die size, we opted for a rectangular sensing area of 4 2 mm .
The signal levels can vary significantly depending on cell type, distance from the recording electrode, and seal resistance of the cell-electrode cleft [26] . A summary of signal characteristics is reported in Table I . In the case of cardiac myocytes, action potentials (APs) feature amplitudes of up to several tens of mV. In the case of neurons, APs recorded at the soma have amplitudes that are typically in the range 100-500 V. In order to also detect low-amplitude spikes from single axons ( 20 V [20] ), for the readout channels we targeted an input-referred noise of 2 V in the band 500 Hz-3 kHz, where most spike energy is concentrated [23] , corresponding to a thermal noise level of 40 nV Hz. Further reducing the noise can result in overdesign, at the expense of circuit area or power consumption, since the overall noise performance is limited by the neural background activity and electrode noise (e.g., 80 nV Hz at 1 kHz, for Pt electrodes with a 25 m diameter [26] ).
APs have a 3 dB bandwidth typically around 2 kHz [27] , with signal content up to 6 kHz. A sampling rate of 20 kS/s is sufficient for most applications. Limiting the recording bandwidth to 7 kHz reduces the aliased noise from both the electrodes and the circuitry. Local field potentials (LFPs), arising from the synchronized activity of many neurons, can occur simultaneously with APs and exhibit amplitudes of up to a few mV, with frequency components in the range of 1-300 Hz. Therefore, in order to also study LFPs, the readout units must be capable of recording frequencies of a few tens of Hz, while rejecting the large offset and drift of the electrode-electrolyte interface potential (in the range of several hundreds of mV [10] , [15] ).
Furthermore, versatile electrical stimulation capabilities for precisely and reliably eliciting APs are essential for investigating, for example, mechanisms such as learning and synaptic plasticity in a neural network. Since neurons can be stimulated by either voltage or current signals [21] , the availability of both modes is desirable. Typical stimulation pulses have durations of 50-900 s, with amplitudes of 0.1-1 V and 50-900 A [28] . Finally, to limit the chip-induced heating to less than 2 C and to avoid active cooling, we aimed for a total power consumption of less than 100 mW.
B. Chip Architecture
A block diagram illustrating the system architecture is depicted in Fig. 2 . The chip features a sensing area of 3.85 2.10 mm with 26,400 electrodes, placed at a pitch of 17.5 m (3,265 electrodes mm ). A matrix of switches, placed below the electrodes, is used to connect an arbitrarily configurable selection of electrodes to 1024 readout channels and 32 stimulation units, all of which are located outside the electrode area.
To adapt to varying experimental requirements, the readout channels provide programmable bandwidth and gain. The full signal band of 1 Hz-6 kHz can be recorded from each channel. Parallel single-slope ADCs, sharing the ramp generator and a 10 bit counter, are used to digitize signals at 20 kS/s and 10 bit resolution. The stimulation circuits, to deliver both voltage and current stimulation pulses, are grouped in two blocks, each comprising 16 units and three 10 bit DACs. By quickly selecting different DAC outputs, complex stimulation patterns with independent bi-phasic or tri-phasic pulses can be generated at each stimulated electrode. Arbitrary waveforms can also be generated, such as sinusoidal waveforms for low-frequency impedance measurements.
A digital core, operating with two clock domains, transfers the readout data off-chip (24 MHz) and receives control settings through an SPI-like interface (up to 50 MHz), used to configure the array, the readout and stimulation units, and to apply stimulation patterns to the DAC inputs. To ensure data validity, both input and output data streams are protected with CRC checksums.
III. SWITCH MATRIX
The electrode array is composed of 220 120 pixels. Each pixel includes an electrode, three switches and two SRAM cells. The schematic of a pixel is shown in Fig. 3 . Two switches ( and ), with dedicated SRAM cells to hold their on/off states, are used to configure the routing path from any specified electrode to the readout and/or stimulation units.
is used to connect the electrode to a signal wire. Six vertical and six horizontal signal wires are used in each pixel for routing, which are shielded by bit lines (BL), word lines (WL), and supply and ground tracks to minimize cross-talks. Neighboring electrodes are connected each to a different line. The availability of more lines per pixel improves the routing capability.
In [19] , the horizontal lines extend over the whole width of the array and are common to each pixel in a row, so that only up to six electrodes per row can be addressed simultaneously. In this new design, to further improve the routing flexibility and to reduce the parasitic capacitance, the signal wires were cut into segments, which extend only for a length of 24 pixels (418 m length). These segments can be connected through switch to form a specific path and reach the boundary of the array to connect to readout and/or stimulation units. This mesh provides high flexibility to adapt the electrode selection to the morphology of biological samples, for example in sparsely distributed sets, at points of interests, or in high-density blocks with a 17.5 m resolution. Due to constraints given by the technology, mainly the minimum pitch between metal wires and the number of metal layers, the largest high-density blocks can contain 23 23 electrodes, which is 5 times larger than what was possible with the design in [19] , where high-density blocks were limited to 6 17 electrodes.
Large switch on-resistance can negatively affect the performance of the recording or stimulation. Transmission gates with around 1 k on-resistance were chosen as switches for the given pixel size. All readout channels can be connected to randomly selected electrodes through an average of 4.4 switches. Only in a few configurations, up to 20 switches are required to route some electrodes. Even in such cases, the switches contribute a noise density of about 18 nV Hz, which is still lower than the targeted noise level of the readout units. The stimulation units can also be directly connected to the electrode through switch for a low-resistance path. This switch can be activated through an SRAM cell residing at one side of the array.
Custom-made CAD software was developed to design and implement the switch matrix. The performance of different designs was evaluated, in terms of electrode selection flexibility and the shortest paths between electrodes and readout units, using a mathematical graph representing the array. Wires and switches of the arrays were mapped to nodes and arcs of the graphs respectively. The connectivity between electrodes and readout channels was modeled as "flow" and the number of overall switches used as "cost" in the algorithm. In order to determine the routing paths and readout channels for all selected electrodes, an algorithm similar to the one in [19] was used. A max-flow min-cost problem is solved through Integer Linear Programming. A variety of electrode configurations, such as randomly chosen electrodes, as well as specific electrode patterns, like large contiguous blocks were evaluated.
The physical layout was then automatically generated from the graph representation, starting from a template containing the electrode, the switches, the SRAM cells, and the wires. Based on the configuration of the mathematical graph, a different set of vias and short-track segments was used in each pixel to join the signal wires of adjacent pixels into 24-pixel-long segments, and to provide the wire-to-switch connectivity. A pattern of 24 24 pixels, formed in this way, was replicated to construct the whole array. Fig. 4 shows a subset of the switch-matrix. Including the periphery of the array, a total of 86,000 switches controlled by 59,000 SRAM cells are used.
The same software was used during experiments to configure the state of the switches and to program a configuration into the SRAM cells.
IV. READOUT
Good gain uniformity across all channels is desired to reconstruct the actual signal amplitudes and cell positions [24] , [29] . Closed-loop amplifier topologies were preferred over open-loop solutions to ensure gain uniformity without the need for calibration. To achieve an overall gain of more than 70 dB, three amplification stages have been employed to reduce the area of passive devices. The schematic of a readout channel is shown in Fig. 5 . In the first stage, a low-noise amplifier (LNA) provides a gain of 24 dB and high-pass filtering to reject the electrode offset. The second stage is a variable-gain amplifier (VGA), employing a digitally-assisted offset compensation scheme to cancel the output offset of the LNA. Low-pass filtering is implemented in two steps: the VGA limits the noise bandwidth and provides anti-aliasing filtering, whereas a multirate SC filter (SC LPF) further reduces thermal noise and provides precise control over the cutoff frequency.
Fully differential structures were employed for the whole readout chain, to improve rejection of power-supply interference and substrate coupling, and to reduce power consumption in the SC LPF [30] . The area and power breakdown of a readout channel are shown in Fig. 6 .
A. Low-Noise Amplifier (LNA)
AC coupling is employed to remove the offset and low-frequency drifts of the electrode potential. While alternative solutions to reset [31] or compensate [10] the input offset of the front-end amplifier have been proposed, these solutions introduce step-shaped artifacts in the recorded traces, as an abrupt change of the DC level in the output waveforms is introduced upon reset or compensation. To avoid such artifacts in the signals, continuous-time filtering was preferred for our design. An input capacitance of 1.45 pF, implemented with stacked polypoly and MIM capacitors, was chosen for reduced area usage (196 m for both branches) and high input impedance (110 M [15] , [35] and can be set as low as 100 mHz. The possibility to tune allows for increasing the dynamic range in experiments in which LFP recording is not required. The switches activated by the Reset signal are employed to quickly recover from amplifier saturation after a stimulation pulse [36] . An alternative scheme is offered by input switches, controlled by the Disconnect signal, used to disconnect the amplifier prior to stimulation, thus preventing saturation.
A telescopic-cascode OTA, with the common-mode feedback (CMFB) loop using transistors in the triode-region, is employed in order to minimize the number of current branches. All bias currents flow through the input transistors contributing to their transconductance. The reduced output swing is not an issue in this case due to the small signal amplitudes. For a given current budget, the thermal noise of both the input transistors ( ) and the active load ( ) is reduced by increasing the of . Since operating in weak inversion requires a very large ratio, were sized for moderate inversion, with a of 25 , as a tradeoff between transconductance efficiency and area. The noise contribution of was reduced by operating them in strong inversion, with a set to 3.8 . For the input transistors, a relatively short length of 1.1 m was chosen to limit area usage. With this choice the 1/f-noise corner occurs at 300 Hz, the lower limit of the AP signal band. Higher noise levels can be tolerated in the LFP frequency band, due to larger signal amplitudes [10] , [15] , [37] . Further increasing the gate area of also results in a larger input capacitance for the OTA. The input-referred noise PSD of the closed-loop amplifier, , is related to the noise of the OTA, , by the relation:
Therefore a too large can degrade the noise in the closed loop configuration [13] , [37] . 
B. Variable-Gain Amplifier (VGA)
The offset of the LNA, which is mainly caused by the mismatch in the resistances and the leakage currents of the pseudoresistors, can saturate the amplification chain. Performing high-pass filtering in the second stage with pseudoresistors can result in large harmonic distortion due to their non-linearity at larger signal amplitudes. The distortion can become severe at low frequencies, since the total harmonic distortion (THD) depends on the frequency as THD [12] , [38] . To avoid this issue, here we employed instead a DC-coupled amplifier with digitally-assisted offset compensation.
A differential-difference amplifier (DDA) with resistive feedback was used to provide high input impedance in a fully-differential structure. Poly-resistors of 10 k allowed for high gain in a small area. The gain can be programmed within the range 0-30 dB, with increments of 6 dB. The DDA is based on a folded-cascode topology (see Fig. 7 ). An additional input differential-pair in the DDA, driven by a 6 bit DAC, is used to implement channel offset compensation without degrading the PSRR or the CMRR of the VGA. The offset compensation is performed off-chip using a binary search algorithm [10] . At each step of the binary search, the individual bit values of all channels are determined off-chip simultaneously; then, these bit values are programmed into the registers for the compensation DACs sequentially through the SPI-like interface. Since the compensation is only used to reduce the offset of the readout circuits, whereas the electrode offset and drift are removed by AC coupling, the compensation procedure needs to be applied only once per measurement session.
Since accurate models for the offset contributed by the pseudoresistors in the LNA are not available, the compensation range can be controlled globally by adjusting the shared bias current of the DACs.
An alternative design for the VGA in the second stage, based on the same HPF topology of the LNA, was also implemented for comparison. The THD of the two designs is shown in Fig. 8 . In the HPF topology, the distortion becomes severe below 300 Hz, where the THD exceeds 45 dB.
C. Switched-Capacitor Low-Pass Filter (SC LPF)
The bandwidth of the VGA is susceptible to variations in process and bias current, and is inversely proportional to the closed-loop gain. To ensure a precise low-pass cutoff frequency, SC filtering is used in the third stage of the amplification chain. A multirate operation scheme was employed that allows for boosting the gain with a reduced capacitance spread, without impacting noise performance or circuit complexity, and led to a compact implementation based on a single OTA. A low clock rate signal ( 60 kHz, 80 kHz, 100 kHz) is used to control the switches in the feedback path to obtain a cutoff frequency around 5 kHz with a small ratio. The input signal is, instead, sampled at a frequency , which can be set higher ( 1, 2, 4) in order to reduce the noise of the input switches. The transfer function of the SC LPF is given by with . By sampling and integrating the input signal times before leaking charge through , a gain of is obtained. Such a scheme results in a low capacitance spread between , and . In this design, , , for a total of 14 unit capacitors . In contrast, a conventional SC circuit with a clock rate of would require 2 unit capacitors to achieve the same gain, power consumption, cutoff frequency and noise performance (50 , for ).
D. Analog-to-Digital Converter
Recording spikes with amplitudes of tens of V, superimposed on LFPs with amplitudes up to a few mV, requires a resolution of at least 9 bit. A resolution of 10 bit was chosen for the single-slope ADC, as a trade-off between resolution and clock rate. The comparators consist of three gain stages with auto-zeroing. Despite a larger static power consumption compared to dynamic comparators, continuous-time comparators were chosen to avoid large kickback noise, since the ramp signal is shared among 1024 ADCs. A capacitive neutralization technique is used in the first gain stage to further reduce the kickback [39] . The input signal is sampled on capacitors during the -phase of the SC LPF. During the count phase, a continuous ramp signal is produced by integrating a constant current ( ) onto a 20 pF capacitor ( ). A schematic of the ramp generator is shown in Fig. 9 .
is generated by a current conveyor applying a reference voltage across . The upper bound of the ADC range is determined by . The lower bound is determined by the final value of the ramp, and is, therefore, subject to process and temperature variations in and , if is fixed. These variations are eliminated by means of a negative feedback loop. The difference between and the target voltage is sampled on at the end of each sample frame. The charge on is then transferred onto , shifting the voltage by . After a few sample frames, converges to , and the slope of the ramp equals . The single-ended ramp at the output of the integrator is converted to a differential signal by a capacitive-feedback amplifier, whose gain can be varied in steps of 0.25 from 0.25 to 2.0 for coarse regulation of the ADC range. The timing diagram of the ADC is shown in Fig. 10 . During the ramp phase, the amplifier performs single-ended to differential conversion by setting and . At the end of each sample frame, the amplifier is auto-zeroed by setting . The output common-mode of the differential ramp is set by a SC CMFB. The simulated power consumption of the ramp generator is 1.14 mW, including the output buffer. The shared counter runs with a 24 MHz clock signal (1200 clock cycles per sample frame). Each count phase lasts 1024 cycles for 10 bit operation. All switching operations in the comparators and ramp generator (auto-zeroing, calibration, CMFB of the output buffer) occur only during the additional 176 clock cycles after the count phases to avoid glitches in the ramp. Gray code is used for the Count signal, to avoid the acquisition of spurious values when the comparator triggers at transitions between two consecutive codes.
V. STIMULATION UNITS
Each stimulation unit can be configured to provide either voltage or current stimulation (Fig. 11) , as was also done in a previous version [21] . The core of each unit is a class-AB opamp, capable of driving loads as large as 10 nF, while maintaining a low static power consumption. In the voltage mode, the circuit is configured as an inverting amplifier with low output impedance. In the current mode, it is configured as a type II current conveyor. The input resistance can be set to either 20 k or 200 k , for coarse adjustment of the current range. Cascoded transistors in the output branch enhance the output impedance to keep the output current constant in the presence of a varying electrode voltage. A pre-level-shifter at the output of the opamp eliminates the cross-over distortion, which has been observed in [21] . The stimulation units also include an auto-zeroing scheme for offset compensation. In current mode, low offset is crucial, because an offset current can quickly drive the electrode voltage to either VDD or ground, or can induce undesirable electrochemical processes at the electrodes, whose reaction products, such as oxygen or hydrogen, can harm the cells or tissue samples [27] . Since most components are shared in the two modes, each stimulation unit occupies only 0.055 mm .
VI. CHIP IMPLEMENTATION
A. Layout and Floor Plan
The recording amplifiers and the parallel ADCs are grouped in blocks, each comprising 32 channels and shared logic and bias circuits (see Fig. 12 ). In each block, the amplifiers are arranged in four rows per stage, to reduce the aspect ratio of the layout and the perimeter/area ratio of the capacitors. Sensitive analog signals are routed via the top metal (MET4), and shielded from underlying circuits by MET3 planes, which carry supply and reference voltages. 
B. Bias and Test Structures
Bias currents are routed to each readout block and fed by programmable bias generators. These currents can be varied independently for each stage. In the LNA, power consumption can be traded off with noise levels [9] ; in the VGA, it can be used for global tuning of the low-pass corner and to adjust the offset-compensation range. Each readout block can be powered-down independently, to permit long-term continuous monitoring with a small number of electrodes at low power levels.
Off-chip access to the input and output of the recording amplifiers or one stimulation unit is provided by dedicated pads and switches. A voltage buffer can be inserted in front of the output pads to drive off-chip loads. Each recording amplifier can be bypassed to characterize the individual gain stages and the ADC independently.
C. Chip Fabrication
The chip was fabricated in a 0.35 m CMOS technology (2P4M). Platinum electrodes were post-processed at wafer level by means of ion beam deposition and etching. In the same step, three Pt-resistors were fabricated on top of the CMOS passivation for use as temperature sensors. To protect the underlying circuits from the saline solution, used as biological medium, and to avoid cell contamination by the aluminum contained in the CMOS process, a multilayer SiO Si N passivation stack was deposited by plasma-enhanced chemical vapor deposition (PECVD). Openings in the passivation to the platinum, defining the actual electrode areas (9.3 5.4 m ), were then obtained in a reactive-ion etching (RIE) step. A shifted-electrode layout Fig. 15 . Histograms of the gain (at 1 Hz, 10 Hz and 100 Hz) and the low-pass 3 dB cutoff frequency. A nominal gain of 16 16 4 , with a multirate factor and kHz, was used for this measurement.
[40] was employed to prevent any leakage of aluminum into the biological medium. Fig. 12 shows a micrograph of the chip and close-up view of the fabricated electrodes. The chip was die-bonded on a custom PCB. A polycarbonate ring was used to contain the biological medium and a bio-compatible epoxy was used to encapsulate the bond-wires [19] , [36] . In Fig. 13 , an SEM image of the chip surface, plated with rat cortical neurons, is shown.
VII. MEASUREMENTS
A. Electrical Characterization
The frequency response of one readout channel for four possible gain settings is shown in Fig. 14 . The measured maximum gain is 78.3 dB. The spread of the response across all channels was characterized by applying a common signal to all inputs. Owing to closed-loop topologies and SC filtering, very good uniformity in both the gain and the low-pass corner has been obtained, as shown in Fig. 15 . The input-referred noise PSD of the readout chain, including the ADC, is shown in Fig. 16 . The noise spectral density is 39 nV Hz at 1 kHz. The noise integrated over the band 1 Hz-10 kHz is 5.9 V . In the LFP band (1 Hz-300 Hz) the noise amounts to 5.4 V , whereas in the AP band (300 Hz-10 kHz) it is 2.4
When filtered in the band 500 Hz-3 kHz, for spike detection, the noise is 1.8 V . The CMRR of the readout was obtained from measurements on all 1024 channels, resulting in an average of 72 dB. The response of the ADC to a 1.1 kHz sine wave is shown in Fig. 17 . The ADC achieves an SNDR of 59 dB and an SFDR of 68.9 dB. Kick-back from the comparators on the shared ramp has been observed. The SNDR degrades by a maximum of 8 dB in the worst-case condition, which occurs when all comparators toggle simultaneously.
The performance of the stimulation unit was assessed with typical biphasic waveforms used for eliciting electrical activity in neurons (Fig. 18) . Loads as large as 10 nF can be driven in the voltage mode with pulse durations of 250 s. In the current mode, the cross-over distortion is eliminated; channel length modulation effects were also reduced with respect to the design in [21] . To quantify the accuracy of the amplitude of the stimulation pulses, the static linearity has been characterized by sweeping an input DC voltage and extracting the residuals of a best-fit line. In the voltage mode, 10 bit linearity within an output range of 3 V was achieved. In the current mode, the linearity is 9 bits within an output range of 50 A.
The measured total chip power dissipation is 75 mW at a supply voltage of 3.3 V. Table II shows the breakdown for the different supply domains. The power consumption of the stimulation units is largely dependent on the applied stimuli, due to its class-AB operation. Also, these blocks can be powered down during experiments that do not require stimulation. Since low power dissipation is crucial for the survival of cells cultured on the chip's surface, the temperature increase of the device, filled with PBS and placed inside an incubator, was monitored by means of the on-chip Pt temperature sensors. An increase 2 C was observed when all channels were powered up 
B. Biological Measurements
The devices were further verified with in vitro and ex vivo measurements. Portions of acute ex vivo rabbit retina were placed on the chip surface, and spontaneous electrical activity was successfully recorded. A raw trace, as recorded by the CMOS MEA without further processing, is shown in Fig. 19 . Action potentials from retinal ganglion cells (RGCs) were detected. The recorded samples are marked with dots in the time zoom. Large-scale recordings were performed with cultures of cortical neurons. The neurons were isolated from rat brain and plated on the surface of MEA chips, which were pre-coated with poly(ethylenimine) and laminin. Spontaneous activity was observed, and APs were detected with a threshold 5.5 times above the noise rms. Fig. 20 shows a portion of the electrode array, with spike amplitudes obtained from simultaneous recordings of two distant high-density patches, consisting of 23 23 and 15 15 electrodes, respectively. The average spike shapes of two neurons with overlapping electrical footprints are also shown. The low-noise characteristics of the recording channels, combined with high spatial resolution, allowed for identifying and separating the individual signal sources. The stimulation capability was verified by applying biphasic voltage pulses (positive first) with a 200 s phase duration and peak-to-peak amplitude of 800 mV. Fig. 21 shows an overlay of 26 raw traces, subsequently recorded on an electrode located 288 m from the stimulation site. Spikes were reliably elicited 8 ms after the stimulation pulse.
VIII. COMPARISON TO STATE-OF-THE-ART AND CONCLUSION
The performance of the chip has been summarized and compared to that of other CMOS microelectrode arrays in Table III . Our device achieves state-of-the-art noise characteristics (2.4 V in the frequency band of APs) while maintaining high spatial resolution (17.5 m pitch) and low power consumption. Despite the use of closed-loop topologies, which were adopted to ensure high gain uniformity without the need TABLE III  COMPARISON TO OTHER CMOS MICROELECTRODE ARRAYS for calibration, each recording amplifiers occupies a very small area (0.033 mm ) and consumes little power (31 W). The area and power efficient design of the readout channels allowed for the integration of 1024 of such units, which is about 8 the channel count of the switch-matrix-based design reported in [19] . This channel count also exceeds that of all neural acquisition ICs reported in literature (e.g., [7] - [14] ), for which the integration of only up to 256 channels has been demonstrated. The simultaneous signal acquisition at many recording sites facilitates the reconstruction of interconnections in neural networks. The presented device also features the largest sensing area (8.1 mm ), which permits the simultaneous recording of large patches in distant regions, to investigate long-range interactions between sub-networks. Stimulation units with both voltage and current stimulation capabilities have also been integrated on chip. All these features make the whole MEA system a versatile platform for numerous biological applications. The chip was used to successfully record activity from a variety of biological preparations, validating the suitability of the device for high-throughput electrophysiological measurements. After that, he joined the Bio Engineering Laboratory (BEL) at ETH Zurich in Basel as the head of the Circuitry Group. His research interests include analog and mixed-signal integrated-circuit design, with emphases on data converters and biosensor interfaces.
Urs Frey (M'11) received the diploma in electrical
