Abstract-A 256-site, fully implantable, 3-D neural recording microsystem has been developed. The microsystem incorporates four active neural probes with integrated circuitry for site selection, amplification, and multiplexing. The probes drive an embedded data-compression ASIC that successfully detects neural spikes in the presence of neural and circuit noise. The spike detection ASIC achieves a factor of 12 bandwidth reduction while preserving the key features of the action potential waveshape necessary for spike discrimination. This work extends the total number of neural channels that can be recorded across a transcutaneous inductively coupled wireless link from 25 to 312. When a spike is detected, this ASIC serially shifts the 5-bit amplitude and 5-bit address of the spike off of the chip over a single 2.5 Mb/s wired or wireless line. The spike detection ASIC occupies 6 mm 2 in 0.5 m features and consumes 2.6 mW while the entire microsystem consumes 5.4 mW of power from a 3-V supply.
I. INTRODUCTION

I
T HAS long been the goal of researchers to record from hundreds of neurons simultaneously. Recording from hundreds of neurons will allow neurophysiologists to better understand the complex circuits of the nervous system and will provide the control signals necessary to implement complex neural prosthetics. For example, intended movements are encoded in the motor cortex as population vectors, where groups of neurons' firing rates are tuned to movements in a specific direction [1] . Numerous groups [2] - [4] have used these population vectors recorded from the brains of macaque monkeys to control motor prostheses. There is currently a debate in the field as to the minimum number of neurons needed to adequately control a complex neural prosthesis (50-1000), but few disagree that the ability to record from more neurons will result in increased prosthetic control and lifetime.
Micromachined silicon recording electrodes are widely used by neurophysiologists to explore the nervous system and use of these electrodes to control motor prosthetics has recently been reported [3] . While passive silicon probes (without circuitry) have been used to record from up to 96 sites simultaneously [5] , scaling to larger site counts is problematic because of the large number of interconnects necessary (one data lead is required per recording site) and the tethering forces these interconnects produce on the probe in vivo. In order to provide the ability to record from a large number of sites with a reasonable number of lead transfers, circuits must be integrated with neural recording probes to allow front-end selection and time-division multiplexing. On-chip amplification must be included to reduce cross-talk, reduce spurious noise, perform antialiasing, and perform impedance transformations. Problems associated with recording such a large quantity of data are further exacerbated in transcutaneous wireless systems where the data rate is currently limited to 2.56 Mb/s [6] . For neural recordings sampled at 20 kHz and quantized to 5 bits of resolution, this limits the total number of neural channels that can be recorded simultaneously over a transcutaneous wireless link to 25. To overcome this limitation and to further reduce the number of interconnects (and the sheer volume of data) associated with wired probes, implantable signal processing circuitry must be developed. The ability to process neural data in vivo not only promises to increase the total number of neurons that can be recorded simultaneously, but also lays the groundwork for fully implantable, closed-loop neural prosthetics, where population vectors for motor control or the onset of epileptic seizures are processed inside the body. 
II. SYSTEM DESIGN AND METHODOLOGY
In order to provide the ability to record from more neurons simultaneously and to develop the technology required for a fully implantable neuroprosthesis, a 256 site, three-dimensional (3-D) neural recording array with on-platform in vivo signal processing has been designed. The on-platform spike detection unit takes advantage of the low duty cycle of unit activity (neural spikes are sparse even when recording from highly active neurons) to eliminate transmission of neural noise out of the body. This saves valuable bandwidth and will allow many sites to be monitored on just a single data lead. This 3-D microsystem will scale to a wireless 2560 site array, where 320 sites are monitored simultaneously via a telemetry interface.
A picture of the fully implantable neural recording microsystem with platform mounted spike detection circuitry is shown in Fig. 1 and a block diagram of the system electronics is given in Fig. 2 . Four 64-site, eight-channel neural recording probes [7] with integrated amplification and multiplexing circuitry are assembled into a 256-site 3-D neural recording array using the assembly technologies reported by Bai [8] . The time-division-multiplexed outputs of the neural probes are fed into a platform-mounted spike-detection ASIC, which quantizes and demultiplexes the neural waveforms, calculates the mean and standard deviation of each neural channel, sets upper and lower spike thresholds based on the mean and standard deviation, detects neural spikes which exceed these thresholds, and outputs the amplitude and location of detected neural spikes over a serial data lead. It is necessary to individually set spike thresholds for each neural channel because neural noise and signal amplitudes vary significantly from site to site. Thresholding was chosen as the detection mechanism because of its low cost in terms of die space and power consumption and its superior detection performance when compared to other algorithms [9] . While digital detection consumes 75 W/channel, slightly more than a previously reported analog technique [10] , comparator offsets do not degrade detection. In addition, quantizing the signal allows the spike waveshape to be preserved and lays the groundwork for fully implantable neural prostheses where events such as the onset of epileptic seizures are processed and treated inside the body. The amplitude and timing information transmitted by the spike detector allows the discrimination of different spikes recorded from the same site. The spike detection unit controls the clocking required by the time-division multiplexers on the two-dimensional (2-D) probes. External interfacing with the implantable neural recording microsystem system is performed through the spike detection ASIC utilizing one data, three power, and four control signals, all of which can be provided wirelessly [6] or through a ribbon cable and percutaneous connector.
III. ACTIVE NEURAL PROBE DESIGN
As discussed in the introduction, there are many reasons for integrating circuits and neural recording probes, including reduced crosstalk, spurious noise and packaging requirements, as well as signal processing to increase the number of neurons that can be monitored simultaneously. To this end, a 64 site, eight-channel active neural recording array has been developed and is pictured in Fig. 1 . The 100 m iridium recording sites are arranged in an 8 8 array along the probe shanks. The vertical center-to-center site spacing is 100 m while the sites are spaced 200 m apart horizontally. The shanks are 3 mm in length, taper from a maximum width of 85 m at the base to a sharp tip, and are approximately 12 m thick. A block diagram of the on-probe electronics is shown in Fig. 2 . The probe features a 64:8 front-end selector, which allows 24 different site combinations to be chosen for recording. Eight on-chip preamplifiers provide an in-band gain of 40 dB while filtering the DC baseline polarization of the electrode. The time-division multiplexer samples the eight active channels onto one data lead and the active filter removes the clock artifact while providing an additional in-band gain of 20 dB. The probe requires a total of eight leads, three for power, one clock lead, one reset lead, two leads for addressing the front-end selector, and one output data lead. Some designs include a ninth lead for tuning the low-frequency cutoff of the on-chip amplifiers. The input clock frequency to the probe is 80 kHz (sampling on both the positive and negative edge of the clock) for multiplexing eight channels at 20 kHz/channel. The on-probe circuitry consumes 756 W from 1.5-V supplies and occupies 5 mm of die space in two-poly one-metal 3 m CMOS.
A. Front-End Selector Design
When an electrode array is inserted into the nervous system, not all of the sites are within the approximately 100 m radius of an active neuron necessary to record unit activity. Other sites may be encapsulated by scar tissue caused by the rupturing of blood vessels upon insertion. Furthermore, researchers may only be interested in recording from a subset of neurons at any given time. For these reasons, the first level of data compression that can be achieved is front-end selection, which allows the user to choose a subset of neurons from which to record via a static multiplexer. This approach allows the user to change the recording location without another implant and can eliminate further processing of signals recorded from sites where there is no unit activity. A block diagram of the front-end selector is shown in Fig. 3 . When the mode signal is high, the front-end selector will clock in a 5-bit serial address, which corresponds to a specific group of sites to be monitored. The 5-bit serial address is decoded and used to control eight 8:1 multiplexers that perform the site selection. Each probe in the 3-D array operates from a separate mode line, mode(0:3) in Fig. 2 , and can be individually addressed. With 64 sites and eight channels, the maximum number of site configurations for a single probe is
Combinations
(1)
To allow for this many site configurations would require 33 serial address bits along with very large decode and multiplexer logic. Because space on the probe is limited, a subset of the most useful site configurations has been implemented and is shown in Fig. 4 . The current design allows for vertical, horizontal, and block zooming and can be expanded to accommodate more site combinations if necessary.
B. Integrated Neural Preamplifier
A schematic of the integrated preamplifier and its measured frequency response are shown in Fig. 5 . The amplifier performs several key functions. Transforming the M site impedance to less than 1 k reduces crosstalk, spurious noise pick-up, and packaging requirements. The amplifier applies a measured gain of 38.9 dB to the 50-500 V neural action potentials with a frequency content between 300 Hz-10 kHz while eliminating the DC polarization of the electrode, which is typically 250 mV with respect to the solution for iridium in phosphate-buffered saline, is highly variable from site to site, and can drift by tens of millivolts over time [11] . The amplifier has a tunable low-frequency cutoff, which is set to 300 Hz when interfacing with the neural spike detection ASIC described in this paper in order to eliminate neural field potentials that can corrupt the spike detection algorithm. The upper frequency cutoff of the amplifier is 10 kHz, which provides an anti-aliasing filter prior to the time-division multiplexer. The input referred noise of the amplifier integrated from 300 Hz-10 kHz is 8.9 V , which is lower than the integrated noise of the electrode, 9.6 V . Each amplifier consumes 68 W from a 3-V supply and occupies 0.177 mm in 3 m features. A more extensive analysis of the preamplifier requirements and operation are given in [12] .
C. Time-Division Multiplexer Design
The time-division multiplexer, which samples the outputs of the eight preamplifiers onto one data lead, is shown in Fig. 6 . The time-division multiplexer is implemented as a counter-decoder design, where the multiplexer samples on both the positive and negative edges of the clock. This sampling scheme allows for lower clock speeds, reduced area and power consumption, and prevents clock transitions in the middle of the sampling window, a problem encountered using previous designs [13] . The active filter at the output of the time-division multiplexer filters the clock noise, which is primarily at 4 MHz, and adds an additional in-band gain of 20 dB for an overall on-probe gain of 1000 per channel. The active filter circuit implementation is similar to that shown in Fig. 5 , with a gain of 10 and a high frequency cutoff of 200 kHz. The measured crosstalk between consecutive multiplexer channels is less than 6%. A detailed analysis of the multiplexer design and performance is given in [12] .
IV. IMPLANTABLE NEURAL SPIKE DETECTION ASIC
A. Spike Detector Operation
A block diagram of the spike detection circuitry is shown in Fig. 1 . Thirty-two channels of time-division-multiplexed neural data sampled at 20 kHz per channel are input to the spike detection ASIC on four data leads. These signals are quantized to 5 bits of resolution, demultiplexed and stored in data memory by four successive approximation analog-to-digital converters (ADCs). The spike detection core circuitry calculates the upper and lower spike detection thresholds based on the standard deviation and mean of each neural channel and then detects neural spikes by comparing the current sample at a site to the previously calculated thresholds for that site. When a spike is detected, the 5-bit amplitude and 5-bit address of that spike are written into output memory. The spike detector core is controlled by a serially programmable instruction RAM. Calculating the thresholds using a multi-instruction pipeline allows the overall clock speed of the spike detection ASIC to exceed 2.5 MHz. The rate at which the thresholds for each channel are recalculated is software programmable from 75 s to 100 ms. The output memory transmits neural amplitude and address data off of the chip over a serial data lead at 2.5 Mb/s. The output memory contains 32 registers, which allows for neural spikes on 50 consecutive channels before a spike is missed. The output memory operates as a first-in/first-out (FIFO) queue with a latency period of 2.84 s added each time neural spikes are detected on consecutive scans. This yields a maximum latency (occurring only under very rare circumstances such as during an epileptic seizure) of 142 s that can be corrected in the external decoding software because the clock rate and channel sampling order are known. The spike detector core circuitry and the successive approximation register used in the ADC were designed using Verilog HDL along with synthesis and auto place and route tools while the memory and analog ADC electronics were designed using Spice and full custom layout. The synthesized spike detector core circuitry consumes 2.5 mm , the on-chip instruction, data, and output memory occupies 1.44 mm and the ADCs consume 0.3 mm , for a total of 6 mm in AMI 0.5 m CMOS.
B. Neural Analog-to-Digital Converter Design
There is widespread disagreement over the required resolution needed to quantize neural action potentials. The integrated RMS quantization noise is given by [14] (2) where is white noise distributed between and is the sampling frequency. For the design of the ADC in this paper, several assumptions supported by measured data were made: neural action potentials have a bandwidth between 300 Hz and 10 kHz and a maximum amplitude of 500 V, the total measured integrated noise (300 Hz-10 kHz) from a recording site is 9.6 V and from the on-probe circuitry is 8.9 V and the on-probe gain before the ADC is 60 dB. The total integrated quantization noise referred to the site and the total integrated quantization, amplifier, and site noise versus ADC resolution are shown in Fig. 7 . From this plot, 5-bit quantization was chosen as the optimum resolution because the input noise is dominated by the site/preamplifier for quantization of 5 bits and above and the power consumption, area, and clock speed of the ADC and DSP circuitry increases with increasing ADC resolution. In order to preserve this resolution, much of the spike detector DSP operates at 8-12 bits. Since there is a gain of 1000 in the neural recording microsystem before the ADC, the LSB is 31.25 V referred to the site, and the maximum spike amplitude that can be digitized is 500 V. This results in a total integrated quantization noise of 8.9 V . Five-bit quantization increases the total integrated noise (300 Hz-10 kHz) in the system from 13.1 V to 15.8 V .
A block diagram of the successive approximation ADC is shown in Fig. 8 . The input sample and hold circuit, samples the time-division multiplexed neural data 7/8 of the way through the sampling window, which reduces multiplexer crosstalk [12] . A dummy switch has been added to reduce clock feed-through in the sample and hold circuit [14] . The track and latch comparator, Fig. 9 , compares the sampled data to the reference voltage provided by the digital-to-analog converter (DAC). The comparator features an input differential amplifier, transistors -, with a small gain of 4 for high-speed operation. The unity gain buffer at the comparator input prevents kickback from the latch, formed by inverters and , from being stored on the sample-and-hold capacitor. In the latch phase, positive feedback allows very small voltage differences at the input to be resolved very quickly at the output. During the track phase, the outputs of and are shorted together to prevent hysteresis. In addition to controlling the DAC, the synthesized SAR controller (shown as the A/D and probe timing controller block in Fig. 2 ) also provides the multiplexer clock to the neural probes, probe clock, the sampling clock to the A/D, hold, the latch signal to the comparators, and controls the writing of data from the ADCs into the data memory, write data(0:3). A timing diagram illustrating the communication between the ADCs, data memory and on-probe time-division multiplexers is shown in Fig. 10 .
A diagram of the current source DAC is shown in Fig. 11 . The current sources (M8-M22) are turned on or off depending on the value of the digital feedback bits, -. While this architecture is often used to achieve high-speed DACs by steering currents between ground and the negative terminal of a transimpedance amplifier [14] , here the current sources are turned off instead of switched to ground when they are not in use. While this decreases the speed of the design due to transistor "wake up," the average power consumption of the DAC is reduced by almost 58%. The nominal value of is 25 A, which allows quantization of 500 V action potentials with an LSB of 31.25 V.
is adjustable for recording spikes which have a wider/smaller signal swing with a subsequent increase/decrease in the least significant bit. Both pMOS and nMOS current sources have been used here to allow the analog output of the DAC to achieve both positive and negative values. The linearity of the DAC depends on how well the currents can be matched between the nMOS and pMOS devices. This puts stringent restrictions on the maximum tolerable current mirror error. In this design, a 6% current mirror error between the nMOS and pMOS current sources would cause a 1-bit DNL at the output of the DAC. To provide low current mirror error at a 3-V supply, the wide swing current mirror architecture [14] consisting of transistors -was chosen. The transimpedance amplifier is shown in Fig. 12 . The amplifier features a wide-swing output stage [15] , which is necessary because standard output stages cannot obtain the desired output swing from a 3-V supply. 
V. MEASURED RESULTS
A. Analog-to-Digital Converter
The spike detection ASIC has been fabricated and is pictured in Fig. 1 . The measured transfer characteristic of the ADC is shown in Fig. 13 where the offset and gain errors here have been trimmed via and in Fig. 11 . The maximum DNL is 0.3 LSB while the maximum INL is 0.5 LSB and there are no missing codes. The power consumption of each ADC is 600 W, including the bias circuitry, operating at a clock frequency of 2.5 MHz.
B. System Level Spike Detector Testing
In order to evaluate the performance of the spike detection ASIC, neural waveforms recorded using multiplexed active probes were reconstructed using Lab View and a DAQ card and were input to the spike detector. The response of the spike detector to a sample neural waveform with the thresholds set to three times the standard deviation above/below the mean is shown in Fig. 14 . The average bandwidth savings across five sample waveforms is 92% with a maximum bandwidth savings of 95%. On average, 86% of the transmitted data is Fig. 14. Response of the spike detection ASIC to a neural waveform recorded using active microprobes. The top trace is the reconstructed neural signal input to the spike detector. The middle trace is the serial digital output of the spike detector. The bottom trace is the output waveform reconstructed from the serial digital output data. associated with neural spikes and no spikes have been missed. The predicted bandwidth savings based on these records when recording from neurons in motor cortex with a firing rate of 50 firings/s [9] is 91%. Overall microsystem performance is summarized in Table I .
C. Neural Spike Discrimination
The ability to discriminate between neural spikes on a single channel improves the control of neuroprosthetics and aids in neuroscience experiments. Vibert [16] evaluated neural spike separation based on eight parameters of the neural spike and concluded that using three parameters, (the maximum positive spike amplitude), (the minimum negative spike amplitude), and (the time between and ), not only adequately sorted spikes but was actually superior to spike separation using more parameters. This was because the additional parameters were all correlated with one or more of , , and , while , , and were found to be uncorrelated. Separation of neurons using these three parameters is based on physiology where and for a particular neuron are dependent on the distance of the site from the neuron, the size of the neuron, and the spatial configuration of the dendritic tree [17] . The first principal component, represented by amplitude parameters and , contained 60%-65% [16] of the total information in Vibert's study. The time between and is also derived from physiology and is dependent on the distance of the recording site from the neuron axon hillock [18] . This second principal component contained 15%-20% [16] of the information in Vibert's experiments. Since the neural spike detection ASIC shown in Fig. 1 preserves the maximum neural spike amplitude , the minimum neural spike amplitude , and the time , between and , it is possible to discriminate between different neurons detected on the same channel.
VI. CONCLUSION
A 256-site neural recording microsystem with implantable data compression circuitry has been developed. Four active neural probes with integrated circuitry for site selection, amplification, and multiplexing drive an embedded data-compression ASIC that successfully detects neural spikes in the presence of neural and circuit noise. The spike detector quantizes and demultiplexes the neural data, calculates upper and lower spike thresholds based on the mean and standard deviation of each channel, detects neural spikes in the presence of noise, and serially shifts the address and amplitude of detected spikes off-chip on one data lead. The average bandwidth savings when the thresholds are set to three standard deviations above and below the mean is 92%, indicating that 12.5 times as many neural channels can be recorded simultaneously across a finite bandwidth wireless link. Given a telemetry bandwidth of 2.5 Mb/s [6] , this increases the total number of channels that can be transmitted transcutaneously at 5 bits of resolution and a sampling rate of 20 kHz/channel from 25 to 312.
work. The assistance of N. Gulari, G. Eadara, Dr. B. Jamieson, Dr. A. DeHennis, B. Casey, R. Gordenker, and J. Hetke is also very much appreciated.
