Special interface system incorporating an embedded CPU core in a programmable logic device accompanied by real-time software has been developed to allow connectivity to a computer host.
I. INTRODUCTION
A VAILABILITY of multisite neuronal electrodes, such as the Michigan probe [1] or the Utah array [2] , has enabled the development of highly integrated multichannel recording devices with large channel counts. These devices are of importance to various aspects of neurophysiological research [3] - [5] .
Multisite electrodes can potentially provide for simultaneous monitoring of hundreds and even thousands of neurons. The raw data rates that are generated by such populations are large [6] . When sampled at 20 Ksps with 8-bit precision, 100 electrodes would generate raw data rate of 16 Mbps. Communicating such volumes of neuronal data over battery-powered wireless links while maintaining reasonable battery life is hardly possible with common methods of low-power wireless communications. Evidently, some form of data reduction must Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TBME. 2006 .883732 be applied. One possible way is to utilize some form of lossy data compression to reduce the raw waveform data capacity. A method employing Wavelet Transform was suggested in [7] . Alternatively, one might extract the significant features of the neuronal signal and limit the transmitted data to those features only. For example, it is possible to detect the presence of neuronal spikes as demonstrated in [6] and communicate only active portions of recorded signals, which may lead to an order of magnitude reduction in the required data rate [8] . Another order of magnitude reduction can be achieved if the neuronal spikes are sorted on the chip and mere notifications of spike events are transmitted to the host. Power feasibility of on-chip spike sorting with common sorting algorithms that are usually software-based is verified in [9] . Adapting these algorithms for utilization in very large-scale integration (VLSI) can yet lead to significant power savings, with only minor sacrifice of results accuracy [10] , [11] . In [12] , it is suggested to measure and communicate certain features of the incoming spikes; the spike sorting can subsequently operate on these features. Our research is aimed at an integrated wireless recording device capable of acquiring neuronal activity over a large number of channels, digitizing, performing data reduction and communicating over a bi-directional wireless link. This paper describes a multichannel neuronal recording front-end integrated circuit, fabricated in 0.35-standard CMOS process. The front-end acquires neuronal signals from 12 true-differential recording channels, performs analog signal conditioning including separation of spike and local field potential (LFP) frequency bands, digitizes the outputs and transmits the data to the host over a serial bus. An on-chip controller provides a level of data reduction by thresholding the incoming signals and transmitting only the "active" signal portions, i.e., segments of signals immediately following threshold crossing events. The front-end is to be integrated with spike-sorting hardware and wireless modem on the PCB level prior to full VLSI integration.
Separating the LFP and the SPK bands at the analog portion of the frontend may have certain advantages, as it reduces the dynamic range requirements on the last frontend stages: In a signal recorded by an extracellular microelectrode, neuronal firing activity occupies the 100-10 000 Hz frequency band; its amplitude is typically lower than 500
. The LFP occupies the lower frequencies, below 100 Hz, with amplitudes below 5 mV. The signal-to-noise ratio (SNR) of the combined signal is rather large; as the microelectrode noise [13] and background noise of cortical activity [14] are typically 5 , it may reach 60 dB. Since the LFP must be filtered out prior to spike sorting, it is possible to block it right at the front-end [15] , by high- pass filtering below 100 Hz. It is commonly indicated, however, that LFP carries important information [16] - [18] . Thus the recording device should preferably make this information available together with the spike data (SPK). Several front-end circuits pass the LFP band intact: [19] - [21] . They block the large input dc offsets, typical of neuronal signals, by high-pass filtering below 1 Hz. As the entire combined signal is passed, the minimal required precision of subsequent data acquisition is 10 bit, defined by the signal SNR. The maximal gain is limited by the LFP magnitude and chip supply voltage. Since the firing activity (SPK) has ten times lower magnitude than the LFP, it can be amplified only to one tenth of the output swing.
Splitting the signal into two bands after the first amplification stage allows separate processing of the LFP and SPK bands, amplifying both to the full swing. Consequently, the system dynamic range needs only be a 100, as determined by the SNR of the SPK signal, and no more than seven bits data acquisition is required.
The chip architecture is described in Section II. A special embedded system design for interfacing the chip is briefly described in Section III. Recording channel circuitry is reviewed in Section IV, with selected test results brought in Section V. Section VI summaries the discussion.
II. ARCHITECTURE
The chip architecture is shown in Fig. 1 . The on-chip controller is responsible for host communication, chip timing, internal register access, channel readout and spike detection. Channel registers and analog-to-digital converters (ADCs) are accessed through an internal parallel bus, mastered by the controller.
The controller has two modes of operation, programming and streaming. In the programming mode, contents of internal registers can be stored and fetched by the host. In the streaming mode, the controller continuously polls the channel ADCs, checks for threshold crossing events on every channel, and transmits the active signal segments to the host. All 12 channels or an arbitrary subset thereof can be enabled for data streaming.
A threshold crossing event is triggered for a certain channel when the output of this channel falls below the low threshold or rises above the high threshold. The controller polls continuously the ADC outputs to check for threshold crossing events. A certain number of samples from that channel is communicated to the host following the threshold crossing event.
The threshold values and the number of samples to transmit after the threshold event are programmable. The entire data stream, without clipping, can be obtained from the chip by setting both thresholds identical.
A. Chip Communications
The chip communicates over a McBSP [22] bus. This is a five-wire, full-duplex, bit-serial synchronous bus; a synchronization clock signal is constantly supplied by the host. The communication is carried out in frames; the host sends 24-bit frames (we refer to this direction as downwards) and the chip replies with 16-bit frames (the upwards direction). The lengths of downward and upward frames were conveniently chosen to match the lengths of a single host instruction packet and a single reply packet respectively.
The maximal data rate that is generated by the chip can be calculated as follows: A channel analog-to-digital (A/D) sample is 10 bit wide (although seven bits are sufficient, we have implemented ten bit ADCs for verification purposes). Together with a four bit channel number and a two bit control field, an A/D sample can be communicated in a single 16-bit frame. With SPK channel sampled at 40 Ksps, a single channel would generate 640 Kbps. The LFP channel needs to be sampled with a much lower rate (1 Ksps would be enough), with a combined rate of 656 Kbps. Although there are only 12 channels in the current version of the chip, the bus interface was designed to support 16 channels for future versions; the aggregate datarate is, therefore, 10.5 Mbps. The bus was set to operate on a slightly higher, 12.5-MHz clock signal.
B. Instruction Set and Register Access
The chip operation is controlled through instructions sent via the McBSP bus. Four instructions are available.
• STORE reg val: Store value in a register.
• FETCH reg: Fetch register contents.
• RUN: Start streaming data.
• STOP: Halt streaming data. There are two kinds of parameters that control the chip, those affecting controller operation and those affecting the channels. The former include clock divider settings, threshold values, number of samples to communicate upon threshold detection and channel enabling bit mask; the registers for their storage reside in the controller and are accessed directly. The latter include offset calibration data, channel gains and filter frequencies; the registers are distributed over the channels and are accessed through the internal bus.
The internal bus has eight data lines, two control lines and a clock. A register connected to the bus is identified by a distinct 8-bit address. Every bus access is carried out in two steps; during the address step ( is high) the address is driven on . The register matching this address is selected. During the data step ( is low) the contents of the selected register are driven on the bus by the channel ( is low) or the register is updated with the value on the bus ( is high). The bus can be accessed in three possible scenarios, -, -and ---. 
III. HOST INTERFACE
A special interface provides for communication between a personal computer and the neuronal recording front-end. The basis of the interface is an Altera Nios II development kit board incorporating an Altera Cyclone II field programmable gate array (FPGA) device, RAM and flash memory, and an integrated Ethernet physical interface/MAC (Fig. 3) .
The FPGA incorporates an Altera Nios II embedded processor core (running at 50 MHz), bus logic and custom-developed peripheral for McBSP communications with the neuronal recording front-end. The embedded processor executes the real-time operating system and custom-developed real-time software for handling the neuronal data stream. The software reads the serial McBSP data, packetizes it and transmits the packets over Ethernet to a host computer using UDP/IP protocol. It also handles the incoming instructions from the host and communicates them to the chip. The host side software consists of a low-level C++ module that handles the data stream in real time, dumps it onto the disk and performs the decimation necessary for an on-screen display. Displaying data on screen without some sort of decimation (i.e., downsampling) would result in too high screen refresh rates, imperceivable by the human eye. Data display and system control are performed by the top-level Java GUI module (Fig. 4) . IV. RECORDING CHANNEL Fig. 5 shows the recording channel block diagram, as implemented on a 0.35-CMOS chip. The input signal is amplified fifty times by the first stage, which also converts the differential signal to single-ended. A first-order RC filter splits the signal into high frequency SPK and low-frequency LFP parts. The splitting pole is roughly placed at 200 Hz, with a 5-resistor (high resistive polysilicon) and 160 pF (gate-oxide) capacitor.
The SPK signal is amplified by an intermediate 10 stage and a variable gain amplifier (VGA) with digitally selectable gain of 2.5, 5, 7.5, or 10. The SPK chain maximal gain is, therefore, 5 000. SPK signal band is limited by a second-order Bessel LPF (Fig. 6 ), implemented as a Sallen-Key biquad [23] . The dB frequency is digitally programmable in the range of 8-13 kHz, by means of a multitap resistor. The LFP signal is amplified by an identical VGA, without the intermediate 10 amplifier. The LFP chain maximal gain is 500.
Both SPK and LFP channels have to be compensated for dc offsets introduced by element mismatch. The LFP channel amplifies the preamp input offset (typically hundreds of microvolts) by 54 dB; unless compensated, it would severely degrade the LFP dynamic range or even saturate the VGA. The SPK channel amplifies the offset of the intermediate 10 stage by 40 dB, as the preamp dc is blocked by the splitter. Though smaller than LFP, SPK offset is yet significant: the 10 stage has larger input offset compared to the preamp, as the latter uses very large input devices (due to noise requirements). DC offset compensation is carried out by adjusting the VGA reference voltages with a pair of 5-bit calibration digital-to-analog converters (DACs).
Finally, the channels are multiplexed by a Miller-capacitance sample and hold circuit (Fig. 6 ) and converted by a 10-bit successive approximation ADC, which incorporates a special, lowpower inverted-ladder DAC [24] . 
A. Input Preamp
Voltage offsets inherent in neural signal recordings constitute a major challenge in preamplifier design. An input signal must be high-pass filtered at frequencies as low as several Hertz, to let the LFP signal pass unsuppressed. Such time constants are not readily available in integrated circuits.
Several approaches for dc offset stabilization have been reported: Off-chip elements are sometimes employed at input stages [8] , [25] . Several fully integrated approaches were also demonstrated: The signal can be capacitively coupled to the amplifier using the polarization capacitance of the electrode, shunt either by a weak-inversion MOS transistor [21] or a reverse-biased diode [26] , both delivering a large small-signal impedance to form a low-frequency pole at the input. In the former, the gate bias of the shunting transistor is derived with a laser-trimmed resistor. The dc gain of this scheme is not strictly zero, since the real part of the electrode impedance, although very large, is not infinite. DC gain is, therefore, defined by the ratio of the shunting resistance and the parallel resistance of the electrode. Another fully integrated approach suggests using a pseudoresistor device based on a weak inversion MOS and a parasitic bipolar [12] , [19] , [27] , [28] . Such a device has an extremely large small signal resistance at small bias voltages.
The proposed preamp schematic is shown in Fig. 7 . A differential stage with a gain of five and a High Pass Filter (HPF) is followed by a differential-to-single-ended stage with a gain of ten. 1 The total preamp gain is, therefore, 50. The minimal gain to be provided by the preamp is determined by noise constraints as follows. Root-mean-square noise introduced by the frequency splitter resistance into the SPK signal (band of 10 kHz) is (1) Hence, the preamp must provide gain well above 20 dB to keep the splitter contribution below the target 2 . We have chosen to place a weak inversion MOS transistor in parallel with , to provide a first order high-pass filter for input dc suppression. The cutoff frequency is digitally programmable through gate bias voltage adjustment with a calibration DAC. As the conductance provided by the feedback transistor does not belong to a set of controlled process parameters, we have measured a significant variability (more than an order of magnitude) in cutoff frequency among the channels, even on the same die (Fig. 8) . Being able to control the gate bias voltage, we have managed to calibrate all channels to a 1 Hz cutoff.
Given a single pole splitter with pole frequency , the noise energy contributed by the feedback resistor to the SPK signal is (2) where stands for the conductance of the dc nulling resistor. Assuming (about 200 Hz) is much larger than the selected cutoff frequency of the input HPF, the expression above can be re-written as (3) and reflected to the input as (4) Fig. 9 . Die photo.
Placing the resistive element in the feedback has an important advantage: the noise generated by the resistor is attenuated by the amplifier gain. For of 1 Hz, of 200 Hz, first stage gain of five and of 500 fF, we obtain about 1.8 input root-mean-square noise (remembering that there are two resistive elements in a differential stage). The calculations do not include the opamp noise.
Another important tradeoff is revealed by the above formula: higher yields higher noise contribution of the pseudoresistors and better dc rejection. In that context, providing for a selectable cutoff frequency is another advantage.
V. MEASUREMENT RESULTS
A 0.35 CMOS double poly, quad metal 3.7 3.9 mm integrated circuit (Fig. 9) was fabricated at AustrianMicroSystems and tested electrically.
The electrical tests were carried out on 12 channels from ten different dies. The measurements were completely automated; the instruments and the chip were controlled by MATLAB software.
Small signal responses of the SPK and the LFP channels measured on several dies are presented in Fig. 10 . The flat-band gains for the SPK and LFP chains were measured as 3780 and 430, respectively. They have small variations over different dies, some 1% for SPK and 2% for LFP. The deviation from the target average values is due to an inaccurately predicted gain of the 10 stages (and VGAs, which have similar configurations), which turned out to be 9.1 instead of 10. Thus gain errors of and are introduced into SPK and LFP chains respectively.
It can be observed (on both SPK and LFP graphs) that the frequency splitter pole varies significantly among different curves. Its average location is also displaced, 350 Hz instead of 200 Hz. This is due to a failure in a bias circuit that was supposed to provide well bias for a large MOS capacitor inside the band splitter. Fig. 11 shows the SPK channel gain and the cutoff frequency of the output LPF for different settings of digital controls. LPF digital control input determines how many segments are connected in parallel in filter resistors. Thus the control value is directly proportional to the time constant and inversely proportional to the cutoff frequency.
Noise measurements were carried out with grounded inputs. Example results for noise measurements in SPK and LFP channels are presented in Fig. 12 , along with simulated curves. The total input-referred noise is 3
for SPK chain and 10 for LFP chain, when measured down to frequency of 10 Hz. The low-frequency behavior of the LFP noise is , and not as might be expected. This is due to leakage currents through the pseudoresistor MOS diffusions in the input stage. The area of these diffusions must be kept small in the layout. The input stage of the preamplifier consumes 75
; the corresponding NEF [29] ( 5) can be calculated for the bandwidth of 10 kHz as 10.4. It is possible to design a more efficient amplifier in terms of NEF. An example reported in [19] , which also presents a thorough comparison of numerous reported neuronal preamplifiers and their NEF. Although possible, further noise reduction of our preamplifier would require an increase of the input stage area that was not allowed by the overall chip area allocation. The area occupied by the preamplifier is 0.076 , including the 10 stage and the bias DAC. The overall channel current consumption (including the sample-and-hold and the ADC) is about 1 mA, mostly due to the inefficient 10 and VGA stages (to be redesigned in future versions). The total chip power consumption is about 12 mA. The 10-bit ADC was designed for a DNL below 1 LSB and measured a DNL of 0.8LSB. 1.8LSB INL was also measured. The output voltage range is some 0.5 V below the supply rails, limited mostly by the SAH circuit. This leads to overall SPK channel dynamic range of 150-600, depending on the selected gain. For a sine output of 1 Vpp amplitude, a THD of below 1% was measured both on SPK and LFP. PSRR, and CMRR of the entire SPK channel have been estimated by simulations as 70 and 90 dB respectively. An observation was made during the noise measurements of the chip, regarding a better noise immunity of the fully integrated preamplifier compared to a preamplifier with external capacitors. A preamplifier with external capacitors described in [8] was included on the chip for testing purposes. A noise measurement was performed simultaneously on the integrated preamplifier and the preamplifier with external capacitors (Fig. 13 ). The circuit with external elements is far more susceptible to external noise sources. We believe that the noise is induced on the discrete capacitors and inherently longer board tracks (due to the presence of capacitors).
The recording system was successfully tested in vivo with Michigan probes implanted into a cortex of a rat. Samples of recorded signals are shown in Fig. 14 . SPK, band limited by a programmable-cutoff LPF. Another programmable cutoff filter eliminates the dc component at the input. Amplifier offsets are compensated by means of calibration DACs. The SPK and LFP channels provide variable amplification rates of up to 5000 and 500, respectively. Input referred noise of 3 was measured on the SPK channel and 10 on the LFP channel. The two outputs per each channel are converted into digital signals, and the digital controller produces a serial stream at up to 8 Mbps. The controller can also apply a threshold filter to suppress inactive portions of the signal and emit only spike segments; thus, potentially reducing the required communication bandwidth. A prototype of the processor has been fabricated on a 0.35 CMOS process and tested successfully, both electrically and in vivo. An FPGA board incorporating an embedded CPU core providing for connectivity between the recording processor and a computer host have been developed along with appropriate real-time software.
Thanks to digitizing the recorded signal, separating spikes from LFP and detecting threshold crossings, and thanks to its programmability, the processor enables digital transmission of only the active spike segments, thus minimizing the required communication bandwidth and allowing for low-power wireless operation.
ACKNOWLEDGMENT
In vivo testing has been conducted by D. Anderson, D. Kipke, R. Parikh and K. C. Kong at the University of Michigan.
