Abstract-This paper reports an integrated 64-channel neural spike recording sensor, together with all the circuitry to process and configure the channels, process the neural data, transmit via a wireless link the information and receive the required instructions. Neural signals are acquired, filtered, digitized and compressed in the channels. Additionally, each channel implements an autocalibration algorithm which individually configures the transfer characteristics of the recording site. The system has two transmission modes; in one case the information captured by the channels is sent as uncompressed raw data; in the other, feature vectors extracted from the detected neural spikes are released. Data streams coming from the channels are serialized by the embedded digital processor. Experimental results, including in vivo measurements, show that the power consumption of the complete system is lower than 330 µW.
I. INTRODUCTION

I
N RECENT years, advances in technology have made it possible to monitor bioelectric activity using neural prosthesis implanted in the brain [1] - [4] . These devices obtain much higher spatio-temporal resolution than using electroencephalography (EEG) techniques, thus allowing specialists to prescribe more specific treatments and even to develop new therapeutic procedures. The potential of neural prostheses is particularly noticeable when they are tailored to the detection and analysis of Action Potentials (APs, also called spikes). For instance, changes on the neural spiking activity during the different phases of epilepsy could eventually serve as a biomarker for seizure prediction [5] , [6] . Implanted closed loop neuromodulation systems based on spike identification could be a promising solution for counteracting the adverse effects of certain illnesses, such as the M. Delgado-Restituto, A. Rodríguez-Pérez, A. Darie and A. Rodríguez-Vázquez are with the Institute of Microelectronics of Sevilla (IMSE), University of Sevilla-CSIC, 41092, Sevilla, Spain (e-mail: mandel@imse-cnm.csic.es; alberto@imse-cnm.csic.es; angela@imse-cnm.csic.es; angel@imse-cnm.csic. es).
C. Soto-Sánchez and E. Fernández-Jover are with the Bioengineering Institute, University Miguel Hernandez and CIBER-BBN, 03201, Elche, Spain (e-mail: csoto@goumh.umh.es; e.fernandez@goumh.umh.es).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TBCAS.2016.2618319 motor dysfunction caused by Parkinson's disease [7] . By looking at the spiking activity recorded in hippocampal ensembles, memory prosthesis devices can be designed to enhance shortterm memory in real-time for patients with memory disorders that result from head injury or chronic deterioration such as Alzheimer's disease [8] . Additionally, the efficient monitoring of spike trains is key for the development of Brain-Machine Interfaces (BMI) in which appropriately instrumented objects can be controlled exclusively by a person's thoughts, thus improving the quality of life of patients with severe immobility problems as, for instance, after a spinal cord injury [9] - [11] .
As long as these neural monitoring systems are implanted, they have to achieve and maintain stable long-term recordings so that the need for re-surgery is essentially eliminated. This poses important challenges on the hardware implementation:
1) On the one hand channel pitches and system form factors must be small. On the other, channels and systems must be versatile and adaptable. Versatility is needed to cope with different monitoring setups as determined by neurologists. Adaptability is needed to deal not only with the intrinsic statistical deviations of the fabrication process but also with the non-stationary nature of the electrodetissue interface. 2) Power consumption must be small to prevent from harmful effects due to excessive heating and to minimize energy requirements. This latter aspect is particularly relevant for remotely powered, battery-less implants. 3) Data compression techniques are needed to reduce the power budget of information processing and communications. In the case of neural spike recorders, each sensor should idle and does not process any information unless the event of a neural spike is detected. The ultimate limit of data compression is not to undermine the efficiency of the therapeutic procedure or the brain-machine interface in which the recording system is embedded. For instance, in BMIs, not only the presence of a spike but also information about its morphology must also be provided so that action potentials can be properly sorted. Further, the temporal information of the events must be preserved along the entire recording period. Several remarkable integrated low power neural monitoring systems have been proposed during the last years [12] - [27] . Many of them feature wireless telemetry and remote powering functionalities. In some cases, bandwidth reduction techniques based on the detection and compression of neural spikes have been proposed as well [15] , [21] , [23] - [26] . However, existing proposals have only partially addressed the adaptability of the recording system to the plastic behavior of the neural tissue or the alterations in the galvanic contact at the electrodes.
In this work, an integrated 64-channel neural spike recording System-on-Chip (SoC) is presented. The main objective has been to provide techniques for long-life operation without penalizing its power consumption or increasing its area occupation. Some of the distinctive features of this SoC are: (i) it implements a bi-directional communication link not only to retrieve information about the brain activity but also to interact with the implant itself without the need of surgery intervention; (ii) two different transmission modes, uncompressed raw data or feature vectors of detected spikes, can externally selected on demand; (iii) channels can be individually disabled if the associated sensor does not transmit any relevant information or the tissue-electrode interface is completely degraded, thus reducing power consumption; (iv) the analog front-ends of the recording channels include programming and updating solutions for the automatic (or at least, externally-controlled) adjustment of their individual pass-band characteristics; (v) the identification of neural spikes, which ultimately relies on the adjustment of an empirical threshold [28] , takes into account the spike-to-noise ratio of the captured signal and employs an adaptive algorithm for improving the probability of detection; (vi) the feature extraction of neural spikes operates in real-time and obtains similar classification performance as other more sophisticated techniques with no need for dimensionality reduction pre-processing [28] . All these features make the proposed neural spike recording SoC an observation tool which allows neurologists to examine the state of the implant and eventually reconfigure the array of sensors to better serve the purposes of the application at hand. This paper extends the results in [29] and describes the acquisition and communications sections of the SoC, the partitioning of the signal processing tasks, and the structure of the digitallyassisted neural recording channels. Additionally, it describes and experimentally demonstrates the programming and selfadaptation techniques embedded in the SoC, assesses the accuracy of the implemented data reduction algorithms and includes in vivo measurements under raw data and feature extraction modes.
The architecture of the recording SoC as well as a description of the different operation modes are presented in Section II. Afterward, Section III describes the embedded micro-controller in the communications node of the system. After disclosing the communication protocol in Section IV, the paper presents in Section V the architecture of the recording channels and the digital control module therein. Then, Section VI gives a brief overview of the analog front-end of the channels. Section VII illustrates the operation of the SoC, provides experimental measurements and in vivo validation results and compares the proposed recording system to others in the current state-of-the-art. Finally, Section VIII concludes the paper. Fig. 1 shows the proposed SoC architecture. It includes a communications node, which acts as gateway between a set of N = 64 recording sites and a wireless transceiver. Both remote powering and wireless data transfer are implemented through a single inductive link to an external hub placed on the head. The approach is largely inspired in RFID techniques [30] and uses the worldwide available ISM (industrial, scientific and medical) band centered at 40.62 MHz for energy/data transmission. On-Off Keying (OOK) modulation is used to transmit data and power from the external hub to the SoC and load shift keying (LSK) modulation is used to transmit data in the reverse direction. The SoC also embeds a power management block, and a set of S = 8 frequency synthesizers (FS) for adjusting the data transfer characteristics of the recording channels. The power consumption of the wireless transceiver and the power management blocks amounts approximately 47 μW. All the blocks in Fig. 1 are fully integrated on-chip and are individually accessible through dedicated pins for testing purposes. In this paper, focus is put on the description of the communications and recording blocks-details on the remote powering and wireless data transfer will be reported elsewhere.
II. NEURAL RECORDING SOC ARCHITECTURE
The recording channels, distributed in a 8 × 8 array, amplify and filter the neural signals captured from the electrodes and include circuitry to automatically calibrate the output voltage level and bandwidth of the recorded signals. Analog-to-digital conversion and information compression of detected neural action potentials are other functions realized in-channel.
The communications node includes a dedicated MicroController Unit (MCU) for: (i) interpreting and applying the commands received from the wireless transceiver; (ii) digitally processing the signals provided by the recording channels; and (iii) encoding the information collected from these sensors for transmission through the wireless link. The communications node includes N signal routers, one per channel, which essentially work as data rate up-converters for raising the transfer speed of the channels to the master clock rate of the MCU. In this work, a custom MCU, instead of an external commercial unit or an embedded IP solution, has been chosen in order to save power and area consumptions, as well as to reduce the form factor of the whole SoC.
The commands interpretable by the MCU fall into one of four categories, each category associated with an operating mode, MODE. They are: 1) Configuration mode (MODE '00') , in which the operating parameters of the recording channels are assigned. 2) Calibration mode (MODE '01'), in which the passband and gain of the transfer characteristics of the channels are automatically adjusted. This facility provides a convenient solution for counteracting inter-and intra-die process variations, as well as, for adapting against changes in the working conditions of the implant like, for instance, tissue-electrode interface degradation or electrode displacement. 3) Signal Monitoring mode (MODE '10'), in which the signals captured by the selected channels are transferred in raw format to the MCU. In this mode, the output stream generated by the MCU contains recordings from a single row/column of channels; all these recordings acquired at the same instant. The selected row/channel may remain the same or be part of a subset of rows/columns which are cyclically visited. As the master clock rate of the communications node is fixed, this poses a trade-off between the number of channels being read out and the effective throughput rate per channel. 4) Data Compression mode (MODE '11'), in which neural action potentials are detected and characterized in realtime by a reduced set of F representative parameters, FE_DATA, in order to reduce the bandwidth of the transmitted signal. Data output from a recording channel remains idle if no spike is detected. Although different channels in a selected subset may have different parameters, they all have to run in the same operation mode. Mixed configurations with some channels operating in, e.g., Signal Monitoring mode, and some others in, e.g., Data Compression mode, are not allowed. Fig. 2 shows the structure of the dedicated MCU. It operates from a 4.0 MHz master clock CLK which is internally scaled in frequency for driving other elements in the SoC, namely, it generates: (i) signal CLKR for the routers and the shaded elements in Fig. 2 (same rate as CLK); (ii) CLKC for the recording channels and the routers ( During command reception, the data demodulated by the receiver is aligned with clock CLKR and decoded from Pulse Interval Encoding (PIE) format to binary code. This information is stored in a register and a CRC (Cyclic Redundancy Check) block verifies the data integrity. A command ID block analyzes the content of the register and identifies the operation mode MODE specified for the SoC. Then, a finite-state machine (FSM_Rx, one per operation mode) is used for identifying the command option, its range of application and the configuration parameters, if available, comprised in the received instruction. This information is ultimately used by a Selection and Programming block to generate the output vectors CONF and SEL which define the states of the signal routers and the recording channels. The N -bit SEL vector, one bit per recording channel, is used to selectively activate the channels of the array. The CONF vector is used to serially load operating parameters to those channels addressed by SEL.
III. DEDICATED MCU
For data transmission, a finite-state machine (FSM_Tx, one per operation mode) is used for reading, processing and sending the data stored in the selected routers to the encoding section. The encoding section is formed by a parallel-series converter and a Manchester code converter. The information DATA gathered by the routers is retrieved by the MCU by means of tri-state buffers controlled by a pointer READ, which cyclically scans the selected routers. If pointer READ addresses a router with no information to transmit (signal RDY in low logic state), it jumps to the next active router. Otherwise, if signal RDY is in high logic state, DATA is transferred to the MCU and, afterward, RDY toggles to low state by the action of pulse OFF. This data retrieval approach supports the event-based transmission of neural spikes in the Data Compression mode. Recording channels in the subset SEL remain in idle state and do not transmit any information unless a neural spike has been detected and characterized.
The MCU also comprises a data stack in which global parameters (affecting the whole array) are recorded, overwriting any previous data, and a ROM in which default values are stored. When the SoC is switched on, the data stack loads the values stored in such non-volatile memory, which are then transferred to the recording channels. It is also possible to reload the content of the ROM at any moment by using the appropriate command.
Power saving has been one of the main challenges accounted for in the implementation of the MCU. All those blocks not addressed by SEL or enabled by MODE enter in a power-down state and essentially consume no energy. Furthermore, the use of multiple clocks per block and state also helps to reduce the overall dynamic power consumption of the MCU. Another important measure taken to reduce the power consumption has been the adopted signal processing partitioning strategy. Heavy computational tasks in the Calibration and Data Compression modes are run locally at the channel-level and, hence, distributed across the sensor array. One the one hand, this allows to employ lower clock frequencies than if the same tasks were run at the MCU and, on the other, it reduces the transfer rate between the recording channels and the MCU.
The area occupation of the MCU, including all the circuit elements in Fig. 2 , is about 1.6 mm 2 and its average power consumption is 40 μW. Fig. 3 shows the command frames recognizable by the SoC, arranged by operation mode. Numbers between parentheses represent the bit lengths of the different fields. In most cases, frames comprise six fields: preamble; operation mode, MODE; range of Typically, the range of application, SELECT, can address one single recording channel (identified by code ID_cell), a whole row or column in the array (identified by codes ID_row and ID_col respectively), or the totality of sensors. However, depending on the operation mode, not all ranges of application are permitted. For example, as calibration units are shared per rows (see Fig. 1 ), the Calibration mode may only be executed by columns or by individual sensors.
IV. COMMUNICATION PROTOCOL
Bits in vector SEL are set to high logic state if corresponding channels are addressed by vector SELECT. In the Signal Monitoring mode, the selection range can be potentially extended by the action of TR_Q.
Parameters, PARAMS, are only specified in the Configuration and Calibration modes. They are classified into CAL_PAR, TR_PAR and CMP_PAR depending on whether they are related to the Calibration, Signal Monitoring or Data compression modes, respectively (more details in Section V). Fig. 4 shows the structure of the output data frames of the SoC for the different operation modes. In all cases, the frame begins with a preamble, followed by a data payload of fixed length (64 bits) and a CRC code. The data payload carries the information provided by the channels. Fillers are appended if the channel's response is shorter than 64 bits. This happens, for instance, in the Configuration mode, in which the recording channel replies with a short confirmation code, CONFIRM, after loading vector CONF. Similarly, in the Calibration mode, the recording channel responds with the final value of the adjusted parameters TR_PAR once the calibration process has been completed.
The Signal Monitoring mode is the only operation category in which data from different recording channels are combined in the output frame. In this mode, the data payload contains K = 8 sample values RC_DATAx, x = 1, . . . , K, coming respectively from the K recording channels composing a row or column in the array. These values are arranged in the frame in the same order than the channels in the array. Contrary to the other operation modes in which the identifier addresses a cell (ID_cell), in the Signal Monitoring mode the identifier refers to a row (ID_row) or column (ID_col) as corresponds.
In the Data Compression mode, the output frame gathers the representative values FE_DATAy, y = 1, . . . , F , obtained from the characterization of the neural action potential detected by the recording channel identified by code ID_cell. As will be shown in Section V.C, depending on the command option CMP_Q, only F = 5 or F = 6 parameters are needed to fully characterize an action potential.
V. NEURAL RECORDING CHANNEL STRUCTURE amplifier (LNA); (ii) a programmable variable gain amplifier (PGA) for adjusting the voltage levels provided by the LNA; (iii) an analog-to-digital converter (ADC) for digitizing the signal provided by the PGA; and (iv) a control module which identifies the operation mode of the recording channel, configures the parameters of the LNA and PGA, processes the data RC_DATA digitized by the ADC, and handles the operation of a frequency synthesizer (shared by columns in the array) during calibration. In the Calibration and Signal Monitoring modes, the recording channel operates at a rate CLKD, obtained by a local timing block which divides the incoming clock signal CLKC by three. In the Data Compression mode, two phases can be distinguished. When the channel is scanning the captured neural signal for detecting spikes, it also uses the rate CLKD, however, when detected neural APs are being processed, the operation rate, including the ADC sampling speed, is directly given by CLKC.
The LNA exhibits a bandpass transfer characteristic whose High-Pass (HP) and Low-Pass (LP) cut-off frequencies can be digitally programmed by means of control words HPC<1:NHP> and LPC<1:NLP>, respectively. Likewise, the gain of the PGA can be controlled by using the digital word PGC<1:NPG>. Circuit details on the Analog Front-End (AFE) are presented in Section VI. Hereafter, focus is on the design of the control module embedded per channel, whose block diagram is shown in Fig. 6 .
The control module comprises: (i) a command reading block; a parameter storage facility in which the channel configuration is archived; (ii) three blocks which execute instructions related to the Calibration mode, Signal Monitoring mode and Data Compression mode, respectively; and (iii) a data transmission module which builds the output DCELL of the recording channel. The command reading block indexes the content of the configuration vector CONF by identifying the operating mode MODE, the command options QERS, and the associated parameters PARAMS, if any.
Note that the Configuration mode needs no specific instruction block as parameters PARAMS are directly transferred to the storage memory. The other instruction blocks are separately described below. 
A. Calibration Block
The purpose of the calibration block is to automatically program the set of parameters TR_PAR, comprising vectors HPC, LPC and PGC, so that the transfer characteristics of the AFE meets the target passband. If no calibration is run, the recording channel uses the parameters TR_PAR stored in the local memory. As noted in Fig. 3 , vector PGC can be either automatically trimmed or fixed to an externally defined value, depending on the command option CAL_Q.
The tuning approach for the adjustment of the HP and LP channel corners is based on a mixed-signal control loop which uses, as feedback signal, the output of the ADC (see Fig. 5 ) and, as controlled signals, the vectors HPC and LPC, respectively. The approach relies on peak amplitude inspection of the ADC output and requires an auxiliary frequency synthesizer for the generation of three pilot tones. During the automatic tuning, the LNA inputs are directly driven by the frequency synthesizer and, hence, disconnected from the electrodes. The frequency synthesizer consists of a Digitally Controlled Oscillator (DCO) operated at a clock rate CLKS, followed by a MOS-based R-2R current steering DAC, and a first order low-pass smoothing filter [31] .
Data transfer between the calibration block and the frequency synthesizer (FS) uses tri-state buffers. The frequency of the pilot tones is defined by digital words NFREQx (x stands for H, L or M ), which are loaded at instances of a pulse FLOAD. For each NFREQx, a number of tone cycles NTx is used as an estimation of the transient period required by the calibration loop to settle. Once the transient period is over, the calibration block defines a measurement period which occupies NM = NT x periods of the synthesized tone.
The foreground automatic tuning of the HP and LP poles encompasses three steps: 1) Control words HPC and LPC are initially set to the widest bandpath possible and the pilot tone, defined by NFREQM, is an arbitrary frequency within the target passband characteristic. After a PGA calibration process to maximize the ADC input swing, the calibration block stores the digital peak amplitude V a detected at the ADC output in the measurement period. 2) Control word HPC is set to the highest frequency possible and the pilot tone, defined by NFREQH, is made to coincide to the intended HP pole position. The calibration block launches a searching process which aims to identify that HPC configuration which minimizes |V a,H P − α · V a |, where V a,H P is the peak amplitude of the ADC output during HP calibration and α is a scaling factor (α = 3 / 4 has been implemented). 3) Control word LPC is set to the lowest frequency possible and the pilot tone, defined by NFREQL, is made to coincide to the intended LP pole position. A similar searching procedure as in step 2 obtains the LPC code that more closely approximates the intended LP pole. During the adjustment of the PGA, the LNA inputs are connected to the electrodes. The calibration starts by setting the PGA to its maximum gain value and calculating the maximum and minimum amplitudes at the output of the ADC during a time interval specified in parameter TIME. If any of these amplitudes exceeds the full-scale of the ADC, the digital control PGC is reduced by one. This process is repeated until no exceeding values are detected. At this point, the last control word PGC is stored.
B. Signal Monitoring Block
The signal monitoring block essentially implements a selective pathway between the ADC of the recording channel and the data transmission block of the control module. Depending on the command option TR_Q, not all vectors RC_DATA are transferred to the associated router. Indeed, this only happens in the configuration TR_Q '000', in which only a single row/column is read. When TR_Q '001' and two consecutive rows/columns are addressed, the block only passes one data conversion out of two. Similarly, when TR_Q '010' one out of four vectors is transferred and, when TR_Q '011' and the whole array is cyclically swept, only one out of eight RC_DATA reaches the data transmission module. In practice, this implies a trade-off between the number of channels being monitored and the overall throughput rate per channel which scales down every time the number of channels is doubled. In all the configurations, the sampling rate of the channels is about 30 kS/s.
C. Data Compression Block
In this block, action potentials are detected by supervising the neural signal sequence RC_DATA and identifying those instances which surpass the band comprised between a positive threshold V T H + and a negative threshold V T H − . Once an action potential is detected, the data compression block triggers a real-time process for extracting the parameters of a piecewiselinear (PWL) approximation of the spike in time-voltage domain. Such parameters, composed by voltage amplitude values and time intervals, define the set of parameters FE_DATA in Fig. 4 . Fig. 7 shows the structure of the data compression block. It comprises two structurally identical blocks, denoted as adaptive V T H + and adaptive V T H − , which operate respectively with the positive and negative values of RC_DATA and allow the threshold voltages V T H + and V T H − to be dynamically adjusted to offset possible variations in the background noise of the captured signal. Additionally, it includes a window comparator for the detection of action potentials and a spike feature extractor, mainly composed of digital comparators and counters, for the characterization of the detected spikes. All the blocks are clocked by CLKD, excepting the spike feature extractor which operates at CLKC rate. Moreover, as soon as the window comparator detects a spike, the clock of the ADC in the recording channel switches to CLKC. This is done to increase the resolution of the time parameters in set FE_DATA.
When the adaptive modules for V T H
+ and V T H − calculations are active (option CMP_Q is '001'), the standard deviations of background noise are obtained as
where RC DAT A + and RC DAT A − denote, respectively, the positive and negative values of RC_DATA, excluding those values belonging to detected spikes; S = 2 U P R and UPR is a parameter which defines the refresh rate of the threshold voltages. Based on these estimations, V T H + and V T H − are calculated as
which are slightly different to the expressions given in [32] to allow for smoother variations of the threshold voltages with the background noise-one LSB variation in standard deviation gives rise to two LSBs corrections in threshold voltage. In (2), parameters V CR + and V CR − are user-defined correction terms for tweaking the threshold values.
As shown in Fig. 7 , sums in (1) are implemented by accumulators and the number of additions is controlled by programmable pulse counters fed by end of conversion signals RC_END. Every time the window comparator detects a spike, signal SPK goes to high logic state and the counting operations in the adaptive V T H + and adaptive V T H − blocks are frozen. When any of the counts in these blocks reaches a value S, the corresponding pulse counter fires a signal DUMP which dumps the content of the register into a vector shaping block. After a clock cycle CLKD, the content of such register is emptied and the accumulation and counting process starts over. The shaping block shifts and truncates the dumped value so as to implement the scaling operation in (2) Δ 2 , where parameter SPD is the estimated duration of action potentials.
The proposed approach for spike detection based on dual adaptive threshold requires smaller computational resources than other more elaborate energy-based detectors [28] and does not add any extra delay on the process of identifying spikes [33] , what paves the way for the real-time characterization of action potentials with no need to temporarily store previous records of the neural signal. Fig. 9(a) illustrates the average spike Detection Accuracy (DA) versus Signal to Noise Ratio (SNR) using the proposed technique. Following the procedures in [28] and [32] , a total of 10 different neural segments (each with a different set of four identifiable action potentials) at 41 different noise levels (from −10 to 10 dB at 0.5 dB steps) were synthesized. The background noise was generated from 25 aggregated neural spikes with different firing rates. The duration of all the segments was 60 s. As can be seen, the obtained DA is similar to that obtained in [34] (above 90% for a SNR of 5 dB), thus confirming the improved performance of the dual threshold approach as compared to the median threshold technique [32] . However, while in [34] the threshold voltages are computed off-chip during a training phase, in the proposed approach they are calculated on-chip and they scale automatically with background noise. The average Detection-Classification Accuracy (DCA) measured over the same datasets is illustrated in Fig. 9(b) . Spike classification (not implemented on-chip) has been made by k-means clustering techniques using Euclidean distances [32] . Note that for SNRs above 5 dB, the sorting error is below 10%. These results are quite similar to those reported in [34] for the Zero-Crossing Feature (ZCF) and Principal Component Analysis (PCA) algorithms. Fig. 10(a) shows the schematic of the analog front-end consisting of a bandpass programmable LNA and a gain adjustable ADC connected together by unity-gain buffers. The LNA uses two amplifier stages and local capacitive feedback around the first OTA. This architecture is used to create a double pole at the low-pass corner of the passband, thus leading to a 40 dB/dec roll-off, which is beneficial for suppressing high frequency noise components in the captured signal. The midband gain of the structure is given by the ratio between the input and the feedback capacitance, C i /C f and the position of the high-pass and low-pass corners are approximately given by 1/R f C f and 2g m 2 C f /C c C l , respectively. The first OTA [OTA 1 , illustrated in Fig. 10(b) ] uses a complementary input differential pair to nearly double its equivalent transconductance for the same biasing current. The second OTA [OTA 2 , illustrated in Fig. 10(c) ] uses a simple p-input class-A stage. Each feedback resistor, R f , is implemented as a 3-bit (NHP = 3) digitally-controlled tapped cascade of transistors biased in deep subthreshold region [35] . The range of R f values sweeps from 5.5 to 85 GOhm. This allows to externally tune the position of the HP corner from 15 to 232 Hz. Similarly, a 2-bit (NLP = 2) programmable capacitive bank, C l , loading the structure allows the tuning of the LP corner between 5.2 kHz and 10.15 kHz. These ranges clearly cover the frequency band of action potentials and partially cover the band of local field potentials (β band and above).
VI. ANALOG FRONT-END
The switched capacitor (SC) structure at the right hand side of Fig. 10(a) implements the ADC of the recording channel [36] . It is built around a SC integrator whose gain can be controlled from 0 to 18 dB with a 3-bit (NP G = 3) programmable input capacitor bank, C in . Hence, besides conversion, the circuit features PGA operation. The OTA of the integrator [OTA 3 , illustrated in Fig. 10(d) ] uses an n-input folded-cascode stage and the comparator employs a current-controlled dynamic latch. Outputs bits are derived by successively detecting the sign of the voltage stored in the integrator. The timing diagram is shown in Fig. 10(e) . Depending on the output of the comparator, the integrated voltage is updated by adding or subtracting binary scaled versions of a voltage reference V ref . These voltages
where n is the output resolution of the converter, are obtained by capacitive voltage division at every step of the conversion process. Solved bits are stored in a SAR register. In the presented design, the bias current OTA 3 is dynamically adapted for power saving by taking advantage that settling requirements are progressively relaxed along the conversion.
Full details on the analysis, design and characterization of the LNA and the programmable gain ADC are given in [37] and [36] , respectively. Herein, for the sake of completeness, only the performance summary of both blocks is given in Table I . The power consumption of the LNA including biasing and commonmode feedback circuits, as well as, output buffers, is 1.92 μW, no matter the SoC operation mode. The power dissipation of the PGA/ADC, when operated at CLKD, is 0.51 μW for a sampling rate of about 30 kS/s. During the extraction of spike features, the clock changes to CLKC and the sampling rate raises to about 90 kS/s. In this case, the power consumption of the PGA/ADC is 1.52 μW. Fig. 11(a) shows the microphotograph of the SoC fabricated in a standard 0.13 μm CMOS process. It occupies an active area of 13.45 mm 2 with each channel having 400 μm pitch-see channel details in Fig. 11(b) . DCOs are placed on the left side of the recording array, while the rest of blocks (communications node, power management, RF transceiver, and some test circuits) are placed to the right. The communications node is on deep n-well for digital noise isolation. Besides an external padring, each recording channel embeds a small pad for flipchiping. These pads include ESD protection diodes which are connected to a voltage clamp protection ring surrounding the array. Analog references (common-mode voltage, ADC voltage references and current bias references) for the recording chan- 
VII. EXPERIMENTAL RESULTS
A. Benchtop Characterization
Measurements of the isolated analog building blocks are reported elsewhere [37] , [36] . In this section focus is paid on the characterization of the control module and, in particular, on the functional assessment of the Calibration and Data Compression modes. Neural signal segments captured in vivo (see Section VII-B2) and resynthesized in the laboratory by means of a TTi TGA12101 arbitrary function generator have been used in the following experiments. Fig. 12 shows the measured ADC output code, as well as the evolution of the control words HPC and LPC, along the calibration of the channel bandpass characteristic. Seeking for neural spike recordings, the target frequencies for the HP and LP poles are 200 Hz and 7 kHz, respectively. As can be seen, three iterations are needed to identify the code (HPC = '101') that better approximates the desired HP pole, whereas two iterations suffice for the LP pole (LPC = '10'). Fig. 13 shows the measured ADC output code and PGC control word along the calibration of the PGA (BP_CAL is set to '0'). The input voltage provided by the function generator was set to a large value of 5 mV pp . As can be seen, the control module of the recording channel initially sets the PGA to its maximum gain value (18 dB) and the implemented algorithm decreases PGC every time a saturation is detected at the output of the ADC. At the end of the adjustment process, the PGC output code is '000' as expected. Fig. 14 illustrates the operation of the adaptive threshold algorithm implemented in the local digital processor of the channels. At the system start-up, the positive and negative threshold voltages are set to the upper and lower bounds of the ADC code, respectively, and, after a transient, these voltages settle to the correct values. In this experiment, UP R = 9, V CR + = 6 and V CR − = 6. It is worth observing that the positive and negative threshold levels self-adapt to changes in the background noise, particularly, when bursts of spikes occur. Fig. 15 illustrates the operation of a neural channel in the Data Compression mode. The SPD parameter was set to 1.75 ms. If no spike is detected, the data compression block of the recording channel sends no information. When a spike is detected, the spike feature extractor obtains the parameters of the PWL representation (in the figure, the characterization of the three detected spikes are given in corresponding tables). Regarding the reconstruction accuracy of the compression, we have reproduced in the laboratory different experimental neural sequences, taken from different channels and recording sessions under mode T R Q = '000', totalizing some 25.000 detected spikes. The root-mean-square deviation between the original waveforms and the generated PWL representations amounts 8.25 ± 2.68%.
In the Signal Monitoring mode, the power consumption of the SoC, excluding the power management unit and the wireless transceiver, raises with the number of active channels: 69.03 μW (one active column, T R Q = '000'), 98.64 μW (two consecutive active columns, T R Q = '001'), 142.28 μW (four consecutive active columns, T R Q = '010'), 241.56 μW (whole array, T R Q = '011'). In all these configurations, however, the power consumption of the communications node keeps fairly at the same value (about 47 μW) because active channels and throughput rate are traded off. The most demanding operation mode is the Data Compression mode with adaptive thresholding enabled for which the power consumption is 330 μW. This is mainly because the power consumption per channel raises from 3.04 μW (in Signal Monitoring mode) to 4.54 μW. Fig. 16 shows the schematic diagram of the measurement setup used for in vivo neural recording. This setup is not intended for mesurements on freely moving animals but for validating and quantifying the SoC performance with two different types of microelectrode arrays (MEAs). In one case, a flexible sub-dural microelectrode array (Multi Channel Systems MCS GmbH) with TiN electrodes separated by 300 μm (diameter 30 μm) placed on a polymide substrate was used. The signals captured by the microelectrodes were transferred to the 64-channel neural recording system by means of flat ribbon cables connected between the MEA adapter (ADPT-FM-36, MCS GmbH) and the SoC through row precision sockets from Samtec. In the second case, the MEA was a Utah array with an ICS-96 connector (Blackrock Microsystems LLC). A custom adapter was designed for arranging the pin distributions of the ICS-96 connector (only, banks A and C) so that the same connection strategy employed with the flexible MEA could be reused.
B. In Vivo Characterization
1) Measurement Setup:
A Field Programmable Gate Array (Nexys 2 Spartan-3E FPGA Board by Digilent) was used to communicate the 64-channel neural recording SoC with a host computer through a conventional 2.0 USB port. A user interface developed in C++ was designed to control the whole measurement setup. The communications between the SoC and the FPGA only need one downward connection for the transfer of commands, and one upward connection for the reception of the neural information recorded and processed by the SoC. Additionally, a 4 MHz master clock is transferred for timing purposes. Details on the FPGA programming were given in [38] . 2) In Vivo Results: Fig. 17 (a) shows a 40 s neural recording segment captured from one of the channels of the SoC when operated in Signal Monitoring mode with the column scanning option enabled (T RQ = '011', 64 channels tracked at a throughput rate per channel of 3.5 kS/s). The flexible sub-dural microelectrode array described in Section VII-B1 was used in this experiment. Although the experiment lasted for several hours, no substantial differences were observed in the course of the measurement, nor among the channels. In this experiment, all the 64 channels of the array were selected and their bandpass characteristics were set between 15 Hz and 5.2 kHz. The control word of the PGA was set to P GC = '101'. Fig. 17(b) shows the recording captured from one of the channels of the SoC when operated in Signal Monitoring mode from two columns of the channel array (T RQ = '001', 16 channels tracked at a throughput rate per channel of 15 kS/s). In this case, the Utah array was employed in the in vivo measurement. The bandpass characteristics of the 16 selected channels were set between 200 Hz and 7 kHz in order to capture spike activity. The PGC word was set to '111'. As can be observed in Fig. 17(b) , isolated action potentials and bursts of spikes are clearly noticeable in the recording. As in the previous case, no substantial difference from the shown performance was observed along the experiment nor among the channels. Fig. 18 illustrates the system operation in the Data Compression mode using the Utah array, as well as, the PWL approximation of the signal captured by one of the channels. Dots represent the spikes detected by the neural array in a time slot of 5 s. Adaptive threshold calculation was enabled and, as before, the SPD parameter was set to 1.75 ms. The main digital processor cyclically examines the routers and transmits their contents when ready. The system requires 22 μs to transmit the information of one spike. This is much lower than a typical spike duration and, of course, much lower than the time basis for firing occurrences. It means, that no information is lost not even in the unlikely case all the channels fire at the same instant.
C. State of the Art Comparison
Table II compares the proposal to the state-of-the-art. The power consumption of the transceiver and the power management unit in those solutions which include wireless data transmission and remote powering has not been computed in the total power. Similarly, if the reported chip includes stimulation circuitry, its consumption has not been accounted for either. It is worth observing that the proposed solution is one of the most power efficient recording systems, in spite it offers external programming capabilities not covered by others.
VIII. CONCLUSION
A 64-channel reconfigurable, self-calibrated neural recording/communication SoC with embedded data reduction techniques fabricated in a standard CMOS 130 nm process is reported. Each channel embeds all the circuitry to filter, amplify and digitizes the input data, as well as compress the detected neural spike activity, minimizing the amount of generated data. A distributed digital signal processing approach, with tasks at channel and array levels, has been found an efficient solution for reducing the power consumption of the SoC and simplifying communications through the array.
The communications section of the SoC defines the operation mode of the recording channels and implements a full-duplex communication protocol for data transmission through the wireless link. In one operation mode, the selected channels can be configured to detect and compress neural spikes so that feature vectors instead of raw signal samples are transferred. In another mode, the system implements a self-calibration algorithm which automatically adapts the filter bandwidth and the gain setting of the channels. The system also offers different alternatives for raw data transmission in which the number of active channels and the effective sampling rate are traded off. In all cases, the total throughput rate of the SoC keeps below 4.0 Mbps as imposed by the wireless link. The sensor consumes 330 μW from a 1.2 V voltage supply in the Data Compression mode, the most demanding one.
