Abstract-Reliable, multi-channel neural recording is critical to the neuroscience research and clinical treatment. However, most hardware development of fully integrated, multi-channel wireless neural recorders to-date, is still in the proof-of-concept stage. To be ready for practical use, the trade-offs between performance, power consumption, device size, robustness, and compatibility need to be carefully taken into account. This paper presents an optimized wireless compressed sensing neural signal recording system. The system takes advantages of both custom integrated circuits and universal compatible wireless solutions. The proposed system includes an implantable wireless system-on-chip (SoC) and an external wireless relay. The SoC integrates 16-channel low-noise neural amplifiers, programmable filters and gain stages, a SAR ADC, a real-time compressed sensing module, and a near field wireless power and data transmission link. The external relay integrates a 32 bit low-power microcontroller with Bluetooth 4.0 wireless module, a programming interface, and an inductive charging unit. The SoC achieves high signal recording quality with minimized power consumption, while reducing the risk of infection from through-skin connectors. The external relay maximizes the compatibility and programmability. The proposed compressed sensing module is highly configurable, featuring a SNDR of 9.78 dB with a compression ratio of 8×. The SoC has been fabricated in a 180 nm standard CMOS technology, occupying 2.1 mm × 0.6 mm silicon area. A pre-implantable system has been assembled to demonstrate the proposed paradigm. The developed system has been successfully used for long-term wireless neural recording in freely behaving rhesus monkey.
A Fully Integrated Wireless Compressed Sensing
Neural Signal Acquisition System for Chronic Recording and Brain Machine Interface I. INTRODUCTION N EURAL recording of large-scale brain activities revolutionizes our understanding of the human brain [1] . Recent studies estimate that simultaneous invasive recording of 100 000 neurons are needed for decoding full-body movements [2] , which is beyond the ability of the cutting-edge brain machine interface (BMI) devices. In practical neuroscience research, the multi-channel recording from a freely behaving animal in a natural environment is important for a wide range of experiments, however, most research to-date still relies on rack-mount equipments. The required recording of high bandwidth neural signals in multi-channels, multi-brain areas via wireless miniature devices places a big challenge on existing electronic technology and design techniques. The design optimizations of a fully integrated neural signal acquisition system with on-chip data compression is thus highly desirable.
There are several key requirements of a successful chronic invasive neural recording system: i) longevity requirement, safe electrode interface, minimum tissue damage and infection; ii) noise, bandwidth and channel count requirement for the target signal; iii) sufficient battery life to support long-term recording; and iv) reliable data storage or wireless transmission. In addition, the research of BMI usually requires the front-end to be highly programmable, wireless compatible with commercial equipments and sensors, and also easy to upgrade. All of these features together constitute a practical recording system for neuroscience research and BMI development. A balance between the requirements of each system blocks need to be carefully considered.
Many prior neural recording systems have been reported with improvements in noise performance, channel count, wireless communication, and system power consumption. For neural amplifier designs, low-noise amplifiers with closed-loop gain set by capacitors [3] - [7] , and resistors [8] , [9] have been commonly used. Chopping is usually adopted to achieve ultra lownoise in low frequency band [4] , [5] , [8] . Systems with large number of channels have also been reported [7] , [10] - [12] . Several of these designs integrate a wireless transceiver. Among them, ISM band FSK [7] , [13] , [14] , FM [15] , [16] , UWB [11] , [17] , and backscattering [10] , [18] , [19] are commonly used. In addition, some of the systems are fully integrated and potentially implantable [10] , [12] , [20] .
On-chip data compression is an effective solution to reduce the system power by cutting off the data rate of the wireless telemetry, which is usually the power bottleneck of the overall system [21] . Various on-chip data compression techniques for neural signals have been proposed. For single or multi-units action potential recording, spike detection [22] , and spike sorting [23] are the most effective ways to reduce the recording data rate, and can also be used to drive the BMIs directly. The hardware implementation of spike detection can be as simple as a comparator with a pre-defined threshold. A compression ratio higher than 100× can be achieved with little power consumption [24] . However, this is usually at the price of losing information of the raw waveform, and can also be unreliable in long-time recording since the spike waveform may change due to the change of electrode impedance or electrode displacement. For EEG, ECoG, or LFP, wavelet transformation is an effective solution, given its high compression ratio and good reconstruction quality [25] , [26] . However, the hardware implementation of wavelet transformation is nontrivial and usually takes considerable area and power. Moreover, a custom design for a specific signal type and sampling frequency significantly reduces the potential applications of the traditional recording compression systems.
Compressed sensing is an emerging signal processing technique that enables sub-Nyquist sampling and near lossless reconstruction of a signal [27] . Since it was introduced in 2006 [28] , the compressed sensing technique has also been successfully applied to rapid MRI [28] , computational image sensors [29] , biomedical sensors [21] , [30] , high frequency receivers [31] , and other applications. Compressed sensing is especially attractive to neural signal recording given its minimum hardware cost in the front-end, favouring power constrainted implanted devices. Prior research shows the sparsity of neural signals in different frequency bands [30] , [32] - [34] . Since an on-chip transformation using random matrix usually achieves sufficient incoherence and restricted isometry property (RIP) [35] , a general purpose recording device can be designed without the knowledge of the target signal.
In addition, the compressed sensing measurements can also be used in signal processing (e.g., machine learning classifiers) [36] , or driving BMI directly. Without a full reconstruction of the raw signal, the processing in the compressed domain can be easily implemented in a low-power embedded systems.
In this paper, we describe a paradigm that meets the aforementioned requirements of a chronic neural recording system. The proposed system consists of an implantable wireless SoC and an external wearable transceiver sub-system. The system design realizes a trade-off between power consumption, compatibility, upgradability, without sacrificing the recorded signal quality. Compared with previous designs, this work presents a complete wireless system that is ready to use in neuroscience research and BMI applications. Long term recording in freely moving animals have been successfully conducted using the developed system prototypes. The system paradigm and circuit techniques proposed in this work can be used in many relevant neural recording devices' development.
The paper is organized as follows. Section II presents the overview of the proposed system. Section III shows the circuits implementation of each building blocks. Section IV shows the measurement results, while Section V concludes the paper.
II. NEURAL SIGNAL ACQUISITION SYSTEM OVERVIEW
The paradigm of the chronic wireless neural signal acquisition system is illustrated in Fig. 1 . The system has a dedicated implantable sub-system and a flexible external sub-system. The implantable sub-system contains a fully integrated compressed sensing neural signal recording SoC, an inductive coil and a super capacitor within a miniature bio-compatible package. The device can be placed under the skin but above the skull bone, while the recording electrode can be placed in any brain area of interest. The external sub-system consists of a standard wireless transceiver, a rechargeable battery, and a coil in a flexible substrate. The external sub-system powers the implanted device and collects data back through backscattering.
The advantages of the proposed system are three-folds: i) the implanted wireless device leaves the skin intact, which reduces the risk of infection, ii) the battery is left externally so that the device's life time will not be limited by the battery's recharging cycles, and the toxicity associated with batteries will not be a potential danger to the subject, iii) the external transceiver makes the system flexible and versatile, for instance, different wireless solutions or flash memory can be used for different situations. The upgrading of the system is also much easier, since the chronic implant can be used for years or even decades while the digital and wireless electronics develop much faster than the analog recording interface. A single pair of coils is used for both power delivery and data read back. A carrier frequency of 13.56 MHz is chosen given the trade-off between the power transfer efficiency and the data rate. Compressed sensing reduces the data rate of the wireless uplink, which is especially helpful in multiple channel recording. Fig. 2 shows the block diagram of the analog front-end implemented in this work. A fully differential low-noise instrumentation amplifier (IA) is used to amplify the neural signal. A following gm-C based high pass filter stage (HPF) conditions the signal with a tunable cut-off frequency. The next stage (LFP) is an operational transconductance amplifier (OTA) that converts the voltage signal into current with programmable lowpass frequency corner. The current outputs from each channel are multiplexed and then converted to a voltage using a transimpedance amplifier (TIA) with a programable gain. A singleto-differential (S2D) converter is used to drive the differential input ADC with an additional programmable gain. A 10-bit SAR ADC digitizes the signal.
III. CIRCUITS IMPLEMENTATION

A. Energy Efficient Analog Front-End
The IA in this work is a fully differential capacitor coupled neural amplifier, which amplifies the weak neural signal in a wide frequency band. The input capacitors block the large electrode offset and half-cell potential from the interface, giving a maximum input range. The closed-loop differential gain is set to be 34 dB to relieve the noise requirement for the following stages. The core of the IA is a low-noise OTA, as shown in Fig. 2(a-1) . The OTA has been designed to maximize the noise and power efficiency. Two stages are used to provide sufficient open-loop gain. A complementary input stage (M1-M4) is used to increase the overall transconductance without increasing the quiescent current [37] . All of the input transistors are biased in the sub-threshold region to achieve a high energy efficiency. Since the complementary stage has a limited input range, a fully differential structure is chosen. The first stage dominates the noise, so the input referred noise of the OTA can be expressed as
where g m1 (= g m2 ) are the transconductance of M1 (M2), and g m3 (= g m4 ) are the transconductance of M3 (M4). The flicker noise can be reduced by increasing the widths and lengths of the input transistors. If only thermal noise is considered in the following design optimization, the input referred noise voltage equals to
A biasing current of 1 μA is used in the first stage as a tradeoff between power and noise. A biasing current of 20 nA is used in the second stage. The dominant pole is set at the second stage, and stability is guaranteed by adding an additional capacitive load. The complementary input amplifier suffers from sensitivity of PVT variations [38] . Common mode feedback, as shown in Fig. 2(a-2) , is adopted to stabilize the DC output at half supply voltage. An ultra low-power programmable bandpass filter is integrated in each channel for selecting the frequency band of interest. The first stage is a fully-differential Gm-C highpass filter. The circuit schematic of the Gm block is show as A2 in Fig. 2(b) . Current division and local feedback are used to achieve low transconductance and an extended linear input range. The cut-off frequency can be programmed by tuning the transconductance. The second stage of the filter is a single-ended Gm-C based low pass filter. The circuit schematic of the Gm block is show as A3 in Fig. 2(c) . Source degeneration is used to achieve high linearity. The differential voltage signal is converted into a single-end current signal. Since a standard current mirror load is used, no extra power is wasted for this conversion, but the single-ended operation reduces the capacitor array size by half, which is important for this design to be implemented at the channel level. The low-pass frequency can be programmed by selecting the load capacitors.
The single-ended current output of the 16 channel is selected by a multiplexer. The single-ended signal reduces the effort in routing, the current signal does not have R-I drop problem in long signal line, and it is less susceptible to noise. The following TIA is used to convert the current signal back to voltage with a programmable gain. The gain can be set to be 5×, 6×, 7×, 8× by the compressed sensing digital processor. The gain of 2×, 4×, can be easily achieved in the binary digital processor, and the 3× can be achieved from shifting the 6× signal by 1 bit.
A single-to-differential converter (S2D) is used to convert the single-ended voltage output from the TIA to differential voltages around the half supply voltage. Additional programmable gain is added in this stage. The resistor values are designed to be R 1 = R 2 = R 4 , and the voltage gain is 2(1 + R 3 /R 4 ). R 3 can be programmed by a shift register. A Class-AB output stage has been designed to drive the sample-and-hold circuits of the following ADC stage.
A 10-bit successive approximation register (SAR) ADC is implemented for signal digitization. A SAR ADC is attractive for low-power, moderate resolution data conversion. Since a miniature unit capacitor can be used in a conventional binary capacitor array for the SAR ADC without compromising the ENOB, custom designed capacitors are often used to achieve low input capacitance and thus ultra low-power [39] , [40] . However, these designs usually require custom characterization for a specific fabrication process. In this work, a split capacitor array is adopted to reduce the total capacitance, lowering the power consumption and area. The overall ADC architecture is shown in Fig. 2(e) . The capacitors are realized as a standard metal-insulator-metal (MIM) structure. Monotonic switching procedure is applied to minimize the power con- sumption from unnecessarily charging and discharging of the capacitor array [39] . In addition, in the monotonic switching procedure, the first comparison is performed without switching, and the total capacitance is the same as the conventional capacitive SAR ADC's DAC array. So we convert the singleended signal to back to differential without penalty in die area and power consumption, but reduces the requirement for the comparator's design. A cascaded three-stage preamplifier with a dynamic latch is used as the comparator for the ADC. The schematics of the preamplifier and the latch are shown in Fig. 2(d) .
The control logic generation circuitry is shown in Fig. 3(b) . A global reset signal is used to synchronize the start of the AD conversion, and the control logic generation is cyclic. The clock cycle for the sample and hold time is configurable, and is used to compensate the processing time for the following compressed sensing stage for different configurations.
B. Compressed Sensing Module
Compressive sensing enables to sample signals at a rate lower than the Nyquist rate without greatly sacrificing the quality of the original signal. The digitized neural signal, x in , of a single channel is fed into the digital processing unit
that can be written as ⎡
Eq. (4) can be rewritten in the form of a summary of vector multiplications, as ⎡
There are two modes of operation. In the simple mode, the entries of the sampling matrix Φ are assigned to be 0, +1, or −1; in the high resolution mode, the entries can be assigned to be 0, ±(1/8), ±(2/8), . . . , ±(7/8). The compression ratio can be evaluated from M/N . In order to avoid large on-chip storage for the sampling matrix, a shift register chain is used to preload the coefficients at the beginning of each sampling loop. Fig. 3(d) illustrates the block diagram of the compressive sensing processing unit. Parallel output from the ADC is fed into the digital model. A simple sign control is applied before sending the ADC output to the adder for the simple mode. For the high resolution mode, the entry coefficients ±(3/8), ±(5/8) and ±(7/8) are realized by configuring the gain of the analog amplifier to 3, 5, and 7, respectively, while shifting the ADC output by 3-bit before sending the ADC output to the adder. The entries coefficients ±(1/8), ±(2/8) and ±(4/8) are realized by configuring the gain of the analog amplifier to 1 while shifting the ADC output by 3-bit, 2-bit, and 1-bit, respectively. The entries coefficients ±(2/8) and ± (6/8) are realized by configuring the gain of the analog amplifier to 1 and 3, respectively, while shifting the ADC output by 2-bit. There are M (M is equal to 16 in the proposed design) vector multiplication units integrated in the system. The entries of Φ is randomly generated off-line and used for the logic control inside of each vector multiplication unit. The output measurement y is reset after every N iteration. N is tunable to meet different compression ratio requirement. The dimension of x in is controlled by the iteration times. A parallel to serial convertor is integrated in the system for the readout of the measurements. According to CS theory, a dictionary for sparsifying neural signals is required for sparse recovery. In this section, neural data recording without compression is performed to generate a database for algorithm analysis at the very beginning. The database is divided into two halves, where one half is used for training the signal dependent dictionary D by unsupervised dictionary learning algorithm [33] and another half is used for testing the recovery performance. In proposed CS framework, we adopt on-chip Bernoulli sensing matrix Φ to compress the neural spikes or LFP x of length N into measurements y of length M , where normally M N and compression ratio is defined by M/N . The recovery problem below can be solved by Orthogonal Matching Pursuit [41] where a is the sparse coefficient vector and S indicates the sparse level, which ranges from 2 to 10. The recovered signal is defined asx = Da and the recovery quality is quantitatively evaluated by signal-to-noise and distortion ratio (SNDR) which is found by [42] 
C. On-Chip Wireless Power and Data Link
A low-power backscatter based wireless transmitter communicates with the external transceiver [43] . The backscatter transmitter consists of a PWM encoder and buffered transistor for antenna impedance modulation.
An active rectifier is used to achieve higher power efficiency [44] . Coupling coils are implemented off-chip. The system clock is recovered from the power waveform [16] . The circuitry of the clock recovery and division module is shown in Fig. 3(a) . The clock frequency can be configured by the register. Standard bandgap reference and low drop-out (LDO) are used in the power management unit. The block diagram and the circuit schematics of the power management module are shown in Fig. 4 . 
D. External Wireless Relay Board
An external wireless relay board has also been designed to demonstrate the proposed paradigm. The external sub-system consists of a microcontroller with integrated wireless transceiver, envelop detection circuits for reading the backscattered signal, power transmitter circuits, and battery management system.
A 32-bit ARM Cortex-M0 based wireless transceiver (Nordic Semiconductor nRF51822) is used as the central processor and wireless transceiver. The unit features 2.4 GHz transceiver, and supports Bluetooth 4.0 low-energy protocol, which provides an easy interface to the computer or mobile devices. A reliable wireless communication up to 5 m was measured in normal in-door environment. A Serial Peripheral Interface (SPI) based microSD card interface is optional in the system to allow long-term wireless recording without limited receiver range.
A computer user interface has been developed in Matlab to configure the device and read back the data. Signal reconduction and off-line analysis are also performed in the user interface.
IV. EXPERIMENTAL RESULTS
The proposed SoC design has been fabricated in an IBM 180 nm standard CMOS technology, occupying a silicon area of 2.1 mm × 0.8 mm.
Bench testing was conducted to verify the functions of the chip and the system. The measured performance of the chip is summarized in Table I . The measured frequency response of the low noise amplifier is shown in Fig. 5 . The input-referred noise spectrum is shown in Fig. 6 . A rms noise floor of 2.8 μV was measured. The measured CMRR and PSRR of the analog frontend in the frequency range of 0.5 Hz to 7 kHz are > 80 dB, and > 67 dB, respectively.
The SAR ADC's output spectrum with a near Nyquist input tone is shown in Fig. 7 . The measured INL and DNL are 0.85 LSB and 0.92 LSB, respectively. The ADC achieves a SNR of 56.6 dB and a SFDR of 70.3 dB.
An invasive neural recording was performed in an anesthetized rat with a tungsten microelectrode placed in its motor cortex. Action potential data is extracted by configuring the filter with a passband of 300 Hz to 7 kHz. Different compression ratios from 2, to 4, to 8, and to 16 have been applied, respectively. Dual-threshold level crossing spike detection has been used for both the uncompressed data and the restored data. Signal-to-noise distortion ratios (SNDR) of 3.60 dB, 9.78 dB, 30.60 dB, and 52.99 dB are achieved for compression ratios 16, 8, 4 and 2, respectively. Near lossless spike detection can be achieved while a lower than 8 compression ratio is applied. Fig. 8 compares both the time-domain waveform and the spectrogram of the uncompressed and restored local field potential (LFP) sampling data sets. The LFP exhibited rhythmic bouts of broadband power interleaved with low power epochs. According to Fig. 8(b) , the time-frequency content of the restored signal was very similar to the uncompressed LFP. Signal-to-noise distortion ratio (SNDR) of 9.04 dB, 4.85 dB, and 3.78 dB are achieved for compression ratios 4, 8, and 16, respectively. A demonstration system was developed to show the proposed concept, as shown in Fig. 9 . A open cavity plastic package is used for the chip, thus the size of the demonstration In-vivo evaluation of the device for long term operation was conducted in a rhesus macaque. An electrode was chronically implanted in the hippocampus. The recording device, including external transceiver, was housed in a small chamber that was fixed to the skull. Fig. 10 shows the spectrogram of a 24-hour continuous recording while the monkey was freely behaving in his home cage. The recording shows the states of hippocampal activity throughout the day. Greater power at higher frequencies (> 20 Hz) was associated with periods in which the animal was awake and freely moving about his home cage (hours 0-7.5 and 19-24). Greater power at low frequencies (< 20 Hz) was associated with sleeping (hours 7.5-19). Individual sleep cycles can be seen. Some broadband chewing artifacts were also present (around hours 3-4.5 and 20-22) corresponding with the times when the animal was fed. The overall activity pattern matches previous observations of sleepwake changes in neural activity.
V. CONCLUSION
In this work, a fully integrated wireless neural signal acquisition system is presented. A high efficiency wireless neural signal recording SoC with integrated compressed sensing processor was designed and fabricated in 180 nm CMOS technology. An external wireless relay was used to power the implantabe SoC, read back the data through backscattering, and transmit the data through universal wireless link. The system features high energy efficiency, high flexibility, compatibility, upgradability without compromising signal recording quality. By performing on-chip compressive sampling, the data rate is significantly reduced, which allows the system to support more recording channels without power penalty. According to the experimental results, a compression ratio up to 8× will cause negligible reduction of the data quality and/or information available in the raw data. Table II compares the proposed design with previous works in literature. A pre-implant system was assembled and successfully demonstrated the proposed paradigm. Bench tests and In-vivo experimental results are presented. The system shows a promising chronic neural signal recording paradigm for neuroscience research and BMI applications. 
