I. INTRODUCTION

B
RAIN machine interface (BMI) is a device designed to generate an artificial communication pathway between the brain and the external hardware. BMI has been widely used in neuroprosthetics system to restore the communication lines that has been lost due to neural injury or disease [1] . Clinical experimental results have shown that paralyzed individuals can control a prosthetic limb [2] , [3] according to the motor intent decoded from neural signals recorded by a BMI. At the same time, neural stimulation has been used to apply encoded patterns to the brain as the sense of touch and proprioception according to state of the prosthesis and its interaction with the environment [4] , [5] . Establishing a bidirectional, sensorimotor neural interface is necessary to achieving the level of performance required for this technology to be clinically relevant [6] , [7] . Therefore, design of a bidirectional, closed-loop neural interface integrated neural recording, stimulation and the signal processing module is critical.
However, majority of existing designs are open-loop, singledirectional devices, focusing on either neural signal recording, or neural stimulation. The absence of a closed-loop BMI device limits the application of BMI in neuroscience research and clinical practice. In addition, in existing BMI system, neural signal processing is usually performed in high-speed external computer, which is impractical for a multiple electrode array (MEA) due to the high data rate and the delay caused in data interface and computer computation. Furthermore, the huge amount of neural data places a big challenge on data transmission, resulting in high power consumption and high data loss, especially for wireless system implementation. One promising solution to these problems is to implement the most computation intensive neural feature extraction units and closed-loop controllers onchip, which greatly reduces latency, wireless data rate, and system power consumption. Fig. 1 illustrates the high-level block diagram of the proposed BMI system.
Recently, BMI systems integrated on-chip signal processing have been reported in the literature [8] - [12] . In the reported designs, neural features, i.e., LFP energy and action potential are commonly used as effective neural features or feedback indicators. Both analog and digital implementations of the neural feature extraction modules have been proposed. The traditional analog implementation can achieve a higher power efficiency, but usually suffering from poor programmability and low linearity. While the conventional digital processor makes use of a serial computation, which will not be able to provide an efficient feature extraction capability for a large number of channels.
In this work, we proposed a fully programmable, bidirectional neural interface system for closed-loop neuroscience experiments. The system consists of a custom SoC, which performs noise sensitive neural signal recording, high safety neural stimulation, computation intensive neural feature extraction, and on-chip closed-loop operation. Novel implementation of digital-assisted analog parallel neural feature extraction units are proposed. The prototype system takes advantage of the programmability of general purpose microcontroller (MCU) with integrated flash memory, and universal wireless protocol (Bluetooth) to interface with general computers or work stations. Commercial sensor (3-axis accelerometer) has also been integrated in the system to monitoring the animal activities. Wireless inductive charging module is integrated for easily recharging the batteries.
This paper is organized as follows. Section II introduces the overall system with featured innovations. Section III describes the circuit implementation of each block. Experimental results are shown in Section IV, while Section V concludes the entire work.
II. SYSTEM OVERVIEW AND MOTIVATIONS
The goal of the proposed system is to provide a reliable generalized BMI for bidirectional and closed-loop neuroscience experiments, especially for experiments performed on freely behaving animals. To achieve this goal, design optimizations are performed from neuron-electronics interface level up to the system level. The overall architecture of the proposed BMI system is as shown in Fig. 2 . The system consists of the custom SoC and general purpose electronics available off-the-shelf.
A. Natural Logarithmic Domain Neural Energy Extraction
A substantial amount of information regarding motor intent can be inferred from field potential recordings [13] , [14] . Field potentials, either recorded with electrodes penetrating into the brain or at the brain surface, reflect the summed activity of thousands to millions of neurons. Oscillations are particularly prominent in field potential recordings and reflect synchronous, rhythmic changes in activity across the network. The recorded oscillations contain information correlated with a number of different behavioral processes, i.e., motor planning [15] . While decoding intent from field potentials for a neuroprosthetic application, it is typical to extract energy from several discrete frequency bands [9] .
A variety of distinct brain oscillations exist, with center frequencies spaced logarithmically [16] , as illustrated in Fig. 3 . Linear tuning of the neural extraction filters is widely used in conventional LFP recording hardware designs, which is not efficient for extraction brain oscillations. A natural logarithmic domain tuning is proposed in this work, which provides sufficient resolution for extracting the low-frequency brain oscillations, without increasing the number of tuning steps.
B. Closed-Loop System Using PID Controller
In a typical bidirectional BMI, as illustrated in Fig. 4(a) , the user determines how to update the motor intent according to the encoded sensory information. In other words, the closedloop control policy origins from the brain not the neural interface system. However, there are BMI applications in which it could be useful to have a closed-loop mechanism, such as a PID controller, integrated into the system. As illustrated in Fig. 4(b) , [17] proposed an application aiming to improve the sensory encoding capacity of the BMI. The method involves an encoder mapping sensing data acquired from a prosthetic to desired patterns related to the somatosensory cortex activity. The errors between these desired patterns and those recorded in somatosensory cortex are used in a PID controller to update stimulation of sub-cortical somatosensory areas in the thalamus or brainstem. This approach could elicit more continuous, natural sensory percepts compared to those evoked by the limited set of pre-programmed typical stimulation patterns [18] .
Another application of the proposed neural interface system is to control a paralyzed arm, using muscle stimulation as shown in Fig. 4(c) , rather than a prosthetic arm. Brain-controlled muscle stimulation has been shown to be a viable method of reanimating paralyzed arms in monkeys and humans [19] - [21] . In these studies the muscle stimulation, and thus the arm movement trajectory, was entirely driven by motor cortex activity. However, prior work has shown that recording from pre-motor cortical areas to decode motor goals, not entire intended trajectories, can improve performance and lower cognitive demand [22] , [23] . Thus a second potential BMI application for a closedloop controller could be to update muscle stimulation based on the error between a decoded goal and the recorded state of the re-animated arm [24] , as illustrated in Fig. 4(d) .
Other examples for closed-loop bidirection BMI applications include deep brain stimulation (DBS) for Parkinson Disease control, with parameters modulated from the internal brain state. Closed-loop stimulation of the sleep slow oscillation has been proposed to enhance memory [25] . In addition, sense-stimulate devices with closed-loop controllers have also been proposed for neuromodulatory applications [26] and for closed-loop electrophysiological studies [27] . PID controllers have been used to characterize input-output neuronal relationships [28] .
PID control is the most commonly used control loop feedback mechanism. PID controller relies only on the process variable and the target value, not requiring the knowledge of a system model or the underlying process. Even though it has wide potential neuroscience applications, it has not been widely integrated in BMI hardwares. In this work, we implemented a programmable PID controller in each neural recording channel to enable a variety of closed-loop control experiments and applications.
C. bidirectional Neural Interface System Integration
The complete integrated BMI system is a battery powered portable device consists of the custom designed SoC and sup- porting electronics. The devices have been used in various neuroscience experiments including behavior experiments. The BMI device is designed to be housed in a secure chamber fixed on the animals' skull to conduct long term study while the animals are freely behaving like social interaction and locomotion.
The system-on-chip (SoC) mainly consists of 1) a 16-channel neural front-end with neural feature extraction and closed-loop controller, 2) a 16-channel programmable stimulators, 3) two ADCs with different specifications, 4) power management units, analog references, and other peripheral circuits. The 16-channel IO pads are shared between recording and stimulation electrodes. Multiple chips can be used in parallel to boost the number of channels. The supporting electronics consist of 1) a general purpose MCU with integrated flash memory and wireless module (Bluetooth), 2) battery management circuit (including inductive charging), 3) expand flash memory (optional), and 4) 3-axis accelerometer (optional). A user interface has been designed in Matlab to support configuration and data readout. It is a modified version of our previous design [8] .
III. CIRCUITS IMPLEMENTATION
A. Neural Recording Front-End
The overall signal flow of the proposed neural recording frontend has been illustrated in Fig. 2 . The wide-band neural signals are acquired by the low-noise neural amplifiers, and then filtered into local field potential (LFP) and action potential (AP) bands for further neural feature extraction. The neural amplifier is designed with sufficient gain to release the noise requirement of the following filter stages.
The circuit schematic of the implemented neural amplifier is show in Fig. 5 (a). Capacitor coupled input stage is used to maximize the input range, and to remove the large DC offset from the electrode-tissue interface [29] . The mid-band gain is set to be 40 dB by the ratio of the input and the feedback capacitors [30] . MOS pseudo resistors are used to enable very low cutoff frequency operation, as shown in Fig. 5(b) . The gates of the MOS pseudo resistors are set to high in normal operation. A synchronization signal from the stimulator can temporarily lower the resistance, and shift the high-pass corner to a higher frequency, thus preventing the recording amplifier from saturation in the stimulation artifacts [31] . The input stage can also be disconnected from the pads to prevent the high stimulation voltage from breaking the input gates. The switches are implemented by thick oxide transistors. The synchronization signal shares between blanking and pole shifting switches, but they can be independently enabled.
The circuit schematic of the core operational transconductance amplifier (OTA) is as shown in Fig. 5 . A two-stage current mirror OTA is chosen with a dominate pole at the second stage [32] . The input differential pair is biased at the sub-threshold region to maximize the noise power efficiency [30] . Source degeneration current mirrors are used in the OTA to lower the noise contribution from the current mirror. By properly choosing the values of the resistors, the noise contributions from the sourcedegenerated current sources can be much smaller compared to the MOS transistors alone [33] .
The output of the wide-band neural amplifier is split into two paths, the LFP and AP. A 1st order GmC lowpass filter [33] is implemented to attenuate the high frequency spikes. The lowpass frequency can be tuned by programming the biasing current of the Gm block. A 2nd order highpass filter [34] is used to remove the large low frequency oscillation for further action potential discrimination.
B. Neural Feature Extraction and PID Controller
Each neural recording channel integrats a neural energy extraction module, an action potential detection module, and a PID controller. The 16 neural feature extraction units can be programmed and work independently. The filters in each channel can also be combined together as a filter bank to perform spectrum analysis for one channel, as illustrated in Fig. 6 (a).
The processing flow of the extraction of LFP energy is as follows. A lowpass filter with a frequency corner of 300 Hz is first used to remove the high frequency spikes. Then, A 4th-order stagger-tuned biquad filter is used to bandpass the neural signal in a programmable frequency band [35] . The filtered signal is then squared in a Gilbert multiplier to calculate the energy, and energy integral is produced by a leaky integrator with a programmable time constant [36] .
Given the low frequency nature of the neural signal, filters with very large time constant have to be implemented on chip. Op-amp based filter features high linearity and good signal-tonoise ratio (SNR), but suffering from high power consumption, large passive components (non-linear if MOS resistors are used), and difficulty in tuning. Switched capacitor filter is widely used due to its high accuracy and tunability [9] , but the limitations in the tunable range, capacitor size, and additional clock generation circuitry are the bottlenecks. GmC filters can potentially realize large time constant in ultra low power and compact circuitry, thus have been widely used in biomedical applications [35] . In this work, a Gm block with a tunable transconductance range of two decades, with extended linear range has been implemented. The circuit schematic of the implemented Gm block is as shown in Fig. 6(b) . The input transistors are biased in the sub-threshold region [37] . Thick oxide devices are used for low transconductance and low leakage. The transconductance features a linear relation with the biasing current in the sub-threshold region [38] , as expressed
where ζ is a parameter that depends on process, and U T = kT /q. The transconductance of the Gm block can be directly tuned by the biasing current. In order to realize a larger linear range, negative feedback is used in the input differential pair. A feedback path has an identical current amplitude to maximize the common mode voltage range, and reduce the distortion term by a factor of 4 [39] . Bulk degeneration [40] is also used to enhance the linear input range. Current division is used at the input differential pair to reduce the transconductance. Capacitor attenuation [41] is used to further reduce the input signal swing and lower the overall transconductance. A programmable biasing current generation module is designed, as shown in Fig. 6(c) . A two-step 6-bit resistor ladder DAC is used to generate 64 steps linear tuning voltage between V cm and V ref . An exponential current reference module is used to generate biasing current for the gm block. Transistors M1 to M6 are biased in the sub-threshold region. Thick oxide devices are used to minimize the leakage currents. When V D S is higher the 4U T , the sub-threshold current can be expressed as
The currents can be expressed as
Thus
Also
The generated biasing current can be expressed as
Thus, the uniform voltage tuning results in an exponentially increasing biasing current, and so is the transconductance.
where code is the digital input of the DAC. I ref is generated by on-chip bandgap reference, and is independent from temperature and supply voltage. Large gate area transistors are used in the current generation module and have been carefully layouted to minimize the mismatch. The process variation can be further calibrated by tuning the reference voltage. The complete LFP energy extraction processing circuits are shown in Fig. 6(d) . A staggered tuned 4th-order band-pass filter is implemented [42] . The center frequency and quality factor of each biquad are independently tuneable. Only two grounded capacitors are used in each biquad, realizing a high compactness. The transfer function of the given bandpass filter is given by
In this work, the transconductance of the Gm blocks are set to be g m 1 = g m 2 , and g m 3 = g m 4 . The capacitors are set to be C 1 = C 2 . Thus,
From Eq. (11) and Eq. (13)
Thus the center frequency of the biquad can be exponentially tuned by the digital code. And
M7 in Fig. 6 (c) is a diode-connected transistors with same length as the current mirrors used in the gm block. The width of M7 can be programmed to divide the current reference, so the ratio of I gm 1 and I gm 3 can be programmed to tune the quality factor. Compared with prior publications, this implementation features small silicon area, high digital programmability and ultra low power consumption. A Gilbert multiplier biased in the sub-threshold region is used to square the band-passed signal. The integral of the output current of the multiplier is computed in the leaky GmC integrator [35] . The moving window length can be tuned by programming the time constant of the integrator.
A current-mode AP discrimination unit has been integrated in each channel. Two amplitude thresholds and time windows are used to discriminate the APs from different neurons [8] , as illustrated in Fig. 7(a) . After the 2nd order highpass filter, the signal is converted from voltage to current in a tunable transconductance. The transconductance is set by the biasing voltage V T une , while M1 is in the deep triode region. V T une can also compensate the threshold variation of M1. The signal current is compared with the depolarization threshold current TH1, which is generated by an 8-bit current DAC. The depolarization threshold is usually set by 5σ value of the signal, and is programed by the two wire interface (TWI), shown as dat and clock in the Fig. 7(b) . The comparator is disable for a period of Φ 1, while the reference current switches to the repolarization threshold TH2. Within a period of Φ2, if the signal crosses the TH2, an action potential is detected.
A programmable PID controller has been integrated to realize the close-loop control. The circuit schematic is shown in Fig. 8 . In the proposed PID control system, the error signal is the difference between the extracted neural feature and its desired value. The output of the PID controller is a weighted sum of the error signal, the derivative of the error signal, and the integral of the error signal. The actuator in the system is the stimulator. The output of the PID controller can be used to modulate the stimulating current amplitude, stimulating frequency, or stimulation pulse width. The sensor in the system is the neural feature extraction unit. Either neural frequency energy or the action potential fire rate can be used as input of the PID controller. In this work, the fire rate is calculated and converted to a voltage signal in the embedded MCU. However, this part can be easily integrated on-chip in the future by using a lossy integrator.
The parameters for each of the P, I, and D components are independently programmable. The transfer function of the PID controller is given by time constant for the differentiator is τ D = C 2 /g m 4 . The time constant is designed to be programmable in two decades, from 1 ms to 100 ms, or from 10 ms to 1 s. The programming and parameter choosing of the PID controller follows the established PID control theory [43] , [44] . For complex neural system where accurate model can hardly be achieved, plant exploration based method can be used. The initial estimation of the optimal operating points can be learnt from the Zeigler-Nichols tuning method [45] . And the final controller parameters can be determined by using an iterative procedure, based on the least root mean square error. Consider the requirement of a BMI system, sufficient gain and phase margin must be guaranteed.
C. Analog to Digital Converters
Two ADCs have been integrated on-chip to optimize the power efficiency of the data conversion. An 8-bit low-power current mode ADC has been implemented for the conversion of action potential signals, while a 10-bit voltage mode SAR ADC has been implemented for neural features and LFP digitizations. Action potential recording features lower linearity requirement compared with wideband neural signals or neural features. Current signals are more robust to noise in routing lines while the current operation allows lower supply voltage. A successive approximation register (SAR) architecture is chosen for this current-mode ADC. An 8-bit binary weighted current steering DAC with current calibration [46] is implemented to achieve good linearity. The designed current mode ADC works with a supply voltage of 0.9 V.
The block diagram of the current mode ADC is shown in Fig. 9(a) . The ADC starts continuous sampling when the reset line is released. The circuit schematic of the current mode comparator is shown in Fig. 9(b) . The input stage of the current comparator is a combination of capacitive input and resistive feedback [46] , [47] , as a trade-off between power, speed and sensitivity. A 10-bit voltage mode SAR ADC with an energy efficient monotonic capacitor switching procedure [48] has been integrated. Single-ended operation is used in the feature extraction units to reduce the area, so a single-to-differential converter has been implemented to drive the differential sampling capacitors array. Spilt capacitor array is used to reduce the total capacitance, lowering the power consumption and the die area cost. The capacitors are realized as a standard metal-insulator-metal (MIM) structure, as provided in the design kit. Power-gating has been used to shut down the ADC modules to minimize the power leakage.
D. Multi-Mode Stimulator
16-channel current mode neural stimulators have been integrated in this work, which support functional electrical stimulation or deep brain stimulation. The stimulator is designed to be fully programmable to meet all the electrophysiology requirements. There are 4 independent driving modules, each includes a 1:4 demux to support 4 channels and provide near simultaneous stimulation. The stimulator can perform monopolar or bipolar, monophase or biphasic, symmetrical or asymmetrical charge balanced stimulation. A typical biphasic stimulation waveform is illustrated in Fig. 10 . The logic and timing generation module of the stimulator can be programmed individually. The stimulator command and parameter registers are listed in Table I . In addition to the regular operating modes, the stimulator can be configure to output continuous current in order to test the DAC and output stage.
The circuit schematic of the multi-mode stimulator is shown in Fig. 11 . The stimulator consists of three modules: 1) current mode DAC, high output impedance current sinks and sources, 2) high voltage switch matrix with level shifter, and 3) local logic and timing generation module. A 6-bit current mode DAC is used to generate the stimulation current reference. The output stage is designed with regulating op-amp, resulting an output impedance higher than 200 MΩ. The op-amp is disabled when the stimulator is in the idle mode to save power consumption. Thick oxide devices were used to tolerant high output impedance. For larger surface electrodes, a high current mode can be enabled to boost the output current to 4 mA. In the monophase mode, reversal phase is disabled. In the biphasic, monopolar mode, only one electrode is selected. In the biphasic, bipolar mode, the stimulation and counter electrodes can be selected arbitrarily from all the 16 channels. Module position 0 × 00-0 × 11 Word 00 [4:5] Stim. mode 00-biphasic, monopolar 01-biphasic, bipolar 10-monophase Word 00 [6] Stim. power 0-low (FS: 256 uA) 1-high (FS: 4096 uA) Word 00 [7] Stim. on/off 0-stimulator OFF Pulse group interval (T L ) 8 ms-2 s Fig. 11 . The proposed multi-mode stimulator module. Each module demux to 4 channels. It consists of 1) a current mode DAC with high output impedance current sinks and sources, 2) a high voltage switch matrix, and 3) a local logic and timing generation module.
E. System Integration
A battery powered PCB-based system has been integrated using the designed chip and off-the-shelf electronics. A 32-bit ARM Cortex M0 based wireless transceiver (Nordic Semiconductor nRF51822) is used as the central processor. The MCU integrated an intergrated 2.4 GHz wireless module and a Bluetooth 4.0 software stack, which enables an user-friendly interface to computers or mobile devices. The configuration of the SoC is stored in the flash memeroy of the MCU, and it can be programmed wirelessly by the Bluetooth link. Once the device is powered up, the MCU first reads the default configuration in the flash memory, and then configure the chip accordingly. The interface between the MCU and the chip is shown in Fig. 12 . The configuration and data readout are through a simplified two-wire interface (TWI) module. The TWI module supports standard I 2 C protocol [49] which is compatible with most general purpose MCU. The MCU works as the master and the chips work as slaves. The MCU first sends the address and the chip with a same address response. Only two pads are used to set up address, thus, the current implementation can support up to 4 chips (64 channels in total). This can be easily expend in the future to support 127 chips (full 8-b address). The ST ART , ST OP and AN SW ER commands are also shown in Fig. 12 .
T3168 and XKT510 are used as the wireless power transmitter and receiver ICs. The wireless charging uses a switching frequency of 125 kHz. MC73831 is used for battery management. A 3-axis accelerometer ADXL345 has also been integrated in the system, with 3-wire SPI interface to the MCU.
A 4.65 g 150 mAh lithium polymer battery is used to power up the system. On-chip DC-DC converters and regulators are intergrated to optimize the power efficiency. The analog frontend and voltage-mode ADC are designed to operate at 1.8 V. The current-mode ADC and the digital circuits operate at 0.9 V. The stimulator back-end is designed under 5 V power supply. The analog references are generated on-chip, while the digital clocks are provided by the external low-power MCU.
IV. EXPERIMENTAL RESULTS
The design has been fabricated in 180 nm CMOS process, occupying a silicon area of 3.9 mm × 0.95 mm. The die photo is shown in Fig. 13(a) , with major building blocks highlighted. One device we have made using the fabricated chip is shown in Fig. 13(b) . The system includes two of the proposed chips to support 32 recording and stimulating channels. The dimension of the device is 30.1 mm × 18.3 mm, and the weight is 18 g including the battery and inductive charging coil. Bench tests were performed to verify the functions and evaluate the performance of the proposed design. In-vivo experimental results performed on Rhesus macaque and Long-Evans rat are shown.
The measured input referred noise of the analog front-end is 4.57 μVrms in a 0.3 Hz-7 kHz bandwidth, with a noise efficiency factor (NEF) [30] of 4.77. The measured CMRR is 81 dB and the PSRR is 71 dB. The cut-off frequency to separate the LFP and action potential signals is tunable, and the default value is set to be 300 Hz.
The frequency response of the natural logarithmic tuning neural energy extraction module were measured. The reference voltages were calibrated to set the center frequency of the unit programming step. All the tuning process is done by programming the digital registers. Fig. 14 shows the measurements of every four steps out of the 64 possible steps, with a frequency ranging from 1 Hz to 200 Hz. Notice that the x-axis is plotted in natural logarithmic domain. Fig. 15 shows the measurements of the tuning of the quality factor. The other measured specifications are listed in Table II . The measured output spectrum of one biquad stage filter with center frequency at 26.6 Hz is shown in Fig. 16 . The input signal is 500 mVpp. A SFDR of 56.6 dB is achieved. The integral noise of one stage biquad filter is less than 0.12 mV with all quality factors.
The action potential discrimination module is tested by a dataset of extracellular recording of a crayfish abdominal ganglion, which contains four spontaneously active motor neurons. The SNR for the four neurons are 3.1 dB, 2.7 dB, 4.5 dB, 8.1 dB, respectively. The detection accuracy is defined as where the T P is the number of correct detections, F P is the number of wrong detections, and F N is the number of missed action potentials. The resulting detection accuracies of the four neurons using the designed action potential discriminator are 97.5%, 96.9%, 98.8%, and 100%, respectively. A closed-loop neuronal response clamp experiment [27] , [28] was set up to test the proposed chip. The nervous system of man's and/or animals' response to the rapidly changing sensory information in highly variable, complex dynamics. The dynamic response is reflected from single neuron to neuronal network. Thus, it is important to study the behavior in a closed-loop approach in the appropriate context of realistic input-output dependence. Voltage, current-clamps are well-known techniques [27] in closed-loop electrophysiology. Recently, dynamic neuronal response clamp technique was proposed to study the threshold dynamics of neuron using extracellular stimulation and recording. A modified version of this technique is employed to test the proposed closed-loop system including the PID controller, stimulator, and action potential detector.
The diagram of the designed testing system is illustrated in Fig. 17 . The integrate and fire model [50] for the single neuron employed in this experiment can be expressed as
where τ m ≈ 10 ms is the membrane time constant, V m is the resting membrane potential, V (t) is the actual membrane potential as a function of time, R m ≈ 10 7 Ω and I s (t) is the stimulation current. Once the membrane potential reaches a certain threshold V T H , an action potential occurs and reset the potential back to its resting membrane potential. In this testing, an extra MCU (Atmel XMEGA 128A4U) with ADC and DAC was used to model the neuron. The MCU is running at a sampling rate of 100 KHz, corresponding to a time resolution dt = 10 μs. The continuous time differential equation is simplied by a discrete difference equation for implementing in MCU. The MCU's ADC measures the R m I s [t] , and the DAC generates V [t] based on the following equations:
The stimulator is reconfigured in a testing mode to output a continuous stimulation current to meet the requirement of intracellular stimulation. The stimulation current amplitude is modulated by the output voltage of the PID controller. The neural model response to the stimulation current, output the membrane potential. The action potential detector module evaluate the membrane potential voltage with a pre-defined threshold voltage. The output of the detector is a PWM wave, which is sent to the integrator and converted to a voltage promotional to the spike rate. In this work, the spike rate is converted to voltage in the embedded MCU, however, this part can be further integrated on-chip by using lossy integrator. The difference between the integrator's output voltage and the reference voltage is sent to the PID controller. Fig. 18 shows 12 testing trails with different proportionalintegral-derivative parameters. The dots indicate the action potential's time stamps. A same reference was set at time 0, the neuron responses to the stimulation current until it settled at a constant fire rate. Fig. 19 shows 12 testing trails with different references. The neuron settled at a relative constant fire rate proportional to the reference, in a manner based on the choosing of the P, I, and D terms. Testing results show that by programming the parameters, one can control the behavior of the neuron without the knowledge of the exact model [27] .
One in-vivo testing of the device was performed on a female Long-Evans rat, with a bipolar electrode pair implanted in the sensory cortex, and a monopolar electrode implanted in the motor cortex. LFPs were recorded using the designed device in both areas, as shown in Fig. 20(a) . Stimulation was performed by the device in the sensory cortex, and the compliance voltage of the stimulating electrode was measured by instrumentation. Fig. 20(b) shows the measured compliance voltage of the stimulation pulse trains and a single pulse. A bidirectional stimulation and recording experiment was conducted by stimulating the rat's sensory cortex and recorded in the motor cortex. Fig. 21 shows the recording while different current amplitudes are used for stimulation. When the required stimulation current is too large (2 mA in this experiment, depending on the surface area and material of the electrode), the recording electrode would saturate. Discharge and shift the high pass frequency corner can help recover faster, so less information will lose due to the stimulation artifacts.
Another in-vivo testing of the device was performed on a male rhesus macaque (Macaca mulatta) with electrodes implanted chronically in the left hippocampus. In this experiment, 22 . In-vivo recording in a Rhesus macaque using the designed chip. The extracted energy in four brain oscillation bands (Theta, Beta, Gamma, and Fast) compared with the theoretical computations (dashed lines). Fig. 23 . The spectrum of a 6 hour continuous recording using the proposed system. The animal was from awake (high frequency oscillation more active) to sleep (low frequency oscillation more active). the device was housed in a secure plastic chamber, which is fixed on the skull of the macaque. Continuous recording over 24 hours were conducted while the animal is freely behaving in the home cage. 
V. CONCLUSION
In this paper, a bidirectional, closed-loop brain machine interface system has been reported. The system includes a 16-channel custom chip, integrated low-noise neural amplifiers, neural feature extraction units, neural stimulators, ADCs, and PID controllers. A channel-level energy efficient neural feature extraction module has been proposed, including a natural logdomain tuning filter bank, and a current-mode action potential detection. The system is highly programmable and energy efficient. A prototype chip has been fabricated in 180 nm CMOS technology. General purpose wireless modules, memory media, inductive charging and power management units were also integrated in the system. Bench testing and in-vivo experimental results are shown in this paper. The system has been used in neuroscience research. A comparison with recent reported designs of bidirectional neural interface is listed in Table III . The proposed system provides a promising solution for bidirectional brain machine interface applications, especially for closed-loop experiments with freely behaving animals.
