Abstract-We present a high density CMOS neural probe with active electrodes (pixels), consisting of dedicated in-situ circuits for signal source amplification. The complete probe contains 1356 neuron sized (20x20 µm 2 ) pixels densely packed on a 50 µm thick, 100 µm wide and 8 mm long shank. It allows simultaneous highperformance recording from 678 electrodes and a possibility to simultaneously observe all of the 1356 electrodes with increased noise. This considerably surpasses the state of the art active neural probes in electrode count and flexibility. The measured action potential band noise is 12.4 µVrms, with just 3 µW power dissipation per electrode amplifier and 45 µW per channel (including data transmission).
INTRODUCTION
Due to the need for large-scale recording from individual neurons in multiple brain areas, both high density and high number of electrodes are necessary in neural probes [1] . In order to minimize tissue damage, the 'shank' (the implanted part of the probe) has to be as narrow and thin as possible. Various silicon neural probes that have been recently developed consist of a large number of tiny active electrodes that can locally amplify/buffer the neural signals [1] . However, with such limited space, the CMOS pixel amplifiers (PA) underneath the electrodes are restricted to a bare minimum, while most of the signal processing is done in the 'base' of the probe. All prior active and passive neural probes [2] [3] [8] used a dedicated metal line per electrode to send the signal to the base circuitry. Naturally, this limits the number of simultaneous recording electrodes to the number of metal lines fitted in the cross section of the shank (Fig. 1 ). To overcome this fundamental bottleneck and achieve a denser simultaneous readout, a new architecture is proposed which relies on time division multiplexing and techniques that reduce the associated noise.
II. OPERATION PRINCIPLE
Active neural probes improve recording quality by buffering or amplifying the input signal close to its source (i.e. the electrode). This approach reduces the source impedance and minimizes crosstalk caused by the coupling amongst the long neighboring shank wires [2] . The PA has strict design constrains: the area is limited by the electrode size, the power is limited by the acceptable tissue heating and the noise requirements are imposed by the signal amplitude (as small as tens of µV). Within the limited area, an obvious method to reduce noise is to increase the current consumption of the PA input transistor. This results in the PA having a high bandwidth. Since the neural signal band itself is limited to ~7.5 kHz, the PA output can be sampled at f s > 15 kHz in the base. Therefore, a simple time division multiplexing could be embedded within the shank (Fig. 2a) allowing M number of PA outputs on a single shank wire (f MUX = Mf S ). However, the lack of a traditional anti-aliasing filter limiting the high PA bandwidth increases the in-band noise due to folding. Since it is not possible to fit low pass filters within the limited area of the PA (before the sampling operation), we have employed an alternative method of noise reduction by integrating the signal over a period of time (T i ) (Fig. 2b) . The integrate, sample and reset operations strongly attenuate the signal beyond f i =1/T i (f i ≥ f MUX ), improving the signal-to-noise ratio.
For this particular design, a multiplexing factor of M=8 was enough to overcome the shank-wire bottleneck. To avoid inband distortion, each channel is oversampled at f s = 40 kHz (>15 kHz), producing a total multiplexing frequency of 320 kHz. This, in turn, limits the integration period to a maximum of 3.125 µs. We have used T i = 2.5 µs to allow time for transitions. This process essentially results in a low pass operation, strongly reducing the PA bandwidth from ~ 4 MHz to 400 kHz and limiting the noise folding.
III. CIRCUIT DESCRIPTION
A. Architecture Fig. 3 shows the block level architecture of the complete probe. The output from an array of 8 multiplexed pixel amplifiers is sent to the base through a shared shank wire. The signal is fed to an integrator (in the base) whose output is demultiplexed (DMUX block) using 8 sample-and-hold circuits 
B. Pixel amplifier
The integrator architecture described in Section II is split in two parts. Within the limited area of the pixel, the PA acts as a voltage to current converter (Fig. 4) . The current is then integrated for a fixed period of time (T i = 2.5 µs) over a capacitor (C i = 15 pF) in the base shared among 8 channels. After T i , the voltage on C i is sampled and then it is discharged for the next cycle. The integration capacitor and sample and hold (S/H) circuits forming the de-multiplexer are located in the less area restricted base. A flipped voltage follower buffer following the S/H circuit allows the Ref DMUX to drive multiple channels.
The PA employs an open loop, AC coupled, transconductance (gm) stage (M1). This produces an overall small signal gain of 10, given by:
The cascode transistor M2 reduces the clock feedthrough from the switches A and B to the gm stage. These switches are overlapping in order to ensure an always ON current through M1. These aspects are of crucial importance to maintaining stability, as the gate (G) of M1 is a high impedance node. The pseudo-resistor M3 and C 1 set the high pass corner of the PA (< 1 Hz), necessary to reject the relatively high DC level (hundreds of mV) produced by the electrode-tissue interface. During normal operation, the cascode transistor M4 located between the current source (i.e. the PA) and the integrating capacitor (C i ) ensures that the shank wire connected at the source of M4 is at a constant voltage equal to the supply rail (V s ~ 1.2 V). By keeping all shank wires at a constant voltage, this approach reduces the crosstalk amongst channels caused by capacitive coupling of the long shank lines. The supply rails of the PA (1.8/1.2 V) are defined by multiple factors. The power budget (I DC (V DD -V SS )), coupled with minimal noise requirement, induces a trade-off between the current through M1 and its V DS . However, the chosen operating point must account for the drop in the power supply lines across the shank, thus 0.6 V supply voltage is optimal (V DD -V SS ). By using 1.2 and 1.8 V rails, the current can be directly integrated over C i (within the range of 0 to 1.2 V), eliminating the need for a negative supply. Furthermore, since the 1.2 V rail is used by the following stages, the current from the unselected pixels (switch A closed) can be reused to power the blocks in the base, reducing overall consumption. The power dissipation limit of the PA is given by the physiologically safe limit of < 1° C tissue heating, including the dissipation in the power lines.
Additionally, through switch E, the PA allows for gain calibration (CAL) and electrode impedance characterization (IMP). Applying a known voltage (via the CAL/IMP port) while the electrode is floating (not connected to sample/solution) allows for measurement and calibration of the end to end gain. Similarly, applying a known current while the probe is submerged in saline solution allows for the characterization of the electrode-tissue interface impedance. Since this measurement requires the connection of a single PA input to the shared signal injection CAL/IMP port, the selection is of that particular switch E is done by temporarily lowering the cascode voltage Vc 2 when its switch B is ON. This triggers a highthreshold inverter within that specific PA, thus setting the switch E, while allowing normal operation of the PA. This method of using the output line simultaneously as a select signal eliminates the need for any control registers within the PA. With the exception of the high-threshold inverter, the transistors shown in Fig. 4 are thick-oxide transistors, in order to reduce gate leakage.
The extremely high aspect ratio of the shank along with the limited area for supply routing (due to the large number of signal wires) results in a high voltage drop of ~120 mV across the shank power supply (Fig. 6 a) . Even after locally generating the gate bias V b , this voltage drop creates enormous differences among the PA bias voltages (ΔV b ~ ΔV DD ) and severely affects operation performance. To mitigate this problem, a tree structure for the supply line is implemented by splitting the shank in 12 branches (113 PA each). Here, each branch experiences a much lower drop ((ΔV b ~ 0 V), resulting in a more controlled bias current.
In order to maintain proper operation under these voltage drops while also accounting for supply ripple on the highresistive lines, only 6 (random) branches can be turned ON simultaneously without additional penalty on noise. This allows recording from 6 arbitrary sections on the 8 mm shank (~ 0.7 mm each) with the reported performance, suitable for covering multiple regions of a rat brain. Moreover, the design also allows simultaneous complete readout from all electrodes on the shank (1356) with increased noise and power.
C. Channel
Each channel receives a signal (S x ) and reference (R x ) line, from the corresponding DMUX (Fig. 3 ) that feeds the instrumentation amplifier (IA). The reference (REF) line can be selected from i) one of the local Reference PA (Ref-PA) , ii) few locally averaged Ref-PAs or iii) an external signal. Furthermore, single ended operation is possible, which along with the readout of the reference channels, enables software referencing. This may result in an improved signal quality [9] . In order to preserve circuit symmetry and avoid distortions, each Ref-PA is demultiplexed to 8 outputs, such that for each channel the 2 inputs of the IA are de-multiplexed (i.e. sampled) simultaneously.
By providing a gain of 10, the integrator also relaxes the noise budget of the IA. The IA is implemented using an ACcoupled folded-cascode OTA with the bandwidth limited to approximately 15 kHz. This prevents aliasing from the subsequent switched capacitor (SC) band-select filter. The SC filter (operating at 80 kHz) can be configured as high pass, low pass or disabled to select the action potential (AP: 300 Hz -7.5 kHz), the local field potential (LFP: 1 Hz -1 kHz) or the full band (1 Hz -7.5 kHz), respectively. A programmable gain amplifier (PGA) follows the filter and provides 8 configurable gains between 1 and 50. After the PGA, the signal passes through an anti-aliasing filter and is buffered in order to be multiplexed and to drive the ADC. A class-AB ADC driver is used to reduce the static power consumption. Each channel allows for independent band selection, gain configuration, reference selection, calibration selection and power down through a chain of shift registers distributed across the chip.
IV. TEST RESULTS
The chip was fabricated using a 6-metal layers 0.13 µm Al CMOS process, followed by biocompatible TiN electrode deposition and wafer thinning down to 50 µm (Fig. 7) . Measurements were performed in a dark Faraday cage, using phosphate buffered saline solution. The total power consumption is 31 mW for 678 channels, with 2.3 mW dissipated in the shank (3 µW/PA) and 28.7 mW in the base, including data transmission with 4 pF loading. For the optimal (high-performance) setting of 6 random electrode branches (113×6 = 678), the input referred noise is 12.4 ± 0.9 µVrms in the AP band and 50.2 ± 12 µVrms in the LFP band (Fig. 9) . The entire probe (12 groups, 1356 electrodes) can be simultaneously turned on for low fidelity scanning purposes in order to select the region of interest. The crosstalk across the full signal chain is -63 dB at 1 kHz, with the measurement being limited by the noise floor. Fig. 10 shows a pre-recorded neural signal that has been fed into the saline solution and captured by the probe, along with the separation within the two signal bands.
This work demonstrates at least 2 times increase in the number of simultaneous recording channels with respect to the state of the art [8] active neural probes while having comparable power and noise performance. 
