Abstract-Accurate spike sorting is an important issue for neuroscientific and neuroprosthetic applications. The sorting of spikes depends on the features extracted from the neural waveforms, and a better sorting performance usually comes with a higher sampling rate (SR). However for the long duration experiments on free-moving subjects, the miniaturized and wireless neural recording ICs are the current trend, and the compromise on sorting accuracy is usually made by a lower SR for the lower power consumption. In this paper, we implement an on-chip spike sorting processor with integrated interpolation hardware in order to improve the performance in terms of power versus accuracy. According to the fabrication results in 90nm process, if the interpolation is appropriately performed during the spike sorting, the system operated at the SR of 12.5 k samples per second (sps) can outperform the one not having interpolation at 25 ksps on both accuracy and power.
I. INTRODUCTION
Spike sorting is an important tool to study neural activities and brain functions in neuroscience research [1] - [3] . It is also a key component in cortically-controlled neuroprosthetics to benefit spinal cord injured patients [4] . Robust sorting performance is an important issue for these applications [5] . The results of the neural decoding are less significant without an accurate spike sorting. On the other hand, making miniaturized and wireless microsystems for the experiments on free-moving subjects is one of the current research trends [6] - [13] . On the resource-constrained systems, the power minimization is required and may result in the compromise on sorting performance.
One of the design issues for the power and accuracy tradeoff is the sampling rate (SR). Since the classification of spikes depends on the features extracted from the spike waveforms, a better sorting performance usually comes with a higher SR. However the high SR leads to a larger power consumption for the recording, processing, and wireless telemetry, which may not be feasible for the applications. A SR of 100 k sample per second (sps) is recommended in [14] for an excellent spike sorting performance. However the current microsystems are usually designed with the SRs of 20 ksps or lower [7] , [8] , [10] - [12] .
In this paper, in order to improve the power-accuracy tradeoff, we integrate the interpolation hardware into the Power Consumption of the Implanted System Spike Sorting Accuracy Proposed System with Interpolation
Original System

Fixed power
Fixed sorting accuracy Fig. 1 . The interpolation can be utilized to improve the power-accuracy tradeoff of the spike sorting microsystem. With a fixed SR and power consumption of the neural recorder, our spike sorting system achieves higher sorting accuracy through the interpolation. With an anticipated sorting accuracy, the implanted system consumes less power with a lower SR after the interpolation.
spike sorting microsystem. Since most spike energy is under 6.25 kHz [14] , after the waveform reconstruction through the interpolation, the sorting performance with 100 ksps signal resolution could be achieved even if the neural recorder has a low SR for low power consumption. The improvement of the systems can be interpreted in two different aspects as shown in Fig. 1 . With a fixed SR and the corresponding power consumption, the interpolation improves the sorting accuracy of the systems. On the other side, after the interpolation, an anticipated sorting accuracy may be achieved with a lower SR as well as a lower power consumption.
After the proof-of-concept algorithm simulation [15] and the circuit design of interpolation hardware [16] , we will focus more one the system integration and implementation results in this paper. The remainder of the paper is organized as follows. The spike sorting microsystem is introduced in Section II. In Section III, we will show how to efficiently integrate the interpolation to the neural recording and spike sorting microsystem to improve the accuracy-power tradeoff. Section IV shows the implementation results, and Section V concludes this work.
II. SPIKE SORTING MICROSYSTEMS
A. Neural Recording and Spike Sorting
Most neurons in the brain communicate by firing action potentials, or spikes. These electrical voltage signals can be recorded extracellularly with very thin electrodes implanted into brains. Very often an implanted electrode records the signals from multiple surrounded neurons, and the recorded waveform is the superimposed potentials generated by these neurons. Spike sorting is a kind of reverse process to differ- entiate which spike corresponds to which of these close-by neurons from the superimposed waveform. Figure 2 shows the hardware modules of the spike sorting. After the frontend neural recording circuitry amplifies and digitizes the microvolt neural potentials, the neural samples are input to the digital processor for spike sorting. The spikes are usually detected according to their localized instantaneous energy. Then the waveform characteristics, or the features, of the spikes are extracted after the waveform alignment. Spikes with similar features should be corresponding to one specific neuron. Therefore, the spikes are classified according to the assembled clusters on the finite-dimension feature space.
B. Sampling Skew and Power-accuracy Tradeoff
The sorting of the spikes are usually based on the differentiation of the spike shapes and the extracted features. As a result, any waveform variation during the recording may result in significant degradation on sorting performance. Sampling skew is one of the main causes for the waveform variations. During the neural recording, the sampling of the neural signals is discrete. The hardware can hardly sample the spikes at exactly the same points of the waveform characteristics. The time difference, or the so-called sampling skew, results in the the variation of the spike waveforms and affect the sorting performance. The most obvious variation are happened in the neural polarization and depolarization regions (i.e. peak and valley) which are the waveform characteristics generally used for spike sorting.
A common solution for the sampling skew is to increase the sampling frequency. The SR of 100 ksps may be required by the system asking for extremely high-end sorting performance [14] . However, for portable or implantable neural recorders supporting a large channel number, low power consumption is essential. The system with high SR usually leads to a large power consumption and is not feasible for the applications. Therefore, the SR of 20 ksps or lower is generally adopted in the current hardware designs with the compromise on the accuracy of spike sorting.
III. PROPOSED SPIKE SORTING MICROSYSTEMS WITH CUBIC SPLINE INTERPOLATION
A. Interpolation
The spikes have the most energy under 6.25 kHz. According to the Nyquist-Shannon sampling theory, it should be feasible to reconstruct the 100 ksps spike waveforms through the interpolation if the SR is higher than 12.5 ksps. An uncompromised spike sorting performance may thus be achieved even with a SR as low as 12.5 ksps. Figure 3 shows the improvement of the neuron separation by means of the interpolation. The neural signals are originally sampled at 12.5 ksps and aligned according to the peak as shown in Fig. 3 (a) . Then the spike waveforms are interpolated to 25 ksps and 100 ksps in Fig 3 (b) and (c) respectively. After the interpolation, the peaks of the neural spikes are reconstructed, and the waveform can be re-aligned with less error caused by the sampling skew. This improves the separation of neuron clusters on the feature space and leads to a better sorting performance. For the algorithm analysis of the using of cubic spline interpolation, please refer to [15] .
B. On-chip Spike Sorting Microsystem with Interpolation
In the proposed system, the interpolation hardware is integrated to improve the power-accuracy tradeoff. Although the interpolation hardware requires the additional power, it allows the system to use the recording frontend circuits with a lower SR in some respects. Since the power consumed by the recording frontend chips [7] is about an order larger than the state-of-the-art spike sorting designs [10] , [11] , the power tradeoff after the interpolation would finally result in a smaller total power.
Further, even after the interpolation, some power consumption in spike sorting processor can also be saved if the high signal resolution is only utilized at the critical step of the spike sorting. Figure 4 shows the architecture of the proposed spike sorting processor. The processor is divided into three stages with different SRs for the specific purposes. In the first stage, spike detection usually uses the energy detector and does not need detailed waveform information. Therefore the low SR (SR DET ) can be used in this stage. In the second stage, the interpolation is first performed and the detected spikes can be aligned with a higher SR (SR ALIGN ) in order to reduce the sampling skew and improve the ability of neuron separation. In the third stage, the feature extraction and classification are operated after the down-sampling. Since the sampling skew is minimized during the high-resolution alignment, there should be relatively small waveform variations after the down-sampling. A lower SR (SR FE&CLA ) can thus be used to minimize the power consumption.
IV. IMPLEMENTATION RESULTS
The proposed spike sorting microprocessor shown in Fig. 4 is implemented and fabricated in 90 nm CMOS low-leakage process. We use reduced instruction set computer (RISC) to construct most of the processing modules in order to preserve the programmability for various spike sorting algorithms. The cubic spline interpolation [16] along with the alignment and downsampling are implemented in a dedicated parallel hardware called IAD engine. Configurations to turn on and off the interpolation engine are designed to compare the spike sorting performance and power consumption with and without the interpolation. The RISCs and IAD engine are cascaded as a processing pipeline and operated simultaneously. Figure 5 shows the die micrograph of the chip. Table I summarizes the chip implementation results. Figure 6 and Fig. 7 show some testing results of the chip with the pre-recorded neural signals from rat hippocampus [19] , [20] . The data are downsampled to 12.5 ksps for the chip testing. The NEO spike detection, DWT-PCA feature extraction, and K-means classification algorithms are coded in assembling language, compiled to the machine code, and programmed onto the chip. Note that the algorithm parameters such as threshold in spike detection, projection vectors of PCA, and so on are trained off-line with the PC. Different configurations for the IAD engine are tested and compared to demonstrate the accuracy improvement of the spike sorting after the interpolation. In Fig. 6 and 7 (a) , the IAD engine is turned off. Because of the related low SR and large sampling skew, the boundary of different clusters of the detected spikes can hardly be seem on the feature space. In Fig. 6 and Fig. 7 (b) , the IAD engine is turned on. The CSP values is from our previous works of [15] . The power of analog recording frontend is estimated from [8] .
Because the sampling skew is reduced, the boundary of the clusters are more distinguishable. Figure 6 and 7 (c) shows the corresponding classification results. Figure 8 shows the improvement of the sorting accuracy versus power consumption. In this comparison, the sorting accuracy is referred from our previous algorithm analysis work in [15] . The power of spike sorting processor is measured from the chip. The power of analog recording circuit is estimated from [8] Fig. 8 (a) , the spike sorting processor with interpolation (A and B) has a better accuracypower curve compared to the processor without IAD engine (C and D). In Fig. 8 (b) , the power consumption of analog recording circuits is further included. After the interpolation, A can achieve similar or even better sorting performance than D and consumes less power consumption.
V. CONCLUSION
In this paper, the spike sorting processor with interpolation is proposed to improve the performance in terms of sorting accuracy versus power consumption. The idea is implemented and fabricated in 90nm low-leakage CMOS process. The results show that the system in 12.5 ksps SR outperforms the system in 25 ksps on both accuracy and power if the interpolation is appropriately utilized in the spike sorting.
