Abstract-In this paper, we present very-large-scale integrated (VLSI) implementation of a template subtraction algorithm for stimulus artifact rejection (SAR) in real time with applicability to closed-loop neuroprostheses. The SAR algorithm is based upon an infinite impulse response (IIR) temporal filtering technique, which can be efficiently implemented in VLSI with reduced power consumption and silicon area. We demonstrate that initialization of the memory within the system architecture using the first recorded stimulus artifact significantly decreases system response time as compared to the case without memory initialization. Two sets of pre-recorded neural data from an Aplysia californica are used to simulate the functionality of the proposed VLSI architecture in AMS 0.35 µm complementary metal-oxide-semiconductor (CMOS) technology. Depending upon the reproducibility in the shape of stimulus artifacts in vivo, the system eliminates virtually all artifacts in real time and recovers the extracellular neural activity with µW-level power consumption from 1.5 V.
I. INTRODUCTION
In closed-loop neuroprostheses in which electrical stimulation and recording of neuroelectrical activity occur in the same medium, large stimulus artifacts can potentially hinder the analysis of recorded data for investigating the neurophysiological mechanisms of action underlying the therapeutic benefits of closed-loop operation [1] .
In the past, blanking techniques in which the recording amplifier input is simply disconnected during each stimulation cycle have been proven to be effective in rejecting large stimulus artifacts and preventing recording amplifier saturation at its output [2] . However, since no recording can be made during the stimulation cycle, blanking techniques are not suitable for high-frequency stimulation applications such as 136.25-Hz deep brain stimulation (DBS) of the subthalamic nucleus (STN) [1] .
On the other hand, subtraction techniques in which a template signal representative of the stimulus artifact is subtracted from the recorded neural data can afford to retain signal information during each stimulation cycle. These techniques, however, do not prevent recording amplifier saturation on their own, and often require complex digital signal processing (DSP) to execute a template signal generation and subtraction algorithm. In subtraction-based stimulus artifact rejection (SAR) techniques, template signal generation can be effectively accomplished by temporal averaging of multiple stimulation cycles based on the assumption that the overall shape, dynamic range, and timing information of the stimulus artifacts do not change significantly over time.
We have previously reported on the implementation of such algorithms using the well-established finite impulse response (FIR) and infinite impulse response (IIR) temporal filtering techniques and have compared the two corresponding system architectures, overall performances, and the associated computational costs to assess the feasibility of their implementation in very-large-scale integrated (VLSI) technology [3] . Both implementation techniques were found to be capable of removing large stimulus artifacts and recovering the neural activity once the steady-state response was reached. The IIR architecture could be implemented with reduced power consumption and silicon area as compared to its FIR counterpart, but was found to be much slower in reaching its steady-state response.
In this paper, we describe the VLSI implementation of an IIR template subtraction SAR algorithm in AMS 0.35 µm complementary metal-oxide-semiconductor (CMOS) technology and demonstrate that the on-chip memory initialization can significantly decrease the system response time. To illustrate the functionality of the proposed system architecture, two sets of neural data pre-recorded from an Aplysia californica (sea slug) during current-controlled electrical stimulation of the nervous system are employed in this study. The two datasets contain both large stimulus artifacts and small neural action potentials occurring randomly in between and occasionally superimposed on the large artifacts.
The paper is organized as follows. Section II discusses the IIR implementation of the SAR algorithm, while Section III presents the VLSI implementation of the system architecture. Section IV presents the simulation results, and finally Section V draws some conclusions from this work.
II. IIR IMPLEMENTATION OF SAR ALGORITHM
As previously described in [3] , template signal generation can be achieved by averaging out a number of the properly shifted versions of the input neural data, which also contain the stimulus artifacts. This can be mathematically expressed as:
where y(t) is the estimated template signal, x(t) is the input neural data, N is the number of stimulus artifact waveforms used for averaging, a(n) are the averaging factors, and T sti is the stimulation period. In order for the stimulus artifact and its template signal to have equal amplitudes, the averaging factors should sum up to unity, and can be all equal to 1/N for a common averaging method. As shown in [3] , an FIR implementation of (1) would require at least N-1 memory rows and N summations in each period of the sampling clock, whereas an IIR implementation would require a single memory row and only three summations at the expense of much slower system response time. Fig. 1 depicts the system architecture for IIR implementation of the subtraction-based SAR algorithm in which a new template signal is generated from the previous template signal and the input neural data. Therefore, in this case, it is the stimulus artifact template signal that is retained in the memory instead of the input signal. The factor K (< 1) plays a similar role as N in (1) and affects the system response time and accuracy. It can be selected as 1/2 n to alleviate the use of multipliers in VLSI implementation. According to Fig. 1 :
where y n is the new stimulus artifact template signal, y n-1 is the previous template signal, and x n is the input neural data. It can be shown from (2) that the minimum number of stimulus artifacts required to generate an accurate template signal with error less than 0.1% can be calculated as:
where y -1 is the initial condition of the memory (normalized to the stimulus artifact template signal in steady state). It can be seen from (3) that a minimum of 108 and 218 stimulus artifacts are required for K values of 1/16 and 1/32, respectively, if no memory initialization is performed. Fig. 2 depicts a plot of the minimum number of stimulus artifacts required to generate an accurate template signal with error less than 0.1% as a function of the normalized initial condition of the memory. Clearly, the closer the initial condition is to the steady-state template signal, the faster the system response time. If y -1 ≥ 0.999, the error will always be less than 0.1%, even at the start of the algorithm. In our VLSI implementation, the memory is initialized by the first recorded stimulus artifact waveform, which is directly written into the memory during the first clock cycle. Fig. 3 shows the simulated stimulus artifact template signal representative of the stimulus artifacts present in the first Aplysia neural dataset that is generated with an IIR architecture (K = 1/16) without memory initialization as well as with the memory initialized using the first recorded artifact. As can be seen, memory initialization significantly decreases the system response time to create a complete and accurate artifact template signal for subtraction. III. VLSI SYSTEM ARCHITECTURE Fig. 4 depicts the VLSI architecture of the IIR SAR system. The required system inputs include a stimulation timing signal, amplified and digitized (10 b) neural data containing the stimulus artifacts, system clock and ADC sampling clock signals, and 14 external control bits. The system clock frequency is at least 28 times that of the ADC sampling clock. The timing operation and signal flow in the SAR algorithm are handled on-chip using a monolithic digital controller incorporating carry-ripple adders. The 10-b, 4-K sequential access memory (SAM) incorporates standard 6-T memory cells [4] , pre-charge, read/write, and output buffer circuitry [5] . Since memory data are accessed 10 bits at a time in sequential manner, the addressing circuitry for SAM comprises a standard ring counter without any column decoder, leading to an area-efficient architecture for the memory. The factor K can be set to be either 1/16 or 1/32 using one external control bit. An output blanking mechanism synchronized to the stimulation timing signal [3] is also integrated in the VLSI architecture to eliminate any potential residual artifacts after template subtraction, especially around the rising and falling edges of the artifact where it has a large slope. The system architecture can process stimulus artifacts with varying durations using four external control bits that set the memory length (i.e., number of 10-b samples) from 256 to 4,096. For example, for a system clock frequency of 1 MHz (ADC sampling rate of 35.7 kSa/s), the system can handle stimulus artifacts with durations up to 114.7 ms, whereas the blanking duration can be set with eight external bits (four per rising/falling edge) in the range of zero to 6.7 ms. To perform memory initialization, the first recorded stimulus artifact is directly written into the memory. Specifically, if this feature is enabled in the system by one external control bit, the system waits for an indication of stimulation from the stimulus timing input. The digital controller then identifies whether the resulting stimulus artifact is the first recorded artifact. If positive, the recorded artifact is loaded directly into the memory. With the next indication of stimulation coming from the stimulus timing signal, the IIR system executes the SAR algorithm. If the memory initialization feature is not enabled, the memory starts with zero internal values.
IV. SIMULATION RESULTS
The IIR architecture (K = 1/16) with a 1.5-V power supply was simulated in AMS 0.35 µm standard CMOS technology with Verilog simulation using NCLaunch tools in Cadence. The first set of pre-recorded neural data (sampled at 2 kHz and obtained during 0.5-Hz stimulation) was used to evaluate system functionality. There was very little variation in the shape and dynamic range of the stimulus artifacts over time in this dataset. The top plot in Fig. 5 depicts a 125-s portion of the neural data that contains many large stimulus artifacts occurring at 0.5 Hz. Bursts of extracellular neural activity are also present in the data. The middle plot shows the output of the SAR system without memory initialization, whereas the bottom plot depicts the same output when the memory is initialized with the first recorded stimulus artifact. Clearly, large stimulus artifacts are removed from the recorded data, recovering short bursts of neural activity that occur in between. Further, memory initialization speeds up system operation, leading to artifacts removal even at the start of the algorithm, whereas it takes more than a minute to reach steady state without memory initialization. This is important in low-frequency and shortduration stimulation applications in which only a limited number of stimulus artifacts are available for processing. Fig. 6 shows a 120-ms expanded view of waveforms in this simulation. Specifically, the top plot depicts a large (400-µV pp ) stimulus artifact with a small neural action potential riding on its decaying tail. The second plot shows the estimated stimulus artifact template signal to be subtracted from the contaminated neural data. The third plot depicts the IIR system output after template subtraction. Clearly, the stimulus artifact is greatly reduced and the neural spike is recovered from its decaying tail. However, due to the small sampling rate (2 kHz) and synchronization mismatch between stimulation and sampling, a relatively large residual spike still exists at the rising edge of the artifact. To remove these residual artifacts, the output was blanked for 4 ms around the rising and falling edges of the artifact using the built-in blanking signal synchronized to the stimulation occurrence [3] . The bottom plot in Fig. 6 shows the IIR system output after blanking.
Another simulation was performed using the second set of pre-recorded neural data sampled at 35 kHz and obtained during 2-Hz stimulation. As can be seen in Fig. 7 , there was some variation in the shape of successive artifacts in this dataset, particularly during their initial portion. The top plot in Fig. 8 depicts a 22-s portion of the neural data, while the bottom plot shows the IIR system output after blanking for 0.45 ms. Clearly, the majority of the artifacts are removed from the neural data, but there are still a few left that cannot be removed with the built-in blanking signal. This is attributed to variation in the shape of artifacts during the initial 3-ms portion. The total system power consumption was estimated to be only a few microwatts in both simulations. Second plot depicts the stimulus artifact template signal estimated by the IIR system. Third plot shows the system output with K = 1/16 in which the stimulus artifact is rejected and the neural spike is recovered. Bottom plot shows the output signal after 4-ms blanking (arrows) at the rising and falling edges of the artifact to remove the residual artifact.
V. CONCLUSION
A template subtraction technique based upon IIR temporal filtering is implemented in VLSI for real-time stimulus artifact rejection. IIR architectures can be implemented with reduced power consumption and die area requirements and can have significantly enhanced operation speed with memory initialization. Using two sets of neural data pre-recorded from a sea slug, it is shown that the proposed architecture can eliminate large stimulus artifacts Fig. 7 . A total of 111 stimulus artifacts from the second neural dataset aligned in time using the stimulus timing signal and superimposed. The artifacts have amplitude of more than 1 mVpp and last several milliseconds. There is some variation in the shape of successive artifacts during the initial 3-ms portion, but very little jitter in the time instance of their occurrence. The arrow points to the onset of stimulus artifacts, and the thick red line depicts the average artifact waveform. from the recording in real time and recover extracellular neural activity randomly present in between or on top of stimulus artifacts. The total system power consumption is estimated to be only a few microwatts from a 1.5-V supply.
