Abstract-A low-power VLSI processor architecture that computes in real time the magnitude, phase and phase synchronization of two input signals is presented. The processor is part of an envisioned closed-loop implantable or wearable microsystem for adaptive neural stimulation. The architecture uses three CORDIC processing cores that require shift-and-add operations but no multiplication. The 10-bit processor synthesized in a standard 1.2V 0.13 m CMOS technology utilizes 41,000 logic gates. For 64 input channels, it dissipates 1.1 W per input, and provides 1kS/s per-channel throughput when clocked at 1.41MHz. The power scales linearly with the number of input channels or the sampling rate.
I. INTRODUCTION
At least one percent of people worldwide suffer from epilepsy. Approximately one-third of those with epilepsy do not react well to currently available treatments such as antiepileptic drugs [1] . Electrical stimulation has shown promising results in reducing the frequency of seizures in patients [1] , [2] , [3] . Typically, the stimulation pulses are applied continuously, which can result in suboptimal treatment efficacy, shorten the battery life, increase the size of the device and increase the cost of the therapy as more surgical operations are required for replacement [3] . Adding seizure detection or prediction capabilities to an implantable system to yield a closed-loop stimulator can help address these issues [4] . Extensive research has been conducted in predicting and detecting seizures before the seizure onset [5] , [6] .
Neurons initiate electrical oscillations that are contained in multiple frequency bands such as alpha , beta and gamma (40-80Hz) and have been linked to a wide range of cognitive and perceptual processes [7] . It has been shown that during a seizure the amount of synchrony between these oscillations from neurons located in different regions of the brain changes significantly [6] . Thus, the amount of synchrony between different neural signals is a strong indicator in predicting or detecting seizures [6] , [8] . To quantify this level of synchrony between two neural signals, a phase locking value (PLV) can be computed that measures the phase synchronization between two signal sites in the brain [6] , [9] . These signals can be monitored by means of electroencephalography (EEG), electrocorticography (ECoG) or multielectrode arrays (MEA) neural recording.
Existing VLSI systems that perform signal processing on neural signals typically employ univariate algorithms. This generally involves one or more computations on individual inputs, such as computing the spectrum estimate, spike threshold, correlation integral or autoregressive parameters [10] , [11] . More advanced bivariate algorithms, involving processing two neural signals, such as phase synchronization, have been demonstrated for seizure prediction and detection [6] , [8] , [9] and for brain-machine interfaces (BMI) [12] , [13] , but only in software.
For low-power VLSI implementations the CORDIC (COordinate Rotation DIgital Computer) algorithm provides an optimum solution for computing the phase locking value. The CORDIC algorithm offers a hardware-efficient approach to computing trigonometric and vector functions, as it requires only shift-and-add operations for vector rotations. The CORDIC algorithm has been demonstrated in a large number of applications, such as matrix computations (QRD and Eigenvalue estimation), image processing (DCT) and digital communications (FFT, DDS) [14] .
We present a low-power digital VLSI processor architecture that performs the computational intensive PLV estimation. It is to be integrated with multi-channel neural recording and stimulation circuits [15] to implement an implantable closedloop microsystem as shown in Figure 1 . The PLV processor combines three CORDIC processor cores, which operate on vectors to compute both the magnitude and phase on one signal and the phase synchronization between two signals. The rest of the paper is organized as follows. Section II discusses the phase synchronization algorithm. Section III presents the VLSI architecture of the processor. Section IV describes the VLSI implementation. Section V contains simulation results of the synthesized processor.
II. ALGORITHM
To quantify the amount of phase locking between two neural signals requires a series of computations to find the phase 978-1-4244-7270-3/10/$26.00 ©2010 IEEE difference followed by the computation of a phase locking index. First, the Hilbert transform is applied to both signals 0 and 1
where 0 and 1 are the real components and 0 and 1 are the imaginary components of the input signal, extracted by the Hilbert transform. The Hilbert transform is conventionally performed over the full band of frequencies in the neural spectrum, and thus, a bandpass filter should be applied before the Hilbert transform to isolate the signal band of interest [9] . Next, the instantaneous phases are computed for each channel
and if phase synchronization exists between the two channels then the difference of the phase is equal to a constant
where and are integers. Numerous statistical tools exist that quantify the level of phase synchronization between two signals such as entropy index, mutual information index and mean phase coherence [6] . The hardware-efficient mean phase coherence in [6] was selected, which uses a numerical value between 0 and 1 to evaluate the amount of phase synchronization. The algorithm defines PLV as
In summary, the PLV computation requires the Hilbert transform, arctan, addition, sine and cosine, moving-average filtering and lastly, the PLV magnitude. The arctan, sine/cosine and magnitude operators will be computed using the CORDIC algorithm while the moving-average filtering will be computed using digital FIR filtering.
III. VLSI ARCHITECTURE
The architecture of the feedforward path of the system in Figure 1 is presented in Figure 2 . After low-noise amplification of the neural signals a high-Q bandpass filter extracts the signal in the frequency band of interest and then digitized by a low-power medium resolution analog-to-digital converter (ADC). Next, both digitized signals are transferred to two sets of 10-bit finite impulse response (FIR) filters. One FIR filter is configured to perform the Hilbert transform to shift the signal by 90 degrees, while the other FIR filter is an all-pass filter to ensure the digital delays of the two FIR filters are matched. Low-power FIR filtering can be efficiently performed in the mixed-signal VLSI domain [16] .
Next, the phase locking value is computed. The CORDIC algorithm was used which rotates a vector of complex numbers by multiplying it by powers of two removing the requirement of complex multipliers and utilizing only adders, shifters and memory retrieval operations [14] . Using an iterative approach, CORDIC provides a high-accuracy, low-power and a low-area computational algorithm at the cost of reduced speed. Two modes were implemented in CORDIC: rotational mode which is used for computing sine and cosine, and vectoring mode which is used to compute magnitude and phase. The two modes only differ in the directions of rotation [14] .
The architecture of the 10-bit phase synchronization and magnitude processor is shown in Figure 3 . It uses three pipelined CORDIC cores and two moving-average FIR filters. The pipelined architecture allows the supply voltage to be lowered to minimize power dissipation by using a lower frequency clock while maintaining a constant throughput. The first core receives the two digitized vectored signals, preprocesses them by extracting the quadrant of the angle and then simultaneously computes both the angle between 0 and 90 degrees and the magnitude using a 16-bit CORDIC core configured in the vectoring mode. The angles are readjusted using the stored quadrant information to output an angle between 0 and 360 degrees. The difference between the two computed angles is transferred to the next stage.
The sine and cosine of the angle difference are computed using a 16-bit CORDIC core configured in the rotational mode. The computed sine and cosine as well as the negative flags are transferred to the two 32-tap moving-average FIR filters. Higher sensitivity for the PLV algorithm can be achieved by increasing the length of the FIR filters at a significant cost in area and complexity. Lastly, the PLV is computed by extracting the magnitude of the FIR averaged sine and cosine outputs using a 16-bit CORDIC core configured in the vectoring mode. An output multiplexer can be configured to output the instantaneous magnitude and phase of each channel, as well as the phase difference and the PLV between channels. Each CORDIC core requires 18 clock cycles which include one clock cycle for pre-processing the angles, 16-clock cycles to perform the CORDIC algorithm and one clock cycle to output the data and post-process the angles.
IV. VLSI IMPLEMENTATION
The processor was designed and synthesized using a standard 8-metal 0.13-CMOS technology. The layout of the synthesized core is shown in Figure 4 . It contains a total of 41,366 gates and occupies an area of 0.178 2 . The first magnitude/phase CORDIC core occupies 20.6% of the area, the second sine/cosine CORDIC core uses 12.8%, the FIR moving-average filters occupy 57%, the third magnitude CORDIC core utilizes 9% and pre-processing and the output MUX occupy 1% of the total core area. Accuracy and sensitivity of the PLV computation can be traded for area by reducing the length of the moving-average FIR filters.
Power dissipation from a 1.2V supply required to operate the processor at 1kS/s for each of the 64 multiplexed inputs is 70.4 or 1.1 per channel. Increasing the clock frequency to allow processing at 7kS/s for 64 inputs also increases the power dissipation to 0.5mW or 7.8 per channel. The univariate magnitude and phase operations and the bivariate phase difference and PLV operations are all computed simultaneously every sample and are time-multiplexed through a 10-bit output. The synthesized design can have a maximum clock frequency above 100MHz, which is beyond the requirements of the intended application. This margin allows the ability to further reduce power dissipation by lowering the supply voltage.
V. SIMULATION RESULTS
The phase synchronization processor was simulated at the RTL level with the FIR filters required to implement the Hilbert transform using Verilog-AMS. Two filtered analog signals were digitized and each sent to the two FIR filters to obtain the Hilbert transform and its delayed version as shown in Figure 5 to 2 , as it propagates in time is shown in Figure 5(b) . A maximum error of 0.0003 radians of deviation from the ideal case was observed.
The simulated magnitude is shown in Figure 6 , when a sinusoid with an amplitude between 0V and 0.6V is applied to the processor. The maximum error is below 5% when the input is between 50mV and 600mV. For an amplifier gain of 2,000V/V, this results in an input-referred accuracy better than 5%, when the neural signal is between 25 and 300 . The simulated PLV between a pair of channels is shown in Figure 7 . The average PLV between the two channels is computed with one input held constant at 10Hz, while the other input is swept from 4Hz to 16Hz. This represents the 8-12Hz -band. As expected, the computed PLV is at unity when the two signals have the same frequency. When one signal has its frequency set to 8Hz or 12Hz (the boundaries of the -band), the PLV drops to 0.3. The sensitivity can be further improved by increasing the length of the moving-average FIR filters. Figure 8 shows the simulated computation of the instantaneous magnitude on two signals and the computation of PLV between the two signals. The inputs can be seen in Figure 8(a) . When the two sinusoids have different frequencies (12Hz vs. 10Hz at t<1s), the PLV is below 1. When the frequencies become the same (10Hz at t>1s), the PLV between the two signals is equal to 1 as shown in Figure 8 (c). A summary of the simulation and implementation results of the phase synchronization processor is given in Table I. VI. CONCLUSIONS A compact, low-power signal processing VLSI architecture has been implemented to compute the PLV on multiple neural inputs and the instantaneous magnitude on individual neural inputs. The implemented processor is used in conjunction with a neural recording front-end to operate in real-time on frequency bands in the neural spectrum and to assist in a closed-loop neural stimulation treatment of epilepsy. The overall area of the processor is 0.178 2 and it dissipates 1.1 per channel when computing the magnitude, phase and quantifying the phase synchronization at 1kS/s for 64 neural signal inputs from a 1.2V supply. 
