This work presents an area and power efficient encoding system for wireless implantable devices capable of monitoring the electrical activity of the brain. Such devices are becoming an important tool for understanding, real-time monitoring, and potentially treating mental diseases such as epilepsy and depression. Recent advances on compressive sensing (CS) have shown a huge potential for sub-Nyquist sampling of neuronal signals. However, its implementation is still facing critical issues in delivering sufficient performance and in hardware complexity. In this work, we explore the tradeoffs between area and power requirements applying a novel DCT Learning-Based Compressive Subsampling approach on a human iEEG dataset. The proposed method achieves compression rates up to 64×, increasing the reconstruction performance and reducing the wireless transmission costs with respect to recent state-of-art. This new fully digital architecture handles the data compression of each individual neural acquisition channel with an area of 490 × 650µm in 0.18 µm CMOS technology, and a power dissipation of only 2µW .
INTRODUCTION
Wireless implantable devices capable of monitoring the electrical activity of the brain are becoming an important tool to close the gap between current bulky medical solutions and wearable devices for the treatment of some widespread mental diseases. While such devices exist, it is still necessary to address several challenges to make them more practical in terms of area and power dissipation. Considering multiple site or high frequency oscillation (HFO) recordings, the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. payload of data telemetry from the implant to the external base station grows enormously. The typical circuit blocks used in a multiple channel sensors for medical monitoring are shown in Figure 1 . In a wireless monitoring system, the power consumed by the RF transmitter is usually one order of magnitude more than any other system on the chip [1] . Thus, data compression becomes crucial for reducing the power consumption of data telemetry without losing critical information. To this end, compressive sensing (CS) [2, 3] has been exploited in many recent approaches (e.g. [1, 4, 5] and references therein). In a nutshell, CS consists in taking fewer linear samples than dictated by the Shannon-Nyquist theorem, while still allowing robust off-line signal reconstruction. This is possible by exploiting the fact that the information content of a signal is often much lower than its raw data content.
In this work, we present a fully digital encoder for neuronal signals applying a DCT Learning-Based Compressive Subsampling approach (DCT-LBCS), which improves the reconstruction performance and reduces the data telemetry costs by 50% compared to [6] , while trading-off the area and power DSP requirements.
The paper is organized as follows. We introduce the main concepts of Compressive Sensing and Learning Based Compressive Subsampling in Section 2. In Section 3, we describe the digital architecture tailored for LBCS. Numerical experiments are reported in Section 4, while in Section 5 we analyse and describe our circuit design. Conclusions are drawn in Section 6.
COMPRESSION ALGORITHMS
The main tenet of CS states that a signal x ∈ R N which has K non-zero coefficients can be robustly recovered from
where y offers a compressed version of x, A is a linear op-erator that either satisfies the Restricted Isometry Property (RIP) or is incoherent [7] , and w accounts for measurement noise. If we are able to directly sample y, we save both on storage and communication power. Recovering x, though, usually requires to solve a non-linear optimization problem. Theoretically, i.i.d. sub-Gaussian matrices are incoherent and also satisfy the RIP. Furthermore, they are universal, i.e., given an ortho-normal basis Φ which allows for a sparser representation of a signal x, the RIP or the incoherence of AΦ is the same as of the original A [7] . However, subGaussian matrices are prohibitively expensive to use in practice, since they require O(M N ) space and time.
Bernoulli (BERN) described in [1] , Multi-Channel Sampling (MCS) [5] and Structured Hadamard Sampling (SHS) presented in [8] are randomized sampling approaches recently proposed for the compression of neural signals. These three architectures are very efficient on the sampling side, but require solving non-linear optimization problems to reconstruct the original signals.
Natural signals are often characterized by sparse and structured representations in time-frequency (or space-frequency) domains, such as provided by wavelets [9] . As described in [10] and references therein, a reduced number of samples required for stable recovery can be achieved considering additional structures in the signal x, such as interdependencies between its non-zero coefficients or constraints on its support, during the recovery process.
As discussed in [8] , the Hierarchical Group Lasso (HGL) approach gives the best performances over three different structured-sparsity recovery methods. Such approach has been used to compare the reconstructed iEEG signals sampled through BERN, MCS and SHS methods.
Learning-Based Compressive Subsampling
The LBCS method [11] consists on linear encoding and linear decoding with respect to a given orthonormal basis, resulting in a much simpler and faster solution compared to standard CS. LBCS can be summarized as follows. Given a signal x ∈ R N , we consider the compression model
where Ψ ∈ R N ×N is an orthonormal basis and PΩ ∈ R M ×N is a subsampling matrix whose rows are canonical basis vectors. The effect of applying PΩ to Ψx is to retain only the coefficients indexed by the set Ω, also known as the subsampling map. The vector y ∈ R M is the compressed version of x, with a nominal compression rate (CR) of
The signal x is then approximately recovered via the fast linear decoder
Given a training set D = {x1, . . . , xm} of m fully sampled signals of unit norm, the optimal subsampling map Ω is learnt by choosing the indices that capture most of the average energy in the transform domain:
where ψ i is the i-th row of Ψ.Ω can be exactly found by selecting the M indices whose values of
are the largest [11] . The learnt sampling scheme is then used to directly sample only those transform coefficients indexed byΩ for all signals x. In [6] , LBCS is exploited using the Hadamard transformation matrix, which is particularly suited to a hardware implementation because each coefficient can be computed by performing only additions or subtractions. A DCT based transformation matrix yields better reconstruction performances considering the same frame size and compression rate. However, each DCT matrix entry needs more bit resolution which would require an increased hardware intensity. As will be discussed in Section 5, a trade-off can be found to reduce the hardware complexity over an increased reconstruction performance, motivating a DCT based encoding system (named DCT-LBCS).
Full DCT Compression vs DCT-LBCS
The optimal linear encoding would require a full transformation of signal window x, followed by an adaptive compression that retains only the M largest coefficients of Ψx in absolute value. However, such adaptive encoding requires to first compute all the coefficients Ψx, which results prohibitive with area and power consumptions. For such reason, a trade-off should be set to have a good signal quality after reconstruction, while limiting the power and area needs.
SYSTEM ARCHITECTURE
The one-channel sampling DCT-LBCS architecture proposed in this work is depicted in Figure 2 . The embedded sampling and compression of the neural input signal follows the description presented in Section 2.1.
In the following, we fix Ψ equal to the DCT matrix. Let DΩ = PΩΨ be the matrix composed of the rows of Ψ indexed by Ω. We sequentially compute y = DΩx: looking at each component of y, we have
where d kj is the (k, j)-entry of DΩ. The DCT transformation matrix DΩ contains real valued coefficients (positive and negative), which are stored into an SRAM, shown in Figure 2 , with N × M cells of size BDCT .
A finite state machine (FSM) drives the LBCS encoder sub-sampling procedure. The entries d kj are stored into the chip memory in a sequential fashion through the DCTCoef input. The input signal xj is the digital output of an A/D converter with a resolution of Bi bits. The sampling procedure starts once the memory is loaded and the operations are carried out by a single multiplier and an adder, which are used in a time-multiplexed manner to accumulate the M output values into the registers.
At each time step j, xj is multiplied to the DCT entry d kj , and summed to the Bo-bit accumulator value y k , updating each component following the rule y k = y k + d kj xj, k ∈ {1, . . . , M }. At the beginning of each window of length N , the registers are then reset (y = 0). The enable signal is meant to drive the digital registers, so that each accumulator is updated before the next sample xj arrives. This design choice avoids having one multiplier-adder per accumulator lane, but requires an internal digital clock frequency f encoder = M ×fs, where fs is the signal sampling frequency.
The input data sampling frequency for the considered dataset is 5 kHz, and as further described in Section 4, choosing a window length N = 256 and a compression rate of 32×, the DCT-LBCS encoder frequency results to be 5 kHz× 256 32 = 40 kHz, which is still in a relatively low frequency range. Indeed, if M = N CR is large, the internal clock frequency may become a limiting factor, requiring additional digital blocks for clock synchronization.
SIMULATIONS
The iEEG.org portal contains several datasets of EEG and iEEG data which are manually annotated by expert clinicians. We focus on the I001-P034-D01 dataset, which consists of approximately 1 day, 8 hours and 10 minutes of recordings at 5kHz, or approximately 6 · 10 8 samples. In order to reduce the dataset size, we use samples only from the 12th and 13th seizure, and an equal number of samples before the seizure onset, for training and testing respectively. In order to better compare to the sampling strategy that combines samples across the channels (MCS), we consider only a sub-grid of 4 × 4 electrodes.
In the following we use such dataset to compare the numerical results obtained applying the DCT-LBCS encoder against the other approaches described in Section 2.
Experimental Protocol and Performance Evaluation
The training portion of the dataset is used to learn the sampling pattern for both Had-and DCT-LBCS approaches and also to tune the variable density parameters for the SHS method. Then, the fixed sampling pattern is used by LBCS to compress all the signal windows in the test set. The reconstruction is then performed with the linear decoder (3). For the randomized methods, MCS, BERN and SHS, we draw 20 different sampling patterns from the relative distributions for each signal window in the test and reconstruct using the tree-based HGL norm, which was shown in [8] to yield the best results.
All the reconstructed windows for each channel j are concatenated together, forming the entire reconstructed signalxj for the test seizure. We then compute the SNR for each channel as SNRj = 20 log 10
, where xj is the recorded signal for channel j, and average these SNRs to obtain our final measure of performance, SNR = 1 #ch #ch i=1 SNRj. For the randomized methods, we also average over the 20 draws.
Numerical results
The numerical experiments have been developed with all the methods described in this paper, varying the length of the signal window N , the ADC resolution Bi and the com- pression rate CR. The LBCS approach does not result to be very sensitive to the window length, and for the sake of consistency with the Had-LBCS approach proposed in [6] , we present only results for N = 256 and Bi = 10 bits and resolution of DCT transformation matrix coefficient BDCT = 8 bits. Table 1 reports the reconstruction quality, in dB, obtained on the I001-P034-D01 dataset. As expected, adaptive DCT compression sets the upper limit on the achievable performance. DCT-LBCS offers the best reconstruction quality at any compression rate, with an increase in the SNR of several dBs compared to the other methods. The Adaptive Hadamard yields the second best performance and sets the upper limit for the Hadamard-based approach. Interestingly, the DCT-LBCS method offers a comparable performance to the Adaptive Hadamard even at higher compression rate. In the SHS approach the variable density is adapted to the signals, but still fails at capturing as much structure as LBCS. The BERN and MCS methods offer a much inferior performance at high compression rates, because imposing structure only during reconstruction does not fully compensate the limitations of their structure-unaware sampling mechanisms.
As described in Section 2.2, given a fixed signal window length and compression rate, the best linear encoder is given by adaptively sampling the coefficients that capture most of the energy of each signal. The LBCS-based reconstruction performances are close to the ones obtained with the adaptive encoder, but at a fraction of its power and area cost.
The linear decoder (3) yields reconstructions with less computational cost of the other methods. Indeed, solving a single optimization problem with the HGL norm, using DecOpt [12] , requires on average approximately 0.1 s, while the linear decoder requires only approximately 10 −5 s for a signal with 256 samples.
CIRCUIT DESIGN AND VALIDATION
The circuit implementation has been defined following the experimental results discussed in Section 4 and considering the trade-off between area and power requirements. The target signal reconstruction quality is set to 30 dB. Considering a sampling window length of 256 samples and assuming an ADC resolution of Bi = 10 bits, the Had-LBCS method reaches 30 dB performance with a compression ratio CR = 16. As reported in Table 1 , with the DCT-LBCS approach a compression ratio CR = 32 still allows to have a performance higher than 30 dB (and improved with respect to the Had-LBCS design). Thus, we are allowed to relax the number of bits to transmit, which is directly related to the RF data transmission cost. The internal encoder core clock frequency is f encoder = M × fs = 40 kHz with the accumulator resolution set as Bo = Bi + log 2 (N ) + 1 to avoid overflow. This leads to define an effective compression ratio defined as CR ef f = CR × B i Bo , which takes into account the actual number of bits per accumulator, after the compression. Table 2 reports the performance of the system and presents a comparison with recent published work. In this table is summarized the compression power and area requirements for each methods discussed in this paper. It also reports the simulated recovered signal and transmitter performances, highlighting how the DCT-LBCS approach reduces the RF data telemetry cost while improving by almost 3 dB the performances with respect to the best approach presented in [6] . On the other hand, the area requirement is higher because of an increased bit resolution per DCT matrix entry and because of a different CMOS technology node. However, considering a multiple channel application, the memory content is shared among all the channels, reducing the impact of the storage area over the overall chip.
The architecture shown in Figure 2 has been implemented in a 1P6M 0.18 µm CMOS technology. The layout of the fully digital one-channel encoder is shown in Figure 3 . To verify the functionality of the digital encoder, the digitized neuronal data is directly given as input to the DCT-LBCS block. A post place-and-route simulation has verified that the M outputs given by the encoder are equal to the expected values computed through MATLAB software. The simulation has been run considering a worst case scenario with slow-slow process corner operating at 1.8 V, which results in an estimated power consumption of the DCT-LBCS encoder around 2 µW . The silicon area of the encoder block is 490 × 650 µm.
CONCLUSIONS
This work presents an on-the-fly data compression system applying a novel DCT-LBCS approach, which improves the reconstruction performance and reduces the data telemetry costs by 50%, while trading-off the area and power requirements compared to [6] . The memory that stores the subsampled DCT matrix entries occupies a relative large area. However, in a multichannel implementation, the memory content is shared among all the channels, reducing the impact of the storage area over the overall chip area. The faster DCT-LBCS off-line recovery and higher reconstruction quality than standard CS makes it suitable to any sparse data acquisition system for which fully sampled signals are available for training (e.g., image processing and remote sensing).
