An accurate description of muscular activity plays an important role in the clinical diagnosis and rehabilitation research. The electromyography (EMG) is the most used technique to make accurate descriptions of muscular activity. The EMG is associated with the electrical changes generated by the activity of the motor neurons. Typically, to decode the muscular activation during different movements, a large number of individual motor neurons are monitored simultaneously, producing large amounts of data to be transferred and processed by the computing devices. In this paper, we follow an alternative approach that can be deployed locally on the sensor side. We propose a neuromorphic implementation of a spiking neural network (SNN) to extract spatio-temporal information of EMG signals locally and classify hand gestures with very low power consumption. We present experimental results on the input data stream using a mixed-signal analog/digital neuromorphic processor. We performed a thorough investigation on the performance of the SNN implemented on the chip, by: first, calculating PCA on the activity of the silicon neurons at the input and the hidden layers to show how the network helps in separating the samples of different classes; second, performing classification of the data using state-of-the-art SVM and logistic regression methods and a hardware-friendly spike-based read-out. The traditional algorithm achieved a classification rate of 84% and 81%, respectively, and the spiking learning method achieved 74%. The power consumption of the SNN is 0.05 mW, showing the potential of this approach for ultra-low power processing.
detect the electric potential generated by muscle cells when they are electrically or neurologically activated. These signals have traditionally been used for detecting medical abnormalities, activation levels or muscular recruitment orders, and for analyzing the biomechanics of movement. In the context of prosthetic interfaces, the EMG amplitude is the most used feature to decode information for the motor commands to the prosthesis [1] , [2] . In arm amputee patients the EMG is measured in the remaining part of the arm by means of surface electrodes located above the skin (sEMG) and it is processed to allow the control of a Myoelectric Prosthetic Arm.
To achieve robust EMG pattern recognition performance in prosthesis control, it is necessary to employ high density sEMG electrode configuration. In this way, it is possible to monitor the electrical activity of a large number of neurons simultaneously. Conventional approaches typically record these signals and transmit them to external devices for off-line processing (e.g. DSP, microprocessor or computer). However, the faithful monitoring of the electrical activity of a large number of muscles results in a large amount of sEMG data, which in turn results in an information bottleneck that poses several challenges for both their transfer and further post-processing. Typically, these conventional systems transmit the data at high speeds to conventional digital computing systems that are bulky and require high computational power, consuming high amounts of energy. Following this approach the construction of a compact and low-power wearable solution is quite challenging. This problem is even more demanding considering that the EMG dataprocessing time, used to perform the gesture classification, needs to be as short as possible in order to produce timely and lowlatency motor commands and avoid delays in the control loop. For this reason, although the performance of modern prosthetic devices increased conspicuously, the road to versatile wearable EMG devices is still long.
Here, we propose to exploit the low-power and low-latency features of full-custom mixed analog/digital brain-inspired computing Very Large Scale Integration (VLSI) architectures to address the problem of building efficient embedded computing systems optimized for prosthetic control, that in addition exhibit robust and adaptive computing abilities analogous to those of biological systems. Specifically, we present a neuromorphic setup that is event-driven and it thus can process EMG signals with very low latency in parallel, in very compact packages, Fig. 1 . The overall picture of the system: 3 hand gestures (rock, paper, scissor) were measured by using Myo, converted in spikes and sent to the DYNAP chip. The SNN is composed of 192 neurons where each neuron receives a randomly-weighted combination of the input channels (as shown in the highlighted box on the top of figure). Each neuron receives the input channels after the spike conversion. Each input is decomposed in UP and DOWN channels where UP (DOWN) represents the positive (negative) signal derivative. Each UP/DOWN channel is connected to the each neuron in the network (color coded) with excitatory (solid line) and inhibitory (dashed line) synapses respectively. Every channel is weighted differently represented by the line thickness in the Figure. using very low power. Such technology can be miniaturized and customized to process data in-real time directly on the sensor side, offering an optimal wearable solution. We describe an EMG signal processing stage based on a hardware implementation of a SNN in a compact ultra-low power neuromorphic chip [3] . Understanding how to encode continuous signals in spike sequences and training the network for successful classification is necessary for the creation of a new generation of brain-inspired wearable biomedical devices. We make use of a differential delta method for converting the raw EMG waveforms into spike sequences [4] and analyze the network performance, evaluating its ability to discriminate simple hand gestures.
The SNN employed in this work make use of a feed-forward neural network where the synaptic weights of the input layer are fixed and randomly generated, whereas the weights of the output layer are determined by a learning algorithm [5] , [6] .
In EMG processing and classification, traditional approaches mainly focus on feature extraction that can be then fed into a remote classifier or regression system [7] [8] [9] . More recently, a new approach for classifying EMG signals started to emerge based on the use of SNNs. In [10] EMG signals were recorded using the Myo and encoded into spikes as input for the SNN. The classification could detect the finger movements and this was used to trigger single finger reflexes. In [11] the authors used the NeuCube spiking model, to classify hand gestures. Recently, a neuromorphic implementation of NeuCube was proposed on SpiNNaker platform [12] . In [13] , the authors presented a software SNN used for EMG feature extraction and classification with high accuracy that is trained with the back-propagation learning algorithm. Recently, we investigated how EMG signals can be processed by a dynamic SNN to identify the best time from stimulation onset for reaching the highest accuracy in the classification [14] .
In this paper, we use a multi-core neuromorphic chip with re-configurable routing schemes [3] to implement and configure a SNN for hand gesture discrimination. To validate the network we use sEMG signals that represent 3 hand gestures of the Roshambo game (rock, paper and scissor). The signals are recorded by using the commercial Myo armband that sense electrical activity in the forearm muscles [15] . We encode the sEMGs into spike events, as the input to the neuromorphic chip, and describe the details of a SNN implemented on such chip. We show how the signals are processed out of the first layer of the network and we propose three classification algorithm: two state-of-the-art methods such as the logistic regression and the Support Vector Machine (SVM) and a spiking learning method based on the delta rule to train an on-line read-out unit. Such spiking algorithm can be implemented with analog neuromorphic circuits and a possible circuit is presented.
II. METHODS
The overall architecture of the spiking neural network system implemented on the neuromorphic chip is shown in Figure 1 . The sEMG of 3 hand gestures were recorded by using a Myo armband [15] from forearm muscles. The output is converted into spikes and sent to a Dynamic Neuromorphic Asynchronous Processor (DYNAP) [3] , that implements the SNN. The read-out phase is trained off-chip by using a hardware friendly algorithm implemented in software and two traditional methods, since the DYNAP chip used in this work does not support online learning. After the training, the weights can be uploaded on the chip.
A. EMG Dataset and Spike Conversion
The Myo armband is a wireless wearable device, developed by Thalmic Labs, which enables EMG recordings with limited bandwidth (∼ 200 Hz). The Myo armband became very famous in the biomedical scientific community, being applied in the prostethics field [2] , [16] . Despite its simplicity and limited bandwidth, it can be compared to other devices that have been used in similar benchmark studies [17] .
The Myo is composed of 8 equally spaced non-invasive sEMG sensors that can be placed approximately around the middle of the forearm. The dataset is comprised of 10 able-bodied subjects (3 female and 7 male) recorded during the performance of 3 hand gestures: rock, paper and scissor. Each subject performed 3 sessions, where each hand gesture was recorded 5 times, each lasting for 2 s. Between the gestures we introduced a relaxing phase of 1 s where the muscles could go to the rest position, removing any residual muscular activation. In order to compute an input compatible with the SNN each EMG was then converted into a spike train to be sent and processed in the network. The solution proposed for encoding the sEMG signals into spike trains is the delta-modulator ADC algorithm, based on [4] . Such converter has been widely applied in biomedical circuits and systems thanks to its unique properties [18] , [19] : (i) it has much less circuit complexity and lower power consumption than multi-bit ADCs and these are two important aspects for biomedical application, in particular for wearable ones. (ii) it has a higher tolerance to bit errors than multi-bit binary systems, which means they have higher reliability. (iii) an oversampling ADC can achieve a higher resolution compared to a Nyquist rate multi-bit ADC. Moreover its resolution can be changed without modifying the hardware and it is more relaxed to the requirements on the analog anti-aliasing filter. One of the problems introduced by using the delta-modulator encoding scheme is the high sampling rate and larger data size [20] . However, the EMG signal recorded by using Myo has a bandwidth less than 200 Hz.
The delta-modulator algorithm produces two digital pulse outputs (UP or DOWN) for each input. The UP (DOWN) spikes are then generated every time a positive (negative) change in the input signal exceeds a specific threshold. Every time a spike is produced, a "refractory period" occurs, where the algorithm is unresponsive to the input. This parameter can be used to limit the maximum rate of the spike train produced (e.g. to control the network saturation), despite the loss in temporal information. In the proposed work the amplitude of the threshold chosen for the spike conversion is 0.05, and the refractory period was set to very small values (< 1 μs) to investigate how the network encodes the temporal information of the input signals. To increase the time resolution, before the thresholding phase, the signals were over-sampled to a higher frequency. We used an interpolation factor of 3500, and since the delta converter is an asynchronous method, this resulted in a minimum distance of 0.3 ms between samples. Figure 2 shows the conversion of 4 EMGs signals from electrodes 1, 3, 4 and 6, (the ones with the highest variance) for the proposed three hand gestures with the respective spike conversions.
B. The Reconfigurable Neuromorphic Processor
The proposed SNN was mapped onto the DYNAP chip shown in Figure 3 . The chip implements a multi-core neuromorphic processor with scalable architecture fabricated using a standard 0.18 μm 1P6M CMOS technology [3] . It is a full-custom asynchronous mixed-signal processor, with a fully asynchronous inter-core and inter-chip hierarchical routing architecture. Each core comprises 256 adaptive exponential integrate-and-fire (AEI&F) neurons for a total of 1k neurons per chip. Each neuron has a Content Addressable Memory (CAM) block, containing 64 addresses representing the pre-synaptic neurons that the neuron is subscribed to. Four different synapse types can be chosen for each synapse: fast excitatory/inhibitory, slow excitatory/inhibitory. Each synapse type is modeled by a dedicated Differential Pair Integrator (DPI) circuit [21] , with globally shared bias values per core that determine synaptic weights and time constants. These circuits produce EPSCs and IPSCs (Excitatory/Inhibitory Post Synaptic Currents), with time constants that can range from a few μs to hundreds of ms. The analog circuits are operated in the sub-threshold domain, thus minimizing the dynamic power consumption, and enabling implementations of neural and synaptic behaviors with biologically plausible temporal dynamics. For each core, there is an on-chip programmable temperature-compensated bias-generator which supplies 25 different parameters to the analog circuits to govern the behavior and dynamics of the neurons and synapses [22] . The asynchronous CAMs on the synapses are used to store the tags of the source neuron addresses connected to them, while the SRAM cells are used to program the address of the destination core/chip that the neuron targets. The input/output interfacing circuits that receive and transmit spike events follow the Address Event Representation (AER) communication protocol [23] . In the AER representation, each neuron is assigned an address and it is transmitted as soon the neuron spikes. The information about the analog neural dynamic signals is encoded in the timing of these address-events. The DYNAP chip used in this work uses a 1.8 V power supply. At this supply voltage, the neuron circuits consumes 883 pJ of energy per spike. As this DYNAP chip does not contain on-chip learning circuits, we performed a behavioral simulation of a learning circuits in software to implement the SNN read-out training phase.
For fast prototyping, we interfaced the neuromorphic hardware to a PC using a Field Programmable Gate Array (FPGA). This infrastructure allows to configure the network routers/mapping, to send input spikes to the chip and collect output spikes from it. The FPGA is used to generate the spike input train: it creates an explicit list of indices (e.g. neuron IDs that fire) and of the time stamp of each index. The indices in the spike generator correspond to the number of UP and DOWN channels. This corresponds to 8 spike generators generated from 4 EMG signals. These spike generators are then connected to the neurons in the hidden layer with a randomly generated weight.
C. The SNN Architecture
The SNN implemented consists of two main layers: a singlehidden layer feed-forward neural network and a trainable readout function. The hidden layer is stimulated by spike train inputs from the FPGA spike generator. The read-out layer is then trained to transform the neural activity of the hidden layer into the desired system outputs (3 hand gestures recognition). As for the network input, we selected 4 sEMG channels showing the highest difference in shape, corresponding to the most involved muscle activations during those 3 gestures. In particular, the back pack of forearm flexor pollicis lungus, flexor digitorum profundus and the palmaris lungus.
The hidden layer consists of a population of 192 spiking AEI&F neurons that receive input from the FPGA spike generator. To increase the input variability every neuron received a linear combination of input spike trains, as shown in Figure 1 . For each neuron 4 different combinations of UP and DOWN channels are randomly selected and connected to it with a different number of synaptic inputs (corresponding to different synaptic CAMs on the destination neuron circuit), ranging from 0 to 6. Every CAM has a shared weight parameter, defined by the chip analog bias setting, so the total synaptic weight from the input to the hidden neuron is defined by the bias multiplied by the number of CAMs used. The UP and DOWN channels are connected with excitatory and inhibitory synapses respectively.
D. Network Performance Analyses
To estimate the network's performance we performed Principle Component Analysis (PCA). PCA is a widely used tool for dimensionality reduction and visualization of high dimensional data. We perform the PCA on the mean firing rate of the hidden neurons' activity which describes the networks state. To understand how the networks representation of the 3 input classes changes, we calculate the principal components in the input and the hidden layer. We consider the networks activity at the beginning of all trials, at the input neurons and calculate the principal components for the networks activity during this time interval. Then we repeat the same procedure for the networks activity, in the hidden layer, after 100 ms, for all the classes combined. Finally, we project the high dimensional network activities onto the low dimensional space spanned by the first 3 principal components of the respective time interval to visualize the networks representation of the 3 different input classes.
E. Read-Out Phase
The read-out phase was implemented off-chip since the DY-NAP does not support online training. We used three different algorithms: SVM, logistic regression and a spiking method. The three methods were all applied at the read-out layer on the spiking output measured from the DYNAP chip.
As the spiking method in this paper we propose the classical "Delta-rule" [24] , [25] . It has been shown that this minimizes the Least Mean Square (LMS) error of a single-layer neural network cost function defined as the difference between a target desired output signal T and the network output signal y, for a given set of input patterns signals x, weighted by the synaptic weight parameters w. Specifically, this learning rule sets the corresponding weight change between the ith input and the jth output neuron to be:
In the event-based version of this learning algorithm, the input x is substituted by the pre-synaptic spike train, and T j and y j by the running average of the teacher and neuron's spike train respectively. This is implemented by using the difference between the teacher and the neuron's activity at the onset of the pre-synaptic spike for the weight update. Such update mechanism can be implemented using elegant and low-power subthreshold current-mode circuits such as the "Bump circuit" [26] shown in Figure 5 [27] , [28] . The Bump circuit compares the rate of the neuron spikes to a target value. The rates are calculated through low pass filtering the spikes by using a DPI circuit [21] which receives spike inputs, low pass filters them and generates output currents proportional to the rate of the spikes.
TABLE I THE PARAMETERS USED FOR THE LEARNING IN THE READ-OUT PHASE
The Bump circuit provides us with the analog value (V1,V2) and the direction of the difference between the neurons and targets spike rate (UP), along with a flag for the similarity between the two signals (Stop) which are ideal for implementing the Delta update rule. We have used a behavioural model of such circuits simulated using the BRIAN2 spiking neural network simulator [29] to train the output neurons to be active for their assigned class as the input for that specific class is being presented to the network. The parameters of the learning used for the "Deltarule" algorithm are shown in Table I .
III. RESULTS
We performed a thorough analysis of the spike recordings from the DYNAP chip using different methods reported in this section. Figure 4 shows an example raster plot of one trial for the paper gesture (top section), with black spikes from one output neuron in the hidden layer and red and blue spikes from the UP and DOWN input channels. The bottom section shows the measured membrane voltage of the silicon neuron in the hidden layer.
The right part of Figure 4 shows the corresponding number of spikes for the trial. Based on the DYNAP energy/spike figures, we can estimate the average power consumption used by the network, calculated across trials and movements, which amounts to 0.05 mW. Fig. 6 . The accuracy of the training and the test set of the spike recordings from the DYNAP chip using Logistic Regression, SVM and Spike-based delta rule. The results are cross-validated over 5 different combinations of train and test set for each subject separately, we report average and standard deviation over subjects.
TABLE II THE PARAMETERS USED FOR THE OUTPUT NEURONS IN THE SPIKING READ-OUT LAYER

A. PCA Analysis
To investigate the "average" activity of the overall state of the neurons in the network we performed PCA on the high dimensional spike recordings measured from the chip for each trial, to also look at the uncorrelated activity of the neurons inside the network. This result is depicted in Figure 7 . The first 3 principal components are plotted for the 3 hand gestures for the input and the hidden-layer neurons. As can be seen in the Figure, the principal components of the 3 movements become more distinct after the input has been projected into the higher dimensional space.
B. Logistic Regression and SVM
The logistic regression and SVM were performed by creating training and test set from the output spikes of the DYNAP chip, using two thirds and one-third of the data respectively. To verify the accuracy of the network, the results were cross-validated over 5 trials. The training and test set were created by shuffling the output data. The mean accuracy of the logistic regression is 81% with a standard deviation of ±4%. The SVM accuracy is about 84% with standard deviation of ±4%. These results are reported in Figure 6 .
C. Spike-Based Learning
Using the learning algorithm explained in section II-E, we trained a spike-based read-out layer. Three neurons are used in the output layer which are randomly connected to the 80% of the neurons in the hidden layer. At the onset of a pre-synaptic spike produced by the neurons in the hidden layer, the corresponding weights of the output units are updated, based on equation 1. The parameters of the neurons used for this training are shown in Table II . To calculate the error, we filter the output neurons' spikes and the teaching signal spike train with an exponential kernel with the time constant of 20 ms. The teaching signal is presented as a spike train with the frequency of 100 Hz. The difference between the two filtered spike trains are used to update the corresponding weights.
Two thirds of the available trials are used as the training set and the network is tested on the unseen remaining trials as the test set. The results are shown in Figure 6 which are cross-validated over 5 different combinations of training and test sets. The figure shows the mean and standard deviation of the accuracy achieved after 15 epochs of training. The training accuracy reaches to a mean value of 95% while the testing accuracy is at 74% obtained by averaging over the 10 subjects. It is worth noting that although the hidden layer is high dimensional, there is still a large amount of spatial overlap between the 3 patterns of activity for the hand gestures. To address this problem, we have utilized adaptation in the hidden layer neurons, which regularizes the neurons' activities. As a result, if some of the neuron are very active in all of the trials of different movements, the learning rate for their corresponding synapses is lowered since the number of spikes are regulated by the adaptation's negative feedback loop [30] . The time constant of the adaption was set to 250 ms. Table III presents the average power consumption of the implemented system and state-of-the-art solution for sEMG recognition. It is possible to observe that the proposed spiking network implementation is 2 order of magnitude more power efficient compared to the state-of-the-art implementation of embedded processing phase. The huge gain in power efficiency is thanks to the event-driven and to the sub-threshold mixed signal design. The main source of power consumption is the processing phase where such implementation is very advantageous. Regarding the read-out phase, we can select the most accurate model since it is possible to design a mixed signal chip where the read-out phase can be implemented in the digital part.
IV. DISCUSSION
A. Power Consumption
B. Spike Encoding
In this work, we used a delta encoding scheme to convert the continuous EMG signal into spike sequences. The threshold used to produce these spikes is a hyper-parameter which changes the frequency of the input spikes. The lower the value of this threshold, the higher the fidelity of preserving the signal profile. However, there is a trade-off between signal preservation and the number of spikes generated. With a higher number of spikes, the firing rates of the neurons in the hidden layer can saturate, reducing the ability of the network to distinguish between the changes in the different input signals. In order to address this problem, we used a low value of the threshold and connected each neuron of the hidden layer to the UP and DOWN channels with excitatory and inhibitory synapses respectively. The balance between the excitation and inhibition gives rise to a lower frequency of spikes at the hidden layer and prevents the neuron from saturating while preserving the overall information of the input. Fig. 7 . 2D projections of the 3 PCs shown at the input and at the hidden layer in a representative subject. The PCA is performed on the activity of hidden neurons after 100 ms from the trial onset. The separability increases in the hidden layer, where the data are projected in a higher dimensional space.
TABLE III COMPARISON WITH STATE-OF-THE-ART LOW-POWER METHOD
C. Projection to a Higher Dimensional Space
The choice to project the inputs into a higher dimensional space is necessary because the classification of the different muscle activities with a simple one-layer network was not possible since the spatial overlap between the channels is very large. As shown in the Figure 7 the projection to a higher dimensional space increases the separability of the input data. The hidden layer could be considered as the feature extraction phase of conventional approaches, consuming a considerably lower amount of power, as shown in Table III .
D. Learning
In this work, we presented 3 different classification algorithms, two of them from the state-of-the-art approaches, such as SVM and logistic regression and one spiking-based method. Our end-goal is to have a fully-integrated system that can interact with the EMG sensors, with the possibility of on-chip adaptation and learning circuits, since every subject/patient is unique. This can be achieved with a fully spiking neuromorphic approach proposed by implementing multiple forms of "plasticity", operating on a multiple time scales. Short-term plasticity can be implemented on neuromorphic processors through synaptic and neural dynamics [3] , [21] , [33] . Long-term plasticity can be achieved by designing on-line learning circuits that implement weight and/or network structural changes.
Since in our gesture recongnition example, the label is readily available to us, it is desirable to define a cost function and employ gradient descent for its optimization. Learning algorithms can then be introduced to move the parameters of the system in the direction of the gradient and as a result, converge to a global minima [34] . Deriving an update rule based on this optimization algorithm results in Delta rule for a one layer network and extends to "back-propagation" for deeper networks. However, the back-propagation update rule does not pass the locality criteria required for efficient hardware implementation. Fortunately recent theoretical models have been proposed, which show that approximations of such update rule can be implemented in a local fashion [35] , [36] . Therefore it will be possible to map these models to neuromorphic circuits and implement spikebased learning also in (deep) multi-layer neuromorphic processor chips. However, since it is possible to design mixed digital-analog systems, we can have a system where the traditional method, such as SVM and logistic regression are implemented digitally at the output of a neuromorphic chip in order to get the best from the two approaches.
V. CONCLUSION
This work takes an important step towards realizing an endto-end solution, from sensors to classification, for the real-time processing of sEMG signals. We recorded an EMG data set using 3 hand gestures (rock, paper and scissor), encoded them into spikes, and presented them to an event-based neuromorphic chip. We performed a detailed analysis of the spiking output of the chip using PCA reducing the dimensionality of the output, to prove the ability of the network to split the classes into 3 different parts of the PCs space (after about 100 ms from trial onset). This delay matches the requirements of a low-latency prosthesis control where there is a need for the classification and creation of the motor command within 250 ms. In addition, we trained the network read-out layer using two state-of-the-art mechanisms, such as the SVM and logistic regression and a HW-friendly spike-based learning method for a fully event-based end-to-end system. We estimated the network power consumption to be about 0.05 mW across trials and movements. Future improvement of the read-out phase will lead to reach the same accuracy of the state-of-the-art methods still maintaining it compactness and ultra low-power consumption.
The style of computation envisaged, based on the identification and configuration of a full custom hardware spiking neural network is a completely new approach that departs from traditional approaches. By properly setting the parameters of such neuromorphic VLSI devices, in the future, it will be possible to build small-scale embedded systems that can be used to learn about the input signals and the underlying internal states, while interacting with the environment in real-time.
