The objective of this work is to use a multi-core embedded platform as computing architectures for neural applications relevant to neuromorphic engineering: e.g., robotics, and artificial and spiking neural networks. Recently, it has been shown how spike-timing-dependent plasticity (STDP) can play a key role in pattern recognition. In particular, multiple repeating arbitrary spatio-temporal spike patterns hidden in spike trains can be robustly detected and learned by multiple neurons equipped with spike-timing-dependent plasticity listening to the incoming spike trains. This paper presents an implementation on a biological time scale of STDP algorithm to localize a repeating spatio-temporal spike patterns on a multi-core embedded platform.
Introduction
In the last years, a few hardware-based Spiking Neural Networks (SNN) systems were developed and the description of those pioneering platforms has gained remarkable attention [1, 2] . On account of their parallel and distributed structures, spiking neuronal networks can simulate neuronal activities, potentially realizing an extremely large-scale network comparable to that of the human brain in future. Neuromorphic engineering aims to design SNN which will be used in neuroscience for simulating the brain signal processing.
A second approach in the neuromorphic community concerns neuromimetic systems [3, 4] , which mimic more precisely the activity of biological cells and could replace the living part [5] . A neuromorphic system facilitates the building of a hybrid network incorporating both silicon and biological neurons. In recent times, the term neuromorphic has been used to describe analog, digital, mixed-mode analog/digital VLSI, and software systems that implement biologically realistic neural network models, from the electrophysiology of one single neuron to network plasticity rules. The analog circuit implementation consumes low power (down to nanowatts) per silicon neuron. However, it is required to solve the problems induced by fabrication mismatch and temperature dependence to construct a large-scale network [6] . On the other hand, digital circuit implementation solves this limitation, because it is far less sensitive to these factors, though power consumption tends to be higher than the analog circuit implementations. It is observed that the choice between analog and digital neural networks is application-dependent. The architecture of SNN platforms will finally be a compromise between the computational cost and the model complexity which also constraints the achievable network size [7] .
Different approaches to simulate spiking neural networks are to use either analog/digital VLSI or general purpose computing architectures [8] like clusters of CPUs or GPUs. While software tools can be configured for different types of models, hardware-based SNNs are dedicated to a given type of model.
A promising approach, that is a good trade-off, is the use of mixed hardware/software platform as the Parallella board by Adapteva [9] . This platform is designed for developing and implementing high-performance, parallel processing applications developed to take advantage of the on-board Zynq programmable Soc and the Epiphany chip. The Epiphany 16-core chips consist of a scalable array of simple RISC processors programmable in C/C++ connected together with a fast on-chip network within a single shared memory architecture. The advantage of this mixed hardware/software platform is the presence of FPGA that offers a significant speedup over software designs, as well as size, weight, and power efficiencies.
The objective of this work is to use the Parallella board as computing architecture for neural applications relevant to neuromorphic engineering: e.g. robotics, and artificial and spiking neural networks.
In particular; we present an implementation of the spiketiming-dependent plasticity (STDP) algorithm to localize a repeating spatio-temporal spike patterns hidden in spike trains on the Parallella board. This implementation has been realized for computation purposes taking into account biological time scale.
Materials and methods
In neuroscience, synaptic plasticity is the ability of the connection, or synapse, between two neurons to change in strength or efficacy in response to either use or disuse of transmission at preexisting synapses. Since memories are postulated to be represented by vastly interconnected networks of synapses in the brain, synaptic plasticity is one of the important neurochemical foundations of learning and memory. Spike-timing-dependent plasticity is a biological process that adjusts the strength of connections between neurons in the brain. The process adjusts the connection strengths based on the relative timing of a particular neuron's output and input action potentials (or spikes).
Recently, it has been shown how STDP could play a key role by detecting repeating patterns and generating selective response to them. The concept of STDP has been shown to be a proven learning algorithm for forward-connected artificial neural network in pattern recognition.
In particular, in the work presented by Masquelier [10, 11] , it has been shown that multiple repeating arbitrary spatio-temporal spike patterns hidden in spike trains can be robustly detected and learned by multiple neurons equipped with spike-timing-dependent plasticity (STDP) listening to the incoming spike trains (Fig. 1) . The neurons become selective to successive coincidences of the patterns.
In this work, we will use the models presented in the work of Masquelier [11] . Furthermore, in the implementation on the multi-core embedded platform, we will take into account biological time scale.
Neuron model
Specifically, for the neuron model, Gerstner's spike response mode (SRM) [12] is used. This model represents an alternative formulation to the integrate-and-fire model [13] . Instead of defining the evolution of the neuron's membrane potential by a differential equation, SRM uses a kernel-based method to model the effect of spikes on the membrane potential (1).
At any time, the membrane potential is where the w ji are the excitatory synaptic weights, between 0 and 1 (arbitrary units). Variation of synaptic weight w ji from neuron j (pre-synaptic) to neuron i (post-synaptic) depends on the timing difference Δt = t i -t j . When a postsynaptic spike arises after a pre-synaptic spike (Δt > 0), the connection is reinforced (long-term potentiation (LTP), ΔW ji > 0), whereas in the opposite case, it is weakened [longterm depression (LTD)]. The change of the synapse plotted as a function of the relative timing of pre-and post-synaptic action potentials is called the STDP function or learning window and arise between synapse types. In Eq. (1), each pre-synaptic spike j, with arrival time t j , is supposed to add to the membrane potential an excitatory post-synaptic potential (EPSP) of the form:
where τ m is the membrane time constant (here, 10 ms), τ s is the synapse time constant (here 2.5 ms), and Θ is the Heaviside step function and K is a multiplicative constant chosen, so that the maximum value of the kernel is 1.
In Eq. (1), the last emitted post-synaptic spike i has an effect on the membrane potential modeled as follows: where T is the threshold of the neuron (here 550, arbitrary units), and K 1 = 2 and K 2 = 4 are constants. Furthermore in Eq. (1), when a neuron fires at time t k , it sends to the others an inhibitory post-synaptic potential (IPSP) of the form:
Here, ε, η, and μ kernels were rounded to zero when, respectively, t − t j , t − t i , and t − t k were greater than 7τ m .
Spike-timing-dependent plasticity rule
STDP rules are the most common form of learning used in SNN. The dynamics of a SNN and the formation of its connectivity are governed by synaptic plasticity. Plasticity rules formulate the modifications which occur in the synaptic transmission efficacy, driven by correlations in the firing activity of pre-and post-synaptic neurons. At the network level, spikes are generally processed as events, and the synaptic weight w ji (connection from neuron j to neuron i) varies over time, according to the learning rules. As done in the work of Masquelier [11] , we used an additive exponential update rule of the form (5) 
Computing architecture
SNNs fall into the third generation of neural network models, increasing the level of realism in a neural simulation. The idea is that neurons in the SNN do not fire at each propagation cycle (as happens with typical multi-layer perceptron networks), but rather, only fire when a membrane potential reaches a specific value. When a neuron fires, it generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in accordance with this signal.
Computational neuroscience commonly relies on softwarebased processing tools (NEURON, NEST, PCSIM, BRIAN, etc.). As mentioned in the introduction, neuromorphic engineering is a new interdisciplinary field that takes inspiration from biology, physics, mathematics, computer science, and engineering to design analog, digital, and mixed-mode analog/digital VLSI and software systems to mimic neurobiological architectures present in the nervous system. Some of these platforms are dedicated to the simulation of SNNs and take into account the timing of input signals by precisely computing the neurons' asynchronous spikes. While software tools can be configured for different types of models [14] , hardware-based SNNs are dedicated to a given type of model.
In this work, we use the Parallella board in a mixed hardware/software platform by Adapteva [9] to investigate the capability of simulating an STDP algorithm to localize a repeating spatio-temporal spike patterns on a biological time scale.
The Parallella board comes with either a 16 core or a 64 core Epiphany chip (here, we used the 16 core version), it contains a Zynq SOC (FPGA + ARM A9) and an Epiphany coprocessor which are connected through the eLink interface and AXI bus (Fig. 2) . It can be connected through a Gigabit Ethernet port. It also contains a µHDMI port as well as 2 µUSB ports and a µSD slot.
The presence of the FPGA offers a significant speedup over software designs, as well as size, weight, and power efficiencies. In the provided bitstreams, the FPGA is configured to drive the HDMI, GPIO pins, and the host-side eLink communication between the ARM and Epiphany chip.
A single epiphany core is computationally similar to the ARM. Both are 32-bit RISC cores capable of efficient floating-point and discrete operations. The ARM is capable of running a full operating system, such as Linux, and can interface with the external world via the FPGA.
Furthermore, we make use of the Epiphany Software Development Kit (eSDK) that is a software development environment targeting the Epiphany multi-core architecture. The eSDK is based on standard development tools including an optimizing C-compiler, functional simulator, debugger, and multi-core integrated development environment (IDE). Its drawback is its limitation of connected cores at 4096, which limits the capabilities in terms of simulating a sizable part of the human brain.
Network architecture

SNN execution architecture on the Parallella platform
The spike pattern recognition algorithm was implemented on a shared memory multi-core embedded platform, with single program, multiple data (SPMD) as a technique employed to achieve parallelism. Tasks are split up and run simultaneously on multiple processors (here 16 core) with different input lines to obtain results faster. The implementation was realized in C language using the eSDK and compiler provided by Adapteva. We compiled the code with gcc for the ARM and with egcc, a modified compiler, for the Epiphany chip. To evaluate the speedup of the multi-core platform, the algorithm was implemented both on a single-core ARM and on the Epiphany coprocessor mediated by the ARM host.
In particular, the algorithm was evaluated for 2048 inputs spike trains (128 inputs per core), with two hidden patterns integrated in parallel by five downstream SRM neurons, through excitatory synapses governed by STDP. Lateral inhibitory connections are set up between the SRM neurons, so that as a neuron fires, it sends an inhibitory postsynaptic potential (IPSP) of the form presented in Eq. (4) to its neighbors. This implementation has been realized for computation purposes taking into account biological time scale. The inputs spike trains were generated for a simulation time of 500 s and stored in the shared memory. Anyway, the platform provides an FPGA that can be configured to drive GPIO pins for data transfer.
As mentioned before, our interest is in the systems with a biological time scale. Thus, we choose a computation time of 1 ms, and then our architecture has to compute the value of all parameters within that time. The Epiphany has very limited external IO capabilities.
In fact, any spike train data into or out from the coprocessor need to be mediated by the ARM host; final results are all controlled by applications running on the ARM.
Data generating process for the network
In neuroscience, the words firing and spiking commonly refer to action potentials generated by a neuron. Simulating input spike trains like the ones in the raster plot as shown in Fig. 3 requires only one piece of information: the firing rate of the neuron. As used in the work of Masquelier [10] , we generated spikes independently using a Poisson process with a variable instantaneous firing rate that varies randomly between 0 and 90 Hz. The maximal rate change was chosen, so that the neuron could go from 0 to 90 Hz in 50 ms.
Finally, a part of the spike trains, defined as the 'pattern' to be repeated, was replaced for half input lines into sections of 50 ms. We randomly pick one of these sections and copy the corresponding spikes. As shown in Fig. 3 , we generate two different hidden patterns (red circles and green circles) that repeat at random intervals within stochastic Poissonian activity (blue circles).
Results
The spike pattern recognition algorithm was simulated for 500 s on the host ARM. We explored a situation that focuses on the recognition of two previously unknown patterns integrated in parallel by five downstream SRM neurons, through excitatory synapses governed by STDP and lateral inhibitory connections.
As found in the work of Masquelier [11] , we found that the same neuron cannot become selective to two distinct patterns and inhibition encourages the neurons to distribute themselves across all the patterns. The simulation time of about 13 min shows that real-time simulation cannot be reached with only one ARM core. Pursuant to our goal of implementing the algorithm in the biological time scale, we run the simulation using the 16 cores Epiphany chip. In this example, patterns 1 and 2 were learned by four neurons, and one neurons stopped firing after too many spikes had generated outside the patterns (they did not learn any pattern). Figure 4 shows a typical result in which the hidden patterns data are shown in rectangle shape. If the neuron fires inside the rectangle shape mean that the neuron becomes selective to the pattern. Here, we plotted the membrane potential as a function of simulation time, at the end of the simulation. After about 80 pattern presentations and 600 spikes' generation, selectivity to the pattern is emerging: gradually the neuron almost stops spiking outside the pattern, while it does spike most of the time when the pattern is present. As shown in Fig. 4 , neuron 1 and neuron 2 become selective to the pattern 1 and 2, respectively. The measured post-synaptic spike latency is about 5 ms and there are no false alarms after the 676th spike that is for the last 400 s of simulation. The maximum speedup measured on the platform was 14 times.
Conclusion
In the present work, we presented an implementation on the Parallella board of STDP algorithm to localize a repeating spatio-temporal spike patterns hidden in spike trains. This implementation has been realized taking into account biological time scale using the 16 cores Epiphany chip. The Parallella board provides a large number of cores that are able to perform floating-point operations, which makes it suitable for neural applications relevant to neuromorphic engineering as, for example, pattern recognition. The maximum speedup measured in this work shows how real-time simulation can be reached for "small neural network" using Epiphany coprocessor mediated by the ARM host. This calculation was four times faster than the biological real time. We plan to use this hardware and software platform to improve the hybrid technique, also called "dynamic-clamp" that consists of connecting artificial and biological 'in vivo' or/and 'in vitro' neurons to study the function of neuronal circuits using microelectrode arrays. 
