A large-scale spiking neural networks emulation architecture by Pirrone, Vito
 	   1	  
	  
A	  Master	  Thesis	  report	  on	  	  	  	  
A LARGE-SCALE SPIKING NEURAL 
NETWORK EMULATION 
ARCHITECTURE 
 Submitted	  by	  	  
VITO	  PIRRONE	  
	  






	   	   	   	   	  	  
 	   2	  
	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	  This	  page	  is	  intentionally	  blank	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  
	  




My	  heartfelt	  thanks	  to	  remember	  all	  those	  who	  have	  helped	  me	  in	  the	  writing	  of	  this	  master’s	  
thesis	  with	  suggestions,	  criticisms	  and	  observations	  :	  they	  deserve	  my	  gratitude	  ,	  though	  I	  shall	  
be	  responsible	  for	  any	  errors	  contained	  in	  this	  thesis	  .	  
	  
I	  first	  thank	  professor	  Jordi	  Madrenas	  Boadas	  ,co-­‐	  supervisor	  of	  the	  UPC	  of	  Barcelona	  ,	  and	  
Professor	  Eros	  Gian	  Alessandro	  Pasero	  ,	  	  supervisor	  of	  the	  Polytechnic	  of	  Torino	  for	  their	  support	  
and	  their	  wise	  guidance.	  	  
	  
Thanks	  to	  the	  UPC	  supervisor	  Dr.	  Giovanny	  Sanchez	  Rivera	  with	  whom	  I	  worked	  on	  the	  thesis	  
project:	  without	  his	  advices	  and	  suggestions	  this	  thesis	  would	  not	  exist.	  
	  
I	  continue	  with	  the	  Polytechnic	  of	  Torino	  that	  in	  these	  years	  of	  study	  at	  university	  gave	  me	  a	  top-­‐
notch	  education	  and	  the	  skills	  needed	  to	  face	  the	  world	  of	  work.	  
	  
The	  Universitat	  Politecnica	  de	  Catalunya	  (UPC)	  of	  Barcelona	  and	  the	  Department	  of	  Advanced	  
Hardware	  Architectures	  (AHA),	  in	  which	  I	  spent	  one	  year	  of	  Erasmus	  	  interchange	  ,	  where	  I	  had	  
the	  opportunity	  to	  meet	  so	  many	  different	  ideas	  and	  realities,	  that	  gave	  me	  a	  valuable	  technical	  
and	  human	  background.	  
	  
I	  also	  thank	  the	  colleagues	  and	  friends	  who	  have	  encouraged	  me	  or	  who	  have	  spent	  part	  of	  their	  
time	  to	  read	  and	  discuss	  with	  me	  the	  proofs	  of	  work.	  
	  
Last	  but	  not	  least,	  a	  special	  thank	  to	  the	  people	  dearest	  to	  me	  :	  my	  friends,	  my	  flatmates	  and	  in	  
particular	  my	  family,	  who	  always	  supported	  me	  and	  helped	  me	  to	  achieve	  all	  my	  ambition	  and	  
goals	  of	  life:	  to	  my	  father	  Giovanni,	  my	  mother	  Mariella	  and	  my	  sister	  Laura	  this	  work	  is	  
dedicated.	  	  	  	  
Vito	  Pirrone,	  
	  
	  June	  2014	  	  
 	   4	  
	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   This	  page	  is	  intentionally	  blank	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  
Contents 
	   5	  
CONTENTS	  
	  
	   	  Introduction	  ...............................................................................................................................................	  10	  	  
Chapter	  1.	  State	  of	  the	  art	  ........................................................................................................................	  15	  	  1.1 Spiking	  Neural	  Networks………..…………………………………………………………....15	  1.2 Leaky	  Integrate-­‐and-­‐Fire	  model……………………………………………………………18	  1.3 FPGA	  implementations…..…………………………………………………………………….19	  1.3.1	  Bluehive	  system……………………………………………………..……………………20	  1.3.2	  One	  million	  neuron	  single-­‐FPGA	  neuromorphic	  system……………..…...22	  1.3.3	  Ubichip	  system……………………………..…………………………………………..…23	  1.4 SNAVA:	  Spiking	  Neural-­‐network	  Architecture	  for	  Versatile	  Applications….25	  1.4.1	  Modules	  description…….……………………………………………………………...27	  1.4.2	  AER	  system…….…………………………………………………………………….…….32	  1.4.3	  SNAVA	  processing	  phases…….………………………………………………………33	  1.5	  Purposes	  of	  this	  thesis	  project…….…………………………...……………………..……34	  	  
Chapter	  2.	  Brief	  analysis	  of	  performance	  and	  resource	  occupation	  of	  SNAVA	  ............................	  35	  	  2.1	  Time	  performance	  ...….…………………………………………………………………………35	  2.2	  Area	  	  consumption………….……………………………………………………………………37	  	  
Chapter	  3.	  Improvement	  proposals	  ......................................................................................................	  40	  
Contents 
	   6	  
Chapter	  4.	  Implementation	  of	  SNAVA+	  	  ...............................................................................................	  42	  	  4.1	  Instruction	  set	  update	  …….…………………………...………………………………………42	  4.2	  From	  shadow	  registers	  to	  BRAM	  ………………………..…………………………………46	  4.3	  New	  instructions…….……………………………………………………………………………50	  4.3.1	  LOADNP	  instruction	  …….…………………………………………………………51	  4.3.2	  STORENP	  instruction	  …….……………………………………………………….53	  4.3.3	  LOADNPS	  instruction	  …….……………………………………….………………55	  4.4	  Leaky	  Integrate-­‐and-­‐Fire	  model	  implementation	  in	  SNAVA+	  …….………57	  	  
Chapter	  5.	  Results	  .....................................................................................................................................	  60	  	  5.1	  Performance	  evaluation	  –	  Leaky	  integrate-­‐and-­‐fire	  model	  ...….…………..............60	  5.2	  Processing	  speed	  performance	  …….……………………………………………………………60	  5.3	  Implementation	  and	  performance…….………………………………………………………..67	  5.3.1	  Area	  consumption…….………………………………………………………………….68	  5.3.2	  Power	  consumption	  ….…………………………………………………………………71	  
	  
Chapter	  6.	  Conclusions	  and	  further	  research	  ......................................................................................	  75	  	  6.1	  Conclusions	  …………………………………………………………………………………………….75	  6.2	  Further	  research	  ……………………………………………………….…………………………….76	  	  	  
Bibliography	  ...............................................................................................................................................	  78	  
Contents 
	   7	  
Appendiix-­‐A:	  Leaky	  Integrate-­‐and-­‐Fire	  model	  ASM	  code	  ................................................................	  81	  
Appendiix-­‐B:	  Matlab	  code	  for	  creating	  CAM	  SNAVA.pkg	  ..................................................................	  87	  
Appendiix-­‐C:	  Matlab	  code	  for	  creating	  BRAMs	  initialization	  values	  	  .............................................	  89	  
Appendiix-­‐D:	  SNAVA	  +	  experimental	  test	  	  ...........................................................................................	  92	  
	  
	  	  	  	  	  	  	  	  	  
LIST	  OF	  FIGURES	  
	  1.1	  Action	  potential	  waveform	  ...............................................................................................................	  16	  1.2	  Basic	  structure	  of	  the	  neuron	  ...........................................................................................................	  17	  1.3	  Bluehive	  rack	  box	  ...............................................................................................................................	  21	  1.4	  Bluehive	  functional	  block	  diagram	  .................................................................................................	  21	  1.5	  One	  million	  single	  FPGA	  neuron	  block	  diagram	  ..........................................................................	  22	  1.6	  Ubichip	  architecture	  ..........................................................................................................................	  24	  1.7	  SNAVA	  architectural	  overview	  ........................................................................................................	  26	  1.8	  SNAVA	  PE	  structure	  ...........................................................................................................................	  28	  1.9	  Connection	  Synaptic	  BRAM	  –	  active	  registers	  ..............................................................................	  29	  1.10	  SNAVA	  synaptic	  BRAM	  access	  switch	  ..........................................................................................	  31	  1.11	  SNAVA	  CPE	  access	  control	  .............................................................................................................	  31	  1.12	  Phases	  of	  operation	  of	  SNAVA	  .......................................................................................................	  34	  2.1	  SNAVA	  processing	  time	  performance	  ............................................................................................	  35	  
Contents 
	   8	  






	   9	  
LIST	  OF	  TABLES	  
	  2.1	  Utilization	  summary	  of	  fully	  connected	  SNAVA	  	  ..........................................................................	  37	  4.1	  Removed	  SNAVA	  instructions	  	  .........................................................................................................	  42	  4.2	  Number	  of	  neurons	  available	  	  ..........................................................................................................	  49	  4.3	  New	  instructions	  implemented	  in	  SNAVA	  +	  	  ................................................................................	  50	  5.1	  Neuronal	  loop	  subroutines	  	  ..............................................................................................................	  61	  5.2	  Synaptic	  loop	  subroutines	  	  ...............................................................................................................	  61	  5.3	  Area	  occupation	  of	  SNAVA	  and	  SNAVA	  +	  	  .....................................................................................	  68	  5.4	  Area	  occupation	  of	  SNAVA	  +	  for	  different	  number	  of	  synapses	  per	  PE	  	  .................................	  69	  5.5	  Synthesis	  time	  increasing	  the	  synapses	  	  ........................................................................................	  70	  5.6	  Power	  consumption	  of	  SNAVA	  and	  SNAVA	  +	  	  ..............................................................................	  72	  5.7	  Power	  consumption	  of	  a	  singlular	  cellular	  PE	  in	  SNAVA	  and	  SNAVA	  +	  ..................................	  73	  5.8	  Power	  consumption	  of	  SNAVA	  +	  for	  different	  number	  of	  synapses	  per	  PE	  	  ..........................	  74	  
	  	  	  	  	  	  	  
Introduction 




 […]How	  is	  it	  possible	  for	  a	  slow,	  tiny	  brain,	  whether	  biological	  or	  electronic,	  to	  perceive,	  
understand,	  predict,	  and	  manipulate	  a	  world	  far	  larger	  and	  more	  complicated	  than	  itself?	  How	  
do	  we	  go	  about	  making	  something	  with	  those	  properties?	  
These	  are	  hard	  questions,	  but	  unlike	  the	  search	  for	  faster-­‐than-­‐light	  travel	  or	  an	  antigravity	  
device,	  the	  researcher	  in	  Artificial	  Intelligence	  has	  solid	  evidence	  that	  the	  quest	  is	  possible.	  All	  
the	  researcher	  has	  to	  do	  is	  look	  in	  the	  mirror	  to	  see	  an	  example	  of	  an	  intelligent	  system.[…]	  	  	  (Stuart	  J.	  Russell	  and	  Peter	  Norvig,	  Artificial	  intelligence	  a	  modern	  approach,	  Introduction,	  11-­‐16,	  3rd	  edition,	  2009)	  	  	  The	  brain	  is	  a	  massively	  parallel	  and	  efficient	  information	  processing	  system,	  with	  a	  radically	  different	   computational	   architecture	   from	  present	   day	   computers.	   Characteristics	   of	   neural	  computation	   include	   event	   based	   processing,	   robustness	   and	   redundancy,	   adaptation	   and	  learning,	   sensor–motor	   integration,	   cognitive	   tasks,	   as	   well	   as	   lower-­‐level	   sensory	  information	  processing	  such	  as	  vision	  [1].	  All	  these	  skills	  are	  achieved	  under	  severe	  constraints	  of	  size,	  weight,	  and	  energy	  resources.	  	  Over	   the	   last	   half	   century,	   in	   the	   wake	   of	   the	   results	   obtained	   by	   neuroscientists,	  neurobiologists	   and	   mathematicians,	   engineers	   and	   computer	   scientists	   have	   envisioned	  building	   computers	   that	   match	   the	   processing	   capabilities	   of	   the	   most	   powerful	   computer	  existing,	  the	  brain.	  Inspired	  by	  this	  aim,	  the	  fathers	  of	  computer	  science	  Alan	  Turing	  and	  John	  von-­‐Neumann,	  in	  the	  50’s	  looked	  to	  the	  brain	  to	  find	  a	  way	  to	  developing	  better	  computers.	  In	  these	  years	  was	  born	  the	  term	  AI	  (artificial	  intelligence)	  and	  the	  research	  in	  this	  field	  made	  its	  first	  steps	  with	  scientists	  like	  John	  McCarthy,	  Marvin	  Minsky,	  Allen	  Newell,	  Herbert	  Simon	  and	  many	  others.	  Thereafter,	  computer	  scientists	  started	  to	  develop	  artificial	  neural	  networks	  that	  are	  basically	  circuits	   made	   by	   the	   interconnection	   of	   artificial	   neurons,	   which	   mimic	   the	   behavior	   of	  biological	   neurons.	   This	   type	   of	   neural	   network	   needs	   to	   be	   simulated	   through	   a	   custom	  component	   designed	   for	   this	   purpose,	   that	   can	   comprise	   hardware	   (analogue	   or	   digital)	   or	  software	   components,	   which	   computes	   mathematical	   models	   of	   biological	   neurons	   and	  biological	  synapses.	  The	  first	  computational	  model	  for	  neural	  networks	  appeared	  in	  1943,	  by	  McCulloch	  and	  Pitts	  that	   proposed	   the	   very	   first	   model	   called	   “threshold	   logic”,	   in	   which	   the	   artificial	   neuron	  (McCulloch–Pitts	  MCP	  neuron)	  is	  represented	  by	  a	  simple	  Heaviside	  step	  function	  [3].	  
Introduction 
	   11	  
A	  step	   further,	  was	  made	  by	  Hebb	   that	   in	  1949	   in	  his	  book	  “The	  organization	  of	  behavior	   “	  proposed	  learning	  rules	  for	  the	  simple	  neuronal	  model	  existing	  [4].	  But	   after	   the	   publication	   of	   the	   work	   made	   by	   Marvin	   Minsky	   and	   Seymour	   Papert	   (An	  Introduction	   to	   Computational	   Geometry,	   1969),	   the	   research	   in	   neural	   network	   began	   to	  slow:	   Minsky	   and	   Papert	   formulated	   two	   key	   issues	   for	   the	   computational	   machines	   that	  process	  neural	  networks	   [6].	  These	  problems	  concerned	   the	   inability	  of	  neural	  networks	   to	  implement	  the	  xor	  function,	  and	  the	  fact	  that	  computers	  (in	  those	  days),	  could	  not	  manage	  the	  long	  run	  time	  required	  by	  complex	  neural	  networks.	  So	  the	  research	  was	  focused	  in	  the	  study	  of	  alternative	  models	  of	  neural	  networks	  closer	  to	  the	   biological	   reality,	   the	   Spiking	   Neural	   Networks	   (SNN),	  which	  was	   initially	   proposed	   by	  Hodgkin	   and	  Huxley	   [5];	  Hodgkin	   and	  Huxley	  proposed	   a	  detailed	   conductance	  model,	   that	  attaches	  relevance	  to	  the	  spiking	  activity,	  which	  is	  in	  fact	  the	  core	  process	  that	  impacts	  all	  the	  activities	  of	  the	  brain.	  This	   model	   was	   obviously	   too	   far	   from	   the	   complexity	   of	   biological	   neural	   networks	   to	  emulate	  SNN,	  but	  it	  was	  the	  starting	  point	  for	  more	  complex	  models,	  which	  will	  be	  mentioned	  in	  the	  next	  chapter	  of	  this	  work.	  Therefore,	  this	  trend	  led	  the	  research	  at	  the	  third	  generation	  of	  neural	  networks	  that	  raise	  the	  level	  of	  biological	  realism	  by	  using	  individual	  spikes.	  In	  fact,	  recent	  neurological	  research	  has	  shown	   that	   neurons	   encode	   information	   in	   the	   timing	   of	   single	   spikes,	   and	  not	   only	   just	   in	  their	  average	  firing	  frequency.	  So,	  networks	  of	   spiking	  neurons	  are	  more	  powerful	   than	   their	  non-­‐spiking	  predecessors	   as	  they	   can	   encode	   temporal	   information	   in	   their	   signals,	   feature	   that	   allows	   incorporating	  spatial-­‐temporal	  information	  in	  communication	  and	  computation,	  like	  real	  neurons	  do	  [10].	  	  	  But	  what	  are	   the	  advantages	  of	   creating	  machines	   that	   emulate	   the	  SNN?	  Why	  universities,	  companies	   and	   research	   centers	   all	   over	   the	   world,	   are	   increasingly	   interested	   in	   neural	  networks?	  The	  multiple	  uses	  of	  SNN	  emulator	  can	  be	  grouped	  in	  two	  main	  categories.	  The	  first	  one	  is	  to	  give	  to	  the	  neuroscientists	  and	  neurobiologists	  a	  tool	  for	  the	  analysis	  and	  verification.	  In	  fact	  as	  artificial	  neural	  networks	  get	  closer	  to	  biological	  examples,	  it	  becomes	  possible	  to	  emulate	  parts	  of	  the	  nervous	  system	  to	  study	  processes	  which	  normally	  happen	  in	  the	  brain,	  but	  are	  not	   yet	   completely	   understood.	   Spiking	   neural	   network	   simulators	   allow	   to	   verify	   how	  theoretic	  models	  approximate	  biological	  processes,	  or	  it	  can	  be	  used	  to	  study	  mental	  illnesses	  
Introduction 
	   12	  
by	   emulating	   brain	   disorders,	   testing	   how	   drugs	   affect	   the	   brain,	   and	   many	   others	  applications.	  The	   second	   category	   of	   uses	   of	   SNN	   emulators,	   is	   maybe	   the	   most	   attractive	   one,	   and	  comprises	   all	   the	   tasks	   that	   a	   machine	   capable	   of	   learning	   and	   adapting	   to	   the	  "environmental"	   conditions	   can	   achieve.	   Some	   uses	   and	   goals	   that	   SNN	   research	   groups	  obtained	   in	   the	   last	   years	   includes	   data	   processing	   [23]	   [24],	   audio/video	   recognition	   [25]	  [26],	   real	   time	   control	   [27],	   face	   and	   finger	   print	   recognition	   [28]	   [29],	   handwriting	  recognition	   [30],	   decision	   making	   [31]	   [32]	   and	   according	   to	   the	   hopes	   of	   many	   people,	  probably	  in	  a	  somewhere	  as	  far	  future,	  the	  realization	  of	  intelligent	  humanoid	  [33]	  [34].	  	  SNN	  emulators	  can	  be	  split	  in	  two	  classes,	  software	  emulators	  and	  hardware	  emulators:	  	  
• Software	  one	  are	  those	  developed	  to	  run	  on	  a	  general	  purpose	  computer,	  this	  strategy	  just	  means	  developing	  codes	  and	  algorithms,	  mainly	  based	  on	   the	   implementation	  of	  one	  of	   the	  mathematical	  models	  of	   the	  brain	  (e.g.	  Leaky	   integrate-­‐and-­‐fire,	   Iglesias	  &	  Villa	  [19],	  etc.,	  for	  more	  details	  see	  section	  1.2).	  This	   is	   the	   approach	   adopted	   in	  many	   project	   like:	   Brian	   [7],	   SpikeFun	   [8]	   that	   is	   a	  large-­‐scale	  biologically-­‐realistic	  neural	  network	  simulator	  for	  a	  standard	  PC	  based	  on	  the	   Izhikevich	   neuron	   model,	   or	   SpiNNaker	   [9],	   that	   is	   one	   of	   the	   most	   powerful	  because	  it	  simulates	  in	  real-­‐time	  multiple	  neural	  models	  in	  the	  same	  simulation.	  	  
• Hardware	  emulators	  include	  the	  development	  of	  dedicated	  hardware	  to	  simulate	  SNN	  and	  also	  comprises	  software	  simulators	  written	  for	  the	  specific	  developed	  hardware.	  This	  class	  of	  emulators	  can	  be	  analogue	  or	  digital,	  differentiated	  by	  how	  the	  neurons	  are	   implemented:	   if	   information	   is	   processed	   directly	   with	   analogue	   components	  (transistors,	  capacitors,	  resistors,	  etc.)	  then	  it	  is	  an	  analogue	  hardware	  simulator.	  	  Alternatively,	   digital	   hardware	   emulators	   are	   based	   on	   digital	   circuits	   that	   have	   an	  architecture	   specific	   for	   SNN	   emulation,	   and	   in	   which	   dedicated	   software	   runs.	  Furthermore,	   in	   this	   class	   of	   emulators	   the	   circuits	   are	   more	   compact	   and	   the	  operation	  is	  closer	  to	  the	  real	  neuron,	  but	  also	  inherent	  noise	  appears.	  The	  major	  advantage	  of	  analog	  implementations	  is	  that	  can	  be	  easily	  interfaced	  to	  the	  real	  world	   as	   the	   transducer	   signals	   are	   analog	   in	   nature,	  while	   in	   digital	  world	   the	  values	  of	  the	  physical	  quantities	  can	  assume	  only	  discrete	  values.	  However,	  of	  course	  digital	  implementations	  permit	  most	  powerful	  data	  processing	  and	  
Introduction 
	   13	  
SNN	   emulation,	   and	   in	   addition	   can	   be	   easily	   interfaced	   to	   the	   tools	   for	   human	  understanding	   and	   analysis.	   Furthermore,	   they	   have	   much	   more	   flexibility	   and	  programmability;	   this	   means	   that	   with	   the	   same	   digital	   hardware,	   besides	   the	  reconfigurability	  of	  the	  synapses	  and	  the	  variability	  of	  neuron	  parameters,	  is	  possible	  to	  simulate	  different	  neural	  models,	  if	  a	  suitable	  architecture	  is	  developed.	  The	   literature	  also	  describes	  hardware	  emulators	   that	   involve	  a	  combination	  of	  both	  analog	  and	  digital	  techniques,	  in	  which	  the	  distribution	  of	  the	  spikes	  across	  a	  system	  is	  entrusted	   to	   digital	   circuits,	   while	   the	   neural	   model	   is	   simulated	   using	   analogue	  components.	  In	  this	  case,	  the	  simulator	  is	  specific	  for	  a	  particular	  SNN	  model	  because	  the	   technique	   used	   to	   build	   the	   circuit	  which	   emulates	   the	   neural	  model	   (that	   is	   an	  analog	  circuit)	  depends	  completely	  from	  the	  model	  selected.	  	  	  This	   thesis	  project	   focuses	  on	   the	  digital	  approach,	   in	  particular	  using	  FPGAs	   that	  allows	   to	  build	  a	  specific	  architecture	  with	  customized	  functions	  for	  SNN	  simulation.	  Another	  important	  advantage	  of	  the	  FPGA	  systems	  over	  processor-­‐based	  designs	  is	  their	  true	  parallel	   mode	   of	   operation,	   this	   induces	   high	   ability	   of	   FPGA	   to	   perform	   the	   required	  information	  processing	  in	  real-­‐time.	  Moreover	  FPGA	  processors	  are	  easily	  programmable	  and	  reusable.	  	  Surely,	   FPGAs	   are	   less	   optimized	   and	   have	   an	   higher	   power	   consumption	   compared	   with	  ASICs,	  but	  tested	  hardware	  can	  eventually	  be	  implemented	  in	  ASIC	  microcircuits.	  	  All	  these	  facts,	  together	  with	  the	  easy	  accessibility	  and	  the	  low	  costs	  of	  the	  FPGA	  circuits	  make	  them	  attractive	  tools	  for	  the	  neurocontrollers	  implementation.	  	  The	  aim	  of	   this	  work	   is	   to	  present	  SNAVA+,	  a	  new	  version	  of	   the	  hardware	  architecture	   for	  SNN	   emulation	   SNAVA	   (Spiking	   Neural	   Network	   Architecture	   for	   Versatile	   Application).	  	  SNAVA+	  is	  proposed	  to	  be	  a	  release	  of	  SNAVA	  tense	  to	  increase	  the	  performance	  in	  terms	  of	  number	  of	  emulated	  neurons	  and	  reduction	  of	  area	  and	  power	  consumption.	  SNAVA+	  has	  been	  developed	  during	  the	  thesis	  project,	  and	  just	  implemented	  in	  a	  Xilinx	  KC705	  board	  which	  contains	  a	  Kintex7	  FPGA	  (chapter	  3,	  4	  and	  5).	  Chapter	   1	   submits	   an	   introductory	   presentation	   of	   Spiking	  Neural	   Networks	   (SNN),	   a	   very	  used	  SNN	  model	   and	   the	   state	  of	   the	   art	   of	   FPGA-­‐based	  SNN	  emulators;	   in	   addition,	   in	   this	  chapter	   SNAVA	   architecture	   is	   presented	   and	   described,	   since	   its	   basic	   structure	   has	   been	  maintained	  in	  SNAVA	  +.	  	  
Introduction 
	   14	  
In	  chapter	  2	  are	  analyzed	  the	  performance	  of	  SNAVA,	  in	  order	  to	  show	  the	  bottlenecks	  of	  the	  old	  architecture	  and	  to	  propose	  the	  improvement	  (chapter	  3)	  that	  has	  led	  to	  SNAVA+.	  Finally,	   in	  chapter	  6,	   the	  conclusions	  are	  drawn	  regarding	   the	  performance	  achieved	  by	   the	  new	  architecture	   (shown	   in	   chapter	  5),	   strengths	   and	  weaknesses	   are	   analyzed,	   and	   finally	  further	   improvements	   for	   future	   work	   on	   a	   later	   version	   are	   proposed.
Chapter 1 – State of the art 
	   15	  
	  
Chapter 1 – State of the art 
 
 The	  purpose	  of	  this	  chapter	  is	  to	  briefly	  introduce	  the	  spiking	  neural	  networks,	  giving	  a	  quick	  and	   simple	  background	  of	   the	   theoretical	   basis	   of	   SNN.	   In	   addition,	   is	   briefly	   described	   the	  Leaky	  Integrate-­‐and-­‐Fire	  model	  ,	  a	  SNN	  model	  that	  was	  implemented	  on	  SNAVA	  +.	  	  The	  second	  part	  of	  the	  chapter	  focuses	  on	  the	  state	  of	  the	  art	  outright,	  by	  briefly	  presenting	  some	  existing	  SNN	  hardware	  emulators	  implemented	  on	  FPGA.	  In	  particular,	  it	  is	  described	  in	  detail	   the	   SNN	   hardware	   simulator	   SNAVA,	  which	   is	   the	   previous	   version	   of	   SNAVA	   +,	   the	  topic	   of	   this	   thesis	   project.	   A	   more	   accurate	   and	   deeper	   description	   is	   required	   because	  SNAVA	  +	  retains	  a	  large	  part	  of	  SNAVA	  architectural	  structure	  and	  just	  modifies	  and	  optimizes	  it.	  	  	  	  
1.1 – Spiking Neural Networks 
 
 The	  brain	   is	  a	  very	  complicated	  network	  of	  millions	  and	  millions	  of	   interconnected	  neurons	  that	   cooperate	   each	   other	   to	   efficiently	   process	   incoming	   signals	   and	   decide	   on	   actions.	  Typically,	  each	  neuron	  sends	  its	  signals	  out	  to	  over	  10.000	  other	  neurons,	  making	  it	  clear	  how	  the	  signal	  flow	  is	  complicated.	  To	  put	  it	  mildly:	  we	  do	  not	  understand	  the	  brain	  that	  well	  yet.	  In	  fact,	  we	  do	  not	  even	  completely	  understand	  the	  functioning	  of	  a	  single	  neuron,	  and	  also	  the	  chemical	   activity	   of	   the	   synapse	   already	   proves	   to	   be	   infinitely	   more	   complex	   than	   firstly	  assumed.	  However,	  the	  rough	  concept	  of	  how	  neurons	  work	  is	  understood:	  neurons	  send	  out	  short	  pulses	  of	  electrical	  energy	  as	  signals,	  if	  they	  have	  received	  enough	  of	  these	  themselves.	  Starting	   from	   this	   very	   basic	   concept,	   over	   the	   years	   have	   been	   developed	   neural	   network	  models	  closer	  and	  closer	  to	  the	  biological	  reality.	  Spiking	  neural	  networks	  are	  the	  third	  generation	  of	  neural	  network,	  and	  they	  raises	  the	  level	  of	  powerful	  and	  biological	  realism,	  compared	  to	  their	  non-­‐spiking	  predecessors,	  as	  they	  can	  encode	  temporal	  information	  in	  their	  signals.	  In	   fact,	   biological	   neurons	   use	   short	   and	   sudden	   increases	   in	   voltage	   to	   send	   information,	  called	   action	   potentials	   or	   spikes	   or	   pulses.	   Neurological	   research	   has	   shown	   that	   neurons	  
Chapter 1 – State of the art 
	   16	  
encode	   information	   in	   the	   timing	   of	   single	   spikes,	   and	   not	   only	   just	   in	   their	   average	   firing	  frequency.	  	  This	  allows	   incorporating	  spatial-­‐temporal	   information	   in	  communication	  and	  computation,	  like	   real	   neurons	   do.	   So	   instead	   of	   using	   rate	   coding,	   these	   neurons	   use	   pulse	   coding	  mechanisms	  where	  neurons	  receive	  and	  do	  send	  out	  individual	  pulses,	  allowing	  multiplexing	  of	  information,	  for	  example	  frequency	  and	  amplitude	  of	  sound	  [11].	  	  The	  spikes	  play	  a	  crucial	  role	  in	  the	  SNN,	  since	  biological	  neurons	  use	  them	  to	  communicate.	  Incoming	   signals	   alter	   the	   voltage	   of	   the	   neuron	   and	  when	   this	   reaches	   above	   a	   threshold	  value	  the	  neuron	  sends	  out	  an	  action	  potential	  itself.	  Due	  to	  their	  short	  duration	  (1	  ms)	  and	  to	  their	  form	  and	  nature	  we	  refer	  to	  them	  as	  spikes	  or	  pulses	  (see	  Figure	  1.1).	  	  The	  spike	  traverses	  down	  the	  axon	  of	  the	  neuron	  (see	  Figure	  1.2),	  and	  it	  is	  amplified	  by	  small	  interruptions	  of	  the	  myelin	  sheath	  that	  covers	  the	  axon	  (Bodies	  of	  Ranvier)	  in	  order	  to	  minimize	  the	  information	  loss	  and	  speed	  up	  the	  propagation	  of	  the	  pulse.	  	  	  
	  	  
Figure	  1.1	  –	  Action	  potential	  (spike)	  waveform	  	  	  
Chapter 1 – State of the art 
	   17	  
	  
Figure	  1.2	  –	  Basic	  structure	  of	  the	  neuron	  	  	  
(Royalty-­‐Free	  image	  downloaded	  from	  www.dreamstime.com)	  	  	  Spikes	  cannot	  just	  cross	  the	  gap	  between	  one	  neuron	  and	  the	  other.	  They	  have	  to	  be	  handled	  by	  the	  most	  complicated	  part	  of	  the	  neuron:	  the	  synapse,	  formed	  by	  the	  end	  of	  the	  axon	  of	  the	  spiking	  neuron,	  a	  synaptic	  gap	  and	  the	  first	  part	  of	  the	  dendrite	  of	  the	  receiving	  neuron.	  The	   synapse	   is	   a	   very	   complicated	   signal	   pre-­‐processor	   that	   generates	   a	   postsynaptic	  potential	  that	  reaches	  the	  target	  neuron.	  When	   the	   sum	   of	   the	   potentials	   that	   go	   to	   a	   given	   neuron	   reaches	   a	   threshold	   value,	   the	  neuron	  sends	  out	  a	  spike	  down	  the	  axon.	  After	  which	  the	  neuron	  enters	  a	  short	  moment	  (10ms)	  of	  rest,	  the	  refractory	  period,	  in	  which	  it	  cannot	  send	  out	  a	  spike	  again.	  Contrary	  to	  spikes,	  which	  are	  all	  very	  much	  alike,	  postsynaptic	  potentials	  differ	  in	  size.	  This	  is	  
Chapter 1 – State of the art 
	   18	  
caused	  by	  the	  long	  and	  short-­‐term	  history	  of	  the	  synapse:	  outside	  influences,	  shape	  the	  role	  of	  a	  synapse	  as	  a	  pre-­‐processor.	  This	  is	  called	  synaptic	  plasticity:	  influences	  on	  the	  effect	  of	  an	  incoming	   presynaptic	   spike	   on	   the	   postsynaptic	   neuron,	   forms	   the	   basis	   of	  most	  models	   of	  learning	  and	  development	  of	  neural	  networks.	  	  	  	  
1.2 –Leaky Integrate-and-Fire model 
 	  There	   are	   many	   different	   schemes	   for	   the	   use	   of	   spike	   timing	   information	   in	   neural	  computation.	   Two	   important	   SNN	   models	   are	   the	   “integrate-­‐and-­‐fire”	   and	   the	   Hodgkin-­‐Huxley.	  Both	  belong	  to	  the	  general	  group	  of	  threshold-­‐fire	  models	  of	  SNN.	  	  The	  Hodgkin-­‐Huxley	  is	  a	  very	  detailed	  conductance-­‐based	  neuron	  model,	  but	  is	  very	  complex	  and	  computationally	  expensive	  in	  numerical	  implementations.	  Instead,	  the	  integrate-­‐and-­‐fire	  model	  is	  more	  simple	  but	  easier	  to	  implement	  and	  commonly	  used	   in	  networks	   of	   spiking	  neurons.	  However,	   it	   approximates	   the	   very	  detailed	  Hodgkin-­‐Huxley	  model	  very	  well.	  So	  it	  is	  usually	  considered	  a	  good	  trade-­‐off	  between	  complexity	  of	  the	  model	  and	  implementation	  cost.	  	  Because	  of	  the	  nature	  of	  this	  thesis,	  despite	  SNAVA/SNAVA+	  can	  model	  virtually	  any	  spiking	  neuron	   algorithm,	   here	   it	   is	   only	   described	   the	   Leaky	   Integrate-­‐and-­‐Fire	   (LIF)	   model,	   that	  belongs	   to	   the	   category	   of	   “integrate-­‐and-­‐fire”	   models,	   which	   was	   used	   to	   evaluate	   the	  performance	  of	  SNAVA	  +	  (see	  chapter	  5).	  This	  model	  was	  implemented	  on	  SNAVA	  +	  in	  order	  to	  be	  used	  in	  processing	  sensory	  information	  applications.	  The	  Leaky	  integrate-­‐and-­‐fire	  model	  is	  regarded	  as	  one	  of	  the	  simplest	  spiking	  neural	  model.	  It	  has	  been	  implemented	  in	  embedded	  systems	  due	  to	  minimum	  requirements	  of	  area,	  besides	  it	  can	  be	  used	  as	  powerful	  computing	  systems.	  The	  initial	  model	  was	  proposed	  by	  Iglesias	  and	  Villa	  [19],	  and	  it	  models	  the	  neuron	  as	  Leaky	  Integrate-­‐and-­‐Fire,	  and	  adapt	  the	  spike-­‐timing-­‐dependent	  synaptic	  plasticity	  (STDP)	  in	  the	  synapses.	  However	  in	  order	  to	  obtain	  the	  Leaky	  Integrate	  and	  Fire	  model,	  the	  mechanism	  to	  perform	  the	  STDP	  in	  the	  synapses	  was	  removed	  from	  the	  original.	  	  	  
Chapter 1 – State of the art 
	   19	  
The	  neuron	  model	  in	  LIF	  is	  described	  by	  equation	  1.1:	  	  
)())()())((1()()1( tWtVtVtSitVtV jmemrestirest ∑+−−+=+ τ 	   (1.1)	  	  
Where	  V(t)	  is	  the	  membrane	  voltage,	   is	  the	  resting	  potential	  and	   memτ is	  the	  membrane	  voltage	  constant.	  When	  the	  membrane	  voltage	  exceeds	  a	  predefined	  threshold	  value	  (V	  >Vth)	  an	  action	  potential	  is	  generated	  and	  the	  membrane	  is	  rest	  to )(tVrest .	  	  
PtStW jj ⋅=∑ )()( 	   (1.2)	  	  
The	  input	  potential	   jW is	  a	  function	  of	  the	  state	  of	  the	  presynaptic	  spike	  Sj,	  and	  the	  type	  of	  the	  synapse	  P,	  where	  P	  is	  a	  binary	  parameter	  (positive	  or	  negative)	  that	  depends	  on	  the	  type	  of	  units	  in	  the	  network,	  excitatory	  or	  inhibitory.	  	  	  	  
1.3 – FPGA implementations 
 	  The	   usage	   of	   the	   FPGA	   for	   neural	   network	   implementation	   provides	  more	   flexibility	   and	   a	  faster	  and	  cheaper	  development	  than	  conventional	  specific	  VLSI	  neural	  chip	  design.	  Furthermore,	  with	  FPGAs	  is	  possible	  to	  have	  a	  true	  parallel	  mode	  of	  operation,	  which	  results	  in	  a	  high	  ability	  to	  perform	  processing	  in	  real-­‐time.	  In	  addition,	  having	  a	  hardware	  capable	  of	  executing	  parallel	  processing	  is	  a	  great	  way	  to	  emulate	  the	  biological	  brain,	  which	  is	  itself	  an	  highly	  parallel	  system.	  The	  programmability	  of	  reconfigurable	  FPGAs	  yields	  the	  availability	  of	  fast	   special	   purpose	   hardware	   for	   wide	   applications.	   Its	   programmability	   could	   set	   the	  conditions	  to	  explore	  new	  neural	  network	  algorithms	  and	  problems	  of	  a	  scale	  that	  would	  not	  be	  feasible	  with	  conventional	  processor	  [12].	  
 There	  are	  several	  implementations	  in	  literature,	  but	  here	  will	  be	  described	  only	  few	  of	  them.	  
)(tVrest
Chapter 1 – State of the art 
	   20	  
The	  first	  two	  projects	  presented	  are	  Bluehive	  system	  (1.3.1)	  and	  One	  million	  neuron	  single	  –FPGA	  neuromorphic	  system	  (1.3.2).	  	  In	   both	   these	   two	   designs	   (and	   in	  most	   of	   SNN	   emulator	   implemented	   in	   FPGA,	   since	   they	  follow	  the	  same	  trend	  of	  designing	  with	  minor	  changes	   in	  mechanisms	  and	  algorithms)	   it	   is	  	  not	   considered	   the	   synaptic	   plasticity	   and	   the	   synapse	   has	   only	   a	   single	   parameter	   i.e.	   the	  synaptic	  weight.	  Moreover,	   these	   systems	   are	   suitable	   only	   in	   the	   case	   the	   neural	  model	   is	  fixed,	  because	  they	  have	  an	  hardware	  architecture	  built	  according	  to	  a	  fixed	  SNN	  model.	  So	  despite	  the	  large	  number	  of	  neurons	  and	  synapses	  that	  can	  be	  simulated	  in	  some	  of	  these	  architectures,	  this	  is	  a	  limitation	  and	  a	  disadvantage	  with	  respect	  to	  the	  SNAVA	  (and	  SNAVA+)	  architecture,	  which	  instead	  are	  able	  to	  simulate	  different	  SNN	  models.	  	  Finally,	   sections	   1.3.3	   and	   1.3.4	   describe	   briefly	   the	   predecessor	   of	   SNAVA,	   called	   Ubichip	  [16],	  and	  more	  in	  detail	  the	  SNAVA	  architecture	  itself.	  	  	  
1.3.1 – Bluehive system 	  Bluehive	  is	  a	  custom	  multi-­‐FPGA	  machine	  targeted	  at	  scientific	  simulations,	  made	  with	  Altera	  DE4	  boards.	  A	  particular	  feature	  of	  Bluehive	  is	  the	  communication-­‐centric	  approach	  used	  to	  map	  neural	  networks.	  This	  goes	  in	  the	  opposite	  sense	  compared	  to	  what	  is	  more	  commonly	  found	  in	  literature,	  i.e.	  an	  approach	  focused	  on	  parallel	  computation.	  Bluehive	   simulates	   neural	   network	   using	   the	   Izhikevich	   spiking-­‐neuron	   algorithm	   (a	   SNN	  model	   belonging	   to	   integrate-­‐and-­‐fire	   models	   [14]).	   Design	   allows	   64k	   neurons	   with	   64M	  synapses	  per	  FPGA	  and	  is	  scalable	  to	  a	  large	  number	  of	  FPGAs	  [13].	  	  
Chapter 1 – State of the art 
	   21	  
	  
Figure	  1.3	  –	  One	  of	  the	  Bluehive	  rack	  boxes	  containing	  16	  DE4	  boards	  [13]	  	  The	  design	  has	  been	  split	  into	  functional	  components,	  shown	  in	  the	  following	  figure	  1.4:	  	   	  
	  
	  
Figure	  1.4	  –	  Bluehive	  functional	  blocks	  	  	  	  
• Equation	  Processor:	  performs	  the	  neuron	  computation,	  i.e.	  calculating	  equation	  of	  the	  Izhikevich	  model.	  
Chapter 1 – State of the art 
	   22	  
• Fan-­‐out	   Engine:	   takes	   neuron	   firing	   events,	   looks	   up	   the	   destination	   nodes	   to	   be	  notified	  and	  the	  delay	  to	  be	  implemented	  and	  farms	  it	  out.	  
• Delay-­‐Unit:	  performs	  the	  first	  part	  of	  the	  fan-­‐in	  phase.	  Messages	  are	  placed	  into	  one	  of	  sixteen	  1ms	  bins,	  thereby	  delaying	  them	  until	  the	  right	  1ms	  simulation	  time	  step.	  
• Accumulator:	  performs	  the	  second	  part	  of	  the	  fan-­‐in	  phase,	  accumulating	  weights	  to	  produce	  an	  I-­‐value	  for	  each	  neuron.	  
• Router:	  routes	  firing	  events	  destined	  for	  other	  processing	  nodes.	  
• Spike	  auditor:	  records	  spike	  events	  to	  output	  as	  the	  simulation	  results.	  
• Spike	  injector:	  allows	  external	  spike	  events	  to	  be	  injected	  into	  the	  simulated	  network.	  This	  is	  used	  to	  provide	  an	  initial	  stimulus.	  It	  could	  also	  be	  used	  to	  interface	  to	  external	  systems.	  
	  	  	  
1.3.2 – One million neuron single –FPGA 
neuromorphic system 	  	  This	  architecture	  gives	  the	  strategy	  for	  building	  a	  one	  million	  neuron	  system	  on	  a	  single	  off	  the	  shelf	  commercial	  FPGA	  [15].	   It	   is	  capable	  of	   implementing	  simple	   integrate	  and	  fire	  and	  Izhikevich	   [14]	   neurons,	   with	   the	   actual	   aim	   to	   realize	   a	   system	   for	   real-­‐time	   multimodal	  scene	  analysis.	  
	  
Figure	  1.5	  –	  Neuron	  block	  diagram	  of	  the	  architecture	  [15]	  
Chapter 1 – State of the art 
	   23	  
	  It	  uses	  Address	  Event	  Representation	  (AER)	  communication	  protocol	  for	  the	  spike	  relay;	  AER	   is	   an	   asynchronous	   handshaking	   protocol	   used	   to	   transmit	   signals	   between	  neuromorphic	  systems.	  The	  mapping	  of	  the	  network	  and	  the	  synaptic	  weight	  are	  stored	  in	  the	  external	  memory	  and	  fetched	  every	  cycle,	  while	  the	  main	  neuron	  engine	  is	  the	  one	  shown	  in	  figure	  1.4.	  It	  uses	  time	  multiplexing	   for	   mapping	   many	   neurons	   to	   the	   same	   physical	   engine,	   which	   has	   been	  implemented	   using	   a	   state	   cache.	   The	   rest	   of	   the	   processing	   engine	   is	   fixed	   as	   per	   the	  algorithm.	  	  The	  implementation	  is	  capable	  of	  emulating	  1	  million	  of	  neurons.	  But	  there	  are	  considerable	  drawbacks	  because	  it	  needs	  a	  large	  state	  cache	  and	  a	  significant	  time	  consumption	  as	  it	  is	  just	  time	  multiplexed	  to	  the	  same	  physical	  engines.	  Moreover,	  the	  continuous	  communication	  with	  the	  external	  SRAM	  is	  the	  most	  critical	  process	  of	  the	  whole	  emulation	  cycle,	  from	  the	  point	  of	  view	  of	  time	  performance.	  	  	  	  	  
1.3.3 – Ubichip system 	  	  Ubichip	  was	  developed	  for	  an	  European	  project	  called	  Perplexus	  [16].	  It	  is	  a	  SIMD	  processing	  system	  implemented	  on	  a	  customized	  Spartan	  3	  development	  kit	  of	  Xilinx	  built	  as	  a	  part	  of	  the	  Perplexus	  project.	  	  One	   of	   the	  most	   interesting	   features	   in	   emulation	   of	   SNN	  models	   that	   Ubichip	   offer,	   is	   the	  multi-­‐model	   support	   and	   scalability.	   Feature	   that	  has	  been	  exploited	   and	  enhanced	  even	   in	  the	  later	  works,	  SNAVA	  and	  SNAVA+.	  The	  structure	  of	  the	  architecture	  is	  shown	  in	  the	  following	  figure	  1.5	  :	  	  	  
Chapter 1 – State of the art 
	   24	  
	  
Figure	  1.6	  –	  Ubichip	  architecture	  [17]	  	  The	  architecture	  consists	  mainly	  of	  three	  units:	  	  	  
• Configurable	  array	  or	   the	  macrocell	   (MC):	  is	  a	  NxN	  (N	  is	  configurable)	  array	  of	  16	  bit	  processor,	  with	  two	  banks	  of	  16	  bit	  registers	  and	  a	  16	  bit	  ALU,	  that	  performs	  all	  the	  arithmetic	  and	  logical	  operation	  required.	  
• AER	   controller:	   it	   consists	   of	   a	   control	   unit	   (encoder)	   and	   a	  CAM	  unit	   (decoder).	   It	  manages	   the	   communication	   between	   the	   PEs	   within	   the	   chip	   and	   also	   between	  different	  chips,	  which	  is	  performed	  using	  the	  Address	  Event	  Representation	  protocol.	  
• System	  manager:	  it	  comprises	  the:	  	  
§ configuration	  unit	  :	  manages	  the	  configuration	  of	  the	  different	  building	  blocks	   and	   allows	   to	   set	   the	   registers	   for	   the	   integrated	   debugging	  capabilities.	  
§ sequencer:	  it	  basically	  controls	  the	  program	  flow.	  
§ memory	   controller:	   is	   the	   interface	   between	   the	   Ubichip	   and	   the	  external	  SRAM	  in	  which	  are	  stored	  the	  instructions	  and	  the	  data	  (neural	  and	  synaptic	  parameters).	  
Chapter 1 – State of the art 
	   25	  
§ CPU	   interface:	   is	   the	   interface	   between	   the	   Ubichip	   and	   the	   external	  CPU	   used	   for	   the	   initial	   configuration	   and	   the	   access	   to	   the	   chip	   for	  response	  analysis.	  	  	  From	  the	  analysis	  of	   the	  architecture	  of	   the	  Ubichip,	  some	  bottlenecks	  have	  been	   identified,	  and	  suggested	  some	  changes	  that	  led	  to	  the	  birth	  of	  SNAVA	  architecture,	  described	  in	  detail	  in	  Section	  1.4.	  	  	  	  	  
1.4 – SNAVA: Spiking Neural-network 
Architecture for Versatile Applications  	  	  For	   more	   detailed	   information	   about	   SNAVA	   architecture,	   please	   refer	   to	   Mr.	   Giovanny	  Sanchez	  Rivera	  thesis	  [20].	  	  	  SNAVA	  is	  an	  Harvard	  hardware	  architecture	  designed	  to	  be	  a	  flexible	  Spiking	  Neural	  Network	  (SNN)	   emulator,	   capable	   to	   simulate	   any	   SNN	  model	   in	  which	   the	   communication	   between	  neurons	  take	  place	  via	  spikes.	  	  Therefore,	  the	  architecture	  has	  been	  designed	  to	  ensure	  that	  SNAVA	  can	  be	  re-­‐programmed	  to	  perform	  different	  algorithms	  and	  SNN	  models.	  Hence,	   the	   aim	   of	   SNAVA	   is	   to	   emulate	   in	   an	   efficent	   way	   different	   SNN	   models,	   giving	  however	  a	  high	  level	  of	  versatility	  and	  scalability.	  	  The	  core	  of	  the	  architecture,	  consists	  of	  a	  scalable	  array	  of	  SIMD	  (Single	  Instruction	  Multiple	  Data)	  processing	  elements,	   a	   structure	   that	   guarantees	  a	   complete	  parallel	   execution	  of	   the	  operations.	  This	  strategy	  is	  particularly	  effective	  for	  the	  simulation	  of	  SNN	  models,	  since	  they	  are	  parallel	  by	  nature.	  	  
Chapter 1 – State of the art 
	   26	  
SNAVA	  architecture	  basically	  consists	  of	  a	  single	  control	  unit,	  that	  executes	  one	  instruction	  at	  a	   time	   by	   controlling	  multiple	   processing	   elements	   that	   operate	   in	   a	   synchronous	  manner.	  Each	   processing	   element	   of	   the	   NxN	   array	   emulates	   up	   to	   7	   neurons,	   so	   at	   each	   step,	   is	  performed	  in	  parallel	  the	  processing	  of	  NxN	  neurons.	  	  The	  architectural	  overview	  of	  SNAVA	  is	  shown	  in	  the	  following	  Figure	  1.7.	  
	  
	  
Figure	  1.7	  –	  SNAVA	  architectural	  overview	  [20]	  	  As	  is	  shown	  in	  figure	  1.7,	  SNAVA	  architecture	  is	  composed	  of	  four	  modules:	  
• Processing	  Element	  array	  
• Execution	  module	  
• Access	  control	  module	  
• Spike	  generation	  
Chapter 1 – State of the art 
	   27	  
	  The	   flow	   of	   the	   input	   and	   output	   data	   on	   SNAVA	   are	   performed	   by	   two	   communication	  protocols:	  Ethernet	  and	  AER.	  Thus,	  there	  are	  two	  modules	  (Ethernet	  user	  side	  and	  AER	  user	  side)	   whose	   function	   is	   to	   format	   the	   data	   to	   be	   sent	   to	   their	   respective	   communication	  channel.	  	  	  
1.4.1 – Modules description 	  
Processing	  element	  array	  The	   PE	   array	   is	   the	   heart	   of	   the	   architecture.	   Every	   single	   element,	   also	   called	   Cellular	  Processing	   Element	   can	   simulate	   theoretically	   up	   to	   7	   neurons,	   thanks	   to	   7	   register	   banks	  present	  in	  each	  PE.	  The	  structure	  of	  the	  Cellular	  Processing	  Element	  is	  shown	  in	  figure	  1.8.	  It	  	  consists	  of	  some	  sub-­‐modules.	  	  The	  Central	  Processing	  Element	  (CPE)	  is	  a	  simple	  processor	  capable	  of	  performing	  arithmetic	  and	  logical	  operations,	  and	  in	  addition	  some	  specific	  operations	  for	  the	  emulation	  of	  SNN.	  It	  contains	  a	  16-­‐bit	  ALU,	  1	  bank	  of	  "active	  registers"	  and	  n	  banks	  of	  "shadow	  registers"	  (in	  the	  present	   prototype	   n	   =	   7).	   The	   active	   registers	   are	   those	  with	  which	   the	   ALU	   performs	   the	  operations,	  and	  therefore	  with	  which	  is	  fullfilled	  the	  processing	  of	  the	  neuronal	  and	  synaptic	  parameters.	  This	  bank	  consists	  of	  8	  registers,	  of	  which	  the	  first	  one	  (active	  register	  0)	  is	  just	  the	  accumulator	  register.	  	  The	  shadow	  registers,	   instead,	  have	  the	  function	  to	  store	  the	  neuronal	  parameters.	  For	  each	  neuron	   emulated	  by	   the	   single	   element	   of	   the	   array,	   there	   is	   a	   bank	  of	   8	   shadow	   registers.	  When	   is	   performed	   the	   processing	   of	   the	   neuronal	   parameters	   of	   a	   given	   neuron,	   these	  parameters	  are	  moved	  from	  the	  corresponding	  bank	  of	  shadow	  register	  to	  the	  active	  bank	  ,	  so	  as	  to	  perform	  the	  operations.	  	  In	   addition,	   the	   Central	   Processing	   Element,	   also	   contains	   a	   64-­‐bit	   LFSR	   in	   Galois	  configuration,	  	  which	  works	  as	  a	  pseudo-­‐random	  number	  generator.	  	  The	   synaptic	   parameters,	   instead,	   are	   stored	   in	   the	   Synaptic	   BRAM	   (Block	   Random	  Access	  Memory	  available	  in	  the	  FPGA).	  Every	  Cellular	  Processing	  Element	  has	  one	  BRAM	  for	  storing	  the	  synapse	  parameters	  of	  all	  the	  neurons	  emulated	  by	  the	  single	  processor	  of	  the	  array.	  Since	  these	  parameters	  are	  hardwired	  to	  the	  active	  registers	  of	  the	  CPE,	  a	  single	  cycle	  instruction	  is	  
Chapter 1 – State of the art 
	   28	  
used	  to	  load	  at	  a	  time	  all	  the	  parameters	  of	  a	  single	  synapse.	  Similarly,	  a	  single	  instruction	  is	  required	  to	  save	  all	  the	  computed	  new	  parameters	  back	  to	  the	  Synaptic	  BRAM	  (Figure	  1.9).	  	  The	   mapping	   shown	   in	   Figure	   1.9	   has	   been	   chosen	   in	   order	   to	   have	   only	   one	   synaptic	  parameter	  in	  each	  shadow	  register	  from	  register	  4	  to	  register	  7.	  This	  mapping	  is	  fixed,	  and	  the	  only	  way	  to	  change	  it	  is	  to	  re-­‐synthetize	  the	  architecture	  with	  a	  different	  mapping.	  	  The	  Content	  Addressable	  Memory	  (CAM)	  emulates	  the	  behavior	  of	  the	  synapses.	  It	  reads	  the	  address	   transmitted	  on	   the	  AER	  bus,	  which	   corresponds	   to	   the	  presynaptic	   neuron	   ID,	   and	  generates	   the	   matches,	   which	   correspond	   to	   the	   post-­‐synaptic	   spikes.	   The	   Spike	   Register	  deals	  with	  saving	  all	  these	  matches.	  	  	  
	  
Figure	  1.8	  –	  SNAVA	  PE	  structure	  [20]	  
	  
	  
Chapter 1 – State of the art 
	   29	  
	  
	  
Figure	  1.9	  –	  Connection	  between	  Synaptic	  BRAM	  and	  active	  registers	  
	  
Execution	  module	  The	   execution	   unit	   consists	   of	   a	   sequencer	   and	   an	   instruction	   Block	   RAM.	   The	   sequencer	  controls	   the	  whole	  control	   flow	  of	   the	  system	  and	  achieves	   the	  emulation	  of	   the	  neurons	   in	  two	  phases:	  
• Phase	   1:	   Is	   the	   “spike	   processing	   phase”.	   The	   synapse	   and	   neural	   parameters	   are	  calculated,	  the	  possible	  spikes	  are	  generated	  and	  the	  sequencer	  stops	  in	  the	  first	  phase.	  
• Phase	  2:	  is	  the	  “spike	  distribution	  phase”.	  The	  AER	  address	  generator	  module	  and	  the	  sequencer	  together	  generate	  a	  signal	  to	  indicate	  the	  beginning	  of	  distribution	  phase.	  
The	  sequencer	  begins	  again	  the	  phase	  one	  only	  when	  receives	  a	  notification	  from	  AER	  control	  unit.	  The	  functions	  of	  the	  sequencer	  are:	  	  
1. Fetch	  and	  decode	  the	  instructions	  read	  from	  in	  the	  instruction	  BRAM.	  2. Execute	  the	  instructions.	  Furthermore,	  it	  dispatches	  to	  the	  PE	  array	  the	  instructions	  to	  be	  handled	  by	   the	  PE	  array	   itself.	  While	   fetching,	  decode	  and	  execute	  operations	  are	  pipelined.	  3. Provide	   synapse	   count	   to	   the	   spike	   register	   and	   synaptic	   BRAM	   of	   the	   PE	   to	  which	  deliver	  the	  right	  data	  to	  the	  CPE.	  
Chapter 1 – State of the art 
	   30	  
Access	  control	  module	  The	  access	  control	  unit	  allows	  the	  sequencer	  and	  an	  external	  CPU	  to	  access	  to	  the	  PE	  array.	  Through	   this	   module,	   is	   performed	   the	   initialization	   of	   the	   system	   and	   also	   permits	   the	  debugging.	  It	  consists	  of	  the	  BRAM	  access	  switch,	  CPE	  access	  control	  and	  the	  Config	  Unit:	  	  	  
• BRAM	  access	   switch	  (Figure	  1.10):	  it	  interfaces	  the	  CPU	  and	  the	  Sequencer	  with	  the	  Synaptic	   BRAMs	   of	   the	   PE	   array.	   Its	   task	   is	   to	   bring	   to	   the	   target	   BRAM	   the	   enable	  signal,	  the	  read/write	  enable	  signal,	  the	  data	  and	  the	  address.	  It	  also	  manages	  the	  data	  bus	   from	  the	  target	  BRAM	  to	  the	  CPU,	  when	  requested.	  The	  sequencer	  has	  only	  read	  access	  to	  the	  synaptic	  BRAMs.	  	  
• CPE	   access	   control	   (Figure	  1.11):	   it	   interfaces	   the	  CPU	   and	   the	   Sequencer	  with	   the	  target	  CPE	  of	  the	  PE	  array.	  With	  the	  row	  and	  column	  signal	  it	  is	  possible	  to	  select	  the	  target	  CPE,	  from	  which	  to	  read	  the	  data	  of	  ALU	  output	  and	  status	  registers	  (saturation,	  carry	  and	  zero).	  It	  also	  has	  the	  function	  of	  carrying	  the	  data	  to	  the	  target	  PE	  for	  the	  CPU	  write	  access.	  
• Config	  Unit	  :	  the	  Config	  unit	  mainly	  has	  two	  tasks.	  	  
(a) PE	  array	  selection	  lines	  :	  since	  the	  PE	  array	  selection	  lines	  are	  accessible	  by	  both	  the	  sequencer	  and	  the	  CPU,	  the	  config	  unit	  solves	  the	  contention	  between	  them.	  	  (b) Register	  bank	  for	  global	  SNAVA	  control:	  this	  register	  bank	  containing	  a	  set	  of	  configuration	  registers	  that	  allows	  perform	  a	  global	  control	  of	  the	  SNAVA.	  
	  	  	  
Chapter 1 – State of the art 
	   31	  
	  
Figure	  1.10	  –	  SNAVA	  synaptic	  BRAM	  access	  switch	  [17]	  
	  
	  
Figure	  1.11	  –	  SNAVA	  CPE	  access	  control	  [17]	  
	  
Chapter 1 – State of the art 
	   32	  
	  





1.4.2 – AER system 
	  
	  The	  distribution	  of	   the	   spikes	   is	  performed	  by	  AER	  communication	   scheme.	  Current	  FPGAs	  allow	   the	   communication	   through	  high-­‐speed	   serial	   transceivers,	  which	   are	   not	   compatible	  with	  older	   technologies.	  Hence,	   In	  order	   to	  provide	  more	   flexibility	   it	  has	  been	  designed	  an	  AER	   data	   interface	   that	   is	   independent	   of	   transmission	  medium.	   Another	   advantage	   to	   use	  this	   strategy	   is	   giving	   by	   the	   possibility	   to	   connect	   SNAVA	   with	   other	   systems	   that	   have	  implemented	   their	   communication	   by	   using	   the	   AER	   scheme.	   The	   AER	   data	   interface	   sets	  control	   signals	   to	   send	   and	   receive	   spikes	   from	  AER	  module.	   It	   is	   capable	   to	   carry	   out	   the	  distribution	  of	  the	  spikes	  at	  high-­‐speed	  rate,	  a	  key	  feature	  for	  the	  spike	  distribution	  in	  Large	  Scale	  Spiking	  Neural	  Networks.	  Detailed	  information	  on	  the	  AER	  system	  used	  in	  SNAVA,	  can	  be	  found	  in	  “AER-­‐RT:	  Interfaz	  de	  Red	  con	  Topología	  en	  Anillo	  para	  SNN	  Multi-­‐FPGA”	  [22].	  The	  same	  AER	  interface	  used	  in	  SNAVA,	  has	  been	  retained	  in	  SNAVA+,	  but	  it	  has	  been	  realized	  that	  the	  current	  protocol	  is	  a	  limitation	  to	  the	  potential	  of	  the	  system	  SNAVA+	  .	  During	  this	  thesis	  project,	  mainly	  for	  technical	  reasons	  of	  time,	  it	  was	  not	  possible	  to	  redesign	  the	  interface	  AER.	  
Chapter 1 – State of the art 
	   33	  
In	  section	  6.2	  "further	  research"	  of	  this	  work	  are	  proposed	  some	  changes	  to	  be	  made	  on	  the	  interface	  AER.	  	  	  	  
1.4.3 – SNAVA processing phases 
 SNAVA	  operates	  in	  two	  phases:	  the	  Spike	  Processing	  (Phase	  1)	  and	  Spike	  Distribution	  (Phase	  2).	  These	  phases	  of	  operation	  are	  in	  close	  analogy	  with	  the	  biological	  process	  flow:	  
• Spike	   Processing	   (Phase	   1):	   the	   spikes	   coming	   from	   the	   synaptic	   register	  may	   be	  treated	   as	   the	   dendrites	   of	   the	   biological	   neuron.	   Likewise,	   the	   Cellular	   Processing	  Element	   and	   the	   Sequencer	   work	   like	   the	   soma	   of	   the	   biological	   neuron.	   Hence,	  similarly	   to	   what	   happens	   in	   the	   biological	   brain,	   in	   the	   Phase	   1	   the	   spikes	   are	  processed	  according	  to	  the	  algorithm	  implemented	  by	  the	  “soma”.	  Each	  PE	  can	  emulate	  multiple	   layers	  of	  neurons,	  so	  when	  a	   layer	  of	  neurons	  completes	  the	  processing,	   the	  respective	   spikes	   are	   generated	   by	   the	   AER	   Address	   Generator	  module.	   Thus,	   these	  pulses	  are	  sent	  down	  to	  the	  AER	  bus,	  which	  acts	  like	  the	  axon	  of	  the	  biological	  neuron.	  At	  the	  same	  time	  the	  emulator	  continues	  with	  the	  algorithm	  execution,	  by	  processing	  the	  next	  layer	  of	  neurons.	  When	  the	  algorithm	  execution	  is	  completed,	  i.e.	  all	  the	  layers	  of	  neurons	  have	  been	  processed,	  it	  is	  the	  end	  of	  the	  processing	  phase.	  
• Spike	  Distribution	   (Phase	  2):	  the	  spike	  distribution	  is	  achieved	  by	  the	  broadcast	  of	  the	   spikes	   in	   the	   AER	   bus.	   The	   synaptic	   contact	   between	   neurons	   is	   realized	   by	   the	  CAMs	   located	   in	   the	  PEs.	  Thus,	   the	   spikes	   that	  hit	   the	   right	  neuron	  get	   stored	   in	   the	  respective	  Synaptic	  register.	  When	  all	  the	  spikes	  have	  been	  updated,	  then	  the	  Phase	  2	  ends	  and	  new	  Phase	  1	  can	  start	  again.	  
Figure	  1.12	  shows	  the	  correlation	  between	  biological	  neuron	  and	  SNAVA	  hardware.	  
	  
Chapter 1 – State of the art 
	   34	  
	  
Figure	  1.12	  –	  Phases	  of	  operation	  of	  SNAVA	  [17]	  
 
 
1.5 – Purposes of this thesis project 
 	  The	  purpose	  of	  this	  work	  is	  to	  design	  a	  new	  version	  of	  the	  architecture	  SNAVA	  called	  SNAVA+.	  It	  will	  increase	  the	  capabilities	  of	  	  SNAVA	  in	  order	  to	  have	  a	  large-­‐scale	  SNN	  emulator.	  	  SNAVA+	   will	   preserve	   the	   basic	   hardware	   structure	   of	   SNAVA,	   making	   however	   several	  changes	  to	  optimize	  the	  performance.	  In	  particular,	  SNAVA+	  project	  will	  focus	  on	  the	  aim	  to	  exploit	  more	  efficiently	  the	  available	  resources,	   in	  order	  to	  reduce	  both	  the	  area	  and	  power	  consumption	  of	  the	  FPGA.	  A	  better	  use	  of	  the	  resources,	  in	  fact,	  is	  the	  main	  key	  to	  increase	  the	  potentiality	   of	   the	   SNN	   emulator,	   i.e.	   to	   increase	   the	   number	   of	   neurons	   and	   synapses	  simulated.	   The	   features	   and	   functionalities	   of	   SNAVA,	   will	   not	   be	   affected	   in	   the	   new	  architecture	  SNAVA	  +,	  which	  therefore	  will	  retain	  the	  property	  to	  support	  any	  SNN	  model	  and	  algorithm,	  that	  is	  one	  of	  the	  highlights	  of	  this	  SNN	  emulator.	  
Chapter 2 – Brief analysis of performance and area occupation of SNAVA 
 
	   35	  
Chapter 2 – Brief analysis of performance 
and area occupation of SNAVA 
 This	   chapter	   describes	   the	   results	   of	   the	   implementation	   on	   FPGA	   of	   SNAVA,	   regarding	  processing	   time	   and	   area.	   This	   analysis	   is	   necessary	   as	   the	   starting	   point	   to	   highlight	   and	  focus	  on	  the	  aspects	   to	  be	   improved,	   in	  order	  to	  achieve	  an	   implementation	  on	  FPGA	  of	   the	  architecture	  more	  efficient,	  which	  is	  the	  main	  purpose	  of	  SNAVA	  +	  .	  The	  results	  reported	  are	  related	  to	  the	  implementation	  of	  the	  LIF	  (Leaky-­‐Integrate	  and	  Fire)	  model	  on	  SNAVA.	  More	   details	   about	   the	   algorithm,	   the	   conditions	   under	   which	   the	   architecture	   has	   been	  synthesized	  and	  how	  to	  understand	  these	  results,	  can	  be	  found	  in	  Chapter	  5,	  which	  describes	  the	  results	  of	  SNAVA	  +	  compared	  with	  SNAVA. 
 
 
2.1 Time performance 




Chapter 2 – Brief analysis of performance and area occupation of SNAVA 
 







Note: there are no fractional virtual layers: in SNAVA is possible to have 1 or 2 neurons per 
processor. The graphs a and b were obtained by applying the equations 5.3 and 5.4 (see 
chapter 5), and the intention is to show the trend of the execution time in relation to the number 
of synapses per PE and virtual layers (number of neurons per PE). 
a) Execution time VS number of neurons and synapses per processing element without display 
b) Execution time VS number of neurons and synapses per processing element with display 	  Despite	  SNAVA	  was	  designed	  to	  operate	  at	  200	  MHz,	   the	  system	  clock	  used	  to	  calculate	   the	  execution	   time	   is	   125	   MHz.	   Of	   course,	   better	   results	   can	   be	   obtained	   by	   increasing	   the	  frequency	  of	  the	  clock	  up	  to	  200	  MHz,	  but	  SNAVA	  is	  working	  at	  125	  MHz	  to	  work	  at	  the	  same	  clock	  of	  the	  communication	  interfaces	  as	  Ethernet	  communication	  system	  and	  AER	  system,	  in	  order	  to	  avoid	  problems	  of	  synchronization.	  Whereas	   SNAVA	   cannot	   emulate,	   in	   the	   implementation	   on	   the	   FPGA,	   much	   more	   than	   2	  neurons	  and	  50	  synapses	  per	  100	  processors,	   it	   is	  reasonable	  to	  put	  a	  real	  roof	  of	  400	  µs	  in	  the	  worst	  case	  (with	  display)	  as	  the	  maximum	  time	  for	  processing	  in	  a	  single-­‐step	  simulation.	  Therefore,	   the	   processing	   time	   is	   far	   below	   the	   resolution	   time	   in	   the	   biological	   neurons,	  which	  is	  around	  of	  1	  ms.	  	  
Hence	  there	  are	  large	  leeways	  to	  change	  the	  architecture,	  as	  it	  can	  operate	  swapping	  part	  of	  the	  performance	  in	  processing	  time	  with	  savings	  from	  other	  points	  of	  view.	  	  
Chapter 2 – Brief analysis of performance and area occupation of SNAVA 
 
	   37	  
2.2 Area consumption 













FF	   6623	  –	  2%	   15282	  –	  6%	   31532	  –	  8%	   58257	  –	  14%	   99487	  –	  24%	   407600	  
LUT	   9805	  –	  5%	   29595	  –	  15%	   67727	  –	  33%	   109801	  –	  54%	   171291	  –	  84%	   203800	  
Memory	  LUT	   149	  –	  1%	   149	  –	  1%	   149	  –	  1%	   149	  –	  1%	   149	  –	  1%	   64000	  
BRAM	   39	  –	  4%	   51	  –	  6%	   71	  –	  8%	   99	  –	  11%	   135	  –	  15%	   890	  
 Table	  2.1:	  Utilization	  Summary	  of	  Fully	  Connected	  SNAVA	  with	  single	  virtual	  layer	  	  (1	  neuron	  per	  PE)	  	  	  By	   looking	   at	   the	  Table	   2.1	   it	   can	   be	   clearly	   understood	   that	   the	   LUT	   consumption	   plays	   a	  major	  role	  when	  the	  size	  of	  the	  array	  increases.	  More	  PEs	  means	  a	  larger	  number	  of	  neurons,	  so	  more	   logic	   needed	   to	  manage	   the	   processing	   and	  more	   and	   larger	   CAMs	   to	   achieve	   the	  synaptic	   interconnection	   between	   neurons.	   Furthermore,	   the	   critical	   limitation	   is	   that	  increasing	   the	   number	   of	   synapses	   that	   can	  be	   emulated	  per	   PE,	   the	   number	   of	   used	   LUTs	  grows	  dramatically.	  All	  these	  factors	  almost	  saturate	  the	  FPGA	  without	  getting	  an	  excessively	  large	  number	  of	  neurons	  and	  synapses.	  	  The	   use	   of	   too	   many	   LUTs	   is	   thus	   the	   factor	   that	   primarily	   limits	   the	   potential	   of	   SNAVA	  emulator.	  	  Regarding	  the	  FF	  and	  above	  all	  the	  BRAM,	  it	  appears	  that	  these	  resources	  are	  used	  in	  a	  small	  amount,	   compared	   with	   the	   LUT.	   So	   there	   is	   a	   clear	   imbalance	   in	   the	   usage	   of	   the	   FPGA	  resources	  ,	  which	  are	  not	  exploited	  in	  the	  most	  efficient	  way.	  
Chapter 2 – Brief analysis of performance and area occupation of SNAVA 
 
	   38	  
 
 
 Figure	  2.2:	  LUTs	  and	  FFs	  consumption	  VS	  array	  size	  in	  SNAVA	  
 
 Figure	   2.3,	   instead,	   shows	   the	   resource	   occupation	   of	   each	   module	   of	   the	   architecture.	   Of	  course	   the	   processor	   array	   (PU_inst)	   is	   the	   part	   that	   occupies	   most	   of	   the	   area,	   since	   it	  contains	  the	  cellular	  processor	  elements,	  the	  BRAMs	  and	  above	  all	  the	  CAMs,	  that	  increasing	  the	  number	  of	  synapses	  increases	  their	  size.	  	  	   	  
Chapter 2 – Brief analysis of performance and area occupation of SNAVA 
 
	   39	  
	  	  Figure	  2.3:	  Resource	  occupation	  of	  each	  module	  of	  SNAVA	  [17]
AER_CU	  9%	   CONFIG_inst	  0%	  cpu_access_int	  3%	  
PU_inst	  86%	  
seq_inst	  2%	   SNAVA_inst_tot	  
Chapter 3 – Improvement proposals 
 
	   40	  
Chapter 3 – Improvement proposals 
 	  In	  chapters	  1	  and	  2	  the	  characteristics	  and	  the	  results	  of	  the	  implementation	  of	  SNAVA	  have	  been	  analyzed,	   so	  as	   to	  highlight	   the	  key	   features	  and	   to	   identify	  possible	   improvements	   to	  the	  architecture.	  	  In	   particular,	   in	   the	   analysis	   concerning	   the	   area	   consumption	   of	   the	   FPGA,	   it	   has	   been	  realized	   that	   the	   excessive	   consumption	   of	   the	   LUT	   is	   the	   factor	   that	   limits,	   more	   than	  anything,	   the	  potential	  of	  SNAVA	   in	   terms	  of	  neurons	  and	  above	  all	  of	  synapses	   that	  can	  be	  emulated.	  Therefore,	  the	  first	  objective	  in	  the	  design	  of	  the	  architecture	  	  SNAVA	  +	  is	  to	  make	  a	  better	  use	  of	  available	  resources.	  To	  obtain	   a	  decrease	   in	   the	  use	  of	   LUTs	   (and	   in	   general	   of	   all	   the	   resources),	   the	   first	   step	  should	  be	  to	  analyze	  the	  architecture	  and	  identify	  some	  structures	  and	  functionalities	  that	  can	  be	   simplified	   or	   even	   eliminated.	   Therefore	   simplify,	   but	   always	   keeping	   clear	   that	   the	  features	   and	   strengths	   of	   the	   emulator	   should	   not	   be	   affected.	   SNAVA	   +	   will	   preserve	   the	  property	   to	  support	  any	  SNN	  model	  and	  algorithm,	   that	   is	  one	  of	   the	  highlights	  of	   this	  SNN	  emulator.	  	  Another	  aspect	  on	  which	  SNAVA+	  will	  especially	  focus,	  is	  to	  increase	  the	  number	  of	  neurons	  that	  can	  be	  emulated	  with	  a	  single	  chip.	  The	  structure	  of	  SNAVA	  can	   theoretically	  get	  up	   to	  700	  simulated	  neurons.	  In	  fact,	  each	  processor	  of	  the	  array	  contains	  7	  banks	  of	  8	  registers	  of	  16-­‐bit,	   to	   store	   the	  neuronal	   parameters	   of	   each	  neuron	   simulated.	  To	   each	  neuron	   is	   then	  assigned	   a	   register	   bank,	   thus	   obtaining	   seven	   ‘virtualization	   layers',	   i.e.	   7	   neuron	   per	  processor.	  	  However,	   in	   reality,	   it	   is	   possible	   to	   implement	   on	   the	   FPGA	   only	   up	   to	   two	   neurons	   per	  processor,	  due	   to	   the	  excessive	   consumption	  of	   area	  of	   this	   strategy.	   Indeed,	   increasing	   the	  levels	  of	  virtualization	  drastically	  increases	  the	  consumption	  of	  FF.	  It	  also	  requires	  a	  greater	  number	  of	  synapses,	  so	  that	  each	  neuron	  has	  a	  sufficient	  number	  of	  them.	  	  The	   proposed	   solution	   to	   this	   problem,	   is	   to	  move	   the	   storage	   of	   the	   neuronal	   parameters	  from	  the	   internal	  registers	  of	   the	  processor	  (shadow	  registers	  banks),	   to	  BRAMs	   internal	   to	  each	  PE	  of	  the	  array.	  	  	  
Chapter 3 – Improvement proposals 
 
	   41	  
	  This	  strategy	  will	  take	  advantage	  of	  the	  large	  number	  of	  available	  BRAM:	  about	  85%	  of	  BRAM	  are	  not	  used	  in	  SNAVA.	  This	  change	  will	  lower	  the	  percentage	  of	  utilization	  of	  other	  resources.	  Furthermore,	  each	  BRAM	  contains	  1024	  words	  of	  32	  bits	   ,	  which	  means	  that	   it	  can	  increase	  significantly	  the	  number	  of	  neurons	  emulated	  by	  each	  processor.	   In	  this	  way	  the	  number	  of	  neurons	  will	   no	   longer	   be	   so	   closely	   related	   and	   limited	   by	   the	   resources	   available	   on	   the	  FPGA.	  The	  new	   strategy	  will	   require	   changes	   to	   the	   architecture	   on	  multiple	   levels.	   There	  will	   be	  changes	  in	  the	  structure	  of	  the	  processing,	  both	  in	  the	  processing	  of	  the	  neuronal	  parameters	  and	   synaptic	   parameters.	   Will	   be	   necessary,	   therefore,	   new	   instructions,	   to	   exploit	   the	  changes	   on	   the	   architecture.	   	   So	   it	  will	   have	   to	  work	   on	   the	   control	   unit.	   Secondly,	   it	  must	  operate	  on	  the	  datapath	  of	  the	  processors.	  The	  goal	  is	  to	  make	  such	  changes	  as	  non-­‐invasive	  as	  possible,	  so	  as	  not	  to	  disrupt	  or	  undermine	  the	  global	  behavior	  of	  the	  machine.	  The	   other	   aspect,	   which	   comes	   as	   a	   consequence	   of	   these	   changes,	   is	   the	   increase	   of	   the	  number	  of	  synapses,	  which	  is	  a	  key	  objective.	  The	  complexity	  and	  the	  potentialities	  of	  a	  SNN	  are	  higher	  as	  the	  number	  of	  neurons	  and	  especially	  of	  synapses	  is	  greater.	  A	   more	   detailed	   description	   of	   the	   structural	   changes	   which	   characterize	   SNAVA	   +	   are	  described	   in	   Chapter	   4.	   Chapter	   5	   focuses,	   instead,	   on	   the	   results	   obtained	   in	   the	  implementation	  of	  SNAVA	  +	  	  on	  the	  FPGA.	  
Chapter 4 – Implementation of SNAVA + 
 
	   42	  
Chapter 4 – Implementation of SNAVA + 
 This	  chapter	  describes	  in	  general	  terms	  the	  changes	  made	  to	  the	  architecture	  of	  SNAVA	  that	  characterize	   SNAVA+	   .	   The	   first	   part	   deals	  with	   the	   optimization	   and	   the	   removal	   of	   some	  unnecessary	   features	   of	   SNAVA.	   The	   second	   part	   focuses	   on	   the	   architectural	   changes	  outright. 
 
4.1 Instruction set update 
 The	  first	  step	  to	  get	  an	  optimized	  and	  more	  powerful	  version	  of	  the	  architecture,	  has	  been	  to	  identify	   unnecessary	   elements	   and	   functionality	   rarely	   (or	   never)	   used:	   the	   result	   of	   this	  analysis	  was	  a	  "lightening"	  of	  the	  instruction	  set.	  This	  has	  allowed	  to	  recover	  a	  quantity	  of	  resources	  and	  area	  of	  the	  FPGA	  not	  striking,	  but	  still	  enough	  to	  justify	  the	  elimination	  of	  some	  small	  features	  not	  very	  useful	  and	  easily	  replaceable	  without	  substantial	  drawbacks.	  The	   table	   4.1	   below	   summarizes	   the	   percentage	   of	   usage	   of	   the	   removed	   instructions,	   into	  three	   different	   assembler	   codes	   based	   on	   three	   reference	   SNN	  models	   (LIF,	   Izhikevich[14]	  and	  Iglesias	  &	  Villa[19]).	  	  	  	  
Instructions	   Lif	   Izhikevich	   Iglesias	  &	  Villa	  
TOTAL	  N.	  of	  INSTRUCTIONS	   139	   145	   326	  
SET	   0	  (0%)	   0	  (0%)	   0	  (0%)	  
RTL	  	  	  	  	  	  	   4	   3	  (2%)	   13	  (4%)	  
RTR	  	  	  	  	  	   6	   6	  (4%)	   21	  (6,4%)	  
INC	  	  	  	  	  	  	   1	   1	  (0,7%)	   3	  (0,9%)	  
DEC	  	  	  	  	  	   0	  (0%)	   0	  (0%)	   1	  (0,3%)	  
NEG	  	  	  	  	  	   0	  (0%)	   0	  (0%)	   0	  (0%)	  
XOR	  	  	  	  	   0	  (0%)	   0	  (0%)	   0	  (0%)	  
SWAPM	  	  	   0	  (0%)	   0	  (0%)	   0	  (0%)	  
	  
Table	  4.1	  -­‐	  Removed	  SNAVA	  instructions	  	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   43	  
	  It	  is	  thus	  clear	  that	  there	  are	  two	  main	  cases:	  	  
• There	   are	   instructions	   never	   used	   in	   these	   three	   models	   (and	   it	   can	   reasonably	  assumed	  a	  priori	  that	  their	  use	  is	  however	  almost	  null,	  in	  any	  type	  of	  application	  on	  the	  SNN	   emulation).	   	   Anyway,	   these	   operations	   are	   fully	   replaceable	   and	   achievable	   via	  software	  using	  the	  others	  present	  instructions:	  	  
§ SET	  reg:	  sets	  to	  ‘1’	  all	  the	  bits	  of	  a	  selected	  register.	  This	  instruction	  can	  be	   replaced	   by	   “LDALL	   reg,SETVALUE”	   that	   loads	   the	   constant	  SETVALUE	  	  in	  the	  selected	  register	  of	  all	  the	  PEs	  of	  the	  array.	  SETVALUE	  is	  a	  constant	  that	  has	  to	  be	  declared	  in	  the	  program	  and	  should	  have	  the	  value	  “0000FFFF”.	  The	  replacement	  of	  SET	  instruction	  does	  not	  require	  extra	   time,	   because	   it	   is	   substituted	   by	   only	   one	   instruction.	   The	   only	  drawback	   is	   that	   it	   needs	   to	   declare	   a	   constant	   in	   the	   program	   (for	  example	   SETVALUE),	   that	   increase	   the	   program	   size,	   but	   in	   a	   totally	  negligible	  way.	  	  
§ NEG	  reg:	  loads	  in	  the	  accumulator	  the	  value	  of	  reg	  with	  the	  opposite	  sign.	  This	   instruction	   can	   be	   replaced	   by	   using	   2	   instructions.	   Firstly	   the	  accumulator	   should	   be	   reset	   with	   “RST	   R0”	   (the	   register	   R0	   is	   the	  accumulator).	  Lastly,	  with	  “SUB	  reg”	  the	  result	  of	  the	  operation	  R0	  –	  reg	  is	   loaded	   into	   the	   accumulator.	   The	   replacement	   of	   NEG	   instruction	  requires	  an	  extra	  time	  of	  one	  clock	  cycle.	  	  
§ XOR	  reg:	   loads	   in	  the	  accumulator	  the	  result	  of	   the	   logical	  xor	  between	  reg	   and	   the	   accumulator	   itself.	   Logically	   the	   xor	   operation	   could	   be	  replaced	   in	   several	  ways.	   The	  way	   that	   requires	   the	   lowest	   number	   of	  operations	  in	  SNAVA+	  is:	  A	  ⊕	  B	  =	  	  (A	  or	  B)	  and	  (not	  (A	  and	  B)).	  Thus,	  in	  this	  case	  XOR	  reg	  becomes:	  	  	   1. MOVR	   reg_aux1	   (moves	   the	   original	   value	   of	   accumulator	   in	   one	   of	   the	  	  	  	  	  	  registers	  of	  the	  processor,	  chosen	  by	  the	  programmer)	  
Chapter 4 – Implementation of SNAVA + 
 
	   44	  
2. AND	   reg	   (loads	   in	   the	   accumulator	   the	   logical	   and	   between	   the	   accumulator	  and	  reg)	  3. INV	  R0	   (performs	  a	  logical	  NOT	  on	  the	  accumulator,	  and	  stores	  the	  result	  in	  the	  accumulator	  itself)	  4. MOVR	  reg_aux2	   (moves	  the	  value	  of	  accumulator	  in	  one	  of	  the	  registers	  of	  the	  processor,	  chosen	  by	  the	  programmer)	  5. MOVA	  reg	  (moves	  the	  value	  of	  reg	  in	  the	  accumulator	  register)	  6. OR	   reg_aux1	   (performs	   a	   logical	   OR	   between	   the	   accumulator	   and	   the	  reg_aux1,	  which	  contains	  the	  original	  value	  of	  the	  accumulator)	  7. AND	   reg_aux2	   (performs	   a	   logical	   AND	   between	   the	   accumulator	   and	   the	  value	  stored	  in	  reg_aux2.	  The	  result	   is	  stored	  in	  the	  accumulator,	  and	  it	   is	   just	  the	  XOR	  between	  reg	  and	  the	  original	  value	  of	  the	  accumulator)	  	  	  Thus,	   to	   replace	   the	   XOR	   instruction	   causes	   an	   extra-­‐time	   of	   6	   clock	  cycles.	   It	   also	   requires	   two	   registers	   of	   support.	   All	   this	   is	   perfectly	  feasible	   in	  SNAVA	  +,	   although	  much	  more	   complex,	   as	   it	   requires	  more	  instructions	   and	  a	   special	   attention	   to	   the	   "lifetime"	  of	   the	   registers.	   In	  fact	   it	   is	   needed	   to	   override	   the	   values	   of	   the	   two	   "support	   registers"	  required	  to	  realize	  the	  XOR	  instruction.	  	  
§ SWAPM	  reg	   :	   swap	   the	   active	   register	   ”reg”	  with	   all	   the	   corresponding	  shadow	  register	  of	  all	  the	  shadow	  register	  banks.	  This	  operation	  can	  be	  replaced	  by	  multiple	  SWAPS	  instructions	  (SWAPS	  reg),	  which	  performs	  the	   same	   swap	   operation	   between	   active	   and	   shadow	   registers.	  Moreover,	  in	  SNAVA+	  this	  SWAPM	  operation	  has	  no	  more	  sense,	  because	  each	  PE	  has	  only	  one	  shadow	  register	  bank	  (see	  section	  4.2)-­‐	  	  	  
• There	  are	   some	   instructions	   like	  RTR,	  RTL,	   INC	  and	  DEC	   that	   are	   instead	  used	  often	  enough,	  but	  which	  are	  special	  cases	  of	  other	  more	  general	   instructions	  and	  therefore	  easily	  replaceable	  in	  hardware,	  without	  significant	  drawback	  in	  terms	  of	  performance:	  	  
§ RTR:	   performs	   a	   right	   shift	   operation	   of	   1	   bit,	   so	   it	   is	   obviously	  replaceable	  with	  the	  general	  right	  shift	  operation	  of	  n	  positions	  (SHRN).	  There	   is	   not	   any	   extra	   time	   and	   extra	   program	   size,	   because	   RTR	   is	  replaced	  by	  one	  single	   instruction.	  The	  drawback	   is	  a	  slight	   increase	   in	  
Chapter 4 – Implementation of SNAVA + 
 
	   45	  
power	   consumption,	   because	   the	   general	   shift	   operation	   needs	   to	  perform	  a	  control	  on	  the	  number	  of	  shift	  operation	  already	  performed.	  	  	  
§ RTL:	  performs	  a	  left	  shift	  operation	  of	  1	  bit,	  so	  it	  is	  obviously	  replaceable	  with	  the	  general	   left	  shift	  operation	  of	  n	  positions	  (SHLN).	  There	   is	  not	  any	  extra	  time	  and	  extra	  program	  size,	  because	  RTL	  is	  replaced	  by	  one	  single	   instruction.	   The	   drawback	   is	   a	   slight	   increase	   in	   power	  consumption	   because	   the	   general	   shift	   operation	   needs	   to	   perform	   a	  control	  on	  the	  number	  of	  shift	  operation	  already	  performed.	  	  
§ INC:	  sums	  1	  to	  the	  value	  of	  the	  accumulator,	  is	  a	  particular	  case	  of	  ADD	  operation,	   so	   completely	   replaceable	   with	   it.	   This	   operation	   can	   be	  replaced	   by	   “LDALL	   reg,ONE”	   (ONE	   may	   be	   a	   constant	   previously	  defined	  by	  the	  programmer	  with	  the	  value	  0x00000001),	  and	  finally	  by	  the	   instruction	   “ADD	   reg”,	   which	   in	   this	   case	   sums	   ONE	   to	   the	  accumulator	   and	   stores	   the	   result	   in	   the	   accumulator	   itself.	   Therefore,	  there	   is	   an	   extra	   time	   of	   one	   clock	   and	   extra	   program	   size	   of	   one	  instruction	  more	  (and	  eventually	  a	  constant	  declaration).	  	  	  
§ DEC:	  subtracts	  1	  to	  the	  value	  of	   the	  accumulator,	   is	  a	  particular	  case	  of	  SUB	  operation,	  so	  completely	  replaceable	  with	  it.	  This	  operation	  can	  be	  replaced	   by	   “LDALL	   reg,ONE”	   (ONE	   may	   be	   a	   constant	   previously	  defined	  by	  the	  programmer	  with	  the	  value	  0x00000001),	  and	  finally	  by	  the	   instruction	   “SUB	   reg”,	   which	   in	   this	   case	   subtracts	   ONE	   to	   the	  accumulator	   and	   stores	   the	   result	   in	   the	   accumulator	   itself.	   Therefore,	  there	   is	   an	   extra	   time	   of	   one	   clock	   and	   extra	   program	   size	   of	   one	  instruction	  more	  (and	  eventually	  a	  constant	  declaration).	  	  	  Despite	  these	  operations	  taken	  alone	  consume	  only	  a	  small	  quantity	  of	  area	  (LUTs)	  of	  FPGA,	  it	  	  should	  be	  considered	  that	  the	  logic	  required	  to	  implement	  them	  is	  synthesized	  in	  each	  one	  of	  the	   100	   processors	   of	   the	   array,	   thus	   the	   savings	   that	   is	   obtained	   by	   deleting	   them	   is	  substantial.	  The	  results	  are	  shown	  and	  commented	  in	  chapter	  5.	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   46	  
4.2 From shadow registers to neuronal BRAM 	  In	  SNAVA	  architecture,	  each	  processor	  of	  the	  array	  can	  have	  up	  to	  7	  banks	  of	  registers	  (called	  shadow	  registers	  bank),	  used	  to	  store	  the	  neuronal	  parameters.	  For	  each	  neuron	  there	  is	  a	  shadow	  registers	  bank.	  This	  means	  that	  the	  maximum	  number	  of	  neurons	  that	  can	  be	  achieved	  per	  processor	  is	  theoretically	  seven.	  	  However,	   due	   to	   the	   limited	   number	   of	   resources	   on	   the	   FPGA,	   it	   is	   possible	   to	   implement	  only	  up	  to	  two	  neurons	  per	  processor	  .	  	  In	   order	   to	   increase	   the	   number	   of	   neurons,	   in	   SNAVA+	   the	   storage	   of	   the	   neuronal	  parameters	   is	  moved	   from	  the	  shadow	  registers	  banks	   to	  BRAMs	   internal	   to	  each	  PE	  of	   the	  array.	  	  For	  each	  neuron	  emulated	  by	  the	  PE	  of	  the	  array,	  are	  allocated	  a	  configurable	  number	  of	  words	  of	  memory,	  which	  contains	  and	  store	  the	  neuronal	  parameters	  for	  that	  specific	  neuron.	  Depending	  on	  the	  complexity	  of	  the	  model	  and	  the	  algorithm	  implemented	  in	  SNAVA	  +,	  it	  can	  be	  chosen	  how	  many	  BRAM	  word	  are	  necessary	  for	  the	  storage	  of	  the	  neuronal	  parameters.	  This	  number	  can	  be	  configured	  by	  writing	  in	  the	  n_w_reg	  register	  (located	  in	  the	  sequencer)	  from	  an	  external	  CPU	  via	  Ethernet	  interface,	  and	  can	  have	  a	  value	  between	  one	  and	  eight.	  	  The	  figures	  4.1	  and	  4.2	  below,	  illustrates	  the	  new	  structure	  of	  the	  single	  processing	  element	  of	  the	  array	  (also	  called	  Cellular	  Processing	  Element)	  with	  the	  neuronal	  BRAM,	  and	  the	  structure	  of	  the	  CPE	  without	  the	  shadow	  registers.	  Comparing	  figure	  4.1	  with	  SNAVA	  CPE	  structure	  (see	  figure	  1.8),	  the	  main	  difference	  is	  in	  the	  presence	  of	  the	  neuronal	  BRAM.	  Furthermore,	  how	  shown	  in	  figure	  4.2,	  SNAVA+	  has	  only	  one	  bank	  of	  “shadow”	  registers,	  that	  are	  connected	  with	  the	  data	  bus	  of	  the	  Neuronal	  BRAM.	  
Chapter 4 – Implementation of SNAVA + 
 
	   47	  
	  	  
Figure	  4.1	  -­‐	  SNAVA+	  cellular	  processing	  element	  
	  
	  
Figure	  4.2	  -­‐	  	  SNAVA+	  CPE	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   48	  
The	   increase	   of	   neuron	   emulated	   is	   possible	   mainly	   thanks	   to	   a	   trade-­‐off	   between	   area	  occupation	   and	   processing	   timing,	   because	   to	  move	   the	   neuronal	   parameters	   from	   shadow	  registers	  to	  BRAM	  (see	  Figure	  4.3)	  increases	  the	  execution	  time,	  but	  at	  the	  same	  time	  reduces	  drastically	  	  the	  number	  of	  LUTs	  used	  on	  the	  FPGA,	  which	  was	  the	  critical	  resources	  that	  limits	  the	  maximum	  number	  of	  neurons	  in	  the	  previous	  architecture.	  Information	  and	  more	  detailed	  data	  about	  the	  results	  are	  provided	  in	  Chapter	  5.	  In	   SNAVA	   the	   maximum	   number	   of	   neuron	   per	   each	   processor	   was	   7,	   instead	   this	   new	  architecture	  potentially	  could	  get	  up	  to	  100	  neurons	  per	  each	  processor	  (ideally	  128,	  but	  it	  is	  preferable	  not	  to	  take	  to	  the	  limit	  the	  resources	  available	  and	  also	  100	  is	  a	  value	  that	  can	  be	  considered	  more	  standard	  in	  an	  SNN	  hardware	  emulator).	  In	   fact,	   the	   available	   BRAMs	   have	   32	   bits	   of	   data	   and	   10	   bits	   of	   address	   and	   since	   every	  neuron	  can	  have	  maximum	  8	  BRAM	  words	  assigned:	  	   𝑀𝑎𝑥.𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑛𝑒𝑢𝑟𝑜𝑛𝑠   =   2!"8   =   128	  
	  	  
	  	  
Figure	  4.3	  -­‐	  Neuronal	  parameters	  in	  SNAVA	  and	  in	  SNAVA+,	  shadow	  registers	  VS	  BRAM	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   49	  
Moreover,	   thanks	   to	   the	   savings	   in	   terms	  of	   area	   occupied	   on	   the	   FPGA,	   in	   SNAVA+	   is	   also	  possible	  to	  increase	  the	  number	  of	  synapses	  that	  can	  be	  emulated	  for	  each	  processing	  element	  of	  the	  array;	  this	  is	  illustrated	  in	  detail	  in	  Chapter	  5	  of	  this	  thesis.	  	  Increase	   so	   much	   the	   number	   of	   virtualization,	   however,	   would	   require	   a	   reversal	   of	   the	  hardware	   of	   the	   AER,	   a	   task	   that	   is	   beyond	   the	   scope	   of	   this	   thesis,	   mainly	   for	   technical	  reasons	  of	  time.	  Therefore,	  in	  a	  further	  development	  of	  the	  architecture	  it	  can	  be	  suitable	  to	  made	  changes	  to	  the	  AER	  interface,	  in	  order	  to	  exploit	  fully	  the	  potential	  of	  the	  hardware	  that	  SNAVA+	  offers.	  	  Table	   4.2	   compares	   the	  maximum	  number	   of	   neurons	   available	   in	   all	   the	   FPGA	   for	   various	  array	  sizes	  of	  SNAVA,	  of	  SNAVA+	  and	  of	  a	  likely	  future	  version	  that	  can	  take	  full	  advantage	  of	  all	  the	  available	  hardware	  of	  SNAVA+.	  	  	  
Array	  size	   SNAVA	   SNAVA	  +	   SNAVA	  +	  fully	  
exploited	  2	  x	  2	  SIMD	   8	   28	   400	  4	  x	  4	  SIMD	   32	   112	   1600	  6	  x	  6	  SIMD	   72	   252	   3600	  8	  x	  8	  SIMD	   128	   448	   6400	  





(50000	  synapses)	  	  
Table	  4.2	  -­‐	  Number	  of	  neurons	  available	  in	  all	  over	  the	  chip	  –	  for	  the	  maximum	  array	  size	  are	  
also	  shown	  the	  maximum	  number	  of	  synapses	  achievable	  with	  the	  used	  FPGA	  (Xilinx	  Kintex-­‐7)	  	  	  	  SNAVA+	  architecture	  has	  only	  one	  bank	  of	  16	  bit	  shadow	  registers	  per	  processing	  unit,	  used	  for	   a	  momentary	   load	   and	   store	   of	   the	   neuronal	   parameters	   from/to	   the	   neuronal	   BRAM	  (BRAM_N).	  Hence,	  the	  role	  of	  the	  shadow	  register	  is	  only	  to	  act	  as	  an	  intermediary	  between	  the	  BRAM_N	  and	  the	  active	  registers	  bank,	  in	  which	  is	  accomplished	  the	  processing	  of	  the	  parameters.	  	  The	  mapping	  of	   the	  neuronal	  parameters	   in	  the	  shadow	  registers	  depends	  on	  the	  used	  SNN	  model	  and	  also	  on	  the	  number	  of	  BRAM	  words	  used	  per	  neuron	  (n_w_neuron).	  	  
	  
Chapter 4 – Implementation of SNAVA + 
 
	   50	  
4.3 New instructions 
	  SNAVA+,	  due	  to	  the	  architecture	  changes	  explained	  in	  section	  4.2,	  requires	  the	  introduction	  of	  new	  instructions,	  basically	  to	  manage	  the	  neuronal	  parameters	  stored	  in	  the	  neuronal	  BRAMs.	  The	  creation	  of	  new	  operations	  has	  requested	  firstly	  the	  modification	  of	  the	  sequencer	  with	  the	  creation	  of	  new	  states	  in	  the	  FSM	  of	  the	  control	  unit,	  and	  secondly	  the	  modification	  of	  the	  processing	  unit,	  therefore	  of	  the	  processor	  and	  its	  interfacing	  with	  the	  memory.	  These	  changes	  needed	  special	  attention,	  especially	  in	  the	  management	  of	  the	  timing,	  as	  the	  risk	  of	  these	  invasive	  changes	  to	  affect	  the	  functionality	  of	  the	  architecture	  was	  very	  high;	  special	  attention	  has	  been	  paid	  to	  insert	  pipe	  registers	  to	  delay	  appropriately	  the	  signals	  that	  propagate	  from	  the	  sequencer	  to	  the	  heart	  of	  the	  processor,	  in	  order	  to	  avoid	  the	  creation	  of	  critical	  paths	  when	  implementing	  the	  architecture	  on	  the	  FPGA.	  As	  a	  result,	  the	  post-­‐synthesis	  reports	  showed	  a	  slack	  time	  greater	  or	  equal	  to	  0	  for	  every	  path.	  	  The	  new	  instructions	  added	  are:	  	  	   • LOADNP	  
	  • STORENP	  
	   • LOADNPS	  
	  








FUNCTION	  LOADNP	   LOADNP	   LOADNP	   110010	   shadow_registers	  <=	  BRAM_N	  
STORENP	   STORENP	   STORENP	   110011	   BRAM_N	  <=	  shadow_registers	  
LOADNPS	   LOADNPS	   LOADNPS	   110100	   shadow_register(i*)<=BRAM_N	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  BRAM_N	  <=	  	  shadow_register(i*)	  
	  
	   	   Table	  4.3	  –	  New	  instructions	  implemented	  in	  SNAVA+.	  
*	  i	  indicates	  that	  the	  shadow	  register	  can	  be	  any	  of	  the	  8	  registers	  of	  the	  bank	  
	  
Chapter 4 – Implementation of SNAVA + 
 
	   51	  
4.3.1 LOADNP instruction 
	  LOADNP	  instruction	  loads	  in	  the	  shadow	  registers	  the	  neuronal	  parameters	  stored	  in	  the	  neuronal	  BRAM.	  Therefore	  this	  operation	  should	  be	  put	  in	  the	  neuronal	  loop,	  before	  processing	  the	  neuronal	  parameters	  of	  each	  neuron.	  	  This	  operation	  needs	  to	  halt	  the	  sequencer	  for	  a	  number	  of	  cycles	  equal	  to	  the	  value	  of	  n_w_neuron,	  stored	  in	  the	  configurable	  register	  “n_w_reg”.	  So	  in	  order	  to	  do	  this,	  when	  a	  LOADNP	  instruction	  is	  read	  from	  the	  instruction	  memory,	  until	  the	  number	  of	  words	  of	  BRAM_N	  read	  is	  not	  equal	  to	  the	  number	  of	  words	  for	  each	  neuron	  (n_w_neuron),	  the	  specific	  halt	  signal	  (halted_l)	  is	  high	  and	  therefore	  the	  program	  counter	  is	  not	  incremented	  and	  the	  sequencer	  is	  stopped.	  	  The	  flow	  chart	  in	  Figure	  4.4	  shows	  in	  detail	  how	  LOADNP	  instruction	  works,	  form	  the	  point	  of	  view	  of	  the	  sequencer.	  When	  the	  opcode	  of	  the	  fetched	  instruction	  is	  equal	  to	  LOADNP,	  the	  program	  counter	  is	  incremented	  (note	  that	  PC	  is	  incremented	  only	  once)	  and	  the	  FSM	  of	  the	  sequencer	  goes	  to	  the	  S_LOADNP	  state.	  Here	  is	  activated	  the	  enable	  signal	  of	  the	  counter	  (inc_n_word_neu_count),	  the	  address	  of	  the	  BRAM_N	  is	  incremented	  and	  the	  sequencer	  is	  stopped	  thanks	  to	  the	  halted_l	  signal.	  To	  avoid	  that	  during	  the	  LOADNP	  operation	  the	  sequencer	  continues	  to	  read	  the	  next	  instruction	  from	  memory,	  the	  read_BRAM	  signal	  is	  disabled,	  thus	  avoiding	  the	  loss	  of	  a	  certain	  number	  of	  instructions.	  	  	  Only	  when	  all	  the	  BRAM_N	  words	  of	  the	  neuron	  have	  been	  read	  and	  all	  the	  neuronal	  parameters	  have	  been	  moved	  in	  the	  shadow	  registers,	  LOADNP	  instruction	  ends	  and	  the	  next	  instruction	  can	  be	  fetched.	  	  LOADNP	  instruction	  requires	  n	  clock	  cycles,	  where	  n	  is	  the	  number	  of	  neuronal	  BRAM	  words	  used	  per	  neuron	  (n_w_neuron).	  
	  
Chapter 4 – Implementation of SNAVA + 
 
	   52	  
	  
	  





Chapter 4 – Implementation of SNAVA + 
 
	   53	  
4.3.2 STORENP instruction 
	  STORENP	  instruction	  stores	  in	  the	  neuronal	  BRAM	  the	  neuronal	  parameters	  contained	  in	  the	  shadow	  registers.	  Therefore	  this	  operation	  should	  be	  put	  at	  the	  end	  of	  neuronal	  loop,	  after	  processing	  the	  neuronal	  parameters	  of	  each	  neuron.	  	  This	  operation	  needs	  to	  halt	  the	  sequencer	  for	  a	  number	  of	  cycles	  equal	  to	  the	  value	  of	  n_w_neuron,	  stored	  in	  the	  configurable	  register	  “n_w_reg”.	  So	  in	  order	  to	  do	  this,	  when	  a	  STORENP	  instruction	  is	  read	  from	  the	  instruction	  memory,	  until	  the	  number	  of	  words	  of	  BRAM_N	  read	  is	  not	  equal	  to	  the	  number	  of	  words	  for	  each	  neuron	  (n_w_neuron),	  the	  specific	  halt	  signal	  (halted_l)	  is	  high	  and	  therefore	  the	  program	  counter	  is	  not	  incremented	  and	  the	  sequencer	  is	  stopped.	  	  The	  flow	  chart	  in	  Figure	  4.7	  shows	  more	  in	  detail	  how	  STORENP	  instruction	  works,	  form	  the	  point	  of	  view	  of	  the	  sequencer.	  For	  the	  detailed	  description	  of	  the	  flow	  chart	  refer	  to	  LOADNP	  section	  (4.4.1),	  since	  from	  the	  point	  of	  view	  of	  the	  sequencer,	  LOADNP	  and	  STORENP	  instructions	  work	  in	  the	  same	  way.	  	  Like	  LOADNP,	  also	  STORENP	  instruction	  requires	  n	  clock	  cycles,	  where	  n	  is	  the	  number	  of	  neuronal	  BRAM	  words	  used	  per	  neuron	  (n_w_neuron).	  	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   54	  
	  	  





Chapter 4 – Implementation of SNAVA + 
 
	   55	  
4.3.3 LOADNPS instruction 
	  LOADNPS	  instruction	  moves	  one	  neuronal	  parameter	  of	  the	  neuron	  whose	  synapses	  are	  processing,	  from	  the	  neuronal	  BRAM	  to	  the	  shadow	  register	  used	  to	  store	  this	  parameter.	  In	  the	  synthesized	  architecture	  it	  was	  chosen	  to	  implement	  the	  LIF	  model,	  so	  the	  neuronal	  parameter	  that	  required	  to	  be	  loaded	  from	  the	  BRAM	  also	  in	  the	  synaptic	  loop	  is	  SumW	  (sum	  of	  weights).	  	  In	  this	  implementation	  of	  LIF	  model	  it	  was	  chosen	  to	  store	  SumW	  parameter	  in	  the	  shadow	  register	  2	  (SR2).	  Note	  that	  if	  it	  is	  wanted	  to	  implement	  in	  SNAVA+	  another	  SNN	  model,	  whose	  neuronal	  parameters	  are	  not	  compatible	  with	  those	  of	  LIF	  model	  (it	  may	  have	  more	  parameters	  or	  less	  parameters,	  or	  request	  more	  or	  less	  bits),	  it	  must	  be	  changed	  the	  connection	  between	  the	  data	  bus	  of	  the	  neuronal	  BRAM	  and	  the	  shadow	  register	  of	  the	  PE,	  and	  re-­‐synthesized	  the	  architecture.	  	  LOADNPS	  instruction	  should	  be	  put	  in	  the	  synaptic	  loop,	  but	  even	  if	  it	  will	  be	  executed	  at	  each	  iteration	  of	  the	  loop,	  the	  parameter	  SumW	  will	  actually	  be	  loaded	  from	  the	  BRAM	  to	  SR2	  
only	  once	  for	  each	  neuron.	  Therefore	  is	  performed	  a	  control	  in	  hardware	  that	  avoids	  that	  at	  each	  cycle	  the	  value	  of	  synaptic	  SumW	  is	  always	  reset	  to	  the	  default	  value	  or	  to	  the	  value	  before	  the	  loop	  synaptic	  (see	  figure	  4.6).	  Furthermore	  at	  each	  change	  of	  neuron,	  before	  loading	  the	  new	  value	  of	  SumW	  of	  the	  next	  neuron,	  LOADNPS	  will	  store	  the	  previous	  SumW	  inside	  the	  location	  of	  neuronal	  BRAM	  corresponding	  to	  the	  previous	  neuron.	  This	  operation	  is	  performed	  automatically	  whenever	  there	  is	  a	  change	  of	  neuron	  in	  the	  processing	  of	  the	  synapses.	  Therefore	  it	  is	  important	  to	  note	  that	  there	  is	  not	  an	  instruction	  in	  particular	  that	  does	  this,	  but	  it	  is	  an	  operation	  carried	  out	  at	  low	  level	  and	  “hidden”	  from	  the	  user	  point	  of	  view.	  Figure	  4.6	  shows	  an	  example	  of	  how	  LOADNPS	  works.	  	  LOADNPS	  instruction	  requires	  only	  one	  cycle	  of	  clock.	  	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   56	  
	  
Figure	  4.6	  –	  Example	  of	  how	  LOADNPS	  works.	  In	  this	  case	  each	  processor	  has	  n	  neurons	  and	  the	  
first	  neuron	  has	  10	  synapses.	  
A) The	  synaptic	  loop	  is	  processing	  the	  first	  synapse	  à	  LOADNPS	  moves	  the	  parameter	  (in	  
this	  case	  SumW)	  from	  the	  BRAM	  location	  of	  the	  neuron	  1	  to	  the	  shadow	  register	  (in	  this	  
case	  shadow	  register	  2)	  
B) The	  synaptic	  loop	  continues	  the	  processing	  with	  the	  same	  value	  of	  SumW	  (stored	  in	  SR2)	  
until	  the	  synapse	  10.	  In	  fact,	  LOADNP	  instruction	  detects	  that	  there	  is	  a	  change	  of	  neuron,	  
so	  SumW	  is	  stored	  in	  the	  BRAM	  at	  the	  end	  of	  processing	  of	  synapse	  10.	  
C) Since	  is	  starting	  the	  processing	  of	  the	  synapses	  of	  neuron	  2,	  the	  new	  value	  of	  SumW	  is	  
loaded	  in	  SR2.	  
Chapter 4 – Implementation of SNAVA + 
 
	   57	  
	  
4.4 Leaky Integrate-and-Fire model 
implementation in SNAVA + 
	  One	   of	   SNAVA+	   main	   features	   is	   the	   ability	   to	   support	   any	   SNN	   model	   in	   which	   the	  communication	  between	  neurons	  take	  place	  via	  spikes,	  but	  in	  order	  to	  describe	  more	  clearly	  the	  architectural	  changes	  it	  is	  better	  to	  make	  an	  example	  with	  a	  specific	  SNN	  model.	  	  Consider	  the	  Leaky	  Integrate	  and	  Fire	  (LIF)	  model,	  in	  this	  case	  the	  neuronal	  parameters	  are:	  	   • Si:	  post-­‐synaptic	  spike	  –	  1	  bit	  	   • Vi:	  membrane	  potential	  –	  16	  bits	  
	   • SumW:	  sum	  of	  weights	  	  –	  16	  bits	  
	   • Mi:	  Memory	  of	  time	  interval	  between	  latest	  spikes	  –	  10	  bits	  
	   • Tref:	  refractory	  time	  period	  –	  3	  bits	  
	  
	  These	  parameters	  require	  a	  total	  amount	  of	  46	  bits	  so,	  since	  each	  memory	  word	  is	  32	  bit,	  two	  words	  are	  enough;	  therefore	  the	  value	  to	  be	  written	  in	  the	  configuration	  register	  n_w_reg	  is	  2	  (the	  number	  of	  BRAM	  words	  per	  neuron	  is	  2).	  The	  assembler	  program	  used	  to	  implement	  the	  LIF	  model	  is	  shown	  and	  properly	  commented	  in	  Appendix	  A.	  	  In	   SNAVA+	   architecture	   the	   neuronal	   parameters	   are	   mapped	   in	   the	   neuronal	   BRAMs	   as	  	  shown	  in	  Figure	  4.7	  .	  	  	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   58	  
	  
	  
Figure	  4.7	  -­‐	  Structure	  of	  neuronal	  BRAM	  (LIF	  model).	  
Note:	  “nu”	  means	  not	  used,	  so	  the	  value	  of	  the	  bits	  is	  set	  to	  0	  	  	  	  Using	  the	  LOADNP	  and	  STORENP	  operations,	  these	  parameters	  are	  written/read	  from/to	  the	  shadow	  registers,	  and	  mapped	  as	  shown	  in	  Figure	  4.8	  	  	  	  	  
Chapter 4 – Implementation of SNAVA + 
 
	   59	  
	  	  
Figure	  4.8	  -­‐	  Mapping	  of	  the	  neuronal	  parameters	  in	  the	  shadow	  registers	  (LIF	  model)	  








Chapter 5 - Results 
	   60	  
Chapter 5 - Results 
 This	   chapter	   summarizes	   the	   results	   obtained	   by	   implementing	   and	   synthesizing	   SNAVA	   +	  with	   Xilinx	   Vivado	   ™	   Design	   Suite.	   In	   particular	   this	   study	   is	   proposed	   to	   analyze	   the	  performance	   of	   SNAVA+	   in	   terms	   of	   processing	   speed,	   area	   and	   power	   consumption,	   	   and	  compare	  them	  with	  SNAVA	  by	  executing	  the	  LIF	  model.	  	  	  
5.1 Performance evaluation – Leaky integrate-and-
fire model  




5.2 Processing speed performance 
	  The	  following	  performance	  study	  is	  dedicated	  to	  analyze	  the	  processing	  speed.	  This	  is	  because	  the	   new	   strategy	   in	   the	   mapping	   of	   the	   neural	   variables	   affects	   the	   performance	   of	   the	  computation	  of	  the	  LIF	  model	  in	  SNAVA+.	  
Chapter 5 - Results 
	   61	  
The	  LIF	  algorithm	  consists	  of	  7	   subroutines	  dedicated	   to	   compute	  neural	  parameters	  and	  a	  loop	   to	   calculate	   synaptic	   parameters.	   The	   required	   number	   of	   cycles	   to	   execute	   each	  subroutine	  in	  phase	  1	  is	  indicated	  in	  Table	  5.1	  .	  The	  encoding	  of	  subroutines	  contained	  in	  the	  synapse	  loop	  is	  shown	  in	  Table	  5.2	  .	  
	  
	  	  	  NEURONAL	  LOOP	  
Symbol	   Subroutine	   Clock cycles	  
L	  N	  P	   Load	  neuronal	  parameters	   2∙N*	  
S	  N	  P	   Save	  neuronal	  parameters	   2∙N*	  
M	  P	   Membrane	  Potential	   39	  
C	  S	   Cycles	  per	  each	  synapse	   (25)	  ∙S	  
S	  U	   Spike	  update	   48	  
R	  F	   Refractory	  period	   5	  
N	  S	   Neuron	  display	   24	  
S	  E	   Spikes	  enable	   6	  
	  
Table	  5.1	  -­‐	  Neuronal	  loop	  subroutines.	  	  
Note:	  *N	  is	  the	  number	  of	  words	  in	  the	  neuronal	  BRAM	  assigned	  to	  each	  neuron	  in	  order	  to	  store	  




	  	  	  	  	  	  	  	  	  	  SYNAPTIC	  LOOP	  
Symbol	   Subroutine	   Clock cycles 	  
S	  L	   Synapse	  Load	   3	  
S	  W	   Synaptic	  weight	   21	  
S	  S	   Synapse	  Save	   1	  
 
Table	  5.2	  -­‐	  Synaptic	  loop	  subroutines.	  	  
 Hence	  it	  can	  be	  formulated	  a	  relation	  between	  the	  number	  of	  execution	  cycles	  and	  the	  number	  of	  emulated	  neurons	  and	  synapses	  by	  adding	  the	  contribution	  of	  each	  subroutine	  to	  the	  total	  
Chapter 5 - Results 
	   62	  
delay.	   There	   are	   two	   equations	   to	   compute	   the	   number	   of	   clock	   cycles	   to	   emulate	   the	   LIF	  model.	  The	  first	  equation	  5.1,	  calculates	  the	  number	  of	  clocks	  without	  considering	  the	  delay	  to	  display	  the	  neuronal	  parameters	  in	  the	  monitor,	  and	  equation	  5.2	  considers	  the	  monitor	  delay	  produced	  by	  the	  visualization	  of	  the	  parameters.	  	  	  Without	  parameters	  display:	  	  
NT = 4 ⋅N ⋅NV +122 ⋅NV + 25 ⋅S 	   	   	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  (5.1) 	  	  	   With	  parameters	  display:	  
In	  these	  equations:	  
• NT	  =	  Number	  of	  clock	  cycles	  
• N	  :	  number	  of	  neuronal	  BRAM	  words	  per	  each	  neuron	  
• Nv	  :	  number	  of	  virtualization	  
• S	  :	  number	  of	  synapses	  per	  each	  Processing	  Element	  of	  the	  array	  
• P	  	  	  	  =	  Number	  of	  processors	  
• B	  	  	  =	  Bus	  width	  
• SD	  =	  Number	  of	  synapse	  parameters	  to	  display	  
• ND	  =	  Number	  of	  neuronal	  parameters	  to	  display	  
• BS	  =	  Buffer	  size	  	  	  Figure	  5.1	  and	  5.2	  shows	  the	  execution	  time	  required	  to	  compute	  the	  LIF	  model	  in	  SNAVA	  and	  SNAVA+	   respectively.	  Both	   figures	   show	   the	  execution	   time	   for	   the	   same	  number	  of	   virtual	  neurons	  and	  synapses	  per	  processing	  element	  in	  order	  to	  compare	  these	  architectures	  under	  the	  same	  condition,	  in	  a	  single	  step	  simulation	  of	  the	  LIF	  algorithm.	  	  In	  the	  case	  of	  SNAVA+	  (Figure	  5.2),	  the	  equations	  5.1	  and	  5.2	  have	  been	  used	  to	  calculate	  the	  number	  of	  clock	  cycles	  for	  a	  single	  step	  simulation.	  	  	  












Chapter 5 - Results 
	   63	  
Instead	  for	  SNAVA	  have	  been	  used	  the	  following	  equations	  5.3	  and	  5.4	  [20]:	  	  
SNN vT ⋅+⋅= 122152  (5.3) 
 








' S ⋅SD+ Nv ⋅ND( )  (5.4) 	  	  The	  system	  clock	  used	  to	  calculate	  the	  execution	  time	  for	  both	  the	  architectures	  is	  125	  MHz.	  Despite	  SNAVA	  and	  SNAVA+	  could	  operate	  with	  a	  clock	  up	  to	  200	  MHz,	  they	  are	  working	  at	  125	   MHz	   to	   work	   at	   the	   same	   clock	   of	   the	   communication	   interfaces	   as	   Ethernet	  communication	  system	  and	  AER	  system,	  in	  order	  to	  avoid	  problems	  of	  synchronization.	  	  
• Execution time for the simulation of Leaky integrate-and-fire model implemented on 





Chapter 5 - Results 





a) Execution time VS number of neurons and synapses per processing element without display 
b) Execution time VS number of neurons and synapses per processing element with display 
 
• Execution time for the simulation of Leaky integrate-and-fire model implemented on 





Chapter 5 - Results 





a) Execution time VS number of neurons and synapses per processing element without display 
b) Execution time VS number of neurons and synapses per processing element with display 	  The	   distinctive	   difference	   between	   equations	   5.1,	   5.2	   and	   5.3,	   5.4	   is	   the	   inclusion	   of	   new	  variable	  N,	  which	  considers	   the	  number	  of	  BRAM	  works	  to	  store	  the	  neural	  parameters	  per	  each	  neuron,	  for	  the	  case	  of	  SNAVA+.	  In	  the	  presented	  results,	  the	  number	  of	  neuronal	  BRAM	  words	  per	  each	  neuron	  is	  equal	  to	  2,	  since	  it	  is	  sufficient	  to	  use	  two	  words	  of	  memory	  to	  store	  the	  neuron	  parameters	  of	  	  the	  LIF	  model.	  As	   can	   be	   seen	   from	  Figures	   5.1	   and	   5.2,	   SNAVA+	   has	   a	   penalization	   of	   about	   400	   µs	  with	  respect	   to	  SNAVA	   in	  both	   the	  cases,	  without	  and	  with	  display.	  The	  graphs	  of	  SNAVA+	  show	  that	   the	   trend	  of	   the	  execution	  time	   is	  only	   lightly	   influenced	  by	   the	  number	  of	  neurons	   for	  processing	   element.	   The	   major	   contribution	   to	   the	   delay	   is	   due	   to	   the	   increment	   in	   the	  number	  of	  synapses,	  therefore	  the	  execution	  time	  grow	  considerably.	  	  In	   the	   case	   of	   4000	   available	   synapses	   per	   processing	   element	   the	   execution	   time	   remains	  close	  to	  1.5	  ms	  that	  may	  still	  be	  considered	  not	  significantly	  far	  from	  the	  resolution	  time	  in	  the	  biological	   neurons,	   which	   is	   around	   of	   1	   ms.	   The	   execution	   time	   is	   about	   1	   ms	   when	   are	  simulated	  around	  3000	  synapses	  per	  processor	  and	  until	  100	  neurons	  per	  processor,	  which	  is	  already	  a	  large	  number	  of	  synapses	  and	  neurons.	  
Chapter 5 - Results 
	   66	  
This	  shows	  the	  enormous	  potential	  of	   the	  emulator	  SNAVA+	  in	  the	  simulation	  of	   large	  scale	  SNN	  models,	  where	   is	   feasible	   to	   implement	   this	  number	  of	  neurons	  or	  even	  more,	  because	  this	  depends	  on	  the	  number	  of	  BRAMs	  to	  be	  used	  for	  store	  neural	  parameters.	  The	  possible	  difficulty	  is	  regarding	  in	  the	  implementation	  of	  the	  number	  of	  synapses,	  which	  are	  limited	  by	  the	  number	  of	  LUTs	  available	  in	  the	  current	  FPGA.	  The	  proposed	  target	  in	  the	  current	  version	  of	  the	  system	  called	  SNAVA+	  is	  to	  implement	  1000	  neurons	   and	   50000	   synapses	   per	   FPGA,	   i.e.	   10	   neurons	   and	   500	   synapses	   per	   processor.	  	  Figure	   5.3	   points	   out	   the	   execution	   time	   for	   the	   simulation	   of	   LIF	   model	   in	   SNAVA	   and	  SNAVA+	   taking	   into	   account	   the	   proposed	   target.	   The	   bar	   charts	   reveal	   the	   loss	   of	   time	  performance	  in	  SNAVA+	  respect	  to	  SNAVA.	  This	  is	  due	  to	  the	  fact	  that	  in	  SNAVA+	  the	  neuronal	  parameters	  are	  placed	  in	  the	  BRAMs	  and	  no	  more	  in	  registers	  inside	  of	  the	  processor	  (bank	  of	  registers),	   so	   this	   leads	   an	   overhead	   that	   however	   does	   not	   degrade	   the	   performance	   in	   a	  striking	   and	   significant	   way.	   In	   this	   case,	   with	   the	   LIF	   SNN	   model,	   also	   the	   removed	  instructions	  (see	  section	  4.1)	  affect	  the	  processing	  time.	  In	  fact	  in	  the	  used	  ASM	  program	  (see	  Appendix	  A)	   there	   are	  more	   instructions	   than	   the	   corresponding	  ASM	  program	   for	   the	   LIF	  model	   in	   SNAVA.	   However,	   in	   the	   face	   of	   positive	   results	   in	   terms	   of	   area	   and	   power	  consumption,	  a	  slight	  loss	  of	  time	  performance	  time	  can	  be	  certainly	  considered	  a	  good	  trade	  off.  
 
Chapter 5 - Results 
	   67	  
 
 
Figure 5.3:Execution time for the simulation of 10 neurons and 500 synapses per processor (a total 
amount of 1000 neurons and 50000 synapses in all the chip) 
Note: 1 is the case without parameters display, 2 is the case with parameters display 
 
 
5.3 Implementation and performance  
 The	   SNAVA+	  prototype	   is	   implemented	   on	   the	  KC705	  board	   kit	   (Fig	   5.4),	  which	   includes	   a	  Xilinx	  Kintex7	  FPGA	  embedded	  on	  it.	  This	  board	  offers	  advanced	  modules	  of	  hardware	  which	  involve	  high	  speed	  serial	   links	  and	  advanced	  memory	  interfaces.	  Therefore,	  the	  use	  of	  these	  	  features	  has	  facilitated	  the	  development	  of	  the	  present	  architecture	  with	  high	  performance	  in	  terms	  of	  communication	  and	  processing.	  In	  this	  section,	  the	  post-­‐synthesis	  results	  of	  SNAVA+	  are	   shown	   and	   are	   compared	   with	   those	   of	   SNAVA,	   firstly	   in	   terms	   of	   area	   (resource	  occupation	  of	  FPGA)	  and	  secondly	  in	  terms	  of	  power	  consumption.	  	  
Chapter 5 - Results 
	   68	  
	  
Figure 5.4:KC705 base board 	  	  
5.3.1 Area consumption  
 Table	  5.3	  shows	  the	  comparison	  between	  SNAVA	  and	  SNAVA+	  in	  terms	  of	  area	  consumption.	  The	   comparison	   is	   made	   considering	   the	   same	   conditions	   for	   both	   the	   architectures,	   so	   it	  means	  the	  same	  array	  dimension	  (same	  number	  of	  processing	  element	  of	  the	  array)	  and	  the	  same	   number	   of	   synapses.	  Moreover	   in	   both	   the	   architectures	   the	   64-­‐bit	   Galois	   LFSRs	   are	  synthetized.	  Regarding	  the	  number	  of	  neurons,	   for	  SNAVA+	  it	  should	  be	  considered	  that	   for	  any	  number	  of	  virtualized	  neurons	  (from	  1	   to	  128)	   the	  same	  area	   is	  occupied	  on	   the	  FPGA,	  because	  regardless	  of	   the	  number	  of	  neurons	  a	  BRAM	  1024-­‐byte	  x	  32-­‐bit	   is	   synthetized	   for	  each	  processing	  element	  of	  the	  array.	  
Resource SNAVA 10x10 1 level of 
virtualization (99 syn per PE) 
SNAVA+ 10x10 n levels of 
virtualization* (99 syn per 
PE) 
Available 
Flip-Flops 99487 – 24% 77444 – 19% 407600 
LUTs 171291 – 84% 134400 – 66% 203800 
BRAMs 135 – 15% 213 – 24% 890 	  
Table	  5.3	  –	  Area	  occupation	  of	  SNAVA	  and	  SNAVA	  +	  
Note:	  *n	  can	  be	  from	  1	  to	  128	  
	  
Chapter 5 - Results 
	   69	  
	  An	   important	   reduction	  of	  hardware	  resources	   is	  obtained	   thanks	   to	   the	   implementation	  of	  the	  BRAMs	   to	   store	   the	  neural	  parameters	   instead	  of	  using	   the	  bank	  of	   registers,	   as	   can	  be	  observed	   from	   the	   Table	   5.3.	   The	   percentage	   in	   the	   consumption	   of	   Flip-­‐Flops	   is	   reduced	  around	  of	  5%.	  This	  is	  because	  of	  the	  removal	  of	  the	  banks	  of	  registers	  (shadow	  registers)	  and	  of	  some	  instructions.	  The	  best	  result	  of	  the	  new	  strategy	  is	  in	  the	  consumption	  of	  the	  available	  LUTs	  in	  the	  FPGA.	  Around	  of	  18%	  in	  the	  consumption	  of	  the	  LUTs	  is	  gained.	  This	  implies	  that	  a	  greater	  number	  of	  synapses	  can	  be	  implemented.	  Evidently,	  the	  increment	  of	  the	  BRAMs	  is	  visible,	  but	  the	  consumption	  is	  not	  significantly	  considering	  the	  number	  of	  BRAM	  available.	  In	  fact,	  the	  target	  was	  just	  to	  take	  advantage	  of	  the	  large	  number	  of	  BRAM	  not	  used	  to	  gain	  more	  FF	  and	  especially	  LUTs. Table	  5.4	  shows	  the	  area	  occupation	  of	  SNAVA+	  with	  different	  numbers	  of	  available	  synapses	  per	  each	  Processing	  Element	  of	   the	  array.	  Regarding	   the	  number	  of	  neurons,	   for	  SNAVA+	   it	  should	  be	  considered	   that	   for	  any	  number	  of	  virtualized	  neurons	   (from	  1	   to	  128)	   the	   same	  area	   is	  occupied	  on	   the	  FPGA,	  because	   regardless	  of	   the	  number	  of	  neurons	  a	  BRAM	  1024-­‐byte	  x	  32-­‐bit	  is	  synthetized	  for	  each	  processing	  element	  of	  the	  array.	  In	  any	  case,	  these	  data	  have	  been	  achieved	  synthetizing	  the	  architecture	  with	  a	  virtualization	  level	  of	  10.	  	  Note:	   these	   results	   have	   been	   obtained	   considering	   a	   chip	   ID	   of	   4	   bits,	   so	   considering	   a	  maximum	  number	  of	  16	  possible	  interconnected	  boards	  	  
 
Table	  5.4	  –	  Area	  occupation	  of	  SNAVA+	  for	  different	  numbers	  of	  synapses	  per	  Processing	  
Element.	  Note:	  *n	  can	  be	  from	  1	  to	  128	  
 
 
Resource SNAVA+ 10x10  
n levels of 
virtualization* 
(50 syn per PE) 
SNAVA+ 10x10  
n levels of 
virtualization* 
(100 syn per PE) 
SNAVA+ 10x10  
n levels of 
virtualization* 
(200 syn per PE) 
Available 
Flip-Flops 65216 – 16% 77444 – 19% 97237 – 24% 407600 
LUTs 128394 – 63% 134508 – 66% 148485 – 73% 203800 
BRAMs 213 – 24% 213 – 24% 213 – 24% 890 
Chapter 5 - Results 
	   70	  
As	  can	  be	  observed	   in	  Table	  5.4,	   the	  percentage	   in	   the	  consumption	  of	   the	  number	  of	  LUTs	  and	   registers	   is	   increasing	  by	  3%	   in	   the	   case	  of	  100	   synapses	  per	  Processing	  Element	  with	  respect	  to	  the	  implementation	  of	  50	  synapses	  per	  PE,	  while	  it	  increases	  of	  about	  6.8	  %	  when	  passing	  from	  100	  to	  200	  synapses	  per	  processor.	  Note:	  increasing	  the	  number	  of	  synapses,	  the	  synthesis	  performed	  by	  Xilinx	  Vivado	  ™	  requires	  more	  and	  more	  time:	  to	  perform	  the	  synthesis	  of	  100	  synapses	  per	  processor	  required	  about	  8	  hours,	  while	  the	  one	  with	  200	  synapses	  per	  processor	  required	  about	  3	  days	  (see	  table	  5.5)	  .	  	  
 
Table	  5.5:	  Synthesis	  time	  required	  increasing	  the	  number	  of	  synapses	  per	  processor	  
 
 However	  from	  the	  tests	  performed,	  also	  with	  a	  number	  of	  synapses	  per	  PE	  less	  than	  50	  (not	  shown	   in	  Table	   5.4	   and	  5.5	   because	   they	   are	   considered	  not	   relevant),	   the	   consumption	   of	  LUTs	  and	  FF	  seems	  to	  have	  a	  trend	  almost	  linear,	  as	  a	  function	  of	  the	  number	  of	  synapses.	  We	  can	  assume,	  in	  the	  worst	  case,	  a	  growth	  of	  4%	  for	  every	  increase	  of	  50	  synapses	  per	  PE	  (see	  Figure	  5.5).	  
 
 
 SNAVA+ 10x10  
n levels of 
virtualization* 
(50 syn per PE) 
SNAVA+ 10x10  
n levels of 
virtualization* 
(100 syn per PE) 
SNAVA+ 10x10  
n levels of 
virtualization* 
(200 syn per PE) 
Computation time about 3 hours about 8 hours about 72 hours 
Chapter 5 - Results 
	   71	  
 
Figure	  5.5	  –	  LUTs	  and	  FF	  consumption	  vs.	  synapses	  per	  processor	  
 Thus,	  whereas	   the	   graph	   in	   figure	   shows	   a	  worst-­‐case	   situation,	   it	   is	   reasonable	   to	   assume	  that	  with	  SNAVA+	  are	  achievable	  until	  500	  synapses	  per	  processor	  (50000	  synapses	  in	  all	  the	  FPGA).	  	  	  	  	  	  
 
5.3.2 Power consumption  







50	   100	   150	   200	   250	   300	   350	   400	   450	   500	  
LUT	  consumption	  VS	  synapses	  per	  processor	  FF	  consumption	  VS	  synapses	  per	  processor	  
FPGA	  resource	  usage	  
Chapter 5 - Results 
	   72	  
section	  have	  been	  obtained	  by	  enabling	  the	  settings	  on	  VIVADO	  to	  optimize	  the	  power	  on	  the	  design.	  These	  options	  are:	  • power	  opt	  design	  • post	  placed	  power	  opt	  design	  
	  
	  
Dynamic Power	   Static power	  
1.216 W	   0.186 W	  
 
	  
Figure	  5.6	  –	  Power	  consumption	  of	  SNAVA+	  with	  10x10	  PE	  array	  size	  and	  99	  synapses	  per	  PE	  
unit	  
	  
	  Table	   5.6	   shows	   the	   comparison	   between	   SNAVA	   and	   SNAVA+	   in	   terms	   of	   power	  consumption.	   The	   comparison	   is	   made	   considering	   the	   same	   conditions	   for	   both	   the	  architectures,	   the	  same	  array	  dimension	   (same	  number	  of	  processing	  element	  of	   the	  array)	  and	   the	   same	   number	   of	   synapses.	   Regarding	   the	   number	   of	   neurons,	   in	   this	   comparison	  
SNAVA+	  has	  10	  times	  more	  neurons	  than	  SNAVA.	  	  
	  
Resource SNAVA 10x10 1 level of virtualization 
(99 syn per PE) 
SNAVA+ 10x10 10 levels of 
virtualization (99 syn per PE) 
SNAVA  0.649 W 0.922 W 
Ethernet controller 0.039 W 0.043 W 
AER controller 0.249 W 0.251 W 
TOTAL 0.931 W 1.216 W 	  
Table	  5.6	  –	  Power	  consumption	  of	  SNAVA	  and	  SNAVA	  +	  	  As	   can	  be	  observed	   from	  Table	  5.6,	   there	   is	   an	   increment	  of	   around	  285	  mW	   in	   the	  power	  consumption	   of	   SNAVA+	   when	   compared	   with	   SNAVA,	   but	   considering	   a	   number	   of	  
neurons	  10	  times	  more.	  The	  module	  SNAVA	  in	  SNAVA+	  project,	  which	  contains	  the	  cellular	  
Chapter 5 - Results 
	   73	  
configurable	   processing	   elements,	   the	   sequencer	   (control	   unit)	   and	   all	   the	   components	  described	  in	  section	  1.4,	  is	  contributing	  273	  mW	  more	  than	  SNAVA	  project.	  This	  bigger	  power	  consumption	  is	  generated	  by	  the	  Cellular	  PE	  unit.	  However	  considering	  that	  in	  this	  comparison,	  SNAVA	  +	  has	  much	  more	  neurons	  than	  SNAVA,	  the	  power	  consumption	  per	  neuron	  is	  widely	  lower	  than	  that	  of	  SNAVA.	  	  Hence,	  considering	  this,	  SNAVA+	  has	  a	  better	  performance-­‐power	  consumption	  ratio	  than	  SNAVA.	  	  	  	  Table	   5.7	   shows	   the	   consumption	   of	   each	   module	   of	   a	   single	   Cellular	   PE	   in	   SNAVA	   and	  SNAVA+.	  Evidently,	   the	   integration	  of	   the	  BRAM	  blocks	   to	   store	   the	  neural	  parameters	   in	   SNAVA+	   is	  contributing	  to	  the	  total	  with	  2	  mW	  per	  each	  cellular	  Processing	  Element.	  	  
Resource SNAVA 10x10 
1 levels of virtualization 
(99 syn per PE) 
SNAVA+ 10x10 
10 levels of virtualization 
(99 syn per PE) 
CAM 0.001 W 0.001 W 
Spike register 0.001 W 0.001 W 
Processing element 0.0039 W 0.003 W 
neuronal BRAM  - 0.002 W 
synaptic BRAM  0.001 W 0.002 W 
TOTAL 0.007 W 0.009 W 	  
Table	  5.7	  –	  Power	  consumption	  of	  a	  single	  Cellular	  PE	  in	  SNAVA	  and	  SNAVA	  +	  	  Table	  5.8	  shows	  the	  power	  consumption	  of	  SNAVA+	  with	  different	  numbers	  of	  	  synapses	  and	  virtualization	  level	  of	  10	  per	  each	  Processing	  Element	  of	  the	  array.	  These	  results	  are	  obtained	  by	  considering	  a	  chip	  ID	  with	  4	  bits,	  so	  that	  the	  maximum	  number	  of	  boards	  to	  be	  connected	  are	  16.	  	  There	  is	  not	  consumption	  overhead	  in	  the	  SNAVA+	  increasing	  the	  number	  of	  synapses,	  except	  for	  the	  CAMs	  that	  increase	  their	  size,	  so	  also	  their	  static	  power	  consumption	  increases.	  
	  
 
Chapter 5 - Results 
	   74	  
Resource	   SNAVA+ 10x10 
10 levels of 
virtualization 
 (50 syn per PE)	  
SNAVA+ 10x10 
10 levels of 
virtualization 
(100 syn per PE)	  
SNAVA+ 10x10 
10 levels of 
virtualization 
(200 syn per PE)	  
CAM	   0.001 W	   0.001 W	   0.002 W	  
Spike register	   0.001 W	   0.001 W	   0.001 W	  
Processing element	   0.003 W	   0.003 W	   0.003 W	  
BRAM neuronal	   0.002 W	   0.002 W	   0.002 W	  
BRAM synaptic	   0.002 W	   0.002 W	   0.002 W	  
TOTAL	   0.009 W	   0.009 W	   0.01 W	  
 
Table	  5.8:	  Power	  consumption	  of	  a	  single	  Cellular	  PE	  in	  SNAVA	  and	  SNAVA	  +	  
 
Chapter 5 – Conclusion and further research 
	   75	  
Chapter 6 – Conclusions and  
further research 




	  The	  purpose	  of	  SNAVA+	  is	  to	  boost	  the	  performance	  of	  SNAVA	  architecture,	  in	  order	  to	  obtain	  a	  large-­‐scale	  emulator	  SNN.	  In	  SNAVA	  was	  possible	  to	  simulate	  200	  neurons	  per	  chip,	  SNAVA+	  can	  potentially	  reach	  up	  to	  10000	  neuron	  per	  chip.	  The	  area	  consumption	  of	  SNAVA+	  appears	  to	  be	  considerably	   lower	   than	  that	  of	  SNAVA,	  under	  equal	  conditions	  (99	  synapses	  per	  PE).	  About	  18%	  of	  LUTs	  has	  been	  spared,	  thus	  allowing	  to	   implement	  up	  to	  50000	  synapses	  per	  chip.	   SNAVA+	   appears	   to	   have	   a	   better	   performance	   -­‐	   power	   consumption	   ratio.	   All	   these	  features	  getting	  only	  a	  reduced	  drop	  in	  the	  processing	  speed.	  However,	   since	   there	  are	  several	   trade-­‐offs	   to	  respect,	   it	   is	  almost	   impossible	   to	  provide	  an	  architecture	  to	  emulate	  this	  type	  of	  SNN	  models	  that	  is	  efficient	  according	  all	  these	  issues.	  The	  main	   trade-­‐off	   of	   is	   related	   to	   the	   area	   consumption	   and	   flexibility	   of	   the	   design. SNAVA+	  architecture	  intends	  to	  offer	  an	  emulator	  which	  support	  large-­‐scale	  SNN	  models	  by	  making	  a	  balance	  between	  these	  two	  factors.	   It	  offers	  the	  possibility	  to	  emulate	  different	  SNN	  models	  with	   different	   level	   of	   computational	   complexity,	   from	   the	   simple	   Leaky	   integrate	   and	   fire	  model	   to	   the	   complex	   Iglesias	   and	   Villa	  model	   [19],	   and	   keeping	   low	   area	   consumption. A	  significant	   improvement	   is	   obtained	   relating	   to	   the	   increment	   of	   the	   number	   of	   neurons	  supported	  in	  SNAVA+	  at	  the	  cost	  of	  time	  processing.	  	  The	  performance	  evaluation	  of	  SNAVA+	  suggests	   that	   the	   processing	   speed	   is	  minimally	   decreased.	   In	   addition,	   the	   best	   use	   of	   the	  available	   resources	   has	   made	   it	   possible	   to	   significantly	   increase	   the	   number	   of	   synapses.	  Thus,	   SNAVA+	  can	  be	  definitely	   considered	  an	   important	  option	   to	  emulate	  SNN	  models	  by	  
Chapter 5 – Conclusion and further research 
	   76	  
considering	   its	   features	   like	   multi-­‐model	   support,	   scalability	   and	   low	   power	   and	   area	  consumption.	  	  
 
 
6.2 Further research 
	  Although	  with	   the	   architecture	   SNAVA	  +	  we	  have	  made	   important	   steps	   forward,	   however,	  still	   remain	  some	  aspects	   that	   if	   improved	  can	   lead	   to	  a	  SNN	  emulator	  even	  more	  powerful	  and	   competitive.	   The	   goal	   is,	   of	   course	   ,	   to	   increase	   as	   much	   as	   possible	   the	   number	   of	  neurons	  and	  especially	  the	  number	  of	  synapses	  simulated.	  Firstly,	  to	  fully	  exploit	  the	  potential	  of	  SNAVA	  +	   	   in	  terms	  of	  number	  of	  neurons	  that	  can	  be	  emulated	   ,	   it	  would	  be	  necessary	  to	  radically	   change	   the	  AER.	   Indeed,	   currently	   the	  protocol	  allows	   to	  address	  a	  maximum	  of	  7	  neurons	  per	  processor	   ,	   as	   there	  are	  only	   three	  bits	   to	   indicate	   the	  virtualization	   layer	  of	   a	  given	  PE	  of	  	  the	  array.	  It	  would	  be	  necessary	  to	  increase	  the	  number	  of	  bits	  to	  7,	  in	  order	  to	  address	   up	   to	   128	   neurons	   per	   processing	   element	   and	   therefore	   take	   advantage	   of	   the	  possibilities	  offered	  by	  neuronal	  BRAM.	  However	  this	  increase	  in	  the	  number	  of	  address	  bits	  would	   certainly	   entail	   an	   increase	   of	   the	   area	   occupied	   by	   the	   CAM.	   To	   limitate	   the	   area	  consumption,	  it	  would	  be	  better	  to	  change	  the	  system	  for	  identifying	  and	  addressing	  the	  PE	  in	  the	  array	   ,	  which	  is	  currently	  performed	  using	  4-­‐bit	  to	  identify	  the	  row	  and	  4-­‐bit	  to	  identify	  the	  column	  .	  In	  addition,	  there	  are	  7-­‐bit	  	  of	  chip	  ID	  ,	  necessary	  to	  identify	  the	  FPGA	  when	  they	  are	  connected	  	  several	  boards	  (Figure	  6.1).	  	  
	  
Figure	  6.1	  –	  Current	  mapping	  for	  neuron	  identification	  	  	  
	  
Figure	  6.2	  –	  Proposal	  of	  mapping	  for	  neuron	  identification	  
Chapter 5 – Conclusion and further research 
	   77	  
	  The	  proposal	  (Figure	  6.2)	  is	  to	  change	  the	  identification	  of	  the	  PE,	  	  which	  now	  occurs	  with	  the	  row+column	   system	   (8	   bits	   total),	   with	   a	   system	   of	   PE	   ID,	   where	   it	   is	   assigned	   to	   each	  processor	  of	  	  the	  array	  a	  number	  from	  1	  to	  100	  (in	  the	  case	  of	  10X10	  SIMD	  array).	  Therefore	  it	  will	  be	  needed	  only	  7	  bits,	   	  no	  longer	  8.	  It	  should	  be	  taken	  into	  account	  that	  saving	  1	  bit	  for	  each	  line	  of	  each	  instantiated	  CAM	  represents	  a	  not	  negligible	  saving	  of	  area	  on	  the	  FPGA.	  In	  addition,	  to	  have	  7-­‐bit	  of	  chip	  ID	  seems	  to	  be	  just	  a	  waste	  of	  area	  as	  it	  currently	  has	  not	  gone	  beyond	  the	  4	  boards	  interconnected,	  so	  at	  first	  it	  may	  be	  better	  to	  have	  4-­‐bit	  of	  chip	  ID,	  which	  means	  a	  maximum	  16	  board	  interconnected.	  This	  saving	  of	  area	  achievable	  will	  be	  invested	  in	  an	  increase	  in	  the	  number	  of	  synapses.	  In	  addition,	  another	  limitation	  of	  the	  AER	  is	  that	  ther	  is	  not	  any	  mechanism	  for	  error	  detection	  and	  error	  correction.	  So	  considering	  the	  high	  frequencies	  at	  which	  it	  works,	  it	  would	  be	  highly	  recommended	  to	  work	  in	  order	  to	  make	  the	  protocol	  more	  reliable.	  Another	  proposal	   is	   to	  exploit	   in	  a	  more	  efficient	  way	   the	  available	   resources	  on	   the	  FPGA.	  Currently	  the	  use	  of	  resources	  is	  clearly	  unbalanced	  towards	  the	  LUTs,	  which,	  however,	  still	  constitute	   the	   critical	   and	   limiting	   resource	   of	   the	   FPGA.	   The	   idea	   is	   to	   exploit	   the	   large	  number	   of	   BRAM	   yet	   available	   (about	   76%),	   to	   store	   the	  mapping	   of	   synapses	   assigned	   to	  each	  neuron,	  changing	  the	  mechanism	  that	  now	  takes	  place	  using	  the	  CAM,	  and	  	  wastes	  a	  large	  amount	  of	  resources.	  This	  strategy	  proved	  to	  be	  inefficient	  for	  a	  large-­‐scale	  network,	  from	  the	  point	  of	  view	  of	  the	  area	  consumption.	  Finally,	  a	   further	  step	  could	  be	  to	  switch	   from	  a	  commercial	  FPGA,	   to	  an	  ASIC	  to	  realize	  the	  hardware	   architecture	   SNAVA+	   .	   The	   advantages	   in	   creating	   a	   full	   custom	   chip,	   would	  obviously	   be	   to	   have	   an	   optimized	   hardware	   that	   means	   better	   performance	   and	   reduced	  power	  consumption.	  However,	  this	  choice	  would	  be	  practicable	  only	  in	  the	  event	  that	  the	  production	  of	  such	  chips	  would	  justify	  the	  non-­‐recurring	  engineering	  (NRE)	  costs.	  	  	  
Bibliography 
	   78	  
Bibliography 	  	  	  1. Stuart	  J.	  Russell	  and	  Peter	  Norvig,	  “Artificial	  intelligence	  a	  modern	  approach”,	  Introduction:	  11-­‐16,	  3rd	  edition,	  2009.	  	  2. Andrew	  S.	  Cassidy,	  Julius	  Georgiou,	  Andreas	  G.	  Andreou	  ,	  “	  Design	  of	  silicon	  brains	  in	  the	  
nano-­‐CMOS	  era:	  Spiking	  neurons	  learning	  synapses	  and	  neural	  architecture	  optimization”	  ,	  4-­‐6,	  Journal	  of	  the	  International	  Neural	  Network	  Society,	  European	  Neural	  Network	  Society	  &	  Japanese	  Neural	  Network	  Society	  (ELSEVIER),	  2013.	  	  3. W.S.	  McCulloch	  and	  W.	  Pitts.	  “A	  logical	  calculus	  of	  the	  ideas	  immanent	  in	  nervous	  activity.”,	  5-­‐7,	  Bulletin	  of	  Mathematical	  Biophysics,	  1943.	  	  4. D.O.	  Hebb.	  ,	  “The	  Organization	  of	  Behaviour”,	  60-­‐78,,Wiley,	  New	  York,	  1949.	  	  5. A.L.	  Hodgkin	  and	  A.F.	  Huxley.	  ”A	  quantitative	  description	  of	  ion	  currents	  and	  its	  applications	  
to	  conduction	  and	  excitation	  in	  nerve	  membranes”,	  117:500–544,	  J.	  of	  Physiology,	  1952.	  	  6. Minsky,	  M.;	  S.	  Papert	  ,	  “An	  Introduction	  to	  Computational	  Geometry”.,	  MIT	  Press,1969	  	  7. Goodman,	  D.,	  Brette,	  R.,	  “Brian:	  a	  simulator	  for	  spiking	  neural	  networks	  in	  python”,	  Frontiers	  in	  neuroinformatics	  2,	  2008.	  	  8. 	  Dimkovic,	  I.,	  “SpikeFun”,	  introduction,	  2011.	  	  9. Furber,	  S.	  B.,	  Lester,	  D.	  R.,	  Plana,	  L.	  A.,	  Garside,	  J.	  D.,	  Painkras,	  E.,	  Temple,	  S.,	  Brown,	  A.	  D.,	  	  “Overview	  of	  the	  SpiNNaker	  System	  Architecture”,	  vol62,	  issue	  12,	  IEEE	  Transactions	  on	  Computers	  (PrePrints),	  2012.	  	  10. J.	  Vreeken,”	  Spiking	  Neural	  Networks,	  an	  introduction”,1-­‐5,	  Technical	  Report	  UU-­‐CS,	  issue:	  2003-­‐008	  ,	  2003.	  	  11. Gerstner,	  W.	  ,	  Kempter,	  R.,	  Leo	  van	  Hemmen,	  J.	  &	  Wagner,	  H.	  Hebbian,	  “Learning	  of	  Pulse	  
Timing	  in	  the	  Barn	  Owl	  Auditory	  System	  in	  Maass,	  W.	  &	  Bishop,	  C.	  M.	  (eds.)	  Pulsed	  Neural	  
Networks”,	  360-­‐366,	  MIT-­‐press,	  1999.	  	  12. Suhap	  Sahin,	  Yasar	  Becerikli,	  Suleyman	  Yazici,	  “Neural	  Network	  Implementation	  in	  
Hardware	  Using	  FPGAs”	  ,	  1105	  ,	  	  Springer-­‐Verlag	  Berlin	  Heidelberg	  ,	  2006.	  	  13. Moore,	  S.W.;	  Fox,	  P.J.;	  Marsh,	  S.J.T.;	  Markettos,	  A.T.;	  Mujumdar,	  A.,	  “Bluehive	  -­‐	  A	  field-­‐
programable	  custom	  computing	  machine	  for	  extreme-­‐scale	  real-­‐time	  neural	  network	  
simulation”,	  131-­‐140,	  Field-­‐Programmable	  Custom	  Computing	  Machines	  (FCCM),IEEE	  20th	  Annual	  International	  Symposium	  on	  April	  29	  2012-­‐May	  1	  2012,	  2012	  	  14. E.	  Izhikevich,	  “Simple	  model	  of	  spiking	  neurons,	  Neural	  Networks”,	  vol.	  14,	  no.	  6,	  1569–1572,	  IEEE	  Transactions	  on,	  2003.	  	  
Bibliography 
	   79	  
15. Cassidy,	  A.;	  Andreou,	  A.G.;	  Georgiou,	  J.,	  “Design	  of	  a	  one	  million	  neuron	  single	  FPGA	  
neuromorphic	  system	  for	  real-­‐time	  multimodal	  scene	  analysis”,	  1-­‐6,	  Information	  Sciences	  and	  Systems	  (CISS),	  45th	  Annual	  Conference	  on	  23-­‐25	  March	  2011,	  2011.	  	  16. Sanchez,	  E.;	  Perez-­‐Uribe,	  A.;	  Upegui,	  A.;	  Thoma,	  Y.;	  Moreno,	  J.M.;	  Napieralski,	  Andrzej;	  Villa,	  A.;	  Sassatelli,	  G.;	  Volken,	  H.;	  Lavarec,	  E.,	  "PERPLEXUS:	  Pervasive	  Computing	  Framework	  for	  
Modeling	  Complex	  Virtually-­‐Unbounded	  Systems",	  587-­‐591,	  Adaptive	  Hardware	  and	  Systems	  AHS	  2007.	  Second	  NASA/ESA	  Conference	  on	  5-­‐8	  Aug.	  2007,	  2007.	  	  17. Athul Sripad, “SNAVA: A Generic Threshold-Based-SNN Emulation Solution”, chapter 
3, Master Thesis, Universitat Politècnica de Catalunya, September 2013.	  	  18. Sanchez,	  G.,	  Koickal,	  T.J.,	  Sripad,	  T.A.A.,	  Gouveia,	  L.C.,	  Hamilton,	  A.,	  Madrenas,	  J.,	  "Spike-­‐
based	  analog-­‐digital	  neuromorphic	  information	  processing	  system	  for	  sensor	  
applications",	  1624-­‐1627,	  Circuits	  and	  Systems	  (ISCAS),	  2013	  IEEE	  International	  Symposium	  on	  19-­‐23	  May	  2013,	  2013.	  	  19. J.	  Iglesias,	  J.	  Eriksson,	  F.	  Grize,	  M.	  Tomassini,	  and	  A.	  E.	  P.	  Villa,	  "Dynamics	  of	  pruning	  in	  
simulated	  large-­‐scale	  spiking	  neural	  networks",	  Biosystems,	  vol.	  79,	  2005.	  	  20. 	  Giovanny	  Sanchez	  Rivera,	  “Efficient	  multiprocessing	  architecture	  for	  Spiking	  Neural	  
Network	  emulation	  based	  on	  configurable	  devices”,	  chapter	  5,	  doctoral	  thesis,	  Advanced	  Hardware	  Architecture	  department	  of	  UPC	  Barcelona,	  2014.	  	  21. Khan	  MM,	  Lester	  DR,	  Plana	  LA,	  Rast	  A,	  Jin	  X,	  Painkras	  E,	  Furber	  SB,	  “SpiNNaker:	  Mapping	  
neural	  networks	  onto	  a	  massively-­‐parallel	  chip	  multiprocessor”,	  2849-­‐2856,	  	  Neural	  Networks,	  2008	  IJCNN	  2008	  (IEEE	  World	  Congress	  on	  Computational	  Intelligence)	  IEEE	  International	  Joint	  Conference,	  2008.	  	  22. Taho	  Dorta	  Pérez,	  “AER-­‐RT:	  Interfaz	  de	  Red	  con	  Topología	  en	  Anillo	  para	  SNN	  Multi-­‐FPGA”,	  Master	  Thesis,	  Universitat	  Politecnica	  de	  Catalunya,	  July	  2013.	  	  23. Boudjelal	  Meftah,	  Olivier	  Lezoray,	  Soni	  Chaturvedi,	  Aleefia	  A.	  Khurshid,	  and	  Abdelkader	  Benyettou,	  “Image	  Processing	  with	  Spiking	  Neuron	  Networks”,	  525–544,	  X.-­‐S.	  Yang	  (Ed.):	  Artif.	  Intell.,	  Evol.	  Comput.	  and	  Metaheuristics,	  SCI	  427,	  Springer-­‐Verlag	  Berlin	  Heidelberg	  2012,	  2012.	  	  24. Thorpe,	  S.,	  Delorme,	  A.,	  Van	  Rullen,	  R.	  “Spike	  based	  strategies	  for	  rapid	  processing”,	  vol.	  14,	  715-­‐726	  Neural	  Networks,	  2001.	  
 25. A.R.	  Baig,	  R.	  Séguier	  and	  G.	  Vaucher,	  “A	  Spatio-­‐temporal	  Neural	  Network	  applied	  to	  visual	  
speech	  recognition”,	  vol.2,	  797-­‐802,	  Ninth	  International	  Conference	  on	  Artificial	  Neural	  Networks	  (ICANN),	  1999.	  	  26. Simei	  Gomes	  Wysoski,	  Lubica	  Benuskova,	  and	  Nikola	  Kasabo,”Adaptive	  Spiking	  Neural	  
Networks	  for	  Audiovisual	  Pattern	  Recognition”,	  14th	  International	  Conference,	  ICONIP	  2007,	  Kitakyushu,	  Japan,	  November	  13-­‐16,	  2007.	  	  27. Pearson	  MJ,	  Pipe	  AG,	  Mitchinson	  B,	  Gurney	  K,	  Melhuish	  C,	  Gilhespy	  I,	  Nibouche	  M.,	  “Implementing	  spiking	  neural	  networks	  for	  real-­‐time	  signal-­‐processing	  and	  control	  
Bibliography 
	   80	  
applications:	  a	  model-­‐validated	  FPGA	  approach”,	  vol	  5,	  1472-­‐87,	  IEEE	  Trans	  Neural	  Networks.	  2007	  Sep,	  2007	  	  28. Marco	  K.	  Muller,	  Michael	  Tremer,	  Christian	  Bodenstein,	  Rolf	  P.	  Wurtz,	  	  “A	  spiking	  neural	  
network	  for	  situation-­‐independent	  face	  recognition	  “,	  conference:	  Proceedings	  of	  New	  Challenges	  in	  Neural	  Computation,	  Frankfurt,	  August	  2011.	  	  29. Youssef	  Elmir,	  Mohammed	  Benyettou,	  “Gabor	  Filters	  Based	  Fingerprint	  Identification	  Using	  
Spike	  Neural	  Networks”,	  International	  Conference:	  Sciences	  of	  Electronic,	  Technologies	  of	  Information	  and	  Telecommunications,	  Tunisia,	  March	  22-­‐26,	  2009.	  
 30. Ankur	  Guptaand,	  Lyle	  N.	  Long,	  “Character	  Recognition	  using	  Spiking	  Neural	  Networks“,	  	  53-­‐58,	  IEEE	  Neural	  Networks	  Conference,	  Orlando,	  FL,	  Aug.,	  2007.	  	  31. Skorheim	  S,	  Lonjers	  P,	  Bazhenov	  M.	  “A	  Spiking	  Network	  Model	  of	  Decision	  Making	  
Employing	  Rewarded	  STDP”,	  vol.	  9,	  issue	  3	  ,	  PLoS	  ONE	  9(3):	  e90821.	  doi:10.1371/journal.pone.0090821,	  2014.	  
 32. X-­‐J	  Wang.	  “Probabilistic	  decision	  making	  by	  slow	  reverberation	  in	  cortical	  circuits”,	  vol.	  36,	  
955-­‐968,	  Neuron,	  2002.	  	  33. Stuart	  J.	  Russell	  and	  Peter	  Norvig,	  “Artificial	  intelligence	  a	  modern	  approach”,	  Conclusion:	  842-­‐850,	  3rd	  edition,	  2009.	  	  34. Sachin	  Lakra,	  T.V.	  Prasad,	  G.	  Ramakrishna,	  “The	  future	  of	  neural	  networks”,	  481-­‐486,	  6th	  National	  Conference	  on	  Computing	  for	  Nation	  Development,	  INDIACom	  2012,	  New	  Delhi,	  India,	  23-­‐24	  February,	  2012.	  	  
Appendices 
	   81	  
Appendix-A 
 	  Below	  there	  is	  the	  LIF	  model	  ASM	  code	  developed	  for	  SNAVA+	  ,	  that	  implements	  the	  Leaky	  Integrate	  and	  Fire	  (LIF)	  SNN	  model.	  	  	  The	  code	  basically	  consists	  in	  three	  loops	  that	  realizes	  the	  algortihm	  in	  two	  phases	  (see	  section	  1.4.3):	  	  PHASE	  1	  –	  Spikes	  processing	  	  
• First	  neuronal	  loop:	  it	  loads	  the	  neuronal	  parameters	  of	  each	  neuron	  from	  the	  neuronal	  BRAM,	  calculates	  the	  membrane	  potential	  and	  stores	  the	  updated	  neuronal	  parameters	  in	  the	  BRAM.	  	  
• Synaptic	  loop:	  it	  loads	  the	  synaptic	  parameters	  of	  each	  neuron	  from	  the	  synaptic	  BRAM,	  it	  checks	  if	  the	  synapses	  are	  inibitories	  or	  excitatories	  and	  calculates	  the	  synaptic	  weight	  ,	  for	  all	  the	  synapsis	  of	  all	  the	  neurons	  emulated	  by	  each	  processor.	  Finally	  the	  updated	  values	  of	  the	  synaptic	  parameters	  are	  stored	  in	  the	  synaptic	  BRAM.	  	  PHASE	  2	  –	  Spikes	  distribution	  	  
• Second	  neuronal	  loop:	  it	  loads	  the	  neuronal	  parameters	  from	  neuronal	  BRAM,	  updates	  the	  spikes	  related	  to	  the	  neuron	  (SPIKES_UPDATE	  subroutine)	  and	  updates	  the	  refractory	  period	  of	  the	  neuron	  (BACKGROUND	  ACTIVITY	  subroutine).	  Subsequently	  it	  sends	  via	  Ethernet,	  to	  the	  user	  interface	  tool,	  the	  parameters	  of	  the	  neuron	  (NEURON_DISPLAY	  subroutine).	  Finally	  it	  enables	  the	  spikes	  to	  be	  enabled	  and	  stores	  the	  updated	  neuronal	  parameters	  in	  the	  neuronal	  BRAMs.	  
• Finally	  the	  SPKDIS	  instruction,	  enables	  the	  spikes	  distribution	  via	  AER	  communication	  protocol.	  	  	  	  	  	  
Appendices 
	   82	  
define	  synapses	  20	  	   	   ;	  select	  the	  number	  of	  synapses	  for	  processing	  element	  define	  neurons_virtualized	  	  7	   ;select	  the	  virtualization	  level	  	  	  ;	  CONSTANT	  DECLARATIONS	  	  .DATA	  	  AMAX="00000003"	  DACT1="0000FFFA"	  DACT2="0000FFFA"	  DBACK="0000E7A3"	  	  DMEM1="0000EF7D"	  DMEM2="0000EF7D"	  DSYN1="0000F9AE"	  DSYN2="0000F9AE"	  LMAX="00003FFF"	  MMAX="00000666"	  POT1="000003E8"	  	  	   	  POT2="0000FFB0"	  	  	  PROB="00001FFF"	  THETA1="0000E380"	  THETA2="0000E380"	  VREST1="0000E188"	  VREST2="0000E188"	  UNO="00000001"	  	  DOS="00000002"	  	  CTETP="0000F448"	  CTE1="00000000"	  	  	  	  	  .CODE	  	  	  GOTO	  MAIN	  	  ;	  *****************************	  PROCEDURES	  BEGIN	  ***************************	  	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  LOAD	  AND	  SAVE	  NEURAL	  PARAMETERS-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  .LOAD_NEURAL_PARAMETERS	  	  NOP	  	  LOADNP	  	  NOP	  RET	   	  	  .SAVE_NEURAL_PARAMETERS	  	  NOP	  STORENP	  	  NOP	  RET	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  MEMBRANE	  VALUE	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  .MEMBRANE_VALUE	  ;-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  Vi	  <-­‐-­‐	  Vres	  +	  (1-­‐Si(t))*(Vi(t)-­‐Vres)*(Kmem)	  +	  SUM_WEIGHTS	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  LDALL	  R4,DMEM1	   	   	   ;R4	  	  	  	  <-­‐-­‐	  DECAY	  DONATOR	  1	  	  LDALL	  R5,VREST1	  	   	   	   ;R5	  	  	  	  <-­‐-­‐	  Vres1	  	  	  	   SWAPS	  R0	   	   	   	  	  	  ;R0	  	  	  	  <-­‐-­‐	  SR0_2	  =	  Nt	  +	  Si	  	   MOVR	  	  R3	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	  	  	  ;R3	  	  	  	  <-­‐-­‐	  Nt	  +	  Si	  	   SWAPS	  R0	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	  	  ;SR0_2	  <-­‐-­‐	  R0	  =	  Nt	  +	  Si	  	   MOVA	  	  R3	  	   SHRN	  DOS	  	   	   	   	   	   	  	  	   FREEZENC	   	   	   ;if	  neuron	  type	  =	  II	  (conditional	  load)	  
Appendices 
	   83	  
	   	   LDALL	  R4,DMEM2	   ;R4	  	  <-­‐-­‐	  DECAY	  DONATOR	  2	  	   	   LDALL	  R5,VREST2	  	   ;R5	  	  <-­‐-­‐	  Vres2	  	   UNFREEZE	  	   	  ;-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  R2	  <-­‐-­‐	  (1-­‐Si(t))*(Vi(t)-­‐Vres)*(Kmem)	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	   MOVA	  R3	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   ;R0	  	  <-­‐-­‐	  R3	  =	  Nt	  +	  Si	  	   SHRN	  UNO	   	  	   FREEZEC	   	   	   ;if	  (si	  =	  0)	  then	  r2	  <-­‐-­‐	  ((1)*(vi(t)-­‐vres)*(kmem)	  	   	   SWAPS	  R1	   	   ;R1	  	  <-­‐-­‐	  SR1_2	  =	  Vi	  	  	   	   MOVA	  	  R1	   	   ;R0	  	  <-­‐-­‐	  R1	  =	  Vi	  	   	   SUB	  	  	  R5	   	   	   ;R0	  	  <-­‐-­‐	  Vi	  -­‐	  Vres	  	   	   UNMUL	  	  	  R4	   	  	  	  	  	  	   ;R0	  	  <-­‐-­‐(Vi(t)-­‐Vres)	  *	  (Kmem)	  	   	   MOVR	  	  R2	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   ;R2	  	  <-­‐-­‐(Vi(t)-­‐Vres)	  *	  (Kmem)	   	  	   UNFREEZE	  	  	   MOVA	  R3	  	   SHRN	  UNO	  	   FREEZENC	   	   	   ;IF	  (Si	  =	  1)	  THEN	  R2	  <-­‐-­‐	  ((0)*(Vi(t)-­‐Vres)*(Kmem)	  =	  0	  	   	   RST	  	  R2	  	  	  	  	  	   	   ;R2	  	  <-­‐-­‐	  ((0)*(Vi(t)-­‐Vres)*(Kmem)	  	   UNFREEZE	  	  MOVA	  R2	  	  	  	  	  	  	  	  	  	   	   	   	   ;R0	  	  <-­‐-­‐	  (Vi(t)-­‐Vres)*(Kmem)	  ADD	  	  R5	  	  	  	  	  	  	  	  	  	   	   	   	   ;R0	  	  <-­‐-­‐	  (Vres1	  or	  Vres2)	  +	  (1-­‐Si(t))*(Vi(t)-­‐Vres)*(Kmem)	  	  	  SWAPS	  R2	  	  	  	  	  	  	  	  	   	   	   	   ;R2	  	  <-­‐-­‐	  SR2_2	  =	  SUM_WEIGHTS	  ADD	  	  R2	   	   	   	   	   ;R0	  	  <-­‐-­‐	  (Vres1	  or	  Vres2)	  +	  (1-­‐Si(t))*(Vi(t)-­‐Vres)*(Kmem)	  +	  SUM_WEIGHTS	  MOVR	  R1	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   	   ;R1	  	  <-­‐-­‐	  (Vres1	  or	  Vres2)	  +	  (1-­‐Si(t))*(Vi(t)-­‐Vres)*(Kmem)	  +	  SUM_WEIGHTS	  SWAPS	  R1	   	   	   	   ;SR1_2	  	  <-­‐-­‐	  R1	  =	  Vi	  	  RST	  R2	   	   	   	   	   ;SUM_WEIGHTS	  <-­‐-­‐	  0	  	  SWAPS	  R2	  	  	  	  	  	  	  	   	   	   	   ;SR2_2	  <-­‐-­‐	  R2	  =	  SUM_WEIGHTS	  RET	   	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  SYNAPSE	  LOAD	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  .SYNAPSE_LOAD	  	  NOP	  LOADNP	  NOP	  	  LOADSP	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   	   	  ;R4	  <-­‐-­‐	  St	  +	  Sj	  	   	   	   	   	   ;R5	  <-­‐-­‐	  Aj	  	   	   	   	   	   ;R6	  <-­‐-­‐	  Lji	  	   	   	   	   	   ;R7	  <-­‐-­‐	  Mj	  RET	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  SYNAPTIC	  WEIGHT	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .SYNAPTIC_WEIGHT	  	  	  	   	   	   	  MOVA	  R4	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   	   ;	  R0	  <-­‐-­‐	  St	  +	  Sj	  SHRN	  UNO	  FREEZENC	  	   	   	   	   	  ;if	  (sj	  =	  1)	  then	  r0	  <-­‐-­‐	  wji	  =	  aji	  *	  p	  	   LDALL	  R1,POT1	   	   	   ;	  R1	  	  <-­‐-­‐	  	  POT1	  	   MOVA	  	  R4	   	  	  	  	  	   	   ;	  R0	  <-­‐-­‐	  St	  +	  Sj	   	  	   SHRN	  	  DOS	  	   	   FREEZENC	  	   	   LDALL	  R1,POT2	  	   	   UNFREEZE	  	   MOVA	  R1	  	  	  	  	  	  	  	  	  	   	   	   	  ;R0	  	  <-­‐-­‐	  POT1	  or	  POT2	  	  	  	  	   MUL	  	  R5	  	  	  	  	  	  	  	  	  	   	   	   	  ;R0	  	  <-­‐-­‐	  wji	  =	  Aji	  *	  P	  	  	  	  	  	  	  SWAPS	  R2	  	  	  	   	   	   	   	  ;R2	  	  <-­‐-­‐	  SR2_2	  =	  sumW	  	   ADD	  	  R2	   	   	  	  	  	   	   	  ;SR0	  <-­‐-­‐	  wji	  =	  Sj	  *	  Aji	  *	  P	  	   MOVR	  R2	  	  	  	  	  	  	  	  	  	   	   	   ;R2	  	  <-­‐-­‐	  wji	  =	  Sj	  *	  Aji	  *	  P	  	  	  	  	  SWAPS	  R2	  	  	  	  	   	   	   	   ;SR2_2	  <-­‐-­‐	  R2	  	  =	  sumW	   	  
Appendices 
	   84	  
UNFREEZE	  RET	  ;-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  SYNAPSE_SAVE	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .SYNAPSE_SAVE	  	  ;	  THE	  SYNAPTIC	  PARAMETERS	  GO	  TO	  BUFFER	  32	  bits	  	  ;MOVA	  R5	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R5	  <-­‐-­‐	  Aji	  MOVA	  	  R6	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R6	  <-­‐-­‐	  Lji	  SHLN	  UNO	  SHLN	  UNO	  OR	  	  	  	  R5	  MOVR	  	  R1	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	  	  	   	   	   ;R1	  <-­‐-­‐	  Lji	  +	  Aji	  	  ;MOVA	  R4	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R4	  <-­‐-­‐	  St	  +	  Sj	  	  	  MOVA	  	  R7	   	   	   	  	  	  	  	   ;R0	  <-­‐-­‐	  R7	  =	  Mj	  SHLN	  UNO	  SHLN	  UNO	  OR	  	  	  	  R4	   	   	   	   	   ;R0	  <-­‐-­‐	  Mj	  +	  St	  +	  Sj	  	  	  	  STOREB	  NOP	  	  MOVA	  R4	  	  	  	  	  	  	   	   	   	   ;R0	  <-­‐-­‐-­‐	  R4	  =	  St	  +	  Sj	  	  	  to	  delete	  the	  spike	  SHRN	  UNO	  SHLN	  UNO	  MOVR	  R4	  	  STORESP	  	  	  	  	  	  	  RET	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  SPIKE	  UPDATE	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .SPIKE_UPDATE	  	  SWAPS	  R0	   	   	   	   ;R0	  	   <-­‐-­‐	  SR0_2	  =	  Nt	  +	  Si	  MOVR	  	  R2	  	  LDALL	  R3,THETA1	   	   	  	  	  	   ;R3	  	  	   <-­‐-­‐	  THETA1	  =	  "0000F060"	   	   	  SHRN	  	  DOS	  FREEZENC	  	   LDALL	  R3,THETA2	  	  	  	   	   	  ;R3	  	  	   <-­‐-­‐	  THETA2	  =	  "0000F060"	  UNFREEZE	  	  	  	   MOVA	  R2	  	   	   	   	   	  	   SHRN	  UNO	  	   SHLN	  UNO	  	   MOVR	  R2	   	   	   	  ;R2	  	  	   <-­‐-­‐	  Neuron	  Type	  +	  0	  It	  has	  been	  set	  Si	  =	  0	  	  	  	  	  	  SWAPS	  R1	  	  	  	  	  	  	   	   	   	   ;R1	  <-­‐-­‐	  SR1_2	  =	  Vi	  	   MOVA	  	  R1	  	  	  	  	  MOVR	  	  R5	   	  	   SWAPS	  R1	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;SR1_2	  	  <-­‐-­‐	  R1	  =	  Vi	  	   SHLN	  UNO	  	   	   FREEZEC	  	   	   	   LDALL	  R5,CTETP	  	  	  	  	  ;	  has	  assigned	  a	  positive	  value	  <	  30	  because	  it	  has	  verified	  that	  is	  lower	  than	  0	  	   	   UNFREEZE	  	  	  MOVA	  	  R5	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  ;R0	  	  	   <-­‐-­‐	  	  Vi	  
Appendices 
	   85	  
SUB	  	  	  R3	   	   	   	   	   	  	  	  	  	  ;R0	  	  	   <-­‐-­‐	  	  Vi	  -­‐	  (THETA1	  or	  THETA2)	  	  	   FREEZENC	  	  	  	  	  	  	  	   	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  SWAPS	  	   R4	   	   	  	  	  	  	  ;R4	  	  	   <-­‐-­‐	  SR4	  =	  Tref	  	   	   RST	   R0	  	  	   	   SUB	  	  	  	  	  R4	  	   	   SWAPS	  	  	  R4	  	   	   FREEZENZ	  	  	  	   	   	  	  	  	  	  	  ;if	  	  (z	  =	  1)	  then	  tref	  is	  setting	  	   	  	   	   LDALL	  R3,UNO	  	   	   MOVA	  	  R2	  	   	   ADD	  	  	  R3	  	   	   MOVR	  	  R2	  	   	   LDALL	  	  	  R4,CTE1	  	  	  	   	  	  	  	  	  	  	  ;CTE1	  =	  7	  	   	   SWAPS	  	  	  R4	  	   	   UNFREEZE	   	   	  	   UNFREEZE	  MOVA	  	  R2	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   	  	  	  	  	  	  	  ;R0	  	  	  	  <-­‐-­‐	  	  Nt	  +	  Si	   	  SWAPS	  R0	   	   	   	   	  	  	  	  	  	  	  ;SR0_2	  <-­‐-­‐	  R0	  =	  Nt	  +	  Si	   	  	   	  RET	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  REFRACTORY	  P	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .REFRACTORY_P	  	   SWAPS	  	   R4	   	   	   	  	  	  	  	  	  	  	  ;R4	  	  	  <-­‐-­‐	  SR4	  =	  Tref	  	   MOVA	  	  	  	  R4	  	   SHRN	  UNO	  	   MOVR	  	  	  	  R4	  	   SWAPS	  	   R4	   	   	   	  	  	  	  	  	  	  	  ;SR4	  	  <-­‐-­‐	  R4	  =	  Tref	   	  RET	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .NEURON_DISPLAY	  	  ;SWAPS	  R0	   	   	   ;Nt	  +	  Si	  ;SWAPS	  R1	  	  	  	  	  	  	  	  	  	   	   	   	  ;Vi	  ;SWAPS	  R2	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;sum_W	   	  ;SWAPS	  R3	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;Mi	  	  SWAPS	  	  R0	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R0	  	  <-­‐-­‐	  SR0	  =	  Nt	  +	  Si	  	  MOVR	  	  	  R1	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R1	  	  <-­‐-­‐	  R0	  SWAPS	  	  R0	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;SR0	  <-­‐-­‐	  R0	  	  =	  Nt	  +	  Si	  SWAPS	  	  R3	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R3	  <-­‐-­‐	  SR3	  =	  Mi	  MOVA	  	  	  R3	  SHLN	  UNO	  SHLN	  UNO	  OR	  	  	  	  	  R1	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R0	  <-­‐-­‐	  Mi	  +	  Nt	  +	  Si	  SWAPS	  	  R3	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R3	  <-­‐-­‐	  SR3	  =	  Mi	  SWAPS	  	  R1	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R1	  <-­‐-­‐	  SR1	  =	  Vi	  	  STOREB	  NOP	  SWAPS	  	  R1	  	  RST	  	  	  R1	  SWAPS	  R2	  	  	  	  	  	  	  	  	  	  	  	  	   	   	   ;R2	  <-­‐-­‐	  SR2	  =	  sum_W	  MOVA	  	  R2	  	  	  	  	  	  	  	  	  	  	  SWAPS	  R2	  STOREB	  NOP	   	  RET	  	  ;-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  BACKGROUND_ACTIVITY-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .BACKGROUND_ACTIVITY	  	  SWAPS	  R3	  	  	  	  	  	  	  	  	  	  	  	   	   	   	  	  	  	  ;R3	  	  <-­‐-­‐	  SR3_2	  =	  INITIAL	  SPIKING	  	  	  MOVA	  	  R3	  SWAPS	  R3	  	  
Appendices 
	   86	  
SHRN	  UNO	  MOVR	  R3	  	  	  	  	  	  	  	  	  FREEZENC	  	  	  	  	   	   	  	  	  	  	  	  ;IF	  	  (Z	  =	  1)	  THEN	  Tref	  is	  setting	  	   	  	   	   SWAPS	  	  	  R0	  	   	  	  	  	  	  	  ;	  R0	  <-­‐-­‐	  SR0_2	  =	  Neuron	  Type	  +	  Si	  	  	  	   	   SHRN	  	  	  	  UNO	  	   	   SHLN	  	  	  	  UNO	  	   	   LDALL	  R0,UNO	  	   	   SWAPS	  	  	  R0	  	   	  	  	  	  	  	  	  	  ;	  R0	  <-­‐-­‐	  SR0_2	  =	  Neuron	  Type	  +	  Si	   	   	  	   UNFREEZE	   	  	  SWAPS	  R3	  	   	   	  	  	  	  	  	  	  	   	  	  	  	  	  	  	  	  ;SR3_2	  <-­‐-­‐	  R3	  =	  NEW	  SPIKING	  REG	  VALUE	  RET	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  ;-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ENABLE	  SPIKES	  PROPAGATION-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  .SPIKES_ENABLE	  SWAPS	  R0	   	   	   	  	  MOVR	  R2	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	   	   	  	  	  	  	  	  	  	  	  	  	  ;	  R2	  <-­‐-­‐	  St	  +	  Si	  SWAPS	  R0	  MOVA	  R2	   	   	   	  	  	  	  	  	  	  	  	  	  	  ;	  R0	  <==	  Spikes	  STOREPS	  RET	  	  	  	  ;	  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  ;	  ****************************	  PROCEDURES	  END	  ******************************	  	  	  ;	  ****************************	  MAIN	  PROGRAM	  BEGIN	  ************************	  .MAIN	  	  LOOPN	  neurons_virtualized	   	   ;	  neurons_virtualized	  is	  the	  number	  of	  neurons	  per	  processor	  GOTO	  LOAD_NEURAL_PARAMETERS	  GOTO	  MEMBRANE_VALUE	  GOTO	  SAVE_NEURAL_PARAMETERS	  ENDL	  	  LOOPS	  synapses	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  ;synaptic	  loop	   ;	  synapses	  is	  the	  number	  of	  synapses	  per	  processor	  	   GOTO	  	  SYNAPSE_LOAD	  	   GOTO	  	  SYNAPTIC_WEIGHT	  	   GOTO	  	  SYNAPSE_SAVE	  ENDL	  	  LOOPN	  neurons_virtualized	  GOTO	  LOAD_NEURAL_PARAMETERS	  GOTO	  SPIKE_UPDATE	  GOTO	  BACKGROUND_ACTIVITY	  GOTO	  NEURON_DISPLAY	  GOTO	  SPIKES_ENABLE	  GOTO	  SAVE_NEURAL_PARAMETERS	  ENDL	  NOP	  	  SPKDIS	  NOP	  NOP	  GOTO	  MAIN	  ;	  ****************************	  MAIN	  PROGRAM	  END	  **************************	  
Appendices 
	   87	  
Appendix-B 
 	  Below	  there	  is	  the	  Matlab	  code	  used	  to	  generate	  the	  tags	  of	  SNAVA+	  CAMs,	  i.e.	  the	  mapping	  of	  the	  synapses	  in	  the	  network.	  The	  content	  of	  the	  text	  file	  created	  has	  to	  be	  copied	  in	  the	  SNAVA_pkg.vhd	  project	  file.	  	  This	  code	  generates	  a	  specific	  topology,	  for	  an	  implementation	  of	  SNAVA	  +	  with	  20	  synapses	  and	  7	  neurons	  per	  PE.	  	  To	  each	  neuron	  are	  assigned	  3	  synapses,	  except	  for	  the	  neurons	  corresponding	  to	  the	  first	  layer	  of	  virtualization,	  which	  have	  2	  synapses.	  The	  synapse	  mapping	  is	  shown	  in	  Figure	  B.1.	  	  	  	  	  
	  
Figure	  B.1	  –	  The	  neurons	  of	  each	  PE	  are	  connected	  in	  the	  way	  shown	  in	  this	  figure.	  The	  circles	  
rapresents	  the	  neurons	  (in	  this	  case	  there	  are	  7	  neurons	  per	  PE),	  while	  the	  connectors	  rapresent	  
the	  synapses	  (three	  synapses	  for	  all	  ,	  except	  for	  the	  connection	  between	  neuron	  7	  and	  neuron	  1).	  	  	  	  
This	  Matlab	  code	  is	  easily	  adaptable	  to	  any	  configuration	  and	  implementation	  of	  
SNAVA+	  required,	  so	  this	  code	  is	  to	  be	  understood	  as	  an	  example	  to	  achieve	  other	  
topologies.	  	  	  	  clc;	  clear	  all;	  close	  all;	  depth=input('Enter	  the	  no.	  of	  Virtualization	  Level	  :');	  row=input('Enter	  the	  no	  of	  Rows	  in	  PE	  array	  :	  ');	  col=input('Enter	  the	  no	  of	  cols	  in	  PE	  array	  :	  ');	  chip_id=input('Enter	  Total	  no	  of	  Chips	  :	  ');	  f=fopen('snava.txt','w');	  cnt=0;	  
Appendices 
	   88	  
for	  i=1:row	  for	  j=1:col	  cnt=0;	  for	  k=1:depth	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  for	  l=1:row	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\n(-­‐-­‐Component	  %d	  of	  x\n',(l-­‐1));	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  for	  m=1:col	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\n(-­‐-­‐Component	  %d	  of	  y	  \n',(m-­‐1));	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  for	  n=1:20	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  if((i==l)&&(j==m)&&(k==n))	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  else	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%2d	  =>',(n-­‐1));%PRINT	  SYNAPSE	  NUMBER	  (STARTS	  FROM	  0)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐PRINT	  CHIP_ID-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(chip_id,7);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'"%s',var);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐PRINT	  ROW-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(m-­‐1,4);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐PRINT	  COLUMN-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(l-­‐1,4);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐PRINT	  DEPTH-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  if	  n==1	  |	  n==2	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(depth,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  n==3	  |	  n==4	  |	  n==5	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(1,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  n==6	  |	  n==7	  |	  n==8	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(2,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  n==9	  |	  n==10	  |	  n==11	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(3,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  n==12	  |	  n==13	  |	  n==14	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(4,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  n==15	  |	  n==16	  |	  n==17	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(5,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  n==18	  |	  n==19	  |	  n==20	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(6,3);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  end	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s",',var);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'	  	  -­‐-­‐synapse	  %d',n);	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\n');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  cnt=cnt+1;	  	  	  	  	  	  	  	  	  	  	  	  	  end;	  	  	  	  	  	  	  	  	  end;	  	  	  	  	  	  	  	  	  fprintf(f,'others	  =>	  "1111111111111111111"),');	  	  	  	  	  	  	  	  	  fprintf(f,'-­‐-­‐End	  of	  y	  component	  of	  %d\n',(m-­‐1));	  	  	  	  	  end;	  	  	  	  	  	  	  	  fprintf(f,'-­‐-­‐End	  of	  %d	  component	  of	  x\n',(l-­‐1));	  	  	  	  	  	  	  	  	  fprintf(f,');\n');	  	  end;	  end;	  end;	  end;	  	  fprintf(f,'\n\nEND\n');	  	  	  
Appendices 
	   89	  
Appendix-C 
 	  Below	  there	  is	  the	  Matlab	  code	  used	  to	  generate	  the	  initialization	  values	  of	  the	  synaptic	  and	  neuronal	  BRAMs.	  The	  content	  of	  the	  text	  file	  created	  has	  to	  be	  copied	  in	  the	  config_file.vhd.	  	  
These	  Matlab	  codes	  are	  easily	  adaptable	  to	  any	  configuration	  and	  implementation	  of	  
SNAVA	  +	  required,	  so	  this	  code	  is	  to	  be	  understood	  as	  an	  example	  to	  achieve	  other	  
topologies.	  	  	  
Synaptic	  BRAMs:	  
	  In	  this	  example	  the	  initial	  content	  of	  all	  the	  synaptic	  BRAMs	  is	  0x7FFE0CCC,	  in	  order	  to	  have	  only	  excitatory	  synapses.	  
	  clc;	  clear	  all;	  close	  all;	  	  n_row=input('Enter	  the	  no	  of	  Rows	  in	  PE	  array	  :	  ');	  n_col=input('Enter	  the	  no	  of	  cols	  in	  PE	  array	  :	  ');	  n_syn=input('Enter	  the	  no	  of	  synapses	  per	  PE	  of	  the	  array	  :	  ');	  	  	  f=fopen('bram_syn.txt','w');	  	  PE_cnt=1;	  	  	  for	  col	  =	  1:	  n_col	  	  	  	  	  for	  row	  =	  1:	  n_row	  	  	  	  	  	  	  	  	  fprintf(f,'\n-­‐-­‐-­‐-­‐	  BRAM	  SYNAPTIC	  PE	  %d\n-­‐-­‐	  	  	  	  	  address	  	  	  layer	  	  	  row	  	  	  col	  	  	  ',	  PE_cnt);	  	  	  	  	  	  	  	  	  for	  syn	  =	  1:	  n_syn	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\n("10110');	  	  	  	  	  	  	  	  	  	  	  var=dec2bin((syn-­‐1),11);	  %addr	  of	  BRAM	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'000');	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(row,5);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(col,5);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'000');	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"7FFE0CCC"),');	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'-­‐-­‐	  synapse	  %d',syn);	  	  	  	  	  	  	  	  	  end;	  	  	  	  	  	  	  	  	  PE_cnt	  =PE_cnt	  +1;	  	  	  	  	  	  	  	  	  fprintf(f,'\n\n');	  	  	  	  	  end;	  end;	  	  
Appendices 
	   90	  







	  In	  this	  example	  the	  number	  of	  neuronal	  BRAM	  words	  assigned	  to	  each	  neuron	  is	  2.	  	  
	  clc;	  clear	  all;	  close	  all;	  	  n_row=input('Enter	  the	  no	  of	  Rows	  in	  PE	  array	  :	  ');	  n_col=input('Enter	  the	  no	  of	  cols	  in	  PE	  array	  :	  ');	  n_depth=input('Enter	  the	  depth	  :	  ');	  	  	  f=fopen('bram_neu.txt','w');	  	  PE_cnt=1;	  	  	  for	  col	  =	  1:	  n_col	  	  	  	  	  for	  row	  =	  1:	  n_row	  	  	  	  	  	  	  	  	  fprintf(f,'\n-­‐-­‐-­‐-­‐	  BRAM	  NEURONAL	  PE	  %d\n-­‐-­‐	  	  	  	  	  address	  	  	  layer	  	  	  row	  	  	  col	  	  	  ',	  PE_cnt);	  	  	  	  	  	  	  	  	  for	  depth	  =	  1:	  (2	  *	  n_depth)	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\n("10101');	  	  	  	  	  	  	  	  	  	  	  var=dec2bin((depth-­‐1),11);	  %addr	  of	  BRAM	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'000');	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(row,5);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  var=dec2bin(col,5);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'%s',var);	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'000');	  	  	  	  	  	  	  	  	  	  	  if	  (mod(depth,2)	  ==	  0)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  %many	  case	  for	  LOOP	  BACK	  implementation	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  if(PE_cnt	  <=	  10	  )	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880010"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  20	  &&	  PE_cnt	  >=	  11)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880020"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  30	  &&	  PE_cnt	  >=	  21)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880040"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  40	  &&	  PE_cnt	  >=	  31)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880080"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  50	  &&	  PE_cnt	  >=	  41)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880100"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  60	  &&	  PE_cnt	  >=	  51)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880200"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  70	  &&	  PE_cnt	  >=	  61)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880400"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  80	  &&	  PE_cnt	  >=	  71)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880800"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  <=	  90	  &&	  PE_cnt	  >=	  81)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1881000"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  elseif	  (PE_cnt	  >=	  91)	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1882000"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  else	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"E1880000"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  end	  	  	  	  	  	  	  	  	  	  	  	  
Appendices 
	   91	  
	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  neuron_number	  =	  depth	  /2;	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'-­‐-­‐	  neuron	  %d',neuron_number);	  	  	  	  	  	  	  	  	  	  	  else	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'",X"00000000"),');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'\t');	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  neuron_number	  =	  (depth	  +	  1)	  /2;	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  fprintf(f,'-­‐-­‐	  neuron	  %d',neuron_number);	  	  	  	  	  	  	  	  	  	  	  end;	  	  	  	  	  	  	  	  	  end;	  	  	  	  	  	  	  	  	  PE_cnt	  =PE_cnt	  +1;	  	  	  	  	  	  	  	  	  fprintf(f,'\n\n');	  	  	  	  	  end;	  end;	  	  	  fprintf(f,'\n\n-­‐-­‐	  END	  BRAM	  NEURONAL\n	  ');	  	  fclose(f);	  
	  
	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  
Appendices 
	   92	  
Appendix-D 	  	  In	  this	  appendix	  is	  shown	  a	  simple	  experimental	  test	  on	  SNAVA+,	  and	  the	  results	  are	  commented.	  For	  the	  test	  are	  used	  the	  Kintex	  KC705	  base	  board	  (see	  Figure	  5.4)	  connected	  via	  Ethernet	  to	  a	  PC.	  	  To	  interpret	  and	  understand	  the	  results	  obtained	  from	  the	  test,	  it	  has	  been	  used	  a	  PC	  application	  called	  SNAVA-­‐HMI,	  designed	  specifically	  to	  interface	  the	  user	  with	  SNAVA	  and	  SNAVA	  +,	  by	  Mr.	  Salvatore	  Cambria.	  	  The	  test	  has	  been	  made	  using	  an	  implementation	  of	  SNAVA	  +,	  with	  an	  array	  of	  10x10,	  20	  synapses	  and	  7	  neurons	  per	  PE.	  	  To	  each	  neuron	  are	  assigned	  3	  synapses,	  except	  for	  the	  neurons	  corresponding	  to	  the	  first	  layer	  of	  virtualization,	  which	  have	  2	  synapses.	  The	  synapse	  mapping	  is	  shown	  in	  Figure	  D.1.	  	  	  	  	  
	  
Figure	  D.1	  –	  The	  neurons	  of	  each	  PE	  are	  connected	  in	  the	  way	  shown	  in	  this	  figure.	  The	  circles	  
rapresents	  the	  neurons	  (in	  this	  case	  there	  are	  7	  neurons	  per	  PE),	  while	  the	  connectors	  rapresent	  
the	  synapses	  (three	  synapses	  for	  all	  ,	  except	  for	  the	  connection	  between	  neuron	  7	  and	  neuron	  1).	  	  	  The	  initial	  conditions	  of	  SNAVA	  +	  (in	  this	  case	  the	  initialization	  values	  of	  the	  Synaptic	  BRAMs)	  have	  been	  set	  in	  such	  a	  way	  that	  during	  the	  first	  cycle	  of	  execution,	  all	  the	  neurons	  corresponding	  to	  the	  first	  level	  of	  virtualization	  (first	  layer)	  send	  a	  spike.	  	  The	  experiment	  is	  to	  verify	  that	  this	  spikes	  are	  correctly	  transmitted	  to	  the	  next	  neurons	  according	  to	  the	  topology	  shown	  in	  Figure	  D.1	  	  	  
Appendices 
	   93	  
	  Therefore,	  if	  SNAVA+	  works	  correctly,	  the	  result	  will	  be:	  	  
• CYCLE	  1:	  all	  the	  neurons	  of	  the	  first	  layer	  send	  a	  spike	  
• CYCLE	  2:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  second	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  3:	  all	  the	  neurons	  of	  the	  second	  layer	  send	  a	  spike	  
• CYCLE	  4:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  third	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  5:	  all	  the	  neurons	  of	  the	  third	  layer	  send	  a	  spike	  
• CYCLE	  6:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  fourth	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  7:	  all	  the	  neurons	  of	  the	  fourth	  layer	  send	  a	  spike	  
• CYCLE	  8:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  fifth	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  9:	  all	  the	  neurons	  of	  the	  fifth	  layer	  send	  a	  spike	  
• CYCLE	  10:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  sixth	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  11:	  all	  the	  neurons	  of	  the	  sixth	  layer	  send	  a	  spike	  
• CYCLE	  12:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  seventh	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  13:	  all	  the	  neurons	  of	  the	  seventh	  layer	  send	  a	  spike	  
• CYCLE	  14:	  the	  spike	  is	  received	  by	  the	  neurons	  of	  the	  first	  layer,	  which	  are	  in	  the	  Phase	  1	  of	  the	  execution	  (see	  section	  1.4.1	  and	  Appendix	  A).	  No	  spikes	  are	  sent.	  
• CYCLE	  15:	  all	  the	  neurons	  of	  the	  first	  layer	  send	  a	  spike	  
• The	  same	  scheme	  is	  repeated	  for	  all	  the	  others	  execution	  cycles….	  	  	  
Appendices 
	   94	  
	  Figure	  D.2	  and	  D.3,	  show	  two	  screenshots	  of	  the	  SNAVA-­‐HMI.	  In	  the	  Y-­‐axis	  there	  are	  the	  neurons	  ID,	  while	  in	  the	  X-­‐axis	  there	  are	  the	  execution	  cycle	  numbers.	  The	  red	  points	  indicate	  the	  spikes.	  In	  the	  figure	  D.2	  are	  shown	  only	  the	  7	  neurons	  of	  the	  first	  PE:	  the	  behavior	  of	  the	  system	  is	  the	  expected	  one.	  The	  correct	  behavior	  is	  also	  visible	  in	  Figure	  D.3,	  which	  instead	  shows	  all	  the	  700	  neurons.	  Because	  of	  the	  large	  number	  of	  neurons,	  the	  behavior	  (which	  is	  the	  same	  of	  that	  of	  Figure	  D.2	  and	  valid	  for	  each	  one	  of	  the	  100	  PE)	  is	  more	  difficult	  to	  appreciate	  in	  Figure	  D.3	  	   	  	  










Figure	  D.3	  –	  SNAVA-­‐HMI	  screenshot,	  all	  the	  neurons	  
