Operation and reconstruction of signals based on integrate-and-fire conversion using FPGA by Guilherme Luis Leitão Teixeira Guia de Carvalho
FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO
Operation and reconstruction of signals
based on integrate-and-fire conversion
using FPGA
Guilherme Carvalho
DISSERTATION THESIS
Mestrado Integrado em Engenharia Eletrotécnica e de Computadores
Supervisor: Vítor Grade Tavares
Co-Supervisor: João Canas Ferreira
February 14, 2017
c© Guilherme Carvalho, 2017
Resumo
As arquiteturas típicas de conversão analógico/digital, à taxa de Nyquist, tendem a ocupar um
espaço razoável nos circuitos integrados, bem como também tendem a consumir mais potência do
que o desejado. Uma vez que as tecnologias de circuitos integrados se aproximam do seu limite de
miniaturização, novas arquiteturas e mais aperfeiçoadas são necessárias para superar a maioria dos
problemas consequentes deste esforço de compactação. As soluções alternativas podem residir em
máquinas de codificação temporal que basicamente convertem a informação numa sequência tem-
poral. Em geral, as arquiteturas de codificação resultam em circuitos analógicos mais simples e de
baixa potência, enquanto a maior parte da complexidade é passada para o bloco de descodificação
totalmente digital, que deverá estar localizado num local com acesso a mais recursos.
O Integrated and Fire (IFN) é uma máquina de codificação temporal baseada num modelo sim-
plificado e de primeira ordem, da operação dos neurónios biológicos. O IFN remove a relevância
da amplitude na saída do codificador e ajusta-se ao propósito de balancear a complexidade entre
projeto analógico e digital. O IFN modula um sinal de uma maneira muito eficiente e, além de
reconstrução, pode-se realizar operações sobre o sinal no domínio codificado, bem como recu-
perar informação útil por análise direta da densidade de impulsos (sinal codificado). O impacto
deste método de codificação pode ser importante em vários campos, como é o exemplo das redes
de sensores sem fio. Neste contexto, a codificação de sinais medidos pelos sensores necessita ser
muito eficiente. No entanto, eventualmente o sinal original terá de ser recuperado. Este é o objeto
de investigação do presente trabalho.
Neste documento, o estado atual da arte na conversão IFN será revisto e a teoria comparada
aos resultados obtidos de forma empírica. Uma nova arquitetura de hardware para a reconstrução
do sinal IFN é também apresentada. O objetivo subjacente é realizar um sistema de reconstrução
que possa recuperar o sinal original com qualidade suficiente e num curto espaço de tempo, de
modo a que possa ser utilizado em aplicações de tempo real.
i
ii
Abstract
Typical analog-to-digital conversion architectures, at Nyquist rate, tend to occupy a big piece
of the integrated circuit die-area, and also to consume more power than desired. Because transistor
technology is reaching its limit in miniaturization, new and optimized architectures are required to
overcome most of the problems that new technology nodes are bringing. Alternatives may reside
in time encoding machines that have the capability of transforming information into a timing
sequence. Typical encoder architectures result in simpler and low-power analog circuits, while
most of the complexity is passed to a fully digital decoding stage, stationed in a place with access
to more resources.
The integrate-and-fire modulation is a time encoding machine based on a simplified first-order
model of neuron operation and signal processing. It removes the relevance of magnitude at the
output of the encoder, and fits the purpose of balancing complexity between analog and digital de-
sign. The integrate-and-fire modulates a signal in a very power efficient way, and besides regular
reconstruction, one can perform operations on the signal in the encoded domain, and also retrieve
useful information from directly analysing spiking information (encoded signal). The impact of
this encoding method might be important in various fields, such as in wireless sensor networks. In
this context, the encoding of measured signals by sensor nodes need to be very efficient. Never-
theless, eventually the original signal needs to be recovered. This is the object of research in the
present work.
In this document, the current state of the art on the integrate-and-fire conversion will be re-
viewed and the theory compared to results obtained from empirical data. A novel hardware ar-
chitecture for the spiking signal reconstruction is also presented. The underlying goal is to ac-
complish a reconstruction system that can recover the original signal with enough quality and in a
short amount of time, such that it is suitable for real-time applications.
iii
iv
Agradecimentos
Antes de mais, gostaria de agradecer a toda a gente que me apoiou e contribuiu nesta grande
etapa da minha vida. De entre essas pessoas gostaria de dar um lugar de destaque aos meus pais e
aos meus avós, que sempre me apoiaram incondicionalmente, e ao meu tio por me ter recebido e
aturado ao longo do meu percurso académico.
Em segungo lugar, agradeço ao Professor Vítor Grade Tavares e ao Professor João Canas Fer-
reira pela contribuição imprescindível para este documento. Extendo também este agradecimento
ao Iman Kianpour por me ajudar sempre que necessário.
Finalmente gostaria de mencionar o quão importantes foram todas as amizadas feitas ao longo
destes anos e as experiências vividas.
Guilherme Carvalho
v
vi
Contents
1 Introduction 1
1.1 Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Challenges Faced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background and Related Work 5
2.1 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 L2 Spaces and Projections . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Continuous Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Least-Squares approximation and Regularization . . . . . . . . . . . . . . . . . 7
2.3 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Nonuniform Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Pulse Code modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Delta Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Sigma-Delta Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Time Encoding Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 Asynchronous Sigma-Delta Modulation . . . . . . . . . . . . . . . . . . 11
2.5.2 Amplitude-to-time Conversion . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Chapter review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 IFN Modulation 15
3.1 Encoder Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Spiking Model Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Spline Based Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Frequency Domain Reconstruction . . . . . . . . . . . . . . . . . . . . . 20
3.3 The IFN compared to other modulations . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Empirical Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.1 Recovery Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.2 IFN and the ECG signal . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Implementation and Architecture 27
4.1 Adapted IFN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Memory Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
vii
viii CONTENTS
4.2.2 Spike Location and Matrix C Calculation . . . . . . . . . . . . . . . . . 33
4.2.3 CGSU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.4 Sinc Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.5 Master Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Other Implementation notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Design Testing and Results 41
5.1 Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Matrix-Vector Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 Matrix-Vector Parallelization . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.2 Structured matrix approach . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Conclusion and Future Work 49
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
List of Figures
2.1 Projection of a vector u onto another vector v . . . . . . . . . . . . . . . . . . . 6
2.2 Encoder and decoder stage of the delta modulation . . . . . . . . . . . . . . . . 10
2.3 Encoder and decoder stage of the sigma-delta modulation . . . . . . . . . . . . . 10
2.4 Behavior of a sigma-delta converter. a) Original input signal to be encoded. b)
Input of the integrator. c) Encoded signal . . . . . . . . . . . . . . . . . . . . . 11
2.5 Frequency spectrum of a digital bitstream encoded using the sigma-delta mod-
ulation. As it can be seen, most of the quantization noise is spread to higher
frequencies whilst the original signal is preserved in the lower part of the spectrum. 12
2.6 Encoder stage of the asynchronous sigma-delta modulation . . . . . . . . . . . . 12
2.7 Behavior of the asynchronous sigma-delta converter. a) Original input signal to be
encoded. b) Encoded signal. c) Integrated signal . . . . . . . . . . . . . . . . . . 13
2.8 Encoder stage of the Amplitude-to-time conversion . . . . . . . . . . . . . . . . 13
2.9 This figures shows the behavior of a ATC converter. a) Original input signal to be
encoded. b) Output of the integrator. In this particular case the threshold chosen
was 0.05. c) Encoded signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Example of a encoded IFN signal. a) Original signal at the input of the encoder.
b) Spiking signal resulting from the encoding process . . . . . . . . . . . . . . . 15
3.2 Encoder stage of the integrate and fire. a) represents the uniphasic IFN encoder.
b) represents the biphasic IFN encoder. . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 FWGN signals used as example set . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Reconstructions Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 ENOB of a reconstructed signal according to the Threshold θ and reconstruction
filter frequency used. The upper bound of eq. 3.5 is seen in the figure as being
very close to the actual optimal value . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Partial Reconstruction of an ECG signal . . . . . . . . . . . . . . . . . . . . . . 24
3.7 Representation of the effects of the reconstruction threshold offset θo f f set and
threshold θ on the ENOB of the reconstruction . . . . . . . . . . . . . . . . . . 25
4.1 Block Diagram of the Decoder implementation . . . . . . . . . . . . . . . . . . 31
4.2 Block Diagram of the Memory Manager implementation . . . . . . . . . . . . . 32
4.3 Packet information and impact on reconstruction . . . . . . . . . . . . . . . . . 33
4.4 FSM controlling the MMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Pipeline Diagram of the CGS Hardware Adapted Algorithm. u. represents a timing
unit given by the delay introduced by the SPFP pipelining . . . . . . . . . . . . . 35
4.6 Block Diagram of the CGS implementation . . . . . . . . . . . . . . . . . . . . 36
4.7 Block Diagram of the Inner CGS Block implementation . . . . . . . . . . . . . . 37
4.8 Diagram of the FSM controlling the CGSU . . . . . . . . . . . . . . . . . . . . 38
ix
x LIST OF FIGURES
4.9 Representation of the FSM controlling the Decoder . . . . . . . . . . . . . . . . 39
5.1 Block diagram of the board design . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Comparison between the output result of the FPGA and the real signal. This part
of the signal shown corresponds to a part of the first QRS section of the ECG
signal in fig. 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3 Reconstruction error of two FWGN signals. . . . . . . . . . . . . . . . . . . . . 45
5.4 Diagram of matrix-vector multiplication methodologies. a) represents row-wise
block partitioning. b) represents column-wise block partitioning . . . . . . . . . 46
List of Tables
5.1 Xilinx SPFP IP cores resource utilization . . . . . . . . . . . . . . . . . . . . . 42
5.2 Total resource utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3 Implementation’s power consumption in Watt . . . . . . . . . . . . . . . . . . . 44
xi
xii LIST OF TABLES
xiii
xiv ABBREVIATIONS AND SYMBOLS
Abbreviations and Symbols
ADC Analog to Digital Converter
ASDM Asynchronous Sigma-Delta Modulation
ATC Amplitude-to-Time Conversion
BRAM Block RAM
CG Conjugate Gradient
CGS Conjugate Gradient Squared
CGSU Conjugate Gradient Squared Unit
CWT Continuous Wavelet Transform
DAC Digital to Analog Converter
DM Delta Modulation
DPFP Double-Precision Floating-Point
DWT Discrete Wavelet Transform
ENOB Effective Number Of Bits
FF Flip-Flop
FFT Fast Fourier Transform
FP Fixed-Point
FPGA Field-Programmable Gate Array
FSM Finite State Machine
FWGN Filtered White Gaussian Noise
HDL Hardware Description Language
IC Integrated Circuit
IFN Integrate-and-Fire Neuron
I/O Input/Output
LIF Leaky Integrate-and-Fire
LUT LookUp Table
MMU Memory Manager Unit
MRA Multi-Resolution Analysis
OS Operating System
PCM Pulse Code Modulation
PL Programmable Logic
PS Processing System
RAM Random Access Memory
RSSPD Real Square Symmetric Positive-Definite
SinInt Cardinal Sine Integral
SD Secure Digital
SDM Sigma-Delta Modulation
SDPR Simple Dual Port RAM
SPFP Single-Precision Floating-Point
SSH secure shell
TDPR True Dual Port Ram
TEM Time Encoding Machine
WGN White Gaussian Noise
Chapter 1
Introduction
1.1 Context and Motivation
Analog-to-digital converters (ADC) are one of the most used devices in electronics. They
enable most naturally generated signals to be processed by digital systems and are likely to be
responsible for the recent technological growth. These devices might be designed in many ways
but, in general, to provide a good resolution in the digital domain, ADCs usually occupy a signifi-
cant die size and consume more power than desired. A possible solution to decreasing die size is
to simply decrease the transistor size. This solution has been applied for many years as transistor
technology keep ameliorating and decreasing in size [1], but this does not seem to be a good long
term solution. As Gordon E. Moore put, “The cost of building a modern chip fabrication plant
will double every 18 months until it becomes infeasible to build new plants”. One of the solutions
in sight for the problem might be to optimize the current processing technologies used in order
to reduce the number of components required per integrated circuit (IC) instead of increasing the
component’s density to achieve higher computational power. In recent years, a trend towards time-
based encoding techniques, or time encoding machines (TEM), seems to be resurging instead of
the widespread amplitude-based encoding.
TEMs are capable of encoding analog information into a single dimensional timing sequence.
This is called a spiking signal. Spiking signals can be represented in digital domain, thus making
TEMs suitable modulations for converting analog signals to the digital domain. TEMs have the pe-
culiarity of possessing very simple (with, in most cases, the possibly of fully analog) encoders that
consume very small amounts of power and have a decreased die size compared to the amplitude-
based homonyms. The decoder stages for TEMs are fairly complex compared to their encoders,
but passing the complexity and power consumption from the encoder stage to the decoder stage
might be an acceptable trade-off for most applications.
The integrate-and-fire neuron (IFN) is a TEM inspired by the biological neural network and
its unique capability to process analog information and convert it to a digital signal, encoding time
instead of amplitude and therefore allowing a significant decrease in amplitude related noise. The
1
2 Introduction
first rudimentary electronic model of a neuron was described by Lapicque [2] and later perfected
by Hodgkin and Huxley [3], including now the dynamics of the voltage-dependent membrane
conductances responsible for action potential generation. Even though Lapicque’s model, or the
integrate-and-fire neuron model, is fairly simple, it covers the main idea of the neuron’s infor-
mation processing in a first-order system, rendering it especially useful for many applications.
Furthermore, this simple system has very low power characteristics and has the advantage that the
encoder can be fully designed using analog circuitry.
In this work, the state of the art on the IFN modulation will be reviewed and, based on the
current spiking signal reconstruction techniques, a new decoding architecture will be proposed and
tested. This same architecture should be able to recover the original signal within an acceptable
time frame and with enough quality to make it suitable for real-time applications.
1.2 Problem Statement and Goals
Recently the costs associated with chip manufacturing have skyrocketed and there seems to be
no way to stop the trend due to the demand of faster and smaller machines. Since amplitude-based
encoders seem to require more power and the transmission of more information, time encoding
machines are gaining notoriety for their comparatively simpler encoders.
The main goal of this work is to develop an IFN signal processing platform and decoder using
an FPGA. This system should be capable of recovering any spike based signal at its input and
provide an ENOB of at least 8.00 bits given a set of reasonable parameters. By the end of the
document, it should also be clear how the decoder can be applied to a real-time application.
1.3 Challenges Faced
There were many challenges associated with the reconstruction of IFN based signals and the
subsequent hardware implementation. Besides the amount of literature on the modulation itself
and applications being scarce, it is the author’s believe that this is one of the first works describing
a hardware implementation of an IFN decoder.
The main challenge is to ensure a fast recovery for real-time applications without the loss of
precision. Since precision and recovery time form a trade-off relation, the reconstruction algo-
rithm had to be adapted. The second challenge is to confer a certain degree of customization and
versatility to the algorithm, allowing different signals with different characteristics to be decoded.
The third and last challenge regards the hardware’s characteristics. Area and power consumption
are an important factor and the architecture takes a leading role in minimizing their impact.
1.4 Document Structure 3
1.4 Document Structure
Besides the introduction, this document contains five more chapters. Chapter 2 gives some
theoretical background required to better understand the IFN modulation and the recovery of the
spiking signal. Furthermore, some other amplitude and time-based modulations are described as
to establish similarities and differences. In chapter 3, the IFN modulation itself is presented. Both
the encoding and decoding stages are thoroughly described and some empirical test results are
demonstrated to corroborate the theory. The following chapter (chapter 4) focuses on algorithm
optimization for hardware implementation and architecture. The performance of said implemen-
tation is then presented in 5. In the final chapter (chapter 6) conclusions are drawn and future
improvements are also discussed
4 Introduction
Chapter 2
Background and Related Work
This chapter covers the fundamental tools and basic concepts required to better understand the
algorithms described along the document, but also the IFN related modulations will be presented.
As a start, the concept of viewing signals as vectors will be briefly mentioned as to improve our
understanding of signals in an N dimensional space. Secondly, uniform and non-uniform sampling
will be introduced. The sampling method is an indicator of how a signal is encoded and is therefore
crucial for the reconstruction process. Thirdly and lastly, modulation techniques, both amplitude
and time-based, will be briefly reviewed.
Readers familiarized and comfortable with the described concepts can skip this chapter.
2.1 Hilbert Spaces
A Hilbert Space is a generalization of an Euclidean Space. A Hilbert vector space can be either
a real or complex inner product space, it must have finite energy (Cauchy–Schwarz inequality) and
it holds the following properties:
• Hermitian symmetry: 〈u,v〉 = 〈v,u〉∗
• Hermitian bilinearity: 〈αv + βu, w〉 = α∗〈u,v〉 + β ∗〈u,v〉
• Strict positivity: 〈v,v〉 ≥ 0, being the result null only when v = 0
Hilbert spaces can also extend in to the continuous domain. One of the most relevant Hilbert
sub-spaces are the L2, L2[(a,b)] and L2w[(a,b)] spaces.
5
6 Background and Related Work
2.1.1 L2 Spaces and Projections
L2 spaces are Lebesgue spaces (also known as Lp spaces). The inner product of two complex
functions in L2 has always finite energy and is defined by
〈 f ,g〉=
∫
f (x)g∗(x)dx (2.1)
L2 spaces can also be interval-limited and weighted, the L2w[(a,b)] spaces, defined by
〈 f ,g〉=
∫ b
a
f (x)g∗(x)w(x)dx (2.2)
Since vectors and continuous signals share the same properties, we can also verify that func-
tions can be projected onto others and the projection theorem can also be applied to continuous
signals
f|g =
〈 f ,g〉
〈g,g〉g (2.3)
u
v
𝑢|𝑣
𝑢|𝑣
Figure 2.1: Projection of a vector u onto another vector v
2.1.2 Continuous Wavelet Transform
Wavelets are a good example of how this relation between signals and vectors can be benefi-
cial. There are two types of wavelet transforms, the continuous wavelet transform (CWT) and the
discrete wavelet transform (DWT). The later one is extensively used in image processing.
Wavelets take advantage of what is called Multi-Resolution Analysis (MRA). MRA allows
the adjustment of the window size to the different frequencies present in the signal. For higher
2.2 Least-Squares approximation and Regularization 7
frequencies (detail) the window should be small and for lower frequencies the window can be
bigger.
CWT = ψ(τ,s) =
1√|s|
∫
x(t)ψ∗(
t− τ
s
)dt (2.4)
ψ(t) is called the mother wavelet. A mother wavelet is an oscillatory function of finite length.
It is used as a prototype that can be scaled and translated. The higher the scale, the coarser the
approximation.
The CWT gives then the closeness of the function to the scaled and translated versions of the
mother wavelet.
Strong similarities between equations 2.1.2 and 2.1.1 can be observed, with 1√|s| serving as a
weight.
2.2 Least-Squares approximation and Regularization
A common approximation problem is of the form
min||Ax−b||22 (2.5)
and sometimes it might be that b is not in the same space as A. This problem is a convex opti-
mization problem and a way of solving it is to attribute a residual value associated with the found
solution, x.
min||Ax−b||22 = r21 + r22 + ...+ r2∞ (2.6)
The least-squares approximation is a known and extensively used method to solve this type of
problems and the solution will always be of the form:
x = (AT A)−1AT b (2.7)
If both x and the residual factor should be minimized, a so called regularization problem is
faced and it is of the form:
min||Ax−b||22+δ ||x||2,δ > 0 (2.8)
A solution for the regularized form of the least-squares approximation can be found and is
given by
x = (AT A+δ I)−1AT b (2.9)
8 Background and Related Work
2.3 Sampling
2.3.1 Uniform Sampling
The theory of uniform sampling and signal reconstruction was revolutionized by Claude E.
Shannon in 1949 in a now very famous paper [4] even though it was already common knowledge
among communication engineers and theorists of that period. Other phenomenal contributions to
the sampling theorem were made by J. M Whittaker [5], V. A. Kotel’nikov [6] and many others.
The Shannon-Whittaker-Kotel’nikov or Nyquist-Shannon theorem assumes a channel with
band Ω and restricted to a period of time T.
Theorem 1 : If a function f (t) is bandlimited such that f (t) ∈ [-Ω, Ω], it is completely deter-
mined by giving its ordinates at a series of points spaced piΩ seconds apart.
Let T = pi/Ω. The function f (t) is given by
f (t) =
∞
∑
n=−∞
f (nT )sinc(t/T −n) (2.10)
This simple theorem has proven to be very important and useful throughout the years and
it was even object of some extensions [7]. While it might seem that almost every signal can
be recovered or benefit from this simple method of reconstruction, hardware implementations of
uniform samplers often consume more power and occupy more area than desired.
2.3.2 Nonuniform Sampling
The theory of nonuniform (or irregular) sampling was largely explored by Feichtinger and
Gröchenig [8] and is based largely on the work of Duffin and Schaeffer on frame bounds [9].
Nonuniform sampling is also consider a generalization of the uniform case.
The theory of irregular sampling states that, much in the likings of equation 2.10, a nonuni-
formly sampled bandlimited signal x(t) can be recovered in the following form
x(t) = gT c = ∑
n∈Z
ckg(t− tk) (2.11)
where ck represents a set of weights, gT denotes the transpose of the ideal low-pass filter response
and tk the sampling times. From the above equation and the statement that any bandlimited signal
is completely determined by its local averages, a theorem can be formed as follows:
2.4 Pulse Code modulation 9
Theorem 2 : Let f (t) be a bandlimited function such that f (t) ∈ [-Ω, Ω], [xn,n ∈ Z] the
corresponding sampling sequence with δ = supn(xn+1−xn)< pi/Ω, yn the midpoints yn = xn+xn+12 ,
wn = yn− yn−1, and the local average ξn = 1wn
∫ yn
yn−1
f (x)dx
f0 = ∑
n∈Z
ξnwn
Ω
pi
LxnsincΩ (2.12)
fk+1 = fk + f0−∑
n∈Z
(∫ yn
yn−1
f (x)dx
)
Ω
pi
LxnsincΩ, k ≥ 0 (2.13)
where LxnsincΩ corresponds to the translation of sincΩ by xn.
Then f = lim
k→∞
fk in L2(R) and the error is given by
|| f − fk||22 ≤
(
δΩ
pi
)k+1
|| f || (2.14)
2.4 Pulse Code modulation
One way of encoding amplitude into a single bit stream is pulse code modulation (PCM). Since
PCM relates to the IFN, it is worth to review some of its forms, such as the delta modulation or
the sigma-delta modulation.
2.4.1 Delta Modulation
The delta Modulation (DM or ∆M) was invented around the 1940s. Due to the still to come
advances in mixed analog-digital very large scale integration (VLSI) at that time, the delta mod-
ulation only received recognition more than a decade after its invention [10]. DM provided also
the fundamentals for later modulations such as the sigma-delta modulation and the adaptive-delta
modulation (ADM).
The encoder of the delta modulation is based on a binary quantizer which quantizes the error
between the samples of the original signal and the samples outputted by a local decoder. This local
decoder consists of an integrator in a negative feedback loop that acts as a predictor of the input
signal. The original signal can easily be reconstructed with an integrator followed by a low-pass
filter. The low pass filter is fundamental in order to filter out most of the quantization noise spread
by the high sampling rate.
Correct conversion can only be possible if correct tracking of the input signal can be achieved
and, in the case of the delta modulation, the maximum allowed amplitude of the input signal de-
creases with the increase of the input signal frequency, making this type of conversion suitable for
conversion of signals whose frequency amplitude spectrum decreases with increase of frequency
[10, 11] (e.g. human speech).
10 Background and Related Work
෍
Digital
Signal
Analog
Input
න
Analog
Output
-
න
Figure 2.2: Encoder and decoder stage of the delta modulation
2.4.2 Sigma-Delta Modulation
The sigma-delta modulation was first proposed by Inose and Yasuda [11] and is very used
nowadays as a method for converting analog signals to the digital domain. Σ∆ converters present a
relatively simple architecture and are very robust against quantization noise and circuit imperfec-
tions for both low and medium bandwidth signals [10, 11, 12, 13], which makes them very popular
for high fidelity audio since these signals have a bandwidth of around 20kHz. Such as in DM, the
sigma-delta modulation requires oversampling in order for higher resolutions to be achieved. For
simplicity purposes a first order sigma-delta modulator will be described in this document even
though higher order sigma-delta converters are often used to achieve better performances.
The encoding stage (figure 2.3) is comprised of an integrator, a comparator and a single bit
DAC in a negative feedback loop. To reduce the amount of quantization related noise, the com-
parator can be substituted by a quantizer with a higher number of bits. The feedback loop is
responsible for the increased robustness in comparison to the more noisy, but very similar, DM.
The feedback loop measures the difference between the converted signal and original input (∆) and
filters (∑) the difference. The net result is a shaped quantization noise that is attenuated around
the base-band frequency of the signal. This, however, works well if the noise can be made uncor-
related with the input signal, which is hard to accomplish with a single bit quantizer and a first
order filter.
෍
Digital
Signal
Analog
Input
න
1-bit
DAC
D   
Analog
Output
-
Figure 2.3: Encoder and decoder stage of the sigma-delta modulation
2.5 Time Encoding Machines 11
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
−1
0
1
Am
pl
itu
de
 (V
)
Time (s)
a)
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
−2
0
2
Am
pl
itu
de
 (V
)
Time (s)
b)
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
−1
0
1
Time (s)
c)
Am
pl
itu
de
 (V
)
Figure 2.4: Behavior of a sigma-delta converter. a) Original input signal to be encoded. b) Input
of the integrator. c) Encoded signal
The recovery of the original signal can now be done by means of a low-pass filter followed
by a decimator to reduce the sampling rate. In figure 2.4, the behavior of the SDM is shown
and in figure 2.5 one can see how the quantization noise is spread throughout a wide bandwidth,
magnified outside the Nyquist bandwidth, while the input signal remains in its original band.
2.5 Time Encoding Machines
As stated by Lazar and Tóth [14], a time encoding machine (TEM) is a real-time asynchronous
mechanism for encoding amplitude information into a time sequence. Just like the IFN, many
technologies being studied as ADCs these days can be seen as a TEM and are benefiting from
irregular sampling. Such TEMs being studied these days as potential ADCs are the frequency
modulation and the asynchronous sigma-delta modulation [15], a derivation of the original sigma-
delta modulation and very similar to the IFN.
2.5.1 Asynchronous Sigma-Delta Modulation
Due to its asynchronous behavior, the ASDM presents several advantages compared to its
synchronous homonym. With this new design, oversampling is no longer required and thus power
consumption and the overall circuit’s activity is significantly reduced as shown in [16, 17].
As proposed by Lazar and Tóth [14], the encoder stage of the ASDM is very simple in terms
of hardware and only composed by an integrator and a non inverting Schmitt trigger as seen in
fugure 2.6. It is guaranteed, by biasing the input signal before being integrated, that the output of
the integrator will be a positive (negative) increasing (decreasing) function of time.
12 Background and Related Work
0 500 1000 1500 2000 2500
−80
−70
−60
−50
−40
−30
−20
−10
M
ag
ni
tu
de
 (d
B)
Frequency (Hz)
Figure 2.5: Frequency spectrum of a digital bitstream encoded using the sigma-delta modulation.
As it can be seen, most of the quantization noise is spread to higher frequencies whilst the original
signal is preserved in the lower part of the spectrum.
There are two possible modes of operation in this modulation, one that enables a positive
feedback and another that enables a negative feedback. The input of the Schmitt trigger will be
an oscillatory function between -B and B and the output of the Schmitt trigger (and feedback)
will be an oscillatory function between -A and A imposing thus the operating modes previously
mentioned.
෍
Digital
Signal
Analog
Input
න
-B B
A
-A-
Figure 2.6: Encoder stage of the asynchronous sigma-delta modulation
Even though asynchrony brings great advantages in the encoding stage, it can lead to a much
more complex recovery of the original signal. Perfect recovery of the input signal using ASDM
is possible [14] and it will be discussed later in this document as it can be reconstructed in the
same way as the IFN modulation can be reconstructed. The behaviour of the ASD encoder is
demonstrated in figure 2.7
2.5.2 Amplitude-to-time Conversion
Amplitude-to-time conversion (ATC) is one of the first TEM to be invented. It consists of a
comparator and an integrator (see figure 2.8), but instead of integrating the signal and comparing it
2.6 Chapter review 13
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
−0.5
0
0.5
Am
pl
itu
de
 (V
)
Time (s)
a)
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
−1
0
1
Am
pl
itu
de
 (V
)
Time (s)
b)
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
−5
0
5
Am
pl
itu
de
 (V
)
Time (s)
c)
Figure 2.7: Behavior of the asynchronous sigma-delta converter. a) Original input signal to be
encoded. b) Encoded signal. c) Integrated signal
to a certain threshold, the threshold is integrated and the input signal is compared to the integration
result of the threshold in a fashion very similar to the IFN encoding technique.
න
Θ
Refractory
Time
Reset
Spiking
Signal
Analog
Input
Figure 2.8: Encoder stage of the Amplitude-to-time conversion
Even though the design is simple and the reconstruction process similar to the IFN or ASDM,
both IFN and ASDM tend to filter noise and have a good response and resolution to peaks in the
signal whilst ATC presents the exact opposite behavior, rendering this modulation less useful for
some applications. The behavior of the encoder is depicted in figure 2.9.
2.6 Chapter review
Everything presented in this chapter is already thoroughly and extensively described in lit-
erature and was added with the intention to make a brief introduction to some basic concepts
and topics related to the integrate-and-fire modulation. The contents of the chapter are therefore
very summarized and for improved understanding it is advised the reading of the aforementioned
references.
14 Background and Related Work
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
0
0.5
1
Am
pl
itu
de
 (V
)
Time (s)
a)
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
0
0.5
1
Am
pl
itu
de
 (V
)
Time (s)
b)
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
0
0.5
1
Am
pl
itu
de
 (V
)
Time (s)
c)
Figure 2.9: This figures shows the behavior of a ATC converter. a) Original input signal to be
encoded. b) Output of the integrator. In this particular case the threshold chosen was 0.05. c)
Encoded signal
Chapter 3
IFN Modulation
The IFN modulation is a time encoding machine and, as such, it shares a few properties and
similarities with the previously discussed TEMs. The encoder stage is responsible for turning the
analog signal at the input into a spiking pulse train that can be both reconstructed with fair preci-
sion (as it will be shown later) or promptly analyzed to retrieve less precise data. Very recently,
Nallathambi also described a method to make direct operations with the encoded signal, including
addition [18] and multiplication [19].
Figure 3.1: Example of a encoded IFN signal. a) Original signal at the input of the encoder. b)
Spiking signal resulting from the encoding process
It is also left clear by observation of Fig. 3.1 that a relation between the original signal’s am-
plitude and spike density exists, having the encoded signal fewer spikes whenever the amplitudes
are smaller and having a higher spike density when the signal shows the opposing characteristics.
15
16 IFN Modulation
3.1 Encoder Stage
The typical IFN encoder can be represented as in Fig. 3.2. It can be made completely analog
since it consists only of an integrator and a comparator. Unlike the ATC encoder (fig.2.8), the IFN
modulation integrates the input signal and compares it to a threshold. The output of the integrator
is given by
v(t) =
∫ t
t0
x(t)eα(t−tk+1) dt (3.1)
where α = 1RC is the leaky parameter of the integrator and x(t) is the input signal. Considering the
leaky parameter close to zero, it will result in a much simpler equation.
v(t) =
∫ t
t0
x(t)dt (3.2)
From then on, the firing condition can also be simply reduced to
|v(tk+1)− v(tk)|= qk,qk ≥ |θ | (3.3)
Ideally, when the firing condition is met, the output should be an impulse and the integrator
promptly reset, but because such behavior is impossible to achieve due to hardware limitations, a
small time interval τr is inevitable. This interval τr is the so called refractory time and it defines
the maximum rate of operation of the IFN. During this interval the IFN is at the so called reset
state.
3.2 Reconstruction
Unlike conventional synchronous modulations, the IFN encoder has a highly nonlinear feed-
back which makes the reconstruction of the original signal not as trivial as in other modulations.
The reconstruction of the signal can be done both in time [20, 21, 22, 23] or frequency domain
[24, 25]. A simple spiking model can also be used to reconstruct the signal [26, 27].
For the reconstruction to be possible, it is also required for the distance between spikes (δ ) to
be inferior to the inverse of the Nyquist rate as proved by Lazar [20].
δmax <
1
fNyquist
(3.4)
From this equation, it is also possible to retrieve an upper-bound for the threshold as long as
some characteristics of the signal being encoded are known. The upper-bound for the threshold is
3.2 Reconstruction 17
න
Θ
Refractory
Time
Reset
Spiking
Signal
Analog
Input
a)
න
Refractory
Time
Reset
Spiking
Signal
Analog
Input
Θ𝑚𝑖𝑛
Θ𝑚𝑎𝑥
b)
Figure 3.2: Encoder stage of the integrate and fire. a) represents the uniphasic IFN encoder. b)
represents the biphasic IFN encoder.
then given by the following relation
θ = Amin ∗ fsamplingfNyquist (V · sample) (3.5)
where Amin is the minimum amplitude of the signal being encoded, fNyquist the Nyquist frequency
of the input signal and fsampling the sampling frequency of the encoder’s output. This relation
follows directly from equations 3.2, 3.3 and 3.4.
θ =
∫ tk+1
tk
x(t)dt (3.6)
In the worst case,
θ =
∫ tk+tNyquist
tk
Amindt (3.7)
18 IFN Modulation
θ = Amin ∗ (tNyquist + tk)−Amin ∗ tk = Amin ∗ tNyquist (V · sample) (3.8)
This bound can also be particularly useful when the input signal has a small peak-to-peak
amplitude and instead of the minimum amplitude the signal’s average can be used.
Further ahead in this document it will be shown that this relation can in fact be used and it will
be shown that the upper-bound is in fact close to the optimum threshold.
3.2.1 Spiking Model Reconstruction
If a bandlimited signal x(t), having k IFN spikes is considered, x(t) > 0 and can be modeled by
x(t) = h(t)∗
k
∑
j=1
w jδ (t− s j) =
k
∑
j=1
w jh(t− s j) (3.9)
where s j =
t j+1+t j
2 , w j = x(s j) and h(t) is the sinc function (impulse response of the ideal low-pass
filter).
In matrix form, the equation takes the form: [W ]k = [w j], [q]l = x(tl) and [H]k,l = [h(t− s j)]
q = HTW (3.10)
If the non-leaky simplified version of the integrator is considered, the threshold θ is equal to:
θ =∑
j
w j
∫ ti+τr
ti
h(t− s j)dt =∑
j
w jci, j (3.11)
with
ci, j =
∫ ti+τr
ti
h(t− s j)dt (3.12)
The weights can then be simply computed and the input signal is able to be reconstructed
W =C−1Θ (3.13)
q = HTC−1Θ (3.14)
3.2.2 Spline Based Reconstruction
In [22], Lazar describes a spline-based method to recover a signal encoded using one or more
leaky integrate-and-fire neurons (LIF) by associating the spiking model with a series of projections
in the L2 space.
3.2 Reconstruction 19
Assuming a threshold θ , a Capacitance C, a resistance R and a constant bias b, the membrane
potential of the LIF neuron is governed by the differential equation
C
dV (t)
dt
=−V (t)
R
+u(t)+b (3.15)
with initial condition V (0) = 0 and reseting conditions
V (tk) = θ −→ lim
t→tk
V (t) = 0 (3.16)
for all t ∈ [0,T ], and k = 1,2, ...n. By solving the differential equation, the LIF neuron’s t-
transform can be obtained and is given by
∫ tk+1
tk
(u(s)+b)exp
(
− tk+1− s
RC
)
ds =Cθ (3.17)
Rewriting the t-transform of the LIF neuron in an inner-product form, the following is obtained
〈u,φk〉= qk (3.18)
with
φk = exp
(
− tk+1− t
RC
)
1[tk,tk+1](t) (3.19)
qk =Cθ −bRC
(
1− exp
(
− tk+1− tk
RC
))
(3.20)
Since the signal u originates a set of spikes (tk,k = 1,2, ...,n), another signal û is said to be
consistent and to generate a consistent reconstruction of u if and only if û triggers the exact same
spikes (tk,k = 1,2, ...,n) as u and the following equivalence is true
〈u,φk〉= 〈û,φk〉 (3.21)
A consistent reconstruction û can also be optimal if it minimizes the quadratic criterion
||Ju||=
(∫ T
0
(
d2u
ds2
)2
s
)1/2
(3.22)
with ||Ju|| being the norm of the second derivative of the reconstructed stimulus.
It follows then that the optimal consistent reconstruction û solves the spline interpolation prob-
lem and is unique [22].
û = d0+d1t+
n−1
∑
k=1
ckψk(t) (3.23)
where
ψk(t) = (φk ∗ |.|3)(t) =
∫ tk+1
tk
|t− s|3exp
(
− tk+1− s
RC
)
ds (3.24)
20 IFN Modulation
The coefficients ck,k = 1,2, ...,n, d0 and d1 must also satisfy the matrix equations
G p rpT 0 0
rT 0 0

 cd0
d1
=
q0
0

where
[p]k = 〈φk,1〉 (3.25)
[r]k = 〈φk, t〉 (3.26)
[G]k,l = 〈φk,ψl〉 (3.27)
As in the spiking model this problem can be easily simplified if we assume a non-leaky inte-
grate and fire encoding mechanism. If so, qk = θ , φk = 1[tk,tk+1](t) and
ψk(t) =
∫ tk+1
tk
|t− s|3ds (3.28)
3.2.3 Frequency Domain Reconstruction
In the frequency domain reconstruction, it is assumed that the input signal can be represented
as a sum of complex exponentials [24, 25]. As such, in resemblance to eq.2.11, a non-uniformly
sampled signal can be modeled as
x(t) = ∑
k∈Z
ckg(t− tk) (3.29)
where ck is a vector of weights and
g(t) = α
M
∑
n=−M
e jn
Ω
M t (3.30)
whose limit, when M tends to infinity, is equal to the original impulse response.
In matrix form, where [q]l = x(tl) and [S]m,l = e− jm
Ω
M tl , equation 3.2.3 takes the form
Sq = αSSHSc (3.31)
where αSSH is a Toeplitz and Hermitian matrix.
3.3 The IFN compared to other modulations 21
With some reformulations [25], this classical algorithm can be converted into a fast recov-
ery version of the original. By denoting [D] = diag(tl+1− tl) and [P]i,k = δl+1,k− δl,k, equation
reformulated to
SDP−1q = αSDSHSc (3.32)
with
T = αSDSH , d = αT+SDP−1q (3.33)
where T is again both Toeplitz and Hermitian. The original input signal can then be approximated
by the following function f
f (t) =
jΩ
M
M
∑
n=−M
ndne jn
Ω
M tk (3.34)
3.3 The IFN compared to other modulations
Compared to the PCM based modulations, such as the DM or SDM, the IFN has a relatively
simpler encoder and it can be fully analog. It also allows to take advantage of nonuniform sam-
pling. On the other hand, decoding the IFN signal is way more complex and still unknown whether
it can achieve as good of a resolution as the SDM or if it even shares the same (or similar) noise
shaping characteristics, spreading quantization noise over a wide bandwidth. Furthermore, the
IFN reconstruction is very dependent on the signal characteristics and has many parameters that
require adjustments.
Just like the IFN, the above presented asynchronous modulations can benefit if the general
characteristics of the input signals are known. The IFN and the ATC have mirror-like character-
istics and even though reconstructing the ATC signal is significantly easier, ATC presents worse
results for noisy input signals.
The main reasons to explore the IFN modulation are still the low power characteristics, low-
area costs and low amplitude related noise associated with the IFN architecture not found in most
common ADC technologies.
3.4 Empirical Test Results
Some scripts were written in MATLAB in order to test the IFN reconstruction techniques
mentioned in the previous chapter. Two Filtered White Gaussian Noise (FWGN) with cutoff fre-
quency of 1.5 kHz and sampled at 10 MHz to provide higher resolution and throughput were used
as the system’s input and tested individually for each reconstruction procedure. The FWGN sig-
nals were also given a DC offset of 0.6 V and were re-scaled to have low peak-to-peak amplitude.
22 IFN Modulation
Both these characteristics can easily be added to a signal without loss of their characteristics by
the introduction of an amplifier.
Figure 3.3: FWGN signals used as example set
Figure 3.4: Reconstructions Results
3.4.1 Recovery Test Results
In figure 3.4 the reconstruction results for the three presented methods are displayed for each
FWGN. One can clearly see, as expected, that the signal with a smaller peak-to-peak amplitude
behaves slightly better and that each reconstruction method has a distinctive response.
Both the complex exponential and the cardinal sine method seem to reach their threshold upper
bound approximately at the same point. In the case of the spline method such is not verified possi-
bly because the cutoff frequency of the low-pass filter inherent to the reconstruction is dependent
of the spike distance, therefore directly dependent on the threshold value. Such behavior is ob-
servable in figure 3.5. The optimum ENOB is reached at a reconstruction filter’s cutoff frequency
close to 2 kHz (slightly above the 1.5 kHz maximum frequency due to the non ideal filter used)
with the adequate threshold (see fig. 3.5). The low-pass filter inherent to the modulation scheme
is also visible by the increasing ENOB as the reconstruction filter reaches the Nyquist frequency,
suggesting the correct reconstruction of lower frequency elements in the signal.
3.4 Empirical Test Results 23
Reconstruction Filter Frequency (Hz)
Th
re
sh
ol
d 
(V
.sa
mp
le)
 
 
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
500
1000
1500
2000
2500
3000
−6
−4
−2
0
2
4
6
8
10
12
ENOB
Upper Bound
Figure 3.5: ENOB of a reconstructed signal according to the Threshold θ and reconstruction filter
frequency used. The upper bound of eq. 3.5 is seen in the figure as being very close to the actual
optimal value
Each method also presents a very particular characteristic that is the deviation of the threshold
required in the reconstruction that enables the best recovery quality (θo f f set). Because this offset
is dependent of the reconstruction method and appears to be independent of the input signal, it
might be easily discovered through a tunable system or a feedback loop that automatically finds
the optimum θo f f set . Figure 3.7 shows the offset for each tested reconstruction method (tested
with both FWGN signals) and also shows how precise said offset must be. The spiking method
seems to be the less offset sensible method as opposed to the spline-based reconstruction method,
reaching an offset of about 5 V·sample. As far as offset sensitivity goes, all methods seem to show
similar behaviors of small error tolerance towards the optimum reconstruction threshold.
The major drawback of these three methods compared to other currently used modulation
schemes is the oversampling required. In this specific case, a frequency of 10 MHz is being used
to sample a signal whose cutoff frequency is 1.5 kHz, approximately 3333.3 times higher than
the associated Nyquist frequency. It should be noted that it is possible to use lower sampling
frequencies to recover the original signal but they come at a high cost, lowering considerably the
ENOB.
3.4.2 IFN and the ECG signal
Even though FWGN is a good test signal as it should contain all frequencies below the cutoff
frequency of the filter used, it might not represent the characteristics of inputs applied to the
24 IFN Modulation
system in a real application. To test the possible response of the IFN modulation to such a system,
a resized ElectroCardioGram (ECG) [28] from an online biological signal bank was used [29].
It was assumed without loss of generality that the maximum diagnostic relevant frequency
contained in an ECG signal was of 120 Hz. The original sampling rate of the signal was500 Hz,
but to increase the recovery quality and emulate an analog input it was up-sampled to 150 kHz
using interpolation, giving a sampling frequency 625 times higher than the corresponding Nyquist
frequency. The reconstruction method used was the cardinal sine spiking model. Using equa-
tion 3.5, the threshold used was 376 V·sample. The number of expected spikes can be found by
calculating the area of the signal’s average DC component and dividing it by the threshold.
µ ∗Nsamples
θ
=
0.6∗151800
376
= 242 (3.35)
The results show a promising recovery with an ENOB peaking at slightly below 6.4 bits in the
QRS intervals, allowing the signal to keep its primary features and characteristics (see fig. 3.6).
The number of spikes in the encoded signal were 240, giving a prediction error of around 0.83%.
This low prediction error is good as it is important to be able to size the memories holding the
spiking information in hardware with accuracy and thus use less resources and spare power. It
should be noted however that a much better ENOB is possible. By increasing the ECG signal’s
sampling frequency four fold to 750 kHz, an ENOB of 7.4 is achieved. An ENOB higher than 8
was only reachable when the sampling frequency was above the 1.5 MHz mark.
error =
|240−242|
240
∗100 = 0.83333 (3.36)
Figure 3.6: Partial Reconstruction of an ECG signal
3.4 Empirical Test Results 25
Threshold θ,(V.sample)
Si
nc
 1
 
 
−
5
0
5
10
00
15
00
20
00
25
00
567891011
Threshold θ,(V.sample)
Si
nc
 2
 
 
−
5
0
5
10
00
15
00
20
00
25
00
56789101112
Th
re
sh
ol
d 
O
ffs
et
,(V
.sa
mp
le)
Sp
lin
e 
1
 
 
−
5
0
5
10
00
15
00
20
00
25
00
6789101112
Th
re
sh
ol
d 
O
ffs
et
,(V
.sa
mp
le)
Sp
lin
e 
2
 
 
−
5
0
5
10
00
15
00
20
00
25
00
6789101112
Fo
ur
ie
r 1
 
 
−
5
0
5
10
00
15
00
20
00
25
00
0246810
Fo
ur
ie
r 2
 
 
−
5
0
5
10
00
15
00
20
00
25
00
024681012
Fi
gu
re
3.
7:
R
ep
re
se
nt
at
io
n
of
th
e
ef
fe
ct
s
of
th
e
re
co
ns
tr
uc
tio
n
th
re
sh
ol
d
of
fs
et
θ o
ff
se
t
an
d
th
re
sh
ol
d
θ
on
th
e
E
N
O
B
of
th
e
re
co
ns
tr
uc
tio
n
26 IFN Modulation
Chapter 4
Implementation and Architecture
In this chapter the structure and architecture of the IFN decoder implemented in HDL will be
exposed and detailed.
4.1 Adapted IFN Algorithm
From the above mentioned reconstruction algorithms, the one chosen for hardware implemen-
tation was the spiking method. The choice is mostly supported by the algorithm’s simplicity in
association with the good reconstruction results achievable. Furthermore, the optimal threshold
seems easier to predict compared to the spline method and in general less devastating to the re-
construction quality in case a higher value is used compared to the Fourier method. This last point
is particularly important in choosing the reconstruction method since a good threshold estimation
is fundamental to optimize the encoder’s power consumption and the decoder’s reconstruction
quality.
Even though the algorithm is simpler, it requires a few adaptation in order to be implemented in
hardware as it would be a very cumbersome task to generate a cardinal sine function in hardware.
To overcome this issue, and since it holds no foreseeable consequence to the overall reconstruction
quality, it was decided to load to the hardware’s memory a previously computed cardinal sine and
cardinal sine integral (SinInt) function. Besides clearly simplifying the calculation process, this
solution decreases drastically the computational time since the generation of the cardinal sine
integral function is done only once and prior to processing. The major disadvantage of the method
is the clear increase in occupied memory space, but it is a trade-off worth taking to maintain the
system as real-time compatible as possible.
Another major issue with this algorithm is solving equation 3.14. Since it is very likely for the
matrix C to be ill-conditioned or even singular, most algorithms are not suitable for solving such
system. A naive and simple approach to this would be to use the Moore-Penrose pseudo-inverse
but, as the algorithm is very hardware and time demanding, it would be unfit for a real-time system
such as the IFN decoder. Another approach would be to use a fast matrix inversion algorithm such
27
28 Implementation and Architecture
as QR decomposition. As described in [30] and [31], QR decomposition shows promising results
for large scale matrix inversions and their subsequent hardware implementation but it still faces a
major issue. Processing speed comes at a major hardware cost and the total execution time is still
in the n3 magnitude [31].
Instead of opting for direct matrix inversion algorithms, the conjugate gradient squared (CGS)
was chosen as it could provide a solution to the linear system in a relatively fast iterative algorithm
of O(n2). The final adaptation can be seen in algorithm 1.
4.1.1 Conjugate Gradient
The conjugate gradient method was first introduced in 1952 by Hestenes and Stiefel [32] and
even though not widely used at first, its efficiency in solving large sparse least-squares problems
was later recognized. The original algorithm can only solve linear equations of the type Ax = b
whenever the matrix A is real, square, symmetric and positive-definite (RSSPD) but throughout
the years, many other CG based algorithms were developed to solve a wider range of systems. To
solve non-symmetric linear systems, a particular variant of the CG algorithm is widely used, the
conjugate gradient squared.
Although several variants of the algorithm exist [33], only the main algorithm was tested and
implemented as the number of iterations required to reach a desirable error seemed few enough.
A matlab implementation of a preconditioned form of the algorithm was also put to the test but no
efficient preconditioning was found.
A hardware implementation of the CGS algorithm (see algorithm 2) seems to bring a few
advantages as the algorithm can be partially parallelized and pipelined. The algorithm’s major
flaws are the possible bottlenecks in the matrix-vector multiplications (MVM) of lines 12 and 16.
Memory bandwidth and size cal also prove to be problematic for large scale systems.
4.2 Architecture
The IFN decoder was designed with Verilog HDL. To ensure a better performance, all of the
fundamental calculations were performed with Single-Precision Floating-Point (SPFP) instead of
Fixed-Point (FP). Double-Precision Floating-Point (DPFP) was also considered as an option but it
was deemed less viable as it would require a much larger area of the device, and thus compromising
the design’s viability. All of the memories used in the design are Simple Dual Port RAM (SDPR)
except when otherwise noted.
As it can be seen in figure 4.1, the decoder has six input ports (marked with an "I"), two of
which are optional, and two output ports (marked with an "O"). In total, the decoder has four
main memories, an SPFP multiplier, an SPFP subtractor, a Memory Manager Unit (MMU) and a
Conjugate Gradient Squared Unit (CGSU). Both the MMU and the CGSU are controlled by an
4.2 Architecture 29
Algorithm 1 Hardware Adapted Spiking Model IFN Decoder
1: Inputs :
DataIn, Threshold, MaxIterations.
2: Outputs :
DataOut.
3: In Memory :
Sinc, SinInt.
4: procedure SPIKE LOCATION
5: Nspikes = 0
6: for i < Sizeo f DataIn do
7: if SpikeOccurs then
8: SpikeNspikes = i
9: CenterNspikes = i−Count/2
10: Count = 0
11: Nspikes = Nspikes+1
12: else
13: Count =Count+1
14: end if
15: end for
16: end procedure
17: procedure CALCULATION OF MATRIX C
18: for i < Nspikes do
19: for j < Nspikes do
20: Ci, j = SinInt(Centrali+Spike j+1)−SinInt(Centrali+Spike j)
21: end for
22: end for
23: end procedure
24: procedure CALCULATING WEIGHTS
25: Weights = CGS(C, Threshold, MaxIterations)
26: end procedure
27: procedure CALCULATING RECONSTRUCTED SIGNAL
28: DataOut = 0
29: for i > Sizeo f DataIn∗ 13 and i < Sizeo f DataIn∗ 23 do
30: for j < Nspikes do
31: DataOuti = Sinc(i−Central j)∗Weights j +DataOuti
32: end for
33: end for
34: end procedure
35: Return DataOut
30 Implementation and Architecture
Algorithm 2 CGS Algorithm
1: Inputs :
Matrix, b, MaxIter.
2: Outputs :
Result.
3: ro = b
4: x = q = p = zeros(NSpikes)
5: β = α = σ =CurrIter = 0
6: ρ = 1
7: while CurrIter < MaxIter do
8: ρ = (r,r0)
9: β = ρ/ρ−1
10: u = r+β ∗q
11: p = u+β ∗ (q+β ∗ p)
12: v = Matrix∗ p
13: σ = (v,r0)
14: α = ρ/σ
15: q = u−α ∗ v
16: r = r−α ∗Matrix∗ (u+q)
17: x = x+α ∗ (u+q)
18: xsum = sum(|x|)
19: if xbsum > xsum or CurrIter = 0 then
20: xbest = x
21: xbsum = xsum
22: end if
23: ρ = ρ1
24: CurrIter =CurrIter+1
25: end while
26: Return Result
4.2 Architecture 31
independent FSM and their implementation will be described in more detail further ahead in the
document. The size of the memories inside the decoder must be chosen according to the amount
of samples and spikes expected. Because the decoder built was made to support a dynamic amount
of spikes given inputs with the same amount of samples, the memories were dimensioned to be
bigger than necessary.
CLK 
100MHz 
Memory Manager 
Si
n
c_
In
te
g 
M
e
m
 
Si
n
c 
 
M
e
m
 
Subtractor  
Input Data Sinc_in 
    Integration 
    Sinc_in 
Matrix C Memory 
   
Out In 2 
In 
Out 
Threshold 
Conjugate 
Gradient Squared 
Block 
W
e
ig
h
t 
M
e
m
o
ry
 
 
Multiplier 
Output Data 
Reset 
In 1 
Addr_out 1 
Addr_out 2 
Addr. 
Calculator 
I I I I O 
I I 
Address 
Out 
Matrix In 
Output Valid 
O 
Weights 
Threshold 
Finished 
Start 
Figure 4.1: Block Diagram of the Decoder implementation
4.2.1 Memory Manager
The MMU is where the input data is stored. It is mainly composed by four memories, three of
them to store the current signal being processed and the fourth one to simultaneously write new
data to memory while the current signal is being processed. As to why the signal is divided into
three memories is due to the fact that only a third of the signal is recovered at a time to minimize
the error introduced. The control mechanism of writing and reading is done via an FSM and a
counter.
The input signal arrives in packets of 4kb with a "1" denoting the beginning of the packet. In
between packets only zeros are transmitted. As to prevent an unplaced first reading, it is required
as a starting condition that a certain amount of zeros are read in a row. The amount of zeros for the
starting condition was chosen to be 20000 as it is clearly bigger than a packet and small enough to
fit in between two packets.
32 Implementation and Architecture
CLK 
100MHz Input Data 
RAM 
1 
Counter 
0 
RAM 
2 
RAM 
3 
en_in 
en_out 
en_in 
en_out 
en_in 
en_out 
in 
out 
in 
out 
in 
out 
RAM 
4 
en_in 
en_out 
in 
out 
Reset 
FSM 
STATE 
Counter 
Output 
Data 
Output 
enable 
I I I 
O 
O 
Ready 
I 
Figure 4.2: Block Diagram of the Memory Manager implementation
4.2.1.1 MMU Control
The MMU is the unit whose control is tougher to achieve since it not only depends on an FSM
(see figure 4.4) but also on the received packet flow. If two packets arrive within a time frame
smaller than the time frame between two separate decoder outputs, information will be lost and
the recovered signal will provide wrongful data for a time duration corresponding to four times
the duration of a packet.
The FSM controlling the MMU operates as follows:
1. The IDLE state sets every variable in the system to a known value. The state is reachable
only when a RST signal is applied to the system.
2. Upon receiving the start condition (START = 1), the system changes it state to INIT and
starts counting the number of consecutive zeros at the input. If the counter reaches 20000
without being reset to zero (occurs in the presence of a one at the input), the FSM’s current
state is changed to FILL 1
3. During this stage, the concatenation of memories 2, 3 and 4 (respectively) is outputted and,
as soon as a new packet is detected at the input, MEM1 is filled with the spiking information
in a parallel fashion. The state is changed whenever the input READY is set to 1.
4.2 Architecture 33
1 2 3 4 5 6 7 8
2 3 4 5 6 …
……
…
Receiving 
Stream
Reconstructed
Stream
Invalid/Lost Packet
Valid Packet
1
Packet DataFlag
1 bit 4kb Stream of Data (spiking information)
Invalid Output
Valid Output
Figure 4.3: Packet information and impact on reconstruction
4. The remaining states FILL2, FILL3 and FILL4 have a similar behavior to FILL1 and there-
fore they will not be detailed individually.
INIT
RST
IDLE
START 
FILL 1
FILL 2
FILL 3
FILL 4
𝐶𝑜𝑢𝑛𝑡 𝑁𝑧𝑒𝑟𝑜𝑠
𝑁𝑧𝑒𝑟𝑜𝑠 > 20000
𝑅𝑒𝑎𝑑𝑦
𝐹𝑖𝑙𝑙 𝑀𝐸𝑀1
𝑂𝑈𝑇𝑃𝑈𝑇2,3,4
𝐹𝑖𝑙𝑙 𝑀𝐸𝑀2
𝑂𝑈𝑇𝑃𝑈𝑇3,4,1
𝐹𝑖𝑙𝑙 𝑀𝐸𝑀4
𝑂𝑈𝑇𝑃𝑈𝑇1,2,3
𝐹𝑖𝑙𝑙 𝑀𝐸𝑀3
𝑂𝑈𝑇𝑃𝑈𝑇4,1,2
𝑅𝑒𝑎𝑑𝑦𝑅𝑒𝑎𝑑𝑦
𝑅𝑒𝑎𝑑𝑦
Figure 4.4: FSM controlling the MMU
4.2.2 Spike Location and Matrix C Calculation
After receiving the three packets required, the spike locations, midpoints and total amount
are calculated as the signal is transfered from the MMU to the Address Calculator. This spike
information is passed as an address to the Sinc_Integ memory to calculate the values of the C
34 Implementation and Architecture
Matrix. It should be noted that the Sinc_Integ memory is a true dual port RAM (TDPR) as to
allow a simultaneous read from two different addresses.
4.2.3 CGSU
Being this unit the core of the implementation, it is the most complex one. It implements
the CGS algorithm with a few adaptations to make it slightly quicker. First of all, lines 9-11 of
algorithm 2 were removed from the first iteration (CurrIter = 0) as p= u= r. This attribution was
also parallelized with the Matrix-vector calculation of line 12. The other major modification in the
CGS algorithm was the pipelining introduced in between lines 10-12 and lines 15-17 of algorithm
2 as illustrated in figure 4.5.
In this pipeline two major bottlenecks can be identified, corresponding to the matrix-vector
multiplications in line 12 and 16. These calculations were not parallelized as they would consume
a large area in the device, requiring the addition of a SPFP multiplier and SPFP accumulator per
parallel stage. The total execution time of the algorithm can then be estimated by the following
expression:
Ttotal = Niter ∗2(2+N2spikes+Nspikes) (4.1)
By analysis of the figure 4.6, it can be seen that the implemented algorithm requires eight I/O
ports, two of which are outputs and the rest inputs (assuming external clock signals). CLK_IP is
a clock synchronized with CLK but with a phase offset of 180o. It is used mostly for SPFP Xilinx
Intellectual Property (IP) cores. The Xilinx SPFP IP cores were mostly chosen to be non-blocking
and to have eight pipelining levels. One Multiplier core was chosen to be blocking to improve
control over the calculation of ρ (see algorithm 2). Such constant is specifically important since it
demarcates the beginning of a new CGS iteration.
Conjointly, The CGSU and its inner block sum up to six SDPR and one TDPR to hold the
coefficients of the algorithm. The TDPR is used for the coefficient u as two simultaneous reads
from different memory positions are required for full algorithm optimization.
Because the design was made for real-time system applications, it is required by implemen-
tation to perform always the same number of iterations, independently of the value of the error,
as to improve quality as much as possible within the same time frame. The CGSU interacts with
the decoder unit whenever the error associated with the current iteration is smaller than the pre-
vious ones, or the current iteration is the first (CurrIter = 0). The weights are then written to the
WeightMemory to overwrite previous results. The number of maximum iterations can be hard-
coded into the device or simply provided as an input to the decoder. Since the execution time of
each iteration can be known, whether through empirical testing or theoretical approximations, the
maximum number of iterations for a certain real-time system can be calculated, thus achieving the
best performance possible.
4.2 Architecture 35
Mult_1 Mult_2 
v_1 v_2 … v_n 
Σv_1 Σv_2 … Σv_n 
σ α 
uq_1 uq_2 … uq_n 
x_1 x_2 … x_n 
q_1 q_2 … q_n 
C(1,:)*UQ 
C(1,:)*uq 
r_1 
C(1,:)*uq … C(n,:)*uq 
r_2 … r_n 
Σr_1 Σr_1 Σr_1 Σr_1 
T = 2 u. 
T = 𝑛 u. 
T = 2 u. 
T = 𝑛2 u. 
T = 𝑛2 u. 
T = 1 u. 
ρ β 
T = 1 u. 
ρ_prev 
u_1 u_2 … u_n 
(q+βp)_1 (q+βp)_2 … (q+βp)_2 
p_1 p_2 … p_n 
v_1 
T = 𝑛2 u. 
Figure 4.5: Pipeline Diagram of the CGS Hardware Adapted Algorithm. u. represents a timing
unit given by the delay introduced by the SPFP pipelining
4.2.3.1 CGSU Control
The CGSU control is modeled by an FSM as previously mentioned and its main function is to
keep track of the end of each main section of the algorithm. Figure 4.8 illustrates said FSM and
its operation is as follows:
1. Each time the RST signal is applied, the system return to the IDLE state where all variables
are reset independently of the current state the system was at. To leave the IDLE state,
START must be set to 1 and the FSM is updated to the Start state.
2. If the current state is Start, the computation of ρ,ρprev and β takes place. After detecting the
end of the aforementioned calculations, the state will change to State 1.1 (in case CurrIter=
0) or State 1.2 (in case CurrIter! = 0).
3. At this point, the calculation of u, p and v (or just v if the current state is State 1.1) takes
place in a pipeline fashion. The FSM will then detect the end on the pipeline and pass on to
State 2
36 Implementation and Architecture
CLK 
100MHz Threshold Start 
BUF. 
Mult. 
CLK_IP 
100MHz 
ρ 
Matrix 
C 
Matrix C p 
CGS Inner 
Block 
error 
Mult. Div. 
ρ_prev 
Fused  
Multiply Sum 
ρ 
p 
q 
β 
Σr 
Threshold 
Fused  
Multiply Sum 
 r 
q 
Fused  
Multiply Sum 
β 
Weights 
Counter 
Finished 
Comparator 
Best error 
Weights 
u 
p 
0 
 
Reset 
I 
O O I I I I I 
Figure 4.6: Block Diagram of the CGS implementation
4. Once again, the FSM waits for the end of the computation of variables σ and α to pass to
the next stage.
5. At State 3 the FSM must wait for the calculation of the weights and error vector. It must
then take action depending on two parameters, the sum of the error xsum and the current
iteration CurrIter.
• In case the error xsum is smaller than the currently stored smaller error xbsum the FSM’s
state changes to Output
• In case the current iteration does not provide a smaller error and the iteration number
CurrIter is smaller than the maximum amount of iteration MaxIter, the FSM will set
its state to Start
• If none of the previous conditions are verified, the FSM is set to IDLE
6. After providing the output to the decoder, the state machine has to once more verify if the
current iteration is still within the accepted range. If such condition is not met, the FSM
goes back to the IDLE state. On the other hand, if at least one more iteration is required, the
system is set to the Start state.
4.2 Architecture 37
CLK 
100MHz p Reset 
CLK_IP 
100MHz 
β 
 
Matrix 
C 
Multiplier 
Accumulator 
Accumulator 
Mult. ρ Div. ρ 
Fused  
Multiply Sum 
σ Σv 
v 
u 
Add 
u 
Fused  
Multiply Sum 
x 
u+q 
Multiplier 
Accumulator 
Fused  
Multiply Sum 
r 
α 
α 
Accumulator 
ABS. 
Accumulator 
Error Σr 
Weights 
I I I I 
I 
O O 
O 
Figure 4.7: Block Diagram of the Inner CGS Block implementation
4.2.4 Sinc Weighting
Once the CGS algorithm reaches its final iteration, the final multiplication of the cardinal sine
functions by the corresponding weights is done. As supra cited, only a third of the total signal
is recovered at this point. During this final MVM two values are outputted, a signal denoting the
reconstruction value in SPFP and another denoting whether the value of said output is valid or not.
4.2.5 Master Control
The master control of the decoder is modeled by an FSM with five different states. Figure 4.9
depicts the master FSM and it operates as follows:
1. Just like the previous FSMs, this model also has an IDLE state to set all the variables to a
known state.
2. After START being applied to the system, the Spike Location procedure of algorithm 1 will
commence. This includes receiving and processing a stream of information received from
the MMU. Once the processing is finished, the FSM’s current state is changed to CALC C
3. Once in CALC C, the decoder will calculate each position of matrix C and map them in to
the Matrix C Memory. As soon as the process finishes, the state s changed to CGS and the
CGS algorithm will start to be computed.
38 Implementation and Architecture
Start
𝐶𝑎𝑙𝑐⍴
RST
IDLE
State 2
𝐶𝑎𝑙𝑐σ
𝐶𝑎𝑙𝑐α
Output
Weights
Output
START 
… To IDLE
State 1.1
𝐶𝑎𝑙𝑐𝑣
State 1.2
𝐶𝑎𝑙𝑐𝑢
𝐶𝑎𝑙𝑐𝑝
𝐶𝑎𝑙𝑐𝑣
State 3
𝐶𝑎𝑙𝑐𝑞
𝐶𝑎𝑙𝑐𝑢𝑞
𝐶𝑎𝑙𝑐𝑥
𝐶𝑎𝑙𝑐𝑟
𝐶𝑎𝑙𝑐⍴_𝑝𝑟𝑒𝑣
𝐶𝑎𝑙𝑐β
Figure 4.8: Diagram of the FSM controlling the CGSU
4. Throughout the Calculating Weights procedure (see algorithm 1), the decoder will store the
outputted weights in the Weight Memory and the FSM will change state as soon as it receives
a finished flag from the CGSU.
5. At this stage, the decoder will compute the reconstructed signal. It should be once again
noted that only the second third of the reconstructed signal to minimize errors due to lack
of spiking information.
6. After reconstructing the desired portion of the signal, the decoder will move on to the Spike
Location state where it waits for a new input.
4.3 Other Implementation notes
It should be noted that both the Integration Sinc_in and Sinc_in input are optional because the
values hard-coded in both memories are usable for many different applications. The threshold,
unlike both inputs previously described, can be changed during the device’s operation.
Another important characteristic of the implementation to take in consideration is the lack of a
sliding window. Even though MATLAB simulations showed a very small and almost insignificant
error when putting all the reconstructed parts together, if desired, this glitch can be partly smoothen
4.3 Other Implementation notes 39
START
𝑪𝒂𝒍𝒄 𝑪
𝑪𝑮𝑺
𝑹𝒆𝒄𝒐𝒏𝒔𝒕𝒓𝒖𝒄𝒕
𝐶𝐺𝑆𝐹𝑖𝑛𝑖𝑠ℎ𝑒𝑑
RST
IDLE 𝑺𝒑𝒊𝒌𝒆 𝑳𝒐𝒄𝒂𝒕𝒊𝒐𝒏
𝑆𝑃𝐼𝐾𝐸𝑆𝐹𝑜𝑢𝑛𝑑
𝑅𝐸𝑆𝑈𝐿𝑇𝑂𝑢𝑡
𝑀𝑎𝑡𝑟𝑖𝑥𝐶
𝐷𝑜𝑛𝑒
Figure 4.9: Representation of the FSM controlling the Decoder
by introducing a low-pass filter at the output of the decoder since the noise reveals itself as a high
frequency spike .
The packet size was chosen to be 4kb for several reasons but it is not required to have this
exact size. There should be a couple of things to take into consideration when dimensioning the
packet size and they are:
• The packet size should be a power of two or a sum of powers of two for improved and more
efficient memory sizing
• Since the amount of spikes in an encoded IFN signal depends directly on the packet size, the
system should be designed to optimize the memory space of all variables that have their own
size dependent of the number of spikes. Therefore, the expected amount of spikes should be
slightly smaller than a power of two
• The packet size should not be too small since reconstruction quality is directly impacted by
the amount of information at the decoder’s dispose
• A design change in the MMU could optimize the memory usage of the decoder if instead
of storing the original packet information in the memory, the distance between "1"s in the
spiking signal is stored. The passage of information from the MMU to the decoder would
then be done in a very similar fashion to the already implemented but a adder unit should be
introduced to the design.
40 Implementation and Architecture
Chapter 5
Design Testing and Results
In order to test the logic design, an experimental setup was made using interaction between
the programmable logic (PL) and a processing system (PS).
Figure 5.1: Block diagram of the board design
5.1 Hardware Setup
The design was implemented and tested in a ZYNQ-7 ZC706 Evaluation Board (xc7z045ffg900-
2). Since the board in question supports secure digital (SD) card reading and writing, a linux-based
operating system (OS) was installed in the SD card. This OS is then accessible and configurable
via a secure shell (SSH) connection.
To establish the desired interaction between the PL and PS, a board design must be created in
Vivado Design Suite. It should be mentioned at this stage of the document that future power and
resource utilization results do not take in consideration the design blocks added to allow for this
41
42 Design Testing and Results
PS-PL interaction. To communicate with the decoder’s top module, an AXI lite slave interconnect
port was added at the input to set the decoder’s threshold θ and the CGS’s maximum number of
iterations. The start and reset inputs were mapped to external pushbuttons. As it was required to
add a BRAM to store the output data of the decoder, a counter was added to the design to specify
the BRAM writing address. Finally, the module was wrapped in a custom IP to interact with other
blocks available in Vivado. The schematic can be seen in figure 5.1.
To control the inputs and the output retrieval of the implemented decoder, a C program must
be developed. Most of the design’s configuration is performed automatically by Vivado, but it is
still necessary to write data to the designed IP’s inputs and retrieve the reconstruction data from
the PL’s BRAM. This is achieved by mapping the a target physical address to virtual address
space by means of a special function called getvaddr. In this design, there are only two physical
addresses that should be converted, the AXI lite interface address (0x43C0000) and the output
BRAM address (0x8000000).
5.2 Experimental Results
The experimental results will mainly focus on the area occupied on the target board, time
duration of the calculation, power consumption and reconstruction quality.
Concerning resource utilization and area occupied on the target board, most of the space was
occupied by the Xilinx SPFP IP cores that needed to be used for FP calculation. By comparing
tables 5.1 and 5.2, one can clearly see that these cores occupy more than 80% of the total amount
of resources used.
Xilinx SPFP IP cores
IP core No Used FFeach FFtotal LUTeach LUTtotal
Fused Multiply-Add 6 580 3480 634 3804
Accumulator 5 1001 5005 2421 12105
Multiplier 2 173 346 110 220
Add/Subtract 2 327 654 258 516
Fixed-to-Float 1 109 109 232 232
Multiplier (Blocking) 1 466 466 292 292
Divider 1 463 463 833 833
Comparator (Less-Than) 1 10 10 46 46
Total - - 10533 - 18048
Table 5.1: Xilinx SPFP IP cores resource utilization
As for the power consumption, it seems that the design manages to me somewhat power effi-
cient since it occupies a significant chunk of the board and most of the dissipated a small amount
of combined static and dynamic power (see table 5.3).
Even though only three operating frequencies were experimented with the board, it is estimated
by simulation that the maximum frequency of operation to be about 110 MHz. It is also possible
5.2 Experimental Results 43
Total Resource Utilization
Resource Type Utilization Available Utilization %
LUT 20425 218600 9.34
LUTRAM 663 70400 0.94
FF 12141 437200 2.78
BRAM 73 545 13.39
DSP 64 900 7.11
BUFG 2 32 6.25
Table 5.2: Total resource utilization
to say that, at a frequency of 100 MHz, it takes about 2033 µs to finish the decoding of a signal,
using 30 iterations of the CGS algorithm, that is encoded with 16 spikes and has a length of 4096
samples. Unfortunately, the ENOB resulting from reconstructing the ECG signal introduced in
section 3.4.2 is only slightly above the 6.5 bit mark for the QRS section of the ECG signal and
near the 4.7 bit for the remainder of the signal. An excerpt of the the recovered signal equivalent
to two packets can be seen in figure 5.2. The excess time fraction between two packets can be
calculated as follows:
Nsamples
fsample ∗ treconstruction =
4096
150k ∗2.03µ = 13.45 (5.1)
This indicates that the signal is in fact possible to decode in real time and that a signal with
the same characteristics and a sampling frequency of up to 2.02 MHz can be reconstructed in real
time.
Figure 5.2: Comparison between the output result of the FPGA and the real signal. This part of
the signal shown corresponds to a part of the first QRS section of the ECG signal in fig. 3.6
By analysis of fig. 5.2 and the results achieved in section 3.4.2 one can draw a few conclusions
about the reconstruction methods. On one hand, MATLAB and the FPGA implementation seem
to have a very similar behavior and both can highly benefit from the increase in number of CGS
44 Design Testing and Results
Power Consumption (W)
50 MHz 75 MHz 100 MHz
Static Power 0.231 0.231 0.232
Dynamic Power 0.091 0.168 0.223
Total Power 0.322 0.399 0.454
Table 5.3: Implementation’s power consumption in Watt
algorithm iterations. On the other hand, the encoding and decoding parameters identified seem
to largely influence the final outcome of the reconstruction process. Overall, the FPGA recovery
method seems to perform better in terms of computational time compared to MATLAB, recovering
the same amount of information in less than 30 times the time required by software (it should be
noted that the simulation time in MATLAB includes also latency introduced by external sources,
e.g. OS, and is done at a different clock speed).
As previously mentioned in chapter 3, it is possible to improve significantly the reconstruction
quality by increasing the time resolution of the IFN signal. If a sampling frequency of 300 kHz
were to be used for the encoding of signals with a maximum frequency of 120 Hz, a better ENOB
could be achieved. To test this premise, the original implementation in chapter 4 was slightly
altered to support packets with a size of 8kb and the outcome was simulated using Vivado Design
Suite and MATLAB. The impact of this change is very subtle, mostly affecting the size of the
Sinc_Integ and Sinc memory (see fig. 4.1). It also increases the size of the memories in the MMU
to double their original size but, if the improvement suggested in section 4.3 is implemented, these
memories can keep the same size as the number of spikes is expected to remain the same.
In figure 5.3 one can see the reconstruction error of the MATLAB and FPGA recovery of
two different FWGN signals sampled at 300 kHz with a maximum frequency of 120 Hz. The first
signal achieves an ENOB of 8.32 bit in MATLAB and 7.48 bit in the FPGA and the second 7.8
in MATLAB and 6.98 in the hardware implementation. Several other FWGN signals with the
same characteristics were tested and were consistently reconstructed with ENOBs between 6.8
and 7.5, outperforming FWGN signals with half the rate between timing resolution and maximum
frequency.
5.3 Matrix-Vector Multiplication
In the implemented design there are several matrix-vector multiplications that seem to con-
sume most of the total calculation time and as they were not implemented using parallel computing
algorithms, every matrix-vector multiplication has a calculation time of around O(n∗m). This is
extremely time consuming especially for the final reconstructed signal multiplication in the order
of O(Nspikes ∗Nsamples) or even for the CGS algorithm that has two matrix-vector multiplications
of O(N2spikes) per iteration.
5.3 Matrix-Vector Multiplication 45
Figure 5.3: Reconstruction error of two FWGN signals.
46 Design Testing and Results
5.3.1 Matrix-Vector Parallelization
Matrix-vector multiplication has been extensively discussed in literature and there are two
methods that standout, the row-wise block partitioning and the column-wise block partitioning
(see fig. 5.4). These methods allow the computation of matrix-vector multiplications in approx-
imately O(n
2
p ) time (assuming a squared matrix of size n ∗ n), where p denotes the number of
partitions. Even though this would speedup the calculations significantly, implementing any of
these methodologies in hardware can be quite cumbersome as it requires more hardware (at least
a SPFP accumulator and a SPFP multiplier per partition) and a more complex control system.
b) a) b) a)
Figure 5.4: Diagram of matrix-vector multiplication methodologies. a) represents row-wise block
partitioning. b) represents column-wise block partitioning
5.3.2 Structured matrix approach
On the final matrix-vector multiplication in the algorithm (see equation 3.10) there is a visible
structure in matrix HT that can be explored in order to perform this calculation in a smaller amount
of time.
If instead of [H]k,l = [h(t− s j)] a full matrix of size l ∗ l that includes every shifted cardinal
sine wave is used (denoted as [H f ]l,l), an almost equivalent system of equations can be used if the
weight vector is zero padded such that [Wz]l =w j, l = j. It is now clear that matrix H f is a Toeplitz
matrix and
H ∗W = H f ∗Wz = q (5.2)
5.3 Matrix-Vector Multiplication 47
Since the convolution of a vector a of length M by a vector b of length N can be written as
the product of a M ∗N Toeplitz matrix and a vector of length N [34], the Fast Fourier Transform
(FFT) can be used to calculate the product y = Tnb in O(n logn) time following these steps
1. Embed the Toeplitz matrix Tn into a Circulant matrix C2n
2. f = F2nb
3. g = F2na
4. zT = [ f1g1, f2g2, ..., f2ng2n]
5. y = F∗2nz
where a is the first column of the circulant matrix C2n and F2n is the Fourier matrix Fj,k = e2pii jk/2n
[35].
Since the standard series matrix-vector multiplication is O(N2spikes) and the FFT method is
O(Nsamples logNsamples), there might be room for improvement using this method if the signal has a
higher spiking density. Another factor in favor of this method is the existence in literature of many
methods to parallelize and improve the FFT’s efficiency. One clear disadvantage of this method
is the increase in memory space required to hold the circulant matrix column. The second big
disadvantage is the introduction of imaginary number into the calculations which in undesirable
in FPGA based systems.
48 Design Testing and Results
Chapter 6
Conclusion and Future Work
6.1 Conclusion
The work described in this document presents a novel hardware architecture for an IFN de-
coder was presented based on a spiking model reconstruction and implemented in FPGA.
Firstly, the state of the art on the IFN modulation and reconstruction techniques were analyzed
and studied in chapter 3. MATLAB simulations were used to infer if significant differences existed
between theoretical values and practical results. Furthermore, the IFN modulation was applied to
an ECG signal, thus demonstrating the applicability of the modulation.
In chapter 4, the architecture of the reconstruction method based on spiking model was pre-
sented and divided into three major modules: the CGSU, the MMU and the general decoder. The
CGSU is responsible for solving the most demanding part of the reconstruction algorithm and is,
therefore the most complex unit. The MMU is responsible for managing the packet arrival and is
mostly comprised by four memories, each containing a packet’s worth of information. The gen-
eral decoder contains both the MMU and the CGSU and is mostly responsible for managing and
generating the inputs to the CGSU and final outputs.
Lastly, in chapter 5 the final implementation results are presented together with some sugges-
tions for improving the overall algorithm performance. The hardware characteristics of the de-
coder are not very area-expensive (see table 5.2) and it might prove useful to have a DPFP-based
architecture, increasing thus recovery precision and CGS convergence rate. The reconstruction
results obtained for the ECG signal are shy of the 8 bit mark, but the results obtained for FWGN
signals of up to 7.5 bits of ENOB promise even better reconstruction once the mentioned improve-
ments are applied to the design (see section 5.2 of chapter 5). Furthermore, equation 5.1 makes it
clear that the proposed system is well capable of reconstructing a signal in real time and hints that
this method might also be applied to many other biological signals whose frequency range is not
too wide (e.g. Electroencephalogram).
49
50 Conclusion and Future Work
6.2 Future Work
Even though the results appear to be promising, the author recognizes that there is still room for
improvement in many aspects of the reconstruction method, algorithm and implemented design.
As future work it would be recommended to:
• Fully parallelize matrix-vector multiplications using the techniques described in the previ-
ous chapter.
• Experiment with DPFP architecture to improve precision.
• Modify the MMU to store spiking distance.
• Explore other variations of the CGS algorithm.
• Implement a feedback system to determine the threshold offset that maximizes the ENOB.
• Further explore optimizations between IFN modulation parameters to achieve better recon-
struction results.
Bibliography
[1] Gordon E. Moore. Cramming more components onto integrated circuits. Proceedings of the
IEEE, 86(1):82–85, 1998. Cited on page 1.
[2] Nicolas Brunel and Mark C W Van Rossum. Quantitative investigations of electrical nerve
excitation treated as polarization. Biological Cybernetics, 97(5-6):341–349, 2007. Cited on
page 2.
[3] A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its
application to conduction and excitation in nerve. Bulletin of Mathematical Biology, 52(1-
2):25–71, 1990. Cited on page 2.
[4] C.E. Shannon. Communication in the Presence of Noise. Proceedings of the IRE, 37(1):10–
21, jan 1949. Cited on page 8.
[5] J. M. Whittaker. The “fourier” theory of the cardinal function. Proceedings of the Edinburgh
Mathematical Society (Series 2), 1:169–176, 7 1928. Cited on page 8.
[6] V.A. Kotel’nikov. On the transmission capacity of the ‘ether’ and of cables in electrical
communications. Physics-Uspekhi, 49(7):736, 2006. Cited on page 8.
[7] Michael Unser. Sampling-50 years after Shannon. Proceedings of the IEEE, 88(4):569–587,
apr 2000. Cited on page 8.
[8] Hans G Feichtinger and Karlheinz Gröchenig. Theory and Practice of Irregular Sampling.
Wavelets: mathematics and applications2, pages 305–363, 1994. Cited on page 8.
[9] R. J. Duffin and A. C. Schaeffer. A class of nonharmonic Fourier series. Transactions of the
American Mathematical Society, 72(2):341–341, feb 1952. Cited on page 8.
[10] D.G. Zrilic. Delta Modulation Systems. In Circuits and Systems Based on Delta Modulation,
volume 1, pages 1–28. Springer-Verlag, Berlin/Heidelberg, 2005. Cited on pages 9 and 10.
[11] H. Inose and Y. Yasuda. A Unity Bit Coding Method by Negative Feedback. Proceedings of
the IEEE, 51(11):1524–1535, 1963. Cited on pages 9 and 10.
[12] Robert M Gray. Oversampled Sigma-Delta Modulation. C(5), 1987. Cited on page 10.
51
52 BIBLIOGRAPHY
[13] Sangil Park. Principles of Sigma-Delta Modulation for Analog-to-Digital Converters.
1(9):72–66, 1999. Cited on page 10.
[14] A.A. Lazar and L.T. Toth. Time encoding and perfect recovery of bandlimited signals. In
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.
Proceedings. (ICASSP ’03)., volume 1, pages VI–709–12. IEEE, 2003. Cited on pages 11 and 12.
[15] Aurel a Lazar and K Simonyi. Time Encoding of Bandlimited Signals, an Overview. Con-
ference on Telecommunication Systems, Modeling and Analysis, (October):1–18, 2005. Cited
on page 11.
[16] Dazhi Wei Dazhi Wei, Vaibhav Garg Vaibhav Garg, and J.G. Harris. An asynchronous delta-
sigma converter implementation. 2006 IEEE International Symposium on Circuits and Sys-
tems, (4):4903–4906, 2006. Cited on page 11.
[17] Dariusz Kos´cielnik and Marek Mis´kowicz. Asynchronous Sigma-Delta analog-to digital
converter based on the charge pump integrator. Analog Integrated Circuits and Signal Pro-
cessing, 55(3):223–238, 2008. Cited on page 11.
[18] Gabriel Nallathambi and José C. Príncipe. Signal processing with pulse trains: An algebraic
approach- part I. CoRR, abs/1611.03967, 2016. Cited on page 15.
[19] Gabriel Nallathambi and José C. Príncipe. Signal processing with pulse trains: An algebraic
approach- part II. CoRR, abs/1611.03970, 2016. Cited on page 15.
[20] Aurel A. Lazar. Time encoding with an integrate-and-fire neuron with a refractory period.
Neurocomputing, 58-60:53–58, 2004. Cited on page 16.
[21] Du Chen, Yuan Li, Dongming Xu, J.G. Harris, and J.C. Principe. Asynchronous Biphasic
Pulse Signal Coding and Its CMOS Realization. In 2006 IEEE International Symposium on
Circuits and Systems, pages 2293–2296. IEEE, 2006. Cited on page 16.
[22] Aurel A. Lazar and Eftychios A. Pnevmatikakis. Consistent Recovery of Sensory Stim-
uli Encoded with MIMO Neural Circuits. Computational Intelligence and Neuroscience,
2010:1–13, 2010. Cited on pages 16, 18, and 19.
[23] Alexander Singh Alvarado, Manu Rastogi, John G. Harris, and Jose C. Principe. The
integrate-and-fire sampler: A special type of asynchronous ???-?? Modulator. Proceed-
ings - IEEE International Symposium on Circuits and Systems, 1:2031–2034, 2011. Cited on
page 16.
[24] Alexander Singh Alvarado, Jose C. Principe, and John G. Harris. Stimulus reconstruction
from the biphasic integrate-and-fire sampler. 2(2):415–418, apr 2009. Cited on pages 16 and 20.
[25] A. A. Lazar, E. K. Simonyi, and L. T. Toth. Fast recovery algorithms of time encoded
bandlimited signals. In IEEE International Conference on Acoustics, Speech and Signal
Processing, volume 4, pages 237–240, Mar 2005. Cited on pages 16, 20, and 21.
BIBLIOGRAPHY 53
[26] Iman Kianpour, Bilal Hussain, Vitor G. Tavares, Candido Duarte, Helio Mendonca, and Jose
Principe. An energy study on IR-UWB transmitter using integration-and-fire modulation. In
2014 IEEE International Conference on Ultra-WideBand (ICUWB), pages 479–483. IEEE,
sep 2014. Cited on page 16.
[27] John G. Harris, Jose C. Principe, Justin C. Sanchez, Du Chen, and Christy She. Pulse-based
signal compression for implanted neural recording systems. In 2008 IEEE International
Symposium on Circuits and Systems, pages 344–347. IEEE, may 2008. Cited on page 16.
[28] Anthony Dupre, Sarah Vincent, and Paul A Iaizzo. Basic ecg theory, recordings, and inter-
pretation. pages 191–201, 2005. Cited on page 24.
[29] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G.
Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank,
PhysioToolkit, and PhysioNet: Components of a new research resource for complex
physiologic signals. Circulation, 101(23):e215–e220, 2000 (June 13). Circulation
Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218; doi:
10.1161/01.CIR.101.23.e215. Cited on page 24.
[30] Yong Dou, Jie Zhou, Xiaoyang Chen, Yuanwu Lei, and Jinbo Xu. FPGA accelerating three
QR decomposition algorithms in the unified pipelined framework. In 2009 International
Conference on Field Programmable Logic and Applications, volume 410073, pages 410–
416. IEEE, aug 2009. Cited on page 28.
[31] Jie Zhou, Yong Dou, Jianxun Zhao, Fei Xia, Yuanwu Lei, and Yuxing Tang. A Fine-Grained
Pipelined Implementation for Large-Scale Matrix Inversion on FPGA. pages 110–122. 2009.
Cited on page 28.
[32] Magnus Rudolph Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving
linear systems, volume 49. 1952. Cited on page 28.
[33] Diederik R Fokkema, Gerard LG Sleijpen, and Henk A Van der Vorst. Generalized conjugate
gradient squared. Journal of Computational and Applied Mathematics, 71(1):125–146, 1996.
Cited on page 28.
[34] Eleanor Chu and Alan George. Inside the FFT black box: serial and parallel fast Fourier
transform algorithms. CRC Press, 1999. Cited on page 47.
[35] Zhaojun Bai, James Demmel, Jack Dongarra, Axel Ruhe, and Henk van der Vorst. Templates
for the solution of algebraic eigenvalue problems: a practical guide, volume 11. Siam, 2000.
Cited on page 47.
