Design exploration and performance strategies towards power-efficient FPGA-based achitectures for sound source localization by da Silva Gomes, Bruno et al.
Research Article
Design Exploration and Performance Strategies towards Power-
Efficient FPGA-Based Architectures for Sound Source Localization
Bruno da Silva ,1,2 Laurent Segers,1 An Braeken,1 Kris Steenhaut ,1,2
and Abdellah Touhafi 1,2
1INDI Department, Vrije Universiteit Brussel, Brussels, Belgium
2ETRO Department, Vrije Universiteit Brussel, Brussels, Belgium
Correspondence should be addressed to Bruno da Silva; bruno.da.silva@vub.be
Received 24 May 2019; Accepted 26 July 2019; Published 15 September 2019
Academic Editor: Tomasz Wandowski
Copyright © 2019 Bruno da Silva et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Many applications rely on MEMS microphone arrays for locating sound sources prior to their execution. Those applications
not only are executed under real-time constraints but also are often embedded on low-power devices. These environments
become challenging when increasing the number of microphones or requiring dynamic responses. Field-Programmable
Gate Arrays (FPGAs) are usually chosen due to their flexibility and computational power. This work intends to guide the
design of reconfigurable acoustic beamforming architectures, which are not only able to accurately determine the sound
Direction-Of-Arrival (DoA) but also capable to satisfy the most demanding applications in terms of power efficiency. Design
considerations of the required operations performing the sound location are discussed and analysed in order to facilitate the
elaboration of reconfigurable acoustic beamforming architectures. Performance strategies are proposed and evaluated based on
the characteristics of the presented architecture. This power-efficient architecture is compared to a different architecture
prioritizing performance in order to reveal the unavoidable design trade-offs.
1. Introduction
Audio streaming applications involve multiple signal
processing operations performed in streams of audio signals
and, often, on resource and power-limited embedded
devices. Many applications demand the processing in parallel
of multiple input streams of audio signals while requiring a
real-time response. This is the case of microphone arrays,
which are nowadays used in many acoustic applications such
as hearing aids [1], biometrical systems [2–4], or speech
enhancement [5, 6]. Many of these acoustic applications
demand an accurate and fast localization of the sound source
prior to any operation [7]. Arrays of microphones are used
for acoustic sensing to increase the Signal-to-Noise Ratio
(SNR) by combining the input signals from the microphones
while steering the microphone’s response in a desired direc-
tion using acoustic beamforming techniques. Such beam-
forming techniques involve compute-intensive operations
which must be optimized, especially when facing real-time
constraints.
FPGAs present valuable features which make them
interesting computational units to embedded acoustic
beamformers. Firstly, a full customization of an architecture
enables multisensor real-time systems. Such embedded
systems demand low latency, which cannot be achieved on
general-purpose CPUs (e.g., microprocessors). Secondly,
FPGAs provide reprogrammable circuitries which can
become very power efficient thanks to a high-level architec-
ture customization. Although the amount of the programma-
ble logic resources of low-end FPGAs used in embedded
systems is relatively low, streaming applications such as
sound locators based on acoustic beamforming can largely
benefit from the FPGA’s features. Real-time behavior and
power efficiency are priorities for sound locators based on
acoustic beamforming since a low latency is demanded to
estimate the sound Direction-of-Arrival (DoA) while con-
suming as low power as possible.
We propose several design considerations and perfor-
mance strategies to fully exploit the current FPGA’s capabil-
ities. On the one hand, design considerations are needed to
Hindawi
Journal of Sensors
Volume 2019, Article ID 5761235, 27 pages
https://doi.org/10.1155/2019/5761235
properly satisfy the main priority of the acoustic applica-
tion. Power efficiency is a key feature of the proposed archi-
tecture. On the other hand, performance strategies are
proposed to accelerate this power-efficient architecture.
Each performance strategy firstly considers the architec-
ture’s characteristics before exploiting the FPGA’s features.
As a result, reconfigurable acoustic beamforming architec-
tures are able to operate certain orders of magnitude faster.
The presented work extends the architecture proposed
in [8]. Several improvements, such as performance strate-
gies to accelerate the proposed power-efficient architecture,
are now added. Moreover, the features of alternative archi-
tectures are discussed in detail when describing the
required operations for sound source localization and their
implementation. The main extensions and new results pre-
sented in this work are
(i) a complete design exploration of FPGA-based
architectures using a time-domain Delay-and-Sum
beamforming technique for sound source localiza-
tion is used to identify the more power-efficient
architectures
(ii) performance strategies are proposed to accelerate a
power-efficient architecture
(iii) a detailed comparison between the proposed low-
power architecture and the high-performance
architectures described in [9, 10] helps to identify
the trade-offs when targeting power efficiency
This paper is organized as follows. An overview of related
literature is presented in Section 2. A detailed description of
the required stages is done in Section 3 in order to properly
understand the impact of the architecture’s parameters. In
Section 4, the metrics used to evaluate acoustic beamforming
architectures are described. A power-efficient reconfigurable
acoustic beamforming architecture, which embeds not only
a time-domain Delay-and-Sum beamformer but also all
the operations to demodulate the Pulse Density Modula-
tion (PDM) signals from the microphones, is proposed
and analysed in Section 5. This section exemplifies how
performance strategies can fully exploit the architecture’s
characteristics to accelerate the sound localization. A final
comparison with real-time architectures proposed in [9, 10]
is done in Section 6 in order to emphasize the existing
trade-offs when targeting power efficiency. Finally, some
conclusions are drawn in Section 7.
2. Related Work
The interest in microphone arrays has increased in the last
decade, partially thanks to recent advances in microelectro-
mechanical systems (MEMS) which have facilitated the inte-
gration of microphone arrays in smartphones [11], tablets, or
voice assistants, such as Amazon’s Alexa [12]. The digital
output formats like PDM or Inter-IC Sound (I2S) currently
offered by MEMS microphones facilitate their interface to
FPGA-based systems. Figure 1 depicts the number of papers
related to microphone arrays, MEMS microphones, and
FPGA-based microphone arrays since 1997. The number of
publications related to microphone arrays has significantly
increased in the last decades (notice the log scale in the num-
ber of publications). There is a relation between the evolution
of the MEMS technology and the replacement of Digital Sig-
nal Processors (DSPs) by FPGAs for audio processing. The
majority of the publications related to MEMS microphones
or microphone arrays mainly discuss microphone technolo-
gies, with around 4% of the overall number of publications
describing FPGA-based systems using microphone arrays.
We believe that FPGAs are currently underexploited in this
area [13].
1
10
100
1000
10000
N
um
be
r o
f p
ub
lic
at
io
ns
Year
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
MEMS microphones
Microphone array
Microphone array+FPGA
Figure 1: Number of publications reported in Google Scholar related to microphone arrays, MEMS microphones, and FPGAs.
2 Journal of Sensors
FPGA-based embedded systems provide enough
resources to fully embed acoustic beamformers while
presenting power efficiency [8–10]. Such features, however,
might not be directly obtained when developing reconfigur-
able acoustic beamforming architectures. Related literature
lacks in design exploration, with few publications discussing
architecture’s parameters [14, 15] or exploring the full FPGA
potential [2, 16].
Fully embedded architectures are, unexpectedly, rare.
The available resources and the achievable performance that
current FPGAs provide facilitate the signal processing
operations required by the PDM demodulation and the
beamforming techniques. An example of a fully embedded
beamforming-based acoustic system for localization of the
dominant sound source is presented in [17, 18]. The authors
in [17] propose a FPGA-based system consisting of a micro-
phone array composed of up to 33 MEMS microphones.
Their architecture fully embeds the PDM demodulation
detailed in [19] together with a Delay-and-Sum beamformer
and a Root Mean Square (RMS) detector for the sound
source localization.
The authors in [20] also fully embed a beamforming-
based acoustic system composed of digital MEMS micro-
phone arrays acting as a node of a Wi-Fi-based Wireless
Sensor Network (WSN) for deforestation detection. The pro-
posed architecture performs the beamforming operations
before the PDM demodulation and filtering. Instead of
implementing individual PDM demodulators for each
microphone, the authors propose the execution of the
Delay-and-Sum beamforming algorithm over the PDM sig-
nals. The output of the Delay-and-Sum, which is no longer
a 1-bit PDM signal, is filtered by windowing and processed
in the frequency domain. As power consumption is a critical
parameter for WSN-related application, their architecture
uses an extremely low-power Flash-based FPGA, which
allows to only consume 21.8mW per 8-element microphone
array node. A larger version of this microphone array,
composed of 16 microphones, is proposed by the author
in [21]. Their architecture is migrated to a Xilinx
Spartan-6 FPGA due to the additional computational
operations, leading to 61.71mW of power consumption.
FPGA-based low-power architectures for WSN nodes to
perform sound source localization are, however, not an
exception. The authors in [8] propose a multimode architec-
ture implemented on an extremely low-power Flash-based
FPGA, achieving a power consumption as low as 34mW
for a 52-element microphone array proposed in [22]. In these
architectures, the strategy of beamforming PDM signals has
the benefit of saving area and power consumption due to
the drastic reduction of the number of filters needed. The
architecture’s trade-offs, like the real-time capabilities, are,
however, not discussed.
Current low-end FPGAs provide enough resources to
perform in real-time complex beamforming algorithms
involving tens of microphones. Nevertheless, the choice of
the architecture is strongly linked to the characteristics and
constraints of the target application. Here, a power-efficient
architecture is proposed to fully exploit the power-efficient
but also resource-constrained FPGAs.
3. Stages of Reconfigurable Architectures for
Time-Domain Acoustic Beamforming
Reconfigurable acoustic beamforming architectures share
several common components to perform the signal process-
ing operations required to locate sound sources using acous-
tic beamforming techniques with a MEMS microphone
array. The mandatory operations can be grouped in several
stages embedding the processing of the acquired data from
the MEMS microphone array on the FPGA. Although the
microphone array is an external sensing component from
the FPGA perspective, its features directly determine some
of the architecture’s characteristics. Nonetheless, the imple-
mentation of reconfigurable acoustic beamforming architec-
tures demands a study and analysis of the impact of the
application’s parameters. For instance, the sampling fre-
quency (FS) of the audio input determines the filters’
response at the PDM demodulation stage. FS also affects
the beamforming operation, affecting the FPGA resource
consumption, which might be critical when targeting small
FPGA-based embedded systems.
The impact of the design parameters on the implemen-
tation is analysed in this section. Firstly, the required stages
in the operations for the audio retrieval, beamforming
operations, and the sound localization are detailed. Sec-
ondly, a design space of each stage is explored in order to
identify the key design parameters and their impact. Such
a Design-Space Exploration (DSE) is general enough to
obtain power-efficient as well as high-performance reconfi-
gurable architectures such as the one presented in [10]. We
start with a short overview of the required stages enabling
audio retrieval, beamforming, and sound localization.
Microphone array: digital MEMS microphones, espe-
cially PDM MEMS ones, have become popular when build-
ing microphone arrays [13]. Besides the multiple
advantages of MEMS microphones, some of their features,
such as low-power modes, make them interesting candidates
for reconfigurable acoustic beamforming architectures. A
generic microphone array is used to exemplify how the
low-power mode of PDM MEMS microphones can be
exploited to construct arrays supporting a variable number
of active microphones. Such flexibility demands additional
design considerations.
Filter stage: instead of integrating the PDM demodulator
in the microphone package, PDM MEMS microphones out-
put a single-bit PDM signal, which needs to be demodulated
at the processing side. The PDM demodulation requires
additional computations, rather undesirably seen in the rela-
tively low amount of resources available on FPGAs in embed-
ded systems, but it also presents an opportunity to build fully
customized acoustic beamformer architectures targeting
sound location. The PDM demodulation must also be flexible
enough to support dynamic microphone arrays while being
power- and resource-efficient. The parameters which deter-
mine the required filter response are here identified and used
to evaluate multiple designs.
Beamforming stage: its relatively low complexity makes
the Delay-and-Sum technique the most popular acoustic
beamforming technique. The inherent parallelism in large
3Journal of Sensors
microphone arrays can be exploited when embedding this
type of beamformer on FPGAs. Although the computation
of the Delay-and-Sum in the frequency domain is men-
tioned in the literature, its execution in the time domain
is preferred since it avoids the computation of the discrete
Fourier transformation, which is a time- and resource-
demanding computation. The complex data representation
of the data in the frequency domain demands a high bit
width or even the use of floating-point representation,
leading to multiple Multiply-ACCumulate (MACC) opera-
tions to perform the phase shift corrections needed to
compensate the difference in path lengths. Instead, the
computation of the Delay-and-Sum beamforming tech-
nique in the time domain is reduced to the storage of
audio samples to synchronize the acquired audio samples
for a particular steered orientation. The consumption of
the FPGA internal memory, which is used to properly
delay the audio samples during the beamforming opera-
tion, can be optimized through a judicious choice of
design parameters.
Power stage: the direction of the sound source is
located by measuring the Sound Relative Power (SRP)
per horizontal direction. The SRP obtained by a 360°
sweep overview of the surrounding sound field is known
as Polar Steered Response Power (P-SRP), which provides
information about the power response of the array. The
P-SRP is only obtained after the audio recovery and the
beamforming operation. The accuracy of the DoA based
on the P-SRP is determined by parameters such as the
number of steered directions. The impact of this parame-
ter is evaluated in the DSE.
The parameters leading to a dynamic response of the
sound locator, which can be adapted at runtime thanks to
the FPGA’s flexibility, are firstly presented together with their
unavoidable trade-offs.
3.1. Parameters for a Runtime Dynamic Response. FPGAs
present an opportunity to develop dynamic reconfigurable
acoustic beamforming architectures, which self-adapt their
configuration to satisfy certain criteria. The power consump-
tion, for instance, can be dynamically adjusted at runtime.
This dynamism is obtained by exploiting the following
architecture parameters at the design stage.
Active microphones: the number of active microphones
in the array (Nam) directly affects power consumption, fre-
quency response, and performance. The architecture,
however, must be designed to support a variable Nam.
The architecture must be able to selectively deactivate the
PDM MEMS microphones of the array, for instance,
through the clock signal, and be able to deactivate the
FPGA resources associated with disabled microphones.
The following DSE demonstrate how this deactivation
can be supported at runtime, without requiring a partial
reconfiguration of the FPGA.
Angular resolution: one of the parameters which deter-
mine the capability to properly determine the DoA is the
angular resolution. The number of steered orientations
(No) defines the achievable angular resolution when calcu-
lating the P-SRP. Similar to Nam, this parameter affects the
frequency response, the performance, and, indirectly, the
power consumption. With these features in mind, the
architecture can be designed to support a runtime variable
angular resolution as presented in [9, 10].
Sensing time: the sensing time (ts), a well-known
parameter of radio frequency applications, represents the
time the receiver is monitoring the surrounding sound
field. This parameter is known to increase the robustness
against noise [23] and directly influences the probability
of proper detection of a sound source. The value of ts
is determined by the number of processed acoustic sam-
ples at the power stage (Ns). A higher Ns is needed to
detect and to determine the direction of the sound
sources under low SNR conditions. Reconfigurable acous-
tic beamforming architectures can certainly support a var-
iable Ns to adapt at runtime the sensing of the array
based on a continuous SNR estimation. Although the
proposed architecture must support a variable sensing
time at runtime, the evaluation of this parameter is out
of the scope of the presented work.
The three parameters are used to provide dynamism to
reconfigurable acoustic beamforming architectures. Note
that the selection at runtime of the values of Nam, No, and
Ns leads to multiple trade-offs, as already summarized in
Table 1. The exact values used for the architecture’s analysis
are detailed in Table 1.
3.2. Description of the Stages
3.2.1. PDM MEMS Microphone Array. The position of the
microphones into the array, known as the array geometry,
does affect not only the system’s response but also the param-
eters described in Section 3.1. Moreover, the grouping of
microphones in subarrays enables a variable Nam and fre-
quency response [9, 10]. This is a topic that has been largely
explored ([24, 25] or [26]) and is out of the scope of this
work. The microphone array used to evaluate the proposed
reconfigurable architectures presents the following charac-
teristics to achieve the desired dynamism:
(i) The array is composed of PDMMEMSmicrophones
Table 1: Architecture’s parameters used for the analysed reconfigurable acoustic beamforming architectures, the range under evaluation, and
their trade-offs.
Parameter Range Trade-offs
Number of active microphones (Nam) 4, 12, 28, 52 Resources, frequency response, directivity, and power
Number of steered orientations (No) 4, 8, 16, 32, 64 Execution time, performance, frequency response, and directivity
Number of acoustic samples at the power stage (Ns) 64 Power, execution time, and performance
4 Journal of Sensors
(ii) The PDM MEMS microphones support low-power
modes (a.k.a. sleep modes)
(iii) The microphones are grouped in subarrays through
a common clock signal
The reference microphone array is composed of 52 digital
PDM MEMS microphones like described in [22]. The array
geometry consists of four concentric subarrays of 4, 8, 16,
and 24 PDM MEMS microphones mounted on a 20 cm cir-
cular printed circuit board, depicted in Figure 2. Each con-
centric subarray has a different radius and number of
microphones to facilitate the capture of spatial acoustic infor-
mation using a beamforming technique. The selection of
PDM MEMS microphones is also motivated to the multiple
modes that such microphones support. Most of the PDM
MEMS microphones offer a low-power mode and drastically
reduce their power consumption when the microphones’
clock signal is deactivated. This feature allows the construc-
tion of microphone arrays composed of multiple subarrays.
The response of these microphone arrays can be dynamically
modified by individually activating or deactivating subarrays.
This distributed geometry can also adapt the architecture’s
response to different sound sources. For instance, not all sub-
arrays need to be active to detect a particular sound source.
The value of Nam has a direct impact on the array’s output
SNR since the SNR increases with Nam. In this regard, the
computational requirements drastically decrease and the sen-
sor array becomes more power efficient if only a few subar-
rays are active.
The features of the described microphone array, like
the deactivation of the microphones or their group in
subarrays, lead to microphone arrays with dynamic
response, ideal for high-performance or power-efficient
reconfigurable architectures.
3.2.2. Filter Stage. The audio retrieval from PDM MEMS
microphones requires certain operations. The first opera-
tion to be performed on the FPGA is the PDM demulti-
plexing since every pair of microphones has its PDM
output signal multiplexed in time. The PDM demultiplex-
ing is a mandatory operation to retrieve the individual
sampled audio data from each microphone. The incoming
data from one of the microphones is sampled at every
clock edge. A PDM splitter block, located on the FPGA,
demultiplexes the PDM samples.
(1) PDM Demodulators. Figure 3 depicts the internal compo-
nents of a PDM MEMS microphone. The MEMS transducer
converts the input Sound Pressure Level (SPL) to a voltage.
This transducer is followed by an impedance converter
amplifier, which stabilizes the output voltage of the MEMS
for the Sigma-Delta (ΣΔ) modulator. The analog signal is
digitalized at the ADC and converted into a single-bit
PDM signal by a fourth-order ΣΔ modulator running at
a high oversampling rate. PDM is a type of modulation
used to represent analog signals in the digital domain,
where the relative density of the pulses corresponds to
the analog signal’s amplitude. The ΣΔ modulator reduces
the added noise in the audio frequency spectrum by shift-
ing it to higher frequency ranges. This undesirable high-
frequency noise needs to be removed when recovering
the original audio signal.
Digital MEMS microphones usually operate at a clock
frequency ranging from 1MHz to 3.072MHz [27] or up to
3.6MHz [28]. This range of FS is chosen to oversample the
audio signal in order to have sufficient audio quality and to
generate the PDM output signal in the ΣΔ modulator. The
PDM signal needs not only to be filtered in order to remove
the noise but also to be downsampled to convert the audio
signal to a Pulse-Code Modulation (PCM) format.
Several examples of PDM demodulators proposed in
the literature and incorporated in commercial MEMS
Subarray 2: 8 mics
(Ø 81.28 mm)
Subarray 2: 4 mics
(Ø 81.28 mm)
(a)
Ring 4 (Ø = 18 cm): 24 mics
Ring 3 (Ø = 13.5 cm): 16 mics
Ring 2 (Ø = 8.9 cm): 8 mics
Ring 1 (Ø = 4.5 cm): 4 mics
(b)
Figure 2: Examples of microphone arrays composed of two
subarrays ((a) [39]) or four subarrays ((b) [22]). The reference
microphone array used to evaluate the proposed power-efficient
architecture for the sound source localization is the one described
in [22].
5Journal of Sensors
microphones are depicted in Figure 4. For instance, the
PDM demodulators in Figures 4(a) and 4(b) are the block
diagrams of the I2S MEMS microphones [29, 30], respec-
tively. The PDM demodulator in [29] incorporates a deci-
mator to downsample the PDM signal by a factor of 64
and converts the signal to PCM. The remaining high-
frequency components in the PCM signal are removed
by a low-pass filter. The PDM demodulator in [30] is
composed of two cascaded filters acting as a digital band-
pass filter. The first one is a low-pass decimator filter
which eliminates the high-frequency noise, followed by a
high-pass filter, which removes the DC and the low-
frequency components. Notice that the decimation factor
and the filters’ response are fixed in both cases. This fact
reduces the DSE since it limits the operational frequency
range of the target acoustic application to be a fixed mul-
tiple of the microphones’ sampling frequency FS. For
instance, if the PDM demodulator decimates by a fixed
factor of 64 like in [29], the microphones must be clocked
at FS = 3 072MHz for a desired output audio at 48 kHz.
At that frequency, audio signals up to 24 kHz can be recov-
ered without aliasing according to the Nyquist theorem.
PDM demodulators in Figures 4(c) and 4(d), which are
proposed in [19] and in [14], respectively, present a cascade
of three different types of filters in a filter chain fashion, that
is, a CIC decimation filter followed by two half-band filters
with a decimator factor of 2 and a low-pass FIR filter in the
final stage. The CIC filters are used to convert the PDM sig-
nals in PCM format. This class of linear phase FIR filters,
developed by Hogenauer [31, 32], involves only additions
and subtractions. It consists of 3 stages: the integrator stage,
the decimator or interpolator stage, and the comb section.
MEMS
transducer
Amplifier ADC
CLK
Data
L/R selectGNDVDD
PDM
modulator
Channel
select
Power
management
Figure 3: Typical digital PDM MEMS microphone block diagram. Source: [40].
Decimator
64
1-bit
PDM
18 to 24-bit
PCM
Low-pass filter
(a)
Low-pass
decimation filter High-pass filter
1-bit
PDM
24-bit
PCM
(b)
CIC decimator
1-bit
PDM
Low-pass filter
Half-band
filter
Half-band
filter
2 2
16-bit
PCM
(c)
CIC decimator
1-bit
PDM
Half-band
filter
Half-band
filter
2 2
16-bit
PCM
Low-pass
decimation filter
(d)
Figure 4: Examples of PDM demodulators: (a) is the internal block diagram of the digital I2S MEMS microphone SPH0645LM4H from
Knowles [29]. (b) is the internal block diagram of the digital I2S MEMS microphone ICS-43434 from TDK InvenSense [30]. (c) is
proposed for the Blackfin processor [19]. (d) is proposed in [14].
6 Journal of Sensors
PDM input samples are recursively added in the integrator
stage while being recursively subtracted with a differential
delay in the comb stage. The number of recursive operations
in the integrator and comb section determines the order of
the filter (NCIC). This order should at least be equal to the
order of the ΣΔ converter from the DAC of the microphones.
After the CIC filter, the signal growth (G) is proportional to
the decimation factor (DCIC) and the differential delay (D)
and is exponential to the filter order [32]. CIC decimation fil-
ters decimate the signal by DCIC and convert the PDM signal
in PCM at the same time. A major drawback of this type of
filter is the nonflat frequency response in the desired audio
frequency range. To improve the flatness of the frequency
response, a CIC filter with a lower decimation factor followed
by compensation filters is usually a better choice, as proposed
in [19, 32, 33]. The CIC filter is followed by a couple of half-
band filters of order NHB with a decimation factor of two.
Half-band filters are widely used in multirate signal process-
ing applications. These types of filters that let only half of the
frequency band of the input signal present two important
characteristics. Firstly, the passband and stopband ripple
must be the same. Secondly, the passband-edge and
stopband-edge frequencies are equidistant from the half-
band frequency π/2. As a result, the filter’s coefficients are
symmetrical and every second coefficient is zero. Both char-
acteristics can be exploited for resource savings. The last
component is a low-pass compensation FIR filter of order
NFIR to remove the high-frequency noise introduced by the
ADC conversion process in the microphone. This filter can
also be designed to compensate the passband drop usually
introduced by CIC filters [32]. Optionally, it can additionally
perform a downsampling of the signal being further deci-
mated by a factor of DFIR like that proposed in Figure 4(d).
(2) Proposed PDM Demodulator. The analysed filter stage,
originally proposed in [10] and in [8], is composed of single
or multiple filter chains performing the PDM demodulation.
Each filter chain corresponds to several cascaded filters per-
forming a PDM demodulation of the microphone array out-
put signals (Figure 5), simplifying the PDM demodulators in
Figures 4(c) and 4(d) by reducing the number of the cascaded
filters. Both half-band filters are replaced by a moving aver-
age filter, which removes the DC level of the CIC’s output sig-
nal, improving the dynamic range of the signal entering the
low-pass compensation FIR filter. The FIR filter presents a
cut-off frequency of Fmax at a sampling rate of FS/DCIC,
which is the sampling rate obtained after the CIC decimator
filter with a decimation factor of DCIC. The stream nature of
such architecture enables the generation of an output value
from the CIC filter every clock cycle. Due to the decimation
factor, only one output value per DCIC input value is propa-
gated to the low-pass FIR filter. Consequently, the FIR filter
has DCIC clock cycles to compute each input value. This
low-pass FIR filter needs to be designed in a serial fashion
to reduce the resource consumption, and its maximum order
is also determined by DCIC:
NFIR ≤DCIC − 1 1
Hereby, NFIR is assumed to be equal to its maximum
order (DCIC − 1) since the order is directly related to the
quality of the response of the filter. The overall DF can
be expressed based on the downsampling rate change of
each filter
DF =
FS
2 ⋅ Fmax
= FSBW =DCIC ⋅DFIR, 2
where DFIR is the decimation factor needed for the FIR
filter to obtain the minimum bandwidth BW to satisfy
the Nyquist theorem for the target Fmax.
The filter chain depicted in Figure 5 enables dynamic
architectures while performing the PDM demodulation.
The range of parameters such as FS and Fmax depends on
the PDM MEMS microphone specifications. For instance,
the PDM MEMS microphone ADMP521 from Analog
Devices used in [22] operates at a FS in a range from
1.25MHz to 3.072MHz as specified in [27], and its frequency
response ranges from 100Hz to 16 kHz. The specifications of
the acoustic application also determine Fmax, which must be
in the range of the supported frequencies of the microphone.
Both parameters, FS and Fmax, determine the value ofDF and,
therefore, the signal rate of each filter. However, not all pos-
sible configurations are supported when specifying the low-
pass FIR filter’s characteristics. For instance, the passband
and the stopband, the transition band, or the level of attenu-
ation of the signal out of the passband limit the supported
FIR filter’s configurations.
(3) DSE of the PDM Demodulator. Based on the previous
description of the different filters of the filter chain, a DSE
can be done to evaluate the supported configurations. The
analysis considers an Fmax ranging from 13 kHz to 16.5 kHz
in steps of 125Hz, and FS ranges from 1.25MHz to
3.072MHz. All possible combinations of DCIC and DFIR are
considered on equation (2), based on the DF obtained for
every possible value of FS and Fmax. The low-pass FIR filter
parameters are NFIR , which is determined by DCIC, and
Fmax as the cut-off frequency. Each possible low-pass FIR fil-
ter is generated considering a transition band of 3.5 kHz and
Moving
average filter
Filter chain
(DCIC-1)th-order
low-pass
FIR filter
NCICth-order CIC
decimation filter DFIRDCIC PCM
PDM
Figure 5: The filtering stage consists of single or multiple filter chains performing the PDM demodulation.
7Journal of Sensors
an attenuation of at least 60 dB at the stop band. If the mini-
mum order or the filter is higher than NFIR , the filter is dis-
carded. Furthermore, a minimum order of 8 is defined as
the threshold for NFIR . Some values are discarded because
DF is a prime number or NFIR is below 8. The filter’s param-
eters are realistic constraints for low-pass FIR filters.
Figure 6 depicts the values of DCIC for the supported con-
figurations detailed in Table 2. Each low-pass FIR filter is
generated and evaluated in MATLAB 2016b. The values of
DCIC provide information of DF and DFIR due to equation
(2). Higher values of Fmax allow higher values of DCIC, which
can greatly reduce computational complexity of narrowband
low-pass filtering. However, too high values of DCIC lead to
such low rates that, although a higher-order low-pass FIR
filter is supported, it cannot satisfy the low-pass filtering
specifications. Notice how the number of possible solutions
decreases when increasing Fmax. Due to FS and Fmax ranges,
the values of DF vary between 38 and 154. Although, as pre-
viously explained, many values cannot be considered since
they are either prime numbers or the decomposition in fac-
tors of DCIC that leads to values below 8. Because higher
values of Fmax lead to low values of DCIC for low FS, these
DCIC values cannot satisfy the specifications of the low-pass
FIR filter. High values of DCIC lead to high-order low-pass
FIR filters and lower DFIR .
The presented DSE of the filter chain performing the
PDM demodulation is general enough to be applied to any
of the PDM demodulators depicted in Figure 4. It can be
applied to identify the most performing solutions as well as
to reduce the resource consumption as discussed in the fol-
lowing section.
3.2.3. Beamforming Stage. Microphone arrays can focus a
specific orientation thanks to beamforming techniques. Such
techniques amplify the sound coming from the targeted
direction while suppressing the sound coming from other
directions. The time-domain Delay-and-Sum beamforming
is a beamforming technique that delays the output signal of
each microphone by a specific amount of time before adding
all the output signals together. The detection of sound
sources is possible by continuously steering in loops of
360°. The number of steered orientations per 360° sweep,
No, is the angular resolution of the microphone array. Higher
angular resolutions demand not only a larger execution time
per steering loop but also more FPGA memory resources to
store the precomputed delays per orientation.
The beamforming stage performs the time-domain
Delay-and-Sum beamforming operation and is composed
of a bank of memories, a precomputed table of delays, and
several cascaded additions. Although Delay-and-Sum beam-
forming assumes a fixed number of microphones (NMics) and
a fixed geometry, our scalable solution satisfies those restric-
tions while offering a flexible geometry [9]. Figure 7 shows
our proposed beamforming stage, which is basically com-
posed of FPGA blocks of memory (BRAM) in ring-buffer
fashion that properly delays the filtered microphone signal.
The delay for a given microphone is determined by its posi-
tion on the array and on the focus orientation. All possible
delay values (Δ) per microphone for each beamed orientation
are precomputed, grouped per orientation, and stored in
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
13500 14000 14500 15500 16000 16500
10
55
DCIC
F S
 (M
H
z)
Fmax (kHz)
15000
50
45
40
35
30
25
20
15
×106
Figure 6: Supported values of DCIC based on FS and Fmax. The low-pass FIR filter’s specifications are detailed in Table 2.
Table 2: Parameters used for the design-space analysis.
Definition Value
FS 1.25MHz to 3.072MHz
Fmax 13.75 kHz to 16.5 kHz
Steps for Fmax 125Hz
Parameters of the low-pass FIR filter
Minimum order 7
Transition band 4 kHz
Stop band attenuation 60 dB
8 Journal of Sensors
ROMs during compilation time. During execution time,
the delay values Δm θ of each microphone m when point-
ing to a certain orientation θ are obtained from this pre-
computed table.
The memory requirements of the beamforming stage are
obtained for all the possible locations of the beamforming
stage between the components of the filter stage. Figure 8
depicts the potential locations of the Delay-and-Sum-based
beamformer. The memory requirements of the beamforming
stage based on FS and on Fmax are shown in Figure 9. That
figure depicts the memory requirements for the supported
configurations of the filter chain explored when assuming
the FIR filter’s characteristics summarized in Table 2. All
the discussed characteristics of the filter stage depicted in
Figure 6 are evaluated for each possible placement of the
Delay-and-Sum-based beamformer. The first possible loca-
tion is between the microphone array and the CIC filter.
The beamforming memory demand linearly increases with
FS. The input signals to be stored are single-bit PDM signals,
which in theory should reduce the need for memory. How-
ever, due to high values of Δm, thousands of PDM signals
need to be stored per microphone. The bit width of the out-
put signals from the CIC filter grows [32], which increases
the beamforming memory demands when placing the beam-
forming stage after the CIC filter. Nevertheless, the signal bit
width after the moving average filter and before the low-pass
FIR filter can be reduced to 32 bits. Although a lower bit
width would not cause significant signal degradation, 32 bits
are assumed enough to guarantee the good signal quality for
all supported possible microphone array configurations. Due
to the audio signal downscaling, low values of Δm are
obtained when the beamforming stage is located after the
low-pass FIR filter, leading to a significant reduction of the
memory demands. Detailed analysis of the beamforming
memory demands fuels the quest for the most memory effi-
cient architecture.
The memory requirements depicted in Figure 9 have
been calculated for a beamforming stage designed to support
Delays subarray 1
Delays subarray 2
Delays subarray 3
+
Mem delay microphone 1
Mem delay microphone 4
…
Mem delay microphone 5
Mem delay microphone 12
…
Delays subarray 4
+
Mem delay microphone 13
Mem delay microphone 28
…
Mem delay microphone 29
Mem delay microphone 52
…
Sums
Delay-and-Sum beamformer
Mem delay
subarray 1
Mem delay
subarray 2
Mem delay
subarray 3
Mem delay
subarray 4
+
+
+
Precomputed delays per orientation
Figure 7: Details of the internal structure of the beamforming stage performing the Delay-and-Sum beamforming technique. Note that the
delay values are stored in a precomputed table.
PDM PCM
B
e
a
m
f
o
r
m
e
r
B
e
a
m
f
o
r
m
e
r
B
e
a
m
f
o
r
m
e
r
B
e
a
m
f
o
r
m
e
r
NCICth-order CIC
decimation filter
Moving
average filterDCIC DFIR
(DCIC-1)th-order
low-pass
FIR filter
Figure 8: Explored locations of the Delay-and-Sum-based beamformer (grey boxes) detailed in Figure 7.
9Journal of Sensors
a variable NMics. The input signals are grouped following
their subarray structure. Every microphone m is associated
with a memory, which properly delays that particular audio
stream with an amount Δm. Each delay memory belonging
to a subarray has the same width and length to support all
possible orientations. The length is defined by the maximum
7
Before CIC filter
After CIC filter
6
5
4
M
em
or
y 
re
qu
ire
m
en
ts 
(b
its
)
3
2
1
1.6 1.8 2 2.2 2.4
FS (Hz)
2.6 2.8 3 3.2
×104
×104
After moving average
After FIR filter
(a)
7
Before CIC filter
After CIC filter
After moving average
After FIR filter
6
5
4
M
em
or
y 
re
qu
ire
m
en
ts 
(b
its
)
3
2
1
1.4 1.45 1.5
Fmax (Hz)
1.55 1.6 1.65
×104
×104
(b)
Figure 9: Memory consumption based on FS (a) and Fmax (b) for the supported values of DCIC. The memory requirements strongly depend
on the position of the beamforming stage in the architecture (Figure 8).
10 Journal of Sensors
delay (max Δmi ) of that subarray i, which is determined by
the MEMSmicrophone planar distribution and FS. All mem-
ories associated with the same subarray can be disabled.
Therefore, instead of implementing one simple Delay-and-
Sum beamformer for a 52-element microphone array, there
are four Delay-and-Sum beamforming operations in parallel
for the subarrays composed of 4, 8, 16, and 24 microphones.
Their sum operation is firstly done locally for each subarray
and afterwards between subarrays. The only restriction of
this modular beamforming is the synchronization of the out-
puts in order to have them properly delayed. Therefore, the
easiest solution is to delay all the subarrays with the maxi-
mum delay (max max Δmi ) of all subarrays. Although
the output of some subarrays is already properly delayed,
additional delays, shown at the Sums part in Figure 7, are
inserted to assure the proper delay for each subarray. This
is achieved by using the valid output signals of each subarray
beamforming, without additional resource cost. Conse-
quently, only the Delay-and-Sum beamforming modulo
linked to an active subarray is enabled. The nonactive beam-
formers are set to zero in order to avoid any negative impact
of the beamforming operation.
A side benefit of this modular approach is a reduced
memory consumption. Figure 10 shows the memory savings
for the supported configurations of the filter chain explored
in the previous section. Since each subarray has its ring-
buffer memory properly dimensioned to its maximum sam-
ple delay, the portion of underused regions of the memories
is significantly lower. For the filter chain parameters under
evaluation, the memory savings range between 19% and
23%. The variation of the memory savings depends on the
placement of the beamforming stage in the architecture.
Thus, a mostly constant memory saving of around 21% is
possible when the beamforming stage is located between
the microphone array and the filter chains. The higher varia-
tion occurs when the beamforming stage is located at the end
of the filter chains because the memory demands are more
sensitive to small differences in the maximum delay values.
For instance, whereas in the first case max(Δ) rounds to
1048, its value is reduced to 16 when the beamforming stage
is located after the filter chain. The modular approach of the
beamforming stage does not only increase the flexibility of
the architecture by supporting a variable number of micro-
phones at runtime but also represent a significant reduction
of the memory requirements.
3.2.4. Power Stage. The Delay-and-Sum beamforming tech-
nique allows to obtain the relative sound power of the
retrieved audio stream for each steering direction. The com-
putation of the P-SRP in each steering direction provides
information about the power response of the array. The
power value per steering direction is obtained by accumulat-
ing all the individual power values measured for a certain
time ts needed to detect and locate sound sources under
low SNR conditions. All the power signals in one steering
loop conform the P-SRP. The peaks identified in the P-SRP
point to the potential presence of sound sources.
Figure 11 shows the components of the power stage.
Once the filtered data has been properly delayed and
added, the SRP can be obtained for a particular
23.5
M
em
or
y 
sa
vi
ng
s (
in
 %
)
23
22.5
22
21.5
21
20.5
20
19.5
19
18.5
1.4 1.45 1.5
Fmax (Hz)
1.55 1.6 1.65
×104
Before CIC filter
After CIC filter
After moving average
After FIR filter
Figure 10: Memory savings as a result of decomposing the beamforming stage in subarrays.
11Journal of Sensors
orientation θ. The P-SRP is obtained after a steering loop,
allowing the determination of the sound sources. The
sound source is estimated to be located in the direction
shown by the peak of the maximum SRP.
3.2.5. Summary. The proposed design considerations and
their impact on the architecture are summarized in Table 3.
Notice, however, that such design considerations can be
individually applied to each stage.
4. Evaluation of Reconfigurable Acoustic
Beamforming Architectures
The selection of the design parameters determines the
characteristics of the reconfigurable acoustic beamforming
architecture. The speed of the architecture, the frequency
response, and the accuracy of the sound localization are some
of the features used to evaluate and compare designs. The
following metrics are used to evaluate the reconfigurable
architectures embedding time-domain beamformers for
sound location:
(1) Acoustic response
(a) Frequency response
(b) Directivity
(2) Architecture’s characteristics
(a) Timing and performance
(b) Resource and power consumption
4.1. Evaluation of the Acoustic Response. The first two metrics
are used to determine the quality of the sound localization
and use the array’s characteristics to profile the overall
response of the selected architecture. The directional power
output of a microphone array shows the directional response
of the architecture to all sound sources present in a sound
field. The lobes of this polar map can then be used to estimate
the bearing of nearby sound sources in nondiffuse sound field
conditions. The defined P-SRP allows the estimation of the
DoA of multiple sound sources under different sound field
conditions. The accuracy of its estimation can be determined
by the following quality metrics.
Frequency response: the evaluation of the frequency
response of the reconfigurable acoustic beamforming archi-
tecture for different sound source frequencies is needed.
The beam pattern of the microphone array can be repre-
sented as a waterfall diagram. Such diagram shows the power
output of the sound locator in all directions for all frequen-
cies, which demonstrates how DP varies with multiple
orientations and frequencies. Waterfall diagrams allow the
evaluation of the frequency response of different reconfigur-
able acoustic beamforming architectures for certain beamed
orientations. The resolution of the waterfall diagram can be
increased by reducing the frequency steps or by increasing
the number of steered angles.
Directivity: the P-SRP’s lobes are used to estimate the
bearing of nearby sound sources in nondiffuse sound field
conditions. The capacity of the main lobe to unambiguously
point to a specific bearing when considering the scenario of a
single sound source determines the architecture’s directivity
(DP). This definition of DP is originally proposed in [34] for
broadband signals, where DP is used as a metric of the quality
of the architecture as a sound locator sinceDP depends on the
main lobe shape and its capacity to unambiguously point to a
specific bearing. DP is a key metric when locating sound
sources because it reflects how effectively the architecture dis-
criminates the direction of a sound source. The definition of
directivity presented in [34, 35] is adapted for 2D polar coor-
dinates [22] as follows:
Dp θ, ω =
P θ, ω 2
1/2π 2π0 P θ, ω
2dθ
, 3
where P θ, ω represents the output power of the micro-
phone array when pointing to the sound source’s direction
θ and 1/2π 2π0 P θ, ω
2dθ is the average output power in
all other directions. It can be expressed as the ratio between
the area of a circle whose radius is the maximum power of
the array and the total area of the power output. Therefore,
DP defines the quality of the sound locator and can be used
to specify certain thresholds for the architecture. For
instance, if DP equals 8, the main lobe is eight times lower
than the unit circle and offers a trustworthy estimation of a
sound source within half a quadrant.
4.2. Evaluation of the Architecture’s Characteristics. An eval-
uation of the reconfigurable acoustic beamforming architec-
ture must cover not only the quality of the sound location
but also other implementation-related parameters like the
achievable performance and the power consumption.
Time and performance analysis: a proper timing analysis
helps to identify performance bottlenecks and to tune the
architecture towards lower latency. The time needed by the
microphone array to compute P-SRP (tP−SRP) can be deter-
mined by decomposing the execution time of the architec-
tural stages. A proper implementation of the stages, and
especially of their data flow, can significantly reduce this
time. For instance, tP−SRP decreases if the architecture is
designed to pipeline the operations of each stage within a
steered orientation, enabling the overlapping of the execution
Steered response power
+
Z–1
| |2 1/Nam
Figure 11: The power stage consists of a couple of components to
calculate P-SRP, used to estimate the location of the acoustic source.
12 Journal of Sensors
of the architecture’s components. A detailed analysis of the
implementation of each component and its latency provides
a good insight in the speed of the system. On the other hand,
a performance analysis of a reconfigurable acoustic beam-
forming architecture gives an idea about what design param-
eters have a higher performance impact. The performance
units can be defined at different levels. The processed audio
samples per second reflect the reusability of the acquired
data. During the beamforming operation, the same audio
sample can be used to calculate the SRP for multiple different
orientations. Another performance unit could be the number
of beamed orientations per second (Or/s). This type of units
better reflects the achievable performance of reconfigurable
acoustic beamforming architectures and facilitates the
comparison of reconfigurable acoustic beamforming archi-
tectures in terms of performance.
Resource and power consumption: further analysis
regarding the power or resource consumption of the
architecture is needed to satisfy the architecture’s target pri-
orities. For instance, the streaming nature of acoustic beam-
forming applications, with continuous flux of incoming
data, needs a large amount of memory to store the interme-
diate results of the signal processing operations. As
analysed in the previous section, a decomposition in subar-
rays of the beamforming stage reduces the consumption of
internal memory. However, it also affects the power con-
sumption and might finally determine the supported FPGA.
The metrics described above are used in the next section
to evaluate a power-efficient architecture. Different perfor-
mance strategies are proposed to increase the performance
of this architecture.
5. Power-Efficient Reconfigurable Architecture
Current low-end FPGAs offer enough resources to embed
power-efficient reconfigurable acoustic beamforming archi-
tectures such as the one described and analysed in this sec-
tion. The presented architecture, firstly presented in [8],
drastically reduces resource consumption, making it suitable
for low-end Flash-based FPGAs. This type of FPGAs pre-
sents a power consumption as low as few tens of mW but
lacks available resources. Such resource restriction drastically
reduces the achievable performance if the architecture’s char-
acteristics are not properly exploited. Here, different perfor-
mance strategies are applied in order to accelerate this
architecture, becoming more attractive for time-sensitive
applications.
Figure 12 depicts the main components of the power-
efficient architecture. The input rate is determined by the
microphone’s clock and corresponds to FS. The architecture
Table 3: Summary of the design considerations per stage.
Stage Design consideration Effect
Microphone array
PDM MEMS microphones Deactivation of microphones.
Subarray decomposition Power savings.
Filter stage
Moving average filter Resource savings. Improved dynamic range.
Serial FIR filter Resource savings.
Beamforming stage Subarray decomposition Power and resource savings.
…
…
Filter stage
PD
M
 sp
lit
te
r
M
ic
ro
ph
on
e a
rr
ay
Power stage
P-
SR
P
Steered response power
Delays-and-decimations subarray 2
Delays-and-decimations subarray 3
+
Mem delay microphone 1
Mem delay microphone 4
…
Mem delay microphone 5
Mem delay microphone 12
…
Mem delay microphone 13
Mem delay microphone 28
…
…
Sums
Beamforming stage
PDM MIC25
PDM MIC1 Delays-and-decimations subarray 1
Delays-and-decimations subarray 4
Mem delay microphone 29
PDM MIC52
PDM MIC2
Mem delay microphone 52
…
Precomputed delays per orientation
Mem delay
subarray 1
Mem delay
subarray 2
Mem delay
subarray 3
Mem delay
subarray 4
Configuration Control unit
++
+
+
+
Z–1
| |2 1/Nam
(DCIC-1)th-
order low-
pass
FIR filter
DFIRDCIC
Moving
average
filter
NCICth-
order CIC
decimation
filter
Figure 12: Overview of the proposed power-efficient architecture. The Delay-and-Sum beamforming is composed of several memories to
properly delay the input signal. Our implementation groups the memories associated to each subarray to disable those memories linked to
deactivated microphones. The beamformed input signal is converted to audio in the cascade of filters. The DoA is finally obtained based
on SRP obtained per orientation.
13Journal of Sensors
is designed to operate in the streaming mode, which guaran-
ties that each component is always computing after an initial
latency.
The oversampled PDM signal coming from the micro-
phones is multiplexed per microphone pair, requiring a
PDM splitter block to demultiplex the input PDM signal into
2 PDM separate channels. Thus, the PDM streams from each
microphone of the array are properly delayed at this stage to
perform the Delay-and-Sum beamforming operation. The
beamforming stage is followed by the filter stage, where the
high-frequency noise is removed and the input signal is
downsampled to retrieve the audio signal. Notice that a filter
stage is only composed of one filter chain like the one
described in Section 3.2.2 instead ofNMics filter chains thanks
to placing the beamforming stage before the filter stage. The
SRP for the beamed orientation is calculated in the last stage.
The lobes of the P-SRP are used to estimate the DoA for the
localization of the sound sources.
5.1. Architecture Performance Exploration. The architecture
is designed to satisfy power-constraint acoustic beamforming
applications. Multiple performance strategies can be applied
to increase performance while preserving the power effi-
ciency. Such strategies minimize the timing impact of the sig-
nal processing operations of the architecture.
The execution time (tP−SRP) is defined as the time
needed to obtain the P-SRP. Each steered orientation
involves multiple signal processing operations that can be
executed concurrently in a pipelined way. Therefore, the
times to filter (tFiltering), to beamform (tBeamforming), and
to get the SRP (tPower) are overlapping with the sensing
time (ts). Although most of the latency of each component
of the design is hidden when pipelining operations, there
are still some cycles, defined as Initiation Interval (II),
dedicated to initialize the components. The proposed
architecture also demands an additional time to reset the
filters (tr) at the end of the computation of each orienta-
tion. The relatively low value of tr can be neglected
because only a few clock cycles are needed to reset the fil-
ters. As detailed in Figure 13, tP−SRP for a certain No can
be determined by
tP‐SRP =No ⋅ tII + tr + ts ≈No ⋅ tII + ts , 4
where tII corresponds to the sum of the II of the filter
stage (tFilteringII ), the II of the beamforming stage
(tBeamformingII ), and the II of the SRP stage (t
Power
II ).
The power-efficient architecture presents several limi-
tations when considering performance strategies. For
instance, due to the architecture’s characteristics, the
strategies proposed in [10] cannot be applied without a
significant increment of the resource consumption. Some
new performance strategies are here proposed to over-
come these limitations.
5.1.1. Continuous Beamforming. The computation of P-SRP
considers a reinitialization of the beamformer per beamed
orientation. Such initialization can be drastically reduced if
the architecture continuously beamforms the acquired data.
Due to the fact that the beamforming stage is mainly com-
posed of delay memories, the required data to start the com-
putation of SRP for a new orientation has been already stored
when computing the previous orientation. Therefore, a single
initialization is needed at the very beginning, as detailed in
Figure 14. The value of to becomes
to = t
Filtering
II + tPowerII + tr + ts, 5
since tBeamformingII must be only considered when the system
starts. In this regard, tP−SRP becomes
tP‐SRP =No ⋅ to ≈No ⋅ t
Filtering
II + ts , 6
where tPowerII and tr are neglected.
The performance for using this strategy is defined as
follows:
Performance = 1
to
≈
1
tFilteringII + ts
7
5.1.2. Parallel Continuous Time Multiplexing. The single
increment of the functional clock frequency beyond FS
is not enough to improve performance improvement.
Unfortunately, the simple acceleration by increasing the
operational frequency of the filter stage demands the
Orientation 1 . . .Reset
tP-SRP
II of the
beamforming stage
Beamforming Filtering
ts trtII tIItII
Orientation 64 ResetII of thebeamforming stage
II of the
filter stage
II of the
power stage
Power
to
II of the
filter stage
II of the
power stage
Figure 13: Detailed schedule of the operations without any performance strategy.
Orientation 1 . . .Reset
tP-SRP
II of the
beamforming stage
Beamforming
Filtering
tII tII ts tr
tII
Orientation 64 ResetII of thefilter stage
II of the
power stage
Power
to
II of the
filter stage
II of the
power stage
Figure 14: Detailed schedule of the operations when continuously beamforming.
14 Journal of Sensors
prestorage of all the samples acquired in ts. Although
such storage can take place at the beamforming stage,
this component would largely increase its resource con-
sumption to store NS ⋅ DF/FS samples per microphone.
Because of the nature of the filtering operation, the
switching between different orientations demands the
storage of the intermediate values stored in the multiple
taps of the filter structure. Such storage should be applied
for each intermediate register of each filter and for each
orientation. The impact of this resource overhead would
be similar to the cost of replicating each filter chain per
orientation. This strategy causes a significant increment
of resource consumption due to the fact that the filter
stage is located after the beamforming stage. The solution
is the replication of the filter stage while increasing its
operational frequency to FP. Figure 15 details how the
architecture would perform when multiple filter chains,
as many as No, are available. Two clock regions are
defined since the filter chains must operate at a higher
frequency in order to retrieve the beamformed data from
the beamforming stage (Figure 16). The incoming data
from the microphone array enters the beamforming stage
at FS rate. To process No orientations in parallel in one
clock cycle at FS, the beamforming stage needs to gener-
ate data at a desired FP:
FP ≥
FS ⋅NFStages
NB
, 8
where NFStages is the number of filter chains available in
the filter stage and NB is the number of beamformed
values out of the beamforming stage accessible per clock
cycle. The value of NB is defined as
NB =NBStages ⋅Mports, 9
with NBStages the number of beamforming stages if the
available resources support more than one and Mports
the memory ports of each beamforming memory. For
instance, dual-port memories allow 2 readings per mem-
ory access, which results in 2 output beamformed values
per clock cycle if the sums performed in the beamform-
ing stage are duplicated. The use of dual-port memories
is equivalent to duplicating the beamforming stage
composed of single-port memories. In both cases, 2
beamformed values can be loaded from each microphone
delay memory. This strategy, however, does not exploit
the remaining resources to instantiate multiple beam-
forming stages, and therefore, NB is assumed to be 1
for this strategy since single-port memories are consid-
ered for the beamforming stage.
The value of NFStages not only is determined by the avail-
able resources but also depends on No and the maximum
operational frequency Fop. The number of orientations that
can be computed in parallel when increasing the operational
frequency of the filter stage to Fop defines the number of the
supported filter stages NF as
NF =
Fop ⋅NB
FS
10
Orientation 1
. .
 .
tP-SRPtII
II of the
beamforming stage
Beamforming
ts tr
Orientation 64
R
e
s
e
t
Filtering
tIItII
Power
II of the
filter stage
II of the
power stage
Figure 15: Possible schedule of the operations computing in parallel.
Beamforming
stage
Filter
stage
Power
stage P-
SR
P
M
ic
ro
ph
on
e a
rr
ay
FPFS
Figure 16: Clock regions for the time multiplexing of the computation of multiple No.
15Journal of Sensors
Therefore, NFStages becomes
NFStages = min No,NR,NF , 11
where NR is the supported number of filter chains deter-
mined by the available resources. Notice that NFStages can be
limited to No when there is no additional benefit of process-
ing a higher value of No, which is determined by the target
acoustic application.
With this strategy, the time to compute one orientation is
reduced with a factor of NFStages:
to = t
Filtering
II + tr + ts, 12
and the value of tP−SRP becomes
tP‐SRP =No ⋅ to ≈
No
NFStages
⋅ tFilteringII + ts , 13
where tPowerII and tr are neglected. Regarding the achievable
performance,
Performance =
NFStages
tFilteringII + Ts
14
The cost, however, is an increment of the resources and
power consumption. The extra resources are dedicated to
the NFStages filter chains or even to additional beamforming
stages if NFStages is limited by NF, in order to fully compute
in parallel.
The strategies to improve performance are summarized
in Table 4. Notice that their impact is not as significant since
the main goal of the power-efficient architecture is to reduce
the resource consumption and, as a result, the overall power
consumption.
5.2. Experimental Results. The design parameters of the
architecture under evaluation are summarized in Table 5.
The variation of the target Fmax and the FS directly affects
the beamforming stage by determining the length of the
memories and the filter stage, by determining the decima-
tion factor and the FIR filter order. Moreover, the impact
of Nam, which changes at runtime thanks to the subarray
distribution, is analysed. Like the evaluation of the previ-
ous architecture, P-SRP is obtained from a steering loop
composed of 64 orientations. The power-efficient architec-
ture has been evaluated for a Microsemi’s SmartFusion2
M2S025 FPGA.
5.2.1. Frequency Response. The frequency response of the
microphone array is determined by Nam. The experiments
cover four configurations with 4, 12, 28, or 52 microphones
determined by the number of active subarrays. The waterfall
diagram of each configuration is generated in order to ana-
lyse the frequency response while locating sound sources.
The waterfall diagrams show the power output of the com-
bined subarrays in all directions for all frequencies. The
results are calculated with a single sound source placed at
180°. The frequency of the sound source varies between
100Hz and 15 kHz in steps of 100Hz. All results are normal-
ized per frequency.
Figure 17 depicts the waterfall diagrams when combining
a different number of subarrays. Every waterfall shows a clear
distinctive main lobe. However, this lobe dominates the most
in case when subarrays 3 and 4 are also capturing sound
waves. When only subarray 1 is active, the side lobes affect
the capacity of finding the main lobe. The frequency response
of the subarrays improves when they are combined since
their frequency responses are superposed. Consequently,
the combination of the subarrays 1 and 2 reaches a minimum
Table 4: Summary of the performance strategies for the described power-efficient architecture.
Strategy Performance Time execution of one P-SRP (tP−SRP)
None
1
tII + ts
No ⋅ tII + ts
Continuous beamforming
1
tFilteringII + ts
No ⋅ t
Filtering
II + ts
Parallel continuous time multiplexing
NFStages
tFilteringII
+ ts with
NFStages −min NR,NF
No/NFStages ⋅ t
Filtering
II + ts with
NFStages = min No,NR,NF
Table 5: Configuration of the architecture under analysis.
Parameter Definition Value
FS Sampling frequency 2.08MHz
Fmin Minimum frequency 1 kHz
Fmax Maximum frequency 16.250 kHz
BW Minimum bandwidth to satisfy Nyquist 32.5 kHz
DF Decimation factor 64
DCIC CIC filter decimation factor 32
NCIC Order of the CIC filter 4
D CIC differential delay 32
DFIR FIR filter decimation factor 2
NFIR Order of the FIR filter 31
16 Journal of Sensors
detectable frequency of 2.4 kHz, whereas the combination of
the subarrays 1, 2, and 3 and the combination of all subarrays
reach 2.2 kHz and 1.8 kHz, respectively.
5.2.2. Directivity. The standalone waterfall diagrams only
provide information about the frequency response but can-
not be considered a metric of the quality of the sound source
location. Alongside with the waterfalls, DP is calculated to
properly evaluate the quality of the array’s response. The
evaluation covers a variable No and Nam. A low angular res-
olution leads to a lower resolution of the waterfall diagrams,
but only the metrics can show the impact. The frequency
response of a subarray has a strong variation at the main lobe
and, therefore, in DP. A threshold of 8 for DP indicates that
the main lobe’s surface corresponds to maximum half of a
quadrant. Figure 18 depicts the evolution of DP for our fre-
quency range when increasing the angular resolution and
when combining subarrays. The angular resolution deter-
mines the upper bound DP converges to, which is defined
in equation (3), and coincides with the number of orienta-
tions. Notice that a higher angular resolution does not
improve DP when only the inner subarray is active. The value
of Nam, on the other hand, determines how fast DP con-
verges to its upper limit, based on the frequency of the
sound source. Thus, a higher value of Nam increases DP
for lower sound source frequencies. For instance, in the
case that only subarray 1 is used, DP shows better results
only at frequencies beyond 6.9 kHz. This frequency
decreases to approximately 1.7 kHz when all microphones
are used in the beamforming.
The evaluation of the frequency response of this architec-
ture concludes that the angular resolution determines the
quality of the array’s response. This is reflected in the DP,
which is clearly limited when reducing the number of orien-
tations. Nam determines if at what sound source frequency a
certain threshold is achieved. Clearly, a higher value of Nam
allows the achievement of a better DP at a lower frequency.
A higher number of orientations and active microphones
15
4 inner mics
10
So
un
d 
so
ur
ce
 fr
eq
ue
nc
y 
(k
H
z)
5
0
0 50 100
Angle of arrival (degrees)
(a) (b)
(c) (d)
150 200 250 300 350
15 1
12 inner mics
10
So
un
d 
so
ur
ce
 fr
eq
ue
nc
y 
(k
H
z)
5
0
0 50 100
Angle of arrival (degrees)
150 200 250 300 350
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
15
28 inner mics
10
So
un
d 
so
ur
ce
 fr
eq
ue
nc
y 
(k
H
z)
5
0
0 50 100
Angle of arrival (degrees)
150 200 250 300 350
15 1
All mics in use
10
So
un
d 
so
ur
ce
 fr
eq
ue
nc
y 
(k
H
z)
5
0
0 50 100
Angle of arrival (degrees)
150 200 250 300 350
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Figure 17: Theoretical waterfall diagrams of the power-efficient architecture obtained for 64 orientations. The plots are obtained by enabling
only a certain number of subarrays. (a–d) Only the 4 most inner microphones, only the 12 most inner microphones, the 28 most inner
microphones, and all microphones.
17Journal of Sensors
lead to other trade-offs. Whereas the angular resolution will
also affect the performance, the Nam will determine the
power consumption.
5.2.3. Resource Consumption. The relatively low resource
requirements of the power-efficient architecture allows the
use of small and low-power Flash-based FPGAs. Table 6
summarizes the resource consumption of the evaluated
power-efficient architecture. The target SmartFusion2
M2S025 FPGA provides enough resources to allocate one
instantiation of the architecture when fully using all the 52
MEMS microphones of the array. A higher number of subar-
rays mainly increase the resource consumption of the beam-
forming stage. Moreover, the most demanding resources are
the dedicated memory blocks, achieving occupancy rates of
76.5% and 90.3% for the two types of memory resources
uSRAM (RAM64x18) and LSRAM (RAM1K18), respec-
tively, available in this Microsemi FPGA family [36]. The
use of all subarrays demands the use of VHDL attributes to
distribute the allocation of the different delay memories of
102 103 104
0
10
20
30
40
50
60
70
8 orientations
Sound source frequency (Hz)
D
P
Inner 4 mics
Inner 12 mics
Inner 28 mics
All 52 mics
reshold
(a)
102 103 104
Sound source frequency (Hz)
Inner 4 mics
Inner 12 mics
Inner 28 mics
All 52 mics
reshold
0
10
20
30
40
50
60
70
16 orientations
D
P
(b)
102 103 104
Sound source frequency (Hz)
Inner 4 mics
Inner 12 mics
Inner 28 mics
All 52 mics
reshold
0
10
20
30
40
50
60
70
32 orientations
D
P
(c)
102 103 104
Sound source frequency (Hz)
Inner 4 mics
Inner 12 mics
Inner 28 mics
All 52 mics
reshold
0
10
20
30
40
50
60
70
64 orientations
D
P
(d)
Figure 18: Directivities of the power-efficient architecture when considering a variable number of orientations and active microphones. (a–d)
The DP with only 8 orientations up to 64 orientations.
18 Journal of Sensors
T
a
bl
e
6:
R
es
ou
rc
e
co
ns
um
pt
io
n
af
te
r
pl
ac
em
en
t
an
d
ro
ut
in
g
w
he
n
co
m
bi
ni
ng
th
e
tw
o
in
ne
rm
os
t
su
ba
rr
ay
s
an
d
w
he
n
al
lm
ic
ro
ph
on
es
ar
e
ac
ti
ve
.E
ac
h
su
ba
rr
ay
co
m
bi
na
ti
on
de
ta
ils
th
e
re
so
ur
ce
co
ns
um
pt
io
n
of
th
e
fi
lte
r
an
d
th
e
be
am
fo
rm
in
g
st
ag
e.
R
es
ou
rc
es
A
va
ila
bl
e
re
so
ur
ce
s
In
ne
r
4
m
ic
s
In
ne
r
12
m
ic
s
In
ne
r
28
m
ic
s
A
ll
52
m
ic
s
B
ea
m
fo
rm
er
Fi
lte
rs
T
ot
al
B
ea
m
fo
rm
er
Fi
lte
rs
T
ot
al
B
ea
m
fo
rm
er
Fi
lt
er
s
T
ot
al
B
ea
m
fo
rm
er
Fi
lt
er
s
T
ot
al
4L
U
T
27
69
6
37
7
83
9
16
26
95
5
83
9
22
02
23
45
83
9
35
92
43
03
83
6
55
47
D
FF
27
69
6
31
9
38
58
46
49
75
0
38
60
50
90
16
80
38
62
60
38
29
83
38
62
73
65
In
te
rf
ac
e
lo
gi
c
27
69
6
14
4
14
4
43
2
43
2
14
4
72
0
10
08
14
4
12
96
18
72
14
4
21
60
R
A
M
64
x1
8
34
0
2
2
0
2
2
0
2
2
24
2
26
R
A
M
1K
18
31
4
0
4
12
0
12
28
0
28
28
0
28
M
A
C
C
34
0
2
6
0
2
6
0
2
6
0
2
6
19Journal of Sensors
the beamforming stage between these memory resources,
since the Libero 11.8 tool does not automatically allocate
the beamforming memories into these memory resources.
In fact, the delay values linked to the outer subarray com-
posed of 16 MEMSmicrophones have to be entirely allocated
in uSRAM blocks. The consumption of the logic resources
achieves a maximum consumption of 26% of the available
D-type Flip-Flops (DFFs). In this regard, some of the perfor-
mance strategies detailed in Section 5.1 can benefit from the
use of the remaining logic resources.
5.2.4. Power Analysis. Flash-based FPGAs like Microsemi’s
IGLOO2, PolarFire, or SmartFusion2 not only offer the low-
est static power consumption, demanding only few tens of
mW, but also support an interesting sleep mode called
Flash-Freeze. The Flash-Freeze mode is a low-power static
mode that preserves the FPGA configuration while reducing
the FPGA’s power draw to just 1.92mW for IGLOO2 and
SmartFusion2 FPGAs [37].
Table 7 summarizes the reported power consumption.
The power consumption of the FPGA design has been
obtained by using the Libero SoC 11.8 Power tool, obtaining
the dynamic and the static power consumption. Whereas the
static power consumption is the power consumed based on
the used resources, the dynamic power consumption is deter-
mined by the computational usage of the resources and the
dynamism of the input data The static power remains mainly
constant since there is no significant increment of the con-
sumption when increasing the number of microphones.
The dynamic power consumption, on the contrary, increases
since a large number of data must be stored and processed.
The overall power consumption of the reconfigurable archi-
tecture rounds from 17.8mW to 23.7mW, which repre-
sents a significant reduction compared to architectures
like the high-performance architecture in [9, 10], whose
power consumption ranges from 122mW to 138mW. Fur-
thermore, the low-power consumption of the FPGA is
partially possible thanks to operating at a relatively low
frequency (2.08MHz).
The power consumption analysis must also include the
power consumption of the microphone array. For this analy-
sis, the InvenSense ICS-41350 PDM MEMS microphones
[38] operating at their standard mode are considered as the
microphones composing the array. The power consumption
detailed in Table 7 decreases by deactivating the MEMS
microphones of the array, which is done by disabling their
clock. Thanks to the flexibility of the reconfigurable archi-
tecture, Nam can be changed at runtime. For the current
measurements, the MEMS microphones are powered with
1.8V, which represents a power consumption per micro-
phone of 21.6μW and 777μW for the inactive and active
microphones, respectively. As a result, the power con-
sumption of the MEMS microphones almost doubles the
FPGA’s power consumption when all the microphones
are active.
5.2.5. Timing Analysis. Table 8 summarizes the parameters of
the timing analysis. The value of tP−SRP equals to 174ms for
the analysed architecture when no strategy is applied. Notice
that around 22% corresponds to the initialization of the
beamforming stage when switching orientation.
The timing results when applying the performance strat-
egies proposed in Section 5.1 are summarized in Table 9. The
first strategy reduces the impact of the initialization after
transitions between orientations:
tP‐SRP =No ⋅ t
Filtering
II + ts = 142 1ms 15
Further acceleration is only possible by increasing the
resource consumption while operating the filter stage at a
higher frequency. Based on the remaining resources of the
SmartFusion2 M2S025, up to 6 filter chains can be allocated
in parallel due to the resource consumption of the filter
chains detailed in Table 6. The maximum operational fre-
quency of this architecture ranges from 93.11MHz to
86.92MHz if only the inner subarray or all subarrays are
active, respectively. By operating at 86.92MHz, and consid-
ering single-port memories, up to 43 filter stages can be
fetched. The supported number of filter chains, NFStages, is
obtained from equation (11):
NFStages = min No,NR,NF = min 64, 6, 43 = 6, 16
whereNFStages is limited by the available resources. Therefore,
tP‐SRP =
No
min No,NFStages
⋅ tFilteringII + ts = 23 7ms, 17
and the filter stage needs to operate at least at
FP = FS ⋅
NFStages
NB
= 12 48MHz, 18
in case all the microphones are active.
Table 7: Power consumption expressed in mW when combining microphone subarrays. The values are obtained from the Libero SoC v.11.8
power report for the FPGA operating at FS = 2 08MHz and considering the standard mode of the PDM MEMS microphones [38].
Active subarrays
MEMS microphones Reported on-chip power
Total power
Active Inactive Total Static Dynamic Total
Inner 4 mics 3.096 1.036 4.132 13.963 3.818 17.781 21.913
Inner 12 mics 9.288 0.864 10.152 15.333 4.831 20.164 30.316
Inner 28 mics 21.672 0.518 22.190 15.933 6.819 22.752 44.942
All 52 mics 40.248 0 40.248 15.943 7.794 23.738 63.986
20 Journal of Sensors
5.2.6. Performance Analysis. Table 10 summarizes the achiev-
able performance based on the performance strategy. Notice
that the power-efficient architecture analysed here achieves a
higher performance than the high-performance architecture
in [9, 10] when no strategy is applied. The performance dif-
ference is because the configuration of the power-efficient
architecture under evaluation targets a different Fmax. The
difference between both Fmax values leads to different FS
and DF, which directly affects tP−SRP.
The performance strategies are less effective for the
power-efficient architecture. Although the initialization time
is reduced when continuously acquiring data, it is still higher
than that in the high-performance architecture described in
[9, 10] because of the tFilteringII overhead.
The last strategy proposed in Section 5.1 fully exploits
the limited resources. The achievable performance is sev-
eral orders of magnitude lower than the one of the high-
performance architecture described in [9, 10]. This is
mainly because of two factors. Firstly, the evaluation of
the power-efficient architecture is done on a Flash-based
FPGA with a lower amount of resources than the one used
to evaluate the one in [9, 10]. Secondly, the power-efficient
architecture presents certain performance limitations
which cannot be fully solved through the performance
strategies.
The evaluation of the power-efficient architecture and the
analysis done in [10], targeting different application require-
ments and with different types of FPGAs, demonstrate how
the architectures and their performance strategies can be
applied to any FPGA-based architecture.
5.3. Summary. The power-efficient architecture represents an
alternative when power consumption is a priority. Due to its
low demand in resources, this architecture is able to be
implemented on Flash-based FPGAs. These FPGAs are very
power efficient, but they lack in the number of available
resources. Reduced performance of the sound locator is the
price to pay for the power efficiency. Although several strat-
egies can certainly accelerate the architecture’s response, the
limited FPGA resources and the architecture characteristics
bound further acceleration.
6. Comparison of the
Reconfigurable Architectures
This section presents a comparison of the high-
performance architecture described in [10] and the pro-
posed power-efficient architecture detailed in Section 5.
Both architectures are evaluated on a Zynq-based platform
Table 9: Performance analysis of the optimized designs when
applying and combining the performance strategies. The values
are expressed in ms.
Initial Continuous
Parallel continuous
time multiplexing
tP−SRP 174ms 142.1ms 23.7ms
Table 8: Definition of the architecture’s parameter involved in the time analysis. Ns is the number of output samples and Nam is the number
of active microphones.
Parameter Definition Equation Value (cc/MHz)
tCICII II of the CIC filter 2 ⋅NCIC + 1
9
FS
= 4 33μs
tDCII II of the remove DC DCIC + 2
34
FS
= 16 35μs
tFIRII II of the FIR filter DCIC ⋅
NFIR + 1
2 + 1
544
FS
= 261 5μs
tDelayII II of the delay memories at FS max Δ
1023
FS
= 491 18μs
tSumII II of the cascaded sums 2 ⋅ log2 Nam
12
FS
= 5 77μs
ts Sensing time DCIC ⋅DFIR ⋅ Ns − 1
4032
FS
= 1 94ms
tFilteringII II of the filter stage t
CIC
II + tDCII + tFIRII
587
FS
= 282 1μs
tBeamformingII II of the beamforming stage t
Delay
II + tSumII
1035
FS
= 497 59μs
tPowerII II of the SRP 2
2
FS
= 0 96μs
tII Sum of II’s tBeamformingII + t
Filtering
II + tPowerII
1199
FS
= 599 5μs
tP‐SRP Time to obtain a complete P-SRP No ⋅ tII + ts
361984
FS
= 174 03ms
21Journal of Sensors
(Figure 19) for a fair comparison. Although both architec-
tures have different goals, their characteristics make them
both valid solutions for most of the acoustic applications,
as far as they do not prioritize performance neither power
efficiency. A final comparison against the state-of-the-art
of related architectures is also briefly presented at the
end of this section.
6.1. Frequency Response. The high-performance and power-
efficient architectures use DP to properly evaluate the quality
of the array’s response. Instead of evaluating one sound
source at one particular orientation, like done in Section
5.2.1, the directivity is evaluated by placing a sound source
at all the 64 supported orientations. The average of all direc-
tivities along with the 95% confidence interval is calculated
for the supported orientations. Figure 20(a) depicts the
resulting directivities based on the active subarrays for the
proposed architecture. Notice that, when only the 4 inner
microphones are enabled on both architectures, the prede-
fined threshold of 8 for DP is achieved by none of the archi-
tectures. The directivity increases in case 12 microphones
are enabled, reaching the value of 8 at 3.1 kHz. This value is
reached at 2.1 kHz and 1.7 kHz when 28 and all 52 micro-
phones are enabled. One can also note that the 95% confi-
dence interval noticeably increases at 4 kHz, 6 kHz, and
7 kHz for, respectively, the inner 4, 12, and 28 and all 52
microphones.
The power-efficient architecture outperforms the fre-
quency response of the high-performance architecture,
which is depicted in Figure 20(b). The selection of the
low-power FIR filter characteristics and the decomposi-
tion of DF in the PDM demodulation have a lower
impact on the power-efficient architecture. A high DCIC
leads to a higher-order low-pass FIR filter, improving
the frequency response. The high-performance architec-
ture leads to a lower resolution in the beamforming stage
since the values of Δ are directly related to the input data
rate. Therefore, besides the fact that a high DCIC leads to
a better frequency response, the variance of DP of the
architecture based on the sound source location increases
with the sound source frequency, to become very sensi-
tive to the beamed orientation. A possible solution is to
implement a partially parallel low-pass FIR filter, reduc-
ing the existing dependency between DCIC and NFIR . It
would increase the already high resource consumption
of the filter stage.
The power-efficient architecture has higher beamforming
resolution thanks to beamforming before downsampling the
input data. Instead, the high-performance architecture per-
forms the beamforming after the filter stage, whose data has
a lower rate. The capacity of properly determining the DoA
increases with Nam for both architectures, as shown in
Figure 20.
6.2. Resource Consumption. The power-efficient architecture
presents a significantly lower resource consumption com-
pared to the high-performance architecture. Figure 21
depicts a comparison of the resource consumption of both
architectures when targeting a Zynq 7020 FPGA. Although
the low resource consumption of the power-efficient archi-
tecture allows the use of a smaller and lower demanding
power FPGA, the Zynq 7020 FPGA is used in order to
fairly compare both architectures. The amount of different
types of resources demanded by the proposed architecture
is significantly lower than the architecture presented in [9,
10]. The low resource consumption is possible thanks to
the reduction of the number of filter chains, leading to a
more efficient beamforming operation in terms of
resources. Whereas each microphone in the high-
performance architecture has an individual filter chain,
the power-efficient architecture only needs one.
This percentage decreases to 14.7% and 32.8% of the
consumed registers and Look-Up Tables (LUTs), respec-
tively, in the power-efficient architecture. An efficient
Table 10: Summary of the achievable performance and the related parameters when using the performance strategies proposed in Section 5.1.
Strategy Parameter
Active subarrays
Inner 4 mics Inner 12 mics Inner 28 mics All 52 mics
None
Fp (MHz) Fs
Performance (Or/s) 3 98E + 02
Continuous processing
Fp (MHz) Fs
Performance (Or/s) 4 50E + 02
Parallel continuous time multiplexing
Fop (MHz) 93.11 88.02 91.08 86.92
NBStages 6 6 6 5
Performance (Or/s) 2 70E + 03 2 70E + 03 2 70E + 03 2 25E + 03
Figure 19: Demonstrator [41] and target platform on which both
reconfigurable architectures are compared.
22 Journal of Sensors
memory partition is possible thanks to the storage of PDM
signals and the use of LUTs as internal memory. LUTs are
the constraint resource in the high-performance architec-
ture, justifying the increment of the consumption of
dedicated internal memories (BRAMs). In this regard,
BRAMs are the limiting resource for the power-efficient
architecture. Due to the relatively low consumption of
LUTs, the consumption of BRAMs can be certainly reduced
if LUTs are used as internal memory in the beamforming
stage. In fact, BRAMs are only required to store the incom-
ing samples from the 24 microphones of the outer subarray.
The available resources in the Zynq 7020 support up to 11
instantiations of the power-efficient architecture, which
represents the capacity to compute the incoming signal
from more than 500 microphones simultaneously.
6.3. Timing and Performance. Figure 22 depicts a comparison
of tP−SRP for both architectures defined by the supported
configurations detailed in Section 3.2.2. No performance strat-
egy has been applied. Notice how the values of tP−SRP differ
with approximately 33ms for each supported configuration.
Although tP−SRP is equally defined for both architectures, the
II values of the beamforming stage vary due to max(Δ), which
is also reflected in the memory requirements (Figure 9).
The placement of the beamforming stage into the archi-
tecture does not only affect the frequency response but also
directly determines the achievable performance. The perfor-
mance strategies, proposed in [10] and in Section 5.1, accen-
tuate the architectural differences. Figure 14 exemplifies the
relevance of the beamformer placement from the perfor-
mance perspective. Although both architectures support the
same performance strategy of the continuous processing,
the position of the beamforming stage determines what II
of the components affects the increment of tP−SRP. Whereas
in the high-performance architecture only the II of the power
stage increases tP−SRP, the II of the filter stage and additional
clock cycles to reset the filters also increase tP−SRP in the
power-efficient architecture.
Figure 23 shows the values of tP−SRP for the same
design space when applying the parallel continuous time
102 103 104
0
10
20
30
40
50
60
70
Sound source frequency (Hz)
D
P
Inner 4 mics
Inner 12 mics
Inner 28 mics
All 52 mics
Threshold
(a)
102 103 104
0
10
20
30
40
50
60
70
Sound source frequency (Hz)
D
P
Inner 4 mics
Inner 12 mics
Inner 28 mics
All 52 mics
Threshold
(b)
Figure 20: Average DP with a 95% confidence interval for the supported orientations when combining subarrays of the power-efficient
architecture (a) and the high-performance architecture (b).
5.8 1.8
15.9
1.6
35.9
2.3
55.5
3.28.9 1.8 2.3
23.1
3.2
47.1
4.9
79.5
7.61.3 4.6
10.4 16.1
18.9
5.5 2.7
12.7
4.6 2.7
27.3
10.4
2.7
41.8
2.7
90
High-performance
architecture
Power-efficient
architecture
High-performance
architecture
Power-efficient
architecture
High-performance
architecture
Power-efficient
architecture
High-performance
architecture
52 mics28 mics12 mics4 mics
Power-efficient
architecture
Re
so
ur
ce
 co
ns
um
pt
io
n 
(%
)
Registers
LUTs
BRAM18K
DSP48
80
70
60
50
40
30
20
10
0
Figure 21: Comparison of both architectures based on the Zynq 7020 resource consumption after placement and routing when combining
microphone subarrays.
23Journal of Sensors
multiplexing performance strategy. The variation of tP−SRP
based on Fmax reflects the dependency of tP−SRP on FS for
the power-efficient architecture. This dependency, how-
ever, disappears for the high-performance architecture,
where tP−SRP only depends on Fmax. Such a characteristic
reduces the dependency on FS, which corresponds to the
microphone’s clock, during the design stage of a high-
performance architecture.
1.4 1.45 1.5 1.55 1.6 1.65
×104Fmax (Hz)
220
Power-efficient architecture
High-performance architecture
210
200
190
180
170
t P
-S
RP
 (m
s)
160
150
140
Figure 22: Evolution of tP−SRP for each architecture when no performance strategy is applied. The explored design space corresponds to the
one depicted in Figure 6.
2.9
Power-efficient architecture
High-performance architecture
1.4 1.45 1.5 1.55 1.6 1.65
×104Fmax (Hz)
2.8
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2
1.9
t P
-S
RP
 (m
s)
Figure 23: Evolution of tP−SRP for each architecture when the best performance strategy is applied. The explored design space corresponds to
the one depicted in Figure 6.
24 Journal of Sensors
Table 11 summarizes the parameters involved on the cal-
culation of this performance strategy for both architectures.
The achievable performance expressed in Or/s decreases
when increasing Nam on both architectures. Notice that,
although the power-efficient architecture is able to operate
at a higher frequency and to allocate more instances, the
achievable performance of the high-performance architec-
ture outperforms the power-efficient architecture despite
both target the same FPGA.
6.4. Summary. Table 12 summarizes the comparison of the
proposed power-efficient architecture and the related works
from a timing and power consumption point of view. As a
consequence of the lower resource consumption, not only
larger microphone arrays can be processed in parallel but
also more power-efficient FPGAs can be used to minimize
the power consumption. The main difference in power con-
sumption between the analysed architecture and the one
described in [8] is in the microphones’ power consumption
due to operating at a different operational mode. The
power-efficient architecture presents a major reduction of
the power consumption when compared to the high-
performance architecture described in [10], achieving the
lowest power per microphone ratio when all the subarrays
are active. Although the power-efficient architecture is
slower than the high-performance architecture, the time-
per-microphone ratio is better than other related solutions.
7. Conclusions
FPGA-based embedded systems offer sufficient flexibility to
support dynamic acoustic beamformers able to achieve
real-time performance or power efficiency. Nevertheless,
these desirable features are only achievable through design
considerations and performance strategies to fully exploit
the FPGA’s characteristics. On the one hand, design consid-
erations lead to compromises on the selection of the architec-
ture’s components, grouped in stages, to obtain the desired
response. The position of the beamforming stage, for
instance, does affect not only memory requirements but also
performance, frequency response, and resource consumption
of the architecture. The specifications of the PDM demodula-
tion present a limited impact on the performance and
resource consumption. On the other hand, performance
strategies enable a higher performance by exploiting the
FPGA’s resources. Although such performance strategies
are strongly dependent on the reconfigurable architecture,
they are capable to significantly increase the performance of
reconfigurable acoustic beamforming architectures.
Data Availability
The data used to support the findings of this study are
available from the corresponding author upon request.
Table 11: Comparison of architectures when applying the parallel continuous time multiplex strategy. The maximum number of
beamformers is obtained based on the available resources and the resource consumption of each beamformer (Table 6 and reported in
[10]). The value of Fop corresponds to the maximum frequency reported by the Vivado 2016.4 tool after placement and routing.
High-performance architecture [10] Power-efficient architecture
4 mics 12 mics 28 mics 52 mics 4 mics 12 mics 28 mics 52 mics
NBStages 55 23 10 6 170 117 59 11
NFStages — — — — 70 70 70 70
Fop (MHz) 95.62 93.27 91.97 87.91 165.04 141.70 112.91 94.22
Performance (Or/s) 8 35E + 07 3 41E + 07 1 46E + 07 8 37E + 06 5 36E + 06 3 69E + 06 1 86E + 06 3 47E + 05
Table 12: Comparison of the time performance and the reported power consumption of the microphone array and the FPGA.
References Year Device Microphone NMics
Time
(ms)
Time/mic
(ms/mic)
Power
(mW)
Power/mic
(mW/mic)
[20] 2015 IGLOO2
STMicroelectronics
MP34DT01
8 — — 30.44 3.80
[21] 2016 Spartan-6
STMicroelectronics
MP32DB01
16 18.85 1.18 78.99 4.94
[42] 2016 EFM32 InvenSense INMP504 4 249 62.25 7.20 1.80
[10] 2017 Zynq 7020
Analog Devices
ADMP521
52 2 0.04 343.92 6.61
[8] 2018
SmartFusion2
M2S050
InvenSense ICS-41350 52 141.66 2.724 33.78 0.65
Power-efficient
architecture
2019
SmartFusion2
M2S025
InvenSense ICS-41350 52 23.68 0.45 63.98 1.23
25Journal of Sensors
Conflicts of Interest
The authors declare that there is no conflict of interests
regarding the publication of this paper.
Acknowledgments
This work was supported by the European Regional
Development Fund (ERDF) and the Brussels-Capital
Region-Innoviris within the framework of the Operational
Programme 2014-2020 through the ERDF-2020 project
ICITY-RDI.BRU.
References
[1] B. Widrow and F. L. Luo, “Microphone arrays for hearing aids:
an overview,” Speech Communication, vol. 39, no. 1-2,
pp. 139–146, 2003.
[2] A. Izquierdo-Fuente, L. Del Val, M. I. Jiménez, and J. J. Villa-
corta, “Performance evaluation of a biometric system based on
acoustic images,” Sensors, vol. 11, no. 10, pp. 9499–9519, 2011.
[3] L. del Val, A. Izquierdo-Fuente, J. J. Villacorta, and M. Raboso,
“Acoustic biometric system based on preprocessing techniques
and linear support vector machines,” Sensors, vol. 15, no. 6,
pp. 14241–14260, 2015.
[4] M. Akay, A. Dragomir, Y. M. Akay et al., “The assessment of
stent effectiveness using a wearable beamforming MEMS
microphone array system,” IEEE Journal of Translational
Engineering in Health and Medicine, vol. 4, pp. 1–10, 2016.
[5] D. C. Moore and I. A. McCowan, “Microphone array speech
recognition: experiments on overlapping speech in meetings,”
in 2003 IEEE International Conference on Acoustics, Speech,
and Signal Processing, 2003. Proceedings. (ICASSP '03),
pp. V–497, Hong Kong, China, April 2003.
[6] S. Doclo and M. Moonen, “GSVD-based optimal filtering for
single and multimicrophone speech enhancement,” IEEE
Transactions on Signal Processing, vol. 50, no. 9, pp. 2230–
2244, 2002.
[7] M. Brandstein and D. Ward, Microphone Arrays: Signal Pro-
cessing Techniques and Applications, Springer Science & Busi-
ness Media, 2013.
[8] B. da Silva, L. Segers, A. Braeken, K. Steenhaut, and A. Touhafi,
“A Low-power FPGA-based architecture for microphone
arrays in wireless sensor networks,” in Applied Reconfigurable
Computing. Architectures, Tools, and Applications: 14th Inter-
national Symposium. ARC 2018pp. 281–293, Springer, Cham.
[9] B. da Silva, L. Segers, A. Braeken, and A. Touhafi, “Runtime
reconfigurable beamforming architecture for real-time
sound-source localization,” in 2016 26th International Confer-
ence on Field Programmable Logic and Applications (FPL),
pp. 1–4, Lausanne, Switzerland, August-September 2016.
[10] B. da Silva, A. Braeken, K. Steenhaut, and A. Touhafi, “Design
considerations when accelerating an FPGA-based digital
microphone array for sound-source localization,” Journal of
Sensors, vol. 2017, Article ID 6782176, 20 pages, 2017.
[11] J. Tiete, F. Domínguez, B. da Silva, A. Touhafi, and
K. Steenhaut, “MEMS microphones for wireless applications,”
in Wireless MEMS Networks and Applications, pp. 177–195,
Elsevier, 2017.
[12] Microsemi, “Alexa voice service development kit
(ZLK38AVS2),” July 2018, https://www.microsemi.com/
productdirectory/connected-home/4628-zlk38avs.
[13] B. da Silva, A. Braeken, and A. Touhafi, “FPGA-based archi-
tectures for acoustic beamforming with microphone arrays:
trends, challenges and research opportunities,” Computers,
vol. 7, no. 3, p. 41, 2018.
[14] E. Zwyssig, M. Lincoln, and S. Renals, “A digital microphone
array for distant speech recognition,” in 2010 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing,
pp. 5106–5109, Dallas, TX, USA, March 2010.
[15] I. Hafizovic, C. I. C. Nilsen, M. Kjølerbakken, and V. Jahr,
“Design and implementation of a MEMS microphone array
system for real-time speech acquisition,” Applied Acoustics,
vol. 73, no. 2, pp. 132–143, 2012.
[16] A. Izquierdo, J. Villacorta, L. del Val Puente, and L. Suárez,
“Design and evaluation of a scalable and reconfigurable
multi-platform system for acoustic imaging,” Sensors, vol. 16,
no. 10, p. 1671, 2016.
[17] I. Salom, V. Celebic, M. Milanovic, D. Todorovic, and
J. Prezelj, “An implementation of beamforming algorithm on
FPGA platform with digital microphone array,” inAudio Engi-
neering Society Convention 138, Audio Engineering Society,
2015.
[18] D. Todorović, I. Salom, V. Celebić, and J. Prezelj, “Implemen-
tation and application of FPGA platform with digital MEMS
microphone array,” in Proceedings of the Proceedings of 4th
International Conference on Electrical, Electronics and Com-
puting Engineering, pp. 5–8, Kladovo, Serbia, 2017.
[19] N. Hegde, Seamlessly interfacing MEMS microphones with
Blackfin processors, EE-350 Engineer-to-Engineer Note, 2010.
[20] L. Petrica and G. Stefan, “Energy-efficient WSN architecture
for illegal deforestation detection,” International Journal of
Sensors and Sensor Networks, vol. 3, no. 3, pp. 24–30, 2015.
[21] L. Petrica, “An evaluation of low-power microphone array
sound source localization for deforestation detection,” Applied
Acoustics, vol. 113, pp. 162–169, 2016.
[22] J. Tiete, F. Domínguez, B. da Silva, L. Segers, K. Steenhaut,
and A. Touhafi, “SoundCompass: a distributed MEMS
microphone array-based sensor for sound source localization,”
Sensors, vol. 14, no. 2, pp. 1918–1949, 2014.
[23] T. E. Bogale, L. Vandendorpe, and L. B. Le, “Sensing through-
put tradeoff for cognitive radio networks with noise variance
uncertainty,” in Proceedings of the 9th International Confer-
ence on Cognitive Radio Oriented Wireless Networks,
pp. 435–441, Oulu, Finland, 2014.
[24] T. Taguchi, T. Nakadai, R. Egusa et al., “Investigation on opti-
mal microphone arrangement of spherical microphone array
to achieve shape beamforming,” in 2014 5th International
Conference on Intelligent Systems, Modelling and Simulation,
pp. 330–333, Langkawi, Malaysia, January 2014.
[25] A. Malgoezar, M. Snellen, P. Sijtsma, and D. Simons, “Improv-
ing beamforming by optimization of acoustic array
microphone positions,” in Proceedings of the 6th Berlin Beam-
forming Conference, pp. 1–24, Berlin, Germany, 2016.
[26] E. Sarradj, “A generic approach to synthesize optimal array
microphone arrangements,” in 6th Berlin beamforming con-
ference, BeBeC-2016-S4, p. 12, Berlin, Germany, 2016.
[27] Analog Devices, “ADMP521 datasheet,” July 2018, http://www
.analog.com/media/en/technical-documentation/obsolete-data-
sheets/ADMP521.pdf.
26 Journal of Sensors
[28] TDK Invensense, “INMP621 datasheet,” July 2018, https://
www.invensense.com/wp-content/uploads/2015/02/
INMP621.pdf.
[29] Knowles, “SPH0645LM4H-B datasheet,” July 2018, https://
www.alldatasheet.com/datasheet-pdf/pdf/791053/
KNOWLES/SPH0645LM4H-B.html.
[30] TDK Invensense, “ICS-43434 datasheet,” July 2018, https://
www.invensense.com/wp-content/uploads/2016/02/
DS000069-ICS-43434-v1.2.pdf.
[31] E. Hogenauer, “An economical class of digital filters for deci-
mation and interpolation,” IEEE Transactions on Acoustics,
Speech, and Signal Processing, vol. 29, no. 2, pp. 155–162, 1981.
[32] M. P. Donadio, “CIC filter introduction,” in IEEE Interna-
tional Symposium on Communications, pp. 1–6, 2000.
[33] R. Lyons, “Understanding cascaded integrator-comb filters,”
Embedded Systems programming, vol. 18, pp. 14–27, 2005.
[34] M. J. Taghizadeh, P. N. Garner, and H. Bourlard, “Microphone
array beampattern characterization for hands-free speech
applications,” in 2012 IEEE 7th Sensor Array andMultichannel
Signal Processing Workshop (SAM), pp. 465–468, Hoboken,
NJ, USA, June 2012.
[35] H. L. Van Trees, Optimum Array Processing: Part IV of
Detection, Estimation, and Modulation Theory, John Wiley
& Sons, 2004.
[36] Microsemi SmartFusion2 SoC FPGA and IGLOO2 FPGA Fab-
ric, “User Guide 0445 (UG0445) v.6,” July 2018, http://www
.microsemi.com/index.php?option=com_docman&task=doc_
download&gid=132008.
[37] Microsemi SmartFusion2 SoC and IGLOO2 FPGA Low-
Power Design, “User Guide 0444 (UG0444) v.5,” July 2018,
http://www.microsemi.com/document-portal/doc_download/
132010-ug0444-smartfusion2-andigloo2-fpga-low-power-
design-user-guide.
[38] TDK Invensense, “ICS41350 datasheet,” July 2018, https://
www.invensense.com/wp-content/uploads/2016/02/
DS000047-ICS-41350-v1.1.pdf.
[39] B. da Silva, L. Segers, Y. Rasschaert, Q. Quevy, A. Braeken, and
A. Touhafi, “A Multimode SoC FPGA-based acoustic camera
for wireless sensor networks,” in 2018 13th International Sym-
posium on Reconfigurable Communication-centric Systems-on-
Chip (ReCoSoC), pp. 1–8, Lille, France, July 2018.
[40] J. Lewis, “Analog and digital MEMS microphone design
considerations,” July 2018, http://www.analog.com/media/en/
technical-documentation/technical-articles/Analog-and-Digital-
MEMS-Microphone-DesignConsiderations-MS-2472.pdf.
[41] B. da Silva, L. Segers, A. Braeken, and A. Touhafi, “A runtime
reconfigurable FPGA-based microphone array for sound
source localization,” in 2016 26th International Conference
on Field Programmable Logic and Applications (FPL), p. 1,
Lausanne, Switzerland, August-September 2016.
[42] G. Ottoy, B. Thoen, and L. De Strycker, “A low-power MEMS
microphone array for wireless acoustic sensors,” in 2016 IEEE
Sensors Applications Symposium (SAS), pp. 1–6, Catania, Italy,
April 2016.
27Journal of Sensors
International Journal of
Aerospace
Engineering
Hindawi
www.hindawi.com Volume 2018
Robotics
Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
 Active and Passive  
Electronic Components
VLSI Design
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Shock and Vibration
Hindawi
www.hindawi.com Volume 2018
Civil Engineering
Advances in
Acoustics and Vibration
Advances in
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Electrical and Computer 
Engineering
Journal of
Advances in
OptoElectronics
Hindawi
www.hindawi.com
Volume 2018
Hindawi Publishing Corporation 
http://www.hindawi.com Volume 2013www.hindawi.com
The Scientific 
World Journal
8
Control Science
and Engineering
Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com
 Journal ofEngineering
Volume 2018
Sensors
Journal of
Hindawi
www.hindawi.com Volume 2018
International Journal of
Rotating
Machinery
Hindawi
www.hindawi.com Volume 2018
Modelling &
Simulation
in Engineering
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Chemical Engineering
International Journal of  Antennas and
Propagation
International Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Navigation and 
 Observation
International Journal of
Hindawi
www.hindawi.com Volume 2018
 Advances in 
Multimedia
Submit your manuscripts at
www.hindawi.com
