1,350 research outputs found
Polyphonic Sound Event Detection by using Capsule Neural Networks
Artificial sound event detection (SED) has the aim to mimic the human ability
to perceive and understand what is happening in the surroundings. Nowadays,
Deep Learning offers valuable techniques for this goal such as Convolutional
Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has
been recently introduced in the image processing field with the intent to
overcome some of the known limitations of CNNs, specifically regarding the
scarce robustness to affine transformations (i.e., perspective, size,
orientation) and the detection of overlapped images. This motivated the authors
to employ CapsNets to deal with the polyphonic-SED task, in which multiple
sound events occur simultaneously. Specifically, we propose to exploit the
capsule units to represent a set of distinctive properties for each individual
sound event. Capsule units are connected through a so-called "dynamic routing"
that encourages learning part-whole relationships and improves the detection
performance in a polyphonic context. This paper reports extensive evaluations
carried out on three publicly available datasets, showing how the CapsNet-based
algorithm not only outperforms standard CNNs but also allows to achieve the
best results with respect to the state of the art algorithms
Super-Resolution from Short-Time Fourier Transform Measurements
While spike trains are obviously not band-limited, the theory of
super-resolution tells us that perfect recovery of unknown spike locations and
weights from low-pass Fourier transform measurements is possible provided that
the minimum spacing, , between spikes is not too small. Specifically,
for a cutoff frequency of , Donoho [2] shows that exact recovery is
possible if , but does not specify a corresponding recovery
method. On the other hand, Cand\`es and Fernandez-Granda [3] provide a recovery
method based on convex optimization, which provably succeeds as long as . In practical applications one often has access to windowed Fourier
transform measurements, i.e., short-time Fourier transform (STFT) measurements,
only. In this paper, we develop a theory of super-resolution from STFT
measurements, and we propose a method that provably succeeds in recovering
spike trains from STFT measurements provided that .Comment: IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), May 2014, to appea
A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design
Audio fingerprinting, also named as audio hashing, has been well-known as a
powerful technique to perform audio identification and synchronization. It
basically involves two major steps: fingerprint (voice pattern) design and
matching search. While the first step concerns the derivation of a robust and
compact audio signature, the second step usually requires knowledge about
database and quick-search algorithms. Though this technique offers a wide range
of real-world applications, to the best of the authors' knowledge, a
comprehensive survey of existing algorithms appeared more than eight years ago.
Thus, in this paper, we present a more up-to-date review and, for emphasizing
on the audio signal processing aspect, we focus our state-of-the-art survey on
the fingerprint design step for which various audio features and their
tractable statistical models are discussed.Comment: http://www.iaria.org/conferences2015/PATTERNS15.html ; Seventh
International Conferences on Pervasive Patterns and Applications (PATTERNS
2015), Mar 2015, Nice, Franc
- …