1,350 research outputs found

    Polyphonic Sound Event Detection by using Capsule Neural Networks

    Full text link
    Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms

    Super-Resolution from Short-Time Fourier Transform Measurements

    Full text link
    While spike trains are obviously not band-limited, the theory of super-resolution tells us that perfect recovery of unknown spike locations and weights from low-pass Fourier transform measurements is possible provided that the minimum spacing, Δ\Delta, between spikes is not too small. Specifically, for a cutoff frequency of fcf_c, Donoho [2] shows that exact recovery is possible if Δ>1/fc\Delta > 1/f_c, but does not specify a corresponding recovery method. On the other hand, Cand\`es and Fernandez-Granda [3] provide a recovery method based on convex optimization, which provably succeeds as long as Δ>2/fc\Delta > 2/f_c. In practical applications one often has access to windowed Fourier transform measurements, i.e., short-time Fourier transform (STFT) measurements, only. In this paper, we develop a theory of super-resolution from STFT measurements, and we propose a method that provably succeeds in recovering spike trains from STFT measurements provided that Δ>1/fc\Delta > 1/f_c.Comment: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014, to appea

    A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design

    Full text link
    Audio fingerprinting, also named as audio hashing, has been well-known as a powerful technique to perform audio identification and synchronization. It basically involves two major steps: fingerprint (voice pattern) design and matching search. While the first step concerns the derivation of a robust and compact audio signature, the second step usually requires knowledge about database and quick-search algorithms. Though this technique offers a wide range of real-world applications, to the best of the authors' knowledge, a comprehensive survey of existing algorithms appeared more than eight years ago. Thus, in this paper, we present a more up-to-date review and, for emphasizing on the audio signal processing aspect, we focus our state-of-the-art survey on the fingerprint design step for which various audio features and their tractable statistical models are discussed.Comment: http://www.iaria.org/conferences2015/PATTERNS15.html ; Seventh International Conferences on Pervasive Patterns and Applications (PATTERNS 2015), Mar 2015, Nice, Franc
    • …
    corecore