227 research outputs found
Sound Event Detection in Synthetic Audio: Analysis of the DCASE 2016 Task Results
As part of the 2016 public evaluation challenge on Detection and
Classification of Acoustic Scenes and Events (DCASE 2016), the second task
focused on evaluating sound event detection systems using synthetic mixtures of
office sounds. This task, which follows the `Event Detection - Office
Synthetic' task of DCASE 2013, studies the behaviour of tested algorithms when
facing controlled levels of audio complexity with respect to background noise
and polyphony/density, with the added benefit of a very accurate ground truth.
This paper presents the task formulation, evaluation metrics, submitted
systems, and provides a statistical analysis of the results achieved, with
respect to various aspects of the evaluation dataset
The bag-of-frames approach: a not so sufficient model for urban soundscapes
The "bag-of-frames" approach (BOF), which encodes audio signals as the
long-term statistical distribution of short-term spectral features, is commonly
regarded as an effective and sufficient way to represent environmental sound
recordings (soundscapes) since its introduction in an influential 2007 article.
The present paper describes a concep-tual replication of this seminal article
using several new soundscape datasets, with results strongly questioning the
adequacy of the BOF approach for the task. We show that the good accuracy
originally re-ported with BOF likely result from a particularly thankful
dataset with low within-class variability, and that for more realistic
datasets, BOF in fact does not perform significantly better than a mere
one-point av-erage of the signal's features. Soundscape modeling, therefore,
may not be the closed case it was once thought to be. Progress, we ar-gue,
could lie in reconsidering the problem of considering individual acoustical
events within each soundscape
Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model
The task of bandwidth extension addresses the generation of missing high
frequencies of audio signals based on knowledge of the low-frequency part of
the sound. This task applies to various problems, such as audio coding or audio
restoration. In this article, we focus on efficient bandwidth extension of
monophonic and polyphonic musical signals using a differentiable digital signal
processing (DDSP) model. Such a model is composed of a neural network part with
relatively few parameters trained to infer the parameters of a differentiable
digital signal processing model, which efficiently generates the output
full-band audio signal.
We first address bandwidth extension of monophonic signals, and then propose
two methods to explicitely handle polyphonic signals. The benefits of the
proposed models are first demonstrated on monophonic and polyphonic synthetic
data against a baseline and a deep-learning-based resnet model. The models are
next evaluated on recorded monophonic and polyphonic data, for a wide variety
of instruments and musical genres. We show that all proposed models surpass a
higher complexity deep learning model for an objective metric computed in the
frequency domain. A MUSHRA listening test confirms the superiority of the
proposed approach in terms of perceptual quality.Comment: Accepting for publication in EURASIP Journal on Audio, Speech, and
Music Processin
On the visual display of audio data using stacked graphs
Visualisation is an important tool for many steps of a research project. In this paper, we present several displays of audio data based on stacked graphs. Thanks to a careful use of the layering the proposed displays concisely convey a large amount of information. Many flavours are presented, each useful for a specific type of data, from spectral and chromatic data to multi-source and multi channel data. We shall demonstrate that such displays for the case of spectral and chromatic data offer a different compromise than the traditional spectrogram and chroma gram, emphasizing timing information over frequency
Large-scale feature selection with Gaussian mixture models for the classification of high dimensional remote sensing images
A large-scale feature selection wrapper is discussed for the classification of high dimensional remote sensing. An efficient implementation is proposed based on intrinsic properties of Gaussian mixtures models and block matrix. The criterion function is split into two parts:one that is updated to test each feature and one that needs to be updated only once per feature selection. This split saved a lot of computation for each test. The algorithm is implemented in C++ and integrated into the Orfeo Toolbox. It has been compared to other classification algorithms on two high dimension remote sensing images. Results show that the approach provides good classification accuracies with low computation time
On the visual display of audio data using stacked graphs
Visualisation is an important tool for many steps of a research project. In this paper, we present several displays of audio data based on stacked graphs. Thanks to a careful use of the layering the proposed displays concisely convey a large amount of information. Many flavours are presented, each useful for a specific type of data, from spectral and chromatic data to multi-source and multi channel data. We shall demonstrate that such displays for the case of spectral and chromatic data offer a different compromise than the traditional spectrogram and chroma gram, emphasizing timing information over frequency
GMM-based classification from noisy features
International audienceWe consider Gaussian mixture model (GMM)-based classification from noisy features, where the uncertainty over each feature is represented by a Gaussian distribution. For that purpose, we first propose a new GMM training and decoding criterion called log-likelihood integration which, as opposed to the conventional likelihood integration criterion, does not rely on any assumption regarding the distribution of the data. Secondly, we introduce two new Expectation Maximization (EM) algorithms for the two criteria, that allow to learn GMMs directly from noisy features. We then evaluate and compare the behaviors of two proposed algorithms with a categorization task on artificial data and speech data with additive artificial noise, assuming the uncertainty parameters are known. Experiments demonstrate the superiority of the likelihood integration criterion with the newly proposed EM learning in all tested configurations, thus giving rise to a new family of learning approaches that are insensitive to the heterogeneity of the noise characteristics between testing and training data
- …