6,541 research outputs found
Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset
Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken
communication. Machine learning models such as neural networks have already been proposed for audio signal
modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the
implementation of several neural network-based systems for speech and music event detection over a collection of
77,937 10-second audio segments (216 h), selected from the Google AudioSet dataset. These segments belong to
YouTube videos and have been represented as mel-spectrograms. We propose and compare two approaches. The
first one is the training of two different neural networks, one for speech detection and another for music detection.
The second approach consists on training a single neural network to tackle both tasks at the same time. The studied
architectures include fully connected, convolutional and LSTM (long short-term memory) recurrent networks.
Comparative results are provided in terms of classification performance and model complexity. We would like to
highlight the performance of convolutional architectures, specially in combination with an LSTM stage. The hybrid
convolutional-LSTM models achieve the best overall results (85% accuracy) in the three proposed tasks. Furthermore,
a distractor analysis of the results has been carried out in order to identify which events in the ontology are the most
harmful for the performance of the models, showing some difficult scenarios for the detection of music and speechThis work has been supported by project “DSSL: Redes Profundas y Modelos
de Subespacios para Deteccion y Seguimiento de Locutor, Idioma y
Enfermedades Degenerativas a partir de la Voz” (TEC2015-68172-C2-1-P),
funded by the Ministry of Economy and Competitivity of Spain and FEDE
A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and the introduction of competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). The different categories of acoustic events can present diverse temporal and spectral characteristics. However, most approaches use a fixed time-frequency resolution to represent the audio segments. This work proposes a multi-resolution analysis for feature extraction in Sound Event Detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems. Experiments are carried out over the DESED dataset in the context of the DCASE 2020 Challenge, concluding that the combination of up to 5 resolutions allows a neural network-based system to obtain better results than single-resolution models in terms of event-based F1-score in every event category and in terms of PSDS (Polyphonic Sound Detection Score). Furthermore, we analyze the impact of score thresholding in the computation of F1-score results, finding that the standard value of 0.5 is suboptimal and proposing an alternative strategy based in the use of a specific threshold for each event category, which obtains further improvements in performanceThis work was supported in part by the Project Deep Speech for Forensics and Security (DSForSec) under Grant RTI2018-098091-B-I00, in part by the Ministry of Science, Innovation and Universities of Spain, and in part by the European Regional Development Fund (ERDF
An analysis of sound event detection under acoustic degradation using multi-resolution systems
The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. In recent years, the relevance of this field is rising due to the introduction of datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). In this paper, we analyze the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high-or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events. For this purpose, the audio was obtained from the Evaluation subset of the DESED dataset, whereas the systems were trained in the context of the DCASE Challenge 2020 Task 4. Our systems are based upon the challenge baseline, which consists of a Convolutional-Recurrent Neural Network trained using the Mean Teacher method, and they employ a multiresolution approach which is able to improve the Sound Event Detection performance through the use of several resolutions during the extraction of Mel-spectrogram features. We provide insights on the benefits of this multiresolution approach in different acoustic settings, and compare the performance of the single-resolution systems in the aforementioned scenarios when using different resolutions. Furthermore, we complement the analysis of the performance in the high-overlap scenario by assessing the degree of overlap of each event category in sound event detection datasetsThis research and the APC were supported by project DSForSec (grant number RTI2018-
098091-B-I00) funded by the Ministry of Science, Innovation and Universities of Spain and the European Regional Development Fund (ERDF
Measures of fine tuning
Fine-tuning criteria are frequently used to place upper limits on the masses
of superpartners in supersymmetric extensions of the standard model. However,
commonly used prescriptions for quantifying naturalness have some important
shortcomings. Motivated by this, we propose new criteria for quantifying fine
tuning that can be used to place upper limits on superpartner masses with
greater fidelity. In addition, our analysis attempts to make explicit the
assumptions implicit in quantifications of naturalness. We apply our criteria
to the minimal supersymmetric extension of the standard model, and we find that
the scale of supersymmetry breaking can be larger than previous methods
indicate.Comment: 15 pages, LaTex, 5 figures uuencoded, gz-compressed file. Minor
revisions bring the archived manuscript into agreement with published versio
Optimal Control of Underactuated Mechanical Systems: A Geometric Approach
In this paper, we consider a geometric formalism for optimal control of
underactuated mechanical systems. Our techniques are an adaptation of the
classical Skinner and Rusk approach for the case of Lagrangian dynamics with
higher-order constraints. We study a regular case where it is possible to
establish a symplectic framework and, as a consequence, to obtain a unique
vector field determining the dynamics of the optimal control problem. These
developments will allow us to develop a new class of geometric integrators
based on discrete variational calculus.Comment: 20 pages, 2 figure
Direct and sequential radiative three-body reaction rates at low temperatures
We investigate the low-temperature reaction rates for radiative capture
processes of three particles. We compare direct and sequential capture
mechanisms and rates using realistic phenomenological parametrizations of the
corresponding photodissociation cross sections.Energy conservation prohibits
sequential capture for energies smaller than that of the intermediate two-body
structure. A finite width or a finite temperature allows this capture
mechanism. We study generic effects of positions and widths of two- and
three-body resonances for very low temperatures. We focus on nuclear reactions
relevant for astrophysics, and we illustrate with realistic estimates for the
-- and -- radiative capture
processes. The direct capture mechanism leads to reaction rates which for
temperatures smaller than 0.1 GK can be several orders of magnitude larger than
those of the NACRE compilation.Comment: To be published in European Physical Journal
- …