6,227 research outputs found

    Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

    Full text link
    Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio signal modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments (216 h), selected from the Google AudioSet dataset. These segments belong to YouTube videos and have been represented as mel-spectrograms. We propose and compare two approaches. The first one is the training of two different neural networks, one for speech detection and another for music detection. The second approach consists on training a single neural network to tackle both tasks at the same time. The studied architectures include fully connected, convolutional and LSTM (long short-term memory) recurrent networks. Comparative results are provided in terms of classification performance and model complexity. We would like to highlight the performance of convolutional architectures, specially in combination with an LSTM stage. The hybrid convolutional-LSTM models achieve the best overall results (85% accuracy) in the three proposed tasks. Furthermore, a distractor analysis of the results has been carried out in order to identify which events in the ontology are the most harmful for the performance of the models, showing some difficult scenarios for the detection of music and speechThis work has been supported by project “DSSL: Redes Profundas y Modelos de Subespacios para Deteccion y Seguimiento de Locutor, Idioma y Enfermedades Degenerativas a partir de la Voz” (TEC2015-68172-C2-1-P), funded by the Ministry of Economy and Competitivity of Spain and FEDE

    A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

    Full text link
    Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and the introduction of competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). The different categories of acoustic events can present diverse temporal and spectral characteristics. However, most approaches use a fixed time-frequency resolution to represent the audio segments. This work proposes a multi-resolution analysis for feature extraction in Sound Event Detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems. Experiments are carried out over the DESED dataset in the context of the DCASE 2020 Challenge, concluding that the combination of up to 5 resolutions allows a neural network-based system to obtain better results than single-resolution models in terms of event-based F1-score in every event category and in terms of PSDS (Polyphonic Sound Detection Score). Furthermore, we analyze the impact of score thresholding in the computation of F1-score results, finding that the standard value of 0.5 is suboptimal and proposing an alternative strategy based in the use of a specific threshold for each event category, which obtains further improvements in performanceThis work was supported in part by the Project Deep Speech for Forensics and Security (DSForSec) under Grant RTI2018-098091-B-I00, in part by the Ministry of Science, Innovation and Universities of Spain, and in part by the European Regional Development Fund (ERDF

    An analysis of sound event detection under acoustic degradation using multi-resolution systems

    Full text link
    The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. In recent years, the relevance of this field is rising due to the introduction of datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). In this paper, we analyze the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high-or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events. For this purpose, the audio was obtained from the Evaluation subset of the DESED dataset, whereas the systems were trained in the context of the DCASE Challenge 2020 Task 4. Our systems are based upon the challenge baseline, which consists of a Convolutional-Recurrent Neural Network trained using the Mean Teacher method, and they employ a multiresolution approach which is able to improve the Sound Event Detection performance through the use of several resolutions during the extraction of Mel-spectrogram features. We provide insights on the benefits of this multiresolution approach in different acoustic settings, and compare the performance of the single-resolution systems in the aforementioned scenarios when using different resolutions. Furthermore, we complement the analysis of the performance in the high-overlap scenario by assessing the degree of overlap of each event category in sound event detection datasetsThis research and the APC were supported by project DSForSec (grant number RTI2018- 098091-B-I00) funded by the Ministry of Science, Innovation and Universities of Spain and the European Regional Development Fund (ERDF

    Measures of fine tuning

    Get PDF
    Fine-tuning criteria are frequently used to place upper limits on the masses of superpartners in supersymmetric extensions of the standard model. However, commonly used prescriptions for quantifying naturalness have some important shortcomings. Motivated by this, we propose new criteria for quantifying fine tuning that can be used to place upper limits on superpartner masses with greater fidelity. In addition, our analysis attempts to make explicit the assumptions implicit in quantifications of naturalness. We apply our criteria to the minimal supersymmetric extension of the standard model, and we find that the scale of supersymmetry breaking can be larger than previous methods indicate.Comment: 15 pages, LaTex, 5 figures uuencoded, gz-compressed file. Minor revisions bring the archived manuscript into agreement with published versio

    Optimal Control of Underactuated Mechanical Systems: A Geometric Approach

    Full text link
    In this paper, we consider a geometric formalism for optimal control of underactuated mechanical systems. Our techniques are an adaptation of the classical Skinner and Rusk approach for the case of Lagrangian dynamics with higher-order constraints. We study a regular case where it is possible to establish a symplectic framework and, as a consequence, to obtain a unique vector field determining the dynamics of the optimal control problem. These developments will allow us to develop a new class of geometric integrators based on discrete variational calculus.Comment: 20 pages, 2 figure

    Direct and sequential radiative three-body reaction rates at low temperatures

    Get PDF
    We investigate the low-temperature reaction rates for radiative capture processes of three particles. We compare direct and sequential capture mechanisms and rates using realistic phenomenological parametrizations of the corresponding photodissociation cross sections.Energy conservation prohibits sequential capture for energies smaller than that of the intermediate two-body structure. A finite width or a finite temperature allows this capture mechanism. We study generic effects of positions and widths of two- and three-body resonances for very low temperatures. We focus on nuclear reactions relevant for astrophysics, and we illustrate with realistic estimates for the α\alpha-α\alpha-α\alpha and α\alpha-α\alpha-nn radiative capture processes. The direct capture mechanism leads to reaction rates which for temperatures smaller than 0.1 GK can be several orders of magnitude larger than those of the NACRE compilation.Comment: To be published in European Physical Journal
    corecore