10 research outputs found

    Acoustic Event Detection from Weakly Labeled Data Using Auditory Salience

    Get PDF
    Acoustic Event Detection (AED) is an important task of machine listening which, in recent years, has been addressed using common machine learning methods like Non-negative Matrix Factorization (NMF) or deep learning. However, most of these approaches do not take into consideration the way that human auditory system detects salient sounds. In this work, we propose a method for AED using weakly labeled data that combines a Non-negative Matrix Factorization model with a salience model based on predictive coding in the form of Kalman filters. We show that models of auditory perception, particularly auditory salience, can be successfully incorporated into existing AED methods and improve their performance on rare event detection. We evaluate the method on the Task2 of DCASE2017 Challenge

    Frame-Wise dynamic threshold based polyphonic acoustic event detection

    Get PDF
    Acoustic event detection, the determination of the acoustic event type and the localisation of the event, has been widely applied in many real-world applications. Many works adopt multi-label classification techniques to perform the polyphonic acoustic event detection with a global threshold to detect the active acoustic events. However, the global threshold has to be set manually and is highly dependent on the database being tested. To deal with this, we replaced the fixed threshold method with a frame-wise dynamic threshold approach in this paper. Two novel approaches, namely contour and regressor based dynamic threshold approaches are proposed in this work. Experimental results on the popular TUT Acoustic Scenes 2016 database of polyphonic events demonstrated the superior performance of the proposed approaches

    DCASE 2017 Challenge setup: Tasks, datasets and baseline system

    Get PDF
    International audienceDCASE 2017 Challenge consists of four tasks: acoustic scene classification , detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics

    Random Regression Forests for Acoustic Event Detection and Classification

    Get PDF
    Despite the success of the automatic speech recognition framework in its own application field, its adaptation to the problem of acoustic event detection has resulted in limited success. In this paper, instead of treating the problem similar to the segmentation and classification tasks in speech recognition, we pose it as a regression task and propose an approach based on random forest regression. Furthermore, event localization in time can be efficiently handled as a joint problem. We first decompose the training audio signals into multiple interleaved superframes which are annotated with the corresponding event class labels and their displacements to the temporal onsets and offsets of the events. For a specific event category, a random-forest regression model is learned using the displacement information. Given an unseen superframe, the learned regressor will output the continuous estimates of the onset and offset locations of the events. To deal with multiple event categories, prior to the category-specific regression phase, a superframe-wise recognition phase is performed to reject the background superframes and to classify the event superframes into different event categories. While jointly posing event detection and localization as a regression problem is novel, the superior performance on two databases ITC-Irst and UPC-TALP demonstrates the efficiency and potential of the proposed approach

    De l'indexation d'évènements dans des films (application à la détection de violence)

    Get PDF
    Dans cette thèse, nous nous intéressons à la détection de concepts sémantiques dans des films "Hollywoodiens" à l'aide de concepts audio et vidéos, dans le cadre applicatif de la détection de violence. Nos travaux se portent sur deux axes : la détection de concepts audio violents, tels que les coups de feu et les explosions, puis la détection de violence, dans un premier temps uniquement fondée sur l'audio, et dans un deuxième temps fondée sur l'audio et la vidéo. Dans le cadre de la détection de concepts audio, nous mettons tout d'abord un problème de généralisation en lumière, et nous montrons que ce problème est probablement dû à une divergence statistique entre les attributs audio extraits des films. Nous proposons pour résoudre ce problème d'utiliser le concept des mots audio, de façon à réduire cette variabilité en groupant les échantillons par similarité, associé à des réseaux Bayésiens contextuels. Les résultats obtenus sont très encourageants, et une comparaison avec un état de l'art obtenu sur les même données montre que les résultats sont équivalents. Le système obtenu peut être soit très robuste vis-à-vis du seuil appliqué en utilisant la fusion précoce des attributs, soit proposer une grande variété de points de fonctionnement. Nous proposons enfin une adaptation de l'analyse factorielle développée dans le cadre de la reconnaissance du locuteur, et montrons que son intégration dans notre système améliore les résultats obtenus. Dans le cadre de la détection de violence, nous présentons la campagne d'évaluation MediaEval Affect Task 2012, dont l'objectif est de regrouper les équipes travaillant sur le sujet de la détection de violence. Nous proposons ensuite trois systèmes pour détecter la violence, deux fondés uniquement sur l'audio, le premier utilisant une description TF-IDF, et le second étant une intégration du système de détection de concepts audio dans le cadre de la détection violence, et un système multimodal utilisant l'apprentissage de structures de graphe dans des réseaux bayésiens. Les performances obtenues dans le cadre des différents systèmes, et une comparaison avec les systèmes développés dans le cadre de MediaEval, montrent que nous sommes au niveau de l'état de l'art, et révèlent la complexité de tels systèmes.In this thesis, we focus on the detection of semantic concepts in "Hollywood" movies using audio and video concepts for the detection of violence. We present experiments in two main areas : the detection of violent audio concepts such as gunshots and explosions, and the detection of violence, initially based only on audio, then based on both audio and video. In the context of audio concepts detection, we first show a generalisation arising between movies. We show that this problem is probably due to a statistical divergence between the audio features extracted from the movies. In order to solve it, we propose to use the concept of audio words, so as to reduce the variability by grouping samples by similarity, combined with contextual Bayesian networks. The results are very encouraging, and a comparison with the state of the art obtained on the same data shows that the results we obtain are equivalent. The resulting system can be either robust against the threshold applied by using early fusion of features, or provides a wide variety of operating points. We finally propose an adaptation of the factor analysis scheme developed in the context of speaker recognition, and show that its integration into our system improves the results. In the context of the detection of violence, we present the Mediaeval Affect Task 2012 evaluation campaign, which aims at bringing together teams working on the topic of violence detection. We then propose three systems for detecting the violence. The first two are based only on audio, the first using a TF-IDF description, and the second being the integration of the previous system for the detection violence. The last system we present is a multimodal system based on Bayesian networks that allows us to explore structure learning algorithms for graphs. The performance obtained in the different systems, and a comparison to the systems developed within Mediaeval, show that we are comparable to the state of the art, and show the complexity of such systems.RENNES1-Bibl. électronique (352382106) / SudocSudocFranceF

    Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017)

    Get PDF

    Desenvolvimentos sobre métodos de previsão, medição, limitação e avaliação em ruído e vibração ambiente

    Get PDF
    Tese de dout., Ciências do Mar, da Terra e do Ambiente (Ciências e Tecnologia do Ambiente, Acústica), Faculdade de Ciências e Tecnologia, Univ. do Algarve, 2011A Tese incide sobre Ruído e Vibração Ambiente e, nesse âmbito, apresenta desenvolvimentos de: 1. Previsão, nomeadamente, harmonização das probabilidades majorativas de ocorrência meteorológica para Portugal, influência do Espectro no dimensionamento das Barreiras Acústicas, importância e conceito de Velocidade Contínua Equivalente, método inovador de determinação da Área de Permissão Acústica para Fontes Pontuais, método alternativo para cálculo previsional das características tonais, método expedito de determinação da Área de Influência Acústica de Fontes Fixas, Rodovias (ruído) e Ferrovias (ruído e vibração), e método de previsão do Tempo de Reverberação para Absorção Sonora irregular (as aplicações informáticas desenvolvidas estão disponíveis em http://doutoramento.schiu.com/); 2. Medição, nomeadamente, método inovador de aferição da regularidade da passagem de veículos rodoviários para determinação de LMax, método inovador de contagem de tráfego e medição da velocidade distinguindo o tipo de veículo através de Sonómetros, especificidades acústicas das juntas de dilatação e da monitorização de Ferrovias (incluindo cálculo de incertezas), eficácia variável de uma Barreira Acústica ao longo do dia, representatividade das medições de ruído, e importância e método de caracterização do Ruído Aéreo de Máquina de Percussão com exemplificação com o uso da Fonte Sonora Dodecaédrica desenvolvida no âmbito da Tese; 3. Previsão relacionando medições, nomeadamente, método relacional por recurso a sistemas de equações relacionando as variáveis com influência, e método de previsão da vibração através da determinação in situ das funções de transferência de vibração dos locais; 4. Limitação, nomeadamente, sugestões fundamentadas de complementação e correcção do DL 9/2007 e do DL 96/2008, assim como sugestão de Regra de Boa Prática para Ruído de Baixa Frequência e para limitação da Vibração Ambiente; 5. Avaliação, nomeadamente, critérios de objectivação da qualificação e análise comparativa de impactes (ruído e vibração), assim como sugestões de complementação e correcção dos Critérios de Representatividade IPAC e dos Critérios de Amostragem LNEC

    Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016)

    Get PDF
    corecore