Search CORE

6,261 research outputs found

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

Author: Deng Shiwen
Du Zhihao
Han Jiqing
Song Hongwei
Publication venue: 'International Speech Communication Association'
Publication date: 26/04/2019
Field of study

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global acoustical distributions of audio or the temporal evolution of short-term audio features, without analysis down to the level of sound events. To identify distinct sound events for each scene, we formulate ASC in a multi-instance learning (MIL) framework, where each audio recording is mapped into a bag-of-instances representation. Here, instances can be seen as high-level representations for sound events inside a scene. We also propose a MIL neural networks model, which implicitly identifies distinct instances (i.e., sound events). Furthermore, we propose two specially designed modules that model the multi-temporal scale and multi-modal natures of the sound events respectively. The experiments were conducted on the official development set of the DCASE2018 Task1 Subtask B, and our best-performing model improves over the official baseline by 9.4% (68.3% vs 58.9%) in terms of classification accuracy. This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.Comment: code URL typo, code is available at https://github.com/hackerekcah/distinct-events-asc.gi

arXiv.org e-Print Archive

Crossref

Recommended from our members

Signal separation of musical instruments: simulation-based methods for musical signal decomposition and transcription

Author: Walmsley Paul Jospeh
Publication venue: University of Cambridge
Publication date: 29/05/2001
Field of study

This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the description of time-varying homogeneous and heterogeneous multiple component signals is developed, and then applied to the analysis of musical signals. The importance of high level musical and perceptual psychological knowledge in the formulation of the model is highlighted, and attention is drawn to the limitation of pure signal processing techniques for dealing with musical signals. Gestalt psychological grouping principles motivate the hierarchical signal model, and component identifiability is considered in terms of perceptual streaming where each component establishes its own context. A major emphasis of this thesis is the practical application of MCMC techniques, which are generally deemed to be too slow for many applications. Through the design of efficient transition kernels highly optimised for harmonic models, and by careful choice of assumptions and approximations, implementations approaching the order of realtime are viable.Engineering and Physical Sciences Research Counci

Apollo (Cambridge)

Machine Understanding of Human Behavior

Author: Huang Thomas
Nijholt Anton
Pantic Maja
Pentland Alex
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2007
Field of study

A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior

University of Twente Research Information

Robust Sound Event Classification using Deep Neural Networks

Author: McLoughlin Ian Vince
Song Yan
Xiao Wei
Xie Zhi-Peng
Zhang Hao-min
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2015
Field of study

The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques

Crossref

Kent Academic Repository

The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network

Author: Lansink C.S.
Meijer G.T.
Mertens P.E.C.
Olcese U.
Pennartz C.M.A.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Contributions of local speech encoding and functional connectivity to audio-visual speech perception

Author: Abrams
Alexandrou
Alho
Arnal
Arnal
Arnal
Ashburner
Auksztulewicz
Beauchamp
Belitski
Bernstein
Besle
Besserve
Besserve
Binder
Bornkessel-Schlesewsky
Bourguignon
Brainard
Callan
Callan
Callan
Canolty
Chandrasekaran
Chandrasekaran
Chennu
Chu
Clos
Crosse
Ding
Ding
Du
Evans
Ferstl
Fonteneau
Freedman
Ghazanfar
Ghazanfar
Giraud
Gow
Grant
Greenberg
Gross
Guediche
Hasson
Hasson
Hasson
Heim
Hickok
Hickok
Hipp
Horowitz-Kraus
Ince
Ince
Ince
Kandylaki
Kayser
Kayser
Kayser
Keitel
Krieger-Redwood
Kriegeskorte
Lakatos
Lee
Maris
Massey
McGettigan
Meister
Mesgarani
Morillon
Morís Fernández
Nath
Ng
Ohshiro
Oostenveld
Osnes
Panzeri
Park
Park
Peelle
Peelle
Pickering
Poeppel
Poeppel
Pola
Pouget
Price
Rauschecker
Riedel
Ross
Schepers
Schneidman
Schreiber
Schroeder
Schroeder
Schwartz
Schwartz
Skipper
Sohoglu
Sumby
Tavano
Thorne
van Atteveldt
van Atteveldt
van Wassenhove
Vetter
Vicente
Wibral
Wild
Wilson
Winkler
Wright
Yarkoni
Zion Golumbic
Zion Golumbic
Publication venue: eLife Sciences Publications
Publication date: 01/01/2017
Field of study

Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments

Crossref

HAL AMU

ZENODO

Dryad Digital Repository (Duke University)

Publications at Bielefeld University

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Enlighten

FigShare