Search CORE

1,827 research outputs found

Histogram of gradients of Time-Frequency Representations for Audio scene detection

Author: Gasso Gilles
Rakotomamonjy Alain
Publication venue
Publication date: 01/01/2015
Field of study

This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of time-frequency representation of an audio scene. Contrarily to classical audio features like MFCC, we make the hypothesis that histogram of gradients are able to encode some relevant informations in a time-frequency {representation:} namely, the local direction of variation (in time and frequency) of the signal spectral power. In addition, in order to gain more invariance and robustness, histogram of gradients are locally pooled. We have evaluated the relevance of {the novel feature} by comparing its performances with state-of-the-art competitors, on several datasets, including a novel one that we provide, as part of our contribution. This dataset, that we make publicly available, involves

19

classes and contains about

900

minutes of audio scene recording. We thus believe that it may be the next standard dataset for evaluating audio scene classification algorithms. Our comparison results clearly show that our HOG-based features outperform its competitor

arXiv.org e-Print Archive

HAL - Normandie Université

Learning Audio Sequence Representations for Acoustic Event Classification

Author: Han Jing
Liu Ding
Qian Kun
Schuller Björn W.
Zhang Zixing
Publication venue
Publication date: 27/07/2017
Field of study

Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is still challenging. Previous methods mainly focused on designing the audio features in a 'hand-crafted' manner. Interestingly, data-learnt features have been recently reported to show better performance. Up to now, these were only considered on the frame-level. In this paper, we propose an unsupervised learning framework to learn a vector representation of an audio sequence for AEC. This framework consists of a Recurrent Neural Network (RNN) encoder and a RNN decoder, which respectively transforms the variable-length audio sequence into a fixed-length vector and reconstructs the input sequence on the generated vector. After training the encoder-decoder, we feed the audio sequences to the encoder and then take the learnt vectors as the audio sequence representations. Compared with previous methods, the proposed method can not only deal with the problem of arbitrary-lengths of audio streams, but also learn the salient information of the sequence. Extensive evaluation on a large-size acoustic event database is performed, and the empirical results demonstrate that the learnt audio sequence representation yields a significant performance improvement by a large margin compared with other state-of-the-art hand-crafted sequence features for AEC

arXiv.org e-Print Archive

OPUS Augsburg

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

Author: Badlani Rohan
Elizalde Benjamin
Kumar Anurag
Lane Ian
Raj Bhiksha
Shah Ankit
Vincent Emmanuel
Publication venue
Publication date: 25/08/2016
Field of study

In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

Author: Deng Shiwen
Han Jiqing
Song Hongwei
Publication venue: 'International Speech Communication Association'
Publication date: 10/04/2019
Field of study

One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less structure in temporal spectral representations. However, the background of an acoustic scene exhibits temporal homogeneity in acoustic properties, suggesting it could be characterized by distribution statistics rather than temporal details. In this work, we investigated using auditory summary statistics as the feature for ASC tasks. The inspiration comes from a recent neuroscience study, which shows the human auditory system tends to perceive sound textures through time-averaged statistics. Based on these statistics, we further proposed to use linear discriminant analysis to eliminate redundancies among these statistics while keeping the discriminative information, providing an extreme com-pact representation for acoustic scenes. Experimental results show the outstanding performance of the proposed feature over the conventional handcrafted features.Comment: Accepted as a conference paper of Interspeech 201

arXiv.org e-Print Archive

Crossref

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

Author: Deng Shiwen
Du Zhihao
Han Jiqing
Song Hongwei
Publication venue: 'International Speech Communication Association'
Publication date: 26/04/2019
Field of study

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global acoustical distributions of audio or the temporal evolution of short-term audio features, without analysis down to the level of sound events. To identify distinct sound events for each scene, we formulate ASC in a multi-instance learning (MIL) framework, where each audio recording is mapped into a bag-of-instances representation. Here, instances can be seen as high-level representations for sound events inside a scene. We also propose a MIL neural networks model, which implicitly identifies distinct instances (i.e., sound events). Furthermore, we propose two specially designed modules that model the multi-temporal scale and multi-modal natures of the sound events respectively. The experiments were conducted on the official development set of the DCASE2018 Task1 Subtask B, and our best-performing model improves over the official baseline by 9.4% (68.3% vs 58.9%) in terms of classification accuracy. This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.Comment: code URL typo, code is available at https://github.com/hackerekcah/distinct-events-asc.gi

arXiv.org e-Print Archive

Crossref