Search CORE

5,098 research outputs found

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

Author: Benetos Emmanouil
Phaye Sai Samarth R
Wang Ye
Publication venue
Publication date: 25/02/2019
Field of study

Acoustic Scene Classification (ASC) is one of the core research problems in the field of Computational Sound Scene Analysis. In this work, we present SubSpectralNet, a novel model which captures discriminative features by incorporating frequency band-level differences to model soundscapes. Using mel-spectrograms, we propose the idea of using band-wise crops of the input time-frequency representations and train a convolutional neural network (CNN) on the same. We also propose a modification in the training method for more efficient learning of the CNN models. We first give a motivation for using sub-spectrograms by giving intuitive and statistical analyses and finally we develop a sub-spectrogram based CNN architecture for ASC. The system is evaluated on the public ASC development dataset provided for the "Detection and Classification of Acoustic Scenes and Events" (DCASE) 2018 Challenge. Our best model achieves an improvement of +14% in terms of classification accuracy with respect to the DCASE 2018 baseline system. Code and figures are available at https://github.com/ssrp/SubSpectralNetComment: Accepted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 201

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

Author: Deng Shiwen
Du Zhihao
Han Jiqing
Song Hongwei
Publication venue: 'International Speech Communication Association'
Publication date: 26/04/2019
Field of study

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global acoustical distributions of audio or the temporal evolution of short-term audio features, without analysis down to the level of sound events. To identify distinct sound events for each scene, we formulate ASC in a multi-instance learning (MIL) framework, where each audio recording is mapped into a bag-of-instances representation. Here, instances can be seen as high-level representations for sound events inside a scene. We also propose a MIL neural networks model, which implicitly identifies distinct instances (i.e., sound events). Furthermore, we propose two specially designed modules that model the multi-temporal scale and multi-modal natures of the sound events respectively. The experiments were conducted on the official development set of the DCASE2018 Task1 Subtask B, and our best-performing model improves over the official baseline by 9.4% (68.3% vs 58.9%) in terms of classification accuracy. This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.Comment: code URL typo, code is available at https://github.com/hackerekcah/distinct-events-asc.gi

arXiv.org e-Print Archive

Crossref

Histogram Equalization for Robust Speech Recognition

Author: &#193
Antonio J. Rubio
Carmen Ben&#237
Jose Carlos Segura
Luz Garc&#237
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Observational Constraints on Open Inflation Models

Author: A. G. Doroshkevich
A. Linde
A. R. Liddle
A. R. Liddle
A. R. Liddle
B. J. Carr
C. L. Bennett
D. Scott
H. M. P. Couchman
J. A. Peacock
J. D. Cohn
J. Primack
Joseph Silk
K. Yamamoto
M. Bucher
M. S. Wilson
M. Sasaki
M. White
M. White
Martin White
N. Sugiyama
P. J. E. Peebles
P. T. P. Viana
W. Hu
W. Hu
Publication venue: 'American Physical Society (APS)'
Publication date: 27/03/1997
Field of study

We discuss observational constraints on models of open inflation. Current data from large-scale structure and the cosmic microwave background prefer models with blue spectra and/or Omega_0 >= 0.3--0.5. Models with minimal anisotropy at large angles are strongly preferred.Comment: 4 pages, RevTeX, with 2 postscript figures included. Second Figure correcte

arXiv.org e-Print Archive

Crossref

Multi-channel Feature Enhancement for Robust Speech Recognition

Author: Emanuele Principi
Francesco Piazza
Rudy Rotili
Simone Cifani
Stefano Squartini
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

IntechOpen

IRIS UniversitÃ Politecnica delle Marche

Advanced automatic mixing tools for music

Author: Perez Gonzalez Enrique
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2010
Field of study

PhDThis thesis presents research on several independent systems that when combined together can generate an automatic sound mix out of an unknown set of multi‐channel inputs. The research explores the possibility of reproducing the mixing decisions of a skilled audio engineer with minimal or no human interaction. The research is restricted to non‐time varying mixes for large room acoustics. This research has applications in dynamic sound music concerts, remote mixing, recording and postproduction as well as live mixing for interactive scenes. Currently, automated mixers are capable of saving a set of static mix scenes that can be loaded for later use, but they lack the ability to adapt to a different room or to a different set of inputs. In other words, they lack the ability to automatically make mixing decisions. The automatic mixer research depicted here distinguishes between the engineering mixing and the subjective mixing contributions. This research aims to automate the technical tasks related to audio mixing while freeing the audio engineer to perform the fine‐tuning involved in generating an aesthetically‐pleasing sound mix. Although the system mainly deals with the technical constraints involved in generating an audio mix, the developed system takes advantage of common practices performed by sound engineers whenever possible. The system also makes use of inter‐dependent channel information for controlling signal processing tasks while aiming to maintain system stability at all times. A working implementation of the system is described and subjective evaluation between a human mix and the automatic mix is used to measure the success of the automatic mixing tools

Queen Mary Research Online