Search CORE

12,746 research outputs found

A Detailed Investigation into Low-Level Feature Detection in Spectrogram Images

Author: Belhumeur
Belkin
Bengio
Bishop
Chen
Di Martino
Duda
Duda
Egan
Fawcett
Fukunaga
Ghosh
Gillespie
Gonzalez
Hinton
Jia
Jolliffe
Karhunen
Kendall
Koenig
Kohonen
Kohonen
Lampert
Law
Le
Lee
Mellinger
Mitchell
Mitchell
Morrissey
Nayar
Paris
Pearson
Potter
Quinn
Rife
Scharf
Simon E.M. O’Keefe
Thomas A. Lampert
Van der Maaten
von Gioi
Yan
Yang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/09/2011
Field of study

Being the first stage of analysis within an image, low-level feature detection is a crucial step in the image analysis process and, as such, deserves suitable attention. This paper presents a systematic investigation into low-level feature detection in spectrogram images. The result of which is the identification of frequency tracks. Analysis of the literature identifies different strategies for accomplishing low-level feature detection. Nevertheless, the advantages and disadvantages of each are not explicitly investigated. Three model-based detection strategies are outlined, each extracting an increasing amount of information from the spectrogram, and, through ROC analysis, it is shown that at increasing levels of extraction the detection rates increase. Nevertheless, further investigation suggests that model-based detection has a limitation—it is not computationally feasible to fully evaluate the model of even a simple sinusoidal track. Therefore, alternative approaches, such as dimensionality reduction, are investigated to reduce the complex search space. It is shown that, if carefully selected, these techniques can approach the detection rates of model-based strategies that perform the same level of information extraction. The implementations used to derive the results presented within this paper are available online from http://stdetect.googlecode.com

Crossref

White Rose Research Online

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

Author: Deleforge Antoine
Girin Laurent
Horaud Radu
Schechner Yoav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2015
Field of study

This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Idealized computational models for auditory receptive fields

Author: Friberg Anders
Lindeberg Tony
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

This paper presents a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to enable invariance of receptive field responses under natural sound transformations and ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or the combination of a time-causal generalized Gammatone filter over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table

arXiv.org e-Print Archive

Publikationer från KTH

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?

Author: Doerfler Monika
Grill Thomas
Bammer Roswitha
Flexer Arthur
Publication venue
Publication date: 05/07/2017
Field of study

When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which are then used as input to the actual neural network. In this contribution, we investigate, both theoretically and experimentally, the influence of this pre-processing step on the network's performance and pose the question, whether replacing it by applying adaptive or learned filters directly to the raw data, can improve learning success. The theoretical results show that approximately reproducing mel-spectrogram coefficients by applying adaptive filters and subsequent time-averaging is in principle possible. We also conducted extensive experimental work on the task of singing voice detection in music. The results of these experiments show that for classification based on Convolutional Neural Networks the features obtained from adaptive filter banks followed by time-averaging perform better than the canonical Fourier-transform-based mel-spectrogram coefficients. Alternative adaptive approaches with center frequencies or time-averaging lengths learned from training data perform equally well.Comment: Completely revised version; 21 pages, 4 figure

arXiv.org e-Print Archive

Dryad Digital Repository (Duke University)