Search CORE

23,252 research outputs found

Recommended from our members

Auditory Spectrum-Based Pitched Instrument Onset Detection

Author: Benetos E.
Stylianou Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2010
Field of study

In this paper, a method for onset detection of music signals using auditory spectra is proposed. The auditory spectrogram provides a time-frequency representation that employs a sound processing model resembling the human auditory system. Recent work on onset detection employs DFT-based features describing spectral energy and phase differences, as well as pitch-based features. These features are often combined for maximizing detection performance. Here, the spectral flux and phase slope features are derived in the auditory framework and a novel fundamental frequency estimation algorithm based on auditory spectra is introduced. An onset detection algorithm is proposed, which processes and combines the aforementioned features at the decision level. Experiments are conducted on a dataset covering 11 pitched instrument types, consisting of 1829 onsets in total. Results indicate that auditory representations outperform various state-of-the-art approaches, with the onset detection algorithm reaching an F-measure of 82.6%

City Research Online

Crossref

A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation

Author: Banno Hideki
Kawahara Hideki
Morise Masanori
Sakakibara Ken-Ichi
Toda Tomoki
Publication venue: 'International Speech Communication Association'
Publication date: 09/06/2017
Field of study

We introduce a simple and linear SNR (strictly speaking, periodic to random power ratio) estimator (0dB to 80dB without additional calibration/linearization) for providing reliable descriptions of aperiodicity in speech corpus. The main idea of this method is to estimate the background random noise level without directly extracting the background noise. The proposed method is applicable to a wide variety of time windowing functions with very low sidelobe levels. The estimate combines the frequency derivative and the time-frequency derivative of the mapping from filter center frequency to the output instantaneous frequency. This procedure can replace the periodicity detection and aperiodicity estimation subsystems of recently introduced open source vocoder, YANG vocoder. Source code of MATLAB implementation of this method will also be open sourced.Comment: 8 pages 9 figures, Submitted and accepted in Interspeech201

arXiv.org e-Print Archive

Crossref

Audio Source Separation Using Sparse Representations

Author: Jafari MG
Nesbit A
Plumbley MD
Vincent E
Publication venue: 'IGI Global'
Publication date: 01/01/2010
Field of study

This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Queen Mary Research Online

Surrey Research Insight

Hal-Diderot

HAL-Rennes 1

Frame Theory for Signal Processing in Psychoacoustics

Author: A. Bregman
A. Janssen
A. Ron
A.V. Oppenheim
A.V. Oppenheim
B. Laback
B. Laback
B.C.J. Moore
B.C.J. Moore
B.R. Glasberg
C. Heil
C. Heil
C. Wiesmeyr
C.J. Plack
D. Soderquist
D. Wang
D.D. Greenwood
D.T. Stoeva
D.T. Stoeva
E. Hernández
E. Ravelli
E. Zwicker
E. Zwicker
E.A. Lopez-Poveda
G. Chardon
G. Kidd Jr
G. Matz
H. Bölcskei
H. Fastl
I. Daubechies
J. Kovačević
J. Leng
J.J. Benedetto
J.J. O’Donovan
J.S. Garofolo
K. Gröchenig
L. Chai
L.N. Trefethen
M. Bownik
M. Bézat
M. Elad
M. Unoki
M. Vetterli
N. Holighaus
N. Holighaus
N. Perraudin
N.K. Bari
O. Christensen
O. Christensen
O. Christensen
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Casazza
P. Søndergaard
P. Vaidyanathan
P.G. Casazza
P.G. Casazza
R.D. Patterson
R.J. Duffin
R.M. Young
S. Strahl
T. Irino
T. Painter
T. Werther
T.S. Gunawan
W. Jesteadt
X. Valero
X. Zhao
Z. Cvetković
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2016
Field of study

This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

arXiv.org e-Print Archive

Crossref

Extraction of vocal-tract system characteristics from speechsignals

Author: Veldhuis Raymond N.J.
Yegnanarayana B.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/1998
Field of study

We propose methods to track natural variations in the characteristics of the vocal-tract system from speech signals. We are especially interested in the cases where these characteristics vary over time, as happens in dynamic sounds such as consonant-vowel transitions. We show that the selection of appropriate analysis segments is crucial in these methods, and we propose a selection based on estimated instants of significant excitation. These instants are obtained by a method based on the average group-delay property of minimum-phase signals. In voiced speech, they correspond to the instants of glottal closure. The vocal-tract system is characterized by its formant parameters, which are extracted from the analysis segments. Because the segments are always at the same relative position in each pitch period, in voiced speech the extracted formants are consistent across successive pitch periods. We demonstrate the results of the analysis for several difficult cases of speech signals

Repository TU/e

Pure OAI Repository

University of Twente Research Information

Idealized computational models for auditory receptive fields

Author: Friberg Anders
Lindeberg Tony
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

This paper presents a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to enable invariance of receptive field responses under natural sound transformations and ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or the combination of a time-causal generalized Gammatone filter over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table

arXiv.org e-Print Archive

Publikationer från KTH

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line