Search CORE

397,541 research outputs found

Recommended from our members

Parallels in the sequential organization of birdsong and human speech.

Author: Gentner Timothy Q
Sainburg Tim
Theilman Brad
Thielk Marvin
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes

eScholarship - University of California

Resonant Neural Dynamics of Speech Perception

Author: Grossberg Stephen
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/09/2002
Field of study

What is the neural representation of a speech code as it evolves in time? How do listeners integrate temporally distributed phonemic information across hundreds of milliseconds, even backwards in time, into coherent representations of syllables and words? What sorts of brain mechanisms encode the correct temporal order, despite such backwards effects, during speech perception? How does the brain extract rate-invariant properties of variable-rate speech? This article describes an emerging neural model that suggests answers to these questions, while quantitatively simulating challenging data about audition, speech and word recognition. This model includes bottom-up filtering, horizontal competitive, and top-down attentional interactions between a working memory for short-term storage of phonetic items and a list categorization network for grouping sequences of items. The conscious speech and word recognition code is suggested to be a resonant wave of activation across such a network, and a percept of silence is proposed to be a temporal discontinuity in the rate with which such a resonant wave evolves. Properties of these resonant waves can be traced to the brain mechanisms whereby auditory, speech, and language representations are learned in a stable way through time. Because resonances are proposed to control stable learning, the model is called an Adaptive Resonance Theory, or ART, model.Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (IRI-97-20333); Office of Naval Research (N00014-01-1-0624)

Boston University Institutional Repository (OpenBU)

Bio-inspired Dynamic Formant Tracking for Phonetic Labelling

Author: Fernández-Baillo Gallego de la Sacristana Roberto
Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Cristina
Rodellar Biarge M. Victoria
Álvarez Marquina Agustin
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2008
Field of study

It is a known fact that phonetic labeling may be relevant in helping current Automatic Speech Recognition (ASR) when combined with classical parsing systems as HMM's by reducing the search space. Through the present paper a method for Phonetic Broad-Class Labeling (PCL) based on speech perception in the high auditory centers is described. The methodology is based in the operation of CF (Characteristic Frequency) and FM (Frequency Modulation) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus in the automatic detection of formants and formant dynamics on speech. Results obtained informant detection and dynamic formant tracking are given and the applicability of the method to Speech Processing is discussed

Archivo Digital UPM

Speech dynamics

Author: Pols L.C.W.
Publication venue: Department of Chinese, Translation & Linguistics, City University of Hong Kong
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications

Speech dynamics

Author: Pols L.C.W.
Publication venue: Department of Chinese, Translation & Linguistics, City University of Hong Kong
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications

Nonlinear Dynamics of the Perceived Pitch of Complex Sounds

Author: A. Faulkner
A. Gerson
A. J. M. Houtsma
A. Seebeck
C. Baesens
C. Pantev
D. Johnson
Diego L. González
E. de Boer
E. Terhardt
F. L. Wightman
G. Langner
G. S. Ohm
H. L. F. von Helmholtz
J. F. Schouten
J. H. E. Cartwright
J. L. Goldstein
Julyan H. E. Cartwright
M. A. Cohen
N. Y. S. Kiang
Oreste Piro
R. D. Patterson
R. Meddis
R. Plomp
W. M. Hartmann
Publication venue: 'American Physical Society (APS)'
Publication date: 26/06/1999
Field of study

We apply results from nonlinear dynamics to an old problem in acoustical physics: the mechanism of the perception of the pitch of sounds, especially the sounds known as complex tones that are important for music and speech intelligibility

arXiv.org e-Print Archive

Crossref

Digital.CSIC

Speech Enhancement Using An {MMSE} Spectral Amplitude Estimator Based On A Modulation Domain Kalman Filter With A Gamma Prior

Author: Brookes D
Wang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/12/2015
Field of study

In this paper, we propose a minimum mean square error spectral estimator for clean speech spectral amplitudes that uses a Kalman filter to model the temporal dynamics of the spectral amplitudes in the modulation domain. Using a two-parameter Gamma distribution to model the prior distribution of the speech spectral amplitudes, we derive closed form expressions for the posterior mean and variance of the spectral amplitudes as well as for the associated update step of the Kalman filter. The performance of the proposed algorithm is evaluated on the TIMIT core test set using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms

Spiral - Imperial College Digital Repository