Search CORE

1,579 research outputs found

A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

Author: Ren Yao
Johnson Michael T
Clemins Patrick J.
Darre Michael
Glaeser Sharon Stuart
Osiejuk Tomasz S.
Out-Nyarko Ebenezer
Publication venue: e-Publications@Marquette
Publication date: 19/07/1999
Field of study

Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

epublications@Marquette

University of Sheffield Library Digital Collections

A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

Author: Clemins Patrick J.
Darre Michael
Glaeser Sharon Stuart
Johnson Michael T
Osiejuk Tomasz S.
Out-Nyarko Ebenezer
Ren Yao
Publication venue: e-Publications@Marquette
Publication date: 01/11/2009
Field of study

Multidisciplinary Digital Publishing Institute

epublications@Marquette

Directory of Open Access Journals

Learning An Invariant Speech Representation

Author: Evangelopoulos Georgios
Poggio Tomaso
Rosasco Lorenzo
Voinea Stephen
Zhang Chiyuan
Publication venue
Publication date: 01/01/2014
Field of study

Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be obtained by projecting it to each template orbit, i.e., the set of transformed signals, and computing the associated one-dimensional empirical probability distributions. The computations can be performed by modules of filtering and pooling, and extended to hierarchical architectures. In this paper, we apply a single-layer, multicomponent representation for phonemes and demonstrate improved accuracy and decreased sample complexity for vowel classification compared to standard spectral, cepstral and perceptual features.Comment: CBMM Memo No. 022, 5 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Video augmentation for improving audio speech recognition under noise

Author: British Machine Vision Conference (BMVC)
Cavallaro A
Gong S
Pachoud S
Publication venue
Publication date: 23/02/2015
Field of study

Queen Mary Research Online

Segmentation of Speech and Humming in Vocal Input

Author: Havlik J.
Polacek O.
Sporka A. J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2012
Field of study

Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall

CiteSeerX

Directory of Open Access Journals

Digital library of Brno University of Technology