Search CORE

14 research outputs found

Individual identity in songbirds: signal representations and metric learning for locating the information in complex corvid calls

Author: Assoc ISC
Gill LF
Morfi V
Stowell D
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

Bird calls range from simple tones to rich dynamic multi-harmonic structures. The more complex calls are very poorly understood at present, such as those of the scientifically important corvid family (jackdaws, crows, ravens, etc.). Individual birds can recognise familiar individuals from calls, but where in the signal is this identity encoded? We studied the question by applying a combination of feature representations to a dataset of jackdaw calls, including linear predictive coding (LPC) and adaptive discrete Fourier transform (aDFT). We demonstrate through a classification paradigm that we can strongly outperform a standard spectrogram representation for identifying individuals, and we apply metric learning to determine which time-frequency regions contribute most strongly to robust individual identification. Computational methods can help to direct our search for understanding of these complex biological signals

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

MPG.PuRe

All-Cony Net for Bird Activity Detection: Significance of Learned Pooling

Author: Assoc ISC
Nigam A
Pankajakshan A
Rajan P
Thakttr A
Thapar D
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

Crossref

Queen Mary Research Online

Asynchronous Multimodal Text Entry using Speech and Gesture Keyboards

Author: Assoc ISC
Kristensson PO
Vertanen K
Publication venue
Publication date: 01/01/2011
Field of study

CUED - Cambridge University Engineering Department

Combining Information Sources for Confidence Estimation with CRF Models

Author: Assoc ISC
Seigel MS
Woodland PC
Publication venue
Publication date: 01/01/2011
Field of study

CUED - Cambridge University Engineering Department

Structured Support Vector Machines for Noise Robust Continuous Speech Recognition

Author: Assoc ISC
Gales MJF
Zhang S-X
Publication venue
Publication date: 01/01/2011
Field of study

CUED - Cambridge University Engineering Department

Gaussian Process Experts for Voice Conversion

Author: Assoc ISC
Gales MJF
Pilkington NCV
Zen H
Publication venue
Publication date: 01/01/2011
Field of study

Conventional approaches to voice conversion typically use a GMM to represent the joint probability density of source and target features. This model is then used to perform spectral conversion between speakers. This approach is reasonably effective but can be prone to overfitting and oversmoothing of the target spectra. This paper proposes an alternative scheme that uses a collection of Gaussian process experts to perform the spectral conversion. Gaussian processes are robust to overfitting and oversmoothing and can predict the target spectra more accurately. Experimental results indicate that the objective performance of voice conversion can be improved using the proposed approach. Copyright © 2011 ISCA

CUED - Cambridge University Engineering Department

Attention, sobriety checkpoint! Can humans determine by means of voice, if someone is drunk... and can automatic classifiers compete?

Author: Assoc ISC
Minker W
Schmitt A
Ultes S
Publication venue
Publication date: 01/01/2011
Field of study

This paper analyzes the human performance of recognizing drunk speakers merely by voice and compares the results with the performance of an automatic statistical classifier. The study is carried out within the Interspeech 2011 Speaker State Challenge [1] employing the Alcohol Language Corpus (ALC) [2]. The 79 subjects yielded an average performance of 55.8% unweighted accuracy on a balanced intoxicated/non-intoxicated sample set. The statistical classifier developed in this study reaches a performance of 66.6% unweighted accuracy on the test set. In comparison, the subject with the highest performance yielded 70.0%. Our classifier is based on 4368 acoustic and prosodic features. Incorporating linguistic features along with feature selection using Information Gain Ratio (IGR) ranking added 0.7% absolute improvement with resulting in a 29% smaller feature space size. Copyright © 2011 ISCA

CUED - Cambridge University Engineering Department

Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation

Author: Assoc ISC
Gales MJF
Liu X
Woodland PC
Publication venue
Publication date: 01/01/2011
Field of study

CUED - Cambridge University Engineering Department

Graphone model interpolation and Arabic pronunciation generation

Author: Assoc ISC
Diehl F
Gales MJF
Li T
Woodland PC
Publication venue
Publication date: 01/01/2011
Field of study

This paper extends n-gram graphone model pronunciation generation to use a mixture of such models. This technique is useful when pronunciation data is for a specific variant (or set of variants) of a language, such as for a dialect, and only a small amount of pronunciation dictionary training data for that specific variant is available. The performance of the interpolated n-gram graphone model is evaluated on Arabic phonetic pronunciation generation for words that can't be handled by the Buckwalter Morphological Analyser. The pronunciations produced are also used to train an Arabic broadcast audio speech recognition system. In both cases the interpolated graphone model leads to improved performance. Copyright © 2011 ISCA

CUED - Cambridge University Engineering Department

Compression Techniques Applied to Multiple Speech Recognition Systems

Author: ASSOC I-ISC
Breslin C
Knill K
Stuttle M
Publication venue
Publication date: 01/01/2009
Field of study

Speech recognition systems typically contain many Gaussian distributions, and hence a large number of parameters. This makes them both slow to decode speech, and large to store. Techniques have been proposed to decrease the number of parameters. One approach is to share parameters between multiple Gaussians, thus reducing the total number of parameters and allowing for shared likelihood calculation. Gaussian tying and subspace clustering are two related techniques which take this approach to system compression. These techniques can decrease the number of parameters with no noticeable drop in performance for single systems. However, multiple acoustic models are often used in real speech recognition systems. This paper considers the application of Gaussian tying and subspace compression to multiple systems. Results show that two speech recognition systems can be modelled using the same number of Gaussians as just one system, with little effect on individual system performance. Copyright © 2009 ISCA

CUED - Cambridge University Engineering Department