11,659 research outputs found
Evolutionary discriminative confidence estimation for spoken term detection
The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0913-zSpoken term detection (STD) is the task of searching for occurrences
of spoken terms in audio archives. It relies on robust confidence estimation
to make a hit/false alarm (FA) decision. In order to optimize the decision
in terms of the STD evaluation metric, the confidence has to be discriminative.
Multi-layer perceptrons (MLPs) and support vector machines (SVMs) exhibit
good performance in producing discriminative confidence; however they are
severely limited by the continuous objective functions, and are therefore less
capable of dealing with complex decision tasks. This leads to a substantial
performance reduction when measuring detection of out-of-vocabulary (OOV)
terms, where the high diversity in term properties usually leads to a complicated
decision boundary.
In this paper we present a new discriminative confidence estimation approach
based on evolutionary discriminant analysis (EDA). Unlike MLPs and
SVMs, EDA uses the classification error as its objective function, resulting
in a model optimized towards the evaluation metric. In addition, EDA combines
heterogeneous projection functions and classification strategies in decision
making, leading to a highly flexible classifier that is capable of dealing
with complex decision tasks. Finally, the evolutionary strategy of EDA reduces the risk of local minima. We tested the EDA-based confidence with a
state-of-the-art phoneme-based STD system on an English meeting domain
corpus, which employs a phoneme speech recognition system to produce lattices
within which the phoneme sequences corresponding to the enquiry terms
are searched. The test corpora comprise 11 hours of speech data recorded with
individual head-mounted microphones from 30 meetings carried out at several
institutes including ICSI; NIST; ISL; LDC; the Virginia Polytechnic Institute
and State University; and the University of Edinburgh. The experimental results
demonstrate that EDA considerably outperforms MLPs and SVMs on
both classification and confidence measurement in STD, and the advantage
is found to be more significant on OOV terms than on in-vocabulary (INV)
terms. In terms of classification performance, EDA achieved an equal error
rate (EER) of 11% on OOV terms, compared to 34% and 31% with MLPs and
SVMs respectively; for INV terms, an EER of 15% was obtained with EDA
compared to 17% obtained with MLPs and SVMs. In terms of STD performance
for OOV terms, EDA presented a significant relative improvement of
1.4% and 2.5% in terms of average term-weighted value (ATWV) over MLPs
and SVMs respectively.This work was partially supported by the French Ministry of Industry
(Innovative Web call) under contract 09.2.93.0966, âCollaborative Annotation for Video
Accessibilityâ (ACAV) and by âThe Adaptable Ambient Living Assistantâ (ALIAS) project
funded through the joint national Ambient Assisted Living (AAL) programme
Real-time interactive speech technology at Threshold Technology, Incorporated
Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed
Travel linearity and speed of human foragers and chimpanzees during their daily search for food in tropical rainforests
To understand the evolutionary roots of human spatial cognition, researchers have compared spatial abilities of humans and one of our closest living relatives, the chimpanzee (Pan troglodytes). However, how humans and chimpanzees compare in solving spatial tasks during real-world foraging is unclear to date, as measuring such spatial abilities in natural habitats is challenging. Here we compared spatial movement patterns of the Mbendjele BaYaka people and the TaĂŻ chimpanzees during their daily search for food in rainforests. We measured linearity and speed during off-trail travels toward out-of-sight locations as proxies for spatial knowledge. We found similarly high levels of linearity in individuals of Mbendjele foragers and TaĂŻ chimpanzees. However, human foragers and chimpanzees clearly differed in their reactions to group size and familiarity with the foraging areas. Mbendjele foragers increased travel linearity with increasing familiarity and group size, without obvious changes in speed. This pattern was reversed in TaĂŻ chimpanzees. We suggest that these differences between Mbendjele foragers and TaĂŻ chimpanzees reflect their different ranging styles, such as life-time range size and trail use. This result highlights the impact of socio-ecological settings on comparing spatial movement patterns. Our study provides a first step toward comparing long-range spatial movement patterns of two closely-related species in their natural environments
Rhythmic unit extraction and modelling for automatic language identification
International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)
- âŠ