Search CORE

1 research outputs found

Fusing Language Information from Diverse Data Sources for Phonotactic Language Recognition

Author: Benzeghiba Mohamed
Gauvain Jean-Luc
Lamel Lori
Publication venue: HAL CCSD
Publication date: 01/06/2012
Field of study

International audienceThe baseline approach in building phonotactic languagerecognition systems is to characterize each language by a singlephonotactic model generated from all the available languagespecifictraining data. When several data sources are availablefor a given target language, system performance can beimproved using language source-dependent phonotactic models.In this case, the common practice is to fuse languagesource information (i.e., the phonotactic scores for each language/source) early (at the input) to the backend. This paperproposes to postpone the fusion to the end (at the output) of thebackend. In this case, the language recognition score can beestimated from well-calibrated language source scores.Experiments were conducted using the NIST LRE 2007 andthe NIST LRE 2009 evaluation data sets with the 30s condition.On the NIST LRE 2007 eval data, a Cavg of 0.9% is obtainedfor the closed-set task and 2.5% for the open-set task.Compared to the common practice of early fusion, these resultsrepresent relative improvements of 18% and 11%, for theclosed-set and open-set tasks, respectively. Initial tests on theNIST LRE 2009 eval data gave no improvement on the closedsettask. Moreover, the Cllr measure indicates that languagerecognition scores estimated by the proposed approach are bettercalibrated than the common practice (early fusion)