Search CORE

1 research outputs found

Generating language distance metrics by language recognition using acoustic features

Author: Hu Roland
Sluckin T. J.
Sun Le
Yu Huimin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

A language recognition system is used to build quantitative measure of language distance. The OpenEAR toolkit is used to extract more than 6,000 features per speech sample. The features consist of 56 low level descriptors (LLDs) and their Delta and Delta Delta values, the corresponding 39 functionals. The language model training component is based on the Gentle AdaBoost algorithm. When tested on a group of 10 principally Indo-European languages, the language recognition system performs comparatively to other language recognizers.The UPGMA tree built from the interlanguage distances identifies the major subgroups of Indo-European. Genetic algorithms are also implemented to generate the language map on the 2D plane. Although some errors remain, the obtained language tree and map are indicators of language relationships. We discuss errors in our system and more generally perspectives for the use of sound file classifiers in historical linguistics

Southampton (e-Prints Soton)