1,242 research outputs found

    Low-resource language recognition using a fusion of phoneme posteriorgram counts, acoustic and glottal-based i-vectors

    Get PDF
    This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper

    Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. Gonzalez-Dominguez, I. Lopez-Moreno, J. Franco-Pedroso, D. Ramos, D. T. Toledano, and J. Gonzalez-Rodriguez, "Multilevel and Session Variability Compensated Language Recognition: ATVS-UAM Systems at NIST LRE 2009" IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 6, pp. 1084 – 1093, December 2010This work presents the systems submitted by the ATVS Biometric Recognition Group to the 2009 Language Recognition Evaluation (LRE’09), organized by NIST. New challenges included in this LRE edition can be summarized by three main differences with respect to past evaluations. Firstly, the number of languages to be recognized expanded to 23 languages from 14 in 2007, and 7 in 2005. Secondly, the data variability has been increased by including telephone speech excerpts extracted from Voice of America (VOA) radio broadcasts through Internet in addition to Conversational Telephone Speech (CTS). The third difference was the volume of data, involving in this evaluation up to 2 terabytes of speech data for development, which is an order of magnitude greater than past evaluations. LRE’09 thus required participants to develop robust systems able not only to successfully face the session variability problem but also to do it with reasonable computational resources. ATVS participation consisted of state-of-the-art acoustic and high-level systems focussing on these issues. Furthermore, the problem of finding a proper combination and calibration of the information obtained at different levels of the speech signal was widely explored in this submission. In this work, two original contributions were developed. The first contribution was applying a session variability compensation scheme based on Factor Analysis (FA) within the statistics domain into a SVM-supervector (SVM-SV) approach. The second contribution was the employment of a novel backend based on anchor models in order to fuse individual systems prior to one-vs-all calibration via logistic regression. Results both in development and evaluation corpora show the robustness and excellent performance of the submitted systems, exemplified by our system ranked 2nd in the 30 second open-set condition, with remarkably scarce computational resources.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01. Javier Gonzalez-Dominguez also thanks Spanish Ministry of Education for supporting his doctoral research under project TEC2006-13141-C03-03. Special thanks are given to Dr. David Van Leeuwen from TNO Human Factors (Utrech, The Netherlands) for his strong collaboration, valuable discussions and ideas. Also, authors thank to Dr. Patrick Lucey for his final support on (non-target) Australian English review of the manuscript

    Perception of linguistic rhythm by newborn infants

    Get PDF
    Previous studies have shown that newborn infants are able to discriminate between certain languages, and it has been suggested that they do so by categorizing varieties of speech rhythm. However, in order to confirm this hypothesis, it is necessary to show that language discrimination is still performed by newborns when all speech cues other than rhythm are removed. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic structure, newborns are still able to discriminate the languages. We conclude that new-borns are able to classify languages according to their type of rhythm, and that this ability may help them bootstrap other phonological properties of their native language

    Categorical Account of Gradient Acceptability of Word-Initial Polish Onsets

    Get PDF
    We examine how well categorical and probabilistic phonotactic learning models extract grammars which predict Polish speakers' acceptability judgments of words with varied initial consonant clusters. Polish is an especially interesting language to look at because of its rich inventory of sonority-sequencing defying consonant clusters, often as a result of yer-deletion. In line with results by Gorman (2013) and Durvasula (2020), we find that the categorical baselines considered here generally outperformed the Hayes and Wilson's (2008) maximum-entropy based phonotactic learner. We conclude that gradient acceptability judgments do not provide unambiguous evidence for gradient, probabilistic grammars. 

    Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues

    Get PDF
    Speech rhythm has long been claimed to be a useful bootstrapping cue in the very first steps of language acquisition. Previous studies have suggested that newborn infants do categorize varieties of speech rhythm, as demonstrated by their ability to discriminate between certain languages. However, the existing evidence is not unequivocal: in previous studies, stimuli discriminated by newborns always contained additional speech cues on top of rhythm. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic and broad phonotactic structure, newborns still seem to be able to discriminate between the two languages, but the effect is weaker than when intonation is present. This leaves open the possibility that the temporal correlation between intonational and rhythmic cues might actually facilitate the processing of speech rhythm

    PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

    Get PDF
    Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

    Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge

    Get PDF
    The Interspeech 2016 Native Language recognition challenge was to identify the first language of 867 speakers from their spoken English. Effectively this was an L2 accent recognition task where the L1 was one of eleven languages. The lack of transcripts of the spontaneous speech recordings meant that the currently best performing accent recognition approach (ACCDIST) developed by the author could not be applied. Instead, the objectives of this study were to explore whether within-speaker features found to be effective in ACCDIST would also have value within a contemporary GMM-based accent recognition approach. We show that while Gaussian mean supervectors provide the best performance on this task, small gains may be had by fusing the mean supervector system with a system based on within-speaker Gaussian mixture distances

    Language identification with suprasegmental cues: A study based on speech resynthesis

    Get PDF
    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

    Phonotactic probability and phonotactic constraints :processing and lexical segmentation by Arabic learners of English as a foreign language

    Get PDF
    PhD ThesisA fundamental skill in listening comprehension is the ability to recognize words. The ability to accurately locate word boundaries(i . e. to lexically segment) is an important contributor to this skill. Research has shown that English native speakers use various cues in the signal in lexical segmentation. One such cue is phonotactic constraints; more specifically, the presence of illegal English consonant sequences such as AV and MY signals word boundaries. It has also been shown that phonotactic probability (i. e. the frequency of segments and sequences of segments in words) affects native speakers' processing of English. However, the role that phonotactic probability and phonotactic constraints play in the EFL classroom has hardly been studied, while much attention has been devoted to teaching listening comprehension in EFL. This thesis reports on an intervention study which investigated the effect of teaching English phonotactics upon Arabic speakers' lexical segmentation of running speech in English. The study involved a native English group (N= 12), a non-native speaking control group (N= 20); and a non-native speaking experimental group (N=20). Each of the groups took three tests, namely Non-word Rating, Lexical Decision and Word Spotting. These tests probed how sensitive the subjects were to English phonotactic probability and to the presence of illegal sequences of phonemes in English and investigated whether they used these sequences in the lexical segmentation of English. The non-native groups were post-tested with the -same tasks after only the experimental group had been given a treatment which consisted of explicit teaching of relevant English phonotactic constraints and related activities for 8 weeks. The gains made by the experimental group are discussed, with implications for teaching both pronunciation and listening comprehension in an EFL setting.Qassim University, Saudi Arabia
    corecore