Search CORE

3,793 research outputs found

Native Language Identification on Text and Speech

Author: Ciobanu Alina Maria
Dinu Liviu P.
Zampieri Marcos
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken responses in form of audio transcriptions and iVectors by non-native English speakers of eleven native languages. Our system competed in the challenge under the team name ZCD and was based on an ensemble of SVM classifiers trained on character n-grams achieving 83.58% accuracy and ranking 3rd in the shared task.Comment: Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications (BEA

arXiv.org e-Print Archive

Crossref

An introduction to The National Institute for Japanese Language and Linguistics : A sketch of its achievements sixth edition

Author: The National Institute for Japanese Language and Linguistics
国立国語研究所
Publication venue: 国立国語研究所
Publication date: 01/09/2019
Field of study

Institutional Repositories DataBase (IRDB)

Academic Repository of the National Institute for Japanese Language and Linguistics / 国立国語研究所学術情報リポジトリ

Machine Assisted Analysis of Vowel Length Contrasts in Wolof

Author: Besacier Laurent
Gauthier Elodie
Voisin Sylvie
Publication venue
Publication date: 01/06/2017
Field of study

Growing digital archives and improving algorithms for automatic analysis of text and speech create new research opportunities for fundamental research in phonetics. Such empirical approaches allow statistical evaluation of a much larger set of hypothesis about phonetic variation and its conditioning factors (among them geographical / dialectal variants). This paper illustrates this vision and proposes to challenge automatic methods for the analysis of a not easily observable phenomenon: vowel length contrast. We focus on Wolof, an under-resourced language from Sub-Saharan Africa. In particular, we propose multiple features to make a fine evaluation of the degree of length contrast under different factors such as: read vs semi spontaneous speech ; standard vs dialectal Wolof. Our measures made fully automatically on more than 20k vowel tokens show that our proposed features can highlight different degrees of contrast for each vowel considered. We notably show that contrast is weaker in semi-spontaneous speech and in a non standard semi-spontaneous dialect.Comment: Accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Workshop on Advanced Corpus Solutions

Author: Johannessen Janne Bondi
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2010
Field of study

Waseda University Repository

NORA - Norwegian Open Research Archives

Vowel duration and the voicing effect across dialects of English

Author: Sonderegger Morgan
Stuart-Smith Jane
Tanner James
The SPADE Data Consortium
Publication venue: 'University of Toronto Libraries - UOTL'
Publication date: 15/08/2019
Field of study

The ‘voicing effect’ – the durational difference in vowels preceding voiced and voiceless consonants – is a well-documented phenomenon in English, where it plays a key role in the production and perception of the English final voicing contrast. Despite this supposed importance, little is known as to how robust this effect is in spontaneous connected speech, which is itself subject to a range of linguistic factors. Similarly, little attention has focused on variability in the voicing effect across dialects of English, bar analysis of specific varieties. Our findings show that the voicing of the following consonant exhibits a weaker-than-expected effect in spontaneous speech, interacting with manner, vowel height, speech rate, and word frequency. English dialects appear to demonstrate a continuum of potential voicing effect sizes, where varieties with dialect-specific phonological rules exhibit the most extreme values. The results suggest that the voicing effect in English is both substantially weaker than previously assumed in spontaneous connected speech, and subject to a wide range of dialectal variability

University of Toronto: Journal Publishing Services

Enlighten

COMPUTER CORPORA AND THEIR USE IN LANGUAGE ANALYSIS

Author: Atsuko Umesaki
ウメサキアツコ
梅咲敦子
Publication venue: 帝塚山短期大学文芸学科
Publication date: 01/03/1996
Field of study

Tezukayama University Repository

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

Author: Chasaide Ailbhe Ní
Chiaráin Neasa Ní
Gobl Christer
Lonergan Liam
Qian Mengjie
Publication venue
Publication date: 14/07/2023
Field of study

ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems were trained, firstly using baseline dialect-balanced training corpora, and then using modified versions of the baseline corpora, where dialect-specific materials were either subtracted or added. Results indicate that dialect-balanced corpora do not yield a similar performance across the dialects: the Ul dialect consistently underperforms, whereas Mu yields lowest WERs. There is a close relationship between Co and Mu dialects, but one that is not symmetrical. These results will guide future corpus collection and system building strategies to optimise for cross-dialect performance equity.Comment: Accepted to Interspeech 2023, Dubli

arXiv.org e-Print Archive