2,109 research outputs found

    A corpus-based study of Spanish L2 mispronunciations by Japanese speakers

    Get PDF
    In a companion paper (Carranza et al.) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training (CAPT) programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analysing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition (ASR). In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with. The corpus features 8.9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat software [1]. We adopted the SAMPA phonemic inventory for the phonemic transcription adapted to Spanish [2] and added 11 new symbols and 7 diacritics taken from X-SAMPA [3] for the narrow-phonetic transcription. Non linguistic phenomena and incidents were also annotated with XML tags in independent tiers. Standards for transcribing and annotating non-native spontaneous speech ([4], [5]), as well as the error encoding system used in the project will be addressed. Up to 13410 errors were segmented, aligned with the canonical-phonemic tier and the narrow-phonetic tier, and annotated following an encoding system that specifies the type of error (substitutions, insertion and deletion), the affected phone and the preceding and following phonemic contexts where the error occurred. We then carried out additional analyses to check the accuracy of the transcriptions by asking two other annotators to transcribe a subset of the speech material. We calculated intertranscriber agreement coefficients. The data was automatically recovered by Praat scripts and statistically analyzed with R. The resulting frequency ratios obtained for the most frequent errors and the most frequent contexts of appearance were statistically tested to determine their significance values. We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified

    The use of initial h- in the writing-tablets from Roman Britain

    Get PDF
    This chapter is focused on the treatment of h in initial position in non-literary texts written on tablet from Roman Britain. The analysis highlights the variation concerning the treatment of h-. We consider the cases of h- insertion in initial position in the Vindolanda corpus, which targets specific areas of the lexicon: everyday language (Tab.Vindol. 622, hostrea) and, more importantly, personal names (Tab.Vindol. 184, Huettius). In contrast, the other nonliterary corpora of Londinium-Bloomberg, Carlisle and curse tablets show a different outcome, as there are only cases of h- deletion in initial position, which follows a more widely attested non-standard Latin development, which is eventually seen in the formation of the Romance language

    Oral corpora, French language education and Francophonie : how to turn linguistic data into pedagogical resources

    Get PDF

    Charting the rise and demise of a phonotactically motivated change in Scots

    Get PDF
    Although Old English [f] and [v] are represented unambiguously in Older Scots orthography by <f> and <v> (or <u>) in initial and morpheme-internal position, in morpheme-final position <f> and <v>/<u> appear to be used interchangeably for both of these Old English sounds. As a result, there is often a mismatch between the spellings and the etymologically expected consonant. This paper explores these spellings using a substantial database of Older Scots texts, which have been grapho-phonologically parsed as part of the From Inglis to Scots (FITS) project. Three explanations are explored for this apparent mismatch: (1) it was a spelling-only change; (2) there was a near merger of /f/ and /v/ in Older Scots; (3) final [v] devoiced in (pre-)Older Scots but this has subsequently been reversed. A close analysis of the data suggests that the Old English phonotactic constraint against final voiced fricatives survived into the pre-Literary Scots period, leading to automatic devoicing of any fricative that appeared in word-final position (a version of Hypothesis 3), and this, interacting with final schwa loss, gave rise to the complex patterns of variation we see in the Older Scots data. Thus, the devoicing of [v] in final position was not just a phonetically natural sound change, but also one driven by a pre-existing phonotactic constraint in the language. This paper provides evidence for the active role of phonotactic constraints in the development of sound changes, suggesting that phonotactic constraints are not necessarily at the mercy of the changes which conflict with them, but can be involved in the direction of sound change themselves

    An Exploratory analysis of individual variation in schwa epenthesis in Flemish Dutch and Scottish English

    Get PDF
    This study examines schwa epenthesis as it occurs in Flemish Dutch and Scottish English from a forensic phonetic perspective in terms of the potential it exhibits for speaker comparison purposes, as prior investigations have shown that sociolinguistic variables-such as schwa insertion-may also reveal individual speaker variability. An analysis of spontaneous speech samples for two homogeneous groups of speakers revealed that schwa epenthesis in Flemish showed higher inter-speaker variability than in Scottish English. This suggests that while Flemish schwa insertion may potentially be suitable for use in forensic phonetic environments, this process in Scottish English is not.Aquest estudi investiga la inserció epentètica d'una vocal neutra com ocorre en el neerlandès flamenc i l'anglès escocès des d'una perspectiva fonètica forense quant a la variabilitat individual de parlants i el seu potencial a l'hora de dur a terme comparacions de parlants. Una anàlisi de parla espontània de dos grups homogenis demostra que la inserció epentètica d'una vocal neutra en neerlandès flamenc mostra més variació inter-parlant que en anglès escocès. Així, els resultats suggereixen que aquest procés en neerlandès flamenc podria ser adequat per utilitzar-se en contexts fonètics forenses mentre que en anglès escocès no ho és

    Individual and Domain Adaptation in Sentence Planning for Dialogue

    Full text link
    One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations

    A spoken Chinese corpus : development, description, and application in L2 studies : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Applied Linguistics at Massey University, Manawatū, New Zealand

    Get PDF
    This thesis introduces a corpus of present-day spoken Chinese, which contains over 440,000 words of orthographically transcribed interactions. The corpus is made up of an L1 corpus and an L2 corpus. It includes data gathered in informal contexts in 2018, and is, to date, the first Chinese corpus resource of its kind investigating non-test/task-oriented dialogical interaction of L2 Chinese. The main part of the thesis is devoted to a detailed account of the compilation of the spoken Chinese corpus, including its design, the data collection, and transcription. In doing this, this study attempts to answer the question: what are the key considerations in building a spoken Chinese corpus of informal interaction, especially in building a spoken L2 corpus of L1–L2 interaction? Then, this thesis compares the L1 corpus and the L2 corpus before using them to carry out corpus studies. Differences between and within the two subcorpora are discussed in some detail. This corpus comparison is essential to any L1–L2 comparative studies conducted on the basis of the spoken Chinese corpus, and it addresses the question: to what extent is the L1 corpus comparable to the L2 corpus? Finally, this thesis demonstrates the research potential of the spoken Chinese corpus, by presenting an analysis of the L2 use of the discourse marker 就是 jiushi in comparison with the L1 use. Analysis considers mainly the contribution就是 jiushi makes as a reformulation marker to utterance interpretation within the relevance theoretic framework. To do this, it seeks to answer the question: what are the features that characterise the L2 use of the marker 就是 jiushi in informal speech? The results of this study make several useful contributions to the academic community. First of all, the spoken Chinese corpus is available to the academic community through the website, so it is expected the corpus itself will be of use to researchers, Chinese teachers, and students who are interested in spoken Chinese. In addition to the obtainable data, this thesis presents transparent accounts of each step of the compilation of both the L1 and L2 corpora. As a result, decisions and strategies taken with regard to the procedures of spoken corpus design and construction can provide some valuable suggestions to researchers who want to build their own spoken Chinese corpora. Finally, the findings of the comparative analysis of the L2 use of the marker 就是 jiushi will contribute to research on the teaching and learning of interactive spoken Chinese

    A corpus-based study of Spanish L2 mispronunciations by Japanese speakers

    Get PDF
    Abstract In a companion paper (Carranza et al.) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training (CAPT) programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analysing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition (ASR). In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with. The corpus features 8.9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat software We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified
    • …
    corecore