15 research outputs found

    Building the Arabic Learner Corpus and a System for Arabic Error Annotation

    Get PDF
    Recent developments in learner corpora have highlighted the growing role they play in some linguistic and computational research areas such as language teaching and natural language processing. However, there is a lack of a well-designed Arabic learner corpus that can be used for studies in the aforementioned research areas. This thesis aims to introduce a detailed and original methodology for developing a new learner corpus. This methodology which represents the major contribution of the thesis includes a combination of resources, proposed standards and tools developed for the Arabic Learner Corpus project. The resources include the Arabic Learner Corpus, which is the largest learner corpus for Arabic based on systematic design criteria. The resources also include the Error Tagset of Arabic that was designed for annotating errors in Arabic covering 29 types of errors under five broad categories. The Guide on Design Criteria for Learner Corpus is an example of the proposed standards which was created based on a review of previous work. It focuses on 11 aspects of corpus design criteria. The tools include the Computer-aided Error Annotation Tool for Arabic that provides some functions facilitating error annotation such as the smart-selection function and the auto-tagging function. Additionally, the tools include the ALC Search Tool that is developed to enable searching the ALC and downloading the source files based on a number of determinants. The project was successfully able to recruit 992 people including language learners, data collectors, evaluators, annotators and collaborators from more than 30 educational institutions in Saudi Arabia and the UK. The data of the Arabic Learner Corpus was used in a number of projects for different purposes including error detection and correction, native language identification, Arabic analysers evaluation, applied linguistics studies and data-driven Arabic learning. The use of the ALC highlights the extent to which it is important to develop this project

    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008

    Compiling and annotating a learner corpus for a morphologically rich language: CzeSL, a corpus of non-native Czech

    Get PDF
    Learner corpora, linguistic collections documenting a language as used by learners, provide an important empirical foundation for language acquisition research and teaching practice. This book presents CzeSL, a corpus of non-native Czech, against the background of theoretical and practical issues in the current learner corpus research. Languages with rich morphology and relatively free word order, including Czech, are particularly challenging for the analysis of learner language. The authors address both the complexity of learner error annotation, describing three complementary annotation schemes, and the complexity of description of non-native Czech in terms of standard linguistic categories. The book discusses in detail practical aspects of the corpus creation: the process of collection and annotation itself, the supporting tools, the resulting data, their formats and search platforms. The chapter on use cases exemplifies the usefulness of learner corpora for teaching, language acquisition research, and computational linguistics. Any researcher developing learner corpora will surely appreciate the concluding chapter listing lessons learned and pitfalls to avoid

    Tools for linguistic variation

    Get PDF
    Índice / Index / Sommaire:- Introducción a los problemas y métodos según los principios de la Escuela Dialectométrica de Salzburgo (con ejemplos sacados del “Atlante Italo-Svizzero”, AIS) (Hans Goebl).- Some further dialectometrical stops (John Nerbonne, Jelena Prokic, Martijn Wieling and Charlotte Gooskens).- Tools for dialect syntax: the case of CORDIAL-SIN (an annotated corpus of Portuguese dialects) (Ernestina Carrilho).- Le projet Vivaldi: présentation d’un atlas linguistique parlant virtual (Roland Bauer).- Le Thesaurus Occitan: une base de données multimedia consacrée aux dialectes occitans (Guylaine Brun-Trigaud).- The Thesaurus Occitan: a multimedia database dedicated to Occitan dialects; presentation of its morphosyntax module (Pierre-Aurélien Georges).- New methods for the study of grammatical variation and the Audible Corpus of Spoken Rural Spanish (Inés Fernández Ordóñez).- The application of speech synthesis and speech recognition techniques in dialectal studies (María Pilar Perea).- Relevancia del análisis lingüístico en el tratamiento cuantitativo de la variación dialectal (Esteve Clua).- El procesamiento informático de los materiales del Atlas Lingüístico de la Península Ibérica de Tomás Navarro Tomás (Pilar García Mouton).- Un retrato del artículo vasco en el año 1895 mediante el programa VDM (Ekaitz Santazilia).- Technology for prosodic variation (Gotzon Aurrekoetxea and Aitor Iglesias)

    The very model of a modern linguist — in honor of Helge Dyvik

    Get PDF
    publishedVersio
    corecore