4 research outputs found

    Linguistic complexity in second language development: variability and variation at advanced stages

    Get PDF
    De plus en plus de recherches montrent que le langage est un système dynamique et que le développement langagier est un processus dynamique qui se caractérise par de la variabilité (changements intra-individuels) et de la variation (différences inter-individuelles). Le développement de la complexité linguistique serait donc un processus individuel et il conviendrait de ne pas tenir pour acquis qu’une généralisation, au-delà de l’individu, est possible. Cependant, si nous voulons explorer des schémas communs chez les apprenants, il serait plus pertinent de retracer l’évolution du développement dans des études de cas. Cet article explore les mesures de la complexité qui semblent caractériser le mieux le développement à des niveaux avancés en L2. L’étude 1 est une étude de cas unique qui questionne ces mesures de la complexité afin de déterminer lesquelles peuvent capturer le mieux un développement général pour ce cas précis. La longueur moyenne des mots et le ratio de verbes personnels ont montré une forte corrélation avec à la fois le développement dans le temps et l’évaluation des productions écrites. L’étude 2 reprend ces deux mesures avec en plus celle des propositions subordonnées dans les productions de deux autres apprenants. Une analyse basée sur les modèles de Markov cachés a également montré que les apprenants étaient à des stades très différents et cela confirme l’hypothèse dynamique des trajectoires particulières individuelles.As shown by a growing body of research, language is a dynamic system and language development is a dynamic process characterized by variability (intra-individual changes) and variation (inter-individual differences). This implies that the development of linguistic complexity is an individually owned process and we should not assume a priori that generalization beyond the individual is possible. Therefore, if we want to explore common patterns in learners, it is best to trace development in individual cases. This paper explores which linguistic complexity measures most convincingly characterize development at advanced L2 stages. Study 1 is a single case study that explores which linguistic complexity measures capture overall development best for this individual and average word length and finite verb ratio are found to correlate strongly with both development over time and text ratings. Study 2 traces these two measures and dependent clauses in two other, similar learners. The three learners were indeed somewhat similar in the development of the two general measures, but not in the development of dependent clauses. A Hidden Markov Modeling analysis also showed that the learners developed in different stages, confirming the dynamic hypothesis of individually owned trajectories

    Incorporating Weak Statistics for Low-Resource Language Modeling

    Get PDF
    Automatic speech recognition (ASR) requires a strong language model to guide the acoustic model and favor likely utterances. While many tasks enjoy billions of language model training tokens, many domains which require ASR do not have readily available electronic corpora.The only source of useful language modeling data is expensive and time-consuming human transcription of in-domain audio. This dissertation seeks to quickly and inexpensively improve low-resource language modeling for use in automatic speech recognition. This dissertation first considers efficient use of non-professional human labor to best improve system performance, and demonstrate that it is better to collect more data, despite higher transcription error, than to redundantly transcribe data to improve quality. In the process of developing procedures to collect such data, this work also presents an efficient rating scheme to detect poor transcribers without gold standard data. As an alternative to this process, automatic transcripts are generated with an ASR system and explore efficiently combining these low-quality transcripts with a small amount of high quality transcripts. Standard n-gram language models are sensitive to the quality of the highest order n-gram and are unable to exploit accurate weaker statistics. Instead, a log-linear language model is introduced, which elegantly incorporates a variety of background models through MAP adaptation. This work introduces marginal class constraints which effectively capture knowledge of transcriber error and improve performance over n-gram features. Finally, this work constrains the language modeling task to keyword search of words unseen in the training text. While overall system performance is good, these words suffer the most due to a low probability in the language model. Semi-supervised learning effectively extracts likely n-grams containing these new keywords from a large corpus of audio. By using a search metric that favors recall over precision, this method captures over 80% of the potential gain
    corecore