118 research outputs found

    Processing Relative Clauses in Turkish as a Second Language

    Get PDF
    The present study focuses on the processing of relative clauses in Turkish as a second language. The specific purpose of the study is to address the gap in the previous research with regard to why certain relative clause constructions should be more difficult to process than others. For example, in English, object relative clauses such as "the lion that the cow carries" are more difficult to comprehend and produce than subject relative clauses such as "the lion that carries the cow." It has been stated for both L1 and L2 learners that these observed differences in difficulty parallel the implicational relationships in Keenan and Comrie's (1977) Noun Phrase Accessibility Hierarchy Hypothesis (NPAH). Although there has been some research on this issue, the question of why the acquisition order follows this pattern has never fully been answered since different theories make the same predictions for languages that have been investigated thus far. However, in an SOV language like Turkish, because of its particular structural characteristics, the predictions of those theories diverge, and thus their separate effects can be disentangled. Therefore, the present study explores the issue using the Turkish language. The results of picture selection tasks taken by 20 English and 7 Japanese, Korean and Mongolian learners of Turkish indicate that learners have an easier time with processing object relative clauses than subject relative clauses contrary to the results in the literature for the same construction in other languages. These results have significant implications for the theory of second language acquisition. These implications include, among others, questions about the accuracy of current views of "interlanguages" (language learner languages) and of the role of "language universals" in second language acquisition

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Proceedings of the 1st Conference on Central Asian Languages and Linguistics (ConCALL)

    Get PDF
    The Conference on Central Asian Languages and Linguistics (ConCALL) was founded in 2014 at Indiana University by Dr. Öner Özçelik, the residing director of the Center for Languages of the Central Asian Region (CeLCAR). As the nation’s sole U.S. Department of Education funded Language Resource Center focusing on the languages of the Central Asian Region, CeLCAR’s main mission is to strengthen and improve the nation’s capacity for teaching and learning Central Asian languages through teacher training, research, materials development projects, and dissemination. As part of this mission, CeLCAR has an ultimate goal to unify and fortify the Central Asian language learning community by facilitating networking between linguists and language educators, encouraging research projects that will inform language instruction, and provide opportunities for professionals in the field to both showcase their work and receive feedback from their peers. Thus ConCALL was established to be the first international academic conference to bring together linguists and language educators in the languages of the Central Asian region, including both the Altaic and Eastern Indo-European languages spoken in the region, to focus on research into how these specific languages are represented formally, as well as acquired by second/foreign language learners, and also to present research driven teaching methods. Languages served by ConCALL include, but are not limited to: Azerbaijani, Dari, Karakalpak, Kazakh, Kyrgyz, Lokaabharan, Mari, Mongolian, Pamiri, Pashto, Persian, Russian, Shughnani, Tajiki, Tibetan, Tofalar, Tungusic, Turkish, Tuvan, Uyghur, Uzbek, Wakhi and more!The Conference on Central Asian Languages and Linguistics held at Indiana University on 16-17 May 1014 was made possible through the generosity of our sponsors: Center for Languages of the Central Asian Region (CeLCAR), Ostrom Grant Programs, IU's College of Arts and Humanities Center (CAHI), Inner Asian and Uralic National Resource Center (IAUNRC), IU's School of Global and International Studies (SGIS), IU's College of Arts and Sciences, Sinor Research Institute for Inner Asian Studies (SRIFIAS), IU's Department of Central Eurasian Studies (CEUS), and IU's Department of Linguistics

    Elements, Government, and Licensing: Developments in phonology

    Get PDF
    Elements, Government, and Licensing brings together new theoretical and empirical developments in phonology. It covers three principal domains of phonological representation: melody and segmental structure; tone, prosody and prosodic structure; and phonological relations, empty categories, and vowel-zero alternations. Theoretical topics covered include the formalisation of Element Theory, the hotly debated topic of structural recursion in phonology, and the empirical status of government. In addition, a wealth of new analyses and empirical evidence sheds new light on empty categories in phonology, the analysis of certain consonantal sequences, phonological and non-phonological alternation, the elemental composition of segments, and many more. Taking up long-standing empirical and theoretical issues informed by the Government Phonology and Element Theory, this book provides theoretical advances while also bringing to light new empirical evidence and analysis challenging previous generalisations. The insights offered here will be equally exciting for phonologists working on related issues inside and outside the Principles & Parameters programme, such as researchers working in Optimality Theory or classical rule-based phonology

    Compiling and annotating a learner corpus for a morphologically rich language: CzeSL, a corpus of non-native Czech

    Get PDF
    Learner corpora, linguistic collections documenting a language as used by learners, provide an important empirical foundation for language acquisition research and teaching practice. This book presents CzeSL, a corpus of non-native Czech, against the background of theoretical and practical issues in the current learner corpus research. Languages with rich morphology and relatively free word order, including Czech, are particularly challenging for the analysis of learner language. The authors address both the complexity of learner error annotation, describing three complementary annotation schemes, and the complexity of description of non-native Czech in terms of standard linguistic categories. The book discusses in detail practical aspects of the corpus creation: the process of collection and annotation itself, the supporting tools, the resulting data, their formats and search platforms. The chapter on use cases exemplifies the usefulness of learner corpora for teaching, language acquisition research, and computational linguistics. Any researcher developing learner corpora will surely appreciate the concluding chapter listing lessons learned and pitfalls to avoid

    Paths through meaning and form: Festschrift offered to Klaus von Heusinger on the occasion of his 60th birthday

    Get PDF
    “Paths through meaning and form. Festschrift offered to Klaus von Heusinger on the occasion of his 60th birthday” umfasst 60 Beiträge von Kolleginnen und Kollegen, die mit Klaus von Heusinger in seiner wissenschaftlichen Laufbahn zusammengearbeitet haben. Die in den einzelnen Beiträgen behandelten Themen gehen auf Prominenz, Referentialität, Quantifikation, Kasus, Spracherwerb und experimentelle Psycholinguistik ein

    An Investigation into Automatic Translation of Prepositions in IT Technical Documentation from English to Chinese

    Get PDF
    Machine Translation (MT) technology has been widely used in the localisation industry to boost the productivity of professional translators. However, due to the high quality of translation expected, the translation performance of an MT system in isolation is less than satisfactory due to various generated errors. This study focuses on translation of prepositions from English into Chinese within technical documents in an industrial localisation context. The aim of the study is to reveal the salient errors in the translation of prepositions and to explore possible methods to remedy these errors. This study proposes three new approaches to improve the translation of prepositions. All approaches attempt to make use of the strengths of the two most popular MT architectures at the moment: Rule-Based MT (RBMT) and Statistical MT (SMT). The approaches include: firstly building an automatic preposition dictionary for the RBMT system; secondly exploring and modifing the process of Statistical Post-Editing (SPE) and thirdly pre-processing the source texts to better suit the RBMT system. Overall evaluation results (both human evaluation and automatic evaluation) show the potential of our new approaches in improving the translation of prepositions. In addition, the current study also reveals a new function of automatic metrics in assisting researchers to obtain more valid or purpose-specific human valuation results
    corecore