289 research outputs found

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Full text link
    Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field

    An Evaluation of Sinhala Language NLP Tools and Neural Network Based POS Taggers

    Get PDF
    PoS sildistamine on fundamentaalne probleem, NLP domeenis ja PoS sildistajaid (ühestajaid) kasutatakse selle väljakutse lahendamiseks. Kuigi reeglipõhist, tõenäosuslikku või süvaõppe lähenemisviisi saab kasutada, PoS sildistaja (ühestaja) väljatöötamiseks, aga süvaõppel põhinevad PoS sildistajad (ühestajad) on paremaid tulemusi näidanud. Kõik senimaani läbi viidud singala keele PoS sildistamise uuringud, on läbi viidud kasutades reeglipõhist ja tõenäosuslikku meetodit. See uurimistöö keskendub süvaõppel põhinevate PoS sildistamise (ühendamise) arendamisele ja hindamisele, kasutades singala keele jaoks LSTM võrku. Selle uurimistöö käigus koolitasime viite (5) süvaõppele tuginevat PoS sildistamise (ühendamise) mudelit, kahel erineval andmekogumil ja hindasime nende mudelite tulemusi. Hindamistulemused on näidanud, et süvaõppel põhinevaid PoS sildistajaid (ühestajaid), saab singala keele jaoks kasutada ja nende jõudlus on parem, kui olemasolevad reeglipõhised või tõenäosuslikud PoS sildistajad (ühestajad).Part Of Speech tagging is a fundamental problem in the NLP domain and Part Of Speech taggers are used to address this challenge. Though Rule based, probabilistic or deep learning approaches can be used to develop a Part Of Speech tagger, deep learning based Part Of Speech taggers have shown better results. All the Part Of Speech tagging researches that have been carried out so far for the Sinhala language have been done using rule based and probabilistic approaches. This research focuses on developing and evaluating deep learning based Part Of Speech taggers using LSTM network for the Sinhala language.In this research we trained 5 deep learning based Part Of Speech tagging models on two different data sets and evaluated the results of those models. The evaluation results have shown that deep learning based Part Of Speech taggers can be used for Sinhala language and their performance is better than the existing rule based or probabilistic Part Of Speech taggers

    Syntactic Competence and Processing: Constraints on Long-distance A-bar Dependencies in Bilinguals.

    Full text link
    This dissertation investigates the syntactic competence and processing of A-bar dependencies by Sinhala native speakers in their L2 English. The specific focus is on wh-dependencies (wh-questions and relative clauses) and topicalization, given that these phenomena are syntactically distinct across the two languages. Presenting novel results from a series of psycholinguistic experiments, the study reevaluates the predictive and explanatory power of two recent hypotheses in generative SLA —the Feature Interpretability Hypothesis (FIH) and the Shallow Structure Hypothesis (SSH)— which concern the kind of ultimate attainment possible in post-childhood L2 acquisition, regarding syntactic competence and real-time processing. The first part of the dissertation is a re-evaluation of the FIH, in particular the claim that post-childhood L2 learners fail to develop native-like underlying mental representations for the target language syntax because their access to UG is restricted in the domain of uninterpretable syntactic features. Two experiments (Grammaticality Judgment and Truth-value Judgment tasks) were conducted with thirty-eight Sinhala L1/English L2 speakers and a control group of thirty-one English monolinguals. Our results are consistent with the hypothesis that highly proficient L2 speakers are capable of acquiring native-like syntactic competence even in those domains where L2 acquisition involves the mastery of a new uninterpretable feature. The fact that these L2ers have been able to overcome a poverty of the stimulus problem, imposed by both their L1 syntax and L2 input, implies that full access to UG is available in post-childhood L2 acquisition, against the predictions of the FIH. The second part of the dissertation re-evaluates a tenet of the Shallow Structure Hypothesis that in real-time processing of the target language, L2 speakers fail to build full-fledged syntactic representations, but instead over-rely on non-syntactic information (lexical semantics and contextual cues), unlike native speakers of a target language. Our results from two Self-paced Reading experiments with thirty-six bilinguals and thirty-nine monolinguals support the conclusion that advanced L2 learners are capable of building complex native-like syntactic representations during their real-time comprehension of the target language. Thus, the study concludes that neither the FIH nor the SSH can be maintained in the experimental L2 acquisition domain investigated in this dissertation.PhDLinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116655/1/sujeewa_1.pd

    A preliminary bibliography on focus

    Get PDF
    [I]n its present form, the bibliography contains approximately 1100 entries. Bibliographical work is never complete, and the present one is still modest in a number of respects. It is not annotated, and it still contains a lot of mistakes and inconsistencies. It has nevertheless reached a stage which justifies considering the possibility of making it available to the public. The first step towards this is its pre-publication in the form of this working paper. […] The bibliography is less complete for earlier years. For works before 1970, the bibliographies of Firbas and Golkova 1975 and Tyl 1970 may be consulted, which have not been included here

    ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

    Get PDF
    国立大学法人長岡技術科学大

    Local Object Scrambling in Sinhala: Evidence for A-bar Movement

    Get PDF
    This paper provides an analysis of local object scrambling that generates the Object (O), Subject (S), and Verb (V) word order in Sinhala, an Indo-Aryan isolate spoken in Sri Lanka. Even though it has been generally assumed in limited generative literature on Sinhala that its OSV word order is derived through constituent scrambling, no prior study has systematically investigated the nature of the operation responsible for its derivation. This study reveals that local object scrambling (OSV) in Sinhala results from the syntactic merge, and it is uniformly an A-bar movement operation. The evidence comes from binding, reconstruction and parasitic gaps, the diagnostics standard in generative syntactic literature on scrambling. The analysis has implications for a generative theory on scrambling, a phenomenon that has remained a problem for the Minimalist Syntactic approach.                                        KEYWORDS: scrambling, Sinhala, OSV word order, A-bar movemen

    Grammatical properties of pronouns and their representation : an exposition

    Get PDF
    This volume brings together a cross-section of recent research on the grammar and representation of pronouns, centering around the typology of pronominal paradigms, the generation of syntactic and semantic representations for constructions containing pronouns, and the neurological underpinnings for linguistic distinctions that are relevant for the production and interpretation of these constructions. In this introductory chapter we first give an exposition of our topic (section 2). Taking the interpretation of pronouns as a starting point, we discuss the basic parameters of pronominal representations, and draw a general picture of how morphological, semantic, discourse-pragmatic and syntactic aspects come together. In section 3, we sketch the different domains of research that are concerned with these phenomena, and the particular questions they are interested in, and show how the papers in the present volume fit into the picture. Section 4 gives summaries of the individual papers, and a short synopsis of their main points of convergence
    corecore