17 research outputs found

    Crossings as a side effect of dependency lengths

    Get PDF
    The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

    Constraint Based Hybrid Approach to Parsing Indian Languages

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Conceptual structure representation of causative verb in Malay language and relation with syntax [Representasi argumen struktur konseptual bagi kata kerja kausatif Bahasa Melayu dan hubungannya dengan sintaksis]

    Get PDF
    Causative verbs can refer to an act because of something that is happening, because nothing will happen without any specific reason, for the occurrence of a favor and the reason for to justifying something. Therefore, this study will examine lexical verbs that are causative in representing the structure of the argument and its relation to syntax. To examine the lexical verb of causative in representing the structure of the argument and its relation to the syntax, this study will focus on the lexical of verb kill, force, allowed and help from the organizational class of the conceptual structure of semantics using data taken from the Corpus Base of the Dewan Bahasa dan Pustaka as an example of a data. The analysis of the study will use the Conceptual Structure Theory – by Ray Jackendoff (2011). Semanticrepresentations are depicted in the form of conceptual structures within the Function [Event] that carry the conceptual structure of the formula [Event CAUSE ([THING, x], [Event ([y], [x])])]. The findings show the representation of conceptual structures for verbs to kill, force, allowed and help represented by constituents [Event CAUSE ([BE X], [Y])]), NO [event LET ([ GO X], [Y])], and [Event HELP ([ GO X], [Y])] to constituents [THING], [PLACE] and [PATH]. For conceptual structure representation to syntax, data findings indicate a connection between the Head Rules and Arguments Rules based on the role of the argument being a subset of the acceptance and verbal request to the Name Phrase. Consequently, this study will show the representation of conceptual structure and its relationship with systematic syntax and especially in the verbal nature

    Representasi argumen gerak ruang Bahasa Melayu berdasarkan teori struktur konseptual - Representation of the spatial motion in the Malay language based on the Conceptual Structure Theory

    Get PDF
    Gerak difahami sebagai perubahan dari satu lokasi ke lokasi yang lain. Oleh itu kajian ini tertumpu pada gerak ruang yang terdiri daripada gerak bersifat arah, gerak bersifat tambah, gerak bersifat menjadi dan gerak bersifat kausatif. Data yang digunakan dalam kajian ini diambil daripada Pangkalan data korpus Dewan Bahasa dan Pustaka, Malaysia sebagai bahan dan contoh kajian. Penganalisisan data menggunakan Teori Struktur Konseptual (TSK) oleh Jackendoff (1997 & 2011). Pemetaan argumen ini merangkumi tiga tatatingkat utama, iaitu pemetaan kepada representasi struktur konseptual, pemetaan kepada rajah pohon struktur konseptual dan pemetaan kepada peranan tematik. Hasil dapatan menunjukkan, kehadiran gerak ruang hadir dalam fungsi [Peristiwa] dan [Keadaan] yang membawa rumus asas [[[GERAK [ x [Peristiwa] [Benda] [Hala] [Tempat]]]. Hasil dapatan menunjukkan leksikal balik, berlari, tinggal, terletak, berada, bunuh, paksa, benar dan tolong mempunyai pemetaan representasi argumen yang tersendiri yang melibatkan konstituen seperti [Peristiwa],[Keadaan],[Benda] [Hala][Tempat] [Peristiwa SEBAB], [Peristiwa TIDAK], [Peristiwa BENAR] dan [Peristiwa TOLONG]. Bagi peringkat pemetaan dalam peranan tematik, item yang terlibat ialah peranan kepada Aktor, Tema, Matlamat, Sumber, Agen dan Penerima (-). Sehubungan dengan itu, kajian ini menunjukkan representasi struktur konseptual secara sistematik dan berpada khususnya dalam gerak ruang bahasa Melayu. ****************************************************************************** Motion is understood as a change from one location to another. Therefore, this study focuses on spatial motion consisting of directional, extensional, state and causative motion. This study utilizes data from the Pangkalan Data Korpus Dewan Bahasa dan Pustaka database in Malaysia for its study samples. The analysis employs Jackendoff’s (1997 & 2011) Theory of Conceptual Structure for its framework. The mapping of these arguments comprises three main stages, namely mapping to conceptual structure representation, mapping to conceptual tree diagrams and mapping to thematic roles. The results show that spatial motion is present in the [Event] and [States] functions that carry the basic formula [[[MOTION [x [Event] [Object] [Path] [Place]]]. The findings show that the lexical items balik, berlari, tinggal, terletak, berada, bunuh, paksa, benar and tolong have their own representation of arguments involving constituents such as [Event], [States], [Object] [Path] [Place] [Event] REASON], [NO Causative], [TRUE Causative] and [HELP Causative]. For the mapping stage in the thematic role, the item involved is the role of Actor, Theme, Goal, Source, Agent and Beneficiary (-). Accordingly, this study shows an adequate systematic representation of the conceptual structure, particularly in the spatial motion domain in the Malay language

    Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

    Full text link
    The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio
    corecore