2,846 research outputs found

    Distributed Representations for Compositional Semantics

    Full text link
    The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches --- meaning distributed representations that exploit co-occurrence statistics of large corpora --- have proved popular and successful across a number of tasks. However, natural language usually comes in structures beyond the word level, with meaning arising not only from the individual words but also the structure they are contained in at the phrasal or sentential level. Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is an equally fundamental task of NLP. This dissertation explores methods for learning distributed semantic representations and models for composing these into representations for larger linguistic units. Our underlying hypothesis is that neural models are a suitable vehicle for learning semantically rich representations and that such representations in turn are suitable vehicles for solving important tasks in natural language processing. The contribution of this thesis is a thorough evaluation of our hypothesis, as part of which we introduce several new approaches to representation learning and compositional semantics, as well as multiple state-of-the-art models which apply distributed semantic representations to various tasks in NLP.Comment: DPhil Thesis, University of Oxford, Submitted and accepted in 201

    (Anti-)Control in German: evidence from comparative, corpus- and psycholinguistic studies

    Get PDF
    The present investigation targets the phenomenon commonly called control. Many languages including German and Polish employ non-finite clauses (besides finite clauses) as propositional complements. The subject of these complement clauses is left unexpressed and must generally be interpreted co-referentially with the subject or object of the matrix clause (subject or object control). However. there are also infinitive-selecting verbs that do not allow for a co- referential interpretation of the embedded subject - semantically, the embedded infinitives of these anti-control verbs are thus less dependent on or less unifiable with the matrix proposition. In Polish anti-control constructions, non-finite complements are overtly marked with the complementizer zeby, suggesting that they are structurally more complex (namely. containing a C-projection) than the non-finite complements in control constructions lacking zeby (modulo special contexts. viz. 'control switch'). In a comparative perspective, the paper brings corpuslinguistic and experimental evidence to bear on the question whether surface appearances notwithstanding, the infinitival complements of anti-control verbs in German should similarly be analyzed as truly sentential, i.e., C-headed structures

    Syntactic diacrisis in a rigid and a free word order language

    Get PDF
    The paper is concerned with some syntactic consequences of Polish being a synthetic language with a rich system of case inflections and English lacking morphological case (or having a residual form of it). It will be argued that this typologically significant grammatical difference provides an essential premise in a unified explanation for the clustering of a number of syntactic differences between the two languages.The argument is based on a set of functionally motivated constraints on grammatical representations. The constraints are proposed as a part of a theory of “syntactic diacrisis” and are claimed to result from a) the general nature of language as a semiotic system, and b) the specific properties of the human parsing mechanism.The paper consists of three sections. The first contains a brief discussion of the role and place of functional explanations in syntax and introduces the concept of a “parser’s requirement on structure” (PROS).The second section introduces and justifies some basic principles of “syntactic diacrisis”.The third focuses on several syntactic differences between English and Polish and shows how they could all be explained by reference to the interplay of the functional (theory of diacrisis)and grammatical factors.The paper is concerned with some syntactic consequences of Polish being a synthetic language with a rich system of case inflections and English lacking morphological case (or having a residual form of it). It will be argued that this typologically significant grammatical difference provides an essential premise in a unified explanation for the clustering of a number of syntactic differences between the two languages.The argument is based on a set of functionally motivated constraints on grammatical representations. The constraints are proposed as a part of a theory of “syntactic diacrisis” and are claimed to result from a) the general nature of language as a semiotic system, and b) the specific properties of the human parsing mechanism.The paper consists of three sections. The first contains a brief discussion of the role and place of functional explanations in syntax and introduces the concept of a “parser’s requirement on structure” (PROS).The second section introduces and justifies some basic principles of “syntactic diacrisis”.The third focuses on several syntactic differences between English and Polish and shows how they could all be explained by reference to the interplay of the functional (theory of diacrisis)and grammatical factors

    Advances in formal Slavic linguistics 2017

    Get PDF
    Advances in Formal Slavic Linguistics 2017 is a collection of fifteen articles that were prepared on the basis of talks given at the conference Formal Description of Slavic Languages 12.5, which was held on December 7-9, 2017, at the University of Nova Gorica. The volume covers a wide array of topics, such as control verbs, instrumental arguments, and perduratives in Russian, comparatives, negation, n-words, negative polarity items, and complementizer ellipsis in Czech, impersonal se-constructions and complementizer doubling in Slovenian, prosody and the morphology of multi-purpose suffixes in Serbo-Croatian, and indefinite numerals and the binding properties of dative arguments in Polish. Importantly, by exploring these phenomena in individual Slavic languages, the collection of articles in this volume makes a significant contribution to both Slavic linguistics and to linguistics in general

    Formal approaches to number in Slavic and beyond (Volume 5)

    Get PDF
    The goal of this collective monograph is to explore the relationship between the cognitive notion of number and various grammatical devices expressing this concept in natural language with a special focus on Slavic. The book aims at investigating different morphosyntactic and semantic categories including plurality and number-marking, individuation and countability, cumulativity, distributivity and collectivity, numerals, numeral modifiers and classifiers, as well as other quantifiers. It gathers 19 contributions tackling the main themes from different theoretical and methodological perspectives in order to contribute to our understanding of cross-linguistic patterns both in Slavic and non-Slavic languages

    Experimental Evidence for the Syntax of Phrasal Comparatives in Polish

    Get PDF
    Pancheva (2009) argues that phrasal comparatives in Polish exhibit a subject-island effect. She proposes an account of the island effect as a combination of several factors: than has a small clause complement in phrasal comparatives; wh-movement turns the than-clause into a degree predicate; wh-movement of the vP subject is prohibited by an anti-locality constraint; sub-extraction of the vP subject is then the only option, but it causes an island violation. Informally elicited judgments support this proposal but there is a fair amount of variability among and even within speakers. Given this variability in speakers’ responses, we need to elicit judgments in controlled conditions allowing subsequent quantitative analysis. We conducted two acceptability-rating studies on Polish comparatives following standard experimental procedures and testing a large number of speakers. The results support the small clause analysis of phrasal comparatives

    Adopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directions

    Get PDF
    This paper explores a discourse relations annotation project carried out under the CLARIN-PL initiative, leveraging the ISO 24617-8 standard. The goal is to boost research interoperability and foster multilingual research. Our team of three linguist-annotators tackled the annotation of a corpus spanning several genres, including e.g., literature and press articles in the Polish language. This effort was guided by a project expert and external linguists from the CLARIN-PL language technology research infrastructure. Several significant challenges emerged during the process. Ambiguities within the ISO standard’s relation categories, poorly-defined definitions for certain relation categories, and the difficulty of identifying and annotating implicit discourse relations, which lack explicit discourse connectives or signaling devices, were among the key issues. To overcome these problems, we implemented strategies such as regular team meetings, collaborative annotation forms, and preliminary revisions to the annotation scheme. This paper presents the project, the annotation process, and offers initial annotation data on the discourse relations and connectives identified within the corpus. Looking forward, we discuss potential enhancements to the process, including additional revisions to the guidelines and conclude with an overview of the project’s contributions and a discussion of our future development plans

    Advances in formal Slavic linguistics 2017

    Get PDF
    Advances in Formal Slavic Linguistics 2017 is a collection of fifteen articles that were prepared on the basis of talks given at the conference Formal Description of Slavic Languages 12.5, which was held on December 7–9, 2017, at the University of Nova Gorica. The volume covers a wide array of topics, such as control verbs, instrumental arguments, and perduratives in Russian, comparatives, negation, n-words, negative polarity items, and complementizer ellipsis in Czech, impersonal se-constructions and complementizer doubling in Slovenian, prosody and the morphology of multi-purpose suffixes in Serbo-Croatian, and indefinite numerals and the binding properties of dative arguments in Polish. Importantly, by exploring these phenomena in individual Slavic languages, the collection of articles in this volume makes a significant contribution to both Slavic linguistics and to linguistics in general
    • 

    corecore