101 research outputs found

    Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

    Full text link
    Existing approaches to automatic VerbNet-style verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for inducing VerbNets for multiple languages. To the best of our knowledge, this is the first study which demonstrates how the architectures for learning word embeddings can be applied to this challenging syntactic-semantic task. Our method uses cross-lingual translation pairs to tie each of the six target languages into a bilingual vector space with English, jointly specialising the representations to encode the relational information from English VerbNet. A standard clustering algorithm is then run on top of the VerbNet-specialised representations, using vector dimensions as features for learning verb classes. Our results show that the proposed cross-lingual transfer approach sets new state-of-the-art verb classification performance across all six target languages explored in this work.Comment: EMNLP 2017 (long paper

    Reflexive Space. A Constructionist Model of the Russian Reflexive Marker

    Get PDF
    This study examines the structure of the Russian Reflexive Marker ( ся/-сь) and offers a usage-based model building on Construction Grammar and a probabilistic view of linguistic structure. Traditionally, reflexive verbs are accounted for relative to non-reflexive verbs. These accounts assume that linguistic structures emerge as pairs. Furthermore, these accounts assume directionality where the semantics and structure of a reflexive verb can be derived from the non-reflexive verb. However, this directionality does not necessarily hold diachronically. Additionally, the semantics and the patterns associated with a particular reflexive verb are not always shared with the non-reflexive verb. Thus, a model is proposed that can accommodate the traditional pairs as well as for the possible deviations without postulating different systems. A random sample of 2000 instances marked with the Reflexive Marker was extracted from the Russian National Corpus and the sample used in this study contains 819 unique reflexive verbs. This study moves away from the traditional pair account and introduces the concept of Neighbor Verb. A neighbor verb exists for a reflexive verb if they share the same phonological form excluding the Reflexive Marker. It is claimed here that the Reflexive Marker constitutes a system in Russian and the relation between the reflexive and neighbor verbs constitutes a cross-paradigmatic relation. Furthermore, the relation between the reflexive and the neighbor verb is argued to be of symbolic connectivity rather than directionality. Effectively, the relation holding between particular instantiations can vary. The theoretical basis of the present study builds on this assumption. Several new variables are examined in order to systematically model variability of this symbolic connectivity, specifically the degree and strength of connectivity between items. In usage-based models, the lexicon does not constitute an unstructured list of items. Instead, items are assumed to be interconnected in a network. This interconnectedness is defined as Neighborhood in this study. Additionally, each verb carves its own niche within the Neighborhood and this interconnectedness is modeled through rhyme verbs constituting the degree of connectivity of a particular verb in the lexicon. The second component of the degree of connectivity concerns the status of a particular verb relative to its rhyme verbs. The connectivity within the neighborhood of a particular verb varies and this variability is quantified by using the Levenshtein distance. The second property of the lexical network is the strength of connectivity between items. Frequency of use has been one of the primary variables in functional linguistics used to probe this. In addition, a new variable called Constructional Entropy is introduced in this study building on information theory. It is a quantification of the amount of information carried by a particular reflexive verb in one or more argument constructions. The results of the lexical connectivity indicate that the reflexive verbs have statistically greater neighborhood distances than the neighbor verbs. This distributional property can be used to motivate the traditional observation that the reflexive verbs tend to have idiosyncratic properties. A set of argument constructions, generalizations over usage patterns, are proposed for the reflexive verbs in this study. In addition to the variables associated with the lexical connectivity, a number of variables proposed in the literature are explored and used as predictors in the model. The second part of this study introduces the use of a machine learning algorithm called Random Forests. The performance of the model indicates that it is capable, up to a degree, of disambiguating the proposed argument construction types of the Russian Reflexive Marker. Additionally, a global ranking of the predictors used in the model is offered. Finally, most construction grammars assume that argument construction form a network structure. A new method is proposed that establishes generalization over the argument constructions referred to as Linking Construction. In sum, this study explores the structural properties of the Russian Reflexive Marker and a new model is set forth that can accommodate both the traditional pairs and potential deviations from it in a principled manner.Siirretty Doriast

    Studies in the Morphosyntax of Native and Greek-Origin Verbs

    Get PDF
    The study clarifies certain details of the Coptic verbal system, such as diathetic classes of labile verbs, semantic classes of non-labile mutable verbs, stative: infinitive opposition, the functional range of the periphrastic construction, integration of Greek loan verbs into Coptic valency alternation system and the role of the loaned morphology in that system. In all these problems, we find manifested the interaction between two grammatical categories, transitivity and aspect. The introductory chapter briefly states the research objectives and gives a general overview of the linguistic material and theory employed. The first chapter studies major regularities in the transitivity alternations of native Egyptian verbs. Defining the Coptic conjugation system by two parameters, aspect and transitivity, I examine the functions of the absolute infinitive as the only unmarked form opposed, on the one hand, to transitive eventive construct forms, and on the other hand, to intransitive stative. The system of conjugation patterns is analyzed as a templatic system where a specific conjugation pattern ascribes not only tense, aspect, and modus, but also voice to an unmarked verbal form. Finally, the native verbs are classified into four groups based on the formal criteria of mutability and lability, and this classification is found to correlate with the semantic one based on the agentivity and telicity of verbal lexemes. I also look into the diachrony of the aspect-transitivity cluster and use the two-parameter model to explain various synchronic anomalies of Coptic verbal valency. The second chapter looks into semantic and grammatical factors triggering the use of the periphrastic pattern which is shown to fulfil the whole range of functions, from punctual passive to resultative, depending on the lexical properties of the verb. The third chapter explores the diathesis of Greek loan verbs in Sahidic. Valency-changing devices for Greek verbs are examined and compared with those operating on native verbs. The occasional use of Greek middle-passive suffix is analyzed as the vestige of parallel system borrowing

    Bootstrapping an Italian VerbNet: data-driven analysis of verb alternations

    Get PDF
    The goal of this paper is to propose a classification of the syntactic alternations admitted by the most frequent Italian verbs. The data-driven two-steps procedure exploited and the structure of the identified classes of alternations are presented in depth and discussed. Even if this classification has been developed with a practical application in mind, namely the semi-automatic building of a VerbNet-like lexicon for Italian verbs, partly following the methodology proposed in the context of the VerbNet project, its availability may have a positive impact on several related research topics and Natural Language Processing tasks

    Syntaxe computationnelle du hongrois : de l'analyse en chunks à la sous-catégorisation verbale

    Get PDF
    We present the creation of two resources for Hungarian NLP applications: a rule-based shallow parser and a database of verbal subcategorization frames. Hungarian, as a non-configurational language with a rich morphology, presents specific challenges for NLP at the level of morphological and syntactic processing. While efficient and precise morphological analyzers are already available, Hungarian is under-resourced with respect to syntactic analysis. Our work aimed at overcoming this problem by providing resources for syntactic processing. Hungarian language is characterized by a rich morphology and a non-configurational encoding of grammatical functions. These features imply that the syntactic processing of Hungarian has to rely on morphological features rather than on constituent order. The broader interest of our undertaking is to propose representations and methods that are adapted to these specific characteristics, and at the same time are in line with state of the art research methodologies. More concretely, we attempt to adapt current results in argument realization and lexical semantics to the task of labeling sentence constituents according to their syntactic function and semantic role in Hungarian. Syntax and semantics are not completely independent modules in linguistic analysis and language processing: it has been known for decades that semantic properties of words affect their syntactic distribution. Within the syntax-semantics interface, the field of argument realization deals with the (partial or complete) prediction of verbal subcategorization from semantic properties. Research on verbal lexical semantics and semantically motivated mapping has been concentrating on predicting the syntactic realization of arguments, taking for granted (either explicitly or implicitly) that the distinction between arguments and adjuncts is known, and that adjuncts' syntactic realization is governed by productive syntactic rules, not lexical properties. However, besides the correlation between verbal aspect or actionsart and time adverbs (e.g. Vendler, 1967 or Kiefer, 1992 for Hungarian), the distribution of adjuncts among verbs or verb classes did not receive significant attention, especially within the lexical semantics framework. We claim that contrary to the widely shared presumption, adjuncts are often not fully productive. We therefore propose a gradual notion of productivity, defined in relation to Levin-type lexical semantic verb classes (Levin, 1993; Levin and Rappaport-Hovav, 2005). The definition we propose for the argument-adjunct dichotomy is based on evidence from Hungarian and exploits the idea that lexical semantics not only influences complement structure but is the key to the argument-adjunct distinction and the realization of adjunctsLa linguistique informatique est un domaine de recherche qui se concentre sur les méthodes et les perspectives de la modélisation formelle (statistique ou symbolique) de la langue naturelle. La linguistique informatique, tout comme la linguistique théorique, est une discipline fortement modulaire : les niveaux d'analyse linguistique comprennent la segmentation, l'analyse morphologique, la désambiguïsation, l'analyse syntaxique et sémantique. Tandis qu'un nombre d'outils existent déjà pour les traitements de bas niveau (analyse morphologique, étiquetage grammatical), le hongrois peut être considéré comme une langue peu doté pour l'analyse syntaxique et sémantique. Le travail décrit dans la présente thèse vise à combler ce manque en créant des ressources pour le traitement syntaxique du hongrois : notamment, un analyseur en chunks et une base de données lexicale de schémas de sous-catégorisation verbale. La première partie de la recherche présentée ici se concentre sur la création d'un analyseur syntaxique de surface (ou analyseur en chunks) pour le hongrois. La sortie de l'analyseur de surface est conçue pour servir d'entrée pour un traitement ultérieur visant à annoter les relations de dépendance entre le prédicat et ses compléments essentiels et circonstanciels. L'analyseur profond est mis en œuvre dans NooJ (Silberztein, 2004) en tant qu'une cascade de grammaires. Le deuxième objectif de recherche était de proposer une représentation lexicale pour la structure argumentale en hongrois. Cette représentation doit pouvoir gérer la vaste gamme de phénomènes qui échappent à la dichotomie traditionnelle entre un complément essentiel et un circonstanciel (p. ex. des structures partiellement productives, des écarts entre la prédictibilité syntaxique et sémantique). Nous avons eu recours à des résultats de la recherche récente sur la réalisation d'arguments et choisi un cadre qui répond à nos critères et qui est adaptable à une langue non-configurationnelle. Nous avons utilisé la classification sémantique de Levin (1993) comme modèle. Nous avons adapté les notions relatives à cette classification, à savoir celle de la composante sémantique et celle de l'alternance syntaxique, ainsi que la méthodologie d'explorer et de décrire le comportement des prédicats à l'aide de cette représentation, à la tâche de construire une représentation lexicale des verbes dans une langue non-configurationnelle. La première étape consistait à définir les règles de codage et de construire un vaste base de données lexicale pour les verbes et leurs compléments. Par la suite, nous avons entrepris deux expériences pour l'enrichissement de ce lexique avec des informations sémantiques lexicales afin de formaliser des généralisations syntaxiques et sémantiques pertinentes sur les classes de prédicats sous-jacentes. La première approche que nous avons testée consistait en une élaboration manuelle de classification de verbes en fonction de leur structure de compléments et de l'attribution de rôles sémantiques à ces compléments. Nous avons cherché la réponse aux questions suivantes: quelles sont les composants sémantiques pertinents pour définir une classification sémantique des prédicats hongrois? Quelles sont les implications syntaxiques spécifiques à ces classes? Et, plus généralement, quelle est la nature des alternances spécifiques aux classes verbales en hongrois ? Dans la phase finale de la recherche, nous avons étudié le potentiel de l'acquisition automatique pour extraire des classes de verbes à partir de corpus. Nous avons effectué une classification non supervisée, basée sur des données distributionnelles, pour obtenir une classification sémantique pertinente des verbes hongrois. Nous avons également testé la méthode de classification non supervisée sur des données françaises

    A distributional investigation of German verbs

    Get PDF
    Diese Dissertation bietet eine empirische Untersuchung deutscher Verben auf der Grundlage statistischer Beschreibungen, die aus einem großen deutschen Textkorpus gewonnen wurden. In einem kurzen Überblick über linguistische Theorien zur lexikalischen Semantik von Verben skizziere ich die Idee, dass die Verbbedeutung wesentlich von seiner Argumentstruktur (der Anzahl und Art der Argumente, die zusammen mit dem Verb auftreten) und seiner Aspektstruktur (Eigenschaften, die den zeitlichen Ablauf des vom Verb denotierten Ereignisses bestimmen) abhängt. Anschließend erstelle ich statistische Beschreibungen von Verben, die auf diesen beiden unterschiedlichen Bedeutungsfacetten basieren. Insbesondere untersuche ich verbale Subkategorisierung, Selektionspräferenzen und Aspekt. Alle diese Modellierungsstrategien werden anhand einer gemeinsamen Aufgabe, der Verbklassifikation, bewertet. Ich zeige, dass im Rahmen von maschinellem Lernen erworbene Merkmale, die verbale lexikalische Aspekte erfassen, für eine Anwendung von Vorteil sind, die Argumentstrukturen betrifft, nämlich semantische Rollenkennzeichnung. Darüber hinaus zeige ich, dass Merkmale, die die verbale Argumentstruktur erfassen, bei der Aufgabe, ein Verb nach seiner Aspektklasse zu klassifizieren, gut funktionieren. Diese Ergebnisse bestätigen, dass diese beiden Facetten der Verbbedeutung auf grundsätzliche Weise zusammenhängen.This dissertation provides an empirical investigation of German verbs conducted on the basis of statistical descriptions acquired from a large corpus of German text. In a brief overview of the linguistic theory pertaining to the lexical semantics of verbs, I outline the idea that verb meaning is composed of argument structure (the number and types of arguments that co-occur with a verb) and aspectual structure (properties describing the temporal progression of an event referenced by the verb). I then produce statistical descriptions of verbs according to these two distinct facets of meaning: In particular, I examine verbal subcategorisation, selectional preferences, and aspectual type. All three of these modelling strategies are evaluated on a common task, automatic verb classification. I demonstrate that automatically acquired features capturing verbal lexical aspect are beneficial for an application that concerns argument structure, namely semantic role labelling. Furthermore, I demonstrate that features capturing verbal argument structure perform well on the task of classifying a verb for its aspectual type. These findings suggest that these two facets of verb meaning are related in an underlying way
    corecore