17 research outputs found

    A clustering approach to automatic verb classification incorporating selectional preferences: model, implementation, and user manual

    Get PDF
    This report presents two variations of an innovative, complex approach to semantic verb classes that relies on selectional preferences as verb properties. The underlying linguistic assumption for this verb class model is that verbs which agree on their selectional preferences belong to a common semantic class. The model is implemented as a soft-clustering approach, in order to capture the polysemy of the verbs. The training procedure uses the Expectation-Maximisation (EM) algorithm (Baum, 1972) to iteratively improve the probabilistic parameters of the model, and applies the Minimum Description Length (MDL) principle (Rissanen, 1978) to induce WordNet-based selectional preferences for arguments within subcategorisation frames. One variation of the MDL principle replicates a standard MDL approach by Li and Abe (1998), the other variation presents an improved pruning strategy that outperforms the standard implementation considerably. Our model is potentially useful for lexical induction (e.g., verb senses, subcategorisation and selectional preferences, collocations, and verb alternations), and for NLP applications in sparse data situations. We demonstrate the usefulness of the model by a standard evaluation (pseudo-word disambiguation), and three applications (selectional preference induction, verb sense disambiguation, and semi-supervised sense labelling)

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    A distributional investigation of German verbs

    Get PDF
    Diese Dissertation bietet eine empirische Untersuchung deutscher Verben auf der Grundlage statistischer Beschreibungen, die aus einem großen deutschen Textkorpus gewonnen wurden. In einem kurzen Überblick über linguistische Theorien zur lexikalischen Semantik von Verben skizziere ich die Idee, dass die Verbbedeutung wesentlich von seiner Argumentstruktur (der Anzahl und Art der Argumente, die zusammen mit dem Verb auftreten) und seiner Aspektstruktur (Eigenschaften, die den zeitlichen Ablauf des vom Verb denotierten Ereignisses bestimmen) abhängt. Anschließend erstelle ich statistische Beschreibungen von Verben, die auf diesen beiden unterschiedlichen Bedeutungsfacetten basieren. Insbesondere untersuche ich verbale Subkategorisierung, Selektionspräferenzen und Aspekt. Alle diese Modellierungsstrategien werden anhand einer gemeinsamen Aufgabe, der Verbklassifikation, bewertet. Ich zeige, dass im Rahmen von maschinellem Lernen erworbene Merkmale, die verbale lexikalische Aspekte erfassen, für eine Anwendung von Vorteil sind, die Argumentstrukturen betrifft, nämlich semantische Rollenkennzeichnung. Darüber hinaus zeige ich, dass Merkmale, die die verbale Argumentstruktur erfassen, bei der Aufgabe, ein Verb nach seiner Aspektklasse zu klassifizieren, gut funktionieren. Diese Ergebnisse bestätigen, dass diese beiden Facetten der Verbbedeutung auf grundsätzliche Weise zusammenhängen.This dissertation provides an empirical investigation of German verbs conducted on the basis of statistical descriptions acquired from a large corpus of German text. In a brief overview of the linguistic theory pertaining to the lexical semantics of verbs, I outline the idea that verb meaning is composed of argument structure (the number and types of arguments that co-occur with a verb) and aspectual structure (properties describing the temporal progression of an event referenced by the verb). I then produce statistical descriptions of verbs according to these two distinct facets of meaning: In particular, I examine verbal subcategorisation, selectional preferences, and aspectual type. All three of these modelling strategies are evaluated on a common task, automatic verb classification. I demonstrate that automatically acquired features capturing verbal lexical aspect are beneficial for an application that concerns argument structure, namely semantic role labelling. Furthermore, I demonstrate that features capturing verbal argument structure perform well on the task of classifying a verb for its aspectual type. These findings suggest that these two facets of verb meaning are related in an underlying way

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field

    Implicit indefinite objects at the syntax-semantics-pragmatics interface: a probabilistic model of acceptability judgments

    Get PDF
    Optionally transitive verbs, whose Patient participant is semantically obligatory but syntactically optional (e.g., to eat, to drink, to write), deviate from the transitive prototype defined by Hopper and Thompson (1980). Following Fillmore (1986), unexpressed objects may be either indefinite (referring to prototypical Patients of a verb, whose actual entity is unknown or irrelevant) or definite (with a referent available in the immediate intra- or extra-linguistic context). This thesis centered on indefinite null objects, which the literature argues to be a gradient, non-categorical phenomenon possible with virtually any transitive verb (in different degrees depending on the verb semantics), favored or hindered by several semantic, aspectual, pragmatic, and discourse factors. In particular, the probabilistic model of the grammaticality of indefinite null objects hereby discussed takes into account a continuous factor (semantic selectivity, as a proxy to object recoverability) and four binary factors (telicity, perfectivity, iterativity, and manner specification). This work was inspired by Medina (2007), who modeled the effect of three predictors (semantic selectivity, telicity, and perfectivity) on the grammaticality of indefinite null objects (as gauged via Likert-scale acceptability judgments elicited from native speakers of English) within the framework of Stochastic Optimality Theory. In her variant of the framework, the constraints get floating rankings based on the input verb’s semantic selectivity, which she modeled via the Selectional Preference Strength measure by Resnik (1993, 1996). I expanded Medina’s model by modeling implicit indefinite objects in two languages (English and Italian), by using three different measures of semantic selectivity (Resnik’s SPS; Behavioral PISA, inspired by Medina’s Object Similarity measure; and Computational PISA, a novel similarity-based measure by Cappelli and Lenci (2020) based on distributional semantics), and by adding iterativity and manner specification as new predictors in the model. Both the English and the Italian five-predictor models based on Behavioral PISA explain almost half of the variance in the data, improving on the Medina-like three-predictor models based on Resnik’s SPS. Moreover, they have a comparable range of predicted object-dropping probabilities (30-100% in English, 30-90% in Italian), and the predictors perform consistently with theoretical literature on object drop. Indeed, in both models, atelic imperfective iterative manner-specified inputs are the most likely to drop their object (between 80% and 90%), while telic perfective non-iterative manner-unspecified inputs are the least likely (between 30% and 40%). The constraint re-ranking probabilities are always directly proportional to semantic selectivity, with the exception of Telic End in Italian. Both models show a main effect of telicity, but the second most relevant factor in the model is perfectivity in English and manner specification in Italian

    An Approach for Automatic Generation of on-line Information Systems based on the Integration of Natural Language Processing and Adaptive Hypermedia Techniques

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de ingeniería informática. Fecha de lectura: 29-05-200
    corecore