Search CORE

17 research outputs found

A clustering approach to automatic verb classification incorporating selectional preferences: model, implementation, and user manual

Author: Hying Christian
Scheible Christian
Schmid Helmut
Schulte im Walde Sabine
Wagner Wiebke
Publication venue
Publication date: 12/12/2015
Field of study

This report presents two variations of an innovative, complex approach to semantic verb classes that relies on selectional preferences as verb properties. The underlying linguistic assumption for this verb class model is that verbs which agree on their selectional preferences belong to a common semantic class. The model is implemented as a soft-clustering approach, in order to capture the polysemy of the verbs. The training procedure uses the Expectation-Maximisation (EM) algorithm (Baum, 1972) to iteratively improve the probabilistic parameters of the model, and applies the Minimum Description Length (MDL) principle (Rissanen, 1978) to induce WordNet-based selectional preferences for arguments within subcategorisation frames. One variation of the MDL principle replicates a standard MDL approach by Li and Abe (1998), the other variation presents an improved pruning strategy that outperforms the standard implementation considerably. Our model is potentially useful for lexical induction (e.g., verb senses, subcategorisation and selectional preferences, collocations, and verb alternations), and for NLP applications in sparse data situations. We demonstrate the usefulness of the model by a standard evaluation (pseudo-word disambiguation), and three applications (selectional preference induction, verb sense disambiguation, and semi-supervised sense labelling)

D6.1: Technologies and Tools for Lexical Acquisition

Author: Abrate Matteo
Bacciu Clara
Bel Nuria
Caselli Tommaso
Gavrilidou Maria
Korhonen Anna
Monachini Monica
Padr? Muntsa
Poibeau Thierry
Prokopidis Prokopis
Quochi Valeria
Revilla Eva
Rimell Laura
Tesconi Maurizio
Publication venue
Publication date
Field of study

This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

PUblication MAnagement

A distributional investigation of German verbs

Author: Roberts William
Publication venue: Humboldt-Universität zu Berlin
Publication date: 14/06/2023
Field of study

Diese Dissertation bietet eine empirische Untersuchung deutscher Verben auf der Grundlage statistischer Beschreibungen, die aus einem großen deutschen Textkorpus gewonnen wurden. In einem kurzen Überblick über linguistische Theorien zur lexikalischen Semantik von Verben skizziere ich die Idee, dass die Verbbedeutung wesentlich von seiner Argumentstruktur (der Anzahl und Art der Argumente, die zusammen mit dem Verb auftreten) und seiner Aspektstruktur (Eigenschaften, die den zeitlichen Ablauf des vom Verb denotierten Ereignisses bestimmen) abhängt. Anschließend erstelle ich statistische Beschreibungen von Verben, die auf diesen beiden unterschiedlichen Bedeutungsfacetten basieren. Insbesondere untersuche ich verbale Subkategorisierung, Selektionspräferenzen und Aspekt. Alle diese Modellierungsstrategien werden anhand einer gemeinsamen Aufgabe, der Verbklassifikation, bewertet. Ich zeige, dass im Rahmen von maschinellem Lernen erworbene Merkmale, die verbale lexikalische Aspekte erfassen, für eine Anwendung von Vorteil sind, die Argumentstrukturen betrifft, nämlich semantische Rollenkennzeichnung. Darüber hinaus zeige ich, dass Merkmale, die die verbale Argumentstruktur erfassen, bei der Aufgabe, ein Verb nach seiner Aspektklasse zu klassifizieren, gut funktionieren. Diese Ergebnisse bestätigen, dass diese beiden Facetten der Verbbedeutung auf grundsätzliche Weise zusammenhängen.This dissertation provides an empirical investigation of German verbs conducted on the basis of statistical descriptions acquired from a large corpus of German text. In a brief overview of the linguistic theory pertaining to the lexical semantics of verbs, I outline the idea that verb meaning is composed of argument structure (the number and types of arguments that co-occur with a verb) and aspectual structure (properties describing the temporal progression of an event referenced by the verb). I then produce statistical descriptions of verbs according to these two distinct facets of meaning: In particular, I examine verbal subcategorisation, selectional preferences, and aspectual type. All three of these modelling strategies are evaluated on a common task, automatic verb classification. I demonstrate that automatically acquired features capturing verbal lexical aspect are beneficial for an application that concerns argument structure, namely semantic role labelling. Furthermore, I demonstrate that features capturing verbal argument structure perform well on the task of classifying a verb for its aspectual type. These findings suggest that these two facets of verb meaning are related in an underlying way

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Recommended from our members

Automatic induction of verb classes using clustering

Author: Sun Lin
Publication venue: University of Cambridge
Publication date: 30/04/2013
Field of study

Verb classiﬁcations have attracted a great deal of interest in both linguistics and natural language processing (NLP). They have proved useful for important tasks and applications, including e.g. computational lexicography, parsing, word sense disambiguation, semantic role labelling, information extraction, question-answering, and machine translation (Swier and Stevenson, 2004; Dang, 2004; Shi and Mihalcea, 2005; Kipper et al., 2008; Zapirain et al., 2008; Rios et al., 2011). Particularly useful are classes which capture generalizations about a range of linguistic properties (e.g. lexical, (morpho-)syntactic, semantic), such as those proposed by Beth Levin (1993). However, full exploitation of such classes in real-world tasks has been limited because no comprehensive or domain-speciﬁc lexical classiﬁcation is available. This thesis investigates how Levin-style lexical semantic classes could be learned automatically from corpus data. Automatic acquisition is cost-effective when it involves either no or minimal supervision and it can be applied to any domain of interest where adequate corpus data is available. We improve on earlier work on automatic verb clustering. We introduce new features and new clustering methods to improve the accuracy and coverage. We evaluate our methods and features on well-established cross-domain datasets in English, on a speciﬁc domain of English (the biomedical) and on another language (French), reporting promising results. Finally, our task-based evaluation demonstrates that the automatically acquired lexical classes enable new approaches to some NLP tasks (e.g. metaphor identiﬁcation) and help to improve the accuracy of existing ones (e.g. argumentative zoning).This work was supported by a Dorothy Hodgkin PhD Scholarship

Apollo (Cambridge)

Acquisition and modeling of lexical knowledge: a corpus-based investigation of systematic polysemy

Author: Lapata Maria
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

Proceedings of the Conference on Natural Language Processing 2010

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2010
Field of study

This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field

Acronym

Implicit indefinite objects at the syntax-semantics-pragmatics interface: a probabilistic model of acceptability judgments

Author: CAPPELLI Giulia
Publication venue: 'Scuola Normale Superiore - Edizioni della Normale'
Publication date: 21/10/2022
Field of study

Optionally transitive verbs, whose Patient participant is semantically obligatory but syntactically optional (e.g., to eat, to drink, to write), deviate from the transitive prototype defined by Hopper and Thompson (1980). Following Fillmore (1986), unexpressed objects may be either indefinite (referring to prototypical Patients of a verb, whose actual entity is unknown or irrelevant) or definite (with a referent available in the immediate intra- or extra-linguistic context). This thesis centered on indefinite null objects, which the literature argues to be a gradient, non-categorical phenomenon possible with virtually any transitive verb (in different degrees depending on the verb semantics), favored or hindered by several semantic, aspectual, pragmatic, and discourse factors. In particular, the probabilistic model of the grammaticality of indefinite null objects hereby discussed takes into account a continuous factor (semantic selectivity, as a proxy to object recoverability) and four binary factors (telicity, perfectivity, iterativity, and manner specification). This work was inspired by Medina (2007), who modeled the effect of three predictors (semantic selectivity, telicity, and perfectivity) on the grammaticality of indefinite null objects (as gauged via Likert-scale acceptability judgments elicited from native speakers of English) within the framework of Stochastic Optimality Theory. In her variant of the framework, the constraints get floating rankings based on the input verb’s semantic selectivity, which she modeled via the Selectional Preference Strength measure by Resnik (1993, 1996). I expanded Medina’s model by modeling implicit indefinite objects in two languages (English and Italian), by using three different measures of semantic selectivity (Resnik’s SPS; Behavioral PISA, inspired by Medina’s Object Similarity measure; and Computational PISA, a novel similarity-based measure by Cappelli and Lenci (2020) based on distributional semantics), and by adding iterativity and manner specification as new predictors in the model. Both the English and the Italian five-predictor models based on Behavioral PISA explain almost half of the variance in the data, improving on the Medina-like three-predictor models based on Resnik’s SPS. Moreover, they have a comparable range of predicted object-dropping probabilities (30-100% in English, 30-90% in Italian), and the predictors perform consistently with theoretical literature on object drop. Indeed, in both models, atelic imperfective iterative manner-specified inputs are the most likely to drop their object (between 80% and 90%), while telic perfective non-iterative manner-unspecified inputs are the least likely (between 30% and 40%). The constraint re-ranking probabilities are always directly proportional to semantic selectivity, with the exception of Telic End in Italian. Both models show a main effect of telicity, but the second most relevant factor in the model is perfectivity in English and manner specification in Italian

Archivio istituzionale della Ricerca - Scuola Normale Superiore

An Approach for Automatic Generation of on-line Information Systems based on the Integration of Natural Language Processing and Adaptive Hypermedia Techniques

Author: Alfonseca Cubero Enrique
Publication venue
Publication date: 01/01/2003
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de ingeniería informática. Fecha de lectura: 29-05-200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo