99 research outputs found
Distributional analysis of copredication: towards distinguishing systematic polysemy from coercion
National audienceIn this paper we argue that the account of the notion of complex type based on copredication tests is problematic,because copredication is possible,albeit less frequent, also with expressions which exhibit polysemy due to coercion.We show through a distributional and lexico-syntactic pattern-based corpus analysis that the variability of copredication contexts is the key to distinguish complex types nouns from nouns subject to coercion
What lexical sets tell us about conceptual categories
It is common practice in computational linguistics to attempt to use selectional constraints and semantic type hierarchies as primary knowledge resources to perform word sense disambiguation (cf. Jurafsky and Martin 2000). The most widely adopted methodology is to start from a given ontology of types (e.g. Wordnet, cf. Miller and Fellbaum 2007) and try to use its implied conceptual categories to specify the combinatorial constraints on lexical items. Semantic Typing information about selectional preferences is then used to guide the induction of senses for both nouns and verbs in texts. Practical results have shown, however, that there are a number of problems with such an approach. For instance, as corpus-driven pattern analysis shows (cf. Hanks et al. 2007), the paradigmatic sets of words that populate specific argument slots within the same verb sense do not map neatly onto conceptual categories, as they often include words belonging to different types. Also, the internal composition of these sets changes from verb to verb, so that no stable generalization seems possible as to which lexemes belong to which semantic type (cf. Hanks and Jezek 2008). In this paper, we claim that these are not accidental facts related to the contingencies of a given ontology, but rather the result of an attempt to map distributional language behaviour onto semantic type systems that are not sufficiently grounded in real corpus data. We report the efforts done within the CPA project (cf. Hanks 2009) to build an ontology which satisfies such requirements and explore its advantages in terms of empirical validity over more speculative ontologies
Dati empirici e risorse lessicali
Introduction to CrOCEVIA monographic issue "Dati empirici e risorse lessicali
Distributional Analysis of Verbal Neologisms: Task Definition and Dataset Construction
In this paper we introduce the task of interpreting verbal neologism (VNeo) for the Italian language making use of a highly context-sensitive distributional semantic model (DSM). The task is commonly performed manually by lexicographers verifying the contexts in which the VNeo appear. Developing such a task is likely to be of use from a cognitive, social and linguistic perspective. In the following, we first outline the motivation for our study and our goal, then focus on the construction of the dataset and the definition of the task.In questo contributo introduciamo un task di interpretazione dei neologismi verbali (Vneo) in italiano, utilizzando un modello di semantica distribuzionale altamente sensibile al contesto. Questa attività è comunemente svolta manualmente dai lessicografi, i quali verificano il contesto in cui il Vneo appare. Sviluppare questo tipo di task può rivelarsi utile da una prospettiva linguistica, cognitiva e sociale. Di seguito presenteremo inizialmente le motivazioni e gli scopi dell’analisi, concentrandoci poi sulla costruzione del dataset e sulla definizione del task
Lexical Opposition in Discourse Contrast
We investigate the connection between lexical opposition and discourse relations, with a focus on the relation of contrast, in order to evaluate whether opposition participates in discourse relations. Through a corpus-based analysis of Italian documents, we show that the relation between opposition and contrast is not crucial, although not insignificant in the case of implicit relation. The correlation is even weaker when other discourse relations are taken into account.Studiamo la connessione tra l’opposizione lessicale e le relazioni del discorso, con attenzione alla relazione di contrasto, per verificare se l’opposizione partecipa alle relazioni del discorso. Attraverso un’analisi basata su un corpus di documenti in italiano, mostriamo che la relazione tra opposizione e contrasto non è cruciale, anche se non priva di importanza soprattutto per i casi di contrasto implicito. La correlazione sembra più debole se consideriamo le altre relazioni del discorso
opposition relations among verb frames
In this paper we propose a scheme for annotating opposition relations among verb frames in lexical resources. The scheme is tested on the T-PAS resource, an inventory of typed predicate argument structures for Italian, conceived for both linguistic research and computational tasks. After discussing opposition relations from a linguistic point of view and listing the tags we decided to use, we report the results of the experiment we performed to test the annotation scheme, in terms of interannotation agreement and linguistic analysis of annotated data
Lexical Opposition in Discourse Contrast
We investigate the connection between lexical opposition and discourse relations, with a focus on the relation of contrast, in order to evaluate whether opposition participates in discourse relations. Through a corpus-based analysis of Italian documents, we show that the relation between opposition and contrast is not crucial, although not insignificant in the case of implicit relation. The correlation is even weaker when other discourse relations are taken into account.Studiamo la connessione tra l’opposizione lessicale e le relazioni del discorso, con attenzione alla relazione di contrasto, per verificare se l’opposizione partecipa alle relazioni del discorso. Attraverso un’analisi basata su un corpus di documenti in italiano, mostriamo che la relazione tra opposizione e contrasto non è cruciale, anche se non priva di importanza soprattutto per i casi di contrasto implicito. La correlazione sembra più debole se consideriamo le altre relazioni del discorso
Corpus Patterns for Semantic Processing
This tutorial presents a corpus-driven, pattern-based empirical approach to meaning representation and computation. Patterns in text are everywhere, but techniques for identifying and processing them are still rudimentary. Patterns are not merely syntactic but syntagmatic: each pattern identifies a lexico-semantic clause structure consisting of a predicator (verb or predicative adjective) together with open-ended lexical sets of collocates in different clause roles (subject, object, prepositional argument, etc.). If NLP is to make progress in identifying and processing text meaning, pattern recognition and collocational analysis will play an essential role, because
Designing a Methodology for Semantic Type Tagging of Argument Positions
A verb argument position can be described by the semantic type that characterizes the words filling that position. We investigate a number of linguistic issues underlying the tagging of an Italian corpus with the semantic types provided by the T-PAS (Typed Predicate-Argument Structure) resource. Our main interest is to evaluate whether our annotation methodology can be employed effectively for the extension of the annotation of the corpus associated with the resource. In order to achieve this goal we compare quantitative data about the tagging and qualitative data derived from the Inter-Annotator Agreement
- …