6 research outputs found

    What lexical sets tell us about conceptual categories

    Get PDF
    It is common practice in computational linguistics to attempt to use selectional constraints and semantic type hierarchies as primary knowledge resources to perform word sense disambiguation (cf. Jurafsky and Martin 2000). The most widely adopted methodology is to start from a given ontology of types (e.g. Wordnet, cf. Miller and Fellbaum 2007) and try to use its implied conceptual categories to specify the combinatorial constraints on lexical items. Semantic Typing information about selectional preferences is then used to guide the induction of senses for both nouns and verbs in texts. Practical results have shown, however, that there are a number of problems with such an approach. For instance, as corpus-driven pattern analysis shows (cf. Hanks et al. 2007), the paradigmatic sets of words that populate specific argument slots within the same verb sense do not map neatly onto conceptual categories, as they often include words belonging to different types. Also, the internal composition of these sets changes from verb to verb, so that no stable generalization seems possible as to which lexemes belong to which semantic type (cf. Hanks and Jezek 2008). In this paper, we claim that these are not accidental facts related to the contingencies of a given ontology, but rather the result of an attempt to map distributional language behaviour onto semantic type systems that are not sufficiently grounded in real corpus data. We report the efforts done within the CPA project (cf. Hanks 2009) to build an ontology which satisfies such requirements and explore its advantages in terms of empirical validity over more speculative ontologies

    Wie man aus Wörtern Bedeutungen macht: Semantische Typen treffen Valenzen

    Get PDF
    Wie versteht ein Hörer oder Leser die von einem Sprecher oder Schreiber beabsichtigte Bedeutung? Syntaktische Strukturen sind zu allgemein, um feine Bedeutungsunterscheidungen auszudrücken. Wörter sind oft sehr mehrdeutig, und aufgrund dessen unzuverlässig als „Bedeutungsleitfaden“. Im Gegensatz dazu zeigt die Korpusmusteranalyse, dass die meisten Äußerungen aus Mustern von vergleichsweise geringer Mehrdeutigkeit aufgebaut sind. Daher stellt sich die Frage: Was ist ein Muster? Muster sind häufig verwendete Sprachbausteine, die aus zwei Elementen bestehen: Valenzen und Kollokationen. Während Valenzen relativ stabil sind, sind Kollokationen extrem variabel. In der Korpusmusteranalyse wird eine große Anzahl von Gebrauchsbelegen jedes Wortes studiert, und seine Kollokationen werden, ihren semantischen Typen entsprechend, lexikalischen Sets zugeordnet. Jedes Wort einer Sprache ist Bestandteil von mindestens einem Muster. Wenn es Teil von mehr als einem Muster ist, können die Bedeutungen seiner Muster meist durch unterschiedliche Kollokations-Präferenzen unterschieden werden. Kreative Benutzungen sind Abweichungen von normalen Nutzmustern, aber Abweichungen sind selbst regelgeleitet. Daher benötigt man eine Theorie von Normen und Abweichungen. Da die zwei Regelsysteme interagieren, können wir die Theorie als eine „Doppelhelix“ beschreiben

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    CLUB Working Papers in Linguistics Volume 2

    Get PDF
    Questo secondo volume della collana CLUB Working Papers in Linguistics raccoglie alcuni dei contributi presentati durante il secondo anno di attività del CLUB – Circolo Linguistico dell’Università di Bologna (a.a. 2016/2017). Il volume contiene dieci saggi a firma di Fabio Ardolino (vincitore del premio CLUB ‘Una tesi in linguistica’ per l’anno 2017), Ilaria Fiorentini, Giuliana Fiorentino, Chiara Gianollo, Eugenio Goria, Elisabetta Jezek, Alberto Manco, Caterina Mauri, Francesco Olivucci, Andrea Sansò

    Automated induction of sense in context

    No full text

    Automated Induction of Sense in Context James PUSTEJOVSKY

    No full text
    In this paper, we introduce a model for sense assignment which relies on assigning senses to the contexts within which words appear, rather than to the words themselves. We argue that word senses as such are not directly encoded in the lexicon of the language. Rather, each word is associated with one or more stereotypical syntagmatic patterns, which we call selection contexts. Each selection context is associated with a meaning, which can be expressed in any of various formal or computational manifestations. We present a formalism for encoding contexts that help to determine the semantic contribution of a word in an utterance. Further, we develop a methodology through which such stereotypical contexts for words and phrases can be identified from very large corpora, and subsequently structured in a selection context dictionary, encoding both stereotypical syntactic and semantic information. We present some preliminary results
    corecore