20 research outputs found

    Class-based probability estimation using a semantic hierarchy

    Get PDF
    This article concerns the estimation of a particular kind of probability, namely, the probability of a noun sense appearing as a particular argument of a predicate. In order to overcome the accompanying sparse-data problem, the proposal here is to define the probabilities in terms of senses from a semantic hierarchy and exploit the fact that the senses can be grouped into classes consisting of semantically similar senses. There is a particular focus on the problem of how to determine a suitable class for a given sense, or, alternatively, how to determine a suitable level of generalization in the hierarchy. A procedure is developed that uses a chi-square test to determine a suitable level of generalization. In order to test the performance of the estimation method, a pseudo-disambiguation task is used, together with two alternative estimation methods. Each method uses a different generalization procedure; the first alternative uses the minimum description length principle, and the second uses Resnik's measure of selectional preference. In addition, the performance of our method is investigated using both the standard Pearson chi-square statistic and the log-likelihood chi-square statistic

    Mit iszunk? : a Magyar WordNet automatikus kiterjesztése szelekciós preferenciákat ábrázoló szófajközi relációkkal

    Get PDF
    A cikkben bemutatott, folyamatban lév munkálatok célja a Magyar WordNet automatikus kiegészítése új, különböz argumentumpozíciók szelekciós preferenciáit ábrázoló ige-fnév relációkkal. Bemutatunk egy algoritmust, amely korpuszgyakorisági adatok és a WordNet hierarchikus szerkezete alapján megkísérli azonosítani a vonzatpozíciók szemantikai típusait legjobban reprezentáló HuWN hipernima-algráfokat. Az eljárás segítségével minden, a korpuszban megtalálható, esetraggal vagy névutóval jelölt igei argumentumpozíciót igyekszünk lefedni. Nem célunk egyértelm, kizárólagos kategóriák kijelölése, ehelyett súlyozott listák segítségével igyekszünk felsorolni a megfigyelt példákból általánosítható leggyakoribb típusokat. Az eredmények reményeink szerint a Magyar WordNet felhasználóin felül az általunk fejlesztett szintaktikai elemz számára is hasznos erforrásként fognak szolgálni. A cikkben bemutatunk néhány elzetes eredményt és szót ejtünk néhány felmerül kérdésről

    Verb selection preferences: a computational approach

    Get PDF
    Il lavoro mira a fornire una rappresentazione delle preferenze di selezione verbali per la lingua italiana. L'esperimento si ricollega alle metodologie basate su corpora e si articola in due fasi: l'estrazione degli argomenti dai corpora scelti e la generalizzazione delle preferenze di selezione utilizzando un'ontologia lessicale. Le risorse utilizzate sono: LexIt, un lessico di valenza per i verbi italiani, come risorsa lessicale, e MultiWordNet, come ontologia. L'obiettivo è fornire un livello di rappresentazione dettagliato del comportamento verbale navigando l'intera rete semantica e facendo emergere comportamenti più specifici nelle preferenze di selezione degli argomenti verbali.The present article aims at providing a representation of verb selection preferences for the Italian language. The experiment connects methods based on corpora, and it has been carried on in two steps: first, topic mining from the selected corpora; then, the generalization of verb selection preferences using a lexical ontology.The resources used for the experiment are the following: LexIt, a lexicon for Italian verbs as a lexical resource, and MultiWordNet as an ontology.The aim of the study is provinding a detailed representation level of the verbal behaviour and selection through the semantic network, highlighting specific behaviours in preferences of verb selection

    D7.1. Criteria for evaluation of resources, technology and integration.

    Get PDF
    This deliverable defines how evaluation is carried out at each integration cycle in the PANACEA project. As PANACEA aims at producing large scale resources, evaluation becomes a critical and challenging issue. Critical because it is important to assess the quality of the results that should be delivered to users. Challenging because we prospect rather new areas, and through a technical platform: some new methodologies will have to be explored or old ones to be adapted

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    A Probabilistic Model of Semantic Plausibility in Sentence Processing

    Get PDF
    Experimental research shows that human sentence processing uses information from different levels of linguistic analysis, for example lexical and syntactic preferences as well as semantic plausibility. Existing computational models of human sentence processing, however, have focused primarily on lexico-syntactic factors. Those models that do account for semantic plausibility effects lack a general model of human plausibility intuitions at the sentence level. Within a probabilistic framework, we propose a widecoverage model that both assigns thematic roles to verb-argument pairs and determines a preferred interpretation by evaluating the plausibility of the resulting (verb,role,argument) triples. The model is trained on a corpus of role-annotated language data. We also present a transparent integration of the semantic model with an incremental probabilistic parser. We demonstrate that both the semantic plausibility model and the combined syntax/semantics model predict judgment and reading time data from the experimental literature. 1

    A clustering approach to automatic verb classification incorporating selectional preferences: model, implementation, and user manual

    Get PDF
    This report presents two variations of an innovative, complex approach to semantic verb classes that relies on selectional preferences as verb properties. The underlying linguistic assumption for this verb class model is that verbs which agree on their selectional preferences belong to a common semantic class. The model is implemented as a soft-clustering approach, in order to capture the polysemy of the verbs. The training procedure uses the Expectation-Maximisation (EM) algorithm (Baum, 1972) to iteratively improve the probabilistic parameters of the model, and applies the Minimum Description Length (MDL) principle (Rissanen, 1978) to induce WordNet-based selectional preferences for arguments within subcategorisation frames. One variation of the MDL principle replicates a standard MDL approach by Li and Abe (1998), the other variation presents an improved pruning strategy that outperforms the standard implementation considerably. Our model is potentially useful for lexical induction (e.g., verb senses, subcategorisation and selectional preferences, collocations, and verb alternations), and for NLP applications in sparse data situations. We demonstrate the usefulness of the model by a standard evaluation (pseudo-word disambiguation), and three applications (selectional preference induction, verb sense disambiguation, and semi-supervised sense labelling)

    Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources

    Get PDF
    The manual construction of formal domain conceptualizations (ontologies) is labor-intensive. Ontology learning, by contrast, provides (semi-)automatic ontology generation from input data such as domain text. This thesis proposes a novel approach for learning labels of non-taxonomic ontology relations. It combines corpus-based techniques with reasoning on Semantic Web data. Corpus-based methods apply vector space similarity of verbs co-occurring with labeled and unlabeled relations to calculate relation label suggestions from a set of candidates. A meta ontology in combination with Semantic Web sources such as DBpedia and OpenCyc allows reasoning to improve the suggested labels. An extensive formal evaluation demonstrates the superior accuracy of the presented hybrid approach
    corecore