77 research outputs found

    MELODI : Semantic Similarity of Words and Compositional Phrases using Latent Vector Weighting

    Get PDF
    International audienceIn this paper we present our system for the SemEval 2013 Task 5a on semantic similar- ity of words and compositional phrases. Our system uses a dependency-based vector space model, in combination with a technique called latent vector weighting. The system computes the similarity between a particular noun in- stance and the head noun of a particular noun phrase, which was weighted according to the semantics of the modifier. The system is en- tirely unsupervised; one single parameter, the similarity threshold, was tuned using the train- ing data

    A Tensor-based Factorization Model of Semantic Compositionality

    Get PDF
    International audienceIn this paper, we present a novel method for the computation of compositionality within a distributional framework. The key idea is that compositionality is modeled as a multi-way interaction between latent factors, which are automatically constructed from corpus data. We use our method to model the composition of subject verb object triples. The method consists of two steps. First, we compute a latent factor model for nouns from standard co-occurrence data. Next, the latent factors are used to induce a latent model of three-way subject verb object interactions. Our model has been evaluated on a similarity task for transitive phrases, in which it exceeds the state of the art

    MELODI : A Supervised Distributional Approach for Free Paraphrasing of Noun Compounds

    Get PDF
    National audienceThis paper describes the system submitted by the MELODI team for the SemEval-2013 Task 4 : Free Paraphrases of Noun Compounds (Hendrickx et al., 2013). Our approach combines the strength of an unsupervised distributional word space model with a supervised maximum-entropy classification model; the distributional model yields a feature representation for a particular compound noun, which is subsequently used by the classifier to induce a number of appropriate paraphrases

    A Quantitative Evaluation of Global Word Sense Induction

    Get PDF
    International audienceWord sense induction (WSI) is the task aimed at automatically identifying the senses of words in texts, without the need for handcrafted resources or annotated data. Up till now, most WSI algorithms extract the different senses of a word 'locally' on a per-word basis, i.e. the different senses for each word are determined separately. In this paper, we compare the performance of such algorithms to an algorithm that uses a 'global' approach, i.e. the different senses of a particular word are determined by comparing them to, and demarcating them from, the senses of other words in a full-blown word space model. We adopt the evaluation framework proposed in the SemEval-2010 Word Sense Induction \& Disambiguation task. All systems that participated in this task use a local scheme for determining the different senses of a word. We compare their results to the ones obtained by the global approach, and discuss the advantages and weaknesses of both approaches

    Discourse-Based Evaluation of Language Understanding

    Full text link
    We introduce DiscEval, a compilation of 1111 evaluation datasets with a focus on discourse, that can be used for evaluation of English Natural Language Understanding when considering meaning as use. We make the case that evaluation with discourse tasks is overlooked and that Natural Language Inference (NLI) pretraining may not lead to the learning really universal representations. DiscEval can also be used as supplementary training data for multi-task learning-based systems, and is publicly available, alongside the code for gathering and preprocessing the datasets

    Automatic Relation Extraction — Can Synonym Extraction Benefit from Antonym Knowledge?

    Get PDF
    Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), 17-20. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209

    Apprentissage non-supervisé pour l'appariement et l'étiquetage de cas cliniques en français

    Get PDF
    Nous prĂ©sentons le systĂšme utilisĂ© par l’équipe Synapse/IRIT dans la compĂ©tition DEFT2019 portant sur deux tĂąches liĂ©es Ă  des cas cliniques rĂ©digĂ©s en français : l’une d’appariement entre des cas cliniques et des discussions, l’autre d’extraction de mots-clefs. Une des particularitĂ© est l’emploi d’apprentissage non-supervisĂ© sur les deux tĂąches, sur un corpus construit spĂ©cifiquement pour le domaine mĂ©dical en français

    Présentation de l'atelier SemDis 2014 : sémantique distributionnelle pour la substitution lexicale et l'exploration de corpus spécialisés

    Get PDF
    International audienceThis is an introductory paper for the proceedings of the SemDis 2014 workshop, dedicated to distributional semantics methods with a focus on the construction of French distributional resources. We describe the two tasks that have been set up : the first one is competitive. It is a French lexical substitution task, based on the FRWAC corpus. The second one is a more exploratory task, which consists in the analysis of a specific corpus in the NLP field. We report an evaluation of the systems participating in the competitive task, and give a broad overview for both tasks of the diverse methods that have been used by the participants.Il s'agit d'un article d'introduction aux actes de SemDis 2014, atelier dédié aux méthodes d'analyse sémantique distributionnelle, avec une focalisation sur la construction de ressources distributionnelles en français. Il décrit les deux tùches qui ont été proposées dans le cadre de l'atelier : la premiÚre est une tùche compétitive de substitution lexicale, basée sur le corpus FRWAC. La seconde, plus exploratoire, consiste à analyser un corpus spécifique relevant du champ du TAL. Nous rendons compte de l'évaluation des systÚmes qui ont participé à la tùche compétitive, et donnons un aperçu de la diversité des méthodes qui ont été utilisées par les participants dans les deux tùches

    Unsupervised compositionality prediction of nominal compounds

    Get PDF
    Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results
    • 

    corecore