Search CORE

77 research outputs found

MELODI : Semantic Similarity of Words and Compositional Phrases using Latent Vector Weighting

Author: Afantenos Stergos
Muller Philippe
Van De Cruys Tim
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceIn this paper we present our system for the SemEval 2013 Task 5a on semantic similar- ity of words and compositional phrases. Our system uses a dependency-based vector space model, in combination with a technique called latent vector weighting. The system computes the similarity between a particular noun in- stance and the head noun of a particular noun phrase, which was weighted according to the semantics of the modifier. The system is en- tirely unsupervised; one single parameter, the similarity threshold, was tuned using the train- ing data

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

A Tensor-based Factorization Model of Semantic Compositionality

Author: Korhonen Anna
Poibeau Thierry
Van De Cruys Tim
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceIn this paper, we present a novel method for the computation of compositionality within a distributional framework. The key idea is that compositionality is modeled as a multi-way interaction between latent factors, which are automatically constructed from corpus data. We use our method to model the composition of subject verb object triples. The method consists of two steps. First, we compute a latent factor model for nouns from standard co-occurrence data. Next, the latent factors are used to induce a latent model of three-way subject verb object interactions. Our model has been evaluated on a similarity task for transitive phrases, in which it exceeds the state of the art

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

MELODI : A Supervised Distributional Approach for Free Paraphrasing of Noun Compounds

Author: Afantenos Stergos
Muller Philippe
Van De Cruys Tim
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceThis paper describes the system submitted by the MELODI team for the SemEval-2013 Task 4 : Free Paraphrases of Noun Compounds (Hendrickx et al., 2013). Our approach combines the strength of an unsupervised distributional word space model with a supervised maximum-entropy classification model; the distributional model yields a feature representation for a particular compound noun, which is subsequently used by the classifier to induce a number of appropriate paraphrases

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

A Quantitative Evaluation of Global Word Sense Induction

Author: Apidianaki Marianna
Van De Cruys Tim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/02/2011
Field of study

International audienceWord sense induction (WSI) is the task aimed at automatically identifying the senses of words in texts, without the need for handcrafted resources or annotated data. Up till now, most WSI algorithms extract the different senses of a word 'locally' on a per-word basis, i.e. the different senses for each word are determined separately. In this paper, we compare the performance of such algorithms to an algorithm that uses a 'global' approach, i.e. the different senses of a particular word are determined by comparing them to, and demarcating them from, the senses of other words in a full-blown word space model. We adopt the evaluation framework proposed in the SemEval-2010 Word Sense Induction \& Disambiguation task. All systems that participated in this task use a local scheme for determining the different senses of a word. We compare their results to the ones obtained by the global approach, and discuss the advantages and weaknesses of both approaches

INRIA a CCSD electronic archive server

Discourse-Based Evaluation of Language Understanding

Author: Muller Philippe
Pradel Camille
Sileo Damien
Van-de-Cruys Tim
Publication venue
Publication date: 19/07/2019
Field of study

We introduce DiscEval, a compilation of

11

evaluation datasets with a focus on discourse, that can be used for evaluation of English Natural Language Understanding when considering meaning as use. We make the case that evaluation with discourse tasks is overlooked and that Natural Language Inference (NLI) pretraining may not lead to the learning really universal representations. DiscEval can also be used as supplementary training data for multi-task learning-based systems, and is publicly available, alongside the code for gathering and preprocessing the datasets

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

Automatic Relation Extraction — Can Synonym Extraction Benefit from Antonym Knowledge?

Author: Anna Lobanova
Erik Tjong
Jennifer Spenader
Kim Sang
Tim Van De Cruys
Tom Van Der Kleij
Publication venue
Publication date: 01/01/2009
Field of study

Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), 17-20. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

DSpace at Tartu University Library

Dissertations of the University of Groningen

Apprentissage non-supervisé pour l'appariement et l'étiquetage de cas cliniques en français

Author: Muller Philippe
Pradel Camille
Sileo Damien
Van de Cruys Tim
Publication venue: Association pour le Traitement Automatique des Langues (ATALA)
Publication date: 02/07/2019
Field of study

Nous présentons le système utilisé par l’équipe Synapse/IRIT dans la compétition DEFT2019 portant sur deux tâches liées à des cas cliniques rédigés en français : l’une d’appariement entre des cas cliniques et des discussions, l’autre d’extraction de mots-clefs. Une des particularité est l’emploi d’apprentissage non-supervisé sur les deux tâches, sur un corpus construit spécifiquement pour le domaine médical en français

Open Archive Toulouse Archive Ouverte

Présentation de l'atelier SemDis 2014 : sémantique distributionnelle pour la substitution lexicale et l'exploration de corpus spécialisés

Author: Fabre Cécile
Hathout Nabil
Ho-Dac Lydia-Mai
Morlane-Hondère François
Muller Philippe
Sajous Franck
Tanguy Ludovic
Van De Cruys Tim
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceThis is an introductory paper for the proceedings of the SemDis 2014 workshop, dedicated to distributional semantics methods with a focus on the construction of French distributional resources. We describe the two tasks that have been set up : the first one is competitive. It is a French lexical substitution task, based on the FRWAC corpus. The second one is a more exploratory task, which consists in the analysis of a specific corpus in the NLP field. We report an evaluation of the systems participating in the competitive task, and give a broad overview for both tasks of the diverse methods that have been used by the participants.Il s'agit d'un article d'introduction aux actes de SemDis 2014, atelier dédié aux méthodes d'analyse sémantique distributionnelle, avec une focalisation sur la construction de ressources distributionnelles en français. Il décrit les deux tâches qui ont été proposées dans le cadre de l'atelier : la première est une tâche compétitive de substitution lexicale, basée sur le corpus FRWAC. La seconde, plus exploratoire, consiste à analyser un corpus spécifique relevant du champ du TAL. Nous rendons compte de l'évaluation des systèmes qui ont participé à la tâche compétitive, et donnons un aperçu de la diversité des méthodes qui ont été utilisées par les participants dans les deux tâches

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Unsupervised compositionality prediction of nominal compounds

Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results

University of Essex Research Repository

Crossref

HAL AMU

White Rose Research Online