Search CORE

790 research outputs found

Using parallel corpora for word sense disambiguation

Author: De Cock Martine
Hoste Veronique
Lefever Els
Publication venue
Publication date: 01/01/2011
Field of study

A false friend exercise with authentic material retrieved from a corpus

Author: Wagner Joachim
Publication venue: 'International Speech Communication Association'
Publication date: 17/06/2004
Field of study

This paper presents a CALL exercise that aims to raise the learner's awareness of false friends. In the exercise, the learner is asked to mark words in a text that are similar in form to a word in his or her native language and then to classify these words according to three levels of meaning correspondence. Text is randomly selected from a corpus and integrated into the exercise. A preliminary evaluation shows that mature students understand the exercise well

Irish Universities

DCU Online Research Access Service

Synapse at CAp 2017 NER challenge: Fasttext CRF

Author: Alexandra J. Weisberg (4234153)
Briana S. Bullington (4234156)
Eric R. Moore (4234150)
Jeff Chang (228277)
Kimberly H. Halsey (208487)
Yuan Jiang (296541)
Publication venue
Publication date: 01/01/2017
Field of study

We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 01/01/2017
Field of study

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

SupWSD: a flexible toolkit for supervised word sense disambiguation

Author: DELLI BOVI Claudio
Papandrea Simone
Raganato Alessandro
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

In this demonstration we present SupWSD, a Java API for supervised Word Sense Disambiguation (WSD). This toolkit includes the implementation of a state-of-the-art supervised WSD system, together with a Natural Language Processing pipeline for preprocessing and feature extraction. Our aim is to provide an easy-to-use tool for the research community, designed to be modular, fast and scalable for training and testing on large datasets. The source code of SupWSD is available at http://github.com/SI3P/SupWSD

Archivio della ricerca- Università di Roma La Sapienza

Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

Author: Despeyroux Thierry
Lechevallier Yves
Trousse Brigitte
Vercoustre Anne-Marie
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Exploring Text Virality in Social Networks

Author: Guerini Marco
Ozbal Gozde
Strapparava Carlo
Publication venue
Publication date: 01/01/2011
Field of study

This paper aims to shed some light on the concept of virality - especially in social networks - and to provide new insights on its structure. We argue that: (a) virality is a phenomenon strictly connected to the nature of the content being spread, rather than to the influencers who spread it, (b) virality is a phenomenon with many facets, i.e. under this generic term several different effects of persuasive communication are comprised and they only partially overlap. To give ground to our claims, we provide initial experiments in a machine learning framework to show how various aspects of virality can be independently predicted according to content features

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Association for the Advancement of Artificial Intelligence: AAAI Publications

MIsA : multilingual 'IsA' extraction from Corpora

Author: Faralli Stefano
Lefever Els
Paolo Ponzetto Simone
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

MAnnheim DOCument Server