790 research outputs found
A false friend exercise with authentic material retrieved from a corpus
This paper presents a CALL exercise that aims to raise the learner's awareness of false friends. In the exercise, the learner is asked to mark words in a text that are similar in form to a word in his or her native language and then to classify these words according to three levels of meaning correspondence. Text is randomly selected from a corpus and integrated into the exercise. A preliminary evaluation shows that mature students understand the exercise well
Synapse at CAp 2017 NER challenge: Fasttext CRF
We present our system for the CAp 2017 NER challenge which is about named
entity recognition on French tweets. Our system leverages unsupervised learning
on a larger dataset of French tweets to learn features feeding a CRF model. It
was ranked first without using any gazetteer or structured external data, with
an F-measure of 58.89\%. To the best of our knowledge, it is the first system
to use fasttext embeddings (which include subword representations) and an
embedding-based sentence representation for NER
CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
We present our submitted systems for Semantic Textual Similarity (STS) Track
4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must
estimate their semantic similarity by a score between 0 and 5. In our
submission, we use syntax-based, dictionary-based, context-based, and MT-based
methods. We also combine these methods in unsupervised and supervised way. Our
best run ranked 1st on track 4a with a correlation of 83.02% with human
annotations
SupWSD: a flexible toolkit for supervised word sense disambiguation
In this demonstration we present SupWSD, a Java API for supervised Word Sense Disambiguation (WSD). This toolkit includes the implementation of a state-of-the-art supervised WSD system, together with a Natural Language Processing pipeline for preprocessing and feature extraction. Our aim is to provide an easy-to-use tool for the research community, designed to be modular, fast and scalable for training and testing on large datasets. The source code of SupWSD is available at http://github.com/SI3P/SupWSD
Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology
This paper presents some experiments in clustering homogeneous XMLdocuments
to validate an existing classification or more generally anorganisational
structure. Our approach integrates techniques for extracting knowledge from
documents with unsupervised classification (clustering) of documents. We focus
on the feature selection used for representing documents and its impact on the
emerging classification. We mix the selection of structured features with fine
textual selection based on syntactic characteristics.We illustrate and evaluate
this approach with a collection of Inria activity reports for the year 2003.
The objective is to cluster projects into larger groups (Themes), based on the
keywords or different chapters of these activity reports. We then compare the
results of clustering using different feature selections, with the official
theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors'
names in the bibliograph
Exploring Text Virality in Social Networks
This paper aims to shed some light on the concept of virality - especially in
social networks - and to provide new insights on its structure. We argue that:
(a) virality is a phenomenon strictly connected to the nature of the content
being spread, rather than to the influencers who spread it, (b) virality is a
phenomenon with many facets, i.e. under this generic term several different
effects of persuasive communication are comprised and they only partially
overlap. To give ground to our claims, we provide initial experiments in a
machine learning framework to show how various aspects of virality can be
independently predicted according to content features
- …