Search CORE

69 research outputs found

A French Fairy Tale Corpus syntactically and semantically annotated.

Author: El Maarouf Ismaïl
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 23/05/2012
Field of study

International audienceFairy tales, folktales and more generally children stories have lately attracted the Natural Language Processing (NLP) community. As such, very few corpora exist and linguistic resources are lacking. The work presented in this paper aims at filling this gap by presenting a syntactically and semantically annotated corpus. It focuses on the linguistic analysis of a Fairy Tales Corpus, and provides the description of the syntactic and semantic resources developed for Information Extraction. Resources include syntactic dependency relation annotation for 120 verbs; referential annotation, which is concerned with annotating each anaphoric occurrence and Proper Name with the most specific noun in the text; ontology matching for a substantial part of the nouns in the corpus; semantic role labelling for 41 verbs using the FrameNet database. The article also sums up previous analyses of this corpus and indicates possible uses of this corpus for the NLP community

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Evaluation de la détection des émotions, des opinions ou des sentiments : dictatute de la majorité ou respect de la diversité d'opinions ?

Author: Antoine Jean-Yves
Le Tallec Marc
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 27/06/2011
Field of study

National audienceDétection d'émotion, fouille d'opinion et analyse des sentiments sont généralement évalués par comparaison des réponses du système concerné par rapport à celles contenues dans un corpus de référence. Les questions posées dans cet article concernent à la fois la définition de la référence et la fiabilité des métriques les plus fréquemment utilisées pour cette comparaison. Les expérimentations menées pour évaluer le système de détection d'émotions EmoLogus servent de base de réflexion pour ces deux problèmes. L'analyse des résultats d'EmoLogus et la comparaison entre les différentes métriques remettent en cause le choix du vote majoritaire comme référence. Par ailleurs elles montrent également la nécessité de recourir à des outils statistiques plus évolués que ceux généralement utilisés pour obtenir des évaluations fiables de systèmes qui travaillent sur des données intrinsèquement subjectives et incertaines

HAL-Université de Bretagne Occidentale

HAL Université de Tours

Affective Interaction with a Companion Robot for Hospitalized Children: a Linguistically based Model for Emotion Detection

Author: Antoine Jean-Yves
Duhaut Dominique
Le Tallec Marc
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 25/11/2011
Field of study

6 pagesInternational audienceThis paper presents a system which aims at characterizing emotions in speech by only considering linguistic content. It is based on the assumption that emotions can be compound: simple lexical words have an intrinsic emotional value, while verbal and adjectival predicates act as a function on the emotional values of their arguments. The paper describes the compositional computation algorithm of the emotion, as well as the lexical emotional lexicons used by this algorithm. A quantitative and qualitative analysis of the differences between system outputs and expert annotations is given, which shows satisfactory results, with a good detection of emotional valence in 82.8% of the test utterances

HAL-Université de Bretagne Occidentale

HAL Université de Tours

Word Order Phenomena in Spoken French : a Study on Four Corpora of Task-Oriented Dialogue and its Consequences on Language Processing

Author: Antoine Jean-Yves
Goulian Jerome
Le Tallec Marc
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 20/07/2009
Field of study

International audienceThis paper presents a corpus study that investigates the question of word order variations (WOV) in spontaneous spoken French and its consequences on the parsing techniques that are used in Natural Language Processing. We have studied four taskoriented spoken dialogue corpora which concern different application tasks (air transport or tourism information, switchboard calls). Two corpora concern phone conversations while the other two correspond to direct interaction. Every word order variation has been manually annotated by 3 experts, following a cross-validation procedure. Our results show that, while conversational spoken French should be highly affected by WOVs, it should also still be considered as a rigid order language: WOVs follow some impressive structural regularity and they result very rarely in discontinuous syntactic structures. As a result, non-projective parsers remain well adapted to conversational spoken French

Hal - Université Grenoble Alpes

HAL Université de Tours

Weighted Krippendorff's alpha is a more reliable metrics for multi- coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation.

Author: Antoine Jean-Yves
Lefeuvre Anaïs
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 26/04/2014
Field of study

http://www.aclweb.org/anthology/E14-1058International audienceThe question of data reliability is of first importance to assess the quality of manually annotated corpora. Although Cohen ' s κ is the prevailing reliability measure used in NLP, alternative statistics have been proposed. This paper presents an experimental study with four measures (Cohen's κ, Scott's π, binary and weighted Krippendorff ' s α) on three tasks: emotion, opinion and coreference annotation. The reported studies investigate the factors of influence (annotator bias, category prevalence, number of coders, number of categories) that should affect reliability estimation. Results show that the use of a weighted measure re- stricts this influence on ordinal annotations. They suggest that weighted α is the most reliable metrics for such an annotation scheme

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Université de Tours

HAL-Rennes 1

Extraction de patrons sémantiques appliquée à la classification d'Entités Nommées

Author: El Maarouf Ismaïl
Rosset Sophie
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceLa variabilité des corpus constitue un problème majeur pour les systèmes de reconnaissance d'entités nommées. L'une des pistes possibles pour y remédier est l'utilisation d'approches linguistiques pour les adapter à de nouveaux contextes : la construction de patrons sémantiques peut permettre de désambiguïser les entités nommées en structurant leur environnement syntaxico-sémantique. Cet article présente une première réalisation sur un corpus de presse d'un système de correction. Après une étape de segmentation sur des critères discursifs de surface, le système extrait et pondère les patrons liés à une classe d'entité nommée fournie par un analyseur. Malgré des modèles encore relativement élémentaires, les résultats obtenus sont encourageants et montrent la nécessité d'un traitement plus approfondi de la classe Organisation. Abstract Corpus variation is a major problem for named entity recognition systems. One possible direction to tackle this problem involves using linguistic approaches to adapt them to unseen contexts : building semantic patterns may help for their disambiguation by structuring their syntactic and semantic environment. This article presents a preliminary implementation on a press corpus of a correction system. After a segmentation step based on surface discourse clues, the system extracts and weights the patterns linked to a named entity class provided by an analyzer. Despite relatively elementary models, the results obtained are promising and point on the necessary treatment of the Organisation class. Mots-clés : entités nommées, patrons sémantiques, segmentation discursive de surface Keywords: named entities, semantic patterns, surface discourse segmentation ISMAÏL EL MAAROUF, JEANNE VILLANEAU, SOPHIE ROSSE

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Détection hors contexte des émotions à partir du contenu linguistique d'énoncés oraux : le système EmoLogus

Author: Antoine Jean-Yves
Le Tallec Marc
Savary A.
Syssau A.
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

The ANR Emotirob project aims at detecting emotions in an original application context: realizing an emotional companion robot for weakened children. This paper presents a system which aims at characterizing emotions by only considering linguistic content. It is based on the assumption that emotions can be compound: simple lexical words have an intrinsic emotional value, while verbal and adjectival predicates act as a function on the emotional values of their arguments. The paper describes the algorithm of compositional computation of the emotion and the lexical emotional norm used by this algorithm. A quantitative and qualitative analysis of the differences between system outputs and expert annotations is given, which shows satisfactory results, with a good detection of emotional valency in 90.0% of the test utterances

HAL Descartes

HAL Université de Tours

Hal-Diderot

Détection des émotions à partir du contenu linguistique d'énoncés oraux : application à un robot compagnon pour enfants fragilisés

Author: Antoine Jean-Yves
Le Tallec Marc
Savary Agata
Syssau-Vaccarella Arielle
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

International audienceProject ANR Emotirob aims at detecting emotions from an original point of view: realizing an emotional companion robot for weakened children. In our approach, linguistic detection and prosodie are combined. Our experiments show that human beings can estimate the emotional value of an utterance from its propositional content in a reliable way. So we have implemented a first model of linguistic detection, based on the principle that emotions can be compound: lexical words have an emotional value while predicates can modify emotional values of their arguments. This paper presents a short description of the logical understanding system, the outputs of which are used for the final emotional value calculus. Then, the creation of a lexical emotional reference standard is presented with an ontology of emotional predicate classes for children, aged between 5 and 7

HAL Descartes

HAL Université de Tours

Hal-Diderot

Logical Approach to Natural Language Understanding in a Spoken Dialogue System

Author: Antoine Jean-Yves
Ridoux Olivier
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 08/09/2004
Field of study

International audienceWe present a logical approach of spoken language understanding for a human-machine dialogue system. The aim of the analysis is to provide a logical formula, or a conceptual graph, by assembling concepts related to a delimited application domain. This flexible structure is gradually built during an incremental parsing, which is meant to combine syntactic and semantic criteria. Then, a contextual understanding step leads to completing this structure. The evaluations of the current system are encouraging. This approach is a preliminar

INRIA a CCSD electronic archive server

ANCOR, premier corpus de français parlé d'envergure annoté en coréférence et distribué librement

Author: Antoine Jean-Yves
Eshkol Iris
Lefeuvre Anaïs
Maurel Denis
Muzerelle Judith
Schang Emmanuel
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 17/06/2011
Field of study

National audienceCet article présente la réalisation d'ANCOR, qui constitue par son envergure (453 000 mots) le premier corpus francophone annoté en anaphores et coréférences permettant le développement d'approches centrées sur les données pour la résolution des anaphores et autres traitements de la coréférence. L'annotation a été réalisée sur trois corpus de parole conversationnelle (Accueil_UBS, OTG et ESLO) qui le destinent plus particulièrement au traitement du langage parlé. En l'absence d'équivalent pour le langage écrit, il est toutefois susceptible d'intéresser l'ensemble de la communauté TAL. Par ailleurs, le schéma d'annotation retenu est suffisamment riche pour permettre des études en linguistique de corpus. Le corpus sera diffusé librement à la mi-2013 sous licence Creative Commons BY-NC-SA. Cet article se concentre sur sa mise en œuvre et décrit brièvement quelques résultats obtenus sur la partie déjà annotée de la ressource

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Université de Tours

HAL-Rennes 1