69 research outputs found

    A French Fairy Tale Corpus syntactically and semantically annotated.

    Get PDF
    International audienceFairy tales, folktales and more generally children stories have lately attracted the Natural Language Processing (NLP) community. As such, very few corpora exist and linguistic resources are lacking. The work presented in this paper aims at filling this gap by presenting a syntactically and semantically annotated corpus. It focuses on the linguistic analysis of a Fairy Tales Corpus, and provides the description of the syntactic and semantic resources developed for Information Extraction. Resources include syntactic dependency relation annotation for 120 verbs; referential annotation, which is concerned with annotating each anaphoric occurrence and Proper Name with the most specific noun in the text; ontology matching for a substantial part of the nouns in the corpus; semantic role labelling for 41 verbs using the FrameNet database. The article also sums up previous analyses of this corpus and indicates possible uses of this corpus for the NLP community

    Evaluation de la détection des émotions, des opinions ou des sentiments : dictatute de la majorité ou respect de la diversité d'opinions ?

    Get PDF
    National audienceDétection d'émotion, fouille d'opinion et analyse des sentiments sont généralement évalués par comparaison des réponses du systÚme concerné par rapport à celles contenues dans un corpus de référence. Les questions posées dans cet article concernent à la fois la définition de la référence et la fiabilité des métriques les plus fréquemment utilisées pour cette comparaison. Les expérimentations menées pour évaluer le systÚme de détection d'émotions EmoLogus servent de base de réflexion pour ces deux problÚmes. L'analyse des résultats d'EmoLogus et la comparaison entre les différentes métriques remettent en cause le choix du vote majoritaire comme référence. Par ailleurs elles montrent également la nécessité de recourir à des outils statistiques plus évolués que ceux généralement utilisés pour obtenir des évaluations fiables de systÚmes qui travaillent sur des données intrinsÚquement subjectives et incertaines

    Affective Interaction with a Companion Robot for Hospitalized Children: a Linguistically based Model for Emotion Detection

    Get PDF
    6 pagesInternational audienceThis paper presents a system which aims at characterizing emotions in speech by only considering linguistic content. It is based on the assumption that emotions can be compound: simple lexical words have an intrinsic emotional value, while verbal and adjectival predicates act as a function on the emotional values of their arguments. The paper describes the compositional computation algorithm of the emotion, as well as the lexical emotional lexicons used by this algorithm. A quantitative and qualitative analysis of the differences between system outputs and expert annotations is given, which shows satisfactory results, with a good detection of emotional valence in 82.8% of the test utterances

    Word Order Phenomena in Spoken French : a Study on Four Corpora of Task-Oriented Dialogue and its Consequences on Language Processing

    Get PDF
    International audienceThis paper presents a corpus study that investigates the question of word order variations (WOV) in spontaneous spoken French and its consequences on the parsing techniques that are used in Natural Language Processing. We have studied four taskoriented spoken dialogue corpora which concern different application tasks (air transport or tourism information, switchboard calls). Two corpora concern phone conversations while the other two correspond to direct interaction. Every word order variation has been manually annotated by 3 experts, following a cross-validation procedure. Our results show that, while conversational spoken French should be highly affected by WOVs, it should also still be considered as a rigid order language: WOVs follow some impressive structural regularity and they result very rarely in discontinuous syntactic structures. As a result, non-projective parsers remain well adapted to conversational spoken French

    Weighted Krippendorff's alpha is a more reliable metrics for multi- coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation.

    Get PDF
    http://www.aclweb.org/anthology/E14-1058International audienceThe question of data reliability is of first importance to assess the quality of manually annotated corpora. Although Cohen ' s Îș is the prevailing reliability measure used in NLP, alternative statistics have been proposed. This paper presents an experimental study with four measures (Cohen's Îș, Scott's π, binary and weighted Krippendorff ' s α) on three tasks: emotion, opinion and coreference annotation. The reported studies investigate the factors of influence (annotator bias, category prevalence, number of coders, number of categories) that should affect reliability estimation. Results show that the use of a weighted measure re- stricts this influence on ordinal annotations. They suggest that weighted α is the most reliable metrics for such an annotation scheme

    Extraction de patrons sémantiques appliquée à la classification d'Entités Nommées

    Get PDF
    International audienceLa variabilitĂ© des corpus constitue un problĂšme majeur pour les systĂšmes de reconnaissance d'entitĂ©s nommĂ©es. L'une des pistes possibles pour y remĂ©dier est l'utilisation d'approches linguistiques pour les adapter Ă  de nouveaux contextes : la construction de patrons sĂ©mantiques peut permettre de dĂ©sambiguĂŻser les entitĂ©s nommĂ©es en structurant leur environnement syntaxico-sĂ©mantique. Cet article prĂ©sente une premiĂšre rĂ©alisation sur un corpus de presse d'un systĂšme de correction. AprĂšs une Ă©tape de segmentation sur des critĂšres discursifs de surface, le systĂšme extrait et pondĂšre les patrons liĂ©s Ă  une classe d'entitĂ© nommĂ©e fournie par un analyseur. MalgrĂ© des modĂšles encore relativement Ă©lĂ©mentaires, les rĂ©sultats obtenus sont encourageants et montrent la nĂ©cessitĂ© d'un traitement plus approfondi de la classe Organisation. Abstract Corpus variation is a major problem for named entity recognition systems. One possible direction to tackle this problem involves using linguistic approaches to adapt them to unseen contexts : building semantic patterns may help for their disambiguation by structuring their syntactic and semantic environment. This article presents a preliminary implementation on a press corpus of a correction system. After a segmentation step based on surface discourse clues, the system extracts and weights the patterns linked to a named entity class provided by an analyzer. Despite relatively elementary models, the results obtained are promising and point on the necessary treatment of the Organisation class. Mots-clĂ©s : entitĂ©s nommĂ©es, patrons sĂ©mantiques, segmentation discursive de surface Keywords: named entities, semantic patterns, surface discourse segmentation ISMAÏL EL MAAROUF, JEANNE VILLANEAU, SOPHIE ROSSE

    Détection hors contexte des émotions à partir du contenu linguistique d'énoncés oraux : le systÚme EmoLogus

    Get PDF
    The ANR Emotirob project aims at detecting emotions in an original application context: realizing an emotional companion robot for weakened children. This paper presents a system which aims at characterizing emotions by only considering linguistic content. It is based on the assumption that emotions can be compound: simple lexical words have an intrinsic emotional value, while verbal and adjectival predicates act as a function on the emotional values of their arguments. The paper describes the algorithm of compositional computation of the emotion and the lexical emotional norm used by this algorithm. A quantitative and qualitative analysis of the differences between system outputs and expert annotations is given, which shows satisfactory results, with a good detection of emotional valency in 90.0% of the test utterances

    Détection des émotions à partir du contenu linguistique d'énoncés oraux : application à un robot compagnon pour enfants fragilisés

    Get PDF
    International audienceProject ANR Emotirob aims at detecting emotions from an original point of view: realizing an emotional companion robot for weakened children. In our approach, linguistic detection and prosodie are combined. Our experiments show that human beings can estimate the emotional value of an utterance from its propositional content in a reliable way. So we have implemented a first model of linguistic detection, based on the principle that emotions can be compound: lexical words have an emotional value while predicates can modify emotional values of their arguments. This paper presents a short description of the logical understanding system, the outputs of which are used for the final emotional value calculus. Then, the creation of a lexical emotional reference standard is presented with an ontology of emotional predicate classes for children, aged between 5 and 7

    Logical Approach to Natural Language Understanding in a Spoken Dialogue System

    Get PDF
    International audienceWe present a logical approach of spoken language understanding for a human-machine dialogue system. The aim of the analysis is to provide a logical formula, or a conceptual graph, by assembling concepts related to a delimited application domain. This flexible structure is gradually built during an incremental parsing, which is meant to combine syntactic and semantic criteria. Then, a contextual understanding step leads to completing this structure. The evaluations of the current system are encouraging. This approach is a preliminar

    ANCOR, premier corpus de français parlé d'envergure annoté en coréférence et distribué librement

    Get PDF
    National audienceCet article prĂ©sente la rĂ©alisation d'ANCOR, qui constitue par son envergure (453 000 mots) le premier corpus francophone annotĂ© en anaphores et corĂ©fĂ©rences permettant le dĂ©veloppement d'approches centrĂ©es sur les donnĂ©es pour la rĂ©solution des anaphores et autres traitements de la corĂ©fĂ©rence. L'annotation a Ă©tĂ© rĂ©alisĂ©e sur trois corpus de parole conversationnelle (Accueil_UBS, OTG et ESLO) qui le destinent plus particuliĂšrement au traitement du langage parlĂ©. En l'absence d'Ă©quivalent pour le langage Ă©crit, il est toutefois susceptible d'intĂ©resser l'ensemble de la communautĂ© TAL. Par ailleurs, le schĂ©ma d'annotation retenu est suffisamment riche pour permettre des Ă©tudes en linguistique de corpus. Le corpus sera diffusĂ© librement Ă  la mi-2013 sous licence Creative Commons BY-NC-SA. Cet article se concentre sur sa mise en Ɠuvre et dĂ©crit briĂšvement quelques rĂ©sultats obtenus sur la partie dĂ©jĂ  annotĂ©e de la ressource
    • 

    corecore