Search CORE

48 research outputs found

ANNODIS : une approche outillée de l'annotation de structures discursives

Author: Asher Nicholas
Benamara Farah
Bras Myriam
Enjalbert Patrice
Fabre Cécile
Ferrari Stéphane
Ho-Dac Lydia-Mai
Le Draoulec Anne
Mathet Yann
Muller Philippe
Prévot Laurent
Péry-Woodley Marie-Paule
Rebeyrolle Josette
Tanguy Ludovic
Vergez-Couret Marianne
Vieu Laure
Widlöcher Antoine
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

International audienceThe ANNODIS project has two interconnected objectives: to produce a corpus of texts annotated at discourse-level, and to develop tools for corpus annotation and exploitation. Two sets of annotations are proposed, representing two complementary perspectives on discourse organisation: a bottom-up approach starting from minimal discourse units and building complex structures via a set of discourse relations; a top-down approach envisaging the text as a whole and using pre-identified cues to detect discourse macro-structures. The construction of the corpus goes hand in hand with the development of two interfaces: the first one supports manual annotation of discourse structures, and allows different views of the texts using NLP-based pre-processing; another interface will support the exploitation of the annotations. We present the discourse models and annotation protocols, and the interface which embodies them.Le projet ANNODIS vise la construction d'un corpus de textes annotés au niveau discursif ainsi que le développement d'outils pour l'annotation et l'exploitation de corpus. Les annotations adoptent deux points de vue complémentaires : une perspective ascendante part d'unités de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices pré-identifiés pour détecter des structures discursives de haut niveau. La construction du corpus est associée à la création de deux interfaces : la première assiste l'annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prétraitements ; une seconde sera destinée à l'exploitation des annotations. Nous présentons les modèles et protocoles d'annotation élaborés pour mettre en œuvre, au travers de l'interface dédiée, la campagne d'annotation

HAL - Normandie Université

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

HAL Descartes

A contribution to Computational Linguistics and Natural Language Processing: From the Semantics of Space and Time to Annotations and Agreement Measures

Author: Mathet Yann
Publication venue: HAL CCSD
Publication date: 05/12/2017
Field of study

HAL - Normandie Université

Thèses en Ligne

Une approche cognitive de l'itération et sa modélisation

Author: Mathet Yann
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

@inproceedings{AC-MATHET-2007, author = {Mathet, Y.}, title = {Une approche cognitive de l'itération et sa modélisation}, booktitle = {Atelier Représentation et Raisonnement sur le Temps et l'Espace associé à la plateforme AFIA'07}, publisher = {}, address = {Grenoble, France}, year = {2007}, month = {juillet}, pages = {}, note = {} }International audienc

HAL - Normandie Université

Sémantique de l'Espace et du déplacement

Author: Mathet Yann
Publication venue: Hermès Sciences, Lavoisier
Publication date: 01/01/2005
Field of study

@incollection{OL-MATHET-2005, author = {Yann Mathet}, title = {{S{é}mantique de l'Espace et du d{é}placement}}, chapter = {6}, pages = {215-265}, booktitle = {S{é}mantique et traitement automatique des langues}, editor = {Patrice Enjalbert}, series = {Trait{é} IC2, s{é}rie Cognition et traitement de l'information}, publisher = {Herm{è}s Sciences, Lavoisier}, year = {2005}

HAL - Normandie Université

The Agreement Measure Gamma-Cat : a Complement to Gamma Focused on Categorization of a Continuum

Author: Mathet Yann
Publication venue: 'MIT Press - Journals'
Publication date: 01/06/2017
Field of study

International audienceAgreement on unitizing, where several annotators freely put units of various sizes and categories on a continuum, is difficult to assess because of the simultaneaous discrepancies in positioning and categorizing. The recent agreement measure γ offers an overall solution that simultaneously takes into account positions and categories. In this article, I propose the additional coefficient γcat, which complements γ by assessing the agreement on categorization of a continuum, putting aside positional discrepancies. When applied to pure categorization (with predefined units), γcat behaves the same way as the famous dedicated Krippendorff's α, even with missing values, which proves its consistency. A variation of γcat is also proposed that provides an in-depth assessment of categorizing for each individual category. The entire family of γ coefficients is implemented in free software

HAL - Normandie Université

Crossref

Directory of Open Access Journals

The Agreement Measure γ cat

Author: Yann Mathet
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

Évaluation des annotations : ses principes et ses pièges

Author: Mathet Yann
Widlöcher Antoine
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/12/2016
Field of study

National audienceA lot of data is produced by NLP (automatic systems) and for NLP (reference corpus, for computational linguistics or for machine learning) and should be publicly released only if their consistency is proven. While the growing effort that has been made in this direction over the past two decades is encouraging, for example through the increasing use of inter-annotating agreement measures such as kappa, it is not always accompanied by sufficient knowledge of the principles underlying evaluation or the rigor required for their application. The aim of this paper is to present and question the basic concepts and principles of the domain (e.g., shall we use "chance correction" in agreement measures, and if so, how?), and to illustrate with concrete and quantified examples the consequences of an approximate practice of evaluation.Beaucoup de données sont produites par le TAL (systèmes automatiques) et pour le TAL (corpus de référence, pour la linguistique computationnelle ou pour l’apprentissage), et leur mise à disposition ne devrait se faire que dans la mesure où leur consistance est établie. Sil’on peut se réjouir de l’effort grandissant qui est fait en ce sens depuis une vingtaine d’années, par exemple par l’utilisation de plus en plus fréquente de mesures d’accord inter-annotateurs telles que le coefficient kappa, on constate cependant qu’il ne s’accompagne pas toujours d’une connaissance suffisante des principes sous-jacents à l’évaluation, ni de la rigueur nécessaire à l’application de ces derniers.L’objectif de cet article est d’une part de présenter et de questionner les concepts et les principes fondamentaux du domaine (faut-il par exemple « corriger par la chance » les mesures d’accord, et si oui, comment ?), et d’illustrer par des exemples concrets et chiffrés les conséquences d’une pratique approximative de l’évaluation

HAL - Normandie Université

Annotation, évaluation et mesure d’accord en linguistique de corpus

Author: Mathet Yann
Widlöcher Antoine
Publication venue: 'CAIRN'
Publication date: 01/01/2019
Field of study

National audienc

HAL - Normandie Université

GlozzQL : un langage de requêtes incrémental pour les textes annotés

Author: Mathet Yann
Widlöcher Antoine
Publication venue: HAL CCSD
Publication date: 29/06/2011
Field of study

National audienc

HAL - Normandie Université

La plate-forme Glozz: environnement d'annotation et d'exploration de corpus

Author: Mathet Yann
Widlöcher Antoine
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

National audienceLa nécessité d'une interaction systématique entre modèles, traitements et corpus impose la disponibilité d'annotations de référence auxquelles modèles et traitements pourront être confrontés. Or l'établissement de telles annotations requiert un cadre formel permettant la représentation d'objets linguistiques variés, et des applications permettant à l'annotateur de localiser sur corpus et de caractériser les occurrences des phénomènes observés. Si différents outils d'annotation ont vu le jour, ils demeurent souvent fortement liés à un modèle théorique et à des objets linguistiques particuliers, et ne permettent que marginalement d'explorer certaines structures plus récemment appréhendées expérimentalement, notamment à granularité élevée et en matière d'analyse du discours. La plate-forme Glozz répond à ces différentes contraintes et propose un environnement d'exploration de corpus et d'annotation fortement configurable et non limité a priori au contexte discursif dans lequel elle a initialement vu le jour. ------ The need for a systematic confrontation between models and corpora make it necessary to have - and consequently, to produce - reference annotations to which linguistic models could be compared. Creating such annotations requires both a formal framework which copes with various linguistic objects, and specific manual annotation tools, in order to make it possible to locate, identify and feature linguistic phenomena in texts. Though several annotation tools do already exist, they are mostly dedicated to a given theory and to a given set of structures. The Glozz platform, described in this paper, tries to address all of these needs, and provides a highly versatile corpus exploration and annotation framework

HAL - Normandie Université