Search CORE

84 research outputs found

Un hommage à Thierry Chanier

Author: Poudat Céline
Publication venue: 'OpenEdition'
Publication date: 28/01/2020
Field of study

Nous dédions ce numéro sur les corpus complexes à notre collègue Thierry Chanier, qui a joué un rôle majeur dans le consortium Corpus Écrits (2011-2015) en mettant notamment en œuvre le projet CoMeRe qui fut l’une des plus grandes réussites du consortium, et qui a cessé ses activités académiques depuis mars 2017. Revenons d’abord en deux mots sur le parcours de Thierry Chanier, qui fut l’un des premiers en France à développer les liens entre Intelligence Artificielle (IA) et Traitement Automa..

OpenEdition

French Wikipedia Talk Pages: Profiling and Conflict Detection

Author: Ho-Dac Lydia-Mai
Laippala Veronika
Poudat Céline
Tanguy Ludovic
Publication venue: HAL CCSD
Publication date: 27/09/2016
Field of study

International audienceWikipedia is a popular and extremely useful resource for studies in both linguistics and natural language processing (Yano and Kang, 2008; Ferschke et al., 2013). This paper introduces a new language resource based on the French Wikipedia online discussion pages, the WikiTalk corpus. The publicly available corpus includes 160M words and 3M posts structured into 1M thematic sections and has been syntactically parsed with the Talismane toolkit (Urieli, 2013). In this paper, we present the first results of experiments aiming at classifying and profiling the talk pages and threads in order to determine criteria for selecting discussions with conflicts

Scientific Publications of the University of Toulouse II Le Mirail

HAL-UNICE

HAL Descartes

Managing conflicts between users in Wikipedia

Author: Auray Nicolas
Hurault-Plantet Martine
Jacquemin Bernard
Lauf Aurélien
Poudat Céline
Publication venue
Publication date: 01/05/2008
Field of study

Wikipedia is nowadays a widely used encyclopedia, and one of the most visible sites on the Internet. Its strong principle of collaborative work and free editing sometimes generates disputes due to disagreements between users. In this article we study how the wikipedian community resolves the conflicts and which roles do wikipedian choose in this process. We observed the users behavior both in the article talk pages, and in the Arbitration Committee pages specifically dedicated to serious disputes. We first set up a users typology according to their involvement in conflicts and their publishing and management activity in the encyclopedia. We then used those user types to describe users behavior in contributing to articles that are tagged by the wikipedian community as being in conflict with the official guidelines of Wikipedia, or conversely as being well featured.Comment: 12 p

arXiv.org e-Print Archive

HAL-UNICE

Controverses du changement climatique : la représentation des paroles d’autrui dans les pages de discussion sur Wikipédia francophone et norvégien

Author: Gjerstad Øyvind
Gjesdal Anje Müller
Poudat Céline
Publication venue: Universitetet i Bergen
Publication date: 01/01/2023
Field of study

This article explores the collaborative encyclopedia Wikipedia through the lens of two aspects that have particularly interested Kjersti Fløttum throughout her work: (i) the climate crisis, which will be examined through the analysis of a corpus of Wikipedia pages related to climate change as well as the relevant discussion pages. The corpus contains both French and Norwegian versions of the relevant pages; in this sense, it is a comparable corpus that allows us to shed light on differences both in the treatment and textualization of the contents, and in the nature of the discussions carried out; (ii) the plurality of voices, more specifically from the perspective developed by J. Authier-Revuz (2020) (la représentation du discours autre - RDA), which allows us to analyze the representation of the speech of others, and linguistic polyphony (Ducrot 1984, Nølke 2017), with a focus on polemic negation in contexts of RDA, which allows us to analyze the expression of controversy.Le présent article se propose d’explorer l’encyclopédie collaborative Wikipédia au prisme de deux aspects ayant particulièrement intéressé Kjersti Fløttum au fil de ses travaux : (i) la crise climatique, qui sera abordée à travers le choix d’un corpus échantillonné comprenant des articles encyclopédiques et leurs pages de discussion associées. Notre corpus de travail contient les deux versions francophone et norvégienne des pages choisies ; en ce sens, il s’agit d’un corpus comparable qui nous permettra de mettre au jour des différences significatives tant dans le traitement et la mise en texte des contenus, que dans la nature des discussions menées; (ii) la pluralité de voix, plus précisément sous l’optique de la représentation du discours autre (RDA) développée par J. Authier-Revuz (2020), qui nous permettra d’analyser la représentation de la parole d’autrui et la polyphonie (Ducrot 1984, Nølke 2017), abordée dans le cadre de l’analyse de la négation polémique dans les contextes de RDA, ce qui nous permettra d’analyser l’expression de la controverse.publishedVersio

HIØ Brage

The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres

Author: Antoniadis Georges
Chanier Thierry
Hriba Linda
Longhi Julien
Poudat Céline
Sagot Benoît
Seddah Djamé
Wigham Ciara R.
Publication venue: GSCL (Gesellschaft für Sprachtechnologie und Computerlinguistik)
Publication date: 01/01/2014
Field of study

Final version to Special Issue of JLCL (Journal of Language Technology and Computational Linguistics (JLCL, http://jlcl.org/): BUILDING AND ANNOTATING CORPORA OF COMPUTER-MEDIATED DISCOURSE: Issues and Challenges at the Interface of Corpus and Computational Linguistics (ed. by Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel)International audienceThe CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective

HAL-ENS-LYON

Hal - Université Grenoble Alpes

HAL Clermont Université

INRIA a CCSD electronic archive server

Exploration du récit de vie d’adolescents

Author: Boulard Aurore
Poudat Céline
Publication venue
Publication date: 05/06/2014
Field of study

peer reviewedThe present paper follows on from the research we presented in previous JADT (Boulard, Poudat, Gauthier 2012). We still focus on the development of narrative competence (Habermas et Bluck 2000) in children and adolescents. Although children develop narrative skills, life narratives only emerge in adolescence. On the basis of a corpus made of spontaneous oral speech, we had empirically demonstrated that pre-adolescents aged of 12 had developed stabilized narrative skills enabling them to produce life narratives. Here, we propose a first exploration of the overall structure of adolescent life stories, thanks to a corpus of 268 oral self narratives completed with a questionnaire

Open Repository and Bibliography - Liège