84 research outputs found
Un hommage Ă Thierry Chanier
Nous dĂ©dions ce numĂ©ro sur les corpus complexes Ă notre collĂšgue Thierry Chanier, qui a jouĂ© un rĂŽle majeur dans le consortium Corpus Ăcrits (2011-2015) en mettant notamment en Ćuvre le projet CoMeRe qui fut lâune des plus grandes rĂ©ussites du consortium, et qui a cessĂ© ses activitĂ©s acadĂ©miques depuis mars 2017. Revenons dâabord en deux mots sur le parcours de Thierry Chanier, qui fut lâun des premiers en France Ă dĂ©velopper les liens entre Intelligence Artificielle (IA) et Traitement Automa..
French Wikipedia Talk Pages: Profiling and Conflict Detection
International audienceWikipedia is a popular and extremely useful resource for studies in both linguistics and natural language processing (Yano and Kang, 2008; Ferschke et al., 2013). This paper introduces a new language resource based on the French Wikipedia online discussion pages, the WikiTalk corpus. The publicly available corpus includes 160M words and 3M posts structured into 1M thematic sections and has been syntactically parsed with the Talismane toolkit (Urieli, 2013). In this paper, we present the first results of experiments aiming at classifying and profiling the talk pages and threads in order to determine criteria for selecting discussions with conflicts
Managing conflicts between users in Wikipedia
Wikipedia is nowadays a widely used encyclopedia, and one of the most visible
sites on the Internet. Its strong principle of collaborative work and free
editing sometimes generates disputes due to disagreements between users. In
this article we study how the wikipedian community resolves the conflicts and
which roles do wikipedian choose in this process. We observed the users
behavior both in the article talk pages, and in the Arbitration Committee pages
specifically dedicated to serious disputes. We first set up a users typology
according to their involvement in conflicts and their publishing and management
activity in the encyclopedia. We then used those user types to describe users
behavior in contributing to articles that are tagged by the wikipedian
community as being in conflict with the official guidelines of Wikipedia, or
conversely as being well featured.Comment: 12 p
Controverses du changement climatique : la reprĂ©sentation des paroles dâautrui dans les pages de discussion sur WikipĂ©dia francophone et norvĂ©gien
This article explores the collaborative encyclopedia Wikipedia through the lens of two aspects that have particularly interested Kjersti FlĂžttum throughout her work:
(i) the climate crisis, which will be examined through the analysis of a corpus of Wikipedia pages related to climate change as well as the relevant discussion pages. The corpus contains both French and Norwegian versions of the relevant pages; in this sense, it is a comparable corpus that allows us to shed light on differences both in the treatment and textualization of the contents, and in the nature of the discussions carried out;
(ii) the plurality of voices, more specifically from the perspective developed by J. Authier-Revuz (2020) (la reprĂ©sentation du discours autre - RDA), which allows us to analyze the representation of the speech of others, and linguistic polyphony (Ducrot 1984, NĂžlke 2017), with a focus on polemic negation in contexts of RDA, which allows us to analyze the expression of controversy.Le prĂ©sent article se propose dâexplorer lâencyclopĂ©die collaborative WikipĂ©dia au prisme de deux aspects ayant particuliĂšrement intĂ©ressĂ© Kjersti FlĂžttum au fil de ses travaux : (i) la crise climatique, qui sera abordĂ©e Ă travers le choix dâun corpus Ă©chantillonnĂ© comprenant des articles encyclopĂ©diques et leurs pages de discussion associĂ©es. Notre corpus de travail contient les deux versions francophone et norvĂ©gienne des pages choisies ; en ce sens, il sâagit dâun corpus comparable qui nous permettra de mettre au jour des diffĂ©rences significatives tant dans le traitement et la mise en texte des contenus, que dans la nature des discussions menĂ©es; (ii) la pluralitĂ© de voix, plus prĂ©cisĂ©ment sous lâoptique de la reprĂ©sentation du discours autre (RDA) dĂ©veloppĂ©e par J. Authier-Revuz (2020), qui nous permettra dâanalyser la reprĂ©sentation de la parole dâautrui et la polyphonie (Ducrot 1984, NĂžlke 2017), abordĂ©e dans le cadre de lâanalyse de la nĂ©gation polĂ©mique dans les contextes de RDA, ce qui nous permettra dâanalyser lâexpression de la controverse.publishedVersio
The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres
Final version to Special Issue of JLCL (Journal of Language Technology and Computational Linguistics (JLCL, http://jlcl.org/): BUILDING AND ANNOTATING CORPORA OF COMPUTER-MEDIATED DISCOURSE: Issues and Challenges at the Interface of Corpus and Computational Linguistics (ed. by Michael BeiĂwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel)International audienceThe CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective
Exploration du rĂ©cit de vie dâadolescents
peer reviewedThe present paper follows on from the research we presented in previous JADT (Boulard, Poudat, Gauthier
2012). We still focus on the development of narrative competence (Habermas et Bluck 2000) in children and
adolescents. Although children develop narrative skills, life narratives only emerge in adolescence. On the basis
of a corpus made of spontaneous oral speech, we had empirically demonstrated that pre-adolescents aged of 12
had developed stabilized narrative skills enabling them to produce life narratives. Here, we propose a first
exploration of the overall structure of adolescent life stories, thanks to a corpus of 268 oral self narratives
completed with a questionnaire
Du corpus au genre : lâexemple de linguistique
International audienc
- âŠ