84 research outputs found

    Un hommage Ă  Thierry Chanier

    Get PDF
    Nous dĂ©dions ce numĂ©ro sur les corpus complexes Ă  notre collĂšgue Thierry Chanier, qui a jouĂ© un rĂŽle majeur dans le consortium Corpus Écrits (2011-2015) en mettant notamment en Ɠuvre le projet CoMeRe qui fut l’une des plus grandes rĂ©ussites du consortium, et qui a cessĂ© ses activitĂ©s acadĂ©miques depuis mars 2017. Revenons d’abord en deux mots sur le parcours de Thierry Chanier, qui fut l’un des premiers en France Ă  dĂ©velopper les liens entre Intelligence Artificielle (IA) et Traitement Automa..

    French Wikipedia Talk Pages: Profiling and Conflict Detection

    Get PDF
    International audienceWikipedia is a popular and extremely useful resource for studies in both linguistics and natural language processing (Yano and Kang, 2008; Ferschke et al., 2013). This paper introduces a new language resource based on the French Wikipedia online discussion pages, the WikiTalk corpus. The publicly available corpus includes 160M words and 3M posts structured into 1M thematic sections and has been syntactically parsed with the Talismane toolkit (Urieli, 2013). In this paper, we present the first results of experiments aiming at classifying and profiling the talk pages and threads in order to determine criteria for selecting discussions with conflicts

    Managing conflicts between users in Wikipedia

    Get PDF
    Wikipedia is nowadays a widely used encyclopedia, and one of the most visible sites on the Internet. Its strong principle of collaborative work and free editing sometimes generates disputes due to disagreements between users. In this article we study how the wikipedian community resolves the conflicts and which roles do wikipedian choose in this process. We observed the users behavior both in the article talk pages, and in the Arbitration Committee pages specifically dedicated to serious disputes. We first set up a users typology according to their involvement in conflicts and their publishing and management activity in the encyclopedia. We then used those user types to describe users behavior in contributing to articles that are tagged by the wikipedian community as being in conflict with the official guidelines of Wikipedia, or conversely as being well featured.Comment: 12 p

    Controverses du changement climatique : la reprĂ©sentation des paroles d’autrui dans les pages de discussion sur WikipĂ©dia francophone et norvĂ©gien

    Get PDF
    This article explores the collaborative encyclopedia Wikipedia through the lens of two aspects that have particularly interested Kjersti FlĂžttum throughout her work: (i) the climate crisis, which will be examined through the analysis of a corpus of Wikipedia pages related to climate change as well as the relevant discussion pages. The corpus contains both French and Norwegian versions of the relevant pages; in this sense, it is a comparable corpus that allows us to shed light on differences both in the treatment and textualization of the contents, and in the nature of the discussions carried out; (ii) the plurality of voices, more specifically from the perspective developed by J. Authier-Revuz (2020) (la reprĂ©sentation du discours autre - RDA), which allows us to analyze the representation of the speech of others, and linguistic polyphony (Ducrot 1984, NĂžlke 2017), with a focus on polemic negation in contexts of RDA, which allows us to analyze the expression of controversy.Le prĂ©sent article se propose d’explorer l’encyclopĂ©die collaborative WikipĂ©dia au prisme de deux aspects ayant particuliĂšrement intĂ©ressĂ© Kjersti FlĂžttum au fil de ses travaux : (i) la crise climatique, qui sera abordĂ©e Ă  travers le choix d’un corpus Ă©chantillonnĂ© comprenant des articles encyclopĂ©diques et leurs pages de discussion associĂ©es. Notre corpus de travail contient les deux versions francophone et norvĂ©gienne des pages choisies ; en ce sens, il s’agit d’un corpus comparable qui nous permettra de mettre au jour des diffĂ©rences significatives tant dans le traitement et la mise en texte des contenus, que dans la nature des discussions menĂ©es; (ii) la pluralitĂ© de voix, plus prĂ©cisĂ©ment sous l’optique de la reprĂ©sentation du discours autre (RDA) dĂ©veloppĂ©e par J. Authier-Revuz (2020), qui nous permettra d’analyser la reprĂ©sentation de la parole d’autrui et la polyphonie (Ducrot 1984, NĂžlke 2017), abordĂ©e dans le cadre de l’analyse de la nĂ©gation polĂ©mique dans les contextes de RDA, ce qui nous permettra d’analyser l’expression de la controverse.publishedVersio

    The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres

    Get PDF
    Final version to Special Issue of JLCL (Journal of Language Technology and Computational Linguistics (JLCL, http://jlcl.org/): BUILDING AND ANNOTATING CORPORA OF COMPUTER-MEDIATED DISCOURSE: Issues and Challenges at the Interface of Corpus and Computational Linguistics (ed. by Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel)International audienceThe CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective

    Exploration du rĂ©cit de vie d’adolescents

    Full text link
    peer reviewedThe present paper follows on from the research we presented in previous JADT (Boulard, Poudat, Gauthier 2012). We still focus on the development of narrative competence (Habermas et Bluck 2000) in children and adolescents. Although children develop narrative skills, life narratives only emerge in adolescence. On the basis of a corpus made of spontaneous oral speech, we had empirically demonstrated that pre-adolescents aged of 12 had developed stabilized narrative skills enabling them to produce life narratives. Here, we propose a first exploration of the overall structure of adolescent life stories, thanks to a corpus of 268 oral self narratives completed with a questionnaire

    Du corpus au genre : l’exemple de linguistique

    No full text
    International audienc
    • 

    corecore