1,007 research outputs found

    Damage Detection and Mitigation in Open Collaboration Applications

    Get PDF
    Collaborative functionality is changing the way information is amassed, refined, and disseminated in online environments. A subclass of these systems characterized by open collaboration uniquely allow participants to *modify* content with low barriers-to-entry. A prominent example and our case study, English Wikipedia, exemplifies the vulnerabilities: 7%+ of its edits are blatantly unconstructive. Our measurement studies show this damage manifests in novel socio-technical forms, limiting the effectiveness of computational detection strategies from related domains. In turn this has made much mitigation the responsibility of a poorly organized and ill-routed human workforce. We aim to improve all facets of this incident response workflow. Complementing language based solutions we first develop content agnostic predictors of damage. We implicitly glean reputations for system entities and overcome sparse behavioral histories with a spatial reputation model combining evidence from multiple granularity. We also identify simple yet indicative metadata features that capture participatory dynamics and content maturation. When brought to bear over damage corpora our contributions: (1) advance benchmarks over a broad set of security issues ( vandalism ), (2) perform well in the first anti-spam specific approach, and (3) demonstrate their portability over diverse open collaboration use cases. Probabilities generated by our classifiers can also intelligently route human assets using prioritization schemes optimized for capture rate or impact minimization. Organizational primitives are introduced that improve workforce efficiency. The whole of these strategies are then implemented into a tool ( STiki ) that has been used to revert 350,000+ damaging instances from Wikipedia. These uses are analyzed to learn about human aspects of the edit review process, properties including scalability, motivation, and latency. Finally, we conclude by measuring practical impacts of work, discussing how to better integrate our solutions, and revealing outstanding vulnerabilities that speak to research challenges for open collaboration security

    INCREASING THE WILLINGNESS TO COLLABORATE ONLINE: AN ANALYSIS OF SENTIMENT-DRIVEN INTERACTIONS IN PEER CONTENT PRODUCTION

    Get PDF
    We investigate mechanisms that trigger collaborative work behavior in online peer communities. We regard the collaboration among Wikipedia editors as a social process influenced by specific communication practices. We analyze and quantify the way Wikipedia editors communicate their feedback and support towards each others’ work in form of sentiments and opinions, and explore to what extent this influences online trust among them. We show that peer content production in Wikipedia is influenced by sharing sentiments during discussions among editors. At the global level, sharing sentiments positively influences the level of online trust. We also find a significant difference in the amount of online trust among editors who share mainly positive or mainly negative sentiments. We further suggest that providing and receiving especially supportive feedback expressed in form of positive sentiments and opinions may be beneficial in terms of virtual teamwork

    Calculating and Presenting Trust in Collaborative Content

    Get PDF
    Collaborative functionality is increasingly prevalent in Internet applications. Such functionality permits individuals to add -- and sometimes modify -- web content, often with minimal barriers to entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provides a medium for biased individuals, spammers, and nefarious persons to operate. By computing trust/reputation for participating agents and/or the content they generate, one can identify quality contributions. In this work, we survey the state-of-the-art for calculating trust in collaborative content. In particular, we examine four proposals from literature based on: (1) content persistence, (2) natural-language processing, (3) metadata properties, and (4) incoming link quantity. Though each technique can be applied broadly, Wikipedia provides a focal point for discussion. Finally, having critiqued how trust values are calculated, we analyze how the presentation of these values can benefit end-users and application security

    Trust in Collaborative Web Applications

    Get PDF
    Collaborative functionality is increasingly prevalent in web applications. Such functionality permits individuals to add - and sometimes modify - web content, often with minimal barriers to entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provide a medium for nefarious persons to operate. By determining the extent to which participating content/agents can be trusted, one can identify useful contributions. In this work, we define the notion of trust for Collaborative Web Applications and survey the state-of-the-art for calculating, interpreting, and presenting trust values. Though techniques can be applied broadly, Wikipedia\u27s archetypal nature makes it a focal point for discussion

    Automatically Neutralizing Subjective Bias in Text

    Full text link
    Texts like news, encyclopedias, and some social media strive for objectivity. Yet bias in the form of inappropriate subjectivity - introducing attitudes via framing, presupposing truth, and casting doubt - remains ubiquitous. This kind of bias erodes our collective trust and fuels social conflict. To address this issue, we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view ("neutralizing" biased text). We also offer the first parallel corpus of biased language. The corpus contains 180,000 sentence pairs and originates from Wikipedia edits that removed various framings, presuppositions, and attitudes from biased sentences. Last, we propose two strong encoder-decoder baselines for the task. A straightforward yet opaque CONCURRENT system uses a BERT encoder to identify subjective words as part of the generation process. An interpretable and controllable MODULAR algorithm separates these steps, using (1) a BERT-based classifier to identify problematic words and (2) a novel join embedding through which the classifier can edit the hidden states of the encoder. Large-scale human evaluation across four domains (encyclopedias, news headlines, books, and political speeches) suggests that these algorithms are a first step towards the automatic identification and reduction of bias.Comment: To appear at AAAI 202

    Are We There Yet?: The Development of a Corpus Annotated for Social Acts in Multilingual Online Discourse

    Get PDF
    We present the AAWD and AACD corpora, a collection of discussions drawn from Wikipedia talk pages and small group IRC discussions in English, Russian and Mandarin. Our datasets are annotated with labels capturing two kinds of social acts: alignment moves and authority claims. We describe these social acts, describe our annotation process, highlight challenges we encountered and strategies we employed during annotation, and present some analyses of resulting data set which illustrate the utility of our corpus and identify interactions among social acts and between participant status and social acts and in online discourse

    A Data Mining Toolbox for Collaborative Writing Processes

    Get PDF
    Collaborative writing (CW) is an essential skill in academia and industry. Providing support during the process of CW can be useful not only for achieving better quality documents, but also for improving the CW skills of the writers. In order to properly support collaborative writing, it is essential to understand how ideas and concepts are developed during the writing process, which consists of a series of steps of writing activities. These steps can be considered as sequence patterns comprising both time events and the semantics of the changes made during those steps. Two techniques can be combined to examine those patterns: process mining, which focuses on extracting process-related knowledge from event logs recorded by an information system; and semantic analysis, which focuses on extracting knowledge about what the student wrote or edited. This thesis contributes (i) techniques to automatically extract process models of collaborative writing processes and (ii) visualisations to describe aspects of collaborative writing. These two techniques form a data mining toolbox for collaborative writing by using process mining, probabilistic graphical models, and text mining. First, I created a framework, WriteProc, for investigating collaborative writing processes, integrated with the existing cloud computing writing tools in Google Docs. Secondly, I created new heuristic to extract the semantic nature of text edits that occur in the document revisions and automatically identify the corresponding writing activities. Thirdly, based on sequences of writing activities, I propose methods to discover the writing process models and transitional state diagrams using a process mining algorithm, Heuristics Miner, and Hidden Markov Models, respectively. Finally, I designed three types of visualisations and made contributions to their underlying techniques for analysing writing processes. All components of the toolbox are validated against annotated writing activities of real documents and a synthetic dataset. I also illustrate how the automatically discovered process models and visualisations are used in the process analysis with real documents written by groups of graduate students. I discuss how the analyses can be used to gain further insight into how students work and create their collaborative documents
    • …
    corecore