1,165 research outputs found

    Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

    Get PDF
    We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.Comment: To be presented at EMNLP 2018. 15 page

    Toward Real Event Detection

    Get PDF
    News agencies and other news providers or consumers are confronted with the task of extracting events from news articles. This is done i) either to monitor and, hence, to be informed about events of specific kinds over time and/or ii) to react to events immediately. In the past, several promising approaches to extracting events from text have been proposed. Besides purely statistically-based approaches there are methods to represent events in a semantically-structured form, such as graphs containing actions (predicates), participants (entities), etc. However, it turns out to be very dificult to automatically determine whether an event is real or not. In this paper, we give an overview of approaches which proposed solutions for this research problem. We show that there is no gold standard dataset where real events are annotated in text documents in a fine-grained, semantically-enriched way. We present A methodology of creating such a dataset with the help of crowdsourcing and present preliminary results

    Automatic Detection of Modality with ITGETARUNS

    Get PDF
    In this paper we present a system for modality detection which is then used for Subjectivity and Factuality evaluation. The system has been tested lately on a task for Subjectivity and Irony detection in Italian tweets , where the performance was 10th and 4th, respectively, over 27 participants overall. We will focus our paper on an internal evaluation where we considered three national newspapers Il Corriere, Repubblica, Libero. This task was prompted by a project on the evaluation of press stylistic features in political discourse. The project used newspaper articles from the same sources over a period of three months, thus including latest political 2013 governmental crisis. We intended to produce a similar experiment and evaluate results in comparison with previous 2011 crisis. In this evaluation, we focused on Subjectivity, Polarity and Factuality which include Modality evaluation. Final graphs at the end of the paper will show results confirming our previous findings about differences in style, with Il Corriere emerging as the most atypical

    Predicting Sentence-Level Factuality of News and Bias of Media Outlets

    Full text link
    Predicting the factuality of news reporting and bias of media outlets is surely relevant for automated news credibility and fact-checking. While prior work has focused on the veracity of news, we propose a fine-grained reliability analysis of the entire media. Specifically, we study the prediction of sentence-level factuality of news reporting and bias of media outlets, which may explain more accurately the overall reliability of the entire source. We first manually produced a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions from AllSides. As a result, baseline models for sentence-level factuality prediction were presented by fine-tuning BERT. Finally, due to the severity of fake news and political polarization in Brazil, both dataset and baseline were proposed for Portuguese. However, our approach may be applied to any other language

    Impact of Stricter Content Moderation on Parler's Users' Discourse

    Full text link
    Social media platforms employ various content moderation techniques to remove harmful, offensive, and hate speech content. The moderation level varies across platforms; even over time, it can evolve in a platform. For example, Parler, a fringe social media platform popular among conservative users, was known to have the least restrictive moderation policies, claiming to have open discussion spaces for their users. However, after linking the 2021 US Capitol Riots and the activity of some groups on Parler, such as QAnon and Proud Boys, on January 12, 2021, Parler was removed from the Apple and Google App Store and suspended from Amazon Cloud hosting service. Parler would have to modify their moderation policies to return to these online stores. After a month of downtime, Parler was back online with a new set of user guidelines, which reflected stricter content moderation, especially regarding the \emph{hate speech} policy. In this paper, we studied the moderation changes performed by Parler and their effect on the toxicity of its content. We collected a large longitudinal Parler dataset with 17M parleys from 432K active users from February 2021 to January 2022, after its return to the Internet and App Store. To the best of our knowledge, this is the first study investigating the effectiveness of content moderation techniques using data-driven approaches and also the first Parler dataset after its brief hiatus. Our quasi-experimental time series analysis indicates that after the change in Parler's moderation, the severe forms of toxicity (above a threshold of 0.5) immediately decreased and sustained. In contrast, the trend did not change for less severe threats and insults (a threshold between 0.5 - 0.7). Finally, we found an increase in the factuality of the news sites being shared, as well as a decrease in the number of conspiracy or pseudoscience sources being shared
    • …
    corecore