Search CORE

1,165 research outputs found

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

Author: Haldar Aparajita
Hu J. Edward
Pavlick Ellie
Poliak Adam
Rudinger Rachel
Van Durme Benjamin
White Aaron Steven
Publication venue
Publication date: 01/01/2018
Field of study

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.Comment: To be presented at EMNLP 2018. 15 page

arXiv.org e-Print Archive

Crossref

Scholarship, Research, and Creative Work at Bryn Mawr College | Bryn Mawr College Research

Toward Real Event Detection

Author: Färber Michael
Rettinger Achim
Publication venue: RWTH Aachen
Publication date: 01/01/2015
Field of study

News agencies and other news providers or consumers are confronted with the task of extracting events from news articles. This is done i) either to monitor and, hence, to be informed about events of specific kinds over time and/or ii) to react to events immediately. In the past, several promising approaches to extracting events from text have been proposed. Besides purely statistically-based approaches there are methods to represent events in a semantically-structured form, such as graphs containing actions (predicates), participants (entities), etc. However, it turns out to be very dificult to automatically determine whether an event is real or not. In this paper, we give an overview of approaches which proposed solutions for this research problem. We show that there is no gold standard dataset where real events are annotated in text documents in a fine-grained, semantically-enriched way. We present A methodology of creating such a dataset with the help of crowdsourcing and present preliminary results

KITopen

Automatic Detection of Modality with ITGETARUNS

Author: Byrne Nuala
Gibson Alice
Hills Andrew
King Neil
Roekenes Jessica
Sainsbury-Salis Amanda
Seimon Radhika
Wood Rachel
Zhu Benjamin
Zibellini Jessica
Publication venue: Springer
Publication date: 01/01/2015
Field of study

In this paper we present a system for modality detection which is then used for Subjectivity and Factuality evaluation. The system has been tested lately on a task for Subjectivity and Irony detection in Italian tweets , where the performance was 10th and 4th, respectively, over 27 participants overall. We will focus our paper on an internal evaluation where we considered three national newspapers Il Corriere, Repubblica, Libero. This task was prompted by a project on the evaluation of press stylistic features in political discourse. The project used newspaper articles from the same sources over a period of three months, thus including latest political 2013 governmental crisis. We intended to produce a similar experiment and evaluate results in comparison with previous 2011 crisis. In this evaluation, we focused on Subjectivity, Polarity and Factuality which include Modality evaluation. Final graphs at the end of the paper will show results confirming our previous findings about differences in style, with Il Corriere emerging as the most atypical

Archivio Ricerca Ca'Foscari

Crossref

Queensland University of Technology ePrints Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

University of Queensland eSpace

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Author: Benevenuto Fabrício
Jaidka Kokil
Pardo Thiago A. S.
Vargas Francielle
Publication venue
Publication date: 26/04/2023
Field of study

Predicting the factuality of news reporting and bias of media outlets is surely relevant for automated news credibility and fact-checking. While prior work has focused on the veracity of news, we propose a fine-grained reliability analysis of the entire media. Specifically, we study the prediction of sentence-level factuality of news reporting and bias of media outlets, which may explain more accurately the overall reliability of the entire source. We first manually produced a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions from AllSides. As a result, baseline models for sentence-level factuality prediction were presented by fine-tuning BERT. Finally, due to the severity of fake news and political polarization in Brazil, both dataset and baseline were proposed for Portuguese. However, our approach may be applied to any other language

arXiv.org e-Print Archive

Impact of Stricter Content Moderation on Parler's Users' Discourse

Author: Kumarswamy Nihal
Nilizadeh Shirin
Singhal Mohit
Publication venue
Publication date: 13/10/2023
Field of study

Social media platforms employ various content moderation techniques to remove harmful, offensive, and hate speech content. The moderation level varies across platforms; even over time, it can evolve in a platform. For example, Parler, a fringe social media platform popular among conservative users, was known to have the least restrictive moderation policies, claiming to have open discussion spaces for their users. However, after linking the 2021 US Capitol Riots and the activity of some groups on Parler, such as QAnon and Proud Boys, on January 12, 2021, Parler was removed from the Apple and Google App Store and suspended from Amazon Cloud hosting service. Parler would have to modify their moderation policies to return to these online stores. After a month of downtime, Parler was back online with a new set of user guidelines, which reflected stricter content moderation, especially regarding the \emph{hate speech} policy. In this paper, we studied the moderation changes performed by Parler and their effect on the toxicity of its content. We collected a large longitudinal Parler dataset with 17M parleys from 432K active users from February 2021 to January 2022, after its return to the Internet and App Store. To the best of our knowledge, this is the first study investigating the effectiveness of content moderation techniques using data-driven approaches and also the first Parler dataset after its brief hiatus. Our quasi-experimental time series analysis indicates that after the change in Parler's moderation, the severe forms of toxicity (above a threshold of 0.5) immediately decreased and sustained. In contrast, the trend did not change for less severe threats and insults (a threshold between 0.5 - 0.7). Finally, we found an increase in the factuality of the news sites being shared, as well as a decrease in the number of conspiracy or pseudoscience sources being shared

arXiv.org e-Print Archive