Search CORE

6,313 research outputs found

The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity

Author: Dallas
Felix Hamborg
Wei-Fan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/05/2020
Field of study

News articles covering policy issues are an essential source of information in the social sciences and are also frequently used for other use cases, e.g., to train NLP language models. To derive meaningful insights from the analysis of news, large datasets are required that represent real-world distributions, e.g., with respect to the contained outlets' popularity, topically, or across time. Information on the political leanings of media publishers is often needed, e.g., to study differences in news reporting across the political spectrum, which is one of the prime use cases in the social sciences when studying media bias and related societal issues. Concerning these requirements, existing datasets have major flaws, resulting in redundant and cumbersome effort in the research community for dataset creation. To fill this gap, we present POLUSA, a dataset that represents the online media landscape as perceived by an average US news consumer. The dataset contains 0.9M articles covering policy topics published between Jan. 2017 and Aug. 2019 by 18 news outlets representing the political spectrum. Each outlet is labeled by its political leaning, which we derive using a systematic aggregation of eight data sources. The news dataset is balanced with respect to publication date and outlet popularity. POLUSA enables studying a variety of subjects, e.g., media effects and political partisanship. Due to its size, the dataset allows to utilize data-intense deep learning methods.Comment: 2 pages, 1 tabl

arXiv.org e-Print Archive

Crossref

Towards an Automated Media Chart: Framing News Articles with Natural Language Processing Techniques

Author: José Luís Sousa Tavares
Publication venue
Publication date: 21/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Detecting frames in news headlines and its application to analyzing news framing trends surrounding U.S. gun violence

Author: Betke Margrit
Guo Lei
Liu Siyi
Mays Kate
Wijaya Derry Tanti
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as “frames”, which, when used in news media will influence the opinion of their readers in multiple ways. In this paper, we present a method for effectively detecting frames in news headlines. Our training and performance evaluation is based on a new dataset of news headlines related to the issue of gun violence in the United States. This Gun Violence Frame Corpus (GVFC) was curated and annotated by journalism and communication experts. Our proposed approach sets a new state-of-the-art performance for multiclass news frame detection, significantly outperforming a recent baseline by 35.9% absolute difference in accuracy. We apply our frame detection approach in a large scale study of 88k news headlines about the coverage of gun violence in the U.S. between 2016 and 2018.Published versio

Crossref

Boston University Institutional Repository (OpenBU)

Newsalyze: Effective Communication of Person-Targeting Biases in News Articles

Author: Donnay Karsten
Gipp Bela
Hamborg Felix
Heinser Kim
Zhukova Anastasia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/09/2021
Field of study

Media bias and its extreme form, fake news, can decisively affect public opinion. Especially when reporting on policy issues, slanted news coverage may strongly influence societal decisions, e.g., in democratic elections. Our paper makes three contributions to address this issue. First, we present a system for bias identification, which combines state-of-the-art methods from natural language understanding. Second, we devise bias-sensitive visualizations to communicate bias in news articles to non-expert news consumers. Third, our main contribution is a large-scale user study that measures bias-awareness in a setting that approximates daily news consumption, e.g., we present respondents with a news overview and individual articles. We not only measure the visualizations' effect on respondents' bias-awareness, but we can also pinpoint the effects on individual components of the visualizations by employing a conjoint design. Our bias-sensitive overviews strongly and significantly increase bias-awareness in respondents. Our study further suggests that our content-driven identification method detects groups of similarly slanted news articles due to substantial biases present in individual news articles. In contrast, the reviewed prior work rather only facilitates the visibility of biases, e.g., by distinguishing left- and right-wing outlets

ZORA

A Multidimensional Dataset Based on Crowdsourcing for Analyzing and Detecting News Bias

Author: Amy
Cremisini Andres
Jianwei
Johannes
Lim Sora
Lim Sora
Marta
Martin
Nakashole Ndapandula
Souneil
Publication venue: Association for Computing Machinery
Publication date: 01/01/2020
Field of study

The automatic detection of bias in news articles can have a high impact on society because undiscovered news bias may influence the political opinions, social views, and emotional feelings of readers. While various analyses and approaches to news bias detection have been proposed, large data sets with rich bias annotations on a fine-grained level are still missing. In this paper, we firstly aggregate the aspects of news bias in related works by proposing a new annotation schema for labeling news bias. This schema covers the overall bias, as well as the bias dimensions (1) hidden assumptions, (2) subjectivity, and (3) representation tendencies. Secondly, we propose a methodology based on crowdsourcing for obtaining a large data set for news bias analysis and identification. We then use our methodology to create a dataset consisting of more than 2,000 sentences annotated with 43,000 bias and bias dimension labels. Thirdly, we perform an in-depth analysis of the collected data. We show that the annotation task is difficult with respect to bias and specific bias dimensions. While crowdworkers\u27 labels of representation tendencies correlate with experts\u27 bias labels for articles, subjectivity and hidden assumptions do not correlate with experts\u27 bias labels and, thus, seem to be less relevant when creating data sets with crowdworkers. The experts\u27 article labels better match the inferred crowdworkers\u27 article labels than the crowdworkers\u27 sentence labels. The crowdworkers\u27 countries of origin seem to affect their judgements. In our study, non-Western crowdworkers tend to annotate more bias either directly or in the form of bias dimensions (e.g., subjectivity) than Western crowdworkers do

Crossref

KITopen

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

Author: Pavllo Dario
Piccardi Tiziano
West Robert
Publication venue
Publication date: 07/04/2018
Field of study

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications