483 research outputs found
A Multidimensional Dataset Based on Crowdsourcing for Analyzing and Detecting News Bias
The automatic detection of bias in news articles can have a high impact on society because undiscovered news bias may influence the political opinions, social views, and emotional feelings of readers. While various analyses and approaches to news bias detection have been proposed, large data sets with rich bias annotations on a fine-grained level are still missing. In this paper, we firstly aggregate the aspects of news bias in related works by proposing a new annotation schema for labeling news bias. This schema covers the overall bias, as well as the bias dimensions (1) hidden assumptions, (2) subjectivity, and (3) representation tendencies. Secondly, we propose a methodology based on crowdsourcing for obtaining a large data set for news bias analysis and identification. We then use our methodology to create a dataset consisting of more than 2,000 sentences annotated with 43,000 bias and bias dimension labels. Thirdly, we perform an in-depth analysis of the collected data. We show that the annotation task is difficult with respect to bias and specific bias dimensions. While crowdworkers\u27 labels of representation tendencies correlate with experts\u27 bias labels for articles, subjectivity and hidden assumptions do not correlate with experts\u27 bias labels and, thus, seem to be less relevant when creating data sets with crowdworkers. The experts\u27 article labels better match the inferred crowdworkers\u27 article labels than the crowdworkers\u27 sentence labels. The crowdworkers\u27 countries of origin seem to affect their judgements. In our study, non-Western crowdworkers tend to annotate more bias either directly or in the form of bias dimensions (e.g., subjectivity) than Western crowdworkers do
Neural Based Statement Classification for Biased Language
Biased language commonly occurs around topics which are of controversial
nature, thus, stirring disagreement between the different involved parties of a
discussion. This is due to the fact that for language and its use,
specifically, the understanding and use of phrases, the stances are cohesive
within the particular groups. However, such cohesiveness does not hold across
groups.
In collaborative environments or environments where impartial language is
desired (e.g. Wikipedia, news media), statements and the language therein
should represent equally the involved parties and be neutrally phrased. Biased
language is introduced through the presence of inflammatory words or phrases,
or statements that may be incorrect or one-sided, thus violating such
consensus.
In this work, we focus on the specific case of phrasing bias, which may be
introduced through specific inflammatory words or phrases in a statement. For
this purpose, we propose an approach that relies on a recurrent neural networks
in order to capture the inter-dependencies between words in a phrase that
introduced bias.
We perform a thorough experimental evaluation, where we show the advantages
of a neural based approach over competitors that rely on word lexicons and
other hand-crafted features in detecting biased language. We are able to
distinguish biased statements with a precision of P=0.92, thus significantly
outperforming baseline models with an improvement of over 30%. Finally, we
release the largest corpus of statements annotated for biased language.Comment: The Twelfth ACM International Conference on Web Search and Data
Mining, February 11--15, 2019, Melbourne, VIC, Australi
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles
Media has a substantial impact on the public perception of events. A
one-sided or polarizing perspective on any topic is usually described as media
bias. One of the ways how bias in news articles can be introduced is by
altering word choice. Biased word choices are not always obvious, nor do they
exhibit high context-dependency. Hence, detecting bias is often difficult. We
propose a Transformer-based deep learning architecture trained via Multi-Task
Learning using six bias-related data sets to tackle the media bias detection
problem. Our best-performing implementation achieves a macro of 0.776,
a performance boost of 3\% compared to our baseline, outperforming existing
methods. Our results indicate Multi-Task Learning as a promising alternative to
improve existing baseline models in identifying slanted reporting
Methods for detecting and mitigating linguistic bias in text corpora
Im Zuge der fortschreitenden Ausbreitung des Webs in alle Aspekte des täglichen
Lebens wird Bias in Form von Voreingenommenheit und versteckten Meinungen zu einem
zunehmend herausfordernden Problem. Eine weitverbreitete Erscheinungsform ist Bias in
Textdaten. Um dem entgegenzuwirken hat die Online-Enzyklopädie Wikipedia das Prinzip
des neutralen Standpunkts (Englisch: Neutral Point of View, kurz: NPOV) eingefĂĽhrt,
welcher die Verwendung neutraler Sprache und die Vermeidung von einseitigen oder subjektiven
Formulierungen vorschreibt. Während Studien gezeigt haben, dass die Qualität von
Wikipedia-Artikel mit der Qualität von Artikeln in klassischen Enzyklopädien vergleichbar
ist, zeigt die Forschung gleichzeitig auch, dass Wikipedia anfällig für verschiedene Typen
von NPOV-Verletzungen ist. Bias zu identifizieren, kann eine herausfordernde Aufgabe sein,
sogar fĂĽr Menschen, und mit Millionen von Artikeln und einer zurĂĽckgehenden Anzahl von
Mitwirkenden wird diese Aufgabe zunehmend schwieriger. Wenn Bias nicht eingedämmt
wird, kann dies nicht nur zu Polarisierungen und Konflikten zwischen Meinungsgruppen
fĂĽhren, sondern Nutzer auch negativ in ihrer freien Meinungsbildung beeinflussen. Hinzu
kommt, dass sich Bias in Texten und in Ground-Truth-Daten negativ auf Machine Learning
Modelle, die auf diesen Daten trainiert werden, auswirken kann, was zu diskriminierendem
Verhalten von Modellen fĂĽhren kann.
In dieser Arbeit beschäftigen wir uns mit Bias, indem wir uns auf drei zentrale Aspekte
konzentrieren: Bias-Inhalte in Form von geschriebenen Aussagen, Bias von Crowdworkern
während des Annotierens von Daten und Bias in Word Embeddings Repräsentationen.
Wir stellen zwei Ansätze für die Identifizierung von Aussagen mit Bias in Textsammlungen
wie Wikipedia vor. Unser auf Features basierender Ansatz verwendet Bag-of-Word
Features inklusive einer Liste von Bias-Wörtern, die wir durch das Identifizieren von Clustern
von Bias-Wörtern im Vektorraum von Word Embeddings zusammengestellt haben.
Unser verbesserter, neuronaler Ansatz verwendet Gated Recurrent Neural Networks, um
Kontext-Abhängigkeiten zu erfassen und die Performance des Modells weiter zu verbessern.
Unsere Studie zum Thema Crowd Worker Bias deckt Bias-Verhalten von Crowdworkern
mit extremen Meinungen zu einem bestimmten Thema auf und zeigt, dass dieses Verhalten
die entstehenden Ground-Truth-Label beeinflusst, was wiederum Einfluss auf die Erstellung
von Datensätzen für Aufgaben wie Bias Identifizierung oder Sentiment Analysis hat. Wir
stellen Ansätze für die Abschwächung von Worker Bias vor, die Bewusstsein unter den
Workern erzeugen und das Konzept der sozialen Projektion verwenden.
Schließlich beschäftigen wir uns mit dem Problem von Bias in Word Embeddings,
indem wir uns auf das Beispiel von variierenden Sentiment-Scores fĂĽr Namen konzentrieren.
Wir zeigen, dass Bias in den Trainingsdaten von den Embeddings erfasst und an
nachgelagerte Modelle weitergegeben wird. In diesem Zusammenhang stellen wir einen
Debiasing-Ansatz vor, der den Bias-Effekt reduziert und sich positiv auf die produzierten
Label eines nachgeschalteten Sentiment Classifiers auswirkt
Connotation Frames: A Data-Driven Investigation
Through a particular choice of a predicate (e.g., "x violated y"), a writer
can subtly connote a range of implied sentiments and presupposed facts about
the entities x and y: (1) writer's perspective: projecting x as an
"antagonist"and y as a "victim", (2) entities' perspective: y probably dislikes
x, (3) effect: something bad happened to y, (4) value: y is something valuable,
and (5) mental state: y is distressed by the event. We introduce connotation
frames as a representation formalism to organize these rich dimensions of
connotation using typed relations. First, we investigate the feasibility of
obtaining connotative labels through crowdsourcing experiments. We then present
models for predicting the connotation frames of verb predicates based on their
distributional word representations and the interplay between different types
of connotative relations. Empirical results confirm that connotation frames can
be induced from various data sources that reflect how people use language and
give rise to the connotative meanings. We conclude with analytical results that
show the potential use of connotation frames for analyzing subtle biases in
online news media.Comment: 11 pages, published in Proceedings of ACL 201
Predicting Sentence-Level Factuality of News and Bias of Media Outlets
Predicting the factuality of news reporting and bias of media outlets is
surely relevant for automated news credibility and fact-checking. While prior
work has focused on the veracity of news, we propose a fine-grained reliability
analysis of the entire media. Specifically, we study the prediction of
sentence-level factuality of news reporting and bias of media outlets, which
may explain more accurately the overall reliability of the entire source. We
first manually produced a large sentence-level dataset, titled "FactNews",
composed of 6,191 sentences expertly annotated according to factuality and
media bias definitions from AllSides. As a result, baseline models for
sentence-level factuality prediction were presented by fine-tuning BERT.
Finally, due to the severity of fake news and political polarization in Brazil,
both dataset and baseline were proposed for Portuguese. However, our approach
may be applied to any other language
- …