4,182 research outputs found
Political Text Scaling Meets Computational Semantics
During the last fifteen years, automatic text scaling has become one of the
key tools of the Text as Data community in political science. Prominent text
scaling algorithms, however, rely on the assumption that latent positions can
be captured just by leveraging the information about word frequencies in
documents under study. We challenge this traditional view and present a new,
semantically aware text scaling algorithm, SemScale, which combines recent
developments in the area of computational linguistics with unsupervised
graph-based clustering. We conduct an extensive quantitative analysis over a
collection of speeches from the European Parliament in five different languages
and from two different legislative terms, and show that a scaling approach
relying on semantic document representations is often better at capturing known
underlying political dimensions than the established frequency-based (i.e.,
symbolic) scaling method. We further validate our findings through a series of
experiments focused on text preprocessing and feature selection, document
representation, scaling of party manifestos, and a supervised extension of our
algorithm. To catalyze further research on this new branch of text scaling
methods, we release a Python implementation of SemScale with all included data
sets and evaluation procedures.Comment: Updated version - accepted for Transactions on Data Science (TDS
Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection
In this paper, we describe our submission to SemEval-2019 Task 4 on
Hyperpartisan News Detection. Our system relies on a variety of engineered
features originally used to detect propaganda. This is based on the assumption
that biased messages are propagandistic in the sense that they promote a
particular political cause or viewpoint. We trained a logistic regression model
with features ranging from simple bag-of-words to vocabulary richness and text
readability features. Our system achieved 72.9% accuracy on the test data that
is annotated manually and 60.8% on the test data that is annotated with distant
supervision. Additional experiments showed that significant performance
improvements can be achieved with better feature pre-processing.Comment: Hyperpartisanship, propaganda, news media, fake news, SemEval-201
Difficult forms: critical practices of design and research
As a kind of 'criticism from within', conceptual and critical design inquire into what design is about – how the market operates, what is considered 'good design', and how the design and development of technology typically works. Tracing relations of conceptual and critical design to (post-)critical architecture and anti-design, we discuss a series of issues related to the operational and intellectual basis for 'critical practice', and how these might open up for a new kind of development of the conceptual and theoretical frameworks of design. Rather than prescribing a practice on the basis of theoretical considerations, these critical practices seem to build an intellectual basis for design on the basis of its own modes of operation, a kind of theoretical development that happens through, and from within, design practice and not by means of external descriptions or analyses of its practices and products
Multilingual estimation of political-party positioning: From label aggregation to long-input Transformers
Scaling analysis is a technique in computational political science that
assigns a political actor (e.g. politician or party) a score on a predefined
scale based on a (typically long) body of text (e.g. a parliamentary speech or
an election manifesto). For example, political scientists have often used the
left--right scale to systematically analyse political landscapes of different
countries. NLP methods for automatic scaling analysis can find broad
application provided they (i) are able to deal with long texts and (ii) work
robustly across domains and languages. In this work, we implement and compare
two approaches to automatic scaling analysis of political-party manifestos:
label aggregation, a pipeline strategy relying on annotations of individual
statements from the manifestos, and long-input-Transformer-based models, which
compute scaling values directly from raw text. We carry out the analysis of the
Comparative Manifestos Project dataset across 41 countries and 27 languages and
find that the task can be efficiently solved by state-of-the-art models, with
label aggregation producing the best results.Comment: Accepted to EMNLP 202
The Ethical Need for Watermarks in Machine-Generated Language
Watermarks should be introduced in the natural language outputs of AI systems
in order to maintain the distinction between human and machine-generated text.
The ethical imperative to not blur this distinction arises from the asemantic
nature of large language models and from human projections of emotional and
cognitive states on machines, possibly leading to manipulation, spreading
falsehoods or emotional distress. Enforcing this distinction requires
unintrusive, yet easily accessible marks of the machine origin. We propose to
implement a code based on equidistant letter sequences. While no such code
exists in human-written texts, its appearance in machine-generated ones would
prove helpful for ethical reasons
- …