Search CORE

4,182 research outputs found

Political Text Scaling Meets Computational Semantics

Author: Glavas Goran
Nanni Federico
Ponzetto Simone Paolo
Rehbein Ines
Stuckenschmidt Heiner
Publication venue
Publication date: 01/01/2021
Field of study

During the last fifteen years, automatic text scaling has become one of the key tools of the Text as Data community in political science. Prominent text scaling algorithms, however, rely on the assumption that latent positions can be captured just by leveraging the information about word frequencies in documents under study. We challenge this traditional view and present a new, semantically aware text scaling algorithm, SemScale, which combines recent developments in the area of computational linguistics with unsupervised graph-based clustering. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislative terms, and show that a scaling approach relying on semantic document representations is often better at capturing known underlying political dimensions than the established frequency-based (i.e., symbolic) scaling method. We further validate our findings through a series of experiments focused on text preprocessing and feature selection, document representation, scaling of party manifestos, and a supervised extension of our algorithm. To catalyze further research on this new branch of text scaling methods, we release a Python implementation of SemScale with all included data sets and evaluation procedures.Comment: Updated version - accepted for Transactions on Data Science (TDS

arXiv.org e-Print Archive

MAnnheim DOCument Server

Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

Author: Baly Ramy
Barrón-Cedeño Alberto
Glass James
Martino Giovanni Da San
Mohtarami Mitra
Nakov Preslav
Saleh Abdelrhman
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. Our system relies on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic in the sense that they promote a particular political cause or viewpoint. We trained a logistic regression model with features ranging from simple bag-of-words to vocabulary richness and text readability features. Our system achieved 72.9% accuracy on the test data that is annotated manually and 60.8% on the test data that is annotated with distant supervision. Additional experiments showed that significant performance improvements can be achieved with better feature pre-processing.Comment: Hyperpartisanship, propaganda, news media, fake news, SemEval-201

arXiv.org e-Print Archive

Crossref

Difficult forms: critical practices of design and research

Author: Mazé Ramia
Redström Johan
Publication venue: IASDR / The Hong Kong Polytechnic University, School of Design
Publication date: 01/01/2007
Field of study

As a kind of 'criticism from within', conceptual and critical design inquire into what design is about – how the market operates, what is considered 'good design', and how the design and development of technology typically works. Tracing relations of conceptual and critical design to (post-)critical architecture and anti-design, we discuss a series of issues related to the operational and intellectual basis for 'critical practice', and how these might open up for a new kind of development of the conceptual and theoretical frameworks of design. Rather than prescribing a practice on the basis of theoretical considerations, these critical practices seem to build an intellectual basis for design on the basis of its own modes of operation, a kind of theoretical development that happens through, and from within, design practice and not by means of external descriptions or analyses of its practices and products

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Multilingual estimation of political-party positioning: From label aggregation to long-input Transformers

Author: Ceron Tanise
Nikolaev Dmitry
Padó Sebastian
Publication venue
Publication date: 19/10/2023
Field of study

Scaling analysis is a technique in computational political science that assigns a political actor (e.g. politician or party) a score on a predefined scale based on a (typically long) body of text (e.g. a parliamentary speech or an election manifesto). For example, political scientists have often used the left--right scale to systematically analyse political landscapes of different countries. NLP methods for automatic scaling analysis can find broad application provided they (i) are able to deal with long texts and (ii) work robustly across domains and languages. In this work, we implement and compare two approaches to automatic scaling analysis of political-party manifestos: label aggregation, a pipeline strategy relying on annotations of individual statements from the manifestos, and long-input-Transformer-based models, which compute scaling values directly from raw text. We carry out the analysis of the Comparative Manifestos Project dataset across 41 countries and 27 languages and find that the task can be efficiently solved by state-of-the-art models, with label aggregation producing the best results.Comment: Accepted to EMNLP 202

arXiv.org e-Print Archive

The Ethical Need for Watermarks in Machine-Generated Language

Author: Adomaitis Laurynas
Grinbaum Alexei
Publication venue
Publication date: 01/01/2022
Field of study

Watermarks should be introduced in the natural language outputs of AI systems in order to maintain the distinction between human and machine-generated text. The ethical imperative to not blur this distinction arises from the asemantic nature of large language models and from human projections of emotional and cognitive states on machines, possibly leading to manipulation, spreading falsehoods or emotional distress. Enforcing this distinction requires unintrusive, yet easily accessible marks of the machine origin. We propose to implement a code based on equidistant letter sequences. While no such code exists in human-written texts, its appearance in machine-generated ones would prove helpful for ethical reasons

arXiv.org e-Print Archive

HAL-CEA

Text mining for central banks: handbook

Author: Bholat David
Hansen Stephen
Santos Pedro
Schonhardt-Bailey Cheryl
Publication venue: Bank of England
Publication date: 01/01/2015
Field of study

LSE Research Online