6 research outputs found

    Unsupervised cross-lingual scaling of political texts

    Get PDF

    Political Text Scaling Meets Computational Semantics

    Full text link
    During the last fifteen years, automatic text scaling has become one of the key tools of the Text as Data community in political science. Prominent text scaling algorithms, however, rely on the assumption that latent positions can be captured just by leveraging the information about word frequencies in documents under study. We challenge this traditional view and present a new, semantically aware text scaling algorithm, SemScale, which combines recent developments in the area of computational linguistics with unsupervised graph-based clustering. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislative terms, and show that a scaling approach relying on semantic document representations is often better at capturing known underlying political dimensions than the established frequency-based (i.e., symbolic) scaling method. We further validate our findings through a series of experiments focused on text preprocessing and feature selection, document representation, scaling of party manifestos, and a supervised extension of our algorithm. To catalyze further research on this new branch of text scaling methods, we release a Python implementation of SemScale with all included data sets and evaluation procedures.Comment: Updated version - accepted for Transactions on Data Science (TDS

    Highlighting supranational institutions? An automated analysis of EU politicisation (2002–2017)

    Get PDF
    This article examines, using automated text analyses, the EU politicisation in the media of six Eurozone countries (Belgium, Germany, Greece, Ireland, Portugal and Spain), between 2002 and 2017. By contrasting creditor and debtor countries, the article analyses how the Eurozone crisis affected the politicisation of the EU and its institutions using a unique dataset of 165,341 articles from 12 newspapers. The results show that the Eurozone crisis increased the politicisation of the EU, particularly in the countries that were at the forefront of the Eurozone bailouts. Importantly, the crisis contributed as well to a more multifaceted news coverage of the European Union, namely with a greater emphasis given to supranational institutions vis-à-vis intergovernmental ones. Yet, this supranational coverage was associated with the increasingly negative tone of articles. To that extent, this study shows that greater mention of EU institutions may not necessarily contribute to a Europeanisation of public debates. Supplemental data for this article can be accessed online at: https://doi.org/10.1080/01402382.2021.1910778info:eu-repo/semantics/publishedVersio

    Data from the paper: Unsupervised Cross-Lingual Scaling of Political Texts

    No full text
    Code and data from the paper "Unsupervised Cross-Lingual Scaling of Political Texts", presented at EACL 2017
    corecore