31 research outputs found

    Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

    Get PDF
    Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models use unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and suffers from human bias. We present a semi-automatic transfer topic labeling method that seeks to remedy these problems. Domain-specific codebooks form the knowledge-base for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches 1935-2014, using the coding instructions of the Comparative Agendas Project to label topics. We show that our method works well for a majority of the topics we estimate; but we also find that institution-specific topics, in particular on subnational governance, require manual input. We validate our results using human expert coding

    Analysis of Computational Science Papers from ICCS 2001-2016 using Topic Modeling and Graph Theory

    Get PDF
    This paper presents results of topic modeling and network models of topics using the International Conference on Computational Science corpus, which contains domain-specific (computational science) papers over sixteen years (a total of 5695 papers). We discuss topical structures of International Conference on Computational Science, how these topics evolve over time in response to the topicality of various problems, technologies and methods, and how all these topics relate to one another. This analysis illustrates multidisciplinary research and collaborations among scientific communities, by constructing static and dynamic networks from the topic modeling results and the keywords of authors. The results of this study give insights about the past and future trends of core discussion topics in computational science. We used the Non-negative Matrix Factorization topic modeling algorithm to discover topics and labeled and grouped results hierarchically.Comment: Accepted by International Conference on Computational Science (ICCS) 2017 which will be held in Zurich, Switzerland from June 11-June 1

    Event-based Access to Historical Italian War Memoirs

    Full text link
    The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

    Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies

    Full text link
    With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform thematic explorations on collections of texts in multiple languages. However, these approaches require theme-aligned training data to create a language-independent space. This constraint limits the amount of scenarios that this technique can offer solutions to train and makes it difficult to scale up to situations where a huge collection of multi-lingual documents are required during the training phase. This paper presents an unsupervised document similarity algorithm that does not require parallel or comparable corpora, or any other type of translation resource. The algorithm annotates topics automatically created from documents in a single language with cross-lingual labels and describes documents by hierarchies of multi-lingual concepts from independently-trained models. Experiments performed on the English, Spanish and French editions of JCR-Acquis corpora reveal promising results on classifying and sorting documents by similar content.Comment: Accepted at the 10th International Conference on Knowledge Capture (K-CAP 2019

    Two Computational Models for Analyzing Political Attention in Social Media

    Get PDF
    Understanding how political attention is divided and over what subjects is crucial for research on areas such as agenda setting, framing, and political rhetoric. However, existing methods for measuring attention, such as manual labeling ac- cording to established codebooks, are expensive and restric- tive. We describe two computational models that automati- cally distinguish topics in politicians’ social media content. Our models - one supervised classifier and one unsupervised topic model - provide different benefits. The supervised clas- sifier reduces the labor required to classify content accord- ing to pre-determined topic lists. However, tweets do more than communicate policy positions. Our unsupervised model uncovers both political topics and other Twitter uses (e.g., constituent service). Together, these models are effective, in- expensive computational tools for political communication and social media research. We demonstrate their utility and discuss the different analyses they afford by applying both models to the tweets posted by members of the 115th U.S. Congress.This material is based upon work supported by the National Science Foundation under Grant No. 1822228.https://deepblue.lib.umich.edu/bitstream/2027.42/147460/6/Hemphill and Schopke - Two Compuational Models.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147460/1/Hemphill and Schopke - Two Computational Models.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147460/8/ICWSM 2020 Two Computational Models.pptx5056Description of Hemphill and Schopke - Two Compuational Models.pdf : Revised articleDescription of Hemphill and Schopke - Two Computational Models.pdf : Main articleDescription of ICWSM 2020 Two Computational Models.pptx : Presentation with scrip

    Speaking in unison? Explaining the role of agenda-setter constellations in the ECB policy agenda using a network-based approach

    Get PDF
    Policy agendas are a well-studied institutional level phenomenon that capture the set of policy issues that an institution pays attention to over time. They are emergent in nature in that individual behaviour shapes institutional level outcomes when policy makers allocate attention to policy issues. To examine the link between individual-level actions and system-level outcomes we introduce the concept of the agenda-setting constellation, defined as a group of policy makers paying attention to a set of policy issues. Taking the European Central Bank as a case study, and using a combination of text-analysis and networks-analysis techniques, we demonstrate how these meso-level structures shape the evolving policy agenda. We then examine the roles of personal experience, institutional constraints, and policy context in driving agenda-setter constellation membership. Our results show the value of studying policy agendas as networked processes and the key role that agenda-setter constellations play in driving policy agenda dynamics
    corecore