Search CORE

11 research outputs found

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

Author: Herzog Alexander
John Peter
Mikhaylov Slava Jankin
Publication venue
Publication date: 03/06/2018
Field of study

Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models use unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and suffers from human bias. We present a semi-automatic transfer topic labeling method that seeks to remedy these problems. Domain-specific codebooks form the knowledge-base for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches 1935-2014, using the coding instructions of the Comparative Agendas Project to label topics. We show that our method works well for a majority of the topics we estimate; but we also find that institution-specific topics, in particular on subnational governance, require manual input. We validate our results using human expert coding

arXiv.org e-Print Archive

University of Birmingham Research Portal

Two Computational Models for Analyzing Political Attention in Social Media

Author: Hemphill Libby
Schöpke-Gonzalez Angela
Publication venue
Publication date: 01/01/2019
Field of study

Understanding how political attention is divided and over what subjects is crucial for research on areas such as agenda setting, framing, and political rhetoric. However, existing methods for measuring attention, such as manual labeling ac- cording to established codebooks, are expensive and restric- tive. We describe two computational models that automati- cally distinguish topics in politicians’ social media content. Our models - one supervised classifier and one unsupervised topic model - provide different benefits. The supervised clas- sifier reduces the labor required to classify content accord- ing to pre-determined topic lists. However, tweets do more than communicate policy positions. Our unsupervised model uncovers both political topics and other Twitter uses (e.g., constituent service). Together, these models are effective, in- expensive computational tools for political communication and social media research. We demonstrate their utility and discuss the different analyses they afford by applying both models to the tweets posted by members of the 115th U.S. Congress.This material is based upon work supported by the National Science Foundation under Grant No. 1822228.https://deepblue.lib.umich.edu/bitstream/2027.42/147460/6/Hemphill and Schopke - Two Compuational Models.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147460/1/Hemphill and Schopke - Two Computational Models.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147460/8/ICWSM 2020 Two Computational Models.pptx5056Description of Hemphill and Schopke - Two Compuational Models.pdf : Revised articleDescription of Hemphill and Schopke - Two Computational Models.pdf : Main articleDescription of ICWSM 2020 Two Computational Models.pptx : Presentation with scrip

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Deep Blue Documents at the University of Michigan

Multi-Dimensional Explanation of Target Variables from Documents

Author: Antognini Diego
Faltings Boi
Musat Claudiu
Publication venue
Publication date: 21/12/2020
Field of study

Automated predictions require explanations to be interpretable by humans. Past work used attention and rationale mechanisms to find words that predict the target variable of a document. Often though, they result in a tradeoff between noisy explanations or a drop in accuracy. Furthermore, rationale methods cannot capture the multi-faceted nature of justifications for multiple targets, because of the non-probabilistic nature of the mask. In this paper, we propose the Multi-Target Masker (MTM) to address these shortcomings. The novelty lies in the soft multi-dimensional mask that models a relevance probability distribution over the set of target variables to handle ambiguities. Additionally, two regularizers guide MTM to induce long, meaningful explanations. We evaluate MTM on two datasets and show, using standard metrics and human annotations, that the resulting masks are more accurate and coherent than those generated by the state-of-the-art methods. Moreover, MTM is the first to also achieve the highest F1 scores for all the target variables simultaneously.Comment: Accepted in AAAI 2021. 18 pages, 14 figures, 9 table

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Association for the Advancement of Artificial Intelligence: AAAI Publications

Propuesta de aprendizaje ontológico a partir de datos textuales que aporte a la construcción del carácter adaptativo de una conceptualización unificadora y formal del dominio de liderazgo

Author: Gómez Carrillo Carolina
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2022
Field of study

El liderazgo es un área importante de investigación que ha sido abordada desde diferentes enfoques. Lo anterior ha generado múltiples explicaciones o interpretaciones, llegando a surgir superposición y ambigüedad en la información construida. En consecuencia, se ha dificultado la concepción de una conceptualización unificadora, adecuadamente construida, consisa y sin ambiguedades que brinde una comprensión global de esta área. Este trabajo presenta una propuesta de aprendizaje ontológico a partir de datos textuales para establecer vocabulario y conceptos del domino de liderazgo. De esta forma, aportar a la construcción del carácter adaptativo de una conceptualización unificadora y formal de este dominio, que permitaidentificar y expresar los cambios que se experimentan en este dominio.Leadership is an important area of research that has been approached from different perspectives. This has generated multiple explanations or interpretations, leading to overlapping and ambiguity in the constructed information. Consequently, the conception of a unifying, adequately constructed, concise and unambiguous conceptualization that provides a global understanding of this area has been difficult. This work presents an ontological learning proposal based on textual data to establish vocabulary, and concepts of the leadership domain. In this way, contribute to the construction of the adaptive character of a unifying and formal conceptualization of this domain, which allows identifying and expressing the changes that are experienced in this domain.MaestríaMagíster en Investigación Operativa y EstadísticaContenido Capítulo 1. Introducción.......................................................................................................... 9 Capítulo 2. Descripción del proyecto.................................................................................... 11 2.1. Planteamiento del problema de investigación y justificación ....................................... 11 Capítulo 3. Marco de referencia ........................................................................................... 17 3.1. Antecedentes y Estado del arte ..................................................................................... 17 3.1.1. Liderazgo................................................................................................................ 17 3.1.2. Esquematización del liderazgo............................................................................... 19 3.1.3. Aprendizaje ontológico .......................................................................................... 26 3.1.4. Modelos de tópicos.............................................................................................. 27 Capítulo 4. Objetivos ............................................................................................................. 31 Objetivo general:.................................................................................................................. 31 Objetivos específicos: .......................................................................................................... 31 Capítulo 5. Metodología......................................................................................................... 32 5.1. Fase I: Preparación de datos textuales .......................................................................... 33 5.1.1. Preprocesamiento de los documentos .................................................................... 33 5.2. Fase II: Evaluación del etiquetado del corpus............................................................... 34 5.3. Fase III: Construcción y evaluación de vocabulario de términos relevantes................ 34 5.3.1. Construcción y evaluación de vocabulario ............................................................ 34 5.3.2. Clasificación de Oraciones..................................................................................... 37 5.4. Fase IV: Conceptos....................................................................................................... 44 5.4.1. Modelos de tópicos ................................................................................................ 44 5.4.2. Evaluación basada en gold standard ...................................................................... 51 5.4.3. Topic coherence (TC) ............................................................................................ 51 5.4.4. Clasificación de documentos: ................................................................................ 51 Capítulo 6. Resultados y discusión ....................................................................................... 53 6.1. Fase I: Preprocesamiento .............................................................................................. 53 6.2. Fase II: Evaluación de las etiquetas: ............................................................................. 56 6.3. Fase III: Términos- Construcción y evaluación de vocabulario ................................... 57 6.3.1. Evaluación basada en gold standard ...................................................................... 57 6.3.2. Clasificación de oraciones...................................................................................... 59 6.4. Fase IV: Conceptos....................................................................................................... 62 6.4.1. Evaluación de estructuras ontológicas ................................................................... 63 Capítulo 7. Conclusiones ....................................................................................................... 73 Referencias.............................................................................................................................. 7

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio academico de la Universidad Tecnológica de Pereira

Macro-micro approach for mining public sociopolitical opinion from social media

Author: Wang Bo
Publication venue
Publication date
Field of study

During the past decade, we have witnessed the emergence of social media, which has prominence as a means for the general public to exchange opinions towards a broad range of topics. Furthermore, its social and temporal dimensions make it a rich resource for policy makers and organisations to understand public opinion. In this thesis, we present our research in understanding public opinion on Twitter along three dimensions: sentiment, topics and summary. In the first line of our work, we study how to classify public sentiment on Twitter. We focus on the task of multi-target-specific sentiment recognition on Twitter, and propose an approach which utilises the syntactic information from parse-tree in conjunction with the left-right context of the target. We show the state-of-the-art performance on two datasets including a multi-target Twitter corpus on UK elections which we make public available for the research community. Additionally we also conduct two preliminary studies including cross-domain emotion classification on discourse around arts and cultural experiences, and social spam detection to improve the signal-to-noise ratio of our sentiment corpus. Our second line of work focuses on automatic topical clustering of tweets. Our aim is to group tweets into a number of clusters, with each cluster representing a meaningful topic, story, event or a reason behind a particular choice of sentiment. We explore various ways of tackling this challenge and propose a two-stage hierarchical topic modelling system that is efficient and effective in achieving our goal. Lastly, for our third line of work, we study the task of summarising tweets on common topics, with the goal to provide informative summaries for real-world events/stories or explanation underlying the sentiment expressed towards an issue/entity. As most existing tweet summarisation approaches rely on extractive methods, we propose to apply state-of-the-art neural abstractive summarisation model for tweets. We also tackle the challenge of cross-medium supervised summarisation with no target-medium training resources. To the best of our knowledge, there is no existing work on studying neural abstractive summarisation on tweets. In addition, we present a system for providing interactive visualisation of topic-entity sentiments and the corresponding summaries in chronological order. Throughout our work presented in this thesis, we conduct experiments to evaluate and verify the effectiveness of our proposed models, comparing to relevant baseline methods. Most of our evaluations are quantitative, however, we do perform qualitative analyses where it is appropriate. This thesis provides insights and findings that can be used for better understanding public opinion in social media

Warwick Research Archives Portal Repository

A graph theoretical perspective for the unsupervised clustering of free text corpora

Author: Altuncu Muhammed Tarık
Publication venue: Mathematics, Imperial College London
Publication date: 01/09/2021
Field of study

This thesis introduces a robust end to end topic discovery framework that extracts a set of coherent topics stemming intrinsically from document similarities. Some topic clustering methods can support embedded vectors instead of traditional Bag-of-Words (BoW) representation. Some can be free from the number of topics hyperparameter and some others can extract a multi-scale relation between topics. However, no topic clustering method supports all these properties together. This thesis focuses on this gap in the literature by designing a framework that supports any type of document-level features especially the embedded vectors. This framework does not require any uninformed decision making about the underlying data such as the number of topics, instead, the framework extracts topics in multiple resolutions. To achieve this goal, we combine existing methods from natural language processing (NLP) for feature generation and graph theory, first for graph construction based on semantic document similarities, then for graph partitioning to extract corresponding topics in multiple resolutions. Finally, we use specific methods from statistical machine learning to obtain highly generalisable supervised models to deploy topic classifiers for the deployment of topic extraction in real-time. Our applications on both a noisy and specialised corpus of medical records (i.e., descriptions for patient incidents within the NHS) and public news articles in daily language show that our framework extracts coherent topics that have better quantitative benchmark scores than other methods in most cases. The resulting multi-scale topics in both applications enable us to capture specific details more easily and choose the relevant resolutions for the specific objective. This study contributes to topic clustering literature by introducing a novel graph theoretical perspective that provides a combination of new properties. These properties are multiple resolutions, independence from uninformed decisions about the corpus, and usage of recent NLP features, such as vector embeddings.Open Acces

Spiral - Imperial College Digital Repository