7,417 research outputs found

    Topic Modelling of Everyday Sexism Project Entries

    Full text link
    The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another.Comment: preprint, under revie

    Dirichlet belief networks for topic structure learning

    Full text link
    Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

    Tweeting your Destiny: Profiling Users in the Twitter Landscape around an Online Game

    Full text link
    Social media has become a major communication channel for communities centered around video games. Consequently, social media offers a rich data source to study online communities and the discussions evolving around games. Towards this end, we explore a large-scale dataset consisting of over 1 million tweets related to the online multiplayer shooter Destiny and spanning a time period of about 14 months using unsupervised clustering and topic modelling. Furthermore, we correlate Twitter activity of over 3,000 players with their playtime. Our results contribute to the understanding of online player communities by identifying distinct player groups with respect to their Twitter characteristics, describing subgroups within the Destiny community, and uncovering broad topics of community interest.Comment: Accepted at IEEE Conference on Games 201

    Spatio-temporal distribution analysis of brand interest in social networks

    Get PDF
    Social Networks applications such as Facebook and Twitter became part of many people’s lives and are used daily by millions of users. In such platforms, users share their emotions, opinions, experiences, and thoughts. Twitter, in particular, is used to discuss diverse topics, including brands, their products and services. In this thesis, we analyse how brand interest is reflected on Twitter and how this platform can be used to monitor what people say about specific brands, as an indicator of brand interest. Brand interest can be defined as the level of interest one has in a brand, and the level of curiosity one has to learn more about a brand. For this work, the volume of tweets is used as a measure of brand interest. Our methodology is based on time, location, and the number of brand-related tweets to perform a spatio-temporal analysis. Additionally, we propose a framework for discovering latent patterns (topics) from a large dataset of grouped short messages to analyse brand interest, using Twitter as a data source. We applied a well-known Text Mining technique called Topic Modelling, which is an unsupervised learning technique used when dealing with text data, useful to uncover topics in a collection of documents. This technique provides a convenient way to retrieve information from unstructured text. Topic Modelling tasks have been applied to track events/trends and uncover topics in domains such as academic, public health, marketing, and so forth. The framework consists of training LDA (Latent Dirichlet Allocation) topic models on aggregated tweets, and then applying the model on different documents, also composed by grouped Twitter posts. Furthermore, we describe a set of pre-processing tasks that helped to improve the performance of topic models, enabling us to obtain a better output, thus performing a better analysis of it. The experiments demonstrated that Topic Modelling can successfully track people’s discussions on Social Networks even in massive datasets such as the one used in the current work, and capture those topics spiked by real-life eventsActualmente, plataformas como Twitter e Facebook fazem parte do dia-a-dia de muitas pessoas e são usadas por milhões de utilizadores. Nestas plataformas, denominadas Redes Sociais, os utilizadores partilham informações incluindo opiniões, sentimentos, experiências e pensamentos. A plataforma Twitter, em particular, e usada para partilhar diversos tópicos, que podem incluir dicussões sobre marcas, seus produtos e/ou serviços. O presente estudo analisa como o interesse numa marca e reflectido na Rede Social Twitter e apresenta uma metodologia que permite utilizar o Twitter como fonte de informação para monitorizar o que os utilizadores dizem acerca de determinadas marcas. O interesse numa marca pode ser definido como o nível de interesse que um indivíduo tem por uma marca, e o nível de curiosidade que um indivíduo tem e que o leva a aprender mais acerca dessa marca. Neste estudo, o número de tweets publicados e usado para medir o interesse nas marcas escolhidas. A metodologia seguida baseia-se na data em que o tweet foi publicado, localização, e número de publicações, para efectuar uma análise espacio-temporal. Adicionalmente, apresenta-se uma framework que possibilita a exploração de um vasto conjunto de dados, com o objectivo de revelar padrões latentes, bem como analisar o interesse nas marcas seleccionadas, usando o Twitter como fonte dados. Para o efeito, aplicou-se Topic Modelling, uma técnica de Text Mining bastante utilizada para descobrir tópicos em texto não estruturado. Algoritmos de Topic Modelling têm sido amplamente utilizados para monitorizar eventos e tendências e descobrir tópicos em áreas como educação, marketing, saúde, entre outras. A framework consiste em treinar o modelo de tópicos LDA (Latent Dirichlet Allocation) usando tweets agrupados (considerando determinado critério) e posteriormente aplicar o modelo treinado noutro conjunto de tweets agrupados (considerando outro critério). Descreve-se um conjunto de tarefas de pré-processamento dos dados que ajudaram a melhorar o desempenho dos modelos, a obter melhor resultados e, consequentemente, a efectuar uma melhor análise. As experiências revelam que atravês de Topic Modelling e possível rastrear dicussões de utilizadores de Redes Sociais durante um longo período de tempo, e capturar alterações relacionadas com acontecimentos reais
    • …