7,417 research outputs found
Topic Modelling of Everyday Sexism Project Entries
The Everyday Sexism Project documents everyday examples of sexism reported by
volunteer contributors from all around the world. It collected 100,000 entries
in 13+ languages within the first 3 years of its existence. The content of
reports in various languages submitted to Everyday Sexism is a valuable source
of crowdsourced information with great potential for feminist and gender
studies. In this paper, we take a computational approach to analyze the content
of reports. We use topic-modelling techniques to extract emerging topics and
concepts from the reports, and to map the semantic relations between those
topics. The resulting picture closely resembles and adds to that arrived at
through qualitative analysis, showing that this form of topic modeling could be
useful for sifting through datasets that had not previously been subject to any
analysis. More precisely, we come up with a map of topics for two different
resolutions of our topic model and discuss the connection between the
identified topics. In the low resolution picture, for instance, we found Public
space/Street, Online, Work related/Office, Transport, School, Media harassment,
and Domestic abuse. Among these, the strongest connection is between Public
space/Street harassment and Domestic abuse and sexism in personal
relationships.The strength of the relationships between topics illustrates the
fluid and ubiquitous nature of sexism, with no single experience being
unrelated to another.Comment: preprint, under revie
Dirichlet belief networks for topic structure learning
Recently, considerable research effort has been devoted to developing deep
architectures for topic models to learn topic structures. Although several deep
models have been proposed to learn better topic proportions of documents, how
to leverage the benefits of deep structures for learning word distributions of
topics has not yet been rigorously studied. Here we propose a new multi-layer
generative process on word distributions of topics, where each layer consists
of a set of topics and each topic is drawn from a mixture of the topics of the
layer above. As the topics in all layers can be directly interpreted by words,
the proposed model is able to discover interpretable topic hierarchies. As a
self-contained module, our model can be flexibly adapted to different kinds of
topic models to improve their modelling accuracy and interpretability.
Extensive experiments on text corpora demonstrate the advantages of the
proposed model.Comment: accepted in NIPS 201
Tweeting your Destiny: Profiling Users in the Twitter Landscape around an Online Game
Social media has become a major communication channel for communities
centered around video games. Consequently, social media offers a rich data
source to study online communities and the discussions evolving around games.
Towards this end, we explore a large-scale dataset consisting of over 1 million
tweets related to the online multiplayer shooter Destiny and spanning a time
period of about 14 months using unsupervised clustering and topic modelling.
Furthermore, we correlate Twitter activity of over 3,000 players with their
playtime. Our results contribute to the understanding of online player
communities by identifying distinct player groups with respect to their Twitter
characteristics, describing subgroups within the Destiny community, and
uncovering broad topics of community interest.Comment: Accepted at IEEE Conference on Games 201
Spatio-temporal distribution analysis of brand interest in social networks
Social Networks applications such as Facebook and Twitter became part of many people’s
lives and are used daily by millions of users. In such platforms, users share their emotions,
opinions, experiences, and thoughts. Twitter, in particular, is used to discuss diverse topics,
including brands, their products and services. In this thesis, we analyse how brand interest is
reflected on Twitter and how this platform can be used to monitor what people say about specific
brands, as an indicator of brand interest. Brand interest can be defined as the level of interest
one has in a brand, and the level of curiosity one has to learn more about a brand. For this work,
the volume of tweets is used as a measure of brand interest. Our methodology is based on time,
location, and the number of brand-related tweets to perform a spatio-temporal analysis.
Additionally, we propose a framework for discovering latent patterns (topics) from a large
dataset of grouped short messages to analyse brand interest, using Twitter as a data source. We
applied a well-known Text Mining technique called Topic Modelling, which is an unsupervised
learning technique used when dealing with text data, useful to uncover topics in a collection
of documents. This technique provides a convenient way to retrieve information from unstructured text. Topic Modelling tasks have been applied to track events/trends and uncover topics
in domains such as academic, public health, marketing, and so forth. The framework consists of training LDA (Latent Dirichlet Allocation) topic models on aggregated tweets, and then
applying the model on different documents, also composed by grouped Twitter posts. Furthermore, we describe a set of pre-processing tasks that helped to improve the performance of topic
models, enabling us to obtain a better output, thus performing a better analysis of it. The experiments demonstrated that Topic Modelling can successfully track people’s discussions on Social
Networks even in massive datasets such as the one used in the current work, and capture those
topics spiked by real-life eventsActualmente, plataformas como Twitter e Facebook fazem parte do dia-a-dia de muitas pessoas e são usadas por milhões de utilizadores. Nestas plataformas, denominadas Redes Sociais,
os utilizadores partilham informações incluindo opiniões, sentimentos, experiências e pensamentos. A plataforma Twitter, em particular, e usada para partilhar diversos tópicos, que podem
incluir dicussões sobre marcas, seus produtos e/ou serviços. O presente estudo analisa como o
interesse numa marca e reflectido na Rede Social Twitter e apresenta uma metodologia que permite utilizar o Twitter como fonte de informação para monitorizar o que os utilizadores dizem
acerca de determinadas marcas. O interesse numa marca pode ser definido como o nÃvel de
interesse que um indivÃduo tem por uma marca, e o nÃvel de curiosidade que um indivÃduo tem
e que o leva a aprender mais acerca dessa marca. Neste estudo, o número de tweets publicados
e usado para medir o interesse nas marcas escolhidas. A metodologia seguida baseia-se na data
em que o tweet foi publicado, localização, e número de publicações, para efectuar uma análise
espacio-temporal.
Adicionalmente, apresenta-se uma framework que possibilita a exploração de um vasto
conjunto de dados, com o objectivo de revelar padrões latentes, bem como analisar o interesse
nas marcas seleccionadas, usando o Twitter como fonte dados. Para o efeito, aplicou-se Topic
Modelling, uma técnica de Text Mining bastante utilizada para descobrir tópicos em texto não
estruturado. Algoritmos de Topic Modelling têm sido amplamente utilizados para monitorizar
eventos e tendências e descobrir tópicos em áreas como educação, marketing, saúde, entre outras. A framework consiste em treinar o modelo de tópicos LDA (Latent Dirichlet Allocation)
usando tweets agrupados (considerando determinado critério) e posteriormente aplicar o modelo treinado noutro conjunto de tweets agrupados (considerando outro critério). Descreve-se um
conjunto de tarefas de pré-processamento dos dados que ajudaram a melhorar o desempenho dos modelos, a obter melhor resultados e, consequentemente, a efectuar uma melhor análise. As experiências revelam que atravês de Topic Modelling e possÃvel rastrear dicussões de utilizadores
de Redes Sociais durante um longo perÃodo de tempo, e capturar alterações relacionadas com acontecimentos reais
- …