1,730 research outputs found
Exploring the meaning behind twitter hashtags through clustering
Abstract. Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map reduce in order to process data in a distributed manner. Our intention is to retrieve connections that might exist between different hashtags and their textual representation, and grasp their semantics through the main topics they occur with
Temporal word embeddings for dynamic user profiling in Twitter
The research described in this paper focused on exploring
the domain of user profiling, a nascent and contentious technology which
has been steadily attracting increased interest from the research community as its potential for providing personalised digital services is realised.
An extensive review of related literature revealed that limited research
has been conducted into how temporal aspects of users can be captured
using user profiling techniques. This, coupled with the notable lack of
research into the use of word embedding techniques to capture temporal
variances in language, revealed an opportunity to extend the Random Indexing word embedding technique such that the interests of users could
be modelled based on their use of language. To achieve this, this work
concerned itself with extending an existing implementation of Temporal
Random Indexing to model Twitter users across multiple granularities of
time based on their use of language. The product of this is a novel technique for temporal user profiling, where a set of vectors is used to describe
the evolution of a Twitter user’s interests over time through their use of
language. The vectors produced were evaluated against a temporal implementation of another state-of-the-art word embedding technique, the
Word2Vec Dynamic Independent Skip-gram model, where it was found
that Temporal Random Indexing outperformed Word2Vec in the generation of temporal user profiles
Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter
Over the past few years, online bullying and aggression have become
increasingly prominent, and manifested in many different forms on social media.
However, there is little work analyzing the characteristics of abusive users
and what distinguishes them from typical social media users. In this paper, we
start addressing this gap by analyzing tweets containing a great large amount
of abusiveness. We focus on a Twitter dataset revolving around the Gamergate
controversy, which led to many incidents of cyberbullying and cyberaggression
on various gaming and social media platforms. We study the properties of the
users tweeting about Gamergate, the content they post, and the differences in
their behavior compared to typical Twitter users.
We find that while their tweets are often seemingly about aggressive and
hateful subjects, "Gamergaters" do not exhibit common expressions of online
anger, and in fact primarily differ from typical users in that their tweets are
less joyful. They are also more engaged than typical Twitter users, which is an
indication as to how and why this controversy is still ongoing. Surprisingly,
we find that Gamergaters are less likely to be suspended by Twitter, thus we
analyze their properties to identify differences from typical users and what
may have led to their suspension. We perform an unsupervised machine learning
analysis to detect clusters of users who, though currently active, could be
considered for suspension since they exhibit similar behaviors with suspended
users. Finally, we confirm the usefulness of our analyzed features by emulating
the Twitter suspension mechanism with a supervised learning method, achieving
very good precision and recall.Comment: In 28th ACM Conference on Hypertext and Social Media (ACM HyperText
2017
The social sciences and the web : From ‘Lurking’ to interdisciplinary ‘Big Data’ research
Acknowledgements This research is supported by the award made by the RCUK Digital Economy theme to the dot.rural Digital Economy Hub (award reference: EP/G066051/1) and the UK Economic & Social Research Council (ESRC) (award reference: ES/M001628/1).Peer reviewedPublisher PD
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
Towards a Computational Model of Narrative on Social Media
This thesis describes a variety of approaches to developing a computational model of narrative on social media. Our goal is to use such a narrative model to identify efforts to manipulate public opinion on social media platforms like Twitter. We present a model in which narratives in a collection of tweets are represented as a graph. Elements from each tweet that are relevant to potential narratives are made into nodes in the graph; for this thesis, we populate graph nodes with tweets’ authors, hashtags, named entities (people, locations, organizations, etc.,), and moral foundations (central moral values framing the discussion). Two nodes are connected with an edge if the narrative elements they represent appear together in one or more tweets, with the edge weight corresponding to the number of tweets in which these elements coincide. We then explore multiple possible deep learning and graph analysis methods for identifying narratives in a collection of tweets, including clustering of language embeddings, topic modeling, community detection and random walks on our narrative graph, training a graph neural network to identify narratives in the graph, and training a graph embedding model to generate vector embeddings of graph nodes. While much work still remains to be done in this area, several of our techniques, especially the generation and clustering of graph embeddings, were able to identify groups of related and connected nodes that might form the beginnings of narratives. Further study of these or other techniques could allow for the reliable identification of full narratives and information operations on social media
Quantitative intersectional data (QUINTA): a #metoo case study
This research began as an investigation of the #metoo movement, with the initial impetus to illuminate the voices located on the margins, those who often go unheard or are never recognized. This work aimed to understand the intersectional aspects of how these hashtag variations of the hashtag #metoo (i.e. #metoomosque, #churchtoo, #metoodisable, #metooqueer, #metoochina, etc) reveal the inequities of the #metoo movement on Twitter. The proliferation of these hashtag variations has often been ignored by scholars, and therefore absorbed into the larger #metoo movement conversation on Twitter. Therefore, the term `hashtag derivative\u27 was created to describe the variation on the theme of its original hashtag, strongly reflecting its composition.
Moreover, a critical theory such as Intersectionality is well-equipped to explore how overlapping identities encounter structure social reality relationship to power. Amid a pandemic and racial unrest, the true capabilities of Intersectionality to describe inequities and injustices beyond the singular social position of race and gender are not widely understood. Data science, is not absolved of its role in inequities and injustices merely by dint of being a quantitative field that claims to ``objectivity\u27\u27. Social scientists have illuminated the racism, sexism, ableism, transphobia, homophobia, prejudice, bigotry, and bias embedded in data science\u27s technology, tools, and algorithms. This has, direct and indirectly, grave consequences on an entire community as a whole as well as marginalized communities.
The application of Intersectionality into a quantitative field can provide researchers a formal structure to be more conscientious about how to critique, develop, and design their data science processes, while also reckoning with their own positioning in relationship to the data. In this way, Intersectionality is inclusive in terms of data equity yet adds an additional layer of accountability to the researcher. This research leads to the three critical contributions of this work: (1) creating a more concise terminology to describe the phenomenon of hashtag variation, known as hashtag derivatives, (2) defining the historical context of Intersectionality and building a formal case for this to be properly contextualized in the Computer Science field (in particular Data Science), and (3) developing the Quantitative Intersectional Data (QUINTA) Framework which data scientists and scholars can use to be more equitable, inclusive and accountable for their role in the data science process
- …