1,730 research outputs found

    Exploring the meaning behind twitter hashtags through clustering

    Get PDF
    Abstract. Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map reduce in order to process data in a distributed manner. Our intention is to retrieve connections that might exist between different hashtags and their textual representation, and grasp their semantics through the main topics they occur with

    Temporal word embeddings for dynamic user profiling in Twitter

    Get PDF
    The research described in this paper focused on exploring the domain of user profiling, a nascent and contentious technology which has been steadily attracting increased interest from the research community as its potential for providing personalised digital services is realised. An extensive review of related literature revealed that limited research has been conducted into how temporal aspects of users can be captured using user profiling techniques. This, coupled with the notable lack of research into the use of word embedding techniques to capture temporal variances in language, revealed an opportunity to extend the Random Indexing word embedding technique such that the interests of users could be modelled based on their use of language. To achieve this, this work concerned itself with extending an existing implementation of Temporal Random Indexing to model Twitter users across multiple granularities of time based on their use of language. The product of this is a novel technique for temporal user profiling, where a set of vectors is used to describe the evolution of a Twitter user’s interests over time through their use of language. The vectors produced were evaluated against a temporal implementation of another state-of-the-art word embedding technique, the Word2Vec Dynamic Independent Skip-gram model, where it was found that Temporal Random Indexing outperformed Word2Vec in the generation of temporal user profiles

    Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter

    Get PDF
    Over the past few years, online bullying and aggression have become increasingly prominent, and manifested in many different forms on social media. However, there is little work analyzing the characteristics of abusive users and what distinguishes them from typical social media users. In this paper, we start addressing this gap by analyzing tweets containing a great large amount of abusiveness. We focus on a Twitter dataset revolving around the Gamergate controversy, which led to many incidents of cyberbullying and cyberaggression on various gaming and social media platforms. We study the properties of the users tweeting about Gamergate, the content they post, and the differences in their behavior compared to typical Twitter users. We find that while their tweets are often seemingly about aggressive and hateful subjects, "Gamergaters" do not exhibit common expressions of online anger, and in fact primarily differ from typical users in that their tweets are less joyful. They are also more engaged than typical Twitter users, which is an indication as to how and why this controversy is still ongoing. Surprisingly, we find that Gamergaters are less likely to be suspended by Twitter, thus we analyze their properties to identify differences from typical users and what may have led to their suspension. We perform an unsupervised machine learning analysis to detect clusters of users who, though currently active, could be considered for suspension since they exhibit similar behaviors with suspended users. Finally, we confirm the usefulness of our analyzed features by emulating the Twitter suspension mechanism with a supervised learning method, achieving very good precision and recall.Comment: In 28th ACM Conference on Hypertext and Social Media (ACM HyperText 2017

    The social sciences and the web : From ‘Lurking’ to interdisciplinary ‘Big Data’ research

    Get PDF
    Acknowledgements This research is supported by the award made by the RCUK Digital Economy theme to the dot.rural Digital Economy Hub (award reference: EP/G066051/1) and the UK Economic & Social Research Council (ESRC) (award reference: ES/M001628/1).Peer reviewedPublisher PD

    Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

    Get PDF
    In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature

    Towards a Computational Model of Narrative on Social Media

    Get PDF
    This thesis describes a variety of approaches to developing a computational model of narrative on social media. Our goal is to use such a narrative model to identify efforts to manipulate public opinion on social media platforms like Twitter. We present a model in which narratives in a collection of tweets are represented as a graph. Elements from each tweet that are relevant to potential narratives are made into nodes in the graph; for this thesis, we populate graph nodes with tweets’ authors, hashtags, named entities (people, locations, organizations, etc.,), and moral foundations (central moral values framing the discussion). Two nodes are connected with an edge if the narrative elements they represent appear together in one or more tweets, with the edge weight corresponding to the number of tweets in which these elements coincide. We then explore multiple possible deep learning and graph analysis methods for identifying narratives in a collection of tweets, including clustering of language embeddings, topic modeling, community detection and random walks on our narrative graph, training a graph neural network to identify narratives in the graph, and training a graph embedding model to generate vector embeddings of graph nodes. While much work still remains to be done in this area, several of our techniques, especially the generation and clustering of graph embeddings, were able to identify groups of related and connected nodes that might form the beginnings of narratives. Further study of these or other techniques could allow for the reliable identification of full narratives and information operations on social media

    Quantitative intersectional data (QUINTA): a #metoo case study

    Get PDF
    This research began as an investigation of the #metoo movement, with the initial impetus to illuminate the voices located on the margins, those who often go unheard or are never recognized. This work aimed to understand the intersectional aspects of how these hashtag variations of the hashtag #metoo (i.e. #metoomosque, #churchtoo, #metoodisable, #metooqueer, #metoochina, etc) reveal the inequities of the #metoo movement on Twitter. The proliferation of these hashtag variations has often been ignored by scholars, and therefore absorbed into the larger #metoo movement conversation on Twitter. Therefore, the term `hashtag derivative\u27 was created to describe the variation on the theme of its original hashtag, strongly reflecting its composition. Moreover, a critical theory such as Intersectionality is well-equipped to explore how overlapping identities encounter structure social reality relationship to power. Amid a pandemic and racial unrest, the true capabilities of Intersectionality to describe inequities and injustices beyond the singular social position of race and gender are not widely understood. Data science, is not absolved of its role in inequities and injustices merely by dint of being a quantitative field that claims to ``objectivity\u27\u27. Social scientists have illuminated the racism, sexism, ableism, transphobia, homophobia, prejudice, bigotry, and bias embedded in data science\u27s technology, tools, and algorithms. This has, direct and indirectly, grave consequences on an entire community as a whole as well as marginalized communities. The application of Intersectionality into a quantitative field can provide researchers a formal structure to be more conscientious about how to critique, develop, and design their data science processes, while also reckoning with their own positioning in relationship to the data. In this way, Intersectionality is inclusive in terms of data equity yet adds an additional layer of accountability to the researcher. This research leads to the three critical contributions of this work: (1) creating a more concise terminology to describe the phenomenon of hashtag variation, known as hashtag derivatives, (2) defining the historical context of Intersectionality and building a formal case for this to be properly contextualized in the Computer Science field (in particular Data Science), and (3) developing the Quantitative Intersectional Data (QUINTA) Framework which data scientists and scholars can use to be more equitable, inclusive and accountable for their role in the data science process
    corecore