10 research outputs found

    TwitterMancer: predicting interactions on Twitter accurately

    Full text link
    This paper investigates the interplay between different types of user interactions on Twitter, with respect to predicting missing or unseen interactions. For example, given a set of retweet interactions between Twitter users, how accurately can we predict reply interactions? Is it more difficult to predict retweet or quote interactions between a pair of accounts? Also, how important is time locality, and which features of interaction patterns are most important to enable accurate prediction of specific Twitter interactions? Our empirical study of Twitter interactions contributes initial answers to these questions.We have crawled an extensive data set of Greek-speaking Twitter accounts and their follow, quote, retweet, reply interactions over a period of a month. We find we can accurately predict many interactions of Twitter users. Interestingly, the most predictive features vary with the user profiles, and are not the same across all users. For example, for a pair of users that interact with a large number of other Twitter users, we find that certain “higher-dimensional” triads, i.e., triads that involve multiple types of interactions, are very informative, whereas for less active Twitter users, certain in-degrees and out-degrees play a major role. Finally, we provide various other insights on Twitter user behavior. Our code and data are available at https://github.com/twittermancer/.Accepted manuscrip

    TwitterMancer: Predicting Interactions on Twitter Accurately

    Full text link
    This paper investigates the interplay between different types of user interactions on Twitter, with respect to predicting missing or unseen interactions. For example, given a set of retweet interactions between Twitter users, how accurately can we predict reply interactions? Is it more difficult to predict retweet or quote interactions between a pair of accounts? Also, how important is time locality, and which features of interaction patterns are most important to enable accurate prediction of specific Twitter interactions? Our empirical study of Twitter interactions contributes initial answers to these questions. We have crawled an extensive dataset of Greek-speaking Twitter accounts and their follow, quote, retweet, reply interactions over a period of a month. We find we can accurately predict many interactions of Twitter users. Interestingly, the most predictive features vary with the user profiles, and are not the same across all users. For example, for a pair of users that interact with a large number of other Twitter users, we find that certain "higher-dimensional" triads, i.e., triads that involve multiple types of interactions, are very informative, whereas for less active Twitter users, certain in-degrees and out-degrees play a major role. Finally, we provide various other insights on Twitter user behavior. Our code and data are available at https://github.com/twittermancer/. Keywords: Graph mining, machine learning, social media, social network

    Raumgeographische Verteilung von Twitter-Hashtags im deutschen Sprachraum

    Get PDF
    Diese Studie untersucht die räumliche Verteilung von Hashtags in einem Korpus deutschsprachiger Tweets unter Berücksichtigung dreier Arten von Nutzerstandortinformationen: exakter Standort, kodiert als Breitengrad-Längengrad-Koordinaten, ein „place“-Attribut, ausgewählt aus einer von Twitter geführten Liste von Orten, oder ein freier Eintrag im Nutzerprofil. Hashtags in Tweets mit exakten Ortsangaben weisen mit etwas höherer Wahrscheinlichkeit eine räumliche Konzentration auf als Hashtags mit Orts- oder Nutzerangaben, was möglicherweise auf die Verwendung von Mobilgeräten zur Veröffentlichung von Tweets zurückzuführen ist. Die Analyse der räumlichen Autokorrelation zeigt zwar, dass die meisten Hashtags keine starke räumliche Tendenz aufweisen, aber bei denjenigen, bei denen dies der Fall ist, handelt es sich meistens um Toponyme, Appellativa oder Eigennamen, die mit bestimmten Orten in Verbindung gebracht werden, wie eine auf Kartierung der Autokorrelationswerte veranschaulicht. Darüber hinaus beschreiben einige Hashtags, die eine räumliche Tendenz aufweisen, lokalisierte geografische oder meteorologische Phänomene.This study examines the spatial distribution of hashtags in a corpus of German-language tweets by considering three kinds of user location information: exact location encoded as latitude-longitude coordinates, a „place“ attribute selected from a Twitter-maintained list of places, or a free-form entry in the user profile. Hashtags in tweets with exact locations are slightly more likely to show spatial concentration, compared to hashtags with place or user location information, which may reflect the use of mobile devices to publish tweets. While spatial autocorrelation analysis shows that most hashtags do not exhibit a strong spatial tendency, those that do are likely to be toponyms, appellatives, or proper nouns associated with specific places, as can be shown by mapping autocorrelation values. In addition, some hashtags that exhibit a spatial tendency describe localized geographical or meteorological phenomena

    Claves para analizar datos en Twitter. Recolección y procesamiento de corpus*

    Get PDF
    El objetivo de este trabajo es presentar una propuesta metodológica para el análisis de datos de Twitter con un enfoque mixto. Específicamente, el procedimiento de recolección y procesamiento de la información se caracteriza por retomar recursos cualitativos y cuantitativos, así como por la construcción de un corpus manejable para un posterior análisis cualitativo. El procedimiento para abordar los discursos digitales de Twitter consiste en: 1) registro de la etnografía virtual, 2) recolección de los datos por medio de la API de Twitter usando Python; 3) visualización y filtrado de los datos con Open Refine; 4) construcción del corpus 5) categorización y etiquetado de los enunciados verbo-icónicos con Atlas.ti. El trabajo reconstruye el recorrido metodológico llevado a cabo en una investigación doctoral en curso con enfoque cualitativo, de la cual se extraen los ejemplos, con el fin de ofrecer una ruta accesible que pueda ser replicada en investigaciones con este tipo de datos

    Calculated vs. ad hoc publics in the# Brexit discourse on Twitter and the role of business actors

    Get PDF
    Mobilization theory posits that social media gives a voice to non-traditional actors in socio-political discourse. This study uses network analytics to understand the underlying structure of the Brexit discourse and whether the main sub-networks identify new publics and influencers in political participation, and specifically industry stakeholders. Content analytics and peak detection analysis are used to provide greater explanatory values to the organizing themes for these sub-networks. Our findings suggest that the Brexit discourse on Twitter can be largely explained by calculated publics organized around the two campaigns and political parties. Ad hoc communities were identified based on (i) the media, (ii) geo-location, and (iii) the US presidential election. Other than the media, significant sub-communities did not form around industry as whole or around individual sectors or leaders. Participation by business accounts in the Twitter discourse had limited impact

    O Papel Do Twitter Nos Resultados Dos Referendos: Casos de estudo: Brexit, no Reino Unido e Referendo Pela Paz, na Colômbia

    Get PDF
    A presente dissertação de mestrado pretende contribuir com elementos para o debate acerca da relação que as novas tecnologias da informação e comunicação têm com os sistemas democráticos nas sociedades ocidentais. Baseado numa análise do papel da plataforma digital Twitter em processos democráticos plebiscitários, toma-se como estudos de caso dois referendos do ano 2016, o referendo no Reino Unido para sair da União Europeia, conhecido como Brexit, e o referendo para aprovar a implementação do acordo final do processo de paz na Colômbia. Analisa-se o uso e o impacto dos atores principais de cada caso, tendo como referência a dualidade: ganhador/perdedor. Tem-se em conta para a análise o conteúdo publicado na rede social assim como também o possível impacto e alcance que ditos conteúdos conseguem ter dentro da comunidade digital do Twitter. O resultado procura contribuir para a discussão correlacionando o uso da plataforma uma das duas propostas principais do debate: por um lado o fortalecimento da democracia devido à presença destas tecnologias digitais ou, em contrapartida, o enfraquecimento do sistema democrático, produto dos novos media.This current Master degree dissertation intends to supply elements to the debate about the relevance of new information and communication technologies in the democratic systems of the Western world. Taking as a starting point the analysis of the role that the digital platform Twitter has in plebiscitary consultations, it presents two concrete case studies from 2016: on the one hand, the United Kingdom referendum to leave the European Union, also known as Brexit, and on the other hand the peace process referendum in Colombia. To achieve such a goal, the usage and impact of the principal actors of each case will be analysed, taking into account the duality winner/loser. The analysis takes into consideration the published content in this social medium, as well as the reach and impact of the above-mentioned contents. The result will contribute to the discussion by co-relating the usage of the platform with each of the main proposals in the debate; either there is a strengthening of the democratic system due to the presence of these digital technologies or, conversely, the weakening of the democracy by hand of the new media

    A Twitter sentiment gold standard for the Brexit referendum

    Get PDF
    A Twitter Sentiment Gold Standard for the Brexit Referendum Manuela Hürlimann, Brian Davis Insight Centre for Data Analytics National University of Ireland Galway, Ireland {first.last}@insight-centre.org Keith Cortis, André Freitas, Siegfried Handschuh University of Passau Germany {first.last}@uni-passau.de Sergio Fernández Redlink GmbH Salzburg Austria [email protected] ABSTRACT In this paper, we present a sentiment-annotated Twitter gold standard for the Brexit referendum. The data set consists of 2,000 Twitter messages (“tweets”) annotated with information about the sentiment expressed, the strength of the sentiment, and context dependence. This is a valuable resource for social media-based opinion mining in the context of political events

    Adapting to Change: The Temporal Persistence of Text Classifiers in the Context of Longitudinally Evolving Data

    Get PDF
    This thesis delves into the evolving landscape of NLP, particularly focusing on the temporal persistence of text classifiers amid the dynamic nature of language use. The primary objective is to understand how changes in language patterns over time impact the performance of text classification models and to develop methodologies for maintaining their effectiveness. The research begins by establishing a theoretical foundation for text classification and temporal data analysis, highlighting the challenges posed by the evolving use of language and its implications for NLP models. A detailed exploration of various datasets, including the stance detection and sentiment analysis datasets, sets the stage for examining these dynamics. The characteristics of the datasets, such as linguistic variations and temporal vocabulary growth, are carefully examined to understand their influence on the performance of the text classifier. A series of experiments are conducted to evaluate the performance of text classifiers across different temporal scenarios. The findings reveal a general trend of performance degradation over time, emphasizing the need for classifiers that can adapt to linguistic changes. The experiments assess models' ability to estimate past and future performance based on their current efficacy and linguistic dataset characteristics, leading to valuable insights into the factors influencing model longevity. Innovative solutions are proposed to address the observed performance decline and adapt to temporal changes in language use over time. These include incorporating temporal information into word embeddings and comparing various methods across temporal gaps. The Incremental Temporal Alignment (ITA) method emerges as a significant contributor to enhancing classifier performance in same-period experiments, although it faces challenges in maintaining effectiveness over longer temporal gaps. Furthermore, the exploration of machine learning and statistical methods highlights their potential to maintain classifier accuracy in the face of longitudinally evolving data. The thesis culminates in a shared task evaluation, where participant-submitted models are compared against baseline models to assess their classifiers' temporal persistence. This comparison provides a comprehensive understanding of the short-term, long-term, and overall persistence of their models, providing valuable information to the field. The research identifies several future directions, including interdisciplinary approaches that integrate linguistics and sociology, tracking textual shifts on online platforms, extending the analysis to other classification tasks, and investigating the ethical implications of evolving language in NLP applications. This thesis contributes to the NLP field by highlighting the importance of evaluating text classifiers' temporal persistence and offering methodologies to enhance their sustainability in dynamically evolving language environments. The findings and proposed approaches pave the way for future research, aiming at the development of more robust, reliable, and temporally persistent text classification models
    corecore