359 research outputs found

    Tracking public opinion on social media

    Get PDF
    The increasing popularity of social media has changed the web from a static repository of information into a dynamic forum with continuously changing information. Social media platforms has given the capability to people expressing and sharing their thoughts and opinions on the web in a very simple way. The so-called User Generated Content is a good source of users opinion and mining it can be very useful for a wide variety of applications that require understanding the public opinion about a concept. For example, enterprises can capture the negative or positive opinions of customers about their services or products and improve their quality accordingly. The dynamic nature of social media with the constantly changing vocabulary, makes developing tools that can automatically track public opinion a challenge. To help users better understand public opinion towards an entity or a topic, it is important to: a) find the related documents and the sentiment polarity expressed in them; b) identify the important time intervals where there is a change in the opinion; c) identify the causes of the opinion change; d) estimate the number of people that have a certain opinion about the entity; and e) measure the impact of public opinion towards the entity. In this thesis we focus on the problem of tracking public opinion on social media and we propose and develop methods to address the different subproblems. First, we analyse the topical distribution of tweets to determine the number of topics that are discussed in a single tweet. Next, we propose a topic specific stylistic method to retrieve tweets that are relevant to a topic and also express opinion about it. Then, we explore the effectiveness of time series methodologies to track and forecast the evolution of sentiment towards a specific topic over time. In addition, we propose the LDA & KL-divergence approach to extract and rank the likely causes of sentiment spikes. We create a test collection that can be used to evaluate methodologies in ranking the likely reasons of sentiment spikes. To estimate the number of people that have a certain opinion about an entity, we propose an approach that uses pre-publication and post- publication features extracted from news posts and users' comments respectively. Finally, we propose an approach that propagates sentiment signals to measure the impact of public opinion towards the entity's reputation. We evaluate our proposed methods on standard evaluation collections and provide evidence that the proposed methods improve the performance of the state-of-the-art approaches on tracking public opinion on social media

    How Does Twitter Account Moderation Work? Dynamics of Account Creation and Suspension During Major Geopolitical Events

    Get PDF
    Social media moderation policies are often at the center of public debate, and their implementation and enactment are sometimes surrounded by a veil of mystery. Unsurprisingly, due to limited platform transparency and data access, relatively little research has been devoted to characterizing moderation dynamics, especially in the context of controversial events and the platform activity associated with them. Here, we study the dynamics of account creation and suspension on Twitter during two global political events: Russia's invasion of Ukraine and the 2022 French Presidential election. Leveraging a large-scale dataset of 270M tweets shared by 16M users in multiple languages over several months, we identify peaks of suspicious account creation and suspension, and we characterize behaviours that more frequently lead to account suspension. We show how large numbers of accounts get suspended within days from their creation. Suspended accounts tend to mostly interact with legitimate users, as opposed to other suspicious accounts, often making unwarranted and excessive use of reply and mention features, and predominantly sharing spam and harmful content. While we are only able to speculate about the specific causes leading to a given account suspension, our findings shed light on patterns of platform abuse and subsequent moderation during major events

    Breadth analysis of Online Social Networks

    Get PDF
    This thesis is mainly motivated by the analysis, understanding, and prediction of human behaviour by means of the study of their digital fingeprints. Unlike a classical PhD thesis, where you choose a topic and go further on a deep analysis on a research topic, we carried out a breadth analysis on the research topic of complex networks, such as those that humans create themselves with their relationships and interactions. These kinds of digital communities where humans interact and create relationships are commonly called Online Social Networks. Then, (i) we have collected their interactions, as text messages they share among each other, in order to analyze the sentiment and topic of such messages. We have basically applied the state-of-the-art techniques for Natural Language Processing, widely developed and tested on English texts, in a collection of Spanish Tweets and we compare the results. Next, (ii) we focused on Topic Detection, creating our own classifier and applying it to the former Tweets dataset. The breakthroughs are two: our classifier relies on text-graphs from the input text and we achieved a figure of 70% accuracy, outperforming previous results. After that, (iii) we moved to analyze the network structure (or topology) and their data values to detect outliers. We hypothesize that in social networks there is a large mass of users that behaves similarly, while a reduced set of them behave in a different way. However, specially among this last group, we try to separate those with high activity, or low activity, or any other paramater/feature that make them belong to different kind of outliers. We aim to detect influential users in one of these outliers set. We propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), labeling the outliers detected os of shape, magnitude, amplitude or combination of those. We applied this method to a subset of roughly 400 million Google+ users, identifying and discriminating automatically sets of outlier users. Finally, (iv) we find interesting to address the monitorization of real complex networks. We created a framework to dynamically adapt the temporality of large-scale dynamic networks, reducing compute overhead by at least 76%, data volume by 60% and overall cloud costs by at least 54%, while always maintaining accuracy above 88%.PublicadoPrograma de Doctorado en Ingeniería Matemática por la Universidad Carlos III de MadridPresidente: Rosa María Benito Zafrilla.- Secretario: Ángel Cuevas Rumín.- Vocal: José Ernesto Jiménez Merin

    Text-mining in macroeconomics: the wealth of words

    Get PDF
    The coming to life of the Royal Society in 1660 surely represented an important milestone in the history of science, not least in Economics. Yet, its founding motto, ``Nullius in verba'', could be somewhat misleading. Words in fact may play an important role in Economics. In order to extract relevant information that words provide, this thesis relies on state-of-the-art methods from the information retrieval and computer science communities. Chapter 1 shows how policy uncertainty indices can be constructed via unsupervised machine learning models. Using unsupervised algorithms proves useful in terms of the time and resources needed to compute these indices. The unsupervised machine learning algorithm, called Latent Dirichlet Allocation (LDA), allows obtaining the different themes in documents without any prior information about their context. Given that this algorithm is widely used throughout this thesis, this chapter offers a detailed while intuitive description of its underlying mechanics. Chapter 2 uses the LDA algorithm to categorize the political uncertainty embedded in the Scottish media. In particular, it models the uncertainty regarding Brexit and the Scottish referendum for independence. These referendum-related indices are compared with the Google search queries ``Scottish independence'' and ``Brexit'', showing strong similarities. The second part of the chapter examines the relationship of these indices on investment in a longitudinal panel dataset of 2,589 Scottish firms over the period 2008-2017. It presents evidence of greater sensitivity for firms that are financially constrained or whose investment is to a greater degree irreversible. Additionally, it is found that Scottish companies located on the border with England have a stronger negative correlation with Scottish political uncertainty than those operating in the rest of the country. Contrary to expectations, we notice that investment coming from manufacturing companies appears less sensitive to political uncertainty. Chapter 3 builds eight different policy-related uncertainty indicators for the four largest euro area countries using press-media in German, French, Italian and Spanish from January 2000 until May 2019. This is done in two steps. Firstly, a continuous bag of word model is used to obtain semantically similar words to ``economy'' and ``uncertainty'' across the four languages and contexts. This allows for the retrieval of all news-articles relevant to economic uncertainty. Secondly, LDA is again employed to model the different sources of uncertainty for each country, highlighting how easily LDA can adapt to different languages and contexts. Using a Bayesian Structural Vector Autoregressive set up (BSVAR) a strong heterogeneity in the relationship between uncertainty and investment in machinery and equipment is then documented. For example, while investment in France, Italy and Spain reacts heavily to political uncertainty shocks, in Germany it is more sensitive to trade uncertainty shocks. Finally, Chapter 4 analyses English language media from Europe, India and the United States, augmented by a sentiment analysis to study how different narratives concerning cryptocurrencies influence their prices. The time span ranges from April 2013 to December 2018 a period where cryptocurrency prices experienced a parabolic behaviour. In addition, this case study is motivated by Shiller's belief that narratives around cryptocurrencies might have led to this price behaviour. Nonetheless, the relationship between narratives and prices ought to be driven by complex interactions. For example, articles written in the media about a specific phenomenon will attract or detract new investors depending on their content and tone (sentiment). Moreover, the press might also react to price changes by increasing the coverage of a given topic. For this reason, a recent causal model, Convergent Cross Mapping (CCM), suited to discovering causal relationships in complex dynamical ecosystems is used. I find bidirectional causal relationships between narratives concerning investment and regulation while a mild unidirectional causal association exists in narratives that relate technology and security to prices
    corecore