115 research outputs found

    Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams

    Full text link
    Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We introduce an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events. We describe a Web service that leverages this framework to track political memes in Twitter and help detect astroturfing, smear campaigns, and other misinformation in the context of U.S. political elections. We present some cases of abusive behaviors uncovered by our service. Finally, we discuss promising preliminary results on the detection of suspicious memes via supervised learning based on features extracted from the topology of the diffusion networks, sentiment analysis, and crowdsourced annotations

    Temporal search in document streams

    Get PDF
    In this thesis, we address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on termmatching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. On the other hand, time-only-based methods fall short when it comes to reasoning about events in social media. During the last few years users create chronologically ordered documents about topics that draw their attention in an ever increasing pace. However, with the vast adoption of social media, new types of marketing campaigns have been developed in order to promote content, i.e. brands, products, celebrities, etc

    Хештегирование в рекламных текстах международных глянцевых журналов

    Get PDF
    Статья посвящена особенностям процесса хештегирования и функционирования хештегов в печатной рекламе. Источниками исследования послужили российские издания международных журналов за 2019-2022 гг.: "Tatler", "Elle Decoration", "Cosmopolitan", "Elle" и др. Сделан вывод, что хештеги в рекламной коммуникации международных журналов отличаются функциональной дифференцированностью и полифункциональность

    StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series

    Full text link
    There has been growing interest in applying NLP techniques in the financial domain, however, resources are extremely limited. This paper introduces StockEmotions, a new dataset for detecting emotions in the stock market that consists of 10,000 English comments collected from StockTwits, a financial social media platform. Inspired by behavioral finance, it proposes 12 fine-grained emotion classes that span the roller coaster of investor emotion. Unlike existing financial sentiment datasets, StockEmotions presents granular features such as investor sentiment classes, fine-grained emotions, emojis, and time series data. To demonstrate the usability of the dataset, we perform a dataset analysis and conduct experimental downstream tasks. For financial sentiment/emotion classification tasks, DistilBERT outperforms other baselines, and for multivariate time series forecasting, a Temporal Attention LSTM model combining price index, text, and emotion features achieves the best performance than using a single feature.Comment: Preprint for the AAAI-23 Bridge Program (AI for Financial Services

    A Momentum Theory for Hot Topic Life-cycle: A Case Study of Hot Hashtag Emerging in Twitter

    Get PDF
    The existing work on mining of hot topics is mainly based on topic multiplicity andattention from users in unit time. With the advent of social networking, the weight has been put on the hot topics which can effectively describe the importance and hotness of a topic. However, the researches on the influence exerted by the accumulation of attention towards hot topics and the alternation between hot topics and outdated ones are still relatively weak. In this paper, a novel algorithm for calculating the hotness of topics is proposed based on momentum. The number of the participants, but also the long tail effect of the historical accumulation on the topic is taken into consideration. Through this algorithm, we can accurately build a model for the hot topics on their emerging growing period and effectively describe the whole life circle of the topic. Additionally, the change between hot topics and old ones can be distinguished efficiently. Our experiments show that the process of a topic growing into a hot topic can be detected explicitly. Potential hot topics can be explored and the overdue ones can be rejected respectively

    Quantitative intersectional data (QUINTA): a #metoo case study

    Get PDF
    This research began as an investigation of the #metoo movement, with the initial impetus to illuminate the voices located on the margins, those who often go unheard or are never recognized. This work aimed to understand the intersectional aspects of how these hashtag variations of the hashtag #metoo (i.e. #metoomosque, #churchtoo, #metoodisable, #metooqueer, #metoochina, etc) reveal the inequities of the #metoo movement on Twitter. The proliferation of these hashtag variations has often been ignored by scholars, and therefore absorbed into the larger #metoo movement conversation on Twitter. Therefore, the term `hashtag derivative\u27 was created to describe the variation on the theme of its original hashtag, strongly reflecting its composition. Moreover, a critical theory such as Intersectionality is well-equipped to explore how overlapping identities encounter structure social reality relationship to power. Amid a pandemic and racial unrest, the true capabilities of Intersectionality to describe inequities and injustices beyond the singular social position of race and gender are not widely understood. Data science, is not absolved of its role in inequities and injustices merely by dint of being a quantitative field that claims to ``objectivity\u27\u27. Social scientists have illuminated the racism, sexism, ableism, transphobia, homophobia, prejudice, bigotry, and bias embedded in data science\u27s technology, tools, and algorithms. This has, direct and indirectly, grave consequences on an entire community as a whole as well as marginalized communities. The application of Intersectionality into a quantitative field can provide researchers a formal structure to be more conscientious about how to critique, develop, and design their data science processes, while also reckoning with their own positioning in relationship to the data. In this way, Intersectionality is inclusive in terms of data equity yet adds an additional layer of accountability to the researcher. This research leads to the three critical contributions of this work: (1) creating a more concise terminology to describe the phenomenon of hashtag variation, known as hashtag derivatives, (2) defining the historical context of Intersectionality and building a formal case for this to be properly contextualized in the Computer Science field (in particular Data Science), and (3) developing the Quantitative Intersectional Data (QUINTA) Framework which data scientists and scholars can use to be more equitable, inclusive and accountable for their role in the data science process

    Understanding the Real World through the Analysis of User Behavior and Topics in Online Social Media

    Get PDF
    Physical events happening in the real world usually trigger reactions and discussions in the digital world; a world most often represented by Online Social Media such as Twitter or Facebook. Mining these reactions through social sensors offers a fast and low cost way to explain what is happening in the physical world. A thorough understanding of these discussions and the context behind them has become critical for many applications like business or political analysis. This context includes the characteristics of the population participating in a discussion, or when it is being discussed, or why. As an example, we demonstrate how the time of the day affects the prediction of traffic on highways through the analysis of social media content. Obtaining an understanding of what is happening online and the ramifications on the real world can be enabled through the automatic summarization of Social Media. Trending topics are offered as a high level content recommendation system where users are suggested to view related content if they deem the displayed topics interesting. However, identifying the characteristics of the users focused on each topic can boost the importance even for topics that might not be popular or bursty. We define a way to characterize groups of users that are focused in such topics and propose an efficient and accurate algorithm to extract such communities. Through qualitative and quantitative experimentation we observe that topics with a strong community focus are interesting and more likely to catch the attention of users.Consequently, as trending topic extraction algorithms become more sophisticated and report additional information like the characteristics of the users that participate in a trend, significant and novel privacy issues arise. We introduce a statistical attack to infer sensitive attribute values of Online Social Networks users that utilizes such reported community-aware trending topics. Additionally, we provide an algorithmic methodology that alters an existing community-aware trending topic algorithm so that it can preserve the privacy of the involved users while still reporting trending topics with a satisfactory level of utility. From the user’s perspective, we explore the idea of a cyborg that can constantly monitor its owner’s privacy and alert them when necessary. However, apart from individuals, the notion of privacy can also extend to a group of people (or community). We study how non-private behavior of individuals can lead to exposure of the identity of a larger group. This exposure poses certain dangers, like online harassment targeted to the members of a group, potential physical attacks, group identity shift, etc. We discuss how this new privacy notion can be modeled and identify a set of core challenges and potential solutions

    Data mining Twitter for cancer, diabetes, and asthma insights

    Get PDF
    Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As Twitter data in its raw form would have been difficult to manage, three separate data reduction methods were contrasted to identify a method to generate analysis files, maximizing classification precision and data retention. Each of the disease files were then run through a CHAID (chi-square automatic interaction detector) analysis to demonstrate how user behavior insights vary by disease. Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables. This study followed the standard CRISP-DM data mining approach and demonstrates how the practice of mining Twitter data fits into this six-stage iterative framework. The study produced insights that provide a new lens into the potential Twitter data has as a valuable healthcare data source as well as the nuances involved in working with the data
    corecore