8,814 research outputs found
Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Notwithstanding recent work which has demonstrated the potential of using
Twitter messages for content-specific data mining and analysis, the depth of
such analysis is inherently limited by the scarcity of data imposed by the 140
character tweet limit. In this paper we describe a novel approach for targeted
knowledge exploration which uses tweet content analysis as a preliminary step.
This step is used to bootstrap more sophisticated data collection from directly
related but much richer content sources. In particular we demonstrate that
valuable information can be collected by following URLs included in tweets. We
automatically extract content from the corresponding web pages and treating
each web page as a document linked to the original tweet show how a temporal
topic model based on a hierarchical Dirichlet process can be used to track the
evolution of a complex topic structure of a Twitter community. Using
autism-related tweets we demonstrate that our method is capable of capturing a
much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 201
Recommended from our members
A conceptual framework for studying collective reactions to events in location-based social media
Events are a core concept of spatial information, but location-based social media (LBSM) provide information on reactions to events. Individuals have varied degrees of agency in initiating, reacting to or modifying the course of events, and reactions include observations of occurrence, expressions containing sentiment or emotions, or a call to action. Key characteristics of reactions include referent events and information about who reacted, when, where and how. Collective reactions are composed of multiple individual reactions sharing common referents. They can be characterized according to the following dimensions: spatial, temporal, social, thematic and interlinkage. We present a conceptual framework, which allows characterization and comparison of collective reactions. For a thematically well-defined class of event such as storms, we can explore differences and similarities in collective attribution of meaning across space and time. Other events may have very complex spatio-temporal signatures (e.g. political processes such as Brexit or elections), which can be decomposed into series of individual events (e.g. a temporal window around the result of a vote). The purpose of our framework is to explore ways in which collective reactions to events in LBSM can be described and underpin the development of methods for analysing and understanding collective reactions to events
An Investigation of Domain-based Social Influence on ChatGPT-Related Twitter Data
Recently, the word ChatGPT has made its way into the vocabulary of various fields that are important to society, including healthcare, education, governance, and robotics. This study examines the various reasons for social media influences using the Social Influence Theory. The study used tweets connected to ChatGPT collected from January 1st, 2023 to March 31st, 2023, totaling 402, 965 tweets. Initially, the key domains of ChatGPT discussions were identified along with the variations in the sentiments over the period. Subsequently drawing the literature related to Social Influence Theory, this study explored the relationship between subjective norm, social identity, user commitment, and features of the tweet content with social influence on the readership of a ChatGPT-related tweet. Subjective norm, identity, sentiment and domain are significant factors for social influence. The user commitment suggests a negative relationship with the social influence. Finally, the theoretical and practical implications are explained
A survey of data mining techniques for social media analysis
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors
MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction
Moral rhetoric plays a fundamental role in how we perceive and interpret the
information we receive, greatly influencing our decision-making process.
Especially when it comes to controversial social and political issues, our
opinions and attitudes are hardly ever based on evidence alone. The Moral
Foundations Dictionary (MFD) was developed to operationalize moral values in
the text. In this study, we present MoralStrength, a lexicon of approximately
1,000 lemmas, obtained as an extension of the Moral Foundations Dictionary,
based on WordNet synsets. Moreover, for each lemma it provides with a
crowdsourced numeric assessment of Moral Valence, indicating the strength with
which a lemma is expressing the specific value. We evaluated the predictive
potentials of this moral lexicon, defining three utilization approaches of
increased complexity, ranging from lemmas' statistical properties to a deep
learning approach of word embeddings based on semantic similarity. Logistic
regression models trained on the features extracted from MoralStrength,
significantly outperformed the current state-of-the-art, reaching an F1-score
of 87.6% over the previous 62.4% (p-value<0.01), and an average F1-Score of
86.25% over six different datasets. Such findings pave the way for further
research, allowing for an in-depth understanding of moral narratives in text
for a wide range of social issues
- …