459 research outputs found
Discovering Organizational Correlations from Twitter
Organizational relationships are usually very complex in real life. It is
difficult or impossible to directly measure such correlations among different
organizations, because important information is usually not publicly available
(e.g., the correlations of terrorist organizations). Nowadays, an increasing
amount of organizational information can be posted online by individuals and
spread instantly through Twitter. Such information can be crucial for detecting
organizational correlations. In this paper, we study the problem of discovering
correlations among organizations from Twitter. Mining organizational
correlations is a very challenging task due to the following reasons: a) Data
in Twitter occurs as large volumes of mixed information. The most relevant
information about organizations is often buried. Thus, the organizational
correlations can be scattered in multiple places, represented by different
forms; b) Making use of information from Twitter collectively and judiciously
is difficult because of the multiple representations of organizational
correlations that are extracted. In order to address these issues, we propose
multi-CG (multiple Correlation Graphs based model), an unsupervised framework
that can learn a consensus of correlations among organizations based on
multiple representations extracted from Twitter, which is more accurate and
robust than correlations based on a single representation. Empirical study
shows that the consensus graph extracted from Twitter can capture the
organizational correlations effectively.Comment: 11 pages, 4 figure
From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics
Cascades are ubiquitous in various network environments. How to predict these
cascades is highly nontrivial in several vital applications, such as viral
marketing, epidemic prevention and traffic management. Most previous works
mainly focus on predicting the final cascade sizes. As cascades are typical
dynamic processes, it is always interesting and important to predict the
cascade size at any time, or predict the time when a cascade will reach a
certain size (e.g. an threshold for outbreak). In this paper, we unify all
these tasks into a fundamental problem: cascading process prediction. That is,
given the early stage of a cascade, how to predict its cumulative cascade size
of any later time? For such a challenging problem, how to understand the micro
mechanism that drives and generates the macro phenomenons (i.e. cascading
proceese) is essential. Here we introduce behavioral dynamics as the micro
mechanism to describe the dynamic process of a node's neighbors get infected by
a cascade after this node get infected (i.e. one-hop subcascades). Through
data-driven analysis, we find out the common principles and patterns lying in
behavioral dynamics and propose a novel Networked Weibull Regression model for
behavioral dynamics modeling. After that we propose a novel method for
predicting cascading processes by effectively aggregating behavioral dynamics,
and propose a scalable solution to approximate the cascading process with a
theoretical guarantee. We extensively evaluate the proposed method on a large
scale social network dataset. The results demonstrate that the proposed method
can significantly outperform other state-of-the-art baselines in multiple tasks
including cascade size prediction, outbreak time prediction and cascading
process prediction.Comment: 10 pages, 11 figure
RTbust: Exploiting Temporal Patterns for Botnet Detection on Twitter
Within OSNs, many of our supposedly online friends may instead be fake
accounts called social bots, part of large groups that purposely re-share
targeted content. Here, we study retweeting behaviors on Twitter, with the
ultimate goal of detecting retweeting social bots. We collect a dataset of 10M
retweets. We design a novel visualization that we leverage to highlight benign
and malicious patterns of retweeting activity. In this way, we uncover a
'normal' retweeting pattern that is peculiar of human-operated accounts, and 3
suspicious patterns related to bot activities. Then, we propose a bot detection
technique that stems from the previous exploration of retweeting behaviors. Our
technique, called Retweet-Buster (RTbust), leverages unsupervised feature
extraction and clustering. An LSTM autoencoder converts the retweet time series
into compact and informative latent feature vectors, which are then clustered
with a hierarchical density-based algorithm. Accounts belonging to large
clusters characterized by malicious retweeting patterns are labeled as bots.
RTbust obtains excellent detection results, with F1 = 0.87, whereas competitors
achieve F1 < 0.76. Finally, we apply RTbust to a large dataset of retweets,
uncovering 2 previously unknown active botnets with hundreds of accounts
Leveraging Twitter data to analyze the virality of Covid-19 tweets: a text mining approach
As the novel coronavirus spreads across the world, work, pleasure, entertainment, social interactions, and meetings have shifted online. The conversations on social media have spiked, and given the uncertainties and new policies, COVID-19 remains the trending topic on all such platforms, including Twitter. This research explores the factors that affect COVID-19 content-sharing by Twitter users. The analysis was conducted using 57,000 plus tweets that mentioned COVID-19 and related keywords. The tweets were subjected to the Natural Language Processing (NLP) techniques like Topic modelling, Named Entity-Relationship, Emotion & Sentiment analysis, and Linguistic feature extraction. These methods generated features that could help explain the retweet count of the tweets. The results indicate that tweets with named entities (person, organisation, and location), expression of negative emotions (anger, disgust, fear, and sadness), reference to mental health, optimistic content, and greater length have higher chances of being shared (retweeted). On the other hand, tweets with more hashtags and user mentions are less likely to be shared
Modeling Adoption and Usage of Competing Products
The emergence and wide-spread use of online social networks has led to a
dramatic increase on the availability of social activity data. Importantly,
this data can be exploited to investigate, at a microscopic level, some of the
problems that have captured the attention of economists, marketers and
sociologists for decades, such as, e.g., product adoption, usage and
competition.
In this paper, we propose a continuous-time probabilistic model, based on
temporal point processes, for the adoption and frequency of use of competing
products, where the frequency of use of one product can be modulated by those
of others. This model allows us to efficiently simulate the adoption and
recurrent usages of competing products, and generate traces in which we can
easily recognize the effect of social influence, recency and competition. We
then develop an inference method to efficiently fit the model parameters by
solving a convex program. The problem decouples into a collection of smaller
subproblems, thus scaling easily to networks with hundred of thousands of
nodes. We validate our model over synthetic and real diffusion data gathered
from Twitter, and show that the proposed model does not only provides a good
fit to the data and more accurate predictions than alternatives but also
provides interpretable model parameters, which allow us to gain insights into
some of the factors driving product adoption and frequency of use
Incorporating social role theory into topic models for social media content analysis
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model
- …