13 research outputs found
The Complete Picture of the Twitter Social Graph
International audienceIn this work, we collected the entire Twitter social graph that consists of 537 million Twitter accounts connected by 23.95 billion links, and performed a preliminary analysis of the collected data. In order to collect the social graph, we implemented a distributed crawler on the PlanetLab infrastructure that collected all information in 4 months. Our preliminary analysis already revealed some interesting properties. Whereas there are 537 million Twitter accounts, only 268 million already sent at least one tweet and no more than 54 million have been recently active. In addition, 40% of the accounts are not followed by anybody and 25% do not follow anybody. Finally, we found that the Twitter policies, but also social conventions (like the follow-back convention) have a huge impact on the structure of the Twitter social graph
LiveRank: How to Refresh Old Datasets
This paper considers the problem of refreshing a dataset. More precisely ,
given a collection of nodes gathered at some time (Web pages, users from an
online social network) along with some structure (hyperlinks, social
relationships), we want to identify a significant fraction of the nodes that
still exist at present time. The liveness of an old node can be tested through
an online query at present time. We call LiveRank a ranking of the old pages so
that active nodes are more likely to appear first. The quality of a LiveRank is
measured by the number of queries necessary to identify a given fraction of the
active nodes when using the LiveRank order. We study different scenarios from a
static setting where the Liv-eRank is computed before any query is made, to
dynamic settings where the LiveRank can be updated as queries are processed.
Our results show that building on the PageRank can lead to efficient LiveRanks,
for Web graphs as well as for online social networks
A Random Growth Model with any Real or Theoretical Degree Distribution
The degree distributions of complex networks are usually considered to be
power law. However, it is not the case for a large number of them. We thus
propose a new model able to build random growing networks with (almost) any
wanted degree distribution. The degree distribution can either be theoretical
or extracted from a real-world network. The main idea is to invert the
recurrence equation commonly used to compute the degree distribution in order
to find a convenient attachment function for node connections - commonly chosen
as linear. We compute this attachment function for some classical
distributions, as the power-law, broken power-law, geometric and Poisson
distributions. We also use the model on an undirected version of the Twitter
network, for which the degree distribution has an unusual shape. We finally
show that the divergence of chosen attachment functions is heavily links to the
heavy-tailed property of the obtained degree distributions.Comment: 23 pages, 3 figure
Trollslayer: Crowdsourcing and Characterization of Abusive Birds in Twitter
As of today, abuse is a pressing issue to participants and administrators of
Online Social Networks (OSN). Abuse in Twitter can spawn from arguments
generated for influencing outcomes of a political election, the use of bots to
automatically spread misinformation, and generally speaking, activities that
deny, disrupt, degrade or deceive other participants and, or the network. Given
the difficulty in finding and accessing a large enough sample of abuse ground
truth from the Twitter platform, we built and deployed a custom crawler that we
use to judiciously collect a new dataset from the Twitter platform with the aim
of characterizing the nature of abusive users, a.k.a abusive birds, in the
wild. We provide a comprehensive set of features based on users' attributes, as
well as social-graph metadata. The former includes metadata about the account
itself, while the latter is computed from the social graph among the sender and
the receiver of each message. Attribute-based features are useful to
characterize user's accounts in OSN, while graph-based features can reveal the
dynamics of information dissemination across the network. In particular, we
derive the Jaccard index as a key feature to reveal the benign or malicious
nature of directed messages in Twitter. To the best of our knowledge, we are
the first to propose such a similarity metric to characterize abuse in Twitter.Comment: SNAMS 201
Un modÚle de graphes aléatoires croissants pour n'importe quelle distribution des degrés
International audienceThe degree distributions of complex networks are usually considered to be power law. However, it is not the case for a large number of them. We thus propose a new model able to build random growing networks with almost any wanted degree distribution. The degree distribution can either be theoretical or extracted from a real-world network. The main idea is to invert the recurrence equation commonly used to compute the degree distribution in order to find a convenient attachment function for node connections - commonly chosen as linear. We compute this attachment function for some classical distributions, as the power-law, broken power-law, and the geometric distributions. We also use the model on an undirected version of the Twitter network, for which the degree distribution has an unusual shape.Les distributions de degrĂ©s des rĂ©seaux du monde rĂ©el sont gĂ©nĂ©ralement considĂ©rĂ©es comme des lois de puissance. Cependant, ce n'est pas le cas pour un grand nombre d'entre eux. Nous proposons donc un nouveau modĂšle de graphes alĂ©atoires croissants capable de construire des graphes avec presque toute distribution de degrĂ©s souhaitĂ©e. La distribution des degrĂ©s voulue peut ĂȘtre soit thĂ©orique, soit extraite d'un rĂ©seau du monde rĂ©el. L'idĂ©e principale est d'inverser l'Ă©quation de rĂ©currence couramment utilisĂ©e pour calculer la distribution des degrĂ©s, afin de trouver une fonction d'attachement adĂ©quate pour le choix des nĆuds recevant les nouvelles connexions - gĂ©nĂ©ralement choisie comme linĂ©aire. Nous calculons cette fonction d'attachement pour certaines distributions classiques, telles que les distributions de loi de puissance, loi de puissance brisĂ©e, et gĂ©omĂ©trique. Nous utilisons Ă©galement le modĂšle sur une version non dirigĂ©e du rĂ©seau social des suivis de Twitter, pour lequel la distribution des degrĂ©s a une forme inhabituelle
Unlocking the power of Twitter communities for startups
Peixoto, A. R., Almeida, A. D., AntĂłnio, N., Batista, F., Ribeiro, R., & Cardoso, E. (2023). Unlocking the power of Twitter communities for startups. Applied Network Science, 8, 1-21. [66]. https://doi.org/10.21203/rs.3.rs-3062630/v1, https://doi.org/10.1007/s41109-023-00593-0 --- This work was partially supported by Fundação para a CiĂȘncia e a Tecnologia, I.P. (FCT) namely by UIDB/04466/2020 and UIDP/04466/2020 (ISTAR_Iscte); UIDB/04152/2020 (MagIC/NOVA IMS); UIDB/50021/2020 (INESC-ID); and UIDB/03126/2020 (CIES_Iscte).Social media platforms offer cost-effective digital marketing opportunities to monitor the market, create user communities, and spread positive opinions. They allow companies with fewer budgets, like startups, to achieve their goals and grow. In fact, studies found that startups with active engagement on those platforms have a higher chance of succeeding and receiving funding from venture capitalists. Our study explores how startups utilize social media platforms to foster social communities. We also aim to characterize the individuals within these communities. The findings from this study underscore the importance of social media for startups. We used network analysis and visualization techniques to investigate the communities of Portuguese IT startups through their Twitter data. For that, a social digraph has been created, and its visualization shows that each startup created a community with a degree of intersecting followers and following users. We characterized those users using user node-level measures. The results indicate that users who are followed by or follow Portuguese IT startups are of these types: âPersonâ, âCompany,â âBlog,â âVenture Capital/Investor,â âIT Event,â âIncubators/Accelerators,â âStartup,â and âUniversity.â Furthermore, startups follow users who post high volumes of tweets and have high popularity levels, while those who follow them have low activity and are unpopular. The attained results reveal the power of Twitter communities and offer essential insights for startups to consider when building their social media strategies. Lastly, this study proposes a methodological process for social media community analysis on platforms like Twitter.publishersversionpublishe
Interest Clustering Coefficient: a New Metric for Directed Networks like Twitter
We study here the clustering of directed social graphs. The clustering
coefficient has been introduced to capture the social phenomena that a friend
of a friend tends to be my friend. This metric has been widely studied and has
shown to be of great interest to describe the characteristics of a social
graph. In fact, the clustering coefficient is adapted for a graph in which the
links are undirected, such as friendship links (Facebook) or professional links
(LinkedIn). For a graph in which links are directed from a source of
information to a consumer of information, it is no more adequate. We show that
former studies have missed much of the information contained in the directed
part of such graphs. We thus introduce a new metric to measure the clustering
of a directed social graph with interest links, namely the interest clustering
coefficient. We compute it (exactly and using sampling methods) on a very large
social graph, a Twitter snapshot with 505 million users and 23 billion links.
We additionally provide the values of the formerly introduced directed and
undirected metrics, a first on such a large snapshot. We exhibit that the
interest clustering coefficient is larger than classic directed clustering
coefficients introduced in the literature. This shows the relevancy of the
metric to capture the informational aspects of directed graphs.Comment: 15 pages, 9 figure
Discovery, retrieval, and analysis of the 'Star wars' botnet in twitter
It is known that many Twitter users are bots, which are accounts controlled and sometimes created by computers. Twitter bots can send spam tweets, manipulate public opinion and be used for online fraud. Here we report the discovery, retrieval, and analysis of the âStar Warsâ botnet in Twitter, which consists of more than 350,000 bots tweeting random quotations exclusively from Star Wars novels. The botnet contains a single type of bot, showing exactly the same properties throughout the botnet. It is unusually large, many times larger than other available datasets. It provides a valuable source of ground truth for research on Twitter bots. We analysed and revealed rich details on how the botnet was designed and created. As of this writing, the Star Wars bots are still alive in Twitter. They have survived since their creation in 2013, despite the increasing efforts in recent years to detect and remove Twitter bots. We also reflect on the âunconventionalâ way in which we discovered the Star Wars bots, and discuss the current problems and future challenges of Twitter bot detection
Interest clustering coefficient: a new metric for directed networks like Twitter
International audienceThe clustering coefficient has been introduced to capture the social phenomena that a friend of a friend tends to be my friend. This metric has been widely studied and has shown to be of great interest to describe the characteristics of a social graph. But, the clustering coefficient is originally defined for a graph in which the links are undirected, such as friendship links (Facebook) or professional links (LinkedIn). For a graph in which links are directed from a source of information to a consumer of information, it is no more adequate. We show that former studies have missed much of the information contained in the directed part of such graphs. In this article, we introduce a new metric to measure the clustering of directed social graphs with interest links, namely the interest clustering coefficient. We compute it (exactly and using sampling methods) on a very large social graph, a Twitter snapshot with 505 million users and 23 billion links, as well as other various datasets. We additionally provide the values of the formerly introduced directed and undirected metrics, a first on such a large snapshot. We observe a higher value of the interest clustering coefficient than classic directed clustering coefficients, showing the importance of this metric. By studying the bidirectional edges of the Twitter graph, we also show that the interest clustering coefficient is more adequate to capture the interest part of the graph while classic ones are more adequate to capture the social part. We also introduce a new model able to build random networks with a high value of interest clustering coefficient. We finally discuss the interest of this new metric for link recommendation
Message Propagation and Social Influence in Twitter
Twitter data has potentially unlimited value and numerous applications and is known for its increase in users over time. Twitter facilitates information diffusion at an exponential rate and also the creation of networks of users with a common interest. People reacting to the spread of an epidemic or a natural disaster are greatly influenced by the information diffusion in social media. Twitter, being a popular micro-blogging network provides an effective way to measure diffusion in terms of speed and strength. Our research is based on previous work on models related to topic diffusion and user influence. A topic is defined by a set of keywords.This research concentrates on the implementation of algorithms for computation of diffusion of a topic in twitter. The degree of influence of the users who tweet on the topic is also addressed. We have presented two different approaches to compute user influence based on topic potential. We compare two diffusion models proposed in the literature, namely potentials and connections. For testing and empirical analyses we use tweets related to âfluâ, âfood poisoningâ, and âpoliticsâ.Computer Scienc