13 research outputs found

    The Complete Picture of the Twitter Social Graph

    Get PDF
    International audienceIn this work, we collected the entire Twitter social graph that consists of 537 million Twitter accounts connected by 23.95 billion links, and performed a preliminary analysis of the collected data. In order to collect the social graph, we implemented a distributed crawler on the PlanetLab infrastructure that collected all information in 4 months. Our preliminary analysis already revealed some interesting properties. Whereas there are 537 million Twitter accounts, only 268 million already sent at least one tweet and no more than 54 million have been recently active. In addition, 40% of the accounts are not followed by anybody and 25% do not follow anybody. Finally, we found that the Twitter policies, but also social conventions (like the follow-back convention) have a huge impact on the structure of the Twitter social graph

    LiveRank: How to Refresh Old Datasets

    Get PDF
    This paper considers the problem of refreshing a dataset. More precisely , given a collection of nodes gathered at some time (Web pages, users from an online social network) along with some structure (hyperlinks, social relationships), we want to identify a significant fraction of the nodes that still exist at present time. The liveness of an old node can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the active nodes when using the LiveRank order. We study different scenarios from a static setting where the Liv-eRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks, for Web graphs as well as for online social networks

    A Random Growth Model with any Real or Theoretical Degree Distribution

    Get PDF
    The degree distributions of complex networks are usually considered to be power law. However, it is not the case for a large number of them. We thus propose a new model able to build random growing networks with (almost) any wanted degree distribution. The degree distribution can either be theoretical or extracted from a real-world network. The main idea is to invert the recurrence equation commonly used to compute the degree distribution in order to find a convenient attachment function for node connections - commonly chosen as linear. We compute this attachment function for some classical distributions, as the power-law, broken power-law, geometric and Poisson distributions. We also use the model on an undirected version of the Twitter network, for which the degree distribution has an unusual shape. We finally show that the divergence of chosen attachment functions is heavily links to the heavy-tailed property of the obtained degree distributions.Comment: 23 pages, 3 figure

    Trollslayer: Crowdsourcing and Characterization of Abusive Birds in Twitter

    Full text link
    As of today, abuse is a pressing issue to participants and administrators of Online Social Networks (OSN). Abuse in Twitter can spawn from arguments generated for influencing outcomes of a political election, the use of bots to automatically spread misinformation, and generally speaking, activities that deny, disrupt, degrade or deceive other participants and, or the network. Given the difficulty in finding and accessing a large enough sample of abuse ground truth from the Twitter platform, we built and deployed a custom crawler that we use to judiciously collect a new dataset from the Twitter platform with the aim of characterizing the nature of abusive users, a.k.a abusive birds, in the wild. We provide a comprehensive set of features based on users' attributes, as well as social-graph metadata. The former includes metadata about the account itself, while the latter is computed from the social graph among the sender and the receiver of each message. Attribute-based features are useful to characterize user's accounts in OSN, while graph-based features can reveal the dynamics of information dissemination across the network. In particular, we derive the Jaccard index as a key feature to reveal the benign or malicious nature of directed messages in Twitter. To the best of our knowledge, we are the first to propose such a similarity metric to characterize abuse in Twitter.Comment: SNAMS 201

    Un modÚle de graphes aléatoires croissants pour n'importe quelle distribution des degrés

    Get PDF
    International audienceThe degree distributions of complex networks are usually considered to be power law. However, it is not the case for a large number of them. We thus propose a new model able to build random growing networks with almost any wanted degree distribution. The degree distribution can either be theoretical or extracted from a real-world network. The main idea is to invert the recurrence equation commonly used to compute the degree distribution in order to find a convenient attachment function for node connections - commonly chosen as linear. We compute this attachment function for some classical distributions, as the power-law, broken power-law, and the geometric distributions. We also use the model on an undirected version of the Twitter network, for which the degree distribution has an unusual shape.Les distributions de degrĂ©s des rĂ©seaux du monde rĂ©el sont gĂ©nĂ©ralement considĂ©rĂ©es comme des lois de puissance. Cependant, ce n'est pas le cas pour un grand nombre d'entre eux. Nous proposons donc un nouveau modĂšle de graphes alĂ©atoires croissants capable de construire des graphes avec presque toute distribution de degrĂ©s souhaitĂ©e. La distribution des degrĂ©s voulue peut ĂȘtre soit thĂ©orique, soit extraite d'un rĂ©seau du monde rĂ©el. L'idĂ©e principale est d'inverser l'Ă©quation de rĂ©currence couramment utilisĂ©e pour calculer la distribution des degrĂ©s, afin de trouver une fonction d'attachement adĂ©quate pour le choix des nƓuds recevant les nouvelles connexions - gĂ©nĂ©ralement choisie comme linĂ©aire. Nous calculons cette fonction d'attachement pour certaines distributions classiques, telles que les distributions de loi de puissance, loi de puissance brisĂ©e, et gĂ©omĂ©trique. Nous utilisons Ă©galement le modĂšle sur une version non dirigĂ©e du rĂ©seau social des suivis de Twitter, pour lequel la distribution des degrĂ©s a une forme inhabituelle

    Unlocking the power of Twitter communities for startups

    Get PDF
    Peixoto, A. R., Almeida, A. D., AntĂłnio, N., Batista, F., Ribeiro, R., & Cardoso, E. (2023). Unlocking the power of Twitter communities for startups. Applied Network Science, 8, 1-21. [66]. https://doi.org/10.21203/rs.3.rs-3062630/v1, https://doi.org/10.1007/s41109-023-00593-0 --- This work was partially supported by Fundação para a CiĂȘncia e a Tecnologia, I.P. (FCT) namely by UIDB/04466/2020 and UIDP/04466/2020 (ISTAR_Iscte); UIDB/04152/2020 (MagIC/NOVA IMS); UIDB/50021/2020 (INESC-ID); and UIDB/03126/2020 (CIES_Iscte).Social media platforms offer cost-effective digital marketing opportunities to monitor the market, create user communities, and spread positive opinions. They allow companies with fewer budgets, like startups, to achieve their goals and grow. In fact, studies found that startups with active engagement on those platforms have a higher chance of succeeding and receiving funding from venture capitalists. Our study explores how startups utilize social media platforms to foster social communities. We also aim to characterize the individuals within these communities. The findings from this study underscore the importance of social media for startups. We used network analysis and visualization techniques to investigate the communities of Portuguese IT startups through their Twitter data. For that, a social digraph has been created, and its visualization shows that each startup created a community with a degree of intersecting followers and following users. We characterized those users using user node-level measures. The results indicate that users who are followed by or follow Portuguese IT startups are of these types: “Person”, “Company,” “Blog,” “Venture Capital/Investor,” “IT Event,” “Incubators/Accelerators,” “Startup,” and “University.” Furthermore, startups follow users who post high volumes of tweets and have high popularity levels, while those who follow them have low activity and are unpopular. The attained results reveal the power of Twitter communities and offer essential insights for startups to consider when building their social media strategies. Lastly, this study proposes a methodological process for social media community analysis on platforms like Twitter.publishersversionpublishe

    Interest Clustering Coefficient: a New Metric for Directed Networks like Twitter

    Get PDF
    We study here the clustering of directed social graphs. The clustering coefficient has been introduced to capture the social phenomena that a friend of a friend tends to be my friend. This metric has been widely studied and has shown to be of great interest to describe the characteristics of a social graph. In fact, the clustering coefficient is adapted for a graph in which the links are undirected, such as friendship links (Facebook) or professional links (LinkedIn). For a graph in which links are directed from a source of information to a consumer of information, it is no more adequate. We show that former studies have missed much of the information contained in the directed part of such graphs. We thus introduce a new metric to measure the clustering of a directed social graph with interest links, namely the interest clustering coefficient. We compute it (exactly and using sampling methods) on a very large social graph, a Twitter snapshot with 505 million users and 23 billion links. We additionally provide the values of the formerly introduced directed and undirected metrics, a first on such a large snapshot. We exhibit that the interest clustering coefficient is larger than classic directed clustering coefficients introduced in the literature. This shows the relevancy of the metric to capture the informational aspects of directed graphs.Comment: 15 pages, 9 figure

    Discovery, retrieval, and analysis of the 'Star wars' botnet in twitter

    Get PDF
    It is known that many Twitter users are bots, which are accounts controlled and sometimes created by computers. Twitter bots can send spam tweets, manipulate public opinion and be used for online fraud. Here we report the discovery, retrieval, and analysis of the ‘Star Wars’ botnet in Twitter, which consists of more than 350,000 bots tweeting random quotations exclusively from Star Wars novels. The botnet contains a single type of bot, showing exactly the same properties throughout the botnet. It is unusually large, many times larger than other available datasets. It provides a valuable source of ground truth for research on Twitter bots. We analysed and revealed rich details on how the botnet was designed and created. As of this writing, the Star Wars bots are still alive in Twitter. They have survived since their creation in 2013, despite the increasing efforts in recent years to detect and remove Twitter bots. We also reflect on the ‘unconventional’ way in which we discovered the Star Wars bots, and discuss the current problems and future challenges of Twitter bot detection

    Interest clustering coefficient: a new metric for directed networks like Twitter

    Get PDF
    International audienceThe clustering coefficient has been introduced to capture the social phenomena that a friend of a friend tends to be my friend. This metric has been widely studied and has shown to be of great interest to describe the characteristics of a social graph. But, the clustering coefficient is originally defined for a graph in which the links are undirected, such as friendship links (Facebook) or professional links (LinkedIn). For a graph in which links are directed from a source of information to a consumer of information, it is no more adequate. We show that former studies have missed much of the information contained in the directed part of such graphs. In this article, we introduce a new metric to measure the clustering of directed social graphs with interest links, namely the interest clustering coefficient. We compute it (exactly and using sampling methods) on a very large social graph, a Twitter snapshot with 505 million users and 23 billion links, as well as other various datasets. We additionally provide the values of the formerly introduced directed and undirected metrics, a first on such a large snapshot. We observe a higher value of the interest clustering coefficient than classic directed clustering coefficients, showing the importance of this metric. By studying the bidirectional edges of the Twitter graph, we also show that the interest clustering coefficient is more adequate to capture the interest part of the graph while classic ones are more adequate to capture the social part. We also introduce a new model able to build random networks with a high value of interest clustering coefficient. We finally discuss the interest of this new metric for link recommendation

    Message Propagation and Social Influence in Twitter

    Get PDF
    Twitter data has potentially unlimited value and numerous applications and is known for its increase in users over time. Twitter facilitates information diffusion at an exponential rate and also the creation of networks of users with a common interest. People reacting to the spread of an epidemic or a natural disaster are greatly influenced by the information diffusion in social media. Twitter, being a popular micro-blogging network provides an effective way to measure diffusion in terms of speed and strength. Our research is based on previous work on models related to topic diffusion and user influence. A topic is defined by a set of keywords.This research concentrates on the implementation of algorithms for computation of diffusion of a topic in twitter. The degree of influence of the users who tweet on the topic is also addressed. We have presented two different approaches to compute user influence based on topic potential. We compare two diffusion models proposed in the literature, namely potentials and connections. For testing and empirical analyses we use tweets related to “flu”, “food poisoning”, and “politics”.Computer Scienc
    corecore