91 research outputs found

    Pattern Analysis of Money Flow in the Bitcoin Blockchain

    Full text link
    Bitcoin is the first and highest valued cryptocurrency that stores transactions in a publicly distributed ledger called the blockchain. Understanding the activity and behavior of Bitcoin actors is a crucial research topic as they are pseudonymous in the transaction network. In this article, we propose a method based on taint analysis to extract taint flows --dynamic networks representing the sequence of Bitcoins transferred from an initial source to other actors until dissolution. Then, we apply graph embedding methods to characterize taint flows. We evaluate our embedding method with taint flows from top mining pools and show that it can classify mining pools with high accuracy. We also found that taint flows from the same period show high similarity. Our work proves that tracing the money flows can be a promising approach to classifying source actors and characterizing different money flow pattern

    Recent Decisions

    Get PDF
    International audienceCommunity structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present an online algorithm, called OLCPM, based on clique percolation and label propagation methods. OLCPM can detect overlapping communities and works on temporal networks with a fine granularity. By locally updating the community structure, OLCPM delivers significant improvement in running time compared with previous clique percolation techniques. The experimental results on both synthetic and real-world networks illustrate the effectiveness of the method

    Détection de communautés dynamiques dans des réseaux temporels

    Get PDF
    La détection de communauté dans les réseaux est aujourd'hui un domaine ayant donné lieu à une abondante littérature. Depuis les travaux de Girvan et Newman en 2002, des centaines de travaux ont été menés sur le sujet, notamment la proposition d'un nombre important d'algorithmes de plus en plus élaborés. Si, au départ, le découpage était un partitionnement -chaque nœud appartenait à une et une seule communauté, unique et statique- les méthodes ultérieures ont montré l'intérêt de communautés imbriquées, ou décomposées hiérarchiquement. Encore plus récemment, certains travaux ont commencé à s'intéresser aux communautés dans des réseaux temporels, c'est à dire à des communautés qui évoluent au cours du temps, selon les modifications du réseau. C'est à ce nouveau problème que j'ai consacré cette thèse. Mon état de l'art, après avoir présenté les méthodes statiques les plus connues, est consacré à l'étude des quelques méthodes déjà proposées pour la détection de communautés dynamiques - dont beaucoup ont été publiées au cours des années durant lesquelles j'ai fait ma thèse- ainsi qu'à leurs forces et faiblesses. Dans une seconde partie, je propose un framework (iLCD) permettant de détecter des communautés persistantes dans des réseaux évoluant fortement, représentés sous la forme de graphes d'intervalles (chaque lien existe pour une ou plusieurs périodes données). Ce framework est conçu pour traiter de grands graphes, éventuellement en temps réel. Je propose ensuite deux implémentations de ce framework, la première étant limitée à des réseaux sans disparition de liens (de type réseaux de citation). La dernière partie de ce chapitre est consacrée aux aspects pratiques de la détection de communautés dynamiques, en particulier comment manipuler les données en entrée (réseaux temporels) et en sortie (communautés dynamique), qui sont plus complexes que dans le cas statique. Deux outils de visualisation de communautés dynamiques sont proposés, leur nécessité étant apparue au cours ma thèse. Le problème de tout algorithme de détection de communautés est de prouver la pertinence des résultats qu'il trouve. J'ai donc consacré la troisième partie de la thèse à ce problème. Cela m'a conduit à m'interroger sur la définition de ce qu'est une bonne communauté, et j'ai en particulier distingué ce que j'ai appelé les communautés intrinsèques des communautés définies relativement au réseau. Afin de valider la pertinence des résultats trouvés, j'ai ensuite essayé de comparer les communautés données par ma méthode avec celles données par les algorithmes statiques les plus connus. Étant particulièrement intéressé par l'application à des graphes réels, et la comparaison aux autre algorithmes se faisant généralement sur des graphes générés, j'ai ensuite proposé deux approches originales pour comparer des communautés sur des graphes réels : l'une, basée sur l'expérimentation, demande à des utilisateurs de Facebook de comparer les communautés trouvées dans leur réseau personnel par différentes solutions. L'autre propose, via deux métriques complémentaires, de comparer les solutions fournies par des algorithmes différents sur un même réseau. Enfin, dans la dernière partie, je présente deux applications de cet algorithme à des réseaux réels dynamiques. Le but de ces applications est double : d'une part, montrer l'intérêt pratique de l'approche dynamique, et, d'autre part, valider l'applicabilité de l'algorithme proposé sur des réseaux réels. Le premier réseau, de petite taille, concerne l'évolution des groupes au sein d'une population animale ayant un comportement social, étudiée sur une période de plus de quinze ans. Ce travail a été fait en concertation avec des éthologues, ayant déjà travaillé sur ces données de manière statique. La deuxième application est menée sur un réseau de beaucoup plus grande taille, concernant le réseau complet d'une plateforme de partage de vidéo japonaise de type Youtube, appelée Nico Nico Douga. Dans les deux cas, une analyse détaillée des résultats obtenus est fournie, qui permet de se rendre compte de l'intérêt de mon approche.The detection of community in networks is a domain today having given rise in plentiful one literature. Since the works of Girvan and Newman in 2002, hundreds of works were led on the subject, in particular the proposal of a significant number of more and more developed algorithms. If, at first, the cutting(division) was a partitionnement - every knot belonged in one and a single community, unique(only) and static the later methods showed the interest of imbricated communities, or decomposed hierarchically. Even more recently, certain works began to be interested in communities in temporal networks, that is in communities which evolve in time, according to the modifications of the network. It is to this new problem that I dedicated this Thesis. My state of the art, having presented the most known static methods, is dedicated to the study of some methods already proposed for the detection of dynamic communities - among which many were published during the years in the course of which I made my thesis(theory) as well as for their strengths and weaknesses

    Minimum entropy stochastic block models neglect edge distribution heterogeneity

    Full text link
    The statistical inference of stochastic block models as emerged as a mathematicaly principled method for identifying communities inside networks. Its objective is to find the node partition and the block-to-block adjacency matrix of maximum likelihood i.e. the one which has most probably generated the observed network. In practice, in the so-called microcanonical ensemble, it is frequently assumed that when comparing two models which have the same number and sizes of communities, the best one is the one of minimum entropy i.e. the one which can generate the less different networks. In this paper, we show that there are situations in which the minimum entropy model does not identify the most significant communities in terms of edge distribution, even though it generates the observed graph with a higher probability

    Graph space: using both geometric and probabilistic structure to evaluate statistical graph models

    Full text link
    Statistical graph models aim at modeling graphs as random realization among a set of possible graphs. One issue is to evaluate whether or not a graph is likely to have been generated by one particular model. In this paper we introduce the edit distance expected value (EDEV) and compare it with other methods such as entropy and distance to the barycenter. We show that contrary to them, EDEV is able to distinguish between graphs that have a typical structure with respect to a model, and those that do not. Finally we introduce a statistical hypothesis testing methodology based on this distance to evaluate the relevance of a candidate model with respect to an observed graph

    Contextual Subgraph Discovery With Mobility Models

    Get PDF
    International audienceStarting from a relational database that gathers information on people mobility – such as origin/destination places, date and time, means of transport – as well as demographic data, we adopt a graph-based representation that results from the aggregation of individual travels. In such a graph, the vertices are places or points of interest (POI) and the edges stand for the trips. Travel information as well as user demographics are labels associated to the edges. We tackle the problem of discovering exceptional contextual subgraphs, i.e., subgraphs related to a context – a restriction on the attribute values – that are unexpected according to a model. Previous work considers a simple model based on the number of trips associated with an edge without taking into account its length or the surrounding demography. In this article, we consider richer models based on statistical physics and demonstrate their ability to capture complex phenomena which were previously ignored

    Pattern Analysis of Money Flow in the Bitcoin Blockchain

    No full text
    International audienceBitcoin is the first and highest valued cryptocurrency that stores transactions in a publicly distributed ledger called the blockchain. Understanding the activity and behavior of Bitcoin actors is a crucial research topic as they are pseudonymous in the transaction network. In this article, we propose a method based on taint analysis to extract taint flows-dynamic networks representing the sequence of Bitcoins transferred from an initial source to other actors until dissolution. Then, we apply graph embedding methods to characterize taint flows. We evaluate our embedding method with taint flows from top mining pools and show that it can classify mining pools with high accuracy. We also found that taint flows from the same period show high similarity. Our work proves that tracing the money flows can be a promising approach to classifying source actors and characterizing different money flow patterns

    Fingerprinting Bitcoin entities using money flow representation learning

    No full text
    Abstract Deanonymization is one of the major research challenges in the Bitcoin blockchain, as entities are pseudonymous and cannot be identified from the on-chain data. Various approaches exist to identify multiple addresses of the same entity, i.e., address clustering. But it is known that these approaches tend to find several clusters for the same actor. In this work, we propose to assign a fingerprint to entities based on the dynamic graph of the taint flow of money originating from them, with the idea that we could identify multiple clusters of addresses belonging to the same entity as having similar fingerprints. We experiment with different configurations to generate substructure patterns from taint flows before embedding them using representation learning models. To evaluate our method, we train classification models to identify entities from their fingerprints. Experiments show that our approach can accurately classify entities on three datasets. We compare different fingerprint strategies and show that including the temporality of transactions improves classification accuracy and that following the flow for too long impairs performance. Our work demonstrates that out-flow fingerprinting is a valid approach for recognizing multiple clusters of the same entity

    Pattern Analysis of Money Flows in the Bitcoin Blockchain

    No full text
    International audienceBitcoin is a cryptocurrency that stores transaction records in a public distributed ledger called the blockchain. All transactions that occurred since the beginning of Bitcoin in 2009 can therefore be consulted by anyone. This unique dataset allows us to study financial transaction networks among pseudonymous participants. Several works analyzed static transaction networks but did not consider the flow of money over the time. In this work, we focus on the analysis of flows, a challenging task given the scale of the data (hundreds of millions of transactions).We propose a method based on taint analysis to track Bitcoin money flow from initial starting points to the dissolution of the taint. The algorithm derives the dynamics subgraphs passing through known entities in the transaction network. We study the pattern of money flowing from different starting points: we taint coins minted by different mining pools in one day period between 2013 and 2016, and use graph embeddings from three representations of the data: (1) static network, (2) dynamic network, and (3) money flow pattern tree. Both qualitative and quantitative analysis show that mining pools have different diffusion patterns and that those patterns evolve over time. Based on this initial result, we are developing a method to select critical entities and expand our unsupervised approach to characterize other money flow patterns, in particular, related to illegal and cybercrime activities
    corecore