171 research outputs found

    Decentralized K-means using randomized Gossip protocols for clustering large datasets

    Get PDF
    International audienceIn this paper, we consider the clustering of very large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, we use a randomized gossip aggregation protocol where only small messages are ex- changed. We theoretically show the equivalence of the algorithm with a centralized K-means, provided a bound on the number of messages each node has to send is met. We provide experiments showing that the consensus is reached for a number of messages consistent with the bound, but also for a smaller number of messages, albeit with a less smooth evolution of the objective function

    Hide & Share: Landmark-based Similarity for Private KNN Computation

    Get PDF
    International audienceComputing k-nearest-neighbor graphs constitutes a fundamental operation in a variety of data-mining applications. As a prominent example, user-based collaborative-filtering provides recommendations by identifying the items appreciated by the closest neighbors of a target user. As this kind of applications evolve, they will require KNN algorithms to operate on more and more sensitive data. This has prompted researchers to propose decentralized peer-to-peer KNN solutions that avoid concentrating all information in the hands of one central organization. Unfortunately , such decentralized solutions remain vulnerable to malicious peers that attempt to collect and exploit information on participating users. In this paper, we seek to overcome this limitation by proposing H&S (Hide & Share), a novel landmark-based similarity mechanism for decentralized KNN computation. Landmarks allow users (and the associated peers) to estimate how close they lay to one another without disclosing their individual profiles. We evaluate H&S in the context of a user-based collaborative-filtering recommender with publicly available traces from existing recommendation systems. We show that although landmark-based similarity does disturb similarity values (to ensure privacy), the quality of the recommendations is not as significantly hampered. We also show that the mere fact of disturbing similarity values turns out to be an asset because it prevents a malicious user from performing a profile reconstruction attack against other users, thus reinforcing users' privacy. Finally, we provide a formal privacy guarantee by computing an upper bound on the amount of information revealed by H&S about a user's profile

    GDCluster: a general decentralized clustering algorithm

    Get PDF
    In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P

    Who started this rumor? Quantifying the natural differential privacy guarantees of gossip protocols

    Get PDF
    Gossip protocols are widely used to disseminate information in massive peer-to-peer networks. These protocols are often claimed to guarantee privacy because of the uncertainty they introduce on the node that started the dissemination. But is that claim really true? Can the source of a gossip safely hide in the crowd? This paper examines, for the first time, gossip protocols through a rigorous mathematical framework based on differential privacy to determine the extent to which the source of a gossip can be traceable. Considering the case of a complete graph in which a subset of the nodes are curious, we study a family of gossip protocols parameterized by a ``muting'' parameter ss: nodes stop emitting after each communication with a fixed probability 1−s1-s. We first prove that the standard push protocol, corresponding to the case s=1s=1, does not satisfy differential privacy for large graphs. In contrast, the protocol with s=0s=0 achieves optimal privacy guarantees but at the cost of a drastic increase in the spreading time compared to standard push, revealing an interesting tension between privacy and spreading time. Yet, surprisingly, we show that some choices of the muting parameter ss lead to protocols that achieve an optimal order of magnitude in both privacy and speed. We also confirm empirically that, with appropriate choices of ss, we indeed obtain protocols that are very robust against concrete source location attacks while spreading the information almost as fast as the standard (and non-private) push protocol

    Risk assessment in centralized and decentralized online social network.

    Get PDF
    One of the main concerns in centralized and decentralized OSNs is related to the fact that OSNs users establish new relationships with unknown people with the result of exposing a huge amount of personal data. This can attract the variety of attackers that try to propagate malwares and malicious items in the network to misuse the personal information of users. Therefore, there have been several research studies to detect specific kinds of attacks by focusing on the topology of the graph [159, 158, 32, 148, 157]. On the other hand, there are several solutions to detect specific kinds of attackers based on the behavior of users. But, most of these approaches either focus on just the topology of the graph [159, 158] or the detection of anomalous users by exploiting supervised learning techniques [157, 47, 86, 125]. However, we have to note that the main issue of supervised learning is that they are not able to detect new attacker's behaviors, since the classifier is trained based on the known behavioral patterns. Literature also offers approaches to detect anomalous users in OSNs that use unsupervised learning approaches [150, 153, 36, 146] or a combination of supervised and unsupervised techniques [153]. But, existing attack defenses are designed to cope with just one specific type of attack. Although several solutions to detect specific kinds of attacks have been recently proposed, there is no general solution to cope with the main privacy/security attacks in OSNs. In such a scenario, it would be very beneficial to have a solution that can cope with the main privacy/security attacks that can be perpetrated using the social network graph. Our main contribution is proposing a unique unsupervised approach that helps OSNs providers and users to have a global understanding of risky users and detect them. We believe that the core of such a solution is a mechanism able to assign a risk score to each OSNs account. Over the last three years, we have done significant research efforts in analyzing user's behavior to detect risky users included some kinds of well known attacks in centralized and decentralized online social networks. Our research started by proposing a risk assessment approach based on the idea that the more a user behavior diverges from normal behavior, the more it should be considered risky. In our proposed approach, we monitor and analyze the combination of interaction or activity patterns and friendship patterns of users and build the risk estimation model in order to detect and identify those risky users who follow the behavioral patterns of attackers. Since, users in OSNs follow different behavioral patterns, it is not possible to define a unique standard behavioral model that fits all OSNs users' behaviors. Towards this goal, we propose a two-phase risk assessment approach by grouping users in the first phase to find similar users that share the same behavioral patterns and, then in the second phase, for each identified group, building some normal behavior models and compute for each user the level of divergency from these normal behaviors. Then, we extend this approach for Decentralized Online Social Networks (i.e., DOSNs). In the following of this approach, we propose a solution in defining a risk measure to help users in OSNs to judge their direct contacts so as to avoid friendship with malicious users. Finally, we monitor dynamically the friendship patterns of users in a large social graph over time for any anomalous changes reflecting attacker's behaviors. In this thesis, we will describe all the solutions that we proposed

    LayStream: composing standard gossip protocols for live video streaming

    Get PDF
    Gossip-based live streaming is a popular topic, as attested by the vast literature on the subject. Despite the particular merits of each proposal, all need to implement and deal with common challenges such as membership management, topology construction and video packets dissemination. Well-principled gossip-based protocols have been proposed in the literature for each of these aspects. Our goal is to assess the feasibility of building a live streaming system, \sys, as a composition of these existing protocols, to deploy the resulting system on real testbeds, and report on lessons learned in the process. Unlike previous evaluations conducted by simulations and considering each protocol independently, we use real deployments. We evaluate protocols both independently and as a layered composition, and unearth specific problems and challenges associated with deployment and composition. We discuss and present solutions for these, such as a novel topology construction mechanism able to cope with the specificities of a large-scale and delay-sensitive environment, but also with requirements from the upper layer. Our implementation and data are openly available to support experimental reproducibility

    Who Started This Rumor? Quantifying the Natural Differential Privacy of Gossip Protocols

    Get PDF
    Gossip protocols (also called rumor spreading or epidemic protocols) are widely used to disseminate information in massive peer-to-peer networks. These protocols are often claimed to guarantee privacy because of the uncertainty they introduce on the node that started the dissemination. But is that claim really true? Can the source of a gossip safely hide in the crowd? This paper examines, for the first time, gossip protocols through a rigorous mathematical framework based on differential privacy to determine the extent to which the source of a gossip can be traceable. Considering the case of a complete graph in which a subset of the nodes are curious, we study a family of gossip protocols parameterized by a "muting" parameter s: nodes stop emitting after each communication with a fixed probability 1-s. We first prove that the standard push protocol, corresponding to the case s = 1, does not satisfy differential privacy for large graphs. In contrast, the protocol with s = 0 (nodes forward only once) achieves optimal privacy guarantees but at the cost of a drastic increase in the spreading time compared to standard push, revealing an interesting tension between privacy and spreading time. Yet, surprisingly, we show that some choices of the muting parameter s lead to protocols that achieve an optimal order of magnitude in both privacy and speed. Privacy guarantees are obtained by showing that only a small fraction of the possible observations by curious nodes have different probabilities when two different nodes start the gossip, since the source node rapidly stops emitting when s is small. The speed is established by analyzing the mean dynamics of the protocol, and leveraging concentration inequalities to bound the deviations from this mean behavior. We also confirm empirically that, with appropriate choices of s, we indeed obtain protocols that are very robust against concrete source location attacks (such as maximum a posteriori estimates) while spreading the information almost as fast as the standard (and non-private) push protocol

    Risk assessment in centralized and decentralized online social network.

    Get PDF
    One of the main concerns in centralized and decentralized OSNs is related to the fact that OSNs users establish new relationships with unknown people with the result of exposing a huge amount of personal data. This can attract the variety of attackers that try to propagate malwares and malicious items in the network to misuse the personal information of users. Therefore, there have been several research studies to detect specific kinds of attacks by focusing on the topology of the graph [159, 158, 32, 148, 157]. On the other hand, there are several solutions to detect specific kinds of attackers based on the behavior of users. But, most of these approaches either focus on just the topology of the graph [159, 158] or the detection of anomalous users by exploiting supervised learning techniques [157, 47, 86, 125]. However, we have to note that the main issue of supervised learning is that they are not able to detect new attacker's behaviors, since the classifier is trained based on the known behavioral patterns. Literature also offers approaches to detect anomalous users in OSNs that use unsupervised learning approaches [150, 153, 36, 146] or a combination of supervised and unsupervised techniques [153]. But, existing attack defenses are designed to cope with just one specific type of attack. Although several solutions to detect specific kinds of attacks have been recently proposed, there is no general solution to cope with the main privacy/security attacks in OSNs. In such a scenario, it would be very beneficial to have a solution that can cope with the main privacy/security attacks that can be perpetrated using the social network graph. Our main contribution is proposing a unique unsupervised approach that helps OSNs providers and users to have a global understanding of risky users and detect them. We believe that the core of such a solution is a mechanism able to assign a risk score to each OSNs account. Over the last three years, we have done significant research efforts in analyzing user's behavior to detect risky users included some kinds of well known attacks in centralized and decentralized online social networks. Our research started by proposing a risk assessment approach based on the idea that the more a user behavior diverges from normal behavior, the more it should be considered risky. In our proposed approach, we monitor and analyze the combination of interaction or activity patterns and friendship patterns of users and build the risk estimation model in order to detect and identify those risky users who follow the behavioral patterns of attackers. Since, users in OSNs follow different behavioral patterns, it is not possible to define a unique standard behavioral model that fits all OSNs users' behaviors. Towards this goal, we propose a two-phase risk assessment approach by grouping users in the first phase to find similar users that share the same behavioral patterns and, then in the second phase, for each identified group, building some normal behavior models and compute for each user the level of divergency from these normal behaviors. Then, we extend this approach for Decentralized Online Social Networks (i.e., DOSNs). In the following of this approach, we propose a solution in defining a risk measure to help users in OSNs to judge their direct contacts so as to avoid friendship with malicious users. Finally, we monitor dynamically the friendship patterns of users in a large social graph over time for any anomalous changes reflecting attacker's behaviors. In this thesis, we will describe all the solutions that we proposed
    • 

    corecore